Reportovanie súhlasu posudzovateľov a spoľahlivosti posudzovateľov

Lucia Kočišová

doi:10.5817/TF2022-15-14647

Reportovanie súhlasu posudzovateľov a spoľahlivosti posudzovateľov

č.15(2022)

Lucia Kočišová

https://doi.org/10.5817/TF2022-15-14647

PDF

Abstrakt

V psychológii ale aj v mnohých iných oblastiach sa stretneme s použitím ďalšieho posudzovateľa pre potvrdenie validity a reliability našich záverov. Ide o súhlas posudzovateľov (inter-rater agreement), ktorý predstavuje zhodu v ich hodnotení a ak je zhoda dosiahnutá, hodnotitelia sú zameniteľní (Tinsley, Weiss, 1975) a spoľahlivosť posudzovateľov (inter-rater reliability) v zmysle konzistencie hodnotenia (LeBreton, Senter, 2008). Oba koncepty sa okrem definovania líšia aj v zodpovedaní rôznych výskumných otázok a spôsobu štatistickej analýzy.

Cieľom príspevku je zodpovedať otázky, ktoré súvisia s praktickou potrebou reportovania súhlasu posudzovateľov a spoľahlivosti posudzovateľov. S tým sú spojené otázky, na ktoré príspevok hľadá odpovede: Aký počet posudzovateľov je vhodné zvoliť? Ako si vybrať vhodný index súhlasu a spoľahlivosti posudzovateľov? Existujú akceptované miery súhlasu a spoľahlivosti posudzovateľov? Ktoré faktory vplývajú na mieru súhlasu a spoľahlivosti posudzovateľov?

Klíčová slova:
súhlas posudzovateľov; spoľahlivosť posudzovateľov; výber indexu

Reference

Bogartz, R. S. (2005). 1 Interrater Agreement and Combining Ratings.

http://people.umass.edu/~bogartz/Interrater%20Agreement.pdf

Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement VOL. XX, No. 1, 1960. s. 37-46

de Vet, H. C, Terwee, C. B., Knol, D. L., Bouter, L. M. (2006). When to use agreement versus reliability measures. J Clin Epidemiol. Oct;59(10):1033-9.

Eye, A. von, Mun E. Y. (2005). Analyzing Rater Agreement. Manifest Variable Methods. Lawrence Erlbaum Associates, London, 2005. ISBN 0-8058-4967-X

Feinstein, A., Cicchetti, D. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6), s. 543 – 549.

Fradenburg, L. A., Harrison, R. J., Baer, D. M. (1995). The effect of some environmental factors on interobserver agreement. Research in Developmental Disabilities, 16(6), 425–437.

Gálová, L. (2010). Koeficient kappa -aplikačné možnosti, výhody a nevýhody In: 2. Česko-slovenská konference doktorandů oborů pomáhajících profesí: sborník z vědecké konference konané v Ostravě 3. února 2010. Ostrava: Ostravská univerzita. ISBN 978-80-7368-782-3, s. 98-105.

Gamer, M., Lemon, J., Fellows, I., Singh, P. (2012). irr: Various coefficients of interrater reliability and agreement [computer software]. https://CRAN.R-project.org/package=irr.

Gerke, O., Möller, S., Debrabant, B., Halekoh, U. (2018). Experience Applying the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) Indicated Five Questions Should Be Addressed in the Planning Phase from a Statistical Point of View. Diagnostics (Basel). Sep 24;8(4):69.

Graham, M., Milanowski, A., Miller, J. (2012). Measuring and Promoting Inter-Rater Agreement of Teacher and Principal Performance Ratings. Center for Educator Compensation and Reform. http://es.eric.ed.gov/fulltext/ED532068.pdf

Gwet, K. (2001). Handbook of inter-rater reliability. How to estimate the level of agreement between two or multiple raters. Gaithersburg, MD: STATAXIS Publishing Company

Haley, D.T. (2007). Using a New inter-rater Reliability Statistic. Technical Report N 2007/16. ISSN 1744-1986

Hintze, J. M., Matthews, W. J. (2004). The generalizability of systematic direct observations across time and setting: A preliminary investigation of the psychometrics of behavioral observation. School Psychology Review, 33(2), 258-270.

Keener, A. (2020). Comparison of Cohen's Kappa and Gwet's AC1 with a mass shooting classification index: A study of rater uncertainty. Dissertation. Oklahoma State University.

Kottner, J., Audige, L., Brorson, S., Donner, A., Gajewski, B. J., Hrobjartsson, A., Streiner, D. L. (2011). Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. Journal of Clinical Epidemiology, 64, 96-106.

Kottner, J., Streiner, D. L. (2011). The difference between reliability and agreement. Journal of Clinical Epidemiology 64 (2011) 701-702

LeBreton, J. M., Senter, J. L. (2008). Answers to 20 Questions About Interrater Reliability and Interrater Agreement. Organizational Research Methods; 11; 815

Liao, S. C., Hunt, E. A., Chen, W. (2010). Comparison between inter-rater reliability and inter-rater agreement in performance assessment. Annals Academy of Medicine, 39(8), 613-618.

McDonald, N., Schoenebeck, S., Forte A. (2019). Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction, November 2019 Article No.: 72

O'Neill, T. A. (2017). An Overview of Interrater Agreement on Likert Scales for Researchers and Practitioners. Front. Psychol. 8:777.

Popping, R. (1988). On agreement indices for nominal data. In W. E. Saris I. N. Gallhofer (Eds.), Sociometric research (pp. 90–105). London, UK: Palgrave Macmillan.

Slaug, B., Schilling, O., Helle, T., Iwarsson, S., Carlsson, G., Brandt, Å. (2012). Unfolding the phenomenon of interrater agreement: a multicomponent approach for in-depth examination was proposed. Journal of Clinical Epidemiology, 65 (9), 1016-1025.

Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. In: A peer-reviewed electronic journal. Practical Assessment, Research & Evaluation, 9(4), 2004. ISSN 1531-7714.

Stolarova, M., Wolf, C., Rinker, T., Bielmann, A. (2004). How to assess and compare inter-rater reliability, agreement and correlation of ratings: An exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs. Frontiers in Psychology, 5, 1-13.

ten Hove, D., Jorgensen, T. D., & van der Ark, L. A. (2018). On the usefulness of interrater reliability coefficients. In M. Wiberg, S. Culpepper, R. Janssen, J. González, & D. Molenaar (Eds.), Quantitative Psychology: The 82nd Annual Meeting of the Psychometric Society, Zurich, Switzerland, 2017 (pp. 67-75). (Springer Proceedings in Mathematics & Statistics; Vol. 233). Springer.

Tinsley, H. E. A., Weiss, D. J. (1975). Interrater Reliability and Agreement of Subjective Judgments. Journal of Counseling Psychology. 1975, Vol. 22, No. 4, s. 358-376

Uebersax, J. (2008). Statistical methods for rater agreement. http://www.john-uebersax.com/stat/agree.htm

Wilhelm, A. G., Rouse, A. G., and Jones, F. (2018). Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability. Practical Assessment, Research, and Evaluation: Vol. 23 , Article 4.

Zhao, X., Liu, J. S., Deng, K. (2013). Assumptions behind intercoder reliability indices. Annals of the International Communication Association, 36, 419–480.

Metriky