Measures of Interrater Agreement


Agreement among ratings or measurements provided by several raters (humans or devices) is considered in education, biomedical sciences, and other disciplines. For instance, the agreement among ratings of educators who assess on a new rating scale the language proficiency of a corpus of argumentative texts is considered to test reliability of the scale, or the agreement among clinical diagnoses provided by physicians is analysed for identifying the best treatment for the patient. In all these applications, the main interest is to analyse interrater absolute agreement, that is the extent that raters assign the same (or very similar) values on the rating scale. Many indices of interrater agreement on a whole group of subjects (objects) have been proposed. Less frequently agreement on single subjects has been considered, in spite of the fact that this is useful, for example, to request the raters for a specific comparison on single cases in which agreement is poor. In the seminar, after a critical review of the most used indices of interrater agreement, new subject-specific and global measures of absolute agreement for ratings on different levels of scale are presented. Some applications will show the advantages of the indices proposed.

25 Marzo 2022

Giuseppe Bove
Department of Education, Roma Tre University

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma