Agreement among ratings or measurements provided by several raters (humans or
devices) is considered in education, biomedical sciences, and other disciplines. For
instance, the agreement among ratings of educators who assess on a new rating scale
the language proficiency of a corpus of argumentative texts is considered to test
reliability of the scale, or the agreement among clinical diagnoses provided by
physicians is analysed for identifying the best treatment for the patient. In all these
applications, the main interest is to analyse interrater absolute agreement, that is the
extent that raters assign the same (or very similar) values on the rating scale. Many
indices of interrater agreement on a whole group of subjects (objects) have been
proposed. Less frequently agreement on single subjects has been considered, in spite
of the fact that this is useful, for example, to request the raters for a specific
comparison on single cases in which agreement is poor. In the seminar, after a critical
review of the most used indices of interrater agreement, new subject-specific and
global measures of absolute agreement for ratings on different levels of scale are
presented. Some applications will show the advantages of the indices proposed.
25 Marzo 2022
Giuseppe Bove
Department of Education, Roma Tre University