Comparison of Agreement Statistics in Case of Multiple-Raters and Diagnostic Test Being Categorical: A Simulation Study
E. Arzu Kanýk1, GĂĽlhan Ă–rekici Temel1, Semra Erdođan1, Ýrem ErsĂ¶z Kaya2.
Aim: When the number of raters and the number of categories of diagnostic tests are two or more, put forward agreement statisticsâ€™ conditions of being affected by the sample size, the number of raters and the number of categories of scale used. Material and Methods: AC1 statistic, Fleiss Kappa and Krippendorffâ€™s Alpha values belonging to state where there was no agreement between raters and states where agreement was 0.90 for those combinations were recorded for 1000 simulation study. Results: The expected agreement between raters is 0.90, AC1 statistic and Fleiss Kappa coefficient offer similar results and take equivalent values, to the expected value of agreement in all combinations. When Krippendorffâ€™s Alpha coefficient examined, it is not affected by sample size but affected by the number of raters and the number of categories pertaining to diagnostic test. Conclusion: If prevalence value is known and a bear significant for study, use of AC1 statistic is recommended among agreement statistics, if the existence of missing data is the case in study, Krippendorffâ€™s Alpha coefficient is the most appropriate agreement statistic, except these cases mentioned, use of Fleiss Kappa coefficient is recommended. Key Words: Fleiss Kappa; Gwetâ€™s AC1 Statistics; Krippendorff Alpha; Agreement between Raters.