Estimators of different delta coefficients based on the unbiased estimator of the expected proportions of agreements
To measure the degree of agreement between two observers that independently classify $n$ subjects within $K$ categories, it is common to use different kappa type coefficients, the most common of which is the $κ_C$ coefficient (Cohen's kappa). As $κ_C…
Authors: A. Martín Andrés, M. Álvarez Hernández
Estimators of different delta coefficients based on the unbi ased estimator of the expected proportions of agreements Martín Andrés, A. 1 and Álvarez Hernández, M. 2,3 1 Bioestatistics. Emeritus Professo r of the U niversity of Granada , Gran ada, Spain . 2 Departamento de Estadística e In vest igación Ope rativa, Universi dad de Vigo, S pain. 3 CITMAga, 15782 Santiago de C ompostela, La Coru ña, Spain. ABSTRACT To measure the degree of agreement between two observers that independently classify n subjects within K categories, it is common to use different kappa type coefficients, the most common of which is the C coefficient (Cohen's kappa ). As C has some weaknesses -such as its poor performance with highly unbalanced marginal distributi ons-, the coefficient is sometimes used, based on the delta response model. This model allows us to obtain other parameters like: (a) the i contribution of each i category to the value of the global agreement = I ; and (b) the consistency i in the category i (degree of agreement in the category i ), a more appropriate parameter than the kappa value obtained by collap sing the data into the category i . I t h a s r e c e n t l y b e e n s h o w n t h a t t h e c l a s s i c e s t i m a t o r ˆ C underestimates C , having obtained a new estimator ˆ CU which is less biased. This article demonstrates that something similar happens to the known estimators ˆ , i ˆ , and ˆ i o f , i , and i (respectively), proposes new and less biased estimators U ˆ , iU ˆ , and ˆ iU , determines their variances, analyses the behaviour of all estimators, and concludes that the new estimators should be used when n o r K are small (at least when n 50 or K 3). Additionally, the case where one of the raters is a gold standard is contemplated, in which situation t wo new parameters arise: t he conformity (the rater’s capability to rec ognize a subject in the category i ) and the predictivity (the reliability of a response i by the rater). Keywords: agreement; consistency; Cohen’s kappa ; delta agreement model. Email: amartina@ugr.es . Phone: 34-58-244080 . 2 1. Introduction Let there be two raters that independently classify n s ubjects in K nominal categories. Given a subject, Observer 1 classifies it as type i ( i 1, 2, ..., K ) and Observer 2 as type j ( j 1, 2, ..., K ) . T h i s l e a d s t o a t a b l e o f a b s o l u t e f r e q u e n c i e s x ij like that of Table 1(a), in which x i = j x ij and x j = i x ij . The random variable { x ij } follows a multinomial distribution M{ n p ij }, where p ij is the probability of a subject being classified in the cell ( i , j ) : s e e T a b l e 1 ( b ) , i n which p i = j p ij a nd p j = i p ij are the marginal distributions. In this scenario, the primary interest of researchers is to determine the degree of concordance or agr eement between both raters. From a population point of view, this degree of agreement canno t be the populational index of observed agreements I o = i p ii -the sum of all the probabilities of agreement -, since some o f these agreements may be random. This is why it is standard proc edure to eliminate the random effect defining a kappa type coefficient = ( I o I e )/(1 I e ), where I e is the populational index of expected agreements -the sum of the probabilities of agreements that happen randomly - and is the degree of population agreement. Depending on how I e is defined we obtain different kappa - S coefficients (Scott 1955), C (Cohen 1960), K (Krippendorff 2004), and G (Gwet 2008), among others-, all of which are based on the marginal di stributions. The standard procedure is to define I e = i p i p i , thus obtaining the C value of Cohen. The estimation of these coefficients has the general form of 1 oe e ˆˆ ˆ ˆ I II , where the values ˆ , o ˆ I = ii p (with ij p = x ij /n ) and e ˆ I are the sample estimations of the previous population parameters , I o , and I e , respectively. In the case of C ˆ it is e ˆ I = i ii p p , where i p = j ij p and j p = i ij p . Something similar happens with the other coefficients S ˆ , K ˆ , and G ˆ . Nevertheless, all of the estimators e ˆ I -of any of the definitions of I e which were proposed – are based on the assumption that ii p p is an unbiased estimator of p i p i . Martín Andrés and 3 Álvarez Hernández (2025) recently observed that this is not tru e -since the expected value of a product is not generally the product of t he expected values -, and they proposed the necessary expressions to correct that bi as, thus improving the estimation of . Although C ˆ is a very popular measure of agreement, it has two very signif ican t limitations (Brennan and Prediger 1981, Agresti et al. 1 995, Guggenmoos-Holzmann and Vonk 1998, Nelson and Pepe 2000, Martín Andrés and Femia Marzo 2004 and 2005, and Erdmann et al. 2015): its dependence on marginal distributions and its diffic ulty in evaluating the degree of agreement in each category. Martín Andrés and Fem ia Marzo (2004 and 2005) and Martín Andrés and Álvarez Hernández (2022) proposed a respo nse model ( delta model) that gives rise to the agreement measure delta ( ) which does not have the limitations of C (Ato et al. 2011, Shankar and Bangdiwala 2014, Giammarino et al. 2021). For instance, with K =2 and data x 11 =80, x 12 =10, x 21 =10 y x 22 =0 by Nelson and Pepe (2000), C ˆ = 0.11 and ˆ =0.58 are obtained. I n the clas sic definitions of agreement, t he p ij probabilities are not submitted to any model, but in the case of the delta model (with the notation of Martín Andrés and Álvarez Hernández 2022) p ij = ij i + B i1 j2 with B = 1 , = i i , 0 ir 1 and i ir =1 ( r =1, 2), (1) when ij a r e t h e K r o n e c k e r d e l t a s . I n t h i s m o d e l , i s the total proportion of non-random agreements (global agreement), i is the proportion of agreements in Category i which are not random (contribution of category i to the global agreement), and { ir } is the distribution of the response that the rater r makes randomly (in the proportion of subjects B =1 that classifies by chance). Note tha t the model indic ates that p ii = i + B i1 i2 for i = j , so the proportion p ii o f observed agreements (theoretical) in the category i decomposes into two addends: the proportions of agreements NOT due by chance ( ii ) y those YES due by chance ( B i1 i2 ). That is why the global raw agreement is i p ii , while the global agreement corrected by chance is 4 = i i . The model has the advantage of providing results similar to tho se of the classic C i n normal circumstances, but provides a greater degree of agreemen t than C when the marginal distributions a re very unbalan ced in the same direction. Coeffi cient c a n a l s o b e p u t i n t h e format of since, due to expression (1), = ( I o I )/(1 I ) with I = i i1 i2 . The aforementioned authors provide the estimati on of the parameters of the delta model through the method of maximum likelihood -values i ˆ = 12 1 ii i i ˆ ˆˆ p , ir ˆ and ˆ = i i ˆ (see Appendix A)-, and it also occurs that ˆ = 1 o ˆˆ ˆ I II , where ˆ I = i 12 ii ˆˆ . As once again 12 ii ˆˆ i s n o t a n unbiased estimator of i1 i2 , t h e n o u r f i r s t o b j e c t i v e w i l l b e t o c o r r e c t t h i s d e f e c t a n d i mprove estimator ˆ . Additionally, researchers can als o have two secondary interest s. Firstly, to determin e the contribution i of category i to the global agreement . Secondly, to determine the degree of agreement in category i which is not random i.e. the consistency i in category i 12 22 . 21 ii i ii i i i pp (2) This parameter, which comes from the raw agreement index in cat egory i by Cicchetti and Feinstein (1990) - S i =2 p ii /( p i + p i ), in which p ii has been replaced by i - refers t o the proportion of (non-random) agreements betwe en all of the subjects classifi ed in category i by either of the two raters, and was analysed and estimated in t he context of th e delta model by Martín Andrés and Femia Marzo (2005). These authors demonstrated that i has the same objective as that sought when defining C f o r t h e c o l l a p s e d d a t a i n t h e c a t e g o r y i -parameter C ( i ) -, an objective which is always achieved with i , but which sometimes is not achieved with C(i) . In fact, a parameter of global agreement based on the collapsed data in th e category i m e a s u r e s t h e degree of global agreement in t he new situation, but does not m easure the degree of agreement 5 in category i since, for the same reason, it should also measure the degree of agreement in the category “no i ”. As in both cases ( i and i ) the estimation of the parameter ( i ˆ and i ˆ ) depends on the estimation of t he product i1 i2 , the conclusion is the same as in the previous paragraph: the two estimations are biased and that bias should b e c o r r e c t e d a s m u c h a s possible. This will be the second objective of th is article. T h e t h i r d o b j e c t i v e i s o b v i o u s : t o d e t e r m i n e t h e v a r i a n c e o f e ach of the new estimators obtained. The fourth and last objective, d iscussed in a more superficial way, consists of measuring the quality of a rater’s response when the other rate r is a gold standard, which is related to diagnostic methods. 2. Approximately unbiased estimator of i1 i2 and new estimators proposed for , i and S i when K >2. In the following it is assumed that i ˆ , ir ˆ , and ˆ are the maximum likelihood estimators of i , ir , and respectively, obtained under to the delta model. Appendix A summarizes the procedure to obtain these estimators, and at http://www.ugr.es/local/bioest/ software (Section “Agreement Amo ng Raters") it is possible to d ownload two free programmes (Delta.exe and Multi_Rater_Delta.exe) to run this model. App end ix A also demonstrates that 12 12 ii ii i ˆˆ EE where 12 1 11 ii ii i XX X E nX , (3) with X = i X i and X i = i1 i2 /( i1 + i2 1). This means that 12 ii ˆˆ is a biased estimator of i1 i2 and that it is not possible to easily correct the bias of 12 ii ˆˆ . The best option (see Appendix A) is the following approximat ely unbiased estimation: 12 12 12 1 where 1 1 ii ii ii i i ii U ˆˆ ˆ XX X ˆˆ ˆˆ ˆˆ EE ˆ ˆ X n , (4) 6 where i i ˆˆ X X and 12 1 2 1 ii i i i ˆ ˆˆ ˆ ˆ X . Expression (4) also causes th e traditional es timators of I , , i , and i to change, since all of them depend on this expres sion. As the classic estimato r s are: ˆ I = 12 ii ˆˆ , ˆ = 0 1 ˆˆ I I ˆ I , 12 1 ii i i i ˆ ˆ ˆˆ p and 2 i i ii ˆ ˆ p p , (5) the new (approximately unbi ased) estimators will be, U ˆˆ ˆ I IE , where 22 1 1 1 i i i i ˆˆ X X ˆˆ ˆ EE I ˆ ˆ X n , (6) 0 1 U U U ˆˆ I I ˆ ˆ I , 12 1 iU ii U i i i ˆ ˆ ˆ ˆˆ p E and 2 iU iU ii ˆ ˆ p p , (7) where it can be observed that the definitions are coherent: i iU ˆ = U ˆ . In general it can be expected that most of the time U ˆ ˆ , iU ˆ i ˆ , and iU ˆ i ˆ . The reason for this is that it will generally happen that 12 1 ii ˆˆ <0, so that i ˆ X <0 a nd, t hrough expressions (4) and (6), i ˆ E >0, 12 12 ii ii U ˆˆ , and U ˆ I ˆ I . The simulations in Section 4 confirm this question. 3. Variances of the new estimators. Martín Andrés and Femia Marzo ( 2004 and 2005) provided the fol lowing asymptotic variances of the estimators ˆ , i ˆ and i ˆ , which we express here -and in everything that follows- in the format of Martín Andrés and Álvarez Hernández ( 2022), 1 1 A X ˆ V nX , 1 ií í Aí H ˆ V n , and 2 42 3 2 ii i i i i i i Ai i Ht t p ˆ V nt , (8) where t i = p i + p i and 11 1 i ii X HX X , (9) 7 as well as the following estimated variances: 1 1 ˆˆ X ˆˆ ˆ V ˆ n X , 1 ií í í ˆ ˆˆ H ˆ ˆ V n , and 2 42 3 2 ii i i i i i i í i ˆˆ ˆ ˆ Ht t p ˆ ˆ VS nt , (10) where ii i tp p and 11 1 i ii ˆ X ˆˆ ˆ HX ˆ X . (11) The determination of t he formula of the variance of the new es timators U ˆ , iU ˆ , and iU ˆ is complicated and its expression will be very complex. Given that the extra terms of these estimators - i ˆ E and ˆ E of expressions (4) and (6) respectively- are divided by n , it is reasonable to assume that the formulas will be similar to those of express ions (10), with the appropriate changes. For example, as (1 ˆ ) = (1 U ˆ ){1+ ˆ E /(1 ˆ I ) } = ( 1 U ˆ )+ ˆ B , where ˆ B =(1 U ˆ ) ˆ E /(1 ˆ I ), then V(1 ˆ ) = V(1 U ˆ )+V( ˆ B ) 2Cov{ U ˆ , ˆ B }≈ V(1 U ˆ ) -since ˆ B i s O( n -1 )- and therefore V( ˆ )≈V( U ˆ ). That is why the proposed variances are given by that the following modifications of e xpressions (10) and (11): 1 1 U UU ˆ ˆ X ˆˆ ˆ V ˆ n X , 1 íU íU iU íU ˆ ˆˆ H ˆ ˆ V n y 2 42 3 2 iU iU i i iU ii iU iU i ˆˆ ˆ ˆ Ht t p ˆ ˆ V, nt (12) where: 11 1 i iU U i ˆ X ˆˆ ˆ HX ˆ X . (13) 8 4. Simulations. This Section will determine the bias of the three classic esti mators ( ˆ , i ˆ , and i ˆ ) and of the three new estimators ( U ˆ , iU ˆ , and iU ˆ ), assessing and comparing the behaviour of all of them. Furthermore, this section will study the behaviour of the different formulas of the variances indicated in the previous section. For t his purpose, we will consider the 48 setting indicated in Table 2, each one of which refer to a multinomial distribution M{ p ij ; n } whose parameters under the delta model are those indicated on each line: values of n , K , i , i1 , and i2 . 4.1. The case of parameter . To assess the estimators of and of their variances, the procedure is as fo llows: (1) For each setting in Table 2, calculate A ˆ V of the first expression (8) and the value of . (2) Obtain N =10,000 random samples and, in each sample h -with h = 1 , 2 , … , N -, obtain the estimations h ˆ and Uh ˆ of expressions (5) and (7), and the values h ˆˆ V and hU ˆˆ V o f expressions (10) and (12). (3) Calculate the average and the sample variance (denominator N 1) of the N values of h ˆ and Uh ˆ : E ˆˆ ,V and UE U ˆˆ ,V respectively. (4) Calculate the average of the N values h ˆˆ V and hU ˆˆ V : ˆˆ V and U ˆˆ V respectively. (5) With the data obtained, T able 3 is constructed. F o r t h e a s s e s s m e n t o f t h e e s t i m a t o r s o f , it is necessary to take into account that the comparison of ˆ and U ˆ with the known value allows us to assess the bias of each estimator. It is observed that t he two estimators always undere stimate the true value , except for the estimator U ˆ which slightly overestimates it on one occasion and coincides with it on 9 another. The average underestimation is 0.044 for ˆ and 0.021 for U ˆ . Moreover, U ˆ ˆ , with a minimum of 0.003 and a maximum of 0.070. It can therefor e be seen that U ˆ is clearly a better estimator of than ˆ . In a more detailed way, the average of t he differences U ˆ ˆ : (a) decreases with n , and is 0.04, 0.02, and 0.01 for n = 30, 50 and 100 respectively, which m e an s t h a t i t i s n e c e s s a r y t o ma k e t h e co r r e c t i o n a t l ea s t f o r n 50; (b) it decreases with K , and is 0.03 and 0.01 for K =3 and 5 respectively, which means that it is necessary to make t h e correction at least for K 3; (c) it increases very slightly with , passing from 0.021 in =0.4 t o 0.025 in =0.8, which means that that the correction is necessary whether is small or large. The consequence is that U ˆ is a better estimator than ˆ and it should be used when n o r K a r e sma ll . F or ex amp le , o ne of th e e xt re me cas e s is t hat of th e f ir st setting ( n =30, K =3 and =0.40) in which ˆ =0.31 and U ˆ =0.38. If the maximum differences U ˆ ˆ a r e d e t e r m i n e d i n a l l o f the settings with ( n , K ) = (3, 30), (3, 50), (3, 100), (5, 30), (5, 50), and (5, 100), it is observed that these differences can reach the values 0.08, 0.06, 0.03, 0 .06, 0.02, and 0.00, respectively. If it is considered that a difference U ˆ ˆ of 0.02 is r elevant, then we should use U ˆ if n 50 or K 3. Although this has only been demonstrated with very limited v alues of n a n d K , and in some specific settings, in practice the demands may be rather less strict, since the values U ˆ ˆ are average differences and, therefore, the values of interest U ˆ ˆ may be much larger. For the assessment of the variances, it is necessary to take i nto account that: (A) The two estimators ˆ and U ˆ have exact unknown variances, but they can be estimated in quite a precise manner through the sample variances E ˆ V and EU ˆ V described previously in paragraph (3). The se values will be referred to f rom now on as “exact 10 variances”, serving as a reference to assess the average of the estimated values: ˆˆ V an d U ˆˆ V . (B) The true value of the asymptotic variance ˆ is given by the value A ˆ V of expression (8), which is not an estimated value, since it is obtained in the tr ue values of the parameters in each setting. The comparison of A ˆ V (exact asymptotic variance) with E ˆ V (exact variance) will indica te how good the asymptotic formula actuall y is. Beginning with (B), the compar ison of both estimators indicate s that E ˆ V > A ˆ V , except on two occasions, which means that the asymptotic formul a itself always provides smaller values than those of the true variance. In particular, E ˆ V A ˆ V is found between 0.0022 and +0.0310 (with an average of 0.0063), which indicates that this underestimation may become very important. Regarding what is highlighted i n (A), the data always indicate t h a t ˆˆ V > E ˆ V a n d U ˆˆ V EU ˆ V , so t hat (on average) the classic and new variances overestima t e t h e r e a l variance. It is also observed that 1.01 ˆˆ V / E ˆ V 2.75 and 1 U ˆˆ V / EU ˆ V 2.71, fractions which decrease (tending to 1) when n o r K increase, but they grow when increases. Note that although it was previously stated that A ˆ V underestimated the real, value of the variance, substituting in it the correct optimal estimators the opposite occurs: the estimators overestimate the true variance. Note also that the formula used fo r U ˆˆ V works coherently in relation to the rea l formula of ˆˆ V . 11 4.2. The case of parameter i . The simulation procedure is similar to that of Section 4.1, le ading to the results in Table 4 for the case of category 3 (the only one that is contemplated here, but the same thing happens with the other categories). Now the real value 3 is that of the setting itself, the N estimators 3 ˆ of each setting will give an average 3 ˆ and a variance 3 E ˆ V (denominator N 1) which is assumed to be the true one. We call 3 ˆ ˆ V the average of the N variances of 3 ˆ . La exact asymptotic variance 3 A ˆ V is deduced from the values of the setting through the second expression (8). Similarly with 3 U ˆ : 3 U ˆ , 3 E U ˆ V , and 3 U ˆ ˆ V . It is observed that the two estimators always underestimate th e true value 3 , except for estimator 3 U ˆ which they slightly overestimate on two occasions. The averag e underestimations are 0.017 for 3 ˆ and 0.008 for 3 U ˆ . Furthermore 3 U ˆ 3 ˆ , with a minimum of 0.001 and a maximum of 0.041. It is therefore clear that 3 U ˆ is a better estimator of 3 t h a n 3 ˆ . The average of the differences 3 U ˆ 3 ˆ : (a) decreases with n , and is 0.014, 0.009, and 0.004 for n = 30, 50, and 100 respectively, which means that it is necessar y to make the correction at l east for n 50; (b) it decreases with K , and is 0.015 and 0.003 for K =3 and K =5 respectively, which m eans that it i s n e c e s s a r y t o m a k e t h e c o r r ection at least for K 3; (c) it is not influenced excessively by 3 . The consequence is that 3 U ˆ is a better estimator than 3 ˆ and it should be used at least for n 50 or K 3. For example, one of the extreme cases is that of the first setting ( n =30, K =3, and 3 =0.20) in which 3 ˆ =0.140 and 3 U ˆ =0.182. In a similar way as indicated in Section 4.1, in practice the demands may be rather less strict, since the values 3 U ˆ 3 ˆ are average differences and, therefore, the values of interest 3 U ˆ 3 ˆ may be much higher. 12 Regarding the variances, some similar happens to the scenario in the previous section: 3 E ˆ V > 3 A ˆ V , except on four occasions. This indicates that the asymptotic formula itself always provides values which are smaller than those of the true variance. Additionally, 3 E ˆ V 3 A ˆ V is found between 0.0052 and +0.0321 (with an average of 0.0034), which indicates that this underestima tion may be very important. Furt hermore, the data also always indicate that 3 ˆ ˆ V 3 E ˆ V and 3 U ˆ ˆ V 3 E U ˆ V , so that (on average) the classic and new variances overestimate the real variance. It is also observed t hat 1 3 ˆ ˆ V / 3 E ˆ V 3.73 and 1 3 U ˆ ˆ V / 3 E U ˆ V 2.63, fractions that decrease (tending to 1) when n o r K increase. Therefore, and as i n the case of the previous section: (1) alth ough it was previously stated that 3 A ˆ V underestimated the real value of the variance, substituting it with optimal estimators the opposite happens: the estimato rs overestimate the true vari ance; (2) the formula used for iU ˆ ˆ V works coherently in relation to t he performance of the real fo rmula rea l of i ˆ ˆ V . 4.3. The case of parameter S i . The simulation procedure is similar to that in Section 4.1, le ading to the results in Table 5 for the case of category 3. Now the real value 3 is deduced from the values of the setting in its row -see the last expression of (5)- and will be different in general f or each row. The N =10,000 estimators 3 ˆ of each setting will giv e a measure 3 ˆ and a variance 3 E ˆ V (denominator N 1) which is assumed to be the true one. We call 3 ˆ ˆ V the average of the N variances of 3 ˆ . The exact asymptotic variance 3 A ˆ V is deduced from the values of the setting through of the third expression (8). A similar thing ha ppens with 3 U ˆ : 3 U ˆ , 3 E U ˆ V , and 3 U ˆ ˆ V . 13 It is observed that the two estimators always underestimate th e true value S 3 . The average underestimations are 0.05 for 3 ˆ and 0.03 for 3 U ˆ . Moreover 3 U ˆ 3 ˆ , with a minimum of 0.003 and a maximum of 0.094. It is therefore clear that 3 U ˆ is clearly a better estimator of 3 t h a n 3 ˆ . The average of the differences 3 U ˆ 3 ˆ : (a) decreases with n , and is 0.04, 0.02, and 0.01 for n = 30, 50, and 100 respectively, which means that it is necessar y to ma k e t h e c o r r e c ti o n a t l e a s t fo r n 50; (b) it decreases with K , and is 0.04 and 0.01 for K =3 and K =5 respectively, which means that it is necessa ry to make the c orrection at least for K 3; (c) it is not influenced excessively by 3 . The consequence is that 3 U ˆ is a better estimator than 3 ˆ and it is necessary to use it at least for n 50 or K 3. For example, one of the extreme cases is that of the third setting ( n =30, K =3, and S 3 =0.32): 3 ˆ =0.19 and 3 U ˆ =0.28. In a similar way to what is indicated in Section 4.1, in practice the demands ma y be rather less strict, since the values 3 U ˆ 3 ˆ are average differences and, therefore, the values of interest 3 U ˆ 3 ˆ may be much higher. It can be observed that it now occurs that 3 E ˆ V > 3 A ˆ V , except on two occasions, which indicates once again that the asymptotic formula itself a lways give smaller values than those of the true variance. In particular, 0.0266 3 E ˆ V 3 A ˆ V +0.1608, with an average of 0.0266; this indicates that thi s underestimatio n may also be very important. Furthermore, the data indicate that 3 ˆ ˆ V 3 E ˆ V and 3 U ˆ ˆ V 3 E U ˆ V on 75% of occasions, so that (on average) the classic and new variances usually overestimate the real vari ance. It is also observed that 0.87 3 ˆ ˆ V / 3 E ˆ V 4.98 and 0.96 3 U ˆ ˆ V / 3 E U ˆ V 4.23, fractions that decrease (tending to 1) when n o r K increase. Note that it was previously stated that 3 A ˆ V 14 almost always underestimated the real value of the variance 3 E ˆ V , substituting in it the optimal estimators the opposite h appens: the estimators usually overestimate the true variance. Additionally, the formula used for 3 U ˆ ˆ V works coherently in relati on with t he performance of the real formula of 3 ˆ ˆ V . 5. The special case in which there are only two categories ( K =2). T h e delta model presents the problem of when there are only two categori es, then there are more unknown parameters ( 1 , 2 , 11 and 12 ) than cells free to take values ( p 11 , p 12 , and p 21 for example, since i j p ij =1). To solve the problem, Martín Andrés and Femia Marzo (2004) proposed the following solution: (1) create a third virtual category of observed frequencies x i3 = x 3j =0 ( i , j ); (2) increase all of the data in the new 3 3 tabl e by 0 .5; (3) estimate the parameters based on the new table with K =3; and (4) redefine the measures of agreement taking into account the new situation. This procedure has been found to provide coherent results: Martín Andrés and Femia Marzo (2004, 2005, an d 2008), Ato et al. (2011), Shankar and Bangdiwala (2014), and Giammarino et al. (2021). L e t ij p be the new proportions observed, and let i , ir and the parameters of the delta mod el , a ll of t he m re fe rri ng to t he new 3 3 table. With the notation of Martín Andrés and Álvarez Hernández (2022), the measures of agreement for the ori ginal 2 2 table are defined as i = i /( p 1 + p 2 ), 12 , and i =2 i /( p i + p i )= i , for i =1 and 2. Their estimators and the estimated variances are: 3 1 i i ˆ ˆ , p 3 2 3 1 V1 1 1 1 1 1 * ** i ii i i ˆ X ˆ ˆˆ ˆˆ ˆ X p, ˆ X np (14) 12 ˆ ˆˆ , 3 33 2 3 1 V1 1 1 1 1 1 ˆˆ XX ˆˆ ˆˆ ˆ ˆ X p. ˆ X np (15) In the case of i , whose definition is t he same as that of expression (2), what is valid is what is 15 indicated in expr essions (5) for ˆ i and (10) for ˆ ˆ i V . If now we a pply to expressions ( 14) and (15) t he corrections h ighlighted in Sections 2 and 3, we obtain the followi ng expressions for the type U estimators: 3 1 iU iU ˆ ˆ , p 3 2 3 1 V1 1 1 1 1 1 * ** i iU U i iU iU ˆ X ˆˆ ˆ ˆˆ ˆ X p, ˆ X np (16) 12 * UUU ˆ ˆˆ , 3 33 2 3 1 V1 1 1 1 1 1 * ** UU U U ˆˆ XX ˆˆ ˆˆ ˆ ˆ X p. ˆ X np (17) As in the previous paragraph, in the case of i expressions (7) for ˆ iU and (12) for ˆ ˆ iU V are valid. 6. Case in which there is a gold standard Until now, it has been assumed that neither of two raters is a gold standard. Let us assume that the rater in r ows is indeed a gold standard. In tha t case, two new parameters are also of interest. First, the raw probability that the rater wil l identify a subject as belonging to category i , when it actually belongs to that category, is F i = p ii / p i ; therefore that probability corrected by chance i s the conformity i = i / p i by Martín Andrés and Femia Marzo (2005). Second, the raw probability that when the rater responses i , the subject is actually in the category i (according to the gold standard), i s P i = p ii / p i ; therefore that probability corrected by chance is the predictivity i = i / p i by the same a uthors. Note that when K =2, categories 1 and 2 are “sick” and “healthy” r espectively, and the rater 2 is a d iagnostic test, so F 1 =Sensitivity, F 2 =Specificity, P 1 =Positive Predictive Value and P 2 = Negative Predictive Value (“of the diagnostic test”, in all cases). The estimators of these parame ters, and their variances, are the ones by Martín Andrés and Femia Marzo (2005). Their values, in the format of Martín Andrés and Álvarez Hernández (2022), are: 16 2 ˆˆ ˆ 1 ˆ ˆˆ ˆ , ii i i i ii ii Hp V pn p , 2 ˆˆ ˆ 1 ˆ ˆˆ ˆ , and = ii i i i ii ii Hp V pn p , where ˆ i H is given by the expression (11). Adapting those expressions to t he current corrected estimators yields: 2 ˆˆ ˆ 1 ˆ ˆˆ ˆ , iU i iU iU iU iU iU ii Hp V pn p , 2 ˆˆ ˆ 1 ˆ ˆˆ ˆ , and = iU i iU iU iU iU iU ii Hp V pn p , where ˆ iU H is given by the expression (13). 7. Examples Table 6 ( a) contains t he data from the classic example by Flei ss et al. (2003) in which two raters diagnose n =100 subjects in K =3 categories (Psychotic, Ne urotic a nd Organic). Part (b) specifies the classic estimations of parameters i , i , and ( i ˆ , i ˆ , and ˆ , respectively) and the new estimations ( iU ˆ , iU ˆ , and U ˆ , respectively). Some of the new estimators are higher than the classic estimators by approximately 0.03; for e xample, in the global agreement we obtain ˆ = 0.687 vs U ˆ = 0.715 (a 4% increase). With the classic Cohen kappa measure of agreement the differences are smaller: C ˆ =0.676 vs CU ˆ =0.679. The differences are more notable when using the same n =100, but a K = 2 value, as in Table 7(a); this table -which was already mentioned in the Introduction- al s o a l l o w u s t o illustrate the advantage o f the delta measure over the kappa measure. From the point of view of kappa it is obtained that C ˆ = 0.111 vs CU ˆ = 0.112. These values -which are very similar- are very surprising since, as the obs erved agreements are 80/100=80 % , t h e t w o kappa coefficients provide a negative degree of agr eement; this is due t o the fact that the marginals are very unbalanced in the same direction. From the point of t he v iew of delta it is obtained that -Table 7(b)- ˆ = 0.582 vs U ˆ = 0.714 (a 23% increase). Now the degree of agreement is notabl y larger, it is concordant with the high degree of agreement observed, an d the value of U ˆ is notably 17 higher than the value of ˆ . Something similar happens with t he estimators of the coeffici ents i and i . Furthermore, if we assume that the row rater is gold standard , the new interest coefficients that allow us evaluate the column rater are the conformity i and the predictivity i . Their various estimators are those shown in the Table 7(c), a nd it can be seen that something similar to what is d escribed above also occurs. Finally, Table 8(a) contains the data from an example with n =30 subjects ( n 50) and K =4 categories ( K >3). Part (b) specifies the classic estimations of parameters i , i and ( i ˆ , i ˆ and ˆ , r es p ec t iv e ly ) an d th e n e w e s ti m at i on s ( iU ˆ , iU ˆ a nd U ˆ , respectively). Some of the new estimators are higher than the classic estimators by 0. 03 or more; for i nstance, in the global agreement we obtain ˆ = 0.182 vs U ˆ = 0.210 (a 15% increase) a nd in the consistency in category 2 we obtain 2 ˆ =0.074 vs 2 U ˆ =0.115 (a 55% increase). With the classic Cohen kappa measure of agreement the differences are smaller: C ˆ =0.197 vs CU ˆ =0.202. 8. Conclusions There are different kappa type coefficients that measure the experimental degree of agreement between dos raters that independently classify n subjects in K categories. The best known of them is the C coefficient (Cohen 1960). Martín Andrés and Álvarez Hernández (2025) demonstrated that the traditional estimator C ˆ of C underestimates the true value of the parameter, proposed a new estimator CU ˆ ( C ˆ si C ˆ 0) that improves the performance of C ˆ , and justified that the corr ecti on is convenient at least when n 30. Given that C shows a poor performance when the marginal distributions are v ery unbalanced in the same direction (Brennan and Prediger 1981, Ag resti et al. 1995, Guggenmoos-Holzmann and Vonk 1998, and Nelson and Pepe 2000) an d, additionally, it does not always properly measure the degree of agreement in a specif ic category (Martín Andrés 18 and Femia Marzo 2004, 2005), these same authors and Martín Andr és and Álvarez Hernández (2022) proposed a ( delta ) model que that corrected both problems. This article has demonstrated both theoretically and practically that the estima tors i ˆ , i ˆ , and ˆ o f t h e parameters of interest of the model i , i , and respectively, usually underestimate the true value of said parameters, proposing new estimators iU ˆ , iU ˆ , and U ˆ respectively, which improves the estimation. The co rrected estimators should be use d the smaller n and K are, and they are particularly advisable when n 50 or K 3. The article also studies (through simulation) the behaviour of t h e v a r i a n c e s o f t h e previous estimators (classic and new). An initial conclusion is that the known formulas of the asymptotic variances of the estimators i ˆ , i ˆ , and ˆ underestimates the real variances of said parameters. Nevertheless, the estimations of said variances ove restimates said real variances, especially in the case of the estimators i ˆ and ˆ ; the same thing happens in the case of the new estimators iU ˆ , iU ˆ , and U ˆ . Finally, the paper also considers two special cases: when K = 2 a n d w h e n t h e r a t er 1 i s a gold standard. In the first case it is necessary to adapt formu las for the case K >2. In the second case, t wo new parameters are defined and estimated: the conformity i (capability of rater 2 to identify, not by chance, a su bject belonging to the category i ) and the predictiv ity i (reliability of a response i from rater 2, not by chance). When both cases occur - K =2 and Rater 1 = Gold Standard-, t he new parameters correspond to the correc tion by chance of classic parameters used to evaluate a diagnostic test: sensitivity and specificity on the one hand, and positive and negative predictive values on the other. In both c ases, the new “ U ” type estimators perform better than the old uncorrected estimator of Martín And rés and Femia Marzo (2005). Acknowledgments This research was supported by the Ministry of Science and Inn ovation (Spain), Grant 19 PID2021-126095NB-I00 funded by MCIN/AEI/10.13039/501100011033 a nd by “ERDF A way of making Europe”. The author s have declared no conflict of interest. APPENDIX A Variance and covariance of the estimators of i1 and i2 subject to the delta model Agresti (2013) particularized the multivariant delta model in the case of a multinomial distribution, pointing ou t that if the functions f = f ( p ij ) and g = g ( p ij ) are estimated by ˆ ff ij p and ˆ gg i j p with p ij ={ p ij } and ij p = ij p , then if f ij = f / p ij and g ij = g / p ij : 11 11 11 ˆ ˆ ,. KK KK KK ij ij ij ij ij ij ij ij nCov f g f g p f g (A1) Moreover, Martín Andrés and Álva rez Hernández (2022) demonstrat ed that to o btain the estimators of the parameters of the delta model it is necessary t o solve the ( K +1) equations t hat follow in the ( K +1) unknown ( s , B ), with s =1, 2, …, K : 12 1 11 0 KK ss s s ss ss s dd B, B d , where 1 s ss s dp p and 2 s ss s dp p are, respectively, the observed disagreements of the raters 1 and 2 when classifying an individual like in category s . Note that 1 1 K s s d = 2 1 K s s d is the total proportion of observed disagreements. In t he previous e x p r e s s i o n i t m u s t b e a s s u m e d that s =0 when ir d =0 in r =1 or r =2. Once we have determined t he values s and B =1 , the maximum likelihood estimators of the parameters o f the model ar e: 12 12 1 2 1 and ss ss s ss s s s s s s dd ˆ ˆ ˆˆ ˆ ˆ B ,p , , B . BB (A2) Note that if 1 s d =0, then s ss ˆ p , 1 0 s ˆ , and 22 ss ˆ dB (similarly when 2 s d = 0 ) . I n t h a t same article the followin g derivatives were also obtained, 12 1 s j ss s i si j s Bs i js ij s s X, X Bp , 20 12 1 1 1 ij j i ij ij i j ij X X B p Xp , a n d ( A3) 12 12 1 11 1 sj j ss i i si sj ij s ij s ij i j i j X X X , pX with X and X i as in Section 2. From th is it is deduced that 12 11 j i ij ij s s ij X X , a n d 12 1 2 1 11 1 sj j ss i i ij s ij s ij s s i j X dX X , dp X (A4) since d s /d p ij = ( s / p ij )+( s / )( / p ij ) = ij(s) X s ij . If 1 s ˆ ˆ f and 2 s ˆˆ g are the functions referred to at the beginning of this Appendi x then, through expressions (A2), f ij = s1 / p ij = [ B { ij s +( d s1 / p ij )}+( s + d s1 ) ij ]/ B 2 , with d s1 = j p sj p ss a n d d s1 / p ij = si (1 ij ). Substituting the second expression of (A3) and the expression (A4), taking into account that s + d s1 = B s1 -third formula of expression (A2)-, and operating it is obtained that (in a similar way for g ij ): 11 11 12 1 1 11 a n d 11 ij j ss i ij si s sj s si j X X f TX 22 22 12 1 1 11 11 ij j ss i ij si s sj s si j X X g. TX where T s = s1 + s2 1. Performing operations it is deduced that i j f ij p ij = i j g ij p ij = 0 and that 12 1 11 ss ij ij ij s s ij XX X fg p X , (A5) With this, applying expressi o n ( A 1 ) i t i s d e d u c e d t h a t ˆ ˆ Cov f , g = 11 KK ij ij ij ij f gp n , where ˆ ˆ Cov f , g = 12 ss ˆˆ Cov , = 12 ss ˆˆ E 12 ss ˆˆ EE ≈ 12 ss ˆˆ E s1 s2 s i n c e s r ˆ a r e 21 maximum likelihood estimators. Th erefore, for expression (A5), expression (3) is deduced from Section 2. Additio nally: 1 11 ss r ss r sr XX ˆ V nX Section 2 proposed different estimators, all based on expressi on (4). An approximately unbiased estimator, which is an alternative to expression (4), is the following: 12 12 1 where and 1 1 11 ii ii i ii i AC ˆˆ ˆ ˆ XX X ˆˆ A ˆˆ AC ˆ ˆ ˆ ˆ C n nX , which leads to the follow ing alternative estimators to those of expressions (6) and (7): AC ˆ I A ˆ I C , where 22 11 i i i i ˆˆ XX ˆˆ AA , ˆ ˆ nX 0 1 AC AC AC ˆˆ I I ˆ ˆ I , 12 1 ii i iAC ii AC ˆ ˆˆ A ˆ ˆ p ˆ C , and 2 iAC iAC ii ˆ ˆ p p . We have found that t hese estimat ors are only slightly worse tha n those reviewed in Section 2. Their approximate variances are the same as in expressions (12) , changing U ˆ , iU ˆ , and iU ˆ with AC ˆ , iAC ˆ , and iAC ˆ respectively. REFERENCES Agresti A ( 2013) . Categorical Data Analysis (3nd e dn). New Jer sey: John Wile y & Sons. I SBN: 978-0-470-46363-5. Agresti A, Ghos h A and Bini M (1995). Raki ng Kappa: Describing Potential Impact of Marginal Distributions on Meas ures of Agreement. Biomet rical Journal 37 , 811-820. Ato M, López JJ and Benavente A ( 2011). A simulati on st udy of r ater agreement measur es with 2 2 contingency tables. Psicologi ca 32 , 385-402. Brenann RL and Prediger DJ (1981). Coefficient ka ppa: some uses , mi suses, and alternatives. Educational and Psychol ogical Measurement 41, 687-699. Cicchetti DV and Feinstein AR (1990). High agreement but low ka ppa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 43 , 551-558 Cohen J ( 1960). A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20 , 37-46. 22 Fleiss JL, Levin B and Paik MC (2003). Statistical Methods fo r Rates and Proportions . 3nd ed. New York: John W il ey & Sons. Giammarino M, Mattiello S, Battini M, Quatto P, Battaglini LM, Vieira ACL, Stilwell G and Renna M. (2021). Evaluation of i nter-observer re liability of an i mal welfare indicators: which is the best index to use? Animals 11 , 1445. Guggenmoos-Holzmann I and Vonk, R (1998). Kappa-Like Indi ces of Observer Agreement Viewed From a Latent Class Perspect ive . Statistics in Medicine 17 , 797-812. Gwet KL (2008) . Com puting inter- rater reliability and its varia nce in the presence of high agreement. British Journal of Mathe matical and Stati stical Psychology 61 (1) , 29-68. Kramer, M. S. and Fei nstein, A. R. ( 1981). Clinical Biostatisti cs. LIV. The Biostatistics of Concordance. Clinical Pharmacology Therapeutics 29(1) , 111-123. DOI: 10.1038/clpt.1981.18. Krippendorff K (2004). Measuring the reliabili ty of qualit ative text anal ysis data. Quality and Quantity 38 , 787–800. Martín Andrés, A. and Álvarez He rnández, M. (2022). Multi-rater delta: extending the delta nominal measure of agreement between two raters to many raters. Journal of Statistical Computation and Simulation 92 (9) , 1877-1897. DOI: 10.1080/00949655.2021.2013485. Martín Andrés, A. and Álvarez Hernández, M. (2025). Estimators of various kappa coefficients based on the unbiased estimator o f the expected index of agreem ents. Advances in Data Analysis and Classification 19 , 177-207 (published onlin e: 06 March 2024). DOI : 10.1007/s11634-024-00581-x. Ma r tí n And ré s A an d Fem ia Marz o P (20 04 ). Delta: a new measure of agreement between two raters. British Journal of Math ematical and Statist ical Psychology 57 ( 1) , 1-19. Martín Andrés A and Femia Marzo P (2005) . Chance-corrected meas ures of reliability and validity in K K tables. Statistical Methods in Medi cal Research 14 , 473-492. Martín Andrés A a nd F e mi a Marzo P (2008). Chance-cor rected meas ures of reliability and validity in 2 2 tables. Communications in Statistics-Theory and Methods 37 , 760-772. DOI: 10.1080/03610920 701669884. Cor rigendum in 39, 3009 (2010). Nelson JC and Pepe MS (2000). Statistical description of interr ater variability in ordinal ratings. Statistical Methods in Medical Resear ch 9 , 475-496. Scott WA (1955). Reliability of content analysis: the case of n ominal scale coding. Public Opin Q, 19 , 321-325. Shankar V and Bangdi wala SI (2014). Observer agreement paradoxe s in 2x2 tabl es: comparison of agreement meas ures. BMC Medical Research M ethodology 14 , 100. 23 Table 1 Distribution of the responses when two observers classify n subjects in K categories (a) Absolute frequencies observed x ij Rater 2 Rater 1 1 . . . j . . . K Total 1 x 11 . . . x 1 j . . . x 1 K x 1 . . . . . . . . . . . . . . . i x i1 . . . x ij . . . x iK x i . . . . . . . . . . . . . . . K x K 1 . . . x Kj . . . x KK x K Total x 1 . . . x j . . . x K n (b) Probability of a subject being classified in the indicated cell Rater 2 Rater 1 1 . . . J . . . K Total 1 p 11 . . . p 1 j . . . p 1 K p 1 . . . . . . . . . . . . . . . i p i1 . . . p ij . . . p iK p i . . . . . . . . . . . . . . . K p K 1 . . . p Kj . . . p KK p K Total p 1 . . . p j . . . p K 1 24 Table 2: Each line of the table indicates the populational valu es to be used (values of K , n , i and ir ). The first column (id) refers to the number of the setting. id K n 1 2 3 4 5 11 21 31 41 51 12 22 32 42 52 1 3 30 0.05 0.15 0.2 --- --- 0.2 0.3 0.5 --- --- 0.2 0.3 0.5 --- --- 2 --- --- --- --- 0.5 0.3 0.2 --- --- 3 0.13 0.13 0.14 --- --- --- --- 0.2 0.3 0.5 --- --- 4 --- --- --- --- 0.5 0.3 0.2 --- --- 5 50 0.05 0.15 0.2 --- --- --- --- 0.2 0.3 0.5 --- --- 6 --- --- --- --- 0.5 0.3 0.2 --- --- 7 0.13 0.13 0.14 --- --- --- --- 0.2 0.3 0.5 --- --- 8 --- --- --- --- 0.5 0.3 0.2 --- --- 9 100 0.05 0.15 0.2 --- --- --- --- 0.2 0.3 0.5 --- --- 10 --- --- --- --- 0.5 0.3 0.2 --- --- 11 0.13 0.13 0.14 --- --- --- --- 0.2 0.3 0.5 --- --- 12 --- --- --- --- 0.5 0.3 0.2 --- --- 13 30 0.15 0.25 0.4 --- --- --- --- 0.2 0.3 0.5 --- --- 14 --- --- --- --- 0.5 0.3 0.2 --- --- 15 0.26 0.26 0.28 --- --- --- --- 0.2 0.3 0.5 --- --- 16 --- --- --- --- 0.5 0.3 0.2 --- --- 17 50 0.15 0.25 0.4 --- --- --- --- 0.2 0.3 0.5 --- --- 18 --- --- --- --- 0.5 0.3 0.2 --- --- 19 0.26 0.26 0.28 --- --- --- --- 0.2 0.3 0.5 --- --- 20 --- --- --- --- 0.5 0.3 0.2 --- --- 21 100 0.15 0.25 0.4 --- --- --- --- 0.2 0.3 0.5 --- --- 22 --- --- --- --- 0.5 0.3 0.2 --- --- 23 0.26 0.26 0.28 --- --- --- --- 0.2 0.3 0.5 --- --- 24 --- --- --- --- 0.5 0.3 0.2 --- --- 25 5 30 0.05 0.05 0.05 0.1 0.15 0.1 0.15 0.2 0.25 0.3 0.1 0.15 0.2 0.25 0.3 26 0.3 0.25 0.2 0.15 0.1 27 0.08 0.08 0.08 0.08 0 . 0 8 0.1 0.15 0.2 0.25 0.3 28 0.3 0.25 0.2 0.15 0.1 28 5 0 0.05 0.05 0.05 0.1 0 . 1 5 0.1 0.15 0.2 0.25 0.3 30 0.3 0.25 0.2 0.15 0.1 31 0.08 0.08 0.08 0.08 0.08 0.1 0.15 0.2 0.25 0.3 32 0.3 0.25 0.2 0.15 0.1 33 100 0.05 0.05 0.05 0.1 0.15 0.1 0.15 0.2 0.25 0.3 34 0.3 0.25 0.2 0.15 0.1 35 0.08 0.08 0.08 0.08 0.08 0.1 0.15 0.2 0.25 0.3 36 0.3 0.25 0.2 0.15 0.1 37 30 0.1 0.15 0.15 0.2 0.2 0.1 0.15 0.2 0.25 0.3 38 0.3 0.25 0.2 0.15 0.1 38 0.16 0.16 0.16 0.16 0.16 0.1 0.15 0.2 0.25 0.3 40 0.3 0.25 0.2 0.15 0.1 41 50 0.1 0.15 0.15 0.2 0.2 0.1 0.15 0.2 0.25 0.3 42 0.3 0.25 0.2 0.15 0.1 43 0.16 0.16 0.16 0.16 0.16 0.1 0.15 0.2 0.25 0.3 44 0.3 0.25 0.2 0.15 0.1 45 100 0.1 0.15 0.15 0.2 0.2 0.1 0.15 0.2 0.25 0.3 46 0.3 0.25 0.2 0.15 0.1 47 0.16 0.16 0.16 0.16 0.16 0.1 0.15 0.2 0.25 0.3 48 0.3 0.25 0.2 0.15 0.1 25 Table 3: Assessment of the estimators ˆ and U ˆ , and of their variances, based on 10,000 simulation of each one of the settings in Table 2. id K n ˆ U ˆ A ˆ V E ˆ V ˆˆ V E U ˆ V U ˆˆ V 1 3 30 0.4 0.3 127 0.3824 0.0280 0.0509 0.1214 0.0413 0.1091 2 0.3416 0.3829 0.0174 0 .0321 0.0641 0.0274 0.0596 3 0.3171 0.3867 0.0280 0 .0485 0.1179 0.0395 0.1059 4 0.3373 0.3786 0.0174 0 .0312 0.0653 0.0266 0.0607 5 50 0.3317 0.3 752 0.0168 0.0404 0.0 896 0.0330 0.0822 6 0.3692 0.3905 0.0105 0 .0161 0.0248 0.0147 0.0238 7 0.3238 0.3678 0.0168 0 .0448 0.0976 0.0367 0.0896 8 0.3675 0.3892 0.0105 0 .0164 0.0262 0.0149 0.0252 9 100 0 .3664 0.3854 0.0084 0 .0206 0.0302 0.0182 0.0288 10 0.3873 0.3965 0.0052 0.0060 0 .0066 0.0059 0.0066 11 0.3647 0.3838 0.0084 0.0204 0 .0299 0.0181 0.0286 12 0.3892 0.3984 0.0052 0.0057 0 .0065 0.0056 0.0065 13 30 0.8 0.7088 0.7513 0.0120 0.0101 0.0224 0 .0077 0.0196 14 0.7083 0.7446 0.0085 0.0096 0.0223 0.0074 0.0199 15 0.7095 0.7521 0.0120 0.0098 0.0223 0.0075 0.0195 16 0.7079 0.7442 0.0085 0.0093 0.0222 0.0073 0.0198 17 50 0 .7479 0.7815 0 .0072 0.0091 0.0189 0 .0071 0.0166 18 0.7543 0.7788 0.0051 0.0077 0.0156 0.0063 0.0140 19 0.7497 0.7833 0.0072 0.0093 0.0189 0.0073 0.0166 20 0.7540 0.7786 0.0051 0.0076 0.0159 0.0062 0.0143 21 100 0.7708 0.7 923 0.0036 0.0067 0.0145 0.0055 0.0130 22 0.7816 0.7937 0.0025 0.0039 0.0071 0.0034 0.0066 23 0.7698 0.7913 0.0036 0.0070 0.0150 0.0058 0.0134 24 0.7809 0.7931 0.0025 0.0040 0.0073 0.0035 0.0068 25 5 30 0.4 0 .3629 0.3802 0 .0143 0.0186 0.0227 0 .0173 0.0221 26 0.3797 0.3908 0.0125 0.0140 0.0152 0.0137 0.0150 27 0.3699 0.3873 0.0143 0.0182 0.0233 0.0171 0.0227 28 0.3824 0.3934 0.0125 0.0140 0.0151 0.0137 0.0150 28 50 0 .3873 0.3967 0 .0086 0.0094 0.0103 0 .0092 0.0102 30 0.3934 0.3995 0.0 075 0.0076 0 .0081 0.0 076 0.0080 31 0.3875 0.3970 0 .0086 0.0095 0.0103 0 .0093 0.0102 32 0.3919 0.3980 0 .0075 0.0080 0.0081 0 .0079 0.0081 33 100 0.3952 0.3997 0 .0043 0.0045 0.0046 0 .0045 0.0045 34 0.3982 0.4010 0 .0038 0.0038 0.0039 0 .0038 0.0039 35 0.3941 0.3985 0 .0043 0.0045 0.0046 0 .0045 0.0045 36 0.3971 0.4000 0 .0038 0.0038 0.0039 0 .0038 0.0039 37 30 0.8 0 .6347 0.6912 0 .0074 0 .0384 0.0993 0.0 173 0.0414 38 0.6836 0.7175 0 .0068 0.0266 0.0549 0 .0133 0.0269 38 0.6386 0.6968 0 .0074 0.0376 0.1036 0 .0165 0.0426 40 0.6885 0.7215 0 .0068 0.0253 0.0534 0 .0125 0.0260 41 50 0.7650 0.7816 0 .0045 0.0087 0.0163 0 .0063 0.0107 42 0.7780 0.7877 0 .0041 0.0061 0.0088 0 .0050 0.0072 43 0.7630 0.7797 0 .0045 0.0085 0.0163 0 .0062 0.0105 44 0.7784 0.7880 0 .0041 0.0058 0.0088 0 .0048 0.0071 45 100 0.7933 0.7 983 0.0022 0.0025 0.0029 0.0024 0.0028 46 0.7958 0.7991 0 .0021 0.0021 0.0023 0 .0021 0.0023 47 0.7935 0.7986 0 .0022 0.0024 0.0030 0 .0024 0.0029 48 0.7960 0.7993 0 .0021 0.0021 0.0023 0 .0021 0.0023 (1) = i i is the tru e value of the delta parameter; ˆ a n d U ˆ are the sample averages of the 10,000 estimators ˆ and U ˆ respectively. (2) A ˆ V is the real asy mptotic varia nce, obtaine d based o n the true va lues of the parameters. 26 (3) E ˆ V and EU ˆ V are the sample variances of the 10,000 estimators ˆ and U ˆ respectively. It is assumed that these variances are approximately equal to the tru e variances of the estimators ˆ a n d U ˆ respectively. (4) ˆˆ V and U ˆˆ V are the sample avera ges o f the 10,000 estimators ˆˆ V and U ˆˆ V respectively. 27 Table 4: Assessment of the estimators of 3 ( 3 ˆ and 3 E ˆ ), and of their variances, based on 10,000 simulations of each one of the settings in Table 2. id K n 3 3 ˆ 3 U ˆ 3 A ˆ V 3 E ˆ V 3 E U ˆ V 3 ˆ ˆ V 3 U ˆ ˆ V 1 3 30 0.2 0.1402 0.1 815 0.0312 0.0518 0.0406 0.1133 0.1018 2 0.1785 0.1 930 0.0108 0.0196 0.0164 0.0305 0.0286 3 0.1 4 0.0859 0 .1270 0.0 298 0.0476 0.0 371 0.1085 0 .0975 4 0.1146 0.1 296 0.0095 0.0186 0.0152 0.0312 0.0291 5 5 0 0.2 0 .1469 0.1742 0.0187 0.0450 0.0365 0.0875 0.0 803 6 0.1865 0.1 942 0.0065 0.0105 0.0094 0.0139 0.0134 7 0.1 4 0.0783 0 .1063 0.0 179 0.0500 0.0 404 0.0961 0 .0883 8 0.1285 0.1 361 0.0057 0.0093 0.0083 0.0123 0.0119 9 100 0.2 0.1709 0.1829 0 .0094 0.0236 0 .0209 0.0314 0.0300 10 0 .1953 0.1986 0.0032 0.0040 0 .0038 0.0040 0.0040 11 0.14 0.1080 0.1201 0 .0090 0.0230 0 .0203 0.0308 0.0294 12 0 .1349 0.1382 0.0029 0.0035 0 .0033 0.0036 0.0036 13 30 0.4 0.3586 0.3 786 0.0166 0.0114 0.0097 0.0201 0.0184 14 0 .3582 0.3707 0.0098 0.0095 0 .0085 0.0142 0.0133 15 0.28 0.2516 0.2716 0 .0153 0.0104 0 .0088 0.0187 0.0171 16 0 .2480 0.2607 0.0086 0.0085 0 .0075 0.0131 0.0123 17 50 0.4 0.3742 0.3 918 0.0100 0.0101 0.0085 0.0181 0.0163 18 0 .3808 0.3893 0.0059 0.0071 0 .0063 0.0099 0.0093 19 0.28 0.2615 0.2791 0 .0092 0.0094 0 .0078 0.0172 0.0155 20 0 .2645 0.2732 0.0051 0.0064 0 .0056 0.0093 0.0088 21 100 0.4 0 .3801 0.3930 0.0050 0.0080 0 .0067 0.0148 0.0134 22 0 .3925 0.3968 0.0029 0.0038 0 .0035 0.0049 0.0047 23 0.28 0.2597 0.2727 0 .0046 0.0077 0 .0064 0.0147 0.0134 24 0 .2727 0.2771 0.0026 0.0034 0 .0031 0.0046 0.0044 25 5 3 0 0.0 5 0.0440 0 .0471 0 .0029 0.0034 0 .0032 0.0038 0.0038 26 0 .0462 0.0487 0.0028 0.0033 0 .0031 0.0036 0.0035 27 0.08 0.0737 0.0768 0 .0037 0.0042 0 .0040 0.0045 0.0045 28 0 .0761 0.0785 0.0037 0.0042 0 .0040 0.0043 0.0044 28 50 0.05 0.0477 0 .0494 0 .0017 0.0019 0.0 018 0.0019 0 .0019 30 0.0483 0.0 497 0.0017 0.0018 0.0 018 0.0019 0.0019 31 0.08 0.0795 0.0 811 0.0 022 0.0023 0.0023 0.0 024 0.0 024 32 0.0 784 0.0798 0.0022 0.0024 0 .0024 0.0024 0.0024 33 100 0.05 0.0493 0.0500 0 .0009 0.0009 0 .0009 0.0009 0.0009 34 0.0 497 0.0503 0.0009 0.0009 0 .0009 0.0009 0.0009 35 0.08 0.0788 0.0 796 0.0 011 0.0011 0.0011 0.0 011 0.0 011 36 0.0 794 0.0801 0.0011 0.0011 0 .0011 0.0011 0.0011 37 30 0.15 0.1 199 0.1305 0.0047 0.0071 0.0 047 0.0262 0 .0118 38 0.1277 0 .1347 0 .0047 0.0066 0.0 048 0.0175 0 .0095 38 0.16 0.1289 0.1 397 0.0 049 0.0074 0.0049 0.0 276 0.0 122 40 0.1358 0 .1428 0 .0049 0.0066 0.0 049 0.0180 0 .0096 41 50 0.15 0.1 437 0.1467 0.0028 0.0032 0.0 029 0.0050 0 .0038 42 0.1462 0 .1483 0 .0028 0.0032 0.0 030 0.0037 0 .0035 43 0.16 0.1527 0.1 556 0.0 029 0.0032 0.0030 0.0 046 0.0 038 44 0.1557 0 .1578 0 .0029 0.0032 0.0 031 0.0040 0 .0036 45 100 0.15 0.1491 0.1499 0 .0014 0.0015 0 .0014 0.0015 0.0015 46 0.1500 0 .1507 0 .0014 0.0014 0.0 014 0.0015 0 .0015 47 0.16 0.1592 0.1 601 0.0 015 0.0015 0.0015 0.0 015 0.0 015 48 0.1588 0 .1596 0 .0015 0.0015 0.0 015 0.0015 0 .0015 (1) 3 is the true value of the agreement in category 3; 3 ˆ a n d 3 U ˆ a r e t h e s a m p l e a v e r a g e s o f t h e 1 0 , 0 0 0 estimators 3 ˆ and 3 U ˆ respectively. (2) 3 A ˆ V is the real asy mptotic varia nce, obtaine d based o n the true va lues of the parameters. (3) 3 E ˆ V and 3 EU ˆ V are the sample variances of the 10,000 estimators 3 ˆ and 3 U ˆ respectively. It is assumed that t hose sample varian ces are approximately equal to th e true variances of the previou s estimators. 28 (4) 3 ˆ ˆ V and 3 U ˆ ˆ V are the sample avera ges o f the 10,000 estimators 3 ˆ ˆ V and 3 U ˆ ˆ V respectively. 29 Table 5: Evaluation of three estimators of 3 ( 3 ˆ and 3 U ˆ }, and of their variances, based on 10,000 simulations of each one of the settings in Table 2. id K n 3 3 ˆ 3 U ˆ 3 A ˆ V 3 E ˆ V 3 E U ˆ V 3 ˆ ˆ V 3 U ˆ ˆ V 1 3 30 0.4000 0.2756 0.3590 0.1177 0.2093 0.1602 0.4 471 0.4 012 2 0.4 878 0.4296 0.4645 0.0494 0.0999 0.0803 0.1512 0.1408 3 0.3 182 0.1896 0.2833 0.1486 0.2483 0.1897 0.5400 0.4857 4 0.4 000 0.3228 0.3647 0.0644 0.1335 0.1061 0.2092 0.1954 5 5 0 0.4000 0 .2930 0.3470 0 .0706 0.1755 0.1410 0.3 293 0.3 021 6 0.4 878 0.4521 0.4706 0.0297 0.0523 0.0456 0.0665 0.0637 7 0.3 182 0.1764 0.2394 0.0891 0.2499 0.2008 0.4632 0.4256 8 0.4 000 0.3632 0.3846 0.0387 0.0651 0.0569 0.0818 0.0787 9 100 0 .4000 0.3416 0.3653 0.0353 0.0899 0.0793 0.1167 0.1115 10 0.4878 0 .4741 0.4819 0.0 148 0.0189 0.0179 0.0190 0.0187 11 0.3182 0 .2458 0.2730 0.0 446 0.1122 0.0988 0.1461 0.1395 12 0.4000 0 .3836 0.3929 0.0 193 0.0243 0.0230 0.0244 0.0242 13 30 0.8000 0.7378 0.7797 0.0430 0.0283 0.0196 0.0684 0.0587 14 0.8511 0 .7760 0.8036 0.0 145 0.0204 0.0143 0.0442 0.0386 15 0.7368 0 .6608 0.7153 0.0 735 0.0469 0.0325 0.1118 0.0965 16 0.8000 0 .6995 0.7366 0.0 258 0.0356 0.0250 0.0769 0.0675 17 50 0.8000 0.7556 0.7916 0.0258 0.0286 0.0208 0.0627 0.0544 18 0.8511 0 .8140 0.8324 0.0 087 0.0160 0.0118 0.0290 0.0258 19 0.7368 0 .6843 0.7313 0.0 441 0.0479 0.0349 0.1036 0.0902 20 0.8000 0 .7507 0.7755 0.0 155 0.0295 0.0217 0.0529 0.0471 21 100 0.8000 0 .7601 0.7859 0 .0129 0.0257 0.0200 0.0529 0.0472 22 0.8511 0 .8361 0.8452 0.0 044 0.0087 0.0071 0.0132 0.0122 23 0.7368 0 .6825 0.7167 0.0 221 0.0453 0.0352 0.0925 0.0827 24 0.8000 0 .7772 0.7897 0.0 077 0.0159 0.0130 0.0244 0.0226 25 5 3 0 0.2 941 0.2444 0.2614 0.0800 0.0933 0.0854 0.0809 0.0818 26 0.2941 0 .2551 0.2691 0.0 797 0.0899 0.0834 0.0794 0.0770 27 0.4000 0 .3565 0.3713 0.0 654 0.0807 0.0744 0.0702 0.0703 28 0.4000 0 .3650 0.3768 0.0 652 0.0768 0.0722 0.0661 0.0663 28 50 0.2941 0.2697 0.2794 0.0480 0.0529 0.0508 0.0462 0.0465 30 0.2941 0 .2735 0.2814 0.0478 0 .0508 0.0491 0.0456 0.0459 31 0.4000 0.3881 0.3962 0 .0392 0.0426 0.0409 0.0 392 0.0 391 32 0.4000 0.3800 0.3867 0 .0391 0.0431 0.0417 0.0 391 0.0 391 33 100 0.2941 0 .2853 0.2899 0 .0240 0.0251 0.0246 0.0235 0.0235 34 0.2941 0.2879 0.2917 0 .0239 0.0244 0.0240 0.0 234 0.0 234 35 0.4000 0.3897 0.3936 0 .0196 0.0206 0.0202 0.0 198 0.0 197 36 0.4000 0.3920 0.3953 0 .0196 0.0202 0.0199 0.0 196 0.0 195 37 30 0.7895 0.6061 0.6 614 0.0317 0 .1285 0.0588 0.6120 0.2359 38 0.7895 0.6474 0.6841 0 .0316 0.1070 0.0558 0.3 840 0.1 761 38 0.8000 0.6288 0.6827 0 .0287 0.1232 0.0545 0.6 135 0.2 306 40 0.8000 0.6643 0.6986 0 .0287 0.0968 0.0515 0.3 776 0.1 622 41 50 0.7895 0.7491 0.7 650 0.0190 0 .0363 0.0258 0.0921 0.0501 42 0.7895 0.7591 0.7696 0 .0190 0.0331 0.0272 0.0 434 0.0 374 43 0.8000 0.7617 0.7763 0 .0172 0.0326 0.0239 0.0 615 0.0 423 44 0.8000 0.7706 0.7810 0 .0172 0.0288 0.0235 0.0 438 0.0 343 45 100 0.7895 0 .7816 0.7863 0 .0095 0.0111 0.0104 0.0117 0.0114 46 0.7895 0.7823 0.7861 0 .0095 0.0112 0.0105 0.0 115 0.0 112 47 0.8000 0.7920 0.7966 0 .0086 0.0104 0.0097 0.0 109 0.0 106 48 0.8000 0.7899 0.7937 0 .0086 0.0105 0.0100 0.0 106 0.0 104 (1) 3 is the true value o f the consistency in category 3; 3 ˆ and 3 U ˆ are the sample averages o f the 10,000 estimators 3 ˆ and 3 U ˆ respectively. (2) 3 A ˆ V is the real asy mptotic varia nce, obtaine d based o n the true va lues of the parameters. 30 (3) 3 E ˆ V a n d 3 EU ˆ V are the sample variances of the 10,000 estimators 3 ˆ a n d 3 U ˆ respectively. It is assumed that t hose sample varian ces are approximately equal to th e true variances of the previou s estimators. (4) 3 ˆ ˆ V and 3 U ˆ ˆ V are sample averages of the 10,000 estimators 3 ˆ ˆ V and 3 U ˆ ˆ V respectively. 31 Table 6 Diagnosis of n =100 subjects by two raters in K =3 categories (Fleiss et al. 2003) (a) Frequencies observed ( x ij ) R ater 2 Totals ( x i ) Rater 1 Psychotic Neurotic Organic Psychotic Neurotic Or g anic 75 5 0 1 4 0 4 1 10 80 10 10 Totals ( x j ) 80 5 15 100 ( x ) (b) Parameters estimated under the delta model 1 ˆ = 0.550 1 U ˆ = 0.575 1 ˆ = 0.687 1 U ˆ = 0.719 ˆ = 0.687 U ˆ = 0.715 2 ˆ = 0.037 2 U ˆ = 0.040 2 ˆ = 0.500 2 U ˆ = 0.528 3 ˆ = 0.100 3 U ˆ = 0.100 3 ˆ = 0.800 3 U ˆ = 0.800 32 Table 7 Data from a 2 2 table by Nelson and Pepe (2000). (a) Original data (left) and data increased by 0.5 and in a vir tual category (right) Categories 1 2 Total Categories 1 2 3 Total 1 80 10 90 1 80.5 10.5 0.5 91.5 2 10 0 10 2 10.5 0.5 0.5 11.5 Total 90 10 100 3 0.5 0.5 0.5 1.5 Total 91.5 11.5 1.5 104.5 (b) Parameters estimated under the delta model 1 * ˆ = 0.680 1 * U ˆ = 0.745 1 ˆ = 0.765 1 U ˆ = 0.869 * ˆ = 0.583 * U ˆ = 0.714 2 * ˆ = 0.097 2 * U ˆ = 0.031 2 ˆ = 0.870 2 U ˆ = 0.280 (c) Other parameters estimated under the delta model if the row rater is a gold standard 1 ˆ = 1 ˆ = 0.765 1 ˆ U = 1 ˆ U = 0.839 2 ˆ = 2 ˆ = 0.870 2 ˆ U = 2 ˆ U = 0.280 33 Table 8 Data from Table IIA by Kra mer and Feinstein (1981) (a) Observed frequencies ( x ij ) R ater 2 Totals ( x i ) Rater 1 1 2 3 4 1 2 3 4 1 1 1 1 2 5 4 1 0 3 5 1 0 1 2 2 3 10 12 5 Totals ( x j ) 4 12 9 5 30 ( x ) (b) Parameters estimated under the delta model 1 ˆ = 0.023 1 U ˆ = 0.024 1 ˆ S = 0.197 1 U ˆ S = 0.206 ˆ = 0.182 U ˆ = 0.210 2 ˆ = 0.027 2 U ˆ = 0.042 2 ˆ S = 0.074 2 U ˆ S = 0.115 3 ˆ = 0.082 3 U ˆ = 0.092 3 ˆ S = 0.234 3 U ˆ S = 0.264 4 ˆ = 0.050 4 U ˆ = 0.052 4 ˆ S = 0.300 4 U ˆ S = 0.311
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment