Testing for Homogeneity in Meta-Analysis I. The One Parameter Case: Standardized Mean Difference

T esting for Homogeneit y in Meta-Analysis I. The One P arameter Case: Standardized Mean Diﬀerence Elena Kulinsk a y a, Mic hael B. Dollinger and Kirsten Bjørk estøl 1 August 2009 Abstract Meta-analysis se eks to c ombine the r esults of sever al exp eriments in or der to impr ove the ac cur acy of de cisions. It is c ommon to use a test for homo geneity to determine if the r e- sults of the sever al exp eriments ar e suﬃciently similar to warr ant their c ombination into an over al l r esult. Co chr an ’s Q statistic is fr e quently use d for this homo geneity test. It is often assume d that Q fol lows a chi-squar e distribution under the nul l hyp othesis of homo- geneity, but it has long b e en known that this asymptotic distribution for Q is not ac cur ate for mo der ate sample sizes. Her e we pr esent formulas for the me an and varianc e of Q under the nul l hyp othesis which r epr esent O (1 /n ) c orr e ctions to the c orr esp onding chi-squar e mo- ments in the one p ar ameter c ase. The formulas ar e fairly c omplic ate d, and so we pr ovide a pr o gr am (available at http://www.imp erial.ac.uk/stathelp/r ese ar chpr oje cts/metaanalysis ) for making the ne c essary c alculations. We apply the r esults to the standar dize d me an dif- fer enc e (Cohen ’s d -statistic) and c onsider two appr oximations: a gamma distribution with estimate d shap e and sc ale p ar ameters and the chi-squar e distribution with fr actional de- gr e es of fr e e dom e qual to the estimate d me an of Q . We r e c ommend the latter distribution as an appr oximate distribution for Q to use for testing the nul l hyp othesis. Key w ords: w eighted ANOV A, w eigh ted sum of squares, Q statistic, Cohen’s d -statistic, gamma distribution, heterogeneit y test. Elena Kulinsk ay a, Statistical Advisory Service, Imp erial College, London, UK. Mic hael B. Dollinger, Department of Mathematics, Paciﬁc Lutheran Univ ersit y , T acoma, W A, USA. Kirsten Bjørkestøl, F aculty of T ec hnology and Science, Universit y of Agder, Kristiansand, Nor- w ay 1 1 In tro duction In the meta-analysis of sev eral studies, it is usual to conduct a “homogeneit y test” to determine if the eﬀects measured b y the studies are suﬃciently similar to warran t their com bination in to one grand summary eﬀect using the ﬁxed eﬀect model, Normand (1999). The most commonly used test statistic is Co c hran’s Q (Co c hran, 1937). It is deﬁned as follo ws. Supp ose that there are I studies (or exp erimen ts) each of whose result is given b y an estimator ˆ θ i of a p opulation eﬀect θ i . Supp ose that the v ariance of ˆ θ i is given by v i whic h can b e estimated in turn b y ˆ v i . The summary eﬀects ma y b e com bined into a grand summary eﬀect using a weigh ted av erage ˆ θ w = P i ˆ w i ˆ θ i / P i ˆ w i where the weigh ts w i and their estimators ˆ w i are usually tak en as inv erses of the v ariances and their esti- mators resp ectiv ely (thus w eigh ting more accurate studies more heavily). A t this p oin t of the discussion, the summary eﬀect may be quite general, suc h as the sample mean of eac h study , the diﬀerence of means betw een treatment and con trol arms of eac h study or the standardized diﬀerence of means b et w een treatmen t and con trol arms of each study; but in the main b o dy of the pap er we will restrict the discussion to cases in which the estimators of θ i and w i dep end on only the one parameter θ i . Co c hran’s Q statistic, which is used in the homogeneity test, is deﬁned by Q = P i ˆ w i ( ˆ θ i − ˆ θ w ) 2 . When testing the null hypothesis that θ 1 = · · · = θ I , that is the un- derlying eﬀects measured by all the studies are the same, it is common to assume that Q has a c hi-square distribution with I − 1 degrees of freedom. This distribution app ears to b e asymptotically v alid (as the sizes n i of the studies b ecome large) ov er a wide c hoice of summary eﬀects. There hav e b een many sim ulation studies of the accuracy of the chi- square approximation (see Hedges & Olkin (1985), Viech tbauer (2007) and the references therein), but except for the case where the p opulations are normally distributed with the parameters estimated b y sample means and sample v ariances, there are few theoretical re- sults dealing with the question of the distribution of Q for small or moderate sample sizes. The c hi-square distribution is an exact distribution of the Q statistic for normally distributed p opulations with known v ariances resulting in kno wn w eights. Randomness of the weigh ts is traditionally ignored in meta-analysis, Biggerstaﬀ & Tw eedie (1997), Jac kson (2006), Biggerstaﬀ & Jackson (2008). In contrast, Co chran, as early as in his 1937 pap er which dealt with the normally distributed case, recognized the need for a correction to the c hi-square distribution for mo derate sample sizes and prop osed such a correction at that time. In 1951, James (1951) and W elc h (1951) prop osed separate im- pro ved corrections to the distribution of Q (again for the normal case), corrections which are equiv alen t to eac h other up to order 1 /n i . W elc h’s prop osal (more commonly used and now known as the W elc h test) referred Q to a rescaled F -distribution ( cF I − 1 ,ν ) with I − 1 and ν degrees of freedom where ν and the rescaling constan t c are quantities to be estimated from the data. In W elc h’s deriv ation, the prop erties of normalit y including the 2 indep endence of the estimators of the w eights (inv erses of sample v ariances) and of the eﬀects (sample means) was heavily used; these prop erties are not generally v alid in many situations in which the Q statistic is commonly used. Improv ed approximations to p o wer of the W elc h test in the normal case are given in Kulinsk a y a et al. (2003). The pap er Kulinsk ay a et al. (2004) extended the W elc h test in the normal case to contrasts (suc h as the diﬀerence of treatment and control means), and a W elch type Q test for robust estimators of eﬀects and their v ariances w as introduced in Kulinsk a ya & Dollinger (2007). In a series of pap ers, we plan to in vestigate corrections to the distribution of Q in situations in which the estimators of the eﬀects and of the w eights are not statistically indep enden t. As far as we kno w, there hav e b een no theoretical results b efore no w on this sub ject. W e exp ect that the results will pro vide more accurate homogeneit y tests when the sample sizes are small or moderate. In this paper (the ﬁrst of the planned series), w e in vestigate the situation in whic h the eﬀect and w eigh t estimators dep end on a single pa- rameter. W e will apply our general theory to an imp ortan t sp ecial case: the standardized mean diﬀerence (also kno wn as Cohen’s d , Cohen (1988)). Deﬁnitions app ear in Section 3. This pap er is organized as follo ws. In Section 2, we presen t the general theory . In Section 3 we apply the general theory to the standardized mean diﬀerence. Section 4 con- tains tw o real meta-analytic examples whic h ha v e used the standardized mean diﬀerence to measure the eﬀects. Section 5 contains the results of a large num b er of simulations whic h sho w the qualit y and the limitations of the new appro ximations for the homogeneit y test based on Q when the eﬀects are measured as standardized mean diﬀerences. In the ﬁnal section we summarize the more imp ortan t conclusions, make some recommendations and indicate areas of future work. Some of the more complicated formulas hav e b een relegated to the Appendix. 2 The general theory W elc h’s 1951 correction to the distribution of Q w as based on expansions to appro ximate the mean and v ariance of Q . He then used these moments to deﬁne an approximate distribution for Q . W e follow this same general idea, but there are several imp ortan t diﬀerences. W elc h made the assumption that the underlying distributions w ere normally distributed and that the weigh ts w ere in verses of the study v ariances, estimated by the sample v ariances. T o p ermit as wide applicabilit y as p ossible, w e do not assume normalit y and allow the w eights to b e diﬀeren t from the inv erses of the v ariances. Also we do not mak e the assumption that the estimators of the w eights are statistically indep enden t of the estimators for the eﬀects. A third diﬀerence b et ween our approac h and that of W elc h is that he based his approximation s on an asymptotic expansion of the moment generating function of Q . W e instead use the delta metho d, whic h is based on T a ylor expansions of 3 Q and Q 2 ab out the mean of the eﬀect size. 2.1 Notation and assumptions There are I studies with corresp onding eﬀect θ i . The null h yp othesis for the homogeneit y test will b e equalit y of the eﬀects, i.e., θ 1 = · · · = θ I ; we will denote the common eﬀect under the n ull h yp othesis b y θ . The eﬀects are estimated by random v ariables ˆ θ i . The theoretical w eights are w i and they are estimated by ˆ w i . In most applications, we will ha ve w i = 1 / V ar[ ˆ θ i ], but in this section w e merely assume that the w eigh t estimators are some function f i of the eﬀect estimator ˆ θ i ; that is, ˆ w i = f i ( ˆ θ i ) where the functions f i will generally dep end on additional constants such as the sample size. The theoretical w eights under the null hypothesis will b e w i = f i ( θ ). The assumption that the w eights are dependent only on the corresp onding eﬀects is an imp ortant limitation of the results of this pap er. In our next pap er in this series, we plan to inv estigate the situation in whic h the weigh ts dep end on more than one random v ariable. W e need to mak e some fairly standard assumptions ab out the orders (relativ e to the sample sizes) of the central moments E[( ˆ θ i − θ i ) r ] of the estimators ˆ θ i and also the orders of the w eigh ts and their deriv ativ es. Let n i represen t the sample size of the i th study . In the even t that the studies ha ve t w o arms (as in the application in Section 3), let n i b e the minim um sample s ize of the t wo arms. W e will also use the notation n = min { n i } and sometimes express appro ximations in terms of orders of n . T o simplify notation, deﬁne Θ i = ( ˆ θ i − θ i ). W e assume ﬁrst that E[Θ i ] = O (1 /n 2 i ). This condition will certainly b e satisﬁed if the estimator ˆ θ i is un biased. In regular parametric problems, it is easy to remov e the ﬁrst-order term from the asymptotic bias of maximum lik eliho o d estimates (see Firth (1993)). W e will need higher moments up to and including the sixth central momen t. F or these moments, w e assume the following orders which generally follow from √ n i asymptotic normality: E[Θ 2 i ] = O (1 /n i ), E[Θ 3 i ] = O (1 /n 2 i ), E[Θ 4 i ] = O (1 /n 2 i ), E[Θ 5 i ] = O (1 /n 3 i ) and E[Θ 6 i ] = O (1 /n 3 i ). W e further assume that the w eight estimators ˆ w i and their ﬁrst t wo deriv ativ es with resp ect to θ i will b e O ( n i ), as will b e the case whenev er the weigh ts are in v erses of the v ariances. 2.2 Expansions for E [ Q ] and E [ Q 2 ] In this section we presen t expressions for E[ Q ] and E[ Q 2 ] using T a ylor expansions and then taking exp ectations of these expansions. The T a ylor expansions are centered ab out the the null h yp othesis θ 1 = · · · = θ I = θ , and thus all deriv ativ es in this section are to b e ev aluated at this null hypothesis. In our expansions we hav e kept all terms to order O (1 /n ). W e begin with the ﬁrst moment of Q . E[ Q ] = 1 2 X i ∂ 2 Q ∂ θ 2 i E[Θ 2 i ] + 1 6 X i ∂ 3 Q ∂ θ 3 i E[Θ 3 i ] (1) 4 + 1 24 X i ∂ 4 Q ∂ θ 4 i E[Θ 4 i ] + 1 8 X i 6 = j X ∂ 4 [ Q ] ∂ θ 2 i ∂ θ 2 j E[Θ 2 i ]E[Θ 2 j ] + O  1 n 2  W e next substitute expressions for the indicated deriv ativ es in to this form ula and expand the double sum into combinations of single sums to obtain the follo wing result. T o simplify the expression, w e use the notation W = P i w i and U i = 1 − w i /W . The form ula is expressed in terms of parameter v alues; estimates of these parameter v alues will b e needed when the formula is applied to data. E[ Q ] = X i [ w i U i ]E[Θ 2 i ] + X i " U 2 i d f i d ˆ θ i # E[Θ 3 i ] + X i   − U 2 i W d f i d ˆ θ i ! 2 + U 2 i 2 d 2 f i d ˆ θ 2 i   E[Θ 4 i ] − 1 W X i U i d f i d ˆ θ i E[Θ 2 i ] ! 2 − 1 W 3 X i w i d f i d ˆ θ i E[Θ 2 i ] ! 2 (2) − 1 W 3 X i w i E[Θ 2 i ] !   X i    " d f i d ˆ θ i # 2 − 1 2 W d 2 f i d ˆ θ 2 i    E[Θ 2 i ]   + 1 W X i 1 − 2 w i W + 3 w 2 i W 2 ! " d f i d ˆ θ i # 2 (E[Θ 2 i ]) 2 − 1 2 W 3 X i w 2 i d 2 f i d ˆ θ 2 i (E[Θ 2 i ]) 2 + O  1 n 2  The expansion for second moment E[ Q 2 ] up to order O (1 /n ) requires terms of 4th, 5th and 6th degree. The expansion is giv en b y E[ Q 2 ] = 1 24 X i ∂ 4 [ Q 2 ] ∂ θ 4 i E[Θ 4 i ] + 1 8 X i 6 = j X ∂ 4 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j E[Θ 2 i ]E[Θ 2 j ] + 1 120 X i ∂ 5 [ Q 2 ] ∂ θ 5 i E[Θ 5 i ] + 1 12 X i 6 = j X ∂ 5 [ Q 2 ] ∂ θ 3 i ∂ θ 2 j E[Θ 3 i ]E[Θ 2 j ] + 1 720 X i ∂ 6 [ Q 2 ] ∂ θ 6 i E[Θ 6 i ] (3) + 1 48 X i 6 = j X ∂ 6 [ Q 2 ] ∂ θ 4 i ∂ θ 2 j E[Θ 4 i ]E[Θ 2 j ] + 1 48 X i 6 = j X 6 = k X ∂ 6 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j ∂ θ 2 k E[Θ 2 i ]E[Θ 2 j ]E[Θ 2 k ] + O  1 n 2  The deriv ativ es of Q 2 needed for Equation 3 are fairly complicated and app ear in the App endix. 2.3 Applying the form ulas The formulas in Equations 2 and 3 are fairly general since they are not based on an y normalit y assumptions, and will b e applicable to any situation in which there is only one parameter and in which the weigh ts and central momen ts meet the order conditions describ ed in Section 2.1. 5 T o use the form ulas for a sp eciﬁc application, the user will need to supply expressions for the weigh ts (that is, the functions f i ) and their ﬁrst and second deriv ativ es and also expressions for the cen tral moments E[Θ r i ] for r = 1 , . . . , 6. W e pro vide an illustration in the next section where w e apply the theory to the imp ortan t sp ecial case of the stan- dardized mean diﬀerence. Because of the complexity of the formulas, w e hav e pro vided a computer program in R which can b e used for the necessary calculations for applying the Q test to the standardized mean diﬀerence. This program can b e downloaded from the w ebsite http://www.imperial.ac.uk/stathelp/researchpro jects/metaanalysis . The w eigh ts and their deriv ativ es which app ear in the form ulas need to b e estimated under the n ull h yp othesis and will b e diﬀerent from the weigh ts which are used for cal- culating a sp eciﬁc v alue of Q from the data. Sp eciﬁcally , weigh ts ˆ w i = f i ( ˆ θ i ) are ﬁrst calculated. These weigh ts are used to estimate the combined eﬀect ˆ θ w = P i ˆ w i ˆ θ i / P i ˆ w i and to calculate the v alue of the Q statistic P i ˆ w i ( ˆ θ i − ˆ θ w ) 2 . How ev er, the w eights whic h app ear in Equations 2 and 3 need to b e recalculated using the same combined eﬀect ˆ θ w as the eﬀect for eac h of the studies. That is, these ‘n ull’ w eights are estimated b y f i ( ˆ θ w ) and the deriv ativ es will b e estimated b y ∂ f i ∂ θ i ( ˆ θ w ) and ∂ 2 f i ∂ θ 2 i ( ˆ θ w ). Impro ved appro ximations to the mean and v ariance of Q under the n ull h yp othesis are, of course, not suﬃcient to conduct a test of the null h yp othesis. A distribution for Q is needed for this purp ose. Ideally , simulations should b e used for eac h separate application t yp e to select a family of distributions which ﬁts the distribution of Q . How ev er, w e hav e found in our simulations, whic h cov er a num ber of situations (including both the one parameter case discussed here as well as in cases inv olving multiple parameters), that the gamma family of distributions ﬁts the null distribution of Q quite closely . Imp ortan tly , this family includes the chi square family as a special case. In particular, w e ha v e found that the gamma family of distributions ﬁts the distribution of Q very well in the case of the eﬀects are measured by the standardized mean diﬀerence. Another contender is the c hi-square distribution with fractional degrees of freedom equal to the mean of Q (see Section 3.4 b elo w). 2.4 In v erse v ariance w eigh ts and the c hi-square distribution It is usual to choose weigh ts to b e in verse v ariances, i.e., w i = 1 / E[Θ 2 i ]. W e make this assumption in the remainder of this section. The expressions for the momen ts giv en in Equations 2 and 3 simplify somewhat under this in verse v ariance assumption. In Equation 2, only the ﬁrst (or quadratic) term is O (1). The remaining terms are all O (1 /n ). With in verse v ariance w eights, this ﬁrst term simpliﬁes to P i (1 − w i /W ) = I − 1. Notice that this quantit y is the ﬁrst momen t of the c hi-square distribution with I − 1 degrees of freedom. Th us Equation 2 pro vides an order O (1 /n ) correction to the chi-square ﬁrst momen t. 6 In Equation 3 for the second momen t of Q , the lo west degree terms are the ﬁrst t wo terms (those of fourth degree), and these are the only t w o terms of order O (1). The remaining terms are all of order O (1 /n ). Using Equations 20 and 21 (in the App endix) for the fourth deriv ativ es of Q 2 , these t wo terms become X i w 2 i U 2 i E[Θ 4 i ] + X i 6 = j X ( U i U j + 2 w i w j W 2 ) . (4) The kurtosis γ 2 of a random v ariable with fourth central moment µ 4 and v ariance σ 2 is commonly deﬁned b y γ 2 = µ 4 /σ 4 − 3; this deﬁnition is arranged so that normally distributed random v ariables hav e kurtosis of zero. Using this deﬁnition, we will denote the kurtosis of ˆ θ i (the estimator of the i th eﬀect) by γ 2 ,i . Then Equation 4 can b e algebraically rearranged to I 2 − 1 + X i γ 2 ,i U 2 i . (5) Since kurtosis is typically of order O (1 /n ), we see that the second moment of the null distribution of Q agrees with the second momen t of the chi-square distribution with I − 1 degrees of freedom (whic h is I 2 − 1) up to order O (1 /n ). Th us when inv erse v ariance weigh ts are used, b oth the ﬁrst and second moments of the null distribution of Q agree with those of the c hi-square distribution up to order O (1) and Equations 2 and 3 pro vide order O (1 /n ) corrections. When discussing the distribution of Q , some authors make the simplifying assumption that the weigh ts are constants rather than random v ariables. See, for example Biggerstaﬀ & Tweedie (1997), Jac kson (2006), Biggerstaﬀ & Jackson (2008). When this assumption of constant weigh ts holds, the deriv ativ es of the w eights b ecome zero and all terms of our appro ximate form ula for E[ Q ] v anish except for the ﬁrst (or c hi-square) term. Similarly , all terms for E[ Q 2 ] v anish except for the ﬁrst t w o terms. Accordingly , under the assumption that the w eights are known constan ts, the commonly used c hi-square appro ximation for Q has mean which is accurate to order O (1 /n ). But the second momen t is accurate to this order only when the estimators of the eﬀects hav e kurtosis of order less than 1 /n . Ho wev er, since in realit y the weigh ts are random, b oth the mean and v ariance of Q need the corrections given by our form ulas in order to b e accurate to order O (1 /n ). Th us use of our form ulas should yield impro ved accuracy in the Q test when n is not to o large. 3 The Q test for the standardized mean diﬀerence In this section, w e apply the theoretical results of the previous section to the standardized mean diﬀerence (also known as Cohen’s d -statistic). W e begin with notation and a brief review of the necessary background. See, for example, Hedges & Olkin (1985) for details. 7 3.1 Notation and w eigh t functions W e assume that eac h of I studies consists of t w o arms of sizes n T i and n C i ha ving normally distributed data with means µ T i and µ C i and that the v ariance σ 2 i is the same in each arm. (The subscripts T and C may b e thought of as treatmen t and control.) Then the eﬀect measured b y the standardized mean diﬀerence in the i th study is giv en by δ i = ( µ T i − µ C i ) /σ i . (6) A natural, but biased, estimator of δ is ˆ δ i = ( ¯ X T i − ¯ X C i ) /s pi (7) where s 2 pi is the usual p ooled v ariance estimator. Instead of using ˆ δ i , w e follo w the usual practice to correct for the bias by using the un biased estimator of δ deﬁned by ˆ g i = J i ˆ δ i = J i ( ¯ X T i − ¯ X C i ) /s pi (8) where J i = Γ[( N i − 2) / 2] q ( N i − 2) / 2 Γ[( N i − 3) / 2] (9) is a constant dep ending only on the total sample size N i = n T i + n C i . Deﬁne q i = n C i / N i to b e the prop ortion of the total sample size in the control arm of the i th study . It is kno wn that (see (Hedges & Olkin, 1985, p. 104–5)) V ar[ ˆ g i ] = ( N i − 2) J 2 i ( N i − 4) N i q i (1 − q i ) + ( N i − 2) J 2 i N i − 4 − 1 ! δ 2 i := A i + B i δ 2 i , (10) where the constan ts A i and B i dep end only on the sample sizes. Replacing δ i b y its un biased estimator ˆ g i in this v ariance formula, we obtain an estimator of the v ariance of ˆ g i whic h is given b y ˆ V ar[ ˆ g i ] = A i + B i ˆ g 2 i . (11) Then the functions f i giving the estimated inv erse v ariance w eigh ts in the Q statistic are giv en b y ˆ w i = f i ( ˆ g i ) = h A i + B i ˆ g 2 i i − 1 . (12) The ﬁrst and second deriv ativ es of ˆ w i are given by d f i d ˆ g i = − 2 B i ˆ g i ˆ w 2 i (13) d 2 f i d ˆ g 2 i = − 2 B i ˆ w 2 i + 8 B 2 i ˆ g 2 i ˆ w 3 i . (14) 8 One issue that has arisen in meta-analysis inv olving the standardized mean diﬀerence is how b est to estimate the combined eﬀect δ . Estimators of δ app ear in tw o places in the Q test: in the deﬁnition of Q ; and in the application of Equations 2 and 3 where an estimated v alue of δ under the null hypothesis is used. It is kno wn (see Y uan & Bushman (2002)) that the natural weigh ted sum estimator ˆ g w = P ˆ w i ˆ g i / P ˆ w i is slightly biased. An alternativ e choice is to use the estimator ˆ g A = P A i ˆ g i / P A i ; since the weigh ts A i are not random, the estimator ˆ g A is unbiased. W e explored b oth c hoices in our simulations of the Q test and found that the diﬀerence b et ween these t wo c hoices is barely noticeable and not of practical importance. W e use the estimator ˆ g w in the examples of Section 4. 3.2 The moments of ˆ g In this section w e suppress the subscript i on all v ariables p ertaining to the i th study . The tw o main ingredients needed for applying the form ulas for E[ Q ] and E[ Q 2 ] are ﬁrst the w eigh t functions and their deriv atives (given in the previous section) and second the cen tral moments E[( ˆ g − δ ) r ] for r = 1 , . . . , 6. W e pro vide these cen tral moments in this section. F or these momen ts to exist, we assume that N > 8. (W e note that for the usual chi-square appro ximation to hold, N > 4 is required just for the v ariance of g to exist.) It is known that ((Hedges & Olkin, 1985, p. 79)) q ( N q (1 − q )) ˆ δ has a non- cen tral t -distribution with N − 2 degrees of freedom and non-cen trality parameter equal to q ( N q (1 − q )) δ . T o simplify notation, write γ = q ( N q (1 − q )) δ for the non-centralit y parameter. Denote a random v ariable with a non-central t -distribution with N − 2 degrees of freedom and non-cen tralit y parameter γ b y t N − 2 ( γ ). Then from (Johnson et al., 1995, p. 512), the momen ts of t N − 2 ( γ ) ab out zero are given b y E[ t r N − 2 ( γ )] =  N − 2 2  r/ 2 Γ[ N − 2 − r 2 ] Γ[ N − 2 2 ] b r/ 2 c X j =0 r 2 j ! (2 j )! 2 j j ! γ r − 2 j . (15) The ﬁrst momen t of t N − 2 ( γ ) will b e denoted by µ t and is giv en b y µ t =  N − 2 2  1 / 2 Γ[ N − 3 2 ] Γ[ N − 2 2 ] γ (16) Then the cen tral momen ts of t N − 2 ( γ ) are given by E[( t N − 2 ( γ ) − µ t ) r ] = r X k =0 ( − 1) k r k ! µ k t E[ t r − k N − 2 ( γ )] (17) 9 Since √ ( N q (1 − q )) J ˆ g has the distribution t N − 2 ( γ ), w e then ha ve the desired cen tral momen ts needed for the formula for Q . These are E[( ˆ g − δ ) r ] =   J q ( N q (1 − q ))   r E[( t N − 2 ( γ ) − µ t ) r ] (18) 3.3 V erifying the order conditions One further step in applying the formulas for E[ Q ] and E[ Q 2 ] is to c heck the order con- ditions whic h are set out in Section 2.1. Recall that w e use the notation N i to represen t the sum of the sizes of the tw o arms of the i th study and that we use the notation n i to b e the minim um of the tw o sizes, with n = min { n i } . It is eviden t from the deﬁnition in Equation 10 that A i = O (1 /n ). Also B i (as deﬁned in Equation 10) is O (1 /n ); see Hedges & Olkin (1985) for this fact. Thus ˆ w i = f i ( ˆ g i ) and its deriv ativ es are O ( n ). F urther, since ˆ g i is unbiased, the order condition for the ﬁrst cen tral moment of ˆ g i is trivially satisﬁed. In the remainder of this paragraph, w e again suppress the subscript i on all v ariables p ertaining to the i th study in order to simplify notation. Let X denote a normally distributed random v ariable with mean γ and v ariance 1, i.e., X ∼ N ( γ , 1). Then the k th momen ts of the noncen tral t N − 2 ( γ ) distribution are related to the momen ts of X by µ k ( t N − 2 ( γ )) = µ k ( X ) Γ[( N − 2 − k ) / 2]( N − 2) k/ 2 2 k/ 2 Γ[( N − 2) / 2] (19) where µ k denotes the k th momen t (see Bain (1969)). F rom Stirling’s formula, µ k ( t N − 2 ( γ )) = µ k ( X )(1 + O ( n − 1 )). Therefore, from equation (18), the cen tral momen ts of ˆ g are in the limit (up to an O (1) multiplier J r ) the cen tral momen ts of the N ( δ, ( N q (1 − q )) − 1 ) dis- tribution, so the order conditions are satisﬁed. 3.4 The gamma distribution F rom our man y simulations, it has b ecome apparent that the gamma family with proba- bilit y densit y functions f ( t ) = 1 Γ( α ) β α t α − 1 e − t/β is a very go o d ﬁt to the distribution of Q under the n ull h yp othesis of equal standardized mean diﬀerences. F or a random v ariable T with a gamma distribution, the shap e param- eter α is given b y α = (E[ T ]) 2 / V ar[ T ] and the scale parameter β is giv en by V ar[ T ] / E[ T ]. The c hi-square distribution with ν degrees of freedom is a mem b er of the gamma family with α = ν / 2 and β = 2. T o verify the ﬁt of the gamma family to the null distribution of Q , we simulated a num b er of empirical distributions of Q and used the statistics pac k age Statgr aphics 10 Centurion XV (from Statp oin t, Inc.) to compare the ﬁt of these empirical distributions with a v ariet y of distribution families. The gamma family alwa ys w as the b est, t ypically with a Kolmogorov-Smirno v (K-S) distance of only 0.002, which indicates a remark ably go od ﬁt. The second best ﬁtting family was the c hi-square family with fractional degrees of freedom whic h typically had a K-S distance of four times that of the b est ﬁtting gamma distribution. 4 Examples In this section, w e present tw o examples to illustrate the application of the theory of Sections 2 and 3 to real data. Our program, av ailable at h ttp://www.imp erial.ac.uk/stathelp/researc hpro jects/metaanalysis can b e used to p erform the calculations for these examples. 4.1 Meta-analysis of the use of a placeb o for pain relief As a ﬁrst example, consider the meta-analysis by Hr´ ob jartsson & Gøtzsc he (2004) of 17 randomized clinical trials comparing the use of a placeb o for pain against no treatment at all. Summary data from the meta-analysis is found in T able 1. Because diﬀerent studies used diﬀeren t measurement scales for ev aluating pain, the standardized mean diﬀerence is used in the meta-analysis in order to place each of the eﬀects on a scale free basis. The eﬀect from each study appears in the table in the column headed ˆ g . The w eights (from Equation 12) whic h app ear in the last column of the table are given as p ercen tages for ease of comparison, but the actual weigh ts are needed for computation of the Q statistic. The actual weigh ts can b e computed using the w eight total whic h is W = 212 . 91. The weigh ted a verage of the eﬀects is ˆ g w = − 0 . 338. The v alue of Co c hran’s Q statistic is 22.07. Using the standard chi-square approximation with 16 degrees of freedom pro vides the p-v alue of 0.141 for the test for homogeneity . T o use the results from Sections 2 and 3, ﬁrst the weigh ts need to b e recalculated to reﬂect the null h yp othesis of equal standardized mean diﬀerences. W e take this null v alue (as found ab ov e) to b e ˆ g w = − 0 . 338 for each of the 17 studies and recalculate the w eights using Equation 12. Then the estimated ﬁrst and second moments of the null distribution of Q can b e calculated from Equations 2 and 3 and the App endix yielding the v alues E[ Q ] = 15 . 19 and E[ Q 2 ] = 257 . 57 resp ectiv ely . Thus the estimated parameters of the approximating gamma distribution are α = 8 . 96 (shap e parameter) and β = 1 . 70 (scale parameter). The p-v alue corresp onding to Q = 22 . 07 is 0.098. The p-v alue for a c hi-square distribution with E( Q ) = 15 . 19 degrees of freedom is 0.112. T o assess the relative accuracy of the three approximations (gamma and chi-square with 16 and with 15.19 degrees of freedom) to the n ull distribution of Q , we conducted a sim ulation of 100,000 random samples with seven teen studies ha ving the same sizes as 11 Study n T ¯ X T s T n C ¯ X C s C ˆ g w % Reading 1982 18 1.60 1.30 20 2.30 2.00 –0.402 4.3 Conn 1986 13 28.20 18.40 14 44.40 15.70 –0.921 2.8 Hashish 1986 25 16.00 11.70 50 30.00 18.90 –0.821 7.2 Hashish 1988 25 42.00 25.00 25 60.00 23.00 –0.738 5.4 Hargrea ves 1989 25 4.50 2.50 25 4.90 2.40 –0.161 5.8 Blanc hard 1990b 18 11.90 23.90 24 20.70 34.80 –0.282 4.7 Blanc hard 1990a 13 8.30 13.60 11 22.50 25.10 –0.697 2.5 Sprott 1993 10 7.90 3.00 10 7.40 3.00 0.160 2.3 F orster 1994 15 3.20 2.80 15 4.60 2.20 –0.541 3.3 P arker 1995 49 4.00 1.90 45 3.80 2.20 0.097 10.9 Ro wb otham 1996 35 –4.40 8.70 35 1.90 8.70 –0.716 7.6 W ang 1997 25 10.70 7.30 26 13.40 5.80 –0.404 5.8 Robinson 2001 13 3.85 3.48 10 4.25 3.74 –0.107 2.6 Cupal 2001 10 2.70 0.95 10 2.70 1.34 0.000 2.3 Ra wling 2001 89 5.30 4.72 96 5.60 4.90 –0.062 21.6 Kotani 2001 23 15.00 4.50 24 18.00 6.00 –0.554 5.2 Lin 2002 25 30.20 14.40 25 38.10 16.00 –0.511 5.6 T able 1: Data on plac eb o interventions for p ain, Hr´ objartsson & Gøtzsche (2004). The data ar e on clinician-r ate d p ain sc ales. The subscripts T and C r efer to the tr e atment and c ontr ol arms of the studies. The c olumn he ade d ˆ g c ontains the estimate d standar dize d me an diﬀer enc es b etwe en the two arms of e ach study and the c olumn he ade d w ar e the weights (as p er c entages) use d in c omputing the Q statistic. those of Hr´ ob jartsson & Gøtzsche (2004), but with all studies having the n ull v alue of the standardized mean diﬀerence δ = − 0 . 338. The comparisons are as follo ws, where the notation ‘true’ n ull refers to the simulation of 100,000 samples: p-v alue for Q = 22 . 07 E[ Q ] E[ Q 2 ] α β sim ulation (‘true’ null) 0.108 15.22 260.76 c hi-square est-df 0.112 15.19 gamma 0.098 15.19 257.57 8.96 1.70 c hi-square 16 df 0.141 16 288 The p-v alue pro duced b y the gamma distribution and esp ecially that from the chi- square distribution with fractional degrees of freedom are substan tially closer to the ‘true’ p-v alue as giv en b y the simulations. Notice that the ﬁrst and second momen ts of the 12 ‘true’ null distribution of Q are smaller than the corresp onding moments of the chi- square distribution, indicating the need for corrections. Our formulas pro duce an excellen t appro ximation of the ﬁrst moment. The approximation for the second moment is m uch b etter than that giv en by the c hi-square distribution, but it is not nearly as goo d as the appro ximation of the ﬁrst moment. 4.2 Meta-analysis of ligh t therap y for depression F or a second example, consider the data from a meta-analysis of ﬁv e studies to determine the eﬀect of ligh t therap y for non-seasonal depression (bright ligh t vs standard treatmen t), T uunainen et al. (2004). See T able 2 for the summary data. Study n T ¯ X T s T n C ¯ X C s C ˆ g w (%) Holsb oer 1994 14 14.50 5.59 14 8.64 8.38 0.80 23.2 F ritzsche 2001b 10 15.80 5.30 10 16.90 6.40 –0.18 17.8 F ritzsche 2001a 11 10.01 8.60 9 9.50 3.80 0.07 17.7 Prask o 2002 11 17.00 11.20 9 13.00 7.90 0.39 17.3 Benedetti 2003 18 11.72 9.25 12 18.75 7.78 –0.79 24.0 T able 2: Data fr om a meta-analysis of light ther apy for non-se asonal depr ession (bright light vs standar d tr e atment), T uunainen et al. (2004). The data ar e on clinician-r ate d mo o d sc ales. The subscripts T and C r efer to the tr e atment and c ontr ol arms of the studies. The c olumn he ade d ˆ g c ontains the standar dize d me an diﬀer enc es b etwe en the two arms of e ach study and the c olumn he ade d w ar e the weights (as p er c entages) use d in c omputing the Q statistic. The outcomes of the treatmen ts w ere measured on a clinician-rated mo od scales. The standardized mean diﬀerence statistic w as used in the meta-analysis because diﬀeren t mo od-scale scores w ere used in diﬀerent studies. The w eighted a verage of the eﬀects is 0.0437. The total of the weigh ts is 27.1. The v alue of Co c hran’s Q statistic is 8.86, and the standard c hi-square approximation with 4 degrees of freedom provides the p-v alue of 0.065 for the test for homogeneit y . T o use the results from Sections 2 and 3, ﬁrst the weigh ts need to b e recalculated to reﬂect the null h yp othesis of equal standardized mean diﬀerences. W e take this null v alue (as found ab o v e) to b e ˆ g w = 0 . 0437 for each of the 5 studies and recalculate the w eights using Equation 12. Then the formulas yield the following results. The estimated ﬁrst and second moments of the null distribution of Q are E[ Q ] = 3 . 70 and E[ Q 2 ] = 19 . 37 resp ectiv ely . Th us the estimated parameters of the approximating gamma distribution are α = 2 . 41 (shap e parameter) and β = 1 . 54 (scale parameter). The p-v alues corresp onding to Q = 22 . 07 are 0.037 (gamma appro ximation) and 0.053 (c hi-square with 3.70 degrees of freedom). 13 T o assess the relative accuracy of the gamma and chi-square approximations to the n ull distribution of Q , we conducted a simulation of 100,000 random samples with ﬁv e studies of the same sizes as that of T uunainen et al. (2004), but with all studies having the null v alue of the standardized mean diﬀerence δ = 0 . 437. The comparisons are as follo ws where the notation ‘true’ n ull refers to the simulation of 100,000 samples: p-v alue for Q = 8 . 86 E[ Q ] E[ Q 2 ] α β sim ulation (‘true’ null) 0.050 3.74 20.95 c hi-square est-df 0.053 3.70 gamma 0.037 3.70 19.37 2.41 1.54 c hi-square 4 df 0.065 4 24 Notice again that the ﬁrst and second moments of the ‘true’ null distribution of Q are smaller than the corresp onding moments of the chi-square distribution. Our form ulas pro- duce b etter appro ximations of these moments, but even with these b etter approximations the p-v alue of the approximating gamma distribution is only sligh tly more accurate than that pro duced b y the c hi-square distribution. The p-v alue from the c hi-square distribu- tion with 3.70 d.f. (0.053) is v ery close to that of the simulations (0.050). The sample sizes which app ear in this meta-analysis (ab out 10 patien ts in eac h of the tw o arms of the studies) are simply too small for the asymptotics implicit in our formulas for the second momen t of Q to b e v alid. It is somewhat surprising, but gratifying, that the metho d based on the chi-square distribution with fractional d.f. is so accurate in this example. F or meta-analyses with samples of such small sizes, p erhaps the b est metho d of ﬁnding a p-v alue associated with the obtained v alue of Q is the b ootstrap t yp e pro cedure whic h w e used ab o v e: conduct a large sim ulation with the sample sizes of the actual data and the weigh ted a v erage of the eﬀects used as a n ull v alue. 4.3 Generalizations from the examples There are some features of the examples which are common not only to the t wo examples but also to all the simulations w e ha v e conducted. W e wish to comment on some of these here. Notice that the mean of the null distribution for Q found via the simulations is somewhat less than the c hi-square mean of I − 1; and the second momen t of Q is substan tially less than the chi-square second moment of I 2 − 1. These facts app ear to b e general. The form ulas of Sections 2 and 3 which we use for estimating the mean and second momen t of Q underestimate b oth the moments but provide estimates whic h are substan tially closer than the c hi-square v alues to the sim ulated v alues. The form ula whic h estimates the mean seems to b e very accurate, but the form ula for estimating the second momen t is not as go od. The o v er-estimation b y the c hi-square appro ximation results, as is w ell kno wn (see for example Viec htbauer (2007)), in a conserv ativ e h ypothesis test; that 14 is, the null hypothesis is not rejected often enough. The underestimation by our form ulas results in a slightly lib eral h yp othesis test when the gamma appro ximation is used, but in general the p-v alues are closer to the true v alues than the chi-square appro ximation is to the true v alues. The chi-square with estimated E( Q ) degrees of freedom pro vides nearly p erfect ﬁt. The ﬁt of the gamma family of distributions to the empirical distribution of Q is remark ably close. The inaccuracy in the p-v alues giv en by our gamma appro ximation app ears to b e due to the underestimation of the second momen t of Q . In fact, if we w ere able to accurately estimate the second moment of Q , then the estimated p-v alues w ould agree with the simulated p-v alues in our examples to three decimals. W e do not understand the reason why the expansion for E[ Q 2 ] is not more accurate, or why it alw ays seems to underestimate the second momen t. Resolution of this question is an area of p ossible future research. 5 Sim ulations The sim ulations w ere p erformed using the R programming language (R Dev elopmen t Core T eam, 2004). The details of the sim ulations are presented in four tables (T ables 6, 7, 8 and 9), all of which compare the Q test using the usual c hi-square appro ximation to the Q test using the gamma approximation and the c hi-square appro ximation with fractional degrees of freedom presented in this article. T able 6 con tains results of the Q test under the null hypothesis in the situation where all studies hav e the same size, the treatmen t and con trol arms are equal, and the combined eﬀect δ is estimated by ˆ g w . T able 7 con tains results similar to that of T able 7, but here the combined eﬀect is estimated by ˆ g A . (See the end of Section 3 .1 for the distinction b et w een ˆ g A and ˆ g w .) T able 8 also con tains results of the Q test under the n ull h yp othesis, but in the situation in which the study sizes are not equal. Finally T able 9 con tains sim ulation results ab out the pow er of the Q test. 5.1 Sim ulations under the n ull h yp othesis: equal study sizes Since q N q (1 − q ) ˆ g /J ∼ t N − 2 ( q N q (1 − q ) δ ) the v alues of ˆ g could b e sim ulated directly from the appropriately scaled non-central t -distribution. In this case the quality of sim- ulations w ould dep end on the implemen tation of the noncen tral t . Instead w e calcu- lated ˆ g from the ﬁrst principles, using σ C = σ T = 1, and sim ulating sample means ¯ X C ∼ N (0 , n − 1 C ), ¯ X T ∼ N ( δ, n − 1 T ) and sample v ariances ( n C − 1) s 2 C ∼ χ 2 n C − 1 and ( n T − 1) s 2 T ∼ χ 2 n T − 1 . The ﬁrst series of simulations was p erformed for the situation in which all I of the studies hav e equal sample sizes. The data pattern used in the ﬁrst series of simulations are describ ed in T ables 3. Eac h data pattern w as replicated 100,000 times. The results 15 of these simulations for the case of equal treatmen t and con trol arms ( q = 1 / 2) app ear in T ables 7 and 6. I (n umber of studies) 5, 10, 20, 50 N (total size of both arms of each study) 20, 30, 40, 100, 200 q (prop ortion of each study size in the con trol arm) 1/2, 3/4 δ (null v alue of the SMD) 0, 0.2, 0.5, 1, 2 T able 3: Data p attern of the simulations use d in T ables 7 and T ables 7 6 for the T yp e I err or in the Q test The choice of δ v alues w as determined by the standard con ven tion (Cohen, 1988) that the δ v alues of 0 . 2 and 0 . 5 constitute small and medium eﬀect sizes, resp ectiv ely . Instead of using the traditional ‘large’ eﬀect size of 0.8, w e mov ed b ey ond to v alues of 1 and 2 to explore the p ossible consequence on the Q test of v ery large v alues of δ . Previous sim u- lations b y Viec htbauer (2007) did not uncov er an y suc h consequence for δ v alues up to 0 . 8. F our p-v alues w ere obtained for each v alue of Q calculated from one of the 100,000 replications: the standard chi-square based p-v alue; the p-v alue based on the gamma appro ximation using the known v alue of δ together with the form ulas giv en in Equations 2 and 3; the p-v alue based on the gamma appro ximation using the estimated null v alue of δ together with the form ulas giv en in Equations 2 and 3; and the p-v alue based on the c hi-square approximation using the estimated degrees of freedom equal to E( Q ). These p-v alues were then compared to the levels α = 0 . 05 and α = 0 . 1 to obtain the type I errors of each appro ximation at the 5% and 10% nominal levels. In the tables b elow these v alues are denoted b y χ 2 α , Γ th α , Γ s α , and χ 2 E ( Q ) ,α resp ectiv ely . In addition to the three p-v alues ( χ 2 α , Γ s α , and χ 2 E ( Q ) ,α ), T able 6 contains the ﬁrst t wo momen ts of Q calculated from our form ulas with known δ (denoted E f [ Q ] and E f [ Q 2 ] in the table, where the subscript f denotes a result calculated from our appro ximation for- m ulas) and their sample counterparts ¯ Q and ¯ Q 2 ; T able 7 additionally pro vides the fourth p-v alue Γ th α , the v ariance V ar f [ Q ] and the sample v ariance s 2 ( Q ). These data p ermit us to judge the accuracy of the formulas which giv e approximations for the moments of Q b y comparing the form ula v alues with the sim ulated distribution of Q . R esults of the simulations with e qual study sizes The ﬁrst set of sim ulations can b e used to answ er t w o t yp es of questions: how accurate are the momen ts estimated by our formulas—especially compared to the accuracy of the standard chi-square appro ximation?; and how accurate are the p-v alues (at the nominal lev els 0.05 and 0.10) given b y the gamma appro ximation and the c hi-square appro ximation with fractional degrees of freedom—esp ecially in comparison with the p-v alues pro duced b y the standard c hi-square appro ximation? W e begin with the moments. 16 A c cur acy of the appr oximating moments The sim ulations pro vide us with sample estimates of the momen ts of Q denoted ¯ Q and ¯ Q 2 , whic h we take to b e ‘true’ v alues. Thus we can estimate the relative error in the ﬁrst momen t of the t w o approximations b y (E f [ Q ] / ¯ Q − 1) × 100% and (( I − 1) / ¯ Q − 1) × 100%; and similarly estimate the relative errors in the second moments by (E f [ Q 2 ] / ¯ Q 2 − 1) × 100% and (( I 2 − 1) / ¯ Q 2 − 1) × 100%. The three graphs of Figure 1 pro vide a summary of the comparison of the t w o approx- imations to the ﬁrst moment. Figure 1: R elative err or of two appr oximations to the me an of Q as a function of the total sample size of e ach study N (left), of the numb er of studies I (c enter), and of the standar dize d me an diﬀer enc e δ (right). The lower curves ar e b ase d on Equation 2 and the upp er curves ar e fr om the chi-squar e ﬁrst moment. On the ﬁrst and the se c ond plots the nul l value of the SMD δ is ﬁxe d at 0.5. On the rightmost plot, the numb er of studies is ﬁxe d at I = 20 . W e see that E f [ Q ] is generally quite accurate although it sligh tly underestimates ¯ Q . In fact the relativ e error in E f [ Q ] is almost alw ays less than 3%, is less than 1% for samples of size N = 30, and is essentially p erfect b eginning with sample sizes of N = 40. In con trast, the chi-square momen t is alwa ys to o large, with relativ e errors more than 10% when N=20 and around 5% when N = 30 or 40. Except for the case when the num ber of studies is small ( I = 5), the relativ e error of the c hi-square ﬁrst momen t remains as high as 1–2% even when the study sizes are as large as N = 200. W e also see from the graphs that the relativ e errors do not seem to depend on the n um b er of studies I or on the standardized mean diﬀerence δ , with the exception that for the chi-square approximation the relative error in the ﬁrst momen t increases slightly for the v ery large (and somewhat unrealistic) v alues of δ = 1 and 2. The three graphs of Figure 2 pro vide a summary of the comparison of the percent error 17 in the appro ximation of the second moment E[ Q 2 ] b y the tw o approximating distributions. Figure 2: R elative err or of two appr oximations to the se c ond moment of Q as a function of the total sample size of e ach study N (left), of the numb er of studies I (c enter), and of the standar dize d me an diﬀer enc e δ (right). The lower curves ar e b ase d on Equation 3 and the upp er curves ar e fr om the chi-squar e se c ond moment. On the ﬁrst and the se c ond plots the nul l value of the SMD δ is ﬁxe d at 0.5. On the rightmost plot, the numb er of studies is ﬁxe d at I = 20 . W e see that the chi-square appro ximation ov erestimates the second momen t while our form ula underestimates the second moment, but b y a smaller amoun t. The percent error for b oth approximations decreases as total sample size N increases. The chi-square error starts at ab out 20% for N = 20 and decreases to 9% for N = 40 and at N = 100 the error is still in the 2–3% range. In con trast, the error using our form ula starts at ab out 9% for N = 20, decreases to less than 2% for N = 40 and at N = 100 the error is less than 1%. W e see from the graphs that the relativ e error in the second moment do es not appear to hav e muc h dep endence on the n umber of studies I , except that there is a small diﬀer- ence in error for the v ery small n umber of studies I = 5. The relativ e error for the form ula v alues E f [ Q 2 ] seems to b e indep enden t of δ , but surprisingly there is some increase in the relativ e error of the c hi-square approximation as δ increases, especially for the v ery large v alues of δ = 1 and 2. A c cur acy of signiﬁc anc e levels: two-moment gamma vs standar d chi-squar e appr oximation The dep endence of the achiev ed lev el on the size of the studies N for our gamma and the standard c hi-square appro ximations can b e seen graphically in Figure 3. The t yp e I error of the Q test of homogeneit y using the standard chi-square approxi- mation is considerably low er than the nominal lev el, and hence the standard test is very conserv ative. This conserv ativ eness is a well kno wn fact; our sim ulations agree with the sim ulations of S´ anchez-Meca & Mar ´ yn-Mart ´ ynez (1997), Viec htbauer (2007), and others. 18 Figure 3: A chieve d levels of the Q test at the nominal level of 0.05 (left) and 0.1 (right) using two appr oximations, as a function of the total sample size of e ach study N . The upp er curves ar e fr om the gamma appr oximation and the lower curves ar e fr om the chi- squar e appr oximation. The nul l value of the SMD δ is ﬁxe d at 0.5. T o b etter show details, the data for N = 20 have b e en omitte d. Because the standard test is so conserv ativ e, there is a well known recommendation to use the 10% signiﬁcance lev el for the Q test (see Petitti (2001), among others). Our sim ulations conﬁrm that this recommendation is certainly justiﬁed; for 10 or more small studies ( N = 20), the t yp e I error at the 10% signiﬁcance level is closer to 5% than to 10%. In con trast, our gamma approximation is somewhat lib eral for small v alues of N . In fact, for total study sizes as small as N = 20 the gamma approximation is suﬃciently p oor that w e do not recommend it. F or N = 30 the true level seems to b e in b et ween the t wo approximations. Starting from N = 40 the gamma appro ximation works b etter than the standard chi-square appro ximation. F or a ﬁxed v alue of I , the p erformance of b oth appro ximations improv es with the study size, but the improv emen t is considerably faster for the gamma approximation. F or N = 100 the gamma approximation delivers p erfect results, whereas the c hi-square approximation is still to o conserv ativ e. F or ﬁxed study size N , the accuracy of the ac hieved levels deca ys as the n umber of studies I increases. F or example, for the gamma approximation, studies of size 40 (and ev en size 30) provide reasonably accurate lev els when there are only I = 5 studies. Ho wev er when the n um b er of studies increases to I = 50, then larger study sizes are necessary to ac hiev e accurate levels. F or I = 50, studies of size 40 are not large enough, but studies of size 100 giv e excellen t results. F or an intermediate n umber of I = 20 studies, the study size of N = 40 gives reasonably accurate levels pro ducing lev els of ab out 0.055 and 0.108 for nominal lev els of 5% and 10% resp ectiv ely . The pattern is similar for the c hi-square approximation: meta-analyses with many studies require large 19 Figure 4: A chieve d levels of the Q test at the nominal level of 0.05 (left) and 0.1 (right) using two appr oximations, as a function of the numb er of studies I . The upp er curves ar e fr om the gamma appr oximation and the lower curves ar e fr om the chi-squar e appr ox- imation. The nul l value of the SMD δ is ﬁxe d at 0.5. T o b etter show details, the data for N = 20 have b e en omitte d. sample sizes for accuracy . But in all cases, the chi-square p erforms less well than the gamma approximation. The dep endence of the b eha vior of the achiev ed levels on I can b e seen in Figure 4. The sim ulations show that the t ype I error of the standard chi-square test decreases as the eﬀect size δ increases. Th us the test is ev en more conserv ativ e for larger eﬀect sizes. Ho wev er, the gamma appro ximation impro ves as the eﬀect size δ increases, con trasting with the worsening of the c hi-square approximation. The dep endence of the b eha vior of the achiev ed lev els on δ can b e seen in Figure 5. 20 Figure 5: A chieve d levels of the Q test at the nominal level of 0.05 (left) and 0.1 (right) using two appr oximations, as a function of the standar dize d me an diﬀer enc e δ . The upp er curves ar e fr om the gamma appr oximation and the lower curves ar e fr om the chi-squar e appr oximation. The numb er of studies is ﬁxe d at I = 20 . T o b etter show details, the data for N = 20 have b e en omitte d. A c cur acy of signiﬁc anc e levels for the chi-squar e appr oximation with fr actional de gr e es of fr e e dom The results of sim ulations to do with the fractional chi-square test are not included in the ﬁgures. As can b e seen from T able 6, in every instance, the fractional c hi-square test is sup erior to the usual chi-square test. Most imp ortan tly for applications is the fact that the impro vemen t giv en by the fractional c hi-square is substantial for small to mo derate sample sizes, from N = 20. As examples of this impro v ement, consider the case of I = 20 studies and δ = 0 . 5. The simulations indicate the follo wing improv emen ts in the achiev ed lev el at the t wo nominal lev els of 0.05 and 0.10: for N = 20 the ac hieved lev els impro ve from 0.021 to 0.046 and from 0.050 to 0.098, resp ectively; for N = 40 from 0.035 to 0.047 and from 0.076 to 0.096, resp ectiv ely; and even for study size as large as N = 100, the ac hieved lev els impro v e from 0.044 to 0.048 and from 0.090 to 0.099, respectively . Other r esults of the e qual study size simulations First, the sim ulations of T able 6 were rep eated with equal total study sizes as b efore, but with each study having an un balanced design with three-quarters of the study size presen t in the con trol arm ( q = 3 / 4). The results were so similar to that of the balanced studies that we hav e not included either a table of the results analogous to T able 6 nor graphical displays of the data. Second, there is not m uc h diﬀerence b et ween the type I error with a kno wn v alue of δ (denoted b y Γ th α in T able 7) and the t yp e I error with an estimated n ull v alue of δ (denoted b y Γ s α in T ables 6 and 7). Of course, only the latter test can be used in practice. 21 Finally , the results in T able 6 used the estimated null v alue of ˆ δ = P w i δ i / P w i . In T able 7 the simulations were rep eated for using ˆ δ = P A − 1 i δ i / P A − 1 i instead. It is known (Y uan & Bushman, 2002) that the former, more natural, estimator is a biased estimator of the com bined null v alue of δ . Do es the c hoice of estimator of δ aﬀect the results? It can b e seen that the only noticeable diﬀerences are for N = 20 and δ = 2. Then the constan t w eigh ts A − 1 i pro vide p-v alues closer to those obtained using the kno wn v alue of δ when using gamma appro ximation. Interestingly , the in verse v ariance weigh ts pro vide p-v alues closer to nominal for K = 10 and K = 20, but not for K = 5 or K = 50. These diﬀerences are only academic though, we do not recommend our gamma appro ximation for N = 20 in any case, and δ = 2 is muc h to o large. Thus, there is no practical diﬀerence b et w een the tw o choices, tak e y our pic k. 5.2 Sim ulations under the n ull h yp othesis: unequal study sizes The second series of simulations used unequal study sizes. W e hav e follow ed a suggestion of S´ anchez-Meca & Mar ´ yn-Mart´ ynez (2000), who selected the follo wing study sizes with the sk ewness of 1.464 whic h they consider t ypical for meta-analyses in the ﬁeld of b ehavioral and health sciences: the set N 1 with av erage study of sixt y , consisting of individual sizes { 24 , 32 , 36 , 40 , 168 } ; the set N 2 with a verage study size of 100, consisting of individual sizes { 64 , 72 , 76 , 80 , 208 } and the set N 3 with a v erage study size of 160, consisting of individual study sizes { 124 , 132 , 136 , 140 , 268 } . W e hav e tak en the studies to b e balanced, th us dividing each study size equally b etw een the t w o study arms. The sim ulations were run for I = 5 , 10 and 20. F or meta-analyses with I = 10 and I = 20, the same set of sample sizes w as rep eated twice or four times, resp ectiv ely . The data patterns of the sim ulations are summarized in T able 4. I (n umber of studies) 5, 10, 20 ¯ N (a verage and (individual) study size s) 60 (24, 32, 36, 40, 168) 100 (64, 72, 76, 80, 208) 160 (124, 132, 136, 140, 268) q (prop ortion of each study in the con trol arm) 1/2 δ (null v alue of the SMD) 0.5 T able 4: Data p attern of the simulations use d in T able 8 for the T yp e I err or in the Q test for une qual study sizes R esults of the simulations with une qual study sizes The results of the sim ulations with unequal study sizes are giv en in T able 8. The appro ximation of the momen ts is excellen t. The ﬁrst momen ts giv en by the form ulas are nearly exact (relative error less than 2%) and the second moments hav e relative error 22 less than 3%, compared with relativ e errors of the chi-square ﬁrst and second momen ts of more than 5% and 10% respectively . The signiﬁcance levels are similar to those obtained from the simulations for equal study sizes. The chi-square appro ximation yields a conserv ativ e test, while the gamma appro ximation yields a liberal test whic h is closer to the nominal lev els. At the signiﬁcance lev els of 0.05 and 0.10, the gamma approximation is nearly p erfect for the larger tw o sizes ¯ N = 100 and 160 while the error in the level of the chi-square approximation is substan tial even for the largest size of ¯ N = 160. F or the smaller size of ¯ N = 60, the gamma approximation has an error of roughly half that of the c hi-square approximation. Graphical displays of the levels are shown in Figure 6. Once more, the results from the fractional chi-squre approximation are nearly p erfect even for the smallest sample sizes. Figure 6: A chieve d levels of the Q test at the nominal levels of 0.05 (left) and 0.1 (right) using two appr oximations, as a function of the aver age sample size of e ach study ¯ N . The sample sizes ar e une qual in this ﬁgur e. The upp er curves ar e fr om the gamma appr oxi- mation and the lower curves ar e fr om the chi-squar e appr oximation. I is the numb er of studies. The standar dize d me an diﬀer enc e has b e en ﬁxe d at δ = 0 . 5 . 5.3 Comparison of the p o w er of the Q tests The standard Q test using the c hi-square appro ximation is well known to hav e low p o wer (see for example Viech tbauer (2007)). In this section we rep ort on simulations to see ho w the p o w er of the Q test is improv ed b y the use of our moment appro ximations. T o this end, we adopt the random eﬀects mo del that the heterogeneit y in eﬀects among the sev eral studies is mo deled b y the assumption that the eﬀect δ i of the i th study is normally distributed ab out a ﬁxed mean δ and with v ariance τ 2 . Then the null (homogeneity) h yp othesis b ecomes τ 2 = 0 and alternativ es are measured by the magnitude of τ 2 . In the sim ulations, we ha v e taken δ = 0 . 5, a ‘medium’ eﬀect size, and ha ve v aried τ 2 from 0.025 23 to 0.25. W e compared the p o w er of the standard and the improv ed tests in the range from N = 20 to N = 80 where we exp ected noticeable diﬀerences: in general, the p ow er of the standard Q test is considered not to b e suﬃcient for N ≤ 80 (Viec h tbauer (2007)). The data patterns for the pow er simulations are sp eciﬁed in T able 5. I (n umber of studies) 5, 10, 20, 50 N (equal study sizes) 20, 30, 40, 50, 60, 80 q (prop ortion of each study in the con trol arm) 1/2 δ (null v alue of the SMD) 0.5 τ 2 (v ariance of random eﬀect) 0.025, 0.05, 0.1, 0.15, 0.20, 0.25 α (nominal signiﬁcance lev el of the test) 0.05, 0.10 T able 5: Data p attern of the simulations use d in T able 9 for the p ower of the Q test. W e conducted 10,000 rep etitions for each conﬁguration. W e simulated within-study parameters δ i ∼ N ( δ, τ 2 ), i = 1 , · · · , I and then simulated the v alues of ˆ g i directly from the appropriately scaled non-cen tral t -distribution with non-centralit y parameter q N q (1 − q ) δ i . The results of the simulations app ear in T able 9. R esults of the p ower simulations Since the test based on the gamma appro ximation is lib eral, its p o wer is higher than the p o wer of the conserv ativ e standard test. W e note that the p o wer of the test using the fractional chi-square distribution is also alwa ys higher than the test using the standard chi- square appro ximation. In this discussion, w e fo cus on the magnitude of the impro vemen t in p ow er rather than on the p o w er for the tests separately . The most striking result of the sim ulations is that the p o w er impr ovement increases as the num ber of studies increases and as the sizes of the studies decrease. The greatest improv emen t in p o w er for the fractional chi-square test in comparison to the standard test (based on the range of our simulations) is 21 p ercen tage p oin ts which o ccurred for I = 50 studies, study sizes N = 20, and for τ 2 = 0 . 1. Maxim um impro vemen t for the other v alues of I were 12 p ercen tage points for I = 20, 7 percentage p oints for I = 10 and 4 p ercen tage p oin ts for I = 5, all o ccurring at the smallest study size of N = 20. As the study sizes N increase from N = 20 to N = 40, the impro vemen t in p o w er for the fractional c hi-square test decreases by roughly t wo-thirds. Finally w e note that the increase in p o wer at the tw o diﬀeren t lev els of 0.05 and 0.10 w ere quite similar to eac h other. Since the gamma approximation is recommended only for N ≥ 40, w e consider this range when comparing the p o w er of the test based on gamma appro ximation to the standard c hi-square test. The greatest impro v emen t in p o w er is 11 to 12 p ercen tage p oin ts whic h o ccurred for the largest num b er of studies I = 50 and the smallest study sizes N = 40. Maxim um impro v emen t for the other v alues of I were 7 p ercen tage points for I = 20, 5 p ercen tage p oin ts for I = 10 and 3 p ercentage p oin ts for I = 5. As the 24 study sizes N increased from N = 40 to N = 80, the improv emen t decreased by roughly half. Once more, the increase in p ow er at the tw o diﬀerent levels of 0.05 and 0.10 w ere quite similar to eac h other. 6 Summary and concluding discussion The main fo cus of this pap er is the impro vemen t of the test for homogeneity commonly used in meta-analysis b y referring Co c hran’s Q statistic to a more accurate distribution. In this pap er, w e hav e considered the situation in which the Q statistic is a function of only one parameter and ha ve applied the results to the case in which the eﬀect of in terest is measured b y the standardized mean diﬀerence (SMD or Cohen’s d statistic), a measure whic h is frequently used in meta-analytic applications. W e hav e presen ted expansions for the ﬁrst t wo moments of Q which are accurate to order O (1 /n ). These expansions th us oﬀer corrections of order O (1 /n ) to the corresp onding momen ts of the c hi-square appro ximation to the distribution of Q . These expansions are the ﬁrst that w e are aw are of to include the situation in which the w eigh ts in the Q statistic are not indep enden t of the eﬀects (as is the case with the SMD). W e considered tw o options to approximate the distribution of Q for the SMD: the use of a gamma distribution with momen ts matc hing those of the expansions or by the chi-square distribution with fractional degrees of freedom matc hing the ﬁrst mo- men t. Both appro ximations result in impro v ed Q tests for homogeneity when the ef- fects are measured by the SMD. T o facilitate the substan tial computations necessary for these impro ved tests, a computer program in the R -language can b e do wnloaded from h ttp://www.imp erial.ac.uk/stathelp/researc hpro jects/metaanalysis . Our simulations show that the impro v ed test for the SMD using the gamma distribu- tion is somewhat lib eral (rejecting the null h yp othesis more often than appropriate); in con trast, the curren tly used test whic h uses the c hi-square distribution is w ell kno wn to b e conserv ative. But the impro ved test based on gamma appro ximation is quite accurate for study sizes of 40 or more (for example, 20 sub jects in each arm of a randomized clinical trial). Ho wev er our recommended test is based not on the gamma approximation but on the use of the fractional c hi-square distribution whose ﬁrst momen t matches that of the expansion. In applications, the parameter in the expansion will need to b e estimated from the data. Th us our recommended appro ximating distribution of Q (namely χ 2 E [ Q ] ) is data dep enden t as opp osed to the now standard approximating distribution of Q (namely χ 2 I − 1 ) which is data indep endent. The result is an impro v ed Q test for homogeneity when the eﬀects are measured by the SMD. Sim ulations show that our recommended improv ed Q test for homogeneity yields a substan tial impro vemen t o v er the standard test in accuracy of ac hieved signiﬁcance lev els, esp ecially for small to mo derate study sizes. In addition the impro ved test provides an 25 increase in p o w er. The sim ulations sho w that the impro v ed test works quite w ell in a v ariet y of circumstances, suc h as when the individual studies hav e unbalanced sizes b et w een the tw o arms or when the studies hav e substan tially diﬀerent total sizes from eac h other. An important limitation of this pap er, which is intended to b e the ﬁrst in a series, is the restriction to the one parameter case. In future work, w e plan to extend our expan- sions to the tw o parameter case and to provide applications to imp ortan t meta-analytic measures such as the risk diﬀerence and the odds ratio. Ac kno wledgemen ts The authors wish to thank Joanne McKenzie for pro viding data from The Co c hrane Database of Systematic Reviews (2004) in an electronic format. 26 A App endix Equation 3 whic h approximates the second momen t of Q needs expressions for v arious deriv atives of Q 2 with resp ect to θ . These deriv ativ es are provided b elo w. But for ease of reference, we ﬁrst repro duce Equation 3. E[ Q 2 ] ≈ 1 24 X i ∂ 4 [ Q 2 ] ∂ θ 4 i E[Θ 4 i ] + 1 8 X i 6 = j X ∂ 4 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j E[Θ 2 i ]E[Θ 2 j ] + 1 120 X i ∂ 5 [ Q 2 ] ∂ θ 5 i E[Θ 5 i ] + 1 12 X i 6 = j X ∂ 5 [ Q 2 ] ∂ θ 3 i ∂ θ 2 j E[Θ 3 i ]E[Θ 2 j ] + 1 720 X i ∂ 6 [ Q 2 ] ∂ θ 6 i E[Θ 6 i ] + 1 48 X i 6 = j X ∂ 6 [ Q 2 ] ∂ θ 4 i ∂ θ 2 j E[Θ 4 i ]E[Θ 2 j ] + 1 48 X i 6 = j X 6 = k X ∂ 6 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j ∂ θ 2 k E[Θ 2 i ]E[Θ 2 j ]E[Θ 2 k ] Here are the deriv ativ es of Q 2 needed for the abov e formula. ∂ 4 [ Q 2 ] ∂ θ 4 i = 24 w 2 i U 2 i (20) ∂ 4 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j = 8 w i w j  U i U j + 2 w i w j W 2  (21) ∂ 5 [ Q 2 ] ∂ θ 5 i = 240 w i U 3 i d f i dθ i (22) ∂ 5 [ Q 2 ] ∂ θ 3 i ∂ θ 2 j = 24 U i w j  U i U j + 5 w i w j W 2  d f i dθ i − 48 w 2 i W  U i U j + w i w j W 2  d f j dθ j (23) ∂ 6 [ Q 2 ] ∂ θ 6 i = 720 U 3 i    U i − 2 w i W  d f i dθ i ! 2 + w i d 2 f i dθ 2 i   (24) ∂ 6 [ Q 2 ] ∂ θ 4 i ∂ θ 2 j = 48 W 4    − 2 W U i w j ( W 2 U i − 4 W w j + 9 w i w j ) d f i dθ i ! 2 (25) + W 2 U i w j ( W 2 − W w j − W w i + 6 w i w j ) d 2 f i dθ 2 i − 8 W U i w i ( W 2 U i − W w j + 3 w i w j ) d f i dθ i ! d f j dθ j ! + w 3 i ( − 2 W + 3 w i ) d f j dθ j ! 2 + W 2 U i w 3 i d 2 f j dθ 2 j    ∂ 6 [ Q 2 ] ∂ θ 2 i ∂ θ 2 j ∂ θ 2 k = − 16 w j w k W 4 ( W w j + W w k − 9 w j w k ) d f i dθ i ! 2 (26) 27 − 16 w i w k W 4 ( W w i + W w k − 9 w i w k ) d f j dθ j ! 2 − 16 w i w j W 4 ( W w i + W w j − 9 w i w j ) d f k dθ k ! 2 + 8 w j w k W 3 ( W w j + W w k − 6 w j w k ) d 2 f i dθ 2 i + 8 w i w k W 3 ( W w i + W w k − 6 w i w k ) d 2 f j dθ 2 j + 8 w i w j W 3 ( W w i + W w j − 6 w i w j ) d 2 f k dθ 2 k − 32 w k W 2  U i U j ( W − 6 w k ) + w i w j W 2 ( W − 12 w k ) + 3 w k  d f i dθ i ! d f j dθ j ! − 32 w j W 2  U i U k ( W − 6 w j ) + w i w k W 2 ( W − 12 w j ) + 3 w j  d f i dθ i ! d f k dθ k ! − 32 w i W 2  U j U k ( W − 6 w i ) + w j w k W 2 ( W − 12 w i ) + 3 w i  d f j dθ j ! d f k dθ k ! 28 References Bain, L. (1969). Moments of a noncen tral t and noncen tral F-Distribution. The Americ an Statistician 23 , 33–34. Biggerstaﬀ, B. and Jac kson, D. (2008). The exact distribution of Co c hrans heterogeneity statistic in one-wa y random eﬀects meta-analysis. Statistics in Me dicine 27 , 6093–6110. Biggerstaﬀ, B. and Tw eedie, R. (1997). Incorp orating v ariabilit y in estimates of het- erogeneit y in the random eﬀects mo del in meta-analysis. Statistics in Me dicine 16 , 753–768. Co c hran, W. (1937). Problems arising in the analysis of a series of similar exp erimen ts. JRSS 4 , 102–118. Cohen, J. 1988, Statistical p o wer analysis for the b eha vioral sciences (2nd ed.) (Hillsdale, NJ: Lawrence Earlbaum Asso ciates). Firth, D. (1993). Bias reduction of maxim um likelihoo d estimates. Biometrika 80 , 27–8. Hedges, L. and Olkin, I. 1985, Statistical metho ds for meta-analysis (Orlando: Academic Press). Hr´ ob jartsson, A. and Gøtzsche, P . (2004). Placeb o interv en tions for all clinical con- ditions. Co chr ane Datab ase of Systematic R eviews Art. No.: CD003974. DOI: 10.1002/14651858.CD003974.pub2. Jac kson, D. (2006). The p o w er of the standard test for the presence of heterogeneity in meta-analysis. Statistics in Me dicine 25 , 2688–2699. James, G. (1951). The Comparison of Sev eral Groups of Observ ations when the Ratios of the P opulation V ariances are unkno wn. Biometrika 38 , 324–329. Johnson, N., Kotz, S., and Balakrishnan, N. 1995, Con tinuous Univ ariate Distributions, V ol. 2 (New Y ork: John Wiley & Sons). Kulinsk ay a, E. and Dollinger, M. (2007). Robust w eighted one-wa y ANOV A: Impro ved appro ximation and eﬃciency . Journal of Statistic al Planning and Infer enc e 137 , 462– 472. Kulinsk ay a, E., Dollinger, M., Knight, E., and Gao, H. (2004). A Welch-t ype test for homogeneit y of contrasts under heteroscedasticity with application to meta-analysis. Statistics in Me dicine 23 , 3655–3670. 29 Kulinsk ay a, E., Staudte, R., and Gao, H. (2003). Po w er approximations in testing for unequal means in a one-wa y ANOV A weigh ted for unequal v ariances. Communic ations in Statistics—The ory and Metho ds 32 , 2353–2371. Normand, S.-L. (1999). Meta-analysis: F orm ulating, ev aluating, combining, and rep ort- ing. Statistics in Me dicine 18 , 321–359. P etitti, D. (2001). Approac hes to heterogeneity in meta-analysis. Statistics in Me dicine 20 , 3625–3633. R Developmen t Core T eam. 2004, R: A language and en vironmen t for statistical comput- ing, R F oundation for Statistical Computing, Vienna, Austria, iSBN 3-900051-00-3. S´ anchez-Meca, J. and Mar´ yn-Mart ´ ynez, F. (1997). Homogeneit y tests in meta-analysis: A Monte Carlo comparison of statistical p ow er and T yp e I error. Quality & Quantity 31 , 385–399. S´ anchez-Meca, J. and Mar´ yn-Mart ´ ynez, F. (2000). T esting the signiﬁcance of a common risk diﬀerence in meta-analysis. Computational Statistics & Data Analysis 33 , 299–313. T uunainen, A., Kripke, D., and Endo, T. (2004). Ligh t therapy for non-seasonal depression. Co chr ane Datab ase of Systematic R eviews Art. No.: CD004050. DOI: 10.1002/14651858.CD004050.pub2. Viec htbauer, W. (2007). Hyp othesis tests for p opulation heterogeneity in meta-analysis. British Journal of Mathematic al and Statistic al Psycholo gy 60 , 29–60. W elc h, B. (1951). On the comparison of sev eral mean v alues: an alternative approac h. Biometrika 38 , 330–336. Y uan, K.-H. and Bushman, B. (2002). Com bining standardized mean diﬀerences using the metho d of maximum lik eliho o d. Psychometrika 67 , 589–608. 30 I N δ χ 2 . 05 Γ s . 05 χ 2 E ( Q ) ,. 05 χ 2 . 1 Γ s . 1 χ 2 E ( Q ) ,. 1 E f ( Q ) ¯ Q E f ( Q 2 ) ¯ Q 2 5 10 0 0.014 NA 0.052 0.041 NA 0.119 2.8 3.2 -12.2 14.9 5 10 0.2 0.014 NA 0.051 0.039 NA 0.119 2.7 3.2 -14.3 14.8 5 10 0.5 0.013 NA 0.052 0.037 NA 0.120 2.6 3.1 -22.7 14.5 5 10 1 0.012 NA 0.055 0.035 NA 0.127 2.5 3.1 -24.4 14.1 5 10 2 0.008 NA 0.039 0.027 NA 0.097 2.8 3.0 134.3 12.8 5 14 0 0.026 NA 0.044 0.061 NA 0.097 3.4 3.5 12.7 18.1 5 14 0.2 0.026 NA 0.044 0.062 NA 0.097 3.4 3.5 12.4 18.0 5 14 0.5 0.025 NA 0.045 0.061 NA 0.098 3.3 3.5 10.9 17.9 5 14 1 0.023 NA 0.043 0.056 NA 0.097 3.2 3.4 8.9 17.3 5 14 2 0.019 NA 0.040 0.049 NA 0.093 3.2 3.3 20.0 16.3 5 16 0 0.030 0.105 0.045 0.068 0.161 0.096 3.5 3.6 15.7 19.1 5 16 0.2 0.030 0.110 0.044 0.068 0.164 0.097 3.5 3.6 15.4 19.0 5 16 0.5 0.029 0.124 0.044 0.066 0.178 0.095 3.5 3.6 14.5 18.8 5 16 1 0.028 0.160 0.044 0.063 0.212 0.097 3.4 3.5 13.0 18.3 5 16 2 0.022 0.061 0.040 0.055 0.112 0.091 3.3 3.4 17.8 17.2 5 20 0 0.034 0.069 0.045 0.075 0.123 0.096 3.7 3.7 18.5 20.3 5 20 0.2 0.034 0.070 0.045 0.075 0.123 0.096 3.6 3.7 18.4 20.2 5 20 0.5 0.034 0.073 0.046 0.074 0.127 0.095 3.6 3.7 17.9 20.1 5 20 1 0.032 0.081 0.045 0.072 0.134 0.095 3.6 3.6 16.9 19.7 5 20 2 0.028 0.058 0.042 0.064 0.110 0.092 3.5 3.6 18.1 18.7 5 30 0 0.040 0.055 0.047 0.084 0.106 0.096 3.8 3.8 21.1 21.6 5 30 0.2 0.039 0.054 0.046 0.084 0.106 0.096 3.8 3.8 21.0 21.5 5 30 0.5 0.040 0.056 0.047 0.084 0.108 0.096 3.8 3.8 20.8 21.5 5 30 1 0.038 0.058 0.046 0.082 0.109 0.096 3.8 3.8 20.3 21.2 5 30 2 0.036 0.054 0.045 0.078 0.105 0.095 3.7 3.7 20.1 20.7 5 40 0 0.042 0.051 0.046 0.088 0.102 0.096 3.9 3.8 22.0 22.0 5 40 0.2 0.044 0.053 0.048 0.090 0.104 0.098 3.9 3.9 22.0 22.4 5 40 0.5 0.043 0.054 0.048 0.088 0.103 0.097 3.8 3.9 21.9 22.3 5 40 1 0.041 0.054 0.047 0.087 0.104 0.096 3.8 3.8 21.5 21.9 5 40 2 0.039 0.052 0.046 0.082 0.102 0.094 3.8 3.8 21.2 21.4 5 100 0 0.048 0.051 0.050 0.097 0.101 0.100 3.9 4.0 23.3 23.4 5 100 0.2 0.048 0.051 0.049 0.096 0.100 0.099 3.9 3.9 23.3 23.4 5 100 0.5 0.046 0.050 0.048 0.095 0.100 0.098 3.9 3.9 23.3 23.3 5 100 1 0.046 0.050 0.048 0.095 0.100 0.098 3.9 3.9 23.2 23.1 5 100 2 0.045 0.050 0.048 0.093 0.100 0.098 3.9 3.9 23.0 23.0 5 200 0 0.049 0.051 0.050 0.098 0.101 0.100 4.0 4.0 23.7 23.8 5 200 0.2 0.049 0.050 0.050 0.097 0.100 0.099 4.0 4.0 23.7 23.6 5 200 0.5 0.048 0.049 0.049 0.097 0.099 0.099 4.0 4.0 23.7 23.6 31 5 200 1 0.049 0.050 0.050 0.097 0.100 0.099 4.0 4.0 23.6 23.7 5 200 2 0.048 0.050 0.049 0.098 0.102 0.101 4.0 4.0 23.5 23.7 10 10 0 0.007 NA 0.068 0.022 NA 0.149 5.8 7.1 -29.7 60.2 10 10 0.2 0.007 NA 0.070 0.022 NA 0.152 5.8 7.1 -35.7 60.4 10 10 0.5 0.007 NA 0.072 0.022 NA 0.157 5.6 7.0 -58.3 59.4 10 10 1 0.006 NA 0.073 0.019 NA 0.162 5.4 6.9 -46.2 57.2 10 10 2 0.004 NA 0.034 0.013 NA 0.085 6.9 6.6 523.3 52.9 10 14 0 0.019 NA 0.047 0.047 NA 0.101 7.4 7.8 54.2 73.6 10 14 0.2 0.019 NA 0.046 0.046 NA 0.102 7.4 7.8 53.2 73.7 10 14 0.5 0.018 NA 0.047 0.045 NA 0.103 7.3 7.8 48.9 72.9 10 14 1 0.016 NA 0.047 0.041 NA 0.104 7.2 7.7 44.1 71.2 10 14 2 0.011 NA 0.037 0.032 NA 0.085 7.3 7.4 88.7 66.5 10 16 0 0.022 0.156 0.044 0.053 0.214 0.096 7.8 8.0 65.1 77.3 10 16 0.2 0.022 0.167 0.045 0.053 0.224 0.099 7.7 8.0 64.5 77.5 10 16 0.5 0.022 0.213 0.045 0.053 0.263 0.098 7.7 7.9 61.7 76.6 10 16 1 0.020 0.310 0.044 0.048 0.344 0.097 7.5 7.9 57.6 74.8 10 16 2 0.016 0.043 0.039 0.041 0.088 0.089 7.5 7.7 78.1 71.0 10 20 0 0.028 0.082 0.046 0.064 0.137 0.095 8.1 8.2 76.3 82.6 10 20 0.2 0.028 0.083 0.046 0.064 0.139 0.095 8.1 8.2 75.9 82.3 10 20 0.5 0.028 0.091 0.046 0.064 0.150 0.098 8.1 8.2 74.3 82.1 10 20 1 0.026 0.101 0.045 0.060 0.158 0.095 8.0 8.1 71.4 80.2 10 20 2 0.022 0.052 0.041 0.052 0.104 0.090 7.9 8.0 77.0 77.3 10 30 0 0.036 0.058 0.046 0.076 0.109 0.095 8.5 8.5 86.5 88.6 10 30 0.2 0.036 0.058 0.047 0.077 0.111 0.096 8.5 8.5 86.4 88.3 10 30 0.5 0.035 0.059 0.046 0.077 0.113 0.096 8.5 8.5 85.6 88.3 10 30 1 0.034 0.062 0.046 0.074 0.115 0.096 8.4 8.5 84.0 87.1 10 30 2 0.030 0.052 0.043 0.068 0.103 0.092 8.3 8.3 83.6 84.5 10 40 0 0.039 0.053 0.046 0.082 0.104 0.095 8.6 8.6 90.4 90.9 10 40 0.2 0.040 0.054 0.047 0.084 0.106 0.098 8.6 8.7 90.3 91.5 10 40 0.5 0.041 0.056 0.049 0.084 0.108 0.099 8.6 8.7 89.9 91.9 10 40 1 0.037 0.054 0.046 0.080 0.107 0.096 8.6 8.6 88.7 90.2 10 40 2 0.035 0.052 0.045 0.076 0.102 0.094 8.5 8.5 87.7 88.3 10 100 0 0.045 0.050 0.048 0.094 0.101 0.099 8.9 8.9 96.0 96.1 10 100 0.2 0.045 0.050 0.048 0.091 0.098 0.096 8.9 8.9 96.0 95.8 10 100 0.5 0.046 0.051 0.049 0.094 0.101 0.099 8.9 8.9 95.8 96.2 10 100 1 0.046 0.051 0.049 0.093 0.102 0.099 8.8 8.8 95.5 95.7 10 100 2 0.043 0.050 0.048 0.090 0.100 0.098 8.8 8.8 94.8 95.0 10 200 0 0.047 0.049 0.048 0.095 0.099 0.098 8.9 8.9 97.6 96.9 10 200 0.2 0.049 0.051 0.050 0.098 0.101 0.100 8.9 8.9 97.5 97.5 32 10 200 0.5 0.048 0.051 0.050 0.098 0.101 0.100 8.9 8.9 97.5 97.7 10 200 1 0.048 0.050 0.049 0.097 0.101 0.100 8.9 8.9 97.3 97.6 10 200 2 0.045 0.049 0.047 0.093 0.098 0.097 8.9 8.9 96.9 96.4 20 10 0 0.003 NA 0.103 0.011 NA 0.206 12.0 14.8 -34.5 241.8 20 10 0.2 0.003 NA 0.102 0.011 NA 0.205 11.9 14.8 -49.7 240.3 20 10 0.5 0.003 NA 0.109 0.010 NA 0.219 11.5 14.7 -105.9 237.6 20 10 1 0.002 NA 0.116 0.008 NA 0.231 11.3 14.5 -59.6 231.0 20 10 2 0.001 NA 0.031 0.005 NA 0.076 15.2 14.0 1481.0 214.1 20 14 0 0.013 NA 0.052 0.033 NA 0.110 15.5 16.4 229.0 294.8 20 14 0.2 0.012 NA 0.051 0.032 NA 0.111 15.5 16.3 226.1 294.1 20 14 0.5 0.012 NA 0.053 0.031 NA 0.112 15.3 16.3 213.7 292.4 20 14 1 0.010 NA 0.053 0.027 NA 0.115 15.0 16.1 200.3 285.7 20 14 2 0.006 NA 0.037 0.020 NA 0.087 15.5 15.7 328.6 269.9 20 16 0 0.016 NA 0.048 0.040 NA 0.103 16.2 16.8 267.9 311.0 20 16 0.2 0.017 NA 0.048 0.040 NA 0.103 16.2 16.8 266.0 310.6 20 16 0.5 0.015 NA 0.048 0.038 NA 0.105 16.1 16.7 257.5 307.4 20 16 1 0.014 NA 0.049 0.035 NA 0.106 15.8 16.6 245.1 302.6 20 16 2 0.010 0.037 0.040 0.027 0.081 0.089 15.9 16.2 304.3 287.1 20 20 0 0.022 0.101 0.046 0.052 0.162 0.098 17.0 17.3 308.9 330.7 20 20 0.2 0.022 0.102 0.046 0.051 0.162 0.097 17.0 17.3 307.8 329.2 20 20 0.5 0.021 0.115 0.046 0.050 0.176 0.098 16.9 17.3 302.8 328.4 20 20 1 0.020 0.135 0.047 0.048 0.196 0.099 16.7 17.1 293.2 323.5 20 20 2 0.016 0.053 0.042 0.040 0.107 0.091 16.6 16.8 308.9 312.0 20 30 0 0.031 0.062 0.046 0.069 0.115 0.097 17.9 17.9 348.4 355.0 20 30 0.2 0.032 0.063 0.047 0.069 0.116 0.097 17.9 17.9 347.9 355.9 20 30 0.5 0.030 0.063 0.046 0.066 0.116 0.095 17.8 17.9 345.5 353.3 20 30 1 0.030 0.068 0.048 0.066 0.124 0.098 17.7 17.8 339.8 351.4 20 30 2 0.025 0.054 0.043 0.057 0.106 0.091 17.5 17.6 337.4 340.8 20 40 0 0.036 0.056 0.047 0.077 0.108 0.098 18.2 18.2 363.8 367.3 20 40 0.2 0.036 0.056 0.048 0.077 0.106 0.096 18.2 18.2 363.5 367.3 20 40 0.5 0.035 0.056 0.047 0.076 0.108 0.096 18.2 18.2 361.9 364.8 20 40 1 0.034 0.058 0.048 0.074 0.111 0.098 18.1 18.2 357.9 364.0 20 40 2 0.031 0.053 0.045 0.068 0.104 0.094 17.9 18.0 353.7 356.3 20 100 0 0.045 0.051 0.049 0.092 0.103 0.100 18.7 18.7 386.5 388.1 20 100 0.2 0.044 0.050 0.048 0.091 0.102 0.099 18.7 18.7 386.4 387.0 20 100 0.5 0.044 0.051 0.048 0.090 0.101 0.098 18.7 18.7 386.0 385.6 20 100 1 0.044 0.051 0.049 0.089 0.101 0.097 18.7 18.7 384.5 384.8 20 100 2 0.042 0.051 0.048 0.088 0.101 0.098 18.6 18.6 381.9 382.5 20 200 0 0.048 0.051 0.050 0.096 0.101 0.100 18.9 18.9 393.0 394.3 33 20 200 0.2 0.047 0.050 0.049 0.097 0.102 0.101 18.9 18.9 393.0 393.6 20 200 0.5 0.046 0.050 0.049 0.096 0.101 0.100 18.9 18.9 392.7 393.0 20 200 1 0.047 0.051 0.050 0.095 0.100 0.099 18.8 18.9 392.1 392.8 20 200 2 0.046 0.050 0.049 0.094 0.100 0.099 18.8 18.8 390.6 391.6 50 20 0 0.014 0.169 0.049 0.034 0.231 0.103 43.8 44.6 1945.7 2064.6 50 20 0.2 0.014 0.173 0.050 0.034 0.235 0.102 43.7 44.5 1940.1 2059.9 50 20 0.5 0.013 0.210 0.051 0.034 0.269 0.107 43.5 44.5 1914.7 2057.5 50 20 1 0.011 0.285 0.050 0.030 0.333 0.107 43.0 44.1 1862.5 2022.4 50 20 2 0.008 0.082 0.042 0.022 0.140 0.091 42.9 43.3 1907.4 1950.6 50 30 0 0.023 0.070 0.046 0.053 0.127 0.096 46.0 46.2 2182.5 2218.9 50 30 0.2 0.024 0.073 0.048 0.055 0.130 0.099 45.9 46.3 2179.8 2226.4 50 30 0.5 0.023 0.076 0.048 0.054 0.133 0.099 45.8 46.1 2166.7 2215.6 50 30 1 0.022 0.084 0.049 0.051 0.141 0.100 45.5 46.0 2134.5 2197.4 50 30 2 0.017 0.065 0.043 0.042 0.119 0.092 45.2 45.4 2110.0 2137.9 50 40 0 0.030 0.061 0.048 0.066 0.115 0.099 46.9 47.0 2277.4 2298.3 50 40 0.2 0.030 0.060 0.047 0.066 0.114 0.098 46.9 47.0 2275.6 2295.2 50 40 0.5 0.029 0.060 0.046 0.063 0.114 0.097 46.8 46.9 2266.9 2290.6 50 40 1 0.029 0.065 0.049 0.063 0.120 0.099 46.6 46.8 2243.9 2277.0 50 40 2 0.024 0.058 0.045 0.055 0.110 0.093 46.2 46.4 2214.5 2234.3 50 100 0 0.041 0.051 0.049 0.085 0.101 0.098 48.2 48.2 2419.8 2420.3 50 100 0.2 0.041 0.051 0.049 0.086 0.103 0.100 48.2 48.2 2419.2 2416.6 50 100 0.5 0.041 0.052 0.049 0.086 0.103 0.100 48.2 48.3 2416.3 2426.5 50 100 1 0.040 0.051 0.048 0.083 0.101 0.097 48.1 48.1 2408.0 2407.3 50 100 2 0.038 0.051 0.047 0.081 0.102 0.098 48.0 48.0 2391.7 2395.3 50 200 0 0.046 0.050 0.049 0.093 0.100 0.099 48.6 48.6 2460.7 2461.6 50 200 0.2 0.046 0.051 0.050 0.094 0.102 0.100 48.6 48.6 2460.5 2459.7 50 200 0.5 0.046 0.051 0.050 0.093 0.100 0.099 48.6 48.6 2459.1 2459.3 50 200 1 0.046 0.051 0.050 0.092 0.101 0.099 48.6 48.6 2455.1 2456.8 50 200 2 0.044 0.051 0.049 0.091 0.101 0.099 48.5 48.5 2446.5 2450.8 T able 6: T yp e I err or of the standar d Q test and the impr ove d Q test for homo geneity (gamma- and χ 2 E ( Q ) appr oximations) under the nul l and moments of the distribution of Q . Sample sizes ar e e qual and b alanc e d. The c olumn he adings ar e deﬁne d in Se ction 5.1. Her e ˆ δ = P w i δ i /W . 34 I N δ χ 2 . 05 Γ th . 05 Γ s . 05 χ 2 . 1 Γ th . 1 Γ s . 1 E f ( Q ) ¯ Q E f ( Q 2 ) ¯ Q 2 V ar f ( Q ) s 2 ( Q ) 5 20 0.0 0.035 0.068 0.070 0.077 0.120 0.122 3.7 3.7 18.5 20.1 5.2 6.6 5 20 0.2 0.034 0.068 0.069 0.074 0.120 0.122 3.6 3.7 18.4 20.0 5.1 6.6 5 20 0.5 0.034 0.074 0.075 0.075 0.128 0.129 3.6 3.7 17.9 20.1 4.8 6.6 5 20 1.0 0.032 0.083 0.082 0.071 0.137 0.136 3.6 3.6 16.9 19.6 4.2 6.4 5 20 2.0 0.028 0.054 0.050 0.066 0.108 0.104 3.5 3.6 18.1 18.9 5.8 6.1 5 30 0.0 0.041 0.055 0.056 0.086 0.107 0.107 3.8 3.8 21.1 21.8 6.7 7.2 5 30 0.2 0.041 0.056 0.056 0.086 0.107 0.107 3.8 3.8 21.0 21.7 6.6 7.3 5 30 0.5 0.039 0.056 0.056 0.084 0.107 0.107 3.8 3.8 20.8 21.5 6.5 7.1 5 30 1.0 0.039 0.058 0.059 0.083 0.111 0.111 3.8 3.8 20.3 21.3 6.2 7.1 5 30 2.0 0.036 0.054 0.053 0.078 0.106 0.106 3.7 3.7 20.1 20.7 6.4 6.8 5 40 0.0 0.043 0.053 0.053 0.089 0.103 0.104 3.9 3.9 22.0 22.3 7.1 7.4 5 40 0.2 0.043 0.053 0.053 0.089 0.103 0.103 3.9 3.9 22.0 22.4 7.1 7.4 5 40 0.5 0.043 0.054 0.054 0.089 0.103 0.103 3.9 3.9 21.9 22.3 7.0 7.4 5 40 1.0 0.042 0.054 0.054 0.089 0.106 0.105 3.8 3.8 21.5 22.1 6.9 7.3 5 40 2.0 0.039 0.052 0.049 0.084 0.104 0.102 3.8 3.8 21.2 21.5 6.9 7.0 5 100 0.0 0.047 0.050 0.050 0.095 0.100 0.100 4.0 4.0 23.3 23.3 7.7 7.8 5 100 0.2 0.047 0.050 0.050 0.096 0.100 0.100 4.0 4.0 23.3 23.4 7.7 7.8 5 100 0.5 0.048 0.052 0.052 0.097 0.102 0.102 4.0 4.0 23.3 23.5 7.7 7.9 5 100 1.0 0.047 0.050 0.050 0.096 0.102 0.102 3.9 4.0 23.2 23.4 7.7 7.8 5 100 2.0 0.045 0.049 0.049 0.093 0.100 0.100 3.9 3.9 23.0 23.0 7.6 7.6 5 200 0.0 0.048 0.050 0.050 0.097 0.099 0.099 4.0 4.0 23.7 23.5 7.9 7.8 5 200 0.2 0.049 0.050 0.050 0.098 0.100 0.100 4.0 4.0 23.7 23.8 7.9 7.9 5 200 0.5 0.049 0.050 0.050 0.099 0.101 0.101 4.0 4.0 23.7 23.7 7.9 7.9 5 200 1.0 0.049 0.050 0.050 0.098 0.100 0.100 4.0 4.0 23.6 23.7 7.9 7.9 5 200 2.0 0.048 0.050 0.050 0.097 0.101 0.101 4.0 4.0 23.5 23.4 7.8 7.8 10 20 0.0 0.030 0.082 0.083 0.065 0.137 0.139 8.1 8.2 76.3 82.7 10.3 14.9 10 20 0.2 0.027 0.081 0.082 0.063 0.138 0.139 8.1 8.2 75.9 82.1 10.1 14.7 10 20 0.5 0.028 0.090 0.091 0.061 0.147 0.148 8.1 8.2 74.3 81.7 9.2 14.5 10 20 1.0 0.025 0.104 0.102 0.060 0.163 0.161 8.0 8.1 71.4 80.4 8.0 14.3 10 20 2.0 0.022 0.043 0.039 0.051 0.093 0.089 7.9 8.0 77.0 77.0 15.1 13.5 10 30 0.0 0.037 0.059 0.059 0.078 0.111 0.111 8.5 8.5 86.5 88.8 14.4 16.1 10 30 0.2 0.036 0.059 0.059 0.078 0.111 0.111 8.5 8.5 86.4 88.6 14.4 16.0 10 30 0.5 0.035 0.059 0.060 0.077 0.113 0.113 8.5 8.5 85.6 88.1 14.1 15.9 10 30 1.0 0.034 0.062 0.062 0.073 0.114 0.114 8.4 8.4 84.0 86.8 13.5 15.6 10 30 2.0 0.031 0.053 0.053 0.070 0.106 0.105 8.3 8.4 83.6 85.0 14.7 15.2 10 40 0.0 0.040 0.054 0.054 0.084 0.105 0.106 8.6 8.7 90.4 91.6 15.7 16.6 10 40 0.2 0.040 0.054 0.055 0.084 0.105 0.106 8.6 8.7 90.3 91.5 15.7 16.7 10 40 0.5 0.040 0.055 0.055 0.082 0.104 0.105 8.6 8.6 89.9 91.1 15.5 16.5 35 10 40 1.0 0.038 0.055 0.054 0.080 0.107 0.105 8.6 8.6 88.7 90.3 15.2 16.2 10 40 2.0 0.034 0.050 0.048 0.075 0.101 0.099 8.5 8.5 87.7 88.1 15.5 15.7 10 100 0.0 0.047 0.051 0.051 0.095 0.103 0.103 8.9 8.9 96.0 96.5 17.3 17.7 10 100 0.2 0.047 0.051 0.051 0.094 0.101 0.101 8.9 8.9 96.0 96.4 17.3 17.5 10 100 0.5 0.046 0.051 0.051 0.093 0.100 0.100 8.9 8.9 95.8 96.0 17.3 17.4 10 100 1.0 0.044 0.050 0.050 0.091 0.099 0.099 8.9 8.8 95.5 95.1 17.2 17.2 10 100 2.0 0.044 0.050 0.050 0.090 0.099 0.099 8.8 8.8 94.8 94.5 17.1 17.1 10 200 0.0 0.047 0.050 0.050 0.097 0.101 0.101 8.9 8.9 97.6 97.4 17.7 17.6 10 200 0.2 0.049 0.051 0.051 0.098 0.101 0.101 8.9 9.0 97.6 98.0 17.7 17.8 10 200 0.5 0.048 0.050 0.050 0.097 0.100 0.100 8.9 8.9 97.5 97.6 17.7 17.7 10 200 1.0 0.047 0.050 0.050 0.097 0.101 0.101 8.9 9.0 97.3 97.8 17.6 17.7 10 200 2.0 0.046 0.049 0.049 0.094 0.099 0.099 8.9 8.9 96.9 96.9 17.6 17.4 20 20 0.0 0.022 0.100 0.101 0.053 0.160 0.161 17.0 17.3 308.9 330.5 18.6 30.9 20 20 0.2 0.022 0.102 0.103 0.052 0.161 0.162 17.0 17.3 307.8 329.0 18.2 31.0 20 20 0.5 0.022 0.117 0.118 0.050 0.179 0.179 16.9 17.3 302.8 328.6 16.0 30.8 20 20 1.0 0.019 0.138 0.136 0.047 0.201 0.199 16.7 17.1 293.2 323.1 13.2 30.0 20 20 2.0 0.015 0.042 0.038 0.039 0.092 0.087 16.6 16.8 308.9 310.7 32.5 28.2 20 30 0.0 0.032 0.063 0.063 0.070 0.117 0.117 17.9 18.0 348.4 357.5 29.3 33.8 20 30 0.2 0.032 0.063 0.063 0.069 0.117 0.118 17.9 18.0 347.9 356.4 29.2 33.8 20 30 0.5 0.031 0.063 0.064 0.067 0.119 0.119 17.8 17.9 345.5 354.2 28.4 33.4 20 30 1.0 0.029 0.066 0.066 0.064 0.121 0.121 17.7 17.8 339.8 349.0 27.1 32.7 20 30 2.0 0.025 0.053 0.052 0.058 0.105 0.104 17.5 17.6 337.4 340.8 30.7 31.6 20 40 0.0 0.035 0.054 0.055 0.076 0.107 0.108 18.2 18.2 363.8 366.8 32.6 34.7 20 40 0.2 0.035 0.054 0.055 0.076 0.107 0.107 18.2 18.2 363.5 366.8 32.5 34.5 20 40 0.5 0.035 0.056 0.056 0.076 0.109 0.109 18.2 18.2 361.9 365.7 32.1 34.7 20 40 1.0 0.035 0.058 0.056 0.073 0.110 0.109 18.1 18.2 357.9 364.0 31.3 34.3 20 40 2.0 0.030 0.052 0.050 0.068 0.102 0.100 17.9 17.9 353.7 355.1 32.4 33.1 20 100 0.0 0.044 0.051 0.051 0.091 0.101 0.101 18.7 18.7 386.5 386.7 36.4 36.9 20 100 0.2 0.045 0.052 0.052 0.092 0.102 0.102 18.7 18.7 386.4 387.5 36.4 37.0 20 100 0.5 0.044 0.051 0.051 0.091 0.102 0.102 18.7 18.7 386.0 386.7 36.3 36.7 20 100 1.0 0.044 0.051 0.051 0.089 0.101 0.101 18.7 18.7 384.5 385.2 36.1 36.6 20 100 2.0 0.042 0.049 0.049 0.086 0.100 0.100 18.6 18.6 381.9 382.4 36.0 36.0 20 200 0.0 0.047 0.050 0.050 0.095 0.100 0.100 18.9 18.8 393.0 391.9 37.3 37.5 20 200 0.2 0.046 0.049 0.049 0.094 0.099 0.099 18.9 18.9 393.0 392.9 37.3 37.1 20 200 0.5 0.047 0.050 0.050 0.096 0.101 0.101 18.9 18.9 392.7 394.7 37.3 37.6 20 200 1.0 0.047 0.051 0.051 0.094 0.099 0.099 18.8 18.8 392.1 391.0 37.2 37.3 20 200 2.0 0.045 0.049 0.049 0.092 0.099 0.099 18.8 18.8 390.6 389.9 37.1 36.7 50 20 0 0.014 0.169 0.169 0.034 0.230 0.231 43.8 44.6 1945.7 2064.8 29.2 79.4 50 20 0.2 0.014 0.176 0.177 0.035 0.239 0.240 43.7 44.6 1940.1 2065.0 27.6 80.0 36 50 20 0.5 0.014 0.214 0.216 0.033 0.274 0.275 43.5 44.4 1914.7 2054.5 20.6 79.0 50 20 1 0.012 0.290 0.288 0.030 0.338 0.336 43.0 44.1 1862.5 2019.2 10.5 76.8 50 20 2 0.008 0.061 0.057 0.022 0.116 0.112 42.9 43.3 1907.4 1949.7 67.4 72.8 50 30 0 0.025 0.073 0.073 0.055 0.130 0.130 46.0 46.3 2182.5 2226.6 69.2 87.0 50 30 0.2 0.024 0.073 0.073 0.054 0.129 0.129 45.9 46.2 2179.8 2224.8 68.7 87.0 50 30 0.5 0.023 0.076 0.076 0.052 0.133 0.133 45.8 46.1 2166.7 2213.2 66.3 85.9 50 30 1 0.022 0.083 0.083 0.050 0.142 0.142 45.5 45.9 2134.5 2195.9 61.8 84.7 50 30 2 0.017 0.063 0.062 0.042 0.116 0.115 45.2 45.3 2110.0 2136.7 71.4 80.6 50 40 0.0 0.030 0.060 0.060 0.066 0.113 0.113 46.9 47.0 2277.4 2299.9 80.8 89.6 50 40 0.2 0.029 0.059 0.059 0.065 0.113 0.113 46.9 47.0 2275.6 2297.9 80.6 89.2 50 40 0.5 0.030 0.062 0.062 0.065 0.114 0.115 46.8 46.9 2266.9 2290.7 79.3 89.6 50 40 1.0 0.028 0.065 0.065 0.062 0.118 0.119 46.6 46.8 2243.9 2277.9 76.7 88.1 50 40 2.0 0.024 0.057 0.057 0.054 0.109 0.109 46.2 46.3 2214.5 2231.7 79.3 85.6 50 100 0.0 0.041 0.051 0.052 0.086 0.102 0.102 48.2 48.2 2419.8 2419.3 93.5 94.7 50 100 0.2 0.042 0.052 0.052 0.086 0.102 0.102 48.2 48.2 2419.2 2421.7 93.4 94.6 50 100 0.5 0.041 0.052 0.052 0.085 0.101 0.101 48.2 48.2 2416.3 2419.5 93.2 94.2 50 100 1.0 0.040 0.051 0.051 0.084 0.102 0.102 48.1 48.1 2408.0 2411.3 92.6 94.2 50 100 2.0 0.038 0.051 0.051 0.081 0.101 0.101 48.0 48.0 2391.7 2394.1 92.2 93.1 50 200 0 0.046 0.051 0.051 0.095 0.102 0.102 48.6 48.7 2460.7 2466.4 96.0 96.8 50 200 0.2 0.045 0.050 0.050 0.093 0.101 0.101 48.6 48.6 2460.5 2460.2 96.0 96.6 50 200 0.5 0.045 0.050 0.050 0.093 0.100 0.100 48.6 48.6 2459.1 2456.6 95.9 96.5 50 200 1 0.046 0.051 0.051 0.093 0.102 0.102 48.6 48.6 2455.1 2458.3 95.7 96.2 50 200 2 0.043 0.049 0.049 0.089 0.099 0.099 48.5 48.4 2446.5 2439.0 95.4 95.7 T able 7: T yp e I err or of the standar d Q test and the impr ove d Q test for homo geneity under the nul l and moments of the distribution of Q . Sample sizes ar e e qual and b alanc e d. The c olumn he adings ar e deﬁne d in Se ction 5.1. Her e ˆ δ = P A − 1 i δ i / P A − 1 i . 37 I ¯ N χ 2 . 05 Γ s . 05 χ 2 E ( Q ) ,. 05 χ 2 . 1 Γ s . 1 χ 2 E ( Q ) ,. 1 E f ( Q ) ¯ Q E f ( Q 2 ) ¯ Q 2 V ar f ( Q ) s 2 ( Q ) 5 60 0.041 0.056 0.048 0.086 0.107 0.097 3.8 3.8 21.1 21.8 6.7 7.2 5 100 0.046 0.051 0.048 0.095 0.102 0.099 3.9 3.9 23.0 23.2 7.6 7.7 5 160 0.050 0.052 0.051 0.098 0.101 0.100 4.0 4.0 23.5 23.7 7.8 7.9 10 60 0.038 0.057 0.047 0.081 0.110 0.098 8.6 8.6 88.0 90.4 14.8 16.3 10 100 0.045 0.051 0.049 0.093 0.102 0.099 8.8 8.9 95.0 95.7 17.1 17.3 10 160 0.048 0.051 0.049 0.095 0.100 0.098 8.9 8.9 96.9 96.7 17.5 17.6 20 60 0.034 0.060 0.048 0.074 0.113 0.098 18.0 18.1 356.3 363.5 30.7 34.5 20 100 0.043 0.051 0.048 0.088 0.101 0.097 18.6 18.6 382.8 383.0 35.9 36.3 20 160 0.046 0.050 0.049 0.093 0.100 0.098 18.8 18.8 390.3 389.6 36.9 37.1 T able 8: T yp e I err or of the standar d Q test and the impr ove d Q test for homo geneity under the nul l and moments of the distribution of Q . Sample sizes ar e une qual but b alanc e d. The c olumn he adings ar e deﬁne d in Se ction 5.1. 38 p o w er at level 0.05 p o w er at level 0.10 I N τ 2 χ 2 K − 1 Gamma χ 2 E ( Q ) χ 2 K − 1 Gamma χ 2 E ( Q ) 5 20 0.025 0.051 0.099 0.065 0.100 0.160 0.127 5 20 0.05 0.071 0.134 0.089 0.136 0.203 0.163 5 20 0.1 0.121 0.208 0.148 0.210 0.294 0.245 5 20 0.15 0.177 0.274 0.209 0.277 0.369 0.316 5 20 0.2 0.230 0.340 0.266 0.343 0.438 0.387 5 20 0.25 0.281 0.397 0.321 0.399 0.500 0.442 5 30 0.025 0.072 0.095 0.082 0.134 0.164 0.150 5 30 0.05 0.111 0.142 0.125 0.190 0.228 0.210 5 30 0.1 0.196 0.236 0.214 0.297 0.339 0.320 5 30 0.15 0.286 0.336 0.307 0.402 0.447 0.426 5 30 0.2 0.369 0.416 0.388 0.481 0.522 0.503 5 30 0.25 0.444 0.491 0.463 0.551 0.591 0.574 5 40 0.025 0.092 0.107 0.099 0.161 0.182 0.174 5 40 0.05 0.147 0.168 0.156 0.232 0.258 0.247 5 40 0.1 0.271 0.302 0.286 0.383 0.413 0.401 5 40 0.15 0.387 0.418 0.402 0.500 0.528 0.515 5 40 0.2 0.486 0.516 0.501 0.592 0.612 0.605 5 40 0.25 0.560 0.590 0.575 0.662 0.683 0.674 5 50 0.025 0.110 0.122 0.117 0.182 0.199 0.192 5 50 0.05 0.196 0.214 0.204 0.292 0.313 0.304 5 50 0.1 0.330 0.354 0.342 0.447 0.467 0.459 5 50 0.15 0.465 0.488 0.478 0.572 0.588 0.582 5 50 0.2 0.570 0.589 0.581 0.663 0.680 0.674 5 50 0.25 0.650 0.671 0.661 0.731 0.746 0.740 5 60 0.025 0.127 0.138 0.132 0.210 0.224 0.217 5 60 0.05 0.225 0.242 0.233 0.326 0.342 0.336 5 60 0.1 0.394 0.411 0.403 0.501 0.517 0.510 5 60 0.15 0.531 0.547 0.539 0.633 0.645 0.641 5 60 0.2 0.633 0.647 0.640 0.718 0.729 0.725 5 60 0.25 0.707 0.720 0.714 0.782 0.791 0.788 5 80 0.025 0.160 0.170 0.165 0.254 0.264 0.261 5 80 0.05 0.295 0.309 0.303 0.406 0.417 0.413 5 80 0.1 0.496 0.509 0.504 0.598 0.608 0.604 5 80 0.15 0.639 0.648 0.643 0.717 0.724 0.721 5 80 0.2 0.728 0.736 0.732 0.798 0.804 0.802 5 80 0.25 0.800 0.806 0.803 0.853 0.857 0.856 39 10 20 0.025 0.050 0.141 0.078 0.103 0.211 0.150 10 20 0.05 0.083 0.195 0.118 0.151 0.277 0.205 10 20 0.1 0.161 0.310 0.215 0.258 0.414 0.324 10 20 0.15 0.256 0.432 0.319 0.371 0.531 0.445 10 20 0.2 0.347 0.532 0.417 0.471 0.628 0.544 10 20 0.25 0.445 0.627 0.515 0.567 0.708 0.640 10 30 0.025 0.081 0.121 0.097 0.150 0.201 0.180 10 30 0.05 0.147 0.210 0.178 0.243 0.305 0.279 10 30 0.1 0.294 0.369 0.331 0.410 0.486 0.455 10 30 0.15 0.438 0.520 0.482 0.561 0.628 0.600 10 30 0.2 0.564 0.638 0.602 0.676 0.733 0.709 10 30 0.25 0.668 0.732 0.702 0.764 0.811 0.791 10 40 0.025 0.106 0.135 0.121 0.184 0.222 0.207 10 40 0.05 0.203 0.244 0.223 0.310 0.354 0.338 10 40 0.1 0.411 0.464 0.440 0.538 0.582 0.568 10 40 0.15 0.600 0.642 0.627 0.703 0.738 0.725 10 40 0.2 0.719 0.753 0.736 0.799 0.826 0.815 10 40 0.25 0.803 0.833 0.821 0.865 0.885 0.877 10 50 0.025 0.136 0.159 0.151 0.224 0.258 0.246 10 50 0.05 0.267 0.299 0.285 0.381 0.415 0.404 10 50 0.1 0.524 0.556 0.544 0.634 0.660 0.652 10 50 0.15 0.698 0.730 0.716 0.783 0.806 0.798 10 50 0.2 0.809 0.830 0.822 0.872 0.888 0.882 10 50 0.25 0.878 0.891 0.886 0.919 0.928 0.925 10 60 0.025 0.170 0.191 0.181 0.268 0.292 0.284 10 60 0.05 0.338 0.366 0.354 0.458 0.486 0.478 10 60 0.1 0.619 0.644 0.635 0.717 0.736 0.730 10 60 0.15 0.780 0.797 0.791 0.845 0.858 0.853 10 60 0.2 0.862 0.875 0.869 0.909 0.917 0.914 10 60 0.25 0.917 0.927 0.923 0.948 0.953 0.951 10 80 0.025 0.229 0.245 0.239 0.332 0.347 0.343 10 80 0.05 0.459 0.477 0.469 0.572 0.591 0.585 10 80 0.1 0.734 0.748 0.742 0.811 0.822 0.819 10 80 0.15 0.877 0.887 0.884 0.919 0.925 0.923 10 80 0.2 0.940 0.944 0.942 0.961 0.965 0.963 10 80 0.25 0.962 0.965 0.964 0.975 0.977 0.977 20 20 0.025 0.051 0.199 0.100 0.106 0.277 0.178 20 20 0.05 0.096 0.298 0.160 0.170 0.391 0.270 20 20 0.1 0.217 0.479 0.314 0.329 0.577 0.448 40 20 20 0.15 0.383 0.659 0.506 0.520 0.738 0.632 20 20 0.2 0.523 0.780 0.638 0.652 0.841 0.754 20 20 0.25 0.655 0.862 0.751 0.761 0.901 0.841 20 30 0.025 0.098 0.169 0.132 0.174 0.259 0.225 20 30 0.05 0.197 0.298 0.256 0.306 0.413 0.372 20 30 0.1 0.451 0.566 0.514 0.576 0.680 0.642 20 30 0.15 0.670 0.765 0.724 0.770 0.838 0.815 20 30 0.2 0.798 0.869 0.839 0.874 0.918 0.903 20 30 0.25 0.888 0.929 0.913 0.932 0.957 0.949 20 40 0.025 0.138 0.193 0.170 0.235 0.296 0.277 20 40 0.05 0.311 0.379 0.353 0.429 0.500 0.478 20 40 0.1 0.627 0.695 0.667 0.736 0.786 0.770 20 40 0.15 0.818 0.858 0.842 0.885 0.911 0.903 20 40 0.2 0.917 0.938 0.930 0.949 0.961 0.958 20 40 0.25 0.962 0.972 0.968 0.980 0.986 0.984 20 50 0.025 0.189 0.230 0.214 0.293 0.342 0.326 20 50 0.05 0.407 0.460 0.441 0.534 0.586 0.573 20 50 0.1 0.758 0.797 0.781 0.841 0.867 0.859 20 50 0.15 0.913 0.928 0.924 0.947 0.960 0.956 20 50 0.2 0.968 0.976 0.973 0.983 0.988 0.986 20 50 0.25 0.987 0.990 0.989 0.993 0.995 0.994 20 60 0.025 0.241 0.278 0.266 0.356 0.392 0.382 20 60 0.05 0.502 0.547 0.531 0.627 0.664 0.654 20 60 0.1 0.843 0.865 0.856 0.901 0.914 0.911 20 60 0.15 0.951 0.959 0.955 0.971 0.975 0.974 20 60 0.2 0.985 0.989 0.988 0.993 0.994 0.994 20 60 0.25 0.995 0.996 0.996 0.998 0.998 0.998 20 80 0.025 0.341 0.369 0.360 0.467 0.497 0.488 20 80 0.05 0.671 0.697 0.687 0.768 0.787 0.782 20 80 0.1 0.935 0.943 0.942 0.962 0.966 0.966 20 80 0.15 0.985 0.988 0.987 0.992 0.993 0.993 20 80 0.2 0.997 0.997 0.997 0.999 0.999 0.999 20 80 0.25 0.999 0.999 0.999 0.999 1.000 0.999 50 20 0.025 0.050 0.391 0.139 0.102 0.465 0.238 50 20 0.05 0.131 0.577 0.276 0.221 0.643 0.412 50 20 0.1 0.388 0.838 0.600 0.530 0.877 0.722 50 20 0.15 0.661 0.950 0.819 0.771 0.965 0.894 50 20 0.2 0.833 0.983 0.925 0.903 0.990 0.963 50 20 0.25 0.929 0.996 0.974 0.962 0.998 0.988 41 50 30 0.025 0.126 0.270 0.196 0.211 0.383 0.318 50 30 0.05 0.332 0.531 0.446 0.466 0.647 0.586 50 30 0.1 0.753 0.876 0.830 0.842 0.925 0.900 50 30 0.15 0.937 0.975 0.963 0.966 0.988 0.982 50 30 0.2 0.985 0.995 0.992 0.993 0.998 0.996 50 30 0.25 0.998 1.000 0.999 0.999 1.000 1.000 50 40 0.025 0.216 0.327 0.286 0.334 0.447 0.415 50 40 0.05 0.537 0.656 0.614 0.666 0.760 0.733 50 40 0.1 0.917 0.951 0.942 0.954 0.974 0.970 50 40 0.15 0.989 0.994 0.993 0.995 0.997 0.997 50 40 0.2 0.998 0.999 0.999 0.999 1.000 1.000 50 40 0.25 1.000 1.000 1.000 1.000 1.000 1.000 50 50 0.025 0.317 0.402 0.376 0.443 0.534 0.512 50 50 0.05 0.697 0.769 0.748 0.798 0.850 0.837 50 50 0.1 0.974 0.982 0.979 0.986 0.992 0.991 50 50 0.15 0.998 0.999 0.999 0.999 1.000 0.999 50 50 0.2 1.000 1.000 1.000 1.000 1.000 1.000 50 50 0.25 1.000 1.000 1.000 1.000 1.000 1.000 50 60 0.025 0.412 0.480 0.461 0.539 0.608 0.592 50 60 0.05 0.814 0.860 0.848 0.890 0.917 0.911 50 60 0.1 0.993 0.995 0.995 0.996 0.997 0.997 50 60 0.15 1.000 1.000 1.000 1.000 1.000 1.000 50 60 0.2 1.000 1.000 1.000 1.000 1.000 1.000 50 60 0.25 1.000 1.000 1.000 1.000 1.000 1.000 50 80 0.025 0.598 0.645 0.634 0.720 0.755 0.747 50 80 0.05 0.938 0.951 0.948 0.968 0.973 0.972 50 80 0.1 0.998 0.999 0.999 0.999 1.000 1.000 50 80 0.15 1.000 1.000 1.000 1.000 1.000 1.000 50 80 0.2 1.000 1.000 1.000 1.000 1.000 1.000 50 80 0.25 1.000 1.000 1.000 1.000 1.000 1.000 T able 9: Power of the standar d chi-squar e b ase d Q test and the impr ove d Q test for homo geneity (gamma and chi-squar e with E ( Q ) de gr e es of fr e e dom appr oximations) at the nominal 5% and 10% levels. I is the numb er of studies al l of size N e qual ly divide d b etwe en the tr e atment and c ontr ol arms. The eﬀe cts have a me an of δ = 0 . 5 and varianc e of τ 2 . 42

Testing for Homogeneity in Meta-Analysis I. The One Parameter Case: Standardized Mean Difference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment