Simultaneous inference: When should hypothesis testing problems be combined?

The Annals of Applie d Statistics 2008, V ol. 2, No. 1, 197–22 3 DOI: 10.1214 /07-A OAS141 c  Institute of Mathematical Statistics , 2008 SIMUL T ANEOUS INFERENCE: WHEN SHOULD HYPOTHESIS TESTING PR OBLEMS BE COMBINED? By Bradley Efron Stanfor d University Modern statisticia ns are often presented with hundreds or thou- sands of h yp othesis testing p roblems to eva luate at the same time, generated fro m new scien tiﬁc technolog ies suc h a s microarra ys, med- ical and satel lite i maging devices, or ﬂow cytometry counters. The relev ant statistical literature tend s to b egin with the tacit assump- tion that a single combined analysis, for instance, a F alse Discov ery Rate assessmen t, should b e applied to the entire set of problems at hand. This can b e a dangerous assumption, as the ex amples in the pap er sho w, leading to ov erly conserv ativ e or ov erly liberal conclu- sions within any particular sub class of th e cases. A simple Bayes ian theory yields a succinct d escription of the eﬀects of separation or com- bination on false disco very rate analyses. The theory allo ws eﬃcient testing within small sub classes, and has applications to “en richment, ” the detection of m ulti-case eﬀects. 1. In trod uction. Mo dern scien tiﬁc devices suc h as microarra ys routinely pro vide the statistic ian with thousands of hypothesis testing problems to consider at the same time. A v ariety of statistical tec hniques, f alse disco very rates, famil y-wise error rates, p erm u tatio n metho ds etc., ha ve b een prop osed to hand le large-scal e testing situations, usually u nder the tacit assump tion that all a v ailable tests sh ould b e analyzed together—for instance, emplo y- ing a single f alse disco very analysis for all the genes in a giv en microarray exp erimen t. This can b e a d angerous assu mption. As my examples will show, omnibus com bination may distort individu al inf erences in b oth d irect ions: h ighly sig- niﬁcan t cases may b e hidden while insigniﬁcan t ones are enhanced. This pap er concerns the c hoice b et ween com bination and separation of hyp othe- sis testing problems. A h elpful m ethod ology will b e describ ed for diagnosing when separation ma y b e necessary for a subs et of the testing problems, as w ell as for carrying out separation in an eﬃcien t f ashion. Received Ma y 2007; revised September 2007. Key wor ds and phr ases. F alse disco very rates, separate-class mo del, enrichmen t. This is an electronic reprint of the original article published b y the Institute of Mathematical Statistics in The Annals of Applie d Statistics , 2008, V ol. 2, No. 1, 197– 2 23 . This reprint diﬀers from the o riginal in pa gination and t y pog r aphic detail. 1 2 B. EFRON Fig. 1. Br ain Data : z -values c omp aring 6 dyslexic childr en with 6 normals; horizontal se ction showing 848 of 15443 voxels. Co de: R e d z ≥ 0 , Gr e en z < 0 ; solid cir cles z ≥ 2 . 0 , solid squar es z ≤ − 2 . 0 ; “ x ” indic ates distanc e f r om b ack of br ain; y -axis is right-left distanc e. The fr ont half of the br ain app e ars to have mor e p ositive z -values. Data fr om Schwartzman, Doughert y and T aylor (2005) . Figure 1 illustrates our motiv ating example, tak en fr om Sc h w artzman, Doughert y and T a ylor (200 5) . Tw elve c hildr en, six dys lexic and six normal, receiv ed Diﬀusion T ensor Imaging (DTI) b rain scans, an adv anced form of MRI tec hnology that measures ﬂuid diﬀusion in the brain, in this case at N = 15443 lo cations, eac h r epresen ted b y its o wn vo xel’s resp onse. A z -v alue “ z i ” comparing the dyslexics with the normals has b een calculated for eac h v oxel i , su c h that z i should hav e a standard normal distribution und er the n ull h yp othesis of no dyslexic-normal diﬀerence at that brain lo cation, the or etic al nul l hyp othesis : z i ∼ N (0 , 1) . (1.1) The z -v alues f or a h orizon tal section o f the b rain con taining 848 of the 15443 vo x els are in d icate d in Figure 1 . Distance “ x ” from the bac k tow ard the f ron t of the brain is indicated along the x axis. In app earance at least, the z i ’s seem to b e more p ositiv e to wa rd the front. The in vestig ators were, of course, in terested in s p otting lo cations of gen- uine br ain diﬀerences b et we en the dyslexic and normal c hildren. A standard F alse Disco v ery Rate (FDR) analysis describ ed in Section 2 , Benjamini and Ho c hberg (1995) , returned 198 “signiﬁcan t” v o xels at thr eshold lev el q = 0 . 1, those havi ng z i ≥ 3 . 02. The histogram of all 15443 z i ’s app ears in the left panel of Figure 2 . Separate z -v alue histograms for the b ac k and fr ont h alv es of the brain are displa yed in the right panel of Figure 2 , with the dividing line at x = 49 . 5 SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 3 Fig. 2. L eft p anel: hi sto gr am of al l 15433 z -values for Br ain Data; the 198 voxels with z i ≥ 3 . 02 wer e judge d signiﬁc ant by an FDR analysis with thr eshold level q = 0 . 1 . Right p anel: histo gr ams for b ack and fr ont voxels; sep ar ate FDR analyses at level q = 0.1 gave no signiﬁc ant voxels for b ack half, and 281 signiﬁc ant voxels, those with z i ≥ 2 . 69 , for fr ont half. MLE values ar e me ans and standar d deviations for normal densities ﬁt to the c enters of the two histo gr ams, as explaine d in Se ction 3 . as sho w n in Figure 1 . Tw o discrepancies strik e the ey e: the hea vy righ t tail seen in the com b ined histogram on th e right comes almost exclusive ly from fron t-half vo xels; and the cent er of the back- half histogram is shifted left w ard ab out 0.35 u nits compared to the front . Separate FDR analyses, eac h at threshold lev el q = 0 . 1, w ere ru n on the bac k and fr on t half data. 281 fron t half v o xels were found signiﬁcan t, those ha vin g z i ≥ 2 . 69; the bac k half analysis gav e no signiﬁcant v oxel s at all, in con trast to 9 signiﬁcan t bac k-half cases found in the com b ined analysis. This example illustrates b oth of the dangers of combinat ion—o ve r and u nder sensitivit y w ithin d iﬀeren t sub classes of the exp erimen t. Section 2 b egins with a simple Bay esian theorem that quan tiﬁes the c hoice b et ween sep arate and com bined analysis. It is applied to the brain data in Section 3 , elucidating the d iﬀerences b et ween fron t and bac k f alse disco v- ery rate analyses. Th e theorem is most useful for separately inv estigating smal l sub classes, where there is too little d ata for the dir ect empirical Ba y es tec hniques of Section 3 . Sections 4 and 7 demonstrate how al l the data, all N = 15833 z v alues in the Brain study , can b e br ought to b ear on eﬃcien t FDR estimation for a small sub class, for example, ju s t the 82 v o xels at dis- tance x = 18 . Section 6 applies small sub class theory to “enric h men t,” the assessmen t of a p ossible ov erall discrepancy b et ween the z -v alues w ithin and outside a c hosen class. 4 B. EFRON A reasonable ob jection to p erform ing separate analyses on p ortions of the N cases is the p ossibilit y of weak ening con trol of Typ e 1 err or, the ov erall size of the pro cedure. Th is qu estio n is tak en u p in Section 5 , where false disco very rate metho ds are sho wn to b e nearly imm u n e to the danger. Some tec hnical remarks and d iscussion end the pap er in Sections 8 and 9 . The question of separating large-scale testing problems has not r eceiv ed m u c h recen t atten tion. Two relev an t references are Geno v ese, Ro eder and W asserman (2006 ) , and F erkinstad, F rigessi, Thorleifsson and Kong (2007) . Enric hmen t tec hniques ha ve b een more activ ely dev elop ed, as in Su brama- nian et al. (2005 ) , Newton et al. (2007) and Efron and Tibshirani (2007) . 2. A separate-class mo del. The “Two- Groups mo del,” reviewe d b elo w, pro vides a simple Ba y esian fr amew ork for the analysis of sim ultaneous h y- p othesis testing problems. Th e framew ork will b e extended here to include the p ossibilit y of separate situations in diﬀeren t sub-classes of problems— for example, for the b ac k and front halv es of the b r ain in Figure 1 —the extended framew ork b eing the “Separate-Class mo del.” First, we b egin with a br ief r eview of the Tw o-Groups mo del, tak en from Efron ( 2005 , 2007a ). It starts with the Ba y esian assump tion that eac h of the N cases (all N = 15443 vo xels for the Brain Data) is either n u ll or nonnull, with pr ior probabilit y p 0 or p 1 = 1 − p 0 , and with its test statistic “ z ” ha ving densit y either f 0 ( z ) or f 1 ( z ), p 0 = Pr ob { n ull } , f 0 ( z ) density if null , (2.1) p 1 = Pr ob { nonn u ll } , f 1 ( z ) densit y if nonn ull . The theoretica l null mo d el ( 1.1 ) mak es f 0 ( z ) a standard normal densit y f 0 ( z ) = ϕ ( z ) = 1 √ 2 π e − 1 / 2 z 2 , (2.2) an assumption we will question more critically later. W e add the qualitativ e requiremen t that p 0 is large, s a y , p 0 ≥ 0 . 90 , (2.3) reﬂecting the usual purp ose of large-sca le testing, whic h is to reduce a v ast set of p ossibilities to a muc h smaller set of interesti ng pr osp ects. Mo del ( 2.1 ) is p articularly helpful f or motiv ating F alse Disco v ery Rate metho ds. Let F 0 ( z ) and F 1 ( z ) b e the cum ulativ e distribution functions (c.d.f.) corresp onding to f 0 ( z ) and f 1 ( z ), and deﬁne the mixture c.d.f. F ( z ) = p 0 F 0 ( z ) + p 1 F 1 ( z ) . (2.4) SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 5 Then the a p osteriori probabilit y of a case b eing in the null group of ( 2.1 ), giv en that its z -v alue z i is less than some threshold z , is the “Ba yesia n false disco very rate” Fdr( z ) ≡ Pr ob { null | z i < z } = p 0 F 0 ( z ) /F ( z ) . (2.5) [It is notationally conv enien t here to consider the negativ e end of the z -scale, e.g., z = − 3, but w e could ju st as well tak e z i > z or | z i | > z in ( 2.5 ).] Benjamini and Ho ch b erg’s (1995 ) false disco very cont rol rule estimates Fdr( z ) by Fdr( z ) = p 0 F 0 ( z ) / ¯ F ( z ) , (2.6) where ¯ F ( z ) is th e empirical c.d.f. ¯ F ( z ) = # { z i ≤ z } / N . (2.7) Ha ving selected some cont rol lev el “ q ,” often q = 0 . 1, the rule declares all cases as nonnull ha ving z i ≤ z max , where z max is the maximum v alue of z satisfying Fdr( z max ) ≤ q . (2.8) [Usually taking p 0 = 1 and F 0 ( z ) the theoretical null c.d.f. Φ ( z ) in ( 2.5 ).] Rule ( 2.8 ), wh ic h lo oks Ba y esian here, can b e shown to h a v e an imp ortant frequen tist “con trol” prop erty: if the z i ’s are indep end ent, the exp ected p ro- p ortion of false disco veries, that is, the prop ortion of cases identiﬁed b y ( 2.8 ) that are actually from the null group in ( 2.1 ), will b e n o greater than q . Ben- jamini and Y ekutie li (2001 ) relax the indep endence r equiremen t somewhat. Most large-sca le testing situations exhibit substan tial correlations among the z v alues—ob vious in Figure 1 —bu t dep end en ce is less of a problem for the empirical Ba y es approac h to false d isco v ery rates w e will follo w here [see Efron ( 2007 a , (2007b) )]. Deﬁning the mixtur e density f ( z ) from ( 2.1 ), f ( z ) = p 0 f 0 ( z ) + p 1 f 1 ( z ) (2.9) leads to the “local false disco v ery rate” fdr( z ), fdr( z ) ≡ Prob { n ull | z i = z } = p 0 f 0 ( z ) /f ( z ) (2.10) for the probabilit y of a case b eing in the null group giv en z -score z . Densities are more natural than the tail areas of ( 2.5 ) for Ba y esian analysis. Both will b e used in w hat follo w s. The Sep ar ate-Class mo del extends ( 2.1 ) to co ver the situation wh ere the N cases can b e divided into distinct classes, p ossibly h a ving diﬀeren t c hoices of p 0 , f 0 and f 1 . Figure 3 illustrates the sc h eme: the tw o classes “A” and “B” (fron t and bac k in Figure 2 ) ha ve a priori probabilities π A and π B = 1 − π A . 6 B. EFRON Fig. 3. A Sep ar ate-Class m o del with two classes : The N c ases sep ar ate into classes A or B with a priori pr ob abili ty π A or π B ; the Two-Gr oups mo del ( 2.1 ) holds sep ar ately, with p ossibly diﬀer ent p ar am eters, within e ach class. The Two-Groups m od el ( 2.1 ) holds separately within eac h class, for example, with p 0 = p A 0 , f 0 ( z ) = f A 0 ( z ), and f 1 ( z ) = f A 1 ( z ) in class A. I t is imp ortant to notice that the class lab el, A or B, is observ ed b y the statistician, in con trast to the n u ll/nonnull dichoto m y , which must b e inferred. Our pr evious deﬁnitions app ly separately to classes A an d B, for instance, follo wing ( 2.9 )–( 2.10 ), f A ( z ) = p A 0 f A 0 ( z ) + p A 1 f A 1 ( z ) and (2.11) fdr A ( z ) = p A 0 f A 0 ( z ) /f A ( z ) . Com b ining the tw o classes in Figure 3 giv es m arginal den s ities f 0 ( z ) = π A f A 0 ( z ) + π B f B 0 ( z ) , (2.12) f 1 ( z ) = π A f A 1 ( z ) + π B f B 1 ( z ) , and p 0 = π A p A 0 + π B p B 0 , s o f ( z ) = π A f A ( z ) + π B f B ( z ) , (2.13) leading to the com bin ed lo cal false disco very rate fdr( z ) = p 0 f 0 ( z ) /f ( z ) as in ( 2.10 ). Th e same relationships with c.d.f.’s replacing den s ities apply to tail area Fdr’s, ( 2.5 ). Ba ye s theorem yields a simp le but useful relationship b etw een the separate and com bined false disco v ery rates: Theorem. Deﬁne π A ( z ) as the c onditional pr ob ability of class A give n z , π A ( z ) = Prob { A | z } , (2.14) SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 7 and also π A 0 ( z ) = Prob 0 { A | z } , (2.15) the c onditional pr ob ability of class A given z for a nul l c ase. Then fdr A ( z ) = fdr( z ) · π A 0 ( z ) π A ( z ) . (2.16) Pr oof. Let I b e the even t that a case is n ull, so I occurs in the tw o n ull paths of Figure 3 but n ot otherwise. Th en , from deﬁnition ( 2.10 ), fdr A ( z ) fdr( z ) = Prob { I | A, z } Prob { I | z } = Prob { I , A | z } Prob { A | z } Prob { I | z } (2.17) = Prob { A | I , z } Prob { A | z } = π A 0 ( z ) π A ( z ) .  The Theorem has u seful practical applications. Section 4 sho ws that the ratio R A ( z ) = π A 0 ( z ) /π A ( z ) (2.18) in ( 2.16 ) can often b e easily estimated, yielding helpful diagnostic s for p os- sible discrepancies b etw een fd r A ( z ) and f dr( z ), the separate and com bin ed false disco ve ry rates. T ail-area false disco v ery r ates ( 2.5 ) also follo w ( 2.16 ), after the ob vious deﬁnitional c hanges, Fdr A ( z ) = Fdr( z ) · R A ( z ) , (2.19) where no w R A ( z ) in volv es pr obab ilities for cases ha ving z i ≤ z , R A ( z ) = Prob 0 { A | z i ≤ z } Prob { A | z i ≤ z } . (2.20) There is no r eal reason, except exp ositional clarit y , for restricting atten- tion to jus t tw o classes. Section 4 brieﬂy discusses versions of the Theorem applicable to more nuanced situations—in terms of Figure 1 , for example, where the relev ance of other cases to the fdr at a giv en “ x ” falls oﬀ smo othly as we mo v e aw a y from x . First though, Section 3 applies the Theorem to the dic h otomo us front-bac k Brain Data analysis. 3. Class-wise Fd r estimation. Th e Theorem of Section 2 sa ys that sep- arate and com bined lo cal false disco very rates are related by fdr A ( z ) = fdr( z ) · R A ( z ) , R A ( z ) = π A 0 ( z ) /π A ( z ) , (3.1) where π A 0 ( z ) and π A ( z ) are the conditional p robabilities Pr ob 0 { A | z } and Prob { A | z } . This section applies ( 3.1 ) to the Brain Data of Figures 1 and 2 , 8 B. EFRON Fig. 4. Points ar e pr op ortion of fr ont-half voxels r Ak , ( 3.2 ) , for Br ain Data of Figur e 2 . Solid curve i s b π A ( z ) , cubic lo gistic r e gr ession estimate of π A ( z ) = Prob { A | z } ; Dashe d curve b π A 0 ( z ) estimates Prob 0 { A | z } as explaine d i n text. taking the front-half v oxe ls as class A. Th e f ron t-bac k dic hotom y is extended to a more realistic multi-cl ass mo del in Section 4 . In ord er to estimate π A ( z ), it is conv enien t, though not n ecessary , to bin the z -v alues. Deﬁne r Ak as the p rop ortion of class A z -v alues in bin k , r Ak = N Ak / N k , (3.2) with N k the num b er of z i ’s ob s erved in b in k , and N Ak the num b er of those originating from v oxe ls in class A. The p oin ts in Figure 4 sho w r Ak for K = 42 b ins, eac h of length 0.2, runn ing from z = − 4 . 2 to 4.2. As suggested b y the right p anel of Figure 2 , the prop ortions r Ak steadily increase as w e mo ve from left to right. A standard weigh ted logist ic regression, ﬁtting logit( π Ak ) as a cubic fu nc- tion of midp oint z ( k ) of bin k , w ith wei gh ts N k , yielded estimate b π A ( z ) sho w n by the solid curv e in Figure 4 . Binn ing isn ’t necessary here, but it is comforting to see b π A ( z ) n icely follo wing the r Ak p oin ts. (The three bins with r Ak = 0 at the extreme left con tain only N k = 2 z i ’s eac h.) In order to estimate π A 0 ( z ), we need to m ak e some assumptions ab out the null d istributions in the Tw o-Class mo del of Figure 3 . F ollo wing Ef ron ( 2004a , 2004b ), we assume that f A 0 ( z ) and f B 0 ( z ) are normal densities, but not necessarily N (0 , 1) as in ( 1.1 ), sa y , f A 0 ( z ) ∼ N ( δ A 0 , σ 2 A 0 ) and f B 0 ( z ) ∼ N ( δ B 0 , σ 2 B 0 ) . (3.3) Ba ye s theorem then giv es π A 0 ( z ) π B 0 ( z ) = π A 0 ( z ) 1 − π A 0 ( z ) SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 9 T able 1 Par ameter estimates for the nul l arms of the Two-Class mo del in Figur e 3 , br ain data. Obtaine d using R pr o gr am l o cfdr, Efr on (2007b) , MLE ﬁtting metho d b p 0 b δ 0 b σ 0 π A (front): 0.97 0.06 1.09 0.50 B (back): 1.00 − 0 . 29 1.01 0.50 (3.4) = π A p A 0 σ B 0 π B p B 0 σ A 0 exp  − 0 . 5  z − δ A 0 σ A 0  2 −  z − δ B 0 σ B 0  2  , whic h is easily solv ed for π A 0 ( z ). The R alg orithm lo cf dr used “MLE ﬁtting” d escrib ed in Section 4 of Efron (2007 b) to pr o vide th e p arameter estimates in T able 1 . (The front- bac k dividin g line in Figure 1 wa s chosen to put ab out h alf the vo xels in to eac h class, so π A /π B . = 1 .) Solving f or b π A 0 in ( 3.4 ) ga ve the dashed cur v e of Figure 4 . Lo oking at Figure 2 , we migh t exp ect fdr A ( z ), the local fdr for the fr ont, to b e muc h low er than the com bined fdr( z ) for large v alues of z , that is, to pro vide man y m ore “signiﬁcan t” z -v alues, but this is not the case: in fact, b R A ( z ) = b π A 0 ( z ) / b π A ( z ) . = 0 . 94 (3.5) for z ≥ 3 . 0, so form ula ( 3.1 ) imp lies only small diﬀerences. T w o con tradictory eﬀects are at work: the longer right tail of th e fron t-half d istribution b y itself w ould pr od uce small v alues of R A ( z ) and fdr A ( z ); ho w ev er, th is eﬀect is mostly canceled by the right w ard shift of the whole f r on t-half distrib ution, whic h s ubstan tially increases th e numerator of fdr A ( z ) = p A 0 f A 0 ( z ) /f A ( z ), ( 2.11 ). [Note: th e “signiﬁcan t” vo xels in Figure 2 w ere obtained using the theoretica l null ( 1.1 ) for b oth the separate and com bined analyses, making them somewh at diﬀeren t than those based on the empirical n ull estimates here.] The close matc h b etw een b π A 0 ( z ) and b π A ( z ) near z = 0 is no acciden t. F ollo wing through the deﬁnitions in Figure 3 and ( 2.11 ) give s, after a little rearrangemen t, π A ( z ) 1 − π A ( z ) = π A 0 ( z ) 1 − π A 0 ( z ) 1 + Q A ( z ) 1 + Q B ( z ) , (3.6) where Q A ( z ) = 1 − fdr A ( z ) fdr A ( z ) and Q B ( z ) = 1 − fdr B ( z ) fdr B ( z ) . (3.7) 10 B. EFRON Fig. 5. Br ain Data: z -values plotte d vertic al ly versus distanc e x fr om b ack of br ain. Smal l histo gr am shows x values for the 198 voxels wi th z i ≥ 3 . 02 , l eft p anel of Figur e 2 . Running p er c entile curves r eve al gener al upwar d shif t of z -values ne ar histo gr am mo de at x = 65 . Usually fd r A ( z ) and fdr B ( z ) will appro ximately equal 1.0 near z = 0, reﬂect- ing the large prep ond erance of null cases, (2.3), and the fact th at nonnull cases tend to pro duce z -v alues further a wa y from 0. Then (3.6) giv es π A ( z ) . = π A 0 ( z ) for z near 0 , (3.8) as seen in Figure 4 . Supp ose we b eliev e that f A 0 ( z ) = f B 0 ( z ) in Figure 3 , in other words, that the null cases are distrib uted ident ically in the tw o classes. [This b eing true, e.g., if we accept the theoretical null distribution ( 1.1 ), as is u sual in the microarra y literature.] Then π A 0 ( z ) will b e constan t as a fu nction of z , π A 0 ( z ) = π A p A 0 π A p A 0 + π B p B 0 = π A p A 0 p 0 . (3.9) Since b π A ( z ) in Figure 4 is not constan t near z = 0, and should closely ap- pro ximate π A 0 ( z ) there, we h av e evidence against f A 0 ( z ) = f B 0 ( z ) in this case, obtained without r ecourse to mo dels suc h as ( 3.3 ). 4. Fdr estimation f or small su b classes. Our d ivision of the Brain Data in to front and bac k halv es was somewhat arb itrary . Figure 5 sho ws the N = 15443 z -v alues plotted v ersus x , the distance fr om the bac k of the brain. A clear wa v e is visible, cresting n ear x = 65. Most of the 198 BH (0 . 1) signiﬁcan t v oxels of Figure 2 o ccurred at the top of the crest. There is something worrisome h ere: th e z -v alues near x = 65 are s h ifted upw ard across their entire range, not just in th e upp er p ercen tiles. This SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 11 Fig. 6. L eft p anel: Solid histo gr am shows 82 z -values for class A, the voxels at x = 18 ; Line histo gr am for al l other z -values in Br ain Data; “ cub ” i s cubic lo gistic r e gr ession estimate of 1 /R A ( z ) . R ight p anel: c ombine d lo c al false di sc overy r ate c fdr( z ) , obtain fr om lo cfdr algorithm, and c fdr A ( z ) = c fdr( z ) · b R A ( z ) . Dashes indic ate the 82 z i ’ s. migh t b e du e to a reading bias in the DTI imaging device, or a gen uine dif- ference b et w een d yslexic and normal children for al l b rain lo cati ons around x = 65. In neither case is it correct to assign some sp ecial signiﬁcance to those v oxel s near x = 65 ha ving large z i scores. In fact, a separate Fdr analysis of the 1956 v o xels ha ving x in the range [60 , 69] [using lo cfdr to estimate their empirical n ull distribution as N (0 . 65 , 1 . 44 2 )] yielded no signiﬁcan t cases. Figure 5 suggests that we migh t wish to p erform separate analyses on man y small su b classes of the data. Large classes can b e inv estigated directly , as ab ov e, but a fully sep arate analysis may b e to o muc h to ask for a su b class of less than a f ew hundred cases, as sho wn by the accuracy calculations of Efron (2007 b) . Relationship ( 3.1 ) can b e useful in s uc h situations. As an example, let class A b e the 82 v oxel s located at x = 18. These displa y some large z -v alues in Figure 5 , attained without the b eneﬁt of riding a wa v e crest. Figure 6 shows their z -v alue histogram and a cub ic logistic regression estimate of π A ( z ) /π A , where π A = 82 / 1554 3 = 0 . 0053 . (4.1) This amoun ts to an estimate of 1 /R A ( z ) in ( 3.1 ). Here w e are assuming that A s h ares a common n ull distribu tion with all the other cases, f A 0 ( z ) = f B 0 ( z ) as in ( 3.9 ), a necessary assu mption sin ce th ere isn’t enough data in A to separately estimate π A 0 ( z ). In its fa vo r is the ﬂatness of b π A ( z ) /π A near z = 0, a necessary d iagnost ic signal as discussed at the end of S ectio n 3 . [Remark 12 B. EFRON G of Section 8 discuss es replacing π A with π A 0 , ( 3.9 ), in the estimation of R A ( z ).] The r ight p anel of Figure 6 compares the com b ined lo cal false d isco v ery rate c fdr( z ), estimated using lo cfdr as in Efron ( 2007a ), with c fdr A ( z ) = c fdr( z ) · b R A ( z ) , (4.2) for z ≥ 0. The adjustment is subs tantia l. Whether or not it is gen u ine d e- p ends on the accuracy of b R A ( z ), as considered further in the “eﬃciency” discussion of Section 7 . So far we ha ve only discussed th e d ic hotomization of cases into t wo classes A and B of p ossibly separate r elev ance. Figure 5 might suggest a more con- tin u ous approac h in whic h the relev ance of case j to case i falls oﬀ smo othly as | x j − x i | in creases, for example, as 1 / (1 + | x j − x i | / 10). W e sup p ose that eac h case comprises three comp onents, case i = ( x i , I i , z i ) , (4.3) where x i is an observ able vec tor of co v ariates, I i is the unobserv able n ull indicator h aving I i = 1 or 0 as case i is from the null or nonnull group in ( 2.1 ), and z i is th e ob s erv able z -v alue. W e also d eﬁne a “relev ance function” ρ i ( x ) taking v alues in the int erv al [0 , 1], whic h sa ys how relev an t a case w ith co v ariate x is to case i , the case of inte rest. [Previously , ρ i ( x j ) = 1 or 0 as x j w as or was not in the same class as x i .] The Two -Class mo del of Figure 3 can b e extended to a m u lti-cl ass mo d el, where eac h cov ariate v alue x h as its o wn distribu tion parameters p x 0 , f x 0 ( z ), and f x 1 ( z ), giving f x ( z ) = p x 0 f x 0 ( z ) + (1 − p x 0 ) f x 1 ( z ) and (4.4) fdr x ( z ) = p x 0 f x 0 ( z ) /f x ( z ) as in ( 2.11 ). Let fd r( z ) b e the com bined lo cal false disco very rate as in the Theorem of Section 2 , and fdr i ( z ) the separate rate fdr x i ( z ). Th en ( 3.1 ) generalize s to fdr i ( z ) = fdr( z ) · R i ( z ) , R i ( z ) = E 0 { ρ i ( x ) | z } E { ρ i ( x ) | z } , (4.5) “ E 0 ” indicating n ull case conditional exp ectati on. T ail area false disco very rates ( 2.5 ) also satisfy ( 4.5 ) after the r equisite deﬁnitional c hanges, Fdr i ( z ) = Fdr( z ) · R i ( z ) , R i ( z ) = E 0 { ρ i ( X ) | Z ≤ z } E { ρ i ( X ) | Z ≤ z } . (4.6) The emp irical v ersion of ( 4.6 ) clariﬁes its meaning. L et p j 0 and F j 0 ( z ) indicate p x 0 and the c.d.f. of f x 0 ( z ) for x = x j , ( 4.4 ), and let N ( z ) = # { z j ≤ SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 13 z } . T aking account of all the diﬀerent situations ( 4.3 ), the combined Fdr estimate ( 2.5 ) b ecomes Fdr( z ) = N X j =1 p j 0 F j 0 ( z ) / N ( z ) , (4.7) this b eing the ratio of exp ected null cases to observ ed total cases for z j ≤ z i . Similarly , Fdr i ( z ) = N X j =1 ρ i ( x j ) p j 0 F j 0 ( z ) . X z j ≤ z ρ i ( x j ) , (4.8) the ratio of exp ected n ull cases to total cases taking accoun t of the relev ance of x j to x i . Therefore, Fdr i ( z ) = Fdr( z ) · ¯ R i ( z ) , (4.9) ¯ R i ( z ) = [ P N j =1 ρ i ( x j ) p j 0 F j 0 ( z ) / P N j =1 p j 0 F j 0 ( z )] [ P z j ≤ z ρ i ( x j ) / N ( z )] ; the denominator of ¯ R i ( z ) is an ob vious estimate of E { ρ i ( X ) | Z ≤ z } in ( 4.6 ), while the n u m erator is the Ba yes estimate of E 0 { ρ i ( X ) | Z ≤ z } . 5. Are separate analyses legitimate ? The p rincipal, and sometimes only , concern of classical multi ple inference theory was the con trol of Typ e I error in sim ultaneous hyp othesis testing situations. T his r aises an imp ortan t qu es- tion: is it legitima te from an error con trol viewp oin t to sp lit su c h a s ituation in to separate analysis classes? The answ er, d iscussed b rieﬂy here, dep ends up on the method of inference. The Bonferroni metho d applied to N indep endent hyp othesis tests rejects the n ull for those cases ha ving p -v alue p i suﬃcien tly small to accoun t for m u ltiplicit y , p i ≤ α/ N , (5.1) α = 0 . 05 b eing the familiar c hoice. If we n o w separate the cases into t w o classes of size N / 2, rejecting for p i ≤ α/ ( N/ 2), we eﬀectiv ely double α . Some adjustmen t of Bonferroni’s metho d is necessary if we are con templat- ing separate analyses. Changing α to α/ 2 works here, but things get more complicate d f or situations su c h as those suggested by Figure 5 . F alse disco v ery rate metho d s are more f orgiving—usually they can b e applied to separate analyses in un c hanged form without u ndermining their inferen tial v alue. Basically , this is b ecause they are r ates , and, as s u c h, cor- rectly scale with “ N .” W e will discuss b oth Ba ye sian and f requen tist justi- ﬁcations for this state men t. 14 B. EFRON Starting in a Ba y esian framew ork, as in ( 4.3 ), let ( X, I , Z ) (5.2) represen t a random case, where X is an observ ed co v ariate v ector, I is an unobserve d ind icato r equaling 1 or 0 as the case is null or nonnull, and Z is the observ ed z -v alue. W e assume X has p rior distribution π ( x ). X might indicate class A or B in Figure 3 , or the distance from the bac k of the b rain in Figure 5 . Let Fdr x ( z ) b e the Ba y esian tail area false d isco v ery r ate ( 2.5 ) conditional on observing X = x , Fdr x ( z ) = Prob { I = 1 | X = x, Z ≤ z } . (5.3) (Remem b ering that we could ju st as wel l c hange the Z condition to Z ≥ z or | Z | ≤ z .) F or eac h x , d eﬁne a threshold v alue z ( x ) by Fdr x ( z ( x )) = q (5.4 ) for some preselecte d control lev el q , p erhaps q = 0 . 10. This implies a rule R that mak es decisions “ b I ” according to ( 5.4 ), b I =  1 (null) , if Z > z ( X ), 0 (nonnull) , if Z ≤ z . (5.5) Rule R has a conditional Ba y esian false disco very r ate q for ev ery c hoice of x , Fdr x ( R ) ≡ Prob { I = 1 | b I = 0 , X = x } = q . (5 .6) Unconditionally , the rate is also q : Fdr( R ) = Pr ob { I = 1 | b I = 0 } = Z X Prob { I = 1 | Z ≤ z ( X ) , X = x } π ( x | Z ≤ z ) dx (5.7) = Z X q · π ( x | Z ≤ z ) dx = q . This veriﬁes the Ba y esian separation pr op ert y f or false d isco v ery rates: Fdr x ( R ) = q separately for all x implies Fd r ( R ) = q . Separating the Fdr analyses has not weak ened the Fdr int erpretation for the en tire ensemble. F or an y ﬁxed v alue of z , the combined Ba y esian false disco ve ry rate Fdr( z ) ( 2.5 ) is an a p osteriori mixtur e of the separate Fdr x ( z ) v alues, Fdr( z ) = Z X Fdr x ( z ) π ( x | Z ≤ z ) dx, (5.8) b y the same argument as in the top line of ( 5.7 ). Th ere will b e some thresh old v alue “ z (com b )” that make s Fdr( z (com b )) = q , (5.9) SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 15 deﬁning a combined d ecision rule R com b that, lik e R , con trols the Bay es false discov ery rate at q . Beca use of ( 5.8 ), z (com b ) will lie within the range of z ( X ); R com b will b e more or less conserv ativ e than the sep arated rule R as z (com b ) < z ( X ) or z (com b) > z ( X ) . 1 Not using the information in X reduces the theoretica l accuracy of rule R com b ; see Remark A of Section 8 . Result ( 5.7 ) justiﬁes Fdr separation from a Ba yesian p oint of view. The corresp onding frequ entist/ empirical Ba y es calculat ions lead to essen tially the same conclusion, though n ot as cleanly as in ( 5.7 ). In order to justify the empir ical Ba y es in terpretation of Fdr( z ) ( 2.6 ), we w ould lik e it to accurately estimate Fdr( z ) ( 2.5 ). F ollo wing mo del ( 2.1 ), deﬁne N ( z ) = # { z i ≤ z } and e ( z ) = E { N ( z ) } = N · F ( z ); (5.10) also let D = Fdr( z ) Fdr( z ) and d = 1 − F ( z ) e ( z ) = 1 N 1 − F ( z ) F ( z ) . (5.11) Assuming indep endence of the z i (just for this calculat ion) standard bi- nomial results sho w that E { D } . = 1 + d a nd v ar { D } . = d, (5.12) where the appro ximations are acc urate to order O (1 / N ), ignoring terms O (1 / N 2 ). Remark B impro ves ( 5.12 ) to accuracy O (1 / N 2 ). W e see that Fdr( z ) is nearly unbiased for Fdr ( z ) with coeﬃcient of v ari- ation CV( F dr ( z )) = d 1 / 2 ≤ e ( z ) − 1 / 2 . (5.13) W e n eed e ( z ) to b e reasonably large to make CV small enough for accurate estimatio n, p erhaps e ( z ) = N · F ( z ) ≥ 10 for CV ≤ 0 . 3 . (5.14) If we are w orking near the 1% tail of F ( z ), common enough in Fdr applica- tions, w e n eed N ≥ 1000. Making “ N ” as large as p ossible in th e b est reason for combined rather than separate analysis, at least in an empirical Ba ye s framew ork. The sepa- rate analyses s till ha v e large N ’s in the front -bac k example of Figure 2 , bu t 1 Geno vese , Ro eder and W asserman (2006) consider more qualitative situations where what we hav e called “A” or “B” migh t b e classes of greater or less a priori null probability . Their “w eighted BH” rule transforms z into v alues z A or z B dep ending u pon the class, and then carries out ru le ( 2.5 ) on th e transformed z ’s. Here, instead, the z ’s are kept the same, but compared to diﬀerent thresholds z ( x ). F erkinstad et al. (2007) exp lore th e dep endence of Fdr( z ) on x via ex plicit p arametric mo dels. 16 B. EFRON not in Figure 6 . Section 7 sho ws ho w the small sub class approac h used in Section 4 can imp ro ve estimation eﬃciency . The danger of com bination is that we ma y b e gett ing a go od estimate of the wrong qu an tit y: if Fdr A ( z ) is m uc h d iﬀeren t than Fdr B ( z ), Fd r ( z ) ma y b e a p o or estimate of b oth. Returning to a com bin ed analysis, let N 0 ( z ) and N 1 ( z ) b e the num b er of n ull and n onn u ll z i ’s equal or less than z , for some ﬁxed v alue of z , N o ( z ) = # { z i ≤ z , I i = 1 } and N 1 ( z ) = # { z i ≤ z , I i = 0 } (5.15) in notation ( 4.3 ), so N 0 ( z ) + N 1 ( z ) = N ( z ) , ( 5.10 ); also let e 0 ( z ) and e 1 ( z ) b e their exp ectations, e 0 ( z ) = E { N 0 ( z ) } = N p 0 · F 0 ( z ) and (5.16) e 1 ( z ) = E { N 1 ( z ) } = N p 1 · F 1 ( z ) . The ru le that declares b I i = 0 if z i ≤ z (i.e., “rejects th e n ull” for z i ≤ z ) has actual false disc overy pr op ortion Fdp( z ) = N 0 ( z ) N 0 ( z ) + N 1 ( z ) . (5.1 7) Fdp ( z ) is unobserv able, but w e can estimate it by Fdr( z ) ( 2.2 ), equaling e 0 ( z ) / ( N 0 ( z ) + N 1 ( z )) in notation ( 5.15 ). Th is is conserv ativ e in the fre- quen test sense of b eing an upw ardly biased estimate. I n fact, it is u p wardly biased giv en any ﬁxed v alue of N 1 ( z ): E { Fdr( z ) | N 1 ( z ) } = E  e 0 ( z ) N 0 ( z ) + N 1 ( z )    N 1 ( z )  ≥ e 0 ( z ) e 0 ( z ) + N 1 ( z ) (5.18) ≥ E  N 0 ( z ) N 0 ( z ) + N 1 ( z )    N 1 ( z )  = E { Fdp( z ) | N 1 ( z ) } , where Jensen’s inequalit y has b een u sed t wice. On ly the deﬁnition E { N 0 ( z ) = e 0 ( z ) is requir ed here, not indep endent z i ’s. No w su p p ose we ha v e separated the cases into classes A and B, emplo y- ing separate rejection r ules z i ≤ z A and z i ≤ z B satisfying (in the ob vious notation) E { Fdr A ( z A ) } = E { Fdr B ( z B ) } ≡ q . (5.19) Applying ( 5.18 ) sh o ws that the separate false d isco v ery prop ortions will b e con trolled in exp ectation at r ate q . Ho wev er, for the equiv alen t of the SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 17 Ba ye sian result ( 5.7 ) to hold fr equen tistically , we w an t the com bined F alse Disco v ery Prop ortion, Fdp com b = N A ( z A ) + N B 0 ( z B ) N A 0 ( z A ) + N B 0 ( z B ) + N A 1 ( z A ) + N B 1 ( z B ) , (5.20) to satisfy E { Fdp com b } ≤ q . Remarks C and D sho w that asymptotically E { Fdp com b } = q − c N + O (1 / N 2 ) (5.21) for some c ≥ 0, but an exact ﬁnite-sample result has not b een veriﬁed. (It fails if the d enominator exp ectat ions are v ery small.) Sim ulations sho w the original Benjamini–Hoch b erg rule b ehavi ng in the same wa y; app lying rule ( 2.8 ) s ep arately to classes A and B also con trols the ov erall exp ected v alue of Fdp com b at rate q , in the sense of ( 5.20 ). Bu t again this has n ot b een ve riﬁed analytically . The conclusion of this section is that separate false disco v ery rates anal- yses are legimate, in th e sense that they do not inﬂate the com bin ed Fdr con trol rate, at least not if the d enominator exp ectations are reasonably large. 6. Enric hmen t calculations. Micro arra y stu dies fr equen tly yield disap- p oin ting results b ecause of low p o w er for detecting ind ividually s igniﬁcant genes, Efr on ( 2007a ). “Enr ichmen t” tec hn iqu es strive for in creased p ow er b y p o oling the z -v alues from some pre-identiﬁed collec tion of genes, for in- stance, th ose from a sp eciﬁed p ath wa y , as in S ubramanian et al. (2005 ) , Newton et al. (2007 ) and Efron and Tibsh irani (2007) . By thinking of the p o oled collectio n as “class A” in ( 2.16 ), the Th eorem of Section 2 can b e brought to b ear on enric hm ent an alysis. Figure 7 inv olv es a microarra y stud y of 10,100 genes, featured in Subrama- nian et al. (2005 ) , concerning transcription f actor, p 53. The stud y compared 17 normal cell lines with 33 lines exhibiting p53 m utations. T w o-sample t - tests yielded z -v alues z i for eac h gene, but the results were disapp oin ting: a standard Fdr test ( 2.8 ), with q = 0 . 1, yielded only one nonnull gene, “BAX.” The solid histogram in the left panel of Figure 7 sh o ws z i v alues for the 40 genes in set “P53 UP ,” a collection of genes kno wn to b e up -regulate d by gene p53. Compared with the line histogram of the 10,060 other z i ’s, the P53 UP set d eﬁnitely lo oks “enric hed,” eve n though it con tains only one individually signiﬁcan t z -v alue. The same analysis as in Figure 6 w as applied in Figure 7 , with class A no w the 40 P53 UP genes. Th e right panel sho w s d Fdr( z ) for the com bined analysis of all 10,10 0 genes [obtained from lo cfdr , using the theoretical n ull ( 1.1 )], and c fdr A ( z ) = c fdr( z ) · π A ˆ π A ( z ) , (6.1) 18 B. EFRON with π A = 40 / 1 0 , 100 as in (4.1) and ˆ π A ( z ) obtained from a cubic logistic regression. W e see that c fdr A ( z ) is m uc h smaller than c fdr( z ) for z ≥ 2. S ix of the P53 UP genes ha v e c fdr A ( z i ) < 0 . 10, now indicating strong evidence of b eing nonn u ll. The null hyp othesis of no enric hmen t can b e started as fdr A ( z ) = fdr( z ). Assuming π A 0 ( z ) constan t, as in ( 3.9 ), Theorem ( 2.16 ) pr o vides the equiv- alen t statemen t enrichment nul l hyp othesis : π A ( z ) = constan t , (6.2) so we can use π A ( z ) to test for enric hment. F or in stance, w e migh t estimate π A ( z ) with a linear logistic regression, and u se the test statistic S = ˆ β / c se, (6.3) where ˆ β is the estima ted regression slop e and c se its estimated standard error. F or the P53 UP gene set, ( 6.3 ) ga v e S = 4 . 54, t wo- sided p -v alue 6 . 10 − 6 . This agrees with the analyses in S u bramanian et al. and Efron and Tib- shirani, b oth of wh ic h judged P53 UP “enric hed ,” even taking accoun t of sim u ltaneous testing for sev eral hundred other gene s ets. (In situations lik e that of Figure 6 , where the class A z i ’s extend across a wide range of neg- ativ e and p ositiv e v alues, S should b e calculated sep arately for z < 0 and z > 0 ; in Figure 6 the p ositiv e z i ’s yielded S = 3 . 23 , p -v alue 0 . 001.) Fig. 7. p53 mi cr o arr ay study, 10,000 genes, c omp aring normal versus mutate d c el l lines. Solid histo gr am in left p anel shows z -values for 40 genes in class P53 UP, c omp ar e d with al l others ( line histo gr am ) . Right p anel c omp ar es c fdr( z ) for al l 10,000 genes wi th c fdr A ( z ) obtaine d as in (4.12). Six of the P53 UP genes have c fdr A ( z ) < 0 . 1 . SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 19 Remark E connects ( 6.3 ) to more familiar enric hment test statistics, and suggests that it is like ly to b e reasonably eﬃcien t. T he approac h here has one notable adv anta ge: w e obtain an assessment of individual s igniﬁcance for the genes within the set, via c fdr A ( z i ) , rather than just an o v erall decision of enric h men t. 7. Eﬃciency . W e used the Theorem of S ection 2 to estimat e fdr A ( z ) for sub class A: c fdr A ( z ) = c fdr( z ) π A ˆ π A ( z ) , (7. 1) [setting π A 0 ( z ) in ( 2.6 ) equal to π A in Figures 6 and 7 ]. Of course, we could also estimate, fdr A ( z ) or the tail area analogue Fdr A ( z ), ( 2.19 ), directly from the class A d ata alone, but ( 7.1 ) is su bstan tially more eﬃcien t. This section giv es a very brief o verview of the eﬃciency calculation, with Remark E of Section 8 pro viding a little more detail, concluding w ith a simulatio n example supp orting the accuracy of our metho dology . T aking loga rithms in ( 7.1 ) giv es log c fdr A ( z ) = log c fdr( z ) + log ˆ R A ( z ) , [ ˆ R A ( z ) = π A / ˆ π A ( z )] . (7.2) It tu r ns out th at log c fdr( z ) and log ˆ R A ( z ) are nearly un correlated with eac h other, leading to a conv enien t approximat ion for the standard deviation of log c fdr A ( z ): sd { log c fdr A ( z ) } = [( sd { log c fdr( z ) } ) 2 + ( sd { log ˆ R A ( z ) } ) 2 ] 1 / 2 . (7.3) Section 5 of Efron (2007b) p ro vides an accurate delta m ethod formula for sd { log c fdr( z ) } ; sd { log ˆ R A ( z ) } is also easy to approximat e, u sing familiar lo- gistic regression calcula tions. W e exp ect c fdr A ( z ) to b e more v ariable than c fdr( z ) since class A inv olv es only prop ortion π A of all N cases; standard sample-size considerations imply sd { log c fdr A ( z ) } ∼ 1 √ π A sd { log c fdr( z ) } (7.4) if c fdr A ( z ) and c fdr( z ) w ere estimated d irectly . E stimati on metho d ( 7.1 ) do es b etter—the extra v ariabilit y added to c fdr( z ) by ˆ R A ( z ) = π A / ˆ π A ( z ), repr e- sen ted by the last term in ( 7.3 ), tends to b e m u c h smaller than ( 7.4 ) suggests. Figure 8 illustrates a sim ulation example. In terms of the T w o-Class m o d el of Figure 3 , the p arameters are N = 5000 , π A = 0 . 01 , p A 0 = 0 . 5 , p B 0 = 1 . 0 (7.5) f A 0 = f B 0 ∼ N (0 , 1) and f A 1 = ∼ N (2 . 5 , 1) ; 20 B. EFRON Fig. 8. The thr e e standar d deviation terms of ( 7.3 ) for sim ul ation m o del ( 7.5 ) . so we exp ect 50 of the 5000 z i ’s to b e from class A, and 25 of these to b e nonnulls distributed as N ( 2 . 5 , 1), the remaining 4975 cases b eing N (0 , 1) n ulls. A t z = 2 . 5, the cen ter of the n onn u ll distribution, the ratio sd { log c fdr A ( z ) } /sd { log c fdr( z ) } is 1 . 61, compared to the ratio 10 suggested by ( 7.4 ). Similar results w ere obtained for other c hoices of the simulatio n parameters, ( 7.5 ). See Remark H . 8. Remarks. The remarks of this section expand on some of the tec hnical p oin ts raised earlier. Remark A ( Information loss if X is ignor e d ). The observed co v ariate X in (5.2), or x i in (4.3), is an ancillary statistic that aﬀects the p osterior probabilit y of a null case, fdr x ( z ) = Prob { I = 1 | X = x, Z = z } . General prin - ciples say that ignoring X w ill increase prediction error f or I , and this can b e made pr ecise by considering sp eciﬁc loss functions. Supp ose we wish to predict a binary v ariate I that equals 1 or 0 with probabilit y p or 1 − p ; f or a prediction “ P ” in (0 , 1), let the loss fu nction b e Q ( I , P ) = q ( P ) + ˙ q ( P )( I − P ) , (8.1) where q ( · ) is a p ositiv e conca ve fun ctio n on (0 , 1) satisfying q (0) = q (1) = 0 [e.g., q ( p ) = p · (1 − p ) or q ( p ) = − { p log( p ) + (1 − p ) log(1 − p ) } ], and ˙ q ( p ) = SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 21 dq /dp . This is the “Q class” of loss fun ctio ns discussed in E f ron ( 2004a ). It then tur ns out that choosing P equal to the true probabilit y p minimizes exp ected loss, w ith risk E { Q ( I , p ) } = q ( p ). Giv en Z = z , the marginal false disco ve ry r ate is fdr( z ) = Prob { I = 1 | Z = z } = Z X fdr x ( z ) π ( x | z ) dx, (8.2) similar to ( 5.8 ). Th en, with fdr( z ) or fd r x ( z ) playi ng the r ole of the true probabilit y P , we hav e q (fdr( z )) = q  Z X fdr x ( z ) π ( x | z ) dx  ≥ Z X q (fdr x ( z )) π ( x | z ) dx (8. 3) b y Jensen’s inequalit y—in other wo rds, the u n conditional marginal risk q (fdr( z )) exceeds the exp ected risk conditioning on x . Remark B [ Co eﬃcient of variation of Fdr( z )]. Standard calculatio ns in volving the ﬁrst three momen ts of a binomial v ariate yield the mean and v ariance of D = Fdr( z ) / Fdr( z ), D ˙ ∼ (1 + d − d 2 c, d − d 2 (6 c − 1)) (8.4) for d = (1 − F ( z )) / ( N · F ( z )) as in ( 5.11 ), and c = (1 − 2 F ( z )) / (1 − F ( z )) , with errors O (1 / N 3 ). This giv es app ro ximate co eﬃcien t of v ariation CV ( Fdr) . = d 1 / 2 [1 − d (3 c − 1 / 2)] , (8.5) impro ving on ( 5.1 3 ). Remark C ( Poisson mo del f or Fdr r elationship ). Let Z indicate some region of interest in the space of z -v alues for Figure 3 , for instance z ≤ z A in the A branc h and z ≤ z B in the B b ranc h. Denote th e n umber of null, nonnull, and total z i v alues in Z as N 0 ( Z ), N 1 ( Z ) and N ( Z ) = N 0 ( Z ) + N 1 ( Z ), with corresp onding exp ectations e 0 ( Z ), e 1 ( Z ) and e ( Z ) = e 0 ( Z ) + e 1 ( Z ). S ectio n 5 considers the relatio nship of three quanti ties, Fdr( Z ) = e 0 ( Z ) N ( Z ) , Fdr ( Z ) = e 0 ( Z ) e ( Z ) and Fdp( Z ) = N 0 ( Z ) N ( Z ) , (8.6) the estimated Benjamini–Hoch b erg FDR, the Ba yesia n Fdr and the F alse Disco v ery Prop ortion. No w assume th at N in Figure 3 is Poisson with exp ectation µ , and that the z i ’s are in dep enden t, N ∼ P oi( µ ) , z 1 , z 2 , . . . , z N indep endent, (8.7) implying that N 0 ( Z ) and N 1 ( Z ) are indep endent Poisson v ariates, N 0 ( Z ) ∼ P oi( e 0 ( Z )) indep end en t of N 1 ( Z ) ∼ P oi( e 1 ( Z )) . (8.8) 22 B. EFRON W e can write N 0 ( Z ) = e 0 ( Z ) + δ 0 , (8.9) where δ 0 has ﬁrst three moments δ 0 ∼ (0 , e 0 ( Z ) , e 0 ( Z )) , (8.10) and similarly for N 1 ( Z ) ≡ e 1 ( Z ) + δ 1 and N ( Z ) ≡ e ( Z ) + δ . Note. The indep endence in ( 8.7 ) is not a necessary assum ption, but it leads to the neatly sp eciﬁc forms of the relationship b elo w. The P oisson assum p tions make it easy to relate the t wo random quan tities Fdr( Z ) and Fdp( Z ) in ( 8.6 ) to th e p arameter Fdr ( Z ): Lemma. Under assumption ( 8.7 ), E { Fdr( Z ) } . = Fd r ( Z ) · (1 + 1 /e ( Z )) + O (1 /e ( Z ) 2 ) (8.11) and E { Fdp ( Z ) } . = Fd r( Z ) + O (1 /e ( Z ) 2 ) . (8.12) T ypic al ly, e ( Z ) = e 0 ( Z ) + e 1 ( Z ) wil l b e O ( µ ) , so that the err or terms in ( 8.11 )–( 8.12 ) ar e O (1 /µ 2 ) , eﬀ e c tiv e ly O (1 / N 2 ) . The L e mma shows that Fdr ( Z ) , the Bayesian f alse disc overy r ate, is an exc el lent appr oximation to E { Fdp( Z ) } , while E { Fdr( Z ) } is only slightly upwar d ly biase d. Pr oof. F ollo wing through deﬁnitions ( 8.6 ) and ( 8.9 ), Fdr( Z ) − Fd r ( Z ) = e 0 ( Z ) e ( Z ) + δ − e 0 ( Z ) e ( Z ) = Fdr( Z )  1 1 + δ /e ( Z ) − 1  (8.13) . = Fdr( Z )  − δ e ( Z ) + δ 2 e ( Z ) 2  , so taking exp ecta tions yields ( 8.11 ). S imilarly , Fdr( Z ) − Fd p ( Z ) = Fdr( Z ) ·  1 − 1 + δ 0 /e 0 ( Z ) 1 + δ /e ( Z )  (8.14) . = Fdr( Z ) ·  1 −  1 + δ 0 e 0 ( Z )  1 − δ /e ( Z ) + δ 2 e ( Z ) 2  . SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 23 Therefore, E { Fdr( Z ) − Fdp( Z ) } . = Fd r ( Z )  E  δ 0 δ e 0 ( Z ) e ( Z )  − E  δ 2 e ( Z ) 2  (8.15) = Fd r ( Z )  e 0 ( Z ) e 0 ( Z ) e ( Z ) − e ( Z ) e ( Z ) 2  = 0 .  Remark D ( F r e quentist Fdr c ombination r esult ). The Lemma ab o v e leads to a heuristic v eriﬁcation of ( 5.21 ): that under ( 5.1 9 ), E { Fdr com b } ≤ q in ( 5.20 ). T o b egin w ith, notice that ( 8.7 ) giv es exp ected v alues of N A 0 ( z A ) and N A ( z A ), e A 0 ( z A ) = µπ A p A 0 F A 0 ( z A ) and e A ( z A ) = µπ A F A ( z A ) , (8.16) so e A 0 ( z A ) e A ( z A ) = p A 0 F A 0 ( z A ) F A ( z A ) = Fd r A ( z A ) , (8.17) the Ba ye sian Fdr for class A, and similarly , e B 0 ( z B ) /e B ( z B ) = Fdr B ( z B ). F rom ( 8.11 ) and ( 5.19 ), app lied individually w ithin the t wo classes, Fdr A ( z A ) . = q  1 − 1 e A ( z A )  ≤ q and (8.18) Fdr B ( z B ) . = q  1 − 1 e B ( z B )  ≤ q . Therefore, the com bined Ba y esian Fdr is also b ounded by q , Fdr com b = e A 0 ( z A ) + e B 0 ( z B ) e A ( z A ) + e B ( z B ) = e A ( z A ) Fdr A ( z A ) + e B ( z B ) Fdr B ( z B ) e A ( z A ) + e B ( z B ) (8.19) ≤ e A ( z A ) q + e B ( z B ) q e A ( z A ) + e B ( z B ) = q . But E { Fdp com b } . = Fd r com b according to ( 8.12 ), v erifyin g ( 5.21 ). Remark E ( Th e slop e statistic for testing enrichment ). Slop e statisti c ( 6.3 ), S = ˆ β / c se , is asymptotically fully eﬃcien t f or enrichmen t testing un- der a tw o-sample exp onen tial families mo del. Sup p ose that all the z -v alues come, indep endently , from a one-paramete r exp onential family h a ving den- sit y functions g η ( z ) = e ηz − ψ ( η ) g 0 ( z ) , (8.20) 24 B. EFRON as in Lehmann and Romano (2005) , with η =  η A , in class A, η B , in class B. (8.21) Deﬁne β = η A − η B . As N → ∞ , the MLE ˆ β has asymp totic n ull hyp oth- esis distribution ˆ β ˙ ∼ N  β , 1 N π A π B V ( η 0 )  , (8.22) where π A and π B are the pr op ortions of z i ’s from the t wo classes, and V ( η 0 ) is the v ariance of z if η A = η B equals, sa y , η 0 . Ba ye s ru le applied to ( 8.20 )–( 8.21 ) giv es logit( π A ( z )) = β z + c  c = log  π A π B  + ψ ( η B ) − ψ ( η A )  , (8.23) with β = η A − η B as ab o v e. Standard calculations sho w that ˆ β obtained from the logistic regression mo del ( 8.23 ) also satisﬁes ( 8.22 ). Th is implies full asymptotic eﬃciency of th e slop e statistic ( 6.3 ) for testing η A = η B , the “no enric h m en t” n ull hypothesis. Under norm al assu m ptions, g η ( x ) ∼ N ( η , 1) in ( 8.20 ), ( 6.3 ) is asymptoticall y equiv alen t to z A − z B , the diﬀerence of class means; “limma” [Smyth (2004 ) ], an enrichmen t test implemen ted in Bio c onductor , is also based on z A , as discussed in Efron and Tibshirani (2007 b) . Remark F [ The additive varianc e appr oximation ( 7.3 )]. Being a little more careful, w e can use ( 3.9 ) to write ( 7.1 ) as c fdr A ( z ) = c fdr( z ) π A 0 ˆ π A ( z )  π A 0 = π A p A 0 π A p A 0 + π B p B 0  , (8.24) under the assum ption that f A 0 ( z ) = f B 0 ( z ) in Figure 3 . Binning the data as in ( 3.2 ) giv es ℓ c fdr Ak = ℓ c fdr k − ℓ ˆ π Ak + log( π A 0 ) , (8 .25) where ℓ c fdr Ak is log( c fdr A ( z )) ev aluated at the midp oin t z ( k ) of bin k , and sim- ilarly , ℓ c fdr k = log ( c fdr( z ( k ) )) and ℓ ˆ π Ak = log ( ˆ π A ( z ( k ) )). L o cfdr computes the estimates ℓ c fdr k from the vec tor of count s N = ( . . . , N k , . . . ), while a standard logistic regression program computes ℓ ˆ π Ak from th e vect or of prop ortions r A = ( . . . , r Ak = N Ak / N k , . . . ), ( 3.2 ); N and r A are, to a ﬁ rst ord er of calcu- lation, uncorrelated, leading to appr o ximation ( 7.3 ). In broad outline, th e argument dep end s on the general equalit y v ar { X + Y } = v ar { X } + v ar { Y } + 2 co v { X, E ( Y | X ) } , (8.26) SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 25 applied to X = ℓ c fdr k and Y = − ℓ ˆ π Ak . Both v ar { X } and v ar { Y } are O (1 / N ), but, b ecause the exp ectatio n of r A do es not dep end on N , the co v ariance term in ( 8.26 ) is of order only O (1 / N 2 ), and can b e ignored in ( 7.3 ). Remark G ( The assumption of identic al nul l distributions ). I f w e are willing to assum e that f A 0 ( z ) = f B 0 ( z ) in Figure 3 [or equiv alen tly , that f A 0 ( z ) = f 0 ( z ), the com bined n ull densit y], th en relationship ( 3.1 ) b ecomes fdr A ( z ) = fdr( z ) π A 0 π A ( z ) [ π A 0 = π A p A 0 /p 0 ] , (8.27) as in (3.9) , which can b e written as fdr A ( z ) = fdr( z ) π A π A ( z ) p A 0 p 0 . (8.28) The examples in Figures 6 and 7 estimated R a ( z ) = π A 0 /π A ( z ) b y π A / ˆ π A ( z ), ignoring the ﬁ nal factor p A 0 /p 0 in ( 8.28 ). This is probably conserv ativ e: one w ould exp ect a small sub class A of int erest to h a v e prop ortionate ly more nonnull cases than the whole ensem ble, in other words, to ha ve p A 0 /p 0 < 1 . It isn ’t diﬃcult to estimate the f u ll relationship ( 8.27 ). S ince π A 0 . = π A ( z ) for z near 0, ( 3.8 )–( 3.9 ), w e can set c fdr A ( z ) = c fdr( z ) ˆ π A (0) ˆ π A ( z ) ; (8. 29) ( 8.29 ) giv es b etter results if the logisti c regression mo del for estimat ing π A ( z ) incorp orates a ﬂat interv al around z (0)—for instance, if only p ositiv e v alues of z are of in terest, logit( π A ( z )) = β 1 + β 2 max( z − 1 , 0) 2 + β 3 max( z − 1 , 0) 3 . (8.30) Remark H ( Simulation example ). Figure 9 graphs 100 sim ulations of c fdr A ( z ), ( 7.1 ), drawn fr om mo del ( 7.5 ). The comparison with the actual curv e fdr A ( z ) sho ws excellen t accuracy . Here neither the simula tions n or the actual curve incorp orate the factor p A 0 /p 0 in ( 8.28 ), whic h could b e included as in ( 8.29 )–( 8.30 ). A simpler correction starts with c fdr A ( z ), ( 7.1 ), estimates p A 0 b y ˆ p A 0 = X A c fdr A ( z i ) / N A , (8.31) and ﬁnally m ultiplies c fdr A ( z i ) by ˆ p A 0 /p 0 . In the simulati ons for Figure 9 , ˆ p A 0 had mean 0.575 and standard d eviati on 0.035 , reasonably close to the true v alue p A 0 = 0.50. 26 B. EFRON Fig. 9. Light l ines show 100 simulations of c fdr A ( z ) , ( 7.1 ) , fr om mo del ( 7.5 ) ; he avy li ne is actual fdr A ( z ) curve. [F actor p A 0 /p 0 in ( 8.28 ) not include d in actual or simulations.] Also shown is c ombine d r ate fdr( z ) . F ormula ( 7.1 ) pr ovides go o d ac cur acy in this c ase. 9. Discussion and su mmary . A m ore accurate title for this pap er might ha ve b een “When shouldn ’ t h yp othesis testing problems b e com bined?” A general algorithm for combining or s eparating pr oblems is b ey ond my scop e here, bu t the analysis makes it clea r that com bin ation can b e dangerous in situations like that of Figure 5 . O n the p ositiv e side, the simple Ba ye sian theorem of Section 2 , extended at ( 4.5 ), h elps signal if separation is called for, and even ho w it can b e eﬃcien tly carried out. Some sp eciﬁc p oin ts: • Com bining p roblems increases the empirical Ba yes inferenti al accuracy , with N > 1000 n ecessary for reasonably accurate d irect estimation of false disco very rates ( 5.13 )–( 5.14 ), at least in the partially nonparametric framew ork of Benjamini and Ho c h b erg (199 5) . • Ho wev er, th e Separate-Class mo del of Figure 2 , and its ensu ing theorem ( 2.16 ), imply that separate inferences can b e n ecessary for p roblems of dif- fering structure. T h e question of whether to com bine p roblems amounts, here, to a question of trading oﬀ v ariance with bias in the estimation of false disco ve ry rates. • Situations lik e that of Figure 5 argue strongly against a single combined analysis. The theorem can b e implemen ted as in Figure 6 to estimate fdr or Fdr for small s u b classes, N = 82 in Figure 6 , and with surprising accuracy as sho wn in Section 7 . • A formal test for s ep aration can b e based on the slop e statisti c ( 6.3 ). T his pro vided strong evidence for the necessit y of separation in the p53 enric h- men t example of Figure 7 , and mo derately strong evidence in Figure 6 . SEP AR A TING AN D COMBININ G HY POTHESI S TESTS 27 Fig. 10. Pair e d t -statistics c omp aring aﬀe cte d versus nonaﬀe cte d tissue in 13 c anc er p atients; m icr o arr ay study of 1262 5 genes. The t -values ar e plotte d vertic al ly, against the or der i n which they wer e r e ad fr om the arr ay. Smo othing spline ( soli d curve ) r eve als p erio dic disturb anc es. • Section 5 shows that con trolling the false disco v ery r ate in separate classes also con trols it in combinatio n, at least if the exp ected n um b er of tail ev ents isn’t to o small. In this sense, Fdr analysis has an adv an tage o ver other sim u ltaneous testing tec h niques. Whether or not th e sp eciﬁc m ethodology presente d here app eals to the reader, th e general question of which problems to com bine in a s imultaneous testing situation remains imp ortant. As a matter of d ue diligence, p lotti ng test statistics versus p ossible co v ariates, as in Figure 5 , can raise a wa rn- ing ﬂag against casual com bination. Such co v ariates exist eve n in lo osely structured microarra y studies—where, for example, the order in which the expression lev els are read oﬀ the plate can rev eal n otic eable eﬀects. This last p oin t is illustrated in Figure 10 , w h ere a p erio dic distu r bance in the microarra y reading m ec hanism has evident ly aﬀected the gene-wise summary statist ics. Sub tracti ng the estimated disturbance function from the observ ed t -statistic s is an ob viously wise ﬁr st step. Adjustmen ts that mak e cases more comparable are a complement ary tac tic to separate analyses. Both can b e u s efu l in large-scale testing situations. In Figure 5 , for example, w e might adjust the z -v alues by subtracting the lo cal median and dividing b y the lo cal spread (84%–16%). The r esulting ve rsion of Figure 5 , h ow ev er, still displa ys obvio us inhomogeneit y , and requires separate analyses like that in Figure 7 to ferret out the interest ing cases. 28 B. EFRON REFERENCES Benjamin i, Y. and Hochber g (1995). Con trolling th e false discov ery rate: A practical and p o werful approach to multiple testing. J. R oy. Statist. So c. Ser. B 57 289–3 00. MR132539 2 Benjamin i, Y. and Yekutieli, D. (2001). The contro l of the false disco very rate under dep endency . Ann. Statist . 29 1165–11 88. MR186924 5 Efr on, B. (2004a). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Asso c. 99 96–104. MR205428 9 Efr on, B. (2004b). The estimation of prediction error: Cov ariance p enalties and cross- v alidation (with d iscussio n). J. Amer. Statist. Asso c. 99 619–642. MR209089 9 Efr on, B. (2005). Local false discov ery rates. Av ailable at http:// www-stat.stanf ord.edu/˜brad/papers/F alse.pdf . Efr on, B. (2007a). Correlation and large-scale signiﬁcance testing. J. Amer . Statist. As- so c. 102 93–10 3. MR229330 2 Efr on, B. (2007 b). Size, p o wer and false discov ery rates. Ann. Statist. 35 1351–13 77. Efr on, B. and Tibshirani, R. (2007 ). On testing th e signiﬁcance of sets of genes. A nn. Appl. Statist. 1 107–129. Ferkinst ad, E., Frigessi, A., Thorl eifsson, G. and K ong, A. (2007). Co va riate-modulated false discov ery rates. Av ailable at http://fol k.uio.no/e gilfe/cmfdr- ims.pdf . Genovese, C., Roeder, K. and W asserma n, L. (2006). F alse discov ery control with p -val ue w eigh ting. Bi ometrika 93 509–524. MR226143 9 Lehmann, E. and Romano, J. (2005). T esting Statistic al Hyp otheses , 3rd ed. Sp ringer, New Y ork. MR213592 7 Newton, M., Qui nt ana , F., de n Boon, J., Sengup t a, S. and Ahlquist, P. (2007). Random-set metho ds identify distinct asp ects of the enric h men t signal in gene-set anal- ysis. Ann. Appl. Statist. 1 85–106. Schw ar tzman, A., Dougher ty, R. F. and T a ylor, J. E. (2005). Cross-sub ject com- parison of principal diﬀusion direction maps. Magn. R eson. Me d. 53 1423– 1431. Smyth, G. (2004). Linear mo d els and empirical Bay es metho ds for assessi ng diﬀeren tial expression in microarra y exp erimen ts. Stat. Appl. Genet. Mol. Biol. 3 (1). Ava ilable at http://www .bepress.c om/sagmb/vol3/iss1/art3 . MR210145 4 Subramanian, A., T am a yo, P., Mootha, V. K., Mu kherjee, S., Ebe r t, B. L., Gillette, M. A., P aulo vich, A., Pomero y, S. L., Golub, T. R., Lander, E. S. and Me sir o v, J. P. (2005). Gene set enrichmen t analysis: A knowledge-based ap- proac h for interpreting genome-wide exp ressio n p roﬁles. Pr o c. Natl. A c ad. Sci. 102 15545– 15550. Dep ar tmen ts of St a tistics and Heal th, Research and Po licy Sequoia Hall St anford University St anford, California 94305-4065 USA E-mail: brad@stat.stanford.edu

Simultaneous inference: When should hypothesis testing problems be combined?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment