Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures Asymptotic Results on A daptiv e F alse Disco v ery Rate Con trolling Pro cedures Based on Kernel Estimators Pierre Neuvial pierre.neuvial@genopole.cnrs.fr L ab or atoir e de Pr ob abilités et Mo dèles Alé atoir es Université Paris VII-Denis Dider ot 175 rue du Chevaler et, 75013 Paris, F r anc e INSERM, U900, Paris, F-75248 F r anc e Ec ole des Mines de Paris, ParisT e ch, F ontaineble au, F-77300 F r anc e Institut Curie, 26 rue d’Ulm, Paris c e dex 05, F-75248 F r anc e Curr ent aﬃliation: L ab or atoir e Statistique et Génome Université d’Évry V al d’Essonne, UMR CNRS 8071 – USC INRA, F r anc e Editor: Abstract The F alse Disco very Rate (FDR) is a commonly used type I error rate in multiple testing problems. It is deﬁned as the expected F alse Disco very Prop ortion (FDP), that is, the exp ected fraction of false p ositiv es among rejected hypotheses. When the hypotheses are indep enden t, the Benjamini-Hoch b erg procedure achiev es FDR con trol at any pre-sp eciﬁed lev el. By construction, FDR control oﬀers no guarantee in terms of p ow er, or type I I error. A n um b er of alternativ e pro cedures hav e b een developed, including plug-in pro cedures that aim at gaining p o w er by incorporating an estimate of the prop ortion of true null h yp otheses. In this pap er, we study the asymptotic b ehavior of a class of plug-in pro cedures based on k ernel estimators of the density of the p -v alues, as the num b er m of tested hypotheses gro ws to inﬁnity . In a setting where the hypotheses tested are indep endent, we prov e that these pro cedures are asymptotically more p ow erful in tw o resp ects: (i) a tighter asymptotic FDR control for an y target FDR level and (ii) a broader range of target levels yielding p ositiv e asymptotic p ow er. W e also sho w that this increased asymptotic p ow er comes at the price of slo wer, non-parametric con v ergence rates for the FDP . These rates are of the form m − k/ (2 k +1) , where k is determined by the regularit y of the density of the p -v alue distribution, or, equiv alently , of the test statistics distribution. These results are applied to one- and tw o-sided tests statistics for Gaussian and Laplace lo cation mo dels, and for the Studen t mo del. Keyw ords: Multiple testing, F alse Discov ery Rate, Benjamini Ho c h b erg’s pro cedure, p o w er, criticality , plug-in pro cedures, adaptive control, test statistics distribution, conv er- gence rates, kernel estimators. 1. Introduction Multiple sim ultaneous h yp othesis testing has b ecome a ma jor issue for high-dimensional data analysis in a v ariety of ﬁelds, including non-parametric estimation by wa velet meth- o ds in image analysis, functional magnetic resonance imaging (fMRI) in medicine, source detection in astronom y , and DNA microarra y or high-throughput sequencing analyses in 1 P. Neuvial genomics. Giv en a set of observ ations corresp onding either to a null hypothesis or an al- ternativ e hypothesis, the goal of multiple testing is to infer which of them corresp ond to true alternatives. This requires the deﬁnition of risk measures that are adapted to the large n umber of tests p erformed: typically 10 4 to 10 6 in genomics. The F alse Discov ery Rate ( FDR ) in tro duced by Benjamini and Ho ch b erg (1995) is one of the most commonly used and one of the most widely studied such risk measure in large-scale m ultiple testing problems. The FDR is deﬁned as the exp ected prop ortion of false p ositives among rejected h yp otheses. A simple pro cedure called the Benjamini-Ho ch b erg (BH) pro cedure provides FDR control when the tested hypotheses are indep endent (Benjamini and Ho ch b erg, 1995) or follow sp eciﬁc types of p ositive dep endence (Benjamini and Y ekutieli, 2001). When the hypotheses tested are indep endent, applying the BH pro cedure at level α in fact yields FDR = π 0 α , where π 0 is the unknown fraction of true n ull hypotheses (Ben- jamini and Y ekutieli, 2001). This has motiv ated the developmen t of a num b er of “plug-in” pro cedures, which consist in applying the BH pro cedure at level α/ ˆ π 0 , where ˆ π 0 is an esti- mator of π 0 . A t ypical example is the Storey- λ pro cedure (Storey, 2002; Storey et al., 2004) in which ˆ π 0 is a function of the empirical cumulativ e distribution function of the p -v alues. In this pap er, w e consider an asymptotic framework where the n umber m of tests p erformed goes to inﬁnity . When ˆ π 0 con v erges in probabilit y to π 0 , ∞ ∈ [ π 0 , 1) as m → + ∞ , the corresp onding plug-in pro cedure is by construction asymptotically more p o w erful than the BH pro cedure, while still providing FDR ≤ α . How ever, as FDR control only implies that the exp e cte d FDP is below the target level, it is of in terest to study the ﬂuctuations of the FDP achiev ed by such plug-in procedures around their corresp onding FDR. This pap er studies the inﬂuence of the plug-in step on the asymptotic prop erties of the corresp onding pro cedure for a particular class of estimators of π 0 , whic h may be written as kernel estimators of the density of the p -v alue distribution at 1. 2. Background and notation 2.1 Settings T esting one hypothesis. W e consider a test statistic X distributed as F 0 under a null h yp othesis H 0 and as F 1 under an alternativ e hypothesis H 1 . W e assume that for a ∈ { 0 , 1 } , F a is con tinuously diﬀeren tiable, and that the corresponding density function, whic h w e denote by f a , is p ositive. This testing problem may b e formulated in terms of p -v alues in- stead of test statistics. The p -v alue function is deﬁned as p : x 7→ P H 0 ( X ≥ x ) = 1 − F 0 ( x ) for one-sided tests and p : x 7→ P H 0 ( | X | ≥ | x | ) for tw o-sided tests. As F 0 is contin uous, the p -v alues are uniform on [0 , 1] under H 0 . F or consistency we denote b y G 0 the corresp onding distribution function, that is, the identit y function on [0 , 1] . Under H 1 , the distribution function and density of the p -v alues are denoted by G 1 and g 1 , resp ectively . Their ex- pression as functions of the distribution of the test statistics are recalled in Prop osition 1 b elo w in the case of one- and tw o-sided p -v alues. F or tw o-sided p -v alues, we assume that the distribution function of the test statistics under H 0 is symmetric (around 0): ∀ x ∈ R , F 0 ( x ) + F 0 ( − x ) = 1 . (Sym) Condition (Sym) is typically fulﬁlled in usual mo dels such as Gaussian or Laplace (double exp onen tial) mo dels. U nder (Sym), the t w o-sided p -v alue satisﬁes p ( x ) = 2 (1 − F 0 ( | x | )) for any x ∈ R . Prop osition 1 (One- and tw o-sided p -v alues) F or t ∈ [0 , 1] , let q 0 ( t ) = F − 1 0 (1 − t ) . The distri- bution function G 1 and the and density function g 1 of the p -value under H 1 at t satisfy the fol lowing: 1. for a one-side d p -value, G 1 ( t ) = 1 − F 1 ( q 0 ( t )) and g 1 ( t ) = ( f 1 /f 0 ) ( q 0 ( t )) ; 2 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures 2. for a two-side d p -value, G 1 ( t ) = 1 − F 1 ( q 0 ( t/ 2)) + F 1 ( − q 0 ( t/ 2)) and g 1 ( t ) = 1 / 2 (( f 1 /f 0 ) ( q 0 ( t/ 2)) + ( f 1 /f 0 ) ( − q 0 ( t/ 2))) . The assumption that f 1 is p ositiv e en tails that g 1 is p ositiv e as w ell. W e further assume that G 1 is concav e. (Conc) As g 1 is a func tion of the likelihoo d ratio f 1 /f 0 and the n on-increasing function q 0 , (Conc) ma y b e characterized as follows: Lemma 2 (Concavit y and likelihoo d ratios) 1. F or a one-side d p -value, (Conc) holds if and only if the likeliho o d r atio f 1 /f 0 is non-de cr e asing. 2. F or a two-side d p -value under (Sym) , (Conc) holds if and only if x 7→ ( f 1 /f 0 )( x ) + ( f 1 /f 0 )( − x ) is non-de cr e asing on R + . Multiple testing setting. W e consider a sequence of indep endent tests p erformed as describ ed ab ov e and indexed by the se t N ∗ of p ositive integers. W e assume that either all of them are one-sided tests, or all of them are tw o-sided tests. This sequence of tests is c haracterized by a sequence ( H , p ) = ( H i , p i ) i ∈ N ∗ , where for each i ∈ N ∗ , p i is a p -v alue asso ciated to the i th test, and H i is a binary indicator deﬁned b y H i = ( 0 if H 0 is true for test i 1 if H 1 is true for test i . W e also let m 0 ( m ) = P m i =1 (1 − H i ) , and π 0 ,m = m 0 ( m ) /m . F ollowing the terminology prop osed by Ro quain and Villers (2011), we deﬁne the c onditional setting as the situation where H is deterministic and p is a sequence of indep endent random v ariables suc h that for i ∈ N ∗ , p i ∼ G H i . This is a particular case of the setting originally considered by Benjamini and Hoch b erg (1995), where no assumption was made on the distribution of p i when H i = 1 . In the present pap er, we consider an unc onditional setting introduced b y Efron et al. (2001), whic h is also known as the “random eﬀects” setting. Sp eciﬁcally , H is a sequence of random indicators, independently and identically distributed as B (1 − π 0 ) , where π 0 ∈ (0 , 1) , and conditional on H , p follows the conditional setting, that is, the p -v alues satisfy p i | H i ∼ G H i . This unconditional setting has b een widely used in the multiple testing literature, see, e.g., Storey (2003); Genov ese and W asserman (2004); Chi (2007a). In this setting, the p -v alues are indep enden tly , identically distributed as G = π 0 G 0 + (1 − π 0 ) G 1 , and m 0 ( m ) follo ws the binomial distribution B in ( m, π 0 ) . Remark 3 W e ar e assuming that π 0 < 1 , which implies that the pr op ortion 1 − π 0 ,m of true nul l alternatives do es not vanish as m → + ∞ . While this r estriction is natur al in the unc onditional setting c onsider e d in this p ap er, we note that our r esults do not apply to the “sp arse” situation wher e π 0 ,m → 1 as m → + ∞ . As G 0 is the identit y function, the multiple testing mo del is entirely c haracterized by the t wo parameters π 0 and G 1 (or, equiv alently , π 0 and G ), where G 1 is itself en tirely c haracterized b y F 0 and F 1 , by Prop osition 1. The mixture distribution G is conca ve if and only if (Conc) holds. More generally , we note that making a regularity assumption on G 1 (or g 1 ) is equiv alent to making the same regularity assumption on G (or g ): Remark 4 (Diﬀerentiabilit y assumptions) Thr oughout the p ap er, diﬀer entiability assumptions on the distribution of the p -values ne ar 1 ar e expr esse d in terms of g , the (mixtur e) p -value density. As g = π 0 + (1 − π 0 ) g 1 , we note that they c ould e qual ly b e written in terms of g 1 , the p -value density under the alternative hyp othesis. 3 P. Neuvial 2.2 T yp e I and I I error rate control in multiple testing W e deﬁne a multiple testing pro cedure P as a collection of functions ( P α ) α ∈ [0 , 1] suc h that for an y α ∈ [0 , 1] , P α tak es as input a v ector of m p -v alues, and returns a subset of { 1 , . . . m } corresp onding to the indices of h yp otheses to b e rejected. F or a given pro cedure P and a giv en α ∈ [0 , 1] , the function P α will b e called “Pro cedure P at (target) level α ”. In this pap er, we focus on thr esholding-b ase d multiple testing pro cedures, for which the rejected hypotheses are those with p -v alues less than a threshold. Each p ossible v alue for the threshold corresponds to a trade-oﬀ b et w een false positives (t yp e I errors) and false negativ es (type I I errors). Most risk measures developed for multiple testing procedures are based on t yp e I errors. W e focus on one such measure, the F alse Discov ery Rate ( FDR ), whic h is one of the most widely used error rate in m ultiple testing. Denoting b y R m b e the total num b er of rejections of P α among m hypotheses tested, and by V m the n um b er of false rejections, the corresp onding F alse Discov ery Prop ortion is deﬁned as FDP m = V m / ( R m ∨ 1) , and the F alse Discov ery Rate is the e xpected FDP , that is: FDR m = E  V m R m ∨ 1  . (1) A trivial w ay to control the FDR — or any risk measure only based on type I errors — is to make no rejection with high probability . Ob viously , this is not the b est strategy , as it ma y lead to a high num b er of t yp e II errors. The performance of multiple testing pro cedures ma y be ev aluated through their p ow er, whic h is a function of the n umber of t yp e I I errors. Sp eciﬁcally , the p o w er of a multiple testing pro cedure at lev el α is generally deﬁned as the (random) prop ortion of correct rejections (true p ositives) among true alternative hypotheses, see, e.g., Chi (2007a): Π m = R m − V m ( m − m 0 ( m )) ∨ 1 . (2) Remark 5 A l l of the quantities deﬁne d in this se ction implicitly dep end on the multiple testing pr o c e dur e c onsider e d, P = ( P α ) α ∈ [0 , 1] . However, for simplicity, we wil l write R m , V m , FDR m , and Π m , inste ad of R P α m , V P α m , FDR P α m , and Π P α m whenever not ambiguous. Remark 6 (Po w er of thresholding-based pro cedures) By deﬁnition, the p ower of a thr esholding- b ase d pr o c e dur e is a non-de cr e asing function of its thr eshold. Ther efor e, among thr esholding-b ase d pr o c e dur es that yield FDR less than a pr escrib e d level, maximizing p ower is e quivalent to maximizing the thr eshold of the pr o c e dur e. 2.3 The Benjamini-Ho c hberg pro cedure Supp ose we wish to con trol the FDR at lev el α . Let p (1) ≤ . . . ≤ p ( m ) b e the ordered p -v alues, and denote by H ( i ) the n ull hypothesis corresp onding to p ( i ) . Deﬁne b I m ( α ) as the largest index k ≥ 0 such that p ( k ) ≤ αk /m . The Benjamini-Hoch b erg procedure at lev el α rejects all H ( i ) suc h that i ≤ b I m ( α ) (if b I m ( α ) = 0 , then no rejection is made). This pro cedure has b een proposed b y Benjamini and Hoch b erg (1995) in the con text of FDR con trol; Seeger (1968) rep orted that it had previously b een used by Eklund (1961–1963) in another multiple testing context. When all true null h yp otheses are indep endent, the BH pro cedure at level α yields str ong FDR control, that is, it entails FDR ≤ α regardless of the num b er of true null h yp otheses (Benjamini and Ho ch b erg, 1995). The BH pro cedure also controls the FDR when the p -v alues satisfy sp eciﬁc forms of p ositive dep endence, see Benjamini and Y ekutieli (2001). Figure 1 illus trates the application of the BH pro cedure 4 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures with α = 0 . 2 to m = 100 sim ulated hypotheses, among which 20 are true alternativ es. The left panel illustrates the abov e deﬁnition of the BH pro cedure. An equiv alent deﬁnition is that the procedure rejects all hypotheses with associated p -v alue is less than b τ m ( α ) = α b I m ( α ) /m . The right panel provides a dual representation of the same information, where 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Sorted p−values ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y=x y= α x τ ^ ατ ^ ● ● F alse positive F alse negative ● ● F alse positive F alse negative α =0.2 ; FDP=3/17 Figure 1: Il lustr ations of the BH pr o c e dur e on a simulate d example with m = 100 . L eft: sorte d p -values: i/m 7→ p ( i ) . R ight: empiric al distribution function: t 7→ b G m ( t ) . the x and y axes hav e b een swapped. It gives a geometrical interpretation of b τ m ( α ) as the largest crossing p oint b etw een the line y = x/α and the empirical distribution function of the p -v alues, deﬁned for t ∈ [0 , 1] by b G m ( t ) = P m i =1 1 P i ≤ t : b τ m ( α ) = sup { t ∈ [0 , 1] , b G m ( t ) ≥ t/α } . (3) 2.4 Plug-in pro cedures In our setting where all of the hypotheses tested are indep endent, the BH pro cedure at target level α (henceforth denoted by BH ( α ) for short) in fact yields FDR control at level π 0 α exactly (Benjamini and Ho c hberg, 1995; Benjamini and Y ekutieli, 2001). This entails that the BH ( α 0 ) pro cedure yields FDR ≤ α if and only if α 0 ≤ α/π 0 . Therefore, as the threshold of the BH( α ) pro cedure is a non-decreasing function of α and by Remark 6, the BH ( α/π 0 ) pro cedure is optimal in our setting, in the sense that it yields maximum p o w er among procedures of the form BH ( α 0 ) that control the FDR at lev el α . As π 0 is unkno wn, this pro cedure cannot b e implemen ted; it is generally referred to as the Oracle BH pro cedure. Remark 7 If α ≥ π 0 , then r eje cting al l nul l hyp otheses is optimal, as it c orr esp onds to the lar gest p ossible thr eshold while stil l maintaining FDR = π 0 ≤ α . Ther efor e, we wil l assume that α < π 0 thr oughout the p ap er. In order to mimic the Oracle pro cedure, it is natural to apply the BH procedure at lev el α/ ˆ π 0 ,m , where ˆ π 0 ,m ≤ 1 is an estimator of π 0 (Benjamini and Ho ch b erg, 2000). Such 5 P. Neuvial plug-in pro cedures (also known as tw o-stage adaptiv e pro cedures) hav e the same geometric in terpretation as the BH pro cedure (see Figure 1) in terms of the largest crossing p oint, with α/ ˆ π 0 ,m instead of α . Their rejection threshold can b e written as b τ 0 m ( α ) = b τ m ( α/ ˆ π 0 ,m ) , that is: b τ 0 m ( α ) = sup { t ∈ [0 , 1] , b G m ( t ) ≥ ˆ π 0 ,m t/α } . (4) Note that b τ 0 m dep ends on the observ ations through b oth b G m and ˆ π 0 ,m . By construction, a plug-in pro cedure based on an estimator ˆ π 0 ,m that conv erges in probability to π 0 , ∞ ∈ [ π 0 , 1) as m → + ∞ is asymptotically more p ow erful that the original BH pro cedure. The Storey- λ estimator. A dapting a metho d originally proposed by Sch weder and Sp jøtv oll (1982), Storey (2002) deﬁned ˆ π Sto 0 ,m ( λ ) = # { i/P i ≥ λ } / # { i ≥ λ } for λ ∈ (0 , 1) . This estimator ma y also b e written as a function of the empirical distribution of the p - v alues: ˆ π Sto 0 ,m ( λ ) = 1 − b G m ( λ ) 1 − λ . (5) The rationale for ˆ π Sto 0 ,m ( λ ) is that under (Conc), larger p -v alues are more likely to corresp ond to true null hypotheses than smaller ones. Moreov er, ˆ π Sto 0 ,m ( λ ) con verges in probability to (1 − G ( λ )) / (1 − λ ) , where the limit is greater than π 0 as G sto chastically dominates the uniform distribution. Several choices of λ hav e b een prop osed, including λ = 1 / 2 (Storey and Tibshirani, 2003), a data-driven c hoice based on the b o otstrap Storey et al. (2004), and λ = α (Blanchard and Ro quain, 2009). In our setting, a sligh tly mo diﬁed version of the corresp onding plug-in BH ( α/ ˆ π Sto 0 ,m ( λ )) pro cedure where 1 /m is added to the numerator in (5) ac hieves strong FDR control at level α (Storey et al., 2004). W e note that the Storey- λ estimator ˆ π Sto 0 ,m ( λ ) can b e viewed as a kernel estimator of the density g at 1. Deﬁnition 8 (Kernel of order ` and k ernel estimator of a density at a p oint) 1. A ker- nel of or der ` ∈ N is a function K : R → R such that the functions u 7→ u j K ( u ) ar e inte gr able for any j = 0 . . . ` , and satisfy R R K = 1 , and R R u j K ( u ) du = 0 for j = 1 . . . ` . 2. The kernel estimator of a density g at x 0 b ase d on m indep endent, identic al ly distribute d observations x 1 , . . . x m fr om g is deﬁne d by ˆ g m ( x 0 ) = 1 mh m X i =1 K  x i − x 0 h  , wher e h > 0 is c al le d the b andwidth of the estimator, and K is a kernel. By Deﬁnition 8, ˆ π Sto 0 ,m ( λ ) is a kernel estimator of the density g at 1 with k ernel K Sto ( t ) = 1 [ − 1 , 0] ( t ) and bandwidth h = 1 − λ . K Sto is an asymmetric, rectangular kernel of order 0. 2.5 Criticalit y and asymptotic prop erties of FDR controlling pro cedures Upp er b ounds on the asymptotic num b er of rejections of FDR controlling pro cedures hav e b een identiﬁed and characterized by Chi (2007a) and Chi and T an (2008), who introduced the notion of critic al value of a multiple testing pr oblem and that of critic al value of a multiple testing pr o c e dur e . Both notions are deﬁned formally b elow. They are tigh tly connected, with the imp ortan t diﬀerence that the former only dep ends on the multiple testing problem, while the latter dep ends on b oth the m ultiple testing problem and a sp eciﬁc multiple testing pro cedure. 6 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures Deﬁnition 9 (Critical v alue of a multiple testing problem (Chi, 2007a)) The critical v alue of the multiple testing pr oblem p ar ametrize d by π 0 and G is deﬁne d by α ? = inf t ∈ (0 , 1] π 0 t G ( t ) . (6) Chi and T an (2008, proof of Prop osition 3.2) prov ed that for any multiple testing pro cedure, for α < α ? , there exists a p ositive constant c ( α ) such that almost surely , for m large enough, the ev ents { V m /R m ≤ α } and { R m ≥ c ( α ) log m } are incompatible. This restriction is in trinsic to the multiple testing problem, in the sense that it holds regardless of the considered multiple testing pro cedure. Ob viously , this is not a limitation when α ? = 0 . W e in tro duce the following Condition: α ? > 0 . (Critic) Whether (Critic) is satisﬁed or not only depends on G . Ho wev er, the v alue of α ? as deﬁned in (6) dep ends on b oth π 0 and G . Under (Conc) we hav e α ? = lim t → 0 π 0 t/G ( t ) = π 0 / ( π 0 + (1 − π 0 ) g 1 (0)) , where g 1 (0) ∈ [0 , + ∞ ] is deﬁned by g 1 (0) = lim t → 0 g 1 ( t ) . By Prop osition 1, g 1 (0) only dep ends on the b ehavior of the test statistics distribution. In particular, under (Conc), (Critic) is satisﬁed if and only if the likelihoo d ratio f 1 /f 0 is b ounded near + ∞ . W e no w in tro duce the notion of critic al value of a multiple testing pr o c e dur e . Chi (2007a) deﬁned the critical v alue of the BH pro cedure as α ? B H = inf t ∈ (0 , 1] t/G ( t ) . Let us denote by τ ∞ ( α ) = sup { t ∈ [0 , 1] , G ( t ) ≥ t/α } (7) the rightmost crossing p oint b etw een G and the line y = x/α . Chi (2007a) has prov ed the follo wing result: Prop osition 10 (Asymptotic prop erties of the BH pro cedure (Chi, 2007a)) F or α ∈ [0 , 1] , let b τ m ( α ) b e the thr eshold of the BH ( α ) pr o c e dur e, and let τ ∞ ( α ) b e deﬁne d by (7) . L et α ? B H = inf t ∈ (0 , 1] t/G ( t ) . As m → + ∞ , 1. If α < α ? B H , then b τ m ( α ) a.s. → 0 ; 2. If α > α ? B H , then b τ m ( α ) a.s. → τ ∞ ( α ) , wher e the limit is p ositive. A straightforw ard consequence of Prop osition 10 is that the BH( α ) pro cedure has asymp- totically null p ow er when α < α ? B H and p ositive p ow er when α > α ? B H . The following Deﬁnition generalizes the notion of critical v alue of to a generic multiple testing pro cedure: Deﬁnition 11 (Critical v alue of a multiple testing pro cedure) L et P = ( P ( α )) α ∈ [0 , 1] denote a multiple testing pr o c e dur e. The critic al value of P is deﬁne d by α ? P = sup  α ∈ [0 , 1] , Π P ( α ) m a.s. − → m → + ∞ 0  . (8) The critical v alue α ? P dep ends on b oth the pro cedure P , and the multiple setting. F or the BH pro cedure, criticality ( α < α ? B H ) corresp onds to situations w here the target FDR lev el α is so small that there is no p ositive crossing p oin t b etw een G and the line y = x/α . Con versely , when α > α ? B H , there is a p ositiv e crossing p oint b etw een G and the line y = x/α , as illustrated b y Figure 1 (righ t). The almost sure conv ergence results of Prop osition 10 in the case α > α ? B H w ere extended by Neuvial (2008), in the conditional setting. Sp eciﬁcally , the threshold b τ m ( α ) of the BH pro cedure was shown to conv erge in 7 P. Neuvial distribution to τ ∞ ( α ) at rate m − 1 / 2 as so on as α > α ? B H . Neuvial (2008) also prov ed that similar central limit theorems hold for a class of thresholding-based FDR controlling pro cedures that cov e rs some plug-in procedures, including the Storey- λ pro cedure: the threshold of a pro cedure P of this class conv erges in distribution to a pro cedure-sp eciﬁc, p ositiv e v alue at rate m − 1 / 2 as so on as α > α ? P . Criticalit y of a multiple testing problem and criticality of a pro cedure. Whether (Critic) hols or not only dep ends on the b ehavior of the test statistics distribu- tion. Ho wev er, this condition is tightly connected to the critical v alue of FDR controlling pro cedures. In order to shed some light on this connection, we note that α ? = π 0 α ? B H ma y b e interpreted as the critical v alue of the Oracle BH pro cedure BH ( α/π 0 ) . Therefore, as the Oracle BH pro cedure at level α is the most p ow erful pro cedure among thresholding-based pro cedures that control FDR at level α , α ? is a low er b ound on the critical v alues of these pro cedures. Sp eciﬁcally , m ultiple problems for which (Critic) is satisﬁed or not diﬀer in that: • when (Critic) is satisﬁed, all thresholding-based pro cedures that control FDR hav e null asymp- totic p ow er in a range of levels containing [0 , α ? ) ; • when (Critic) is not satisﬁed, some pro cedures (including BH) hav e p ositiv e asymptotic p ow er for any p ositive lev el α . Organization of the pap er This pap er extends the asymptotic results of Chi (2007a) and Neuvial (2008) to the case of plug-in pro cedures of the form BH ( α/ ˆ π 0 ,m ) , where ˆ π 0 ,m is a kernel estimator of the p -v alue distribution g at 1. Sp eciﬁcally , we consider a class of k ernel estimators of π 0 , which includes a mo diﬁcation of the Storey- λ estimator, where the parameter λ tends to 1 as m → ∞ . In Section 3, we prov e that this class of estimators of π 0 ac hieves non-parametric conv ergence rates of the form m − k/ (2 k +1) /η m , where η m go es to 0 slowly enough as m → + ∞ , and k controls the regularit y of g at 1. In Section 4, w e c haracterize the critical v alue α ? 0 of plug-in pro cedures based on such estimators, and prov e that when the target FDR lev el α is greater than α ? 0 , the conv ergence rate of these plug-in pro cedures is m − k/ (2 k +1) /η m , whic h is slow er than the parametric rate ac hiev ed by the BH pro cedure and by the plug- in pro cedures studied in Neuvial (2008). In Section 5, these results are applied to one and t w o-sided tests in lo cation and Student mo dels. Practical consequences and p ossible extensions of this work are discussed in Section 6. 3. Asymptotic prop erties of non-parametric estimators of π 0 Let λ ∈ (0 , 1) . The exp ectation π 0 ( λ ) of the Storey- λ estimator is given by π 0 ( λ ) = π 0 + (1 − π 0 ) 1 − G 1 ( λ ) 1 − λ . (9) Moreo v er, as a regular function of the empirical distribution of the p -v alues, ˆ π Sto 0 ,m ( λ ) has the following asymptotic distribution for λ ∈ (0 , 1) (Genov ese and W asserman, 2004): √ m  ˆ π Sto 0 ,m ( λ ) − π 0 ( λ )  N  0 , G ( λ )(1 − G ( λ )) (1 − λ ) 2  . (10) In our setting, g 1 is p ositive, as noted in Section 2.1. Therefore, we hav e G 1 ( λ ) < 1 for an y λ ∈ (0 , 1) , and the bias π 0 ( λ ) − π 0 is positive: the Storey- λ estimator achiev es a 8 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures parametric conv ergence rate, but it is not a consistent estimator of π 0 . Under (Conc), this bias decreases as λ increases (by Equation (9)). In order to mimic the Oracle BH ( α/π 0 ) pro cedure, it is therefore natural to c ho ose λ close to 1. W e consider plug-in pro cedures where π 0 is estimated b y ˆ π Sto 0 ,m (1 − h m ) , with h m → 0 as m → + ∞ . As the limit in probabilit y of this estimator is g (1) = π 0 + (1 − π 0 ) g 1 (1) , it is consistent if and only if the following “purit y” condition, which has b een introduced by Genov ese and W asserman (2004), is met: g 1 (1) = 0 (Pur) W e note that the Storey- λ estimator is not a consistent estimator of π 0 ev en in when (Pur) is met. Moreov er, (Pur) is entirely determined by the shap e of the test statistics under the alternativ e hypothesis. The asymptotic bias and v ariance of ˆ π Sto 0 ,m (1 − h m ) are c haracterized b y Prop osition 12: Prop osition 12 (Asymptotic bias and v ariance of ˆ π Sto 0 ,m (1 − h m ) ) L et h m b e a p ositive se quenc e such that h m → 0 . 1. If mh m → + ∞ as m → + ∞ , then p mh m  ˆ π Sto 0 ,m (1 − h m ) − E  ˆ π Sto 0 ,m (1 − h m )  N (0 , g (1)) . 2. Assume that for k ≥ 1 , g is k times diﬀer entiable at 1 , with g ( l ) (1) = 0 for 1 ≤ l < k . Then E  ˆ π Sto 0 ,m (1 − h m )  − g (1) = m → + ∞ ( − 1) k g ( k ) (1) ( k + 1)! h k m + o  h k m  . Only the bias term in Prop osition 12 dep ends on the regularity k of the distribution near 1: the asymptotic bias is of order h k m , while the asymptotic v ariance of ˆ π Sto 0 ,m (1 − h m ) is of order ( mh m ) − 1 , regardless of the regularit y of the distribution. The bandwidth h m in Prop osition 12 realizes a trade-oﬀ b etw een the asymptotic bias and v ariance of ˆ π Sto 0 ,m (1 − h m ) . When the regularity of the distribution is known, a natural wa y to resolve this bias/v ariance trade-oﬀ is to calibrate h m suc h that the Mean Squared Error (MSE) of the corresp onding estimator is asymptotically minim um. This gives rise to an optimal c hoice of the bandwidth, whic h is characterized by the following prop osition: Prop osition 13 (Asymptotic prop erties of ˆ π Sto 0 ,m (1 − h m ) ) Assume that g is k times diﬀer en- tiable at 1 for k ≥ 1 , with g ( l ) (1) = 0 for 1 ≤ l < k . 1. If g ( k ) (1) 6 = 0 , then the asymptotic al ly optimal b andwidth for ˆ π Sto 0 ,m (1 − h m ) in terms of MSE is of or der m − 1 / (2 k +1) , and the c orr esp onding MSE is of or der m − 2 k/ (2 k +1) . 2. L et η m b e any se quenc e such that η m → 0 and m k/ (2 k +1) η m → + ∞ as m → + ∞ . Then, letting h m ( k ) = m − 1 / (2 k +1) η 2 m , we have, as m → + ∞ : m k/ (2 k +1) η m  ˆ π Sto 0 ,m (1 − h m ( k )) − g (1)  N (0 , g (1)) (11) Prop osition 13 is prov ed in App endix B. The conv ergence rate in (11) is a t ypical conv er- gence rate for non-parametric estimators of a density at a p oint. Ho w ever, Prop osition 13 cannot b e derived from classical results on k ernel estimators (e.g. T sybako v (2009)) as suc h results typically require that the order of the k ernel matches the regularity k of the densit y , whereas the kernel of Storey’s estimator, K Sto ( t ) = 1 [ − 1 , 0] ( t ) , is of order 0. The results that can b e obtained with kernels of order k are summarized by Prop osition 14; we refer to T sybako v (2009) for a pro of of this result. 9 P. Neuvial Prop osition 14 ( k th order k ernel estimator (T sybak o v, 2009)) Assume that for k ≥ 1 , g is k times diﬀer entiable at 1 . L et ˆ g k m (1) b e a kernel estimator of g (1) with b andwidth h m , asso ciate d with a k th or der kernel. 1. The optimal b andwidth for ˆ g k m (1) in terms of MSE is of or der m − 1 / (2 k +1) , and the c orr esp ond- ing MSE is of or der m − 2 k/ (2 k +1) ; 2. L et η m b e any se quenc e such that η m → 0 and m k/ (2 k +1) η m → + ∞ as m → + ∞ . Then letting h m ( k ) = m − 1 / (2 k +1) η 2 m , we have, as m → + ∞ : m k/ (2 k +1) η m  ˆ g k m (1) − g (1)  N (0 , g (1)) . Prop ositions 13 and 14 show that the conv ergence rate of kernel estimators of g (1) with asymptotically optimal bandwidth directly dep ends on the regularity k of g at 1 . The only diﬀerence betw een the tw o prop ositions is that the assumption that the ﬁrst k − 1 deriv ativ es of g are n ull at 1 for ˆ π 0 ,m (1 − h m ) is not needed for k th order kernel estimators. Imp ortan tly , these conv ergence rates cannot b e improv ed in our setting, in the sense that m − k/ (2 k +1) is the minimax rate for the e stimation of a densit y at a p oint where its regularity is of order k (T sybako v, 2009, Chapter 2). Connection to previously prop osed estimators. T o the b est of our knowledge, the only non-parametric estimators of π 0 for which conv ergence rates hav e b een established in our setting are those prop osed by Storey (2002), Swanepo el (1999) and Hengartner and Stark (1995). W e no w brieﬂy review asymptotic properties of these estimators in the con text of multiple testing, as stated in Genov ese and W asserman (2004), and show that their conv ergence rates can essentially b e recov ered by Prop ositions 13 and 14. Conﬁdence en v elop es for the density: Hengartner and Stark (1995) deriv ed a ﬁnite sample conﬁdence en velope for a monotone density . Assuming that G is concav e and that g is Lips- c hitz in a neighborho o d of 1 , Genov ese and W asserman (2004) obtained an estimator whic h con v erged to g (1) at rate (ln m ) 1 / 3 m − 1 / 3 . The same rate of con vergence can b e ac hieved by Prop osition 13 or 14 (for η m = (ln m ) − 1 / 3 ) if we assume that g is diﬀerentiable at 1. This is a sligh tly stronger assumption than the ones made by Hengartner and Stark (1995), but it still corresp onds to a regularity of order 1. Spacings-based estimator: Sw anep o el (1999) prop osed a tw o-step estimator of the minimum of an unknown density based on the distribution of the spacings b etw een observ ations: ﬁrst, the lo cation of the minim um is estimated, and then the density at this p oint is itself estimated. Assuming that at the v alue at which the density g achiev es its minimum, g and g (1) are null, and g (2) is b ounded aw ay from 0 and + ∞ and Lipschitz, then for any δ > 0 , there exists an estimator con verging at rate (ln m ) δ m − 2 / 5 to the true minimum. The same rate of conv ergence can b e achiev ed by Prop osition 13 or 14 (for η m = (ln m ) − δ ) if one assumes that g is twice diﬀeren tiable at 1 (and additionally that g (1) (1) = 0 for Proposition 13). In our setting, the Lipsc hitz condition for the second deriv ativ e is unnecessary: the minim um of g is necessarily ac hiev ed at 1 b ecause g is non-increasing (under (Conc)), so the ﬁrst step of the estimation in Swanepo el (1999) may b e omitted. As b oth estimators are estimators of g (1) , the diﬀerences in their asymptotic prop erties are driv en by the diﬀerences in the regularity assumptions made for g (or g 1 ) near 1, rather than by their sp eciﬁc form. 4. Consistency , criticality and conv ergence rates of plug-in pro cedures The aim of this section is to derive conv ergence rates for plug-in pro cedures based on the estimators ˆ π 0 ,m of π 0 studied in Section 3. Sp eciﬁcally , our goal is to establish central 10 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures limit theorems for the threshold b τ 0 m ( α ) of the plug-in pro cedure BH ( α/ ˆ π 0 ,m ) and the as- so ciated F alse Discov ery Prop ortion, which we denote by FDP m ( b τ 0 m ( α )) . The conv ergence results obtained by Neuvial (2008) cov er a broad class of FDR controlling pro cedures, in- cluding the BH pro cedure and plug-in pro cedures based on estimators of π 0 that depend on the observ ations only through the empirical distribution function b G m of the p -v alues (Storey, 2002; Storey et al., 2004; Benjamini et al., 2006). Although these results w ere obtained in the conditional setting of Benjamini and Ho ch b erg (1995), extending them to the unconditional setting considered here is relatively straightforw ard, b ecause the pro of tec hniques developed in Neuvial (2008) can b e adapted to this setting. F or completeness, the asymptotic prop erties of the BH pro cedure and the plug-in pro cedure based on the Storey- λ estimator are derived in App endix C. The problem considered in this section is more challenging, as the kernel estimators introduced in Section 3 dep end on m not only through b G m , but also through the bandwidth of the kernel (e.g. h m for ˆ π Sto 0 ,m (1 − h m ) ). Let ˆ π 0 ,m denote a generic estimator of π 0 . W e assume that ˆ π 0 ,m con v erges in probabilit y to π 0 , ∞ ≤ 1 as m → + ∞ . W e do not assume that π 0 , ∞ = π 0 . Therefore, ˆ π 0 ,m ma y or may not b e a consistent estimator of π 0 . W e recall that the BH ( α / ˆ π 0 ,m ) pro cedure rejects all h yp otheses with p -v alues smaller than b τ 0 m ( α ) = sup n t ∈ [0 , 1] , b G m ( t ) ≥ ˆ π 0 ,m t/α o . W e now study the b ehavior of the BH ( α / ˆ π 0 ,m ) procedure when ˆ π 0 ,m con v erges at a rate r m slo wer than the parametric rate m − 1 / 2 (i.e., m − 1 / 2 = o ( r m ) ). W e deﬁne the asymptotic threshold τ 0 ∞ ( α ) corresp onding to b τ 0 m ( α ) as τ 0 ∞ ( α ) = sup { t ∈ [0 , 1] , G ( t ) ≥ π 0 , ∞ t/α } . (12) W e hav e τ 0 ∞ ( α ) = τ ∞ ( α/π 0 , ∞ ) , that is, the asymptotic threshold of the BH pro cedure deﬁned in Equation (7) at lev el α/π 0 , ∞ . Theorem 15 (Asymptotic prop erties of plug-in pro cedures) L et ˆ π 0 ,m b e an estimator of π 0 such that ˆ π 0 ,m → π 0 , ∞ in pr ob ability as m → + ∞ . L et α ? 0 = π 0 , ∞ α ? B H . Then: 1. α ? 0 is the critic al value of the BH ( α/ ˆ π 0 ,m ) pr o c e dur e; 2. F urther assume that the asymptotic distribution of ˆ π 0 ,m is given by p mh m ( ˆ π 0 ,m − π 0 , ∞ ) N (0 , s 2 0 ) for some s 0 , with h m = o (1 / ln ln m ) and mh m → + ∞ as m → + ∞ . Then, under (Conc) , for any α > α ? 0 , (a) The asymptotic distribution of the thr eshold b τ 0 m ( α ) is given by p mh m  b τ 0 m ( α ) − τ 0 ∞ ( α )  N 0 ,  s 0 τ 0 ∞ ( α ) /α π 0 , ∞ /α − g ( τ 0 ∞ ( α ))  2 ! (b) The asymptotic distribution of the FDP achieve d by the BH ( α/ ˆ π 0 ,m ) pr o c e dur e is given by p mh m  FDP m ( b τ 0 m ( α )) − π 0 α π 0 , ∞  N   0 , π 0 αs 0 π 2 0 , ∞ ! 2   . 11 P. Neuvial Theorem 15 states that for α > α ? 0 , for any estimator ˆ π 0 ,m that con v erges in distribution at a rate r m slo wer than the parametric rate m − 1 / 2 , the plug-in pro cedure BH ( α/ ˆ π 0 ,m ) con v erges at rate r m as well. This is a consequence of the fact that r m dominates the ﬂuctuations of b G m , which are of parametric order. W e now state the main result of the pap er (Corollary 16), that is, the asymptotic prop erties of plug-in pro cedures asso ciated with the estimators of π 0 studied in Section 3, for which s 2 0 = g (1) . This result can b e derived by com bining the results of Theorem 15 with those of Prop ositions 13 and 14. Corollary 16 Assume that (Conc) holds, and that g is k times diﬀer entiable at 1 for k ≥ 1 . Deﬁne h m ( k ) = m − 1 / (2 k +1) η 2 m , wher e η m → 0 and m k/ (2 k +1) η m → + ∞ as m → + ∞ . Denote by ˆ π k 0 ,m one of the fol lowing two estimators of π 0 : • Stor ey’s estimator ˆ π Sto 0 ,m (1 − h m ( k )) ; in this c ase, it is further assume d that g ( l ) (1) = 0 for 1 ≤ l < k ; • A kernel estimator of g (1) asso ciate d with a k th or der kernel with b andwidth h m ( k ) . Then 1. α ? 0 = g (1) α ? B H is the critic al value of the BH ( α/ ˆ π k 0 ,m ) pr o c e dur e; 2. F or any α > α ? 0 , (a) The asymptotic distribution of the thr eshold b τ 0 m ( α ) is given by m k/ (2 k +1) η m  b τ 0 m ( α ) − τ 0 ∞ ( α )  N 0 ,  τ 0 ∞ ( α ) /α g (1) /α − g ( τ 0 ∞ ( α ))  2 g (1) ! (b) The asymptotic distribution of the FDP achieve d by the BH ( α/ ˆ π 0 ,m ) pr o c e dur e is given by m k/ (2 k +1) η m  FDP m ( b τ 0 m ( α )) − π 0 α g (1)  N  0 , π 2 0 α 2 g (1) 3  . W e note that unlike the mo diﬁcation of the Storey- λ estimator studied here, the estimators of π 0 based on kernels of order k do not require the ﬁrst k − 1 deriv atives of g at 1 to b e n ull. Therefore, the latter are generally preferable to the former. Corollary 16 has the follo wing consequences, which are also summarized in T able 1: • Assume that (Pur) is met. Then the asymptotic threshold of the BH ( α/ ˆ π 0 ,m ) pro cedure is τ ∞ ( α/π 0 ) , that is, the asymptotic threshold of the Oracle pro cedure BH ( α/π 0 ) . In particular, the asymptotic FDP achiev ed by the estimators in Corollary 16 is then exactly α (and its asymptotic v ariance is α 2 /π 0 ), whereas the asymptotic FDP of the original BH pro cedure is π 0 α . • W e hav e: α ? ≤ α ? 0 ≤ α ? Sto ( λ ) ≤ α ? B H (13) In mo dels where (Critic) is not satisﬁed, all the critical v alues in (13) are n ull, implying that all the corresp onding pro cedures hav e p ositive p o w er for any target FDR lev el. In mo dels where (Critic) is satisﬁed, all the critical v alues in (13) are p ositive, and (13) implies that the range of target FDR v alues α that yield asymptotically p ositive p o wer is larger for the plug-in pro cedures studied in this pap er than for the BH pro cedure or the Storey- λ pro cedure. 12 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures • W e ha v e τ 0 ∞ ( α ) ≥ τ 0 ,λ ∞ ( α ) ≥ τ ∞ ( α ) , where τ 0 ,λ ∞ ( α ) denotes the asymptotic threshold of the Storey- λ pro cedure, which is formally deﬁned and c haracterized in App endix C. Therefore, as the p ow er of a thresholding-based FDR controlling pro cedure is a non-decreasing function of its threshold (Remark 6), the asymptotic p ow er of the BH ( α/ ˆ π 0 ,m ) pro cedure is greater than that of b oth the Storey- λ and the original BH pro cedures, ev en in the range α > α ? B H where all of them hav e p ositive asymptotic p ow er. Name ˆ π 0 ,m FDR /α Rate (Asy . v ar. of FDP)/ FDR BH 1 π 0 m − 1 / 2 ( π 0 τ ∞ ( α )) − 1 − 1 Oracle BH π 0 1 m − 1 / 2 ( τ ∞ ( α/π 0 )) − 1 − 1 Storey- λ ˆ π Sto 0 ,m ( λ ) π 0 /π 0 ( λ ) m − 1 / 2 ( π 0 τ 0 ,λ ∞ ( α )) − 1 + (1 − G ( λ )) − 1 Kernel( h m ( k ) ) ˆ π k 0 ,m π 0 /g (1) m − k/ (2 k +1) g (1) − 1 T able 1: Summary of the asymptotic prop erties of the FDR controlling pro cedures consid- ered in this pap er, for a target FDR level α greater than the (pro cedure-sp eciﬁc) critical v alue. Note that “Storey- λ ” denotes the original pro cedure with a ﬁxed λ , while our extension with λ = 1 − h m ( k ) is categorized in the table as a particular case of kernel estimator (last row). F or Storey- λ , we also assume that λ > τ 0 ,λ ∞ ( α ) . These results characterize the increase in asymptotic p ow er achiev ed by plug-in pro ce- dures based on kernel estimators of π 0 . Ho wev er, this increased asymptotic p ow er comes at the price of a slow er con vergence rate. Speciﬁcally , the con vergence rate of plug-in pro cedures is the non-parametric rate m − k/ (2 k +1) /η m (where k controls the regularity of g ) for the BH ( α / ˆ π k 0 ,m ) procedure, while the parametric rate m − 1 / 2 w as ac hieved b y the original BH procedure, the Oracle BH pro cedure, and the Storey- λ pro cedure (as prov ed in App endix C). 5. Application to lo cation and Studen t mo dels In Section 4 we prov ed that the asymptotic b ehavior of plug-in pro cedures depends on whether the target FDR level α is ab ov e or b elow the critical v alue α ? 0 c haracterized by Theorem 15, and b y establishing con v ergence rates for these procedures when α > α ? 0 . Both the critical v alue α ? 0 and the obtained conv ergence rates dep end on the test statistics distribution. In the present section, these results are applied to Gaussian and Laplace lo cation mo dels, and to the Student model. W e begin b y deﬁning these mo dels (Section 5.1) and studying criticality in each of them (Section 5.2). Then, we derive conv ergence rates for plug-in pro cedures based on the k ernel estimators of π 0 considered in Sections 3 and 4, b oth for tw o-sided tests (Section 5.3) and one-sided tests (Section 5.4). 5.1 Models for the test statistics Lo cation mo dels. In lo cation mo dels the distribution of the test statistic under H 1 is a shift from that of the test statistic under H 0 : F 1 = F 0 ( · − θ ) for some lo cation parameter θ > 0 . The most widely studied location mo dels are the Gaussian and Laplace (double exp onen tial) location models. Both the Gaussian and the Laplace distribution can b e view ed as instances of a more general class of distributions introduced by Subb otin (1923) 13 P. Neuvial and given for γ ≥ 1 by f γ 0 ( x ) = 1 C γ e −| x | γ /γ , with C γ = Z + ∞ −∞ e −| x | γ /γ dx = 2Γ(1 /γ ) γ 1 /γ − 1 . (14) Therefore, the likelihoo d ratio in the γ -Subb otin lo cation mo del may b e written as f γ 1 f γ 0 ( x ) = exp  | x | γ γ − | x − θ | γ γ  . (15) The Gaussian case corresp onds to γ = 2 and the Laplace case to γ = 1 . In the Laplace case, the distribution of the p -v alues under the alternative can b e deriv ed explicitly , see Lemma 21 in App endix. W e focus on 1 ≤ γ ≤ 2 as this corresponds to situations in whic h (Conc) is fulﬁlled. Sp eciﬁcally , for one-sided tests, (Conc) holds as so on as γ ≥ 1 , b ecause then f γ 1 /f γ 0 is non-decreasing; for t wo-sided tests, if additionally γ ≤ 2 , then (Conc) holds (as prov ed in App endix A, Prop osition 22). Studen t mo del. Student’s t distribution is widely used in applications, as it naturally arises when testing equality of means of Gaussian random v ariables with unknown v ariance. In the Student model with parameter ν > 0 , F 0 is the (central) t distribution with ν degrees of freedom, and F 1 is the non-central t distribution with ν degrees of freedom and non- cen trality parameter θ > 0 . The Student mo del is not a lo cation mo del, as F 1 cannot b e written as a translation of F 0 . F ollo wing Chi (2007a, Equation (3.5)), we note that the lik eliho o d ratio of the Student mo del may b e written as f 1 f 0 ( t ) = + ∞ X j =0 a j ( ν, θ ) ψ ( j,ν ) ( t ) , (16) where ψ ( j,ν ) ( t ) = ( t/ √ t 2 + ν ) j = sgn( t ) j  1 + ν /t 2  − j / 2 for t ∈ R and a j ( ν, θ ) = e − θ 2 / 2 Γ(( ν + j + 1) / 2) Γ(( ν + 1) / 2) ( √ 2 θ ) j j ! . (17) Remark 17 The se quenc e a j ( ν, θ ) is p ositive, and it is not har d to se e that ( P j a j ( ν, θ )) is a c on- ver gent series using Stirling’s formula. Ther efor e, as ψ ( j,ν ) ( t ) ∈ [ − 1 , 1] , the dominate d c onver genc e the or em ensur es that Equation (16) is wel l-deﬁne d for any t ∈ R . Another useful expression for the Student likelihoo d ratio ma y be derived from the in tegral expression of the density of a non-cen tral t distribution giv en b y Johnson and W elc h (1940): f 1 f 0 ( t ) = exp " − θ 2 2 1 1 + t 2 ν # H h ν  − θt √ ν + t 2  H h ν (0) , (18) where H h ν ( z ) = R + ∞ 0 u ν ν ! e − 1 2 ( u + z ) 2 dx . As noted by Chi (2007a, Section 3.1), the likelihoo d ratio of Student test statistics is non-decreasing, whic h implies that (Conc) holds for one- sided tests. It also holds for tw o-sided tests, as prov ed in App endix A, Prop osition 25. The lo cation mo dels and the Student mo del considered here are parametrized b y tw o parameters: (i) a non-cen trality parameter θ , which enco des a notion of distance b etw een H 0 and H 1 ; (ii) a parameter which controls the (common) tails of the distribution under H 0 and H 1 : γ for the γ -Subb otin mo del, and ν for the Studen t mo del with ν degrees of freedom. 14 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures 5.2 Criticalit y As the asymptotic b ehavior of plug-in pro cedures crucially dep ends on whether the target FDR level is ab o v e or b elow the critical v alue α ? 0 c haracterized by Theorem 15, it is of primary importance to study criticalit y in the mo dels we are interested in. Noting that α ? 0 = π 0 , ∞ α ? B H = π 0 , ∞ α ? /π 0 , w e ha ve α ? 0 > 0 if and only if (Critic) is satisﬁed, that is, if and only if the lik eliho o d ratio f 1 /f 0 is b ounded near + ∞ . In this section, we study (Critic) in lo cation and Student mo dels. Lo cation mo dels. In lo cation mo dels, where f 1 = f 0 ( · − θ ) with θ > 0 , the b ehavior of the lik eliho o d ratio is closely related to the tail behavior of the distribution of the test statistics: for a giv en non-centralit y parameter θ , the heavier the tails, the smaller the diﬀerence betw een f 1 and f 0 . In a γ -Subb otin lo cation mo del, Equation (21) yields | 1 − θ /x | γ ∼ 1 − γ θ /x as x → + ∞ . Th us | x | γ (1 − | 1 − θ /x | γ ) ∼ γ θ x γ − 1 , and the b ehavior of the likelihoo d ratio f γ 1 /f γ 0 is driven by the v alue of γ , as illustrated by Figure 2 for the Gaussian and Laplace lo cation mo dels with lo cation parameter θ ∈ { 1 , 2 } . If γ > 1 , then lim + ∞ f γ 1 /f γ 0 = + ∞ . Therefore, the slop e of the cumulativ e distribution function of the p -v alues is inﬁnite at 0, and (Critic) is not satisﬁed for the Subb otin mo del: α ? = 0 for any θ and π 0 . This situation is illustrated by Figure 2 (left panels) for the Gaus- sian mo del ( γ = 2 ). In suc h a situation, for any target FDR lev el α , the asymptotic fraction of rejections by the BH ( α ) pro cedure or by a plug-in pro cedure of the form BH ( α/ ˆ π 0 ,m ) , where ˆ π 0 ,m → π 0 , ∞ in probability as m → + ∞ , is p ositive by Lemma 26. If γ = 1 (Laplace mo del, as illustrated by Figure 2, right panels), then the likelihoo d ratio of the mo del is f γ 1 /f γ 0 ( x ) = exp( | x | − | x − θ | ) . It is b ounded as x → + ∞ , with lim x → + ∞ f γ 1 /f γ 0 ( x ) = e θ . Therefore, (Critic) is satisﬁed for the Laplace lo cation mo del. Sp eciﬁcally , we hav e α ? = π 0 / ( π 0 + (1 − π 0 ) g 1 (0)) , with g 1 (0) = e θ for one-sided p -v alues, and g 1 (0) = cosh θ for tw o-sided p -v alues. Laplace-distributed test statistics app ear as a limit situation in terms of criticality: within the family of γ -Subb otin lo cation mo dels with γ ∈ [1 , 2] , the Laplace mo del ( γ = 1 ) is the only one for which (Critic) is satisﬁed. Studen t mo del. F or the Student mo del, Equation (18) yields that ( f 1 /f 0 )( t ) con verges to s ν ( θ ) as t → + ∞ and s ν ( − θ ) as t → −∞ , where s ν ( θ ) = H h ν ( − θ ) /H h ν (0) is p ositive for any θ . Therefore, (Critic) is satisﬁed for one-sided and tw o-sided tests in the Student mo del (this had already b een noted by Chi (2007a) for one-sided tests). Figure 3 gives the distribution function of one- and tw o-sided p-v alues in the Studen t mo del with parameters θ ∈ { 1 , 2 } and ν ∈ { 10 , 50 } , for π 0 ∈ { 0 , 0 . 5 , 0 . 75 } . Although criticality is muc h less obvious than for the Laplace mo del, the inserted plots which zo om into a region where the p -v alues are v ery small ( p < 2 . 10 − 4 ) do suggest for ν = 10 that the slop e of the distribution function at 0 is linear for the Student model. As an illustration, w e calculated that the critical v alues for one-sided tests in the Student mo del for π 0 = 0 . 75 for θ ∈ { 1 , 2 } are resp ectively 0.173 and 0.015 for ν = 10 , and 4 . 10 − 3 and 7 . 10 − 6 for ν = 50 . 5.3 Consistency and conv ergence rates for t w o-sided tests Consistency. Let us ﬁrst recall that b y Proposition 1.2, we hav e for t wo-sided tests under a model satisfying (Sym): g 1 ( t ) = 1 2  f 1 f 0 ( q 0 ( t/ 2)) + f 1 f 0 ( − q 0 ( t/ 2))  , (19) where q 0 : t 7→ F − 1 0 (1 − t ) tends to 0 as t → 1 / 2 . A straigh tforward consequence of (19) is that g 1 (1) = ( f 1 /f 0 )(0) . As f 1 > 0 , w e ha ve g (1) = π 0 + (1 − π 0 ) g 1 (1) > π 0 . 15 P. Neuvial Gaussian distribution ( θ = 1 ) Laplace distribution ( θ = 1 ) Gaussian distribution ( θ = 2 ) Laplace distribution ( θ = 2 ) Figure 2: Distribution functions G for one-side d (solid) and two-side d (dashe d) p -values, in Gaussian lo c ation mo dels (left: (Critic) is not satisﬁe d), and L aplac e lo c ation mo dels (right: (Critic) is satisﬁe d) for π 0 = 0, 0.5 and 0.75. The lo c ation p ar am- eter θ is set to 1 in top p anels and 2 in b ottom p anels. Inserte d plot: zo om in the r e gion p < 2 . 10 − 4 . Therefore, (Pur) is not met, and the kernel estimators of π 0 studied in Sec tion 3 are not consisten t for the estimation of π 0 . Sp eciﬁcally , we hav e g 1 (1) = e − θ 2 / 2 for Gaussian and Studen t test statistics, and g 1 (1) = e − θ for Laplace test statistics. Conver gence ra tes. Another consequence of (19) is that if for k ≥ 1 the likelihoo d ratio f 1 /f 0 is k times semi- diﬀeren tiable at 0, then g is k times (left-)diﬀerentiable at 1. In particular, this holds for any k in the γ -Subb otin lo cation mo del with γ ∈ [1 , 2] , whic h co vers the Gaussian and 16 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures Studen t ( ν = 50 , θ = 1 ) Studen t ( ν = 10 , θ = 1 ) Studen t ( ν = 50 , θ = 2 ) Studen t ( ν = 10 , θ = 2 ) Figure 3: Distribution functions G for one-side d tests (solid) and two-side d tests (dashe d) in Student mo dels with ν = 100 de gr e es of fr e e dom (left) and ν = 10 (right). The lo c ation p ar ameter θ was set to 1 in top p anels and 2 in b ottom p anels. Any Student mo del satisﬁes (Critic) . Inserte d plots: zo om in the r e gion p < 2 . 10 − 4 . Laplace cases. It also holds for the Student model (as pro ved in Prop osition 24). F or these models, Corollary 16 en tails that for an y k > 0 , if ˆ π 0 ,m is a k ernel estimator of g asso ciated with a k th order kernel with bandwidth h m ( k ) = m − 1 / (2 k +1) η 2 m (where η m → 0 and mη m → + ∞ as m → + ∞ ), then the corresp onding plug-in pro cedure BH ( α / ˆ π 0 ,m ) con v erges in distribu tion at rate m − k/ (2 k +1) /η m for any α greater than α ? 0 = g (1) α ? B H . These results are summarized in the last column of T able 2. Let us now consider the mo diﬁcation of the Storey- λ estimator introduced in Section 3: ˆ π 0 ,m = ˆ π Sto 0 ,m (1 − h m ) , with h m → 0 as m → + ∞ . By Corollary 16, the optimal conv ergence rate of the BH ( α / ˆ π 0 ,m ) pro cedure is then determined by the order of the ﬁrst non null deriv ativ e of g at 1. In order to calculate this order, we use the following lemma: 17 P. Neuvial Lemma 18 (Behavior of g 1 at 1 for tw o-sided p -v alues in symmetric mo dels) Under (Sym) , the density function g 1 of two-side d p -values under the alternative hyp othesis satisﬁes: 1. If f 1 /f 0 is semi-diﬀer entiable at 0, with left-derivative ` − and right-derivative ` + , then g (1) 1 is semi-diﬀer entiable at 1 and we have: g (1) 1 (1) = − ` + − ` − 4 f 0 (0) . In p articular, g (1) 1 (1) = 0 if and only if f 1 /f 0 is diﬀer entiable at 0. 2. If f 1 /f 0 is twic e diﬀer entiable at 0, then g (1) 1 is twic e diﬀer entiable at 1 and we have: g (2) 1 (1) = 1 4 f 0 (0) 2  f 1 f 0  (2) (0) . Lemma 18 may b e applied to tw o-sided tests for γ -Subb otin lo cation mo dels, and for the Studen t mo del. F or the tw o-sided Gaussian mo del, f 1 /f 0 is C ∞ near 0 and ( f 1 /f 0 ) (2) (0) 6 = 0 . The same holds for the tw o-sided Student mo del, as shown in App endix A.2 (Prop osi- tion 24). F or b oth mo dels, Lemma 18 en tails that g (1) (1) = 0 and g (2) (1) > 0 . F or t w o-sided Laplace test statistics, the likelihoo d ratio f 1 /f 0 : t 7→ exp ( | t − θ | − | t | ) has a singularity at t = 0 but it is semi-diﬀeren tiable at 0 (and diﬀerentiable on ( −∞ , θ ) \ { 0 } ), with left and right deriv atives at 0 given by ` − = 0 and ` + = e − θ . Lemma 18 yields that g (1) (1) = − (1 − π 0 ) e − θ / 2 . In particular, letting k = 1 for the Laplace mo del and k = 2 for the Gaus- sian and Student mo dels, Corollary 16 yields that if ˆ π 0 ,m = ˆ π Sto 0 ,m (1 − m − 1 / (2 k +1) η 2 m ) , where η m → 0 , then for any α > α ? 0 = g (1) α ? B H , the FDP of the BH ( α/ ˆ π 0 ,m ) pro cedure conv erges in distribution at rate m − k/ (2 k +1) /η m to wa rd π 0 α/g (1) , where g (1) = π 0 + (1 − π 0 ) e − θ 2 / 2 in the Gaussian and Student mo dels, and g (1) = π 0 + (1 − π 0 ) e − θ / 2 in the Laplace mo del. These rates are slo wer than those obtained at the b eginning of this section for k th order k ernels b ecause the latter do not require the deriv ativ es of g of order l < k to b e null at 1, whic h implied that any k > 0 could b e chosen (see T able 2 for a comparison). 5.4 Consistency and conv ergence rates for one-sided tests Consistency. F or one-sided tests, we hav e g 1 ( t ) = ( f 1 /f 0 )( q 0 ( t )) . As lim t → 1 q 0 ( t ) = −∞ , (Pur) is met if and only if the lik eliho o d ratio ( f 1 /f 0 )( t ) tends to 0 as t → −∞ . F or the Student mo del, f 1 /f 0 tends to s ν ( − θ ) > 0 as t → −∞ . This implies that (Pur) is not satisﬁed in that mo del: π 0 cannot be consistently estimated using a consistent estimator of g (1) , b ecause g (1) = π 0 + (1 − π 0 ) e − θ > π 0 . F or lo cation mo dels, we b egin by establishing a connection b et w een purity and criticality (Prop osition 20), which is a consequence of the following symmetry prop erty: Lemma 19 (Likelihoo d ratios in symmetric lo cation mo dels) Consider a lo c ation mo del in which the test statistics have densities f 0 under H 0 , and f 1 = f 0 ( · − θ ) under H 1 for some θ 6 = 0 . Under (Sym) , we have lim −∞ f 0 f 1 = lim + ∞ f 1 f 0 . F or one-sided tests in symmetric lo cation mo dels, Lemma 19 implies the following result: Prop osition 20 (Purity and criticality for one-sided tests in symmetric lo cation mo dels) L et g 1 b e the density of one-sided p -values under the alternative hyp othesis, and α ? the critic al value of the multiple testing pr oblem. Under (Sym) and (Conc) , 18 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures 1. (Critic) and (Pur) ar e c omplementary events, in the sense that α ? = 0 if and only if g 1 (1) = 0 ; 2. If lim + ∞ f 1 /f 0 is ﬁnite, then α ? = π 0 / ( π 0 + (1 − π 0 ) g 1 (0)) and g (1) = π 0 + (1 − π 0 ) g 1 (1) ar e c onne cte d by g 1 (0) g 1 (1) = 1 . Prop osition 20 implies that con trary to tw o-sided lo cation mo dels, in which we alw ays hav e g 1 (1) > 0 , consistency may b e achiev ed in one-sided lo cation mo dels using k ernel estimators suc h as those considered here, dep ending on mo del parameters. In particular, there is no criticalit y in the one-sided Gaussian mo del, implying that (Pur) is satisﬁed in that mo del: w e ha ve g (1) = π 0 , and π 0 can b e consisten tly estimated using the k ernel estimators of g (1) introduced in Section 3. In the one-sided Laplace mo del, (Critic) is satisﬁed, implying that (Pur) is not satisﬁed in that mo del: π 0 cannot b e consistently estimated using these k ernel estimators of g (1) . Conver gence ra tes. Studen t. Prop osition 24 en tails that for the one-sided Student mo del, g 1 is C ∞ , and all its deriv ativ es or order greater than 1 are null at 1. Therefore, an y k > 0 , if ˆ π k 0 ,m denotes an y of the tw o estimators studied in Corollary 16 for a k th order k ernel with bandwidth h m ( k ) = m − 1 / (2 k +1) η 2 m (where η m → 0 and mη m → + ∞ as m → + ∞ ), then the corresp onding plug-in pro cedure BH ( α/ ˆ π k 0 ,m ) con verges in distribution at rate m − k/ (2 k +1) /η m for any α greater than α ? 0 = g (1) α ? B H . These results are summarized in the ﬁrst row of T able 2. Laplace. The distribution of one-sided p -v alues in the one-sided Laplace mo del satisﬁes G 1 ( t ) = 1 − (1 − t ) e − θ for t ≥ 1 / 2 , see Lemma 21 in App endix A. Therefore, for t ≥ 1 / 2 , (1 − G ( t )) / (1 − t ) is constant, equal to g (1) = π 0 + (1 − π 0 ) e − θ , as illustrated by the solid curv es in the right panels of Figure 2. Therefore, for any ﬁxed λ ≥ 1 / 2 , the Storey- λ estimator is an unbiased estimator of g (1) , which conv erges to g (1) at rate m − 1 / 2 . The same prop erty holds for any kernel estimator of g (1) with a ﬁxed bandwidth. These results are summarized in the third ro w of T able 2. Gauss. In the Gaussian mo del how ever, the regularity of g 1 near 1 is p o or: w e hav e g 1 ( t ) = exp  − θ 2 2 − θ Φ − 1 ( t )  , where Φ(= F 0 ) denotes the standard Gaussian distribution function. As h → 0 , Φ − 1 (1 − h ) ≤ p 2 ln(1 /h ) , implying that g 1 (1 − h ) ≥ exp  − θ 2 2 − θ p 2 ln(1 /h )  . Therefore, g 1 is not diﬀeren tiable at 1, and the conv ergence rates of the k ernel estimators of π 0 studied in Section 3 are slow er than m − 1 / 3 in our setting. These results are summarized in the second row of T able 2. The diﬀerence b etw een one- and t wo-sided tests in the Gaussian location model is illustrated by Figure 4 for θ = 1 , that is when testing N (0 , 1) against N (1 , 1) . The density of t w o-sided p -v alues has a p ositiv e limit at 1, and its deriv ative at 1 is 0, making it p ossible to estimate g (1) = π 0 + (1 − π 0 ) e − θ 2 / 2 at rate m − 2 / 5 , by Corollary 16. Conv ersely , the densit y of one-sided p -v alues tends to 0 at 1, but is not diﬀerentiable: the true π 0 can b e estimated consistently , but the conv ergence rate is slow er. 19 P. Neuvial Con v ergence rates Mo del lim 0 1 /g 1 g 1 (1) ˆ π Sto 0 ,m (1 − h m ( k )) ˆ g k m (1) /η m One-sided Student s ν ( θ ) s ν ( − θ )  m − k/ (2 k +1) /η m  m − k/ (2 k +1) /η m One-sided Gaussian 0 0  m − 1 / 3  m − 1 / 3 One-sided Laplace e − θ e − θ m − 1 / 2 m − 1 / 2 T w o-sided Studen t ( s ν ( θ ) + s ν ( − θ )) / 2 e − θ 2 / 2 m − 2 / 5 /η m  m − k/ (2 k +1) /η m T w o-sided Gaussian 0 e − θ 2 / 2 m − 2 / 5 /η m  m − k/ (2 k +1) /η m T w o-sided Laplace cosh θ e − θ m − 1 / 3 /η m  m − k/ (2 k +1) /η m T able 2: Pr op erties of one- and two-side d test statistics distributions in Student, Gaussian, and L aplac e mo dels, and c onver genc e r ates of the kernel estimators studie d. When the r ate dep ends on k , the value of k may b e chosen arbitr arily lar ge. η m is a se quenc e such that η m → 0 and mη m → + ∞ as m → + ∞ . 6. Concluding remarks This pap er studies asymptotic prop erties of a family of plug-in pro cedures based on the BH pro cedure. When compared to the BH pro cedure or to the Storey- λ pro cedure, th e results for general mo dels obtained in Section 4 show that incorp orating the prop osed estimators of π 0 in to the BH pro cedure asymptotically yields (i) tighter FDR control (or, equiv alently , greater p ow er) and (ii) smaller critical v alues, thereby increasing the range of situations in whic h the resulting pro cedure has p ositive asymptotic p ow er. These improv ements come at the price of a reduction in the conv ergence rate from the parametric rate m − 1 / 2 to a non-parametric rate m − k/ (2 k +1) , where k is connected to the order of diﬀeren tiabilit y of the test statistics distribution. As the results obtained for the prop osed mo diﬁcation of the Storey- λ estimator ˆ π Sto 0 ,m (1 − h m ) require stronger conditions (null deriv atives of g 1 ) than for kernel estimators with a kernel of order k , we conclude that it is generally b etter to use the latter class of estimators. Our application of these results to sp eciﬁc mo dels for the test statistics sheds some ligh t on the inﬂuence of the test statistics distribution on con vergence rates of plug-in pro cedures: • When the test statistics distribution is C ∞ (e.g. for tw o-sided Gaussian test statistics, and for Laplace and Studen t tests statistics), the obtained conv ergence rates are slow er than the parametric rate, but ma y be arbitrarily close to it b y choosing a k ernel of suﬃcien tly high order. The resulting estimators are not consistent estimators of π 0 , although the bias decreases as the non-centralit y parameter θ increases. • When the regularity of the test statistics distribution is p o or (such as in the one-sided Gaussian mo del), the conv ergence rate of the FDP achiev ed by the plug-in pro cedures studied in this pap er is slow er. The plug-in pro cedures studied are still asymptotically more p ow erful than the BH pro cedure or the Storey- λ pro cedure, but the FDP actually achiev ed by that pro cedure ma y b e far from the target FDR level. Obtaining more precise conclusions in the con text of a speciﬁc data set or application exceeds the scop e of the present pap er, as it would require extending the obtained results to more realistic settings such as the ones that are now describ ed. 20 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 N(0,1) vs N(1, 1) one−sided two−sided 0.90 0.92 0.94 0.96 0.98 1.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 x gt(x, theta) Figure 4: Density of one- and two-side d p -values under the alternative hyp othesis for the lo c ation mo del N (0 , 1) versus N (1 , 1) . Inserte d plot: zo om in the r e gion [0 . 9 , 1] , which is highlighte d by a black b ox in the main plot. 6.1 Extensions of the multiple testing setting considered An in teresting researc h direction would b e to extend the multiple testing setting considered here to more realistic assumptions. A typical example of application is the case of diﬀeren- tial expression analyses in genomics, which aim at identifying those genes whose expression lev el diﬀers b etw een tw o known populations of samples. First, we hav e assumed that all null h yp otheses are indep endent, and that all true alternative hypotheses follow the same distri- bution. The indep endence assumption is not realistic, as genes are known to interact with eac h other, in particular through transcriptional regulation net w orks. Moreov er, the level of diﬀerential expression needs not b e the same for all genes under H 1 . F or the results on criticalit y that hav e b een used in this pap er, the pro of giv en in Chi (2007a) essentially relies on the assumption that the p -v alues are indep endently and identically distributed. Th ere- fore, it seems that these results could be extended to comp osite distributions under H 1 , pro vided that the corresp onding marginal distributions are still indep endently and identi- cally distributed. Extending these results to settings where the indep endence assumption 21 P. Neuvial is relaxed seems a more challenging question. As for the conv ergence results established in Section 4, their pro ofs rely on the formalism laid down b y Neuvial (2008). Therefore, these results could b e extended to other dep endency assumptions, or to comp osite distributions under H 1 pro vided that the conv ergence in distribution of the empirical distribution func- tions ( b G 0 ,m , b G 1 ,m ) holds under these assumptions. In that spirit, the results of Neuvial (2008) ha ve recently been extended to an equi-correlated Gaussian model (Delattre and Ro quain, 2011) and to a more general Gaussian mo del where the co v ariance matrix is sup- p osed to b e close enough to the iden tity as the num b er of tests grows to inﬁnity (Delattre and Ro quain, 2013). Second, w e ha ve sho wn that the asymptotic prop erties of FDR controlling pro cedures are driv en b y the shap e and regularity of the test statistics distribution. In practice, the test statistics distribution dep ends on the size of the sample used to generate them. In diﬀeren tial expression analyses, a natural test statistic is Student’s t , whose distribution dep ends on sample size through both the n umber of degrees of freedom ν and a non- cen trality parameter θ . In the spirit of the results of Chi (2007b) on the inﬂuence of sample size on criticality , it would b e interesting to study the conv ergence rates of plug-in pro cedures when b oth the sample size and the n um b er of hypotheses tested grow to inﬁnit y . 6.2 Alternativ e strategies to estimate π 0 The estimators of π 0 considered in this pap er are k ernel estimators of the density g at 1. Therefore, they achiev e non-parametric conv ergence rates of the form m − k/ (2 k +1) /η m , where k controls the regularity of g near 1 and η m → 0 slo wly enough. An interesting op en question is whether these non-parametric rates may b e improv ed. Other strategies for estimating π 0 ma y be considered to ac hieve faster con vergence rates, including the follo wing tw o: • One-stage adaptive pro cedures as prop osed by Blanchard and Ro quain (2009) and Finner et al. (2009) allow more p ow erful FDR control than the standard BH pro cedure without explicitly incorp orating an estimate of π 0 : they are not plug-in pro cedures. • Jin (2008) prop osed an estimator of π 0 based on the F ourier transform of the empirical char- acteristic function of the Z -scores asso ciated to the p -v alues. This estimator do es not fo cus on the b ehavior of the densit y near 1, and might not suﬀer from the same limitations as the esti- mators studied here. This estimator was sho wn to b e consistent for the estimation of π 0 when the Z -scores follow a Gaussian lo cation mixture, but no conv ergence rates were established. In a general semi-parametric framework where g 1 is not necessarily decreasing, and its reg- ularit y is not sp eciﬁed, Nguyen and Matias (2012) hav e recen tly prov ed that if the Leb esgue measure of the set on which g 1 ac hiev es its minimum is n ull, then no consistent estima- tor of min t g ( t ) with a ﬁnite asymptotic v ariance can reac h the parametric con vergence rate m − 1 / 2 . In our setting where g 1 is decreasing, the measure of the set on which g 1 is minimum is indeed n ull, except if g 1 is constant on an in terv al of the form [ t 0 , 1] . F or one-sided tests where g 1 ( t ) = ( f 1 /f 0 )( F − 1 0 (1 − t )) , this extreme situation arises if and only if the likelihoo d ratio is constant on an interv al of the form [ x 0 , + ∞ ) . Among all mo dels studied in Section 5, the only case in which this o ccurs is the one-sided Laplace mo del, where f 1 /f 0 ( x ) = exp( | x | − | x − θ | ) = e θ for x ≥ θ > 0 . The kernel estimators that we hav e studied here do reach the rate m − 1 / 2 in this case. In the more common situation in whic h the measure of the set on whic h g 1 v anishes (or achiev es its minimum) is null, the ab ov e negative result of Nguy en and Matias (2012) suggests that there is little ro om for impro ving on the non-parametric con vergence rates obtained in Prop ositions 13 and 14. W e conjecture that it is not possible for consistent estimators of g (1) to reach a parametric conv ergence rate in this setting. 22 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures A ckno wledgmen ts The author w ould like to thank Stéphane Boucheron for insightful advice and comments, and Catherine Matias and Etienne Roquain for useful discussions. He is also grateful to anon ymous referees for v ery constructive commen ts and suggestions that greatly help ed impro v e the pap er. This work was partly supp orted by the asso ciation “Courir pour la vie, courir p our Curie”, and by the F rench ANR pro ject T AMIS. App endix App endix A. Calculations in sp eciﬁc mo dels A.1 Location mo dels Lemma 21 gives the distribution of the p -v alue under the alternativ e h yp othesis for one- sided tests in the Laplace mo del. The pro of is straigh tforward, so it is omitted. Lemma 21 (One-sided Laplace lo cation model) Assume that the pr ob ability distribution func- tion of the test statistics is f 0 : x 7→ 1 2 e −| x | under the nul l hyp othesis, and f 1 : x 7→ 1 2 e −| x − θ | under the alternative, with θ > 0 (one-side d test). Then 1. The one-side d p -value function is 1 − F 0 ( x ) = ( 1 2 e ( −| x | ) if x ≥ 0 1 − 1 2 e ( −| x | ) if x < 0 2. The inverse one-side d p -value function is (1 − F 0 ) − 1 ( t ) = ( ln  1 2 t  if 0 ≤ t ≤ 1 2 ln (2(1 − t )) if 1 2 < t < 1 3. The c df of one-side d p -values under H 1 is G 1 ( t ) =      te θ if 0 ≤ t ≤ e − θ 2 1 − 1 4 t e − θ if e − θ 2 ≤ t ≤ 1 2 1 − (1 − t ) e − θ if t ≥ 1 2 4. The pr ob ability distribution function of one-side d p -values under H 1 is g 1 ( t ) =      e θ if 0 ≤ t ≤ e − θ 2 1 4 t 2 e − θ if e − θ 2 ≤ t ≤ 1 2 e − θ if t ≥ 1 2 Prop osition 22 (Concavit y in tw o-sided γ -Subb otin mo dels) If the test statistics fol low a γ -Subb otin distribution with γ ∈ [1 , 2] , then the distribution function of the two-side d p -values under the alternative G 1 is c onc ave. Pro of [Proof of Prop osition 22] (Sym) holds for Subb otin models. By Lemma 2, we need to prov e that the likelihoo d ratio f γ 1 /f γ 0 of the γ -Subb otin mo del with γ is such that 23 P. Neuvial h : x 7→ ( f γ 1 /f γ 0 )( x ) + ( f γ 1 /f γ 0 )( − x ) is non-decreasing on R + . The function h is diﬀerentiable on (0 , + ∞ ) \ { θ } , and its deriv ative is given by h 0 ( x ) =  f γ 1 f γ 0  0 ( x ) −  f γ 1 f γ 0  0 ( − x ) , where  f γ 1 f γ 0  0 ( y ) =  sgn( y ) | y | γ − 1 − sgn( y − θ ) | y − θ | γ − 1  f γ 1 f γ 0 ( y ) (20) for any y ∈ R \ { 0 , θ } . Let x > 0 such that x 6 = θ , we are going to pro v e that h 0 ( x ) ≥ 0 . As f γ 1 /f γ 0 is non-decreasing, b oth ( f γ 1 /f γ 0 ) 0 ( x ) and ( f γ 1 /f γ 0 ) 0 ( − x ) are non-negativ e. If ( f γ 1 /f γ 0 ) 0 ( − x ) = 0 , then h 0 ( x ) ≥ 0 as desired. F rom now on, we assume that ( f γ 1 /f γ 0 ) 0 ( − x ) > 0 . As θ > 0 , (20) entails that ( f γ 1 /f γ 0 ) 0 ( x ) ( f γ 1 /f γ 0 ) 0 ( − x ) = x γ − 1 − sgn( x − θ ) | x − θ | γ − 1 ( x + θ ) γ − 1 − x γ − 1 f 1 ( x ) γ f 1 ( − x ) γ , (21) where f 1 ( x ) γ > f 1 ( − x ) γ b ecause −| x − θ | + | x + θ | > 0 . As ( f γ 1 /f γ 0 ) 0 ( − x ) > 0 , it is enough to show that x γ − 1 − sgn( x − θ ) | x − θ | γ − 1 ≥ ( x + θ ) γ − 1 − x γ − 1 (22) in order to prov e that h 0 ( x ) ≥ 0 . By the conca vity of x 7→ x γ − 1 on R + for 1 ≤ γ ≤ 2 , φ : x 7→ θ − 1 ( x γ − 1 − ( x − θ ) γ − 1 ) is non-increasing on [ θ , + ∞ ] . Therefore, if x > θ we hav e φ ( x ) ≥ φ ( x + θ ) and (22) holds. If x < θ , then noting that for any a, b > 0 and ζ ∈ [0 , 1] , a ζ + b ζ ≥ ( a + b ) ζ , we hav e, for 1 ≤ γ ≤ 2 , x γ − 1 + ( θ − x ) γ − 1 ≥ θ γ − 1 ≥ ( x + θ ) γ − 1 − x γ − 1 , and (22) holds as well. A.2 Studen t mo del Lemma 23 (Deriv ativ e of the Student likelihoo d ratio) L et ν ∈ N ∗ and θ > 0 . The likeli- ho o d r atio f 1 /f 0 of the Student mo del with ν de gr e es of fr e e dom and non-c entr ality p ar ameter θ is C 1 on R , and for any t ∈ R ,  f 1 f 0  0 ( t ) = ν ( ν + t 2 ) − 3 / 2 + ∞ X j =0 a 1 j ( ν, θ ) ψ ( j,ν ) ( t ) , (23) wher e a 1 j ( ν, θ ) = ( j + 1) a j +1 ( ν, θ ) is such that ( P j a 1 j ( ν, θ )) c onver ges absolutely. Pro of [Pro of of Lemma 23] As ( P j a j ( ν, θ )) conv erges absolutely and as ψ ( j,ν ) is diﬀer- en tiable on R for any j ≥ 0 and b ounded (b y [-1,1]), the dominated conv ergence theorem ensures that f 1 /f 0 is diﬀerentiable on R and that its deriv ative is given by:  f 1 f 0  0 ( t ) = + ∞ X j =1 a j ( ν, θ ) ψ 0 ( j,ν ) ( t ) . (24) F or t 6 = 0 , we ha ve log  sgn( t ) j ψ ( j,ν ) ( t )  = − j / 2  log(1 + ν /t 2 )  , whose deriv ative is j ν / ( ν t + t 3 ) , so that ψ 0 ( j,ν ) ( t ) = ψ ( j,ν ) ( t ) j ν t ( ν + t 2 ) . (25) 24 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures As ψ ( j,ν ) ( t ) ∼ t → 0 ( t/ √ ν ) j , we hav e ψ ( j,ν ) (0) = 0 , ψ 0 ( j,ν ) (0) = 0 , and ψ 0 ( j,ν ) is contin uous at 0. Equation (23) follows by noting that ψ ( j +1 ,ν ) ( t ) /ψ ( j,ν ) ( t ) = t/ √ t 2 + ν , and that ( P j a 1 j ( ν, θ )) conv erges absolutely by Stirling’s formula. Lemma 23 entails the following result: Prop osition 24 (Regularity of the Studen t likelihoo d ratio) L et ν ∈ N ∗ and θ > 0 . The likeliho o d r atio f 1 /f 0 of the Student mo del with ν de gr e es of fr e e dom and non-c entr ality p ar ameter θ is has the fol lowing pr op erties: 1. f 1 /f 0 is C ∞ on R ; 2. F or any k ∈ N ∗ , we have ( f 1 /f 0 ) ( k ) ( t ) → 0 as | t | → + ∞ ; 3. ( f 1 /f 0 ) (2) (0) 6 = 0 . Pro of [Pro of of Prop osition 24] 1. By (23), the function series in ( f 1 /f 0 ) 0 has the same form as f 1 /f 0 ; therefore, the result easily follo ws by induction. 2. By (23), Leibniz formula entails that the successive deriv atives of f 1 /f 0 are linear com binations of pro ducts of function series of the same form as f 1 /f 0 b y deriv ativ es of t 7→ ( ν + t 2 ) − 3 / 2 . The result follows by the dominated conv ergence theorem, as all the deriv atives of t 7→ ( ν + t 2 ) − 3 / 2 tend to 0 as | t | → + ∞ ; 3. The result follows by diﬀerentiating (23) at 0. Prop osition 25 (Concavit y in the tw o-sided Student mo del) The distribution function G 1 of two-side d p -values in the Student mo del satisﬁes (Conc). Pro of [Pro of of Proposition 25] By Lemma 2, we need to pro v e that the likelihoo d ratio f 1 /f 0 of the Student mo del is suc h that t 7→ ( f 1 /f 0 )( t ) + ( f 1 /f 0 )( − t ) is non-decreasing. Equation (23) yields for t ∈ R  f 1 f 0  0 ( t ) +  f 1 f 0  0 ( − t ) = ν ( ν + t 2 ) − 3 / 2 + ∞ X j =0 a 1 j ( ν, θ )  ψ ( j,ν ) ( t ) − ψ ( j,ν ) ( − t )  , (26) with ψ ( j,ν ) ( t ) − ψ ( j,ν ) ( − t ) = (1 − ( − 1) j )( t/ √ ν + t 2 ) − j . Therefore, as a 1 j ( ν, θ ) > 0 , (26) yields ( f 1 /f 0 ) 0 ( t ) + ( f 1 /f 0 ) 0 ( − t ) ≥ 0 , which concludes the pro of. App endix B. Con vergence rate of a kernel estimator based on Storey’s estimator Pro of [Pro of of Prop osition 12] 25 P. Neuvial 1. W e demonstrate that ˆ π Sto 0 ,m (1 − h m ) ma y b e written as a sum of m indep endent random v ariables that satisfy the Lindeberg-F eller conditions for the Central Limit Theorem (P ollard, 1984). Let Z m i = 1 P i ≥ 1 − h m , where the P i are the p -v alues. Z m i follo ws a Bernoulli distribution with parameter 1 − G (1 − h m ) . Letting Y m i = Z m i − E [ Z m i ] √ mh m , w e hav e P m i =1 Y m i = √ mh m  ˆ π Sto 0 ,m (1 − h m ) − E  ˆ π Sto 0 ,m (1 − h m )  . The ( Y m i ) 1 ≤ i ≤ m are cen- tered, indep endent random v ariables, with V ar Y m i = V ar Z m i / ( mh m ) = G (1 − h m )(1 − G (1 − h m )) / ( mh m ) , which is equiv alent to g (1) /m as m → + ∞ . Therefore, lim m → + ∞ m X i =1 E  ( Y m i ) 2  = g (1) . Finally we prov e that for any ε > 0 , lim m → + ∞ m X i =1 E  ( Y m i ) 2 1 | Y m i | >ε  = 0 . As Z m i ∈ { 0 , 1 } and E [ Z m i ] ∈ [0 , 1] , we ha v e ( Y m i ) 2 ≤ 1 / ( mh m ) , and m X i =1 E  ( Y m i ) 2 1 | Y m i | >ε  ≤ 1 h m E  1 | Y m 1 | >ε  = 1 h m P ( | Y m 1 | > ε ) ≤ 1 h m V ar Y m 1 ε 2 b y Chebyc heﬀ ’s inequalit y . As mh m → + ∞ and V ar Y m 1 ∼ g (1) /m as m → + ∞ , the ab ov e sum therefore go es to 0 as mh m → + ∞ . The Lindeb erg-F eller conditions for the Cen tral Limit Theorem are thus fulﬁlled, and we hav e m X i =1 Y m i N (0 , g (1)) , whic h concludes the pro of. 2. As G ( λ ) = π 0 λ + (1 − π 0 ) G 1 ( λ ) , we hav e, for any λ < 1 , 1 − G ( λ ) 1 − λ = π 0 + (1 − π 0 ) 1 − G 1 ( λ ) 1 − λ . (27) Therefore, the bias is given by E  ˆ π Sto 0 ,m ( λ )  − π 0 = (1 − π 0 ) 1 − G 1 ( λ ) 1 − λ . A T aylor expansion as λ → 1 yields 1 − G 1 ( λ ) = k X l =0 ( − 1) l g ( l ) 1 (1) ( l + 1)! (1 − λ ) l +1 + o  (1 − λ ) l +1  = (1 − λ ) g 1 (1) + ( − 1) k g ( k ) 1 (1) ( k + 1)! (1 − λ ) k +1 + o  (1 − λ ) k +1  26 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures as g ( l ) 1 (1) = (1 − π 0 ) − 1 g ( l ) (1) = 0 for 1 ≤ l < k . Therefore, if h m → 0 as m → + ∞ , w e hav e E  ˆ π Sto 0 ,m (1 − h m )  − g (1) = (1 − π 0 ) ( − 1) k g ( k ) 1 (1) ( k + 1)! h k m + o  h k m  , whic h concludes the pro of, as (1 − π 0 ) g ( k ) 1 (1) = g ( k ) (1) . Pro of [Pro of of Prop osition 13] By Prop osition 12, the asymptotic v ariance of ˆ π 0 ,m (1 − h m ) is equiv alent to g (1) / ( mh m ) , and the bias is of order h k m . The optimal bandwidth is obtained for h m prop ortional to m − 1 / (2 k +1) , b ecause this choice balances v ariance and squared bias. The prop ortionality constant is an explicit function of k , π 0 , g 1 (1) , and g ( k ) 1 (1) . By deﬁnition, the MSE that corresp onds to this optimal choice is twice the corresp onding squared bias, i.e. of order m − 2 k/ (2 k +1) , which completes the pro of of (1). T o prov e (2), we note that p mh m ( ˆ π 0 ,m − g (1)) = p mh m ( ˆ π 0 ,m − E [ ˆ π 0 ,m ]) + p mh m ( E [ ˆ π 0 ,m ] − g (1)) , where ˆ π 0 ,m denotes ˆ π 0 ,m (1 − h m ) to alleviate notation. The ﬁrst term (v ariance) conv erges in distribution to N (0 , g (1)) by Prop osition 12 (1) as so on as √ mh m → + ∞ . The sec- ond term (bias) is of the order of √ mh m h k m = p mh 2 k +1 m b y Prop osition 12 (2). T aking h m ( k ) = h ? m ( k ) η 2 m , where η m → 0 , we hav e mh 2 k +1 m → 0 , which ensures that the bias term con v erges in probability to 0. App endix C. Extension of Neuvial (2008) to the unconditional setting In this section, we sho w that the results obtained by Neuvial (2008) in the original (con- ditional) setting of Benjamini and Ho ch b erg (1995) also hold in the unconditional setting considered here, at the price of an additional term in the asymptotic v ariance due to the ﬂuctuations of the random v ariable π 0 ,m . W e start by stating a lemma which provides a lo wer b ound on the critical v alue of plug-in pro cedures. It is is a consequence of Prop osi- tion 10(1). Lemma 26 L et α m b e a se quenc e of (p ossibly data-dep endent) levels that c onver ges in pr ob ability to α ∞ ∈ (0 , 1) as m → + ∞ . If α ∞ < α ? B H , then the thr eshold b τ m ( α m ) of the BH ( α m ) pr o c e dur e c onver ges in pr ob ability to 0 as m → + ∞ . If the c onver genc e of α m to α ∞ holds almost sur ely, then the c onver genc e of b τ m ( α m ) to 0 holds almost sur ely as wel l. Pro of [Pro of of Lemma 26] Assume that α m con v erges to α ∞ in probability , with α ∞ < α ? B H . Let ε > 0 , w e are going to show that there exists an in teger N > 0 such that for a large enough m , the n um b er of rejections of the BH ( α m ) pro cedure is less than N with probabilit y greater than 1 − ε . Let ¯ α = ( α ∞ + α ? B H ) / 2 . As α m P → α ∞ < ¯ α , there exists an in teger M such that for any m ≥ M , α m ≤ ¯ α with probability greater than 1 − ε/ 2 . As ¯ α < α ? B H , Prop osition 10(1) entails that the num b er of rejections by the BH ( ¯ α ) pro cedure is b ounded in probability as m → + ∞ ; that is, there exist tw o integers N and M 0 suc h that for m ≥ M 0 , the num b er of rejections of the BH ( ¯ α ) pro cedure is less than N with probabilit y greater that 1 − ε/ 2 . Thus, for any m ≥ max( M , M 0 ) , the num b er of rejections of the BH ( α m ) pro cedure is less than N with probability greater that 1 − ε . The pro of for the almost sure conv ergence in the case when α m con v erges to α ∞ almost surely is similar. 27 P. Neuvial W e follo w the proof technique introduced by Neuvial (2008), b y writing the empir- ical threshold of a giv en FDR controlling pro cedure (and its asso ciated FDP) as the result of the application of a thr eshold function of the empirical distribution of the ob- serv ed p -v alues. As the regularity of the threshold functions inv olved has already b een established b y Neuvial (2008), the result is a consequence of the fact that the p -v alue distributions under the null and the alternative hypotheses (as deﬁned b elow) satisfy Donsk er’s theorem in the current unconditional setting. This Donsker’s theorem has b een established by Genov ese and W asserman (2004). F or a ∈ { 0 , 1 } and t ∈ [0 , 1] , w e let b Γ a,m ( t ) = m − 1 P m i =1 1 H a true and P i ≤ t . Prop osition 27 (Genov ese and W asserman (2004), Theorem 4.1) As m → + ∞ , we have: 1. √ m b Γ 0 ,m ( t ) b Γ 1 ,m ( t ) ! −  π 0 t (1 − π 0 ) g 1 ( t )  !  W 0 W 1  , (28) wher e ( W 0 , W 1 ) is a two-dimensional, c enter e d Gaussian pr o c ess with c ovarianc e function γ ( s, t ) deﬁne d for any ( s, t ) ∈ [0 , 1] 2 by γ ( s, t ) =  π 0 s ∧ t − π 2 0 st − π 0 s (1 − π 0 ) G 1 ( t ) − π 0 t (1 − π 0 ) G 1 ( s ) (1 − π 0 ) G 1 ( s ∧ t ) − (1 − π 0 ) 2 G 1 ( s ) G 1 ( t )  (29) 2. √ m  b G m − G  W , (30) wher e W ( d ) = W 0 + W 1 is a one-dimensional, c enter e d Gaussian pr o c ess with c ovarianc e function ( s, t ) 7→ G ( s ∧ t ) − G ( s ) G ( t ) . Note that b Γ 0 ,m = π 0 ,m b G 0 ,m and b Γ 1 ,m = (1 − π 0 ,m ) b G 1 ,m , where ( b G 0 ,m , b G 1 ,m ) are the empirical distribution functions of the p -v alues under H 0 and H 1 , resp ectively . The results of Neuvial (2008) hav e b een obtained by directly considering the conv ergence of the process ( b G 0 ,m , b G 1 ,m ) instead of ( b Γ 0 ,m , b Γ 1 ,m ) , b ecause π 0 ,m w as deterministic in the conditional setting (see Neuvial (2009, Theorem 3.1)). The results established in Neuvial (2008) (in particular Theorem 3.2) can b e translated to the unconditional setting just b y replacing the pro cesses π 0 Z 0 and π 1 Z 1 in Neuvial (2008) by the pro cesses W 0 and W 1 deﬁned in Prop osition 27, and consequently , the pro cess Z = π 0 Z 0 + π 1 Z 1 b y W = W 0 + W 1 . Therefore, the asymptotic prop erties of the BH pro cedure and Storey’s pro cedure (i.e. BH ( · / ˆ π Sto 0 ,m ( λ )) in the unconditional setting can b e obtained by adapting the pro of of the corresp onding theorems (Theorems 4.2 and 4.15) in Neuvial (2008): Corollary 28 (Asymptotic prop erties of the BH procedure in the unconditional setting) F or any α ≥ α ? B H , we have 1. The asymptotic distribution of the thr eshold b τ m ( α ) is given by √ m ( b τ m ( α ) − τ ∞ ( α )) N  0 , G ( τ ∞ ( α ))(1 − G ( τ ∞ ( α ))) (1 /α − g ( τ ∞ ( α ))) 2  (31) 2. The asymptotic distribution of the asso ciate d FDPs is given by √ m (FDP m ( b τ m ( α )) − π 0 α ) N  0 , ( π 0 α ) 2  1 π 0 τ ∞ ( α ) − 1  (32) 28 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures The asymptotic prop erties of the BH Oracle pro cedure are simply obtained b y applying Corollary 28 at level α/π 0 . Corollary 29 (Asymptotic prop erties of Storey’s pro cedure in the unconditional mo del) F or any λ ∈ [0 , 1) , and α ∈ [0 , 1] , let b τ 0 ,λ m ( α ) = T Sto( λ ) ( b G m ) b e the empiric al thr eshold b τ 0 ,λ m ( α ) of Stor ey’s pr o c e dur e at level α , and τ 0 ,λ ∞ ( α ) = T Sto( λ ) ( G ) b e the c orr esp onding asymptotic thr eshold. Then, 1. α ? Sto ( λ ) = π 0 ( λ ) α ? B H is the critic al value of Stor ey’s pr o c e dur e; 2. F or any α > α ? Sto ( λ ) : (a) The asymptotic distribution of the thr eshold b τ 0 ,λ m ( α ) is given by √ m  b τ 0 ,λ m ( α ) − τ 0 ,λ ∞ ( α )  τ 0 ,λ ∞ ( α ) π 0 ( λ ) /α − g ( τ 0 ,λ ∞ ( α )) ( W ( τ 0 ,λ ∞ ( α )) τ 0 ,λ ∞ ( α ) + 1 α W ( λ ) 1 − λ ) , (33) wher e W is a c enter e d Gaussian pr o c ess with c ovarianc e function ( s, t ) 7→ G ( s ∧ t ) − G ( s ) G ( t ) ; (b) The asymptotic distribution of the asso ciate d FDPs is given by √ m  FDP m ( b τ 0 ,λ m ( α )) − π 0 α/π 0 ( λ )  N  0 , σ 2 λ  , (34) wher e σ 2 λ =  π 0 α π 0 ( λ )  2 ( 1 π 0 τ 0 ,λ ∞ ( α ) + 2 τ 0 ,λ ∞ ( α ) ∧ λ τ 0 ,λ ∞ ( α )(1 − G ( λ )) − 1 1 − G ( λ ) ) Note that Corollary 29 with λ = 0 recov ers Corollary 28. App endix D. Asymptotic prop erties of plug-in pro cedures D.1 Proof of Theorem 15 W e denote by b ρ 0 m ( α ) the prop ortion of rejections, and by b ν 0 m ( α ) the prop ortion of incorrect rejections by the plug-in procedure BH ( α/ ˆ π 0 ,m ) (among all m hypotheses tested). They ma y b e written as b ρ 0 m ( α ) = b G m ( b τ 0 m ( α )) = b τ 0 m ( α ) ˆ π 0 ,m /α and b ν 0 m ( α ) = π 0 ,m b G 0 ,m ( b τ 0 m ( α )) , re- sp ectiv ely . The following Lemma shows that the conv ergence rate of ( b τ 0 m ( α ) , b ν 0 m ( α ) , b ρ 0 m ( α )) for a large enough α is driven by the conv ergence rate of ˆ π 0 ,m . In order to alleviate no- tation, we omit the “ ( α ) ” in b τ 0 m , b ρ 0 m , b ν 0 m , τ 0 ∞ , ρ 0 ∞ , ν 0 ∞ in the remainder of this section. Moreo v er, FDP m ( b τ 0 m ( α )) will simply b e denoted by [ FDP 0 m . Lemma 30 L et ˆ π 0 ,m b e an estimator of π 0 such that ˆ π 0 ,m → π 0 , ∞ in pr ob ability as m → + ∞ . Deﬁne α ? 0 = π 0 , ∞ α ? B H , and let α > α ? 0 . Then, under (Conc) , we have, as m → + ∞ : 1. b τ 0 m c onver ges in pr ob ability to τ 0 ∞ as m → + ∞ , with g ( τ 0 ∞ ) < π 0 , ∞ /α . If the c onver genc e of ˆ π 0 ,m to π 0 , ∞ holds almost sur ely, then that of b τ 0 m to τ 0 ∞ holds almost sur ely as wel l; 2. F urther assume that √ mh m ( ˆ π 0 ,m − π 0 , ∞ ) c onver ges in distribution for some h m such that h m = o (1 / ln ln m ) and mh m → + ∞ as m → + ∞ . Then ( b τ 0 m , b ν 0 m , b ρ 0 m ) c onver ges at in distribution at r ate 1 / √ mh m , with   b τ 0 m b ν 0 m b ρ 0 m   −   τ 0 ∞ ν 0 ∞ ρ 0 ∞   = τ 0 ∞ /α π 0 , ∞ /α − g ( τ 0 ∞ )   1 π 0 g ( τ 0 ∞ )   ( π 0 , ∞ − ˆ π 0 ,m )(1 + o P (1)) , wher e ν 0 ∞ = π 0 τ 0 ∞ and ρ 0 ∞ = G ( τ 0 ∞ ) = π 0 , ∞ τ 0 ∞ /α . 29 P. Neuvial Pro of [Pro of of Lemma 30] F or 1., we assume that the conv ergence of ˆ π 0 ,m to π 0 , ∞ holds in probability . If it also holds almost surely , then the conv ergence of b τ 0 m to τ 0 ∞ is almost sure as well. The sketc h of the pro of is inspired by v an der V aart (1998, Lemma 21.3). Let ψ F,ζ : t 7→ t/ζ − F ( t ) for any distribution function F and an y ζ ∈ (0 , 1] . As b G m ( b τ 0 m ) = ˆ π 0 ,m b τ 0 m /α and G ( τ 0 ∞ ) = π 0 , ∞ τ 0 ∞ /α , we ha ve ψ G,α/π 0 , ∞ ( τ 0 ∞ ) = 0 and ψ b G m ,α/ ˆ π 0 ,m ( b τ 0 m ) = 0 . The pro of relies on the follo wing prop erty: (a) ψ G,α/π 0 , ∞ ( b τ 0 m ) conv erges in probability to 0 = ψ G,α/π 0 , ∞ ( τ 0 ∞ ) ; (b) ψ G,α/π 0 , ∞ is lo cally inv ertible in a neigh b orho o d of τ 0 ∞ , with ˙ ψ G,α/π 0 , ∞ ( τ 0 ∞ ) > 0 . T o prov e ( a ) , we note that − ψ G,α/π 0 , ∞ ( b τ 0 m ) = G ( b τ 0 m ) − π 0 , ∞ b τ 0 m /α = ( G − b G m )( b τ 0 m ) + ( b G m ( b τ 0 m ) − ˆ π 0 ,m b τ 0 m /α ) + ( ˆ π 0 ,m − π 0 , ∞ ) b τ 0 m /α . The ﬁrst term con verges to 0 almost surely , the second one is iden tically null, and the third one conv erges in probability to 0 as ˆ π 0 ,m con v erges in probability to π 0 , ∞ , and b τ 0 m ∈ [0 , 1] . Item (b) holds as G in conca v e (b y (Conc)) and α/π 0 , ∞ > α ? B H , where α ? B H = lim u → 0 u/G ( u ) is the critical v alue of the BH pro cedure (see Neuvial (2008, Lemma 7.6 page 1097) for a pro of of the inv ertibility). 1. Combining ( a ) and ( b ) , b τ 0 m con v erges in probabilit y to τ 0 ∞ , and ˙ ψ G,α/π 0 , ∞ ( τ 0 ∞ ) = π 0 , ∞ /α − g ( τ 0 ∞ ) is p ositive. 2. W e only give the pro of for b τ 0 m , as the pro ofs for b ν 0 m and b ρ 0 m are similar. The idea of the pro of is that the ﬂuc tuations of ¯ G m = b G m − G , the centered empirical pro cess asso ciated with G , are of order 1 / √ m by Donsker’s theorem (Donsker, 1951); thus, these ﬂuctuations are negligible with resp ect to the ﬂuctuations of ˆ π 0 ,m − π 0 , ∞ , which are assumed to b e of order 1 / √ mh m with h m → 0 . W e hav e G ( b τ 0 m ) − G ( τ 0 ∞ ) = ( G ( b τ 0 m ) − b G m ( b τ 0 m )) + ( b G m ( b τ 0 m ) − G ( τ 0 ∞ )) = − ¯ G m ( b τ 0 m ) + ( ˆ π 0 ,m b τ 0 m /α − π 0 , ∞ τ 0 ∞ /α ) b ecause b G m ( b τ 0 m ) = ˆ π 0 ,m b τ 0 m /α and G ( τ 0 ∞ ) = π 0 , ∞ τ 0 ∞ /α . Therefore, G ( b τ 0 m ) − G ( τ 0 ∞ ) = − ¯ G m ( b τ 0 m ) + ˆ π 0 ,m α ( b τ 0 m − τ 0 ∞ ) + ˆ π 0 ,m − π 0 , ∞ α τ 0 ∞ . As b τ 0 m P → τ 0 ∞ as m → + ∞ , we also ha ve G ( b τ 0 m ) − G ( τ 0 ∞ ) = ( b τ 0 m − τ 0 ∞ )( g ( τ 0 ∞ ) + o P (1)) by T a ylor’s formula. Hence we hav e  g ( τ 0 ∞ ) − ˆ π 0 ,m /α + o P (1)  ( b τ 0 m − τ 0 ∞ ) = − ¯ G m ( b τ 0 m ) + ( ˆ π 0 ,m − π 0 , ∞ ) τ 0 ∞ /α . No w b ecause ˆ π 0 ,m con v erges in probabilit y to π 0 , ∞ , we hav e g ( τ 0 ∞ ) − ˆ π 0 ,m /α = ( g ( τ 0 ∞ ) − π 0 , ∞ /α )(1 + o P (1)) . By 1, we hav e π 0 , ∞ /α > g ( τ 0 ∞ ) , so that for suﬃciently large m : b τ 0 m − τ 0 ∞ = ¯ G m ( b τ 0 m ) g ( τ 0 ∞ ) − π 0 , ∞ /α (1 + o P (1)) + τ 0 ∞ /α g ( τ 0 ∞ ) − π 0 , ∞ /α ( ˆ π 0 ,m − π 0 , ∞ ) . Finally , we note that as k ¯ G m k ∞ ∼ c p ln ln m/m (by the Law of the Iterated Logarithm) and h m = o (1 / ln ln m ) , we ha ve ¯ G m ( b τ 0 m ) = o P  1 / √ mh m  . On the other hand, √ mh m ( ˆ π 0 ,m − π 0 , ∞ ) 30 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures con v erges in distribution, so that the term ( ˆ π 0 ,m − π 0 , ∞ ) τ 0 ∞ /α dominates the right-hand side. Finally , we hav e b τ 0 m − τ 0 ∞ = τ 0 ∞ /α g ( τ 0 ∞ ) − π 0 , ∞ /α ( ˆ π 0 ,m − π 0 , ∞ )(1 + o P (1)) , whic h concludes the pro of for b τ 0 m . Pro of [Pro of of Theorem 15] 1. is a consequence of Lemma 26 combined with Lemma 30(1); 2.(a) is a consequence of Lemma 26(2). Let us pro ve 2.(b). By Lemma 30, we hav e p mh m  b ν 0 m b ρ 0 m  −  ν 0 ∞ ρ 0 ∞  ξ ∞  π 0 g ( τ 0 ∞ )  X , (35) where X ∼ N (0 , s 2 0 ) and ξ ∞ = τ 0 ∞ /α π 0 , ∞ /α − g ( τ 0 ∞ ) . Recall that [ FDP 0 m = b ν 0 m / ( b ρ 0 m ∨ m − 1 ) . W e b egin by noting that for a large enough m , w e ha v e b ρ 0 m > 1 /m almost surely . This is a consequence of the fact that (i) b ρ 0 m = b G m ( b τ 0 m ) = ˆ π 0 ,m b τ 0 m /α , with b τ 0 m b ounded aw ay from 0 (b y 1.), and (ii) ˆ π 0 ,m con v erges to π 0 , ∞ ≥ π 0 > α . As a consequence, the factor m − 1 ma y b e omitted in [ FDP 0 m for a large enough m ; the FDP ma y then b e written as [ FDP 0 m = γ ( b ν 0 m , b ρ 0 m ) , where γ : ( u, v ) 7→ u/v for any u ≥ 0 and v > 0 . γ is diﬀerentiable for any such ( u, v ) , with deriv ative ˙ γ u,v = (1 /v , − u/v 2 ) = 1 /v (1 , − u/v ) . In particular, recalling that ν 0 ∞ = π 0 τ 0 ∞ and ρ 0 ∞ = G ( τ 0 ∞ ) = π 0 , ∞ τ 0 ∞ /α , we hav e ˙ γ ν 0 ∞ ,ρ 0 ∞ = α τ 0 ∞ π 0 , ∞  1 , − π 0 α π 0 , ∞  . (36) As γ ( ν 0 ∞ , ρ 0 ∞ ) = π 0 α/π 0 , ∞ , the Delta metho d yields p mh m  [ FDP 0 m − π 0 α π 0 , ∞  N  0 , w 2  , with w = s 0 ξ ∞ ˙ γ ν 0 ∞ ,ρ 0 ∞  π 0 g ( τ 0 ∞ )  . By (36), we hav e ˙ γ ν 0 ∞ ,ρ 0 ∞  π 0 g ( τ 0 ∞ )  = α 2 π 0 τ 0 ∞ π 2 0 , ∞ ( π 0 , ∞ /α − g ( τ 0 ∞ )) , so that w = s 0 π 0 α/π 2 0 , ∞ . D.2 Consistency , purity and criticality Pro of [Pro of of Lemma 19] W e note that f 1 ( x ) f 0 ( x ) = f 0 ( x − θ ) f 0 ( x ) b y deﬁnition of a lo cation mo del = f 0 ( − x + θ ) f 0 ( − x ) b y (Sym) = f 0 ( − x + θ ) f 1 ( − x + θ ) , 31 P. Neuvial whic h concludes the pro of, as θ is a ﬁxed scalar. Pro of [Pro of of Prop osition 20] W e hav e α ? B H = lim t → 0 1 /g ( t ) , where g = π 0 + (1 − π 0 ) g 1 and g 1 ( t ) = f 1 f 0  − F − 1 0 ( t )  . Therefore, as lim t → 0 F − 1 0 ( t ) = + ∞ , the result is a consequence of Lemma 19. D.3 Regularit y of g 1 for t wo-sided tests in symmetric models Pro of [Pro of of Lemma 18] 1. W e make the additional assumption that there exists η > 0 such that f 1 /f 0 is diﬀerentiable on V η = [ − η , η ] \ { 0 } , and that its deriv ativ e tends to ` − as u → 0 − and ` + as u → 0 + . This assumption makes the pro of simpler, and it holds in the mo dels considered in this pap er. Ho w ever, the result still holds (and is simpler to state) without this extra assumption. By Prop osition 1, we ha ve under (Sym) g 1 ( t ) = 1 2  f 1 f 0 ( q 0 ( t/ 2)) + f 1 f 0 ( − q 0 ( t/ 2))  , where q 0 ( t/ 2) = F − 1 0 (1 − t/ 2) maps Q η = [2(1 − F 0 ( η )) , 1) onto (0 , η ] . Therefore, g 1 is diﬀeren tiable on Q η and satisﬁes, for any t in Q η : g (1) 1 ( t ) = 1 2 (  f 1 f 0  0 ( q 0 ( t/ 2)) −  f 1 f 0  0 ( − q 0 ( t/ 2)) ) × 1 2 q 0 0 ( t/ 2) = − 1 4 f 0 ( q 0 ( t/ 2))  f 1 f 0  0 ( q 0 ( t/ 2)) −  f 1 f 0  0 ( − q 0 ( t/ 2)) ! (37) As t → 1 , q 0 ( t/ 2) → 0 + , (37) implies that g 1 is diﬀerentiable at 1 with deriv ativ e − (4 f 0 (0)) − 1 ( ` + − ` − ) . 2. Similarly , we prov e the result with the extra assumption that f 1 /f 0 is twice diﬀerentiable in a neighborho o d of 0. Then (37) entails that g 1 is itself twice diﬀerentiable in a neighborho o d of 1. W riting g (1) 1 ( t ) = a ( t ) b ( t ) , with ( a ( t ) = 1 / (4 f 0 ( q 0 ( t/ 2))) b ( t ) = − ( f 1 /f 0 ) 0 ( q 0 ( t/ 2)) + ( f 1 /f 0 ) 0 ( − q 0 ( t/ 2)) , w e hav e g (2) 1 ( t ) = a 0 ( t ) b ( t ) + a ( t ) b 0 ( t ) . As q 0 (1 / 2) = F − 1 0 (1 / 2) = 0 , we hav e b (1) = 0 , so that g (2) 1 (1) = a (1) b 0 (1) , where a (1) = 1 / (4 f 0 (0)) and b 0 ( t ) = 1 2 f 0 ( q 0 ( t/ 2))  f 1 f 0  (2) ( q 0 ( t/ 2)) +  f 1 f 0  (2) ( − q 0 ( t/ 2)) ! . Th us b 0 (1) = 1 / (2 f 0 (0)) × 2( f 1 /f 0 ) (2) (0) , which concludes the pro of. 32 Asymptotics of Kernel-Based Ad aptive FDR Contr olling Pr ocedures References Y. Benjamini and Y. Hoch b erg. Controlling the false disco v ery rate: A practical and p o w erful approac h to multiple testing. J. R. Stat. So c. Ser. B Stat. Metho dol. , 57(1):289–300, 1995. Y. Benjamini and Y. Ho ch b erg. On the A daptiv e Control of the F alse Disco very Rate in Multiple T esting With Indep endent Statistics. Journal of Educ ational and Behavior al Statistics , 25(1): 60–83, 2000. Y. Benjamini and D. Y ekutieli. The con trol of the false discov ery rate in m ultiple testing under dep endency. A nn. Statist. , 29(4):1165–1188, 2001. Y. Benjamini, A. M. Krieger, and D. Y ekutieli. A daptiv e linear step-up pro cedures that control the false discov ery rate. Biometrika , 93(3):491–507, 2006. G. Blanc hard and E. Ro quain. Adaptiv e FDR control under indep endence and dep endence. J. Mach. L e arn. R es , 10:2837–2871, 2009. Z. Chi. On the p erformance of FDR control: constraints and a partial solution. Ann. Statist. , 35 (4):1409–1431, 2007a. Z. Chi. Sample size and p ositive false discov ery rate con trol for multiple testing. Ele ctr onic Journal of Statistics , 1:77–118, 2007b. Z. Chi and Z. T an. Positiv e false disco very prop ortions: intrinsic b ounds and adaptiv e control. Statistic a Sinic a , 18(3):837–860, 2008. S. Delattre and E. Ro quain. On the false disco very prop ortion conv ergence under Gaussian equi- correlation. Statistics & Pr ob ability L etters , 81(1):111–115, Jan. 2011. S. Delattre and E. Ro quain. Asymptotics of empirical distribution function for Gaussian sub ordinated arrays with an application to multiple testing, 2013. URL http://hal. archives- ouvertes.fr/hal- 00739749 . Preprint. M. D. Donsker. An inv ariance principle for certain probability limit theorems. Mem. Amer. Math. So c. , 6:12, 1951. B. Efron, R. Tibshirani, J. D. Storey , and V. T usher. Empirical bay es analysis of a microarray exp erimen t. Journal of the Americ an Statistic al Asso ciation , 96(456):1151–1160, Dec. 2001. G. Eklund. Massigniﬁk ansproblemet. Unpublished seminar pap ers, Uppsala Universit y Institute of Statistics, 1961–1963. H. Finner, T. Dickhaus, and M. Roters. On the false discov ery rate and an asymptotically optimal rejection curve. The Annals of Statistics , 37(2):596–618, Apr. 2009. doi: 10.1214/07- A OS569. C. R. Genov ese and L. W asserman. A Sto chastic Pro cess Approach to F alse Discov ery Control. Ann. Statist. , 32(3):1035–1061, 2004. N. W. Hengartner and P . B. Stark. Finite-Sample Conﬁdence Env elop es for Shap e-Restricted Den- sities. A nn. Statist. , 23(2):525–550, 1995. J. Jin. Prop ortion of non-zero normal means: universal oracle equiv alences and uniformly consistent estimators. Journal of the R oyal Statistic al So ciety: Series B {(Statistic al} Metho dolo gy) , 70(3): 461–493, 2008. 33 P. Neuvial N. L. Johnson and B. L. W elch. Applications of the non-central t -distribution. Biometrika , 31(3): 362–389, 1940. P . Neuvial. Asymptotic prop erties of false discov ery rate controlling pro cedures under indep endence. Ele ctr onic Journal of Statistics , 2:1065–1110, 2008. ISSN 1935-7524. doi: 10.1214/08- EJS207. P . Neuvial. Corrigendum to “ Asymptotic properties of false disco very rate con trolling pro cedures under indep endence” . Ele ctr onic Journal of Statistics , 3:1083, 2009. V. Nguy en and C. Matias. On eﬃcient estimators of the proportion of true n ull h yp otheses in a m ultiple testing setup. Hal preprin t http://hal.arc hives-ouv ertes.fr/hal-00647082, 2012. D. B. Pollard. Conver genc e of Sto chastic Pr o c esses . Springer-V erlag, 1984. E. Ro quain and F. Villers. Exact calculations for false disco very prop ortion with application to least fa v orable conﬁgurations. The Annals of Statistics , 39(1):584–612, 2011. T. Sch weder and E. Sp jøtvoll. Plots of P-v alues to ev aluate many tests simultaneously. Biometrika , 69(3):493–502, 1982. P . Seeger. A note on a metho d for the analysis of signiﬁcances en masse. T e chnometrics , pages 586–593, 1968. J. D. Storey . A direct approach to false discov ery rates. J. R. Stat. So c. Ser. B Stat. Metho dol. , 64 (3):479–498, 2002. J. D. Storey . The Positiv e F alse Disco v ery Rate: A Bay esian In terpretation and the q-v alue. Ann. Statist. , 31(6):2013–2035, 2003. J. D. Storey and R. Tibshirani. Statistical signiﬁcance for genomewide studies. Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a , 100(16):9440–5, Aug. 2003. ISSN 0027-8424. doi: 10.1073/pnas.1530509100. J. D. Storey , J. E. T a ylor, and D. Siegm und. Strong control, conserv ative p oint estimation and sim ultaneous conserv ative consistency of false discov ery rates: a uniﬁed approac h. J. R. Stat. So c. Ser. B Stat. Metho dol. , 66(1):187–205, 2004. M. T. Subb otin. On the law of frequency of errors. Matematicheskii Sb ornik , 31:296–301, 1923. J. W. H. Swanepo el. The limiting b eha vior of a mo diﬁed maximal symmetric 2s-spacing with applications. A nn. Statist. , 27(1):24–35, 1999. A. B. T sybako v. Intr o duction to non-p ar ametric estimation . Springer, 2009. A. W. v an der V aart. Asymptotic statistics , v olume 3 of Cambridge Series in Statistic al and Pr ob a- bilistic Mathematics . Cambridge Universit y Press, Cambridge, 1998. ISBN 0-521-49603-9; 0-521- 78450-6. 34

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment