Sequential tests and estimates after overrunning based on $p$-value combination

IMS Collectio ns Pushing the Limits of Con temp orary Statist ics: Contributions in Honor of Jay an ta K. Ghosh V ol. 3 ( 2008) 33–45 c  Institute of Mathe matical Statistics , 2008 DOI: 10.1214/ 07492170 80000000 39 Sequen tial tests and estimates after o v errunni ng based on p - v alue com bination W. J. Hall ∗ 1 and Keyue Ding † 2 University of R o chester and Que en ’s Universit y Abstract: Often in seque n tial trials additional d ata become av ailable after a stopping b oundary has b een reached. A metho d of incorp orating such in- formation from o v errunning is dev eloped, based on the “adding weigh te d Zs” method of com bining p - v alues. This yields a com bined p -v alue for the primary test and a median-unbiased estimate and conﬁdence bounds for the par ame- ter under test. When the amoun t of o v errunning information is proportional to the amoun t a v ailable up on terminating the sequen tial test, exact inference methods are provided; otherwise, approx imate methods are given and ev al- uated. The conte xt is that of observing a Brownian motion wi th drif t, with either linear stopping b oundaries in contin uous time or discrete-time group- sequen tial boundaries. The method is compared wi th other a v ailable methods and is exempliﬁed wi th data from t wo sequen tial clinical trials. Con ten ts 1 Int ro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2 Combining p -v alues b y adding weigh ted Zs and an extension . . . . . . . . 35 3 Incorp ora ting overrunning by combining p -v alues . . . . . . . . . . . . . . 37 4 Computing p -v a lues and conﬁdence b ounds . . . . . . . . . . . . . . . . . 38 5 T rue c o nﬁdence co eﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6 Computational supp ort for appr oximations . . . . . . . . . . . . . . . . . 39 7 Reversals and p ow er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 0 8 An example: the MADIT study . . . . . . . . . . . . . . . . . . . . . . . . 41 9 Final remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Ac knowledgmen ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1. In troductio n Suppo se a seq uential tria l is ca rried out to test a nu ll hyp o thesis a bo ut a real parameter δ . O nc e the trial is concluded, a non-seq uent ial trial is conducted, with a test o f the same hypo thesis. The trials are connected in that the amo unt of ∗ Supported i n part by gran t R01-HL58751 from the National Heart, Lung and Blo o d Institute (USA). † Supported b y a gran t from the Natural Sciences and Engineering Researc h Council of Canada. 1 Departmen t of Biostatistics, University of Ro che ster Medical Cen ter, Ro c hester, NY 14642- 8630, USA, e-mail: hall@bst .rochest er.edu 2 NCIC Clinical T r ials G roup, Queen’s Univ ersit y , Kingsto n, ON K7L 3N6, Canada AMS 2000 subje c t classiﬁc ations: Pr imary 62 L10; secondary 62P1 0. Keywor ds and phr ases: delay ed observ ations, deletion method, double sampling, lagged data, meta analysis, ML orderi ng, seque n tial clinical trial. 33 34 W. J. Hal l and K. Di ng information in the no n-sequential tria l may depend on data accum ulated in the sequential trial. Ho w can the results of the t w o tria ls b e combined, and a single ov erall test constructed? The context is that the data, or incremental information, in the no n-sequential tria l repr esent “overrunning,” from “ lagged” data from the sequential tria l. T. W. Anderso n [ 1 ] co nsidered the problem of inco rp orating lag g ed data in an accept-reject rule following a se quential pr ob ability r atio test and prop o sed a n (ap- proximate) likeliho o d r atio test . In the context of mo dern-day clinical trials, the problem of how to incorp ora te da ta fr om ov errunning was r aised and discussed by Whitehead [ 16 , 17 ], and he gives an a dmittedly ad ho c solution, later named the deletion metho d [ 14 ]. This latter paper includes a comparison of the deletion metho d with metho ds descr ibed her ein, under certain limited co nditions. Another solutio n is presented in Hall and Liu [ 4 ] – actually , a n extension o f Anderson’s lik eliho o d ratio metho d – along with a disc us sion of the p os sible s tructure of ov errunning in- formation in a sequential c linic a l trial. Howev er, this solutio n utilizes the maximu m- likeliho o d or dering of the sample space, requiring sp eciﬁcatio n of the details of the stopping rule b eyond the time a stopping b oundar y was ﬁrst reached, in contrast to stagewise or dering . In this pap er, we fo cus on pr o cedures that do not req uire such sp eciﬁca tion. See these referenc e s fo r further int ro ductory mater ial. In the context of monitoring a Brownian motion with drift by p er io dic observ a- tions – the con text co nsidered herein – W hitehead [ 16 ] propo ses treating the ﬁnal analysis that incor po rates the ov errunning data as if it were a scheduled analy sis, but ignor ing the analys is that led to s topping, and hence involving a deletion . He uses a stepwise or dering (as deﬁned in [ 6 ], for example) for computing p -v a lues and carrying out further infer ence. Here we pr ovide another s olution, based on the “adding weight ed Zs” method o f combining p -v alues (Stouﬀer et al. [ 15 ], Mosteller and Bush [ 1 0 ], Liptak [ 7 ]); one p - v alue is derived from the s equential exp eriment (without the ov errunning) and the other is based solely o n the incremental ov errunning data. W e r ecommend weighting the tw o p -v alues using o bserved informatio n. This is fully legitimate only if (i) the amount of infor mation in the non-seq uential trial (ov errunning) is pro po rtional to that av ailable at ter mination of the s e quential trial or (ii) the sequential trial was actually nonsequential (and test statistics are normally distributed). F o r discussion of (i), see [ 4 ], Section 2. Another issue that arises in p opular gro up-sequential trials is that if s topping do es not occur until the la st sc heduled analysis, s uch an analysis will ordinarily not be done until la gged data ar e av ailable, in which case a p -v alue will b e computed by standard g roup-seq uent ial metho ds with a r e-scheduled ﬁnal a nalysis. (This is consistent with the deletion m etho d .) A mo diﬁcatio n of our method, whic h combines p -v a lue s fo r such trials only when stopping early , is ev alua ted numerically . Another applica tio n of the co mbination metho d could be to a double sampling study in which the second sample s ize dep ends on the o utcome – e.g., o n the o b- served v ar iability – of the ﬁrs t sample. These met ho ds are also appropriate for a meta-analysis of two (or more) exp eriments, whether sequential o r not. Brannath, Posc h and Bauer [ 2 ] prop os e d p -v alue combination rules in a diﬀer- ent context, namely that of a daptive gr ou p-se quential sampling . In their setting, allow ance is made for the po ssibility of not car rying out the second stage. (In our context, this w ould constitute “preven ting ov errunning.”) If the s e cond stage is carried o ut, the tw o p - v alues are co m bined in a wa y that (i) preserves a n ov erall signiﬁcance level and (ii) r ecognizes the stopping rule. As her e , the s econd sta ge p -v a lue s may b e conditional on r esults fr om the ﬁrst stage. They ex tend to m ultiple Se q uential overrunning p -values 35 stages recur sively . Numerical int egration may b e r equired. The “adding weigh ted Zs” com bination metho d is describ ed in Section 2 and extended to an or der ed sequence o f p ossibly dep endent exp eriments. In Section 3 , this metho d is applied to sequential clinical tria ls w ith ov errunning. Sp ecial atten- tion is given to the case of a constant amount of overrunning information, the case considered in [ 14 ], or to a n amount prop ortio nal to the amount av aila ble a t the end of the sequential trial. It is sho wn that the latter a ssumption justiﬁes the use of weigh ts rela ted to the o bserved (and hence random) amounts of informa tio n. Oth- erwise, the use of such random weigh ts leads to a null distribution o f the p - v alue which is only appr oximately uniform. Still, we r e commend this usage so long a s the approximation is adequate. In Se c tion 4 we show how to use the c ombination p -value metho d to compute estimates and conﬁdence in ter v als, and in Section 5 pr ovide form ulas f or ev aluating the true conﬁdence c o eﬃcient s associa ted with these methods, thus ena bling an ev alua tio n of approximations noted ab ov e. So me ev alua tions are summarized in Section 6. In So or iyarac hchi et al. [ 14 ], the issue o f rev ersals in the conclusions a fter in- corp ora ting ov errunning, fr o m r ejection to accepta nce o f a n ull hypothesis or vice versa, was raised. They f ound, in the cases treated numerically there, that both the deletion metho d and the c ombination metho d migh t lead to an unco mfortable level of rev ersals, with the deletion metho d doing so less frequen tly . They also noted that bo th methods (in cases treated) sometimes lead to reduced pow er. W e consider these issues in Section 7 and indicate a mo diﬁcation of the c ombination metho d that reduces these eﬀects. The met ho ds are a pplied to data from the MAD IT trial [ 8 ] in Section 8 – for bo th the actua l linear -b oundary desig n and fo r an imagined group- sequential v er- sion. Results are compare d with thos e fro m the deletion and ML-or dering metho ds . Results from a seco nd example [ 9 ] are brieﬂy summar ized. Some ﬁnal comments app ear in Section 9, including a summar y co mparison of the alternative methods for incorp or ating overrunning. 2. Com bining p -v alues by adding w eighted Zs and an extension W e supp ose some p otential data X a re to be av ailable for testing a null h ypo thesis ab out a real parameter δ belo nging to an in terv al ∆. F or e ach δ o ∈ ∆, we consider a test of δ = δ o versus δ > δ o , with p - v alue p ( x ; δ o ) when X = x is observed. Suppo se, for each δ o , P ≡ p ( X ; δ o ) is uniformly distributed on (0,1 ) when δ = δ o and that, for each x , p ( x ; δ ) is incr easing in δ . Then P ≡ { p ( · ; δ o ) | δ o ∈ ∆ } deﬁnes a pr op er family of p -values for this testing problem. This is an ov erly strict deﬁnition. W e hav e restricted attention to test func- tions with con tin uous distr ibutio ns , and st o chastic or dering (incr e asing or no n- decreasing) of p - v alues would allow for diﬀering sample spaces, but the co nditions given meet o ur application. W e usually omit the word “prop er.” The ordering is needed to a v oid p ossible incons istencies. Data co nsidered to be “mor e extre me” than that observed should ha v e higher probability under an alternative hypothesis than under the null. Mor eov er, it facilitates co nstruction of consistently deﬁned conﬁdence bo unds. Simply e q uate the p -v alue for testing δ o to γ (1 − γ , r esp.) and solv e f or δ o to obtain a lo w er (upp er, resp.) conﬁdence bo und with conﬁdence coeﬃcient 1 − γ . Cho osing γ = 0 . 5 yields a median-un biased estimate. 36 W. J. Hal l and K. Di ng Suppo se p 1 and p 2 are independent p - v alues for the same null hypo thesis, a nd let z ( u ) ≡ ¯ Φ − 1 ( u ) with ¯ Φ( z ) = 1 − Φ( z ) and Φ being the standard normal distribution function. Let w 1 and w 2 be p ositive n um ber s for which w 2 1 + w 2 2 = 1 . Then (2.1) p ≡ ¯ Φ ( w 1 z ( p 1 ) + w 2 z ( p 2 )) is the adding weighte d Zs c ombine d p -value [ 7 , 10 , 12 , 15 ]; als o s ee [ 13 ]. It is rea dily seen that the argument of ¯ Φ in (2.1), with p i replaced b y the r andom v ariable P i , is distributed as standard normal under the null hypothesis, and hence P in (2.1) is distributed a s U (0 , 1 ). Moreov er, p tends to b e small whenever bo th p 1 and p 2 are small. Mo re pr ecisely , it is se e n to b e pr o p er whene ver p 1 and p 2 are prop er – the p i ’s a re increa sing in δ o , so the z ( p i )’s are decr e a sing, as is any po sitively-w eight ed linear combination, and hence p is increa sing in δ o . The interpretation should b e clea r : z ( p i ) is a standardized nor ma l deviate that corres p o nds to the test statistic on which p i is based (whether or no t p i was based on a norma lly distributed statistic), and the argument o f ¯ Φ in (2.1) repres ent s a (w eighted) p o oling o f no rmal deviates for the t w o indep endent tests, with p the resulting p -v alue. This c ombination m etho d may be extended to settings where p 1 and p 2 are derived fro m ov erlapping da ta sets but p 2 is a conditional p - v alue for each s ubset of data on which p 1 is bas ed. A p oss ible context is that a seco nd exp eriment was designed based on the o utco me of the ﬁr s t exp eriment, and a conditional test was used in the s econd exp eriment. F ormally , Prop ositi o n 2.1. Supp ose p 1 ≡ p 1 ( x ; δ ) and p 2 ≡ p 2 ( x, y ; δ ) , and P 1 ≡ { p 1 ( · ; δ o ) | δ o ∈ ∆ } is a fa mily of p -values and P 2 ≡ { p 2 ( x, · ; δ o ) | δ o ∈ ∆ } is, for e ach X = x , a family of c onditional p - values. Then P 2 is a fami ly of unc onditional p -values, p 1 ( X, δ o ) ⊥ p 2 ( X, Y ; δ o ) (indep endent) for e ach δ o , and (2.1) deﬁnes a family o f p -values. Pr o of. Since p 2 ( X, Y ; δ o ) is conditiona lly U (0 , 1) for every X , it is unconditionally U (0 , 1 ), and the needed monotonicit y als o follows. F or each δ o , the joint dis tr ibution function of ( P 1 , P 2 ) is Pr { P 1 ≤ u 1 , P 2 ≤ u 2 } = E  1( p 1 ( X ) ≤ u 1 ) · E  1( p 2 ( Y , X ) ≤ u 2   X  = E  1( p 1 ( X ) ≤ u 1 ) · u 2  = u 1 u 2 , from which indep endence follows, and this is suﬃcient for the claim ab out p . Now what ab o ut the weights? Ordinarily , they might be rela ted to sample size or information. Sp eciﬁcally , if the p i ’s are derived from tests based on means of n i normally dis tributed observ ations (with common v ariance), then a c o mbined p with w i ∝ √ n i would yield the s ame p a s that fro m a p o o ling of the tw o sa mples . So far, w e hav e o nly assumed the weights to b e p ositive consta nt s – depending neither on δ o nor on the data. Her e are some partia l extens io ns; examples of ea ch appe a r in the next se ction. The weights may dep end on δ o without a ﬀecting the n ull distribution of P in (2.1), but the mo notonicity in δ o may b e destr oy ed except for sp ecia l ch oices. The weights may b e r andom (dep ending on X ) without a ﬀecting the mono to ni- city in δ o , but would typically disturb the uniformity o f the null distribution o f P . It should be e mpha sized that all p -v alues c o nsidered ab ov e are for one- sided alternatives. After including o verrunning, the usual con v en tion of doubling them for 2-sided alterna tives may b e a ppr opriate. Se q uential overrunning p -values 37 Finally , we no te tha t a ll of this can b e dir ectly extended to an or dered s et o f several p -v alues, each involving new data and conditional on all pas t data, and combined in the “ adding weigh ted Z’s ” fashion. Spec iﬁc a lly , let p k be a p -v alue for the incremental stage- k data, conditional on data from all prior stag es. Then deﬁne a stage- k combination p -v alue by replacing the arg ument o f ¯ Φ in (2.1) b y P k i =1 w k : i z ( p i ) with sta g e- k w eight s all p os itive, satisfying P k i =1 w 2 k : i = 1 , and w 2 k : i = w 2 k − 1: i · (1 − w 2 k : k ) for i < k . Equiv alen tly , (2.1) may b e applied r ecursively , replacing p 1 by a combined p fr om e a rlier stages with weigh t w 1 for this new p 1 and w 2 for the incremental data, with w 2 1 + w 2 2 = 1. 3. Incorp orating o v errunning b y com bining p -v alues W e now as sume a sequential exp eriment takes plac e , re s ulting in an observ ation of ( T , X ), say . W e fo cus on the co nt ext of observing a Brownian mo tion X ( t ) with drift δ , with a stopping time T and X ≡ X ( T ) up on stopping, but other contexts may b e treated simila rly . After sto pping, some additional data b ecome av aila ble , represented by further o bserv a tion o f the pro cess for t o = t o ( T , X ) units o f time. Conditional on t o , a suﬃcient statistic for the overrunning data is the incre men t Y obser ved during the ov errunning time incr ement t o . In other w ords, a sequential exp eriment is follow ed by a non-seq ue ntial o ne, with s a mple size (obs erv a tion time) depe nding on the o utcome of the sequen tial tria l. Ther e may b e additional random- ness in t o ; it is suﬃcien t to let t o ( t, x ) b e the conditiona l exp ecta tio n of o verrunning information, g iven ( T , X ) = ( t, x ). See ([ 4 ], Section 2) for discus sion supp or ting t o being a constant, ∝ √ t , or ∝ t as p ossible approximations to reality . Upo n re a ching a stopping b oundary , a p -v a lue p 1 for a null hypo thes is a bo ut the drift parameter is deﬁned: δ = δ o versus δ > δ o . And a t the end of ov errunning, a co nditional p -v alue p 2 is simply ¯ Φ  ( y − δ o t o ) / √ t o  , given t o = t o ( t, x ). A com- bination p - v alue is therefore given by (2.1). (Here, ( T , X ) plays the role o f X in Section 2.) Hence, Corollary 3. 1. Supp ose w 1 and w 2 ar e p ositive c onstants for which w 2 1 + w 2 2 = 1 . Then (3.1) p ( t, x, y ; δ o ) ≡ ¯ Φ  w 1 z ( p 1 ( t, x ; δ o )) + w 2 ( y − δ o t o ) / √ t o     t o = t o ( t,x ) , δ o ∈ ∆ , deﬁnes a family of p -values. But how sho uld the weigh ts b e chosen? It is tempt ing to choose them to be prop ortiona l to the squar e-ro ot of information in the resp ective parts of the ex p er - imen t. Then e ach summa nd in (3 .1 ) would have v ariance or conditiona l v ariance equal to the information in that part o f the exp eriment. Using exp ected informa- tion, w 2 1 = E δ o ( T ) / [ E δ o ( T ) + E δ o t o ( T , X )] and w 2 2 = 1 − w 2 1 . But, as no ted in Section 2, this would not typically preserve the needed mo no tonicity o f p ( δ o ) in (3.1). Moreover, kno wledge of the functional form of the dep endence of t o on ( t, x ) would b e needed. If t o were cons tant, this would yield p ( t, x, y ; δ o ) = ¯ Φ [ E δ o ( T )] 1 / 2 z  p 1 ( δ o )  + y − δ o t o [ E δ o ( T ) + t o ] 1 / 2 ! . This could be used as a p -v alue for a single n ull h ypo thesis, but it w ould no t be suitable for co ns truction o f conﬁdence b ounds, unless E δ o ( T ) was repla c ed by E δ ′ o ( T ) for a ﬁxed δ ′ o . Because of these limitations, we a bandon this approach. 38 W. J. Hal l and K. Di ng Suppo se instea d w e use the squar e-ro ots of observe d information , namely √ t and √ t o , yielding p ( t, x, y ; δ o ) = ¯ Φ t 1 / 2 z  p 1 ( t, x ; δ o )  + y − δ o t o ( t, x ) [ t + t o ( t, x )] 1 / 2 ! . (3.2) Monotonicity in δ o (for each ( t, x, y )) is maintained, but the unifor mit y of the null distribution would app ear to b e in doubt. Howev er, to compute p , no knowledge of the dep endency structur e of t o is required, only its o bserved v a lue. W e no w consider the sp ecial c ase of (3.1) and (3 .2) with t o ∝ t , say t o = c t . Since w 2 1 = t/ ( t + ct ) = 1 / (1 + c ) and w 2 2 = c / (1 + c ), this yields constant weigh ts; and c is known once T = t a nd t o are obs e rved. Hence, the use of observed infor mation in this cas e is justiﬁed. Corollary 3 . 2. If, for some c onstant c , t o ( x, t ) = ct for al l ( x, t ) , then (3.2) deﬁnes a family of p -values. F o r the gro up- sequential case with up to K analyses and stag ewise or dering, we mo dify the com bination p -v alue (3.2): F or testing δ o , with t ok ≡ t o ( k ), the modiﬁed p -v a lue is deﬁned as (3.3) p ∗ ( t k , x, y ; δ o ) = ( ¯ Φ  [ t 1 / 2 k z ( p 1 ( t k , x ; δ o )) + y − δ o t ok ] / ( t k + t ok ) 1 / 2  if k < K p 1 ( t K + t oK , x + y ; δ o ) if k = K where p 1 ( t, x ; δ o ) is the group- sequential stag ewise p -v a lue for tes ting δ = δ o versus larger v alues when the analy ses are scheduled at t 1 , . . . , t K − 1 , t o K ≡ t K + t oK with early-sto pping sets S k ( k < K ) (each the complement of a n in terv al). This matches the deletion metho d when sto pping has not o ccurr ed ea rly . F o r a g roup-sequential ML- o rdering, the ML-or dering metho d [ 4 ] may b e mor e suitable. W e show in Sections 5 and 6 that use o f p in (3.2 ) or (3 .3 ), for several c hoices of the dependency of t o on ( t, x ) and tw o po pular sequential designs, for constr ucting conﬁdence b ounds and interv als may yield adequately accur ate conﬁdence co eﬃ- cients. This leads us to recommend the use of (3.2) or (3.3) as if it were a b ona ﬁde combination p -v a lue, if the des ig n chosen and the likely form o f dep endency are similar to those co nsidered in Section 6. One la st v ariation p ermits further adjustment of the weightin g: Use w eigh ts with squares pr op ortiona l to T and ρ t o ( T ) fo r a sp eciﬁed weightin g factor ρ > 0 . F or motiv ation, see Section 7. 4. Computing p -v alues and conﬁdence b ounds Here w e a ct as if t o ∝ t , and discuss the use o f (3.2) and (3.3) for obtaining p - v alues and, by inv ersion, conﬁdence b ounds and in terv als . F o r a ny pa rticular null v alue δ o , the c ombine d p -value p ( δ o ) may b e co mputed from (3 .2) or (3.3) with t (or t k ), x , y and t o the obse rved v a lues, and using softw a re that e na bles computation of p 1 ( δ o ). F or genera l linear b oundar ie s, such so ft w are is av ailable from the authors (ba s ed on for m ulas in [ 3 ]), and the PEST so ft ware [ 11 ] provides such output for a limited selection o f linea r b oundaries and g roup- sequential mo diﬁcations of them. F or group-s e q uent ial b oundaries with stagewsie ordering, a program – built aro und softw are for p 1 ( δ o ) fr om Jennison [ 5 ] – is a v ailable from the a utho rs. Se q uential overrunning p -values 39 T o o btain an upp er co nﬁdence b ound with conﬁdence co e ﬃcie n t γ , we need to solve p ( δ ) = γ for δ = ˆ δ U , or equiv alently , s olve z ( p ( δ )) = z ( γ ). A little a lgebra leads to the equiv alent problem – except in the group-s e quential case with t = t K – of solving δ − h ( δ ) − [ y − √ t o z ( γ )] /t o = 0 where t o ≡ t + t o and h ( δ ) ≡ √ t z ( p 1 ( δ )) /t o . Starting fro m a tria l so lution δ o , and computing h ( δ o ) and h ( δ o + ǫ ) for some small ǫ , an improved solution is δ ≡ δ o − δ o − h ( δ o ) − [ y − √ t o z ( γ )] /t o 1 + [ h ( δ o ) − h ( δ o + ǫ )] /ǫ . W e ﬁnd that tw o or three iterations pr ovide go o d accura cy . (When t = t K , we only need solve p 1 ( t o K , x + y ; δ ) = γ .) Alternatively , a trial-and-erro r a pproach works quite satisfacto r ily . 5. T rue conﬁdence co eﬃcients W e now ev alua te the true co nﬁdence co eﬃcient for a conﬁdence b ound or in terv al determined b y using (3.2), whether or no t t o is pro p o rtional to T . Le t ˆ δ γ be an upp er conﬁdence b ound deter mined by the metho d of the prev ious section for a nominal conﬁdence co eﬃcien t γ . The question is: wha t is the tr ue conﬁdence coeﬃcient? W e need to ev aluate, for given γ and δ , q γ ( δ ) ≡ P δ ( δ < ˆ δ γ ) a nd deter mine q o γ ≡ inf δ q γ ( δ ). As no ted in Section 4, ˆ δ γ is the so lution to δ − h ( δ ) = [ y − √ t o z ( γ )] /t o ≡ g ( y , t o , t o , γ ). Since h is decrea sing in δ , the left side is incr easing in δ , and hence δ < ˆ δ γ iﬀ δ − h ( δ ) < g ( y , t o , t o , γ ). This latter even t is e q ual to the even t ( y − δ t o ) / √ t o > [ − √ t z ( p 1 ( t, x ; δ )) + √ t o z ( γ )] / √ t o . Therefore, conditioning o n ( T , X ) and hence on T o , we have (5.1) q γ ( δ ) = E δ P δ ( δ < ˆ δ | T , X ) = E δ Φ T 1 / 2 z ( p 1 ( T , X ; δ )) − T o 1 / 2 z ( γ )) T 1 / 2 o ! . If this combination p -v a lue were bo na ﬁde – that is, if T o ∝ T – the r esult would b e γ identically in δ . The true co nﬁdence co eﬃcient for an (eq ua l-tail) co nﬁdence interv al based on (3.2) may be o bta ined similarly . F or an interv al with nominal conﬁdence co eﬃcient γ , the true co nﬁdence co eﬃcient is Q o γ ≡ inf δ Q γ ( δ ) where Q γ ( δ ) ≡ q (1 − γ ) / 2 ( δ ) − q (1+ γ ) / 2 ( δ ) . (5.2) F o r the gr o up-sequential mo diﬁcation (3.3), (5 .1 ) needs to b e mo diﬁed when T = T K . 6. Computational supp ort for approx imations Here we rep ort on some n umerical ev alua tions of the v alidity of using (3.2) or (3.3 ) when t o is not propor tional to th e obser ved stopping time t , and the v alidit y of using the co m bination metho d only when stopping after a n interim analysis . F or v ario us special ca s es and man y v alues of δ , w e computed (5.1) for γ = 0 . 5 and (5.2 ) for γ = 0 . 9 and 0.9 5 to se e how clo s e they are to the resp ective nominal v alues of 0.5, 0.9 and 0.9 5 . W e summarize some of the ﬁndings her e. 40 W. J. Hal l and K. Di ng A line ar-b oundary design: Consider triang ular b o undaries for testing δ = 0 versus δ = 1 with in tercepts ± 5 . 99, slop es 0.75 and 0.25, and apex at t = 23 . 97. This design ha s b oth er r or probabilities 0.025. The design may b e a dapted for testing δ = 0 versus 6 = 0 (as prescrib ed by the PEST softw are). The resulting o ne-sided rejection region is the upper b ounda r y for which the p ow er at δ 1 ≡ 0 . 8 2 33 is 0 .9. The exp ected sto pping time is 7.776 at δ = 0 (or 1) and 9.3 82 at δ 1 , and has its maximum o f 11 .217 at δ = 0 . 5. W e c o nsidered t o ∝ T , t o constant and t o ∝ √ T . F or the ﬁr st cas e, we s imply veriﬁed the a ccuracy of our computer prog r am, ﬁnding that the distribution of the p -v a lue was e xactly uniform, and that the true conﬁdence coe ﬃcient s matched the nominal ones ex a ctly . F o r the constant ca se, we considered t o = c E δ 1 ( T ) w ith c ranging from 0 .1 to 0.5. Here are sele c ted results: c = 0 . 1 c = 0 . 5 0 . 487 < q . 5 < 0 . 513 0 . 471 < q . 5 < 0 . 529 0 . 900 < Q . 9 < 0 . 908 0 . 899 < Q . 9 < 0 . 917 0 . 950 < Q . 95 < 0 . 955 0 . 949 < Q . 95 < 0 . 960 F o r c = 0 . 1, the true conﬁdence co eﬃcients Q o γ for nominal 90 % and 95% co nﬁdence int erv als are there fo re co rrect (to 3 decima l places), a nd only s lig htly b elow the nominal v alues for c = 0 . 5. How ev er, the median-un biased e s timate may have a few p ercentage p oints of median- bias, dep ending on the true δ . W e a lso found that q . 5 < 0 . 5 for δ > 0 . 5 and vice versa. Computations for c -v a lues b etw een 0 .1 and 0.5 yielded b ounds b etw een the r esp ective o nes in the display ab ov e. Results for t o ∝ √ T were unifor mly b etter than those for t o constant. An O’Brien–Fleming gr oup-se quential design: Consider an O’Br ien–Fleming t w o- sided design for testing δ = 0 with signiﬁcance level 0 . 05 and power 0.9 a t δ = ± 1, with a maximum of 5 analyses. W e assume equa lly spaced in terim ana lyses, at 0.2, 0.4, 0.6, 0 .8 times t 5 ≡ 10 . 781, with boundar y v alues of ± 6 . 69 8 8 (o btained from [ 6 ]). Again, w e considered t o ∝ T for the unmo diﬁed comb ination p -v alue to conﬁrm the accuracy of our progr ams. F or the mo diﬁed p , we consider ed t o constant, namely = c t 5 , and t o = c t k ; in each case, c ranged from 0 . 02 to 0 . 1. Here are some of the r esults: t o = c t 5 t o = c t k c = 0 . 0 2 c = 0 . 1 c = 0 . 02 c = 0 . 1 q o . 5 0 . 475 0 . 445 0 . 478 0 . 451 Q o . 9 0 . 894 0 . 887 0 . 894 0 . 888 Q o . 95 0 . 947 0 . 943 0 . 947 0 . 943 Again, a lthough q o . 5 may be as small as 0.44 (and by symmetry 0 . 4 4 < q . 5 ( δ ) < 0 . 56), we found that q . 5 was usually within ± 0 . 0 1 of 0 . 5. Indeed, this occ urred for all but 1%, 7 %, 1 % and 4%, r esp ectively (reading fro m left to right in the display ab ov e) of the ra nge of δ -v alues within ± 2 . 5. 7. Rev ersals and p ow er So oriyarachc hi et al. [ 14 ] raised conce rn abo ut the frequency o f r eversals of ac- ceptance and rejection conclusions after inclusio n of overrunning information, but Se q uential overrunning p -values 41 stressed their desire not t o igno re such informa tion. In sim ulation studies of the deletion and c ombine d p-value metho ds , with c onstant amounts of lagged data (in- depe ndent of the res ults at the time of stopping ), they found levels o f reversals that they co nsidered worrisome, espe c ially for the c ombination metho d – p erhaps 3 or 4 per cent. How ev er, in p o pular g roup-sequential designs suc h a s O’B rien–Fleming, reversals were rare a nd only deﬁned when the trial stopp ed early , as an ana lysis at a ﬁnal scheduled time would o rdinarily aw ait lagg ed data b efor e execution. Of mor e co ncern to us, is their ﬁnding that both methods may le a d to reductions in p ow er. Intuitiv ely , when a rejection o ccurs “early ”, ov errunning can reverse it but the c hances of comp ensating with r eversals in the other direction may b e minimal. With co nstant ov errunning infor mation, our c omputations (no t rep or ted here) conﬁrm theirs, but we ﬁnd reversals to b e somewhat less frequent when overrunning information increas es with stopping times, and losses in p ow er a re then rarer . A possible compro mise metho d is as follo ws: down-weigh t the ov errunning p - v alue in the com bination formula. By int ro ducing a fac to r ρ (see end of Section 3), it is p ossible to main tain p ow er and depress the fr e q uency of reversals but still not igno r e the lagged data completely . Ho w ev er, computations sho w that some situations will r equire extensive down-weigh ting (small ρ ). Choice of a suitable ρ will require c o mputational trial-and-error , with assumptions ab o ut overrunning needed. F o r this purp ose, we provide the following form ulas. When the true drift is δ (and stopping is in contin uous time), the probability of rejection upo n stopping fo llow ed by acce ptance after inclusion o f ov errunning, when t o ∝ t , is (7.1) P δ ( R → A ) = Z t max 0 Φ n [(1 + ρc ) / ( ρc )] 1 / 2 z α − [1 / ( ρc )] 1 / 2 z 1 ( U, t ) − δ ( ct ) 1 / 2 o dP U δ ( t ) with z 1 ( U, t ) b eing the standard normal deviate for which the rig ht -hand-side ta il area be yond it is P U 0 ( t ) (the p -v alue when the upper b o undary U is cr ossed at time t ), P U δ ( t ) being the probability of crossing the upp er b oundar y b efor e the lo w er one prior to t , and α b eing the o ne-sided sig niﬁcance level for testing δ = 0. F o r gr oup- sequential tests, the integrator in (7.1) is dP U δ ( x, t ), indicating a need to integrate ov er x -v alues where t = t k and the upp er b ounda ry has b een rea ched, but t may be restricted to { t k | k < K } sinc e reversals at a ﬁnal analysis hav e no r ole. Similarly , P δ ( A → R ) is given by (7 .1) with U r eplaced b y L (for low er b oundary ) and Φ r eplaced b y ¯ Φ. Finally , the p ow er after inclus ion o f overrunning, when the power o f the o riginal design is p ow ( δ ), is ovp ow ( δ ) = p ow ( δ ) − P δ ( R → A ) + P δ ( A → R ) . (Soft w are is av ailable from the a utho r s.) 8. An example: the MADIT study MADIT (Multicen ter Automatic Deﬁbr illa tor Implan tation T rial [ 8 ]) w as a ran- domized clinica l trial conducted to ev a luate the eﬀectiveness of a n implanted de- ﬁbrillator compar ed with conven tiona l drug therap y to reduce mortality asso cia ted with ven tricular arrhythmias. Mo nitoring was ba sed on the log rank statistic plot- ted against its estimated v aria nce [ 17 ]. This behaves like a B r ownian motion with drift δ = − log( HR ) whe r e HR is the hazard ratio of the treatment-to-con trol arms (assuming prop o rtional haza rds). The es sential fea tures were reviewed in [ 4 ] and are summarized her e . 42 W. J. Hal l and K. Di ng A tr iangular des ign was used that a ssures a tw o-sided sig niﬁcance level of 5% and a p ow er of 90 % a t a hazar d ratio of 0.5 37 (drift = 0.6 218). Monito ring was carried out weekly over the ﬁve years of the trial, thereby y ie lding nearly - contin uous observ ation o f the log r ank pro cess. The stopping b oundaries were u t = 7 . 935+ 0 . 189 t and l t = − 7 . 935 + 0 . 566 t , with the early part o f the lower boundary ( l t ) a rejection region for sup er io rity of the co n trol arm. Int erp olating, the upp er bo undary was r eached at t = 1 2 . 145 with x = 10 . 23 0, later cor r ected to t = 12 . 03 7 and x = 10 . 21 0 . The increment al co ordina tes for ov errunning were t o = 1 . 240 and y = 2 . 957, showing an upturn in the sample path after rea ching the b oundary . Resp ective p - v alues and estimates of the drift a nd of the HR are pr esented b elow, contrasting results of a na lyses without and with the us e of the ov errunning data. V a lues in squa re br ack ets ar e those rep or ted in [ 4 ] for the ML-or dering metho d , assuming t o ∝ √ t ; with linea r b ounda ries and no overrunning, stepwise or dering and ML-or dering a r e ident ical. overrunning 2-side d p me d-u nb-est 95% c onﬁdenc e interval Inference ab out the dr ift δ without 0.0084 0.786 (0.204, 1.361) with 0.0009 [0.0029 ] 0.938 [0.939 ] (0.388, 1 .484) [(0.329, 1.543)] Inference ab out the ha z ard ratio HR = ex p( − δ ) without 0.0084 0.456 (0.256, 0 . 815 + ) with 0.0009 [0.0029 ] 0.391 [0.391 ] (0.22 7, 0.67 8) [(0 .214, 0.720)] Both metho ds reﬂect the upturn in the sa mple path dur ing ov errunning a s the “with’ p -v alues ar e sma ller and the estimates farther from the null v a lues. But the combination metho d gives the smaller p -v alue and nar row er co nﬁdence in ter v als; this may r eﬂect the diﬀerent o rderings b eing used by the t wo metho ds. V a lues repor ted in [ 8 ] w ere based on Whitehead’s deletion metho d ; they are ident ical to the “without overrunning” v alues in the display ab ove, as the deletion metho d essentially ignor es overrunning when the path contin ues in a simila r direc- tion and there is near-contin uous monitor ing. (It w as this o bserv ation that inspire d the development of alternative metho ds for incorpo rating ov errunning.) In s uch set- tings, the deletion-metho d p - v alue cannot b e smaller than when computed upon ﬁrst hitting an upper boundar y , irres pec tive of the natur e o f t he ov errunning data. (F or, when reaching the upp er b oundary at time t , with the prior a nalysis a s hort time earlier, at time t − say , and then ov errunning to x o at a later time t o , the deletion one- sided p is the null pr obability of { T ≤ t − and X ( T ) ≥ u T } ∪ { T ≥ t and X ( t o ) ≥ x o } . These tw o events are disjoint , and the former is virtually the extremal set without ov errunning.) W e now consider a gr oup-sequential v ariation on MADIT as describ ed in [ 4 ]. W e pretended that an O ’Brien–Fleming 5-a nalysis design was used for testing δ = 0 versus 6 = 0 with p ow er 80 % at a HR of 0.537 . It would have stopp ed at the thir d int erim a na lysis with the res ults obtained upo n hitting the b oundar y in MADIT. Results of a nalyses are rep or ted b elow; for comparis o n, v alues in s quare brack ets are those r ep orted in T able 2 of [ 4 ] using the ML-or dering met ho d and as suming t o ∝ √ t . V alues for the deletion metho d – which treats the analysis after ov errunning as a replacement fo r the third scheduled analysis – are a lso given. In each case – i.e., without or with ov errunning – r esults fro m the group-sequential combination metho d indicate a mo re s igniﬁcant departur e from the null v a lue of Se q uential overrunning p -values 43 HR = 1 than do those by the gr oup-sequential ML-o rdering metho d. At lea st for the “without’ re s ults, this is a ttributable to the diﬀeren t orde r ings used. This time the deletion metho d gives results similar to those fr om ML-or dering . (These results are not dir ectly comparable to thos e in the previous table since the pretended group-se q uent ial des ign has reduced p ow er.) overrunning 2-side d p me d-u nb-est of H R 95% c onﬁdenc e int erval Group-sequential inference ab out the hazar d ratio without 0.0039 [0.0041 ] 0.43 1 [0.468] (0.244, 0 .762) [(0.29 ¯ 5 , 0 . 7 7 ¯ 5)] with 0.0004 [0.00 11] 0.37 3 [0.384] (0.217, 0.641) [(0.221, 0.672 )] Deletion metho d 0.0 0 14 0.384 (0.221, 0.680) Here is a brief summary of r esults from a s econd deﬁbrillator tria l, MAD IT-I I [ 9 ], in which the c ombination metho d was pre-sp eciﬁed. The design was aga in triangular, with a 5% 2 -sided signiﬁcance level and p ow er 95% at a HR o f 0.627: u t = 1 1 . 77 + 0 . 1273 t and l t = − 11 . 77 + 0 . 3819 t . This time ( t, x, t o , y ) = (45 . 415 , 17 . 551 , 0 . 4 83 , 1.441). The results were: overrunning 2-side d p me d-u nb-est of H R 95% c onﬁdenc e int erval Inference ab out the ha z ard ratio in MADIT-I I without 0.028 0.708 (0 .525, 0 .962) with 0.016 [0.023] 0.688 [0.689 ] (0.511, 0 .932) [(0.504, 0.948 )] Again, the deletion met ho d of inco rp orating overrunning would hav e agreed with the “without” analysis , and re sults from the ML-or dering metho d (in square brackets) are mainly intermediate. 9. Final remarks Prop ositio n 1 applies to other metho ds of co mbining p -v alues, suc h a s Fisher’s summing of − lo g(1 − p i ). (F or a descriptio n of suc h methods, see [ 12 ] or [ 13 ].) W e chose the “adding Zs” metho d for tw o reaso ns: (i) It lends itself natur ally to weigh ts – it would be unreaso nable to g ive equal weigh ts to a long tr ial and a s mall amount of o verrunning – and (ii) it reduces to standard normal-theor y metho ds when the sequential co mpo nent is repla c ed by a no n-sequential o ne – equiv alently , if a naive analysis is done after stopping ra ther than one rec o gnizing the stopping rule. Here a re some of the pros a nd cons of v ar ious metho ds for incor po rating ov er- running: (a) Deletion met ho d: Not suitable for near -contin uous monito ring. Ig nores the fact that, at the b oundary-hitting stage, the monitoring statistic was in a stopping region but is the natura l approa ch in a group-s e quential trial when early stopping has not o ccurr ed. Simple to use. Results in appr oximate p -v a lues and ﬁnal inference. Limited computations show that a loss in p ower may o ccur . (b) Combination p -value metho d: Makes direct use of the ana lysis that le d to stopping. Approximate except when t o ∝ T , and ev en then for common gro up- sequential designs. Uses stag e - wise ordering, and hence free of any direct dep endence on future stopping b oundarie s . Needs no for mal assumption ab out the form of t o . 44 W. J. Hal l and K. Di ng Computations show a loss in p ower may occ ur . Ma y be mo diﬁed to re duce the chance of r eversal after ov errunning and loss in p ow er. (c) ML-or dering metho d: Based on a minimal suﬃcient statistic, and hence ig- nores which bounda r y was ﬁrst reached and when. Exact, up to needed assumptions ab out overrunning information (and Brownian motion approximation). Requires an assumption ab o ut the form of t o ( t ), but not v ery sensitive to it in the practica l ca s es examined. Uses ML-ordering a nd hence dep ends on sto pping b oundar ies b eyond those when b oundaries were ﬁrst reached. So oriyarachc hi et al. [ 14 ] conclude that (a) is pr eferable although they only considered constant a mounts of o v errunning whereas our f o cus has b een on settings where the amount of overrunning information is likely to incr ease with increased stopping times. They highly stress the p oss ibilities of reversals, but such p ossibilities cannot b e av oided o nce one agrees to utilize lagged data. The chances can be reduced within the c ombination metho d b y reducing the w eigh t giv en to la gged data, but this w ould need to b e consider ed in adv ance of the trial. W e reco mmend (c) in settings where the design is likely to b e fo llowed clos ely . A n umerical study of reversals a nd p ow er with the ML-or dering metho d will b e presented elsewhere. Otherwise, w e think the c ombination p-value metho d , p oss ibly with a down-w eigh ting of ov errunning information, is a co mpe titor worth y of co n- sideration, esp ecially when ov errunning increa ses with increasing stopping times. W e encoura ge inv e s tigation of the c ombination metho d in other settings, includ- ing meta-a na lyses a nd double sampling. Ac kno wle dgment s. The ﬁrst MADIT tr ial stimulated the resea rch r ep orted here. W e thank Arth ur Moss, MD, and Boston Scientiﬁc Corpo r ation (formerly CPI/Guidant), respectively , for leadership and spo nsorship of this trial. W e are also gra teful to John Whitehead for helpful discussion and to Michael McDermo tt for some reference s to the p -v alue liter ature. References [1] Anderson, T. W. (1964 ). Sequential ana lysis with delay ed obser v ations. J. Amer. Statist. Asso c. 5 9 1 006– 1 015. MR01752 62 [2] Branna th, W., P osch, M. and Bauer, P. (20 02). Recursive combination tests. J. Amer. Statist. Ass o c. 97 236–2 44. MR1947 283 [3] Hall, W. J. (1997). The distribution of Brownian motion on linear stopping bo undaries. S e quent ial A nalysis 16 34 5–352 . Addendum in Se quent ial Anal ysis 17 123– 1 24. MR14916 41 [4] Hall, W. J. and Liu, A. (2002). Sequential tests and estimator s after ov errunning ba sed on maximum-lik elihoo d ordering. Biometrika 89 699– 707. MR19291 73 [5] Jennison, C. (1999). Group sequen tial soft w are at website: ht tp://www.bath.ac.uk/ mascj/b o o k /progr ams/gener al . [6] Jennison, C. and Turnbull, B. W. (2 000). Gr oup S e quen tial Metho ds with Applic ations to Clinic al T rials . Chapman & Hall/CRC, Bo ca Raton, FL. MR17107 81 [7] Lipt ak, T. (1958 ). O n the combination of independent tests. Magya r T ud. Aka d. Mat. Kutato Int. K¨ ozl. 3 171 – 197. [8] Moss, A. J., H all, W. J., Cannom, D . S., Da uber t, J. P. , Hig- gins, M. D. , Klein, H., Levine, J. H., Sak sena, S ., W aldo, A . L ., Se q uential overrunning p -values 45 Wilber, D., Bro wn, M. W., Heo, M.; for the Mul ticenter A uto- ma tic Defibrilla tor Implant a tion Trial In vestiga tors (1996 ). Im- prov ed surviv al with a n implanted deﬁbrilla tor in patients with coro nary dis- ease at high risk for ven tricular arrh ythmia. New England J ournal of Me dicine 335 1933 –194 0 . [9] Moss, A. J., Zareba, W. , Hall, W. J., Klein, H., Wilber, D. J. , Can- nom, D. S., Da uber t, J. P., Higgins, S. L., Bro wn, M. W ., And rews, M. L.; f or the Mul ticenter Automa tic Defibrilla tor Implant a tion Trial-I I In vestiga tors (2002). Prophylactic implantation of a deﬁbrillator in patien ts with m y o cardial infarction and reduced ejectio n fraction. New Eng- land J . Me dicine 3 46 877– 8 83. [10] Mosteller, F. M. and Bush, R. R. (1954 ). Selec ted quant itative tec h- niques. In Handb o ok of So cial Psycholo gy I . The ory and Metho ds (G. Lindzey , ed.). Addison-W esley , Ca m bridge, MA. [11] MPS Research Unit (200 0). PEST: Planning and Evaluation of S e quent ial T rials, V ersion 4: Op er ating Manual . Univ ersity of Reading , Rea ding , UK. [12] Oosterhoff, J. (19 69). Combination of One-S ide d S tatistic al T ests . The Mathematical Centre, Amsterda m. MR02477 07 [13] Rosenthal, R. (197 8). Combinin g results o f indep endent studies. Psych. Bul l. 85 185– 1 93. [14] Sooriy arachchi, M. R., Whitehead, J., Ma tsushit a, T., Bolland, K., and Whitehead, A. (200 3). Inco rp orating data received after a sequential trial ha s stopp ed in to the ﬁnal analy sis: Implementation and comparison of metho ds. Biometrics 59 7 01–70 9. MR20042 76 [15] Stouffer, S. A., Suchman, E. A., DeVinner, L. C ., St ar, R. M., Williams, R. M. (19 4 9). The Ameri c an Soldier: Ad justment During Army Life I . Princeton Univ. P ress, Princeton, NJ. [16] Whitehead, J. (1992). Overrunning a nd under r unning in sequential clinical trials. Contr ol le d Clinic al T rials 13 10 6 –121 . [17] Whitehead, J. (1997). The Design and Anal ysis of Se quent ial Cli nic al T rials , 2nd ed. revised. Wiley , New Y ork. MR07930 18

Sequential tests and estimates after overrunning based on $p$-value combination

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment