Rank-based methods for estimating landmark win probability in longitudinal randomized controlled trials with missing data
The primary analysis for longitudinal randomized controlled trials (RCTs) often compares treatment groups at the last timepoint, referred to as the landmark time. Assuming data are normally distributed and missing at random, the mixed model for repea…
Authors: Guangyong Zou, Shi-Fang Qui, Joshua Zou
Rank-based metho ds for estimating landmark win probabilit y in longitudinal randomized con trolled trials with missing data Guangy ong Zou 1,2 ‡ * , Shi-F ang Qiu 3 ‡ , Josh ua Zou 4 , Emma Da vies Smith 5 , Y un-Hee Choi 1 , Y uhan Bi 1 1 Department of Epidemiology and Biostatistics, Sch ulic h School of Medicine & Dentistry , W estern Univ ersity , London, On tario, Canada 2 Alimentiv Inc, London, Ontario, Canada 3 Department of Statistics and Data Science, Chongqing Universit y of T echnology , Chongqing China 4 Department of Statistics and Actuarial Science, Universit y of W aterlo o, W aterlo o, Canada 5 Department of Biostatistics, Harv ard T H Chan Sc ho ol of Public Health, Boston, Massc husetts, USA ‡ These authors contributed equally to this work. * gzou2@uw o.ca Abstract The primary analysis for longitudinal randomized controlled trials (R CTs) often compares treatmen t groups at the last timep oin t, referred to as the landmark time. Assuming data are normally distributed and missing at random, the mixed mo del for rep eated measures (MMRM) is widely used to conduct inference in terms of a mean difference. When outcomes violate normality assumption and/or the mean difference lacks a clear interpretation, we ma y quan tify treatmen t effects using the probabilit y that a treated participan t w ould ha ve a b etter outcome than (or win ov er) a con trol participan t. F or RCTs with missing data, one ma y apply the generalized pairwise comparison (GPC) pro cedure, which carries forward the results of a pairwise comparison from a previous timep oint. W e prop ose first using ranks to con verts eac h observ ation at a timep oint in to a win fraction, reflecting the prop ortion of times that the observ ation is b etter than every observ ation in the comparison group. Then, w e conduct inference for the win probability based on the win fractions using the MMRM to obtain the p oin t and v ariance estimates. Simulation results suggest that our metho d p erformed muc h b etter than the GPC procedure. W e illustrate our prop osed procedure in SAS and R using data from tw o published trials. 1 In tro duction T reatment effects are typically assessed using randomized con trolled trials (RCTs) in which eac h participan t is follow ed ov er time and outcome v ariables are measured at several pre-sp ecified timep oin ts [1 – 3]. The primary analysis usually compares the outcome data b et ween different treatment groups at the end of follow up [4], which may b e referred to as the landmark [5]. When the outcome v ariable p ossesses meaningful units and appro ximately follo ws a normal distribution, the treatment effect may b e quantified by a p oin t estimate of the mean difference b et ween groups at the landmark, accompanied by the asso ciated confidence interv al and P -v alue. Missing data is an ev er-present problem. Some participan ts may drop out or b e lost during the follow-up b efore the completion of the study , resulting in missing data for the Marc h 18, 2026 1/31 landmark analysis. As classified by Rubin [6], data are missing completely at random (MCAR) if, conditional on the cov ariates, the probability of outcome missing is unrelated to either the observed or unobserv ed outcomes; missing at random (MAR) when missingness do es not dep end on the unobserved data, conditional on the observed data; and missing not at random (MNAR) when missingness dep ends on unobserved data even after conditioning on the observed data. The assumption of MAR is commonly made for the primary analysis of R CTs [7]. A lik eliho o d-based mixed-effects mo del for rep eated measures (MMRM) with a group-sp ecific ‘unstructured’ cov ariance matrix has b een recommended for analyzing longitudinal con tinuous outcomes collected at fixed time p oints [1, 3]. The MMRM has several adv antages. First, it is robust to mo del missp ecification b ecause of the unstructured time-resp onse profiles. Second, it allows for the use of partial data from participants who dropp ed out during the trial and do not ha ve data for the landmark timep oin t. Third, the MMRM with a group-specific ‘unstructured’ cov ariance matrix for within-sub ject correlations is robust to missp ecification of the cov ariance structure and v ariance heterogeneity . Finally , it can b e more efficient than multiple imputation (MI) in many situations, despite the latter b eing commonly regarded as the gold standard approach for handling missing data [8]. In terms of the estimands framework outlined in the ICH E9(R1) Addendum, the MMRM analysis commonly uses the hypothetical strategy to handle drop outs by estimating the treatment effect that would hav e b een observed had there b een no drop outs [5]. Besides missing data, longitudinal R CTs often employ outcome measures that do not follo w normalit y and/or lack meaningful units, suc h as disease severit y scores or health related quality of life scores [5, 9]. Despite its p opularity in R CTs, less attention has b een giv en to the analysis of suc h data. Without making distributional assumptions for outcome data, we ma y quan tify the treatmen t effect withthe results of pairwise comparisons, an idea that dates back to Deuc hler in 1914 [10]. Sp ecifically , Deuchler assigned a score of +1, -1, or 0 to each pair of observ ations from the comparison groups, when the first observ ation of the pair is greater than, less than, or equal to the second observ ation. A statistic w as then constructed using the sum of scores divided by the num b er of pairs with nonzero scores. This idea has b een re-disco vered and mo dified many times in the literature, most notably by Mann and Whitney [11]. Buyse [12] used the term generalized pairwise comparison (GPC) to describ e the ab ov e scoring system, and net treatmen t b enefit to describ e the resulting statistic. The net treatmen t b enefit, also known as the Mann-Whitney difference [13] or Somers’ D [14], is the difference b etw een the probability that a randomly selected observ ation from the treatmen t group is b etter than (or wins ov er) a randomly selected observ ation from the control group and the probability that a randomly selected observ ation from the con trol group wins ov er a randomly selected observ ation from the treatmen t group. Since these t wo win probabilities (WinPs) sum to one, w e need only fo cus on inference of the WinP by the treatmen t group. One nov el asp ect of the Buyse GPC pro cedure in the con text of longitudinal RCTs is that timep oints can be prioritized according to clinical imp ortance, whic h is also the principle underlies the win ratio analysis [15]. Thus, it seems v ery attractiv e to apply the GPC pro cedure for landmark analysis in longitudinal R CTs with missing data. Specifically , the timep oints from baseline can b e ordered in reverse suc h that the landmark timep oin t b ecomes the most imp ortant one, and the second from the last b ecomes the second imp ortan t one, etc. If a treatment-con trol pair cannot b e scored due to missing data at the highest priority (primary) timep oin t, then a comparison can be made using the next timep oin t in the hierarc hy . Note that this approac h has also app eared in the literature under the terms ‘mean rank imputation’ [16, 17], and ‘last k ernel carried forward’ [18], and has b een built into the R-pack age sanon [19]. W e previously developed metho ds for the design and analysis of RCTs with a single Marc h 18, 2026 2/31 follo wup time [20 – 22]. The ob jectiv e of this manuscript is to extend our previous regression approac h to estimating the WinP in longitudinal RCTs with missing data. Our metho d first con verts each observ ation at eac h timep oin t to a win fraction that represents the fraction of times that the observ ation is better than all the observ ations in the comparison group. Using the relationship b et ween ranks and the Deuchler scoring system, i.e, the sign of a difference b etw een tw o num b ers [10 – 12], the win fractions can b e easily obtained based on ranks. W e then use regression mo dels with dep endent v ariable b eing the win fractions to obtain the p oin t and v ariance estimates, follow ed by inv oking the central limit theorem for inference. One adv antage of this approach is that it can b e easily implemented using standard statistical softw are. The remainder of the man uscript is structured as follows. Section 2 presents tw o classic datasets as motiv ating examples. Section 3 first presen ts the regression approach to estimating WinP in the absence of missing data, follow ed by the GPC pro cedure, complete case analysis (ignoring cases with missing timep oin t data), and MMRM for win fractions analysis. Section 4 presents a simulation study to ev aluate the three metho ds. Section 5 illustrates the metho ds using data arising from the motiv ating examples. W e close the man uscript with a discussion. W e present illustrative SAS and R co de in the online App endix. 2 Tw o Motiv ating Examples 2.1 The p ostnatal depression trial This trial inv estigated the effectiveness of o estrogen when taken transdermally by woman diagnosed with p ostpartum depression [23]. A total of 61 participants were randomized to either an o estradiol ( n = 34), tw o at a time, to receive a total daily dose of 200 µg of 17 β -o estradiol, or a placeb o patch ( n = 27). Eac h sub ject was ev aluated on the Edinburgh p ostnatal depression scale (EPDS) at baseline and throughout the six mon thly visits. T o complete the EPDS questionnaire, resp onden ts selected the n umber next to the resp onse that b est corresp onded to how they felt during the past seven days. F or example, p ossible resp onses to the statement ‘The thought of harming myself has o ccurred to me’ were 3 (Y es, quite often), 2 (Sometimes), 1 (Hardly ever), and 0 (Never). The total score was obtained b y summing the num b ers selected for each of the 10 items, with a maximum score of 30 p oin ts indicating sev ere depression. The data are av ailable in Rab e-Hesketh and Everitt [24], pp. 138-139. The data is also repro duced in the supplementary SAS co de for con venience. According to these authors [24], the non-integer scores resulted from substituting the a verage of all av ailable items for the missing questionnaire items. The visit-sp ecific b o xplots b y treatmen t group are presented in Figure 1, showing that b y visit 6, 18% of participan ts dropp ed out in the control group and 37% in the treatmen t group. One could apply the MMRM directly to the EPDS scores, resulting in inference on the mean difference in scores, whic h ma y not b e easily in terpreted. 2.2 The trial of lab or pain Da vis [25] presen ted data from a randomized trial ev aluating a treatment for maternal pain relief during lab or. In this trial, 43 women in labor were randomly assigned to receive an exp erimen tal pain medication and 40 w omen were assigned to receive a placeb o. T reatm en t w as initiated when the cervical dilation was 8 cm. At 30 minutes in terv als, the intensit y of pain was self-rep orted b y placing a mark on a 100 mm line (0=no pain, 100=very m uch pain). As shown in Figure 1, the data are very skew ed and hav e numerous missing v alues at later time interv als. By the 6th time in terv al, 62.5% of participants in the placebo and 55.8% in the treatment group dropp ed out of the study . There has b een no easy w ay to Marc h 18, 2026 3/31 analyze such data to estimate the effect of the drug in reducing lab or pain. 3 Metho ds 3.1 Notations, definitions, and estimators Consider a longitudinal randomized clinical trial with a tw o-group parallel design. Supp ose that the primary ob jective is to compare the outcome at the end of the treatment p erio d b y estimating the treatment effect. Let Y ij = [ Y ij 0 , Y ij 1 , . . . , Y ij T ] T denote a vector of observ ations for the j th participant in the i th group, with Y ij t represen ting measurements at timep oin t t , t = 0 , 1 , 2 , . . . , T , with t = 0 denoting baseline, and i = 0 denoting the control group and i = 1 the treatment group, where j = 1 , 2 , . . . , n i , and total sample size is N = n 1 + n 0 . W e denote the sample sizes in group i at timep oint t as n it . When there is no missing data, n it = n i for all t = 0 , 1 , 2 , . . . , T . Data for different sub jects are assumed to b e indep enden t. How ever, rep eated observ ations from the same sub ject are assumed to b e correlated with a cumulativ e distribution function (CDF) for outcome at time t giv en b y F it ( y ) = 1 2 F − it ( y ) + F + it ( y ) where F − it ( t ) = Pr( Y ij t < y ) is the left-contin uous version and F + it ( y ) = Pr( Y ij t ≤ y ) is the righ t-contin uous version of the CDF. Such a definition can easily handle ties [26]. An estimator for F it ( y ) is given by b F it ( y ) = 1 n it n it X j =1 H ( y , Y ij t ) where H ( a, b ) is defined via the Deuchler score, or the sign of a − b , as H ( a, b ) = 1 2 sign( a − b ) + 1 W e can no w define the distribution-free treatmen t effect at time t as the probabilit y that an observ ation in the treatment Y 1 j t , j = 1 , 2 , . . . , n 1 , is larger than (or wins ov er) an observ ation in the control Y 0 j ′ t , j ′ = 1 , 2 , . . . , n 0 . Then the win probability (WinP) at timep oin t t can b e defined as θ t = P ( Y 0 j ′ t < Y 1 j t ) + 1 2 P ( Y 0 j ′ t = Y 1 j t ) = Z F 0 t dF 1 t , t = 0 , 1 , 2 , . . . , T (1) and estimated as b θ t = Z b F 0 t d b F 1 t An analogous definition can b e given for situations where smaller v alues are regarded as ‘winning’. Many terms hav e app eared in the literature for θ t , including the area under the receiv er op erating characteristic curve [27 – 29], ridits [30, 31] and the c -statistic [32]. W e prefer the term WinP for its intuitiv e interpretation. F or normally distributed data with a homogeneous v ariance b et ween the tw o treatmen t groups, i.e. Y ij t ∼ N ( µ it , σ 2 t ), the WinP is giv en b y θ t = Φ h ( µ 1 t − µ 0 t ) / ( √ 2 σ t ) i , where ( µ 1 t − µ 0 t ) /σ t is commonly known as the standardized mean difference (SMD) or Cohen’s effect size [33]. Thus, inference for the SMD can b e conducted using a WinP analysis, thereb y providing a viable alternativ e approac h to constructing a confidence interv al for the SMD. This relationship also suggests that Cohen’s p opular b enchmarks for the SMD ma y b e Marc h 18, 2026 4/31 adapted for interpreting the WinP . That is, WinPs of 0.56, 0.64, and 0.71 correspond to SMDs of 0.2 ‘small’, 0.5 ‘medium’, and 0.8 ‘large’ effect size, resp ectiv ely . In addition, the net treatmen t b enefit defined b y Buyse [12] is given b y NB t = P ( Y 0 j ′ t < Y 1 j t ) − P ( Y 0 j ′ t > Y 1 j t ) = 2 θ t − 1, whic h can b e recognized as the risk difference for binary outcomes. The win o dds (WO) [34], previously known as the generalized o dds ratio [35], is also readily av ailable as WO t = θ t / (1 − θ t ). Thus, the null v alue of the WinP , θ t = 0 . 5, corresp onds to NB t = 0 and W O t = 1. W e fo cus on developing metho ds for the WinP in longitudinal R CTs, ho wev er the results are readily applicable to the NB, WO, and the SMD without making distributional assumptions for the outcome data. 3.2 Inference for a single win probabilit y with complete data T o conduct inference, w e assume min [ n 1 t , n 0 t ] → ∞ suc h that N /n it < ∞ . The distribution of √ N ( b θ t − θ t ) can b e established by first deriving the p oint estimator and v ariance for WinP , follo wed b y in voking the cen tral limit theorem. F ollowing Brunner and Munzel [36], √ N ( b θ t − θ t ) = √ N n Z b F 0 t d b F 1 t − Z F 0 t d F 1 t o = √ N n Z F 0 t d( b F 1 t − F 1 t ) + Z ( b F 0 t − F 0 t ) d F 1 t + Z ( b F 0 t − F 0 t ) d( b F 1 t − F 1 t ) o ≈ √ N n 1 n 1 t n 1 t X j =1 h F 0 t ( Y 1 j t ) − θ t i − 1 n 0 t n 0 t X j ′ =1 h F 1 t ( Y 0 j ′ t ) − 1 − θ t io = √ N n 1 n 1 t n 1 t X j =1 F 0 t ( Y 1 j t ) − 1 n 0 t n 0 t X j ′ =1 F 1 t ( Y 0 j ′ t ) + 1 − 2 θ t o (2) where the approximation holds due to √ N Z ( b F 0 t − F 0 t ) d( b F 1 t − F 1 t ) → 0 in probability . This suggests that √ N ( b θ t − θ t ) d ≈ √ N n 1 n 1 t n 1 t X j =1 F 0 t ( Y 1 j t ) − 1 n 0 t n 0 t X j ′ =1 F 1 t ( Y 0 j ′ t ) + 1 − 2 θ t o (3) where d ≈ denotes asymptotic equiv alence in distribution. Three consequences follow from Eq (3). First, since the exp ectation of the right side of Eq (3) is 0, we hav e θ t = 1 2 h 1 n 1 t n 1 t X j =1 F 0 t ( Y 1 j t ) − 1 n 0 t n 0 t X j ′ =1 F 1 t ( Y 0 j ′ t ) i + 0 . 5 (4) Second, the distributional equiv alence of the tw o sides in Eq (3) suggests v ar( b θ t ) ≈ v ar h 1 n 1 t n 1 t X j =1 F 0 t ( Y 1 j t ) − 1 n 0 t n 0 t X j ′ =1 F 1 t ( Y 0 j ′ t ) i (5) Finally , an application of the cen tral limit theorem to the righ t side yields that [36] b θ t − θ t q v ar( b θ t ) ∼ N (0 , 1) (6) Inference for θ t based on the observed data can proce ed by plugging-in the empirical CDFs. Sp ecifically , the empirical CDFs can b e written as b F 0 t ( Y 1 j t ) = 1 n 0 t n 0 t X j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) and b F 1 t ( Y 0 j ′ t ) = 1 n 1 t n 1 t X j =1 H ( Y 0 j ′ t , Y 1 j t ) (7) Marc h 18, 2026 5/31 whic h are consistent estimators for F 0 t ( Y 1 j t ) and F 1 t ( Y 0 j ′ t ), resp ectively . Note that b F 0 t ( Y 1 j t ) quantifies the prop ortion of times that an observ ation in the treatment group, Y 1 j t , wins ov er all observ ations in the control group, while b F 1 t ( Y 0 j ′ t ) quantifies the prop ortion of times that an observ ation in the control group, Y 0 j ′ t , wins ov er all observ ations in the treatmen t group. The win fractions for the treatment group and one minus the win fractions in the control group ha ve b een termed as ‘placemen ts’ [37, 38]. F or simplicit y , we let W 1 j t = b F 0 t ( Y 1 j t ) and W 0 j ′ t = b F 1 t ( Y 0 j ′ t ) with means defined as W i.t = P n it j =1 W ij t n it , i = 0 , 1 Th us, the estimators based on Eq (4) may b e obtained as b θ t = W 1 .t = 1 2 ( W 1 .t − W 0 .t ) + 0 . 5 (8) with the corresp onding v ariance estimator based on Eq (5) given by c v ar( b θ t ) = c v ar h 1 n 1 t n 1 t X j =1 W 1 j t − 1 n 0 t n 0 t X j ′ =1 W 0 j ′ t i = c v ar( W 1 .t − W 0 .t ) (9) Note that the ostensible mismatch b etw een b θ t and its v ariance ma y also b e explained by the fact that W ij t is constructed by conditioning on the outcome v ariable of the comparison group. T o simplify the calculation of win fractions, we mak e the link b et ween win fractions and ranks, defined via the Deuc hler scores. At timep oin t t , the rank of an observ ation in the treatmen t group, Y 1 j t , among all observ ation in the combined sample of n 1 t + n 0 t is given b y [39], R 1 j t = 1 2 + n 1 t X j ′ =1 H ( Y 1 j t , Y 1 j ′ t ) + n 0 t X j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) and the within-group rank is given by , r 1 j t = 1 2 + n 1 t X j ′ =1 H ( Y 1 j t , Y 1 j ′ t ) Th us, W 1 j t = R 1 j t − r 1 j t n 0 t = P n 0 t j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) n 0 t j = 1 , 2 , . . . , n 1 t quan tifying the prop ortion of times that an observ ation wins ov er every observ ation in the con trol group. Similarly , the win fractions for observ ations in the con trol group, Y 0 j ′ t can b e obtained as W 0 j ′ t = R 0 j ′ t − r 0 j ′ t n 1 t , j ′ = 1 , 2 , . . . , n 0 t T o obtain the p oint estimate and v ariance of the WinP in Eqs (8) and (9), we use the readily av ailable regression pro cedures in common statistical soft ware. Specifically , we regress the win fractions, W ij t , on the treatment group indicator (0 for con trol and 1 for treatmen t) and other cov ariates (or their win fractions). Let G ij = i b e the treatment group indicator, we fit the follo wing mo del W ij t = β 0 + β 1 G ij + e ij t (10) Marc h 18, 2026 6/31 whic h yields b β 1 = W 1 .t − W 0 .t . Thus by Eq (8) b θ t = b β 1 2 + 0 . 5 F urthermore, b y Eq (9) c v ar( b θ t ) = c v ar( W 1 .t − W 0 .t ) = c v ar( b β 1 ) W e emphasize that the regression mo del is used a to ol to obtain estimates of the WinP and v ariance sim ultaneously , although they are derived from t wo differen t consequences of the asymptotic equiv alence in distribution of Eq (3). Th us, it is imp ortan t not to apply con ven tional results from regression mo dels, leading to the erroneous conclusion that c v ar( b θ ) = c v ar( b β 1 ) / 4. W e do not assume v ariance homogeneity in this framew ork as it is well-kno wn to b e to o restrictiv e for rank-based analysis [26, 40]. Instead, we apply the robust v ariance estimator [41], yielding c v ar( b β 1 ) = 1 n 1 t (1 − n 1 t ) n 1 t X j =1 W 1 j t − W 1 .t ) 2 + 1 n 0 t (1 − n 0 t ) n 0 t X j ′ =1 W 0 j ′ t − W 0 .t ) 2 (11) whic h is identical to v ariance estimators that hav e app eared rep eatedly in the literature [28, 36, 42]. T o see how to obtain Eq (11) using the simple linear regression in Eq (10) with the sandwic h v ariance estimator approach [41], denote the design matrix G as G = 1 n || ( 1 n 1 // 0 n 0 ) where 1 n is a vector of 1’s and 0 n 0 is a vector of 0’s, then the ‘robust’ or ‘sandwich’ v ariance estimator for b β 1 is the (2,2) element of V = G T G − 1 G T diag e 2 ij t 1 − h ij ! G G T G − 1 where G T G − 1 = 1 /n 0 − 1 /n 0 − 1 /n 0 1 /n 1 + 1 /n 0 , h ij = 1 G ij G T G − 1 1 G ij Th us, the (2,2) element of V is given by c v ar( b β 1 ) = P n 1 j =1 e 2 1 j t n 1 ( n 1 − 1) + P n 0 j =1 e 2 0 j t n 0 ( n 0 − 1) whic h can b e obtained using standard statistical pro cedures, e.g., SAS PROC MIXED with option GROUP= treatmen t indicator in the REPEATED statemen t. Note also that the normality assumption for the residuals is not critical as w e do not conduct inference for β 1 with a t -distribution. Rather, w e only use regression pro cedures that are readily a v ailable in soft ware pac k ages as a device to obtain the estimators in Eqs (8) and (9). As p oin ted out by Sen [42], the v ariance estimator in Eq (11) is p ositiv ely biased, with a bias less than θ t (1 − θ t ) / N . Tcheuk o et al [43] pro vided a practical approach to compute the un biased v ariance. Brunner and Konietsc hke [44] also prop osed an unbiased v ariance estimator. Both estimators are most useful for RCTs with small sample sizes, e.g., N < 30, whic h is not our fo cus, and will not b e pursued further here. W e do not pursue these Marc h 18, 2026 7/31 estimators further here since it is not ob vious to us how to implement them in the regression framew ork so that more general cases can b e handled. In what follows, w e apply regression mo dels to W ij t as dep endent v ariables, and then use Eqs (8) and (9) to identify estimates of θ t and the corresp onding v ariance. Inference on WinP pro ceeds by applying the Slutzky theorem to Eq (6), yielding asymptotically b θ t − θ t q c v ar( b θ t ) ∼ N (0 , 1) (12) T o impro ve small sample p erformance, w e conduct inference on the logit-scale of the WinP , i.e., ln(win o dds). Sp ecifically , for testing H 0 : θ t = 0 . 50, with b se ( b θ t ) b eing the square ro ot of c v ar( b θ t ), the test statistic is given by T = ln b θ t / (1 − b θ t ) b se( b θ t ) / b θ t (1 − b θ t ) (13) whic h is distributed asymptotically as the standard normal. The corresp onding (1 − α )100% confidence interv al for θ t is given by exp( l ) 1 + exp( l ) to exp( u ) 1 + exp( u ) (14) where the confidence limits for ln(win o dds) are given by l, u = ln b θ t 1 − b θ t ± z α/ 2 b se( b θ t ) b θ t (1 − b θ t ) and z α/ 2 is the upp er α / 2 quantile of the standard normal distribution. 3.3 The generalized pairwise comparison pro cedure for WinP with missing data In the context of longitudinal RCTs, the GPC pro cedure [12] starts b y ordering timep oints according to clinical imp ortance. Usually the later timep oin ts are considered to b e more clinically imp ortant than the earlier ones. The WinP at the last timep oin t T , θ T , may b e estimated using the tw o-sample U -statistic as b θ T = 1 n 1 n 0 n 1 X j =1 n 0 X j ′ =1 H ( Y 1 j T , Y 0 j ′ T ) (15) when both Y 1 j T and Y 0 j ′ T are observ ed. When either Y 1 j T or Y 0 j ′ T or both are missing, the GPC pro cedure replaces H ( Y 1 j T , Y 0 j ′ T ) with H ( Y 1 j ( T − 1) , Y 0 j ′ ( T − 1) ) if b oth Y 1 j ( T − 1) and Y 0 j ′ ( T − 1) are observed. Otherwise, it is replaced with H ( Y 1 j ( T − 2) , Y 0 j ′ ( T − 2) ). This pro cess con tinues un til a score is assigned for each pair of Y 1 j t and Y 0 j ′ t for all j = 1 , . . . , n 1 and j ′ = 1 , . . . , n 0 . Note that the sample size is still n i since the GPCs carries forw ard the comparisons prior to the end of treatment p erio d. In light of results by Rauch et al [45], the estimated win probability using the GPC pro cedure may b e seen as a weigh ted av erage of the timep oint-specific win probabilities, with non-standardized weigh ts ranging from 0.5 to 1. Non-standardized GPC weigh ts cast doubt on the v alidity and interpretation of estimates, and their complex form has also b een discussed by others [46, 47]. A previous simulation study found that the GPC pro cedure can inflate T yp e I error ev en when data are missing under MCAR in the case of a single follo w-up timepoint [48], while another study [17] provided supp orting evidence for the use of the GPC pro cedure for Marc h 18, 2026 8/31 h yp othesis testing. Neither of these pap ers hav e discussed estimation and confidence in terv al construction for appropriate treatment effects. The estimator for θ T can b e re-written as b θ T = 1 n 1 n 1 X j =1 1 n 0 n 0 X j ′ =1 H ( Y 1 j T , Y 0 j ′ T ) = 1 n 1 n 1 X j =1 W 1 j T = W 1 .T where W 1 j T = 1 n 0 n 0 X j ′ =1 H ( Y 1 j T , Y 0 j ′ T ) , j = 1 , 2 , . . . , n 1 whic h represen ts the fraction of times that an observ ation in group 1 at timep oin t T , Y 1 j T , wins ov er every observ ation in group 0 at timep oint T , Y 0 j ′ T , for j = 1 , 2 , . . . , n 1 and j ′ = 1 , 2 , . . . , n 0 . The win fraction for an observ ation in the group 0, Y 0 j ′ T , can b e obtained analogously as W 0 j ′ T = 1 n 1 n 1 X j =1 H ( Y 0 j ′ T , Y 1 j T ) , j ′ = 1 , 2 , . . . , n 0 With the constructed W ij t , we use a regression mo del similar to Eq (10) to obtain the p oin t estimate for θ T and its v ariance, with the guidance of Eqs (8) and (9). Sp ecifically , b y fitting W ij T to the following mo del W ij T = β 0 + β 1 G ij + e ij T (16) w e obtain b θ T = b β 1 2 + 0 . 5 and c v ar( b θ T ) = c v ar( b β 1 ) = c v ar( W 1 .T − W 0 .T ) = 1 n 1 s 2 1 T + 1 n 0 s 2 0 T (17) with s 2 iT = 1 n i − 1 n i X j =1 W ij T − W i.T 2 , i = 1 , 0 T o follow the analysis of adjusting for baseline outcome measurements to impro ve p o w er in RCTs, we can extend the mo del to an analysis of co v ariance (ANCO V A)-t yp e mo del as [21, 22] W ij T = β 0 + β 1 G ij + γ W ij 0 + e ij T (18) where W ij 0 denotes win fraction for the baseline outcome measurements and e ij T ∼ N (0 , σ 2 i ). With estimates of b θ T and its v ariance, we can conduct inference on the logit-scale of WinP , i.e., ln(win o dds) as in Eqs (13) and (14). Note that we hav e used the win fractions, W ij T , as a simple and flexible alternative to the current implementation of the GPC pro cedure [12, 49]. Sp ecifically , the GPC analysis in volv es n 1 × n 0 data p oints with three p ossible v alues of 1, 0.5, and -1, while our pro cedure analyzes n 1 + n 0 data p oints with possible v alues ranging from 0 to 1. Marc h 18, 2026 9/31 3.4 The mixed-effects mo del regression approac h for win probabilit y Instead of carrying the last comparison forward as done b y the GPC approach, we can apply the MMRM for timep oin t-sp ecific win fractions, resulting in a similar nonparametric approac h to multiv ariate outcomes of different scales [50, ]. Recall that the timep oin t-sp ecific win probability is given by ˆ θ t = 1 n 1 t n 0 t n 1 t X j =1 n 0 t X j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) = 1 n 1 t n 1 t X j =1 1 n 0 t n 0 t X j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) = 1 − 1 n 0 t n 0 t X j ′ =1 1 n 1 t n 1 t X j =1 H ( Y 0 j ′ t , Y 1 j t ) = 1 n 1 t n 1 t X j =1 W 1 j t = 1 − 1 n 0 t n 0 t X j ′ =1 W 0 j ′ t where W 1 j t = 1 n 0 t n 0 t X j ′ =1 H ( Y 1 j t , Y 0 j ′ t ) and W 0 j ′ t = 1 n 1 t n 1 t X j =1 H ( Y 0 j ′ t , Y 1 j t ) As discussed ab ov e, the time-sp ecific win fractions, W ij t , can b e con venien tly calculated using (mid)ranks. In the context of disease severit y , w e use descending ranks. The simplest approach is to ignore cases with missing data and analyze the win fractions, W ij T , at the last timep oin t, using an ANCO V A mo del given by W ij T = β 0 + β 1 M G ij + γ W ij 0 + e ij T (19) where W ij 0 represen ts the win fractions based on baseline measurements, β 1 M is the difference in mean fractions b et ween tw o treatment groups at time T , and e ij T ∼ N (0 , σ 2 i ). Note that this mo del has the same form as in a complete case analysis (CCA), but the win fractions here are constructed using timep oint-specific ranks. Again, based on Eq (8), the corresp onding WinP at time T is estimated by b θ T = b β 1 M 2 + 0 . 5 and by Eq (9) the v ariance is estimated by c v ar( b θ T ) ≈ c v ar( W 1 .T − W 0 .T ) = c v ar( b β 1 M ) Inference for the WinP can pro ceed on the logit scale using Eqs (13) and (14). W e refer to this approach as the CCA. A key limitation of CCA is that it do es not use all a v ailable data, and thus may lose information under MCAR, or may provide biased results under MAR or MNAR [3, 51]. T o o vercome suc h limitation, we may analyze the win fractions, W ij t , with a mixed mo del for rep eated measures (MMRM) [52]. Denoting W ij = [ W ij 1 , W ij 2 , . . . , W ij T ] T , the mo del is given b y , W ij = X ij β + e ij (20) where X ij = 1 , G ij , W ij 0 O I T , and β = [ β 01 , . . . , β 0 T , β 11 , . . . , β 1 T , γ 1 , γ 2 , . . . , γ T ] T with N denoting the Kroneck er pro duct, I T b eing a T × T identit y matrix, and the sup erscript ‘ T ’ denoting the transp ose, and e ij ∼ N ( 0 , Σ i ), with an unstructured (co)v ariance matrix Σ i for group i . Marc h 18, 2026 10/31 Missing outcome data can b e easily handled by removing the corresp onding rows in X ij and elements in β . When co ding in statistical soft ware, X ij ma y b e defined by regarding the v ariable for timep oints as a categorical factor crossed with the treatment indicator and baseline win fractions, W ij 0 . Separate unstructured cov ariance structures for each group Σ i can b e easily implemented using the GROUP option of the REPEATED statement in PROC MIXED with SAS. Poin t estimates of all timep oin t-sp ecific WinPs, including θ T , and asso ciated standard errors can b e obtained using least-squares mean estimates (‘lsmestimates’) of differences from the MMRM mo del. Since this mo del uses an unstructured time profile for fixed effects and cov ariance structure for each treatment group, it av oids mo del missp ecification and pro vides v alid results for data under MCAR or MAR in the analysis of longitudinal RCTs with contin uous data [3, 51]. Again, the p erformance of this metho d may b e impro ved b y a logit-transformation for b θ T b y adapting Eqs (13) and (14). The finite sample p erformance of MMRM for analyzing win fractions is largely unknown and will b e inv estigated b y sim ulation study b elow. 4 Sim ulation Study Since the three metho ds (the GPC in Eq (18), CCA in Eq (19 ) and MMRM in Eq (20)) presen ted in the previous section w ere dev elop ed using large sample theory , their p erformance in finite samples must b e ev aluated. T o this end, we ev aluate the p erformance of these metho ds in terms of the relative bias % for estimating the WinP ( θ T ) at the pre-sp ecified final timep oin t, the empirical co verage and tail errors of the asso ciated t wo-sided 95% confidence interv als, as well as the rejection rates in testing H 0 : θ T = 0 . 5. The three metho ds provided identical results when there are no missing data and no baseline measurements. Thus, w e fo cused on cases of missing data with adjustments for baseline measuremen ts. W e sim ulated complete data from a multiv ariate normal distribution with parameter v alues patterned based on the mean depression sc ores at Baseline, and Visits 2, 4, and 6 and v ariance-co v ariance matrices in the p ostpartum depression study [23]. W e considered four tra jectories. In T ra jectory 1, b oth group means changed o ver time, but were equal at eac h timep oin t. In T ra jectory 2, b oth group means changed ov er time, but group mean for the treatment arm changed more b efore conv erging to the same mean of the control arm at the last timep oint. In T ra jectory 3, the treatment-arm mean deviated more ov er time, representing a ‘small’ effect size of θ T ≈ 0 . 56. In T ra jectory 4, the profiles w ere crossed with the win probability at the last timep oint representing close to a ‘medium’ effect size of θ T ≈ 0 . 62. The sp ecific timep oint mean vectors for the control and treatment arms m 0 , m 1 for each tra jectory are given by 1) m 0 = m 1 = (20 , 16 , 12 , 11) 2) m 0 = (20 , 16 , 12 , 11) and m 1 = (20 , 15 , 9 , 11) 3) m 0 = (20 , 16 , 12 , 11) and m 1 = (20 , 15 , 9 , 10) 4) m 0 = (20 , 15 , 9 , 11) and m 1 = (20 , 16 , 12 , 9) and the v ariance-cov ariance matrices for the t wo groups are given by Σ 0 = 15 . 6 12 . 9 4 . 8 4 . 4 12 . 9 37 . 5 22 . 8 11 . 6 4 . 8 22 . 8 34 . 2 17 . 9 4 . 4 11 . 6 17 . 9 21 . 9 and Σ 1 = 12 . 8 7 . 4 3 . 6 7 . 1 7 . 4 43 . 2 22 . 7 23 . 8 3 . 6 22 . 7 21 . 8 18 . 8 7 . 1 23 . 8 18 . 8 22 . 4 for the control and treatmen t groups, resp ectively . Marc h 18, 2026 11/31 Since low er scores indicate b etter outcomes, for the rank-based pro cedure, w e used rev erse ranking in the sim ulation, i.e., the largest score receives a rank of 1, the next largest score receives a rank of 2, and so on. F or the GPC pro cedure, we used H ( a, b ) = [1 − sign( a − b )] / 2 for calculating win fractions. W e first considered a sample size of n 1 = n 0 = 50, similar to our motiv ating examples. W e rep eated the ab o v e pro cess for each scenario 1000 times. The relative bias % is defined as the mean of ( \ WinP − WinP ) / WinP × 100 ov er 1000 replications. Performance of the 95% confidence interv al is quantified by o verall cov erage, defined as the p ercen tage of times the confidence interv al cov ers the true parameter v alue, and the left- and righ t-tail errors, defined as the p ercen tage of times the upp er limit is smaller and the low er limit is greater than the true parameter v alue, respectively . W e considered adequate empirical cov erage to b e within 93.6% to 96.4%. W e quantified the empirical p o wer as the p ercen tage of times that the null hypothesis H 0 : WinP = 0.5 w as rejected at the 2-sided 5% significance level. W e also calculated the mean width of the 1000 confidence interv als for each scenario. F or MCAR, w e deleted data suc h that 10% of participan ts drop out at Times 2, 3, and 4, resp ectiv ely , resulting in a 30% total drop out rate at Time 4 for b oth treatment groups. F or MAR and MNAR, we adapted the metho ds used by Mallinckrodt et al [53] and Barnes et al [54] to simulate tw o-arm trials with outcomes measured at baseline and three p ost-in terven tion timep oin ts. Sp ecifically , we deleted the complete data for each of the four tra jectories according to the following four com binations: (1) Equal trigger v alues ( > 16) and equal drop out probability (0.4) for b oth groups; (2) Differen t trigger v alues ( > 16 for group 0 and > 15 for group 1) and equal dropout probabilit y (0.4); (3) Equal trigger v alues ( > 16) and different drop out probabilities for b oth groups (0.5 for group 0 and 0.3 for group 1); and (4) Differen t trigger v alues ( > 16 for group 0 and > 15 for group 1) and different drop out probabilities for b oth groups (0.5 for group 0 and 0.3 for group group 1). T o create data with a MAR drop out mec hanism, scores that triggered drop out were retained, so that the observed data could explain the drop out. F or example, if a participant had a score of 17 at Time 2 and the realized v alue of Bernoulli (0.4) is 1, then 17 is the trigger for drop out. F or MAR, the v alue of 17 is retained, but the v alues at Times 3 and 4 w ere deleted. T o create data with a MNAR drop out mechanism, scores that triggered drop outs were also deleted, so that the observ ed data could not en tirely explain the drop out. In the ab ov e example, the v alue at Time 2 is also deleted. The resulting drop out rates by treatment group under MAR and MNAR are presented giv en in T able 1. Eac h dataset was then analyzed using the GPC, CCA, and MMRM pro cedures, with the latter tw o approaches analyzing the win fractions conv erted using ranks. F or comparison, w e also analyzed each dataset prior to creating missing data. T able 2 sho ws the results of the four tra jectories with a 30% drop out rate under MCAR. The results clearly show that the GPC metho d only p erformed w ell under T ra jectory 1, i.e, the n ull h yp othesis is true under all the time p oin ts, whic h is v ery restrictiv e in practice. In particular, the bias was less than 5% for b oth CCA and MMRM metho ds, while the GPC exhibited bias of 57.5%, -41.2%, and -170.7% for T ra jectories 2 to 4, resp ectively . Consequen tly , the CI cov erage rates for the CCA and MMRM are close to the nominal level of 95%, while the co verage for the GPC was far from 95%. Interestingly , the GPC metho d pro vided narro wer confidence interv als than the datasets without missing data. This prop ert y has b een do cumented for last observ ation carried forward (LOCF) in the literature [2, 3, 55]. Under the other three tra jectories, the GPC metho d may o verestimate or underestimate the true parameter v alues of the win probability depending on the Marc h 18, 2026 12/31 magnitudes of separation prior to the primary timep oin t. The confidence interv als failed to main tain the nominal cov erage level. These results reinforce the conclusion arrived by Deltuv aite-Thomas and Burzyko wski [48] that the GPC metho d should not b e used in practice with missing data, despite recommendations by Byuse [12] and F an et al [17]. In fact, our simulation results demonstrate similar characteristics of the LOCF technique, whic h cannot provide v alid results even when the data are MCAR [55, 56]. T able 3 presen ts the results for T ra jectory 1 (Cases 1 to 4) and T ra jectory 2 (Cases 5 to 8) under the four com binations of data deletion to em ulate data MAR and MNAR. Under the MAR settings, the o verall cov erage p ercen tages of confidence interv als from the GPC for T ra jectory 1, i.e., the null hypothesis is true at all timep oints, are close to the nominal level of 95%. Ho wev er, the confidence interv als are lopsided, with unequal tail errors. F or T ra jectory 2, the GPC metho d resulted in biased p oin t estimates, p o or cov erage percentages and inflated Type I error rates. The CCA metho d p erformed b etter than the GPC metho d, but was still unsatisfactory . The MMRM for win fractions p erformed v ery well for all eight scenarios in terms of bias, cov erage p ercentage, and T yp e I error rate. Under the MNAR settings, neither the GPC metho d nor the CCA metho d can b e recommended for practice. In terestingly , the MMRM pro cedure p erformed reasonably well, suggesting that it is fairly robust to violation of MAR assumption. This is consistent with previous results when applying the MMRM to conduct inference on mean scores [53, 54, 56]. T able 4 shows the results of the true win probabilit y greater than 0.5 in T ra jectory 3 (Cases 9 to 12) and T ra jectory 4 (cases 13 to 16) under the four combinations of data deletion to emulate data MAR and MNAR. F or T ra jectory 3, the GPC method had comparable confidence interv al cov erage rates to the MMRM that are close to the nominal lev el. F or T ra jectory 4, the GPC pro duced bias% ranging from -58.2% to -92.1%, with confidence interv al cov erage rates severely b elo w the nominal level. Although the CCA p erformed b etter than the GPC under this tra jectory , it did not p erform as well as the MMRM, whic h provided reasonable results across all criteria. Overall, the MMRM provided satisfactory results, and was fairly robust to the violations of the MAR assumption. Sim ulation results (not shown) for sample sizes n 0 = 50 and n 1 = 100 and n 0 = 100 and n 1 = 100 are consistent with the ab ov e conclusions. In addition, unrep orted results also suggest that the p erformance of the metho ds for log-normal data are identical to the ab o v e results, which is exp ected b ecause all metho ds are rank-based and inv ariant to any monotone transformation. 5 Analyzing the motiv ating examples W e no w apply the GPC, CCA and MMRM metho ds to data from the tw o motiv ating examples, with the primary goal of estimating the win probability at the end of the treatmen t p eriod, in addition to time-sp ecific WinP from the MMRM procedure. Since lo wer scores indicate b etter outcomes (wins) in b oth examples, for the rank-based pro cedures, we ranked the data in descending order as done in the sim ulation. F or the GPC pro cedure, we used H ( a.b ) = [1 − sign( a − b )] / 2 to calculate win fractions. F or the p ostnatal depression trial [23], the visit-sp ecific b o xplots show no strong evidence that the normality assumption is violated. One could apply the MMRM on the raw scores, resulting in P = 0 . 0026 for testing the mean score difference b et ween the treatment and con trol groups, and p oint estimate and 95% CI given b y 4.685 (1.757, 7.613). Except for the significan t P -v alue, the p oin t estimate and CI are not straightforw ard to in terpret. T able 5 presents the results of the WinP analysis using the three approaches. The estimate at Visit 6 (95% CI) based on the GPC metho d is given by 0.737 (0.611, 0.834) with P = 0 . 0005, compared with the CCA metho d point estimate of 0.779 (0.604, 0.890). The corresp onding results based on the MMRM for timep oint-specific win fractions are 0.777 (0.608, 0.887) and P = 0 . 0025. In light of the simulation results, it is reasonable to conclude Marc h 18, 2026 13/31 that the GPC metho d provided biased results. W e can also explain the results by regarding the GPC estimate as a weigh ted av erage of the timep oin t-sp ecific WinPs, since the visit-sp ecific estimates b efore Visit 6 are smaller in this case. In other words, carrying forw ard these smaller pairwise-comparisons by the GPC resulted in underestimates for the landmark WinP . The similar results from the CCA and MMRM metho ds suggest that the MAR assumption is tenable for this example, and the results from the MMRM are reliable. The results may b e rep orted as “the probability that a patien t with depression treated w ith a daily dose of 200 µg of 17 β -o estradiol had a b etter EPDS score than a control patien t is 77.7% (95% CI 60.8 to 88.7%, P = 0 . 0025)”. F or the lab or pain trial [25], Figure 1 clearly demonstrates that the data are far from normal. In addition, there are substantial dropouts at later time interv als. W e thus refrain from applying the MMRM directly to the pain scores. T able 5 presen ts the results of the WinP analysis. The estimate (95% CI) of win probability at the sixth time interv al based on the GPC metho d is 0.756 (0.649, 0.838) with P = 0 . 00005. The CCA metho d yielded the estimate of 0.895 (0.738, 0.962) and P = 0 . 00014. The results are comparable with those of the MMRM for win fractions, with p oin t estimate (95% CI) at the sixth time-interv al given b y 0.875 (0.722, 0.950) and P = 0 . 00012. In conclusion, we rep ort the probability that a w oman treated with pain medicine had less pain than a woman in the con trol arm is 87.5% (95% CI 72.2 to 95.0%, P = 0 . 00012). Again, the underestimation by the GPC metho d can b e explained by examining the interv al-sp ecific estimates from the MMRM for win fractions. The sup erficially narrow er confidence interv als by the GPC metho d for b oth examples are the result of treating the comparisons carried forward for missing data as real data. This feature of masking the uncertain ty of missing data has b een well-documented in the literature for LOCF [2, 55]. The LOCF treats the imputed v alues and the observed v alue on equal fo oting, while the GPC metho d treats the imputed comparisons and actual comparisons on equal fo oting and thus artificially increases the amount of a v ailable information. The simulation results also demonstrate this characteristic of the GPC metho d. 6 Discussion Longitudinal R CTs are commonly analyzed b y estimating treatmen t effect at a pre-sp ecified landmark timep oint. Missing data due to drop out is usually handled with the hypothetical strategy so that the estimand targets treatment effect under the assumption that ‘patien ts tak e their medication as directed’ [5]. When the outcome measurements are assumed to b e normally distributed and inference is on the difference b et ween group means, there exists a large literature [1 – 3, 51, 56]. When outcome data do es not follow normal distribution or lacks clearly meaningful units, the MMRM approach may not b e directly applicable. In such situations, the GPC pro cedure app ears to b e an attractiv e option due to its ability to prioritize timep oints [12]. Despite the large literature on the GPC pro cedure in RCTs, esp ecially in situations where different t yp es of outcomes are inv olved in the construction of a comp osite end p oint [15, 57], its v alidity for handling missing data in landmark analysis has not b een fully explored, with few exceptions [17, 48], which provided conflicting results. W e found that the GPC metho d can yield biased results even under restrictive assumptions, such as data b eing missing completely at random and equal drop out rates b et ween the tw o comparison groups. The GPC metho d app ears to b e v alid only when there is no treatment effects at all timep oints. This result is imp ortant, since the GPC metho d w as suggested by Buyse [12] as an alternative to the LOCF approach, which is w ell-known to lead to misleading inference [2, 58]. The simulation results show ed that the GPC metho d has the similar problem asso ciated with LOCF. This is not surprising b ecause the only difference b etw een the t wo approaches is that the GPC carries last comparison forw ard, whereas LOCF carries the last observ ation forward. In fact, the GPC approach in the Marc h 18, 2026 14/31 presen t con text results in a p oin t estimate that may b e regarded as a weigh ted av erage of timep oin t-sp ecific win probabilities, with complicated non-standardized weigh ts that dep end on the correlations among the rep eated measures and drop out rates as well as ties [45]. Similar problems hav e b een identified for the infamous LOCF approach [2, 58]. W e prop osed a rank-based simple alternative that inv olves three steps. First, con vert the timep oin t-sp ecific raw scores to win fractions using ranks. Second, analyze the win fractions using the MMRM with least square contrasts to obtain estimates of the timep oin t-sp ecific win probabilities and their standard errors. Finally , apply the logit-transformation for inference in terms of the win probabilit y for the primary timep oin t and other timep oints as required. The simulation results suggest our approach p erformed very well in terms of bias and confidence interv al cov erage, as long as the data are missing under MCAR or MAR. The MMRM pro cedure for win fractions has also demonstrated some robust prop erties to data under MNAR. Similar to the conv entional MMRM analysis, our approach is useful for primary analysis when the h yp othetical strategy is used to deal with drop outs. Our approach in principle is similar to the nonparametric approach by Rubarth et al [40], who developed a pro cedure for comparing m ultiple groups based on ranks b y also inv oking the ‘Asymptotic Equiv alence Theorem’ given b y Eq (3). Ho wev er, their approach is more suitable for controlled or homogeneous study environmen ts, requiring no adjustmen t for co v ariates. As a regression approac h, our approach can readily handle cov ariates [21, 22], ev en in the case of correlated or clustered outcomes [50, 59, 60]. Moreov er, the WinP estimates from our approach are collapsible due to the use of a linear mo del for estimation. Th us, our pro cedure makes mo ot the debate of which estimand to use in RCTs with discrete data. W e did not consider multiple imputation (MI). Previous simulation results [48] show ed that MI may result in severely conserv ative Type I error rates across a v ariety of scenarios to the exten t that the empirical rejection rates are around 2.5% for the nominal 5% level, as sho wn in their online Supplementary T ables S13 to S16. Second, MI usually relies on parametric imputation mo dels, including the predictive mean matching metho d. It is unclear to us on how to pro ceed with ordered categorical data or severely skew ed outcomes as those in the lab our pain example. Addressing these difficulties to exploit the adv antage of MI in using the information contained in the auxiliary v ariables is left for future research. In summary , we hav e identified the limitations of using the GPC metho d for handling missing data in longitudinal trials when the primary comparison is at the last planned timep oin t (landmark). The MMRM pro cedure for win fractions, whic h can b e obtained easily using ranks, presents a v alid and muc h simpler alternative for estimating win probabilit y . The estimated win probabilit y can b e directly transformed to net treatment b enefit [12, 13] and win o dds [34]. Due to unstructured time and co v ariance structure, the MMRM pro cedure for win fractions can b e pre-sp ecified in the proto col and the statistical analysis plan [3, 5]. In the proto col developmen t, the defining attributes of the estimands remain the same as those for longitudinal RCTs using the hypothetical strategy to deal with missing data as detailed b y Mallinc kro dt et al [5], except the ‘p opulation summary’ is win probabilit y rather than ‘mean difference’. As a rank-based pro cedure, our approac h requires no user-written sp ecialized softw are pac k ages. The SAS and R co de for analyzing the p ostnatal depression trial is presented in the (online) App endix. 7 Ac kno wledgmen ts The research of Drs G Zou and Choi was supp orted partially by Individual Discov ery Grants from the Natural and Engineering Research Council of Canada, Grant/Aw ard Number: R GPIN-2019-04741, R GPIN-2019-06549. Dr. Qiu is supp orted by the Science and T echnology Researc h Program of Chongqing Municipal Education Commission (Grant No. KJZD-K202201101). Marc h 18, 2026 15/31 8 App endix (online) 8.1 SAS co de In this app endix, we present SAS and R co de for the analysis of the p ostnatal depression trial (EPDS) to estimate win probability using the MMRM for win fractions. Note that the non-in teger scores resulted from substituting the av erage of all av ailable items for the missing questionnaire items. Three steps are in volv ed. First, Conv ert wide format data to timep oin t-sp ecific win fractions using PROC RANK, with the option DESCENDING to accommo date smaller scores win. Second, Analyze win fractions in long format using PROC MIXED and obtain timep oin t-sp ecific win probability through treatment-b y-time in teraction con trasts using LSMESTIMA TE statements. Finally , Manipulate results from LSMESTIMA TE with logit-transformation to obtain the win probability estimates and confidence interv al for each timep oin t. **EPDS Data; data wideEPDS; input id trt y0 y1 y2 y3 y4 y5 y6@@; cards; 1 0 18 17 18 15 17 14 15 2 0 27 26 23 18 17 12 10 3 0 16 17 14 . . . . 4 0 17 14 23 17 13 12 12 5 0 15 12 10 8 4 5 5 6 0 20 19 11.54 9 8 6.82 5.05 7 0 16 13 13 9 7 8 7 8 0 28 26 27 . . . . 9 0 28 26 24 19 13.94 11 9 10 0 25 9 12 15 12 13 20 11 0 24 14 . . . . . 12 0 16 19 13 14 23 15 11 13 0 26 13 22 . . . . 14 0 21 7 13 . . . . 15 0 21 18 . . . . . 16 0 22 18 . . . . . 17 0 26 19 13 22 12 18 13 18 0 19 19 7 8 2 5 6 19 0 22 20 15 20 17 15 13.73 20 0 16 7 8 12 10 10 12 21 0 21 19 18 16 13 16 15 22 0 20 16 21 17 21 16 18 23 0 17 15 . . . . . 24 0 22 20 21 17 14 14 10 25 0 19 16 19 . . . . 26 0 21 7 4 4.19 4.73 3.03 3.45 27 0 18 19 . . . . . 28 1 21 13 12 9 9 13 6 29 1 27 8 17 15 7 5 7 30 1 15 8 12.27 10 10 6 5.96 31 1 24 14 14 13 12 18 15 32 1 15 15 16 11 14 12 8 33 1 17 9 5 3 6 0 2 34 1 20 7 7 7 12 9 6 35 1 18 8 1 1 2 0 1 36 1 28 11 7 3 2 2 2 37 1 21 7 8 6 6.5 4.64 4.97 38 1 18 8 6 4 11 7 6 39 1 27.46 22 27 24 22 24 23 40 1 19 14 12 15 12 9 6 41 1 20 13 10 7 9 11 11 42 1 16 17 26 . . . . 43 1 21 19 9 9 12 5 7 44 1 23 11 7 5 8 2 3 45 1 23 16 13 . . . . 46 1 24 16 15 11 11 11 11 47 1 25 20 18 16 9 10 6 48 1 22 15 17.57 12 9 8 6.5 49 1 20 7 2 1 0 0 2 50 1 20 12.13 8 6 3 2 3 51 1 25 15 24 18 15.19 13 12.32 52 1 18 17 6 2 2 0 1 53 1 26 1 18 10 13 12 10 54 1 20 27 13 9 8 4 5 55 1 17 20 10 8.89 8.49 7.02 6.79 56 1 22 12 . . . . . 57 1 22 15.38 2 4 6 3 3 58 1 23 11 9 10 8 7 4 59 1 17 15 . . . . . 60 1 22 7 12 15 . . . 61 1 26 24 . . . . . ; *First, create win fractions; proc sort data=wideEPDS; by trt; run; Marc h 18, 2026 16/31 ods listing close; ods output summary= NN(keep=trt y0_N y1_N y2_N y3_N y4_N y5_N y6_N ); proc means data= wideEPDS; BY trt; var y0 -y6; run; data NN; *switch sample size at each visit; set NN; trt = 1-trt; run; proc sort data=NN; by trt; run; proc rank descending data= wideEPDS out=overrank (keep=yo0 - yo6 trt id); var y0-y6; ranks yo0 - yo6; run; proc rank descending data=wideEPDS out=grprank (keep=yg0 -yg6 trt id); by trt; var y0 -y6; ranks yg0 -yg6; run; data widewinF; *win fractions; merge overrank grprank NN; by trt; y0 = (yo0 - yg0)/y0_N; y1 = (yo1 - yg1)/y1_N; y2 = (yo2 - yg2)/y2_N; y3 = (yo3 - yg3)/y3_N; y4 = (yo4 - yg4)/y4_N; y5 = (yo5 - yg5)/y5_N; y6 = (yo6 - yg6)/y6_N; keep trt id y0- y6; run; data longWinF; set widewinF; time=’y1’; winF = y1; output; time=’y2’; winF = y2; output; time=’y3’; winF = y3; output; time=’y4’; winF = y4; output; time=’y5’; winF = y5; output; time=’y6’; winF = y6; output; keep id trt time winF y0; run; proc sort data = longWinF out= longWF; by outcome ; run; *Second, analyze long fortmat win fraction; ods listing close; ods output LSMEstimates = MMRMEst; proc mixed data = longWF; class id trt outcome; model winF =y0*time time*trt/noint notest ddfm =kr; repeated time /subject =id type =un group=trt; lsmestimate trt*time [-1, 1 1][1, 2 1]; lsmestimate trt*time [-1, 1 2][1, 2 2]; lsmestimate trt*time [-1, 1 3][1, 2 3]; lsmestimate trt*time [-1, 1 4][1, 2 4]; lsmestimate trt*time [-1, 1 5][1, 2 5]; lsmestimate trt*time [-1, 1 6][1, 2 6]; run; *Finally, obtatin WINP estimated and 95% CIs; data WinP; set MMRMEst; WinP = Estimate/2+.5; Marc h 18, 2026 17/31 ** lgt transformation works better; lgt =log(WinP /(1-WinP )); selgt = StdErr/(WinP*(1-WinP)); l = lgt - 1.96*selgt; u = lgt + 1.96*selgt; low = logistic(l ); upp = logistic(u ); wid=upp-low; test=lgt/selgt; p_val = 2*(1-probnorm(abs(test))) ; keep winP low upp wid p_val; run; ods listing; proc print ; var winP low upp wid p_val; run; 8.2 R co de # Load libraries library(dplyr) library(tidyr) library(mmrm) library(emmeans) epds_wide <- read.csv("wideEPDS.csv") # Transform to long format epds_long <- epds_wide |> pivot_longer(!c(id, trt, y0), names_to = "t", values_to = "y") |> mutate(time = as.numeric(gsub("y", "", t))) |> select(-t) # Transform time, id and trt to factor format epds_long$time <- as.factor(epds_long$time) epds_long$id <- as.factor(epds_long$id) epds_long$trt<- as.factor(epds_long$trt) ######--------------transform to rank based win fractions-----------------#### # Calculate sample size based on different trt and time(excluding missing) epds_long2 <- epds_wide |> pivot_longer(!c(id, trt), names_to = "t", values_to = "y") |> mutate(time = as.numeric(gsub("y", "", t))) |> select(-t) # Switch n and count by group and time count_data <- epds_long2 |> mutate(trt = ifelse(trt == "0", "1", "0")) |> group_by(trt, time) |> summarize(n = sum(!is.na(y))) |> ungroup() # Create ranks(descending) ranks <- epds_wide |> mutate(R0 = rank(-y0, na.last = "keep",), Marc h 18, 2026 18/31 R1 = rank(-y1, na.last = "keep"), R2 = rank(-y2, na.last = "keep"), R3 = rank(-y3, na.last = "keep"), R4 = rank(-y4, na.last = "keep"), R5 = rank(-y5, na.last = "keep"), R6 = rank(-y6, na.last = "keep")) |> group_by(trt) |> mutate(r0 = rank(-y0, na.last = "keep"), r1 = rank(-y1, na.last = "keep"), r2 = rank(-y2, na.last = "keep"), r3 = rank(-y3, na.last = "keep"), r4 = rank(-y4, na.last = "keep"), r5 = rank(-y5, na.last = "keep"), r6 = rank(-y6, na.last = "keep")) |> mutate() # Transform ranks to the long format ranks_long <- ranks |> pivot_longer(cols = c(y0:y6, R0:R6, r0:r6), names_to = c(".value", "time"), names_pattern = "(.)(.)") %>% mutate(time = as.numeric(time)) |> merge(count_data, by = c("trt", "time")) |> #merge with count data n by group and time mutate(dd = R -r, winf = dd/n, time = as.factor(time), trt = as.factor(trt), id = as.factor(id)) #calculate the group difference and win fractions # Get the column winf0 for baseline winf0 <- ranks_long |> filter(time == 0) |> mutate(winf0 = winf) |> select (trt, id, winf0) ranks_long <- ranks_long |> merge(winf0, by = c("trt", "id")) |> filter(time != "0") |> arrange(time, id) ######--------------Apply the MMRM to the win fractions -----------------#### # Reorder the levels of ’trt’ so that ’1’ comes before ’0’ ranks_long$trt <- factor(ranks_long$trt, levels = c("1", "0")) mmrm_winf <- mmrm(winf ~ time * trt +winf0*time + winf0 * trt + us(time | trt/id), data = ranks_long, control = mmrm_control(method = "Kenward-Roger", vcov = "Kenward-Roger-Linear")) summary(mmrm_winf) # Use emmeans to get the estimated marginal means Marc h 18, 2026 19/31 emm_results_winf <- emmeans(mmrm_winf, ~ trt | time) # Perform contrasts for time-specific comparisons of treatment vs control contrast_results_winf <- contrast(emm_results_winf, interaction = c("pairwise")) contrast_summary_winf <- summary(contrast_results_winf , infer = c(TRUE, TRUE)) # Display the estimate, 95% confidence interval, and p-value print(contrast_summary_winf) # Calculate WinP and its confidence intervals mmrm_estimates <- contrast_summary_winf |> mutate(WinP = estimate / 2 + 0.5, lgt = log(WinP / (1 - WinP)), selgt = SE / (WinP * (1 - WinP)), l = lgt - 1.96 * selgt, u = lgt + 1.96 * selgt, low = 1 / (1 + exp(-l)), upp = 1 / (1 + exp(-u)), wid = upp - low, test = lgt / selgt, p_val = 2 * (1 - pnorm(abs(test)))) # Display WinP estimates, confidence intervals, and p-values mmrm_estimates |> select(WinP, low, upp, wid, p_val) |> print() Marc h 18, 2026 20/31 References 1. Mallinc kro dt CH, Clark WS, Carroll RJ, Molen b erghs G. Assessing resp onse profiles from incomplete longitudinal clinical trial data under regulatory considerations. Journal of Biopharmaceutical Statistics. 2003;13(2):179–190. doi:10.1081/BIP-120019265. 2. Molen b erghs G, Thijs H, Jansen I, Beunck ens C, Kenw ard MG, Mallinckrodt C, et al. Analyzing incomplete longitudinal clinical trial data. Biostatistics. 2004;5(3):445–464. doi:10.1093/biostatistics/kxh001. 3. Mallinc kro d CH, Lane PW, Sc hnell D, Peng Y, Mancuso JP . Recommendations for the primary analysis of con tinuous endp oin ts in longitudinal clinical trials. Drug Information Journal. 2008;42(4):303–319. doi:10.1177/009286150804200402. 4. Ash b ec k EL, Bell ML. Single time p oin t comparisons in longitudinal randomized con trolled trials: p o wer and bias in the presence of missing data. BMC Medical Researc h Metho dology . 2016;16:1–8. doi:10.1186/s12874-016-0144-0. 5. Mallinc kro dt C, Molenberghs G, Lipk ovic h I, Ratitch B. Estimands, Estimators and Sensitivit y Analysis in Clinical T rials. Chapman and Hall/CRC; 2019. 6. Rubin DB. Inference and missing data. Biometrik a. 1976;63(3):581–592. doi:10.1093/biomet/63.3.581. 7. Little RJ, D’Agostino R, Cohen ML, Dick ersin K, Emerson SS, F arrar JT, et al. The prev ention and treatment of missing data in clinical trials. New England Journal of Medicine. 2012;367(14):1355–1360. doi:10.1056/NEJMsr1203730. 8. Sulliv an TR, White IR, Salter AB, Ry an P , Lee KJ. Should multiple imputation b e the metho d of choice for handling missing data in randomized trials? Statistical Metho ds in Medical Research. 2018;27(9):2610–2626. doi:10.1177/0962280216683570. 9. Ratitc h B, Goel N, Mallinckrodt C, Bell J, Bartlett JW, Molenberghs G, et al. Defining efficacy estimands in clinical trials: examples illustrating ICH E9 (R1) guidelines. Therap eutic Innov ation & Regulatory Science. 2020;54(2):370–384. doi:10.1007/s43441-019-00065-7. 10. Krusk al WH. Historical notes on the Wilco xon unpaired tw o-sample test. Journal of the American Statistical Asso ciation. 1957;52(4):356–360. doi:10.2307/2280906. 11. Mann HB, Whitney DR. On a test of whether one of t wo random v ariables is sto c hastically larger than the other. Annals of Mathematical Statistics. 1947;18(1):50–60. doi:10.1214/aoms/1177730491 . 12. Buyse M. Generalized pairwise comparisons of prioritized outcomes in the tw o-sample problem. Statistics in Medicine. 2010;29(30):3245–3257. doi:10.1002/sim.3923. 13. Lac hin JM. Some large-sample distribution-free estimators and tests for multiv ariate partially incomplete data from t wo p opulations. Statistics in Medicine. 1992;11:1151–1170. doi:10.1002/sim.4780110903. 14. Newson R. Parameters b ehind “nonparametric” statistics: Kendall’s tau, Somers’ D and median differences. The Stata Journal. 2002;2(1):45–64. doi:10.1177/1536867X020020. Marc h 18, 2026 21/31 15. P o co c k SJ, Gregson J, Collier TJ, F erreira JP , Stone GW. The win ratio in cardiology trials: lessons learnt, new developmen ts, and wise future use. Europ ean heart journal. 2024;45(44):4684–4699. doi:10.1093/eurheartj/ehae647. 16. Huang X, Jiao L, W ei L, Quan H, T eoh L, Ko c h GG. Missing radiographic data handling in randomized clinical trials in rheumatoid arthritis. Journal of Biopharmaceutical Statistics. 2013;23(6):1435–1452. doi:10.1080/10543406.2013.834913. 17. F an C, Zhang D, W ei L, Ko c h G. Metho ds for missing data handling in randomized clinical trials with nonnormal endp oin ts with application to a phase I II clinical trial. Statistics in Biopharmaceutical Research. 2016;8(2):179–193. doi:10.1080/19466315.2016.1142890. 18. Sun H, Kaw aguc hi A, Ko ch G. Analyzing multiple endp oin ts in a confirmatory randomized clinical trial—an approach that addresses stratification, missing v alues, baseline imbalance and multiplicit y for strictly ordinal outcomes. Pharmaceutical Statistics. 2017;16(2):157–166. doi:10.1002/pst.1799. 19. Ka waguc hi A, Ko c h GG. Sanon: an R pack age for stratified analysis with nonparametric cov ariable adjustment. Journal of Statistical Softw are. 2015;67:1–37. doi:10.18637/jss.v067.i09. 20. Zou G, Zou L, Choi YH. Distribution-free approac h to the design and analysis of randomized stroke trials with the Mo dified Rankin Scale. Stroke. 2022;53:3025–3031. doi:10.1161/STR OKEAHA.121.037744. 21. Zou G, Smith EJ, Zou L, Qiu SF, Shu D. A rank-based approach to design and analysis of pretest-p osttest randomized trials, with application to COVID-19 ordinal scale data. Con temp orary Clinical T rials. 2023;126:107085. doi:10.1016/j.cct.2023.107085. 22. Zou G, Zou L, Qiu SF. Parametric and nonparametric metho ds for confidence in terv als and sample size planning for win probability in parallel-group randomized trials with Likert item and Likert scale data. Pharmaceutical Statistics. 2023;22(3):418–439. doi:10.1002/pst.2280. 23. Gregoire AJ, Kumar R, Everitt B, Henderson AF, Studd JW. T ransdermal o estrogen for treatment of severe p ostnatal depression. Lancet. 1996;347:930–934. doi:10.1016/s0140-6736(96)91414-2. 24. Rabe-Hesketh S, Everitt B. A Handb o ok of Statistical Analyses using Stata, 3rd ed. Chapman & Hall/CRC; 2003. 25. Da vis CS. Semi-parametric and non-parametric methods for the analysis of repeated measuremen ts with applications to clinical trials. Statistics in Medicine. 1991;10(12):1959–1980. doi:10.1002/sim.4780101210. 26. Brunner E, Munzel U, Puri ML. The m ultiv ariate nonparametric Behrens–Fisher problem. Journal of Statistical Planning and Inference. 2002;108(1-2):37–53. doi:10.1016/S0378-3758(02)00269-0. 27. Bam b er D. The area ab o ve the ordinal dominance graph and the area b elo w the receiv er op erating characteristic graph. Journal of Mathematical Psychology . 1975;12:387–415. doi:10.1016/0022-2496(75)90001-2. Marc h 18, 2026 22/31 28. DeLong ER, DeLong DM, Clarke-P earson DL. Comparing the areas under tw o or more correlated receiver op erating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. doi:10.2307/2531595. 29. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver op erating c haracteristic (R OC) curv e. Radiology . 1982;143(1):29–36. doi:10.1148/radiology .143.1.7063747. 30. Bross IDJ. Ho w to use ridit analysis. Biometrics. 1958;14:18–38. doi:10.2307/2527727. 31. Beder JH, Heim RC. On the use of ridit analysis. Psychometrik a. 1990;55:603–616. doi:10.1007/BF02294610. 32. Harrell Jr FE, Lee KL, Mark DB. Multiv ariable prognostic mo dels: issues in dev eloping mo dels, ev aluating assumptions and adequacy , and measuring and reducing errors. Statistics in Medicine. 1996;15(4):361–387. doi:10.1002/(SICI)1097-0258(19960229)15:4¡361::AID-SIM168¿3.0.CO;2-4. 33. Cohen J. A p o wer primer. Psychological Bulletin. 1992;112:155–159. doi:10.1037//0033-2909.112.1.155. 34. Dong G, Hoaglin DC, Qiu J, Matsouak a RA, Chang YW, W ang J, et al. The win ratio: on in terpretation and handling of ties. Statistics in Biopharmaceutical Researc h. 2020;12(1):99–106. doi:10.1080/19466315.2019.1575279. 35. Agresti A. Generalized o dds ratios for ordinal data. Biometrics. 1980;36(1):59–67. doi:10.2307/2530495. 36. Brunner E, Munzel U. The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample appro ximation. Biometrical Journal. 2000;42(1):17–25. doi:10.1002/(SICI)1521-4036(200001)42:1 < 17::AID-BIMJ17 > 3.0.CO;2-U. 37. Hanley JA, Ha jian-Tilaki KO. Sampling v ariabilit y of nonparametric estimates of the areas under receiver op erating c haracteristic curv es: an up date. Academic Radiology . 1997;4(1):49–58. doi:10.1016/s1076-6332(97)80161-4. 38. Dodd L, Pepe MS. Semiparametric regression for the area under the receiver op erating characteristic curve. Journal of the American Statistical Asso ciation. 2003;98(462):409–417. doi:10.1198/016214503000198. 39. Ho effding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics. 1948;19:293–325. doi:10.1214/aoms/1177730196’. 40. Rubarth K, Pauly M, Konietschk e F. Ranking pro cedures for rep eated measures designs with missing data: estimation, testing and asymptotic theory . Statistical Metho ds in Medical Research. 2022;31(1):105–118. doi:10.1177/09622802211046389. 41. MacKinnon JG, White H. Some heteroskedasticit y-consistent cov ariance matrix estimators with improv ed finite sample properties. Journal of Econometrics. 1985;29(3):305–325. doi:10.1016/0304-4076(85)90158-7. 42. Sen PK. A note on asymptotically distribution-free confidence b ounds for P ( X < Y ), based on tw o indep enden t samples. Sankhy a: The Indian Journal of Statistics,. 1967;29(1):95–102. doi:10.2307/25049448. 43. Tc heuko L, Gallas B, Samuelson F. Using ANOV A/random-effects v ariance estimates to compute a tw o-sample U -statistic of order (1, 1) estimate of v ariance. Journal of Statistical Theory and Practice. 2016;10:87–99. doi:10.1080/15598608.2015.1077759. Marc h 18, 2026 23/31 44. Brunner E, Konietschk e F. An unbiased rank-based estimator of the Mann–Whitney v ariance including the case of ties. Statistical P ap ers. 2025;66(1):20. doi:10.1007/s00362-024-01635-0. 45. Rauc h G, Jahn-Eimermac her A, Brannath W, Kieser M. Opp ortunities and c hallenges of combined effect measures based on prioritized outcomes. Statistics in Medicine. 2014;33(7):1104–1120. doi:10.1002/sim.6010. 46. Zhou TJ, LaV alley MP , Nelson KP , Cabral HJ, Massaro JM. Calculating p ow er for the Finkelstein and Schoenfeld test statistic for a comp osite endp oin t with tw o comp onen ts. Statistics in Medicine. 2022;41(17):3321–3335. doi:10.1002/sim.9419. 47. F uyama K, Ogaw a M, Mizusaw a J, Kanemitsu Y, F ujita S, Kaw ahara T, et al. Impact of correlations b et ween prioritized outcomes on the net b enefit and its estimate by generalized pairwise comparisons. Statistics in Medicine. 2023;42(10):1606–1624. doi:10.1002/sim.9690. 48. Deltuv aite-Thomas V, Burzyk owski T. Op erational characteristics of univ ariate generalized pairwise comparisons with missing data. Comm unications in Statistics-Sim ulation and Computation. 2023;x:1–19. doi:10.1080/03610918.2023.2253380. 49. Jasp ers S, V erb eec k J, Thas O. Cov ariate-adjusted generalized pairwise comparisons in small samples. Statistics in Medicine. 2024;43(21):4027–4042. doi:10.1002/sim.10140. 50. Zou G, Zou L. A nonparametric global win probability approach to the analysis and sizing of randomized controlled trials with multiple endp oin ts of different scales and missing data: Beyond O’Brien-W ei-Lachin. Statistics in Medicine. 2024;43:5366–5379. doi:10.1002/SIM.10247. 51. Siddiqui O. MMRM versus MI in dealing with missing data—a comparison based on 25 NDA data sets. Journal of Biopharmaceutical Statistics. 2011;21(3):423–436. doi:10.1080/10543401003777995. 52. Mallinc kro dt CH, Clark WS, Da vid SR. Accounting for drop out bias using mixed-effects mo dels. Journal of Biopharmaceutical Statistics. 2001;11(1-2):9–21. doi:10.1081/BIP-100104194. 53. Mallinc kro dt CH, Kaiser CJ, W atkin JG, Molenberghs G, Carroll RJ. The effect of correlation structure on treatment con trasts estimated from incomplete clinical trial data with likelihoo d-based rep eated measures compared with last observ ation carried forw ard ANOV A. Clinical T rials. 2004;1(6):477–489. doi:10.1191/1740774504cn049oa. 54. Barnes SA, Mallinckrodt CH, Lindb org SR, Carter MK. The impact of missing data and how it is handled on the rate of false-p ositive results in drug developmen t. Pharmaceutical Statistics. 2008;7(3):215–225. doi:10.1002/pst.310. 55. Beunc kens C, Molen b erghs G, Kenw ard MG. Direct likelihoo d analysis v ersus simple forms of imputation for missing data in randomized clinical trials. Clinical T rials. 2005;2(5):379–386. doi:10.1191/1740774505cn119oa. 56. Siddiqui O, Hung HJ, O’Neill R. MMRM vs. LOCF: a comprehensive comparison based on simulation study and 25 NDA datasets. Journal of Biopharmaceutical Statistics. 2009;19(2):227–246. doi:10.1080/10543400802609797. Marc h 18, 2026 24/31 57. Buyse M, V erb eec k J, Saad ED, De Back er M, Deltuv aite-Thomas V, Molenberghs G. Handb ook of Generalized Pairwise Comparisons: Methods for Patien t-Centric Analysis. Chapman and Hall/CRC; 2025. 58. Lac hin JM. F allacies of last observ ation carried forward analyses. Clinical T rials. 2016;13(2):161–168. doi:10.1177/1740774515602688. 59. Zou G. Confidence interv al estimation for treatment effects in cluster randomization trials based on ranks. Statistics in Medicine. 2021;40(14):3227–3250. doi:10.1002/sim.8918. 60. Da vies Smith E, Jairath V, Zou G. Rank-based estimators of global treatment effects for cluster randomized trials with m ultiple endp oin ts. Statistical Methods in Medical Researc h. 2025;34:1267–1289. doi:10.1177/09622802251338387. Marc h 18, 2026 25/31 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 27 27 22 17 17 17 17 27 27 22 27 27 22 27 27 27 27 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 22 17 17 17 17 27 27 27 27 22 17 17 17 17 27 27 22 27 27 22 17 17 17 17 27 27 n = 0 10 20 Depression Score 0 1 2 3 4 5 6 Monthly visit Placebo 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 34 34 31 29 28 28 28 34 34 31 29 28 28 28 34 34 34 34 31 29 34 34 n = 0 10 20 Depression Score 0 1 2 3 4 5 6 Monthly visit Estrogen 1) Treatment of postnatal depression 40 36 30 27 21 40 36 30 27 21 40 36 40 36 40 36 30 27 21 15 40 36 30 27 40 36 30 27 21 15 40 36 40 36 30 27 21 15 40 36 30 40 36 30 27 40 36 40 36 30 27 40 36 30 27 40 36 30 27 21 15 40 36 30 40 36 30 27 21 15 40 36 30 27 21 40 36 30 27 21 15 40 36 30 27 21 15 40 36 30 27 21 15 40 36 40 36 30 27 40 40 36 30 27 21 15 40 40 36 30 27 40 36 30 27 21 15 40 36 30 27 21 15 40 36 30 27 21 15 40 36 30 27 21 40 36 30 27 21 15 40 40 36 30 40 36 30 27 21 15 40 36 30 27 21 15 40 40 36 30 27 21 40 36 30 27 21 40 36 n = 0 50 100 Pain Score 1 2 3 4 5 6 30-minute interval Placebo 43 39 35 43 39 35 29 24 19 43 39 35 29 24 43 39 35 29 43 43 39 35 43 39 35 43 39 35 29 24 19 43 39 43 39 35 29 24 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 43 39 35 29 24 19 43 39 35 29 24 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 29 43 39 35 29 24 19 43 39 35 29 24 19 43 39 35 43 39 35 29 24 19 43 39 35 29 24 43 39 35 29 43 43 43 39 43 39 35 43 39 35 29 24 19 43 39 35 29 24 19 43 43 39 43 39 35 29 24 19 43 39 43 39 35 29 24 43 39 35 43 39 35 29 24 19 n = 0 50 100 Pain Score 1 2 3 4 5 6 30-minute interval Treatment 2) Treatment of labour pain Fig 1. Boxplots b y treatmen t group: 1) the Edinburgh p ostnatal depression scale (EPDS) scores at baseline (visit 0) and six p ost-in terven tion visits in a randomization trial ev aluating an estrogen patch for treating p ostnatal women with ma jor depression [23] and 2) the six 30-minute interv als pain scores in a trial ev aluating a treatment for maternal pain relief during lab or [25]. n denotes the num b er of participants at eac h timep oin t. T able 1. P ercentages of participants who had data deleted (drop out %) according to trigger v alue and probability by treatment arm (control, treatmen t) and missing data mechanism (missing at random (MAR) and missing not at random (MNAR). MAR MNAR 16, 40% 16, 40% 16, 50% 16, 50% 16, 40% 16, 40% 16, 50% 16, 50% T ra jectory Case Group 16, 40% 15, 40% 16, 30% 15, 30% 15, 40% 16, 40% 16, 30% 15, 30% 1 1-4 0 26 19 33 24 27 20 36 25 1 26 26 20 20 28 28 23 23 2 5-8 0 30 26 38 33 34 29 41 36 1 24 24 19 19 28 28 22 22 3 9-12 0 26 23 33 29 29 25 36 30 1 20 24 16 19 22 27 18 21 4 13-16 0 22 20 25 25 25 30 25 28 1 26 26 21 24 27 27 21 26 T able 2. P erformance based on 1000 simulation replicates of three metho ds for handling missing data in comparison with no missing data in estimating landmark win probabilit y under data missing completely at random (MCAR) of 30% in each arm with sample size 50 p er group. T ra jectory Metho d Bias% ∗ ML CV MR (WD) EP § 1 No missing 0.2 2 . 1 95 . 7 2 . 2 (22 . 0) 4 . 3 GPC 0.5 2 . 6 94 . 6 2 . 8 (20 . 5) 5 . 4 CCA -4.5 3 . 1 95 . 3 1 . 6 (26 . 2) 4 . 7 MMRM -3.2 2 . 5 96 . 0 1 . 5 (25 . 1) 4 . 0 2 No missing 0.7 2 . 2 95 . 6 2 . 2 (22 . 1) 4 . 4 GPC 57.5 0 . 6 90 . 3 9 . 1 (20 . 4) 9 . 7 CCA -4.5 3 . 1 95 . 3 1 . 6 (26 . 2) 4 . 7 MMRM -3.1 2 . 5 96 . 1 1 . 4 (25 . 2) 3 . 9 3 No missing -0.1 2 . 1 95 . 8 2 . 1 (21 . 8) 18 . 1 GPC -41.2 7 . 1 92 . 2 0 . 7 (20 . 4) 11 . 3 CCA -4.0 2 . 7 95 . 7 1 . 6 (26 . 0) 12 . 5 MMRM -3.0 2 . 4 95 . 9 1 . 7 (24 . 9) 14 . 0 4 No missing 0.1 2 . 0 95 . 9 2 . 1 (21 . 2) 55 . 1 GPC -170.7 43 . 1 56 . 8 0 . 1 (20 . 4) 8 . 3 CCA -4.0 3 . 1 95 . 6 1 . 3 (25 . 4) 38 . 2 MMRM -3.4 2 . 5 96 . 0 1 . 5 (24 . 5) 42 . 8 ∗ Bias % is defined as the mean of ( \ WinP − WinP ) / WinP × 100 ov er 1000 simulations. § Eac h 95% confidence interv al entry is presented as ML CV% MR ( WD × 100 ) EP , where ML and MR indicate the p ercentage of times the upp er limit is smaller and the low er limit is greater than the true parameter v alue, resp ectively , CV indicates the p ercen tage of times the confidence interv al cov ers the parameter v alue, WD is the mean width of 1000 confidence in terv als, and EP (empirical p ow er) is defined as the percentage of times that the n ull h yp othesis H 0 : WinP=0.5 b eing rejected at 2-sided 5% significance level. T able 3. P erformance based on 1000 simulation replicates of three metho ds for handling missing data in comparison with no missing data in estimating landmark win probabilit y under data missing at random (MAR) and missing not at random (MNAR) with sample size 50 p er group. Case † Metho d MAR MNAR Bias% ∗ ML CV MR (WD) EP § Bias ML CV MR (WD) EP No missing T ra j 1 0.2 2 . 1 95 . 7 2 . 2 (22 . 0) 4 . 3 0.2 2 . 1 95 . 7 2 . 2 (22 . 0) 4 . 3 1 GPC 36.7 0 . 8 94 . 2 5 . 0 (21 . 6) 5 . 8 45.1 0 . 5 93 . 5 6 . 0 (20 . 8) 6 . 5 CCA 40.6 1 . 0 92 . 5 6 . 5 (25 . 8) 7 . 5 46.2 0 . 7 92 . 8 6 . 5 (26 . 4) 7 . 2 MMRM 26.5 1 . 4 94 . 7 3 . 9 (25 . 5) 5 . 3 38.0 0 . 7 93 . 8 5 . 5 (26 . 4) 6 . 2 2 GPC 30.9 0 . 8 94 . 6 4 . 6 (21 . 4) 5 . 4 46.1 0 . 4 94 . 1 5 . 5 (20 . 8) 5 . 9 CCA 52.5 0 . 8 91 . 8 7 . 4 (26 . 1) 8 . 2 70.4 0 . 2 90 . 4 9 . 4 (26 . 8) 9 . 6 MMRM 33.2 1 . 0 94 . 4 4 . 6 (25 . 6) 5 . 6 56.3 0 . 3 92 . 3 7 . 4 (26 . 7) 7 . 7 3 GPC 51.4 0 . 5 92 . 1 7 . 4 (21 . 6) 7 . 9 34.3 1 . 0 94 . 2 4 . 8 (20 . 9) 5 . 8 CCA 0.6 2 . 0 95 . 4 2 . 6 (25 . 8) 4 . 6 0.4 2 . 3 94 . 7 3 . 0 (26 . 5) 5 . 3 MMRM 4.3 2 . 2 95 . 0 2 . 8 (25 . 6) 5 . 0 -2.6 2 . 3 95 . 0 2 . 7 (26 . 4) 5 . 0 4 GPC 46.6 0 . 6 93 . 1 6 . 3 (21 . 5) 6 . 9 34.8 0 . 9 94 . 3 4 . 8 (20 . 9) 5 . 7 CCA 6.6 2 . 0 95 . 3 2 . 7 (26 . 1) 4 . 7 18.1 1 . 7 94 . 2 4 . 1 (26 . 9) 5 . 8 MMRM 7.4 2 . 0 95 . 1 2 . 9 (25 . 6) 4 . 9 10.5 1 . 8 94 . 9 3 . 3 (26 . 7) 5 . 1 No missing T ra j 2 0.7 2 . 2 95 . 6 2 . 2 (22 . 1) 4 . 4 0.7 2 . 2 95 . 6 2 . 2 (22 . 1) 4 . 4 5 GPC 80.8 0 . 1 87 . 2 12 . 7 (21 . 4) 12 . 8 60.3 0 . 3 92 . 6 7 . 1 (20 . 9) 7 . 4 CCA 16.9 1 . 5 94 . 1 4 . 4 (26 . 0) 5 . 9 30.2 1 . 3 93 . 8 4 . 9 (27 . 0) 6 . 2 MMRM 4.5 2 . 0 95 . 0 3 . 0 (25 . 7) 5 . 0 20.3 1 . 3 94 . 7 4 . 0 (27 . 0) 5 . 3 6 GPC 71.3 0 . 2 88 . 6 11 . 2 (21 . 4) 11 . 4 63.9 0 . 3 91 . 7 8 . 0 (20 . 7) 8 . 3 CCA 24.8 1 . 3 94 . 2 4 . 5 (25 . 6) 5 . 8 47.0 0 . 6 92 . 5 6 . 9 (26 . 4) 7 . 5 MMRM 8.8 1 . 6 95 . 5 2 . 9 (25 . 3) 4 . 5 34.7 0 . 9 93 . 7 5 . 4 (26 . 5) 6 . 3 7 GPC 96.0 0 . 1 84 . 5 15 . 4 (21 . 4) 15 . 5 47.9 0 . 5 93 . 4 6 . 1 (21 . 1) 6 . 6 CCA -16.3 3 . 9 94 . 5 1 . 6 (26 . 2) 5 . 5 -17.9 3 . 5 94 . 5 2 . 0 (27 . 1) 5 . 5 MMRM -14.0 3 . 0 95 . 4 1 . 6 (25 . 8) 4 . 6 -22.5 3 . 5 95 . 0 1 . 5 (27 . 1) 5 . 0 8 GPC 85.3 0 . 2 85 . 6 14 . 2 (21 . 5) 14 . 4 51.9 0 . 5 92 . 9 6 . 6 (20 . 9) 7 . 1 CCA -7.4 2 . 8 95 . 0 2 . 2 (25 . 7) 5 . 0 1.9 2 . 2 94 . 5 3 . 3 (26 . 5) 5 . 5 MMRM -8.9 2 . 9 95 . 4 1 . 7 (25 . 3) 4 . 6 -5.2 2 . 1 95 . 5 2 . 4 (26 . 5) 4 . 5 † Case n umber denotes the com bination of T ra jectory and missing data percentage as shown in T able 1. ∗ Bias % is defined as the mean of ( \ WinP − WinP ) / WinP × 100 o ver 1000 sim ulations. § Eac h 95% confidence interv al entry is presented as ML CV% MR ( WD × 100 ) EP , where ML and MR indicate the p ercentage of times the upp er limit is smaller and the low er limit is greater than the true parameter v alue, resp ectively , CV indicates the p ercen tage of times the confidence interv al cov ers the parameter v alue, WD is the mean width of 1000 confidence in terv als, and EP (empirical p ow er) is defined as the percentage of times that the n ull h yp othesis H 0 : WinP=0.5 b eing rejected at 2-sided 5% significance level. T able 4. P erformance based on 1000 simulation replicates of three metho ds for handling missing data in comparison with no missing data in estimating win probability at endpoint under data missing at random (MAR) and missing not at random (MNAR) with sample size 50 p er group. Case † Metho d MAR MNAR Bias% ∗ ML CV MR (WD) EP § Bias ML CV MR (WD) EP No missing T ra j 3 -0.1 2 . 1 95 . 8 2 . 1 (21 . 8) 18 . 1 -0.1 2 . 1 95 . 8 2 . 1 (21 . 8) 18 . 1 9 GPC 21.5 1 . 6 94 . 7 3 . 7 (21 . 4) 25 . 2 0.8 2 . 3 94 . 9 2 . 8 (20 . 6) 19 . 9 CCA 41.7 0 . 8 93 . 1 6 . 1 (25 . 2) 26 . 3 42.8 0 . 5 93 . 5 6 . 0 (25 . 7) 23 . 6 MMRM 28.4 1 . 1 95 . 2 3 . 7 (25 . 1) 21 . 5 36.3 0 . 5 94 . 1 5 . 4 (25 . 7) 22 . 2 10 GPC 13.3 2 . 5 94 . 6 2 . 9 (21 . 3) 22 . 8 -1.1 2 . 6 95 . 0 2 . 4 (20 . 6) 18 . 5 CCA 55.0 0 . 7 92 . 0 7 . 3 (25 . 5) 29 . 7 65.1 0 . 3 90 . 9 8 . 8 (26 . 0) 30 . 5 MMRM 35.7 1 . 0 94 . 7 4 . 3 (25 . 2) 23 . 3 53.7 0 . 3 92 . 7 7 . 0 (25 . 9) 26 . 0 11 GPC 36.8 0 . 7 94 . 0 5 . 3 (21 . 4) 30 . 5 -9.4 3 . 5 94 . 6 1 . 9 (20 . 7) 17 . 4 CCA 1.5 2 . 5 95 . 1 2 . 4 (25 . 5) 15 . 7 -2.3 2 . 7 94 . 9 2 . 4 (26 . 1) 13 . 4 MMRM 5.6 2 . 1 95 . 8 2 . 1 (25 . 4) 16 . 0 -3.2 2 . 5 95 . 2 2 . 3 (26 . 0) 12 . 6 12 GPC 30.1 1 . 1 94 . 1 4 . 8 (21 . 3) 27 . 5 -11.0 3 . 9 94 . 2 1 . 9 (20 . 7) 16 . 4 CCA 8.7 2 . 2 95 . 2 2 . 6 (25 . 7) 16 . 8 14.8 1 . 5 94 . 7 3 . 8 (26 . 3) 15 . 8 MMRM 9.3 2 . 2 95 . 4 2 . 4 (25 . 4) 16 . 8 9.7 1 . 6 94 . 8 3 . 6 (26 . 2) 15 . 5 No missing T ra j 4 0.1 2 . 0 95 . 9 2 . 1 (21 . 2) 55 . 1 0.1 2 . 0 95 . 9 2 . 1 (21 . 2) 55 . 1 13 GPC -89.9 15 . 2 84 . 5 0 . 3 (21 . 8) 21 . 0 -58.2 8 . 4 91 . 1 0 . 5 (20 . 5) 35 . 3 CCA 69.5 0 . 4 90 . 3 9 . 3 (23 . 7) 69 . 8 66.4 0 . 3 91 . 5 8 . 2 (24 . 2) 66 . 7 MMRM 50.3 0 . 7 92 . 2 7 . 1 (23 . 6) 63 . 4 60.5 0 . 3 92 . 3 7 . 4 (24 . 2) 65 . 4 14 GPC -92.1 16 . 2 83 . 6 0 . 2 (21 . 7) 20 . 2 -71.5 9 . 7 90 . 0 0 . 3 (20 . 5) 31 . 3 CCA 61.7 0 . 7 91 . 2 8 . 1 (24 . 2) 65 . 3 51.0 0 . 5 92 . 8 6 . 7 (24 . 8) 61 . 0 MMRM 46.5 0 . 8 92 . 5 6 . 7 (24 . 0) 60 . 7 47.4 0 . 7 92 . 5 6 . 8 (24 . 8) 59 . 0 15 GPC -71.3 12 . 3 87 . 3 0 . 4 (21 . 6) 27 . 2 -64.7 10 . 1 89 . 4 0 . 5 (20 . 5) 32 . 8 CCA 28.7 1 . 4 94 . 3 4 . 3 (24 . 1) 54 . 4 20.3 1 . 7 94 . 6 3 . 7 (24 . 7) 48 . 5 MMRM 25.3 1 . 4 94 . 8 3 . 8 (24 . 0) 53 . 5 19.9 1 . 5 95 . 2 3 . 3 (24 . 7) 48 . 8 16 GPC -86.7 13 . 5 86 . 2 0 . 3 (21 . 6) 22 . 7 -73.2 11 . 1 88 . 4 0 . 5 (20 . 6) 30 . 2 CCA 33.7 1 . 4 93 . 3 5 . 3 (24 . 3) 55 . 6 34.8 1 . 2 93 . 5 5 . 3 (24 . 9) 54 . 5 MMRM 27.2 1 . 4 94 . 9 3 . 7 (24 . 0) 54 . 2 30.9 1 . 2 94 . 0 4 . 8 (24 . 8) 53 . 1 † Case n umber denotes the com bination of T ra jectory and missing data percentage as shown in T able 1. ∗ Bias % is defined as the mean of ( \ WinP − WinP ) / WinP × 100 o ver 1000 sim ulations. § Eac h 95% confidence interv al entry is presented as ML CV% MR ( WD × 100 ) EP , where ML and MR indicate the p ercentage of times the upp er limit is smaller and the low er limit is greater than the true parameter v alue, resp ectively , CV indicates the p ercen tage of times the confidence interv al cov ers the parameter v alue, WD is the mean width of 1000 confidence in terv als, and EP (empirical p ow er) is defined as the percentage of times that the n ull h yp othesis H 0 : WinP=0.5 b eing rejected at 2-sided 5% significance level. T able 5. Results of the motiv ating examples Metho d WinP estimate (95% CI) P-v alue P ostnatal depression trial [23] GPC 0.737 (0.611, 0.834) 0.0005 CCA 0.779 (0.604, 0.890) 0.0032 MMRM Visit 1 0.670 (0.516, 0.794) 0.0314 2 0.700 (0.544, 0.817) 0.0132 3 0.772 (0.619, 0.876) 0.0011 4 0.703 (0.521, 0.837) 0.0300 5 0.749 (0.583, 0.865) 0.0048 6 0.774 (0.605, 0.885) 0.0027 Lab or pain trial [25] GPC 0.756 (0.649, 0.838) 0.000015 CCA 0.895 (0.738, 0.962) 0.00014 MMRM 30-min interv al 1 0.587 (0.461, 0.702) 0.1742 2 0.656 (0.527, 0.765) 0.0182 3 0.772 (0.650, 0.861) 0.0001 4 0.844 (0.712, 0.922) 0.00002 5 0.861 (0.745, 0.930) 0.00001 6 0.875 (0.722, 0.950) 0.00012
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment