BMW: Bayesian Model-Assisted Adaptive Phase II Clinical Trial Design for Win Ratio Statistic
The win ratio (WR) statistic is increasingly used to evaluate treatment effects based on prioritized composite endpoints, yet existing Bayesian adaptive designs are not directly applicable because the WR is a summary statistic derived from pairwise comparisons and does not correspond to a unique data-generating mechanism. We propose a Bayesian model-assisted adaptive design for randomized phase II clinical trials based on the WR statistic, referred to as the BMW design. The proposed design uses the joint asymptotic distribution of WR test statistics across interim and final analyses to compute posterior probabilities without specifying the underlying outcome distribution. The BMW design allows flexible interim monitoring with early stopping for futility or superiority and is extended to jointly evaluate efficacy and toxicity using a graphical testing procedure that controls the family-wise error rate (FWER). Simulation studies demonstrate that the BMW design maintains valid type I error and FWER control, achieves power comparable to conventional methods, and substantially reduces expected sample size. An R Shiny application is provided to facilitate practical implementation.
💡 Research Summary
The paper introduces a novel Bayesian adaptive design for phase‑II randomized trials that uses the win‑ratio (WR) statistic, called the BMW (Bayesian Model‑Assisted Adaptive) design. The WR is a composite endpoint that ranks patient outcomes by priority and then compares treatment groups through pairwise “wins” and “losses.” Because the WR is a summary derived from pairwise comparisons rather than a direct observation from a generative model, conventional Bayesian adaptive designs— which require explicit likelihoods and priors for the raw data—cannot be applied directly.
To overcome this limitation, the authors adopt a “model‑assisted” strategy. They show that the WR test statistics calculated at interim and final analyses jointly follow an asymptotic multivariate normal distribution. This result follows from the central limit theorem applied to the large number of patient pairs and is validated by extensive simulation. The mean vector of this distribution is linked to a treatment‑effect parameter θ (the difference in win probabilities between experimental and control arms), while the covariance matrix can be estimated from pilot data, historical studies, or specified via a Bayesian prior.
At each analysis, the observed WR value is plugged into the multivariate normal model to obtain posterior probabilities such as P(θ>0 | data) for superiority or P(θ<0 | data) for futility. Pre‑specified decision thresholds (e.g., 0.95 for superiority, 0.05 for futility) trigger early stopping. Because the posterior is derived from the joint asymptotic distribution rather than a full likelihood, the design remains valid even when the underlying patient‑level outcomes are heterogeneous or non‑Gaussian.
The BMW framework is extended to simultaneously test efficacy and safety. The authors incorporate a graphical testing procedure that allocates the overall family‑wise error rate (FWER) across two hypotheses: an efficacy hypothesis and a toxicity hypothesis. For example, an initial α‑spending of 0.025 is assigned to efficacy; if the efficacy test is significant, the remaining 0.025 is used for the toxicity test. This sequential allocation guarantees that the overall FWER stays at or below the nominal 5 % level, even with adaptive interim looks.
Simulation studies explore a wide range of scenarios: varying true win‑ratio differences, different toxicity rates, multiple interim analysis timings (30 % and 50 % of enrollment), and differing strengths of prior information (informative vs. weak priors). Across all settings, the BMW design maintains type I error and FWER at the target 5 % level, achieves power comparable to a fixed‑sample design (often within a few percentage points), and reduces the expected sample size (ESS) by 20–30 % on average. The greatest sample‑size savings occur when futility stopping is frequent, such as when the true treatment effect is modest or toxicity is high.
To facilitate practical adoption, the authors provide an R Shiny web application. Users input the numbers of wins for each arm, the timing of interim looks, and prior parameters; the app instantly computes posterior probabilities, displays whether stopping criteria are met, and visualizes the graphical testing pathway. This tool lowers the barrier for clinical investigators who may not have deep Bayesian expertise.
In summary, the BMW design offers a rigorous, flexible, and efficient solution for trials that rely on the win‑ratio—a statistic increasingly popular for prioritized composite endpoints. By leveraging the joint asymptotic normality of WR test statistics, the design sidesteps the need for a fully specified outcome model while still delivering Bayesian decision‑making, early‑stopping capabilities, and controlled multiplicity. The approach is especially attractive for rare‑disease or early‑phase studies where sample sizes are limited and composite outcomes are essential. Future work suggested by the authors includes extending the asymptotic approximation to small‑sample regimes, incorporating more complex hierarchical priors, and applying the method to real‑world phase‑II oncology trials.
Comments & Academic Discussion
Loading comments...
Leave a Comment