The win ratio (WR) statistic is increasingly used to evaluate treatment effects based on prioritized composite endpoints, yet existing Bayesian adaptive designs are not directly applicable because the WR is a summary statistic derived from pairwise comparisons and does not correspond to a unique data-generating mechanism. We propose a Bayesian model-assisted adaptive design for randomized phase II clinical trials based on the WR statistic, referred to as the BMW design. The proposed design uses the joint asymptotic distribution of WR test statistics across interim and final analyses to compute posterior probabilities without specifying the underlying outcome distribution. The BMW design allows flexible interim monitoring with early stopping for futility or superiority and is extended to jointly evaluate efficacy and toxicity using a graphical testing procedure that controls the family-wise error rate (FWER). Simulation studies demonstrate that the BMW design maintains valid type I error and FWER control, achieves power comparable to conventional methods, and substantially reduces expected sample size. An R Shiny application is provided to facilitate practical implementation.
Bayesian adaptive designs have gained substantial popularity in the development and conduct of early-phase clinical trials (US Food and Drug Administration, 2026). Traditionally, these designs rely on parametric models to characterize the treatment-outcome relationship and to derive posterior probabilities from observed data, which are then used to guide patient allocation and treatment selection at interim and final analyses (Berry et al., 2010).
However, such model-based designs can be difficult to understand, rely on strong parametric model assumptions that may not hold for real-world clinical data, and require substantial computational resources and expensive infrastructure for implementation in clinical practice. (Chevret, 2012).
In recent years, a new class of Bayesian adaptive designs, referred to as the model-assisted designs, has been proposed as an alternative to model-based designs (Yuan et al., 2019). Like model-based designs, model-assisted designs rely on statistical models to support decision making. However, their decision rules can be fully pre-specified prior to trial initiation and explicitly incorporated into the study protocol. In general, model-assisted designs often demonstrate superior operating characteristics compared with model-based designs and can be implemented more straightforwardly, often using freely available software (Yuan et al., 2022).
In phase II clinical trials, the Bayesian optimal phase II (BOP2) design is a transparent, flexible, and efficient model-assisted trial design for evaluating treatment effect (Zhou et al., 2017). As a Bayesian adaptive design, BOP2 offers several desirable properties, including flexibility in the timing and frequency of interim analyses and the ability to stop a trial early for either futility or superiority (Xu et al., 2025). At the same time, similar to frequentist designs, BOP2 can strictly control the type I error rate and maximize statistical power through simulation-based numerical optimization of design parameters. Although it was originally developed for single-arm trials with categorical outcomes, the BOP2 design has since been extended to randomized controlled trials (RCT) (Zhao et al., 2022) and to timeto-event outcomes (Zhou et al., 2020).
Based on the Dirichlet-multinomial model, the BOP2 design can readily accommodate multiple types of outcomes within a unified multiple hypothesis testing framework. For example, consider a RCT with two efficacy endpoints: objective tumor response (OR) and 3-month event-free survival (EFS3). The BOP2 design can be used to formulate and test hypotheses such that the new treatment is declared successful if it demonstrates a statistically significant improvement over the control in either OR or EFS3. However, despite this flexibility, the BOP2 design treats multiple endpoints exclusively as co-primary and model the joint distribution for all the endpoints of interest, which may not fully reflect clinical practice, particularly with the increasing use of composite endpoint.
In clinical trials with multiple endpoints, the composite endpoint is often used to combine multiple endpoints into a single index, thereby improving statistical efficiency, simplifying hypothesis testing, and avoiding complex joint modeling approaches (Freemantle et al., 2003).
However, the conventional definition of a composite endpoint treats all components as equally important and typically considers only the first event. As a result, it ignores clinical priorities and the timing of multiple endpoints, both of which are often critically important in clinical practice. To address these limitations, the win ratio (WR) statistic has been proposed as an alternative that incorporates a clinically meaningful hierarchy of endpoints through pairwise comparisons between treatment and control arms, sequentially prioritizing more important outcomes and yielding a more appropriate assessment of the overall treatment benefit (Pocock et al., 2012).
Since its introduction, the WR statistic has been used in many clinical trials to evaluate the overall treatment effect (Redfors et al., 2020). The statistical properties of the WR statistic have also been extensively studied in the literature, including large-sample inference approach developed under the U-statistic framework (Bebu and Lachin, 2016) and sample size formulas (Mao et al., 2021;Yu and Ganju, 2022). However, little work has focused on Bayesian adaptive designs for the WR statistic, and existing methods such as the BOP2 design are not directly applicable in this setting. A key challenge is that the posterior probabilities used in the BOP2 design and other Bayesian adaptive designs are derived directly from the observed data, whereas the WR statistic is a summary measurement of pairwise comparison rather than the raw observed data itself. Moreover, the WR statistic is highly flexible and general, placing no restrictions on either the number of endpoints or the correlation st
This content is AI-processed based on open access ArXiv data.