Design of Bayesian Clinical Trials with Clustered Data
In the design of clinical trials, it is essential to assess the design operating characteristics (e.g., power and the type I error rate). Common practice for the evaluation of operating characteristics in Bayesian clinical trials relies on estimating the sampling distribution of posterior summaries via Monte Carlo simulation. It is computationally intensive to repeat this estimation process for each design configuration considered, particularly for clustered data that are analyzed using complex, high-dimensional models. In this paper, we propose an efficient method to assess operating characteristics and determine sample sizes for Bayesian trials with clustered data. We prove theoretical results that enable posterior probabilities to be modeled as a function of the number of clusters. Using these functions, we assess operating characteristics at a range of sample sizes given simulations conducted at only two cluster counts. These theoretical results are also leveraged to quantify the impact of simulation variability on our sample size recommendations. The applicability of our methodology is illustrated using an example cluster-randomized Bayesian clinical trial.
💡 Research Summary
This paper addresses a major computational bottleneck in the design of Bayesian clinical trials that involve clustered data, such as cluster‑randomized or longitudinal studies. In Bayesian trial design, operating characteristics—most notably statistical power and type‑I error—must be evaluated by estimating the sampling distribution of posterior summaries (e.g., the posterior probability that a treatment effect exceeds a clinically relevant threshold). The standard approach relies on Monte‑Carlo simulation: for each candidate design (different sample sizes, effect sizes, priors, etc.) a large number of trial replications are generated, each requiring a full Bayesian posterior approximation (often via MCMC). When the analysis model includes random effects to capture within‑cluster dependence, the dimensionality of the posterior explodes, making the simulation effort prohibitive.
The authors propose a novel, theoretically grounded method that dramatically reduces the number of required simulations. Their key insight is that the posterior probability of interest, denoted τ(D_c)=Pr(H₁|D_c), can be expressed as a smooth function of the number of clusters c. By proving that the logit transformation of τ is approximately linear in c (or in 1/c, depending on the model formulation), they show that the entire sampling distribution of τ for any c can be reconstructed from its quantiles at just two distinct cluster counts, c₁ and c₂. In practice, one runs a full Monte‑Carlo simulation at these two cluster sizes, records the empirical quantiles (or moments) of τ, fits a simple linear (or affine) relationship on the logit scale, and then predicts τ’s distribution for any intermediate or extrapolated c by back‑transforming.
The theoretical development proceeds as follows. The paper assumes a generic hierarchical model (e.g., Bayesian logistic regression with cluster‑specific random intercepts w_j ~ N(0,σ²)). Under standard regularity conditions, as the number of clusters grows, the posterior mean of the treatment effect and its variance scale with 1/c, leading to a predictable contraction of the posterior distribution. By applying a delta‑method argument to the posterior probability τ, the authors derive that logit(τ) ≈ α + β·c, where α and β are constants that can be estimated from the two simulated designs. Moreover, they quantify the additional uncertainty introduced by the finite Monte‑Carlo sample size, showing that the variance of the estimated β can be expressed analytically in terms of the observed variability at c₁ and c₂. This yields a closed‑form confidence band for the predicted τ at any c, allowing designers to assess not only point estimates of power but also the precision of those estimates.
With this result, the authors construct a Bayesian sample‑size determination (SSD) algorithm that requires only two simulation runs. For a given decision rule—reject the null hypothesis if τ ≥ γ, where γ is a calibrated threshold—the algorithm proceeds: (1) Choose two plausible cluster counts (e.g., a small and a moderate size); (2) Run full Bayesian analyses for each, obtaining the empirical distribution of τ; (3) Fit the linear logit model and compute the predicted power curve across a grid of c values; (4) Incorporate the analytic variance to produce confidence intervals for the power curve; (5) Select the smallest c that achieves the target power (e.g., 80 %) with acceptable uncertainty. This replaces the conventional approach that would require thousands of simulations for each candidate c, reducing computational cost from O(m·|C|) to O(m·2), where m is the number of Monte‑Carlo replications per simulated design.
The methodology is illustrated with a concrete example inspired by a cluster‑randomized tuberculosis preventive therapy trial (SST‑ARLET). In the example, households are randomized to a new regimen or standard care, and the binary outcome is the occurrence of a severe adverse event. The analysis model is a Bayesian logistic regression with a random intercept for each household. The estimand of interest is the marginal difference in adverse‑event rates between treatment arms, obtained by integrating over the distribution of the random intercepts (a marginal, non‑collapsible estimand). The authors simulate data at two cluster counts (e.g., 30 and 60 households), each with 10⁴ trial replications, and compute τ for a non‑inferiority threshold (γ≈0.95). Using their linear‑logit extrapolation, they generate a power curve for cluster counts ranging from 10 to 120. The extrapolated curve matches almost perfectly the curve obtained by exhaustive simulation at every intermediate cluster size, confirming the accuracy of the approximation. The accompanying confidence bands correctly capture the Monte‑Carlo variability, demonstrating that the method provides both point and uncertainty estimates for power.
The paper also discusses extensions. For adaptive designs where interim analyses may change the number of clusters, the linear relationship can be updated sequentially, enabling real‑time power monitoring. For trials with multiple endpoints, a multivariate extension of the logit model can be employed, allowing joint power calculations. Finally, the authors note that the approach can accommodate unequal cluster sizes or more complex correlation structures by adjusting the functional form (e.g., using 1/√c instead of 1/c) while preserving the core idea of low‑dimensional extrapolation.
In summary, this work delivers a rigorous, computationally efficient framework for Bayesian SSD in clustered‑data settings. By proving that posterior probabilities behave predictably as a function of the number of clusters, the authors reduce the simulation burden from thousands of designs to just two, while still providing analytically derived uncertainty quantification. This advancement makes Bayesian design more practical for complex trials, facilitates regulatory acceptance by offering transparent operating‑characteristic estimates, and opens avenues for further methodological development in adaptive and multi‑endpoint clustered trials.
Comments & Academic Discussion
Loading comments...
Leave a Comment