A Multivariate Polya Tree Model for Meta-Analysis with Event Time Distributions
We develop a non-parametric Bayesian prior for a family of random probability measures by extending the Polya tree ($PT$) prior to a joint prior for a set of probability measures $G_1,\dots,G_n$, suitable for meta-analysis with event time outcomes. In the application to meta-analysis $G_i$ is the event time distribution specific to study $i$. The proposed model defines a regression on study-specific covariates by introducing increased correlation for any pair of studies with similar characteristics. The desired multivariate $PT$ model is constructed by introducing a hierarchical prior on the conditional splitting probabilities in the $PT$ construction for each of the $G_i$. The hierarchical prior replaces the independent beta priors for the splitting probability in the $PT$ construction with a Gaussian process prior for corresponding (logit) splitting probabilities across all studies. The Gaussian process is indexed by study-specific covariates, introducing the desired dependence with increased correlation for similar studies. The main feature of the proposed construction is (conditionally) conjugate posterior updating with commonly reported inference summaries for event time data. The construction is motivated by a meta-analysis over cancer immunotherapy studies.
💡 Research Summary
This paper introduces a novel non‑parametric Bayesian framework for meta‑analysis of event‑time (survival) outcomes, built on a multivariate extension of the Polya tree (PT) prior. In traditional meta‑analysis of survival data, investigators often have access only to summary statistics such as the median survival time and its confidence interval for each study or cohort. Such limited information hampers the ability to model heterogeneity across studies, especially when dealing with rare tumor types or biomarker sub‑groups. The authors address this limitation by constructing a joint prior for a collection of study‑specific probability measures (G_1,\dots,G_n) that represent the underlying event‑time distributions.
The core idea is to retain the hierarchical binary partitioning structure of the PT, which yields conjugate posterior updates when data are expressed as counts in the partition cells. Instead of assigning independent Beta priors to the conditional splitting probabilities at each node, the authors place a Gaussian process (GP) prior on the logit‑transformed splitting probabilities. The GP is indexed by study‑level covariates (e.g., tumor type, treatment agent, biomarker status, study indicator), thereby inducing correlation among studies that share similar characteristics. For tree depths up to a pre‑specified level (D), the GP governs dependence; beyond depth (D) the model reverts to independent Beta priors, preserving computational tractability.
Mathematically, for each node (\epsilon) in the binary tree, the logit of the left‑split probability for study (i) is denoted (Z_i^{\epsilon}= \text{logit}(Y_i^{\epsilon})). The vector ((Z_1^{\epsilon},\dots,Z_n^{\epsilon})) follows a multivariate normal distribution with mean function (\mu^{\epsilon}(\cdot)) and covariance function (K^{\epsilon}(\cdot,\cdot)) defined by the GP. Different nodes have independent GPs, allowing flexible, node‑specific correlation structures. The covariance function can be any standard kernel (e.g., squared‑exponential, Matérn) that captures similarity in the covariate space. Hyper‑parameters controlling overall variance and length‑scale are chosen to reflect prior beliefs about the degree of borrowing across studies.
Because the PT representation reduces the observed summary ((\ell_i,m_i,h_i)) and sample size (N_i) to counts in four intervals (below (\ell_i), between (\ell_i) and (m_i), between (m_i) and (h_i), above (h_i)), posterior updating proceeds exactly as in the univariate PT case: Beta–Bernoulli conjugacy yields closed‑form posterior parameters for each node, conditional on the latent logit values. The logit‑normal prior introduced by the GP is handled via Polya‑Gamma data augmentation, enabling Gibbs sampling of the latent (Z_i^{\epsilon}) in a conditionally Gaussian form. Consequently, the entire model can be fit with a relatively simple Gibbs sampler that alternates between (i) updating the latent counts given current splitting probabilities, (ii) updating the GP latent variables using Polya‑Gamma draws, and (iii) updating GP hyper‑parameters.
The methodology is illustrated on a meta‑analysis of 174 published early‑phase cancer immunotherapy trials, focusing on progression‑free survival (PFS). From these studies, 33 papers provide data for 84 distinct cohorts (each cohort corresponds to a biomarker‑positive or -negative subgroup). For each cohort, only the median PFS and its 95 % confidence interval are reported. The authors model the underlying PFS distribution for each cohort, borrowing strength across studies via the GP. They then compute posterior probabilities that the median PFS for biomarker‑positive patients exceeds that for biomarker‑negative patients within each study, addressing the clinical question of whether biomarker‑guided treatment improves outcomes.
Results show that the multivariate PT with GP dependence yields substantially narrower credible intervals for median PFS in rare tumor types and small cohorts compared with standard random‑effects meta‑regression. Moreover, the model naturally incorporates covariate information, allowing exploration of how treatment agents or tumor types modulate the biomarker effect. The authors also demonstrate that the approach respects the non‑smooth nature of PT densities—appropriate for survival analysis where the cumulative distribution (or survival curve) is the primary object of interest.
Key contributions of the paper include:
- A hierarchical GP‑logit prior for PT splitting probabilities, enabling study‑specific dependence driven by observable covariates.
- Conditionally conjugate posterior updating despite the introduction of dependence, achieved through Polya‑Gamma augmentation.
- Practical handling of summary‑only survival data, converting median and confidence interval information into interval counts compatible with the PT framework.
- Demonstrated gains in inference for heterogeneous and sparse meta‑analysis settings, particularly for rare biomarkers and tumor sub‑populations.
The authors acknowledge limitations such as sensitivity to the choice of tree depth (D) and partition scheme, potential instability of GP hyper‑parameter estimation when the number of studies is small, and the binary partitioning restriction which may require extensions for smoother density estimation. Future work is suggested on multi‑event outcomes, non‑binary partitioning schemes, and fully data‑driven GP hyper‑parameter learning.
Overall, the paper provides a rigorous, computationally feasible, and clinically relevant tool for meta‑analysis of survival outcomes, bridging the gap between limited summary data and the need for flexible, covariate‑aware borrowing of information across studies.
Comments & Academic Discussion
Loading comments...
Leave a Comment