Parallel hierarchical sampling: a practical multiple-chains sampler for Bayesian model selection
This paper introduces the parallel hierarchical sampler (PHS), a Markov chain Monte Carlo algorithm using several chains simultaneously. The connections between PHS and the parallel tempering (PT) algorithm are illustrated, convergence of PHS joint transition kernel is proved and and its practical advantages are emphasized. We illustrate the inferences obtained using PHS, parallel tempering and the Metropolis-Hastings algorithm for three Bayesian model selection problems, namely Gaussian clustering, the selection of covariates for a linear regression model and the selection of the structure of a treed survival model.
💡 Research Summary
The paper introduces the Parallel Hierarchical Sampler (PHS), a novel Markov chain Monte Carlo (MCMC) algorithm designed for Bayesian model selection problems that require efficient exploration of high‑dimensional, multimodal posterior distributions. Unlike traditional Parallel Tempering (PT), which runs multiple chains at different temperatures and relies on temperature‑dependent exchange moves, PHS runs several chains that all target the same posterior distribution. Each chain performs a standard Metropolis–Hastings (MH) update independently, after which a global “hierarchical” exchange step is executed. In this exchange step the current states of all chains are collected, a new joint configuration is proposed by permuting or otherwise reshuffling the whole set, and a Metropolis acceptance probability is computed for the entire ensemble. Because the proposal distribution is symmetric, the acceptance probability reduces to the usual MH ratio applied to the joint state, guaranteeing detailed balance with respect to the target posterior. The authors prove that the resulting joint transition kernel is reversible and leaves the posterior invariant, establishing theoretical convergence.
The hierarchical exchange mechanism has two important practical consequences. First, there is no need to tune temperature ladders or to design temperature‑specific proposal distributions, which simplifies implementation and reduces the risk of poor mixing due to inappropriate temperature spacing. Second, because the exchange involves the whole ensemble, the probability of accepting a move can be substantially higher than in PT, where only a pair of chains is swapped and acceptance often deteriorates for large temperature gaps. The authors also show that as the number of chains increases, the diversity of joint proposals grows, leading to faster mixing in practice.
Three empirical studies illustrate the advantages of PHS. In a Gaussian mixture clustering task with several latent components, PHS rapidly converges to the correct number of clusters and accurate component parameters, outperforming PT whose mixing slows down when high‑temperature chains dominate. In a sparse linear regression variable‑selection problem with twenty candidate predictors, PHS efficiently identifies the true predictors, achieving higher posterior inclusion probabilities and requiring roughly 40 % fewer iterations than PT to reach stable estimates. Finally, in a treed survival model where the tree structure itself is a random variable, PHS explores the combinatorial space of splits and pruning decisions far more effectively than both PT and a single‑chain MH sampler, attaining higher log‑posterior values and locating the optimal tree in fewer sweeps.
The discussion acknowledges that while PHS eliminates temperature tuning, it does increase memory usage and the computational cost of the global exchange step, which scales linearly with the number of chains. The authors suggest possible extensions such as adaptive chain‑count strategies, partial‑ensemble exchanges, and GPU‑accelerated implementations to mitigate these costs.
In conclusion, the Parallel Hierarchical Sampler offers a theoretically sound and practically efficient alternative to existing multi‑chain MCMC methods. Its ability to maintain a single target distribution across all chains, combined with a high‑acceptance hierarchical exchange, yields superior mixing and more reliable model‑selection performance in a variety of Bayesian contexts, especially those characterized by complex, multimodal posteriors.
Comments & Academic Discussion
Loading comments...
Leave a Comment