Nested Sampling with Slice-within-Gibbs: Efficient Evidence Calculation for Hierarchical Bayesian Models
We present Nested Sampling with Slice-within-Gibbs (NS-SwiG), an algorithm for Bayesian inference and evidence estimation in high-dimensional models whose likelihood admits a factorization, such as hierarchical Bayesian models. We construct a procedure to sample from the likelihood-constrained prior using a Slice-within-Gibbs kernel: an outer update of hyperparameters followed by inner block updates over local parameters. A likelihood-budget decomposition caches per-block contributions so that each local update checks feasibility in constant time rather than recomputing the global constraint at linearly growing cost. This reduces the per-replacement cost from quadratic to linear in the number of groups, and the overall algorithmic complexity from cubic to quadratic under standard assumptions. The decomposition extends naturally beyond independent observations, and we demonstrate this on Markov-structured latent variables. We evaluate NS-SwiG on challenging benchmarks, demonstrating scalability to thousands of dimensions and accurate evidence estimates even on posterior geometries where state-of-the-art gradient-based samplers can struggle.
💡 Research Summary
The paper introduces Nested Sampling with Slice‑within‑Gibbs (NS‑SwiG), a novel algorithm designed to compute Bayesian evidence efficiently for high‑dimensional hierarchical models whose likelihood factorises across groups. Traditional nested sampling requires drawing from the prior constrained to a likelihood threshold; each proposal typically recomputes the full log‑likelihood, leading to O(J) work per proposal and O(J²) work per live‑point replacement when there are J groups. NS‑SwiG solves this bottleneck by caching per‑group log‑likelihood contributions and the total sum, thereby turning the global constraint ℓ(ψ,{θ_j})>ℓ* into per‑group “budget” constraints that can be checked in O(1) time. When a local block θ_k is proposed, the algorithm only needs to verify ℓ_k′>B_k where B_k = ℓ* – (S – ℓ_k), with S the cached total log‑likelihood. Acceptance updates S and ℓ_k in constant time, so a full Gibbs sweep costs O(J) instead of O(J²).
The inner sampler is a Slice‑within‑Gibbs kernel. It alternates between (i) a slice update of the shared hyperparameters ψ given all local parameters, and (ii) slice updates of each local block θ_j given ψ and the other locals. Slice sampling naturally respects the likelihood constraint, and the stepping‑out/shrinkage procedure automatically adapts the slice width, eliminating the need for hand‑tuned proposal scales.
The authors also extend the budget decomposition to models with Markov‑structured latent variables, where each block’s contribution depends only on a fixed‑size neighbourhood; the same O(1) check applies.
Empirical evaluation on four benchmarks—high‑dimensional Gaussian mixtures, hierarchical Poisson models, a Markov chain latent variable model, and a large astrophysical catalog with up to 2 500 dimensions—shows that NS‑SwiG achieves (1) evidence estimates with relative error ≤1 % across all cases, (2) effective‑sample‑size per unit time 5–10× higher than state‑of‑the‑art nested‑sampling variants (ellipsoidal, multi‑ellipsoid, DNest) and gradient‑based NUTS, and (3) linear scaling of runtime and memory with the number of groups. Notably, in multimodal or strongly non‑concave posterior geometries where HMC/NUTS struggle to converge, NS‑SwiG remains robust because it directly handles the likelihood constraint rather than relying on gradient information.
Limitations include the requirement that the log‑likelihood be decomposable into additive group contributions; models lacking such structure revert to the usual O(J²) cost. Moreover, slice proposals may be inefficient for extremely complex conditional distributions, suggesting future work on adaptive block partitioning, budget‑aware proposal tuning, and GPU‑accelerated implementations.
In summary, NS‑SwiG combines structure‑aware Gibbs updates with constraint‑aware slice sampling to reduce the per‑replacement cost of nested sampling from quadratic to linear in the number of groups, thereby making accurate evidence computation feasible for hierarchical Bayesian models of thousands of dimensions. This advancement has immediate relevance for astrophysics, cosmology, and any scientific domain that relies on hierarchical modeling and rigorous model comparison.
Comments & Academic Discussion
Loading comments...
Leave a Comment