Nested Sampling with Slice-within-Gibbs: Efficient Evidence Calculation for Hierarchical Bayesian Models

Reading time: 5 minute
...

📝 Original Info

  • Title: Nested Sampling with Slice-within-Gibbs: Efficient Evidence Calculation for Hierarchical Bayesian Models
  • ArXiv ID: 2602.17414
  • Date: 2026-02-19
  • Authors: ** (논문에 명시된 저자 정보가 제공되지 않아 저자 명단을 기재할 수 없습니다. 원문 PDF 혹은 저널 페이지에서 확인하시기 바랍니다.) **

📝 Abstract

We present Nested Sampling with Slice-within-Gibbs (NS-SwiG), an algorithm for Bayesian inference and evidence estimation in high-dimensional models whose likelihood admits a factorization, such as hierarchical Bayesian models. We construct a procedure to sample from the likelihood-constrained prior using a Slice-within-Gibbs kernel: an outer update of hyperparameters followed by inner block updates over local parameters. A likelihood-budget decomposition caches per-block contributions so that each local update checks feasibility in constant time rather than recomputing the global constraint at linearly growing cost. This reduces the per-replacement cost from quadratic to linear in the number of groups, and the overall algorithmic complexity from cubic to quadratic under standard assumptions. The decomposition extends naturally beyond independent observations, and we demonstrate this on Markov-structured latent variables. We evaluate NS-SwiG on challenging benchmarks, demonstrating scalability to thousands of dimensions and accurate evidence estimates even on posterior geometries where state-of-the-art gradient-based samplers can struggle.

💡 Deep Analysis

📄 Full Content

Nested sampling (Skilling, 2006) has become a cornerstone of Bayesian inference in the physical sciences (Ashton et al., 2022;Buchner, 2023). Unlike Markov chain Monte Carlo (MCMC), nested sampling directly estimates the Bayesian evidence Z = L(D|θ) π(θ) dθ, enabling rigorous model comparison. This is central to hypothesis testing tasks such as comparing physical models, distinguishing signal from noise, or selecting among population models. Practical implementations treat constrained-prior sampling-drawing from the prior truncated to a likelihood threshold-largely as a black-box task (Feroz et al., 2009;Handley et al., 2015a;Speagle, 2020;Buchner, 2021), with limited exploitation of model-specific factorisation. In high dimensions, the available constrained MCMC mutation kernels can mix poorly, often resembling local random-walk exploration.

The dominant paradigm for scaling Bayesian methods to high dimensions is to use the gradient of the target distribution to suppress this random-walk behaviour (Fearnhead et al., 2024). Gradient-based samplers such as Hamiltonian Monte Carlo (HMC) (Neal, 2011) and the No-U-Turn Sampler (NUTS) (Hoffman & Gelman, 2014) have become ubiquitous general-purpose inference tools, and the success of Stan (Carpenter et al., 2017) has reinforced a widespread view-particularly among practitioners-that gradients are essential for scalability. Designing efficient gradient-based samplers for hard likelihood constraints remains challenging; existing approaches for nested sampling (Feroz & Skilling, 2013;Lemos et al., 2023) have not yet demonstrated robust performance across generic problem classes (Kroupa et al., 2025), outside settings with additional structure (e.g., log-concave models amenable to proximal methods (Cai et al., 2022)). This has left nested sampling viewed as well suited for low-dimensional problems with pathological geometries, but requiring alternatives for high-dimensional inference (Piras et al., 2024). This paper introduces Nested Sampling with Slice-within-Gibbs (NS-SwiG), a constrained-sampling inner loop that exploits the conditional independence structure of models with factorised likelihoods. Consider a hierarchical model with J groups, local parameters θ j , and shared hyperparameters ψ, where the loglikelihood decomposes as ℓ(ψ, {θ j }) = J j=1 ℓ j (θ j , ψ). Standard nested sampling checks the global constraint 1 arXiv:2602.17414v1 [stat.CO] 19 Feb 2026 ℓ > ℓ * at every MCMC proposal, costing O(J) per evaluation; with O(J) parameters requiring updates, this yields O(J 2 ) per replacement. NS-SwiG decomposes the global constraint into per-block budgets checkable in O(1), reducing the cost to O(J).

This builds on contemporary work reviving interest in coordinate-wise Metropolis-within-Gibbs schemes as scalable alternatives to joint-gradient methods (Ascolani et al., 2024;Luu et al., 2025). We embed a Slicewithin-Gibbs kernel within a nested sampling outer loop, targeting the joint constrained prior with block updates: an outer slice update of hyperparameters followed by an inner sweep over the J local blocks. We demonstrate the scalability this unlocks on challenging benchmarks up to d ∼ 2500 dimensions. In summary, we make the following contributions:

(i) Likelihood-budget decomposition. We show that the global nested sampling constraint can be decomposed into per-block budgets updated in O(1) from a cached total, extending naturally to Markov-structured latent variables where budgets depend on local neighbourhoods.

(ii) Constrained Slice-within-Gibbs kernel. We construct an MCMC kernel for the joint constrained prior using blocked slice updates, yielding O(J) per sweep compared to O(J 2 ) for standard joint-space constrained sampling.

(iii) Implementation and validation. We provide a fully vectorised JAX implementation building on Yallup et al. (2026), validate posterior recovery against NUTS on four benchmarks, and demonstrate accurate evidence estimates at scale. We motivate a number of immediate applications within astrophysics, where nested sampling is already widely used for object-level inference problems.

The remainder of the paper is organized as follows. Section 2 reviews nested sampling fundamentals and hierarchical model structure. Section 3 presents NS-SwiG, including the budget decomposition, caching strategy, and constrained Slice-within-Gibbs updates. Section 4 reports numerical experiments validating correctness and scaling across four benchmarks. Section 5 discusses limitations, extensions, and practical considerations, and Section 6 concludes.

Many high-dimensional Bayesian models are high-dimensional because they compose many repeated lowdimensional components. Consider an inference task where J objects are observed with data D j ; we construct a model with local parameters θ j and shared hyperparameters ψ. The likelihood factorizes as

L j (D j | θ j , ψ).

(1)

Defining ℓ j (θ j , ψ) := log p(D j | θ j , ψ), the log-likelihood d

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut