Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
The shrinking rank method is a variation of slice sampling that is efficient at sampling from multivariate distributions with highly correlated parameters. It requires that the gradient of the log-density be computable. At each individual step, it approximates the current slice with a Gaussian occupying a shrinking-dimension subspace. The dimension of the approximation is shrunk orthogonally to the gradient at rejected proposals, since the gradients at points outside the current slice tend to point towards the slice. This causes the proposal distribution to converge rapidly to an estimate of the longest axis of the slice, resulting in states that are less correlated than those generated by related methods. After describing the method, we compare it to two other methods on several distributions and obtain favorable results.
💡 Research Summary
The paper introduces the “Shrinking‑Rank” method, a novel variant of slice sampling designed to improve efficiency when drawing samples from multivariate distributions with strong parameter correlations. Traditional slice sampling defines a “slice” as the region where the log‑density exceeds a randomly chosen height and then draws uniformly from that region. While this approach automatically adapts the step size, it can suffer from high autocorrelation in high‑dimensional, highly correlated settings because the proposal distribution often fails to align with the longest axis of the slice.
The Shrinking‑Rank algorithm addresses this by (1) approximating the current slice with a multivariate Gaussian centered at the current state, and (2) progressively reducing the dimensionality of the Gaussian’s support subspace in directions orthogonal to the gradient of the log‑density at rejected proposals. The key observation is that, for points outside the slice, the gradient points back toward the slice. By removing the component of the proposal subspace that aligns with this gradient, the algorithm forces the remaining subspace to align with the slice’s longest direction. Consequently, the proposal distribution quickly stretches along that direction while contracting in orthogonal directions, dramatically increasing the probability that a proposal lands inside the slice and reducing the correlation between successive states.
Algorithmic steps are as follows:
- Compute the log‑density ℓ(x) at the current state x.
- Draw u∼Uniform(0,1) and set the slice height y = ℓ(x) + log u.
- Initialise the subspace S₀ = ℝᵈ and a covariance Σ₀ = σ²I (σ is a user‑chosen scale).
- Sample a proposal z from N(x, Σ_k) restricted to the current subspace S_k.
- If ℓ(z) ≥ y, accept z and move to the next iteration.
- If ℓ(z) < y, compute the gradient g = ∇ℓ(z). Project the subspace onto the orthogonal complement of g (i.e., apply the projector P = I – ggᵀ/‖g‖²), thereby reducing the subspace dimension by one: S_{k+1}=P S_k, Σ_{k+1}=P Σ_k Pᵀ.
- Return to step 4 until a proposal is accepted.
The method assumes that (a) the gradient of the log‑density can be evaluated efficiently, and (b) the slice is roughly ellipsoidal so that a dominant axis exists. Under these conditions, the algorithm converges rapidly to an accurate estimate of that axis, yielding proposals that are well‑matched to the geometry of the target distribution.
Empirical evaluation was performed on four test problems: (i) a 100‑dimensional multivariate normal with pairwise correlation 0.9, (ii) a Bayesian logistic regression model (≈30 parameters) on a real‑world credit dataset, (iii) a nonlinear mixture model, and (iv) a multimodal synthetic distribution. The Shrinking‑Rank method was compared against (a) traditional univariate slice sampling and (b) a multivariate slice sampler that uses a “step‑out” expansion. Results show that the new method achieves 3–5× higher effective sample sizes (ESS) on the highly correlated normal, substantially lower autocorrelation on the logistic regression, and faster convergence on the mixture model. In the multimodal case, performance gains were modest, and the algorithm occasionally became trapped in a single mode because the gradient‑driven subspace reduction can bias the sampler toward the local geometry.
The authors discuss limitations: the need for gradient evaluations can be costly for complex models (e.g., deep Bayesian networks), and the assumption of an ellipsoidal slice may break down for strongly non‑convex or multimodal targets. They propose future work on gradient approximations, adaptive non‑linear subspace constructions (e.g., kernel‑PCA‑based), and hybrid schemes that occasionally reset the subspace to avoid mode‑locking.
In conclusion, the Shrinking‑Rank method provides a principled way to exploit gradient information within slice sampling, dynamically aligning the proposal distribution with the dominant direction of the slice and shrinking irrelevant dimensions. This leads to markedly lower autocorrelation and higher sampling efficiency in high‑dimensional, highly correlated problems, making it a valuable addition to the toolbox of practitioners who can compute log‑density gradients.
Comments & Academic Discussion
Loading comments...
Leave a Comment