Stability of the Gibbs Sampler for Bayesian Hierarchical Models
We characterise the convergence of the Gibbs sampler which samples from the joint posterior distribution of parameters and missing data in hierarchical linear models with arbitrary symmetric error distributions. We show that the convergence can be uniform, geometric or sub-geometric depending on the relative tail behaviour of the error distributions, and on the parametrisation chosen. Our theory is applied to characterise the convergence of the Gibbs sampler on latent Gaussian process models. We indicate how the theoretical framework we introduce will be useful in analyzing more complex models.
💡 Research Summary
The paper provides a comprehensive theoretical investigation of the convergence behavior of the Gibbs sampler when it is used to draw from the joint posterior distribution of parameters and latent variables in Bayesian hierarchical linear models with arbitrary symmetric error distributions. By allowing the error distribution to range from light‑tailed (e.g., Gaussian, Laplace) to heavy‑tailed (e.g., Student‑t, Cauchy), the authors are able to classify the sampler’s ergodicity into three distinct regimes: uniform ergodicity, geometric ergodicity, and sub‑geometric ergodicity. The classification hinges on the tail decay rate of the error distribution: exponential tails guarantee a uniform lower bound on the transition kernel and thus uniform convergence; polynomial tails yield a V‑uniform bound that translates into geometric convergence; and extremely heavy tails lead to only polynomial decay of the kernel, resulting in sub‑geometric rates such as O(n‑α).
A second major contribution is the analysis of how the choice of parametrisation influences these rates. In the traditional “centered” parametrisation, the latent variables are sampled directly, so the tail properties of the error distribution are transmitted unchanged to the Gibbs transition kernel. In contrast, a “non‑centered” parametrisation rewrites each latent variable as a product of a scale parameter and a standard normal auxiliary variable. This re‑parameterisation attenuates the impact of heavy tails and can upgrade a sub‑geometric chain to geometric, and even to uniform in some cases. The authors prove these statements rigorously and illustrate them with simulation studies that show dramatic reductions in autocorrelation when moving from centered to non‑centered forms under heavy‑tailed errors.
The theoretical framework is then applied to latent Gaussian process (GP) models, which are infinite‑dimensional hierarchical structures that become finite‑dimensional after conditioning on observed locations. The paper demonstrates that the smoothness of the GP covariance function interacts with the error‑distribution tail behavior: smoother covariances diminish the effective heaviness of the tails, and a suitable non‑centered parametrisation restores geometric ergodicity even when the observation noise follows a Student‑t distribution. Empirical results confirm that the non‑centered GP sampler mixes orders of magnitude faster than its centered counterpart under the same heavy‑tailed noise.
Finally, the authors discuss the broader implications of their work. By explicitly linking tail behaviour and parametrisation to convergence rates, the paper offers a practical diagnostic for practitioners: before running a Gibbs sampler, one can examine the error distribution and decide whether a non‑centered re‑parameterisation is advisable. The authors argue that the same principles extend to more complex hierarchical models such as mixed‑effects models, sparse Bayesian regression, and deep Bayesian networks, where heavy‑tailed priors or likelihoods are common. In summary, the study delivers a unified, tail‑aware theory of Gibbs sampler stability, provides concrete guidelines for improving sampler efficiency, and opens avenues for future research on advanced Bayesian hierarchical structures.
Comments & Academic Discussion
Loading comments...
Leave a Comment