Parameter expansion in local-shrinkage models
This paper considers the problem of using MCMC to fit sparse Bayesian models based on normal scale-mixture priors. Examples of this framework include the Bayesian LASSO and the horseshoe prior. We study the usefulness of parameter expansion (PX) for improving convergence in such models, which is notoriously slow when the global variance component is near zero. Our conclusion is that parameter expansion does improve matters in LASSO-type models, but only modestly. In most cases this improvement, while noticeable, is less than what might be expected, especially compared to the improvements that PX makes possible for models very similar to those considered here. We give some examples, and we attempt to provide some intuition as to why this is so. We also describe how slice sampling may be used to update the global variance component. In practice, this approach seems to perform almost as well as parameter expansion. As a practical matter, however, it is perhaps best viewed not as a replacement for PX, but as a tool for expanding the class of models to which PX is applicable.
💡 Research Summary
The paper investigates the notorious slow mixing of Markov chain Monte Carlo (MCMC) algorithms when fitting sparse Bayesian regression models that employ normal‑scale‑mixture priors, such as the Bayesian LASSO and the horseshoe. These models are built on a hierarchical “global‑local” structure: a global variance component τ² controls overall shrinkage, while local scale parameters λi (or ψi) allow individual coefficients to escape shrinkage. When τ is close to zero, the posterior coupling between τ and the λi becomes extremely strong, leading to very high autocorrelation in Gibbs updates and consequently poor effective sample sizes.
To alleviate this problem the authors explore Parameter Expansion (PX), a technique that introduces redundant parameters to break the tight τ–λi dependence. Specifically, τ is re‑parameterised as the product of two independent auxiliary scales γ and δ (τ = γ δ). Each auxiliary receives a weakly informative prior (e.g., half‑Normal) and conditional normal updates are derived for the original parameters. In theory, this re‑parameterisation should allow the sampler to move more freely in the τ direction because the dependence on any single auxiliary is weaker.
The authors conduct a series of simulation studies focusing primarily on LASSO‑type priors, where the normal‑scale‑mixture representation corresponds to a Laplace marginal. Their results show that PX does improve mixing relative to a naïve Gibbs sampler, but the gains are modest: effective sample sizes increase by roughly 1.5–2×, far below the dramatic speed‑ups reported for PX in other hierarchical models (often an order of magnitude). The paper offers two explanations. First, the Laplace mixture has a relatively light tail, so the global‑local coupling is not as pathological as in heavier‑tailed priors; the expansion therefore cannot completely decorrelate τ from the λi. Second, the introduction of γ and δ adds extra dimensions to the state space, creating new correlations that partially offset the benefits of the expansion.
In parallel, the authors propose a slice‑sampling scheme for τ. Slice sampling does not require a proposal distribution that matches the shape of the posterior; instead it samples a uniform auxiliary variable u and then draws τ uniformly from the region where the conditional posterior exceeds u. This approach automatically adapts to the steepness of the τ‑posterior, even when τ is near zero, and avoids the “sticky” behaviour that plagues standard Gibbs updates. Empirical comparisons reveal that slice sampling achieves effective sample sizes essentially indistinguishable from those obtained with PX, while being simpler to implement and computationally comparable.
The paper also extends the experiments to the horseshoe prior, which possesses a much heavier tail and a stronger tendency for τ to collapse toward zero. Even in this setting, the slice sampler remains robust, and PX provides only a slight additional improvement. The authors conclude that, for the class of local‑shrinkage models examined, parameter expansion is a useful but limited tool; slice sampling offers a practically equivalent alternative and can be applied to models where PX is difficult to formulate.
Beyond the empirical findings, the authors discuss broader implications. They argue that the modest impact of PX stems from the intrinsic structure of the priors: when the global‑local hierarchy is already relatively weakly coupled, breaking the remaining dependence yields diminishing returns. Consequently, model designers might achieve greater gains by selecting priors that naturally mitigate τ–λi coupling, rather than relying on algorithmic tricks. Finally, the paper positions slice sampling as a flexible “plug‑in” that can broaden the applicability of PX‑style acceleration to more complex hierarchical shrinkage models, suggesting a promising direction for future research in scalable Bayesian variable selection.
Comments & Academic Discussion
Loading comments...
Leave a Comment