The Bayesian Bridge
We propose the Bayesian bridge estimator for regularized regression and classification. Two key mixture representations for the Bayesian bridge model are developed: (1) a scale mixture of normals with respect to an alpha-stable random variable; and (2) a mixture of Bartlett–Fejer kernels (or triangle densities) with respect to a two-component mixture of gamma random variables. Both lead to MCMC methods for posterior simulation, and these methods turn out to have complementary domains of maximum efficiency. The first representation is a well known result due to West (1987), and is the better choice for collinear design matrices. The second representation is new, and is more efficient for orthogonal problems, largely because it avoids the need to deal with exponentially tilted stable random variables. It also provides insight into the multimodality of the joint posterior distribution, a feature of the bridge model that is notably absent under ridge or lasso-type priors. We prove a theorem that extends this representation to a wider class of densities representable as scale mixtures of betas, and provide an explicit inversion formula for the mixing distribution. The connections with slice sampling and scale mixtures of normals are explored. On the practical side, we find that the Bayesian bridge model outperforms its classical cousin in estimation and prediction across a variety of data sets, both simulated and real. We also show that the MCMC for fitting the bridge model exhibits excellent mixing properties, particularly for the global scale parameter. This makes for a favorable contrast with analogous MCMC algorithms for other sparse Bayesian models. All methods described in this paper are implemented in the R package BayesBridge. An extensive set of simulation results are provided in two supplemental files.
💡 Research Summary
The paper introduces a Bayesian version of the bridge estimator for regularized regression and classification, where the penalty on the regression coefficients takes the form ν∑|β_j|^α with α∈(0,1]. By treating the penalized likelihood as a posterior distribution under a Gaussian likelihood and an independent exponential‑power prior, the authors develop two distinct data‑augmentation schemes that enable efficient Markov chain Monte Carlo (MCMC) sampling.
The first scheme exploits the classic result of West (1987) that the exponential‑power density can be expressed as a scale mixture of normals with respect to an α‑stable mixing distribution. Introducing latent scale variables λ_j yields conditionally Gaussian full‑conditional distributions for β, but the posterior of λ_j becomes an exponentially tilted α‑stable law, which lacks a closed‑form density. To sample from this distribution the authors adopt a modern rejection‑sampling algorithm for stable laws (Devroye 2009). This approach works well when the design matrix X is highly collinear, because the latent λ_j effectively capture the dependence structure.
The second, novel scheme avoids α‑stable variables altogether. By invoking the Bernstein‑Schönberg‑Williamson theorem for n‑monotone densities, the authors show that the same exponential‑power prior can be written as a scale mixture of Bartlett‑Fejer (triangular) kernels. The mixing distribution for the local scale ω_j is a two‑component mixture of Gamma distributions. Conditional on ω_j, β_j follows a simple triangular density, and the ω_j updates are straightforward because their full‑conditionals are standard Gamma mixtures. This representation is especially advantageous for orthogonal or near‑orthogonal designs, where it leads to rapid mixing of the global scale parameter τ and enables the sampler to move easily between multiple posterior modes—a feature that is difficult to achieve with the stable‑mixture formulation.
The authors prove a general theorem extending this representation to any density that can be expressed as a scale mixture of betas, providing an explicit inversion formula for the mixing distribution. They also discuss connections to slice sampling and other latent‑variable constructions.
Extensive simulation studies and real‑data applications (including a classic diabetes data set) demonstrate that the Bayesian bridge outperforms its classical counterpart in both estimation error and predictive performance. Moreover, the MCMC algorithms exhibit excellent mixing for τ, a notorious bottleneck in many sparse Bayesian models such as the horseshoe or double‑Pareto priors. Figure 1 in the paper illustrates that, on a 1000‑dimensional orthogonal regression problem with 900 true zeros, the bridge sampler reaches stationarity for τ orders of magnitude faster than the best existing horseshoe Gibbs sampler.
Practically, the paper provides the R package BayesBridge, which implements both the stable‑mixture and the Bartlett‑Fejer mixture samplers, along with diagnostic tools. The package makes the methods readily accessible to applied researchers interested in sparse Bayesian modeling.
In summary, the contribution lies in (1) revealing two complementary data‑augmentation representations for the exponential‑power prior, (2) delivering robust and fast MCMC algorithms that adapt to the structure of the design matrix, (3) offering theoretical insight into the multimodal nature of the bridge posterior, and (4) delivering software that brings these advances to a broad audience.
Comments & Academic Discussion
Loading comments...
Leave a Comment