Heavy-tailed and Horseshoe priors for regression and sparse Besov rates

Heavy-tailed and Horseshoe priors for regression and sparse Besov rates
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The large variety of functions encountered in nonparametric statistics, calls for methods that are flexible enough to achieve optimal or near-optimal performance over a wide variety of functional classes, such as Besov balls, as well as over a large array of loss functions. In this work, we show that a class of heavy-tailed prior distributions on basis function coefficients introduced in \cite{AC} and called Oversmoothed heavy-Tailed (OT) priors, leads to Bayesian posterior distributions that satisfy these requirements; the case of horseshoe distributions is also investigated, for the first time in the context of nonparametrics, and we show that they fit into this framework. Posterior contraction rates are derived in two settings. The case of Sobolev–smooth signals and $L_2$–risk is considered first, along with a lower bound result showing that the imposed form of the scalings on prior coefficients by the OT prior is necessary to get full adaptation to smoothness. Second, the broader case of Besov-smooth signals with $L_{p’}$–risks, $p’ \geq 1$, is considered, and minimax posterior contraction rates, adaptive to the underlying smoothness, and including rates in the so-called {\em sparse} zone, are derived. We provide an implementation of the proposed method and illustrate our results through a simulation study.


💡 Research Summary

The paper addresses the problem of adaptive Bayesian non‑parametric regression in the Gaussian white‑noise model, focusing on achieving minimax posterior contraction rates over a broad spectrum of function classes (Sobolev and Besov spaces) and loss functions (L₂, L_{p′} with p′≥1). Classical Bayesian priors such as Gaussian processes (GPs) are known to be optimal for homogeneous smoothness but can be sub‑optimal when the true function exhibits spatially heterogeneous regularity, especially in the so‑called “sparse” zone of Besov spaces where the loss does not match the norm defining the function class.

To overcome these limitations, the authors introduce two families of heavy‑tailed priors on the coefficients of an orthonormal basis expansion: (i) Oversmoothed Heavy‑Tailed (OT) priors and (ii) Horseshoe priors. An OT prior is defined by f_k = σ_k ζ_k, where the deterministic scaling σ_k decays slightly faster than any polynomial, namely σ_k = exp{−(log k)^{1+ν}} with ν>0, while ζ_k are i.i.d. draws from a symmetric heavy‑tailed density h satisfying mild tail conditions (H1‑H3). This construction shrinks small noisy coefficients aggressively through σ_k, yet the heavy tails of ζ_k allow large true coefficients to be captured with high probability.

The paper first establishes L₂‑contraction rates for Sobolev‑smooth truths f₀∈H^β. It shows that OT priors achieve the (near‑optimal) rate n^{-β/(2β+1)} up to logarithmic factors that depend on ν and the tail parameter κ of h. A complementary lower‑bound analysis demonstrates that the more common polynomial scaling σ_k = k^{-α‑½} yields minimax rates only when α≥β (the oversmoothing regime); in the under‑smoothing regime α<β the rate deteriorates to a polynomially slower n^{-α/(2α+1)}. Hence the specific “oversmoothing” choice of σ_k is essential for full adaptation.

The second major contribution concerns Besov spaces B_{p,q}^s and L_{p′} losses. Besov spaces are characterized by smoothness s, integrability p, and summability q, and the interaction with the loss leads to three distinct zones: regular, intermediate, and sparse. In the regular zone linear estimators are minimax; in the intermediate zone they are sub‑optimal; and in the sparse zone the minimax rate depends on both s and the loss exponent p′. The authors prove that OT priors attain the minimax contraction rate (again up to log‑factors) uniformly across all three zones. In the sparse zone the rate takes the form n^{-s/(2s+1)}^{1−1/p′}·n^{-(1/p′−1/p)} (or an equivalent expression), matching known information‑theoretic lower bounds.

The Horseshoe prior is treated as a special case of the heavy‑tailed framework. Each coefficient follows f_k∼HS(σ_k), i.e., a scale‑mixture of normals with a half‑Cauchy local scale. The marginal density h_τ satisfies the same tail conditions (H1‑H3) with κ=0, and the prior can be written as f_k = σ_k ζ_k where ζ_k∼HS(1). Consequently, the same contraction results hold for Horseshoe priors, even though they lack finite moments. The paper also discusses a truncated Horseshoe version with a uniform scaling τ up to the sample size n, showing that appropriate choices of τ (typically a negative power of n) preserve the optimal rates.

Implementation details are provided: the hyper‑parameter ν is set to ½ in all experiments, and the posterior is sampled via standard MCMC (Gibbs or Metropolis‑within‑Gibbs) because the likelihood remains Gaussian. A simulation study compares the proposed OT/Horseshoe methods against wavelet thresholding, GP priors, and state‑of‑the‑art sparse‑regression software. Results indicate that the heavy‑tailed priors achieve equal or lower empirical risk across L₂, L_∞, and L_{p′} losses, and they remain computationally competitive.

In summary, the paper makes three original contributions: (1) it shows that OT priors, with their specific oversmoothing scaling, automatically adapt to unknown smoothness and achieve near‑minimax rates over Sobolev and Besov classes for a wide range of losses, including the challenging sparse zone; (2) it provides the first rigorous posterior contraction analysis for Horseshoe priors in non‑parametric regression, demonstrating that they belong to the same heavy‑tailed family and inherit the same optimality properties; (3) it validates the theoretical findings with a practical algorithm and empirical evaluation, establishing that heavy‑tailed Bayesian priors can be both theoretically optimal and practically effective for adaptive non‑parametric estimation.


Comments & Academic Discussion

Loading comments...

Leave a Comment