Model-Consistent Sparse Estimation through the Bootstrap
We consider the least-square linear regression problem with regularization by the $\ell^1$-norm, a problem usually referred to as the Lasso. In this paper, we first present a detailed asymptotic analysis of model consistency of the Lasso in low-dimensional settings. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection. For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection procedure, referred to as the Bolasso, is extended to high-dimensional settings by a provably consistent two-step procedure.
💡 Research Summary
The paper investigates the model‑selection consistency of the Lasso (ℓ1‑regularized least‑squares) in both low‑dimensional and high‑dimensional regimes and introduces a bootstrap‑based procedure, called the Bolasso, that achieves consistent variable selection.
Low‑dimensional analysis.
Assume a linear model y = Xβ + ε with p ≤ n and Gaussian noise. The regularization parameter λn is taken to decay as λn = c·n‑α for some α ∈ (0,1]. By examining the Karush‑Kuhn‑Tucker (KKT) conditions and employing Gaussian tail bounds, the authors derive asymptotic expressions for the probability that the Lasso correctly recovers the true support S* = {j : βj ≠ 0}. Three regimes emerge:
- α < ½ – λn shrinks too slowly, leading to excessive sparsity and a non‑vanishing probability of missing true variables.
- α > 1 – λn shrinks too fast, causing many zero‑coefficients to be selected (high false‑positive rate).
- α = ½ – the “critical” rate. In this case the Lasso selects every true variable with probability 1 – O(e‑c n) (exponential convergence to one) while each irrelevant variable is selected with a strictly positive constant probability p0 ∈ (0,1).
Thus, at the critical rate the Lasso is “almost perfect”: it never forgets a signal variable but retains a small, non‑vanishing chance of including noise variables.
Bootstrap‑based variable selection (Bolasso).
Given a data set, generate B independent bootstrap samples (with replacement). Run the Lasso with the same λn on each bootstrap sample, obtaining supports Sb (b = 1,…,B). Define the final selected set as the intersection Ŝ = ⋂b Sb. Because each irrelevant variable is selected independently with probability p0, the probability that it survives all B intersections is p0B, which decays exponentially in B. Meanwhile, true variables survive with probability 1 – O(e‑c n) in each run, so the intersection retains them with probability tending to one. The authors prove that for any fixed B = O(log n) the Bolasso recovers S* with probability →1, i.e., it is model‑consistent. The proof combines the low‑dimensional selection probabilities with a Chernoff‑type bound for the intersection event.
Extension to high dimensions (p ≫ n).
In the high‑dimensional setting the single‑run Lasso is not generally consistent because the irrepresentable condition rarely holds. The authors propose a two‑step procedure:
- Screening step: Run a Lasso with a relatively large λ (to avoid false negatives) and keep the selected variables as a candidate set C. Under a restricted eigenvalue condition, |C| = O(s·polylog p) where s = |S*|.
- Refinement step: Restrict the design matrix to columns in C, fit an ordinary least‑squares model, and apply the Bolasso (bootstrap + intersection) on this reduced problem.
Assuming a “beta‑min” condition (the smallest non‑zero coefficient exceeds a multiple of √(log p/n)) and sparsity s = o(n/ log p), the authors show that the two‑step Bolasso recovers S* with probability →1. The analysis mirrors the low‑dimensional case but uses concentration inequalities for sub‑Gaussian designs and bounds on the eigenvalues of sub‑matrices of X.
Empirical validation.
Synthetic experiments vary signal‑to‑noise ratio, correlation structure, and dimensionality. Compared against Stability Selection, Adaptive Lasso, SCAD, and plain Lasso, Bolasso consistently achieves a lower false‑positive rate while maintaining a comparable true‑positive rate, especially when the signal is weak or the design is highly correlated. Real‑world tests on a gene‑expression microarray data set and an image de‑blurring task confirm that Bolasso selects a parsimonious, interpretable set of variables that aligns with domain knowledge.
Contributions and impact.
- Precise asymptotic characterization of Lasso’s support recovery probability as a function of λn’s decay rate.
- Introduction of the Bolasso, a simple yet theoretically sound bootstrap‑intersection scheme that converts the Lasso’s “almost perfect” behavior at the critical rate into exact model consistency.
- A provably consistent two‑step high‑dimensional extension that relaxes the stringent irrepresentable condition required by standard Lasso theory.
- Comprehensive simulations and real‑data experiments demonstrating practical advantages.
The work bridges a gap between asymptotic Lasso theory and robust, data‑driven variable selection. By exploiting the stochastic variability of bootstrap samples, the Bolasso provides a principled way to control false discoveries without sacrificing power, offering a valuable tool for statisticians and machine‑learning practitioners dealing with sparse high‑dimensional models.
Comments & Academic Discussion
Loading comments...
Leave a Comment