Separating Oblivious and Adaptive Models of Variable Selection
Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees. This variant of the problem is motivated by \emph{variable selection} tasks, where the goal is to estimate the support of a $k$-sparse signal in $\mathbb{R}^d$. Our main contribution is a provable separation between the \emph{oblivious} (for each'') and \emph{adaptive} (for all’’) models of $\ell_\infty$ sparse recovery. We show that under an oblivious model, the optimal $\ell_\infty$ error is attainable in near-linear time with $\approx k\log d$ samples, whereas in an adaptive model, $\gtrsim k^2$ samples are necessary for any algorithm to achieve this bound. This establishes a surprising contrast with the standard $\ell_2$ setting, where $\approx k \log d$ samples suffice even for adaptive sparse recovery. We conclude with a preliminary examination of a \emph{partially-adaptive} model, where we show nontrivial variable selection guarantees are possible with $\approx k\log d$ measurements.
💡 Research Summary
This paper investigates sparse recovery with ℓ∞ error guarantees, a setting directly relevant to variable‑selection tasks where one wishes to recover the support of a k‑sparse vector in ℝ^d. The authors distinguish two measurement paradigms: an oblivious (or “for‑each”) model, where the sensing matrix is fixed in advance and does not depend on the signal, and an adaptive (or “for‑all”) model, where each measurement may be designed after observing previous outcomes. While in the classical ℓ₂‑norm setting both models achieve the optimal sample complexity of Θ(k log d), the paper shows that this equivalence breaks dramatically for ℓ∞‑norm recovery.
In the oblivious model the authors construct a near‑linear‑time algorithm that attains the optimal ℓ∞ error using only ≈k log d measurements. The method relies on random Gaussian sensing matrices combined with a coordinate‑wise median‑of‑means estimator and a simple hard‑thresholding step. Each coordinate is estimated independently with O(log d) measurements, leading to a total of O(k log d) samples and an overall runtime essentially linear in the ambient dimension d. The resulting ℓ∞ error scales as σ √(log d)/√m, matching the information‑theoretic lower bound for oblivious designs.
Conversely, for the adaptive model the authors prove a strong lower bound: any algorithm that achieves the same ℓ∞ error must use at least Ω(k²) measurements, regardless of computational power. The proof blends Yao’s minimax principle with Fano’s inequality, constructing a hard distribution over k‑sparse signals such that each adaptive query can convey only limited information about the unknown support. To keep the worst‑case ℓ∞ error small, the algorithm must essentially learn each non‑zero coordinate separately, which forces a quadratic dependence on k. This result highlights a stark contrast with the ℓ₂ setting, where adaptivity does not improve the sample complexity.
The paper also explores a partially‑adaptive regime in which the measurement process is divided into a small number of stages (e.g., O(log k)). Within each stage the design may depend on all previous observations, but the total number of stages is bounded. The authors show that with this limited adaptivity one can still achieve the O(k log d) sample bound while obtaining non‑trivial support‑recovery guarantees (e.g., recovering a constant fraction of the true support). The algorithmic framework combines iterative hard thresholding with adaptive threshold selection, and empirical simulations confirm its practical effectiveness.
Overall, the work provides a clean separation between oblivious and adaptive models for ℓ∞‑norm sparse recovery, establishing that adaptivity can be a liability in terms of sample efficiency when the goal is precise coordinate‑wise accuracy. The findings have immediate implications for high‑dimensional experimental design, genetics, compressed sensing, and any domain where variable selection under stringent per‑coordinate error constraints is required. The authors conclude by suggesting future directions, including tighter upper bounds for partially‑adaptive schemes, extensions to structured sparsity, and investigations of robustness under different noise models.