Separating Oblivious and Adaptive Models of Variable Selection

Reading time: 6 minute
...

📝 Original Info

  • Title: Separating Oblivious and Adaptive Models of Variable Selection
  • ArXiv ID: 2602.16568
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명 및 소속을 확인하려면 원문을 참고하십시오.) **

📝 Abstract

Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees. This variant of the problem is motivated by \emph{variable selection} tasks, where the goal is to estimate the support of a $k$-sparse signal in $\mathbb{R}^d$. Our main contribution is a provable separation between the \emph{oblivious} (``for each'') and \emph{adaptive} (``for all'') models of $\ell_\infty$ sparse recovery. We show that under an oblivious model, the optimal $\ell_\infty$ error is attainable in near-linear time with $\approx k\log d$ samples, whereas in an adaptive model, $\gtrsim k^2$ samples are necessary for any algorithm to achieve this bound. This establishes a surprising contrast with the standard $\ell_2$ setting, where $\approx k \log d$ samples suffice even for adaptive sparse recovery. We conclude with a preliminary examination of a \emph{partially-adaptive} model, where we show nontrivial variable selection guarantees are possible with $\approx k\log d$ measurements.

💡 Deep Analysis

📄 Full Content

We consider the problem of sparse recovery, a cornerstone problem in learning theory and highdimensional statistics, with applications to many diverse fields, including medical imaging [LDP07,GS15], computational photography [DDT + 08, GJP20] and wireless communication [DE11,HLY13]. In this problem, we assume there is some underlying ground truth k-sparse signal θ ⋆ ∈ R d , and our goal is to recover it given n (potentially noisy) linear measurements, i.e., from y := Xθ ⋆ + ξ for some measurement matrix X ∈ R n×d and some noise vector ξ ∈ R n . Typically, we are interested in the case where the number of measurements n is much smaller than d, and the main statistical measure of merit is how large n has to be to achieve good estimation error for θ ⋆ .

In this paper, we investigate the question of learning θ ⋆ to ℓ ∞ error, a task which is closely related to the well-studied question of variable selection for sparse linear models [Tib96, FL01, CT07, MB10, BC15]. In many real-world applications of sparse recovery, a primary goal is to select which features of the regression model have significant explanatory power [YSY + 08, BCH14, CT20, Aky23]. In other words, the task is to find the support of the large elements of the unknown θ ⋆ . This problem is of particular import in overparameterized, high-dimensional settings where d ≫ |supp(θ ⋆ )|. By a thresholding argument, observe that this task is more or less equivalent to learning θ ⋆ to good ℓ ∞ error. Indeed, recovery in ℓ ∞ immediately implies that we can also learn the support of the heavy elements of θ ⋆ , and conversely, if we can identify this support efficiently, it is (in many natural settings) straightforward to recover θ ⋆ , since by focusing on those coordinates, we can reduce the problem to standard (i.e., dense) linear regression as long as n ≳ |supp(θ ⋆ )|.

In the most commonly-studied setting where X is an entrywise Gaussian measurement matrix, and the goal is to learn θ ⋆ to good ℓ 2 -norm error, the statistical complexity of sparse recovery is by now fairly well-understood. The seminal work of [CT05,CT06] demonstrated that in the noiseless setting, i.e., ξ = 0 d , exact recovery is possible when n ≈ k log d k , and moreover, this is achievable with an efficient algorithm (ℓ 1 minimization). This sample complexity is tight up to a logarithmic factor, simply by a rank argument. Follow-up work of [CRT06] demonstrated that for general noise vectors ξ ∈ R n , there is an efficiently-computable estimator θ which achieves ℓ 2 -norm error ∥θ ⋆ -θ∥ 2 = O (∥ξ∥ 2 ) , with the same asymptotic number of measurements, and this recovery rate is optimal. However despite the large literature on sparse recovery, the sample complexity landscape is significantly less well-understood for recovery in the ℓ ∞ norm, and for variable selection in general. While a number of papers [Lou08, YZ10, CW11, HJLL17, LYP + 19, Wai19] demonstrate upper bounds for this problem, including several that prove error rates for popular algorithms such as LASSO [Lou08,YZ10,Wai19], very few lower bounds are known (see Section 1.3 for a more detailed discussion), and moreover, several of these results require additional assumptions on θ ⋆ and/or the noise. For instance, while [Wai19] proves that one can achieve good ℓ ∞ error with LASSO with n = O(k log d k ) measurements, their results require (among other things) that the support of θ ⋆ is random, and independent of X. This is in stark contrast to the landscape for learning in ℓ 2 , where one can obtain a “for all” guarantee for learning any k-sparse vector θ ⋆ with the same X. Additionally, there are very limited lower bounds for learning in ℓ ∞ error, and they do not typically match the existing upper bounds. This state of affairs begs the natural question:

Can we characterize the statistical landscape of learning sparse linear models in ℓ ∞ error?

Relatedly, can we understand the sample complexity of variable selection for sparse linear regression?

In this work, we make significant progress on understanding these fundamental questions. Our main contributions are new sample complexity upper and lower bounds for variable selection and ℓ ∞ sparse recovery, under various natural generative models. Before we go into detail about our results, we wish to emphasize two main conceptual contributions of our investigation.

Adaptivity matters for ℓ ∞ sparse recovery. As mentioned, prior works for variable selection and ℓ ∞ sparse recovery often required additional assumptions on how the support of the unknown k-sparse vector θ ⋆ is chosen. We show that this is inherent: if θ ⋆ and ξ are chosen independently of the measurement matrix X (the “oblivious” or “for each” model), then recovery is possible with n = O(k log d k ) measurements in nearly-linear time, but they can be chosen with knowledge of X (the “adaptive” or “for all” model),1 then n = Ω(k2 ) measurements are both necessary and (up to a log d factor) sufficient. In other words, un

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut