Sparse Solutions to Nonnegative Linear Systems and Applications
We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. Unlike most known results for sparse recovery, we do not require {\em any} assumption on the matrix other than non-negativity. Our algorithm is combinatorial in nature, inspired by techniques for the set cover problem, as well as the multiplicative weight update method. We then present a natural application to learning mixture models in the PAC framework. For learning a mixture of $k$ axis-aligned Gaussians in $d$ dimensions, we give an algorithm that outputs a mixture of $O(k/\epsilon^3)$ Gaussians that is $\epsilon$-close in statistical distance to the true distribution, without any separation assumptions. The time and sample complexity is roughly $O(kd/\epsilon^3)^{d}$. This is polynomial when $d$ is constant – precisely the regime in which known methods fail to identify the components efficiently. Given that non-negativity is a natural assumption, we believe that our result may find use in other settings in which we wish to approximately explain data using a small number of a (large) candidate set of components.
💡 Research Summary
The paper tackles two intertwined problems: (1) finding sparse approximate solutions to non‑negative linear systems, and (2) applying this capability to learn mixtures of axis‑aligned Gaussians in the PAC framework.
The authors start by observing that most sparse‑recovery results require strong structural assumptions on the matrix (e.g., RIP, incoherence). In contrast, they assume only that the matrix A∈ℝ^{m×n}{+} and the target vector b∈ℝ^{m}{+} are non‑negative. Under the mild condition that there exists a k‑sparse non‑negative vector x* satisfying Ax* = b (or approximately Ax* ≤ (1+ε₀)b), they design a combinatorial algorithm that produces a vector x_alg with O(k/ε³) non‑zero entries and ℓ₁ error ‖Ax_alg – b‖₁ ≤ ε‖b‖₁.
The algorithm is inspired by the set‑cover problem and the multiplicative‑weight‑update (MWU) method. They define a potential function Φ(x)=∑_j b_j (1+δ)^{(Ax)_j/b_j} and a secondary quantity ψ(x)=‖Ax‖₁. Starting from x=0, each iteration selects a coordinate i and a step size θ≥1/(Ck) that minimizes the ratio Φ(x+θe_i)/Φ(x). By carefully choosing the parameters C=16/ε and δ=ε/16, they guarantee that after T = O(k/δ²) = O(k/ε³) iterations, the potential satisfies Φ ≤ (1+δ)(1+η)ψ for a small η. Lemma 2.2 shows that this condition directly yields the desired ℓ₁ error bound. Lemma 2.3 proves the existence of a suitable coordinate and step size at every iteration, relying only on the existence of a k‑sparse feasible solution. The algorithm runs in O(mn·log(mn)/δ) per iteration, giving an overall polynomial runtime in the input size and 1/ε.
Having established a generic sparse‑recovery primitive, the authors turn to learning mixtures of k axis‑aligned Gaussians in d dimensions. They view each possible Gaussian (parameterized by mean and variance in each coordinate) as a column of a huge matrix whose entries are the probability density values on a fine grid over ℝ^d. The true mixture’s density is then a non‑negative linear combination of these columns. By discretizing the space (so that the continuous density is approximated within ε), the problem reduces exactly to the non‑negative linear system setting. Applying their sparse‑recovery algorithm yields a representation using O(k/ε³) Gaussians that is ε‑close in total variation distance to the original mixture. The sample and time complexity become O((kd/ε³)^d), which is polynomial when d is a constant—precisely the regime where previous algorithms either require exponential time in k or need strong separation assumptions between components.
The paper also discusses optimality. The authors conjecture that the trade‑off between sparsity and error should be k/ε² (up to logarithmic factors). They prove a lower bound showing that any polynomial‑time algorithm cannot achieve sparsity better than k·log(1/ε) unless P=NP, indicating that some dependence on ε is unavoidable. Moreover, they introduce a “planted set‑cover” problem and argue that improving the sparsity to O(k/ε²) would imply an efficient solution to this problem, which is believed to be hard. This suggests that achieving the conjectured bound would require fundamentally new techniques beyond current MWU‑style approaches.
In summary, the paper contributes a novel, assumption‑light algorithm for sparse approximation in non‑negative linear systems, and leverages it to obtain the first polynomial‑time PAC learner for mixtures of axis‑aligned Gaussians in constant dimensions without any separation condition. The work bridges combinatorial optimization (set cover), online learning (multiplicative weights), and statistical learning theory, opening avenues for further research on tighter sparsity‑error trade‑offs and extensions to broader classes of models.
Comments & Academic Discussion
Loading comments...
Leave a Comment