Agnostic Learning of Monomials by Halfspaces is Hard
We prove the following strong hardness result for learning: Given a distribution of labeled examples from the hypercube such that there exists a monomial consistent with $(1-\eps)$ of the examples, it is NP-hard to find a halfspace that is correct on $(1/2+\eps)$ of the examples, for arbitrary constants $\eps > 0$. In learning theory terms, weak agnostic learning of monomials is hard, even if one is allowed to output a hypothesis from the much bigger concept class of halfspaces. This hardness result subsumes a long line of previous results, including two recent hardness results for the proper learning of monomials and halfspaces. As an immediate corollary of our result we show that weak agnostic learning of decision lists is NP-hard. Our techniques are quite different from previous hardness proofs for learning. We define distributions on positive and negative examples for monomials whose first few moments match. We use the invariance principle to argue that regular halfspaces (all of whose coefficients have small absolute value relative to the total $\ell_2$ norm) cannot distinguish between distributions whose first few moments match. For highly non-regular subspaces, we use a structural lemma from recent work on fooling halfspaces to argue that they are ``junta-like’’ and one can zero out all but the top few coefficients without affecting the performance of the halfspace. The top few coefficients form the natural list decoding of a halfspace in the context of dictatorship tests/Label Cover reductions. We note that unlike previous invariance principle based proofs which are only known to give Unique-Games hardness, we are able to reduce from a version of Label Cover problem that is known to be NP-hard. This has inspired follow-up work on bypassing the Unique Games conjecture in some optimal geometric inapproximability results.
💡 Research Summary
The paper establishes a strong computational hardness result for agnostic learning of monomials, even when the learner is allowed to output hypotheses from the much larger class of halfspaces. Formally, for any constant ε > 0, given a distribution over labeled Boolean hypercube examples that is (1 − ε)-consistent with some monomial, it is NP‑hard to find a halfspace that achieves accuracy better than ½ + ε on the same distribution. Consequently, weak agnostic learning of decision lists is also NP‑hard.
The authors’ proof departs from earlier reductions based on the Unique‑Games Conjecture and instead reduces directly from a known NP‑hard version of the Label‑Cover problem. The reduction hinges on constructing two families of distributions—positive and negative examples—for each constraint of the Label‑Cover instance. These distributions are engineered so that their first k moments (for a suitably large constant k) match exactly. This moment‑matching property enables the use of the invariance principle: any “regular” halfspace (all coefficients small relative to the ℓ₂ norm) behaves essentially like a low‑degree polynomial and therefore cannot distinguish the two distributions beyond random guessing, yielding accuracy at most ½ + o(1).
The more delicate case concerns “non‑regular” halfspaces, where a few coefficients dominate. Here the authors invoke a recent structural lemma from the literature on fooling halfspaces, which shows that any such halfspace is effectively a junta: after zeroing out all but the top O(1/τ²) coefficients (τ being the regularity threshold), the hypothesis’s performance changes negligibly. The remaining large coefficients constitute a natural list‑decoding of the halfspace and can be mapped directly to the variables of the Label‑Cover instance. In the completeness case (the Label‑Cover instance is satisfiable), there exists a monomial that agrees with (1 − ε) of the examples, and a corresponding halfspace that aligns with the “large‑coefficient” junta, achieving accuracy at least ½ + ε. In the soundness case (the instance is far from satisfiable), any halfspace—regular or reduced to a junta—fails to exceed ½ + ε accuracy because the matched moments render the two distributions indistinguishable.
Thus the problem of finding a halfspace with accuracy > ½ + ε is equivalent to solving the underlying Label‑Cover instance, establishing NP‑hardness. The reduction works for arbitrary constant ε, bypassing the need for the Unique‑Games conjecture and yielding a result that subsumes earlier hardness proofs for proper learning of monomials and halfspaces. As an immediate corollary, the same technique shows that weak agnostic learning of decision lists is NP‑hard.
Beyond the main theorem, the paper contributes several methodological innovations. It demonstrates how moment‑matching distributions can be paired with the invariance principle to handle regular linear classifiers, and how structural junta‑type lemmas can reduce the analysis of highly non‑regular classifiers to a small set of influential variables. This two‑pronged approach may be applicable to other learning problems where the hypothesis class is geometrically richer than the target concept. The authors also discuss implications for optimal geometric inapproximability results, noting that their technique inspired subsequent work that avoids reliance on the Unique‑Games conjecture.
In summary, the work provides a comprehensive hardness landscape for agnostic learning of monomials, showing that even generous hypothesis classes do not alleviate the computational barrier, and it introduces powerful analytic tools that bridge probabilistic invariance, moment matching, and structural analysis of halfspaces.
Comments & Academic Discussion
Loading comments...
Leave a Comment