PAC learnability versus VC dimension: a footnote to a basic result of statistical learning
A fundamental result of statistical learnig theory states that a concept class is PAC learnable if and only if it is a uniform Glivenko-Cantelli class if and only if the VC dimension of the class is finite. However, the theorem is only valid under special assumptions of measurability of the class, in which case the PAC learnability even becomes consistent. Otherwise, there is a classical example, constructed under the Continuum Hypothesis by Dudley and Durst and further adapted by Blumer, Ehrenfeucht, Haussler, and Warmuth, of a concept class of VC dimension one which is neither uniform Glivenko-Cantelli nor consistently PAC learnable. We show that, rather surprisingly, under an additional set-theoretic hypothesis which is much milder than the Continuum Hypothesis (Martin’s Axiom), PAC learnability is equivalent to finite VC dimension for every concept class.
💡 Research Summary
The paper revisits the classic equivalence in statistical learning theory that a concept class is distribution‑free PAC learnable if and only if it is a uniform Glivenko‑Cantelli class, which in turn holds exactly when the class has finite VC dimension. While textbooks often present this theorem without further conditions, the proof actually requires a measurability assumption on the class (e.g., image‑admissible Souslin or “well‑behaved” classes). Without such an assumption the equivalence fails.
A celebrated counterexample, originally built under the Continuum Hypothesis (CH), shows a class of VC dimension 1 that is not uniform Glivenko‑Cantelli and, after a slight modification, is not consistently PAC learnable. The construction uses a total well‑ordering of an uncountable standard Borel space so that every initial segment is countable; each sample is then contained in some initial segment, causing empirical measures to converge to 1 and breaking uniform convergence.
The author’s main contribution is to replace the strong set‑theoretic assumption CH with the much weaker Martin’s Axiom (MA) and to prove that under MA the classic equivalence is restored for all concept classes consisting of universally measurable subsets of a Borel domain. Specifically, Theorem 2 states: assuming MA, a concept class C is distribution‑free PAC learnable iff its VC dimension is finite. The direction “PAC ⇒ finite VC” is already known without extra hypotheses; the novelty lies in establishing the converse under MA.
The technical heart of the argument is a new learning rule built from a well‑ordering of the class. Under MA, any subclass of size strictly less than the continuum (2^{ℵ₀}) is a uniform Glivenko‑Cantelli class (Lemma 10). The learning rule L₁ selects, for each labeled sample, the first concept in this well‑ordering that is consistent with the data. For any target concept C, the collection of hypotheses produced by L₁ on all possible samples forms a subclass of size < 2^{ℵ₀}, hence enjoys uniform convergence. Lemma 8 then shows that such a rule is automatically PAC‑consistent with the standard sample‑complexity bound (the usual VC‑dimension bound).
To connect MA with the needed measurability properties, the paper invokes a result (Theorem 9) that under MA the Lebesgue measure is 2^{ℵ₀}‑additive: the union of fewer than continuum many null sets is still null. This guarantees that the countable unions appearing in the construction remain universally measurable, eliminating the need for CH’s stronger combinatorial control.
The paper also re‑examines the original CH‑based counterexample. By redefining the well‑ordering (making the whole space the smallest element) and applying the MA‑based analysis, the same class is shown to be PAC‑learnable (via the “first‑consistent” rule) while still failing to be a uniform Glivenko‑Cantelli class. Thus the pathological behavior persists under MA, but the equivalence with finite VC dimension is rescued.
In summary, the work demonstrates that Martin’s Axiom—a set‑theoretic principle compatible with both CH and its negation—is sufficient to guarantee the classic PAC/VC equivalence without invoking any explicit measurability constraints on the hypothesis class. This broadens the theoretical foundation of learning theory, showing that the measurability assumptions traditionally required are not essential once MA is adopted. The results invite further investigation into the minimal set‑theoretic conditions needed for learning guarantees and suggest that many practical learning scenarios may rest on weaker, more natural axioms than previously thought.
Comments & Academic Discussion
Loading comments...
Leave a Comment