PAC learnability under non-atomic measures: a problem by Vidyasagar
In response to a 1997 problem of M. Vidyasagar, we state a criterion for PAC learnability of a concept class $\mathscr C$ under the family of all non-atomic (diffuse) measures on the domain $\Omega$. The uniform Glivenko–Cantelli property with respect to non-atomic measures is no longer a necessary condition, and consistent learnability cannot in general be expected. Our criterion is stated in terms of a combinatorial parameter $\VC({\mathscr C},{\mathrm{mod}},\omega_1)$ which we call the VC dimension of $\mathscr C$ modulo countable sets. The new parameter is obtained by “thickening up” single points in the definition of VC dimension to uncountable “clusters”. Equivalently, $\VC(\mathscr C\modd\omega_1)\leq d$ if and only if every countable subclass of $\mathscr C$ has VC dimension $\leq d$ outside a countable subset of $\Omega$. The new parameter can be also expressed as the classical VC dimension of $\mathscr C$ calculated on a suitable subset of a compactification of $\Omega$. We do not make any measurability assumptions on $\mathscr C$, assuming instead the validity of Martin’s Axiom (MA). Similar results are obtained for function learning in terms of fat-shattering dimension modulo countable sets, but, just like in the classical distribution-free case, the finiteness of this parameter is sufficient but not necessary for PAC learnability under non-atomic measures.
💡 Research Summary
The paper addresses a problem posed by M. Vidyasagar in 1997: to give a combinatorial characterization of concept classes that are PAC‑learnable when the learning algorithm is evaluated only against the family of all non‑atomic (diffuse) probability measures on a domain Ω. In the classical distribution‑free setting, three conditions are equivalent under mild measurability assumptions: (i) the class is PAC‑learnable for all probability measures, (ii) it is a uniform Glivenko–Cantelli (GC) class, and (iii) its Vapnik–Chervonenkis (VC) dimension is finite. The authors show that when the family of admissible measures is restricted to non‑atomic measures, condition (ii) and (iii) are no longer necessary, although they remain sufficient.
To capture the phenomenon that “countable noise” is irrelevant for learning under diffuse measures, the authors introduce a new combinatorial parameter, the VC dimension modulo countable sets, denoted VC(𝒞 mod ω₁). Instead of shattering individual points, the definition requires shattering uncountable “clusters”. Formally, VC(𝒞 mod ω₁) ≥ n if there exist n uncountable subsets A₁,…,Aₙ ⊆ Ω such that for every J ⊆ {1,…,n} there is a concept C ∈ 𝒞 containing all Aᵢ with i∈J and disjoint from all Aⱼ with j∉J. Equivalently, every countable subclass of 𝒞 has finite VC dimension on the complement of some countable subset of Ω. This parameter is always bounded above by the ordinary VC dimension, but can be strictly smaller, as illustrated by the class of all finite and co‑finite subsets of a standard Borel space (ordinary VC = ∞, VC mod ω₁ = 1).
The central result (Theorem 1.1) states that, assuming Martin’s Axiom (MA), the following statements are equivalent for a concept class 𝒞 ⊆ 𝒜 (where (Ω,𝒜) is a standard Borel space):
- 𝒞 is PAC‑learnable under the family Pₙₐ(Ω) of all non‑atomic measures.
- VC(𝒞 mod ω₁) is finite.
- Every countable subclass of 𝒞 has finite VC dimension outside some countable subset of Ω (the subset may depend on the subclass).
- There exists a uniform bound d such that the property in (3) holds with the same d for all countable subclasses.
- Every countable subclass of 𝒞 is a uniform GC class with respect to non‑atomic measures.
- The uniform GC property in (5) holds with a sample‑complexity function that depends only on 𝒞, not on the particular subclass.
If 𝒞 is universally separable (i.e., contains a countable dense subclass), the above are also equivalent to:
- VC(𝒞) is finite outside a countable subset of Ω.
- 𝒞 itself is a uniform GC class for non‑atomic measures.
- 𝒞 is consistently PAC‑learnable under non‑atomic measures (every consistent rule succeeds).
The implication (3) ⇒ (1) is the technical heart of the paper. Under MA the authors construct a specific consistent learning rule L with the property that, for any target concept C, the collection of hypotheses produced by L on all samples of the form (σ, C∩σ) forms a uniform GC class. MA is needed to guarantee that unions of fewer than 2^{ℵ₀} Lebesgue‑measurable sets remain measurable, which in turn ensures the measurability of the learning rule.
The paper also treats real‑valued function learning. It defines the fat‑shattering dimension modulo countable sets, fat_ε(F mod ω₁), by replacing points with uncountable clusters in the usual definition of fat‑shattering. Theorem 1.2 shows that, again assuming MA, finiteness of fat_ε(F mod ω₁) for every ε>0 is sufficient (though not necessary) for PAC‑learnability of a function class F under non‑atomic measures. Analogous equivalences to the countable‑subclass uniform GC property are established, and for universally separable function classes the conditions simplify similarly to the concept‑class case.
The authors further relate VC(𝒞 mod ω₁) and fat_ε(F mod ω₁) to classical VC and fat‑shattering dimensions computed on suitable compactifications of Ω (e.g., the Stone–Čech compactification βΩ) after taking closures of the concepts/functions. This provides a bridge to the traditional combinatorial parameters while highlighting that the “modulo countable” modification precisely removes the influence of countable sets, which are invisible to non‑atomic measures.
Finally, the paper discusses the role of Martin’s Axiom. For universally separable classes the equivalences hold in ZFC alone; for arbitrary classes MA is required to obtain the equivalence between (3) and (1). The authors note that MA is compatible with both the Continuum Hypothesis and its negation, and that it yields the needed sigma‑additivity property for unions of <2^{ℵ₀} measurable sets.
In summary, the work provides a complete combinatorial characterization of PAC‑learnability under the restricted family of non‑atomic measures. By introducing VC and fat‑shattering dimensions modulo countable sets, the authors isolate the exact combinatorial obstruction to learning in this setting, demonstrate that uniform Glivenko–Cantelli is no longer necessary, and show that consistent learning is generally insufficient. The results deepen the theoretical understanding of learning under intermediate measure families and open avenues for designing algorithms that are robust to countable “noise” in the data distribution.
Comments & Academic Discussion
Loading comments...
Leave a Comment