Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We establish the last missing link allowing to describe those complete separable metric spaces $X$ in which the $k$ nearest neighbour classifier is universally consistent, both in combinatorial terms of dimension theory and via a fundamental property of real analysis. The following are equivalent: (1) The $k$-nearest neighbour classifier is universally consistent in $X$, (2) The strong Lebesgue–Besicovitch differentiation property holds in $X$ for every locally finite Borel measure, (3) $X$ is sigma-finite dimensional in the sense of Jun-Iti Nagata. The equivalence (2)$\iff$(3) was announced by Preiss (1983), while a detailed proof of the implication (3)$\Rightarrow$(2) has only appeared in Assouad and Quentin de Gromard (2006). The implication (2)$\Rightarrow$(1) was established by Cérou and Guyader (2006). We prove the implication (1)$\Rightarrow$(3). We further show that the weak (instead of strong) Lebesgue–Besicovitch property is insufficient for the consistency of the $k$-NN rule, as witnessed, for example, by the Heisenberg group (here we correct a wrong claim made in the previous article (Kumari and Pestov 2024)). A bit counter-intuitively, there is a metric on the real line uniformly equivalent to the usual distance but under which the $k$-NN classifier fails. Finally, another equivalent condition that can be added to the above is the Cover–Hart property: (4) the error of the $1$-nearest neighbour classifier is asymptotically at most twice as bad as the Bayes error.

💡 Research Summary

The paper completes the characterization of those complete separable metric spaces in which the k‑nearest neighbour (k‑NN) classifier is universally consistent. Three conditions are shown to be equivalent: (1) universal consistency of the k‑NN rule, (2) the strong Lebesgue–Besicovitch differentiation property for every locally finite Borel measure, and (3) σ‑finite Nagata dimension. The implications (2)⇒(1) (Cérou‑Guyader, 2006) and (3)⇔(2) (Preiss, 1983; detailed by Assouad‑de Gromard, 2006) were already known; the missing direction (1)⇒(3) is proved here (Theorem 1.2).

The proof constructs, for any complete separable space that is not σ‑finite dimensional, a Borel probability measure μ and a binary regression function η taking values {0,1} with equal probability. Using a sequence k_n→∞ with k_n/n→0, the k‑NN classifier’s error does not converge to the Bayes error (which is zero in this construction). The measure extends Preiss’s construction from ℓ² to arbitrary non‑σ‑finite spaces, showing that lack of Nagata dimension prevents the strong differentiation property, and consequently prevents universal consistency.

The paper also clarifies that the weak Lebesgue–Besicovitch property (convergence in measure) is insufficient for consistency. Two counter‑examples are provided: a recent construction by Käenmäki, Rajala and Suomala (2021) and, more strikingly, the Heisenberg group equipped with the Carnot–Cygan–Korányi metric. Although the Heisenberg group satisfies the weak differentiation property for every measure, it fails to be σ‑finite dimensional, and thus k‑NN is not universally consistent there. This corrects an erroneous claim in Kumari and Pestov (2024).

A further surprising result is that even on the real line ℝ one can define a metric uniformly equivalent to the usual Euclidean distance such that the k‑NN rule fails to be universally consistent. This demonstrates that mere uniform equivalence of metrics does not guarantee the preservation of consistency.

Finally, the authors add a fourth equivalent condition, the Cover–Hart property: the asymptotic error of the 1‑NN classifier is at most twice the Bayes error. Theorem 1.3 shows that this property is equivalent to the three conditions above, extending the classical Euclidean result of Cover and Hart to the general metric setting.

The paper is organized as follows. Section 2 sets up the statistical learning framework, defines the k‑NN rule, and discusses tie‑breaking procedures. Section 3 reviews Nagata dimension and σ‑finite dimensionality. Section 4 contains the proof of (1)⇒(3) via the generalized Preiss measure. Section 5 discusses the insufficiency of the weak differentiation property, with the Heisenberg group example. Section 6 constructs the pathological metric on ℝ. Section 7 proves the equivalence with the Cover–Hart property.

In summary, the work unifies three seemingly disparate concepts—strong differentiation of measures, a combinatorial dimension notion from topology, and a classic performance bound for nearest‑neighbour classifiers—into a single equivalence class. This provides a precise, mathematically rigorous criterion for when k‑NN (and 1‑NN) can be expected to work universally, and it highlights the delicate role of the underlying metric in non‑Euclidean spaces.

Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III

💡 Research Summary

Comments & Academic Discussion

Leave a Comment