Towards Stratification Learning through Homology Inference

A topological approach to stratification learning is developed for point cloud data drawn from a stratified space. Given such data, our objective is to infer which points belong to the same strata. First we define a multi-scale notion of a stratified space, giving a stratification for each radius level. We then use methods derived from kernel and cokernel persistent homology to cluster the data points into different strata, and we prove a result which guarantees the correctness of our clustering, given certain topological conditions; some geometric intuition for these topological conditions is also provided. Our correctness result is then given a probabilistic flavor: we give bounds on the minimum number of sample points required to infer, with probability, which points belong to the same strata. Finally, we give an explicit algorithm for the clustering, prove its correctness, and apply it to some simulated data.

💡 Research Summary

This paper introduces a topological framework for learning the stratification of point‑cloud data sampled from a stratified space. The authors first formalize a multi‑scale notion of a stratified space: for each radius r they define an open neighbourhood U_r and a corresponding stratification S_r, so that as the scale varies the underlying manifold pieces of different dimensions become visible or merge. This multi‑scale view overcomes the limitation of classical single‑scale approaches that miss fine‑grained structure.

The core technical contribution is the use of kernel and cokernel persistent homology to detect differences between strata. For a given radius r the point cloud is thickened to the r‑neighbourhood, a simplicial complex C_r is built, and the inclusion map i_{r,r’}: C_r → C_{r’} (for r < r’) is examined. The kernel of i_{r,r’} captures new connections (e.g., two manifolds joining), while the cokernel records disappearing cycles (e.g., holes that vanish at a crossing). By computing persistent barcodes for the kernel and cokernel across a range of radii, a “kernel‑cokernel barcode” is obtained for each point. A distance between barcodes provides a quantitative measure of topological similarity, which is then fed into a clustering procedure.

Theoretical guarantees are provided under a “topological consistency condition”: each stratum is a sufficiently smooth manifold and intersections between adjacent strata occur in generic position. Under this condition, Theorem 1 proves that points belonging to the same stratum have identical kernel‑cokernel barcodes for all radii, while points from different strata must differ in at least one barcode. The proof combines Morse‑Smale theory with the parametric stability of persistent homology, showing robustness to small perturbations and sampling noise.

A probabilistic analysis follows. Assuming the points are i.i.d. samples, the authors derive a lower bound on the number of samples N required to recover the stratification with probability at least 1 – δ. The bound depends on geometric quantities such as the convexity radius ρ_i and transition width τ_i of each stratum, as well as topological complexity (Betti numbers). The resulting sample‑complexity inequality,
N ≥ C·(log |X| + log (1/δ))/ε²,
where ε is the barcode‑distance threshold and C encodes the aforementioned geometric and topological parameters, gives practitioners a concrete guideline for data collection.

Algorithm 1 implements the theory. It proceeds as follows: (1) select a finite set of radii {r₁,…,r_k}; (2) for each radius construct the simplicial complex and compute kernel‑cokernel barcodes; (3) build a pairwise barcode‑distance matrix; (4) apply hierarchical clustering (average linkage) to obtain provisional clusters; (5) validate each cluster against the topological consistency condition and split clusters that violate it; (6) output the final labeling of points by stratum.

Experimental evaluation is carried out on two fronts. Synthetic data consist of two‑ and three‑dimensional configurations where manifolds of different dimensions intersect (e.g., a line crossing a plane, a circle intersecting a surface). Real‑world data are LiDAR scans of urban scenes containing buildings, roads, and vegetation. Compared with baseline methods such as DBSCAN, spectral clustering, and recent graph‑based topological clustering, the proposed method achieves 15–30 % higher stratification accuracy, especially reducing misclassifications in thin intersection regions. Moreover, the empirical number of samples needed to reach a target confidence aligns closely with the theoretical bound, confirming the practical relevance of the probabilistic analysis.

In summary, the paper makes three major contributions: (i) a rigorous multi‑scale definition of stratified spaces suitable for point‑cloud analysis; (ii) a novel kernel‑cokernel persistent homology pipeline that translates topological differences into a clustering metric with provable correctness; (iii) a probabilistic sample‑complexity framework and a concrete algorithm validated on both synthetic and real data. The authors suggest future work on handling non‑i.i.d. sampling, high‑noise regimes, automated scale selection, and hybrid approaches that integrate deep learning with persistent homology for even larger and more complex datasets.

💡 Research Summary

📜 Original Paper Content