Topological Data Analysis of Single-cell Hi-C Contact Maps
In this article, we show how the recent statistical techniques developed in Topological Data Analysis for the Mapper algorithm can be extended and leveraged to formally define and statistically quantify the presence of topological structures coming from biological phenomena in datasets of CCC contact maps.
š” Research Summary
The paper presents a novel analytical pipeline that combines recent advances in Topological Data Analysis (TDA) with statistical techniques to rigorously detect and quantify topological structures in singleācell HiāC contact maps. After a concise introduction to chromosome conformation capture (CCC) technologies and the specific challenges of HiāC data (high dimensionality, sparsity, and systematic biases), the authors focus on the stratumāadjusted correlation coefficient (SCC). SCC groups matrix entries by genomic distance, computes a weighted average of Pearson correlations within each stratum, and yields a single similarity score that can be used as a distance metric between contact maps.
Using SCCāderived distances, the authors construct a Mapper simplicial complex. The Mapper requires a filter (or lens) function; here the authors employ a oneādimensional embedding of the SCC distance matrix (e.g., the first principal component) as the filter. The image of the filter is covered by overlapping hypercubes, each preāimage is clustered via singleālinkage with a scale parameter Ī“, and the nerve of the resulting cover produces the Mapper. The Mapper approximates the Reeb space of the underlying data, and the paper cites convergence results that guarantee this approximation under suitable choices of Ī“ and cover resolution.
To endow the Mapper with statistical confidence, the authors turn to extended persistence diagrams. Each node of the Mapper is assigned a scalar function, and the diagram records birthādeath pairs of topological features (connected components, cycles, etc.). The Bottleneck distance provides a metric between diagrams, enabling a bootstrap procedure: repeatedly resample the point cloud, recompute the Mapper and its diagram, and calculate the Bottleneck distance to the original diagram. The empirical distribution of these distances yields confidence intervals for each persistence point. Features whose confidence boxes do not intersect the diagonal are declared statistically significant, i.e., they persist in the limiting Reeb space with a prescribed confidence level.
The methodology is applied to a recent singleācell HiāC dataset (Nagano et al., 2017). After preprocessing the contact matrices, pairwise SCC distances are computed, and a oneādimensional filter is derived. The resulting Mapper exhibits a clear oneādimensional loop. Bootstrap analysis shows that the dominant persistence points of this loop lie well above the diagonal with >95āÆ% confidence, confirming the presence of a circular topological signature. When cells are colored by their known cellācycle phase, the loop aligns with the progression G1 ā S ā G2/M ā G1, providing a biologically meaningful validation of the topological signal.
In summary, the paper makes three key contributions: (1) introduction of SCC as an effective similarity measure for HiāC contact maps; (2) integration of Mapper and extended persistence diagrams to extract and visualize intrinsic topological features; (3) a bootstrapābased statistical framework that yields rigorous confidence statements about those features. The approach is computationally tractable, statistically sound, and broadly applicable to other highādimensional genomic contact data, offering a powerful new lens for interpreting the threeādimensional organization of the genome.
Comments & Academic Discussion
Loading comments...
Leave a Comment