Cross-Fusion Distance: A Novel Metric for Measuring Fusion and Separability Between Data Groups in Representation Space

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Quantifying degrees of fusion and separability between data groups in representation space is a fundamental problem in representation learning, particularly under domain shift. A meaningful metric should capture fusion-altering factors like geometric displacement between representation groups, whose variations change the extent of fusion, while remaining invariant to fusion-preserving factors such as global scaling and sampling-induced layout changes, whose variations do not. Existing distributional distance metrics conflate these factors, leading to measures that are not informative of the true extent of fusion between data groups. We introduce Cross-Fusion Distance (CFD), a principled measure that isolates fusion-altering geometry while remaining robust to fusion-preserving variations, with linear computational complexity. We characterize the invariance and sensitivity properties of CFD theoretically and validate them in controlled synthetic experiments. For practical utility on real-world datasets with domain shift, CFD aligns more closely with downstream generalization degradation than commonly used alternatives. Overall, CFD provides a theoretically grounded and interpretable distance measure for representation learning.

💡 Research Summary

The paper addresses a fundamental challenge in representation learning: measuring how much two groups of latent representations are fused (overlapping) or separated, especially under domain shift where nuisance factors such as scaling, sampling, or internal deformation may obscure the true relationship. Existing distributional distances—Wasserstein distance, Maximum Mean Discrepancy (MMD), Hausdorff, and Chamfer—conflate “fusion‑altering” factors (geometric displacement between groups) with “fusion‑preserving” factors (global scaling, internal shape changes, sampling variations). This conflation leads to metrics that are sensitive to irrelevant variations and fail to isolate the drivers of fusion, limiting interpretability and diagnostic value.

To overcome these limitations, the authors propose Cross‑Fusion Distance (CFD). Let (z_A) and (z_B) be the point clouds of groups A and B, with empirical means (\mu_A,\mu_B) and within‑group variances (\sigma_A^2,\sigma_B^2). Define weights (w_A = n_A/(n_A+n_B)) and (w_B = n_B/(n_A+n_B)) and the fused centroid (\mu_{AB}=w_A\mu_A+w_B\mu_B). The total variance of the combined cloud decomposes as

Cross-Fusion Distance: A Novel Metric for Measuring Fusion and Separability Between Data Groups in Representation Space

💡 Research Summary

Comments & Academic Discussion

Leave a Comment