Fubini Study geometry of representation drift in high dimensional data

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

High dimensional representation drift is commonly quantified using Euclidean or cosine distances, which presuppose fixed coordinates when comparing representations across time, training or preprocessing stages. While effective in many settings, these measures entangle intrinsic changes in the data with variations induced by arbitrary parametrizations. We introduce a projective geometric view of representation drift grounded in the Fubini Study metric, which identifies representations that differ only by gauge transformations such as global rescalings or sign flips. Applying this framework to empirical high dimensional datasets, we explicitly construct representation trajectories and track their evolution through cumulative geometric drift. Comparing Euclidean, cosine and Fubini Study distances along these trajectories reveals that conventional metrics systematically overestimate change whenever representations carry genuine projective ambiguity. By contrast, the Fubini Study metric isolates intrinsic evolution by remaining invariant under gauge-induced fluctuations. We further show that the difference between cosine and Fubini Study drift defines a computable, monotone quantity that directly captures representation churn attributable to gauge freedom. This separation provides a diagnostic for distinguishing meaningful structural evolution from parametrization artifacts, without introducing model-specific assumptions. Overall, we establish a geometric criterion for assessing representation stability in high-dimensional systems and clarify the limits of angular distances. Embedding representation dynamics in projective space connects data analysis with established geometric programs and yields observables that are directly testable in empirical workflows.

💡 Research Summary

The paper addresses a fundamental problem in high‑dimensional data analysis: how to quantify the drift of learned or extracted representations over time, training epochs, or preprocessing steps. Traditionally, researchers have relied on Euclidean distance (which measures raw coordinate differences) or cosine distance (which measures angular differences after normalizing vectors). Both approaches implicitly assume that the coordinate system is fixed and that the representations are points in a vector space. In many practical settings—principal component analysis, neural network embeddings, latent variable models—the representations are only defined up to a global scaling, sign flip, or more general gauge transformation. Consequently, Euclidean and cosine distances conflate genuine structural changes with artefacts introduced by arbitrary parametrizations.

The authors propose a projective‑geometric solution based on the Fubini‑Study (FS) metric, a natural distance on complex (or real) projective space. For two normalized vectors u and v, the real FS distance is defined as
d_FS(u, v) = arccos |⟨u, v⟩|,
where the absolute value removes sensitivity to sign. This metric is invariant under any non‑zero scalar multiplication, i.e., it treats vectors that differ only by a global scale or a sign as identical. In other words, it measures the angle between the underlying rays (one‑dimensional subspaces) rather than between the vectors themselves.

To demonstrate the practical impact of this metric, the authors construct representation trajectories from the publicly available handwritten‑digit dataset (64‑dimensional grayscale images). They impose an artificial ordering by sliding a fixed‑size window across the dataset indices. For each window they center the data, perform singular value decomposition, and extract the first principal component (PC1) direction. Because PC1 is defined only up to a sign, the resulting sequence of vectors lives naturally in ℝP⁶³, providing a clean test case for projective ambiguity.

Three distance families are computed between successive windows: (i) Euclidean distance, (ii) cosine distance (after unit‑norm normalization), and (iii) FS distance (the absolute‑value version of cosine). The stepwise distances are summed to obtain cumulative drift curves for each metric. The key observations are:

Cosine vs. FS drift – Across 32 windows the authors detect 17 sign‑flip events (negative inner products between consecutive PC1 vectors). Cosine drift jumps by roughly π for each flip, leading to a cumulative value of 51.15 rad, whereas FS drift remains unchanged at those points, ending at 14.72 rad. The difference (36.43 rad) is statistically significant (two‑sided sign test p < 0.001) and precisely quantifies the “gauge‑induced” component of drift. Outside the flip events, cosine and FS stepwise increments coincide, confirming that the two metrics agree on intrinsic directional changes.
Euclidean vs. FS drift – Euclidean drift grows monotonically and far faster than FS drift, reflecting sensitivity to both magnitude and direction. The authors plot a logarithmic ratio log((Δ_Eucl + ε)/(Δ_FS + ε)) (ε = 10⁻⁸) which rises steadily, indicating that raw magnitude change dominates the Euclidean measure. This demonstrates that Euclidean distance conflates scale variations (which may be irrelevant for many downstream tasks) with genuine orientation changes.
Decomposition of drift – By comparing the three curves the authors effectively decompose observed drift into three orthogonal components: (a) raw magnitude change (Euclidean – FS), (b) sign‑sensitive angular change (Cosine – FS), and (c) intrinsic projective change (FS). This decomposition requires no alignment, no reference vector, and no model‑specific hyper‑parameters.

From a methodological standpoint, the FS metric is computationally cheap: it reuses the inner‑product and normalization steps already needed for cosine distance, adding only an absolute‑value operation before the arccosine. Hence it can be dropped into existing pipelines without appreciable overhead.

The paper also discusses limitations and future directions. The empirical validation is limited to a single, linear‑feature dataset and to the first principal component; extending the analysis to deep, nonlinear embeddings (e.g., variational autoencoders, transformer language models) would test the robustness of the approach. Moreover, while the real FS metric handles sign ambiguity, complex‑valued representations (common in quantum‑inspired models) would require the full complex FS metric. Finally, the authors suggest exploring broader gauge groups (rotations, linear transformations) that go beyond simple scaling and sign changes.

In conclusion, the work establishes the Fubini‑Study metric as a principled, gauge‑invariant distance for high‑dimensional representation drift. It shows that conventional Euclidean and cosine distances systematically overestimate change when representations possess projective ambiguity, and it provides a simple, monotone quantity (cosine – FS drift) that isolates the artefactual component. This geometric perspective bridges representation analysis with well‑studied concepts from projective geometry and quantum state space theory, offering a clear diagnostic tool for researchers who need to distinguish meaningful structural evolution from parametrization noise in any high‑dimensional learning system.

Fubini Study geometry of representation drift in high dimensional data

💡 Research Summary

Comments & Academic Discussion

Leave a Comment