The Representational Geometry of Number

The Representational Geometry of Number
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A central question in cognitive science is whether conceptual representations converge onto a shared manifold to support generalization, or diverge into orthogonal subspaces to minimize task interference. While prior work has discovered evidence for both, a mechanistic account of how these properties coexist and transform across tasks remains elusive. We propose that representational sharing lies not in the concepts themselves, but in the geometric relations between them. Using number concepts as a testbed and language models as high-dimensional computational substrates, we show that number representations preserve a stable relational structure across tasks. Task-specific representations are embedded in distinct subspaces, with low-level features like magnitude and parity encoded along separable linear directions. Crucially, we find that these subspaces are largely transformable into one another via linear mappings, indicating that representations share relational structure despite being located in distinct subspaces. Together, these results provide a mechanistic lens of how language models balance the shared structure of number representation with functional flexibility. It suggests that understanding arises when task-specific transformations are applied to a shared underlying relational structure of conceptual representations.


💡 Research Summary

The paper tackles a central debate in cognitive science: whether conceptual representations converge onto a shared manifold that promotes generalization, or diverge into orthogonal subspaces to avoid task interference. Using large language models (LLMs) as high‑dimensional computational substrates, the authors investigate this question with number concepts as a testbed. They probe four models—BERT, GPT‑2, Qwen2.5, and a math‑specialized variant Qwen2.5‑Math—by extracting contextual embeddings of the digits 1‑9 across a suite of seven numerical tasks (quantity, comparison, arithmetic, parity, primality, successor, predecessor) and two formats (Arabic digit, English word). Embeddings are taken from the middle‑to‑late layers (≈75 % depth) and L2‑normalized; each number’s representation is the mean across task instances.

First, the authors verify that the three classic psychophysical effects of the Mental Number Line (distance, size, and ratio effects) are present in the cosine similarity of the embeddings. Both distance and ratio effects show near‑perfect linear or logarithmic fits across all models and tasks; the size effect is stable in BERT but more variable in Qwen2.5‑Math, reflecting finer task‑specific distinctions in the latter.

To assess whether a common relational scaffold underlies these task‑specific embeddings, they apply Procrustes analysis. After optimal scaling, rotation, and translation, the disparity between any pair of task subspaces is extremely low (mean ≈ 0.01), far below a permutation baseline (0.07–0.27). This indicates that while absolute coordinates differ, the relative geometry of numbers (e.g., the ordering 1‑2‑3) is preserved across tasks, formats, and model families.

Next, they quantify subspace overlap asymmetrically: the top‑k principal components of task A are used to explain variance in task B. Across most task pairs, a large proportion of variance (>70 %) is captured, showing that task subspaces are not fully independent but share a high‑dimensional backbone.

Functional equivalence is further examined with Singular Vector Canonical Correlation Analysis (SVCCA). After reducing each task’s embeddings via PCA, CCA yields canonical correlations ρ₁…ρₙ with an average ρ̄ of 0.82–0.91, demonstrating that linear mappings can align task representations with high fidelity. Notably, low‑level features such as magnitude and parity (or primality) occupy nearly orthogonal directions, suggesting a built‑in mechanism for minimizing interference while preserving shared relational information.

Visualization with t‑SNE reveals that numbers maintain a sequential layout within each task cluster for both BERT and Qwen2.5‑Math. However, Qwen2.5‑Math shows sharper segregation of tasks (e.g., addition vs. multiplication) and clearer linear separability of parity and primality, reflecting the impact of domain‑specific pre‑training on subspace granularity.

Overall, the study proposes a unified mechanistic account: (1) a stable, low‑dimensional relational geometry of numbers is shared across tasks, formats, and models; (2) each task projects this shared scaffold into its own subspace via a largely linear transformation; (3) these subspaces retain high linear similarity (SVCCA) yet encode distinct low‑level attributes in orthogonal directions, thereby balancing generalization with interference avoidance. This framework bridges theoretical accounts of representation disentanglement and shared structure, offering concrete design principles for multitask learning, model interpretability, and the development of AI systems that mirror human numerical cognition.


Comments & Academic Discussion

Loading comments...

Leave a Comment