Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models
How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mechanistic lens, we addressed this by tracking entropy, curvature, and fMRI encoding scores throughout Pythia (70M-1B) training. We identified a geometric modularization where layers self-organize into stable low- and high-complexity clusters. The low-complexity module, characterized by reduced entropy and curvature, consistently better predicted human language network activity. This alignment followed heterogeneous spatial-temporal trajectories: rapid and stable in temporal regions (AntTemp, PostTemp), but delayed and dynamic in frontal areas (IFG, IFGorb). Crucially, reduced curvature remained a robust predictor of model-brain alignment even after controlling for training progress, an effect that strengthened with model scale. These results links training-driven geometric reorganization to temporal-frontal functional specialization, suggesting that representational smoothing facilitates neural-like linguistic processing.
💡 Research Summary
This paper investigates how the internal representational geometry of large language models (LLMs) evolves during training and how this evolution relates to alignment with human brain activity measured by functional MRI (fMRI). The authors focus on the Pythia family of transformer‑based models, spanning four scales from 70 million to 1 billion parameters. For each model they extract layer‑wise hidden states across 1,000 sentences at 19 logarithmically spaced training checkpoints (from step 1 to 143 k). Two geometric metrics are computed per layer and per sentence: (1) Von Neumann entropy of the token‑wise Gram matrix, quantifying spectral dispersion (high entropy = more distributed representations, low entropy = compressed, low‑rank representations); and (2) curvature, defined as the mean turning angle between successive token vectors, capturing the smoothness of token trajectories (low curvature = smoother, globally consistent trajectories). These metrics are averaged across sentences to obtain stable layer‑wise descriptors at each checkpoint.
The authors concatenate entropy and curvature across all checkpoints for each layer, forming a “geometry‑trajectory vector”. Applying k‑means clustering with K = 2 to these vectors reveals a robust bifurcation of layers into a low‑complexity module (layers 4‑15) and a high‑complexity module (the remaining layers). Stability analyses (bootstrap over checkpoints and leave‑one‑checkpoint‑out) show >92 % consistency, indicating that the modular split is not an artifact of any particular training stage.
To assess brain alignment, the study uses the TUCKUTE2024 fMRI dataset, which contains responses from five participants passively reading the same 1,000 sentences. Five left‑hemisphere language regions of interest (ROIs) are examined: anterior and posterior temporal cortex (AntTemp, PostTemp), inferior frontal gyrus (IFG), its orbital part (IFGorb), and middle frontal gyrus (MFG). For each checkpoint, ridge regression (5‑fold cross‑validation) maps model activations to voxel‑wise fMRI responses, yielding a Pearson correlation score per ROI (the “encoding score”).
Across all checkpoints, the low‑complexity module consistently yields higher encoding scores than the high‑complexity module. The advantage is strongest in temporal ROIs (effect sizes d≈2.0) and more modest in frontal ROIs (d≈0.8‑0.9). Temporal regions show an early onset of this advantage (detectable by step ≤ 64) and quickly stabilize around step 512, whereas frontal regions exhibit a delayed, more dynamic trajectory; IFGorb even briefly favors the high‑complexity module early on before switching. MFG shows the weakest and most unstable module difference.
The authors then examine whether the geometric metrics themselves predict encoding performance beyond mere training progress. Correlating checkpoint‑averaged curvature (within the low‑complexity module) with encoding scores yields very strong negative relationships (|r| > 0.91) for all ROIs except MFG, indicating that as curvature flattens, brain alignment improves. Entropy shows weaker, often non‑significant correlations. To control for the confound of training time, they fit ROI‑specific mixed‑effects regressions: encoding ~ curvature + entropy + log(step) + layer (random). Curvature retains a significant negative coefficient (β ≈ ‑0.9, FDR‑corrected q < 0.05) across all ROIs, while entropy’s effect is small or non‑significant. This demonstrates that curvature is an independent predictor of brain alignment.
Finally, the authors explore scaling effects by repeating the analysis for the 70 M, 160 M, and 410 M models. The curvature‑alignment relationship strengthens with model size: larger models exhibit larger absolute β coefficients and higher statistical power, suggesting that geometric smoothing becomes more consequential as capacity grows.
In sum, the study uncovers a training‑driven modularization of representational geometry in LLMs, with a low‑complexity module characterized by reduced entropy and curvature. This module aligns better with human language‑network activity, especially in temporal cortex, and curvature reduction predicts alignment even after accounting for training progress. The findings link geometric re‑organization during training to functional specialization observed in the brain, implying that “representational smoothing” may be a key mechanism by which artificial networks acquire brain‑like linguistic processing. The work opens avenues for incorporating geometric constraints into model training to enhance neuro‑cognitive fidelity and for using geometry as a mechanistic bridge between AI and neuroscience.
Comments & Academic Discussion
Loading comments...
Leave a Comment