Convergent World Representations and Divergent Tasks

Convergent World Representations and Divergent Tasks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While neural representations are central to modern deep learning, the conditions governing their geometry and their roles in downstream adaptability remain poorly understood. We develop a framework clearly separating the underlying world, the data generation process and the resulting model representations to study these questions in a controlled setup. 5,075 city coordinates define the world and 7 geometric tasks generate the training data for autoregressive training. We find that different tasks give rise to qualitatively and quantitatively distinct world representation geometries. However, multi-task training drives convergence of world representations: models trained on non-overlapping tasks develop aligned geometric representations, providing controlled evidence for the Multitask Scaling Hypothesis of the Platonic Representation Hypothesis. To study adaptation, we pretrain models on all tasks, then test whether new entities (cities) can be consistently integrated into the representation space via fine-tuning. Surprisingly, we find that despite multi-task pretraining, some tasks, which we call divergent, actively harm the representational integration of new entities and harm generalization. Our results show that training on multiple relational tasks reliably produces convergent world representations, but lurking divergent tasks can catastrophically harm new entity integration via fine-tuning.


💡 Research Summary

The paper introduces a controlled experimental framework that cleanly separates three components: the underlying world, the data generation process, and the model. The “world” consists of 5,075 real‑world city coordinates (population >100 k), providing a fixed latent geometry. On top of this world the authors define seven geometric tasks (distance, angle, triangle area, line crossing, inside‑test, perimeter, compass direction). Each task takes a tuple of city identifiers as input and outputs a textual answer, which is tokenized for autoregressive transformer training. The model never sees the raw coordinates; it only observes identifiers and task outputs, and the authors probe whether the model internally reconstructs the latent coordinates.

First, they train separate models on each single task. Visualizations (PCA, linear probes) reveal that each task induces a qualitatively distinct representation geometry: distance yields thread‑like structures, angle produces a 2‑D manifold, compass creates fragmented clusters, and inside‑test leads to a diffuse spread. Despite these differences, most tasks allow linear decoding of (x, y) coordinates, although the fidelity varies. Centered Kernel Alignment (CKA) quantifies inter‑model similarity and shows substantial variability across random seeds for the same task, yet clear systematic differences between tasks; distance, for example, is the most divergent.

Next, they explore multi‑task training. Models trained on any two‑task combination (including disjoint task pairs) exhibit markedly higher CKA values (≈0.85 in deeper layers) and reduced seed‑to‑seed variance. Adding a third task further improves alignment, especially in the middle transformer layers. Notably, the crossing task, which fails to learn in isolation, succeeds when paired with any other task, suggesting that companion tasks provide a scaffold of coordinate information that the crossing task can exploit. This empirical evidence supports the Multitask Scaling Hypothesis, a proposed mechanism for the Platonic Representation Hypothesis (PRH), which posits that the space of viable representations shrinks as the number of tasks increases, forcing models toward a common structure.

Finally, the authors assess downstream adaptability by introducing 100 synthetic “Atlantis” cities that were absent during pretraining. They fine‑tune the 7‑task pretrained model on the new data and measure how well the new cities are integrated into the learned representation space. Surprisingly, certain tasks—labeled “divergent”—actively hinder this integration. When fine‑tuning includes these divergent tasks, CKA between pre‑ and post‑fine‑tuned representations drops, and linear probe accuracy on Atlantis coordinates deteriorates, leading to poorer generalization on held‑out queries. In contrast, other tasks facilitate smooth incorporation of new entities. Thus, while multi‑task pretraining reliably produces convergent world representations, the presence of specific divergent tasks can catastrophically impair the model’s ability to assimilate novel entities via fine‑tuning.

The paper’s contributions are threefold: (1) a novel world‑data‑model framework that enables precise manipulation of the underlying latent structure and the data generation process; (2) empirical demonstration that task diversity drives representation convergence across otherwise unrelated tasks, providing partial validation of PRH’s scaling hypothesis; and (3) identification of divergent tasks that, despite being part of a multi‑task pretraining regime, degrade downstream adaptation. These findings suggest that when designing large‑scale foundation models, careful selection and combination of pretraining tasks are crucial—not only for achieving high task performance but also for ensuring robust, adaptable internal world models.


Comments & Academic Discussion

Loading comments...

Leave a Comment