Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.


💡 Research Summary

This paper investigates how large language models (LLMs) perform in‑context concept inference by probing the internal representations that emerge during a reverse‑dictionary task. The authors present a series of analyses across multiple open‑source decoder‑only transformers (Llama‑3.1, Llama‑3, Qwen2.5) of varying scale.

First, they trace the evolution of hidden states layer‑by‑layer. Using singular value decomposition (SVD) on the centered hidden‑state matrices for a set of query concepts, they retain the top principal components that explain 95 % of variance. By measuring the mean‑squared cosine of principal angles between the subspaces of successive layers, they find that early layers (roughly up to layer 35‑40) exhibit low overlap, indicating rapid transformation of representations. In contrast, middle‑to‑late layers show a sharp increase in overlap, revealing a stable “conceptual subspace” that persists through the remainder of the forward pass. The number of components needed to capture 95 % variance also spikes in the middle layers, suggesting that contextual information is being integrated into a higher‑dimensional latent space.

To formalize this shared structure, the authors apply Generalized Canonical Correlation Analysis (GCCA) across a selected set of middle‑to‑late layers. GCCA yields a common latent matrix G and layer‑specific projection matrices Wℓ that map each layer’s hidden states onto G. The dimensionality r of G is chosen via a non‑parametric permutation test, ensuring that the identified subspace reflects genuine alignment rather than noise. Alignment scores (average correlation between projected representations) and Representational Similarity Analysis (RSA) values approach unity, confirming that the subspace is linearly consistent across layers while allowing gradual refinement.

Next, the paper conducts causal mediation experiments to determine whether this subspace is merely an epiphenomenon or a functional component of inference. For each layer ℓ they define an orthogonal projector Pℓ = WℓWℓᵀ, decomposing hidden states into subspace‑aligned (hℓ,∥) and orthogonal (hℓ,⊥) components. They introduce three corruption conditions—description, label, and query corruption—and perform activation patching: the orthogonal component remains from the corrupted run, while the subspace component is replaced with that from a clean run. The causal indirect effect (CIE) is computed as the log‑likelihood difference after patching. Results show that restoring only the subspace component substantially recovers performance, indicating that the subspace carries the crucial contextual information needed for correct prediction.

Further, selective ablation (zeroing the subspace component) dramatically degrades accuracy, while isolating the subspace (zeroing the orthogonal component) retains most of the model’s capability, reinforcing the subspace’s necessity and sufficiency. The authors also demonstrate cross‑context generalization: a subspace derived from one set of demonstrations can be transplanted into a different demonstration set and still improve inference, highlighting its abstract, context‑invariant nature.

Finally, the authors examine how the subspace is constructed. By analyzing attention‑head contributions in early and middle layers, they find that specific heads aggregate information from the few demonstration pairs, effectively writing into the emerging subspace. As the number of demonstrations increases from 1 to 24, the internal geometry of the subspace becomes increasingly stable, with diminishing returns beyond 24 examples. Larger models (e.g., Llama‑3.1 70B) exhibit higher cross‑context alignment, suggesting that model scale enhances the fidelity of this latent structure.

Overall, the study provides strong evidence that LLMs do not rely solely on surface statistical patterns. Instead, they dynamically build a structured, latent conceptual subspace during in‑context learning, and this subspace causally mediates the inference process. The work bridges mechanistic interpretability (attention‑head dynamics) with cognitive‑level concepts (abstract, relational representations), offering a concrete mechanistic account of how flexible adaptation emerges in modern language models. This insight has implications for model interpretability, robustness, and the design of next‑generation architectures that explicitly harness such latent structures.


Comments & Academic Discussion

Loading comments...

Leave a Comment