대형 언어 모델의 의사결정 기하학: 내재 차원 분석을 통한 다중 선택 질문 응답 메커니즘
📝 Abstract
Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.
💡 Analysis
Large Language Models (LLMs) show strong generalization across diverse tasks, yet the internal decision-making processes behind their predictions remain opaque. In this work, we study the geometry of hidden representations in LLMs through the lens of \textit{intrinsic dimension} (ID), focusing specifically on decision-making dynamics in a multiple-choice question answering (MCQA) setting. We perform a large-scale study, with 28 open-weight transformer models and estimate ID across layers using multiple estimators, while also quantifying per-layer performance on MCQA tasks. Our findings reveal a consistent ID pattern across models: early layers operate on low-dimensional manifolds, middle layers expand this space, and later layers compress it again, converging to decision-relevant representations. Together, these results suggest LLMs implicitly learn to project linguistic inputs onto structured, low-dimensional manifolds aligned with task-specific decisions, providing new geometric insights into how generalization and reasoning emerge in language models.
📄 Content
Large Language Models (LLMs) have exhibited impressive generalization across diverse natural language tasks [Radford et al., 2019, Brown et al., 2020]. Despite their success, how these models internally arrive at decisions, particularly in tasks requiring structured reasoning, remains underexplored. Understanding this process is central to interpretability and may yield insights into model generalization, failure modes, and capabilities. Recent work in mechanistic interpretability has highlighted specific circuits or components underlie LLM reasoning [Elhage et al., 2021, Olsson et al., 2022]. In parallel, probing-based approaches have tracked how task-relevant information flows across layers [Tenney et al., 2019, Hewitt andManning, 2019]. However, these techniques often focus on how the information is represented and where it resides, rather than how the representation geometry evolves to support decision-making. To complement these perspectives, we study decision-making by analyzing geometric properties of underlying manifolds. We specifically make use of the Intrinsic Dimension (ID), which quantifies the minimal degrees of freedom required to describe a distribution in high-dimensional space [Bishop, 2006]. Prior work has demonstrated that neural representations often lie on low-dimensional manifolds [Gong et al., 2019, Valeriani et al., 2023], with ID fluctuations signalling transitions in learning and abstraction [Cheng et al., 2025]. Yet, the connection between these geometric changes and model decisiveness, i.e., the commitment to a specific prediction, has not been explored extensively.
Our primary focus is to understand how internal decision-making unfolds within transformer-based LLMs, particularly in tasks requiring symbolic reasoning and choice commitment. To this end, we In the transformer-based architectures, a vector (latent features) of the same hidden dimensions d, is transformed by transformer blocks f l . Though the extrinsic dimension remains the same, we find that the feature space lies on low-dimensional manifolds of different intrinsic dimensions R id l . Intrinsically, there exists a mapping ϕ l corresponding to each f l , from R id l-1 → R id l . We study how these compressed manifolds align with the decision-making process in middle layers. We project the internal representations back to the vocabulary space to inspect the decisiveness. There is a sudden shift in performance that is aligned with the follow-up of a sharp peak observed in the residual-post ID estimates. are guided by three key questions: 1) How does ID evolve across layers, and how does this reflect the model’s progression from contextual encoding to decision-making? 2) Can geometric markers, such as ID peaks, serve as interpretable indicators of decisiveness and confidence in model predictions? 3) Are these ID dynamics consistent across different model families and tasks, and what role does model size, training stage, or prompt conditioning (e.g., few-shot examples) play in shaping these trajectories? We aim to bridge representational geometry with functional behavior in LLMs through these questions, providing a complementary perspective to circuit-based or probing-based analyses. Our findings reveal that ID can act as a proxy for representational focus and task commitment, helping identify critical layers that solidify/freeze model decisions, and provide insights that may guide future interpretability and intervention strategies.
In this work, we study the evolution of hidden representations that develop during decision making in LLMs using ID estimates by experimenting with reasoning-based multiple-choice question answering (MCQA)-style prompts. We conduct an extensive investigation into the internal representations of LLMs, analyzing 28 open-weight transformer models spanning multiple architectures and sizes (list of models in App. D). We build upon classical estimators such as Maximum Likelihood Estimation (MLE) [Levina and Bickel, 2004] and Two Nearest Neighbors (TwoNN) [Facco et al., 2017], and incorporate the recently proposed Generalized Ratios Intrinsic Dimension Estimator (GRIDE) [Denti et al., 2022] which demonstrates improved robustness to sampling noise and curvature distortions (see §3). Fig. 1 outlines our approach (details in §4). Our primary findings are as follows:
• Emergence of Decision Geometry: Across models and tasks, we observe a characteristic humpshaped trend in intrinsic dimension estimates (notably at the MLP output layers), where ID increases, peaks, and then declines. This reflects an early phase of abstraction followed by convergence toward decision-specific subspaces. • ID Peaks Coincide with Decisiveness: For most models, the peak in intrinsic dimension aligns closely with the onset of confident predictions (as revealed via projection to vocabulary space). This suggests a geometric marker of decisiveness within the model’s forward pass. • Layer-Specific Dynamics Differ by Componen
This content is AI-processed based on ArXiv data.