Component-Level Lesioning of Language Models Reveals Clinically Aligned Aphasia Phenotypes
Large language models (LLMs) increasingly exhibit human-like linguistic behaviors and internal representations that they could serve as computational simulators of language cognition. We ask whether LLMs can be systematically manipulated to reproduce language-production impairments characteristic of aphasia following focal brain lesions. Such models could provide scalable proxies for testing rehabilitation hypotheses, and offer a controlled framework for probing the functional organization of language. We introduce a clinically grounded, component-level framework that simulates aphasia by selectively perturbing functional components in LLMs, and apply it to both modular Mixture-of-Experts models and dense Transformers using a unified intervention interface. Our pipeline (i) identifies subtype-linked components for Broca’s and Wernicke’s aphasia, (ii) interprets these components with linguistic probing tasks, and (iii) induces graded impairments by progressively perturbing the top-k subtype-linked components, evaluating outcomes with Western Aphasia Battery (WAB) subtests summarized by Aphasia Quotient (AQ). Across architectures and lesioning strategies, subtype-targeted perturbations yield more systematic, aphasia-like regressions than size-matched random perturbations, and MoE modularity supports more localized and interpretable phenotype-to-component mappings. These findings suggest that modular LLMs, combined with clinically informed component perturbations, provide a promising platform for simulating aphasic language production and studying how distinct language functions degrade under targeted disruptions.
💡 Research Summary
The paper investigates whether large language models (LLMs) can be deliberately “lesioned” to reproduce the language‑production deficits characteristic of aphasia, thereby providing a scalable computational proxy for studying disordered language and testing rehabilitation hypotheses. The authors introduce a component‑level framework that treats functional units of an LLM as “components”: in a Mixture‑of‑Experts (MoE) model each expert is a component, while in a dense Transformer each feed‑forward hidden dimension is a component. Two models are examined – OLMo‑E (a 1 B‑parameter MoE with 64 experts per layer, 8 active per token) and its dense counterpart OLMo – which share tokenizer, depth, and training data, allowing a controlled architectural comparison.
First, the authors map each component to specific linguistic phenomena using BLiMP, a benchmark of minimal‑pair grammatical judgments. By zero‑ablation of a single component and measuring the drop in accuracy for each BLiMP subtask, they obtain an importance score Δₜ(u) that quantifies how much component u contributes to phenomenon t. This yields a fine‑grained “phenomenon‑to‑unit” attribution matrix for both architectures.
Second, to identify components linked to clinical phenotypes, the models are fine‑tuned separately on the Broca‑type and Wernicke‑type subsets of the AphasiaBank corpus. During fine‑tuning, a gradient‑weight product I_c(θ)=∑_s g_s(θ)⊙θ_s is accumulated for each parameter, and these values are summed across all parameters belonging to a component u to produce a phenotype‑specific contribution score Score_c(u). Ranking components by Score_c gives two phenotype‑specific lists (Broca and Wernicke). An external classifier trained on the Comparative Aphasia Project (CAP) data validates that the two fine‑tuned models generate stylistically distinct outputs, confirming that the identified components truly reflect phenotype differences rather than mere content memorization.
Because the subsequent heat‑map visualizations and progressive lesion experiments require a threshold p for selecting the top‑p % of components, the authors perform a “p‑sweep” (0.5 %–10 %). For each p they compute a BLiMP‑based profile by averaging the rank‑percentiles of the selected components and evaluate stability across p values using Spearman correlation. The correlation remains high (ρ > 0.9) across the examined range, indicating that the phenomenon‑to‑component patterns are robust to the exact threshold. An intermediate p = 2 % is adopted for all downstream analyses.
The alignment between linguistic phenomena and clinical phenotypes is visualized as a heat‑map H_c(i,j)=r_{t_j}(u_i)/N_{t_j}, where lower values indicate stronger association. Broca‑linked components cluster around syntactic and morphosyntactic tasks (e.g., verb tense, word order), whereas Wernicke‑linked components align with semantic and lexical tasks (e.g., word‑sense disambiguation, semantic plausibility).
To simulate focal, lasting damage, two lesion mechanisms are explored: (1) output zero‑ablation, which forces the component’s output to zero, and (2) Xavier re‑initialization, which replaces the component’s weights with values drawn from a Xavier uniform distribution, effectively erasing learned computation. For each lesion budget (0.5 %, 1 %, 1.5 %, 2 % of components), the selected components are disabled using either method, and the resulting model outputs are evaluated with the Western Aphasia Battery (WAB). Scores on the four subtests (Spontaneous Speech, Auditory Comprehension, Repetition, Naming) are aggregated into the Aphasia Quotient (AQ).
Results show that targeted lesioning of phenotype‑specific components yields systematic, graded declines in AQ and in the subtests most relevant to the phenotype: Broca‑type lesions primarily impair fluency and repetition, while Wernicke‑type lesions mainly affect comprehension and naming. Randomly lesioning the same number of components produces far less consistent and smaller performance drops, confirming the clinical relevance of the component‑phenotype mapping. Moreover, the MoE architecture exhibits more localized and interpretable effects: because each expert is sparsely activated, disabling a few experts leads to clear, phenotype‑consistent deficits. In contrast, the dense model’s hidden dimensions are more redundant, so the same proportion of lesions results in more diffuse performance degradation.
The study demonstrates that LLMs, especially modular MoE models, can serve as controllable simulators of aphasic language production. By linking internal components to linguistic phenomena (via BLiMP) and to clinical phenotypes (via AphasiaBank fine‑tuning), the framework enables precise, quantitative manipulation of language functions. This opens avenues for (a) pre‑clinical testing of rehabilitation interventions (e.g., prompting strategies, targeted fine‑tuning) in a virtual patient, (b) systematic exploration of multi‑component or multi‑lesion scenarios reflecting more complex neurological damage, and (c) deeper neuroscientific investigations that align model activations with brain imaging data. Future work may extend the approach to other aphasia subtypes, incorporate longitudinal recovery simulations, and refine the component definition to capture even finer-grained functional modules, thereby strengthening the bridge between artificial and biological language systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment