Large language models are not about natural language

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational system that recursively generates hierarchical thought structures. The language system grows with minimal external input and can readily distinguish between real language and impossible languages.

💡 Research Summary

The paper “Large language models are not about natural language” presents a comprehensive critique of the claim that contemporary large language models (LLMs) can meaningfully contribute to linguistic theory, processing, or acquisition research. The authors begin by targeting Futrell and Mahowald’s (2025) position that LLMs align with linguistic structure and therefore deserve attention from linguists. They argue that this position conflates two very different kinds of models: the probabilistic, data‑driven systems that LLMs embody and the generative, mind‑internal computational system that underlies human language.

The authors first situate LLMs historically, noting that they are essentially a massive scaling of early Markovian approaches to text (Markov 1913). LLMs treat language as a flat sequence of tokens and learn statistical regularities from billions of examples. By contrast, the human language faculty is described as a recursive, hierarchical system that constructs meaning‑bearing structures internally, often without any externalization. This fundamental difference means that LLMs do not generate the kinds of structures that determine semantics, as argued by Everaert et al. (2015) and Friederici et al. (2017).

A second line of argument concerns learning mechanisms. The authors invoke the classic “poverty of the stimulus” argument: infants acquire syntactic knowledge with remarkably limited input, guided by innate constraints (Universal Grammar). LLMs, on the other hand, require trillions of parameters and massive corpora to produce superficially fluent output. The paper cites multiple studies (e.g., Crain et al., 2017; Yang et al., 2017) to illustrate that children can form syntactic representations even in the absence of direct evidence, a capability LLMs lack.

Energy consumption provides a third, concrete contrast. The authors quote recent reports that xAI’s data center in Memphis consumes 70 MW to run 100,000 GPUs, necessitating supplemental natural‑gas generators, while Google is reportedly installing small nuclear reactors for its AI clusters. By comparison, the human brain operates on roughly 20 W, a figure that includes the portion allocated to language processing. This disparity underscores the ecological and practical implausibility of equating LLMs with the human language system.

The fourth argument focuses on developmental trajectories. Children’s language development proceeds through stages that include structures not present in adult input, reflecting a generative capacity that is absent from LLM training regimes. Empirical work cited (e.g., Musso et al., 2003; Tettamanti et al., 2002) shows that the brain selectively inhibits “impossible” languages—those that violate hierarchical constituency—via a network involving Broca’s area. LLMs, however, fail to make this distinction. The paper critiques Futrell and Mahowald’s reliance on Kallini et al. (2024), noting that the worst performance of LLMs was on a shuffled‑word condition that bears no linguistic structure at all. Subsequent studies (Bowers 2025; Luo et al. 2024; Ziv et al. 2025) demonstrate that LLMs do not differentiate between normal English and backward or otherwise “impossible” variants, reinforcing the claim that they are insensitive to the hierarchical constraints that define human language.

The authors also discuss inductive bias. Human learners possess strong biases toward hierarchical, recursive structures, whereas LLMs are biased toward linear statistical regularities. This mismatch explains why LLMs can learn “impossible” languages as well as possible ones, whereas children cannot. The paper emphasizes that this fundamental divergence renders LLMs uninformative about the cognitive architecture underlying language acquisition.

In the concluding section, the authors summarize their position with a metaphor: an LLM may “quack like a duck” but it is not a duck. They argue that, in its current probabilistic form, an LLM can never become a true model of human language. Consequently, they caution linguists against over‑reliance on LLMs for theoretical insight and call for future work that respects the distinctive, energy‑efficient, generative, and hierarchical nature of the human language faculty.

Large language models are not about natural language

💡 Research Summary

Comments & Academic Discussion

Leave a Comment