From generative AI to the brain: five takeaways

Reading time: 5 minute
...

📝 Abstract

The big strides seen in generative AI are not based on somewhat obscure algorithms, but due to clearly defined generative principles. The resulting concrete implementations have proven themselves in large numbers of applications. We suggest that it is imperative to thoroughly investigate which of these generative principles may be operative also in the brain, and hence relevant for cognitive neuroscience. In addition, ML research led to a range of interesting characterizations of neural information processing systems. We discuss five examples, the shortcomings of world modelling, the generation of thought processes, attention, neural scaling laws, and quantization, that illustrate how much neuroscience could potentially learn from ML research.

💡 Analysis

The big strides seen in generative AI are not based on somewhat obscure algorithms, but due to clearly defined generative principles. The resulting concrete implementations have proven themselves in large numbers of applications. We suggest that it is imperative to thoroughly investigate which of these generative principles may be operative also in the brain, and hence relevant for cognitive neuroscience. In addition, ML research led to a range of interesting characterizations of neural information processing systems. We discuss five examples, the shortcomings of world modelling, the generation of thought processes, attention, neural scaling laws, and quantization, that illustrate how much neuroscience could potentially learn from ML research.

📄 Content

A multitude of factors contributes to the current rise of generative artificial intelligence (generative AI).

Here we focus on two aspects.

• Algorithmic developments can be formulated in many cases in terms of generic generative principles.

These generative principles have proven themselves, giving rise to high-performing machine learning architectures. It is an important question whether corresponding principles may operate in the brain.

• In addition to algorithms, insights regarding general working principles and properties of neural-based information processing systems have been attained. Do these apply also to the human brain?

Machine learning (ML) offers a range of conjectures for the workings of our brain, some of which extend or parallel traditional neuroscience frameworks, while others are new. Cognitive neuroscience should accept the challenge and evaluate these conjectures systematically in the context of wet information processing.

A comprehensive overview of potentially relevant cross-relations between ML and the neurosciences is beyond the scope of this perspective. We will focus instead on five key aspects elucidating the importance of paying attention to the concepts that are being developed for generative artificial intelligence. A flurry of new ideas awaits the scrutiny of cognitive neuroscience.

The two learning principles, ‘predictive coding’ (neuroscience) and ‘autoregressive language modeling’ are both dedicated to the task of building world models, with the former also having active components 1 arXiv:2511.16432v1 [cs.AI] 20 Nov 2025 (Brodski-Guerniero et al., 2017), operating in addition on distinct scales and modalities (Caucheteux et al., 2023).

For large language models (LLMs), autoregressive language modeling takes the form of next-word predictions. However, ML tells us that world-model building alone is insufficient.

The base-or foundation model, viz the result of word prediction training, does contain the knowledge of the world, as present in the training data. But all it can do is to complete a given input word by word. At this stage, key concepts of relevance for the interaction with users, such as ‘question ’ and ‘answer’, are not yet explicitly encoded.

In the early 2020s, a significant step forward was the realization that the otherwise essentially useless base model can be turned into a cognitive powerhouse via a secondary process, denoted ‘fine tuning’ or ‘human supervised fine-tuning’ (HSFT).

• A core fine tuning objective is to teach the system to generate meaningful responses for a given prompt, and not just engage in text completion.

• Next comes fine tuning of style, political correctness, etc.

• Models may be fine tuned further for specific downstream tasks, specializing the otherwise universal LLM to excel, e.g., in accounting.

It seems likely that equivalent processes would occur in our brains. In ML, the two processes are normally separated, viz performed subsequently. In the brain, world model training and fine-tuning via reinforcements are conceivably active at the same time.

Takeaway: ML offers a concrete construction plan for a basic cognitive system: Universal unsupervised world modelling followed by supervised fine tuning. To which extent does the brain follow this recipe?

The autonomous generation of thoughts is considered to be the basis of human intelligence. It is hence remarkable that commercial chatbots started to engage in rudimentary ’thinking’ by the mid-2020s. The algorithm used is denoted ‘Chain-of-Thought’ (CoT) (Zhang et al., 2025), originally a prompting technique (Wei et al., 2022). It is unclear to which extent human thought processes may be understood within the CoT framework, if at all. The same holds for its generalizations, viz ‘Chain-of-X’ (CoX) (Xia et al., 2024), such as Chain-of-Feedback, Chain-of-Instructions, or Chain-of-Histories. In any case, of interest are the underlying generative principles.

• CoT is one of many possible fine-tuning processes, characterized by a specific objective function.

• The system auto-prompts, appending its own thoughts to the user prompt.

• The response is then generated using the combined prompt: (user input)+(chain of thoughts).

Why are responses substantially better when the LLM thinks for a while? A possible explanation is based on the information bottleneck (IB) framework (Tishby and Zaslavsky, 2015). We recall that the token sequence is

The middle part, the thought processes, can be interpreted to act as an information bottleneck for the cognitive processing between input and output (Lei et al., 2025). This principle can be expressed as an information-theoretical min-max optimization.

• The mutual information between the input and CoT is -minimized-.

This means that the self-generated thoughts should abstract from the specific formulation of the input, retaining only the overall content.

• The mutual information between CoT and the output is -maximized-. This because th

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut