Evolutionary System 2 Reasoning: An Empirical Proof

Reading time: 5 minute
...

📝 Original Info

  • Title: Evolutionary System 2 Reasoning: An Empirical Proof
  • ArXiv ID: 2512.05760
  • Date: 2025-12-05
  • Authors: Zeyuan Ma, Wenqi Huang, Guo-Huan Song, Hongshu Guo, Sijie Ma, Zhiguang Cao, Yue-Jiao Gong

📝 Abstract

Machine intelligence marks the ultimate dream of making machines' intelligence comparable to human beings. While recent progress in Large Language Models (LLMs) show substantial specific skills for a wide array of downstream tasks, they more or less fall shorts in general intelligence. Following correlation between intelligence and system 2 reasoning (slow thinking), in this paper, we aim to answering a worthwhile research question: could machine intelligence such as LLMs be evolved to acquire reasoning ability (not specific skill) just like our human beings? To this end, we propose evolutionary reasoning optimization (ERO) framework which performs survival of the fittest over a population of LLMs to search for individual with strong reasoning ability. Given a reasoning task, ERO first initializes multiple LLMs as a population, after which an evolutionary strategy evolves the population to maximize quantified reasoning score of the best individual. Based on experiments on representative testsuites, we claim two surprising empirical discoveries: i) the latest LLMs such as GPT-5 still show limited system 2 reasoning ability; ii) with simple evolution-loop of ERO, a relatively weak model (Qwen-7B) could be enhanced to emerge powerful reasoning ability. Our project can be accessed at https://github.com/MetaEvo/ERO for reproduction needs.

💡 Deep Analysis

Figure 1

📄 Full Content

Evolutionary System 2 Reasoning: An Empirical Proof Zeyuan Ma14, Wenqi Huang1, Guo-Huan Song23, Hongshu Guo14, Sijie Ma1, Zhiguang Cao5, Yue-Jiao Gong1* 1South China University of Technology, 2Zhejiang Normal University, 3Northern Computility, 4Panorama Optimization, 5Singapore Management University The Evolution of Human Beings The Evolution of Machine Intelligence Game Theory 1940s J. von Neumann Info. Theory 1940s C. Shannon Turing Test 1940s A. Turing MCP Neuron 1943 McCulloch/Pitts Mark I Perceptron 1958 Frank Rosenblatt ADALINE 1960 B. Widrow/M. Hoff Backprop 1982 P. Werbos ReLU 1969 Fukushima CNN 1979 Fukushima LeNet 1989 Y. LeCun LSTM, DanNet 1991, 2011 J. Schmidhuber AlexNet 2012 G. Hinton GAN 2014 Y. Bengio Transformer 2017 Google AI LLMs 2020s OpenAI... Figure 1: An intuitive comparison between the evolution paths of human beings and machine intelligence. Abstract Machine intelligence marks the ultimate dream of making machines’ intelligence comparable to human beings. While recent progress in Large Language Models (LLMs) show sub- stantial specific skills for a wide array of downstream tasks, they more or less fall shorts in general intelligence. Follow- ing correlation between intelligence and system 2 reasoning (slow thinking), in this paper, we aim to answering a worth- while research question: could machine intelligence such as LLMs be evolved to acquire reasoning ability (not specific skill) just like our human beings? To this end, we propose evolutionary reasoning optimization (ERO) framework which performs survival of the fittest over a population of LLMs to search for individual with strong reasoning ability. Given a reasoning task, ERO first initializes multiple LLMs as a pop- ulation, after which an evolutionary strategy evolves the pop- ulation to maximize quantified reasoning score of the best in- dividual. Based on experiments on representative testsuites, we claim two surprising empirical discoveries: i) the latest LLMs such as GPT-5 still show limited system 2 reason- ing ability; ii) with simple evolution-loop of ERO, a rela- tively weak model (Qwen-7B) could be enhanced to emerge powerful reasoning ability. Our project can be accessed at https://github.com/MetaEvo/ERO for reproduction needs. This paper does not advertise for LLMs, but explores more possibilities. — The authors *Corresponding author (gongyuejiao@gmail.com) 1 Introduction Machine intelligence (often interchangeably used with AI) has experienced ups and downs within a long river of his- tory (Legg and Hutter 2007; Minsky 2007; LeCun, Ben- gio, and Hinton 2015). Since the initial proposal of AI at 1950s (McCarthy et al. 2006), an evolution path has been ob- served: from basic theories (Shannon 1948; Turing 1950) to concrete architectures (Rosenblatt 1958; Fukushima 1980; Hochreiter and Schmidhuber 1997; Vaswani et al. 2017; Gu and Dao 2024) and algorithms (Robbins and Monro 1951; Werbos 1994; Graves 2013; Loshchilov and Hutter 2017). Today, the application of AI has spread to every corner of the world. Domains such as image processing (Gonzalez 2009), nature language processing (Bengio et al. 2003) and scien- tific discovery (Jumper et al. 2021) benefit from its learning power and corresponding human-competitive performance. However, we should not overlook the dark side of ad- vanced machine intelligence (i.e., LLMs) simply due to its twinkling academic and engineering achievements (Zhou et al. 2024; Li et al. 2024; Novikov et al. 2025a). In other words, we have to realize that LLMs, though pre-trained with massive human knowledge prior, may still operate at the pattern recognition (fast thinking, System 1 reasoning) level, and hence lacks long-chain, deep, logical reasoning ability (slow thinking, System 2 reasoning), as testified in recent competitions1. As illustrated in Figure 1, such System 2 reasoning in- ability potentially roots from the essential difference be- 1https://arcprize.org/leaderboard arXiv:2512.05760v1 [cs.AI] 5 Dec 2025 tween the evolution of machine intelligence and that of our human beings (Cosmides and Tooby 1994; Pinker 2003). For human beings, we are continually involved in evolu- tionary process under open-ended environmental selection pressure, which follows the survival of the fittest principle proposed by Darwin (Darwin, Burrow, and Burrow 1958). The “open-ended” term is used to reference extreme gener- alization scenario where environmental uncertainty is natu- rally unknown by human beings (Wolpert 2024). In contrast, almost all machine intelligence instances are trained for spe- cific application scopes explicitly restricted by their develop- ers (human beings). The feedback or learning signal in their learning loops may inherently restricts them from general intelligence with logic reasoning (Wolpert and Macready 2002). To make this point clearer, we borrow the valu- able perspective from developmental psychology (Spelke and Kinzler 2007), which holds the position that: human- lev

📸 Image Gallery

arc-intro.png evolution.png intro.png prompt.png showcase.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut