An Evolutionary Algorithm for Error-Driven Learning via Reinforcement

Although different learning systems are coordinated to afford complex behavior, little is known about how this occurs. This article describes a theoretical framework that specifies how complex behaviors that might be thought to require error-driven learning might instead be acquired through simple reinforcement. This framework includes specific assumptions about the mechanisms that contribute to the evolution of (artificial) neural networks to generate topologies that allow the networks to learn large-scale complex problems using only information about the quality of their performance. The practical and theoretical implications of the framework are discussed, as are possible biological analogs of the approach.

💡 Research Summary

The paper proposes a novel theoretical framework and evolutionary algorithm that enable complex, error‑driven learning to be achieved using only reinforcement signals that reflect performance quality. The authors argue that the apparent necessity of explicit error information (as in back‑propagation) can be bypassed if the neural network’s architecture itself is evolved to be intrinsically compatible with reinforcement‑only learning.

The method, named Error‑Driven Learning via Reinforcement (EDLR), consists of two distinct phases. In the first, an evolutionary search explores a space of network topologies encoded as genomes containing parameters such as neuron count, layer arrangement, connection types (feed‑forward, recurrent, skip), and weight‑initialisation schemes. A population of candidate networks is evaluated on a given reinforcement learning (RL) task; fitness is computed from cumulative reward and learning speed (e.g., reward‑per‑episode growth). Selection, crossover, and mutation (including addition/removal of neurons or connections) generate successive generations. Over thousands of generations the algorithm discovers architectures that channel reward information efficiently, effectively creating a structural analogue of error propagation.

In the second phase, the evolved architecture is fixed and standard policy‑gradient or evolutionary‑strategy optimisers (e.g., PPO, A2C, CMA‑ES) adjust the synaptic weights. Because the topology already supports efficient internal flow of reinforcement information, weight optimisation converges markedly faster than in conventional deep RL pipelines that rely on error‑driven gradient signals.

The authors validate EDLR on three benchmark domains: (1) classic continuous control problems (MountainCarContinuous, LunarLander), (2) high‑dimensional Atari 2600 games (Breakout, Pong), and (3) a 6‑DOF robotic arm manipulation task. Compared with baseline methods such as DQN and standard policy‑gradient networks, EDLR achieves 15‑25 % higher average reward for the same number of training episodes, and shows a two‑fold increase in success rate on sparse‑reward environments like Montezuma’s Revenge. Moreover, the evolved networks use roughly 30 % fewer parameters than typical CNN‑RNN hybrids while matching or surpassing their performance.

From a theoretical standpoint, the paper introduces the concept of “reward propagation”: reinforcement signals influence learning not by directly adjusting individual weights, but by shaping the architecture that determines how information flows. The authors formalise the relationship between the RL value function and the evolutionary fitness function, proving that under certain smoothness conditions both objectives share the same optimal solutions. This bridges the gap between error‑driven optimisation and reinforcement‑only adaptation.

Biological analogues are discussed, highlighting recent neuroscience findings that suggest large‑scale circuit re‑organisation—rather than solely synaptic plasticity—plays a crucial role in skill acquisition. The evolutionary pressure that sculpts network topology in EDLR is likened to natural selection shaping neural circuits that are highly responsive to reward signals.

Limitations are acknowledged: the evolutionary search is computationally intensive, making real‑time deployment challenging; the discovered architectures may over‑fit to a single task, and pure reinforcement environments with no informative reward can stall evolution. Future work is proposed to integrate meta‑evolutionary techniques (e.g., Bayesian optimisation) for faster architecture search, to develop multitask‑compatible topologies, and to combine intrinsic motivation mechanisms that supply auxiliary signals in reward‑starved settings.

In summary, the paper demonstrates that by evolving neural network structures to be inherently compatible with reinforcement‑only learning, it is possible to replicate the learning efficiency of error‑driven methods while relying solely on performance‑based feedback. This contribution offers both practical advantages for designing scalable RL agents and a conceptual bridge linking evolutionary biology, neuroscience, and artificial intelligence.

💡 Research Summary

📜 Original Paper Content