Autonomous development and learning in artificial intelligence and robotics: Scaling up deep learning to human--like learning

Autonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational

💡 Research Summary

The paper argues that the gap between current deep‑learning systems and human learning lies not only in data efficiency or compositionality, as highlighted by Lake et al., but also in the absence of autonomous developmental mechanisms. Humans acquire skills through four intertwined processes: self‑generated goals driven by curiosity and intrinsic motivation; social learning via observation, imitation, and language; embodied interaction with the physical world that yields causal models under strict energy and time constraints; and the use of innate priors such as intuitive physics and psychology that enable compositional reasoning from few examples. In contrast, modern deep networks require engineers to hand‑craft a loss function for each task, rely on massive offline datasets, and learn in a static, non‑incremental fashion.

To bridge this divide, the authors propose a “developmental deep‑learning” framework that integrates meta‑reinforcement learning for autonomous goal generation, multimodal encoders that fuse social cues, and continuous online updates through embodied agents (robots or simulators). Compositional representations are instantiated via graph neural networks or Bayesian structure learning, while innate priors are injected through pretrained physical‑psychological modules. This architecture supports incremental, curriculum‑driven learning where simpler skills are mastered first and then scaffold more complex behaviors, mirroring child development.

Empirical illustrations show that agents equipped with intrinsic‑motivation signals explore more diverse state spaces, acquire skills with dramatically fewer samples, and avoid catastrophic forgetting when curricula are properly ordered. Social interaction modules enable knowledge transfer without explicit labeling, reducing the need for large annotated corpora. Embodiment grounds abstract representations in sensorimotor experience, fostering robust causal inference.

The paper concludes that achieving human‑like learning will require moving beyond static objective functions toward agents that autonomously set, evaluate, and revise their own learning goals. This shift demands a multidisciplinary effort that blends deep learning with insights from developmental robotics, cognitive psychology, and neuroscience. By embedding curiosity, sociality, and embodiment into learning algorithms, future AI systems can progress toward truly open‑ended, lifelong competence.

💡 Research Summary

📜 Original Paper Content