The Learning Deficit
Modern artificial intelligence, despite its impressive capabilities, suffers from a critical limitation: it ceases to learn once deployed. Unlike children
who continuously absorb information and adapt to their surroundings, AI models remain static, requiring extensive human intervention and retraining when encountering new scenarios. This reliance on 'MLOps'—complex pipelines managed by human experts for data collection, training module design, and model rebuilding—creates significant constraints. AI trained on internet data often performs unpredictably in real-world situations that deviate from its training set, unable to adjust to changing environments or learn from its own errors. All learning in these systems is confined to an offline phase, executed entirely by human oversight before deployment.
Two Pillars of Learning
The proposed solution hinges on integrating two fundamental learning modes, inspired by biological systems. 'System A' encompasses learning through observation, where agents build internal world models by watching and predicting, akin to infants recognizing faces or current AI models performing self-supervised learning like text prediction or image analysis. While effective for pattern discovery and scalability, System A struggles to distinguish correlation from causation and is detached from action. Complementing this is 'System B', which represents learning from action. This involves trial-and-error, reinforcement learning, and goal-directed behavior, similar to a child learning to walk. Its strength lies in its grounding in real-world consequences and ability to discover novel solutions, but it is notoriously sample-inefficient, demanding extensive interaction. Biological systems seamlessly integrate these, with perception guiding action and actions refining perception.
Introducing System M
To bridge the gap between these learning modes and enable dynamic adaptation, the researchers propose 'System M,' an overarching organizer that manages learning dynamically. System M acts as an intelligent controller, monitoring internal signals such as prediction errors, uncertainty levels, and task performance to make crucial meta-decisions. It answers questions like which data warrants attention, whether to explore new possibilities or exploit current knowledge, and when to prioritize learning from observation versus action. This mechanism naturally governs human and animal learning—babies focus on salient stimuli like faces and voices, children explore when uncertain and practice when confident, and brains process information even during sleep. Implementing System M in AI would automate tasks currently performed by humans, including selecting relevant data, fine-tuning learning rates, and switching between different learning strategies, allowing AI to adapt autonomously based on its ongoing learning experiences rather than fixed training protocols.
Building Autonomous AI
The path to creating AI systems capable of autonomous learning involves a two-timescale approach inspired by biological evolution and development. On a developmental timescale, an AI agent learns throughout its 'lifetime,' continuously updating Systems A and B through environmental interactions, all orchestrated by a fixed System M. The second timescale is evolutionary, where System M itself is optimized over millions of simulated lifetimes. A 'fitness function' guides this process, rewarding agents that demonstrate rapid and robust learning across diverse and unpredictable environments. This computational approach, while demanding, would leverage evolutionary algorithms to discover highly effective meta-control policies, mimicking how evolution has shaped human learning instincts over millennia. This shift promises AI that can truly improve from experience and navigate complex, real-world challenges.














