AI Learns Like Humans: New 'System M' Model

New research champions a human-like continuous AI learning model.
Current AI stops learning post-deployment, needs retraining.
New 'System M' allows AI to adapt autonomously, improve.

Summarized by AI ⓘ

Mastering AI

SEE ALL

Feedpost Specials

IIT Delhi Unveils 8th Advanced AI, ML & DL Program: Your Path to Cutting-Edge Tech Expertise

Feedpost Specials

Boost Your Virtual Game Nights with Smart AI Tools!

Feedpost Specials

AI Agents: Revolutionizing Finance Departments and Outperforming Humans in 10 Key Areas

What is the story about?

Current AI stops learning post-deployment. New research champions a human-like continuous learning model, introducing 'System M' for dynamic adaptation and lifelong learning capabilities.

The Learning Deficit

Modern artificial intelligence, despite its impressive capabilities, suffers from a critical limitation: it ceases to learn once deployed. Unlike children

who continuously absorb information and adapt to their surroundings, AI models remain static, requiring extensive human intervention and retraining when encountering new scenarios. This reliance on 'MLOps'—complex pipelines managed by human experts for data collection, training module design, and model rebuilding—creates significant constraints. AI trained on internet data often performs unpredictably in real-world situations that deviate from its training set, unable to adjust to changing environments or learn from its own errors. All learning in these systems is confined to an offline phase, executed entirely by human oversight before deployment.

Two Pillars of Learning

The proposed solution hinges on integrating two fundamental learning modes, inspired by biological systems. 'System A' encompasses learning through observation, where agents build internal world models by watching and predicting, akin to infants recognizing faces or current AI models performing self-supervised learning like text prediction or image analysis. While effective for pattern discovery and scalability, System A struggles to distinguish correlation from causation and is detached from action. Complementing this is 'System B', which represents learning from action. This involves trial-and-error, reinforcement learning, and goal-directed behavior, similar to a child learning to walk. Its strength lies in its grounding in real-world consequences and ability to discover novel solutions, but it is notoriously sample-inefficient, demanding extensive interaction. Biological systems seamlessly integrate these, with perception guiding action and actions refining perception.

Introducing System M

To bridge the gap between these learning modes and enable dynamic adaptation, the researchers propose 'System M,' an overarching organizer that manages learning dynamically. System M acts as an intelligent controller, monitoring internal signals such as prediction errors, uncertainty levels, and task performance to make crucial meta-decisions. It answers questions like which data warrants attention, whether to explore new possibilities or exploit current knowledge, and when to prioritize learning from observation versus action. This mechanism naturally governs human and animal learning—babies focus on salient stimuli like faces and voices, children explore when uncertain and practice when confident, and brains process information even during sleep. Implementing System M in AI would automate tasks currently performed by humans, including selecting relevant data, fine-tuning learning rates, and switching between different learning strategies, allowing AI to adapt autonomously based on its ongoing learning experiences rather than fixed training protocols.

Building Autonomous AI

The path to creating AI systems capable of autonomous learning involves a two-timescale approach inspired by biological evolution and development. On a developmental timescale, an AI agent learns throughout its 'lifetime,' continuously updating Systems A and B through environmental interactions, all orchestrated by a fixed System M. The second timescale is evolutionary, where System M itself is optimized over millions of simulated lifetimes. A 'fitness function' guides this process, rewarding agents that demonstrate rapid and robust learning across diverse and unpredictable environments. This computational approach, while demanding, would leverage evolutionary algorithms to discover highly effective meta-control policies, mimicking how evolution has shaped human learning instincts over millennia. This shift promises AI that can truly improve from experience and navigate complex, real-world challenges.