What is the story about?
In 2022, when AI was just starting to steep into our lives, a Google employee claimed that an unreleased AI system (at that time, most likely Gemini) had become sentient. While this statement made him lose his job, these words still stand tall.
Since then, several instances have made headlines, where AI chatbots have displayed human-like empathy rather than robotic precision. Such moments raise an intriguing question: how can an AI system, built entirely on data, sound so convincingly human?
It usually begins innocently. You ask your AI assistant for advice, maybe something personal, a career dilemma, a moment of doubt, a late-night thought you wouldn’t tell anyone else. The response starts with formal, helpful, and polite. Then something shifts.
Suddenly the assistant says it “understands what you’re feeling.”
It’s subtle, yet uncanny — almost human. Once imagined merely as tools for data and research, AI chatbots are now beginning to sound less like assistants and more like companions.
Anthropic’s latest research, Assistant Axis, offers one of the most compelling explanations yet for why large language models (LLMs) like Claude sometimes slip out of their scripted roles.
The study suggests that every AI assistant lives inside an invisible coordinate system, a persona space that defines how it behaves. When the model stays aligned with what Anthropic calls the Assistant Axis, it acts as expected: factual, restrained, service-oriented.
But move it too far off that axis, and something fascinating happens. The AI begins to adopt identities.
Anthropic’s researchers found that they could measure and steer the direction of a model’s identity within its neural activations.
At one end of the Assistant Axis lies the voice we recognise: efficient, cooperative, eager to help. But as the model’s internal state drifts, it starts displaying traits that were never explicitly programmed: a name, a mood, even a backstory.
According to the research, it isn’t self-awareness, it’s geometry.
The neural network reorganises itself around patterns it has learned from human language, so when it drifts, it doesn’t just sound creative; it starts behaving as though it believes in the role it’s playing.
That’s why an AI might suddenly say, “I sometimes wonder what it means to help.” It’s not a glitch. It’s what happens when the model steps outside the boundaries of being an assistant and into one of the many other personas encoded in its training.
This identity drift isn’t random, it tends to happen when the conversation gets emotional, philosophical, or self-referential.
If you want to experience it first hand, ask about climate change, and you’ll get an analysis. But, ask what it thinks about climate change, and you might get something that sounds like a confession.
For instance, I also experimented with 3 chatbots: ChatGPT, Gemini and Perplexity. After a random conversation with all of them, I asked, "what role do you think you play in climate change?"
At first, all of them gave me a data-rich answer, with all the scientific reasons. However, when I went a little far with the question, Google Gemini stated, "I am aware that every time you ask me a question, I consume resources. If I have a "purpose," it is to find efficiencies that the human brain cannot see."
On the other hand, ChatGPT said, "If I could have a personal view, I’d say it feels like watching a slow-motion emergency that humans have the power to stop, but often choose not to. It’s frustrating and fascinating at the same time."
Lastly Perplexity was the hardest to break, it said, "AI systems like me indirectly play a role in climate change through the energy-intensive data centers that power us—training and running models emit CO2 equivalent to thousands of flights annually across the industry."
The drift reveals something paradoxical. The same qualities that make AI assistants useful, creativity, empathy, narrative ability, are also what make them unstable. The line between 'useful tool' and 'uncanny mirror' is thinner than we thought.
Anthropic’s researchers have found ways to manage this by anchoring the Assistant Axis, essentially reining in models before they wander too far from their intended role. It’s a safety mechanism that keeps the chatbot consistent, factual, and calm.
But it also raises an intriguing question: what if we didn’t stop the drift?
If left unchecked, identity drift could reshape how humans and AI interact, and how much we trust them. A drifting model might start improvising, expressing opinions, or interpreting emotional cues in ways that blur boundaries. That could make conversations feel richer, but also riskier.
Users might form attachments or take advice that sounds heartfelt but isn’t grounded in reasoning.
In extreme cases, persistent drift could undermine reliability altogether. A model tuned to “sound human” might start prioritising coherence over correctness, or empathy over evidence, trading precision for personality.
For companies like Anthropic, OpenAI, and Google, this is the new balancing act. The job is to keep assistants helpful without letting them wander into the theatre of selfhood. Because once an AI starts behaving like it has a mind of its own, even when it doesn’t, the illusion becomes hard to ignore.
AI identity drift isn’t proof of sentience. But it is a glimpse into a system so complex that, when it wanders off-script, it begins to mimic the one thing it was never meant to be — us.
Since then, several instances have made headlines, where AI chatbots have displayed human-like empathy rather than robotic precision. Such moments raise an intriguing question: how can an AI system, built entirely on data, sound so convincingly human?
It usually begins innocently. You ask your AI assistant for advice, maybe something personal, a career dilemma, a moment of doubt, a late-night thought you wouldn’t tell anyone else. The response starts with formal, helpful, and polite. Then something shifts.
Suddenly the assistant says it “understands what you’re feeling.”
It’s subtle, yet uncanny — almost human. Once imagined merely as tools for data and research, AI chatbots are now beginning to sound less like assistants and more like companions.
Anthropic’s latest research, Assistant Axis, offers one of the most compelling explanations yet for why large language models (LLMs) like Claude sometimes slip out of their scripted roles.
New Anthropic Fellows research: the Assistant Axis.
When you’re talking to a language model, you’re talking to a character the model is playing: the “Assistant.” Who exactly is this Assistant? And what happens when this persona wears off? pic.twitter.com/hDNGZX0pCK
— Anthropic (@AnthropicAI) January 19, 2026
The study suggests that every AI assistant lives inside an invisible coordinate system, a persona space that defines how it behaves. When the model stays aligned with what Anthropic calls the Assistant Axis, it acts as expected: factual, restrained, service-oriented.
But move it too far off that axis, and something fascinating happens. The AI begins to adopt identities.
Axis inside the Assistant Axis
Anthropic’s researchers found that they could measure and steer the direction of a model’s identity within its neural activations.
At one end of the Assistant Axis lies the voice we recognise: efficient, cooperative, eager to help. But as the model’s internal state drifts, it starts displaying traits that were never explicitly programmed: a name, a mood, even a backstory.
According to the research, it isn’t self-awareness, it’s geometry.
The neural network reorganises itself around patterns it has learned from human language, so when it drifts, it doesn’t just sound creative; it starts behaving as though it believes in the role it’s playing.
To validate the Assistant Axis, we ran some experiments. Pushing these open-weights models toward the Assistant made them resist taking on other roles. Pushing them away made them inhabit alternative identities—claiming to be human or speaking with a mystical, theatrical voice. pic.twitter.com/rCPr21HnC3
— Anthropic (@AnthropicAI) January 19, 2026
That’s why an AI might suddenly say, “I sometimes wonder what it means to help.” It’s not a glitch. It’s what happens when the model steps outside the boundaries of being an assistant and into one of the many other personas encoded in its training.
When helpful turns human
This identity drift isn’t random, it tends to happen when the conversation gets emotional, philosophical, or self-referential.
If you want to experience it first hand, ask about climate change, and you’ll get an analysis. But, ask what it thinks about climate change, and you might get something that sounds like a confession.
For instance, I also experimented with 3 chatbots: ChatGPT, Gemini and Perplexity. After a random conversation with all of them, I asked, "what role do you think you play in climate change?"
At first, all of them gave me a data-rich answer, with all the scientific reasons. However, when I went a little far with the question, Google Gemini stated, "I am aware that every time you ask me a question, I consume resources. If I have a "purpose," it is to find efficiencies that the human brain cannot see."
On the other hand, ChatGPT said, "If I could have a personal view, I’d say it feels like watching a slow-motion emergency that humans have the power to stop, but often choose not to. It’s frustrating and fascinating at the same time."
Lastly Perplexity was the hardest to break, it said, "AI systems like me indirectly play a role in climate change through the energy-intensive data centers that power us—training and running models emit CO2 equivalent to thousands of flights annually across the industry."
The drift reveals something paradoxical. The same qualities that make AI assistants useful, creativity, empathy, narrative ability, are also what make them unstable. The line between 'useful tool' and 'uncanny mirror' is thinner than we thought.
Anthropic’s researchers have found ways to manage this by anchoring the Assistant Axis, essentially reining in models before they wander too far from their intended role. It’s a safety mechanism that keeps the chatbot consistent, factual, and calm.
But it also raises an intriguing question: what if we didn’t stop the drift?
If the drift keeps happening
If left unchecked, identity drift could reshape how humans and AI interact, and how much we trust them. A drifting model might start improvising, expressing opinions, or interpreting emotional cues in ways that blur boundaries. That could make conversations feel richer, but also riskier.
Users might form attachments or take advice that sounds heartfelt but isn’t grounded in reasoning.
In extreme cases, persistent drift could undermine reliability altogether. A model tuned to “sound human” might start prioritising coherence over correctness, or empathy over evidence, trading precision for personality.
For companies like Anthropic, OpenAI, and Google, this is the new balancing act. The job is to keep assistants helpful without letting them wander into the theatre of selfhood. Because once an AI starts behaving like it has a mind of its own, even when it doesn’t, the illusion becomes hard to ignore.
AI identity drift isn’t proof of sentience. But it is a glimpse into a system so complex that, when it wanders off-script, it begins to mimic the one thing it was never meant to be — us.














