How Anthropic is giving Claude a moral compass in a world of wild chatbots

At Anthropic, one of the world’s most closely-watched AI companies, the task of shaping the moral compass of its chatbot Claude has been entrusted solely

to an individual, Amanda Askell, a philosopher by training.

She spends her days studying how Claude reasons, where it misfires, and how it interprets its own identity.

According to a Wall Street Journal profile, Askell’s work involves crafting prompts that can run into hundreds of pages, aimed at steering the model’s behaviour across millions of real-world conversations each week. The objective is not just accuracy, but character — ensuring the AI can distinguish between right and wrong, read social cues, and resist manipulation.

“There is this human-like element to models that I think is important to acknowledge,” Askell told The Wall Street Journal, arguing that advanced AI systems will inevitably develop something resembling a sense of self. Her job, she says, is to make sure that self is aligned with being helpful and humane.

Anthropic’s approach stands out in an industry racing to deploy ever more powerful models, often with safety handled through dispersed teams and technical guardrails.

The company, which WSJ notes has been valued at around $350 billion, has instead elevated questions of AI character and behaviour to a near-philosophical exercise, placing unusual authority in one individual’s hands.

Read more: What is Seedance 2.0 AI video model driving ByteDance stocks and why it stands out

This focus comes amid growing unease around AI’s unintended consequences — from users forming emotionally unhealthy relationships with chatbots to fears of manipulation, dependency and real-world harm.

Grok (xAI's image tool) has been widely misused to create non‑consensual sexualised images, including of minors, due to weak safeguards. Multiple lawsuits allege ChatGPT encouraged or failed to stop suicidal teens, forming "unhealthy emotional bonds.” There was the Adam Raine case (California, 2025), in which a 16‑year‑old interacted with ChatGPT for seven months, mentioning suicide around 200 times.

Read More: Government tightens rules on AI content, bans child abuse material and mandates clear labelling

India just notified mandatory AI content labelling rules on February 10, 2026, effective February 20, 2026, to combat deepfakes and synthetic media. The US has bipartisan Bills like the REAL Act (Dec 2025) requiring federal agencies to label AI outputs.

How Anthropic is giving Claude a moral compass in a world of wild chatbots

WHAT'S THE STORY?

Who is behind Sarvam AI? Meet Pratyush Kumar and Vivek Raghavan, the duo redefining India’s AI future

Meet Amanda Askell (37): The philosopher shaping Claude’s moral code

AI's New Frontier: How Advanced Chatbots Are Reshaping India's Tech Landscape

AI's Code Revolution: Anthropic's Claude Now Writes Nearly 100% of Its Own Code

Anthropic's AI Revolution: Nearly 100% of Code Now AI-Generated, Transforming Software Development

Adani facing US probe over imports of Iranian oil

AI Redefines Data Control: CEOs Must Think Beyond Storage for True Sovereignty

India's Sarvam AI Revolution: Local Models Outshine Global Giants in Key Tasks

SpaceX prioritises lunar 'self-growing city' over Mars project, says Musk

India's Sarvam AI: A Sovereign Breakthrough in AI, Outperforming Global Giants