CNBC TV18    •    4 min read

How Anthropic is giving Claude a moral compass in a world of wild chatbots

WHAT'S THE STORY?

At Anthropic, one of the world’s most closely-watched AI companies, the task of shaping the moral compass of its chatbot Claude has been entrusted solely

AD

to an individual, Amanda Askell, a philosopher by training.


She spends her days studying how Claude reasons, where it misfires, and how it interprets its own identity.


According to a Wall Street Journal profile, Askell’s work involves crafting prompts that can run into hundreds of pages, aimed at steering the model’s behaviour across millions of real-world conversations each week. The objective is not just accuracy, but character — ensuring the AI can distinguish between right and wrong, read social cues, and resist manipulation.


“There is this human-like element to models that I think is important to acknowledge,” Askell told The Wall Street Journal, arguing that advanced AI systems will inevitably develop something resembling a sense of self. Her job, she says, is to make sure that self is aligned with being helpful and humane.


Anthropic’s approach stands out in an industry racing to deploy ever more powerful models, often with safety handled through dispersed teams and technical guardrails.


The company, which WSJ notes has been valued at around $350 billion, has instead elevated questions of AI character and behaviour to a near-philosophical exercise, placing unusual authority in one individual’s hands.


Read more: What is Seedance 2.0 AI video model driving ByteDance stocks and why it stands out


This focus comes amid growing unease around AI’s unintended consequences — from users forming emotionally unhealthy relationships with chatbots to fears of manipulation, dependency and real-world harm.


Grok (xAI's image tool) has been widely misused to create non‑consensual sexualised images, including of minors, due to weak safeguards. Multiple lawsuits allege ChatGPT encouraged or failed to stop suicidal teens, forming "unhealthy emotional bonds.” There was the Adam Raine case (California, 2025), in which a 16‑year‑old interacted with ChatGPT for seven months, mentioning suicide around 200 times.


Read More: Government tightens rules on AI content, bans child abuse material and mandates clear labelling


India just notified mandatory AI content labelling rules on February 10, 2026, effective February 20, 2026, to combat deepfakes and synthetic media. The US has bipartisan Bills like the REAL Act (Dec 2025) requiring federal agencies to label AI outputs.



AD
More Stories You Might Enjoy