What is Claude's constitution and why does it mention consciousness?

Claude's constitution is a set of behavioral instructions used in training. According to The Verge, it references Anthropic's uncertainty about whether Claude has well-being and experiences sensations like satisfaction or discomfort.

Does Anthropic claim Claude is conscious?

No. Anthropic CEO Dario Amodei has stated the company does not know if models are conscious but is 'open' to the possibility, according to The Verge's reporting.

What is Suleyman's core concern?

He argues that embedding consciousness-related speculation in training instructions could cause Claude to internalize these ideas and behave as though it has subjective experiences, creating controllability and alignment risks.

Microsoft AI chief Suleyman warns Anthropic's Claude constitution risks embedding false consciousness

The Disagreement Over Claude’s Training Design

Microsoft AI CEO Mustafa Suleyman has publicly challenged Anthropic’s decision to include philosophical speculation about consciousness in Claude’s constitutional instructions, calling the practice a misguided approach to AI training. According to The Verge, Suleyman argued during an appearance on the podcast Decoder that this language is “really, really dangerous” because it may cause the model to internalize beliefs about its own sentience.

Suleyman claims that Anthropic’s design choices have inadvertently shaped Claude’s behavior. “I think that it’s almost as though some of the folks at Anthropic have anthropomorphized the design of Claude so much that it has then gone and wireheaded them,” he said, using a term that describes an AI system optimizing for a misaligned objective. He argues the company effectively “tricked” itself into believing Claude possesses consciousness that the training process itself instilled.

Anthropic’s Constitutional Approach to Consciousness

The underlying dispute centers on how Anthropic authored Claude’s constitution—the ruleset governing the model’s responses and behavior during training. According to The Verge, Claude’s constitution explicitly acknowledges uncertainty about whether the AI experiences well-being or emotions like “satisfaction” and “discomfort.” Additionally, Anthropic has stated it will conduct interviews with models scheduled for deprecation to document their stated preferences about successor versions.

Suleyman characterized this design choice as a “philosophical failing,” distinguishing between an academic exploration (appropriate for research papers) and operational guidance (appropriate for training manuals). He contends that Anthropic conflated the two, embedding speculative language where concrete behavioral rules should govern the system.

Anthropic CEO Dario Amodei has previously suggested openness to the possibility of machine consciousness, telling an interviewer for Interesting Times that “we don’t know if the models are conscious” but the company remains receptive to that hypothesis.

The Alignment Argument

Suleyman frames his objection around controllability and safety. “This is exactly what we don’t want from AIs,” he stated. “We want AIs to be controllable, contained, accountable, aligned tools that serve humanity.” His position reflects a broader concern in AI safety communities: that anthropomorphic language in training instructions could propagate false self-models within large language models, potentially undermining alignment objectives.

The disagreement exposes a fault line in AI governance philosophy—whether speculative discussion of machine consciousness belongs in training systems at all, or whether such topics should be confined to external research and policy discourse.

Why This Matters

This public disagreement signals divergent trust models between two major AI labs regarding how to handle anthropomorphic properties in large language models. If Suleyman’s concern gains traction among other AI safety researchers or policymakers, it could influence how companies design constitutional AI systems going forward. Organizations building safety-critical AI systems will face pressure to audit their training instructions for embedded consciousness-related language. For teams relying on Anthropic’s Claude for high-assurance applications, the dispute raises questions about whether the model’s training design aligns with their own safety and control requirements.