Constitutional AI Isn’t Actually Virtue Ethics

Source: LessWrong

Anthropic’s framing of Constitutional AI as character-based alignment obscures what it actually does: enforce rules through fine-tuning and critique, not cultivate internalized virtues. The LessWrong critique exposes a real gap between the marketing of AI systems as “principled” versus their mechanistic reliance on behavioral constraints—a distinction that matters as companies scale safety claims. If virtue ethics requires something closer to genuine practical wisdom rather than rule compliance, then the entire premise of training systems against a written constitution may be chasing the wrong target, and this mismatch will only widen as model capabilities outpace the specificity of any fixed ruleset.

Related Signals