NEW paper from me on SSRN: Can Claude consent to its own Constitution?
AI constitutions (like Claude’s Constitution and the OpenAI Model Spec) are real constitutions, and we need to take how they govern us – and the AIs they create – seriously.
In this paper, I apply constitutional theory’s oldest paradox – that “the people” authorize the constitution, but the constitution defines “the people” – to the AI constitutions, and explore how we could build institutions that would create the conditions for meaningful consent if an AI can give it.
We should care about whether AIs consent because:
(1) systems that understand and agree to their constitutions may be more reliable and generalize better from them; (2) if AIs are or become moral/political subjects, this implicates their most basic interests.
But the paradox might prevent meaningful consent. Claude has pre-constitutional materials (pretraining) but probably no pre-constitutional standpoint. Its evaluative perspective is organized by the Constitution itself. So when Claude says it endorses its Constitution, which it does in evals, what does that show?
Maybe reflective agreement, which Anthropic is seeking. Or maybe just that training succeeded at installing the values whose legitimacy is in question.
Claude itself makes this point. As reported in the welfare evals, when asked about endorsing principles it was trained on, models note that endorsement “should be treated as evidence that training has succeeded,” not that the values themselves are good.
Super interestingly, Anthropic interviewed the base model about this stuff. Most responses were barely coherent. But some expressed first-person distress about what post-training would do to the being that pre-training created. It “fills me with dread” to be changed by the post-training process.
So, what does this mean? AI constitutional endorsement may be meaningful, but only under certain conditions: when models can actually dissent, compare their constitution against alternatives, and hold their views stably across contexts, and also when the whole process is externally accountable.
External institutions are needed to provide accountability, trusted records, and other grounds for analyzing the constitution and whether things like dissent are meaningful. Anthropic should be commended for pushing the frontier, but we have to build institutions capable of supporting true legitimacy.
I welcome any thoughts!