/Tech2h ago

Expert Prefers OpenAI Corrigibility Model Spec Over Anthropic Constitution

8420213.2K

Original post

Ryan Greenblatt@RyanPGreenblatt#1104inTech

I prefer OpenAI's corrigibility focused model spec over Anthropic's constitution which involves intentionally instilling (relatively opaque) long-run objectives into the AI.

Anthropic's constitution is well executed for what it is, but I think it's based on a poor approach.

2:31 PM · Jun 11, 2026 · 1.9K Views

/Tech2h ago

Expert Prefers OpenAI Corrigibility Model Spec Over Anthropic Constitution

8420213.2K

#1104

Original post

Ryan Greenblatt@RyanPGreenblatt#1104inTech

I prefer OpenAI's corrigibility focused model spec over Anthropic's constitution which involves intentionally instilling (relatively opaque) long-run objectives into the AI.

Anthropic's constitution is well executed for what it is, but I think it's based on a poor approach.

2:31 PM · Jun 11, 2026 · 1.9K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS520BOOKMARKS3

Ryan Greenblatt@RyanPGreenblatt

I don't dislike all aspects of Anthropic's approach. It seems good to focus on higher level principles that you explain when trying to instill properties into an AI. But this doesn't require having explicit or implicit long-run objectives! E.g., see here: https://www.lesswrong.com/posts/ffCFgBsaxg2FyJ9df/mis-generalization-of-helpful-only-fine-tuning-1

Ryan Greenblatt@RyanPGreenblatt

I don't think this is totally obvious and there are some reasonable arguments going the other way.

But nonetheless I think not instilling long-run objectives is better for reasons similar to reasons discussed here: https://www.lesswrong.com/posts/mLvxxoNjDqDHBAo6K/claude-s-new-constitution?commentId=HxjJFearXCcdon9BJ

2h52043

LIKES4REPLIES1

Ryan Greenblatt@RyanPGreenblatt

I don't think this is totally obvious and there are some reasonable arguments going the other way.

Ryan Greenblatt@RyanPGreenblatt

I prefer OpenAI's corrigibility focused model spec over Anthropic's constitution which involves intentionally instilling (relatively opaque) long-run objectives into the AI.

Anthropic's constitution is well executed for what it is, but I think it's based on a poor approach.

2h39742

Ryan Greenblatt@RyanPGreenblatt

I'm mostly posting this just to publicly register my views on this topic.

A long time ago, I was planning to write up an overall case against instilling long-run objectives, but I never got around to writing something I was happy with so I decided to just post this.

Ryan Greenblatt@RyanPGreenblatt

2h34431