/AI2h ago

Paper Argues Automated AI Alignment Harder Than Expected

71244203.7K
Original post
Geoffrey Irving@geoffreyirving#349inAI

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!) 2. We’ll invest heavily in automation for fast progress 3. Theory boosts automation, via better filters for good research directions

8:36 AM · Jun 10, 2026 · 1.1K Views
Sentiment

Many users expressed excitement about the paper arguing automated AI alignment is harder than expected due to opportunities for collaboration with its authors and organization.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.3KLIKES37RETWEETS2
Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

Geoffrey Irving@geoffreyirving

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

2hViews 1.3KLikes 37Bookmarks 4
BOOKMARKS8REPLIES3
Geoffrey Irving@geoffreyirving

Please reach out if you’re interested in working with us! Sequent will have a large in-person presence in Berkeley, as well as researchers remote from London, Melbourne, and elsewhere. 🇺🇸🇬🇧🇦🇺

1. Full post: http://sequent.org/launch 2. Express interest: http://sequent.org/apply

2hViews 907Likes 30Bookmarks 8
Geoffrey Irving@geoffreyirving

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

2hViews 664Likes 27Bookmarks 5
Geoffrey Irving@geoffreyirving

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

2hViews 710Likes 30Bookmarks 4
Geoffrey Irving@geoffreyirving

Different research bets can help each other! Partial successes from one area will fill the gaps in others, increasing the value of bringing them together in one organization, and will focus on fast publication for sharing and engagement with the broader community. ❤️

2hViews 693Likes 28Bookmarks 4
Geoffrey Irving@geoffreyirving

I'm excited to work with you, @danielmurfet!

2hViews 809Likes 19Bookmarks 4
Geoffrey Irving@geoffreyirving

@danielmurfet I'm excited to work with you, @jesse_hoogland!

2hViews 741Likes 20Bookmarks 3
Geoffrey Irving@geoffreyirving

@gleech Thank you, should be fixed!

2hViews 26Likes 1
Judd Rosenblatt@juddrosenblatt

Excited to see a new @geoffreyirving group pursuing a portfolio approach to alignment!

Great thesis: "theory unlocks higher automation"

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

1hViews 516Likes 4Bookmarks 0

@geoffreyirving Congratulations @geoffreyirving - our organisation will definitely reach out for potential areas of synergy....

1hViews 3