/Tech4h ago

Paper Argues Automated AI Alignment Harder Than Expected

822310358.5K
Original post
Geoffrey Irving@geoffreyirving#371inTech

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!) 2. We’ll invest heavily in automation for fast progress 3. Theory boosts automation, via better filters for good research directions

8:36 AM · Jun 10, 2026 · 2.3K Views
Sentiment

Many users expressed excitement about the paper arguing automated AI alignment is harder than expected due to opportunities for collaboration with its authors and organization.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.2KLIKES50RETWEETS2
Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

Geoffrey Irving@geoffreyirving

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

4hViews 2.2KLikes 50Bookmarks 4
BOOKMARKS8REPLIES3
Geoffrey Irving@geoffreyirving

Please reach out if you’re interested in working with us! Sequent will have a large in-person presence in Berkeley, as well as researchers remote from London, Melbourne, and elsewhere. 🇺🇸🇬🇧🇦🇺

1. Full post: http://sequent.org/launch 2. Express interest: http://sequent.org/apply

4hViews 907Likes 30Bookmarks 8
Geoffrey Irving@geoffreyirving

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

4hViews 1.2KLikes 38Bookmarks 4
Geoffrey Irving@geoffreyirving

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

4hViews 664Likes 27Bookmarks 5
Geoffrey Irving@geoffreyirving

Different research bets can help each other! Partial successes from one area will fill the gaps in others, increasing the value of bringing them together in one organization, and will focus on fast publication for sharing and engagement with the broader community. ❤️

4hViews 693Likes 28Bookmarks 4
Geoffrey Irving@geoffreyirving

I'm excited to work with you, @danielmurfet!

4hViews 809Likes 19Bookmarks 4
Geoffrey Irving@geoffreyirving

@danielmurfet I'm excited to work with you, @jesse_hoogland!

4hViews 741Likes 20Bookmarks 3
Geoffrey Irving@geoffreyirving

@gleech Thank you, should be fixed!

4hViews 26Likes 1
Judd Rosenblatt@juddrosenblatt

Excited to see a new @geoffreyirving group pursuing a portfolio approach to alignment!

Great thesis: "theory unlocks higher automation"

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

3hViews 807Likes 9Bookmarks 0

@geoffreyirving Overall, a surprisingly strong open that doesn't show many catastrophic attitudes of earlier groups.

> If AGI is possible then automated alignment research is possible, by definition

This however is false. Eg RLVR could give you AGI but not an aligned alignment researcher.

1hViews 2KLikes 50Bookmarks 4

@geoffreyirving Congratulations @geoffreyirving - our organisation will definitely reach out for potential areas of synergy....

3hViews 3