/Tech4h ago

Paper Argues Automated AI Alignment Harder Than Expected

822310358.5K

Original post

Geoffrey Irving@geoffreyirving#371inTech

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!) 2. We’ll invest heavily in automation for fast progress 3. Theory boosts automation, via better filters for good research directions

8:36 AM · Jun 10, 2026 · 2.3K Views

/Tech4h ago

Paper Argues Automated AI Alignment Harder Than Expected

822310358.5K

#371

Original post

Geoffrey Irving@geoffreyirving#371inTech

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

8:36 AM · Jun 10, 2026 · 2.3K Views

Sentiment

Many users expressed excitement about the paper arguing automated AI alignment is harder than expected due to opportunities for collaboration with its authors and organization.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.2KLIKES50RETWEETS2

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

Geoffrey Irving@geoffreyirving

4h2.2K504

BOOKMARKS8REPLIES3

Geoffrey Irving@geoffreyirving

Please reach out if you’re interested in working with us! Sequent will have a large in-person presence in Berkeley, as well as researchers remote from London, Melbourne, and elsewhere. 🇺🇸🇬🇧🇦🇺

1. Full post: http://sequent.org/launch 2. Express interest: http://sequent.org/apply

4h907308

Geoffrey Irving@geoffreyirving

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

Geoffrey Irving@geoffreyirving

4h1.2K384

Geoffrey Irving@geoffreyirving

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

4h664275

Geoffrey Irving@geoffreyirving

Different research bets can help each other! Partial successes from one area will fill the gaps in others, increasing the value of bringing them together in one organization, and will focus on fast publication for sharing and engagement with the broader community. ❤️

4h693284

Geoffrey Irving@geoffreyirving

I'm excited to work with you, @danielmurfet!

4h809194

Geoffrey Irving@geoffreyirving

@danielmurfet I'm excited to work with you, @jesse_hoogland!

4h741203

gavin leech (Non-Reasoning)@gleech

@geoffreyirving AF link broke

4h272

Geoffrey Irving@geoffreyirving

@gleech Thank you, should be fixed!

4h261

Judd Rosenblatt@juddrosenblatt

Excited to see a new @geoffreyirving group pursuing a portfolio approach to alignment!

Great thesis: "theory unlocks higher automation"

Geoffrey Irving@geoffreyirving

3h80790

Eliezer Yudkowsky@allTheYud

@geoffreyirving Overall, a surprisingly strong open that doesn't show many catastrophic attitudes of earlier groups.

> If AGI is possible then automated alignment research is possible, by definition

This however is false. Eg RLVR could give you AGI but not an aligned alignment researcher.

1h2K504

Global AI Security Summit 2026@secsummitai

@geoffreyirving Congratulations @geoffreyirving - our organisation will definitely reach out for potential areas of synergy....

3h3