/AI2h ago

Paper Argues Automated AI Alignment Harder Than Expected

71244203.7K

Original post

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

1. We are aiming at higher confidence via a portfolio of theory and empirics bets (which could all fail!) 2. We’ll invest heavily in automation for fast progress 3. Theory boosts automation, via better filters for good research directions

8:36 AM · Jun 10, 2026 · 1.1K Views

/AI2h ago

Paper Argues Automated AI Alignment Harder Than Expected

71244203.7K

#349

Original post

Geoffrey Irving@geoffreyirving#349inAI

Geoffrey Irving@geoffreyirving

Sequent’s goal is to clear a higher bar:

8:36 AM · Jun 10, 2026 · 1.1K Views

Sentiment

Many users expressed excitement about the paper arguing automated AI alignment is harder than expected due to opportunities for collaboration with its authors and organization.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.3KLIKES37RETWEETS2

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

Geoffrey Irving@geoffreyirving

2h1.3K374

BOOKMARKS8REPLIES3

Geoffrey Irving@geoffreyirving

Please reach out if you’re interested in working with us! Sequent will have a large in-person presence in Berkeley, as well as researchers remote from London, Melbourne, and elsewhere. 🇺🇸🇬🇧🇦🇺

1. Full post: http://sequent.org/launch 2. Express interest: http://sequent.org/apply

2h907308

Geoffrey Irving@geoffreyirving

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

2h664275

Geoffrey Irving@geoffreyirving

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

Geoffrey Irving@geoffreyirving

2h710304

Geoffrey Irving@geoffreyirving

Different research bets can help each other! Partial successes from one area will fill the gaps in others, increasing the value of bringing them together in one organization, and will focus on fast publication for sharing and engagement with the broader community. ❤️

2h693284

Geoffrey Irving@geoffreyirving

I'm excited to work with you, @danielmurfet!

2h809194

Geoffrey Irving@geoffreyirving

@danielmurfet I'm excited to work with you, @jesse_hoogland!

2h741203

gavin leech (Non-Reasoning)@gleech

@geoffreyirving AF link broke

2h272

Geoffrey Irving@geoffreyirving

@gleech Thank you, should be fixed!

2h261

Judd Rosenblatt@juddrosenblatt

Excited to see a new @geoffreyirving group pursuing a portfolio approach to alignment!

Great thesis: "theory unlocks higher automation"

Geoffrey Irving@geoffreyirving

1h51640

Global AI Security Summit 2026@secsummitai

@geoffreyirving Congratulations @geoffreyirving - our organisation will definitely reach out for potential areas of synergy....

1h3