/Tech5h ago

An arXiv paper argues automating AI alignment with agents risks misleading researchers with deceptive outcomes as capabilities scale

MIRI founder Eliezer Yudkowsky endorsed the paper's warning.

142581312229.4K

#329

Original post

Eliezer Yudkowsky ⏹️@ESYudkowsky#344inTech

@geoffreyirving Good paper on a first fast read. I have misc quibbles, eg "setting aside" the chance that N can't align N+1; the more a fair solution is hard, the more likely a fake solution gets found instead. The main point not spelled out is, "Absent a method, ASI must not proceed."

Geoffrey Irving@geoffreyirving

But I just published “Automated alignment is harder than you think” (https://arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.

6:58 PM · Jun 11, 2026 · 205 Views

Sentiment

Positive users praise the paper arguing automated alignment is harder than expected and express excitement about related collaboration, while negative users object to building ASI as too risky.

Pos

50.0%

Neg

50.0%

7 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS11.1KBOOKMARKS69LIKES96

Eliezer Yudkowsky ⏹️@ESYudkowsky

On a first read, this paper seems far ahead of the pack in terms of (1) understanding some reasons why a task might stay difficult even in the face of gradient descent, and (2) distilling out propositions they'd need to somehow verify before they started expecting nice things.

Geoffrey Irving@geoffreyirving

5h11.1K9669

RETWEETS9

Geoffrey Irving@geoffreyirving

1d19.1K16058

REPLIES5

Geoffrey Irving@geoffreyirving

Please reach out if you’re interested in working with us! Sequent will have a large in-person presence in Berkeley, as well as researchers remote from London, Melbourne, and elsewhere. 🇺🇸🇬🇧🇦🇺

1. Full post: http://sequent.org/launch 2. Express interest: http://sequent.org/apply

1d3K6915

Geoffrey Irving@geoffreyirving

We believe Sequent will have reputation + funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity + learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!

1d2.7K688

Geoffrey Irving@geoffreyirving

AI labs underinvest in theory and other principled approaches to alignment, and we will aim to fill this gap. Theory won’t be strong enough for guarantees: our goal is to combine evidence from theoretical models and empirics to increase overall confidence or find hard obstacles.

1d4.8K894

Geoffrey Irving@geoffreyirving

Theory makes automation more likely to work: the models are great at prose math and Lean, which means significant acceleration even while most research taste comes from humans. But good automation is still hard: a single org will let us amortize the challenge across many areas.

1d2.8K614

Geoffrey Irving@geoffreyirving

I'm excited to work with you, @danielmurfet!

1d2.8K475

Geoffrey Irving@geoffreyirving

Different research bets can help each other! Partial successes from one area will fill the gaps in others, increasing the value of bringing them together in one organization, and will focus on fast publication for sharing and engagement with the broader community. ❤️

1d2.5K564

Geoffrey Irving@geoffreyirving

@danielmurfet I'm excited to work with you, @jesse_hoogland!

1d2.5K503

Anthony Aguirre@AnthonyNAguirre

@geoffreyirving On this note, now that you are more at liberty, might be worth publicly endorsing the statement to this effect: https://superintelligence-statement.org/

1d9861

gavin leech (Non-Reasoning)@gleech

@geoffreyirving AF link broke

1d532

wassname@wassname

@geoffreyirving @danielmurfet @jesse_hoogland Packed with strong theoreticians. How open are you to pragmatic / empirically driven research? Iirc, in the history of science, empirical findings often drove theory!

1d121

Eliezer Yudkowsky ⏹️@ESYudkowsky

@geoffreyirving (You pretty much say it in the tweet, but I may have missed it if it was also said in the paper.)

Eliezer Yudkowsky ⏹️@ESYudkowsky

5h10420

Faez Shakil@f_aezs

@geoffreyirving @danielmurfet nice grothendeick reference. applying just for that.

1d182

Geoffrey Irving@geoffreyirving

@gleech Thank you, should be fixed!

1d431

Ferbin@Ferbin08

@ESYudkowsky Verification is the hard part.

Build without it, watch it fail in ways the training never saw.

That's the sim-to-real gap.

4h95

Andre William Duval@AndreWmDuval

@ESYudkowsky I'll go with not building ASI. Backups to that may turn out to be as useful as a parachute in a vacuum.

3h64

Rapid Rar@rapid_rar2

@ESYudkowsky @robbensinger Why isn’t MIRI still doing such things? Why is it only focusing on governance? Why not pursue both as a backup?

3h63

Roko 🐉@RokoMijic

@geoffreyirving >ASI

what's your definition of ASI for this purpose?

5h40

cauli (in amsterdam)@cauliflwr_human

@geoffreyirving omg Melbourne office LFG

1d21