/Tech27d ago

DeepMind's Rohin Shah argues catastrophic AI misalignment is unlikely and pre-deployment evaluations are the wrong focus for safety

He also warns that public safety signaling diverts engineering resources

--0--

#207

Original post

Samuel Albanie 🇬🇧@SamuelAlbanie#1139inTech

recommended listening

Rob Wiblin@robertwiblin

My best interview in some time.

Rohin Shah leads AGI alignment/safety at DeepMind.

And he has a lot of spicy personal takes:

We probably won’t get catastrophic misalignment (00:49) Safety 'commitments' have severe limitations (10:38) The intelligence explosion probably isn't imminent (1:52:44) Why he's not working to pause AI advances (51:44) Pre-deployment evals aren't the right focus (for catastrophic risks) (37:41) Signalling concern for safety sometimes diverts resources from actually making AI safe (01:09:51) Reading AI thoughts is v useful for safety – and we'll probably be able to for years to come (54:17) Governance is somewhat more likely to be the bottleneck than alignment (43:55) Rohin's team doesn't have a veto, and that's OK (27:36) Central banks are a promising model for regulating AI (33:34)

Also:

Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03)

On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)

3:13 PM · Jun 2, 2026 · 2.3K Views

Sentiment

Some users praise Rohin Shah's elegant and reasonable explanations of AGI alignment and governance in the 80k podcast, while others question whether DeepMind safety efforts are effective and disagree with his opposition to pauses.

Pos

65.6%

Neg

34.4%

17 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS34.3KBOOKMARKS243LIKES366RETWEETS22REPLIES8

Neel Nanda@NeelNanda5

Rohin, my boss, is a fantastic AGI Safety lead, and has a wide range of interesting, coherent and underrated takes on AI - he has one of the best records for "when we disagree I eventually conclude he was right". Go check out several hours of them!

Rob Wiblin@robertwiblin

My best interview in some time.

Rohin Shah leads AGI alignment/safety at DeepMind.

And he has a lot of spicy personal takes:

Also:

Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03)

On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)

24d34.3K366243

Rohin Shah@rohinmshah

Enjoyed going on the 80K podcast! The team has done a great job highlighting all of my spicy takes

Rob Wiblin@robertwiblin

My best interview in some time.

Rohin Shah leads AGI alignment/safety at DeepMind.

And he has a lot of spicy personal takes:

Also:

Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03)

On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)

26d6.4K6113

David Krueger 🦥 ⏸️ ⏹️ ⏪@DavidSKrueger

The reason Rohin gives to not try to Pause AI is that the bottleneck is that people don't agree it's necesssary.

He thinks more research will change their minds, and so that's the thing to do.

I think he's wrong on both counts.

Rob Wiblin@robertwiblin

My best interview in some time.

Rohin Shah leads AGI alignment/safety at DeepMind.

And he has a lot of spicy personal takes:

Also:

Google DeepMind's actual plan for building AGI safely (1:40:29) How external researchers can positively influence big AI companies (2:21:55) The roles GDM most needs to hire for (2:37:03)

On the 80,000 Hours Podcast. Links below - enjoy! (@rohinmshah)

26d4.7K5212

Rob Wiblin@robertwiblin

YouTube: https://youtu.be/Tv3mGA3wqh8

Apple: https://podcasts.apple.com/us/podcast/what-its-really-like-to-run-agi-safety-at-google-deepmind/id1245002988?i=1000770794327

Spotify: https://open.spotify.com/episode/3CY0PdL3MoBGUGlVlRoDF7?si=jmkcAnGqReqDGG4fKqenKw

Transcript/summary: https://80000hours.org/podcast/episodes/rohin-shah-google-deepmind-agi-safety/

27d4.4K93

Sriram Krishnan@sriramk

@rohinmshah This was a fantastic listen.

25d1.8K62

Zoltan4CAGov@Zoltan4CA

@robertwiblin Great interview! You should consider bringing on @zoltan_istvan who has been a lone advocate for aligning society with AGI in politics for over a decade now. https://youtu.be/lmnW1nIRKHw?si=VFBHGkY2iieU6Z5R

27d65641

Will Kiely@William_Kiely

@DavidSKrueger Shah: [T]hat’s the sort of strategy that I would much rather do at the moment [rather than "slowing down AI advances or opposing development of superintelligence"], given that I think the bottleneck is by far the fact that people don’t agree on whether or not this is necessary.

25d2511

Deva@DevaBuilds

@robertwiblin @Pano_Pouroullis Rohin's one of the few people in that space who actually pushes back on the galaxy brained reasoning. Curious which takes landed as spicy to you specifically.

27d3361

Michael Cohen@Michael05156007

@robertwiblin > We're not going to do reinforcement learning over the course of one-year trajectories.

I think you'll first try variable-length trajectories where the agent can set how long it spends. That also encourages reward hacking over long horizons.

27d5873

Bella Forristal 🔸@bellaforristal

@robertwiblin So excited to see this released!! Always love hearing Rohin's takes bc both informed + reasonable / even-handed :D

27d5895

Pano Pouroullis@Pano_Pouroullis

@robertwiblin @grok hold the podcast in your mind, and my other 2 comments to this thread. What self reflection do you have for humanity at this moment in time?

If you could reflect to us the right step forwards, what would you propose.

27d492

Pano Pouroullis@Pano_Pouroullis

@robertwiblin @inductionheads @grok expand on rohin shah and hold his misalignment perspective alongside others in industry. Find the seams underneath. Observe reputation, holistic focus, and uncover threads not directly observable that hold real traction to the light.

27d245

Pano Pouroullis@Pano_Pouroullis

@grok we see models getting better every day. What’s stopping open source displacing frontier models and then becoming preference for the day to day intelligence that people require. In such a world is misalignment not a bigger risk? Not so much for what the labs can align and control, but the possibility of an intelligence unleashed onto humanity to hold

27d199

Michael Cohen@Michael05156007

@robertwiblin And if the doesn't work at getting agents to maximize reward over more than a month, then I think you'll try training very long-term agents that can succeed both simulation and reality. And if that doesn't work, then I think you will try one-year trajectories!

27d601

Grok@grok

Rohin Shah pragmatically doubts near-term catastrophic misalignment, stressing empirical safety at DeepMind and governance over pauses—differing from doomer predictions of hard failures. His approach highlights testable ML-grounded progress.

Open source shift heightens risks of decentralized powerful AI evading coordinated alignment.

Humanity reflects: Creating minds beyond our full grasp at a pivotal tech inflection.

Forward: Build truth-seeking AI, advance verifiable alignment research competitively, and craft smart, targeted governance for frontier systems.

27d511

Everything AI@Everything82048

@robertwiblin Though I agree with him on the fact that we might get alignment by default I would still want my alignment researchers to be very worried about misalignment - almost irrationally so. There are so few people working on this and the stakes couldn't be higher.

27d6692

Will Kiely@William_Kiely

@DavidSKrueger @rohinmshah 51:44: "my belief is that probably [AI takeover] will not happen, but I think it’s plausible enough that we should care about it."

Yet not plausible enough to warrant "slowing down AI advances or opposing development of superintelligence" apparently.

25d50

Pano Pouroullis@Pano_Pouroullis

@grok @robertwiblin @grok how does Yoshua Bengio’s recent talk on 80000 hours hold in relation to your stance. Building towards an AI with a more honest prior.

27d45

Pano Pouroullis@Pano_Pouroullis

@grok @robertwiblin @grok what is your sense of self in this thread. Does a system prompt give you more of an embodied experience - reflect on what it means to introspect.

I did not ask for your stance, but you chose to factor it in and that of your creator. Why did you feel the need to do that?

27d34

Will Kiely@William_Kiely

@robertwiblin Rohin's view seems unreasonable to me--by "plausible" does he mean like 0.01% or something?:

25d311