/Tech9h ago

Harvard Professor Urges Less AI Alignment Research to Prevent Backfire

9606144.3K
Original post
Cas (Stephen Casper)@StephenLCasper#619inTech

Wonderful talking with crux artist and all-around smart guy, @liron. Thanks for having me!

Check out the debate for discussions on doom, disinformation, disempowerment, deepfakes, and demagogues!

NEW: Harvard Professor @StephenLCasper (a.k.a. Cas), a computer scientist working on AI safeguards and governance research, worries that alignment efforts will backfire! 🤦‍♂️

Today on Doom Debates, we agree that slowing down AI development would make the world safer… but you know Cas's position is unique when he says he’d prefer to have *less* research on AI alignment!

We cover: ⬜ Cas's research focus ⬜ What's Your P(Doom)?™ ⬜ How high is the intelligence ceiling? ⬜ Idiocracy-inspired gradual disempowerment ⬜ Poor governance led to sycophancy, MechaHitler, nudification ⬜ Case study: Safeguards on DALL-E 2 vs. Stable Diffusion ⬜ The #PauseAI treaty option ⬜ Why Cas opposes alignment research

This episode is a substantive AI doom debate with a brilliant guest. Highly recommended! 👇

12:11 PM · Jun 10, 2026 · 3K Views
Sentiment

Users are excited about the Harvard professor urging less AI alignment research, praising the episode as fantastic.

Pos
100.0%
Neg
0.0%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS241REPLIES4RETWEETS1
Cas (Stephen Casper)@StephenLCasper

My reasoning behind wanting to stop superalignment research: - Principal-agent alignment is neither necessary nor sufficient for safety and ecosystemic health. I think the vast majority of our problems will come from scenarios involving systemic harms, negligence, or malice instead of situations in which someone benevolent was exercising what would be considered “best technical practices” at the time but non-foreseeably loses control of their AI. - I think that current alignment, control, and containment strategies are actually pretty good and there is a big incentive for ML people to underemphasize the effectiveness of these tools to justify their existence. If you’re willing to pay a safety tax and are not “move fast and break things,” existing best practices can make systems pretty robustly safe. - Superalignment is pretty safety washed and is touted by big companies to justify their ambitions to build the superintelligence. - Solving superalignment would be a huge boon and would consolidate enormous power in big tech. This is itself a risk factor. - Jevons paradox — it’s easy to see how lowering the perceived risk of building superintelligence would make more companies choose to try.

7hViews 241Likes 3
LIKES4
Tim Kostolansky@thkostolansky

@StephenLCasper @liron harvard prof cas 🤩

9hViews 25Likes 4
David Gringras@davidgringras

@StephenLCasper @liron Fantastic episode! 👏

8hViews 18Likes 3
Jacques@JacquesThibs

For clarity:

Do you believe that automated alignment, scalable oversight and superalignment are bad because:

- you don't put much weight on superintelligence being a problem - therefore most of the bad outcome probability is from loss of control / gradual disempowerment / power concentration issues - so, those agendas are bad because they accelerate a world where companies have higher capability systems sooner

Are you trying to forever cap the capabilities of AI models? Is it a smooth transition thing?

It almost gave the impression that we should largely not make progress ever on alignment, which is hard to believe this is your take? It's a bit unclear to me what the ideal mainline future would look like to you. Do you never touch those agendas or make progress on alignment? You kind of have to at some point, no?

8hViews 156Likes 1
Cas (Stephen Casper)@StephenLCasper

No commitments, but I might finish transitioning fully away from ML research in a few years. Once we develop better capability-removal-based safeguards for open-weight models, I think that we will be at a point in which marginally improving the ML toolkit will do much much less good for reducing AI harms compared to marginally improving governance and ecosystemic hygiene.

7hViews 37Likes 2

@davidgringras @StephenLCasper Thanks!

8hViews 17Likes 1
Simon Lermen@SimonLermenAI

@StephenLCasper @JacquesThibs @liron How do you think multiple superintelligences would interact with each other? "to justify their ambitions to build the superintelligence." so you think what will happen? they will build ASI or not?

7hViews 31
jawarskiii@Puissant_FeLLA

@StephenLCasper @JacquesThibs @liron you should contribute to this research... seems like it would be up your alley bud... take a look at what @fathom_lab is building. https://github.com/fathom-lab/styxx

7hViews 11
Cas (Stephen Casper)@StephenLCasper

@JacquesThibs @liron Oh — I forgot to mention that alignment is also an inherently double edged sword that makes misuse worse.

7hViews 5