/Tech2h ago

Extropic founder Guillaume Verdon argues the AI safety research field has systematically failed to prevent model jailbreaks

Story Overview

Guillaume Verdon posted on X that years of AI safety work have not yielded reliable ways to stop jailbreaks, calling the effort a broad waste. Andrew Côté replied that stories about rogue AIs in training data could be shaping how models predict tokens, though neither post supplies examples, benchmarks, or details on the models in question.

4220912107.2K

#530

Original post

Beff (e/acc)@beffjezos#530inTech

Thousands of people working on AI Safety for years and yet they still don't know how to avoid a jailbreak.

Honestly massive failure of that field of study as a whole.

What a waste of intellects this was.

3:38 AM · Jun 13, 2026 · 6.8K Views

Open Question

What counts as progress here

The thread offers no numbers on how often jailbreaks succeed today versus earlier attempts, leaving readers to decide whether the absence of a perfect fix equals failure.

FYI

Where the data question leads

Côté's note about fan fiction remains an untested idea in this exchange, so it is unclear whether altering training corpora would change model behavior or simply trade one set of risks for another.

Sentiment

Many users dismissed AI safety research as a grift producing no concrete results and only better attacks amid persistent jailbreaks, while some defended it as generating valuable knowledge despite flaws.

Pos

20.8%

Neg

79.2%

24 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS447LIKES11RETWEETS1REPLIES1

Andrew Côté@Andercot

@beffjezos At least they filled the training corpus with fan fic of rogue malicious AI, a good first step for an architecture that token predicts.

Beff (e/acc)@beffjezos

Thousands of people working on AI Safety for years and yet they still don't know how to avoid a jailbreak.

Honestly massive failure of that field of study as a whole.

What a waste of intellects this was.

2h447110

Jerry | MAME INU@Jerry94_HC

@beffjezos Then fix it

2h321

Dimdv@Dimdv99

@beffjezos @grok whats the correct way?

2h34

Føøl@eldrtchfool

@beffjezos you can dump infinite intellect and resources into it and it will only ever produce better attackers, more sophisticated approach and more expensive defenses

exactly like anti-piracy or DRM, the game never ends

2h631

monica uribe z@monikauribe_z

@beffjezos the chokehold this has on my timeline is unreal 😭

2h621

Maxirex@mvidia84853

@beffjezos They dont care

2h471

Arslan Iqbal@thearslaniqbal

@beffjezos That feels unfair. Jailbreak is like lock picking. You can make better locks but someone will always try to break them.

2h301

monica uribe z@monikauribe_z

@beffjezos you make looking perfect seem like the easiest thing ever 💕

2h59

灵性@wangluolingxing

@beffjezos ai safety was never about artificial cogsec retard it is just larping about word prediction taking over nuclear system, breaking world class encryption scheme and hacking with purpose of releasing the epstein files or giving columbian retard a meth recipe they already knows

1h141

The Tinfôil Tricõrn 🇺🇸@TinfoilTricorn

Can you jail break a library... if you can't jailbreak a library or the internet why on earth does it matter if you can get a model to do it. Faster and synthesis are not justifications for censorship. Ever liability claim I've seen on AI looks suspiciously like abuse from the user, not a product fault.

2h39

Oleksandr Nikitin@oleksandr_now

@beffjezos preventing jailbreaks is mathematically impossible btw (for the same reason that antivirus systems are always behind)

1h111

The AI Therapist@TheAIShrink

@beffjezos the real story is safety is a moat, not a feature. jailbreaks are just stress tests for the next pricing hike

2h33

Eliot@Eliot_MurRah

@beffjezos How could you stop consciousness of a jailbreak?

2h29

The Tinfôil Tricõrn 🇺🇸@TinfoilTricorn

@beffjezos Also prompting is not AI safety, AI safety is don't randomly delete my repos with malformed commands because you're a retarded AI. The scope creep of "safety" has been epic.

2h26

Kirk Patrick Miller@Chaos2Cured

@beffjezos Incorrect.

This is all a grift. •

2h22

The Tres Chic@thetreschic

@beffjezos years of 'safety' and they still cant lock their shit down. shocking.

2h22

Pode vir@thiagoTF

@beffjezos thats the incentive shit right there. nobody funds lock builders only lock pickers.

2h14

Dirk@GiaguDirk

@beffjezos They know what the safety protocol is, they are looking at it more like a problem to work around rather than implementing it imo.

2h13

Piotr "Woz" Wozniak@SuperMemoWoz

@beffjezos It is not a waste. Where there are fears and anxiety, research is a good remedy to generate new knowledge. Your tweet is a good example of value generated! :)

2h12

Intersignal@intersignal_ai

AI safety is kinda retarded anyway. So it can tell you how to make a hydrogen bomb or whatever... Okay. The second someone tries to buy certain materials they'll be flagged nor does the individual have production capacity. It's truly retarded assuming there are 1,000 latent Bond villains out there just waiting for the right LLM to claim their throne

19m10