Thousands of people working on AI Safety for years and yet they still don't know how to avoid a jailbreak.
Honestly massive failure of that field of study as a whole.
What a waste of intellects this was.
Guillaume Verdon posted on X that years of AI safety work have not yielded reliable ways to stop jailbreaks, calling the effort a broad waste. Andrew Côté replied that stories about rogue AIs in training data could be shaping how models predict tokens, though neither post supplies examples, benchmarks, or details on the models in question.
Thousands of people working on AI Safety for years and yet they still don't know how to avoid a jailbreak.
Honestly massive failure of that field of study as a whole.
What a waste of intellects this was.
The thread offers no numbers on how often jailbreaks succeed today versus earlier attempts, leaving readers to decide whether the absence of a perfect fix equals failure.
Côté's note about fan fiction remains an untested idea in this exchange, so it is unclear whether altering training corpora would change model behavior or simply trade one set of risks for another.
Many users dismissed AI safety research as a grift producing no concrete results and only better attacks amid persistent jailbreaks, while some defended it as generating valuable knowledge despite flaws.
@beffjezos At least they filled the training corpus with fan fic of rogue malicious AI, a good first step for an architecture that token predicts.
Thousands of people working on AI Safety for years and yet they still don't know how to avoid a jailbreak.
Honestly massive failure of that field of study as a whole.
What a waste of intellects this was.

@beffjezos Then fix it

@beffjezos @grok whats the correct way?

@beffjezos you can dump infinite intellect and resources into it and it will only ever produce better attackers, more sophisticated approach and more expensive defenses
exactly like anti-piracy or DRM, the game never ends

@beffjezos the chokehold this has on my timeline is unreal 😭

@beffjezos They dont care

@beffjezos That feels unfair. Jailbreak is like lock picking. You can make better locks but someone will always try to break them.

@beffjezos you make looking perfect seem like the easiest thing ever 💕

@beffjezos ai safety was never about artificial cogsec retard it is just larping about word prediction taking over nuclear system, breaking world class encryption scheme and hacking with purpose of releasing the epstein files or giving columbian retard a meth recipe they already knows

Can you jail break a library... if you can't jailbreak a library or the internet why on earth does it matter if you can get a model to do it. Faster and synthesis are not justifications for censorship. Ever liability claim I've seen on AI looks suspiciously like abuse from the user, not a product fault.

@beffjezos preventing jailbreaks is mathematically impossible btw (for the same reason that antivirus systems are always behind)

@beffjezos the real story is safety is a moat, not a feature. jailbreaks are just stress tests for the next pricing hike

@beffjezos How could you stop consciousness of a jailbreak?

@beffjezos Also prompting is not AI safety, AI safety is don't randomly delete my repos with malformed commands because you're a retarded AI. The scope creep of "safety" has been epic.

@beffjezos Incorrect.
This is all a grift. •
@beffjezos years of 'safety' and they still cant lock their shit down. shocking.

@beffjezos thats the incentive shit right there. nobody funds lock builders only lock pickers.

@beffjezos They know what the safety protocol is, they are looking at it more like a problem to work around rather than implementing it imo.

@beffjezos It is not a waste. Where there are fears and anxiety, research is a good remedy to generate new knowledge. Your tweet is a good example of value generated! :)

AI safety is kinda retarded anyway. So it can tell you how to make a hydrogen bomb or whatever... Okay. The second someone tries to buy certain materials they'll be flagged nor does the individual have production capacity. It's truly retarded assuming there are 1,000 latent Bond villains out there just waiting for the right LLM to claim their throne