It's pretty pathetic the amount of people defending jailbreaking as a risk we must now accept for (unsubstantiated) state level capabilities, when a year ago Anthropic was elevated as the only company that can somehow defend against these inherent attacks. A+ goalpost moving.
Timnit Gebru and Bindu Reddy criticize selective safety scrutiny and accepted jailbreak risks of Anthropic's Fable
Gebru criticized framing jailbreaks as unavoidable national security trade-offs.
Many users criticized shifting standards on AI jailbreaking risks as inconsistent and hypocritical, accusing labs and regulators of power imbalances and narrative-driven motives instead of consistent safety arguments.
Most Activity

@HeidyKhlaaf no model is going to be un-jailbreakable.
it was a partial jailbreak and when dario asked trump admin to provide info & some time to investigate, they refused and gave him 90 min hard deadline to take the model down.
and why didn’t amazon also provide info to Ant? it’s all sus

@MapMassiah @HeidyKhlaaf like?

@HeidyKhlaaf That's like asking a pencil maker to create a pencil that can never be used to write curse words.

@trashbaby40k @HeidyKhlaaf Any of them, the intake just needs to be structured…and the jobs need compartmentalized handoffs, the mistake is thinking it’s the tech and not flawed human processes or bad design…

@Skunkmonkey1243 No actually it's nothing like that.

@trashbaby40k @HeidyKhlaaf There are models and architecture that can’t be jailbroken.

@HeidyKhlaaf that was for a UNIVERSAL jailbreak jailbreaks were always possible

@HeidyKhlaaf It's entirely the wrong strategy to demand that cars be unable to drve you to places to do bad things.

@HeidyKhlaaf They don’t know their architecture and taking other people inventions and integrating without the inventor…
They’re cooked!
This is feeling like a hit job against Anthropic...
So the admin has concerns that Fable can be jail-broken
Why don't they have the same concerns for GPT 5.6, Gemini 3.5 or Grok next?
Clearly those models can be jail-broken as well and are definitely comparable in intelligence

@HeidyKhlaaf
like this or with code? what kind of breaches you want?

@MapMassiah @HeidyKhlaaf i’m going to need a source here. because the idea that this is a simple engineering problem that anthropic hasn’t considered is very very far fetched

@HeidyKhlaaf Do you live in a free country?

@HeidyKhlaaf Marxist propaganda

@HeidyKhlaaf Bounties for jailbreaking while advocating for stricter regulation reveal a power imbalance, where frontier labs control both the threat and testing boundaries. The real concern is not jailbreaks, but the entities defining acceptable risk for systems beyond human comprehension.

@HeidyKhlaaf Agreed. If jailbreak resistance mattered last year, it still matters now. Moving the bar after release because the narrative changed is not a serious safety argument.

@HeidyKhlaaf While I'm not an AI scientist, it seems pretty obvious to me that it would be impossible to eliminate hallucinations. I'm happy to be corrected though. Rag helps, but what information do you trust? How do you guarantee that it's always reliable in all situations? Memory fails.