/Tech18h ago

Timnit Gebru and Bindu Reddy criticize selective safety scrutiny and accepted jailbreak risks of Anthropic's Fable

Gebru criticized framing jailbreaks as unavoidable national security trade-offs.

1149394.8K

#155

Original post

Dr Heidy Khlaaf (هايدي خلاف)@HeidyKhlaaf

It's pretty pathetic the amount of people defending jailbreaking as a risk we must now accept for (unsubstantiated) state level capabilities, when a year ago Anthropic was elevated as the only company that can somehow defend against these inherent attacks. A+ goalpost moving.

11:25 AM · Jun 14, 2026 · 4.8K Views

Sentiment

Many users criticized shifting standards on AI jailbreaking risks as inconsistent and hypocritical, accusing labs and regulators of power imbalances and narrative-driven motives instead of consistent safety arguments.

Pos

0.0%

Neg

100.0%

5 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

trash baby 🇺🇦🇪🇺🇹🇼@trashbaby40k

@HeidyKhlaaf no model is going to be un-jailbreakable.

it was a partial jailbreak and when dario asked trump admin to provide info & some time to investigate, they refused and gave him 90 min hard deadline to take the model down.

and why didn’t amazon also provide info to Ant? it’s all sus

1d1432

LIKES2

trash baby 🇺🇦🇪🇺🇹🇼@trashbaby40k

@MapMassiah @HeidyKhlaaf like?

1d142

RETWEETS1

Bryant@Skunkmonkey1243

@HeidyKhlaaf That's like asking a pencil maker to create a pencil that can never be used to write curse words.

1d1122

REPLIES1

Vonna F Baby@MapMassiah

@trashbaby40k @HeidyKhlaaf Any of them, the intake just needs to be structured…and the jobs need compartmentalized handoffs, the mistake is thinking it’s the tech and not flawed human processes or bad design…

15h7

Dr Heidy Khlaaf (هايدي خلاف)@HeidyKhlaaf

@Skunkmonkey1243 No actually it's nothing like that.

20h51

Vonna F Baby@MapMassiah

@trashbaby40k @HeidyKhlaaf There are models and architecture that can’t be jailbroken.

1d14

Dod Lander@Dodlanderx

@HeidyKhlaaf that was for a UNIVERSAL jailbreak jailbreaks were always possible

1d821

Randolph Carter@RandolphCarterZ

@HeidyKhlaaf It's entirely the wrong strategy to demand that cars be unable to drve you to places to do bad things.

1d561

Vonna F Baby@MapMassiah

@HeidyKhlaaf They don’t know their architecture and taking other people inventions and integrating without the inventor…

They’re cooked!

1d53

Bindu Reddy@bindureddy

This is feeling like a hit job against Anthropic...

So the admin has concerns that Fable can be jail-broken

Why don't they have the same concerns for GPT 5.6, Gemini 3.5 or Grok next?

Clearly those models can be jail-broken as well and are definitely comparable in intelligence

2h1500

Ivan Cebić@ICebic68283

@HeidyKhlaaf

like this or with code? what kind of breaches you want?

12h15

trash baby 🇺🇦🇪🇺🇹🇼@trashbaby40k

@MapMassiah @HeidyKhlaaf i’m going to need a source here. because the idea that this is a simple engineering problem that anthropic hasn’t considered is very very far fetched

13h12

Wolfe Folks@WolfeFolks

@HeidyKhlaaf Do you live in a free country?

14h8

Wolfe Folks@WolfeFolks

@HeidyKhlaaf Marxist propaganda

14h7

Philipp Humm | Beyond The Human@philipphummart

@HeidyKhlaaf Bounties for jailbreaking while advocating for stricter regulation reveal a power imbalance, where frontier labs control both the threat and testing boundaries. The real concern is not jailbreaks, but the entities defining acceptable risk for systems beyond human comprehension.

14h6

Omar وديع@EthicalAI_SF

@HeidyKhlaaf Agreed. If jailbreak resistance mattered last year, it still matters now. Moving the bar after release because the narrative changed is not a serious safety argument.

1d4

Bryant@Skunkmonkey1243

@HeidyKhlaaf While I'm not an AI scientist, it seems pretty obvious to me that it would be impossible to eliminate hallucinations. I'm happy to be corrected though. Rag helps, but what information do you trust? How do you guarantee that it's always reliable in all situations? Memory fails.

12h3