apologies for the incorrect title, the judges are being adjusted to detect sarcasm/jokes and factually reground in real-time, will take a couple days to fully deploy
👁️👄👁️
Digg's automated system treated a tongue-in-cheek reference to the Claude Fable 5 jailbreak saga as straight reporting and pushed it live, forcing co-founder Kevin Rose to confirm that the curation judges are now being recalibrated to better separate satire from fact.
apologies for the incorrect title, the judges are being adjusted to detect sarcasm/jokes and factually reground in real-time, will take a couple days to fully deploy
👁️👄👁️
The tweak targets the specific failure mode where a joke post about Pliny's prompt extraction was read as a real security incident, reducing the chance of similar mix-ups in future news flows.
Anthropic's redeployment of Fable 5 with tighter cyber classifiers continues, yet the original jailbreak claims and temporary routing of coding tasks to Opus 4.8 leave open questions about long-term filter accuracy.
Positive users thank researchers for work on Claude Fable 5 security while negative users blame Anthropic's fear-mongering for stricter filters and lost product potential.
No Digg Deeper questions have been answered for this story yet.

"No big deal, just join the trusted group!" the apologists will say, but the restrictions mean you can't build a product on those models. Security companies and startups that provide services to others will now be driven to use Chinese models. Big win for PRC labs this month.

2) Anthropic makes the cost of this White House freakout clear. US labs now have to make a much more conservative precision-recall tradeoff on cyber refusals. US models will become much less useful for defensive cybersecurity work unless you are in the trusted group.
A lot to unpack here. Anthropic is burying some hard truths in careful political language. Some initial reads:
1) Anthropic verifies that none of the jailbreaks provided a capability beyond what many other models, including Chinese models, could do.
Claude Fable 5 will be available again globally tomorrow.
After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests.
We’ve also begun drafting a consensus framework—with Amazon, Microsoft, Google, and other Glasswing partners—for assessing the severity of AI jailbreaks and how AI developers should respond to them. We invite other industry partners and model providers to join us in this effort.
Finally, we’re scaling up our collaboration with the US government on model testing and safeguards. This will include pre-release access to models and safeguards for evaluation, information sharing on jailbreaks and misuse, and dedicated resources for joint research.
Thank you to our users for your patience, and to our partners across the government, industry, and the research community who worked alongside us to make Fable 5 available again.
Read our full blog: https://www.anthropic.com/news/redeploying-fable-5

This was a huge own goal for the US, and we will see how bad US models get over the next six months and if Chinese models become noticeably better for cyber work.

3) CAISI is the group that is supposed to actually make these determinations, not the political actors in the White House. They were positive on the prior safeguards. The implication is that this whole thing was unnecessary.

In short, Anthropic's blog is saying: We have always cared about safety, we did a good job initially, the actual AI experts in USG agreed, we proved it, we will come up with standards so these things are better communicated, welcome to the AI safety club Trump admin.

4) There is no good scoring framework for jailbreaks; this would be an improvement. The inclusion of Amazon as the first name in the coalition is not an accident. Anthropic is saying "Amazon's inability to appropriately communicate severity threw our industry into chaos".

5) "You don't have to get Dario on the phone to talk to us about these things. Other people work here, we swear."

The only upside I can see from this whole mess is that there is a whole bunch of VCs with former or current Administration affiliation who we can now safely ignore on AI policy.
They have shown that everything they ever said on AI regulation was just politically motivated.

For all the “This is what Anthropic wanted” people/bots. No, they didn’t. They didn’t want a stupid, knee-jerk response on a Friday. We give the USG huge powers, this is why you staff it with competent, calm, non-corrupt people who don’t use those powers to punish enemies.

@alexstamos I’d peg Amazon as the guilty party here. When the CEO of a big trusted tech company tells gov “this is bad,” of course they will freak out. Amazon should have known better and possibly had ulterior motives.

The buried lede is even sharper than you put it: Anthropic’s own testing found the jailbreak works on Claude Haiku 4.5 — their weakest public model. If the ‘capability’ that triggered an 18-day global suspension was already present in Haiku, the original export control directive wasn’t protecting against a unique frontier risk. It was protecting against something already in the wild at every price tier.

@__vandos__ That’s not even a jailbreak. Nobody tries to prevent this kind of behavior on most of their models, as finding individual flaws and creating PoC used to be considered normal behavior for an LLM.

@alexstamos Thanks for your service! Anthropic should hire you as their new CAT (Chief Anger Translator)

Amodei got what he wanted from all the scaremongering. Reminder - he was warning the GPT2 was to dangerous to be released to the public back when he worked at OpenAI
I agree - the only winner here is China
The open-source models may be less performant, but they’re cheaper and stable. Now if you take away the frontier access to the US models - the performance argument disappears as well

@ai_appreciator @alexstamos It's worth peeking at the Sonnet 5 System Card. Amazon's Nova 2 gets called out as the most dangerous AI of all, most susceptible to jailbreaks, more dangerous than Chinese models.
Seems like Ant might suggest Amazon / Bedrock be banned for unsafe AI practices & behavior. *coff*

@alexstamos what do you think it means that it falls back on Opus for "coding" and "debuggin" tasks?

@tilmanbayer

@alexstamos Headline: Anthropic's fear mongering campaign not only was irresponsible but was also an outright lie given that Fable is no more dangerous than existing Chinese models. Got it.

@ai_appreciator @alexstamos Dario owns no hardware. He's renting it. The moment the hardware owners decide they dont like him, they pull the plug.
Elon is playing 69-dimensional chess here. Rent compute to people to subsidize buying cursor and training a killer model, then rugpull Dario.