Digg co-founder Kevin Rose says automated curation judges are being adjusted after mistaking a 'Claude Fable 5' joke for news

VIEWS130.7KLIKES560

"No big deal, just join the trusted group!" the apologists will say, but the restrictions mean you can't build a product on those models. Security companies and startups that provide services to others will now be driven to use Chinese models. Big win for PRC labs this month.

19h130.7K56041

BOOKMARKS53

Alex Stamos@alexstamos

2) Anthropic makes the cost of this White House freakout clear. US labs now have to make a much more conservative precision-recall tradeoff on cyber refusals. US models will become much less useful for defensive cybersecurity work unless you are in the trusted group.

19h104.1K48553

RETWEETS155

Alex Stamos@alexstamos

A lot to unpack here. Anthropic is burying some hard truths in careful political language. Some initial reads:

1) Anthropic verifies that none of the jailbreaks provided a capability beyond what many other models, including Chinese models, could do.

Anthropic@AnthropicAI

Claude Fable 5 will be available again globally tomorrow.

After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests.

We’ve also begun drafting a consensus framework—with Amazon, Microsoft, Google, and other Glasswing partners—for assessing the severity of AI jailbreaks and how AI developers should respond to them. We invite other industry partners and model providers to join us in this effort.

Finally, we’re scaling up our collaboration with the US government on model testing and safeguards. This will include pre-release access to models and safeguards for evaluation, information sharing on jailbreaks and misuse, and dedicated resources for joint research.

Thank you to our users for your patience, and to our partners across the government, industry, and the research community who worked alongside us to make Fable 5 available again.

Read our full blog: https://www.anthropic.com/news/redeploying-fable-5

19h1.4M2.6K1.6K

REPLIES8

Alex Stamos@alexstamos

This was a huge own goal for the US, and we will see how bad US models get over the next six months and if Chinese models become noticeably better for cyber work.

19h55K39322

Alex Stamos@alexstamos

3) CAISI is the group that is supposed to actually make these determinations, not the political actors in the White House. They were positive on the prior safeguards. The implication is that this whole thing was unnecessary.

19h80.3K34131

Alex Stamos@alexstamos

In short, Anthropic's blog is saying: We have always cared about safety, we did a good job initially, the actual AI experts in USG agreed, we proved it, we will come up with standards so these things are better communicated, welcome to the AI safety club Trump admin.

19h52.1K36826

Alex Stamos@alexstamos

4) There is no good scoring framework for jailbreaks; this would be an improvement. The inclusion of Amazon as the first name in the coalition is not an accident. Anthropic is saying "Amazon's inability to appropriately communicate severity threw our industry into chaos".

19h70.3K37125

Alex Stamos@alexstamos

5) "You don't have to get Dario on the phone to talk to us about these things. Other people work here, we swear."

19h53.6K30017

Alex Stamos@alexstamos

The only upside I can see from this whole mess is that there is a whole bunch of VCs with former or current Administration affiliation who we can now safely ignore on AI policy.

They have shown that everything they ever said on AI regulation was just politically motivated.

10h12.3K12211

Alex Stamos@alexstamos

For all the “This is what Anthropic wanted” people/bots. No, they didn’t. They didn’t want a stupid, knee-jerk response on a Friday. We give the USG huge powers, this is why you staff it with competent, calm, non-corrupt people who don’t use those powers to punish enemies.

11h15.7K1447

AI Appreciator@ai_appreciator

@alexstamos I’d peg Amazon as the guilty party here. When the CEO of a big trusted tech company tells gov “this is bad,” of course they will freak out. Amazon should have known better and possibly had ulterior motives.

17h28.3K1011

Vandos ❓@__vandos__

The buried lede is even sharper than you put it: Anthropic’s own testing found the jailbreak works on Claude Haiku 4.5 — their weakest public model. If the ‘capability’ that triggered an 18-day global suspension was already present in Haiku, the original export control directive wasn’t protecting against a unique frontier risk. It was protecting against something already in the wild at every price tier.

16h17.9K324

Alex Stamos@alexstamos

@__vandos__ That’s not even a jailbreak. Nobody tries to prevent this kind of behavior on most of their models, as finding individual flaws and creating PoC used to be considered normal behavior for an LLM.

11h7.8K341

Tilman Bayer@tilmanbayer

@alexstamos Thanks for your service! Anthropic should hire you as their new CAT (Chief Anger Translator)

17h23.7K28

Mille@milleniusz

Amodei got what he wanted from all the scaremongering. Reminder - he was warning the GPT2 was to dangerous to be released to the public back when he worked at OpenAI

I agree - the only winner here is China

The open-source models may be less performant, but they’re cheaper and stable. Now if you take away the frontier access to the US models - the performance argument disappears as well

14h4.6K161

Kohan Ikin@syneryder

@ai_appreciator @alexstamos It's worth peeking at the Sonnet 5 System Card. Amazon's Nova 2 gets called out as the most dangerous AI of all, most susceptible to jailbreaks, more dangerous than Chinese models.

Seems like Ant might suggest Amazon / Bedrock be banned for unsafe AI practices & behavior. *coff*

13h47552

SOMALILAND PATRIOTS // 🏳️‍🌈🇺🇦ꑭ🔞@Cirno_Manul

@alexstamos what do you think it means that it falls back on Opus for "coding" and "debuggin" tasks?

17h21.2K21

Alex Stamos@alexstamos

@tilmanbayer

10h4.6K17

Jason@sandarigi

@alexstamos Headline: Anthropic's fear mongering campaign not only was irresponsible but was also an outright lie given that Fable is no more dangerous than existing Chinese models. Got it.

11h1.9K14

Ben "Manacaster" Kelly@ManacasterBen

@ai_appreciator @alexstamos Dario owns no hardware. He's renting it. The moment the hardware owners decide they dont like him, they pull the plug.

Elon is playing 69-dimensional chess here. Rent compute to people to subsidize buying cursor and training a killer model, then rugpull Dario.

12h30911

Digg co-founder Kevin Rose says automated curation judges are being adjusted after mistaking a 'Claude Fable 5' joke for news

Story Overview

Judges now flag edge-case humor earlier

Underlying model drama stays unresolved