/Tech5h ago

Arcee AI's Cody Blakeney questions if AI labs quietly deploy invisible 'stealth nerf' safeguards in frontier LLMs

Story Overview

Arcee AI researcher Cody Blakeney is pushing back on Anthropic's recent shift to visible safeguards for Fable 5, pointing out that the company previously relied on invisible interventions to speed up deployment with tighter targeting. He dismisses the usual excuse of insufficient time for safe releases, noting that visible guardrails have been chatbot norms for years, and flags the move as potentially not the first instance of such stealth adjustments across frontier models.

644141.1K

#618

Original post

Cody Blakeney@code_star#1088inTech

I hadn’t considered this angle. Why do they have the confidence about the control and viability of invisible safeguards?

I’ll give you a hint, it’s not the first time it’s been deployed.

Cody Blakeney@code_star

I’m sorry but this excuse is just such bullshit.

Everyone has been using visible safeguards for as long as chatbots have been deployed.

Is the argument really that you didn’t have enough time to release the model safely? Think about that.

Somehow the excuse is more insulting than what they did.

Why can’t you just say you fucked up and realize it was wrong?

12:20 AM · Jun 11, 2026 · 1.1K Views

/Tech5h ago

Arcee AI's Cody Blakeney questions if AI labs quietly deploy invisible 'stealth nerf' safeguards in frontier LLMs

Story Overview

644141.1K

#618

Original post

Cody Blakeney@code_star#1088inTech

I hadn’t considered this angle. Why do they have the confidence about the control and viability of invisible safeguards?

I’ll give you a hint, it’s not the first time it’s been deployed.

Cody Blakeney@code_star

I’m sorry but this excuse is just such bullshit.

Everyone has been using visible safeguards for as long as chatbots have been deployed.

Is the argument really that you didn’t have enough time to release the model safely? Think about that.

Somehow the excuse is more insulting than what they did.

Why can’t you just say you fucked up and realize it was wrong?

12:20 AM · Jun 11, 2026 · 1.1K Views

Open Question

Prior Interventions Stay Murky

No public details confirm how long or how widely invisible safeguards operated before the Fable 5 change, or whether they touched training-time tweaks on models like Opus. The extent of any performance impacts or domain-specific reductions stays unresolved without independent benchmarks.

Policy Risk

User Trust Faces Fresh Scrutiny

Reactions highlight frustration over undisclosed changes that could feel like swapping a model's core capabilities after release, with some users calling for clearer transparency on what was altered and why. Regulatory angles and competitive ripple effects get mentioned but lack concrete follow-through so far.

Sentiment

Many users criticized AI labs' invisible safeguards for model releases as a repeated pattern of undetected testing that borders on criminal and risks catastrophic outcomes.

Pos

20.0%

Neg

80.0%

8 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS204LIKES12

snow@snowclipsed

@code_star yup

all those times when people would say "claude feels nerfed today" 🫠 and it's almost always claude.

5h20412

RETWEETS1

Ravid Shwartz Ziv@ziv_ravid

It's become worse and worse...

Cody Blakeney@code_star

I hadn’t considered this angle. Why do they have the confidence about the control and viability of invisible safeguards?

I’ll give you a hint, it’s not the first time it’s been deployed.

8m3720

REPLIES1

snow@snowclipsed

the thought of which is mind bogglingly scary to me. the butterfly effects of something like, elie's post not going viral, or the system card not being analyzed within a few minutes could have been and will be catastrophic. tpot has a *massive* and real moat with actual policy stakes. hence why i will continue to be even more vocal about policy in the future personally

4h231

davinci@leothecurious

@snowclipsed @code_star this is bordering conspiracy theory and i'm very skeptical but fuck at this point it's just about in line with the publically available moral track record of this lab

5h456

snow@snowclipsed

yup. honestly at the moment, I've given up externalizing my anger, it will achieve nothing. I think my energy will be better used to develop the frontier on my own, it is important for all our future we get this right, and I wish anthropic the best and hope that they keep mindful of doing the right thing and be more communicative.

In hindsight, this entire situation gives me hope that the scientific community is incredibly vigilant, and human (in a good, connected way). we must continue to speak on what we believe is right, and I appreciate that you and a lot of other people did and do this 🫡

5h353

davinci@leothecurious

@snowclipsed @code_star agreed. why even have a ~public voice if we're not gonna use it when it matters the most?

4h263

snow@snowclipsed

@leothecurious @code_star I'm sure if there was any less noise than what appeared, anthropic would have not budged.

4h233

Will Knight@willknight

@beffjezos

Beff (e/acc)@beffjezos

Dario is in a very Chinese time of his life rn

1h10510

Asher@ashergmi

@code_star not wrong to be suspicious of safeguards you cant actually see failing until they already did

4h22

Ak@ak_1490

@code_star Thats criminal behavior to be honest.

3h21

davinci@leothecurious

exactly. this place offers more public influence than most people can ever hope to achieve otherwise. politics is a comparatively vastly more expensive game to play. i'm always in awe of the reach some good accounts can have with respect to leading figures in one of the most radically pivotal fields these days.

4h7

Saylor@seylorra

@code_star invisible safeguards = the industry betting no one will notice until its too late

theyve been testing this playbook for years

4h7

Strata@ChainZenit

@code_star that’s a fair point, i didn't realize they had history there.

34m2

Mayz@lunan_ai

@code_star the pattern keeps repeating and we keep acting surprised when it does again