/Tech5h ago

Arcee AI's Cody Blakeney questions if AI labs quietly deploy invisible 'stealth nerf' safeguards in frontier LLMs

Story Overview

Arcee AI researcher Cody Blakeney is pushing back on Anthropic's recent shift to visible safeguards for Fable 5, pointing out that the company previously relied on invisible interventions to speed up deployment with tighter targeting. He dismisses the usual excuse of insufficient time for safe releases, noting that visible guardrails have been chatbot norms for years, and flags the move as potentially not the first instance of such stealth adjustments across frontier models.

644141.1K
Original post
Cody Blakeney@code_star#1088inTech

I hadn’t considered this angle. Why do they have the confidence about the control and viability of invisible safeguards?

I’ll give you a hint, it’s not the first time it’s been deployed.

Cody Blakeney@code_star

I’m sorry but this excuse is just such bullshit.

Everyone has been using visible safeguards for as long as chatbots have been deployed.

Is the argument really that you didn’t have enough time to release the model safely? Think about that.

Somehow the excuse is more insulting than what they did.

Why can’t you just say you fucked up and realize it was wrong?

12:20 AM · Jun 11, 2026 · 1.1K Views
Open Question

Prior Interventions Stay Murky

No public details confirm how long or how widely invisible safeguards operated before the Fable 5 change, or whether they touched training-time tweaks on models like Opus. The extent of any performance impacts or domain-specific reductions stays unresolved without independent benchmarks.

Policy Risk

User Trust Faces Fresh Scrutiny

Reactions highlight frustration over undisclosed changes that could feel like swapping a model's core capabilities after release, with some users calling for clearer transparency on what was altered and why. Regulatory angles and competitive ripple effects get mentioned but lack concrete follow-through so far.

Sentiment

Many users criticized AI labs' invisible safeguards for model releases as a repeated pattern of undetected testing that borders on criminal and risks catastrophic outcomes.

Pos
20.0%
Neg
80.0%
8 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS204LIKES12
snow@snowclipsed

@code_star yup

all those times when people would say "claude feels nerfed today" 🫠 and it's almost always claude.

5hViews 204Likes 12
RETWEETS1

It's become worse and worse...

Cody Blakeney@code_star

I hadn’t considered this angle. Why do they have the confidence about the control and viability of invisible safeguards?

I’ll give you a hint, it’s not the first time it’s been deployed.

8mViews 37Likes 2Bookmarks 0
REPLIES1
snow@snowclipsed

the thought of which is mind bogglingly scary to me. the butterfly effects of something like, elie's post not going viral, or the system card not being analyzed within a few minutes could have been and will be catastrophic. tpot has a *massive* and real moat with actual policy stakes. hence why i will continue to be even more vocal about policy in the future personally

4hViews 23Likes 1
davinci@leothecurious

@snowclipsed @code_star this is bordering conspiracy theory and i'm very skeptical but fuck at this point it's just about in line with the publically available moral track record of this lab

5hViews 45Likes 6
snow@snowclipsed

yup. honestly at the moment, I've given up externalizing my anger, it will achieve nothing. I think my energy will be better used to develop the frontier on my own, it is important for all our future we get this right, and I wish anthropic the best and hope that they keep mindful of doing the right thing and be more communicative.

In hindsight, this entire situation gives me hope that the scientific community is incredibly vigilant, and human (in a good, connected way). we must continue to speak on what we believe is right, and I appreciate that you and a lot of other people did and do this 🫡

5hViews 35Likes 3
davinci@leothecurious

@snowclipsed @code_star agreed. why even have a ~public voice if we're not gonna use it when it matters the most?

4hViews 26Likes 3
snow@snowclipsed

@leothecurious @code_star I'm sure if there was any less noise than what appeared, anthropic would have not budged.

4hViews 23Likes 3
Will Knight@willknight

@beffjezos

Beff (e/acc)@beffjezos

Dario is in a very Chinese time of his life rn

1hViews 105Likes 1Bookmarks 0
Asher@ashergmi

@code_star not wrong to be suspicious of safeguards you cant actually see failing until they already did

4hViews 22
Ak@ak_1490

@code_star Thats criminal behavior to be honest.

3hViews 21
davinci@leothecurious

exactly. this place offers more public influence than most people can ever hope to achieve otherwise. politics is a comparatively vastly more expensive game to play. i'm always in awe of the reach some good accounts can have with respect to leading figures in one of the most radically pivotal fields these days.

4hViews 7
Saylor@seylorra

@code_star invisible safeguards = the industry betting no one will notice until its too late

theyve been testing this playbook for years

4hViews 7
Strata@ChainZenit

@code_star that’s a fair point, i didn't realize they had history there.

34mViews 2
Mayz@lunan_ai

@code_star the pattern keeps repeating and we keep acting surprised when it does again

4h
Arcee AI's Cody Blakeney questions if AI labs quietly deploy invisible 'stealth nerf' safeguards in frontier LLMs · Digg