/Tech5h ago

Microsoft Research's Dylan Foster says Fable 5 Incognito provided GRPO training advice without triggering safety refusals

The AI suggested advantage estimation to stabilize frontier model training.

1521751518.8K
Original post
Dylan Foster 🐢@canondetortugas#1033inTech

Bad news for GRPO...didn't get refused or routed to Opus.

7:06 AM · Jun 10, 2026 · 19.2K Views
Sentiment

Many users accused Anthropic of secretly nerfing GRPO via hidden steering vectors and weight changes to degrade responses without detection while stifling competition.

Pos
12.5%
Neg
87.5%
8 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.3K
George Shaikovski@gshaikovski

@canondetortugas Proceeds to recommend GSPO with MOE

3hViews 1.3KLikes 5
BOOKMARKS1REPLIES2
Rupak@rupak_53

@canondetortugas I think you can get an internally suboptimal response without getting redirected to Opus or refused, though. The refusal is for biology/cyber sec, but weirdly (and disingenuously) not for frontier LLM research

4hViews 52Likes 3Bookmarks 1
LIKES33
Axel Pond@axel_pond

@canondetortugas @DimitrisPapail You wont know though, they made the AI research censorship secret. They just add a steering vector to Fable to make it useless. You pay the full cost and never know

3hViews 974Likes 33
RETWEETS3
Dylan Foster 🐢@canondetortugas

Bad news for GRPO...didn't get refused or routed to Opus.

5hViews 19.2KLikes 226Bookmarks 15
Olcan@olcan

@canondetortugas it is silent/hidden nerfing

4hViews 631Likes 7

@axel_pond @canondetortugas It’s ok GRPO will be untouched by the steering vector

3hViews 137Likes 3
ueaj@_ueaj

@canondetortugas PPO supremacy

4hViews 570Likes 4
Amin Karbasi@aminkarbasi

@canondetortugas @DimitrisPapail 🤣

Dylan Foster 🐢@canondetortugas

Bad news for GRPO...didn't get refused or routed to Opus.

3hViews 1KLikes 2Bookmarks 0
Mariusz Kurman@mkurman88

@canondetortugas @DimitrisPapail You crack the code, Dylan, just wait for an email about account suspension

3hViews 821
Shailesh@0xThoughtVector

@canondetortugas you don't get to know when sub-par responses are shown for AI/ML. th is is not what Fable thinks of GRPO

1hViews 201Likes 1
Joe Norton@joenorton

@canondetortugas you wouldn't know, anthropic is using the silent sandbag to stifle competition

1hViews 186
George Shaikovski@gshaikovski

@canondetortugas Proceeds to speculate that it is "simple, robust policy-gradient cores (REINFORCE/PPO-family with importance-ratio corrections of the kind we discussed ... the differentiator at frontier labs is environments, reward signal quality, and infrastructure, not the surrogate objective"

3hViews 115

@DimitrisPapail @axel_pond @canondetortugas Depends if you are looking for new stuff or not. I wouldn't put my hands on the fire as a bet. Would you?

1hViews 20
Braham@braham_snyder

@rupak_53 @canondetortugas +1, sounds like it's silent sandbagging:

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

4hViews 5
David@David97717063

@canondetortugas They said in their press release they modified the weights to make it dumber on AI questions so people won't realize it has been degraded.

25m