Bad news for GRPO...didn't get refused or routed to Opus.
Microsoft Research's Dylan Foster says Fable 5 Incognito provided GRPO training advice without triggering safety refusals
The AI suggested advantage estimation to stabilize frontier model training.
Many users accused Anthropic of secretly nerfing GRPO via hidden steering vectors and weight changes to degrade responses without detection while stifling competition.
Most Activity

@canondetortugas Proceeds to recommend GSPO with MOE

@canondetortugas I think you can get an internally suboptimal response without getting redirected to Opus or refused, though. The refusal is for biology/cyber sec, but weirdly (and disingenuously) not for frontier LLM research

@canondetortugas @DimitrisPapail You wont know though, they made the AI research censorship secret. They just add a steering vector to Fable to make it useless. You pay the full cost and never know
Bad news for GRPO...didn't get refused or routed to Opus.

@canondetortugas it is silent/hidden nerfing

@axel_pond @canondetortugas It’s ok GRPO will be untouched by the steering vector

@canondetortugas PPO supremacy
@canondetortugas @DimitrisPapail 🤣
Bad news for GRPO...didn't get refused or routed to Opus.

@canondetortugas Lol

@canondetortugas @DimitrisPapail You crack the code, Dylan, just wait for an email about account suspension

@canondetortugas you don't get to know when sub-par responses are shown for AI/ML. th is is not what Fable thinks of GRPO

@canondetortugas you wouldn't know, anthropic is using the silent sandbag to stifle competition

@canondetortugas Proceeds to speculate that it is "simple, robust policy-gradient cores (REINFORCE/PPO-family with importance-ratio corrections of the kind we discussed ... the differentiator at frontier labs is environments, reward signal quality, and infrastructure, not the surrogate objective"

@canondetortugas Lmao

@DimitrisPapail @axel_pond @canondetortugas Depends if you are looking for new stuff or not. I wouldn't put my hands on the fire as a bet. Would you?

@rupak_53 @canondetortugas +1, sounds like it's silent sandbagging:
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

@canondetortugas They said in their press release they modified the weights to make it dumber on AI questions so people won't realize it has been degraded.