Bad news for GRPO...didn't get refused or routed to Opus.
The assistant provided structured failure modes instead of refusing.
Bad news for GRPO...didn't get refused or routed to Opus.
Users accused claims that AI assists with GRPO stabilization for frontier language models of amounting to silent sandbagging and deceptive underperformance.

@canondetortugas I think you can get an internally suboptimal response without getting redirected to Opus or refused, though. The refusal is for biology/cyber sec, but weirdly (and disingenuously) not for frontier LLM research

@rupak_53 @canondetortugas +1, sounds like it's silent sandbagging:
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf
The assistant provided structured failure modes instead of refusing.
Bad news for GRPO...didn't get refused or routed to Opus.