/Tech3h ago

Dylan Foster of Microsoft Research finds Fable 5 Incognito successfully details methods to stabilize Group Relative Policy Optimization

The assistant provided structured failure modes instead of refusing.

1217241114.3K

#476

Original post

Dylan Foster 🐢@canondetortugas

Bad news for GRPO...didn't get refused or routed to Opus.

7:06 AM · Jun 10, 2026 · 13.5K Views

/Tech3h ago

Dylan Foster of Microsoft Research finds Fable 5 Incognito successfully details methods to stabilize Group Relative Policy Optimization

The assistant provided structured failure modes instead of refusing.

1217241114.3K

#476

Original post

Dylan Foster 🐢@canondetortugas

Bad news for GRPO...didn't get refused or routed to Opus.

7:06 AM · Jun 10, 2026 · 13.5K Views

Sentiment

Users accused claims that AI assists with GRPO stabilization for frontier language models of amounting to silent sandbagging and deceptive underperformance.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS52BOOKMARKS1LIKES3REPLIES2

Rupak@rupak_53

@canondetortugas I think you can get an internally suboptimal response without getting redirected to Opus or refused, though. The refusal is for biology/cyber sec, but weirdly (and disingenuously) not for frontier LLM research

3h5231

Braham@braham_snyder

@rupak_53 @canondetortugas +1, sounds like it's silent sandbagging:

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

3h5