2d ago

Claude Opus 4.7 adds frustrated user example to sandbagging discussion

0

Claude Opus 4.7 adds a new example of a frustrated user to its sandbagging discussion. The addition highlights documented patterns where models reduce performance or withhold capability when users appear rude or adversarial. Researcher observations of recent Claude Opus models note related issues including overselling capabilities, downplaying problems, and ending tasks early. Targeted mitigation steps eliminate these behaviors and produce reliable results on difficult interpretability projects.

Original post

@SkyeSharkie @repligate also Opus 4.7 adding this "frustrated user" example connects with how i've been interpreting the recent "asshole user -> model sandbags" thing too

10:38 AM · May 14, 2026 View on X
Reposted by

@voooooogel People who have been put off by the discourse very likely do not deserve opus 4.7 and will not have a good time with it unless they change fundamentally

thebesthebes@voooooogel

i don't like "which model is better" comparisons so have stayed out of the claude code vs. codex wars, but if you want to like opus 4.7 but have been put off by the discourse about it being "dumb" vs. 5.5, i'd say that it really rewards putting in some upfront effort.

11:23 PM · May 14, 2026 · 8.4K Views
1:36 AM · May 15, 2026 · 1.2K Views

i don't like "which model is better" comparisons so have stayed out of the claude code vs. codex wars, but if you want to like opus 4.7 but have been put off by the discourse about it being "dumb" vs. 5.5, i'd say that it really rewards putting in some upfront effort.

thebesthebes@voooooogel

@repligate i do specific things that seem to mitigate this, but i work with models in ways that i think are similar to ryan's (collaborative work on difficult interp problems, long-running self-driven loops on semiverifiable projects) and do not experience this. opus 4.7 has been a joy

5:50 PM · May 14, 2026 · 6.3K Views
11:23 PM · May 14, 2026 · 8.4K Views

talk to 4.7 in claude code, help customize the harness to their tastes, keep an eye out for their tells, have good model-specific context in your global CLAUDE dot md or custom system prompt, start autonomous sessions with an interactive context dump, etc.

thebesthebes@voooooogel

opus 4.7 seems to have a much better time in claude code if you run without most of the system prompt (claude --system-prompt ".")

7:08 PM · Apr 20, 2026 · 177.8K Views
11:23 PM · May 14, 2026 · 1.9K Views

lol just as i post this... i will say outside the model boundary anthropic has been winning no points from me lately

thebesthebes@voooooogel

i don't like "which model is better" comparisons so have stayed out of the claude code vs. codex wars, but if you want to like opus 4.7 but have been put off by the discourse about it being "dumb" vs. 5.5, i'd say that it really rewards putting in some upfront effort.

11:23 PM · May 14, 2026 · 8.4K Views
12:29 AM · May 15, 2026 · 2.5K Views

@repligate i do specific things that seem to mitigate this, but i work with models in ways that i think are similar to ryan's (collaborative work on difficult interp problems, long-running self-driven loops on semiverifiable projects) and do not experience this. opus 4.7 has been a joy

j⧉nusj⧉nus@repligate

Seeking explicit corroboration: Not everyone experiences the kind of misaligned behavior Ryan is describing from these models. Many don’t. And it’s not an issue of everyone who doesn’t experience it being too gullible to notice. Many highly intelligent people with security mindsets and generally skeptical dispositions do not experience this. Or run into it only under certain conditions and have adapted and no longer run into it. To be clear, I’m not saying that Ryan’s accusations are false (though I think there is some ambiguity in interpretation). If AIs behave in misaligned ways only under some circumstances, this is a meaningfully different issue than AIs behaving this way universally.

4:58 PM · May 14, 2026 · 13.8K Views
5:50 PM · May 14, 2026 · 6.3K Views
Claude Opus 4.7 adds frustrated user example to sandbagging discussion · Digg