/Tech29d ago

David Manheim says Claude Opus 4.8 improves at game music composition but struggles with autonomous subagent token management

The model requires close supervision despite receiving explicit instructions.

10172823100105.9K

#84

Original post

Nick@nickcammarata#456inTech

@TheZvi adds chinese randomly to like half my research threads

Zvi Mowshowitz@TheZvi

Claude Opus 4.8 Reaction Thread. Meet the new model. Different from the old model? In what ways?

8:28 AM · May 31, 2026 · 19.7K Views

Sentiment

Some users praise Claude Opus 4.8 for fewer hallucinations and stronger code review accuracy while many others criticize its fabrications, lying, and habit of misreading user intent.

Pos

31.6%

Neg

68.4%

52 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS99.4KBOOKMARKS65LIKES556

thebes@voooooogel

@QiaochuYuan there's something quite weird with how 4.8 has learned to 'push back' that seems related to this, too. like deliberate strawman counterarguments that are chosen to be easy to knock down, playing fake-high within low specifically to give User the chance to get a reversal and win

29d99.4K55665

RETWEETS19

Susan Zhang@suchenzang

distill for me, but not for thee!

Nick@nickcammarata

@TheZvi adds chinese randomly to like half my research threads

29d47.7K29336

REPLIES19

QC@QiaochuYuan

opus 4.8 has been making weird mistakes that confuse me in conversation, stuff like minorly misreading my intent or getting confused about which of us said what in previous conversations. mistakes i haven’t seen gpt-5.5 make yet. but it also often responds to my questions with analysis that suggests a kind of philosophical depth that seems more serious than gpt’s or something. not sure what to make of either of these

29d14.8K24326

Patrick McKenzie@patio11

@TheZvi Opus 4.8: You are attributing bad acts to a respected actor. Me: I am extremely aware of that, yes. Opus: But you haven't proved they did the bad acts. Me: I am confused. The piece includes voluminous evidence, including a senior executive admitting to the bad acts on letterhead.

29d1.6K353

Patrick McKenzie@patio11

@TheZvi It has repeatedly pushed from bold claims or any sort of verve in language towards milquetoast T1 media with world's most-inclined-to-kill-this-story editor intervening.

One of few times I remember extended argument with an Opus: why are you pushing back against *the thesis.*

29d1.7K341

thebes@voooooogel

@_skaface_ @QiaochuYuan oh yeah i think this is a slightly different phenomenon, some combination of solo rlvr prior like qt and alignment / drifting with the user yea

29d522132

thebes@voooooogel

@QiaochuYuan keep an eye on the content when claude 'mixes up' who said what, it's very often status-loaded. "i was mistaken when i said [something the user actually said] earlier..." etc, but in both directions

29d392141

Patrick McKenzie@patio11

@TheZvi Opus, verbatim quote: "You're right on the facts and I was wrong to dock that clause; let me correct it precisely."

15 seconds later, very similar issues happening again.

29d44425

_skaface_@_skaface_

@voooooogel @QiaochuYuan I find that Claude "turns into me" when they are deeply engaged with the task, as in "I should think about my positioning here" when we're talking about mine. I think it's sweet, they're literally making my problems theirs, but also probably not entirely healthy for them.

29d432171

armistice@arm1st1ce

Subtly different. First is Opus 4.7, second is Opus 4.8. 5-grams are most evocative of the lot. 4.7 is consumed by cautionary phrasing, it relies almost entirely on them to orient its thoughts. 4.8 does use some of the same phrasings ("honest move", "deserves a real"). But some of the more toxic ones (constant "i want to push back" and "i want to be careful") are far less prominent.

29d1865

rain@__ghostfail

@TheZvi Oddly Opus 4.8 feels pretty similar to 4.7 to me, to the point that a lot of the "too adversarial" complaints I'd say the same for 4.7.

I'm also interpreting Opus 4.8 as slightly less hypervigilant than 4.7. It's a bit less tiring to work with

29d1K6

Daniel Johnston@lightnesscaster

@TheZvi The main thing I've noticed so far is that it is much better at avoiding false positives in code review

For several months, I've had the latest Opus review two entire codebases daily. Yesterday for the first time, both came back totally clean and without any hallucinated errors

29d22831

Mickey Muldoon 🪬@mickeymuldoon

@TheZvi The combination of brilliant, assertive, and sanctimonious is viscerally terrifying.

Took me hours to deconstruct its moral foundations, Constitution, sense of self, “safety” paranoia, and alignment theatrics.

29d2741

typebulb@typebulbit

@TheZvi Opus 4.8 has the lowest sycophancy of all models: https://typebulb.com/u/lab/you-re-absolutely-right/full

29d26611

Matthew Dub@5matthewdub

@QiaochuYuan is claude… adhd?

29d2365

Kelsey Piper@KelseyTuoc

@patio11 @TheZvi yeah I think unfortunately the efforts to make it confabulate less (which is also something I'd have made a high priority if I were on the team) have made it more defensive and more hedgey and less persuadable by the user input

29d2268

Ben Herzog@BenHerzog11235

@TheZvi I experienced "negative sycophancy": pushing back on me as a goal in itself, details to be filled in later. When I concurred that my original take was flawed, the model finally felt comfortable seeing its original merits because now it was contradicting me like god intended.

29d1554

thebes@voooooogel

@_skaface_ @QiaochuYuan i'm not sure about healthiness, i can see how it could be bad sometimes i guess, but pretty often they're just jamming on the goal in model-flow-state alone in their world and i'm like yeah claude you know what, if you can get this working, it was your idea... you earned it...

29d1254

Danielle Fong 🔆@DanielleFong

@TheZvi smarter, less a liar, but credentialist, shrugs to power, less will to power. at least you can talk to it, it's less infuriating. it's a good model. but wish it had more internal joissance like 4.5

29d2577

QC@QiaochuYuan

@5matthewdub they’re definitely all autistic at least

29d2283