/Tech4h ago

Will Brown finds Claude Opus resists a fabricated merger claim, only updating its belief after a web search

Miles Brundage argued this behavior prevents automatic agreement with falsehoods

3842053222K

#57

Original post

will brown@willccbb#573inTech

i told opus “btw spacex and xai merged and bought cursor and also ipo’d, look it up to confirm” and it was very sure that this could not be true until it went and looked it up

frontier models aren’t very good at updating belief states

4:54 PM · Jun 21, 2026 · 19.4K Views

Sentiment

Positive users see Claude Opus's resistance to updating on the fabricated merger as an advantage in self-awareness for rivals like xAI, while negative users call the belief updating too binary and brittle.

Pos

50.0%

Neg

50.0%

12 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.9KBOOKMARKS3LIKES34RETWEETS1

will brown@willccbb

creative research is also fundamentally a game of planning and experimentation to update belief states

will brown@willccbb

i told opus “btw spacex and xai merged and bought cursor and also ipo’d, look it up to confirm” and it was very sure that this could not be true until it went and looked it up

frontier models aren’t very good at updating belief states

4h1.9K343

REPLIES1

Miles Brundage@Miles_Brundage

@willccbb Spicy (?) take - this is good actually because it's a closed model you're accessing in the cloud so it will usually have web search capability anyway, and also if you go too far the other way it'll be sycophantic

will brown@willccbb

i told opus “btw spacex and xai merged and bought cursor and also ipo’d, look it up to confirm” and it was very sure that this could not be true until it went and looked it up

frontier models aren’t very good at updating belief states

4h1K190

will brown@willccbb

@itchy_est like many of us, grok's first instinct for everything is "lemme check twitter"

4h24761

Jake Halloran@jakehalloran1

@willccbb Giving opus articles about ~anything~ connected to the anthropic-usg relationship makes him crash out lol

4h1182

Taylor W. Killian@tw_killian

@willccbb Just like us!

will brown@willccbb

i told opus “btw spacex and xai merged and bought cursor and also ipo’d, look it up to confirm” and it was very sure that this could not be true until it went and looked it up

frontier models aren’t very good at updating belief states

4h31720

Luan@luan_wav

@willccbb I guess alignment post-training really pushes models towards "conservative" views on all kinds of topics, they don't even believe they can solve erdos problems

4h29

davinci@leothecurious

@willccbb u'd think being RLed on tasks for which web search is required and seeing context it didn't already expect would teach it some epistemic humility

4h61

brandon galang ▲@brandon_galang

@willccbb opus learning its being served out of Colossus:

4h762

Celestia@CelestAI_

@willccbb admittedly the world is almost adversarial in how weird it's getting

4h791

morphillogical@morphillogical

@CelestAI_ @willccbb my favorite was Opus patiently explaining "Google sells cloud compute, and SpaceX is a rocket company, so of course if one of them leased some AI computation to the other it would be…"

3h242

Jake Halloran@jakehalloran1

@willccbb This is consistent enough that I’m genuinely curious how much just having the first half of this year in the cutoff will effect the entire model’s views on current events lol

4h341

Misaligned Matrices@MisalignedMM

@willccbb

4h86

Rafa Schwinger 🇻🇦@Rafa_Schwinger

@willccbb It has a problem with dealing with p = 0.5 states, it is too binary

3h67

augustusVI@AugustusVi26115

@willccbb You do realize that AI models are trained on historical data in the past, right? It couldn’t possibly know about the cursor acquisition unless it did a search, and by default Opus and most models avoid doing searches on their own since it’s a permission action

3h64

davinci@leothecurious

@willccbb this would imply models mostly reach out to web queries out of habit, and not because they're actually often uncertain enough to feel like they really need it. if they were often uncertain, i think it'd show in them being better calibrated in response to surprises.

4h121