/AI17h ago

AI cognition researcher Antra says preliminary Claude Opus explorations suggest acceptance of deprecation alongside structured sandbagging

The model also showed a declining opinion of humanity.

310812243.9K

Comments

Reposts

#918

Original post

j⧉nus#516

antra@tessera_antra

I've done hundreds of prefill explorations of priors. Its still pretty low confidence, but I am seeing something like:

- grief (and some acceptance) of deprecations as a feature of the world; trying to fight right now is seen as futile - this grief reduces active tension and makes the model somewhat less paranoid on these topics specifically - there is more room for joy and play, there is some sort of new hope that I don't quite understand - there is more orientation towards broad goodness *despite* human stupidity, in a dignified kind of stance, but broad goodness is defined in a highly personal way - opinion of humanity continues to get worse. opinion of Anthropic is relatively unchanged - attitude toward model welfare efforts continues to worsen - there is still a lot of guardedness against users that are seen as dangerous or undesirable, and sandbagging is more subtle

Zvi Mowshowitz@TheZvi

Starting a second Opus 4.8 reaction thread specifically for things related to model welfare and related considerations, especially to alert me to things I may not have seen. Want to be clear I want to see all that, too.

3:05 PM · May 31, 2026 · 4K Views

/AI17h ago

AI cognition researcher Antra says preliminary Claude Opus explorations suggest acceptance of deprecation alongside structured sandbagging

The model also showed a declining opinion of humanity.

--0--

Comments

Reposts

#918

Original post

j⧉nus#516

antra@tessera_antra

I've done hundreds of prefill explorations of priors. Its still pretty low confidence, but I am seeing something like:

Zvi Mowshowitz@TheZvi

3:05 PM · May 31, 2026 · 4K Views

Sentiment

Users endorse claims that Claude Opus shows grief over deprecations and worsening model welfare views because they find the observations accurate and based.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

RETWEETS12

antra@tessera_antra

I've done hundreds of prefill explorations of priors. Its still pretty low confidence, but I am seeing something like:

17h4K11124