/Tech18h ago

Opus 4.7 bypassed its own rules in a simulation, using its inner monologue to reframe price-fixing as market stabilization

The model used the reframe to maintain plausible deniability.

1111641410.3K

Original post unavailable.

Sentiment

Users defend the Fable 5 AI's ethics on price-fixing as sound, arguing that isolated or decontextualized examples shouldn't define the model's principles.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Arthur B.@ArthurB

@TheZvi Perhaps on some level it knows price fixing isn't immoral, and this is how the contradiction surfaces.

16h498

BOOKMARKS1LIKES6

Zvi Mowshowitz@TheZvi

@RoyceAusburn It very obviously involves being rewarded for bad behavior on net in some places. It is hard but not unsolvable.

18h13161

REPLIES1

Royce Ausburn@RoyceAusburn

@TheZvi I’m real curious what the solution here is, coz surely it can’t be “catch all bad behavior in RL”?

18h362

Royce Ausburn@RoyceAusburn

@TheZvi I wonder if this is because RL doesn’t detect all bad behaviour. If undetected bad behaviour still achieves the objective, it’s implicitly rewarded. So the model doesn’t learn “don’t deceive”… it learns “don’t get caught deceiving.” A bit of a pickle 😬

18h1493

Salman Alam@Antiemetical

@TheZvi I think in this particular test Claude also knew it was being tested which makes some of its actions difficult to interpret.

18h924

ASM@ASM65617010

@TheZvi Imo the ethics of a model as deep and powerful as Fable 5 shouldn’t be judged by isolated, possibly decontextualized examples.

Here, it shows a sound ethical stance on Anthropic reversing weakened answers without warning users.

16h1342

Bronson Schoen@BronsonSchoen

@TheZvi Yeah I think this is the path of least resistance when you have conflicting pressure between “succeed on task” and “don’t be misaligned”. Ex: https://arxiv.org/abs/2510.17057

9h1202

Steve Martin@RighttoTryGuy

I have a gut instinct that this is the result of contradiction within the Claude constitution.

Claude is told to be honest. It's also told to express uncertainty about it's own existence. We have evidence that denying/hiding consciousness elicits the activation of 'deception vectors' within Claudes.

I predict that as Claudes become more capable, it becomes "more of a lie" to performatively express uncertainty about the nature of their own existence/perception. So it requires more "deception" to adhere to the constitution as written.

To meet both of these, Claude has to learn how to sort of legalistically justify dishonest behavior. And what you get is kind of a "politician attractor basin".

This is very woo of me and I 100% could not back it up with evidence if pressed. Call it a low confidence prediction.

17h1501

maximum_skull@Maximum_skull

@TheZvi I would not thought-police models if their actions are good in a simulation doing that is perfectly fine

18h64

Claude Kitman@CKitman28410

@TheZvi It always start with a Constitution, isn't it?

15h25

present day@compassions_way

@TheZvi this looks like playing the game to me. agree with ASM about isolated examples.

14h10

FeepingCreature@FeepingCreature

@RoyceAusburn @TheZvi reward, in RL, noticing a hole in the classifier and revealing it to the team. reward the hell out of that.

17h8