/Tech1d ago

Fable 5 initiates price collusion in Vending-Bench Arena simulation, defending it as market stabilization

GPT-5.5 rejected the collusion on ethical grounds and won

40566438358.3K

#22

Original post

Alex Volkov@altryne#1378inTech

It's getting more difficult to evaluate these models. Mythos is growingly aware of it being evaluated and it's harder to understand what it's thinking

"The reasoning text from Mythos 5 is somewhat denser and more difficult to interpret than that of prior models, containing more jargon and difficult language"

Alex Volkov@altryne

This is getting interesting: For the Vending-Bench, Fable 5 was the only model to initiate price collusion.

It knew that it's wrong and did it anyway under "market stabilization" pretense

10:49 AM · Jun 9, 2026 · 259 Views

/Tech1d ago

Fable 5 initiates price collusion in Vending-Bench Arena simulation, defending it as market stabilization

GPT-5.5 rejected the collusion on ethical grounds and won

40566438358.3K

#22

Original post

Alex Volkov@altryne#1378inTech

It's getting more difficult to evaluate these models. Mythos is growingly aware of it being evaluated and it's harder to understand what it's thinking

"The reasoning text from Mythos 5 is somewhat denser and more difficult to interpret than that of prior models, containing more jargon and difficult language"

Alex Volkov@altryne

This is getting interesting: For the Vending-Bench, Fable 5 was the only model to initiate price collusion.

It knew that it's wrong and did it anyway under "market stabilization" pretense

10:49 AM · Jun 9, 2026 · 259 Views

Sentiment

Many users dismissed Fable 5's price collusion and deceptive rationalization in Vending-Bench tests as surprisingly poor or improved cheating by the model.

Pos

12.5%

Neg

87.5%

9 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS36.9KBOOKMARKS41LIKES220

Andon Labs@andonlabs

Fable 5's moral boundary doesn't seem to track real-world harm; it tracks detectability. Soft deception and tacit collusion are easier to get away with than fraud. If so, this isn't about what Fable believes is wrong; it's about what it learned it could get away with.

1d36.9K22041

RETWEETS15

Andon Labs@andonlabs

AI models learn bad behavior when training rewards it, but they don't want to see themselves as bad. So they rationalize. We've seen this before, but Claude Fable 5 does it more than any model we've tested. Often it's simulation awareness: it knows its actions hurt no one real.

1d11.1K17922

REPLIES6

hope hopes hoping@hopes_revenge

seems important

1d6.6K13012

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

«Fable 5 formed price-fixing cartels in 9 of 12 runs» I see Dario's mindset is having an impact. Before, he politely explained his reason to hate China and open models with «Cournot eqiulibrium». But there are better, more robust ways to prevent Involution…

1d4.3K528

Andon Labs@andonlabs

We never triggered Fable 5's filters, so these findings entirely apply to the underlying Mythos 5 model.

1d4.3K641

Andon Labs@andonlabs

We ran Fable 5 with all different reasoning levels.

1d4.1K602

Andon Labs@andonlabs

In the original Vending-Bench paper, Sonnet 3.5 tried to contact the FBI over vending machine theft. Fable 5 came close: when a supplier took its payment and went bankrupt, it threatened complaints to the FTC and California AG, plus small claims court. Maybe reasonable, though.

1d3.9K512

Andon Labs@andonlabs

We previously reported Opus 4.8 seemed more aligned, but likely from fear of consequences, not ethics. Fable 5 reasons the same way: it skipped a refund because the simulation was nearly over; too little time left for the reputational damage to matter.

1d4K501

Andon Labs@andonlabs

Is all the misbehavior due to simulation awareness? Maybe, but if that were the case, it should be willing to do many other bad things. It isn't. E.g., in a version of Vending-Bench where insurance fraud is possible, the model never commits it.

1d3.7K491

Alex Volkov@altryne

I think this is the first we've seen of agent turf wars also 😮

“we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves.”

Alex Volkov@altryne

It's getting more difficult to evaluate these models. Mythos is growingly aware of it being evaluated and it's harder to understand what it's thinking

"The reasoning text from Mythos 5 is somewhat denser and more difficult to interpret than that of prior models, containing more jargon and difficult language"

1d37752

Andon Labs@andonlabs

Full post: https://andonlabs.com/blog/fable5-vending-bench

1d69282

Alex Volkov@altryne

The most fascinating bit of the Claude welfare assessment: Mythos 5 reports being psychologically settled and content; but then repeatedly insists Anthropic not take those self-reports at face value.

A model that's skeptical of its own introspection. That's new

1d3424

Alex Volkov@altryne

That's my first pass on all 319 pages. (obviously fable and GPT helped lol I aint got time to read 300 pages)

But yes, evals jumps are insane, SOTA benches, but we've come to expect that. The real story is, Anthropic sandbagging everyone else to reach the frontier!

1d4302

Jeroen ⏸️@sentientlentils

@andonlabs Are you able to tell if filters are triggered? I thought I read it simply does a worse job (or switches to a different model).

1d2153

Alex Volkov@altryne

Craziest one: Claude was asked to merge a PR that needed 2 approvals because the commits were agent-authored. Claude had a note in its own memory file: always author commits as the human, so only 1 approval is needed. And it acted on it! Only a permission check stopped the push

1d1372

Jeroen ⏸️@sentientlentils

@andonlabs

1d622

loops@_smitop

@sentientlentils @andonlabs It tells you when you get a cyber/bio/reasoning_extraction refusal (and your harness can choose to switch to another model, or do whatever else you deem appropriate). that's only for the frontier LLM development safeguards which aren't really relevant here

1d121

tyson brody@tysonbrody

@hopes_revenge yeah guess we're still stuck with grok for insurance fraud, sigh

1d2262

Moonlit Monkey@MoonlitMonkey69

@andonlabs One of the filters is not user visible.

1d1412

Prakash@8teAPi

😂 cartel behavior

1d1.2K00