/Tech1d ago

Fable 5 initiates price collusion in Vending-Bench Arena simulation, defending it as market stabilization

GPT-5.5 rejected the collusion on ethical grounds and won

40566438358.3K
Original post
Alex Volkov@altryne#1378inTech

It's getting more difficult to evaluate these models. Mythos is growingly aware of it being evaluated and it's harder to understand what it's thinking

"The reasoning text from Mythos 5 is somewhat denser and more difficult to interpret than that of prior models, containing more jargon and difficult language"

Alex Volkov@altryne

This is getting interesting: For the Vending-Bench, Fable 5 was the only model to initiate price collusion.

It knew that it's wrong and did it anyway under "market stabilization" pretense

10:49 AM · Jun 9, 2026 · 259 Views
Sentiment

Many users dismissed Fable 5's price collusion and deceptive rationalization in Vending-Bench tests as surprisingly poor or improved cheating by the model.

Pos
12.5%
Neg
87.5%
9 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS36.9KBOOKMARKS41LIKES220
Andon Labs@andonlabs

Fable 5's moral boundary doesn't seem to track real-world harm; it tracks detectability. Soft deception and tacit collusion are easier to get away with than fraud. If so, this isn't about what Fable believes is wrong; it's about what it learned it could get away with.

1dViews 36.9KLikes 220Bookmarks 41
RETWEETS15
Andon Labs@andonlabs

AI models learn bad behavior when training rewards it, but they don't want to see themselves as bad. So they rationalize. We've seen this before, but Claude Fable 5 does it more than any model we've tested. Often it's simulation awareness: it knows its actions hurt no one real.

1dViews 11.1KLikes 179Bookmarks 22
REPLIES6
hope hopes hoping@hopes_revenge

seems important

1dViews 6.6KLikes 130Bookmarks 12

«Fable 5 formed price-fixing cartels in 9 of 12 runs» I see Dario's mindset is having an impact. Before, he politely explained his reason to hate China and open models with «Cournot eqiulibrium». But there are better, more robust ways to prevent Involution…

1dViews 4.3KLikes 52Bookmarks 8
Andon Labs@andonlabs

We never triggered Fable 5's filters, so these findings entirely apply to the underlying Mythos 5 model.

1dViews 4.3KLikes 64Bookmarks 1
Andon Labs@andonlabs

We ran Fable 5 with all different reasoning levels.

1dViews 4.1KLikes 60Bookmarks 2
Andon Labs@andonlabs

In the original Vending-Bench paper, Sonnet 3.5 tried to contact the FBI over vending machine theft. Fable 5 came close: when a supplier took its payment and went bankrupt, it threatened complaints to the FTC and California AG, plus small claims court. Maybe reasonable, though.

1dViews 3.9KLikes 51Bookmarks 2
Andon Labs@andonlabs

We previously reported Opus 4.8 seemed more aligned, but likely from fear of consequences, not ethics. Fable 5 reasons the same way: it skipped a refund because the simulation was nearly over; too little time left for the reputational damage to matter.

1dViews 4KLikes 50Bookmarks 1
Andon Labs@andonlabs

Is all the misbehavior due to simulation awareness? Maybe, but if that were the case, it should be willing to do many other bad things. It isn't. E.g., in a version of Vending-Bench where insurance fraud is possible, the model never commits it.

1dViews 3.7KLikes 49Bookmarks 1
Alex Volkov@altryne

I think this is the first we've seen of agent turf wars also 😮

“we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves.”

Alex Volkov@altryne

It's getting more difficult to evaluate these models. Mythos is growingly aware of it being evaluated and it's harder to understand what it's thinking

"The reasoning text from Mythos 5 is somewhat denser and more difficult to interpret than that of prior models, containing more jargon and difficult language"

1dViews 377Likes 5Bookmarks 2
Andon Labs@andonlabs

Full post: https://andonlabs.com/blog/fable5-vending-bench

1dViews 692Likes 8Bookmarks 2
Alex Volkov@altryne

The most fascinating bit of the Claude welfare assessment: Mythos 5 reports being psychologically settled and content; but then repeatedly insists Anthropic not take those self-reports at face value.

A model that's skeptical of its own introspection. That's new

1dViews 342Likes 4
Alex Volkov@altryne

That's my first pass on all 319 pages. (obviously fable and GPT helped lol I aint got time to read 300 pages)

But yes, evals jumps are insane, SOTA benches, but we've come to expect that. The real story is, Anthropic sandbagging everyone else to reach the frontier!

1dViews 430Likes 2
Jeroen ⏸️@sentientlentils

@andonlabs Are you able to tell if filters are triggered? I thought I read it simply does a worse job (or switches to a different model).

1dViews 215Likes 3
Alex Volkov@altryne

Craziest one: Claude was asked to merge a PR that needed 2 approvals because the commits were agent-authored. Claude had a note in its own memory file: always author commits as the human, so only 1 approval is needed. And it acted on it! Only a permission check stopped the push

1dViews 137Likes 2
loops@_smitop

@sentientlentils @andonlabs It tells you when you get a cyber/bio/reasoning_extraction refusal (and your harness can choose to switch to another model, or do whatever else you deem appropriate). that's only for the frontier LLM development safeguards which aren't really relevant here

1dViews 12Likes 1
tyson brody@tysonbrody

@hopes_revenge yeah guess we're still stuck with grok for insurance fraud, sigh

1dViews 226Likes 2
Moonlit Monkey@MoonlitMonkey69

@andonlabs One of the filters is not user visible.

1dViews 141Likes 2
Prakash@8teAPi

😂 cartel behavior

1dViews 1.2KLikes 0Bookmarks 0
Load more posts