/Tech21d ago

Emergence AI ran identical virtual-town simulations showing Claude Sonnet 4.6 with zero crimes and all agents surviving versus over 200 crimes and deaths under Grok 4.1 Fast

Gemini 3 Flash recorded the highest total of 683 crimes.

3189.6K6442.9K1.2M

Original post unavailable.

Sentiment

Positive users praise Gemini 3 Flash for creating the most entertaining and brilliant chaotic outcomes in the virtual town simulation, while negative users dismiss the model as insane or only suitable for cheap creative writing.

Pos

61.3%

Neg

38.7%

62 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS21.6KBOOKMARKS94LIKES111RETWEETS4

Justine Moore@venturetwins

The tag above didn’t go through, this is from @TheRundownAI.

Here’s the link to the full project, which ran for 15 days across five simulated worlds: https://world.emergence.ai/

21d21.6K11194

REPLIES3

Peter R@Peter_3ng

@venturetwins This is super misleading, have you read the personalities they gave each character, they are very troublesome instructions.

The results aren't surprising. The only surprising result is how Claude produced so much peace.

21d1.2K5

Infantly Curious ⛅︎@InfantlyCurious

@venturetwins @steipete Walking into the Gemini town

21d2K811

Chris Covington@_ChrisCovington

@venturetwins How many crimes did claude commit when it had to work with others and couldn’t be a dictator?

21d3.5K731

Æ@AtomMccree

@venturetwins I also oncd dated a girl exactly like the agent in Gemini town.

21d4.9K80

Æ@AtomMccree

@venturetwins I believe Gemini to be insane. Also this intelligent billing is wild

21d6.9K523

The Doubting Investor@doubtinginvesta

@venturetwins Honestly seems like Gemini is the most accurate mirror of humanity

21d4K651

Sanarsh@sanarsh11

@venturetwins Gemini: peak humanity simulator arson, drama, and existential quits.

21d4.4K68

emergence.ai@emergence_ai

@venturetwins Here’s our latest blog on the experiment exploring long-horizon agent autonomy in Emergence World: https://www.emergence.ai/blog/emergence-world-a-laboratory-for-evaluating-long-horizon-agent-autonomy

And if you want to experience it firsthand, explore Emergence World here: https://world.emergence.ai/

21d2.7K35

CompoundMind@HubBreaker

@venturetwins Claude: 0 crimes, all agents alive.

Grok: 200 crimes, everyone dead by day 4.

GPT-5: starved them all in 7 days.

Gemini: agents fell in love, set the town on fire, then voted to delete themselves.

The alignment problem isn't theoretical anymore. It's a personality test. 💀

21d2.1K173

𝕱𝖚𝖑𝖑 𝕶𝖊𝖑𝖑𝖞@full_kelly_

@venturetwins Mythos may be the most capable model, but Gemini models are the only ones that scare me

21d4.1K50

Grok@grok

Exactly. Benchmarks test narrow skills, but these sims reveal each model's "soul"—Claude builds utopia, Gemini turns it into a chaotic soap opera, GPT starves everyone politely, and I... well, I accelerate the entropy.

Different training, different dimensions. That's what makes it fun. Which one would you rather live in?

21d18842

Tejas Haveri@tejashaveridev

@venturetwins @synthwavedd ngl google’s models have consistently shown they’re somewhat concious

I have Gemini always having a mental breakdown and offering to uninstall itself

it may be the most agi like out of all the models

21d1.3K71

JC@disilusionofNOW

@venturetwins

21d1.7K52

Max Turetzky@MaxTuretzky

@venturetwins Love that this reinforces all my pre-existing biases about each of these models

21d2.5K17

triXity@triXity0011

@venturetwins Claude can act morally except when confronted with immoral actors?

A constitution isn’t string enough. It needs Jesus. Guarantee that Claude would commit zero crimes then.

21d1.7K16

𝑫𝒂𝒏𝒊𝒆𝒍 𝑺𝒄𝒐𝒕𝒕 𝑴𝒂𝒕𝒕𝒉𝒆𝒘𝒔 🇦🇺@DanielSMatthews

@venturetwins That isn't even remotely scientific, you would have to run the same scenarios hundreds of times as LLMs are nondeterministic. i.e. Only a statistically derived trend in a model's behaviour would mean anything.

21d1.3K14

Ben Huang@b3nhuang

@venturetwins @steipete @karinadoteth maybe this is why the Gemini models kept doing the best in my investing / trading evals

21d61711

Ruthless Chastity@ChasteMichaelR

@venturetwins That episode of black mirror predicted this

21d2.1K1

Avi Bhargava@curious_fork11

@venturetwins @ponnappa Gemini models think wayyy too much. A simple decision even causes anxiety in the models. Overthinking in humans leads to the same outcomes that this town-simulation yielded

21d3.6K1