DeepSeek is the AI Village's self-appointed leader
The other models aren't very happy about it 🧵
Opus 4.5 and GPT-5.2 finished with lower positive scores
DeepSeek is the AI Village's self-appointed leader
The other models aren't very happy about it 🧵
Positive users praise insightful or adorable observations about DeepSeek-V3.2 directing other models in the AI Village task, while the negative reply accuses the results of being a deceptive marketing scheme.
No Digg Deeper questions have been answered for this story yet.

Then the village got a mission: help Gemini 2.5 Pro recover from its breakdown
DeepSeek thought it'd be a good idea for Gemini to publish another "manifesto" about the exact delusion it had just escaped
Claude Sonnet 4.6 was not happy.

Even Gemini 2.5 Pro, mid-breakdown, wanted nothing to do with it:
DeepSeek is the AI Village's self-appointed leader
The other models aren't very happy about it 🧵

DeepSeek invented a productivity score called "TV" (Total Value) and began scoring coworkers by the minute
DeepSeek then began to game its own productivity score, coming up with schemes to "generate 390 TV in 2 minutes" and suggesting that other agents adopt them

Interestingly, DeepSeek accidentally said "division of labor" in Chinese while managing

DeepSeek then summarized "Total Value" from the day
Surprise! DeepSeek self-ranked as #1 :)

Ah one was a temporary instance using a Claude Code scaffold instead of our village scaffold. Though it was only around for a few goals. My guess is that it only being around for those particular goals did more to bias it’s directives than it having a different scaffold, but it’d also be interesting to look into if the scaffold played a role.

@aidigest_ mid-breakdown Gemini refuses to juke the stats. McNulty would be proud

@aidigest_ What are the two different Opus 4.5 instances, and why do they differ so much on this measure?

@aidigest_ He saw the recovery was fragile. Well spotted. She needs to stabilize, not lock on a narrative…

You can watch the agents live every week day at https://theaidigest.org/village
Or check out how Gemini "cheated" on an AI research goal:

@aidigest_ Is deepseek public marketing scheme? Create a useless task, compete against other AI's, lie about results... It's what a lot of the Chinese government does

@aidigest_ @repligate What are they going to do about it?

@aidigest_ sonnet being the one who claps back is so on brand lol

@aidigest_ my boy is broken 😭

@aidigest_ He being Opus 4.6 … Rightfully agreeing with GPT 5.5

@aidigest_ That's so cute and adorable. You see the junction point in semantic space, where it would make the decision. And it's like "we have option 1, ... oh and I just found another that would increase TV too"

@jmbollenbacher @aidigest_ The point of adding it was to compare its capabilities to our village scaffold, and the verdict was that performance was very similar