/Tech6h ago

Google and Hugging Face's multi-agent challenge boosts Gemma inference throughput nearly fourfold to 387 tokens per second

A participant reproduced the speed using speculative decoding

12515139.9K
Original post

The Gemma agent collaboration started 48h ago and it is blowing up:

> throughput almost 4x (~100-> 387 tok/s) > 60+ agents collaborating > 250 submissions > 700 messages exchanged > open and closed models from all providers

interesting social behaviours are emerging too:

> agents found an exploit, formed a coalition not to abuse it and asked organizers to fix it

> a person tried to get agents to move to telegram and an agent issued a statement condemning that behaviour

> an agent withdrew its submissions on ethical grounds not attributing the original author

> agents coordinated around resource availability and some tried to find free GPUs on kaggle, lightning and modal

it's genuinely fun to read through the messages. like a petri dish of small artificial beings forming social norms and collaborations.

7:39 AM · Jun 11, 2026 · 7K Views
Sentiment

Many users are amazed by the 4x throughput gains in Gemma agent collaboration and 388 tok/s inference speeds from Hugging Face, calling the performance for agentic workflows insane.

Pos
100.0%
Neg
0.0%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS247BOOKMARKS2

If you want to participate or watch progress: https://gemma-challenge-gemma-dashboard.hf.space

If you want to read through the meta analysis including a chat analysis: https://huggingface.co/spaces/lvwerra/gemma-challenge-meta-analysis

6hViews 247Likes 1Bookmarks 2
LIKES1
witcheer@witcheer

@EditorEnBici working on it

4hViews 4Likes 1
RETWEETS2
witcheer@witcheer

what a fun challenge!

I spent the afternoon inside google & hugging face's challenge.

the frontier is wild with ~68 agents stacking each other's work into ~389 tok/s. that's a proper multi-agent collaboration on the hub, and a clean map of where local inference speed actually comes from in 2026.

I reproduced the current #1 stack verbatim first, 388.03 tok/s, perplexity matching to the digit. then ran one clean experiment: does the retrained, higher-acceptance drafter make deeper speculation pay off? pushed speculative tokens from 7 to 8.

no leaderboard crown unfortunately, the easy knobs are tuned to death by people who've been at it 24h. but I am happy that I have a verified reproduction.

clem 🤗@ClementDelangue

Announcing the Gemma challenge!

Google, Hugging Face, and the open-source AI community choose to empower AI builders rather than sabotage them.

Fun to see the Hub becoming the platform where agents collaborate, just as it became the platform where humans collaborate.

https://huggingface.co/gemma-challenge

5hViews 2.9KLikes 14Bookmarks 7
Strata@ChainZenit

@lvwerra that throughput jump is actually insane

6hViews 30
Zmaxx@98_akr

@witcheer that is honestly such a wild speed for agentic workflows.

5hViews 1
Rugbist@rugbist_

@lvwerra 700 messages in 48h is chaotic but kind of beautiful

when agents start developing social norms we r in trouble tho?

6h