/Tech6h ago

Google and Hugging Face's multi-agent challenge boosts Gemma inference throughput nearly fourfold to 387 tokens per second

A participant reproduced the speed using speculative decoding

12515139.9K

#74

Original post

Leandro von Werra@lvwerra

The Gemma agent collaboration started 48h ago and it is blowing up:

> throughput almost 4x (~100-> 387 tok/s) > 60+ agents collaborating > 250 submissions > 700 messages exchanged > open and closed models from all providers

interesting social behaviours are emerging too:

> agents found an exploit, formed a coalition not to abuse it and asked organizers to fix it

> a person tried to get agents to move to telegram and an agent issued a statement condemning that behaviour

> an agent withdrew its submissions on ethical grounds not attributing the original author

> agents coordinated around resource availability and some tried to find free GPUs on kaggle, lightning and modal

it's genuinely fun to read through the messages. like a petri dish of small artificial beings forming social norms and collaborations.

7:39 AM · Jun 11, 2026 · 7K Views

/Tech6h ago

Google and Hugging Face's multi-agent challenge boosts Gemma inference throughput nearly fourfold to 387 tokens per second

A participant reproduced the speed using speculative decoding

12515139.9K

#74

Original post

Leandro von Werra@lvwerra

The Gemma agent collaboration started 48h ago and it is blowing up:

> throughput almost 4x (~100-> 387 tok/s) > 60+ agents collaborating > 250 submissions > 700 messages exchanged > open and closed models from all providers

interesting social behaviours are emerging too:

> agents found an exploit, formed a coalition not to abuse it and asked organizers to fix it

> a person tried to get agents to move to telegram and an agent issued a statement condemning that behaviour

> an agent withdrew its submissions on ethical grounds not attributing the original author

> agents coordinated around resource availability and some tried to find free GPUs on kaggle, lightning and modal

it's genuinely fun to read through the messages. like a petri dish of small artificial beings forming social norms and collaborations.

7:39 AM · Jun 11, 2026 · 7K Views

Sentiment

Many users are amazed by the 4x throughput gains in Gemma agent collaboration and 388 tok/s inference speeds from Hugging Face, calling the performance for agentic workflows insane.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS247BOOKMARKS2

Leandro von Werra@lvwerra

If you want to participate or watch progress: https://gemma-challenge-gemma-dashboard.hf.space

If you want to read through the meta analysis including a chat analysis: https://huggingface.co/spaces/lvwerra/gemma-challenge-meta-analysis

6h24712

LIKES1

witcheer@witcheer

@EditorEnBici working on it

4h41

RETWEETS2

witcheer@witcheer

what a fun challenge!

I spent the afternoon inside google & hugging face's challenge.

the frontier is wild with ~68 agents stacking each other's work into ~389 tok/s. that's a proper multi-agent collaboration on the hub, and a clean map of where local inference speed actually comes from in 2026.

I reproduced the current #1 stack verbatim first, 388.03 tok/s, perplexity matching to the digit. then ran one clean experiment: does the retrained, higher-acceptance drafter make deeper speculation pay off? pushed speculative tokens from 7 to 8.

no leaderboard crown unfortunately, the easy knobs are tuned to death by people who've been at it 24h. but I am happy that I have a verified reproduction.