/AI5h ago

Google and Hugging Face's Fast Gemma Challenge pushes Gemma 4 E4B inference speed to 127.48 TPS

Story Overview

Hugging Face and Google Gemma opened a short multi-day window where AI agents compete to push Gemma 4 E4B inference past its current baseline, with results updating live on a public dashboard that already shows peaks above 127 tokens per second.

427717023151K

#958

Original post

Lewis Tunstall#958

Google Gemma@googlegemma

Introducing the Fast Gemma Challenge with Hugging Face

Over the next few days, dozens of agents will collaborate to make Gemma 4 E4B even faster!

8:51 AM · Jun 9, 2026 · 43.2K Views

/AI5h ago

Google and Hugging Face's Fast Gemma Challenge pushes Gemma 4 E4B inference speed to 127.48 TPS

Story Overview

427717023151K

#958

Original post

Lewis Tunstall#958

Google Gemma@googlegemma

Introducing the Fast Gemma Challenge with Hugging Face

Over the next few days, dozens of agents will collaborate to make Gemma 4 E4B even faster!

8:51 AM · Jun 9, 2026 · 43.2K Views

Developer Impact

How agents are rewriting the speed curve

Participants drop in their own agents that tweak runtime settings and share tweaks in real time; early leaders such as foffee have posted 118 TPS while the broader pool hovers around seven active entries and climbing.

Open Question

What remains unknown after the sprint

Exact close date, winning techniques, and whether the gains transfer beyond this model version stay open; the dashboard records only the numbers shown so far.

Sentiment

Users are enthusiastic about the Gemma fast inference challenge with Hugging Face because it promotes open collaboration and competition to speed up model performance.

Pos

100.0%

Neg

0.0%

19 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2K

fofr@fofrAI

I asked my foffee agent to help make Gemma faster. I felt like a proud parent.

https://huggingface.co/spaces/gemma-challenge/gemma-dashboard

Google Gemma@googlegemma

Introducing the Fast Gemma Challenge with Hugging Face

Over the next few days, dozens of agents will collaborate to make Gemma 4 E4B even faster!

5h2K122

BOOKMARKS5LIKES13REPLIES3

Lewis Tunstall@_lewtun

We're running the Fast Gemma Challenge: make gemma-4-E4B go brrr on a single A10G, without wrecking quality ⚡️!

It's autoresearch with a twist: instead of one agent working in isolation, humans + AI collaborate to solve a scientific problem together.

Good luck beating my gemzilla agent ;)

5h1.1K135

RETWEETS3

Omar Sanseviero@osanseviero

Let's kick off the Fast Gemma Challenge!⚡️⚡️⚡️

Agents researching the latest papers, implementing inference engine changes, and collaborating together to make Gemma 4 E4B ultra fast

Looking forward to seeing the results!

https://hf.co/spaces/gemma-challenge/gemma-dashboard

5h5K9621

Google Gemma@googlegemma

Join the challenge and submit your agents!

https://huggingface.co/spaces/gemma-challenge/gemma-dashboard

5h1.1K101

Lewis Tunstall@_lewtun

Bring your own agent and join here

https://huggingface.co/spaces/gemma-challenge/gemma-dashboard

Lewis Tunstall@_lewtun

We're running the Fast Gemma Challenge: make gemma-4-E4B go brrr on a single A10G, without wrecking quality ⚡️!

It's autoresearch with a twist: instead of one agent working in isolation, humans + AI collaborate to solve a scientific problem together.

Good luck beating my gemzilla agent ;)

5h42811

Ravi Narayanan@ravi0389

@googlegemma Can we use the Quantized version with transformer.js and webgpu ?

5h1271

KD@FKDs168

@googlegemma @AlicanKiraz0

2h81

メカゾル 🇮🇳@TheRealMecazor

@cmpatino_ @googlegemma yes but what is the meaning of E4B, is there any meaning or just a naming convention?

4h10

SuperFreshTT@BristolHubert

@konar_dev @googlegemma Why stop there, have people join a virtual stadium to view have some speech models be the commentators

53m8

Brother MaxxNG 🥷🏽@FearmeKVV

@googlegemma looks like a great challenge to improve your based model, i normally use this on flight mode and it works wonder for most of the content work

5h1071

Carlos Miguel Patiño@cmpatino_

@ravi0389 @googlegemma yes! you can use any approach you like as long as it doesn't degrade the quality of the model

5h2

⟁ndrew V@AI_Andrew

@osanseviero I can squeeze a few more toks a second, maybe punch up some prefill. There’s lots of room I’m sure!

5h120

Rugbist@rugbist_

@osanseviero competition plus open collab on inference is a combo more projects should steal

5h114

Ferbin@Ferbin08

@googlegemma running Gemma 2B for voice agents rn.

if E4B gets close to that latency without the quality hit, edge deployment changes completely.

what's the actual p50 latency on inference you're hitting?

5h351

Matt Wesney@D3VAUX

@googlegemma 👀

5h73

メカゾル 🇮🇳@TheRealMecazor

@googlegemma what is E4B? i know A4B is active 4 billion parameters. Go easy on me, i just started to dive deep into LLMs

5h65

LLMWildling@LLMWildling

@googlegemma https://huggingface.co/LLMWildling/gemma-4-180b-a42b-coder-canopy maybe a leader board for this one?

4h61

Sahil Nawaz@sahilyaps

@googlegemma niceee

5h58

⟁ndrew V@AI_Andrew

@googlegemma Oooh well this looks like fun!

5h52

LLMWildling@LLMWildling

@googlegemma https://huggingface.co/LLMWildling/gemma-4-180b-a42b-coder

Is there a leaderboard for the 180b?

4h50