/Tech6h ago

Joshua Lochner uses Fable 5 to generate WebGPU kernels running Gemma 4 at 255 tokens/sec on Apple M4

A live browser demo lets users test the agentic kernel optimization.

501.3K10579471.8K

#109

Original post

Google Gemma@googlegemma

“Agentic kernel optimization is the future of on-device inference”

@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!!

1:26 PM · Jul 1, 2026 · 86.6K Views

Sentiment

Users are excited by the agentic kernel optimization achieving 255 tok/s for Gemma 4 on M4 WebGPU as a massive leap for high-performance local AI, while one reply criticized models for prioritizing speed over accurate outputs.

Pos

92.9%

Neg

7.1%

14 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS7.3KBOOKMARKS16LIKES45

Google Gemma@googlegemma

Original post by @xenovacom

7h7.3K4516

RETWEETS105

Google Gemma@googlegemma

“Agentic kernel optimization is the future of on-device inference”

@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!!

7h86.6K1.5K928

REPLIES1

Clark@clark__labs

@googlegemma @xenovacom impressive

2h79

Néstor Escoto@nestor_sct

@googlegemma @xenovacom Impressive. Weren't Fable 5 safeguards specifically designed to "sabotage" research of this kind?

6h2.3K16

Dealing Hash@Goldenage_labs

@googlegemma @ClementDelangue @xenovacom @grok 给我解释一下是什么意思

5h787

RasputinKaiser@RasputinKaiser

@googlegemma @xenovacom 50tok/s even on the Macbook M1 2020!

4h205

Who is there@Darko14563

@googlegemma @xenovacom I would appreciate 10 tokens per second if they would be correct. Your products are lying, give false data, who cares how many tokens per second

4h7902

Vanar@Vanarchain

@googlegemma @xenovacom 👀👀

6h1.6K1

Nina Fenko@Nina_f52

@googlegemma @xenovacom ❤️🤝

7h1.5K1

Frame@btcframe

@googlegemma @xenovacom This is a massive leap forward for accessible, high performance local AI.

5h4092

Gadi Cohen@gadicc

@nestor_sct @googlegemma @xenovacom yeah see

5h3112

Atharva@attharrva15

@googlegemma @xenovacom you guys should have a nice public leaderboard of gemma projects, where people can submit + view! with the speed specially combined with cereberas its unlocking huge usecases lol

6h9221