“Agentic kernel optimization is the future of on-device inference”
@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!!
A live browser demo lets users test the agentic kernel optimization.
“Agentic kernel optimization is the future of on-device inference”
@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!!
Users are excited by the agentic kernel optimization achieving 255 tok/s for Gemma 4 on M4 WebGPU as a massive leap for high-performance local AI, while one reply criticized models for prioritizing speed over accurate outputs.
No Digg Deeper questions have been answered for this story yet.

Original post by @xenovacom
“Agentic kernel optimization is the future of on-device inference”
@xenovacom used Fable 5 to write kernels that pushed Gemma 4 to a massive 255 tok/s on WebGPU with M4. He shared the demo, so you can try in your browser!!

@googlegemma @xenovacom impressive

@googlegemma @xenovacom Impressive. Weren't Fable 5 safeguards specifically designed to "sabotage" research of this kind?

@googlegemma @ClementDelangue @xenovacom @grok 给我解释一下是什么意思

@googlegemma @xenovacom 50tok/s even on the Macbook M1 2020!

@googlegemma @xenovacom I would appreciate 10 tokens per second if they would be correct. Your products are lying, give false data, who cares how many tokens per second

@googlegemma @xenovacom 👀👀

@googlegemma @xenovacom ❤️🤝

@googlegemma @xenovacom This is a massive leap forward for accessible, high performance local AI.

@nestor_sct @googlegemma @xenovacom yeah see

@googlegemma @xenovacom you guys should have a nice public leaderboard of gemma projects, where people can submit + view! with the speed specially combined with cereberas its unlocking huge usecases lol

@googlegemma @xenovacom 255 tok/s in a browser is insane
we just casually have a whole llm ripping on webgpu now lol

@googlegemma @xenovacom 255 tokens per second on WebGPU right in the browser is a huge number for on device inference.

@googlegemma @xenovacom That speed sounds insane right now.

@googlegemma @xenovacom WebGPU SIMD helps, but memory bandwidth is the real bottleneck.

@googlegemma @xenovacom #respect @xenovacom

@googlegemma @xenovacom 80tok/s on my macbook air M4 16GB

@googlegemma @xenovacom agentic kernel optimization or buzzword of the week? 255 tok/s doesn't lie though. on-device inference finally delivering.

@googlegemma @xenovacom Insane cross promotion