atomic[.]chat shared a revealing comparison of local open-weight LLMs running on their own hardware.
They benchmarked the new DiffusionGemma (diffusion text model) vs. Gemma4 26B A4B (autoregressive model) on a single H100 (FP8).
The 4X speed of DiffusionGemma changes the shape of error.
- Autoregressive models move left to right, one token at a time, which is slower, but each new word is conditioned on the exact text already written.
- Diffusion models write many tokens at once, then revise the block over several passes, so they can feel fast because the model is not waiting to finish token 1 before starting token 2.
atomic[.]chat, a desktop app for running LLMs locally
Diffusion Gemma is 4x faster, but makes 6x more mistakes!
We benchmarked the new diffusion LLM against its autoregressive twin on a single H100 (FP8). We gave each the same three tasks: write a Steve Jobs biography, the history of Tetris, and the story of BeOS - every next topic less popular than the previous one. Then we fact-checked every claim in every answer.
Gemma4 got 45 facts right, 5 wrong. DiffusionGemma got 33 right, 28 wrong. The less popular the topic, the worse it got: 4 mistakes on Jobs, 12 on Tetris, 12 on BeOS. It named Clara Clley as Steve Jobs' mother, invented a colleague for Pajitnov named Geri Gulovik and priced the BeBox at $9,999. The real one cost $1,600.
Outputs: Gemma4 26B A4B: 218 tok/s · 15.1s total · 45 facts · 5 mistakes DiffusionGemma 26B A4B: 763 tok/s · 3.7s total · 33 facts · 28 mistakes
The reason is simple. DiffusionGemma throws 256 tokens on the screen at once and polishes them pass after pass until the text sounds smooth. Smooth is all it cares about: a fake name, date or number sounds just as smooth as a real one, so it stays. Regular Gemma4 meanwhile writes one word at a time and checks every new word against everything before it. Google says it themselves in the launch post: quality is lower, use regular Gemma 4 when facts matter.

