What happens when you SFT DiffusionGemma to solve Sudoku puzzles, using our Hackable Diffusion library?
💻 ⚙️ : https://github.com/google-deepmind/gemma/tree/main/gemma/diffusion/hackable_diffusion_adapter
Google DeepMind has open-sourced DiffusionGemma, a 26B-parameter text model built on the Gemma 4 MoE backbone that replaces the usual autoregressive token stream with discrete diffusion and parallel decoding. The approach lets the model consider full bidirectional context and self-correct as it fills in blocks of text at once, with a separate hackable adapter released on GitHub for modular fine-tuning.
What happens when you SFT DiffusionGemma to solve Sudoku puzzles, using our Hackable Diffusion library?
💻 ⚙️ : https://github.com/google-deepmind/gemma/tree/main/gemma/diffusion/hackable_diffusion_adapter
The base model struggles to complete a 9x9 grid, yet the supervised fine-tuned variant solves the same puzzle correctly in far fewer steps, illustrating how the adapter can steer the diffusion process toward specific structured tasks.
Early runs show up to 4x faster generation on high-end GPUs compared with typical autoregressive models, though the experimental release focuses on local deployment rather than any hosted API.
DiffusionGemma is so fast that we had to slow down the videos so people could see what was happening
Very proud to see the release of DiffusionGemma! Congratulations to @bodonoghue85 and all the team!
This is a huge leap on faster text generation! 🚀
We have worked with them to also release today finetuning code, with several examples, based on Hackable Diffusion
Meet DiffusionGemma!
An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.
Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇
Read about how we used it to finetune DiffusionGemma on many tasks, including a very cool showcase on Sudoku puzzles!
https://developers.googleblog.com/en/diffusiongemma-the-developer-guide/?linkId=62264697
Hackable Diffusion is a modular toolbox written in JAX to experiment and educate around Diffusion modelling.
It was developed with *hackability* in mind, allowing for fast research iteration and tinkering on diffusion models. 🛠️
https://github.com/google/hackable_diffusion

Hackable Diffusion is a modular toolbox written in JAX to experiment and educate around Diffusion modelling.
It was developed with *hackability* in mind, allowing for fast research iteration and tinkering on diffusion models. 🛠️
https://github.com/google/hackable_diffusion
Congratulations to the DiffusionGemma team and everyone behind Hackable Diffusion that I have worked with on this: @ValentinDeBort1, @agalashov, Klaus Greff, Clement Crepy, @AndrewC_ML, David Ruhe, Alexis Jacq, Yu-Han Wu, with Romuald Elie and @ArnaudDoucet1
Vive l'open-source!
Read about how we used it to finetune DiffusionGemma on many tasks, including a very cool showcase on Sudoku puzzles!
https://developers.googleblog.com/en/diffusiongemma-the-developer-guide/?linkId=62264697

@ConstanceB53pe @osanseviero Compare yourself non-diffusion Gemma 4 26b moe to qwen 3.5 MOE .
@osanseviero haha i love it
DiffusionGemma is so fast that we had to slow down the videos so people could see what was happening
🚀⚡
DiffusionGemma is so fast that we had to slow down the videos so people could see what was happening

@qberthet Thank you so much for the repository! 👀 So many stuff to be built on top of that! Vive l'open source

@osanseviero It also reduces the quality of the already not so great model compared to qwen 3.6 or even qwen 3.5 significantly.

@osanseviero great usecase, I play sudokus sometime

@osanseviero This is truly an incredible achievement, a new approach. But please, Omar, at least blink an eye and let us know that we'll see a bigger Gemma 4 100B+.🥹

@osanseviero when your model is too fast for a demo video, you've crossed a line. Most AI announcements oversell. This one had to slow down just to be visible. That's a different kind of problem to have.

@GlosPazura @osanseviero Did the OP post about Gemma or DiffusionGemma ?

@osanseviero haha the 'we had to slow the videos down' flex is elite. parallel decoding finally paying off — curious how the tok/s holds up under real batching

@GlosPazura @osanseviero Is qwen a diffusion model ?

@osanseviero The speed is mind-blowing, but it seems that the quality is suffering heavily. Is the team aware? Diffusion is the OG of six fingered, three legged AI slop.

@osanseviero cool stuff 😎
Google DeepMind has open-sourced DiffusionGemma, a 26B-parameter text model built on the Gemma 4 MoE backbone that replaces the usual autoregressive token stream with discrete diffusion and parallel decoding. The approach lets the model consider full bidirectional context and self-correct as it fills in blocks of text at once, with a separate hackable adapter released on GitHub for modular fine-tuning.
What happens when you SFT DiffusionGemma to solve Sudoku puzzles, using our Hackable Diffusion library?
💻 ⚙️ : https://github.com/google-deepmind/gemma/tree/main/gemma/diffusion/hackable_diffusion_adapter