Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

9:06 AM · Jun 10, 2026 · 110.9K Views
Sentiment

Users are excited about DiffusionGemma's parallel text generation and up to 4x faster inference via bi-directional attention, praising it as a profound architectural leap and banger from Google.

Pos
97.4%
Neg
2.6%
54 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS64.6KLIKES1.3KRETWEETS169REPLIES83
Sundar Pichai@sundarpichai

DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!

1hViews 64.6KLikes 1.3KBookmarks 206
BOOKMARKS251
Unsloth AI@UnslothAI

Google releases DiffusionGemma.✨ The new 26B-A4B diffusion text model runs locally on 18GB RAM.

It supports high-speed text generation, thinking, image, video and 256K context.

Run and train via Unsloth Studio.

GGUF: https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF Guide: https://unsloth.ai/docs/models/diffusiongemma

Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

1hViews 26.1KLikes 476Bookmarks 251
Google@Google

Meet DiffusionGemma ⚡ Our latest experimental open model (Apache 2.0) that generates text up to 4x faster.

Instead of predicting and typing just one word at a time like most language models, it drafts and refines entire blocks of text simultaneously.

Here’s how it works 🧵 ↓

1hViews 59.1KLikes 929Bookmarks 222
Google DeepMind@GoogleDeepMind

DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.

Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time.

1hViews 44.2KLikes 914Bookmarks 162
Omar Sanseviero@osanseviero

Introducing DiffusionGemma, our first exploration with open diffusion text generation models

🔥Generate blocks of text at a time 🤏26B MoE built on top of Gemma 4 ⚡️Up to 4x faster in popular consumer GPUs 🤗Apache 2.0

Excited to see what the community builds with it!

1hViews 14.7KLikes 438Bookmarks 110
Philipp Schmid@_philschmid

Gemma goes diffusion! DiffusionGemma with up to 1000+ tokens per second! 🌬️

- Built on Gemma 4 as a 26B MoE model. - 3.8B parameters during inference. - Generates text in 256-token blocks in parallel. - Fits within 18 GB VRAM limits when quantized. - Apache 2.0

1hViews 7.2KLikes 145Bookmarks 54

Want 4x faster local inference on dedicated GPUs for your interactive apps? DiffusionGemma is an experimental, open 26B MoE model that generates entire blocks of text simultaneously instead of token-by-token.

By shifting the local decoding bottleneck from memory-bandwidth to compute, it hits speeds over 700 tokens/sec on a single NVIDIA RTX 5090 GPU. This diffusion unlocks unique local workflows like real-time inline editing, code infilling, and instant self-correction.

📥 Download the Apache 2.0 weights on @HuggingFace: https://goo.gle/4xqzKTA

📖 Read the full technical announcement on the blog: https://goo.gle/4ursgwI

1hViews 8KLikes 144Bookmarks 41
Unsloth AI@UnslothAI

@googlegemma Google Deepmind once again delivering when it comes to open-source! 🙏🥰

You can run DiffusionGemma locally on 18GB RAM via our GGUFs: https://huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF

Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

1hViews 3.2KLikes 106Bookmarks 40
Google@Google

We're releasing DiffusionGemma as an open model under an Apache 2.0 license for anyone to experiment with.

Download the model weights on @huggingface, and learn more about DiffusionGemma → http://goo.gle/3Sy0Is7

Google@Google

Because it generates everything at once, DiffusionGemma unlocks new patterns of model behavior.

⚡ Fast: Generates up to 1,000+ tokens a second for up to 4x faster text generation.

💻 Lightweight: Runs smoothly right on 18GB consumer graphics cards.

🧠 Smart editing: Since it processes larger amounts of information at once, it can easily fill in blanks, format code, and fix its own errors in real time.

1hViews 17.7KLikes 151Bookmarks 29
Google@Google

Because it generates everything at once, DiffusionGemma unlocks new patterns of model behavior.

⚡ Fast: Generates up to 1,000+ tokens a second for up to 4x faster text generation.

💻 Lightweight: Runs smoothly right on 18GB consumer graphics cards.

🧠 Smart editing: Since it processes larger amounts of information at once, it can easily fill in blanks, format code, and fix its own errors in real time.

1hViews 19.6KLikes 148Bookmarks 20
vLLM@vllm_project

Congrats to @GoogleDeepMind on DiffusionGemma 🎉 A 26B diffusion language model on the Gemma4 backbone, and the first dLLM natively supported in vLLM.

It denoises 256-token blocks in parallel instead of generating one token at a time: 1200+ output tok/s at batch size 1 on a single H200 (FP8).

Built on model runner v2's ModelState plus the existing speculative decoding path, with minimal scheduler or runner changes. FP8 and NVFP4 checkpoints are on the @RedHat_AI hub. Thanks to the @GoogleDeepMind, @RedHat_AI, and @NVIDIAAI teams!

🔗 http://vllm.ai/blog/2026-06-10-diffusion-gemma

Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

1hViews 3.1KLikes 68Bookmarks 14
Sundar Pichai@sundarpichai

Model weights available on Hugging Face under Apache 2.0 license, read more here: https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/

1hViews 7.6KLikes 72Bookmarks 12

DiffusionGemma, our experimental open model released under an Apache 2.0 license, explores text diffusion, an exceptionally fast approach to text generation.

Here’s how DiffusionGemma accelerates development:

+ Faster token output: By shifting the bottleneck from memory bandwidth to raw compute, the model generates up to 4x faster token output on dedicated GPUs + Accessible hardware footprint: Activates just 3.8B parameters during inference, fitting comfortably within 24GB-VRAM high-end consumer GPUs when quantized + Novel workflows: Parallel token generation enables self-correction, making it ideal for code infilling, in-line editing, and non-linear structures

DiffusionGemma prioritizes speed over raw quality and accelerates best on compute-bound hardware (like @NVIDIAAI GPUs). Standard @GoogleGemma 4 remains recommended for production quality and memory-bound devices.

1hViews 2.7KLikes 68Bookmarks 16
elvis@omarsar0

This is awesome!

I am spending a lot of time on diffusion LLMs these days, so this is perfect timing.

I feel like there are so many underexplored research questions around text diffusion.

Weight available in HF.

Google DeepMind@GoogleDeepMind

DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs.

Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time.

1hViews 3.9KLikes 37Bookmarks 17
merve@mervenoyann

DiffusionGemma is out 🔥

it's compute-bound so 4x faster compared to other Gemma-4 models (1k tok/s on H100) 💨

also great on coding, generate and iterate on any code from 3D generation to front-end ⤵️

1hViews 3.4KLikes 54Bookmarks 14
Google@Google

Most large language models predict answers by guessing the single best word to say next, then the next, and so on... 🔎

It's highly capable, but not necessarily fast. The model waits to finish one word before it can think about the next.

DiffusionGemma skips the wait.

It uses "diffusion" to generate text by refining noise step by step — drafting and error-correcting whole blocks simultaneously. This makes it incredibly fast, and helpful for editing complex math and code.

1hViews 5.6KLikes 80Bookmarks 8

Local models can't benefit from batch parallelism as easily, but you can still parallelise over the token axis. So here's an open text diffusion model! >1000 tokens/sec for accelerated tokenmaxxing, yay!🫨

Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

59mViews 1.6KLikes 32Bookmarks 6
Google Gemma@googlegemma

Meet DiffusionGemma!

An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license.

Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

1hViews 110.9KLikes 1.9KBookmarks 688
Patrick Loeber@patloeber

Meet DiffusionGemma: 4x faster text generation, released under Apache 2.0 license🔥

https://huggingface.co/google/diffusiongemma-26B-A4B-it

1hViews 1KLikes 32Bookmarks 4
Brendan O'Donoghue@bodonoghue85

DiffusionGemma brings high intelligence and lightning fast ⚡️ inference to local developers (>1100 tok/s on a single H100)!

I'm excited to see what people will do with this model - and what improvements people can build on top (better samplers maybe??).

So unbelievably proud of the hard work the team put in to get this out!🪐🪐🪐

Sundar Pichai@sundarpichai

DiffusionGemma is an open, experimental model that brings our text diffusion research to Gemma 4. It’s a racehorse 🏇achieving up to 4x faster inference by generating entire blocks of text simultaneously vs predicting token-by-token (word-by-word) output!

1hViews 665Likes 23Bookmarks 2
Load more posts