Most language models only generate one token at a time.
We just released Nemotron-Labs-Diffusion, a family of diffusion language models that take a different approach, generating multiple tokens in parallel within a single model. Rather than committing to each token permanently, these models can revise as they go, resulting in faster inference that better utilizes modern GPUs.
The full model family ranges from 3B to 14B, including vision-language variants. Available now: https://nvda.ws/4tEnTxP