1d ago

Nvidia releases Nemotron proxy and diffusion models

Nvidia released two Nemotron model families on Hugging Face. The first includes 62M and 350M parameter decoder-only proxies trained on 10 trillion tokens for scaling-law research. The second comprises 3B, 8B, and 14B tri-mode diffusion models that toggle between autoregressive, diffusion, and self-speculation modes solely by changing the attention mask.

0
Original post

NVIDIA just released Nemotron CLIMB Proxy Models on Hugging Face Small decoder-only models (62M & 350M params) trained on 10T tokens for scaling law research, enabling prediction of larger model behavior without full-scale compute.

4:24 PM · May 18, 2026 View on X
Reposted by