Nvidia releases Nemotron proxy and diffusion models
Nvidia released two Nemotron model families on Hugging Face. The first includes 62M and 350M parameter decoder-only proxies trained on 10 trillion tokens for scaling-law research. The second comprises 3B, 8B, and 14B tri-mode diffusion models that toggle between autoregressive, diffusion, and self-speculation modes solely by changing the attention mask.
——0——