1d ago

NVIDIA pretrains Nemotron 3 models in 4-bit NVFP4

0

NVIDIA has pretrained Nemotron 3 Super and Nemotron 3 Ultra models entirely in 4-bit NVFP4 precision. Nemotron 3 Super contains 120 billion parameters and was trained on 25 trillion tokens. Nemotron 3 Ultra reaches roughly 500 billion parameters on the same token volume. Vice president Bryan Catanzaro described the full pretraining run in reduced precision as part of efforts to raise training efficiency at scale.

Original post

We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.

2:01 PM · May 15, 2026 View on X

We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4.

Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.

9:01 PM · May 15, 2026 · 112.6K Views

@ctnzr @max_paperclips This is a great work. Is it possible to give wall clock time for training dor 4 vs 8 bit training?

Bryan CatanzaroBryan Catanzaro@ctnzr

We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.

9:01 PM · May 15, 2026 · 112.6K Views
4:37 AM · May 16, 2026 · 263 Views

@ctnzr @max_paperclips Now give us a 10T parameter frontier model we can run on a small cluster at home, Bryan! The world is counting on you!

Bryan CatanzaroBryan Catanzaro@ctnzr

We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.

9:01 PM · May 15, 2026 · 112.6K Views
11:26 AM · May 16, 2026 · 295 Views
NVIDIA pretrains Nemotron 3 models in 4-bit NVFP4 · Digg