NVIDIA pretrains Nemotron 3 models in 4-bit NVFP4
NVIDIA has pretrained Nemotron 3 Super and Nemotron 3 Ultra models entirely in 4-bit NVFP4 precision. Nemotron 3 Super contains 120 billion parameters and was trained on 25 trillion tokens. Nemotron 3 Ultra reaches roughly 500 billion parameters on the same token volume. Vice president Bryan Catanzaro described the full pretraining run in reduced precision as part of efforts to raise training efficiency at scale.
We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4.
Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.
@ctnzr @max_paperclips This is a great work. Is it possible to give wall clock time for training dor 4 vs 8 bit training?
We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.
@ctnzr @max_paperclips Now give us a 10T parameter frontier model we can run on a small cluster at home, Bryan! The world is counting on you!
We've gone even farther: Nemotron 3 Super is 120B and pretrained on 25T tokens in NVFP4. Nemotron 3 Ultra is ~500B and also pretrained in NVFP4. Accelerated computing means we rethink every aspect of the AI stack looking for new opportunities to improve efficiency.