1h ago

NVIDIA details Nemotron 3 Ultra 550B pre-training using NVFP4, LatentMoE, and 20 trillion tokens

2 top authors

NVFP4 training achieved a 0.4% relative loss gap vs BF16.

Original post

Oh wow, they pre-trained Nemotron 3 Ultra in NVFP4 big update for estimating future model sizes and flops, especially for OpenAI models

7:21 AM · Jun 4, 2026

14 more posts