NVIDIA releases Nemotron 3 Ultra under the OpenMDW 1.1 license, claiming 5x speedups and 30% lower cost
The model achieves roughly 300 tokens per second.
NEW: NVIDIA ships 550B MoE open model for long-running agents. Very exciting times to see more open models to support local long-running coding agents.
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
open weights and open *data* thank you for everything
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
Nemotron 3 Ultra is NVIDIA's best model yet and comes with a really great tech report. It focuses mainly on the NVFP4 recipe, and there is a ton of detailed work that went into their Multi-teacher On-Policy Distillation (MOPD) pipeline. A thread of my notes.
Today we're shipping Nemotron 3 Ultra. A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.
