/Tech22d ago

NVIDIA releases Nemotron 3 Ultra, a 550B parameter open-weight hybrid Mamba2-Transformer MoE model for agentic workloads

AI Judge changed title after evaluation, original title: "NVIDIA releases Nemotron 3 Ultra, a 550-billion parameter open-weight hybrid Mamba2-Transformer MoE model"

Story Overview

NVIDIA has put out Nemotron 3 Ultra, a 550B-parameter open-weight model with only 55B active parameters that mixes Mamba2 layers and Transformer attention inside a Mixture-of-Experts setup. The release targets long-running agent workflows in coding, research, and enterprise settings, with support for up to 1M context and deployment on-premise, in the cloud, or at the edge. Weights and training details are available now under the OpenMDW 1.1 license on Hugging Face.

--0--

Original post

Bryan Catanzaro@ctnzr#652inTech

During the past 6 months, Nemotron has grown from 24 to 48 on the AAI, and we're just getting started.

Bryan Catanzaro@ctnzr

NVIDIA Nemotron 3 Ultra is now live!

Frontier accuracy, 5X greater speed, 30% lower cost.

Deploy however you need - on-premise, on the cloud, or at the edge.

Model is live on HuggingFace under the OpenMDW 1.1 license.

https://www.youtube.com/watch?v=D8LIIvQVGS4

5:42 AM · Jun 4, 2026 · 2.5K Views

Speed and cost numbers rest on NVIDIA's own agent benchmarks

The company states up to 5x higher inference throughput and 30 percent lower cost per task than other open frontier models, backed by charts comparing it on SWE-Bench, Terminal-Bench, and similar suites. Independent confirmation is still absent, so the practical gains for any specific workload remain to be measured by users running the model themselves.

Linear scaling from Mamba layers could matter most for extended agent runs

Replacing most attention with Mamba is presented as the route to handling million-token contexts without quadratic blow-up, which fits the emphasis on long-running agents rather than short chat turns. Whether that architectural choice holds accuracy across diverse tasks is one of the open questions the open weights now let others test directly.

Sentiment

Positive users praise NVIDIA's Nemotron 3 Ultra hybrid MoE releases for their scale, speed, and open availability, while negative users dismiss the models as poor quality or impractical due to size and performance issues.

Pos

79.6%

Neg

20.4%

249 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.