š New blog: SGLang and Miles Add Day-0 Support for NVIDIA Nemotron 3 Ultra for Long-Running Autonomous Agents
The hard part of agentic workloads isn't one big answer, it's sustaining reasoning across hundreds of steps. Here's how we deliver:
ā Hybrid Mamba-Transformer MoE: Mamba keeps long context cheap, Transformer layers preserve exact recall when agents retrieve facts ā One NVFP4 checkpoint, two GPU generations: same weights run on Hopper & Blackwell, no requant ā MTP cuts multi-turn latency by predicting multiple tokens per pass ā Miles RL: verified GRPO pipeline on 128 H200s in colocate mode. DP attention breaks the n_groups=8 TP cap to unlock large-scale EP for a Mamba-hybrid MoE
The full write-up covers the serving setup, the GRPO pipeline, on-policy verification, and reproducible Docker + scripts.
