There is a backstory to why @NVIDIAAI has stuck to 10% throughout the Nemotron 3 series, including the new 550B Ultra model, while most of the industry chases MoE with 3-5% activation.
LatentMoE is that story. They argue effective MoEs be evaluated by two dimensions: accuracy per FLOP and accuracy per parameter. The race toward 3-5% activation implicitly optimizes only the first.
https://arxiv.org/abs/2601.18089v1