2h ago

AI2's Nathan Lambert says Nvidia's multi-teacher distillation pipeline for Nemotron 3 Ultra represents the new post-training industry standard

3 top authors

The pipeline utilizes over 10 specialized teacher models

Original post

Nathan Lambert@natolambert

Nvidia joined the multi-teacher, on-policy distillation (MODP) gang! Is industry standard post-training right now. The multi-teacher SFT to RL that Microsoft did in their first model was the standard established by DeepSeek R1. I expect MAI 2 to be MODP.

6:36 AM · Jun 4, 2026

Sentiment

Pos50%

Neg50%

Some users call Nvidia's multi-teacher distillation for Nemotron post-training a great time for the tech while others dismiss it for having weaker benchmarks than in 2021.

2 comments with sentiment.

6 more posts

finbarr@finbarrtimbers·1hQuote tweet
The Nemotron 3 Ultra post-training pipeline is verrrry impressive.
View on
wh@nrehiew_·49mReply
Think this table is interesting to see what domains does the student outperform the teacher. The merged model outperforms the specialized RLVR model on agentic and instruction following benches. On TBench, the student significantly outperforms the teacher which is interesting. For reference, the second table is a similar figure from Mimo-v2-flash. Interesting to compare relative performance in ~similar domains