/AI2h ago

AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the post-training industry standard

The pipeline uses over 10 specialized teacher models.

--0--
Original post
Nathan Lambert@natolambert#64inAI

Nvidia joined the multi-teacher, on-policy distillation (MODP) gang! Is industry standard post-training right now.

The multi-teacher SFT to RL that Microsoft did in their first model was the standard established by DeepSeek R1. I expect MAI 2 to be MODP.

6:36 AM · Jun 4, 2026 · 14.3K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS3.6KBOOKMARKS25LIKES42RETWEETS2REPLIES5
finbarr@finbarrtimbers

The Nemotron 3 Ultra post-training pipeline is verrrry impressive.

1hViews 3.6KLikes 42Bookmarks 25