Nvidia joined the multi-teacher, on-policy distillation (MODP) gang! Is industry standard post-training right now.
The multi-teacher SFT to RL that Microsoft did in their first model was the standard established by DeepSeek R1. I expect MAI 2 to be MODP.