4/ Nvidia Ultra doesn't. Its teachers get their skills from SFT on data generated by external models (V4-Pro, gpt-oss, GLM) the student never saw. That pushes each teacher's distribution away from the student, so student rollouts can be out-of-distribution for the teacher.
3/ DeepSeek V4 gets that for free. The teachers are forks of one base, each base + domain SFT + RL, and the student is distilled from those same forks. Everyone is a small perturbation of the same backbone, so student rollouts are already in-distribution for the teachers.