we completely missed this. Might be the biggest news from GLM. They do say "the entire OPD training". Uh. @jietang could you… please… share more details? This basically means that OPD is negligibly cheap relative to pretraining and expert RL, right? And expert RL is parallel…
Incredible how Z. ai literally has their RL infrastructure open source.
The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.
https://github.com/THUDM/slime






