@teortaxesTex @didier_lopes this is after training all of the expert variants and collapse them back into the same base model? seems plausible to me the signal is much more dense in that phase
@didier_lopes > The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.
what