@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.
@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD
