There are not many 384 clusters. But they'd also fit with DeepSeek's MOPD agenda, if integrated. You can parallelize RL of teacher clones, and eventually merge them into the main branch. I think we'll see this more with 950 servers. V5 report will be a joyful read.
Huawei-led team completes full-parameter post-training of DeepSeek's 1.6-trillion-parameter model using 1,000 Ascend 910C chips
The cluster achieved 30% Model FLOPs Utilization.
Users in the replies accused the US of funding Chinese AI research via a Huawei-led team's post-training of DeepSeek on Ascend hardware.
Most Activity
@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD
There are not many 384 clusters. But they'd also fit with DeepSeek's MOPD agenda, if integrated. You can parallelize RL of teacher clones, and eventually merge them into the main branch. I think we'll see this more with 950 servers. V5 report will be a joyful read.
@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.
@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD
@teortaxesTex ya fair just that u could also use these small seperate servers for inference (main workload) and u just have to centralise the training workload (technically dont need to either but still)
@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.

@tomshardware #AmericaPaysForChinasResearch.