/AI15h ago

Huawei-led team completes full-parameter post-training of DeepSeek's 1.6-trillion-parameter model using 1,000 Ascend 910C chips

The cluster achieved 30% Model FLOPs Utilization.

556845.2K
Original post

There are not many 384 clusters. But they'd also fit with DeepSeek's MOPD agenda, if integrated. You can parallelize RL of teacher clones, and eventually merge them into the main branch. I think we'll see this more with 950 servers. V5 report will be a joyful read.

6:13 PM · Jun 6, 2026 · 2.3K Views
Sentiment

Users in the replies accused the US of funding Chinese AI research via a Huawei-led team's post-training of DeepSeek on Ascend hardware.

Pos
0.0%
Neg
100.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS314LIKES6
Grad@Grad62304977

@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD

There are not many 384 clusters. But they'd also fit with DeepSeek's MOPD agenda, if integrated. You can parallelize RL of teacher clones, and eventually merge them into the main branch. I think we'll see this more with 950 servers. V5 report will be a joyful read.

2hViews 314Likes 6Bookmarks 0
REPLIES1

@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.

Grad@Grad62304977

@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD

2hViews 164Likes 2Bookmarks 0
Grad@Grad62304977

@teortaxesTex ya fair just that u could also use these small seperate servers for inference (main workload) and u just have to centralise the training workload (technically dont need to either but still)

@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.

2hViews 62Likes 4Bookmarks 0
Bernard Gress@bernard_gress

@tomshardware #AmericaPaysForChinasResearch.

12hViews 33