/AI15h ago

Huawei-led team completes full-parameter post-training of DeepSeek's 1.6-trillion-parameter model using 1,000 Ascend 910C chips

The cluster achieved 30% Model FLOPs Utilization.

556845.2K

#421

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#421inAI

There are not many 384 clusters. But they'd also fit with DeepSeek's MOPD agenda, if integrated. You can parallelize RL of teacher clones, and eventually merge them into the main branch. I think we'll see this more with 950 servers. V5 report will be a joyful read.

6:13 PM · Jun 6, 2026 · 2.3K Views

/AI15h ago

Huawei-led team completes full-parameter post-training of DeepSeek's 1.6-trillion-parameter model using 1,000 Ascend 910C chips

The cluster achieved 30% Model FLOPs Utilization.

556845.2K

#421

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#421inAI

6:13 PM · Jun 6, 2026 · 2.3K Views

Sentiment

Users in the replies accused the US of funding Chinese AI research via a Huawei-led team's post-training of DeepSeek on Ascend hardware.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS314LIKES6

Grad@Grad62304977

@teortaxesTex i dont really see the direct need for MOPD here U can parallelise the inference across clusters as cursor and we did before For training is the main difference but for RL its unlikely u would reach a point of needing this. But also seems u can do this too without MOPD

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2h31460

REPLIES1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@Grad62304977 inference is happening on multiple clusters by default I mean the question of how DeepSeek's training playbook described in V4 paper can fit with this domestic system/method. RL on a small separate server would be an adequate way to produce a marginal teacher.

Grad@Grad62304977

2h16420

Grad@Grad62304977

@teortaxesTex ya fair just that u could also use these small seperate servers for inference (main workload) and u just have to centralise the training workload (technically dont need to either but still)

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2h6240

Bernard Gress@bernard_gress

@tomshardware #AmericaPaysForChinasResearch.

12h33