HyperParallel-MoE is an Ascend-specific scheduling system for MoE training.
Ascend A3 exposes separate AIC matrix units and AIV vector/communication units, but standard MoE execution still runs Dispatch, GMM, SwiGLU, and Combine as serialized full-device kernels. The result is alternating idle hardware instead of tile-level overlap.
HyperParallel-MoE compiles MoE-FFN into a static heterogeneous taskflow:
- AIC handles GMM tiles
- AIV handles vector + communication tiles
- event counters enforce dependencies
- AIV-driven one-sided communication removes host-side collective barriers
- one kernel drives the combined taskflow
Results on DeepSeek-style MoE models:
- 1.49–1.58× lower Dispatch-to-Combine MoE-FFN latency under balanced routing
- 1.08–1.09× end-to-end training speedup under sampled natural routing
- integrated into MindSpore / MindFormers while reusing optimized operators
Paper: https://arxiv.org/abs/2605.23764