3h ago

HyperParallel-MoE Speeds Up MoE Training On Ascend A3 Hardware

1264152.5K

——0——

Original post

HyperParallel-MoE is an Ascend-specific scheduling system for MoE training. Ascend A3 exposes separate AIC matrix units and AIV vector/communication units, but standard MoE execution still runs Dispatch, GMM, SwiGLU, and Combine as serialized full-device kernels. The result is alternating idle hardware instead of tile-level overlap. HyperParallel-MoE compiles MoE-FFN into a static heterogeneous taskflow: - AIC handles GMM tiles - AIV handles vector + communication tiles - event counters enforce dependencies - AIV-driven one-sided communication removes host-side collective barriers - one kernel drives the combined taskflow Results on DeepSeek-style MoE models: - 1.49–1.58× lower Dispatch-to-Combine MoE-FFN latency under balanced routing - 1.08–1.09× end-to-end training speedup under sampled natural routing - integrated into MindSpore / MindFormers while reusing optimized operators Paper: https://arxiv.org/abs/2605.23764

8:33 PM · May 24, 2026