9h ago

Ziming Mao open-sources mKernel, fusing GPU computation with NVLink and RDMA communication in a single persistent kernel

The system reduces coordination overhead in distributed workloads.

0
Original post

🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels. 💻 Code: https://github.com/uccl-project/mKernel 📝 Blog: https://uccl-project.github.io/posts/mkernel/ mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication. Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05

11:05 AM · May 25, 2026 View on X
Reposted by