🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels.
💻 Code: https://github.com/uccl-project/mKernel
📝 Blog: https://uccl-project.github.io/posts/mkernel/
mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication.
Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05