Excited to release ParallelKernelBench (PKB), a benchmark for measuring LLMs’ ability to write fast multi-GPU kernels! 😀
Multi-GPU kernel generation compounds several hard problems:
- a large parallelism design space - a new communication axis to optimize - and hardware-specific decisions around communication mechanisms
Existing kernel-generation benchmarks mostly target single-GPU workloads, so we built PKB to cover real-world multi-GPU workloads (many of which do not have existing optimized solutions). 🧵👇







