Thanks @modal for compute support for the course, and the amazing course staff to make it happen. Finally the effort is made possible by the open source TIRx compiler effort lead by @bohanhou1998 and many other collaborators.
We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://mlc.ai/modern-gpu-programming-for-mlsys/ check it out