We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://mlc.ai/modern-gpu-programming-for-mlsys/ check it out
CMU releases a free online book on GPU programming for machine learning systems, covering NVIDIA's Blackwell architecture
The curriculum covers attention, prefill, and fused MoE kernels
Users are praising CMU's online book on modern GPU programming for ML systems because it offers practical techniques and bridges the gap between ML engineering and low-level systems knowledge.
No Digg Deeper questions have been answered for this story yet.
Most Activity

Thanks @modal for compute support for the course, and the amazing course staff to make it happen. Finally the effort is made possible by the open source TIRx compiler effort lead by @bohanhou1998 and many other collaborators.
We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://mlc.ai/modern-gpu-programming-for-mlsys/ check it out

@levidiamode @SCSatCMU We didn’t manage to have pub recording this time, hopefully the curated materials and interactive demos helps :)

@tqchenml @SCSatCMU amazing work! any chance you're gonna publish the lecture recordings as well?

@tqchenml @Soul0Engineer @SCSatCMU This is quite incredible thanks @tqchenml

@tqchenml @SCSatCMU awesome

@tqchenml @SCSatCMU sounds awesome, love the focus on practical gpu techniques

@tqchenml @SCSatCMU Wow , awesome :)

@tqchenml @SCSatCMU Beautiful

@tqchenml @SCSatCMU Thanks @tqchenml Any complementary materials for this course?

The rare curriculum that teaches across the boundary. Most ML engineers treat the kernel as a black box, and most systems courses stop at the API. The engineers who can reason from swizzling up to the scheduler are exactly who you want when the workload shape shifts underneath you.

@tqchenml @SCSatCMU AWESOME!

@tqchenml @algo_diver @SCSatCMU Thanks a lot for this 🙏🏻

@tqchenml @SCSatCMU thanks Prof. Chen I was looking for something like this it is very helpful...

@tqchenml @SCSatCMU that's awesome, but i think we're still missing the most important part: why are gpu programmers getting more attention in the ml space? what does it say about the state of our industry that we're shifting focus from algo dev to compute optimization?

@tqchenml @SCSatCMU Thx Tianqi for sharing this. It is quite a coincidence that we are just working on a related issue : SymmGemm ( https://www.linkedin.com/posts/lei-wang-1722a28a_faster-symmul-with-thunderkittenspdf-share-7475402691364536322-TPX6/?utm_source=share&utm_medium=member_desktop&rcm=ACoAABLocGYBO0QGi8RFxdL6jUQf99aRtJxy15k ) . We develop some unique techniques to utilize NoC cluster multicast and L2 cache affinity in this question, and hope this a useful example.

@tqchenml @SCSatCMU thanks boss

@tqchenml @levidiamode @SCSatCMU I wish I could attend your seminars and lab meetings remotely. There's so much info relevant to my daily job that will become easy to just naturally acquire by listening in on those conversations.

@tqchenml @SCSatCMU As a CMU student, what would prepare me the best for this type of GPU programming, ML Systems or DL systems?

@tqchenml @levidiamode @SCSatCMU whyyyy?)