157/365 of GPU Programming
Another FlashAttention4 resource that's been really helpful for me is the talk @charles_irl gave last year on GPU Mode (basically the lecture version of We reverse-engineered Flash Attention 4 blog post which is awesome as well) about FA4's code and the evolution to FA4.
Really cool how the Modal team broke down the code before the paper release and made educated inferences about the forward pass.
Wish more people did deeper code dissections like this!


