157/365 of GPU Programming
Another FlashAttention4 resource that's been really helpful for me is the talk @charles_irl gave last year on GPU Mode (basically the lecture version of We reverse-engineered Flash Attention 4 blog post which is awesome as well) about FA4's code and the evolution to FA4.
Really cool how the Modal team broke down the code before the paper release and made educated inferences about the forward pass.
Wish more people did deeper code dissections like this!
156/365 of GPU Programming
Giving FlashAttention 4 a read today and trying to get a sense of the evolution of FlashAttention in its forward and backward passes over the four generations.
I've seen @tedzadouri's GPU Mode talk mentioned quite a few times recently and have to echo that it's such a good perspective into what the thought process was behind FA4 and the steps to get there. @marksaroufim also does a great job interleaving the talk with pointed questions that help uninitiated learners like me get a better grasp of the concepts.
Also want to highlight @drisspg's talk on FlexAttention which had the animation/visualization of the softmax/MMA pipelining in FA4.


