Great technical talk by @tedzadouri on FlashAttention-4: a deep look at how attention kernels are being redesigned for NVIDIA Blackwell, where the bottleneck shifts from tensor cores to softmax + memory movement.
Also featured: voice of god aka @marksaroufim in the background asking questions.
Link below!