7h agoTed Zadouri says FlashAttention-4 redesigns attention kernels as NVIDIA Blackwell bottlenecks shift from tensor cores to softmax and memorySoftware redesigns are required to optimize Blackwell memory bandwidth