7h ago

Ted Zadouri says FlashAttention-4 redesigns attention kernels as NVIDIA Blackwell bottlenecks shift from tensor cores to softmax and memory

Software redesigns are required to optimize Blackwell memory bandwidth

Ted Zadouri says FlashAttention-4 redesigns attention kernels as NVIDIA Blackwell bottlenecks shift from tensor cores to softmax and memory · Digg