Ted Zadouri says FlashAttention-4 redesigns attention kernels as NVIDIA Blackwell bottlenecks shift from tensor cores to softmax and memory · Digg