MiniMax open-sources a blockwise sparse attention kernel showing training from scratch matches full-attention benchmarks · Digg