MSA paper is out
https://github.com/MiniMax-AI/MSA/blob/main/docs/MiniMaxSparseAttention.pdf
MSA paper is out
https://github.com/MiniMax-AI/MSA/blob/main/docs/MiniMaxSparseAttention.pdf

@aurko79 Do you think routing transformer can come back we codesign it? Or parts in msa already
Nice to see MiniMax and others pushing content based sparse attention forward. Back in 2020 the field was exploding with several efficient attention variants:
- Local/Sliding window attention - Fixed sparsity patterns - Strided attention - Recurrence ( T-XL, compressive transformers) - Chunked attention - Linear attention and SSMs
The right prior which has stood the test of time is a combination of: a) local attention for modeling local context and b) a top-k content based sparsity to route queries to the most relevant keys (blocks of keys).
What has changed in the last 6 years is better hardware-algorithm codesign.
Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: https://github.com/MiniMax-AI/MSA Paper:https://github.com/MiniMax-AI/MSA/blob/main/docs/MiniMaxSparseAttention.pdf

@_arohan_ Yeah IO aware clustering potentially
No Digg Deeper questions have been answered for this story yet.