MiniMax details its M3 sparse attention architecture, claiming a 15.6x decoding speedup at 1 million tokens
The design also achieves a 9.7x prefill speedup.
Positive users hail MiniMax M3's block-based sparse attention for delivering 15x+ decoding speedups at 1M tokens, while negative users criticize predatory licensing and prior model shortfalls.
No Digg Deeper questions have been answered for this story yet.
Most Activity
new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA)
main changes: - based on GQA not MLA - block level selection like in CSA but attention is done on the real KV, not in the compressed dimension
Something BIG is coming
MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2.
MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach, lightweight index branch for block selection, then sparse attention only on relevant KV blocks.
Really interesting. And tbh I'm always happy when open source receives new wins.
#MSA #OpenSource #M3 🫣😎
Something BIG is coming
That's the way. DeepSeek v4 is the gold standard with both KV cache compression *and* indexed attention. But while we can likely live with the memory cost, the computational cost of dense attention for local inference is too heavy at ~100k context already.
Something BIG is coming
Interesting, so M3 will have a yet another sparse attention design, this time with blocks. Looks like a streamlined, simplified NSA. It's cool that we have all labs mapping out the design space.
Something BIG is coming
there's something in the corner of the image mmmmm
Something BIG is coming
Reminder that Minimax M2 was supposed to be "Mini", it just turned out to be powerful enough for a whole generation of products. Full-size M3 is likely going to be a big jump.
#MSA #OpenSource #M3 🫣😎
A bit extreme Even if M3 is a resounding technical and product success, other labs are cooking too. Kimi will come out with K3, for starters, GLM is not going anywhere, Stepfun… it's a question who gets to buy whom.
If this fails then Minimax is dead
If this works it should start buying other companies and merging
I just hope it's at least 500B
there's something in the corner of the image mmmmm

@Xianbao_QIAN haha, in several days~

@eliebakouch Tech blog coming soon.
And M3 :)

@SkylerMiao7 Let him cook

@SkylerMiao7 @Xianbao_QIAN Very cool!
Will you all be publishing open weights at the same time that the API is released or will it be a staggered release like M2.7?

For non ai people:
most ai models work like this,
every word looks at every other word to understand context
great for accuracy, terrible for speed at massive scale
MiniMax's sparse attention changes that
instead of processing an entire 1m token context deeply, the model first,
quickly scans everything
finds the most relevant sections
focuses only on those parts
kind of like how humans use an index before reading a huge textbook
Result: - 10x faster context processing - 15x faster decoding - Much lower compute costs
this is one of the key tricks enabling ultra long context ai models without needing absurd amounts of gpu power

@eliebakouch btw just dropped a video explaining the entire deepseek v4 architecture with CSA and HCA:

@leeweiserngmai1 for sure

@SkylerMiao7 Great . Now you have to replace M2.x with M3.x in your bio

@SkylerMiao7 this minimax architecture looks wild but is it actually gonna outperform current sota models or is it just more hype

@SkylerMiao7 LFG! 🚀

@SkylerMiao7 Can it finally support images 🙏