/Tech23d ago

MiniMax details its M3 sparse attention architecture, claiming a 15.6x decoding speedup at 1 million tokens

The design also achieves a 9.7x prefill speedup.

4415.7K4651.6K1M

#403

Original post

Zephyr#1695

Skyler Miao@SkylerMiao7

Something BIG is coming

7:49 AM · May 26, 2026 · 827.2K Views

Sentiment

Positive users hail MiniMax M3's block-based sparse attention for delivering 15x+ decoding speedups at 1M tokens, while negative users criticize predatory licensing and prior model shortfalls.

Pos

75.4%

Neg

24.6%

86 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS65.7KBOOKMARKS259

elie@eliebakouch

new minimax sparse attention compared to deepseek v3.2 (DSA) and v4 (CSA)

main changes: - based on GQA not MLA - block level selection like in CSA but attention is done on the real KV, not in the compressed dimension

Skyler Miao@SkylerMiao7

Something BIG is coming

23d65.7K612259

LIKES809REPLIES25

Chubby♨️@kimmonismus

MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2.

MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach, lightweight index branch for block selection, then sparse attention only on relevant KV blocks.

Really interesting. And tbh I'm always happy when open source receives new wins.

MiniMax (official)@MiniMax_AI

#MSA #OpenSource #M3 🫣😎

23d51.6K809108

RETWEETS285

Skyler Miao@SkylerMiao7

Something BIG is coming

23d827.2K3.3K1.1K

antirez@antirez

That's the way. DeepSeek v4 is the gold standard with both KV cache compression *and* indexed attention. But while we can likely live with the memory cost, the computational cost of dense attention for local inference is too heavy at ~100k context already.

Skyler Miao@SkylerMiao7

Something BIG is coming

23d25.7K323102

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Interesting, so M3 will have a yet another sparse attention design, this time with blocks. Looks like a streamlined, simplified NSA. It's cool that we have all labs mapping out the design space.

Skyler Miao@SkylerMiao7

Something BIG is coming

23d14.8K22230

Lisan al Gaib@scaling01

there's something in the corner of the image mmmmm

Skyler Miao@SkylerMiao7

Something BIG is coming

23d17.9K18311

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Reminder that Minimax M2 was supposed to be "Mini", it just turned out to be powerful enough for a whole generation of products. Full-size M3 is likely going to be a big jump.

MiniMax (official)@MiniMax_AI

#MSA #OpenSource #M3 🫣😎

23d7.6K12211

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

A bit extreme Even if M3 is a resounding technical and product success, other labs are cooking too. Kimi will come out with K3, for starters, GLM is not going anywhere, Stepfun… it's a question who gets to buy whom.