2h ago

MiniMax details its M3 sparse attention architecture, claiming a 15.6x decoding speedup at 1 million tokens

The design also achieves a 9.7x prefill speedup.

951.3K138296130.8K

——0——

Currently Leading (May 26th, 2026)

Fastest Climbing

Original post

#1497@ZEPHYR_Z9OP

Skyler Miao@SKYLERMIAO7

Something BIG is coming

7:49 AM · May 26, 2026

Reposted by

#713@PMINERVINI

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

Interesting, so M3 will have a yet another sparse attention design, this time with blocks. Looks like a streamlined, simplified NSA. It's cool that we have all labs mapping out the design space.

Skyler Miao@SkylerMiao7

Something BIG is coming

2:49 PM · May 26, 2026 · 129K Views

2:57 PM · May 26, 2026 · 5.3K Views

QUOTE POST

#1496Chubby♨️@KIMMONISMUS

MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2.

MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach, lightweight index branch for block selection, then sparse attention only on relevant KV blocks.

Really interesting. And tbh I'm always happy when open source receives new wins.

3:54 PM · May 26, 2026 · 9.3K Views

MiniMax details its M3 sparse attention architecture, claiming a 15.6x decoding speedup at 1 million tokens

Currently Leading (May 26th, 2026)

Currently Leading (May 26th, 2026)

Sentiment

Cluster engagement