/AI21h ago

Bleys Goodson finds MiniMax M3 achieves a 13.3% strict success rate on the DeepSWE coding benchmark

Extended runtime increased the model's success rate to 16.8%.

654273378129.1K

Quote posts

#420

Original post

Chubby♨️#1496

Bleys Goodson@bleysg

Since everyone is asking, I ran DeepSWE on MiniMax M3.

Here is the lowdown. 15 of 113 passed!

19 if you count the 1.5x overtime I gave just to see.

Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/

MiniMax (official)@MiniMax_AI

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero

API: http://platform.minimax.io Token Plan: https://platform.minimax.io/subscribe/token-plan 🚀New! MiniMax Code: http://code.minimax.io

Weights & Tech Report in ~10 Days

10:23 AM · Jun 1, 2026 · 124K Views

/AI21h ago

Bleys Goodson finds MiniMax M3 achieves a 13.3% strict success rate on the DeepSWE coding benchmark

Extended runtime increased the model's success rate to 16.8%.

--0--

Quote posts

#420

Original post

Chubby♨️#1496

Bleys Goodson@bleysg

Since everyone is asking, I ran DeepSWE on MiniMax M3.

Here is the lowdown. 15 of 113 passed!

19 if you count the 1.5x overtime I gave just to see.

Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/

MiniMax (official)@MiniMax_AI

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

API: http://platform.minimax.io Token Plan: https://platform.minimax.io/subscribe/token-plan 🚀New! MiniMax Code: http://code.minimax.io

Weights & Tech Report in ~10 Days

10:23 AM · Jun 1, 2026 · 124K Views

Sentiment

Many users defend MiniMax M3's real coding ability because it rarely causes regressions and often comes close to passing DeepSWE, while others accuse the company of benchmaxxing or call the independent run weak.

Pos

85.0%

Neg

15.0%

16 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS5.5KBOOKMARKS2LIKES48REPLIES6

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Kimi reigns supreme.

Bleys Goodson@bleysg

Since everyone is asking, I ran DeepSWE on MiniMax M3.

Here is the lowdown. 15 of 113 passed!

19 if you count the 1.5x overtime I gave just to see.

Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/

20h5.5K482