/AI21h ago

Bleys Goodson finds MiniMax M3 achieves a 13.3% strict success rate on the DeepSWE coding benchmark

Extended runtime increased the model's success rate to 16.8%.

--0--
Quote posts
Original postChubby♨️#1496

Since everyone is asking, I ran DeepSWE on MiniMax M3.

Here is the lowdown. 15 of 113 passed!

19 if you count the 1.5x overtime I gave just to see.

Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero

API: http://platform.minimax.io Token Plan: https://platform.minimax.io/subscribe/token-plan 🚀New! MiniMax Code: http://code.minimax.io

Weights & Tech Report in ~10 Days

10:23 AM · Jun 1, 2026 · 124K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS5.5KBOOKMARKS2LIKES48REPLIES6

Kimi reigns supreme.

Since everyone is asking, I ran DeepSWE on MiniMax M3.

Here is the lowdown. 15 of 113 passed!

19 if you count the 1.5x overtime I gave just to see.

Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/

20hViews 5.5KLikes 48Bookmarks 2