Since everyone is asking, I ran DeepSWE on MiniMax M3.
Here is the lowdown. 15 of 113 passed!
19 if you count the 1.5x overtime I gave just to see.
Full report: https://entrpi.github.io/misc/deep-swe-minimax-m3/
Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities
- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero
API: http://platform.minimax.io Token Plan: https://platform.minimax.io/subscribe/token-plan 🚀New! MiniMax Code: http://code.minimax.io
Weights & Tech Report in ~10 Days