MiniMax-M2 paper just dropped
The key focus of M2 is on something more agent-native.
It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories.
What's interesting is that M2.7 starting to debug failed training runs and edit its own agent scaffold.
other key details:
29.9B MoE with 9.8B active, 256 fine-grained experts, full attention, 192K context, and MTP for speculative decoding.
"we found no variant that reliably matches full attention quality in production settings"