What are the largest software engineering tasks AI can perform?
To answer this, we built MirrorCode, our long-horizon SWE benchmark that lets AI code autonomously for days at a time.
The best models complete some tasks we estimate would take human engineers several weeks.


