Cursor is experimenting with agents running autonomously for days and even weeks

Apart from the TLDR I found very interesting that they have an agent running that is working on creating an Excel alternative. Will be very interesting to see if and how good this is.

In general I'm fascinated by these experiments as they question a lot of the normal ways any software team was working until recently with lots of time spent on coordination between all kinds of different disciplines.

Scaling long-running autonomous coding · Cursor - Featured Image

TLDR

This blog post explores Cursor's experiments with scaling autonomous coding agents. They've learned that coordinating hundreds of agents on a single project requires a balance of structure and flexibility. Initially, they tried dynamic coordination and locking mechanisms, which failed due to bottlenecks and brittleness. They then separated roles into planners, workers, and judges, which improved coordination and scalability. They tested this system by having agents build a web browser from scratch, migrate a codebase, and improve a product. They found that model choice and prompts matter more than system complexity. While multi-agent coordination remains challenging, they've made progress by scaling agents to tackle ambitious projects.