Cursor releases Composer 2.5 as its most powerful model with gains in intelligence and reliability
Cursor announced Composer 2.5 as its most powerful model to date. The update improves intelligence, sustained performance on long-running tasks, and reliability with complex instructions. Cursor doubles included usage allowances for the next week. The model builds on the Kimi base with 85 percent of total compute devoted to additional training and reinforcement learning. It was partially trained on Colossus 2 and begins joint work with SpaceXAI on a larger model using 10 times more compute and Colossus 2's million H100-equivalent capacity. Benchmarks include Terminal-Bench, SWE-Bench Multilingual, and CursorBench.
Been working on text feedback / OPSD in Composer. Really interesting space, and a much more to be explored.

@eliebakouch Unfortunately can’t give a precise answer. Both were scaled significantly.
@srush_nlp really cool work congrats to the team. if you can answer, do you have rough estimate on how the compute was allocated between RL and continual pt in composer 2 -> 2.5?
Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.

Very cool to see Cursor doubling down on training great models. In my opinion, ultimately all serious companies in AI will want to train models themselves, based on open-source instead of outsourcing AI to others via APIs!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
Try it out!
(Partially trained on Colossus 2)
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
This is such an interesting chart layout, like it a lot!
Congrats to @cursor_ai team on the 2.5 launch 🚀
Great work - exciting to see you training a very powerful coding model!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
Composer 2.5 is a significant step up from Composer 2.
This is the very start of our work with SpaceXAI. Hope to have more improvements out soon.
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
Composer 2.5 sits on the Pareto frontier

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
Self distillation everywhere 🥳
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
@srush_nlp really cool work congrats to the team. if you can answer, do you have rough estimate on how the compute was allocated between RL and continual pt in composer 2 -> 2.5?
Been working on text feedback / OPSD in Composer. Really interesting space, and a much more to be explored.
cursor is at frontier scale, both in terms of performance and compute
if composer 2.5's budget was put into a pre-train: ~6.3T total, 200B active trained on ~56T tokens
if composer 3 allocates 50% of the budget to pre-training: ~500B active, 15.3T total trained on 135T tokens.
assumptions are a lower bound: 35% MFU, FP8, ~3-4% sparsity like K2, H100 efficiency. model/token allocation is the mean between K2+K2.5 data point and Inclusion AI compute optimal rules for MoE
really impressed by the progression between composer 2 and composer 2.5

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
> if composer 3 allocates 50% of the budget to pre-training: ~500B active, 15.3T total trained on 135T tokens.
i meant pre/mid/continual training here (whatever is not RL), and also put 10% estimation in the attached picture
cursor is at frontier scale, both in terms of performance and compute if composer 2.5's budget was put into a pre-train: ~6.3T total, 200B active trained on ~56T tokens if composer 3 allocates 50% of the budget to pre-training: ~500B active, 15.3T total trained on 135T tokens. assumptions are a lower bound: 35% MFU, FP8, ~3-4% sparsity like K2, H100 efficiency. model/token allocation is the mean between K2+K2.5 data point and Inclusion AI compute optimal rules for MoE really impressed by the progression between composer 2 and composer 2.5
@ericzakariasson looks good
composer 1 was fast composer 2 was fast and intelligent composer N:
Our new model is out. It stacks up nicely against the frontier!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
yeah that's pretty good
xAI might be able to cook with Cursor data + 10T model

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
We just trained another really really good model, please try it! Frontier intelligence + very fast
New Composer from Cursor team! Great to see their ack. to the Kimi base + how much they moved the model forward!
This isn't the one they are training on XAI Colossus, that one is coming and would likely slap hard!

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
We used a pretty cool "RL with text feedback" formulation to train this one (see blog post for some details). As RL tasks get longer in horizon, I think it's a ripe time to think about ways we can extract signals that avoid the variance explosion.
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
composer 2.5 is really really great. I had it on last week for some testing, forgot that it was on, & totally didn’t realize I wasn’t on gpt 5.5 (my usual) for a while. the team did a fantastic job!!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
@mntruell Congrats!
Composer 2.5 is a significant step up from Composer 2. This is the very start of our work with SpaceXAI. Hope to have more improvements out soon.
frontier smart extremely efficient Composer 2.5 is here
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
We've gotten really really good at RL. Composer 2.5 is fighting well-above its weight class.
Very excited for the next release as we scale model sizes and FLOPs with @SpaceXAI!
Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
@ericzakariasson Fat slow and genius
composer 1 was fast composer 2 was fast and intelligent composer N:
composer 1 was fast composer 2 was fast and intelligent composer N:

Introducing Composer 2.5, our most powerful model yet. It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions. For the next week, we’re doubling the included usage of the model.
