/Tech26d ago

Cursor releases Composer 2.5 coding agent scoring 62 on the Artificial Analysis index while posting 10 to 60 times lower costs than Claude Opus 4.7 and GPT-5.5

Internal test routed all company chats to the model for two days.

9.7K113.3K13.1K12.4K111.5M

Original post

Sasha Rush#19

Cursor@cursor_ai#1312inTech

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

9:43 AM · May 18, 2026 · 19.4M Views

Sentiment

Many users praise Cursor's Composer 2.5 for its benchmark gains, low-cost efficiency, and ability to handle complex refactors like reading their mind, while a few dismiss it as trash or inferior to Opus.

Pos

84.6%

Neg

15.4%

1,992 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS36MLIKES28.5K

Elon Musk@elonmusk

Try Composer 2.5 on Cursor!

Michael Truell@mntruell

Composer 2.5 is now the most-chosen model in Cursor.

We're giving everyone 10x usage for the rest of the day. Enjoy!

25d36M28.5K2.7K

BOOKMARKS2.7KRETWEETS3K

Elon Musk@elonmusk

Try it out!

(Partially trained on Colossus 2)

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d10.8M25.9K2.7K

REPLIES2K

Elon Musk@elonmusk

Try Composer 2.5

BridgeMind@bridgemindai

New CursorBench results just dropped.

Two big takeaways.

Composer 2.5 is way better than most people think.

63.2% score at $0.55 per task.

Nearly matching Opus 4.7 Max and GPT 5.5 Extra High at 20x less cost.

This is insane value.

Gemini 3.5 Flash is #10 at 49.8%.

Below GPT 5.5 Low. Below Opus 4.7 Low.

Google's newest model can't even beat budget tier competition.

Composer 2.5 is the sleeper.

Gemini 3.5 Flash is the disappointment.

23d7.1M19.2K2K

Michael Truell@mntruell

Composer 2.5 is a significant step up from Composer 2.

This is the very start of our work with SpaceXAI. Hope to have more improvements out soon.

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d995.1K4.6K176

Michael Truell@mntruell

Composer 2.5 is now the most-chosen model in Cursor.

We're giving everyone 10x usage for the rest of the day. Enjoy!

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

25d33.8M3.1K462

Dan Perks@DanPerks

the team did an internal test of this model last week

the whole company (bar a few exceptions) had all their cursor chats redirected to composer 2.5 for like 2 days.

i didn't even notice, which I think is testament to the progress of this model. go use it, its very good.

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d905.4K2.4K131

Lisan al Gaib@scaling01

yeah that's pretty good

xAI might be able to cook with Cursor data + 10T model

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d689.7K2.4K125

Sualeh Asif@sualehasif996

We've gotten really really good at RL. Composer 2.5 is fighting well-above its weight class.

Very excited for the next release as we scale model sizes and FLOPs with @SpaceXAI!

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d677K2.2K87

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d19.4M13K2.7K

Cursor@cursor_ai

Composer 2.5 is built on the same open-source base as Composer 2, Moonshot’s Kimi K2.5.

Cursor@cursor_ai

We improved Composer by scaling training, generating more complex RL environments, and introducing new learning methods.

For example, we use text feedback during RL to learn faster by assigning credit in rollouts spanning hundreds of thousands of tokens.

26d198.9K1.7K98

eric zakariasson@ericzakariasson

composer 1 was fast composer 2 was fast and intelligent composer N:

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d105.4K1.6K103

Beff (e/acc)@beffjezos

This is very bullish for SpaceXAI

Cursor@cursor_ai

Composer 2.5 is exceptionally intelligent and up to 10x more efficient than similarly capable models.

25d122.3K1.2K51

elie@eliebakouch

cursor is at frontier scale, both in terms of performance and compute

if composer 2.5's budget was put into a pre-train: ~6.3T total, 200B active trained on ~56T tokens

if composer 3 allocates 50% of the budget to pre-training: ~500B active, 15.3T total trained on 135T tokens.

assumptions are a lower bound: 35% MFU, FP8, ~3-4% sparsity like K2, H100 efficiency. model/token allocation is the mean between K2+K2.5 data point and Inclusion AI compute optimal rules for MoE

really impressed by the progression between composer 2 and composer 2.5

Cursor@cursor_ai

Introducing Composer 2.5, our most powerful model yet.

It's more intelligent, better at sustained work on long-running tasks, and more reliable at following complex instructions.

For the next week, we’re doubling the included usage of the model.

26d63.9K575139

Leon Lin@LexnLin

My first impressions of the new @cursor_ai Composer 2.5 model

+ pretty fast and efficient model + does a great job, i'd say it's almost as strong as opus 4.7 (or in some cases just at the same niveau) + cheap model + good at frontend - still a bit generic design when used without skills

I'll try to post a few results I got later :)

26d63.8K70297

Sasha Rush@srush_nlp

Been working on text feedback / OPSD in Composer. Really interesting space, and much more to be explored.

Cursor@cursor_ai

We improved Composer by scaling training, generating more complex RL environments, and introducing new learning methods.

For example, we use text feedback during RL to learn faster by assigning credit in rollouts spanning hundreds of thousands of tokens.

26d30.3K263118

Chubby♨️@kimmonismus

Intelligence too cheap to meter. This is the real deal. Composer 2.5 is an efficiency-beast

Chubby♨️@kimmonismus

Huge, did NOT expect that release. Evals looks very solid, significant jump compared to composer 2!

But: it’s 10x more efficient than the competition. Looks really exciting. Need to try it out

26d37.9K60158

eric zakariasson@ericzakariasson

i wrote a guide on optimizing context usage 6 months ago that i never posted. back then with the models available, you could only pick 2 of 3:

1. intelligent 2. fast 3. cheap

intelligent + fast = expensive fast + cheap = dumb cheap + intelligent = slow

now, with composer 2.5, this is no longer true and the post is obsolete. looking at TPS, avg cost per task, and score from cursorbench, it's clearly capable of all three

but benchmarks are just benchmarks. what matters is how it feels to use and if it can actually accomplish your tasks. from the feedback so far, that's very much the case

go try it out if you haven't already

Artificial Analysis@ArtificialAnlys

Cursor's new Composer 2.5 takes third on the Artificial Analysis Coding Agent Index and is ~10-60x lower cost than the higher-effort Opus 4.7 and GPT-5.5 variants above it. This release puts Composer among the leading coding agent models, something that wasn’t clear for past releases

@cursor_ai has released Composer 2.5, the latest model in its Composer line. Composer 2.5 scored 62 on our Coding Agent Index, a 14 point gain over Composer 2 (48). This puts it in third place of our tested agents, behind only Claude Opus 4.7 (max) in Claude Code (66) and GPT-5.5 (xhigh reasoning) in Codex (65). These cost $4.10 and $4.82 per task respectively, ~10x the cost of Composer 2.5 Fast ($0.44) and ~60x the cost of Composer 2.5 standard ($0.07).

Key results for Composer 2.5 in Cursor CLI:

➤ Cost-quality Pareto frontier: At $0.07 (standard) and $0.44 (Fast) per task, Composer 2.5 is cheaper than every other agent scoring above 60 on the Index. Medium-effort peers cost $1.24–$2.21 per task; higher-effort variants land 3-4 points above at $4.10–$4.82

➤ Per-benchmark gains vs Composer 2: +35 points on SWE-Bench-Pro-Hard-AA (12% → 47%), +2 points on Terminal-Bench v2 (64% → 66%), and +3 points on SWE-Atlas-QnA (69% → 72%). At 47%, Composer 2.5's score on SWE-Bench-Pro-Hard-AA is comparable to Claude Opus 4.7 (max) in Claude Code

➤ Among the fastest coding agents: Composer 2.5 Fast runs at an average wall time of 6.7 minutes per task, the third-fastest agent on the Artificial Analysis Coding Agent Index, behind only Claude Opus 4.7 (medium) in Claude Code (5.8m) and GPT-5.5 (medium) in Cursor CLI (6.2m)

➤ Fast mode enables better responsiveness at 6x pricing: Fast runs 30% faster than standard Composer 2.5, but is ~6x the cost per task ($0.44 vs $0.07). Token pricing is 6x higher for Fast: $3.00/$15.00 vs $0.50/$2.50 per million input/output tokens

Model details:

➤ Base model: Continued training on @Kimi_Moonshot's open weights Kimi K2.5 as with Composer 2, with Cursor reporting ~85% of total compute from its own additional training and reinforcement learning

➤ Pricing: $0.50/$2.50 per million input/output tokens for the standard variant; $3.00/$15.00 for the Fast variant (the default in Cursor)

➤ Available exclusively in Cursor: both Cursor IDE and Cursor CLI, an externally accessible API is not available

Congratulations @cursor_ai and @mntruell on the impressive release!

23d35.3K32597

jack friks@jackfriks

i just uninstalled cursor literally 27 days ago and now i gotta reinstall it and resubscribe already cause they released an insane model of their own. i did not see this one coming

imsroch@imsroch

composer 2.5 feels unlimited, insanely fast too. went from using 3 models to just one.

congrats @cursor_ai 🖤

23d54.9K53056

Elon Musk@elonmusk

@beffjezos The trend is strong

25d15.2K44212

Cursor@cursor_ai

Learn more about Composer 2.5: http://cursor.com/blog/composer-2-5

26d61.1K57058

Digg Deeper

Kevin Roseasked

What's the price and difference between Composer 2.5 and the latest frontier models from other providers? Also, is it faster, if so, how much?

Composer 2.5 costs $0.50 per million input tokens and $2.50 output—roughly one-tenth the price of Claude 4.7 Opus ($5/$25) or GPT-5.5 equivalents—while posting nearly identical benchmark scores. ^¹

It runs up to 10x more efficiently than similar frontier models, with users calling it fast and strong on frontend and long tasks.

Sheing Ngasked

Any other eval scores available?

No other eval scores appear in the announcement or posts beyond the three benchmarks shown. The table lists Terminal-Bench 2.0, SWE-Bench Multilingual, and CursorBench v3.1, with Opus and GPT entries flagged as self-reported.