wow GLM5.2 is at the same level as opus 4.8 in terms of cost efficiency on cursorbench
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓
The model achieves task cost parity with Opus 4.8
wow GLM5.2 is at the same level as opus 4.8 in terms of cost efficiency on cursorbench
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓
Some users welcomed GLM 5.2's Cursor integration for its claimed cost efficiency and Fireworks partnership, while many others called the benchmarks inflated, criticized its value versus alternatives, and advised against subscribing.
No Digg Deeper questions have been answered for this story yet.
GLM 5.2 being on the Opus frontier for cost of CursorBench is what drives frontier lab margins down
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓
Sonnet 5 can't come soon enough
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓

@morganlinton More details here!
https://cursor.com/blog/cursorbench
CursorBench is an internal benchmark, so scoring well here is much better measure of real world performance than public benchmarks. GLM 5.2 is not too far from Opus 4.8 and I would not be surprised if the next iteration largely closes the gap - exciting times for open source AI!
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓

@leerob oh mon amour i miss you so so much...

@leerob Why not add Kimi 2.7 code on this graph?

@leerob Why would anyone try that especially compared to Composer 2.5????
output token as x axis show a gap in terms of reasoning efficiency (when you look at output tokens you need to factor in model size btw)
wow GLM5.2 is at the same level as opus 4.8 in terms of cost efficiency on cursorbench

@leerob This is awesome Lee! Any chance you can share how many tasks are in CursorBench, or details on what the tasks are?
I’m getting into benchmarking more and more so super interested in what others are doing!

@leerob

@leerob So none of models are nearby fable 5

@leerob @RayFernando1337 Let’s fricking go

@leerob @LLMJunky actually crazy. opensource model on opus4.8 low level is really good

@leerob Fireworks? Takes me back.

@leerob thank you for adding it, can we expect kimi k2.7 code as well?

@leerob Ohhh very cool, ty!

@leerob the fireworks endpoint seems to be a bit unstable; it doesn't work for me.
It's way easier to switch models than to switch harnesses, and like many of you we use @cursor_ai every day.
Now you can try out the latest open-source frontier model without changing your workflow.
You can now try GLM 5.2 in Cursor!
Excited to see more useful open models, thank you to Fireworks for partnering here. Results from our evals ↓

@leerob I've been hitting GLM 5.2 so much it's telling me to stop and proceed later lol. Has anyone else been facing this as well?