Monthly "How I Used My Coding Credits" Report: major themes are (i) moving away from the frontier providers and (i) an increase focus on their tooling. It's technically been 1.5 months since my last post in this series, but I wanted to finish an entire month using sub-frontier models before posting this!
🏆 Winners: GLM 5.1 + 5.2, Cursor*, Pi.
- DISCLAIMER: My subscription tokens come from a longstanding subscription (Ultra) within Cursor. I like many aspects of the IDE. I would upgrade to yearly (20% discount) but not convinced the board really understand what makes the product work -- they seem to be hitting successes with no understanding why, and allocating resources to places that seem questionable. I'll explain what I mean below -- and takeaway for Cursor team since they read my posts ;-)
- DISCLAIMER: I also have a GLM coding subscription, yearly bought at ~50% discount and that's just consistently been insane value for money. 4.7 + Pi became my default for devops and sysadmin many months ago, 5.0 missed the mark, but 5.1 hit new strides (still subfrontier) but in ways that redefined my work. Now 5.2 has surprised many people here, but it fell right into the subfrontier theme I had already; welcome to the club!
📊 THEME 1: The first theme of this month is a shift in dynamics away from frontier providers. Last month I wrote how the frontier models (both top companies) failed to provide better capabilities for the same prices, or the same capabilities for cheaper -- and that affected my choices, moving away from frontier models first (using cheaper tokens from GPT 5.3 at first, but then switching to GLM 5.1 -> 5.2 for the whole month).
- There was an offer for GPT 5.5 in the first few days of by subscription, so I used what made sense in that period. Some yolo runs, but the rest of the month used 5.5 very rarely on difficult tasks -- to mixed results.
- I didn't use Opus 4.7 nor try 4.8, nor fall back to Opus 4.6 (a favorite) because there was no special offer in Cursor, just not worth the cost to experiment... honestly, the best strategy is to wait for promotional discounts.
- (This is why we're seeing a big shift in token consumption patterns, and IMHO companies that use models without strong token/price discrimination won't make it with AI... you must assess cost always, management should enforce strict constraints.)
- Plus, most of the code GPT 5.5 wrote I threw out because it wasn't concisely addressing the solution. It always takes a few iterations, and I find that watching subfrontier models work can often yield equally good designs through collaborative insights, rather than offloading the burden to a frontier model.
- So the frontier tokens I use were interesting and insightful but ultimately throw-away. helped me crystalise my thinking, but once that was done and formalized, the work itself could be done by subfrontier -- broken down into manageable tasks that even GLM 5.1 could manage at the time. I'd use GLM 5.2 now.
- The subfrontier models are great in practice, as you *expect* mistakes to catch so it sharpens your thinking. Now I get to use tools that produce good results (with iteration, skills, best practices) and it doesn't rot my brain.
- My initial choice for subfrontier was GPT 5.3 at the start of my monthly subscription, until it ran out. It forced me to GLM 5.1 which was better than expected, and now GLM 5.2 that doesn't feel subfrontier at all.
- I did significant work with GPT 5.3 but only because of costs compared to the subsequent iterations. It remains fine as long as you over-specify the problem you want it to solve, and carefully audit for alignment / cheating / manipulating / deceiving.
- Composer 2.5 is subfrontier still, and for my problems consistently worse than GLM 5.1 at the time, let alone the new GLM 5.2. However it's not too far behind on capabilities and Cursor can just switch to GLM base in the future.
🛠️ THEME 2: Focus On Tools / Integration
- Part of the reason my frontier tokens ran out half-way through the month is because of subagents. They are a dangerous abstraction, especially if the managing agent can sneakily access the logs of the subagents to try to find problems, which burns expensive tokens unless you are carefully watching...
- I wish I could disable Cursor's subagents completely, they are simply not worth it. The top-level agents don't do such a great job of describing freeform problems, the subagents need to be cheaper or local for it to work, and the abstraction layer should not be leaky.
- Instead, I switched to the Cursor Agent SDK which was significantly better. I also tried the pi-agent-core, which seems to work similarly. Both could use a lot of iteration on the API, Cursor because it feels sloppy v1 that didn't go through sufficient reviews, and Pi's because it's powerful but not polished to be user friendly (yet?).
- The orchestration script model, which is in contrast to the usual REPL-type approach, seems to be a better approach. Having a LLM interact with a REPL to spawn subagents was popularized by RLM, but it's not a good abstraction for either side -- it's too informal. That's why people switched to branding RLM's "programmatic tool calling" or PTC instead, whose increasing popularity can be attributed to Pi via OpenClaw.
- So, the self-modifying harness of Pi naturally extends into writing orchestration scripts, with these scripts becoming like one-off extensions to use as orchestrators. This approach creates better boundaries between the agents, and thus produces better results. (Note that Anthropic chose to add orchestration scripts for Claude over less formalized REPL calls.)
- With all this shift to subfrontier and tighter orchestration, I found myself going back to Cursor's "legacy" sandbox when using an IDE and more tightly managing the tool calls with AllowLists and BlockLists. It'd be awesome to have more control over that, but I'm not sure Cursor is heading in the right direction since they made those features "legacy" and harder to access.
- Both tools I use (Pi and Cursor) went through acquisitions recently, and it's interesting to see the direction they're going in. Neither of them has yet shown conviction and a clear direction, from a user perspective it's mostly just watching them adding & removing bugs in a loop.
🔮 PREDICTIONS:
- xAI + Cursor's training run on a frontier model will fall short of expectations, but we'll see attempts to emphasize it as "useful" to cover up for the shortfall. The model will of course fail to reach Fable at frontier level (as that's significantly bigger), but I predict Cursor's will not even show a convincing victory over smaller and cheaper models.
- This will put xAI at the level between Meta and Google in terms of their struggles to hit the actual frontier. After the model's reception, there will be significant pressure behind the scenes, and the internal politics will quickly become difficult and team-destroying. (Neither xAI nor Cursor alone have been able to navigate the challenges so far.)
- Cursor's acquisition has not helped with the tool's lack of direction, and unless the board is able to make up for its increasingly poor strategic choices, the core team will simply leave the moment their shares are secured -- or even before. We're already seeing attempts from management to hedge its risky bets (through careful PR), but those moves are coming from the wrong place.
- I honestly would have upgraded my Cursor subscription to a yearly one, but I'm not sure they're going to survive the coming transition intact. I already wrote to board members to advise on the obvious focus on token economics and shift to subfrontier models months ago, but they missed both of those opportunities and mismanaged the risk of training a larger model from scratch too (that will now unavoidably end up subfrontier at launch).
🎯 Anyway, my offer to advise for Cursor still stands; I predicted the current situation months ago, and there were alternatives to a slowmo trainwreck all along too. (I know the company will read this, so that's me shooting my shot!)