This post right here officer Let me know when your engineers ship 8x LESS code
Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025.
This post right here officer Let me know when your engineers ship 8x LESS code
Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025.
Some users express optimism about AI accelerating progress from Anthropic's 8x code claim, while many dismiss the metric itself as arbitrary marketing or flawed due to review costs and meaningless numbers.

@AnthropicAI and yet you don't know why your models are telling people to go to sleep after some tasks. "could revolutionize society". Get a better look at society first before you start claiming you're going to help society. People out here are celebrating a murder of a ceo.

@jsuarez This is why we have a leaderboard that we review at the end of our sprint: who shipped the most lines of code (it usually our SWE II on a PIP)
Today, Anthropic engineers on average ship 8x as much code per quarter as they did compared to 2021-2025.

This may help explain why Claude Code sometimes makes unnecessary tool calls, and why users see errors such as, ‘Your message was sent, but Claude couldn’t respond. Try again.’ I would strongly encourage the team to focus more effort on testing and validating feature behavior, reliability, and performance before release. Recent changes appear to have reduced existing capabilities or introduced regressions that cause users to waste tokens retrying failed interactions. More validation/testing would help preserve user trust and improve the overall developer experience. Thank you ! 🙏

@jsuarez @jeremyphoward yep

@jsuarez yeah. obviously a pointless, useless measure-turned-marketing-tool *just like tokens spent* how about tokens spent per lines of code shipped while we're at it

@jsuarez Sometimes shipping code is good

@jsuarez Let me know when they delete 8x more code

@AnthropicAI What is the obsession with lines of code?

@dsmproengineer @jsuarez found the vibe coder 😂 ^

@cgarciae88 …and got so far …

@AnthropicAI measuring engineer output in lines of code in 2026. might as well pay the agent by the word. it'll write you a novel to rename a variable and bill you for the trilogy.

@jsuarez people that think less code is better don't know how computers work.

@jsuarez 8x more code is not the flex if review cost also scales.

@AnthropicAI They ship 8x more code, cause they ship, then ship a fix, then a fix for the fix, ...
Code production is a stupid metric, I thought we had already went over this as an industry.

Here is my personal graph - same shape as Anthropic's curve. Same inflection point.
Here's what moved the needle - and none of it is exotic:
→ Layered knowledge. A CLAUDE.md system the agent reads before every action. Discipline, not prompting.
→ Memory that survives. Context that holds within a session and carries across them - no starting from zero every morning. Memory management is what makes or breaks any AI system.
→ A pipeline with zero manual steps. Ticket → tests-first → code → review → PR. Same path every time.
→ Parallel agents, each in its own isolated container. Many tickets at once, no merge chaos.
For me, the next step was, unlike Anthropic and OpenAI, I could not spend millions on tokens. I didn't want to compromise on the quality, but wanted to cut the token spent + ensure best practices could be applied consistently across the team.
So once the memory and context layers were solid, I could route intelligently - local models (Qwen), Kimi, DeepSeek and Gemini for the cheap work, Claude only for the hard stuff. The system learns which is which. More output, lower spend. This has been hard work, lots of experiments and fine tuning (more on it later); but it's well worth it.
It's all open source for anyone to use or adapt: http://github.com/alinaqi/maggy
And I've now built a hosted version - http://www.srooter.ai - to put a real governance layer on top: enforce best practices across a team and push engineering performance without growing token spend, all from inside Claude Code, Codex and the tools you already use. #AIEngineering

@AnthropicAI Sounds like your engineers haven't figured out Ship velocity =/= deployment, adoption velocity, or monetization velocity.
And nobody is asking for 8x code. Look around and smell the coffee.
Welcome to the real world.

Are the number of bugs proportional to the size of the codebase, or are they disproportionately higher? On average, how many times is each type of bug fixed? What percentage of the total lines of code (LOC) is dead code, and does this LOC count include only executable code or also blank lines and comments?

@AnthropicAI 日本語で解説を書きました Japanese breakdown here:

@DanAdvantage @jsuarez