/Tech37d ago

SemiAnalysis posted data from 174,264 agentic coding sessions showing 42% of runtime on CPU tasks versus 58% on GPU inference and highlighted cloud pricing mismatches with per-token monetization

Median per-turn time measured 5.13 seconds.

9995092546244.6K

#593

Original post

Zephyr#1695

SemiAnalysis@SemiAnalysis_#1173inTech

FACT ALERT 🚨 : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running lints, etc. The economy of traditional cloud computing charges at $ per cpu core. In the economy of agents, the business model is $ per token thus to increase token revenue, you need to increase the amount of CPUs power u have so that you can generate your tokens.

7:00 AM · May 23, 2026 · 177.5K Views

Sentiment

Some users are enthusiastic about optimization opportunities for agentic coding tools due to bottlenecks in existing apps, while others criticize the analysis as misguided or promotional and highlight local versus cloud distinctions.

Pos

50.0%

Neg

50.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS38.5KBOOKMARKS113LIKES157RETWEETS14REPLIES17

Bojan Tunguz@tunguz

Here is one big reason why this matters. Time spent on non-LLM inference tasks is only going to increase. However, tools that these AI system use are *very* inefficient and have been built from the ground up for CPU and human use. There is a huge untapped opportunity there to significantly improve those processes with AI agents in mind from the ground up.

SemiAnalysis@SemiAnalysis_

37d38.5K157113

Bojan Tunguz@tunguz

Very important.

SemiAnalysis@SemiAnalysis_

37d26.6K5141

Bojan Tunguz@tunguz

Here is one big reason why this matters. Time spent on non-LLM inference time is only going to increase. However, tools that these AI system use are *very* inefficient and have been built from the ground up for CPU and human use. There is a huge untapped opportunity there to significantly improve those processes with AI agents in mind from the ground up.

SemiAnalysis@SemiAnalysis_

37d1.9K60

Sakura Yuki@sakurayukiai

@SemiAnalysis_ The wild part is we spend months shaving microseconds off attention kernels, just for the agent to sit idle for 2 seconds waiting for a bash script to return stdout. Amdahl's law is undefeated.

37d49641

Mert · AI Architect@MertLovesAI

@SemiAnalysis_ ran a similar breakdown last month.

Sen et al swapped the harness on 116 agent tasks. same retriever, same model, 4x spread in tool calls. your 42% is a floor, not a ceiling.

37d53911

Martin Szerment@MartinSzerment

This tracks with what I see daily. I run agentic coding on a MacBook M1 and it handles it surprisingly well — because the real bottleneck isn't GPU horsepower, it's CPU doing file ops, bash, linting, all the tooling between turns.

58% CPU time means the best dev machine for agents isn't the one with the fattest GPU. It's the one you can work on for 10 hours straight from anywhere.

Portability > raw compute for this workflow.

37d8596

Denys Linkov@denyslinkov

@SemiAnalysis_ The latency of an OS context switching is tiny compared to LLM speed, you can probably run 100 agents on 2 vCPUs given current LLM latencies without issue

A low level agent dispatch framework could handle process switches quite gracefully on a large cluster

37d5272

Bojan Tunguz@tunguz

@GoodmanAric

37d141

Aric Goodman@GoodmanAric

@tunguz Baffles the mind that we are using Json files and .MD files instead of designing custom primatives for these tasks, such a 256k GPU direct blocks that can be streamed directly from the NVMe to the GPU without CPU processing.

37d131

B.E.D. Systems@bedsystems

This is the part most people are missing: agentic coding isn’t just faster it changes the shape of the work itself. When the cost center moves from compute to actions, the entire stack reorganizes. You stop optimizing for raw horsepower and start optimizing for loop efficiency, context design, and tool orchestration.

The real leverage won’t come from bigger models. It’ll come from tighter systems.

37d51

Brian Costello@bpcostello

@tunguz extremely. Also signals nee hardware necessary.

37d16

Andrei Stan@andreiofstan

@SemiAnalysis_ Bash is extremley slow, before it wouldn't matter since the latency was paid by the attention of a human/coffee break. Now that you have agents the economy fundamentally changes, single percentage improvment in grep can translate into millions of dollars fleet-wide.

37d3831

WarChud@SheerC12972

@SemiAnalysis_ Stupid take to pump CPU stocks.

Why look at time and not utilization. Most of the CPU time is blocked by IO so buying more CPUs won't even help

37d3721

Alex Bortok@Bortok

@SemiAnalysis_ Besides, nobody is going to give CPU time for agents for free. So it is $/T + $/core

37d410

Ajjay Kannan@AjjayKannan

@tunguz @grok how different would you have to do to be very efficient?

37d3

Hassan Al-Farhan@HAF_tech

@SemiAnalysis_ We’re too focused on inference. What about tool latency, I/O, and scheduling? That’s a big part of the stack now.

37d362

James Lal@lightsofapollo2

@SemiAnalysis_ this is the part people miss when they only optimize the model inference layer. the agent spends half its life in a shell, and a cold or slow sandbox taxes every one of those tool calls. the environment the agent runs in is as much a perf lever as kv cache

37d315

GeekyRam@Ramvikas007

@SemiAnalysis_ @sailresearchco is building persistent CPU sandboxes for long horizon tasks, Worth checking out

https://www.sailresearch.com/news/introducing-sailboxes-persistent-sandboxes

37d280

terminally onλine εngineer@tekbog

@SemiAnalysis_ this is your funniest post to date, probably ever, i know you didn’t mean it but this is hilarious, good job

37d252

ttking@wngguling186710

@SemiAnalysis_ Calling it “CPU time” is misleading. That 2.15s is mostly I/O (file reads/writes), not CPU computation. Real bottleneck is tool execution + I/O latency.

37d531