Coinbase CEO Brian Armstrong predicts 80 percent of AI workloads will migrate to 99 percent cheaper models within 12 to 18 months

VIEWS108.7KBOOKMARKS231LIKES298REPLIES57

The numbers may be a bit extreme here, but unquestionably use-cases have to stratify in the next year or two between model families.

We’ll see a split between frontier intelligence for high end tasks and work, and much cheaper models for high volume workloads that can sufficiently be peeled off to cheaper models. Frontier will still be far bigger than today because the use-cases will demand it, but the low-end will get quite a bit larger as well.

The big update here is that the layer that can efficiently route the workload to the right model will then become increasingly valuable since that becomes one of the new hard problems in AI agents. Agent orchestration that can cost optimize while still performing the task successfully will be in a strong position.

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

26d108.7K298231

RETWEETS409

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

Tommy@Shaughnessy119

The most basic way AI could blow up imo. I'm not saying it does but this is the most obvious way I can see it happening

- Per seat subscriptions are massively subsidized. The flat fee was priced way below what heavy usage actually costs

- For real business use you have to move to the API anyway. Data protections, work integrations and compliance officer approval

- On the API you pay metered rates, and businesses are burning credits way faster than the per seat pricing ever led them to expect

- This is everywhere right now. Internally for us, Codex users, Uber torching its entire 2026 AI budget in 4 months, the Microsoft comments. Just go try an API

I shared more on this here:

- And I don't think most businesses have the money to keep paying increasing API rates without a real change to how they operate (caps needed)

- Because they have a cheap alternative. They can reach open source models through any aggregator (OpenRouter, Venice, Baseten, Together) and still get strong privacy. Venice private data centers, or E2EE/TEE serving GLM 5.1.

More on open source inference provider raises here:

- And the discount is enormous. DeepSeek V4 codes within a hair of Opus on SWE bench at roughly 1/30th the price, and the cheapest open models run closer to 1/100th

- Chinese labs open source frontier grade models. The model is the single biggest cost an inference provider has, and they get it for free

- This idea dies if China goes closed source. That is actually bullish web2 AI labs, because if everyone is closed you pay up for the best intelligence. China goes closed source if they are tired of giving away an asset and they want the revenue and data flow to train new models

- Is this showing up in web2 AI lab revenue yet? No. Revenue is off the charts. Anthropic went from 9B to 47B run rate in five months

- So go forward, what happens?

- I think revenue slowly starts leaking to the open source inference providers (see Venice usage, OpenRouter's $113M raise, Baseten is raising at $11B or triple its valuation in three months, on revenue that went from $200M to $600M annualized in a single quarter)

- It doesnt move overnight, but it caps the labs ability to raise prices, and margins are already deeply negative. OpenAI is reportedly running near negative 122%

- With margins that bad there is no cash flow, so the labs are fully dependent on outside capital to buy GPUs, train models, and keep subsidizing usage (I.e. see Google tapping $80b equity sale, granted 30b for employee RSU taxes. Clearly they think Equity is overvalued or you wouldn't sell it)

- The break comes when that capital stops. Pricing is capped so margins cant improve, and the moment investors lose conviction on payback, the whole flow reverses

- Why would they lose conviction on payback? Back to the start - the inability to improve margins or get businesses to pay more

- This is also limiting, if we start making new drugs with AI or create entirely new businesses, you better believe people will pay up to the max for AI usage

26d2.8M6.6K4K

Yash Patil@ypatil125

This is exactly right.

People are starting to look for cheaper model alternatives and realizing two things at once: open-source models are already very good, and the ability to train and serve them efficiently at scale can change the economics pretty meaningfully.

Tokens are still being subsidized, demand is ramping quickly, and the compute crunch is likely to persist. That will push companies toward using the right model for each task instead of defaulting to the most expensive one.

We’re still early, but I expect open-weight adoption to accelerate much faster than most people think.

Tommy@Shaughnessy119

The most basic way AI could blow up imo. I'm not saying it does but this is the most obvious way I can see it happening

- Per seat subscriptions are massively subsidized. The flat fee was priced way below what heavy usage actually costs

- For real business use you have to move to the API anyway. Data protections, work integrations and compliance officer approval

- On the API you pay metered rates, and businesses are burning credits way faster than the per seat pricing ever led them to expect

- This is everywhere right now. Internally for us, Codex users, Uber torching its entire 2026 AI budget in 4 months, the Microsoft comments. Just go try an API

I shared more on this here:

- And I don't think most businesses have the money to keep paying increasing API rates without a real change to how they operate (caps needed)

- Because they have a cheap alternative. They can reach open source models through any aggregator (OpenRouter, Venice, Baseten, Together) and still get strong privacy. Venice private data centers, or E2EE/TEE serving GLM 5.1.

More on open source inference provider raises here:

- And the discount is enormous. DeepSeek V4 codes within a hair of Opus on SWE bench at roughly 1/30th the price, and the cheapest open models run closer to 1/100th

- Chinese labs open source frontier grade models. The model is the single biggest cost an inference provider has, and they get it for free

- This idea dies if China goes closed source. That is actually bullish web2 AI labs, because if everyone is closed you pay up for the best intelligence. China goes closed source if they are tired of giving away an asset and they want the revenue and data flow to train new models

- Is this showing up in web2 AI lab revenue yet? No. Revenue is off the charts. Anthropic went from 9B to 47B run rate in five months

- So go forward, what happens?

- I think revenue slowly starts leaking to the open source inference providers (see Venice usage, OpenRouter's $113M raise, Baseten is raising at $11B or triple its valuation in three months, on revenue that went from $200M to $600M annualized in a single quarter)

- It doesnt move overnight, but it caps the labs ability to raise prices, and margins are already deeply negative. OpenAI is reportedly running near negative 122%

- With margins that bad there is no cash flow, so the labs are fully dependent on outside capital to buy GPUs, train models, and keep subsidizing usage (I.e. see Google tapping $80b equity sale, granted 30b for employee RSU taxes. Clearly they think Equity is overvalued or you wouldn't sell it)

- The break comes when that capital stops. Pricing is capped so margins cant improve, and the moment investors lose conviction on payback, the whole flow reverses

- Why would they lose conviction on payback? Back to the start - the inability to improve margins or get businesses to pay more

- This is also limiting, if we start making new drugs with AI or create entirely new businesses, you better believe people will pay up to the max for AI usage

26d53.7K192114

Brendan (can/do)@BrendanFoody

The prerequisite to distillation and routing is a robust evaluation.

Every enterprise will need an eval for every agent deployment.

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

25d25K13485

Matthew Berman@MatthewBerman

I made an angel investment in @tomas_hk two years ago, believing model routing would become a massive business.

Now companies are spending $500m per month on tokens. Increased quality and decreased cost are not mutually exclusive.

Here's Tomas' essay about how Model Routing Will Control the Future of Economic Value

https://briefing.forwardfuture.ai/p/model-routing-will-control-the-future-of-economic-value

25d58.5K11379

kache@yacineMTB

The frontier models are pretty retarded, and are only useful for chainsaw level tasks that don't require that much intelligence like mathematics

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

26d30.9K27936

Julien Chaumond@julien_c

Model routing is growing a lot these days

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

25d44.6K14337

Jerry Liu@jerryjliu0

I really do think we'll see a lot of value accrue in AI startups building "model routing as a service"

Not just OpenRouter - this includes a much broader set of verticalized agents and infrastructure. * For us it's document infrastructure: parsing, extraction, search * Another infra analogy is web search: Exa/Parallel * For verticalized apps it could be anything from Cognition to Harvey

The frontier labs own the underlying models, and their main application at the app layer is the enormous amounts of $$ that they have; but their main disadvantage is that they only own a subset of all the points on the Pareto curve.

Speaking from our team's experience, it is both non-trivial and extremely important to find a point on the pareto curve of accuracy and cost. There's enormous amounts of ML time spent on model evaluations and benchmarking, and infra time making sure that the service can scale reliability, without rate limits, and without blowing up cost. At the same time it's extremely important not just for cost reasons but also latency, and oftentimes long-tail accuracy in use cases that demand it.

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

26d6.8K3925

Rohan Paul@rohanpaul_ai

Coinbase CEO Brian Armstrong thinks AI demand is almost limitless, but he expects 80% of workloads to shift to models that are 99% cheaper within 12-18 months.

Brian Armstrong@brian_armstrong

Good take

My guess is - demand for intelligence is near infinite - but 80% of workloads will be running on 99% cheaper models within 12-18 months - 20% of workloads will still run on latest gen models where IQ maxing is important (scientific breakthroughs, higher level ochestrator agents?) - rough analogy might be what % of macbooks or gaming PCs sold have the maxed out specs for CPU/GPU, prices are falling much faster than Moore's law here though - this leads me to think the limiting factor will be energy and compute, not better models

At Coinbase we're working hard on routing prompts to cheaper models where appropriate, and in some cases have been able to keep costs roughly flat, while token usage continues to grow exponentially.

25d8.4K6215

Matthew Berman@MatthewBerman

"How do you predict the intelligence of an output with less intelligence than it costs to produce it?"

Model routing is an extremely hard problem.

Matthew Berman@MatthewBerman

I made an angel investment in @tomas_hk two years ago, believing model routing would become a massive business.

Now companies are spending $500m per month on tokens. Increased quality and decreased cost are not mutually exclusive.

Here's Tomas' essay about how Model Routing Will Control the Future of Economic Value

https://briefing.forwardfuture.ai/p/model-routing-will-control-the-future-of-economic-value

25d11.6K7115

Tomas Hernando Kofman@tomas_hk

My thoughts on the future of model routing and AI: - We have not even scratched the surface of runaway inference costs - Solving this requires intelligent model routing, especially as the inference landscape continues fragmenting. This is a *hard* problem. - Naive solutions (turn-based routing, session routing) fail; routing successfully involves managing multiple cost surface areas in concert. - Getting routing right means a more diverse market of providers, more power for consumers, reduced ecological impact, and improved effectiveness.