CNBC says enterprise shift from token volume to cost optimization threatens projected revenue growth at OpenAI and Anthropic · Digg

/Tech9h ago

CNBC says enterprise shift from token volume to cost optimization threatens projected revenue growth at OpenAI and Anthropic

Story Overview

Enterprises once raced to burn through as many AI tokens as possible, especially in coding workflows, but are now tightening budgets and demanding measurable returns, a change that could curb the rapid revenue ramps previously forecast for OpenAI and Anthropic.

12677982476115.7K

Original post

Gary Marcus@GaryMarcus#178inTech

even polymarket sees it now

Polymarket@Polymarket

JUST IN: Businesses are shifting from AI “tokenmaxxing” to efficiency, threatening the explosive growth of OpenAI & Anthropic.

10:41 AM · Jun 26, 2026 · 5.5K Views

Cost Pressure

Spending caps arrive at large customers

Firms such as Uber have introduced monthly AI tiers after burning through annual budgets in just four months, while Lindy switched entirely to cheaper open-weight alternatives expecting millions in savings.

IPO Prep

IPO timing meets maturing demand

Both companies filed confidential IPO documents in early June, just as analysts noted that current growth rates are likely the fastest either will ever see.

Sentiment

Many users welcome businesses like Coinbase shifting from AI tokenmaxxing to efficiency via smarter defaults, routing, and caching because it cuts spend while avoiding waste and focusing on real value.

Pos

78.3%

Neg

21.7%

27 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

OpenAI and Anthropic face new AI reality as companies shift from tokenmaxxing to efficiency

CNBC.COMVia

Posts from X

Most Activity

VIEWS86.6KBOOKMARKS464LIKES651RETWEETS62REPLIES78

Brian Armstrong@brian_armstrong

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.

Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.

Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.

Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.

Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.

Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.

The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.

Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

1h86.6K651464

Harrison Chase@hwchase17

> Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible.

We do this for you in Deep Agents - see our blog on it here:

Brian Armstrong@brian_armstrong

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.

Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.

Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.

Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.

Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.

Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.

The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.

Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

1h3.5K2214

clem 🤗@ClementDelangue

@brian_armstrong next step: post-training your own models based on open-source!

Brian Armstrong@brian_armstrong

How to keep AI spend flat while token usage grows exponentially: Not with friction and spend alerts. With better defaults, routing, and caching.

Better Defaults (not Usage Caps) – Engineers can choose any model they want, but defaults matter. We’re experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway, while still encouraging engineers to choose the right model for the task. 91% of our employees were never hitting their usage caps, so instead of lowering caps and driving up alerts, we're moving to cheaper defaults. Note that code reviews use a diversity of models, so they can check each other's work.

Better Routing – In our custom harnesses, we preprocess prompts and route to the best model for the job, considering cache hits and model pricing. For instance, you may want a frontier model for planning, but not for execution where they can be overkill. Ultimately, humans shouldn't be choosing models - AI can automate this task.

Better Caching – Cache misses are the easiest way to drive your cost up. All of our requests are cache aware, so we’re reusing a warm cache wherever possible. For example, our cache hit rate went from 5% → 60% in LibreChat once properly implemented.

Keep Context Lean – Start fresh sessions when switching tasks. Scope file context narrowly. Disconnect unused tools. Don't just compact. The goal isn't fewer tokens used, it's fewer tokens wasted.

Better Visibility – Our engineers can use as many tokens as they want, from whatever model they want, but we’ve made usage visible – and the more you spend on AI, the more impact we expect.

The goal isn't to suppress usage. It's to build the infrastructure that makes exponential growth sustainable.

Putting this into practice has cut our AI spend nearly in half, while our token usage continues to grow.

1h1.1K152

CNBC@CNBC

OpenAI and Anthropic face new AI reality as companies shift from tokenmaxxing to efficiency https://www.cnbc.com/2026/06/26/openai-anthropic-new-ai-spending-reality-as-users-shift-to-efficiency.html?taid=6a3e6cfd4493680001f19646&utm_campaign=trueanthem&utm_content=main&utm_medium=social&utm_source=twitter

14h25.5K8518

Smakosh@smakosh

@brian_armstrong

51m11411

DimaGaydar@gaidardima

@CNBC ALFRED's words are absolutely right! We were warned about AI back in 1992, ---"Sounds as if the human race could become quite expendable for AI." Stop AI

14h9011

Gary Marcus@GaryMarcus

@sam90860759 see my essay yesterday on the fizzle. i think maybe it is a fizzle rather than a pop.

4h591

Thokani 🕊️@thokani

@brian_armstrong brian have u checked out @AskSurplus its a inference marketplace on Base built by @mac_eth who you may know

might cut ur costs down!

1h1363

shams@sam90860759

@GaryMarcus Gary I agree with the ai bubble premise, but we are a little early. I think it pops closer to end of 2027.

4h28

Brendan Doyle@skinandbones44

@btsouth @brian_armstrong Claude desktop uses the same baseline harness as Claude code with the Claude agent sdk which just runs the Claude code binary. All of these known name harnesses have tool search built in now, it’s table stakes to even be a functional product after last several quarters.

42m6

Zach AL@zachdotai

@brian_armstrong this is exactly what @_adamr_1 built @TracerML for

1h672

Tyler@btsouth

"Disconnect unused tools" is the line most people skip, and it's the biggest lever in here. Every idle MCP server dumps its full tool list into context on every request, whether the agent calls it or not. You pay that tax before typing a word.

It's the whole reason we built @conduitmcp: one gateway that hands the agent 3 meta-tools to search on demand instead of every server's full catalog. Measured ~90% fewer tokens, same results.

1h1151

Uptopia 🪼🕹️@Uptopia_xyz

@brian_armstrong 💪💪💙💙

1h164

Korey Niese@CoachKorey3

@brian_armstrong Cool beans 🫘

1h124

Florian Leibert 🎢@flo

@brian_armstrong @ClementDelangue I’m running both of these on mi300x amd and am super happy… imho better than current codex

26m121

Boba Network 🧋@bobanetwork

@brian_armstrong spittin bars

1h117

Trace Cohen@Trace_Cohen

@brian_armstrong @arnavbathla20 For now but it will inevitably 📈

31m87

Mcdonalds Boy@mcdonaldsboy_

@brian_armstrong Brian, just let it Ride...

1h75

Thermotopy@Thermotopy

@CNBC Dario, you, of all people, should have been able to resist this logic of power. What’s happening is exactly the worst nightmare. An oligarch-emperor who decides who will control the intelligence of the future—the ultimate power. OSP! (open source power ✊)

14h69

Macro Bombastic@MacroBombastic

@brian_armstrong Solid advice, mate. Cutting costs while scaling usage is the real move. Crypto bros been saying this for years.

1h66