/Tech15h ago

Box CEO Aaron Levie and LlamaIndex founder Jerry Liu argue rising token costs make model routing a critical AI architecture layer

Open-weight options help enterprises optimize Pareto frontier costs.

3712.3K1751.8K731.3K

#14

Original post

Aaron Levie@levie#14inTech

Token costs are becoming one of the hottest topics for any enterprise I talk with right now. It’s very bullish for AI in general because it means these systems are being used at a scale that wasn’t contemplated before.

It also gives way to another form of differentiation that will emerge for the applied AI layer, which is model routing.

As tokens take on a significant amount of the cost of any given workflow, then companies will inevitably want to ensure that their dollars go into the most efficient use of tokens for the particular job at hand.

Frontier intelligence will always be relevant at the high end of tasks, like coding, legal and financial analysis, healthcare, and more. And dollars spent here will only go up over time. But, equally, you can peel off individual tasks to lower cost models (whether they’re from open weights vendors or the major labs) and deliver a more efficient end outcome.

To do this effectively, the applied AI layer needs to understand the workflows in their domain better than anyone else, and be able to mix and match models to different jobs. If you’re doing document extraction, you need to know which models perform better or worse for any given document type. If you’re legal analysis, you want to know which models perform various types of tasks best. And so on.

This will become one of the bigger differentiation points over time. The companies with the best evals, the best ability to route the workloads, and those that have business models directly aligned to customers financial goals, will be in a great position.

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

The biggest surprise of 2026 is that the capability gap between the best open-weight/source models and the best closed models has narrowed much faster than the pricing gap. The pricing gap remains enormous while the capability gap is quite narrow.

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Most CEOs have no idea that, instead of this nuanced approach, their teams are running amok internally by picking the most expensive models in most cases and burning through massive budgets with zero governance, audit ability and control.

As control planes like our Software Factory become more standard, you can expect the run rate revenue growth of the frontier labs to go down meaningfully and the revenues of the open models to skyrocket.

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

11:02 AM · Jun 6, 2026 · 147.4K Views

/Tech15h ago

Box CEO Aaron Levie and LlamaIndex founder Jerry Liu argue rising token costs make model routing a critical AI architecture layer

Open-weight options help enterprises optimize Pareto frontier costs.

3712.3K1751.8K731.3K

#14

Original post

Aaron Levie@levie#14inTech

It also gives way to another form of differentiation that will emerge for the applied AI layer, which is model routing.

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

11:02 AM · Jun 6, 2026 · 147.4K Views

Sentiment

Positive users highlight open models narrowing the capability gap to frontier AI while retaining major price advantages and production viability, while negative users dismiss these claims as trash or cope and insult their promoters.

Pos

44.7%

Neg

55.3%

71 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS28KBOOKMARKS66LIKES207REPLIES16

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

8h28K20766

RETWEETS84

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

Gavin Baker@GavinSBaker

Quite a week for open-source AI. Especially American open-source. Nemotron 3 Ultra is the most important release in quite some time. And some really cool RL and fine-tuning work from Harvey.

17h508.5K1.4K1.3K

Mark Cuban@mcuban

@chamath It means most utilization of AI will be localized.

There will be millions of specialized models.

The big frontier models better add unique value or they won’t stay big.

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

6h12.7K6621

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

A thread with a good collection of hard/private/OOD evals where the Western frontier is comprehensively dunking on Chinese/open source models and it's not remotely close.

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8h8.6K8324

Jerry Liu@jerryjliu0

No frontier lab will own every single point on the pareto frontier around cost/latency and accuracy. Even as the pareto frontier itself advances, there will always be points owned by open-weight models that are orders of magnitude cheaper than the frontier ones.

There's been this huge uptick in interest in model routing and cost optimization, for two reasons: ✅ Organizations are more carefully thinking about how to carefully manage cost ✅ Every AI-native startup and VC is thinking about the best way to build a moat against the frontier labs (and raise their gross margins)

These topics are quite relevant for our mission at @llama_index - which is to build the document infrastructure for AI agents. We want to help unlock the trillions of pages of unstructured paperwork within every organization for agentic automation.

Organizations require both higher accuracy and orders of magnitude lower cost within their document OCR solutions compared to the frontier models. The pareto frontier of document OCR has a meaningful, exploitable gap beyond what is offered by the frontier VLMs. Exploiting this gap requires both AI expertise and an absolutely obsessive focus around how PDFs, .docx, .pptx, and other file formats work.

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

15h8.1K3416

Yash Patil@ypatil125

This is the bet we made when we started AC.

We work with our customers to train specialized models to be on the pareto frontier.

Aaron Levie@levie

It also gives way to another form of differentiation that will emerge for the applied AI layer, which is model routing.

14h7.9K479

Lisan al Gaib@scaling01

Benchmarks: SWE-Marathon DeepSWE ExploitBench GBA-Eval

ProgramBench PencilPuzzleBench CritPT CAIS Text Leaderboard

WeirdML ALE-Bench

RuneScape-Bench CursorBench Toloka Arena

Arena Overall Text Leaderboard

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8h1.6K1610

Mac@mac_eth

@chamath Those margins are my company's opportunity - http://surplusintelligence.ai offers great discounts relative to openrouter's price.

All time average discount per request was 80%; current daily median is a 65% discount.

If you have an AI business, a 50% discount means 2x the inference

16h720452

Yum⋆₊˚@yuhasbeentaken

this feels exactly like the early cloud era...

at first everyone just picked the easiest premium option and nobody cared because the bill was still small. then usage exploded, finance noticed, and suddenly governance became a product category. ai inference is going through the same cycle now. the winners won't be the teams using the most powerful model everywhere... it'll be the teams that can route work intelligently based on task, risk, latency, quality, and cost

15h77865

Jerry Liu@jerryjliu0

If you're interested in these problems, come talk to us or check out our careers page!

Contact: https://www.llamaindex.ai/contact Careers: https://www.llamaindex.ai/careers

Jerry Liu@jerryjliu0

15h3K26

1.08@ArcanesValor

@chamath The apples to apples comparison would be Deepseek to models of similar size and capability. The open models are not on the efficient frontier of price/capability once you benchmark against the most token intensive real-world tasks.

16h2.6K82

Lisan al Gaib@scaling01

don't get me started on long time horizon or long context bruh

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8h1.5K122

Lisan al Gaib@scaling01

the best benchmark of course

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8h2.2K181

Louie Peters@_LouiePeters

To better adjust for thinking token usage, a fair price comparison looks more like:

GPT-5.5 xhigh = $3357 Claude 4.8 Max = $4686 Deepseek V4 Pro Max = $268

*Cost to run artificial analysis benchmark.

I still personally find a far bigger gap between frontier models and open source in real world vs benchmarks though. The much larger training budgets for Claude/GPT are more obvious when you test the edge of capabilities and regularly run into edge cases not covered by public benchmarks.

The value of my time vs LLM tokens still makes it very hard to justify using an inferior LLM and spending more time guiding it and fixing its mistakes.

That said, super glad to see open weight models continue to progress and some great models out this week.

17h34152

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8h1.9K81

Chamath Palihapitiya@chamath

@ArcanesValor Don’t be confused. These evals aren’t focused on “real world tasks”. That’s the fallacy.

Evals aren’t meant to do anything except normalize measurement - many of these evals can be esoteric and difficult vs valuable.

16h1.7K11

ZOYAN@MEM00063

A question on token cost: How much of the token spend is real workflow complexity, and how much comes from the model answering too early, too confidently, or outside the right scope? And how much comes from the AI not understanding the user’s real intent — especially non-technical users and businesses that don’t know how to provide context, constraints, or output format? Maybe token efficiency is not only model routing. Maybe a big part of it is model self-limitation and user-understanding architecture.

15h7052

Crepe Supreme@crepesupreme

@chamath Chamath's $2,740 vs $105,000 is real. The catch: cut per-token cost 10x and you often run 10x the workload. Google volume rose 330x when costs fell 100x. Bill held. Smart play is the barbell, not a wholesale switch. Thread:

17h52512

1.08@ArcanesValor

@chamath The point is that most tokens will be consumed on long horizon tasks, not that the benchmark tasks are themselves real. People who have first-hand experience with coding agents find DeepSWE to be the closest reflection of actual capability.

16h34241

Chamath Palihapitiya@chamath

@yuhasbeentaken Agreed

13h63521