/Tech13h ago

Analysis argues proprietary models cost up to 40 times more than open-weight alternatives despite narrowing capability gaps

Aria's creator says specialized tasks do not require premium reasoning

2761.9K1791.8K799.5K

#719

Original post

xlr8harder@xlr8harder#1048inTech

@scaling01 The gap is irrelevant for many use cases, all that matters is whether the model can do the specific task you need done.

I don't need a model that can solve a millennium prize problem to handle tier 1 customer support.

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8:11 AM · Jun 8, 2026 · 479 Views

/Tech13h ago

Analysis argues proprietary models cost up to 40 times more than open-weight alternatives despite narrowing capability gaps

Aria's creator says specialized tasks do not require premium reasoning

2761.9K1791.8K799.5K

#719

Original post

xlr8harder@xlr8harder#1048inTech

@scaling01 The gap is irrelevant for many use cases, all that matters is whether the model can do the specific task you need done.

I don't need a model that can solve a millennium prize problem to handle tier 1 customer support.

Lisan al Gaib@scaling01

the "narrow capability gap" in question

let's put this to rest please I can't hear the coping anymore

8:11 AM · Jun 8, 2026 · 479 Views

Sentiment

Users praised open-weight models like DeepSeek for strong performance at lower cost than Claude, while others called the claims inaccurate or objected to sending data to Chinese services.

Pos

32.4%

Neg

67.6%

20 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS31.2KBOOKMARKS26LIKES115REPLIES20

Mark Cuban@mcuban

@chamath It means most utilization of AI will be localized.

There will be millions of specialized models.

The big frontier models better add unique value or they won’t stay big.

2d31.2K11526

RETWEETS179

Chamath Palihapitiya@chamath

Your margin is my opportunity: AI version…

The biggest surprise of 2026 is that the capability gap between the best open-weight/source models and the best closed models has narrowed much faster than the pricing gap. The pricing gap remains enormous while the capability gap is quite narrow.

What does this means in practice?

For a company consuming 1 billion input tokens and 1 billion output tokens per month:

GPT-5.5 Pro: ~$105,000 Claude Opus 4.8: ~$30,000 DeepSeek V4 Pro: ~$5,220 DeepSeek R1: ~$2,740

I asked ChatGPT what it thought about this and it answered as follows:

“If I were building a company today, the economic frontier would look roughly like:

DeepSeek V4 Pro / R1 for high-volume inference.

Claude Opus for premium agent workflows where reliability matters.

GPT-5.5 Pro only for workloads where its incremental capability demonstrably produces enough business value to justify a 20–40× token premium.”

Most CEOs have no idea that, instead of this nuanced approach, their teams are running amok internally by picking the most expensive models in most cases and burning through massive budgets with zero governance, audit ability and control.

As control planes like our Software Factory become more standard, you can expect the run rate revenue growth of the frontier labs to go down meaningfully and the revenues of the open models to skyrocket.

Why? Because we can implement the nuanced approach above and be agnostic to model - instead focusing on customer intent, model task and cost management among other things.

Gavin Baker@GavinSBaker

Quite a week for open-source AI. Especially American open-source. Nemotron 3 Ultra is the most important release in quite some time. And some really cool RL and fine-tuning work from Harvey.

2d799.2K1.9K1.8K

Yum⋆₊˚@yuhasbeentaken

this feels exactly like the early cloud era...

at first everyone just picked the easiest premium option and nobody cared because the bill was still small. then usage exploded, finance noticed, and suddenly governance became a product category. ai inference is going through the same cycle now. the winners won't be the teams using the most powerful model everywhere... it'll be the teams that can route work intelligently based on task, risk, latency, quality, and cost

2d3.6K1512

1.08@ArcanesValor

@chamath The apples to apples comparison would be Deepseek to models of similar size and capability. The open models are not on the efficient frontier of price/capability once you benchmark against the most token intensive real-world tasks.

2d7.9K286

Mac@mac_eth

@chamath Those margins are my company's opportunity - http://surplusintelligence.ai offers great discounts relative to openrouter's price.

All time average discount per request was 80%; current daily median is a 65% discount.

If you have an AI business, a 50% discount means 2x the inference

2d1.5K563

Chamath Palihapitiya@chamath

@ArcanesValor Don’t be confused. These evals aren’t focused on “real world tasks”. That’s the fallacy.

Evals aren’t meant to do anything except normalize measurement - many of these evals can be esoteric and difficult vs valuable.

2d5.1K336

1.08@ArcanesValor

@chamath The point is that most tokens will be consumed on long horizon tasks, not that the benchmark tasks are themselves real. People who have first-hand experience with coding agents find DeepSWE to be the closest reflection of actual capability.

2d1.2K135

Chamath Palihapitiya@chamath

@abhav_k no paywalls for secrets is like giving a loaded gun to a child

2d8.4K242

Louie Peters@_LouiePeters

To better adjust for thinking token usage, a fair price comparison looks more like:

GPT-5.5 xhigh = $3357 Claude 4.8 Max = $4686 Deepseek V4 Pro Max = $268

*Cost to run artificial analysis benchmark.

I still personally find a far bigger gap between frontier models and open source in real world vs benchmarks though. The much larger training budgets for Claude/GPT are more obvious when you test the edge of capabilities and regularly run into edge cases not covered by public benchmarks.

The value of my time vs LLM tokens still makes it very hard to justify using an inferior LLM and spending more time guiding it and fixing its mistakes.

That said, super glad to see open weight models continue to progress and some great models out this week.

2d54992

Chamath Palihapitiya@chamath

@yuhasbeentaken Agreed

2d3.5K72

Phillip Lord@philliplord1

@chamath 95% of the capability for 5% of the cost usually wins.

Economics beats benchmarks.

2d83515

Abhav@abhav_k

@chamath bro why are your substack blogs paywalled aren’t you a billionaire?

2d4.8K18

Crepe Supreme@crepesupreme

@chamath Chamath's $2,740 vs $105,000 is real. The catch: cut per-token cost 10x and you often run 10x the workload. Google volume rose 330x when costs fell 100x. Bill held. Smart play is the barbell, not a wholesale switch. Thread:

2d62912

Tomas Hernando Kofman@tomas_hk

@chamath Hey @chamath, couldn't agree more: I've been working on intelligent model routing for the past 2 years. We power auto-routing for OpenRouter and are working with F100 leaders on intelligent model routing for coding agents: http://notdiamond.ai

2d240101

Kevin Simback 🍷@KSimback

@chamath Posted the enterprise cost optimization playbook here, routing is just the first, and easiest step

2d1.2K41

Mark Cuban@mcuban

@chamath @abhav_k 😆😆😆

2d5.1K51

Grease@greasemeta

@mcuban @chamath Like what? What would you like to see?

2d17522

neekhil vatsa@garfieldII

@chamath For ease of understanding of what a true open source is.

2d35112

Nico@DonAndNico

Most companies are not losing on AI because they picked the wrong model. They are losing because token spend is climbing while almost none of the work is tied to revenue, margin, or cost reduction. Enterprise AI should be led by small technical agent teams with evals, traces, token budgets, and a named ROI target.

2d8632

Lisan al Gaib@scaling01

@xlr8harder that's true

but most use it for programming where the skill ceiling is very high

12h3038