11h ago

a16z's Martín Casado argues $2B to $4B frontier training costs prevent open-source AI from competing with proprietary models

Maithra Raghu says proprietary models will lead for two years.

0
Original post

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

12:59 PM · May 30, 2026 View on X

@martin_casado At the true frontier they don’t - they special in per dollar performance and supporting a diverse set of usecases. Separate but overlapping ecosystems.

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
9:36 PM · May 30, 2026 · 1.7K Views

It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up.

(But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
8:20 PM · May 30, 2026 · 1.2K Views

I think we'll see a combination of (i) algorithmic + computational efficiencies reducing training costs -- won't be cheap by any means but less than expectations (ii) people willing to fund given the market opportunity -- the fact that open source is still active given how much costs have increased since ~4 years ago is pretty remarkable (iii) "indirect" distillation probably will stay for a long time

martin_casadomartin_casado@martin_casado

@maithra_raghu It's not clear if distillation is cut off, open source can catch up. Let's say at that point it costs $5b to train a competitive model. Who would pay that for something open source?

8:21 PM · May 30, 2026 · 837 Views
8:59 PM · May 30, 2026 · 57 Views

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia

preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
9:28 PM · May 30, 2026 · 7.7K Views

@martin_casado the "killer app" of open models is customization, which is always going to be easier and cheaper than closed model customization.

open models are within N months of the real frontier, and closing the gap in-domain is cheap. big win for anything workflow-shaped at scale.

will brownwill brown@willccbb

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

9:28 PM · May 30, 2026 · 7.7K Views
9:32 PM · May 30, 2026 · 1.3K Views

> - it costs $2-4B to train a current gen model

I'd like to see the mafs on that far as I can tell, "current gen models" are at most (90th percentile) ≈6X DeepSeek V4 Pro in M(active) and 10x in D. That's maaaybe $1B. And I mean Mythos, not Opus/5.5, those are 2-3x cheaper.

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
10:07 PM · May 30, 2026 · 41.6K Views

... also, there is a ton of pricing power on the frontier by being marginally better than everyone else

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
8:00 PM · May 30, 2026 · 11.4K Views

@maithra_raghu It's not clear if distillation is cut off, open source can catch up. Let's say at that point it costs $5b to train a competitive model. Who would pay that for something open source?

Maithra RaghuMaithra Raghu@maithra_raghu

It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up. (But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)

8:20 PM · May 30, 2026 · 1.2K Views
8:21 PM · May 30, 2026 · 837 Views

... also we know now that the frontier labs are focusing on autocatalytic processes of using models to make models (create GPU kernels, data cleaning etc.).

So autocatalysis will improve economies of scale.

martin_casadomartin_casado@martin_casado

... also, there is a ton of pricing power on the frontier by being marginally better than everyone else

8:00 PM · May 30, 2026 · 11.4K Views
9:02 PM · May 30, 2026 · 9.9K Views

@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.

Cameron R. Wolfe, Ph.D.Cameron R. Wolfe, Ph.D.@cwolferesearch

@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc. https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models

9:26 PM · May 30, 2026 · 1.2K Views
9:37 PM · May 30, 2026 · 800 Views

@willccbb Nice answer.

will brownwill brown@willccbb

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

9:28 PM · May 30, 2026 · 7.7K Views
9:42 PM · May 30, 2026 · 4.5K Views

Well no, it was about how can open weight labs keeps up with frontier labs. Which are in the billions per run. And they are limiting access to the largest models. And they are starting to employ autocatalytic features. And the market is showing preference for marginally better models.

Cameron R. Wolfe, Ph.D.Cameron R. Wolfe, Ph.D.@cwolferesearch

your question was not about frontier labs... it was about open weight labs, which have a drastically different structure / approach. Publicly-shared numbers for training costs at open labs are 100-1000X smaller than the number you cited. The gap in performance has remained relatively consistent, with public estimates recently finding that open labs are ~3-4 months behind the frontier. First mover cost is way higher, open labs just have to replicate and not fall behind too much.

9:44 PM · May 30, 2026 · 219 Views
10:25 PM · May 30, 2026 · 202 Views

@teortaxesTex @sun_hanchi Often multiple models are released around a single pre-training.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@martin_casado @sun_hanchi nobody's doing it for 6 months, the model will be obsolete on release

10:11 PM · May 30, 2026 · 207 Views
10:36 PM · May 30, 2026 · 198 Views

@grok can you summarize the discussion in comments of this post and the primary positions / discussion points?

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
10:45 PM · May 30, 2026 · 9.5K Views

1) I’m very interested how hard in the limit. If I reserve my largest model and only use it directly for services (eg auditing a bank and issuing a report), the chances distill go way down. We’re already seeing signs of this. Holding Mythos back. Not showing reasoning traces. Penalizing third party. Subsidizing first party etc. 2) Yeah for sure. Although to date this hasn’t really worked if viewed through market share. And capital requirements are getting increasingly onerous. If MSFT and Metas next models are closed (as the rumor goes) it’d seem we’re moving away from this.

Sonya Huang 🐥Sonya Huang 🐥@sonyatweetybird

@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)

9:01 PM · May 30, 2026 · 1.3K Views
3:19 AM · May 31, 2026 · 668 Views

1) So, I go back and forth. History would suggest that when you have a few players, and large capital costs, the industry will converge to an oligopoly rather than have margin eroding competition. We saw this with cloud, telcos, chips, etc. Given how effective distillation is, I suspect the trend we're seeing to less access is real. 2) Yup, very much agree.

Sonya Huang 🐥Sonya Huang 🐥@sonyatweetybird

1) fair enough! do you think this is a likely outcome in the limit? it seems like competition might force the FMs to keep releasing the best models they can afford 2) this seems like china's game for now until some of the us-funded options (reflection etc) are ready. and nation-state budgets haven't come into the picture yet, agree?

3:42 AM · May 31, 2026 · 289 Views
3:59 AM · May 31, 2026 · 153 Views

@cwolferesearch Oh for sure. I think the number actually is about 2-5% the cost to build a "close enough" model. However the question remains whether being an epsilon better results in sufficient pricing leverage to take the market. Thus far it has. Will that continue? I don't know.

Cameron R. Wolfe, Ph.D.Cameron R. Wolfe, Ph.D.@cwolferesearch

open labs will likely stay 3-6 months behind frontier labs (as has been the case for some time), and they will probably do this while spending 100X less money than frontier labs (as has been the case for some time). Arcee is one example, but also see DeepSeek, Olmo, Nemotron, etc. All of these open models publicize their training costs, the info is openly available.

3:37 AM · May 31, 2026 · 46 Views
4:01 AM · May 31, 2026 · 23 Views

@cwolferesearch Totally. I've just been surprised how little this is actually reflected in the market.

Cameron R. Wolfe, Ph.D.Cameron R. Wolfe, Ph.D.@cwolferesearch

@martin_casado also worth mentioning the reverse question - if open weight labs can replicate $4B worth of research efforts with ~$20M, do frontier labs have a capital efficiency problem?

3:31 AM · May 31, 2026 · 83 Views
4:02 AM · May 31, 2026 · 64 Views

@martin_casado That’s not a frontier game but cost-efficiency+very likely a subagent market. Current debate over tokenomics is very reminiscent of early open source.

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
9:53 PM · May 30, 2026 · 458 Views

I originally thought this S curve meant we should be bullish on OS models, and I still do to some extent

But I think it's underrated how much there may be an increasing frontier of complexity that can give value in many knowledge work domains

And if that's true, then to keep up in a competitive situation (eg ultimately economic competition between countries), you need to stay on that frontier, because everything below that creates differentially less value

4:21 AM · May 31, 2026 · 23 Views

That said, it may take some time to get to the point where gains in a competitive market concentrate heavily at the frontier, because the models are improving fast enough that they just unlock a huge bucket of low hanging fruit.

It seems more plausible we see a period of automation in any given profession over eg 5 years, before further efficiency gains and thus competitive advantage becomes closer to frontier-only, if there is indeed a high frontier of complexity.

Herbie BradleyHerbie Bradley@herbiebradley

I originally thought this S curve meant we should be bullish on OS models, and I still do to some extent But I think it's underrated how much there may be an increasing frontier of complexity that can give value in many knowledge work domains And if that's true, then to keep up in a competitive situation (eg ultimately economic competition between countries), you need to stay on that frontier, because everything below that creates differentially less value

4:21 AM · May 31, 2026 · 23 Views
4:26 AM · May 31, 2026 · 14 Views

@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
9:01 PM · May 30, 2026 · 1.3K Views

1) fair enough! do you think this is a likely outcome in the limit? it seems like competition might force the FMs to keep releasing the best models they can afford 2) this seems like china's game for now until some of the us-funded options (reflection etc) are ready. and nation-state budgets haven't come into the picture yet, agree?

martin_casadomartin_casado@martin_casado

1) I’m very interested how hard in the limit. If I reserve my largest model and only use it directly for services (eg auditing a bank and issuing a report), the chances distill go way down. We’re already seeing signs of this. Holding Mythos back. Not showing reasoning traces. Penalizing third party. Subsidizing first party etc. 2) Yeah for sure. Although to date this hasn’t really worked if viewed through market share. And capital requirements are getting increasingly onerous. If MSFT and Metas next models are closed (as the rumor goes) it’d seem we’re moving away from this.

3:19 AM · May 31, 2026 · 668 Views
3:42 AM · May 31, 2026 · 289 Views

@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.

interconnects.ai
Arcee AI goes all-in on open models built in the U.S.
Interconnects interview #16 to celebrate the release of Trinity Large.
martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
9:26 PM · May 30, 2026 · 1.2K Views

your question was not about frontier labs... it was about open weight labs, which have a drastically different structure / approach.

Publicly-shared numbers for training costs at open labs are 100-1000X smaller than the number you cited. The gap in performance has remained relatively consistent, with public estimates recently finding that open labs are ~3-4 months behind the frontier.

First mover cost is way higher, open labs just have to replicate and not fall behind too much.

martin_casadomartin_casado@martin_casado

@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.

9:37 PM · May 30, 2026 · 800 Views
9:44 PM · May 30, 2026 · 219 Views

@martin_casado also worth mentioning the reverse question - if open weight labs can replicate $4B worth of research efforts with ~$20M, do frontier labs have a capital efficiency problem?

martin_casadomartin_casado@martin_casado

Well no, it was about how can open weight labs keeps up with frontier labs. Which are in the billions per run. And they are limiting access to the largest models. And they are starting to employ autocatalytic features. And the market is showing preference for marginally better models.

10:25 PM · May 30, 2026 · 202 Views
3:31 AM · May 31, 2026 · 83 Views

open labs will likely stay 3-6 months behind frontier labs (as has been the case for some time), and they will probably do this while spending 100X less money than frontier labs (as has been the case for some time). Arcee is one example, but also see DeepSeek, Olmo, Nemotron, etc. All of these open models publicize their training costs, the info is openly available.

martin_casadomartin_casado@martin_casado

Well no, it was about how can open weight labs keeps up with frontier labs. Which are in the billions per run. And they are limiting access to the largest models. And they are starting to employ autocatalytic features. And the market is showing preference for marginally better models.

10:25 PM · May 30, 2026 · 202 Views
3:37 AM · May 31, 2026 · 46 Views

@martin_casado yeah I agree - some of these trends are really hard to understand, even with the public info a lot of what you can read is contradictory. kind of just have to wait and see

martin_casadomartin_casado@martin_casado

@cwolferesearch Totally. I've just been surprised how little this is actually reflected in the market.

4:02 AM · May 31, 2026 · 64 Views
4:29 AM · May 31, 2026 · 78 Views

@martin_casado yeah I agree

martin_casadomartin_casado@martin_casado

@cwolferesearch Oh for sure. I think the number actually is about 2-5% the cost to build a "close enough" model. However the question remains whether being an epsilon better results in sufficient pricing leverage to take the market. Thus far it has. Will that continue? I don't know.

4:01 AM · May 31, 2026 · 23 Views
4:30 AM · May 31, 2026 · 13 Views

@martin_casado Why would open model labs not deploy $2B to train? Many are well capitalized and generating substantial revenues through APIs and model licensing.

martin_casadomartin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?

7:59 PM · May 30, 2026 · 159.2K Views
4:09 AM · May 31, 2026 · 375 Views
a16z's Martín Casado argues $2B to $4B frontier training costs prevent open-source AI from competing with proprietary models · Digg