Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
AI Judge changed title after evaluation, original title: "a16z partner Martín Casado warns open-source AI faces critical threats from rising training costs and distillation blocks"
Critics argue frontier model training costs under $1 billion.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
Many users defended open source AI models as customizable, cheap, and good enough for privacy-focused tasks, while others dismissed their competitiveness with frontier systems and criticized labs for conspiring on high margins.
No Digg Deeper questions have been answered for this story yet.
> - it costs $2-4B to train a current gen model
I'd like to see the mafs on that far as I can tell, "current gen models" are at most (90th percentile) ≈6X DeepSeek V4 Pro in M(active) and 10x in D. That's maaaybe $1B. And I mean Mythos, not Opus/5.5, those are 2-3x cheaper.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia
preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence.
The believers of this sit across from the open models will be good enough camp.
Closed models will stay slightly smarter. Open models will be cheaper.
... also we know now that the frontier labs are focusing on autocatalytic processes of using models to make models (create GPU kernels, data cleaning etc.).
So autocatalysis will improve economies of scale.
... also, there is a ton of pricing power on the frontier by being marginally better than everyone else
@martin_casado the "killer app" of open models is customization, which is always going to be easier and cheaper than closed model customization.
open models are within N months of the real frontier, and closing the gap in-domain is cheap. big win for anything workflow-shaped at scale.
@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia
preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.
... also, there is a ton of pricing power on the frontier by being marginally better than everyone else
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
@grok can you summarize the discussion in comments of this post and the primary positions / discussion points?
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
@teortaxesTex General rule of thumb right now is 100k gpus for 6 months. There is a lot of ways you can back into this number. But of course it's all from industry gossip around company raises, GPU procurement, gross burn etc.
> - it costs $2-4B to train a current gen model
I'd like to see the mafs on that far as I can tell, "current gen models" are at most (90th percentile) ≈6X DeepSeek V4 Pro in M(active) and 10x in D. That's maaaybe $1B. And I mean Mythos, not Opus/5.5, those are 2-3x cheaper.
@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
> But it’s hard to tell with all the very cheap capital flooding in. with very high inference margins too It is remarkable that frontier labs pretty much don't compete on cost. Like, they do, but with the shared understanding that >50% margin is sacrosanct. No involution allowed!
Lagging by 4 months doesn’t matter if the majority of the value is in being on the frontier, even if marginally. And scaling isn’t saturated. And there start to be positive economies of scale (Autocatalytic effects).
The market data seems to support this. But it’s hard to tell with all the very cheap capital flooding in.
1) I’m very interested how hard in the limit. If I reserve my largest model and only use it directly for services (eg auditing a bank and issuing a report), the chances distill go way down. We’re already seeing signs of this. Holding Mythos back. Not showing reasoning traces. Penalizing third party. Subsidizing first party etc. 2) Yeah for sure. Although to date this hasn’t really worked if viewed through market share. And capital requirements are getting increasingly onerous. If MSFT and Metas next models are closed (as the rumor goes) it’d seem we’re moving away from this.
@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)
@willccbb Nice answer.
@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia
preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.
@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.
https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up.
(But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
your question was not about frontier labs... it was about open weight labs, which have a drastically different structure / approach.
Publicly-shared numbers for training costs at open labs are 100-1000X smaller than the number you cited. The gap in performance has remained relatively consistent, with public estimates recently finding that open labs are ~3-4 months behind the frontier.
First mover cost is way higher, open labs just have to replicate and not fall behind too much.
@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.
1) So, I go back and forth. History would suggest that when you have a few players, and large capital costs, the industry will converge to an oligopoly rather than have margin eroding competition. We saw this with cloud, telcos, chips, etc. Given how effective distillation is, I suspect the trend we're seeing to less access is real. 2) Yup, very much agree.
1) fair enough! do you think this is a likely outcome in the limit? it seems like competition might force the FMs to keep releasing the best models they can afford 2) this seems like china's game for now until some of the us-funded options (reflection etc) are ready. and nation-state budgets haven't come into the picture yet, agree?

@martin_casado Not my world, but I’m curious, if - model weights are worth some $2B in pre-training alone - are running in a dozen data centers owned by a half dozen companies - fit on a thumb drive Can normal companies contain them?
@martin_casado Why would open model labs not deploy $2B to train? Many are well capitalized and generating substantial revenues through APIs and model licensing.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
@martin_casado At the true frontier they don’t - they special in per dollar performance and supporting a diverse set of usecases. Separate but overlapping ecosystems.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..
?
@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.
@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.
https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models