Martín Casado, a16z general partner, warns open-source AI faces $2 billion training and distillation bottlenecks, drawing developer pushback

VIEWS95.4K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

> - it costs $2-4B to train a current gen model

I'd like to see the mafs on that far as I can tell, "current gen models" are at most (90th percentile) ≈6X DeepSeek V4 Pro in M(active) and 10x in D. That's maaaybe $1B. And I mean Mythos, not Opus/5.5, those are 2-3x cheaper.

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d95.4K12543

BOOKMARKS47LIKES281RETWEETS9

will brown@willccbb

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia

preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d11.6K28147

REPLIES22

Nathan Lambert@natolambert

The debate on if open or closed models win comes down to if there is disproportionate value to marginally better intelligence.

The believers of this sit across from the open models will be good enough camp.

Closed models will stay slightly smarter. Open models will be cheaper.

30d7.1K7515

martin_casado@martin_casado

... also we know now that the frontier labs are focusing on autocatalytic processes of using models to make models (create GPU kernels, data cleaning etc.).

So autocatalysis will improve economies of scale.

martin_casado@martin_casado

... also, there is a ton of pricing power on the frontier by being marginally better than everyone else

30d14.2K6816

will brown@willccbb

@martin_casado the "killer app" of open models is customization, which is always going to be easier and cheaper than closed model customization.

open models are within N months of the real frontier, and closing the gap in-domain is cheap. big win for anything workflow-shaped at scale.

will brown@willccbb

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia

preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

30d1.7K4511

martin_casado@martin_casado

... also, there is a ton of pricing power on the frontier by being marginally better than everyone else

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d15.6K860

martin_casado@martin_casado

@grok can you summarize the discussion in comments of this post and the primary positions / discussion points?

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d15.6K198

martin_casado@martin_casado

@teortaxesTex General rule of thumb right now is 100k gpus for 6 months. There is a lot of ways you can back into this number. But of course it's all from industry gossip around company raises, GPU procurement, gross burn etc.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

> - it costs $2-4B to train a current gen model

I'd like to see the mafs on that far as I can tell, "current gen models" are at most (90th percentile) ≈6X DeepSeek V4 Pro in M(active) and 10x in D. That's maaaybe $1B. And I mean Mythos, not Opus/5.5, those are 2-3x cheaper.

30d2.9K257

Sonya Huang 🐥@sonyatweetybird

@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d2.1K304

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

> But it’s hard to tell with all the very cheap capital flooding in. with very high inference margins too It is remarkable that frontier labs pretty much don't compete on cost. Like, they do, but with the shared understanding that >50% margin is sacrosanct. No involution allowed!

martin_casado@martin_casado

Lagging by 4 months doesn’t matter if the majority of the value is in being on the frontier, even if marginally. And scaling isn’t saturated. And there start to be positive economies of scale (Autocatalytic effects).

The market data seems to support this. But it’s hard to tell with all the very cheap capital flooding in.

30d2.6K243

martin_casado@martin_casado

1) I’m very interested how hard in the limit. If I reserve my largest model and only use it directly for services (eg auditing a bank and issuing a report), the chances distill go way down. We’re already seeing signs of this. Holding Mythos back. Not showing reasoning traces. Penalizing third party. Subsidizing first party etc. 2) Yeah for sure. Although to date this hasn’t really worked if viewed through market share. And capital requirements are getting increasingly onerous. If MSFT and Metas next models are closed (as the rumor goes) it’d seem we’re moving away from this.

Sonya Huang 🐥@sonyatweetybird

@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)

30d1.4K103

martin_casado@martin_casado

@willccbb Nice answer.

will brown@willccbb

@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia

preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.

30d6.8K280

Cameron R. Wolfe, Ph.D.@cwolferesearch

@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.

https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d1.7K112

Maithra Raghu@maithra_raghu

It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up.

(But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d1.5K41

Cameron R. Wolfe, Ph.D.@cwolferesearch

your question was not about frontier labs... it was about open weight labs, which have a drastically different structure / approach.

Publicly-shared numbers for training costs at open labs are 100-1000X smaller than the number you cited. The gap in performance has remained relatively consistent, with public estimates recently finding that open labs are ~3-4 months behind the frontier.

First mover cost is way higher, open labs just have to replicate and not fall behind too much.

martin_casado@martin_casado

@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.

30d28771

martin_casado@martin_casado

1) So, I go back and forth. History would suggest that when you have a few players, and large capital costs, the industry will converge to an oligopoly rather than have margin eroding competition. We saw this with cloud, telcos, chips, etc. Given how effective distillation is, I suspect the trend we're seeing to less access is real. 2) Yup, very much agree.

Sonya Huang 🐥@sonyatweetybird

1) fair enough! do you think this is a likely outcome in the limit? it seems like competition might force the FMs to keep releasing the best models they can afford 2) this seems like china's game for now until some of the us-funded options (reflection etc) are ready. and nation-state budgets haven't come into the picture yet, agree?

30d47870

David Crawshaw@davidcrawshaw

@martin_casado Not my world, but I’m curious, if - model weights are worth some $2B in pre-training alone - are running in a dozen data centers owned by a half dozen companies - fit on a thumb drive Can normal companies contain them?

30d42351

Vipul Ved Prakash@vipulved

@martin_casado Why would open model labs not deploy $2B to train? Many are well capitalized and generating substantial revenues through APIs and model licensing.

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d95421

Nathan Lambert@natolambert

@martin_casado At the true frontier they don’t - they special in per dollar performance and supporting a diverse set of usecases. Separate but overlapping ecosystems.

martin_casado@martin_casado

Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked ..

?

30d2.1K130

martin_casado@martin_casado

@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.

Cameron R. Wolfe, Ph.D.@cwolferesearch

@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.

https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models

30d1.1K50