Founder Maithra Raghu predicts open-source AI will trail proprietary models for two years until scaling costs stabilize
Frontier labs will maintain their edge by bundling agent functionality
@martin_casado At the true frontier they don’t - they special in per dollar performance and supporting a diverse set of usecases. Separate but overlapping ecosystems.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?
It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up.
(But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?
I think we'll see a combination of (i) algorithmic + computational efficiencies reducing training costs -- won't be cheap by any means but less than expectations (ii) people willing to fund given the market opportunity -- the fact that open source is still active given how much costs have increased since ~4 years ago is pretty remarkable (iii) "indirect" distillation probably will stay for a long time
@maithra_raghu It's not clear if distillation is cut off, open source can catch up. Let's say at that point it costs $5b to train a competitive model. Who would pay that for something open source?
@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia
preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?
@martin_casado the "killer app" of open models is customization, which is always going to be easier and cheaper than closed model customization.
open models are within N months of the real frontier, and closing the gap in-domain is cheap. big win for anything workflow-shaped at scale.
@martin_casado there are two large, capable, and well-resourced entities with clear strategic interests in ensuring open models keep up: China and Nvidia preventing distillation and capturing market share are in tension. it'll be hard to distill GPT-7-BioChem, easy to distill Default Claude.
... also, there is a ton of pricing power on the frontier by being marginally better than everyone else
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?
@maithra_raghu It's not clear if distillation is cut off, open source can catch up. Let's say at that point it costs $5b to train a competitive model. Who would pay that for something open source?
It's a timing question. I expect for the next couple of years we'll continue to see only closed source models at the frontier because there are a lot more improvements that come with high costs. As that starts to stabilize, more of a chance for open source to catch up. (But I still see Anthropic, OAI, other frontier labs positioned to do well even when that happens, the infra + agent functionality bundled with the model is meaningful)
... also we know now that the frontier labs are focusing on autocatalytic processes of using models to make models (create GPU kernels, data cleaning etc.).
So autocatalysis will improve economies of scale.
... also, there is a ton of pricing power on the frontier by being marginally better than everyone else
@cwolferesearch yeah unfortunately it's not. I know nothing about Arcee Trinity. But I know a lot about the frontier labs and their training costs.
@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc. https://www.interconnects.ai/p/arcee-ai-goes-all-in-on-open-models
@martin_casado 1) distillation hard to stop 2) other economically interested parties to backstop? (chip vendors, governments)
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?
@martin_casado training cost estimate here is way off... total cost to pretrain arcee trinity large (400B) was ~$20 million including compute, salaries, data, etc.
Can someone explain to me how open source models can keep up if ... - pre-training isn't saturated - it costs $2-4B to train a current gen model - distillation is increasingly hard as access to the most powerful models gets blocked .. ?