SITUATION EXPLAINED: 70% of frontier model queries could run locally for free.
@ClementDelangue, co-founder and CEO of @huggingface:
"There was an interesting study from Stanford published last year showing that 70% of the queries that people ask to ChatGPT could be accurately answered locally on your laptop. For free."
"Most of the AI workloads that people do today with frontier models could be done by models that are cheaper, faster, more customizable, more controllable. And they don't do it because frankly it's a pain to pick the right model."
"You're subsidized so you don't have to care because you have your subscription, so you route everything to Einstein. 'Hey Einstein, what's the weather today?'"
"In normal life, he would be like, 'I'm not answering your silly question.' But because it's AI and subsidized by the AI labs, all the questions are getting routed to Einstein versus in an ideal world you have different models that are more specialized and better at answering your questions in different domains."
"It's possible that routing is gonna redistribute a lot of the value capture from frontier models to a more long tail of models. It's like AI maturing... we are in the first phase where it's very simple, everyone using just one giant model. Now we're moving to the second phase."









