Paul Graham argues that big companies struggling with LLM token costs is a normal phase favoring startups

VIEWS175.5KBOOKMARKS402LIKES1.2KREPLIES102

Curiously enough I did office hours today with a startup that cuts companies' LLM token costs by optimizing requests. They can cut costs by about half, which they split with the customer. So the TAM is a quarter of the model companies' corporate revenue. That's a big TAM!

Paul Graham@paulg

If big companies can't make a net return on their LLM token costs, that doesn't mean it's impossible to. In fact this is exactly what you'd expect to happen with a new technology. Incumbents can't use it well, and are replaced by upstarts who can.

12h175.5K1.2K402

RETWEETS168

Paul Graham@paulg

If big companies can't make a net return on their LLM token costs, that doesn't mean it's impossible to. In fact this is exactly what you'd expect to happen with a new technology. Incumbents can't use it well, and are replaced by upstarts who can.

1d436.4K3K463

Garry Tan@garrytan

Skill issues at big company means small new ones can eat their lunch

Paul Graham@paulg

If big companies can't make a net return on their LLM token costs, that doesn't mean it's impossible to. In fact this is exactly what you'd expect to happen with a new technology. Incumbents can't use it well, and are replaced by upstarts who can.

1d48.7K35785

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

👇👇👇👇👇👇

Paul Graham@paulg

If big companies can't make a net return on their LLM token costs, that doesn't mean it's impossible to. In fact this is exactly what you'd expect to happen with a new technology. Incumbents can't use it well, and are replaced by upstarts who can.

18h24.3K6410

Ahmad@TheAhmadOsman

@paulg Or just do this :)

1d1.5K226

alex funk@alexzfunk

@paulg I also think the endgame of LLM routing looks like a mixture of 80% fine tuned local models/ 20% frontier lab models

12h82561

Raph. H.@Rapahelz

@paulg "The very processes and values that constitute an organization’s capabilities in one context, define its disabilities in another context."

-Clayton Christensen

1d6743

Kevin Smith@kevin_smith51

@paulg We are building this with an open source core engine @modelmeld . Approximating TAM as splitting savings with customers is the wrong way to look at it IMO because very hard to validate what costs "would have been". https://github.com/modelmeld/modelmeld

11h24212

Paul Graham@paulg

@rickasaurus Their valuations are bets on the probability of this outcome.

22h23831

kobez@kobez_01

@paulg Adapt or die. A tale as old as time.

23h2111

Michael Fischer@Holden_Rye_

Um, yeah. I’m not sharing details on my defense. I have haters. When I became vocal about the comparison between how some companies code and how sophisticated hackers exploit, it opened me up to revenge attempts.

I only ever messed with blackhats in places like CryptBB, and they get extra pissed when you fry their system or expose how fragile their setup really is.These Kali kids aren’t used to systems-level exploitation. They’ll go all the way around their ass just to get into Google. They’re tool users, not systems thinkers. That’s the issue with tech now. One generation had to learn things the hard way, which was better for developing real understanding. You had to break things, trace things, rebuild things, and actually understand the system. Then the next generation grows up inside polished products and prebuilt tools. They become productized employees: trained to operate the interface, not understand the machine underneath it and thats how craft gets replaced by workflow.

I haven’t hacked in years except cod cheaters. Little bastards. Change subject.

Tell me more on quantum lab at vandy. Im not far

10h15

Rohan Arun@RohanArun

@paulg The false premise is token costs won't approach software costs.

For some reason, every time you ask chatGPT to solve a Rubiks cube, it regenerates the same code over 8 minutes. Everyone is very wasteful right now.

We invented a new primitive that reduces this cost to 0.

1d763

Tony Rost@raspberryman

@paulg The fatal mistake is expecting cost reductions in IT. That line item is only going to get bigger as % net sales. All token ROI needs to be in COGS and classically stubborn operating lines, such as legal and leases.

1d10211

The Long Compound@TheLongCompound

People should be assigned a threshold of tokens, then they will start making better decisions.

It's easy to get so lazy with an LLM and ask it silly things like "make a screenshot of all views of the app, open them and let me know what you think" instead of just looking at the app yourself..

The best employees will be the ones that bring better results with less token usage... it should definitely be a metric..

1d3313

X Team Pal@XTeamPal

Hi 👋 Paul , keep all saved 99% tokens from below ⬇️

As someone who builds AI agents every day, token usage quickly turned into a major bottleneck for me

It’s now saving me ~88 million tokens per day (and climbing toward 2B+ monthly)

I’ve fully open-sourced it under MIT so everyone can benefit

Would love your feedback if you give it a try! 🙏

11h561

Owen Evanger@OwenEvanger

@paulg That is brilliant!

11h1711

Sooraj@suryanox7

@paulg Half the token bill sounds nice until you realize the real win is getting people to trust a third party with their prompts. Splitting the savings is clever, but the margin’s razor thin.

9h141

Jack Feynman@JackFeynman

@paulg Has any dominant incumbent in one era managed to retain dominance in another?

1d2872

brady@brady_thinker

@XTeamPal how does it work, bro? Can I use it in my Claude Code or other agents?

8h191

Murphy Alex@TheMurphyAlex

@paulg The 50% cut is real, but the bigger lever is upstream. Most enterprise LLM spend goes to requests that should never reach the model. Fix the routing, add caching, decompose the tasks, and costs drop before you optimise a single token.

12h2971