
Maybe we are moving to the "optimization" phase of the AI game. Lots of things change in this phase. In the dot-com boom, all website were built on Oracle+Sun. Five years later all MySQL + Linux. Eventually the margins matter.
Many users criticize AI models for skipping web searches and giving up sooner to cut token costs, as this causes degraded quality, less effort, and reduced reliability.

Maybe we are moving to the "optimization" phase of the AI game. Lots of things change in this phase. In the dot-com boom, all website were built on Oracle+Sun. Five years later all MySQL + Linux. Eventually the margins matter.

@bgurley So they get lazy over time… just like us

@bgurley Did you see @thefriley interview on all-in? Recommend

@bgurley As a data point - I analyzed my role spend from using Claude code with a Max-5x plan ($100/mo), for the last 30 days, if I was paying for usage (like api calls) and it was almost $5K. And I’m not even the greatest TokenMaxxer around :)

@bgurley "We went from maximizing accuracy to optimizing compute budgets, and the user experience is paying the price. LLMs aren't getting dumber; they're being engineered to 'give up' early because web searches and deep reasoning loops are too expensive at scale."

@bgurley of course ... and the control on this without people churning is the margin for these businesses ... the lack of framework for what to spend and how to evaluate an answer is the margin (just like ads a generation ago)

@bgurley @thefriley A significant deflationary curve in compute costs, she mentioned that from GPT-4 to GPT-5.4, the cost of tokens dropped by approximately 97%
And they are highly focused on costs for inference
It’s a good watch https://youtu.be/TjrShuj_Zsg?si=0iE_4XBfJypPg3bX

The simple fact that AI can create an amazing deck for me, but won’t allow me to download it into Google slides or Ppt (if anyone still uses that) so I can edit it and use it (rather than staring at like something I desperately want in a window shop but can’t buy at any cost) convinces me that AI is still not there.

@bgurley I am having a similar experience Bill. Recently, I’ve had longer chats to get to good outcomes and almost have to coax the model to get to those outcomes. 🤘🏻

@bgurley I have especially noticed this with Gemini models.

@bgurley Meaning they’re deliberately avoiding web search for recency, even when prompted?

@bgurley Which models have you noticed this with? I’ve seen the same laziness with Sonnet lately, but not with the new Opus 4.8
Could be plan or query dependent

@tyler__palmer 🤣🤣🤣

@bgurley Yeah I agree, the Claude family has had a weird habit of telling you to go to bed or do other things. It definetly seems like their is some effort on its part to discourage people using it for long periods. Could be for mental health, but the cynical read is that its reduce cost.

@bgurley yeah. this is the part users will notice first: not the bill, but the model quietly doing less work. evals need to catch effort regression, not just wrong answers.

@bgurley Interesting and makes sense. I’ve noticed the same thing recently on Claude. It pushes back on researching topics and is more active in shaping our discussions.

@bgurley noticed this too. reasoning tokens getting trimmed. models aren't dumber they're just being told to think less

@bgurley Yeah probably because they want their s-1 to be more appealing to investors.

@bgurley Are you sure you're using Gemini? Don't think I've ever seen Gemini NOT do web search.

@bgurley I noticed this under Opus 4.8 vs 4.7. Same prompts for company deep dives, output is 10-15% shorter with less effort put in.