Also there’s major continuity issues where swapping models, especially to different providers, OpenAI’s and anthropic’s thinking tokens aren’t compatible and can’t be swapped out.
So swapping models requires removing history. Which seems like a recipe for easy context loss
Potentially dumb question, but doesn’t cache busting mean this is actually worse for anything that you can’t one shot?
1st prompt Yay cheap model 2nd prompt Oh wait we need the big model, which means we resubmit your entire chat history
So you pay full price for 2nd prompt, which includes paying for the uncach’ed preceding history, which means you now paid twice for the first prompt, once for cheap model and once for big model
The benchmarks are very misleading here because they only test a single prompt sent. But if you need any sort of follow up you are paying extra