I think the assumption that you should use smaller models for less important tasks is flawed (or at least deserves much more careful consideration). Big models are generally better at everything but cost, so it is worth considering whether gains in non-key tasks would be valuable
Negative users criticized questioning smaller models for routine tasks as cheerleading that ignores energy costs and fails to save money, while positive users agreed quality matters more than cost-cutting.
Most Activity
In fact, areas that are not important in your organization could benefit even more from applying smarter models, since the organizational capabilities & human capital in those areas are weak, and good AI might be a cheap way to increase capacity!
I think the assumption that you should use smaller models for less important tasks is flawed (or at least deserves much more careful consideration). Big models are generally better at everything but cost, so it is worth considering whether gains in non-key tasks would be valuable

@emollick The “but cost” is doing a lot of work in that sentence especially when you pay API pricing.

@JacksonAtkinsX @emollick why did he block you wtf

@emollick It's not the quantity of tokens - it's the quality that matters 🔥🚀

@emollick Completely agree that the token optimizing cost cutting view is premature. We're still so early in this, and the CFOs are already trying to optimize abundance. I think it's fundamentally a category error on the opportunity of AI.

@emollick I think "always use a big model" works for individual users but doesn't work at scale... but "pick the best model along the pareto frontier for your task/budget does" in narrows down model selection from hundreds to maybe 10 once you include on-device ones like Gemma.

@emollick What’s the point of getting 3 cheaper wrong answers for the price of a right answer?

@JacksonAtkinsX It is not doing a lot of work. And neither is your prompt. Blocked.

@emollick Agree

@emollick @JacksonAtkinsX i am surprised you think you can identify an llm in a 10 word sentence
fwiw that sentence made sense to me and the account looks legit
these days i try to judge my content, not using tropes or not
(it can also be an llm assisted human)

@emollick Small model latency and throughput often negate cost advantages when integrating LLMs into real-time, high-frequency agentic workflows.

@emollick a lot of ‘easy’ tasks still have surprisingly high variance in real-world inputs.
still, a setup that i like is working heavy tasks with stronger model and delegating smaller ones to Sonnet. it has been working well for me!

@emollick In enterprise settings processing 1000s of docs, defaulting to frontier models is often wasteful. Basic extraction tasks work great with optimized smaller models. Measure with strong evals, then use GEPA-style optimizers to close most of the quality gap at 5-20x lower cost.

@emollick Depends on the task. Some smaller models are better at particular things than frontier models. For example, I recently switched a workflow that involved examining images and doing web searches from gpt 5.5 to Gemini Flash 3.0 and the output improved by 4x (rate of success)

@emollick this assumes task importance stays fixed while the model handles it. sometimes a small model changes what tasks matter.

@emollick Worth noting: in agentic chains the 'unimportant' task is often upstream of the important one. A cheap model's small error compounds downstream — the real cost isn't tokens, it's quality decay you only catch later.

@emollick Fable docs even mentioned that Fable with low effort outperformed other models, which might affect the cost curve.

@emollick i've never heard that assumption
people (or at least I) use weaker models for easier tasks, not "less important" tasks
in tasks, you either do or don't, not "do badly" or "do well"

@urivalev @JacksonAtkinsX “Doing a lot of work” is a really obvious AI turn of phrase. I block badly prompted AI bots.
@emollick I don’t need an expensive model to ask short syntax questions of
