we are now at a situation where "frontier models" with tools can work quite well within a ReAct loop (or similar), but take like 15$, 40 minutes and a trillion tokens to perform a single task.
are we ok with this balance?
my aesthetic is that we can an should be much more efficient with smaller models, less reasoning, and better harnesses, and that this will be a better solution.
but am I alone in this?










