On price/token != cost/task:
To help internalize this fact for yourself, run Terminal-Bench using Haiku and then using Opus.
Here are the results for a 15-task subset. Haiku is 10x the cost!
Price per token != cost per task
Haiku cost $95.46 compared to $9.47 for Opus.
On price/token != cost/task:
To help internalize this fact for yourself, run Terminal-Bench using Haiku and then using Opus.
Here are the results for a 15-task subset. Haiku is 10x the cost!
Price per token != cost per task
Positive users praise the cost-per-task benchmark for making Haiku vs Opus efficiency comparisons more concrete than raw token prices, while negative users criticize Haiku for wasting tokens through repeated failures and retries.
No Digg Deeper questions have been answered for this story yet.
Cost per task is going to be a meaningful metric these next 12 months.
On price/token != cost/task:
To help internalize this fact for yourself, run Terminal-Bench using Haiku and then using Opus.
Here are the results for a 15-task subset. Haiku is 10x the cost!

@alexatallah AI is making time a worse proxy for value.
Tasks and eventually outcomes are what we’ll optimize for.

@alexatallah wow. would be cool to have this in OpenRouter MCP.

@alexatallah haiku retrying 5x on a task opus nails first try

@alexatallah cheaper per token. more tokens to fail. more tokens to retry. more tokens to eventually give up.

@alexatallah @theo made a beautiful explanation of this concept in his recent video. Must watch.

@Suhail AI is making time a worse proxy for value.
Tasks and eventually outcomes are what we’ll optimize for.

@Suhail Cost per task is the real flex.

@alexatallah Thought you checked out Dm.

@Suhail ngl seeing cost/task spelled out like that makes the efficiency conversation way more concrete
most people still thinking in raw token price

@alexatallah 26 percent pass rate at 30m tokens is not a cheaper model, it is a token furnace with delusions of usefulness

@Suhail tokens per token was fun while it lasted

@Suhail lindy saw this coming. dropped claude for deepseek. token pricing looked insane but actual cost per task crashed. the public pricing sheet is a bad proxy. what matters is what you actually pay to get the work done

@alexatallah Would be great if we can get both price/token and cost/task for each model, for each benchmark in the web