/Tech4h ago

Heavy Claude Usage Spurs Model Routing to Control AI Token Costs

30182165025.6K
Original post
Jerry Liu@jerryjliu0#707inTech

Up until yesterday, our entire MTS team has operated under the philosophy of tokenmaxxing as much as possible on Claude Max plans.

With Fable, this may no longer be possible: - One of our team members hit his limit 3 times yesterday and used the equivalent of $1.5k in 10 hours - Half of our team has hit quota limits on eng work

This era of tokenmaxxing may need to be restrained - or at least have clear guardrails defined. We are concerned about running Fable at API-based billing. If every engineer starts spending tokens at levels equivalent to headcount costs, our burn rate will meaningfully increase.

Just as startups are starting to bake model routing into their core product, we will have to start thinking about model routing in our core engineering usage.

6:49 PM · Jun 10, 2026 · 24.2K Views
Sentiment

Positive users praise routing heavy Claude tasks to Fable or cheaper models for cost control, while negative users complain about rapid token exhaustion and unsustainable burn rates.

Pos
50.0%
Neg
50.0%
8 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS284
Han Xiao@hxiao

@jerryjliu0 Same, I told everyone tokenmaxxing as much as you can (to explore and live on the edge) before the employer changes its mind. we need small, local and vertical models to be ready.

4hViews 284Likes 1
LIKES1
Jerry Liu@jerryjliu0

@nickbaumann_ Would be incredible if the Fable release pushed our team to codex 🙂

2hViews 104Likes 1
RETWEETS1
Nick@nick_kango

The model subsidies will eventually end and this workflow of “creating loops that will prompt your agents” will result in massive amounts of code that’s not well understood that you will have to pay lots of money to maintain in the future.

Jerry Liu@jerryjliu0

Up until yesterday, our entire MTS team has operated under the philosophy of tokenmaxxing as much as possible on Claude Max plans.

With Fable, this may no longer be possible: - One of our team members hit his limit 3 times yesterday and used the equivalent of $1.5k in 10 hours - Half of our team has hit quota limits on eng work

This era of tokenmaxxing may need to be restrained - or at least have clear guardrails defined. We are concerned about running Fable at API-based billing. If every engineer starts spending tokens at levels equivalent to headcount costs, our burn rate will meaningfully increase.

Just as startups are starting to bake model routing into their core product, we will have to start thinking about model routing in our core engineering usage.

3hViews 1.4KLikes 7Bookmarks 2
REPLIES1
Nick@nickbaumann_

@jerryjliu0 Were you able to switch over to your codex plans when they ran out?

2hViews 169Likes 1
Ahmed Raza@ahmedrsyed

napkin math: a $350K engineer is roughly $175/hr, so 10 hours of their work equates to ~$1.75k. If Fable shipped >2x what that eng would've in those 10 hours, the spend argument can still be justified. The harder q is whether you can attribute $ to output, which is difficult to scale org wide. Token allocation will eventually follow what people ship

3hViews 22Likes 1
Trace Cohen@Trace_Cohen

@jerryjliu0 Yah it’s powerful for small specific tasks but if you let it spawn a dozen agents it will incinerate tokens

3hViews 10Likes 1
Matthew Fox@MatthewFoxAF

@jerryjliu0 inevitable beginning of the end

Of subsidies

Codex crazy plans ended last week Claude -p moves to API in 2 days Models like fable not being on API after the 22nd

It's been fun but was bound to end eventually

3hViews 132
Lavish Saluja@lavishsaluja

@jerryjliu0 time to also improve the llama router 😅

3hViews 128
Alvaro Oliveira@Memphisbr

Anthropic has already been preparing for that approach where Opus can be the “verifier” Model at the API layer. We can probably use sonnet with Fable as the verifier and get amazing results.

I’ve been having Fable orchestrate the workflow of sonnet and opus agents and have it verify the results. Way less “fable tokens”, the initial vibe is of great results. No science behind it yet.

3hViews 124
Chad Brewbaker@SMT_Solvers

@jerryjliu0 Federate your creds bro. Don't use the underwear account to test new stuff - set up a new account with different billing.

3hViews 90
Shailesh@0xThoughtVector

@jerryjliu0 you guys work on parsing pdfs so there's def AI/ML involved. you guys know that Fable is nerfed for AI/ML, right?

3hViews 82
Ahmed Raza@ahmedrsyed

@jerryjliu0 yeah fair point - per-run attribution breaks under tokenmaxxing. Perhaps the math just moves up a level where team spend vs what the team shipped that quarter is what matters. Ceiling at that point might be how much agent output the team can actually absorb and act on vs the bill

3hViews 65
Pushkar Jain@pushkar_jain26

@jerryjliu0 I used 86% of my weekly limits in 16 hours on 200$ license. No more tokenmaxxing

3hViews 59
Saylor@seylorra

@jerryjliu0 imagine burning $150/hr on claude and still having to beg for more tokens. fable changed the math, guess api calls finally cheaper than peace of mind

2hViews 41
david@davidtsong

@jerryjliu0 good idea to use fable as a planning agent->delegate to cheaper coding models

3hViews 23
SuMo@su_mo

@jerryjliu0 This is at least partly inconsistent with the VC thesis on Anthropic investment.

2hViews 20
sdmat@sdmat123

@ahmedrsyed @jerryjliu0 Senior SWEs do more than code.

They set the technical direction, evaluate considerations that go beyond the codebase, and act a a reservoir of institutional knowledge.

AGI will be able to do all of that but we don't have AGI yet.

Dollar equivalence is a category error.

2hViews 20
Ferbin@Ferbin08

@jerryjliu0 The rate limit's the real constraint now, not cost.

Cheaper tokens mean you can push harder but hit the ceiling sooner.

For systems optimized for throughput, you've flipped which problem matters.

3hViews 20
Gil@gilnotmountain

@jerryjliu0 If fable goes for api based billing and Gpt5.6 doesnt, do you think there will be a shift in dev market share?

2hViews 18
Jerry Liu@jerryjliu0

@ahmedrsyed i think that's super hard to say, because tokenmaxxing == you're fine wasting tokens massively as long as SOME of it produces meaningful output relative to your bandwidth towards processing

3hViews 14
Load more posts