/Tech4h ago

Heavy Claude Usage Spurs Model Routing to Control AI Token Costs

30182165025.6K

Original post

Up until yesterday, our entire MTS team has operated under the philosophy of tokenmaxxing as much as possible on Claude Max plans.

With Fable, this may no longer be possible: - One of our team members hit his limit 3 times yesterday and used the equivalent of $1.5k in 10 hours - Half of our team has hit quota limits on eng work

This era of tokenmaxxing may need to be restrained - or at least have clear guardrails defined. We are concerned about running Fable at API-based billing. If every engineer starts spending tokens at levels equivalent to headcount costs, our burn rate will meaningfully increase.

Just as startups are starting to bake model routing into their core product, we will have to start thinking about model routing in our core engineering usage.

6:49 PM · Jun 10, 2026 · 24.2K Views

/Tech4h ago

Heavy Claude Usage Spurs Model Routing to Control AI Token Costs

30182165025.6K

#707

Original post

Jerry Liu@jerryjliu0#707inTech

Up until yesterday, our entire MTS team has operated under the philosophy of tokenmaxxing as much as possible on Claude Max plans.

With Fable, this may no longer be possible: - One of our team members hit his limit 3 times yesterday and used the equivalent of $1.5k in 10 hours - Half of our team has hit quota limits on eng work

Just as startups are starting to bake model routing into their core product, we will have to start thinking about model routing in our core engineering usage.

6:49 PM · Jun 10, 2026 · 24.2K Views

Sentiment

Positive users praise routing heavy Claude tasks to Fable or cheaper models for cost control, while negative users complain about rapid token exhaustion and unsustainable burn rates.

Pos

50.0%

Neg

50.0%

8 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Han Xiao@hxiao

@jerryjliu0 Same, I told everyone tokenmaxxing as much as you can (to explore and live on the edge) before the employer changes its mind. we need small, local and vertical models to be ready.

4h2841

LIKES1

Jerry Liu@jerryjliu0

@nickbaumann_ Would be incredible if the Fable release pushed our team to codex 🙂

2h1041

RETWEETS1

Nick@nick_kango

The model subsidies will eventually end and this workflow of “creating loops that will prompt your agents” will result in massive amounts of code that’s not well understood that you will have to pay lots of money to maintain in the future.

Jerry Liu@jerryjliu0

Up until yesterday, our entire MTS team has operated under the philosophy of tokenmaxxing as much as possible on Claude Max plans.

With Fable, this may no longer be possible: - One of our team members hit his limit 3 times yesterday and used the equivalent of $1.5k in 10 hours - Half of our team has hit quota limits on eng work

Just as startups are starting to bake model routing into their core product, we will have to start thinking about model routing in our core engineering usage.

3h1.4K72

REPLIES1

Nick@nickbaumann_

@jerryjliu0 Were you able to switch over to your codex plans when they ran out?

2h1691

Ahmed Raza@ahmedrsyed

napkin math: a $350K engineer is roughly $175/hr, so 10 hours of their work equates to ~$1.75k. If Fable shipped >2x what that eng would've in those 10 hours, the spend argument can still be justified. The harder q is whether you can attribute $ to output, which is difficult to scale org wide. Token allocation will eventually follow what people ship

3h221

Trace Cohen@Trace_Cohen

@jerryjliu0 Yah it’s powerful for small specific tasks but if you let it spawn a dozen agents it will incinerate tokens

3h101

Matthew Fox@MatthewFoxAF

@jerryjliu0 inevitable beginning of the end

Of subsidies

Codex crazy plans ended last week Claude -p moves to API in 2 days Models like fable not being on API after the 22nd

It's been fun but was bound to end eventually

3h132

Lavish Saluja@lavishsaluja

@jerryjliu0 time to also improve the llama router 😅

3h128

Alvaro Oliveira@Memphisbr

Anthropic has already been preparing for that approach where Opus can be the “verifier” Model at the API layer. We can probably use sonnet with Fable as the verifier and get amazing results.

I’ve been having Fable orchestrate the workflow of sonnet and opus agents and have it verify the results. Way less “fable tokens”, the initial vibe is of great results. No science behind it yet.

3h124

Chad Brewbaker@SMT_Solvers

@jerryjliu0 Federate your creds bro. Don't use the underwear account to test new stuff - set up a new account with different billing.

3h90

Shailesh@0xThoughtVector

@jerryjliu0 you guys work on parsing pdfs so there's def AI/ML involved. you guys know that Fable is nerfed for AI/ML, right?

3h82

Ahmed Raza@ahmedrsyed

@jerryjliu0 yeah fair point - per-run attribution breaks under tokenmaxxing. Perhaps the math just moves up a level where team spend vs what the team shipped that quarter is what matters. Ceiling at that point might be how much agent output the team can actually absorb and act on vs the bill

3h65

Pushkar Jain@pushkar_jain26

@jerryjliu0 I used 86% of my weekly limits in 16 hours on 200$ license. No more tokenmaxxing

3h59

Saylor@seylorra

@jerryjliu0 imagine burning $150/hr on claude and still having to beg for more tokens. fable changed the math, guess api calls finally cheaper than peace of mind

2h41

david@davidtsong

@jerryjliu0 good idea to use fable as a planning agent->delegate to cheaper coding models

3h23

SuMo@su_mo

@jerryjliu0 This is at least partly inconsistent with the VC thesis on Anthropic investment.

2h20

sdmat@sdmat123

@ahmedrsyed @jerryjliu0 Senior SWEs do more than code.

They set the technical direction, evaluate considerations that go beyond the codebase, and act a a reservoir of institutional knowledge.

AGI will be able to do all of that but we don't have AGI yet.

Dollar equivalence is a category error.

2h20

Ferbin@Ferbin08

@jerryjliu0 The rate limit's the real constraint now, not cost.

Cheaper tokens mean you can push harder but hit the ceiling sooner.

For systems optimized for throughput, you've flipped which problem matters.

3h20

Gil@gilnotmountain

@jerryjliu0 If fable goes for api based billing and Gpt5.6 doesnt, do you think there will be a shift in dev market share?

2h18

Jerry Liu@jerryjliu0

@ahmedrsyed i think that's super hard to say, because tokenmaxxing == you're fine wasting tokens massively as long as SOME of it produces meaningful output relative to your bandwidth towards processing

3h14