Analysis finds using Hugging Face CLI for AI agent workflows cuts token consumption by up to 6x compared to raw API calls

VIEWS45.7KBOOKMARKS233LIKES412RETWEETS63REPLIES71

clem 🤗@ClementDelangue

Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents!

The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!

We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.

Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).

And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.

Good tools are cached intelligence for agents!

So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.

We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!

https://huggingface.co/blog/hf-cli-for-agents

21h45.7K412233

Matt Hartman@MattHartman

good dev tools are cached intelligence for agents!

clem 🤗@ClementDelangue

Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents!

The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!

We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.

Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).

And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.

Good tools are cached intelligence for agents!

So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.

We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!

https://huggingface.co/blog/hf-cli-for-agents

16h6.9K126

Pete Skomoroch@peteskomoroch

@ClementDelangue This reminds me of university physics exams. Instead of memorizing all of electrodynamics you can re-derive most things during the exam, but that leaves you less time to finish the actual problems and isn’t an optimal strategy.

clem 🤗@ClementDelangue

Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents!

The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints!

We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch.

Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success).

And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time.

Good tools are cached intelligence for agents!

So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive.

We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast!

https://huggingface.co/blog/hf-cli-for-agents

18h47961

Faust@___faust____

@MattHartman cached intelligence is a great way to put it. the real play is making those tools invisible until the agent needs them.

13h5