A thing that API users of frontier models (enterprise IT deployments, for example) can miss is how powerful models are in their native harnesses. It is hard to get Claude or GPT via API to be anywhere near as capable as they are in Code or Codex & its harder as models get smarter
Many users rejected claims that native interfaces unlock more power from frontier models than APIs, citing personal experiences where APIs felt dumber or third-party tools outperformed natives on coding tasks.
Most Activity
Deleted a tweet on the fact that API users don't understand how much more powerful the frontier models are in native harnesses since I didn't differentiate in the post (limited characters!) between folks carefully evaluating other harnesses for tasks & those just using naked API.

@emollick The thing that harness users miss is how much power there is in designing your own harness (instructions, tool surface, loops, etc) around raw API calls
Unfortunately this is not subsidized and 10-100x more costly right now

@emollick The harnesses are great but it’s not that hard. You have Agents SDK that works super well, and we’ve done work where we used CC to create a harness and tools for API and it works just as well

i’ve heard this a lot but i still can’t wrap my head around it as the benchmarks & evals i’ve seen don’t support native harnesses being better over say a Pi, Droid or even a given benchmark’s custom harness
Are you referring to IT teams trying to homebrew their own or something else?

@emollick api users miss out on all the hidden context and scaffolding.

@emollick This is simply not true. Opencode etc consistently outperform native claude code by a significant margin. Model provider harnesses have misaligned incentives. The only reason they are so popular is for subsidized tokens. Enterprises should invest in harness modularity.

@emollick Yes. Although it seems that native harnesses aren't the best harnesses.

@emollick Except Gemini, which is better via API.😂

@emollick

@emollick ... You can use the models via API in their native harnesses though

@emollick yup, hard to ignore all the work even to just prompt these agents

@emollick i’ve seen this firsthand, the same model via API feels noticeably dumber on big coding/refactoring tasks compared to using it directly in its own environment.
it forced me to improve my harness which is fine

@emollick People compare APIs to products like they're the same thing. The harness is often half the intelligence.

@emollick I think that most users, in general, won't be able to understand it; most use AI for simple tasks

@emollick Or just build your own harness.

@emollick Miss us forever

@emollick tweet delete speedrun before the receipts get clipped
the naked api crowd really does need to hear this tho

@emollick People keep saying this, but I see no evidence for it beyond vibes. Can you provide some?
The Pi harness found that just stating in the system prompt that it is Claude Code improved performance.

@emollick like building your own harness from 0 - probably not a good idea
using the claude sdk - imo fine if it gets you going but you could almost certainly do better on performance/cost
and now seems a bad time for vendor locking

@emollick the abstraction tax gets heavier the smarter the model gets
the full context window is half the actual value