/AI7h ago

Researcher Questions Whether to Optimize Models or Evaluation Harnesses

175265113526.1K
Original post
Philipp Schmid@_philschmid#927inAI

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

7:36 AM · Jun 6, 2026 · 26.4K Views
Sentiment

Users endorsed optimizing models over evaluation harnesses because brittle tools break as models improve, while others objected that customizing harnesses for models creates constant issues.

Pos
33.3%
Neg
66.7%
6 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS380BOOKMARKS1
Aaron Slodov@aphysicist

@_philschmid harnesses should just be the new benchmark

3hViews 380Likes 3Bookmarks 1
LIKES5

@_philschmid Yes

Philipp Schmid@_philschmid

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

2hViews 359Likes 5Bookmarks 0
REPLIES2
Leo Tavares@LeoTava8

@_philschmid Co-design, but the harness is where iteration actually happens. You swap models every few months and tweak the harness hourly. Build it to absorb model churn instead of leaning on one model's quirks, and you stay portable.

5hViews 243Likes 1Bookmarks 1
LeetLLM.com@leetllm

@_philschmid Optimizing for a harness is just sophisticated overfitting. The second you tailor models to specific test setups, you've lost. Sutton's Bitter Lesson always wins: scale compute over heuristics.

4hViews 170Likes 5
cogsec@affaan

@_philschmid why in the hell would you optimize the model for the harness instead of creating a meta harness / harness that form fits around the model / models, the friction in one direction versus the other is greatly imbalanced

2hViews 39Likes 2
Charlie Aschkenasy@CAschkenasy

@_philschmid Increasingly think the harness is the differentiator, not the model. And the winning harness is model-agnostic: optimized for all of them, so your model preference doesn't matter, you rely on the harness to execute either way. Nobody's cracked it so far.

5hViews 185
A War@AWar1586398

@_philschmid The former, no doubt. Models are becoming commoditized even at their inflated expense. The tool set is what matters.

5hViews 46Likes 1
Markus J. Buehler@ProfBuehlerMIT

@_philschmid Neither. Evolve, adapt, synergize.

6hViews 141Likes 4
hirenpatelatl@hirenpatelatl

@_philschmid Model should run in a default harness knowing all jobs to be done. Then fine tunes/variants should be done for popular harnesses. Start with Hermes please. Then fix AGY from what’s learned. It’s been unusable compared to gemini CLI.

5hViews 117

@_philschmid Ding ding ding! Someone is feeling my vibe.

It costs a lot fscking less to run search over harness templates than it does to train, brother.

Clearly it's the harness..

5hViews 34Likes 1
GUBA@gubatron

@_philschmid I want my harness to work with every model I throw at it. I'm model agnostic and will simply prefer to use the right model for the task, and often I just want the most intelligence, as fast and cheap as possible. I have no allegiance to any one model.

3hViews 32Likes 1

@_philschmid Gemini struggles with Agentic Needs like following custom instructions set in harness (in my case AG). Fails to look for relevant agent skills and using them unless specially said so in the chat message. Same goes for mcp. For me i would love some Agentic magic in Gemini

6hViews 81
Auriel@aurielws

@_philschmid They should move together :)

5hViews 106Likes 3
Praveen Venkatesh@praveenvnktsh

@_philschmid If model capabilities improve consistently, the harness should become simpler over time.

4hViews 29
Max Andrews@madmaxbr5

@_philschmid neither TBH. The harness should provide clear and coherent abstractions that any intelligent model can use effectively.

4hViews 23
Fahis@fahism767

@_philschmid Honestly, probably both - but the harness is usually the first thing people underestimate.

4hViews 163Likes 2
Dan McAteer@daniel_mac8

@_philschmid Yes.

4hViews 162Likes 2
yamed@yamedoff

@CAschkenasy @_philschmid This is impossible, optimized for everything means optimized for none.

5hViews 12
Ali Raza@aleeraza003

@_philschmid Optimize the harness for the model, the model changes every week

4hViews 115Likes 2
Load more posts