/AI7h ago

Researcher Questions Whether to Optimize Models or Evaluation Harnesses

175265113526.1K

Original post

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

7:36 AM · Jun 6, 2026 · 26.4K Views

/AI7h ago

Researcher Questions Whether to Optimize Models or Evaluation Harnesses

175265113526.1K

#678

Original post

Philipp Schmid@_philschmid#927inAI

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

7:36 AM · Jun 6, 2026 · 26.4K Views

Sentiment

Users endorsed optimizing models over evaluation harnesses because brittle tools break as models improve, while others objected that customizing harnesses for models creates constant issues.

Pos

33.3%

Neg

66.7%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS380BOOKMARKS1

Aaron Slodov@aphysicist

@_philschmid harnesses should just be the new benchmark

3h38031

LIKES5

Bojan Tunguz@tunguz

@_philschmid Yes

Philipp Schmid@_philschmid

My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

2h35950

REPLIES2

Leo Tavares@LeoTava8

@_philschmid Co-design, but the harness is where iteration actually happens. You swap models every few months and tweak the harness hourly. Build it to absorb model churn instead of leaning on one model's quirks, and you stay portable.

5h24311

LeetLLM.com@leetllm

@_philschmid Optimizing for a harness is just sophisticated overfitting. The second you tailor models to specific test setups, you've lost. Sutton's Bitter Lesson always wins: scale compute over heuristics.

4h1705

cogsec@affaan

@_philschmid why in the hell would you optimize the model for the harness instead of creating a meta harness / harness that form fits around the model / models, the friction in one direction versus the other is greatly imbalanced

2h392

Charlie Aschkenasy@CAschkenasy

@_philschmid Increasingly think the harness is the differentiator, not the model. And the winning harness is model-agnostic: optimized for all of them, so your model preference doesn't matter, you rely on the harness to execute either way. Nobody's cracked it so far.

5h185

A War@AWar1586398

@_philschmid The former, no doubt. Models are becoming commoditized even at their inflated expense. The tool set is what matters.

5h461

Markus J. Buehler@ProfBuehlerMIT

@_philschmid Neither. Evolve, adapt, synergize.

6h1414

hirenpatelatl@hirenpatelatl

@_philschmid Model should run in a default harness knowing all jobs to be done. Then fine tunes/variants should be done for popular harnesses. Start with Hermes please. Then fix AGY from what’s learned. It’s been unusable compared to gemini CLI.

5h117

Kyle 'esSOBi' Stone@essobi

@_philschmid Ding ding ding! Someone is feeling my vibe.

It costs a lot fscking less to run search over harness templates than it does to train, brother.

Clearly it's the harness..

5h341

GUBA@gubatron

@_philschmid I want my harness to work with every model I throw at it. I'm model agnostic and will simply prefer to use the right model for the task, and often I just want the most intelligence, as fast and cheap as possible. I have no allegiance to any one model.

3h321

Darth Developer@codeXDXD

@_philschmid Gemini struggles with Agentic Needs like following custom instructions set in harness (in my case AG). Fails to look for relevant agent skills and using them unless specially said so in the chat message. Same goes for mcp. For me i would love some Agentic magic in Gemini

6h81