My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?
Users endorsed optimizing models over evaluation harnesses because brittle tools break as models improve, while others objected that customizing harnesses for models creates constant issues.
Most Activity

@_philschmid harnesses should just be the new benchmark
@_philschmid Yes
My personal research question for today: Should we optimize the model for a harness or should the harness be optimized for the model?

@_philschmid Co-design, but the harness is where iteration actually happens. You swap models every few months and tweak the harness hourly. Build it to absorb model churn instead of leaning on one model's quirks, and you stay portable.

@_philschmid Optimizing for a harness is just sophisticated overfitting. The second you tailor models to specific test setups, you've lost. Sutton's Bitter Lesson always wins: scale compute over heuristics.

@_philschmid why in the hell would you optimize the model for the harness instead of creating a meta harness / harness that form fits around the model / models, the friction in one direction versus the other is greatly imbalanced

@_philschmid Increasingly think the harness is the differentiator, not the model. And the winning harness is model-agnostic: optimized for all of them, so your model preference doesn't matter, you rely on the harness to execute either way. Nobody's cracked it so far.

@_philschmid The former, no doubt. Models are becoming commoditized even at their inflated expense. The tool set is what matters.

@_philschmid Neither. Evolve, adapt, synergize.

@_philschmid Model should run in a default harness knowing all jobs to be done. Then fine tunes/variants should be done for popular harnesses. Start with Hermes please. Then fix AGY from what’s learned. It’s been unusable compared to gemini CLI.

@_philschmid Ding ding ding! Someone is feeling my vibe.
It costs a lot fscking less to run search over harness templates than it does to train, brother.
Clearly it's the harness..

@_philschmid I want my harness to work with every model I throw at it. I'm model agnostic and will simply prefer to use the right model for the task, and often I just want the most intelligence, as fast and cheap as possible. I have no allegiance to any one model.

@_philschmid Gemini struggles with Agentic Needs like following custom instructions set in harness (in my case AG). Fails to look for relevant agent skills and using them unless specially said so in the chat message. Same goes for mcp. For me i would love some Agentic magic in Gemini

@tunguz @_philschmid This is the only way

@_philschmid They should move together :)

@_philschmid If model capabilities improve consistently, the harness should become simpler over time.

@_philschmid neither TBH. The harness should provide clear and coherent abstractions that any intelligent model can use effectively.

@_philschmid Honestly, probably both - but the harness is usually the first thing people underestimate.

@_philschmid Yes.

@CAschkenasy @_philschmid This is impossible, optimized for everything means optimized for none.

@_philschmid Optimize the harness for the model, the model changes every week