/Tech3h ago

Benjamin Anderson says open-source agent evaluation tool Harbor is poorly suited for training non-coding agents

The analysis sparked a debate on Harbor's adaptability for search-based tasks.

2300188

#1172

Original post

Ben (no treats)@andersonbcdefg#1627inTech

@vwxyzjn i wish it was a better fit for tasks that arent coding agent in a sandbox shaped. at least that was my read when i evaluated using it to train a search agent

Costa Huang@vwxyzjn

Harbor is really great. I like the design and it's well polished for doing evals. It would be great to use the same rollout utility for everything (RL / eval / new tasks definitions).

4:47 PM · Jul 4, 2026 · 153 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS39LIKES1

Costa Huang@vwxyzjn

@andersonbcdefg why can't a search based task not a good fit for harbor's task definition?

Ben (no treats)@andersonbcdefg

@vwxyzjn i wish it was a better fit for tasks that arent coding agent in a sandbox shaped. at least that was my read when i evaluated using it to train a search agent

3h3910