/Tech9h ago

AI Researcher Argues Manual Methods Beat Automated Research On Messy Objectives

1340033.6K

Original post

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

3:41 AM · Jun 9, 2026 · 3K Views

/Tech9h ago

AI Researcher Argues Manual Methods Beat Automated Research On Messy Objectives

1340033.6K

#1269

Original post

Cody Blakeney@code_star#1269inTech

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

3:41 AM · Jun 9, 2026 · 3K Views

Sentiment

Users endorse the argument that manual methods outperform automated research on messy objectives because quality beats raw metrics every time.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS646LIKES9REPLIES2

Cody Blakeney@code_star

Sorry, I don’t mean “I’m cheating” to imply I’m benchmaxxing or actually cheating.

I mean if what data went into a model was as clear cut as a single number for all scales, etc. than the auto researchers probably would already outpace humans.

My point is that defining good isn’t even easy to do for what we want out of base/midtrained models. So we cannot produce an easy metric, and as such we cannot give an auto researcher an easy hill to climb.

It’s not that the work isn’t hill climbing, it’s that without something to give you a dead reckoning I don’t know that the auto researchers are really up to the task yet.

Cody Blakeney@code_star

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

9h64690