/Tech1d ago

AI Researcher Argues Manual Methods Beat Automated Research On Messy Objectives

1348164.7K

Original post

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

3:41 AM · Jun 9, 2026 · 4K Views

/Tech1d ago

AI Researcher Argues Manual Methods Beat Automated Research On Messy Objectives

1348164.7K

#1088

Original post

Cody Blakeney@code_star#1088inTech

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

3:41 AM · Jun 9, 2026 · 4K Views

Sentiment

Users backed the argument that manual methods beat automated research on messy objectives because quality beats raw metrics like loss or perplexity.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS710LIKES12REPLIES2

Cody Blakeney@code_star

Sorry, I don’t mean “I’m cheating” to imply I’m benchmaxxing or actually cheating.

I mean if what data went into a model was as clear cut as a single number for all scales, etc. than the auto researchers probably would already outpace humans.

My point is that defining good isn’t even easy to do for what we want out of base/midtrained models. So we cannot produce an easy metric, and as such we cannot give an auto researcher an easy hill to climb.

It’s not that the work isn’t hill climbing, it’s that without something to give you a dead reckoning I don’t know that the auto researchers are really up to the task yet.

Cody Blakeney@code_star

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

1d710120

RETWEETS1

Cody Blakeney@code_star

I think people are taking results from things with really easy and meaningful metrics like loss / perplexity and assuming it applies more broadly than it really does.

I promise I can out data the auto researchers right now. I’m cheating because it’s a messy objective, but that’s the point right?

1d4K366