/Tech5h ago

Researcher Flags Rise of Self-Made AI Benchmarks in Training and Evals

857145.9K

#86

Original post

rohan anil@_arohan_#86inTech

One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.

But this is common now across many topics, from training, inference, evals.

4:12 PM · Jun 10, 2026 · 4K Views

/Tech5h ago

Researcher Flags Rise of Self-Made AI Benchmarks in Training and Evals

857145.9K

#86

Original post

rohan anil@_arohan_#86inTech

One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.

But this is common now across many topics, from training, inference, evals.

4:12 PM · Jun 10, 2026 · 4K Views

Sentiment

Users in the replies dismissed the rise of self-made AI benchmarks as overly ambitious since researchers now commonly tune their own metrics.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.9KBOOKMARKS1LIKES4

rohan anil@_arohan_

I guess its a form of circular economy

rohan anil@_arohan_

One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.

But this is common now across many topics, from training, inference, evals.

5h1.9K41

REPLIES1

rohan anil@_arohan_

@init_malachi Because you are competing with yourself

5h391

M@init_malachi

@_arohan_ in real life you kinda have to tho

5h45

M@init_malachi

@_arohan_ never known different tbh

5h15

Ω.KendrickPlumard@fouriergalois

@_arohan_ let’s get back to the vague posting though. when will we see the loss curve for the 4th order optimizer

5h12

Invincible@InvincibleEdge

@_arohan_ the difference between a benchmark and a flex is fading fast

Rugbist@rugbist_

@_arohan_ bit ambitious when everyone just tunes their own tape now

Blissy@BlissyOnX

@_arohan_ not wrong but the line between "tuning the rules" and "just knowing your own setup well" gets real thin