One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.
But this is common now across many topics, from training, inference, evals.
One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.
But this is common now across many topics, from training, inference, evals.
Users in the replies dismissed the rise of self-made AI benchmarks as overly ambitious since researchers now commonly tune their own metrics.
I guess its a form of circular economy
One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.
But this is common now across many topics, from training, inference, evals.

@init_malachi Because you are competing with yourself

@_arohan_ in real life you kinda have to tho

@_arohan_ never known different tbh

@_arohan_ let’s get back to the vague posting though. when will we see the loss curve for the 4th order optimizer

@_arohan_ the difference between a benchmark and a flex is fading fast

@_arohan_ bit ambitious when everyone just tunes their own tape now

@_arohan_ not wrong but the line between "tuning the rules" and "just knowing your own setup well" gets real thin
One shall not make a benchmark with their own rules and compete in the same benchmark is a reasonable rule of thumb.
But this is common now across many topics, from training, inference, evals.