/AI23h ago

AI Expert Urges Reality-Based Benchmarks Beyond Human Preferences

126122.2K

Original posts

#798

Quote posts

Reposts

#798

Original post

Anastasios Nikolas Angelopoulos@ml_angelopoulos#798inAI

Reality is the only benchmark that actually matters. This is because it is grounded in objective truth and therefore cannot be overfit or gamed. Arena is built to measure the post-deployment characteristics of AI in the hands of real users.

Then, what should we measure? What if we went beyond human preference and started measuring everything?

4:28 PM · Jun 3, 2026 · 1.4K Views

/AI23h ago

AI Expert Urges Reality-Based Benchmarks Beyond Human Preferences

--0--

Original posts

#798

Quote posts

Reposts

#798

Original post

Anastasios Nikolas Angelopoulos@ml_angelopoulos#798inAI

Then, what should we measure? What if we went beyond human preference and started measuring everything?

4:28 PM · Jun 3, 2026 · 1.4K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS806BOOKMARKS1LIKES5

Evan@evan_a_frick

I'm increasingly worried that models will be able to act differently while being explicitly benchmarked. New evaluation methodology will be necessary to check these systems are actually safe and performing as expected in the real world. We need measure every facet of each model.

6h80651

RETWEETS1

Evan@evan_a_frick

6h806

Posts from X

Most Activity

VIEWS806BOOKMARKS1LIKES5

Evan@evan_a_frick

6h80651

RETWEETS1

Evan@evan_a_frick

6h80651