Researcher Dimitris Papailiopoulos says GPQA-Diamond benchmark is the single most predictive of overall AI model performance
Epoch AI dashboard tracks 94 results from OpenAI, Anthropic, and Google models.
——0——
@scaling01 it's also the single most predictive benchmark in terms of overall performance
I think GPQA-Diamond was one of the best benchmarks of all time despite its many flaws simply because it was useful for multiple years
10:25 PM · May 21, 2026 · 3.3K Views
10:46 PM · May 21, 2026 · 444 Views