GDP.pdf measures whether models can read the messy professional documents - wiring diagrams, rocket schematics - that run the world.
Riemann-bench measures research-level math, written by ivy league profs and IMO medalists in the course of their work.
...and climbing them both?...
the stuff of fables 馃槑
congrats anthropic!