Let’s talk about evals.
We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed.
@tejalpatwardhan, who leads our frontier evals team, spoke to @andrewmayne about why evals matter and what models need to be judged on next.















