2d ago

Lun Wang departs Google DeepMind and argues evaluations determine training objectives, safety layers, scaling decisions, and safe capability transitions for frontier AI systems

He called for adaptive evaluations as models cross new thresholds.

0
Original post

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. https://wanglun1996.github.io/blog/your-evals-will-break.html

8:57 PM · May 17, 2026 View on X

Your model is what you measure

bilalbilal@bilaltwovec
5:56 PM · May 18, 2026 · 31K Views
6:50 PM · May 18, 2026 · 2K Views
Lun WangLun Wang@lunwang1996

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. https://wanglun1996.github.io/blog/your-evals-will-break.html

3:57 AM · May 18, 2026 · 559.7K Views
5:56 PM · May 18, 2026 · 31K Views

this parts even better

bilalbilal@bilaltwovec
5:56 PM · May 18, 2026 · 31K Views
6:35 PM · May 18, 2026 · 365 Views