2d ago

Lun Wang departs Google DeepMind and argues evaluations determine training objectives, safety layers, scaling decisions, and safe capability transitions for frontier AI systems

He called for adaptive evaluations as models cross new thresholds.

5164611720.6K

——0——

Original post

#1990@BRENDANFOODYOP

Lun Wang@LUNWANG1996

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. https://wanglun1996.github.io/blog/your-evals-will-break.html

8:57 PM · May 17, 2026

QUOTE POST