I'm big believer in just doing the obvious thing. Turns out you can diff two models by just asking an agent to do it!
New research update from the Google DeepMind Language Model Interpretability team.
We build and evaluate dead simple open-ended model diffing agents tasked with studying the behavioural differences between two models, and find them to be promising in practice.










