馃Уwe've been teasing this @guidelabsai for a while: interpretability scales. Not survives scaling, but models actually get easier to understand, as you scale, if trained correctly. Let me share some of my favorite results in this thread.
Today we鈥檙e announcing a finding that breaks a core assumption in AI: that bigger models are harder to understand.
We show the opposite. When interpretability is built into training, models become MORE understandable as they become more capable.