AI researcher Herbie Bradley says leading closed labs use strict internal evals to avoid public benchmark overfitting · Digg