A benchmark by Stanford NLP's Chenglei Si finds Claude-Fable-5 leads on autoresearch, while open-weight Kimi-K2.7-Code tops ML engineering · Digg