Jeff Dean and Andy Konwinski launch ALE, an agent benchmark finding frontier models score 0% on complex professional tasks · Digg