Anthropic's Julian Schrittwieser argues difficult software benchmarks from Ofir Press are vital for tracking AI progress · Digg