SWE-bench creator Ofir Press outlines a cyclical framework where benchmark development drives language model progress · Digg