Upcoming benchmark from PTB and FutureSim teams finds all AI agents perform poorly · Digg