Stanford's Sanmi Koyejo and collaborators release Terminal-Bench-Science to evaluate AI agents on real-world scientific terminal workflows · Digg