/Tech3h ago

Stanford's Sanmi Koyejo and collaborators release Terminal-Bench-Science to evaluate AI agents on real-world scientific terminal workflows

Domain experts directly authored and verified all benchmark tasks

427443.1K

#440

Original post

Laude Institute@LaudeInstitute

TBench Science / @DillmannSteven, @ryanmart3n, @alexgshaw, @Mike_A_Merril, @AlexGDimakis, @sanmikoyejo, @lschmidt3 (@Stanford) A benchmark for evaluating AI agents on real computational workflows across the natural sciences, with tasks authored and verified by scientific domain experts.

10:19 AM · Jun 25, 2026 · 2.1K Views

Sentiment

Users congratulated the Stanford team and Laude Institute on releasing the TBench Science Benchmark for AI Agents, praising the exceptional batch of projects assembled by the researchers.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

tb science task dashboard

STEVENDILLMANN.GITHUB.IOVia

Posts from X

Most Activity

VIEWS184BOOKMARKS1LIKES14

Laude Institute@LaudeInstitute

Congrats to Research Partner @bradenjhancock and Laude Institute co-founder @ChrisRytting on assembling another exceptional batch. Every project in Slingshots // THREE ships open source. Full announcement: http://laude.org/updates/slingshots-three

3h184141

RETWEETS3

Steven Dillmann@StevenDillmann

Honored to have Terminal-Bench-Science included in Slingshots // THREE, alongside such a strong lineup of researchers and projects. Building a benchmark to evaluate AI agents on computational workflows across the natural sciences — authored and verified by real domain experts. Grateful for the incredible support from @LaudeInstitute & @bradenjhancock, and to all our contributors making this happen. ⚛️🧪

Check out the current progress on our brand-new task submission dashboard: https://stevendillmann.github.io/tb-science-task-dashboard/

Laude Institute@LaudeInstitute

2h1.1K143

Laude Institute@LaudeInstitute

@ryanmart3n @alexgshaw @AlexGDimakis @sanmikoyejo @lschmidt3 @Stanford @StevenDillmann @Mike_A_Merrill

3h611