Terminal-Bench Science extends its platform, already used by Anthropic, OpenAI, and Google DeepMind, with an open call for over 100 scientific workflow tasks for AI agent evaluation

QUOTE POST

I'm very excited about this extension to the celebrated Terminal-Bench to science.

If you're a scientist (life, physical, earth, mathematical science, etc) interested in AI, definitely check this out!

Terminal bench evaluate how good AI models are at controling tools on a computer to achieve a goal (using the command line). T-Bench science now extends that to "AI for Science" and it comes with a call to contribute your own (real scientific world) workflow to the benchmark (until August 2026).

The more workflows and the more diverse they are, the better the next generation of AI models will be at helping you in your daily research work.

Note that this is not a training dataset, it's to evaluate frontier model performances.

Steven Dillmann@StevenDillmann

📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇 http://tbench.ai/news/tb-science-announcement @AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows. 1/6🧵

5:00 PM · May 20, 2026 · 455.4K Views

5:47 PM · May 20, 2026 · 4.5K Views

QUOTE POST

#980Lisan al Gaib@SCALING01

let the hill climbing on scientific tasks begin

new benchmark: TerminalBench Science

Steven Dillmann@StevenDillmann

📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇 http://tbench.ai/news/tb-science-announcement @AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows. 1/6🧵

5:00 PM · May 20, 2026 · 455.4K Views

9:25 PM · May 20, 2026 · 2K Views

REPLY

#980Lisan al Gaib@SCALING01

it's currently still being built and you can submit your verifiable tasks until august 17th 2026

Lisan al Gaib@scaling01

let the hill climbing on scientific tasks begin new benchmark: TerminalBench Science

9:25 PM · May 20, 2026 · 2K Views

9:26 PM · May 20, 2026 · 510 Views

QUOTE POST

#1070Alex Ratner@AJRATNER

Extremely excited for Terminal-Bench Science, which we're proud to support via our Open Benchmarks Grants @SnorkelAI !

Steven Dillmann@StevenDillmann

📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇 http://tbench.ai/news/tb-science-announcement @AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows. 1/6🧵

5:00 PM · May 20, 2026 · 455.4K Views

9:15 PM · May 20, 2026 · 816 Views

Terminal-Bench Science extends its platform, already used by Anthropic, OpenAI, and Google DeepMind, with an open call for over 100 scientific workflow tasks for AI agent evaluation

Sentiment

Cluster engagement