/AI2d ago

Dylan Zhang releases CausaLab, a benchmark evaluating LLM agents on scientific discovery in fictional environments where they cannot use memorized data

Agents must actively experiment to build causal world models.

141452010264.2K

#438

Original post

Chenhao Tan#566

Dylan Zhang@dylan_works_

A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://dylanzsz.github.io/causalab/

2:18 PM · Jun 6, 2026 · 59.9K Views

/AI2d ago

Dylan Zhang releases CausaLab, a benchmark evaluating LLM agents on scientific discovery in fictional environments where they cannot use memorized data

Agents must actively experiment to build causal world models.

141452010264.2K

#438

Original post

Chenhao Tan#566

Dylan Zhang@dylan_works_

2:18 PM · Jun 6, 2026 · 59.9K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS4.2KBOOKMARKS10LIKES19RETWEETS1REPLIES1

CLS@ChengleiSi

AI Scientists need to build world models from their experiments!

Dylan Zhang@dylan_works_

6h4.2K1910