/AI2d ago

Dylan Zhang releases CausaLab, a benchmark evaluating LLM agents on scientific discovery in fictional environments where they cannot use memorized data

Agents must actively experiment to build causal world models.

141452010264.2K
Original postChenhao Tan#566
Dylan Zhang@dylan_works_

A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://dylanzsz.github.io/causalab/

2:18 PM · Jun 6, 2026 · 59.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4.2KBOOKMARKS10LIKES19RETWEETS1REPLIES1
CLS@ChengleiSi

AI Scientists need to build world models from their experiments!

Dylan Zhang@dylan_works_

A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://dylanzsz.github.io/causalab/

6hViews 4.2KLikes 19Bookmarks 10