A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://dylanzsz.github.io/causalab/
Dylan Zhang releases CausaLab, a benchmark evaluating LLM agents on scientific discovery in fictional environments where they cannot use memorized data
Agents must actively experiment to build causal world models.
Most Activity
AI Scientists need to build world models from their experiments!
A real scientist doesn't look up how the world works — they intervene, observe, and revise until a theory holds for a case they've never seen. CausaLab drops an LLM agent into a lab where memorized facts are useless ("Quantum Crystals on Planet X") and asks for the same. https://dylanzsz.github.io/causalab/