🔥 AutoResearchClaw tech report + v0.5.0 just dropped.
12,300+⭐ on GitHub. Two big additions this release:
🧪 1/ Domain-Expert Agents in the experiment stage: Specialized agents for high-energy physics, biology, and more. Real domain tools + knowledge plugged in — not a generic LLM pretending to run experiments.
📊 2/ ARC-Bench A 55-topic benchmark across ML, HEP, quantum physics, biology, and statistics. One of the broadest cross-disciplinary evaluations for autonomous research ever released.
🏆 The numbers: → Beats AI Scientist v2 by 54.7% on ARC-Bench → 7-mode HITL (human-in-the-loop) ablation: targeted intervention > full autonomy OR exhaustive oversight.
The thesis (still): real research isn't a pipeline. Hypotheses fail. Lessons compound. AutoResearchClaw is a research amplifier — not a paper generator.
📄 Tech report: https://arxiv.org/abs/2605.20025 💻 Code: https://github.com/aiming-lab/AutoResearchClaw
Thanks @itsJiaqiLiu and @StephenQS0710 who lead the work and all other contributors @HaonianJi, @lillianwei423, @XinyeYee, @richardxp888, @HaoqinT, @Xinyu2ML, @WeitongZhang, @jiahengzhang96, @LINJIEFUN, @linjunz_stat, @yuyinzhou_cs, @CaimingXiong, @james_y_zou, @ZhengBerkeley, @cihangxie, @dingmyu
Everyone's excited about Karpathy's autoresearch that automates the experiment loop.
We automated the whole damn thing. 🦞
Meet AutoResearchClaw: one message in, full conference paper out. Real experiments. Real citations. Real code. No human in the loop.
One message in → full paper out. Here's what happens in between:
📚 Raids arXiv & Semantic Scholar, digests 50+ papers in minutes 🥊 Three AI agents FIGHT over the best hypothesis (one swings big, one sanity-checks, one tries to kill every idea) 💻 Writes experiment code from scratch, adapts to your hardware 💥 Code crashes at 3am? It reads the stack trace, rewrites the fix, keeps going 🔄 Results weak? It pivots to entirely new hypotheses and starts over 📝 Drafts a full paper with citations, every single one verified against live databases
No babysitting. No Slack messages. No "hey can you re-run this."
Karpathy built the experiment loop. We built the whole lab.
Chat an idea. Get a paper. 🦞
Try it 👉: https://github.com/aiming-lab/AutoResearchClaw
Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @zhengop, @cihangxie, @dingmyu, and we are looking for more contributors.