AutoResearchClaw V0.5.0 Adds Domain-Expert Agents And ARC-Bench

Original post

🔥 AutoResearchClaw tech report + v0.5.0 just dropped.

12,300+⭐ on GitHub. Two big additions this release:

🧪 1/ Domain-Expert Agents in the experiment stage: Specialized agents for high-energy physics, biology, and more. Real domain tools + knowledge plugged in — not a generic LLM pretending to run experiments.

📊 2/ ARC-Bench A 55-topic benchmark across ML, HEP, quantum physics, biology, and statistics. One of the broadest cross-disciplinary evaluations for autonomous research ever released.

🏆 The numbers: → Beats AI Scientist v2 by 54.7% on ARC-Bench → 7-mode HITL (human-in-the-loop) ablation: targeted intervention > full autonomy OR exhaustive oversight.

The thesis (still): real research isn't a pipeline. Hypotheses fail. Lessons compound. AutoResearchClaw is a research amplifier — not a paper generator.

📄 Tech report: https://arxiv.org/abs/2605.20025 💻 Code: https://github.com/aiming-lab/AutoResearchClaw

Thanks @itsJiaqiLiu and @StephenQS0710 who lead the work and all other contributors @HaonianJi, @lillianwei423, @XinyeYee, @richardxp888, @HaoqinT, @Xinyu2ML, @WeitongZhang, @jiahengzhang96, @LINJIEFUN, @linjunz_stat, @yuyinzhou_cs, @CaimingXiong, @james_y_zou, @ZhengBerkeley, @cihangxie, @dingmyu

Huaxiu Yao@HuaxiuYaoML

Everyone's excited about Karpathy's autoresearch that automates the experiment loop.

We automated the whole damn thing. 🦞

Meet AutoResearchClaw: one message in, full conference paper out. Real experiments. Real citations. Real code. No human in the loop.

One message in → full paper out. Here's what happens in between:

📚 Raids arXiv & Semantic Scholar, digests 50+ papers in minutes 🥊 Three AI agents FIGHT over the best hypothesis (one swings big, one sanity-checks, one tries to kill every idea) 💻 Writes experiment code from scratch, adapts to your hardware 💥 Code crashes at 3am? It reads the stack trace, rewrites the fix, keeps going 🔄 Results weak? It pivots to entirely new hypotheses and starts over 📝 Drafts a full paper with citations, every single one verified against live databases

No babysitting. No Slack messages. No "hey can you re-run this."

Karpathy built the experiment loop. We built the whole lab.

Chat an idea. Get a paper. 🦞

Try it 👉: https://github.com/aiming-lab/AutoResearchClaw

Kudos to the team @JiaqiLiu835914, @richardxp888, @lillianwei423, @StephenQS0710, @Xinyu2ML, @HaoqinT, @zhengop, @cihangxie, @dingmyu, and we are looking for more contributors.

7:26 AM · May 20, 2026 · 6.5K Views