/Tech6h ago

Key Papers on LLM Agent Failures and Self-Improvement Highlighted

213171.5K
Original post
Yohei@yoheinakajima#1322inTech

shoutouts: • why multi-agent LLM systems fail? (arXiv:2503.13657) — @mertcemri @melissapan + @istoica05 @matei_zaharia @profjoeyg @adityagp & team • DSPy (arXiv:2310.03714) — @lateinteraction + @hazyresearch lab & co-authors • GRASP (arXiv:2605.29668) — Jonas Moll, Jean-Philippe Corbeil et al. • A self-improving coding agent (arXiv:2504.15228) — @maxime_robeyns, Martin Szummer & Laurence Aitchison • Reflexion (arXiv:2303.11366) — Noah Shinn, Federico Cassano et al. (incl. @ShunyuYao12) • LongMemEval & v2 (arXiv:2410.10813 & 2605.12493) — @DiWu0162 + Kai-Wei Chang et al. • ExpeL (AAAI 2024) — @_AndrewZhao, Daniel Huang et al. • where LLM agents fail and how they can learn from failures (arXiv:2509.25370) — Kunlun Zhu et al.

Yohei@yoheinakajima

two fun surprises from using activegraph: - the coding agent i was using would query the trace db to debug instead of looking at the logs like they normally would (i didn't ask it to) - when long eval runs broke (laptop, api, etc.), it was always able to pick up from right before it broke, never starting from the beginning again

3:03 PM · Jun 10, 2026 · 622 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS894BOOKMARKS4LIKES6RETWEETS1REPLIES1
Yohei@yoheinakajima

the paper is long (30 pages), but have your AI read it: https://arxiv.org/abs/2606.10241

here's a simple interactive tutorial on the topic that claude made: https://claude.ai/public/artifacts/038db6cf-11db-4777-9c5e-7a352f08119a

Yohei@yoheinakajima

shoutouts: • why multi-agent LLM systems fail? (arXiv:2503.13657) — @mertcemri @melissapan + @istoica05 @matei_zaharia @profjoeyg @adityagp & team • DSPy (arXiv:2310.03714) — @lateinteraction + @hazyresearch lab & co-authors • GRASP (arXiv:2605.29668) — Jonas Moll, Jean-Philippe Corbeil et al. • A self-improving coding agent (arXiv:2504.15228) — @maxime_robeyns, Martin Szummer & Laurence Aitchison • Reflexion (arXiv:2303.11366) — Noah Shinn, Federico Cassano et al. (incl. @ShunyuYao12) • LongMemEval & v2 (arXiv:2410.10813 & 2605.12493) — @DiWu0162 + Kai-Wei Chang et al. • ExpeL (AAAI 2024) — @_AndrewZhao, Daniel Huang et al. • where LLM agents fail and how they can learn from failures (arXiv:2509.25370) — Kunlun Zhu et al.

6hViews 894Likes 6Bookmarks 4
Yohei@yoheinakajima

paper #1 for context:

6hViews 400Likes 2Bookmarks 1