/AI1h ago

ToolFailBench Accepted At Two ICML 2026 Workshops For Agent Diagnostics

1921569

Original posts

Reposts

#863

Original post

Charles 🎉 Frye#863

Harsh@SoHarshhh

Really happy to share that “ToolFailBench” got accepted at two ICML 2026 workshops, FAGEN and AIWILD.

Most benchmarks evaluate tool-using agents with a single aggregate success rate, but that number can’t explain why a model actually fails. ToolFailBench is a diagnostic benchmark that scores tool use against a failure taxonomy instead of one number, breaking each trace into four distinct failure modes: skipping a tool that was needed, ignoring what a tool returns, fabricating tool outputs, and over-calling tools when none is needed. We find that models with similar aggregate scores fail in very different ways, so a single number isn’t enough to compare agents.

5:28 PM · May 31, 2026 · 569 Views

/AI1h ago

ToolFailBench Accepted At Two ICML 2026 Workshops For Agent Diagnostics

--0--

Original posts

Reposts

#863

Original post

Charles 🎉 Frye#863

Harsh@SoHarshhh

Really happy to share that “ToolFailBench” got accepted at two ICML 2026 workshops, FAGEN and AIWILD.

5:28 PM · May 31, 2026 · 569 Views

Sentiment

Users thanked Modal and Charles for compute support that helped ToolFailBench get accepted to ICML 2026 workshops.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

No ranked X posts are available for this story yet.