/AI1d ago

Paper Introduces Effective Feedback Compute Metric For Scaling AI Agents

--0--
Original posts
Reposts
Original post
Rohan Paul@rohanpaul_ai#1032inAI

Better AI agent systems scale by remembering useful feedback, not by spending more compute.

The simple mistake is to count tokens, calls, or dollars as if they were all evidence.

The authors say those numbers miss the real issue, because 2 runs can spend the same budget while only 1 gets feedback that is correct, new, relevant, and remembered.

An agent harness is not just a wrapper around a model; it is a feedback machine that decides what to test, what to trust, what to store, and what to ignore.

Their answer is Effective Feedback Compute, or EFC, a score that counts feedback only when it teaches the agent something useful and changes later decisions.

They also divide EFC by task demand, because a small lookup task and a messy software-repair task need different amounts of helpful feedback before the agent has enough to solve them.

They tested this on synthetic tasks, code tasks with executable tests, real benchmark traces, held-out settings, and a new prospective batch, then compared EFC with raw compute and a strong agent-scaling baseline.

The main result is that task-normalized EFC predicted failures much better than raw compute, and in 1 matched-budget test, better feedback raised success from 0.27 to 0.90 while cost and tool calls stayed fixed.

----

Link – arxiv. org/abs/2605.29682

Title: "Scaling Laws for Agent Harnesses via Effective Feedback Compute"

7:12 AM · Jun 1, 2026 · 3.5K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
RETWEETS5
Rohan Paul@rohanpaul_ai

Better AI agent systems scale by remembering useful feedback, not by spending more compute.

The simple mistake is to count tokens, calls, or dollars as if they were all evidence.

The authors say those numbers miss the real issue, because 2 runs can spend the same budget while only 1 gets feedback that is correct, new, relevant, and remembered.

An agent harness is not just a wrapper around a model; it is a feedback machine that decides what to test, what to trust, what to store, and what to ignore.

Their answer is Effective Feedback Compute, or EFC, a score that counts feedback only when it teaches the agent something useful and changes later decisions.

They also divide EFC by task demand, because a small lookup task and a messy software-repair task need different amounts of helpful feedback before the agent has enough to solve them.

They tested this on synthetic tasks, code tasks with executable tests, real benchmark traces, held-out settings, and a new prospective batch, then compared EFC with raw compute and a strong agent-scaling baseline.

The main result is that task-normalized EFC predicted failures much better than raw compute, and in 1 matched-budget test, better feedback raised success from 0.27 to 0.90 while cost and tool calls stayed fixed.

----

Link – arxiv. org/abs/2605.29682

Title: "Scaling Laws for Agent Harnesses via Effective Feedback Compute"

1dViews 3.5KLikes 60Bookmarks 33