11h ago

transformernews.ai argues the METR Long Tasks benchmark reveals little about whether AI systems will replace knowledge workers or create broad risks

0

Balaji Srinivasan linked the piece while noting every AI agent has a human principal.

Original post

Every AI agent ultimately has a human principal.

2:50 AM · May 19, 2026 View on X

It’s possible technology changes this. But right now, AI agents aren’t truly autonomous. They are built for the prompt, bots on a leash. Amplified intelligence as wholly distinct from truly artificial intelligence.

The much-cited METR study doesn’t change that. Read the critique linked below; it notes that there METR shows a sigmoid on the messiest tasks.

Even anecdotally…agentic workflows absolutely do help, and time horizons have been lengthening since Claude Cowork. But it’s just not the panacea it’s made out to be. Human prompting and verification remains the bottleneck, because digital AI only does it middle-to-middle, not end-to-end.

Anyway: in the absence of constant human verification, it’s extraordinarily easy to fill a codebase with economically irrelevant slop.

It’s the principal/agent problem all over again, with human principals and AI agents. High agency actually means exerting high levels of human control over highly expensive agents. https://www.transformernews.ai/p/against-the-metr-graph-coding-capabilities-software-jobs-task-ai

BalajiBalaji@balajis

Every AI agent ultimately has a human principal.

9:50 AM · May 19, 2026 · 55.1K Views
10:05 AM · May 19, 2026 · 19.2K Views

The sooner people and companies realize this, the better they can leverage AI.

It makes a lot of sense. We have trained current AI systems to work optimally when paired with human expertise.

Things can change in the distant future. More autonomous agents are on the horizon. But even then, human verification and ingenuity will matter a ton.

BalajiBalaji@balajis

Every AI agent ultimately has a human principal.

9:50 AM · May 19, 2026 · 55.1K Views
5:42 PM · May 19, 2026 · 2.2K Views
transformernews.ai argues the METR Long Tasks benchmark reveals little about whether AI systems will replace knowledge workers or create broad risks · Digg