6h ago

AI Struggles With Wisdom Tasks and Physical Domains, Analyst Argues

0
Original post

The obvious domains are those where humans also struggle of course (drug development say) or physical domains (massage, plumbing). If restricted to text stuff, it starts to get to the idea of wisdom, which cannot be benchmarked but can be tested (ie have two companies, one with a human SWE and agents, the other with a Claude as CTO, see who wins; or even a fully end to end AI company that does software stuff). Software startups still exist, it is useful to wonder why!

12:07 PM · May 29, 2026 View on X

@dwarkesh_sp Or broadly, there's now a lot of knowhow on how to benchmark (and benchmaxx) tasks. Moving from tasks to jobs is the next thing.

José Luis Ricón Fernández de la PuenteJosé Luis Ricón Fernández de la Puente@ArtirKel

The obvious domains are those where humans also struggle of course (drug development say) or physical domains (massage, plumbing). If restricted to text stuff, it starts to get to the idea of wisdom, which cannot be benchmarked but can be tested (ie have two companies, one with a human SWE and agents, the other with a Claude as CTO, see who wins; or even a fully end to end AI company that does software stuff). Software startups still exist, it is useful to wonder why!

7:07 PM · May 29, 2026 · 2.3K Views
7:11 PM · May 29, 2026 · 965 Views