6h agoMETR Evals reports frontier AI agents rely on natural language for the hardest tasks, trailing full performance by 1.5 to 2 years without out-loud reasoning.— Agents reached only four-minute horizons without reasoning.——0——Original postSK#505@SEBKRIEROPMEMETR|@METR_EVALSFact 2: However, agents appeared to be significantly weaker on tasks where it is costly or hard to verify success.11:11 AM · May 19, 2026 View on XReposted byGL#1480|@GLEECH