10h ago

AI Agents Still Fail Despite Native Harnesses in New Benchmark

0
Original post

@xeophon and yes, there we did use native agent harnesses! and still all agents basically suck. it's gonna be a very interesting benchmark. i know i'm teasing too much...

11:11 AM · May 16, 2026 View on X

@maksym_andr i do not expect anything else from the PTB + FutureSim ppl tbh

Maksym AndriushchenkoMaksym Andriushchenko@maksym_andr

@xeophon and yes, there we did use native agent harnesses! and still all agents basically suck. it's gonna be a very interesting benchmark. i know i'm teasing too much...

6:11 PM · May 16, 2026 · 74 Views
6:14 PM · May 16, 2026 · 62 Views

@xeophon and yes, there we did use native agent harnesses! and still all agents basically suck. it's gonna be a very interesting benchmark. i know i'm teasing too much...

Florian BrandFlorian Brand@xeophon

@maksym_andr you are treating me too well...

6:09 PM · May 16, 2026 · 76 Views
6:11 PM · May 16, 2026 · 74 Views
AI Agents Still Fail Despite Native Harnesses in New Benchmark · Digg