10h ago

METR evaluation shows AI agents autonomously completing real engineering projects inside companies that would take human experts multiple weeks on verifiable tasks like vulnerability discovery

MirrorCode-Early beat prior benchmarks for 2026 models.

0
Original post

How do you AI engineer an agent to do AI engineering? Turns out this is how 💯

11:00 AM · May 19, 2026 View on X
Reposted by