12h ago

Kaito tests Anthropic's Opus 4.8 on a complete codebase refactor, consuming 100 million tokens without producing any working code

— The automated two-hour run generated 172,631 line additions.

——0——
Original post
OPGary Marcus, MIT PhD and NYU Professor EmeritusGM#153Gary Marcus, MIT PhD and NYU Professor Emeritus|@GARYMARCUS

“none of it worked but boy was it beautiful”

7:35 PM · May 29, 2026 View on X

Sentiment

Pos3.6%
Neg96.4%

Many users criticized Anthropic Opus 4.8 for failing a massive codebase refactor after consuming 100 million tokens, calling the outcome wasteful and ineffective.

31 comments with sentiment.

1441.3K5772105.4K

Cluster engagement

30 snapshots