/Tech4h ago

SlopCodeBench Reveals AI Code Erosion 5x Worse Than Humans

1613320

#259

Original post

Danielle Fong 🔆#259

Jimmy Koppel@jimmykoppel

That brings up the first pillar of shipping AI code: maintaining quality

SlopCodeBench measures what happens when you ask an AI to make an MVP, then add a feature, then add a feature, without human intervention

Code erosion 5x worse than a human. 100% failure rate by the end

3:10 PM · Jun 8, 2026 · 320 Views

/Tech4h ago

SlopCodeBench Reveals AI Code Erosion 5x Worse Than Humans

1613320

#259

Original post

Danielle Fong 🔆#259

Jimmy Koppel@jimmykoppel

That brings up the first pillar of shipping AI code: maintaining quality

SlopCodeBench measures what happens when you ask an AI to make an MVP, then add a feature, then add a feature, without human intervention

Code erosion 5x worse than a human. 100% failure rate by the end

3:10 PM · Jun 8, 2026 · 320 Views

Sentiment

Many users congratulated Jimmy Koppel on the SlopCodeBench release because it shows AI agents eroding code quality much faster than humans.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Jimmy Koppel@jimmykoppel

It turns out you can be extremely precise about what makes code complex and how to solve it I spent a decade teaching this to hundreds of engineers ( https://mirdin.com ).

In the hands of an AI: super-effective

In Command Center: click “Refactoring” before reading

4h36821

BOOKMARKS1

Jimmy Koppel@jimmykoppel

http://Refact.ai is the first company I’ve seen that specializes in refactoring. In our internal benchmarks, they scored second behind us — and more than twice as good as Claude alone

4h4931

LIKES3

Jimmy Koppel@jimmykoppel

AI promises to make people 10x or more faster at shipping. But just getting better at prompting and managing the AIs is not enough. 80% of the work happens after the AI finishes, and that part is largely untouched.

So come join us and unlock the future of great software available for all.

4h433

RETWEETS1REPLIES2

Jimmy Koppel@jimmykoppel

Command Center 1.0 is out today. People say the walkthroughs make them >2x faster at reading even a 400-line diff, while the refactorings give the LLMs taste

It has a lot more that’s needed to make you a super-fast AI builder, but that’s what hits the big problems the hardest

4h363

Jimmy Koppel@jimmykoppel

Cursor open sourced their “thermonuclear code quality review” skill.

Some of our users have badmouthed it — think we spoiled them. But they’ve been brave for putting themselves out there and showing they still care about code quality. Kudos to them.

https://github.com/cursor/plugins/blob/3347cbab5b54136f6fba0994c3a01a56f7fb7fca/cursor-team-kit/skills/thermo-nuclear-code-quality-review/SKILL.md

4h2521

Jimmy Koppel@jimmykoppel

Lathe ( https://github.com/devenjarvis/lathe ) is my new favorite. Sometimes learning code is not about reading the code, but about understanding the domain and algorithms. Lathe gives you such a deeper tutorial about anything you want.

4h2021

Jimmy Koppel@jimmykoppel

I think we noticed this problem first and have gone much further than anyone else. But I want to give shout-outs to the other builders also working to unlock 100x shipping speed

4h243

Jimmy Koppel@jimmykoppel

For a more detailed explanation of learning science in the context of reading codebases — with a surprise tie-in to classic 90’s gaming — see

4h302

Jimmy Koppel@jimmykoppel

The trick then is to put them back into logical order.

That’s what Command Center walkthroughs give you

Reading a 1000 line diff now becomes pressing ➡️ 100 times

Extra labels are cues that help you mentally chunk each bit

And explanations on the side, but you’ll usually ignore those

4h282

Jimmy Koppel@jimmykoppel

Back to coding.

Here’s me trying to add Giphy integration to a popular messaging app (Signal Desktop).

Here’s the diff when the AI finishes

First thing? Random localization strings.

Then some random CSS

4h272

Jimmy Koppel@jimmykoppel

That’s why it’s so hard to remember the random facts about this unknown human

But maybe it would make a difference if I told you that Ralph “Jones” is actually my father Ralph Koppel, and the Elijah Watt Sells award is given to the top 100 scorers each year on the CPA exam.

4h242

Jimmy Koppel@jimmykoppel

The Geoff facts connect in a logical story, and attach to something you’ve heard of

If I just gave you those facts in reverse order, it would be harder to remember

The Ralph facts? Can’t even visualize them

4h212

Jimmy Koppel@jimmykoppel

What makes code reading hard?

Back to the memory challenge

When I’ve done this live, usually several can get the Geoffrey Hinton facts.

For Ralph Jones? Maaybe the whole room can cobble it together

4h212

Jimmy Koppel@jimmykoppel

That’s because learning and memory is actually rather predictable.

I designed the Geoff facts to be easy to remember, and the Ralph ones to be hard.

4h212

Jimmy Koppel@jimmykoppel

Geoff pushed that we should teach machines to think associatively. Like humans.

According to multiple cognitive theories, humans organize knowledge into discrete chunks that connect. Activating one thought makes it easier to recall everything connected to it.

4h192

Jimmy Koppel@jimmykoppel

You’ve improved quality

It’s more readable; easier for humans and AIs to build on

You still want to understand it. Enough for the power of seeing a problem and knowing what’s up. Or at least to understand an AI’s explanation

Time to read the code. Summaries are no replacement

4h182

Jimmy Koppel@jimmykoppel

What we’ve seen: sequencing makes a big difference to ease of understanding

I gave you the Geoff facts in logical order

Would it be harder to remember if I scrambled them?

Well, coding agents show their changes in **alphabetical** order

4h172

Jimmy Koppel@jimmykoppel

The thing you should probably read first — the core datatypes — is buried in the middle

Most changes are unmotivated until you read that part

So reading a 1k-line diff tends to consist of scrolling up and down until you spot something that makes sense

>2x harder than a 500 line diff

4h162

Jimmy Koppel@jimmykoppel

Claude’s /simplify command is basically “please reduce complexity.” But people use it because they’re starving for quality.

4h201

Jimmy Koppel@jimmykoppel

When the code is more complex, making it correct gets harder, for both human and AI

Some accept lower quality and letting the codebase get buggier and harder to maintain. Others fight it but never ship AI diffs >400 lines.

We’ve built a way to have both

4h181