/AI1d ago

Shopify CTO Mikhail Parakhin says a weekend test of Claude Workflows on his codebase found and fixed 144 bugs

The issues ranged from highly impactful to embarrassing oversights

434225205107.9K

#320

Original post

Mikhail Parakhin@MParakhin#903inAI

Have been extensively testing Claude Workflows this weekend, with the best model possible. Threw it at my whole code base, combing for bugs. 144 found and fixed! Geez... It is a large code base, for sure, but 144?!! Some are very impactful, some are downright embarrassing...

3:55 PM · Jun 7, 2026 · 121.8K Views

/AI1d ago

Shopify CTO Mikhail Parakhin says a weekend test of Claude Workflows on his codebase found and fixed 144 bugs

The issues ranged from highly impactful to embarrassing oversights

434225205107.9K

#320

Original post

Mikhail Parakhin@MParakhin#903inAI

3:55 PM · Jun 7, 2026 · 121.8K Views

Sentiment

Some users praise Claude for honestly finding many bugs in a large codebase with fresh eyes, while others advise verifying its audit results carefully.

Pos

58.3%

Neg

41.7%

12 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3KLIKES8

Andrew McCalip@andrewmccalip

@MParakhin What's the highest amount of agents you've seen it spin up? I've seen 54 to 133.

22h3K81

BOOKMARKS3REPLIES2

Mikhail Parakhin@MParakhin

@stombya I always run a critique loop between several models. The false positive was 1 out of 174 (found 30 more since I posted). Without critique loop it is about 20 out of 174.

21h1.1K63

RETWEETS4

Mikhail Parakhin@MParakhin

1d121.8K459225

Josh@joshmanders

@MParakhin Did similar, working through them now. (had 30 total)

1d1.5K61

Anonymous Quant@AnonymousQuant

@MParakhin I do not recommend letting Claude look for bugs. Or rather, not actioning on anything unless Codex signs off on it.

1d2K4

stombya@stombya

@MParakhin What False positive rate are you getting

1d1.8K4

Mikhail Parakhin@MParakhin

@iMuffined Still need to run a critique loop

21h1.7K41

Mikhail Parakhin@MParakhin

@AnonymousQuant I do run a critique loop on everything, of course.

21h1.1K21

iMuffin@iMuffined

@MParakhin mythos?

1d2.1K1

Mikhail Parakhin@MParakhin

@andrewmccalip About the same

21h1.7K5

Subramanya N@subramanya

@MParakhin this is the part that changes code review. agents can find a huge pile of issues, but the engineering work becomes deciding which fixes are worth merging.

1d1.6K1

Andrew McCalip@andrewmccalip

@MParakhin I for one am thrilled to purchasing tokens at such a rate of 100+ agents at once. It was a bid burdensome to have to micromanage then one at a time.

21h3175

Mayank Gupta@techfreakworm

@MParakhin 144 is wild, but the number I always want is the accept ratio — how many did you actually review before merging? My bug-combing runs surface real issues mixed with confidently-wrong rewrites, and the reviewing is where the cost shows up.

15h7553