/AI1d ago

Shopify CTO Mikhail Parakhin says a weekend test of Claude Workflows on his codebase found and fixed 144 bugs

The issues ranged from highly impactful to embarrassing oversights

434225205107.9K
Original post
Mikhail Parakhin@MParakhin#903inAI

Have been extensively testing Claude Workflows this weekend, with the best model possible. Threw it at my whole code base, combing for bugs. 144 found and fixed! Geez... It is a large code base, for sure, but 144?!! Some are very impactful, some are downright embarrassing...

3:55 PM · Jun 7, 2026 · 121.8K Views
Sentiment

Some users praise Claude for honestly finding many bugs in a large codebase with fresh eyes, while others advise verifying its audit results carefully.

Pos
58.3%
Neg
41.7%
12 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3KLIKES8
Andrew McCalip@andrewmccalip

@MParakhin What's the highest amount of agents you've seen it spin up? I've seen 54 to 133.

22hViews 3KLikes 8Bookmarks 1
BOOKMARKS3REPLIES2

@stombya I always run a critique loop between several models. The false positive was 1 out of 174 (found 30 more since I posted). Without critique loop it is about 20 out of 174.

21hViews 1.1KLikes 6Bookmarks 3
RETWEETS4

Have been extensively testing Claude Workflows this weekend, with the best model possible. Threw it at my whole code base, combing for bugs. 144 found and fixed! Geez... It is a large code base, for sure, but 144?!! Some are very impactful, some are downright embarrassing...

1dViews 121.8KLikes 459Bookmarks 225
Josh@joshmanders

@MParakhin Did similar, working through them now. (had 30 total)

1dViews 1.5KLikes 6Bookmarks 1
Anonymous Quant@AnonymousQuant

@MParakhin I do not recommend letting Claude look for bugs. Or rather, not actioning on anything unless Codex signs off on it.

1dViews 2KLikes 4
stombya@stombya

@MParakhin What False positive rate are you getting

1dViews 1.8KLikes 4

@iMuffined Still need to run a critique loop

21hViews 1.7KLikes 4Bookmarks 1

@AnonymousQuant I do run a critique loop on everything, of course.

21hViews 1.1KLikes 2Bookmarks 1
iMuffin@iMuffined

@MParakhin mythos?

1dViews 2.1KLikes 1
Subramanya N@subramanya

@MParakhin this is the part that changes code review. agents can find a huge pile of issues, but the engineering work becomes deciding which fixes are worth merging.

1dViews 1.6KBookmarks 1
Andrew McCalip@andrewmccalip

@MParakhin I for one am thrilled to purchasing tokens at such a rate of 100+ agents at once. It was a bid burdensome to have to micromanage then one at a time.

21hViews 317Likes 5
Mayank Gupta@techfreakworm

@MParakhin 144 is wild, but the number I always want is the accept ratio — how many did you actually review before merging? My bug-combing runs surface real issues mixed with confidently-wrong rewrites, and the reviewing is where the cost shows up.

15hViews 755Likes 3
omer@omerhuseyin

@MParakhin lol this is both impressive and terrifying

how long did it take to run through everything?

1dViews 2.2KLikes 2
Yingzhe@Yingzhe0301

@MParakhin best model possible🤔

22hViews 1KLikes 1
cqk@cqkten

@iMuffined @MParakhin seems like it...

1dViews 220Likes 2
Ratif@RatifSsemujju

@MParakhin Made Claude run through the entire Shopify?

12hViews 177Likes 1
Load more posts