14h ago

Non-Randomized Controls Introduce Confounds in ML Datasets

——0——
Original post
Anshul KundajeAK#1675@ANSHULKUNDAJEOPRon AlfaRARon Alfa|@RONALFA

People are still generating “ML datasets” with all kinds of confounds. If the controls are all next to each other on the edge of the plate, no randomization, ngmi.

11:23 AM · May 24, 2026 View on X

Sentiment

Pos0%
Neg100%

Many users criticized MLbio researchers for failing to grasp basic experimental design around non-randomized controls in ML datasets and for overclaiming that models can automatically fix technical artifacts.

2 comments with sentiment.

57252310.7K

Cluster engagement

92 snapshots