/Tech8h ago

AI Models Recommend Normalization Methods That Fail Stated Criteria

5586169.1K
Original post unavailable.
Sentiment

Negative users criticize AI models for recommending popular normalization methods that fail benchmarks, blaming lack of deep thinking, flawed evaluation, and biases from training data and RLHF.

Pos
0.0%
Neg
100.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS726LIKES15RETWEETS1REPLIES2
sina@sinabooeshaghi

12. The point is that AI isn't thinking deeply. It's not reading the literature, developing reasonable evaluation criteria, nor benchmarking normalization methods against it. It repeats the field's default and confidently justifies it. In this case, we know the answer. But what happens when we don't?

9hViews 726Likes 15Bookmarks 3
BOOKMARKS5
sina@sinabooeshaghi

13. In conclusion, if you want a scRNAseq normalization method to best satisfy - depth norm - variance stabilization - monotonicity

Run PFlogPF (package coming soon).

The code is available here: http://github.com/pachterlab/BHGP_2022

The manuscript is available here: https://www.biorxiv.org/content/10.1101/2022.05.06.490859v3

9hViews 572Likes 10Bookmarks 5
sina@sinabooeshaghi

11. And the method that does satisfy all three isn't new. It's the centered log-ratio, from 1982! This transform has been available for 40+ years, passed over in hundreds of thousands of scRNAseq studies for methods that perform poorly with respect to these desiderata.

9hViews 631Likes 8
sina@sinabooeshaghi

10. Each method fails one of the three. sctransform is not monotone (it scrambles within-cell gene order). The shifted log doesn't remove depth (that's the whole reason for the second PF step in PFlogPF). The table below, from our Supplement, shows the Axioms and whether each method satisfies them.

9hViews 718Likes 5
sina@sinabooeshaghi

⤴️ Top of the thread

9hViews 713
sina@sinabooeshaghi

Corresponding thread:

9hViews 520Likes 4
Uria Mor@uria_mor

@lpachter "But this is what all labs are doing". If I had one shekel for every time I heard that sentence...

6hViews 68Likes 1
Uria Mor@uria_mor

@lpachter Sure: you used this function from this package from (authoritative name) lab... but are we really certain that treating zero read counts as "missing values" then imputing them via nuclear norm minimzation makes sense here???

6hViews 38Likes 1
Ernesto Heine@SeniorLazarus

Models are pre-trained on formal scientific literature, which naturally reflects what is popular and widely published. The real problem appears during post-training with RLHF. This stage acts as a noisy channel that pushes the model’s recommendations toward whatever is mainstream and “safe”, depending on the humans (and their biases) hired to provide feedback. Because the same small set of companies and contractors usually handle this RLHF work across different frontier models, most LLMs converge on the same answers and simply repeat the popular methods — even when they are not the most appropriate for a given task or dataset. This is exactly what I see every single day.

3hViews 25Likes 1
Lior Pachter@lpachter

@uria_mor I’m going to cry. 😭

6hViews 25Likes 1
Mathieu Bourdenx@mathieubourdenx

@sinabooeshaghi Is that a prompt problem? If you ask for a survey of most recent methods and a decision based on benchmarks what does it say?

6hViews 8