/Tech8h ago

AI Models Recommend Normalization Methods That Fail Stated Criteria

5586169.1K

Original post unavailable.

/Tech8h ago

AI Models Recommend Normalization Methods That Fail Stated Criteria

5586169.1K

Original post unavailable.

Sentiment

Negative users criticize AI models for recommending popular normalization methods that fail benchmarks, blaming lack of deep thinking, flawed evaluation, and biases from training data and RLHF.

Pos

0.0%

Neg

100.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS726LIKES15RETWEETS1REPLIES2

sina@sinabooeshaghi

12. The point is that AI isn't thinking deeply. It's not reading the literature, developing reasonable evaluation criteria, nor benchmarking normalization methods against it. It repeats the field's default and confidently justifies it. In this case, we know the answer. But what happens when we don't?

9h726153

BOOKMARKS5

sina@sinabooeshaghi

13. In conclusion, if you want a scRNAseq normalization method to best satisfy - depth norm - variance stabilization - monotonicity

Run PFlogPF (package coming soon).

The code is available here: http://github.com/pachterlab/BHGP_2022

The manuscript is available here: https://www.biorxiv.org/content/10.1101/2022.05.06.490859v3

9h572105

sina@sinabooeshaghi

11. And the method that does satisfy all three isn't new. It's the centered log-ratio, from 1982! This transform has been available for 40+ years, passed over in hundreds of thousands of scRNAseq studies for methods that perform poorly with respect to these desiderata.

9h6318

sina@sinabooeshaghi

10. Each method fails one of the three. sctransform is not monotone (it scrambles within-cell gene order). The shifted log doesn't remove depth (that's the whole reason for the second PF step in PFlogPF). The table below, from our Supplement, shows the Axioms and whether each method satisfies them.

9h7185

sina@sinabooeshaghi

⤴️ Top of the thread

9h713

sina@sinabooeshaghi

Corresponding thread:

9h5204

Uria Mor@uria_mor

@lpachter "But this is what all labs are doing". If I had one shekel for every time I heard that sentence...

6h681

Uria Mor@uria_mor

@lpachter Sure: you used this function from this package from (authoritative name) lab... but are we really certain that treating zero read counts as "missing values" then imputing them via nuclear norm minimzation makes sense here???

6h381

Ernesto Heine@SeniorLazarus

Models are pre-trained on formal scientific literature, which naturally reflects what is popular and widely published. The real problem appears during post-training with RLHF. This stage acts as a noisy channel that pushes the model’s recommendations toward whatever is mainstream and “safe”, depending on the humans (and their biases) hired to provide feedback. Because the same small set of companies and contractors usually handle this RLHF work across different frontier models, most LLMs converge on the same answers and simply repeat the popular methods — even when they are not the most appropriate for a given task or dataset. This is exactly what I see every single day.

3h251

Lior Pachter@lpachter

@uria_mor I’m going to cry. 😭

6h251

Mathieu Bourdenx@mathieubourdenx

@sinabooeshaghi Is that a prompt problem? If you ask for a survey of most recent methods and a decision based on benchmarks what does it say?

6h8