/Tech7h ago

Ridge Regression Still Tops Virtual Cell Models After Two Years

82241911217.8K

Original post

Two years ago the best virtual cell model was ridge regression and yesterday the best virtual cell model was ... ridge regression. But sure, throw a billion more parameters at the problem, no one is stopping you.

Adam Green@adamlewisgreen

18 months after posting this tweet, the AI for science commentariat is still proclaiming the death of single-cell scaling laws on the basis of... {checks notes}... a model sweep ranging from 1 million to a whopping 10 million parameters. (but unlike 18 months ago, these proclamations now come wrapped in premium AI-written slop, giving them a glittering verisimilitude of rigor) left as an exercise for the reader: generalize from this example to a meta-update about how epistemically adversarial the scientific environment we're operating in is (for extra credit, partial out the effects of mood affiliation and status deferral)

2:44 PM · Jun 10, 2026 · 10K Views

/Tech7h ago

Ridge Regression Still Tops Virtual Cell Models After Two Years

82241911217.8K

#1779

Original post

Sasha Gusev@SashaGusevPosts

Adam Green@adamlewisgreen

2:44 PM · Jun 10, 2026 · 10K Views

Sentiment

Many users criticize tiny model sweeps for dismissing single-cell AI scaling laws as magical thinking that ignores proper experimental design, while a few note ridge regression's lasting edge in virtual cell models.

Pos

25.0%

Neg

75.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS7.3KBOOKMARKS46LIKES71RETWEETS5REPLIES2

Anshul Kundaje@anshulkundaje

Self-supervised learning on observational scRNAseq especially with naive reconstruction or masking losses will never deliver a causal model of gene regulation. It will learn statistical structure in expression space, not the mechanisms that generate regulatory change. 1/

Adam Green@adamlewisgreen

6h7.3K7146

Anshul Kundaje@anshulkundaje

If u are building a large model, better make sure it actually reliably trashes the simpler approaches on outcomes that matter or offers unique insights & make sure it's really worth the bang for the buck. Nobody is going to use huge models for marginal or no gains. 6/

Anshul Kundaje@anshulkundaje

However, one can solve many problems eg cell type annotation, batch correction, perturbation prediction (within specific limits around the training/fine tuning data) without ever learning a causal model. Simple approaches can do extremely well when provided the right data. 5/

6h502121

Anshul Kundaje@anshulkundaje

People keep confusing predictive representation learning with causal discovery. Without perturbations, temporal information, genetic instruments, anchoring in sequence or strong biologically causal structural assumptions, there will be no causal model of gene regulation. 2/

6h21241

Anshul Kundaje@anshulkundaje

@elephant_ben @SashaGusevPosts All deep learning models can scale in principle. But the data input & the learning strategy have to match the end goals. It is literally impossible to learn causal models of gene regulation via SSL on observational scRNAseq alone. Adam simply does not get this.

7h2541

Anshul Kundaje@anshulkundaje

@adamlewisgreen @SashaGusevPosts I'm sorry you think X-Cell is SOTA on metrics that make no sense. And also u literally fine tune the model on perturbation data & test on the easiest prediction task. Your base model is a virtual cell NOT. Come on man.

6h3231

Adam Green@adamlewisgreen

@SashaGusevPosts We did throw a billion parameters at it, and performance improved monotonically with scale.

6h1781

Anshul Kundaje@anshulkundaje

It's worth noting that predictive representations (including embeddings from scFMs) can be useful to learn causal models. But without the right data inputs & expt design, it's literally impossible to magically learn biologically & statistically causal models. 3/

6h1932

Elephant_Frog@elephant_ben

@SashaGusevPosts Honestly the fact a simpler model still works suggests to me, there are “laws of cells” and such and brute force alone can’t really get better than that.

Which is pretty cool.

7h1022

Anshul Kundaje@anshulkundaje

You can always pull out anecdotes from your models representation where some correlation structure reveals causal biology. Eg. Co-expression networks certainly carry nuggets of causal relationships but there are no guarantees. This doesn't make the model causal. 4/

6h301

Anshul Kundaje@anshulkundaje

6h291

Anshul Kundaje@anshulkundaje

On the other hand, if u provide a model that consistently delivers what it promises, everyone will keep their mouths shut & happily use it. Quite easy to make the case. The model should be able to speak for itself. 7/7

6h633

Sasha Gusev@SashaGusevPosts

@adamlewisgreen it's great to try things, but ridge regression also beat your model (Pearson delta 0.3755 vs 0.41 by Rhaister)

6h31

Alex Strudwick Young@AlexTISYoung

@anshulkundaje There's an issue with people thinking AI models are some kind of magic that obviates the need for proper experimental design.

6h19