
It has been a wonderful surprise for us to see such an interpretable, inexpensive model (trained in seconds, predictions in milliseconds), accomplish what virtual cell models (typically with far more complex architectures) promised to eventually do.
Many users celebrated Anshul Kundaje releasing the Rhaister Model and Emerald Bay Drug Phenotype Dataset, calling the work incredible and a huge advancement for the field.

It has been a wonderful surprise for us to see such an interpretable, inexpensive model (trained in seconds, predictions in milliseconds), accomplish what virtual cell models (typically with far more complex architectures) promised to eventually do.

A more in-depth tweetorial from @vallens

Unlike “virtual cell models," Rhaister goes back to the basics: it builds a minimalist perturbation model from the ground up, directly on summary statistics of the data.

These results demonstrate we can accomplish a lot by going back to basics and building models that, by design, reflect the statistics of the underlying data. Rhaister shows that scaling the right data is far more valuable than scaling parameters.

And it is capable of predicting more complex drug phenotypes such as sensitivity in cancer cells far beyond simple baselines. And there is room to make it much better, so stay tuned.

With a handful of example perturbations in a new context, Rhaister predicts responses for other perturbations with accuracies within experimental noise, exceeding state of the art virtual cell models in performance.

@nalidoust So excited for this to be finally out! Incredible work led by @vallens . Rhaister allowed us to do launch hundreds of dataset scaling experiments in hours in what would have taken months with the previous generation of perturbation models 🚀🚀

It is the first model we have seen that performs significantly beyond mean baselines in a zero-shot setting; a task commonly proposed as a promise of virtual cell models.

Despite its simplicity, it is the first model that shows consistent scaling with more perturbation data.

Read more Paper: https://tahoebio-assets.com/rhaister-manuscript.pdf Model and datasets: https://huggingface.co/collections/tahoebio/rhaister

What excites us more is what comes next: fast and interpretable, Rhaister is far better suited to advance biological reasoning in close iteration with the emerging agentic workflows.

We show that by testing it against our very unique Emerald Bay dataset, generated using our Mosaic platform, measuring 5-day sensitivity of cancer models to various drugs. And we are open-sourcing that dataset along with Rhaister.

@nalidoust Here's to another moon landing 🚀💙

@nalidoust Huge advancement for the field 🔥💯

@nalidoust Let’s goooo! 🚀🚀

@nalidoust 🔥🔥🚀

@nalidoust 🚀🚀🚀

@nalidoust Congrats!

serious dataset, and the right level to be building at. congrats on the drop.
the gap i keep hitting on the sponsor side is cell-line readout versus patient. matching an experimental assay means matching what the cells did in a dish. it does not yet tell you what the tumor does in a person on week six.
has anyone shown the prediction holds against real patient outcomes prospectively, not just against the next in-vitro assay? that is the step the field keeps getting stuck on.