
From what I can tell, this preview model is still not really leveraging the massive amounts of molecular profiling data across many species. 6/
Users are optimistic about AI models using external annotations and biological knowledge in post-training, praising included benchmarks, inductive biases, and biodefense focus as a promising step forward.

From what I can tell, this preview model is still not really leveraging the massive amounts of molecular profiling data across many species. 6/

It is great to see an actual long range regulatory benchmark in the blog post (enhancer-to-gene linking) based on CRISPRi FlowFiSH data. Previous long context DNALMs have avoided these benchmarks cuz those models don't learn reg. elements well or their long range effects 17/

Fine mapping also has its problems but I wud certainly treat MPRA nominations from a cell-line as a stronger gold standard. Not a huge deal cuz the model seems to do overall pretty well (although it is a small set of loci tested). 16/

That being said, the performance numbers reported (not verifiable at the moment) seem strong & non trivial. Now some specific comments 10/

When it does, it will be even more powerful, if it is able to seamlessly transfer massive functional data from limited species into poorly profiled species & siphon evolutionary information adaptively into specific species (e.g. humans) that have a lot of functional data. 7/

A huge proportion of functional human DNA does not have strong conservation signatures so it remains to be seen how these models do in such regimes. 9/

The comparisons to the Borzoi (as a representative of supervised sequence S2F models) needs some nuance (on TraitGym GWAS / QTLs in particular). For disease/trait variant fine mapping, it is critical that S2F models are trained on disease relevant cell type data. 11/

The base Borzoi model lacks many disease relevant cell contexts (even though the data is often available ... later versions do train on single cell pseudobulks from primary cell types). So lack of correct context can result in performance drops. 12/

Also, want to note that from the model description available there are a LOT of inductive biases in the architecture (block conv for motif learning, not a pure Xformer), sparse attention anchoring on functional annotations etc. I am happy to see this. 22/

Some broad comments about the benchmarks. All the benchmarks are tilted quite a bit toward conserved elements. ClinVAR definitely is. Even TraitGym which is focused on common variants as designed has a tilt towards conserved elements. 8/

It does suggest that Omnii does better without requiring such contextual info but a stronger comparator wud be when S2F models have the right cell contexts + are explicitly adapted to predict disease risk (instead of molecular effects). 13/

The AD case study with the MPRA vs fine mapping is a bit weird. The MPRA is treated a bit like a gold standard. MPRAs are often used to "validate" variant nominations. But there is a lot of incoming evidence that MPRAs are not remotely reliable to validate disease variants. 15/

To be fair, proper benchmarking in this zone is still quite nebulous, so this is not a major critique of the benchmarks used here. 14/

That being said, there are clear conventions for this benchmark i.e. distance stratification & auPR instead of auROC. Will be important to see this & a direct head-to-head against the SOTA (AlphaGenome, rE2G etc) 18/

I am hopeful there will be more details incoming, a chance for others to battle test the models & hopefully this is the beginning of an actual fusion of evolutionary self-supervision with valuable specific specific functional context conditioning. 25/

I think it is foolish to not use freely available biological knowledge, annotations directly or indirectly when designing & training models just to desperately adhere to "bitter lesson". 23/

Once strong models are achieved, the lessons learned can help make them sleeker / faster by replacing special purpose modules with optimal general purpose ones if that really helps. 24/

I wud also love to see future iteration of this model post-trained on massive functional data on the exact Borzoi/AlphaGenome/PromoterAI benchmark suite 19/

But, overall this looks like a promising step forward giving them the benefit of doubt without having seen any of the actual details of the model. 21/

These collectively address difficult benchmarks for non-coding regulatory DNA that are quite orthogonal to ClinVAR and TraitGym but essential to test whether models can effectively predict diverse types of variant effects 20/