3h ago

Researcher Questions Carbon DNA Model's Tree Of Life Reconstruction

0
Original post

How do you publish this tree and claim it's a success???!!!

9:34 PM · May 19, 2026 View on X
Michael 英泉 EisenMichael 英泉 Eisen@mbeisen

How do you publish this tree and claim it's a success???!!!

4:34 AM · May 20, 2026 · 11.8K Views
7:12 AM · May 20, 2026 · 291 Views

Am I losing my mind.

What the text says should be happening is NOT happening in the figures? The exons are NOT having higher confidence scores than the introns.

If it can't distinguish exons from introns reliably, I shudder to think what is happening in regulatory DNA 😬😬😬

Leandro von WerraLeandro von Werra@lvwerra

We are releasing Carbon: a crazy fast DNA model Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days. Here are the tricks we used: When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens. Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure. The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size. We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life. https://huggingface.co/spaces/HuggingFaceBio/carbon-demo

4:31 PM · May 19, 2026 · 142.1K Views
7:03 AM · May 20, 2026 · 2.3K Views

I must be missing something here. This cannot be happening.

Anshul KundajeAnshul Kundaje@anshulkundaje

Am I losing my mind. What the text says should be happening is NOT happening in the figures? The exons are NOT having higher confidence scores than the introns. If it can't distinguish exons from introns reliably, I shudder to think what is happening in regulatory DNA 😬😬😬

7:03 AM · May 20, 2026 · 2.3K Views
7:05 AM · May 20, 2026 · 174 Views
Researcher Questions Carbon DNA Model's Tree Of Life Reconstruction · Digg