12h ago

Researcher Argues Native Multimodal Models Lack Dynamic Compression For Long Contexts

4474316.8K

——0——

Original post

For the record, here is my TLDR on Native Multimodal Models (NMMs) NMMs will eventually work slightly better. They are more elegant but data hungry But for very long context tasks, NMMs don't stand a chance. A dynamic compressor is needed. Dynamic means that the compression rate depends on input, which is something NMMs can't handle. This will be true for both visual and textual tasks, and it will be the only way to unlock 100M context lengths in a clean way (there's plenty of hacks to simulate longer context lengths but I'm not talking about those)

11:39 PM · May 19, 2026