This is pretty crazy ("Project status" under the abstract is also an insane detail). Further shrinking of V4 cache footprint to… 360 MB per 1M context? 360 *bytes* per token? Just 2 OOMs from the raw plaintext limit? Calling CSA «conventional» is crazy work lmao. @antirez !!
Users praise the candid reporting on FlashMemory's real limitations when cutting DeepSeek-V4 KV cache 90% for 500K context, calling it a big scientific win.
Most Activity
Maybe this project's failure is a big scientific win though Can't remember the last time I've seen such candid reporting on real limitations, compromises and false hopes in the discussion
This is pretty crazy ("Project status" under the abstract is also an insane detail). Further shrinking of V4 cache footprint to… 360 MB per 1M context? 360 *bytes* per token? Just 2 OOMs from the raw plaintext limit? Calling CSA «conventional» is crazy work lmao. @antirez !!