/AI2h ago

FlashMemory Cuts DeepSeek-V4 KV Cache 90% for 500K Context

18433113.3K
Original post

This is pretty crazy ("Project status" under the abstract is also an insane detail). Further shrinking of V4 cache footprint to… 360 MB per 1M context? 360 *bytes* per token? Just 2 OOMs from the raw plaintext limit? Calling CSA «conventional» is crazy work lmao. @antirez !!

8:29 PM · Jun 9, 2026 · 12K Views
Sentiment

Users praise the candid reporting on FlashMemory's real limitations when cutting DeepSeek-V4 KV cache 90% for 500K context, calling it a big scientific win.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.3KLIKES13

Maybe this project's failure is a big scientific win though Can't remember the last time I've seen such candid reporting on real limitations, compromises and false hopes in the discussion

This is pretty crazy ("Project status" under the abstract is also an insane detail). Further shrinking of V4 cache footprint to… 360 MB per 1M context? 360 *bytes* per token? Just 2 OOMs from the raw plaintext limit? Calling CSA «conventional» is crazy work lmao. @antirez !!

2hViews 1.3KLikes 13Bookmarks 0