/AI8h ago

VaSE Reduces KV Cache Memory in Reasoning Models Without Training

--0--
Original posts
Quote posts
Reposts
Original postRobin Jia#405
Deqing Fu@DeqingFu

Introducing VaSE: Value-Aware Stochastic KV Cache Eviction.

Reasoning models think in CoT, bloating the KV cache. Eviction caps memory but suffers capability drop. VaSE is a training-free recipe that cuts that cost: keep large-magnitude value states, evict stochastically.

5:21 PM 路 Jun 3, 2026 路 43.7K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS1.4KBOOKMARKS2LIKES16RETWEETS3REPLIES1
Ting-Yun Chang@CharlotteTYC

This is the final project of my PhD journey 馃帗 I've thought a lot about how to make interp actionable in my previous projects. I believe efficiency follows naturally: when we have a deep understanding of the model, we can figure out where to be frugal w/o hurting model accuracy. The Attention Sink and LLM.int8() papers set great examples, and they deeply inspire our paper. Mirroring the findings on value-state drain, we find that large-range value states are equally important in KV cache eviction. Evicting these outliers causes reasoning models to enter an endless self-reflection loop, while keeping them in the cache maintains accuracy. I'm extremely grateful to my amazing coauthors and supportive advisors.

Deqing Fu@DeqingFu

Introducing VaSE: Value-Aware Stochastic KV Cache Eviction.

Reasoning models think in CoT, bloating the KV cache. Eviction caps memory but suffers capability drop. VaSE is a training-free recipe that cuts that cost: keep large-magnitude value states, evict stochastically.

4hViews 1.4KLikes 16Bookmarks 2
VaSE Reduces KV Cache Memory in Reasoning Models Without Training 路 Digg