14h ago

Yoav Gelberg proposes training LLMs to produce inherently compressible representations for more efficient KV cache compaction

This addresses the efficiency limits of post-hoc tools like Cartridges.

61843613934.1K

——0——

Original post

#199@MMBRONSTEINOP

Yam Eitan@YTN_YM

1/ How much can you compress an LLM’s KV cache? tl;dr it depends on how you train your model. Many strong context compaction methods, such as Cartridges and attention matching, operate post-hoc: given a fixed model and a context, they try to compress the resulting KV cache. @yoav_gelberg and I ask the complementary question: can we train the model to produce KV representations that are easier to compress? In other words: keep the compression method fixed, and change the representations it sees.

6:24 AM · May 25, 2026

Reposted by

#740@HANGUO97

Yoav Gelberg proposes training LLMs to produce inherently compressible representations for more efficient KV cache compaction

Sentiment

Cluster engagement