can't believe we called it a KV cache when the "KV" part is clearly an implementation detail 😞
Modal's Charles Frye argues the term 'KV cache' is misleading, calling the key-value distinction a mere implementation detail
Phil Chen argued the entire cache is an implementation detail.
Users expressed frustration with the KV Cache name as an ironic leftover artifact from lower engineering abstractions.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@charles_irl Hm well maybe the kv cache was actually the implementation detail
can't believe we called it a KV cache when the "KV" part is clearly an implementation detail 😞

@vishctx no! state space models just have a state. some models now have K == V.

@charles_irl I always found KV Cache to be like naan bread. KV and Cache kinda mean the same thing!
@philhchen state/past caches seem pretty important for sequence models!
@charles_irl Hm well maybe the kv cache was actually the implementation detail
@philhchen you might also just, like, call it a _memory_
it's clearly some kind of associative memory, whether it's expressed via keys and values, or keys that are values, or just a big vector that's written/read across seqlen
@philhchen state/past caches seem pretty important for sequence models!
@charles_irl tradeoff between specificity and overloading with SRAM / VRAM / VMEM / HBM memory too
@philhchen you might also just, like, call it a _memory_
it's clearly some kind of associative memory, whether it's expressed via keys and values, or keys that are values, or just a big vector that's written/read across seqlen

@charles_irl the hindsight of building an inference engine AS wacky architectures like dsv4 and gemma 4 are coming out is worth a lot of refactoring time, i'll tell ya. i implemented zaya1 cca and laguna and i was like "ok, layers have state and models have state. S is parametric"

@charles_irl the irony that kv is itself a caching detail for an even lower abstraction. now it feels like were stuck with naming artifacts from engineering accidents

@charles_irl K: keys, V: values. It's a "Attention is All you need" nomenclature

@vishctx @charles_irl chai tea!