Modal's Charles Frye argues the term 'KV cache' is misleading, calling the key-value distinction a mere implementation detail
Phil Chen argued the entire cache is an implementation detail.
@philhchen state/past caches seem pretty important for sequence models!
@charles_irl Hm well maybe the kv cache was actually the implementation detail
@philhchen you might also just, like, call it a _memory_
it's clearly some kind of associative memory, whether it's expressed via keys and values, or keys that are values, or just a big vector that's written/read across seqlen
@philhchen state/past caches seem pretty important for sequence models!
@charles_irl Hm well maybe the kv cache was actually the implementation detail
can't believe we called it a KV cache when the "KV" part is clearly an implementation detail π
@charles_irl tradeoff between specificity and overloading with SRAM / VRAM / VMEM / HBM memory too
@philhchen you might also just, like, call it a _memory_ it's clearly some kind of associative memory, whether it's expressed via keys and values, or keys that are values, or just a big vector that's written/read across seqlen