1d ago

Meta AI researchers introduce Self-Pruned Key-Value Attention

71884013117.6K

——0——

Meta AI researchers introduced Self-Pruned Key-Value Attention to reduce memory use in large language models. The approach trains models to predict future utility of key-value pairs and selectively retain only relevant entries in the KV cache. It incorporates a utility predictor and local buffer to limit cache growth. The May 15 2026 paper lists affiliations with Meta FAIR and CentraleSupélec.

Original post

#865@CSHORTEN30 @MANUELFAYSSE

Manuel Faysse@MANUELFAYSSE

🚨 Do LLMs need to store everything they read in memory? To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵

2:12 AM · May 15, 2026

Cluster engagement

30 snapshots

Reposted by

#865@CSHORTEN30

#440@WEIJIASHI2