Meta AI researchers introduce Self-Pruned Key-Value Attention
——0——
Meta AI researchers introduced Self-Pruned Key-Value Attention to reduce memory use in large language models. The approach trains models to predict future utility of key-value pairs and selectively retain only relevant entries in the KV cache. It incorporates a utility predictor and local buffer to limit cache growth. The May 15 2026 paper lists affiliations with Meta FAIR and CentraleSupélec.