12h ago

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Offline consolidation passes reduce quadratic attention costs during inference.

1
Original post

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

3:35 AM · May 26, 2026 View on X
Reposted by

abs: https://arxiv.org/abs/2605.26099

Tanishq Mathew Abraham, Ph.D.Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

10:35 AM · May 26, 2026 · 44.7K Views
10:35 AM · May 26, 2026 · 3.9K Views
Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache · Digg