12h ago

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Offline consolidation passes reduce quadratic attention costs during inference.

351.1K14477865.5K

——1——

Original post

#359Tanishq Mathew Abraham, Ph.D.@ISCIENCELUVR

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

3:35 AM · May 26, 2026

Reposted by

#1179@MENHGUIN

#359Tanishq Mathew Abraham, Ph.D.@ISCIENCELUVR

abs: https://arxiv.org/abs/2605.26099

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

10:35 AM · May 26, 2026 · 44.7K Views

10:35 AM · May 26, 2026 · 3.9K Views

QUOTE POST

#475elvis@OMARSAR0

Language models need "sleep"

8:08 PM · May 26, 2026 · 4.2K Views

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Cluster engagement

Sentiment