16h ago

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Offline consolidation passes reduce quadratic attention costs during inference.

401.2K17090181.3K

——1——

Original post

#359Tanishq Mathew Abraham, Ph.D.@ISCIENCELUVR

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

3:35 AM · May 26, 2026

Reposted by

#1179@MENHGUIN

#359Tanishq Mathew Abraham, Ph.D.@ISCIENCELUVR

abs: https://arxiv.org/abs/2605.26099

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

10:35 AM · May 26, 2026 · 50K Views

10:35 AM · May 26, 2026 · 4.2K Views

QUOTE POST

#475elvis@OMARSAR0

Language models need "sleep"

8:08 PM · May 26, 2026 · 7.3K Views

QUOTE POST

#885Andrey Kurenkov@ANDREY_KURENKOV

"Sleep-like consolidation pattern" really is a slick way to say 'Test-time training' (which appears to be the most relevant prior work - no shade, good on them for having a solid related works section!)

Oh oops there is also this:

himanshu@himanshustwts

very cool research (and nomenclature)

2:22 PM · May 26, 2026 · 20.4K Views

2:39 AM · May 27, 2026 · 51 Views

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Cluster engagement

Sentiment