16h ago

Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

Offline consolidation passes reduce quadratic attention costs during inference.

1
Original post

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

3:35 AM · May 26, 2026 View on X
Reposted by

abs: https://arxiv.org/abs/2605.26099

Tanishq Mathew Abraham, Ph.D.Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

10:35 AM · May 26, 2026 · 50K Views
10:35 AM · May 26, 2026 · 4.2K Views

"Sleep-like consolidation pattern" really is a slick way to say 'Test-time training' (which appears to be the most relevant prior work - no shade, good on them for having a solid related works section!)

Oh oops there is also this:

himanshuhimanshu@himanshustwts

very cool research (and nomenclature)

2:22 PM · May 26, 2026 · 20.4K Views
2:39 AM · May 27, 2026 · 51 Views
Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache · Digg