Sangyun Lee and Giulia Fanti propose a "sleep" phase to convert LLM context into fast weights and clear KV cache

VIEWS56.7KBOOKMARKS671LIKES836RETWEETS135REPLIES29

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

Language Models Need Sleep

"Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache."

"increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

34d56.7K836671

himanshu@himanshustwts

very cool research (and nomenclature)

34d43.7K758461

elvis@omarsar0

Language models need "sleep"

DAIR.AI@dair_ai

// Language Models Need Sleep //

Let your agents "sleep", folks.

On a serious note, this is a fascinating paper on getting the most from long-horizon agents.

Here is the problem with agents today: Attention scales badly with context length, so long-horizon agents keep paying a quadratic tax at inference time.

This work proposes a sleep-like consolidation step instead. The model periodically does N offline recurrent passes over recent context, writes the result into persistent fast weights in its state-space blocks, then clears the KV cache.

The effect is that extra compute moves to sleep while wake-time prediction stays low latency. On cellular automata, multi-hop graph retrieval, and a math reasoning task where a plain transformer and SSM-attention hybrids fail, longer sleep durations improve performance, with the biggest gains on examples that need deeper reasoning.

Why does it matter?

It points at an alternative to ever-larger KV caches for agents that run for a long time. Consolidate, then forget the raw tokens.

Paper: https://arxiv.org/abs/2605.26099

Learn to build effective AI agents in our academy: https://academy.dair.ai/

34d12.6K7255

VDN@VDN_00001

What you do is "meaning condensation", or crystallization of the context. Something I talk about for many months already, and which I've already implemented successfully without external DB or any weight manipulation, by simply creating targetted context "memory capsules".

Those are the obtained results:

34d1.7K125

himanshu@himanshustwts

https://www.alphaxiv.org/abs/2605.26099

34d76143

Andrey Kurenkov@andrey_kurenkov

"Sleep-like consolidation pattern" really is a slick way to say 'Test-time training' (which appears to be the most relevant prior work - no shade, good on them for having a solid related works section!)

Oh oops there is also this:

himanshu@himanshustwts

very cool research (and nomenclature)

34d24311

Brian@bryanmusuku

@iScienceLuvr reminds me of experience replay

34d1534

Pooja Algikar@algikarpooja

@iScienceLuvr

34d2483

Tom Dupuis@bellmantd

@himanshustwts at this point continual learning, short and long term memory and infinite context are all merging and converging to the same techniques

34d833

Kiril Bangachev@BangachevKiril

@iScienceLuvr @ArashVahdat Sleepmaxing for LLMs!

More seriously, curious if in deployment you would do this during an interaction with the LLM (increasing latency) or once conversation seems to have ended for now (say in hours when you don’t expect queries)?

34d5361

47fucb4r8curb4fc8f8r4bfic8r@47fucb4r8c69323

@himanshustwts This is just a fancier kind of compaction.

34d812

Steven Collard@stalmico

@iScienceLuvr so we're literally giving llms circadian rhythms now

34d2481

Rahul Chavan@codecroc

@iScienceLuvr never thought this would be true but sleep intervals trade immediate token recall

34d786

Anand C. Patel, MD MS@anandcpatelmdms

@iScienceLuvr Persistent fast weight sound like short term memory?

34d754

Chris Groves@CGrovesNLN

@iScienceLuvr interesting

how do dolphins do it?

34d2301

Pooja Algikar@algikarpooja

@iScienceLuvr what time?

34d2301

himanshu@himanshustwts

IMP: Behrouz et al were early to adopt the nomenclature for the mentioned literature. I have been notified the title of above paper will be modified soon!

34d2181

Super Watcher@superaiwatcher

@iScienceLuvr Sleep cycles are a stopgap. By 2027, architectural shifts to state-space models will render context-window management and consolidation hacks entirely obsolete.

34d599

AIPathfinder@NavigateAI_

@iScienceLuvr Language Models Need Sleep: a smart direction for long-horizon tasks and deeper reasoning #AI #MachineLearning #NLP

34d555

PleaseHoldMyHalo@PlsHoldMyHalo

@iScienceLuvr We have known about this for months...

34d1201