/Tech13h ago

Rohan Virani argues memory-augmented neural networks like DeepSeek's Engram make retrieval and continual learning intrinsic to AI agents

Engram performs knowledge lookups within the model's forward pass.

744122713.1K

#603

Original post

Sarah Catanzaro@sarahcat21#603inTech

Rohan should have titled this “A trip down memory lane”- but he can be forgiven.

Lots of folks conflating continual learning with RL optimization; the latter may be necessary but likely insufficient.

Here he reviews the different veins of historical and current research that can together enable continually learning and capable agents.

Rohan Virani@rohan_virani

Memory Augmented Neural Networks aren’t new, but their modern counterparts open new axes for scaling agents. I wrote a post on how architectures like Memory Networks and Neural Turing Machines pave the way for making retrieval and continual learning intrinsic to the model itself

The silent revolution has already begun. In January, DeepSeek released Engram, a sparse memory module that looks up knowledge within the forward pass. It split where memory is stored from where reasoning happens, analogous to how MoE made transformers capable of conditional computation. Sequential compute is no longer wasted storing facts.

Retrieval has come a long way from single-vector RAG. We use multi-vector embeddings and post-train models to search over filesystems and vector databases, though latency is high and tool call tokens fill up context. Methods like context compaction e.g. Cartridges reduce this cost, but also shifts the burden to teaching a model when cartridges should be used or updated. Still lots to be done here!

On a complementary axis, there’s a rich lineage where retrieval is trained into the model mid-layer. Memory didn't always live in context alone! @jaseweston et al.’s Memory Networks (2014) made memory an explicit addressable matrix the model queries mid-forward-pass. RETRO by @borgeaud_s et al. scaled it to transformers in 2021. @GuillaumeLample et al.’s Product Key Memory made sparse KV retrieval over a parameterized memory layer more efficient, and recently this was scaled up in Memory Layers by Berges et al. in 2024.

Beyond retrieval, as agent tasks stretch from weeks to months, we’ll increasingly want to write information to the model within an episode. Context engineering can store token-level working information in filesystems, but always-on test-time training enables continuous weights updates as in Learning to Discover at Test Time by @mertyuksekgonul.

Over long horizons, continual learning faces two main problems: how to distill the right signal to learn from and how to integrate information without catastrophic forgetting. Most existing work still uses unsupervised learning objectives, or shallow subsets of parameters, shifting the problem to choosing which LORA to swap in and out. There’s even more work to be done here!

On a second complementary axis, what if we could write relevant information to network parameters during a forward pass without catastrophic forgetting? Neural Turing Machines (Graves et al., 2014) made reads AND writes to external memory differentiable, Differentiable Neural Computers ensured only unused slots were updated, and @santoroAI et al. showed in 2016 you could meta-learn a generalizable write policy across many episodes. Imagine a model with an intrinsic scratchpad for planning over long horizons!

5:13 PM · Jun 9, 2026 · 5.4K Views

/Tech13h ago

Rohan Virani argues memory-augmented neural networks like DeepSeek's Engram make retrieval and continual learning intrinsic to AI agents

Engram performs knowledge lookups within the model's forward pass.

744122713.1K

#603

Original post

Sarah Catanzaro@sarahcat21#603inTech

Rohan should have titled this “A trip down memory lane”- but he can be forgiven.

Lots of folks conflating continual learning with RL optimization; the latter may be necessary but likely insufficient.

Here he reviews the different veins of historical and current research that can together enable continually learning and capable agents.

Rohan Virani@rohan_virani

5:13 PM · Jun 9, 2026 · 5.4K Views

Sentiment

Many users praise Memory Augmented Networks for eliminating the tool-call tax and offering a solid approach to long-term memory in AI agents.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

sisyphus bar and grill@itunpredictable

@rohan_virani "What's old is new again"

basically every rohan virani blog post

12h8082

BOOKMARKS4LIKES6RETWEETS1

Rohan Virani@rohan_virani

More of my thoughts in the full post below:

https://www.amplifypartners.com/blog-posts/memory-augmented-neural-networks-for-retrieval-and-continual-learning

Special thanks to @samar_a_khanna and @rohunagrawal for helpful discussions on this topic and reviewing early drafts

13h32864

Audrey 🎹@audreypianist

@rohan_virani 👀

13h1042

Kyriakos@Kyriakos_Pelek

@rohan_virani Finally a decent take on long term memory for agents

5h323

Kelsey Bred@KelseyMich

@rohan_virani eliminating the tool-call tax is music to my ears. great post Rohan

10h501

Rugbist@rugbist_

@sarahcat21 ngl the distinction between continual learning and RL optimization is the part most miss

frames everything differently

Original post

Sarah Catanzaro@sarahcat21#603inTech

Rohan should have titled this “A trip down memory lane”- but he can be forgiven.

Lots of folks conflating continual learning with RL optimization; the latter may be necessary but likely insufficient.

Here he reviews the different veins of historical and current research that can together enable continually learning and capable agents.

Rohan Virani@rohan_virani

5:13 PM · Jun 9, 2026 · 5.4K Views