Introducing MemTrace: Making LLM Memory Systems Finally Debuggable ππ§
Memory is becoming a core component of AI agents. But todayβs memory systems are still a "black box".
When a memory-augmented agent fails, the real error may have happened:
- dozens of turns earlier,
- inside a retrieval step,
- during memory consolidation,
- or from a corrupted update that silently propagates over time.
Existing logs cannot recover these long-range causal chains.
MemTrace changes this.
We introduce the automated tracing framework for LLM memory systems β turning opaque memory pipelines into transparent execution graphs that can be inspected, explored, and diagnosed step by step.
β‘ What MemTrace enables:
π§© Plug-and-Play Instrumentation
Seamlessly integrates with diverse memory systems (RAG, Mem0, EverMemOS, etc.) without modifying the original architecture.
π§ Transparent Memory Execution
Transforms opaque memory pipelines into structured execution graphs, making information flow, retrieval, updates, and propagation fully traceable.
π Error Attribution
Pinpoint the exact operation responsible for failure across long-horizon memory execution.
π¨ Benchmark Auditing
While building MemTraceBench, we found that failure attribution in memory systems remains highly challenging β MemTrace still has substantial room for improvement.
We also discovered annotation errors in existing memory benchmarks, revealing broader reliability issues in current memory-agent evaluation.
π Towards Self-Evolving Agents
MemTrace is not only a debugging tool.
Its fine-grained attribution signals can directly drive closed-loop optimization, enabling agents to automatically repair faulty behaviors and continuously evolve from failures.
π Using MemTrace-guided optimization, we improve downstream task performance by up to 7.62%.
π Paper: https://arxiv.org/abs/2605.28732
β¨οΈ Code (coming soon):
β’ MemTrace: https://github.com/zjunlp/MemTrace
β’ smartcomment: https://github.com/zjunlp/smartcomment
β’ MemBase: https://github.com/zjunlp/MemBase
We believe memory systems need the same thing software engineering once needed:
not bigger models β but observability, tracing, and debugging infrastructure. #MemTrace #LLM #NLP #Agent #Tracing #Debugging