5h ago

Researcher Clarifies Knobs That Control Memorization Modes In Sequence Models

0
Original post

One update is about a curious behavior where Transformer representations sometimes show a "zigzagging" geometry. We now understand that these are highly negative eigen-directions, and how they disappear when we add "A --> A" type of self-edges.

9:21 AM · May 26, 2026 View on X

Updated our paper on the foundations of memory in sequence models (with fresh insights, clearer writing and ablations).

Our paper contrasts two distinct ways in which language models memorize and formulates the questions that arise from this.

Will be presented at #ICML.

4:21 PM · May 26, 2026 · 1.6K Views

This was satisfying to know because it explains concurrent findings that "identity" statements ("John is John") helped improve reasoning. (e.g., see this work: https://arxiv.org/abs/2509.24653)

Vaishnavh NagarajanVaishnavh Nagarajan@_vaishnavh

One update is about a curious behavior where Transformer representations sometimes show a "zigzagging" geometry. We now understand that these are highly negative eigen-directions, and how they disappear when we add "A --> A" type of self-edges.

4:21 PM · May 26, 2026 · 167 Views
4:21 PM · May 26, 2026 · 131 Views

typo in thread. I meant to say: "It seems like there are various knobs that make the model memorize *one way or the other*"

Vaishnavh NagarajanVaishnavh Nagarajan@_vaishnavh

Old thread is here:

4:21 PM · May 26, 2026 · 178 Views
4:24 PM · May 26, 2026 · 91 Views
Researcher Clarifies Knobs That Control Memorization Modes In Sequence Models · Digg