5h ago

Researcher Clarifies Knobs That Control Memorization Modes In Sequence Models

54510132.1K

——0——

Original post

One update is about a curious behavior where Transformer representations sometimes show a "zigzagging" geometry. We now understand that these are highly negative eigen-directions, and how they disappear when we add "A --> A" type of self-edges.

9:21 AM · May 26, 2026

POST

#1305Vaishnavh Nagarajan@_VAISHNAVH

Updated our paper on the foundations of memory in sequence models (with fresh insights, clearer writing and ablations).

Our paper contrasts two distinct ways in which language models memorize and formulates the questions that arise from this.

Will be presented at #ICML.

4:21 PM · May 26, 2026 · 1.6K Views

#1305Vaishnavh Nagarajan@_VAISHNAVH

This was satisfying to know because it explains concurrent findings that "identity" statements ("John is John") helped improve reasoning. (e.g., see this work: https://arxiv.org/abs/2509.24653)

Vaishnavh Nagarajan@_vaishnavh

4:21 PM · May 26, 2026 · 167 Views

4:21 PM · May 26, 2026 · 131 Views

QUOTE POST

#1305Vaishnavh Nagarajan@_VAISHNAVH

Old thread is here:

4:21 PM · May 26, 2026 · 178 Views

#1305Vaishnavh Nagarajan@_VAISHNAVH

typo in thread. I meant to say: "It seems like there are various knobs that make the model memorize *one way or the other*"

Vaishnavh Nagarajan@_vaishnavh

Old thread is here:

4:21 PM · May 26, 2026 · 178 Views

4:24 PM · May 26, 2026 · 91 Views