We are taking a big step towards scaling LLMs that can unlearn on demand. Cleanly deleting data from LLMs has proven impossible: training entangles every source in shared weights. NULLs (Natively Unlearnable LLMs) escapes this, keeping millions of sources individually deletable in a 1B-parameter model trained on web data. (1/8)
CMU's Aditi Raghunathan introduces NULLs to enable scalable, on-demand unlearning of training data in LLMs
The architecture was tested on a 1-billion-parameter model.
Users are praising NULLs for enabling scalable on-demand unlearning in large language models because it offers a cool follow-up to prior memorization sinks research.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Perfect unlearning asks for two incompatible things: share knowledge across data, but keep each source separable enough to delete.
NULLs offer a surprisingly simple and elegant way to solve this. Give each source a few neurons that only fire on that source, and the learning dynamics do the rest: source-specific content naturally gets trapped there.
The idea is conceptually simple, but the exciting part is that it scales! It delivers unlearning that robustly matches oracle retraining.
Check out Gaurav’s excellent thread below.
We are taking a big step towards scaling LLMs that can unlearn on demand. Cleanly deleting data from LLMs has proven impossible: training entangles every source in shared weights. NULLs (Natively Unlearnable LLMs) escapes this, keeping millions of sources individually deletable in a 1B-parameter model trained on web data. (1/8)
More sinks! Great line of work
We are taking a big step towards scaling LLMs that can unlearn on demand. Cleanly deleting data from LLMs has proven impossible: training entangles every source in shared weights. NULLs (Natively Unlearnable LLMs) escapes this, keeping millions of sources individually deletable in a 1B-parameter model trained on web data. (1/8)

@psidharth567 NULLs localizes information to a specific mask over the sink neurons, which allows control, but doesn't require scaling experts per document!

Check out our paper for more results and analysis! Huge thanks to my coauthors @pratyushmaini and @AdtRaghunathan (7/8) Paper: https://arxiv.org/abs/2606.13873 Code: https://github.com/AR-FORUM/NULLS

Some excellent related work on isolating and controlling information in model parameters: • Selective Gradient Masking (Shilov et al.) https://alignment.anthropic.com/2025/selective-gradient-masking/ CC @_igorshilov @cloud_kx • Pre-training Limited Memory Language Models with Internal and External Knowledge (Zhao et al.) https://arxiv.org/abs/2505.15962 CC @linxizhao4 @yoavartzi • Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge (Pouransari et al.) https://arxiv.org/abs/2510.02375 CC @HPouransari (8/8)

NULLs lets us keep all ~6M Wikipedia articles individually deletable, despite heavy topical overlap between them. Unlearning one matches gold-standard retraining: it removes article-specific facts while keeping facts mentioned in or inferable from other articles. (3/8)

How does it work? In each MLP layer, NULLs splits neurons into two groups. A shared backbone learns what's common across sources. Sink neurons are sparsely activated, each source lights up a subset. Unlearning a source = disabling its sinks at inference. One line of code. (2/8)

NULLs is robust: On Harry Potter unlearning, NULLs with its sinks off resists adversarial fine-tuning, relearning the deleted content at the same rate as the retrain (never saw Harry Potter) model. Standard post-hoc unlearning (NPO) is undone in ~10 steps. (4/8)

The effect is also visible in generations. Activate the Harry Potter sink and the model continues with series entities like Hogwarts, Dudley, and Madame Maxime. Disable it and the output stays fluent but Harry Potter free. (5/8)

Why does this work? Shared information is reinforced in the always-on backbone, while information unique to one source faces less interference in its sink neurons and concentrates there. Nothing labels what's source-specific; the model sorts its own knowledge as it trains. (6/8)

@gaurav_ghosal Why can't you use a simple MoE (with shared experts), route through a particular set of experts at each layer during training on the target documents and remove those experts at inference time.

@gaurav_ghosal You would still have shared experts retained during inference. This would also perform "joint learning", which seems to be your primary selling point. "Shared information is reinforced in the always-on backbone."

@psidharth567 This is a great question! The key is that you don't know what the target (unlearning) documents will be during pre-training so you can't necessarily route them to a known expert. You would have to have a separate expert per document in your corpus.

@gaurav_ghosal nice!

@gaurav_ghosal Very cool! Nice follow-up to the memorization sinks work :)

@psidharth567 In more coarse-grained settings, (where you want to unlearn a corpus or domain), separating by experts has also been promising.