/Tech4h ago

CMU's Aditi Raghunathan introduces NULLs to enable scalable, on-demand unlearning of training data in LLMs

The architecture was tested on a 1-billion-parameter model.

982124614.1K

#169

Original post

Gaurav Ghosal@gaurav_ghosal

We are taking a big step towards scaling LLMs that can unlearn on demand. Cleanly deleting data from LLMs has proven impossible: training entangles every source in shared weights. NULLs (Natively Unlearnable LLMs) escapes this, keeping millions of sources individually deletable in a 1B-parameter model trained on web data. (1/8)

9:29 AM · Jun 17, 2026 · 10.9K Views

Sentiment

Users are praising NULLs for enabling scalable on-demand unlearning in large language models because it offers a cool follow-up to prior memorization sinks research.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS2.6KBOOKMARKS14LIKES16

Aditi Raghunathan@AdtRaghunathan

Perfect unlearning asks for two incompatible things: share knowledge across data, but keep each source separable enough to delete.

NULLs offer a surprisingly simple and elegant way to solve this. Give each source a few neurons that only fire on that source, and the learning dynamics do the rest: source-specific content naturally gets trapped there.

The idea is conceptually simple, but the exciting part is that it scales! It delivers unlearning that robustly matches oracle retraining.

Check out Gaurav’s excellent thread below.

Gaurav Ghosal@gaurav_ghosal

2h2.6K1614

RETWEETS3

Yoav Artzi@yoavartzi

More sinks! Great line of work

Gaurav Ghosal@gaurav_ghosal

2h1.2K82

REPLIES1

Gaurav Ghosal@gaurav_ghosal

@psidharth567 NULLs localizes information to a specific mask over the sink neurons, which allows control, but doesn't require scaling experts per document!

2h15

Gaurav Ghosal@gaurav_ghosal

Check out our paper for more results and analysis! Huge thanks to my coauthors @pratyushmaini and @AdtRaghunathan (7/8) Paper: https://arxiv.org/abs/2606.13873 Code: https://github.com/AR-FORUM/NULLS

4h21781

Gaurav Ghosal@gaurav_ghosal

Some excellent related work on isolating and controlling information in model parameters: • Selective Gradient Masking (Shilov et al.) https://alignment.anthropic.com/2025/selective-gradient-masking/ CC @_igorshilov @cloud_kx • Pre-training Limited Memory Language Models with Internal and External Knowledge (Zhao et al.) https://arxiv.org/abs/2505.15962 CC @linxizhao4 @yoavartzi • Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge (Pouransari et al.) https://arxiv.org/abs/2510.02375 CC @HPouransari (8/8)

4h17481

Gaurav Ghosal@gaurav_ghosal

NULLs lets us keep all ~6M Wikipedia articles individually deletable, despite heavy topical overlap between them. Unlearning one matches gold-standard retraining: it removes article-specific facts while keeping facts mentioned in or inferable from other articles. (3/8)

4h2178

Gaurav Ghosal@gaurav_ghosal

How does it work? In each MLP layer, NULLs splits neurons into two groups. A shared backbone learns what's common across sources. Sink neurons are sparsely activated, each source lights up a subset. Unlearning a source = disabling its sinks at inference. One line of code. (2/8)

4h3747

Gaurav Ghosal@gaurav_ghosal

NULLs is robust: On Harry Potter unlearning, NULLs with its sinks off resists adversarial fine-tuning, relearning the deleted content at the same rate as the retrain (never saw Harry Potter) model. Standard post-hoc unlearning (NPO) is undone in ~10 steps. (4/8)

4h1957

Gaurav Ghosal@gaurav_ghosal

The effect is also visible in generations. Activate the Harry Potter sink and the model continues with series entities like Hogwarts, Dudley, and Madame Maxime. Disable it and the output stays fluent but Harry Potter free. (5/8)

4h1647

Gaurav Ghosal@gaurav_ghosal

Why does this work? Shared information is reinforced in the always-on backbone, while information unique to one source faces less interference in its sink neurons and concentrates there. Nothing labels what's source-specific; the model sorts its own knowledge as it trains. (6/8)

4h1597

Sidharth Pulipaka@psidharth567

@gaurav_ghosal Why can't you use a simple MoE (with shared experts), route through a particular set of experts at each layer during training on the target documents and remove those experts at inference time.

2h771

Sidharth Pulipaka@psidharth567

@gaurav_ghosal You would still have shared experts retained during inference. This would also perform "joint learning", which seems to be your primary selling point. "Shared information is reinforced in the always-on backbone."

2h681

Gaurav Ghosal@gaurav_ghosal

@psidharth567 This is a great question! The key is that you don't know what the target (unlearning) documents will be during pre-training so you can't necessarily route them to a known expert. You would have to have a separate expert per document in your corpus.

2h20

Veeraraju Elluru@VeerarajuE

@gaurav_ghosal nice!

3h1591

Aflah 🍉🕊️@Aflah02101

@gaurav_ghosal Very cool! Nice follow-up to the memorization sinks work :)

2h461

Gaurav Ghosal@gaurav_ghosal

@psidharth567 In more coarse-grained settings, (where you want to unlearn a corpus or domain), separating by experts has also been promising.

2h14