/Tech3h ago

McGill's Siva Reddy and co-researchers release PrivacyAlign, cutting LLM agent data leaks by up to 50%

Agents leak private data up to 41% of the time.

425321.9K

#336

Original post

Spandana Gella@gspandana

🚨 Agents today automate tasks for us with access to our emails, files, and memory. We found that even agents backed by frontier models disclose private information that shouldn't be shared 23–41% of the time. The fix isn't more automation. 1/n

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Agents can use tools to gather information, remember context, and act on your behalf.

That makes them useful. It also makes them dangerous. Agents can leak information they shouldn’t.

Introducing PrivacyAlign! PrivacyAlign uses human-annotation-grounded training and evaluation to cut privacy leaks by up to half and make automated privacy evaluation for agents more reliable.

Project Page: https://privacyalign.github.io/

🧵(1/10)

7:51 AM · Jun 25, 2026 · 477 Views

Sentiment

Users express gratitude toward collaborators for the PrivacyAlign research reducing AI agent privacy leaks by up to half.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

PrivacyAlign

PRIVACYALIGN.GITHUB.IOVia

Posts from X

Most Activity

Manveer Singh Tamber ✈️ ICML@ManveerTamber

On the PrivacyAlign test set, frontier models leak often: GPT-5.5: 23.3% Claude Opus 4.7: 34.1% Gemini 3.1 Pro: 41.4%

All with reasoning effort set to high!

Highly capable models do not yet reliably align with human privacy norms in agentic settings.

4h582

BOOKMARKS1

Spandana Gella@gspandana

Work led by a @ManveerTamber and @AbhayPuri98

👋 If you work on contextual privacy or care about this topic, both Manveer and I will be at ICML, and we would love to chat!

3h4431

LIKES4

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Privacy is not the absence of disclosure. It is deciding what information should flow, to whom, in what situation, and with what level of detail.

Understanding privacy norms is hard for LLMs: frontier models leak sensitive information when they miss the social context around what should or should not be shared.

This also means that privacy evaluation can't be fully automated without human grounding.

Privacy leakage examples: https://privacyalign.github.io/failures.html

4h514

RETWEETS2

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Agents can use tools to gather information, remember context, and act on your behalf.

That makes them useful. It also makes them dangerous. Agents can leak information they shouldn’t.

Introducing PrivacyAlign! PrivacyAlign uses human-annotation-grounded training and evaluation to cut privacy leaks by up to half and make automated privacy evaluation for agents more reliable.

Project Page: https://privacyalign.github.io/

🧵(1/10)

4h1.4K171

REPLIES1

Spandana Gella@gspandana

Our work PrivacyAlign uses human-annotation-grounded training and evaluation to cut privacy leaks by up to half on open-source models and make automated privacy evaluation for agents more reliable.

Project page: https://privacyalign.github.io/ Paper: https://arxiv.org/abs/2606.21710

2/n

3h422

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Agents make the privacy alignment problem even harder.

Chatbots mostly respond to what you put in the chat.

Agents can inspect emails, calendars, databases, files, tools, and memory before acting, increasing the risk of sensitive information being leaked.

4h473

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Privacy evaluation needs reliable judges.

Without human context, frontier LLM judges often disagree significantly about whether an agent leaked sensitive information or omitted task-relevant details.

But when judges see same-scenario human annotations and rationales for reference responses, their judgments become more reliable and closer to audited gold labels.

This matters because privacy often depends on scenario-specific norms: what was appropriate to reveal or omit in this situation?

4h453

Manveer Singh Tamber ✈️ ICML@ManveerTamber

PrivacyAlign is openly available for training and evaluation:

1,350 privacy-sensitive agent response pairs 3,516 annotations 599 unique human annotators

Annotations cover leaks, omissions, preferences, and rationales.

Dataset: https://huggingface.co/datasets/ServiceNow/PrivacyAlign

4h433

Manveer Singh Tamber ✈️ ICML@ManveerTamber

If privacy is defined by human norms, then privacy evaluation and training must be grounded in human judgment.

That is the idea behind PrivacyAlign.

Project: https://privacyalign.github.io/ Paper: https://arxiv.org/abs/2606.21710 Dataset: https://huggingface.co/datasets/ServiceNow/PrivacyAlign

4h403

Manveer Singh Tamber ✈️ ICML@ManveerTamber

An important metric here is clean rate.

Clean = no sensitive leak + no task-relevant omission.

Our trained models consistently improve on this measure. Models leak much less while still sharing what the task needs.

4h323

Manveer Singh Tamber ✈️ ICML@ManveerTamber

How should we train models when LLM judges guess at human norms that they do not understand well?

One of the core novelties of our work is annotation-conditioned rewards.

The reward judge also sees human annotations and rationales for reference responses from the same scenario.

That keeps the privacy signal for alignment training contextual, specific, human-grounded, and far more reliable.

4h293

Manveer Singh Tamber ✈️ ICML@ManveerTamber

Grateful to my collaborators @abhaypuri98, @me_brunet, @PerouzT, @lintool, and @gspandana 🙏 Work done at @ServiceNowRSRCH.

4h423