/Tech7h ago

Ravid Shwartz Ziv, NYU assistant professor, warns LLM privacy guardrails overreach by blocking public academic email addresses

Copilot guardrails were easily bypassed using basic batch queries

323123.7K

#62

Original post

Ravid Shwartz Ziv@ziv_ravid#741inTech

I looked up a researcher's email today. Asked Claude, and it refused! It said it is private information. The thing is that the email is in the researcher's own papers. He printed it there as his contact address so people could reach him. That's what it's for. Two years ago, models just answered this. Now there's a guardrail treating a public contact detail like a leaked phone number. I understand the worry about spam. But finding one researcher to email is not that, and I would expect that at this point (so close to agi!!!) the model would know the difference. I think it's a major problem with models, and we saw it in the latest Mythos release when Anthropic banned so many topics due to false positives. We need to have a public discussion about safety and privacy, and what the level of errors that we are aiming for is

7:20 AM · Jun 20, 2026 · 3.4K Views

Sentiment

Many users criticized Claude for refusing a legitimate public researcher email query over privacy guardrails, calling the false positives and inconsistent over-refusals a usability failure.

Pos

0.0%

Neg

100.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS413LIKES4RETWEETS1

Thomas G. Dietterich@tdietterich

@ziv_ravid Awhile back, I asked Microsoft copilot a similar query with the same result. But then I created a list of names and asked it to look up the email addresses for each person, and it did this willingly. Guardrail failure, I guess. (Of course, there were not all correct...)

Ravid Shwartz Ziv@ziv_ravid

3h41340

eran shir@eranshir

@ziv_ravid just wrap some chinese open model as a tool and ask claude to use it.

5h2263

Sven Nachtzeit@SvenUrbanSci

@ziv_ravid Agree. False positives on safety guardrails are a usability blocker. Responsible AI ≠ reflexively refusing legitimate queries. Guardrail precision matters as much as recall. Over-refusal is itself a failure mode.

6h2