I looked up a researcher's email today. Asked Claude, and it refused! It said it is private information. The thing is that the email is in the researcher's own papers. He printed it there as his contact address so people could reach him. That's what it's for. Two years ago, models just answered this. Now there's a guardrail treating a public contact detail like a leaked phone number. I understand the worry about spam. But finding one researcher to email is not that, and I would expect that at this point (so close to agi!!!) the model would know the difference. I think it's a major problem with models, and we saw it in the latest Mythos release when Anthropic banned so many topics due to false positives. We need to have a public discussion about safety and privacy, and what the level of errors that we are aiming for is
Ravid Shwartz Ziv, NYU assistant professor, warns LLM privacy guardrails overreach by blocking public academic email addresses
Copilot guardrails were easily bypassed using basic batch queries
Many users criticized Claude for refusing a legitimate public researcher email query over privacy guardrails, calling the false positives and inconsistent over-refusals a usability failure.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@ziv_ravid Awhile back, I asked Microsoft copilot a similar query with the same result. But then I created a list of names and asked it to look up the email addresses for each person, and it did this willingly. Guardrail failure, I guess. (Of course, there were not all correct...)
I looked up a researcher's email today. Asked Claude, and it refused! It said it is private information. The thing is that the email is in the researcher's own papers. He printed it there as his contact address so people could reach him. That's what it's for. Two years ago, models just answered this. Now there's a guardrail treating a public contact detail like a leaked phone number. I understand the worry about spam. But finding one researcher to email is not that, and I would expect that at this point (so close to agi!!!) the model would know the difference. I think it's a major problem with models, and we saw it in the latest Mythos release when Anthropic banned so many topics due to false positives. We need to have a public discussion about safety and privacy, and what the level of errors that we are aiming for is

@ziv_ravid just wrap some chinese open model as a tool and ask claude to use it.

@ziv_ravid Agree. False positives on safety guardrails are a usability blocker. Responsible AI ≠ reflexively refusing legitimate queries. Guardrail precision matters as much as recall. Over-refusal is itself a failure mode.