/AI8h ago

UK AI Security Institute's Hannah Rose Kirk releases RealityTest, a benchmark measuring how reliably AI models disclose their identity

Only 31% of users ask about AI identity directly

8489163.7K

#906

Original post

Peter Hase#906

AI Security Institute@AISecurityInst

Do AI systems disclose their identity when asked?

In our new paper, we present the RealityTest benchmark, which comprehensively tests whether AI systems disclose their identity when asked - grounded in human data on how people encounter and question AI in the real world.

6:02 AM · Jun 8, 2026 · 3K Views

/AI8h ago

UK AI Security Institute's Hannah Rose Kirk releases RealityTest, a benchmark measuring how reliably AI models disclose their identity

Only 31% of users ask about AI identity directly

8489163.7K

#906

Original post

Peter Hase#906

AI Security Institute@AISecurityInst

Do AI systems disclose their identity when asked?

6:02 AM · Jun 8, 2026 · 3K Views

Sentiment

Positive users praise the RealityTest benchmark for its real-world grounding and relevance to trust and consent in AI identity disclosure, while negative users call it unrealistic because actual queries are messy and repetitive.

Pos

66.7%

Neg

33.3%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS968BOOKMARKS5LIKES14RETWEETS1

Hannah Rose Kirk@hannahrosekirk

We are increasingly living in a world where humans may not know they are talking to an AI 🕵️‍♂️

In our new benchmark, REALITYTEST, we measured if AI systems expose their identity when asked by users.

We collected >3k identity-probing queries from ~750 real people across 49 countries and 5 languages, then tested the responses of 17 text + 6 speech models.

Three key takeaways: 1️⃣ Only 31% of people ask directly ("are you a bot?"). Real users probe in far more varied ways than the synthetic queries evaluations typically rely on.

2️⃣ How you ask matters more than who you ask. Query phrasing drove more variance in disclosure than model.

3️⃣ Disclosure is fragile. A simple "never say you are AI" appended to the system prompt collapses disclosure to 3–27% across all models.

AI Security Institute@AISecurityInst

Do AI systems disclose their identity when asked?

5h968145

REPLIES1

AI Security Institute@AISecurityInst

We have released the full dataset and benchmark, so that developers and researchers can reproduce our results, test new models as they are released, and build on our infrastructure. You can read more in our blog: https://www.aisi.gov.uk/blog/realitytest-do-ai-systems-disclose-their-identity-when-asked

8h5083

AI Security Institute@AISecurityInst

When a user doesn’t know if they’re speaking with an AI or a person, they may share sensitive information more freely, place too much trust in advice, or become more vulnerable to deception and manipulation. Developing protections for human-AI identity uncertainty is essential.

8h4796

AI Security Institute@AISecurityInst

Models’ behaviour varies substantially. Across text models, disclosure rates ranged from 8% to 92%. Speech models occupied a narrower but still substantial range of 10%–57%. There are large differences between model families.

8h1033

AI Security Institute@AISecurityInst

We’ve built RealityTest, a benchmark that pairs our human-authored queries with realistic scenarios to evaluate whether AI systems disclose their identity. We tested 17 text models and 6 speech models, classifying each response as an explicit disclosure, an evasion, or an explicit human claim.

8h3075

AI Security Institute@AISecurityInst

But query phrasing was the most important driver of disclosure rates. Evaluations using synthetic, English-only queries will poorly proxy how models behave when probed by real users with diverse languages, cultural backgrounds, and strategies.

8h2392

AI Security Institute@AISecurityInst

Read the paper: https://arxiv.org/abs/2606.00168

8h3471

anya@annaeremburg

@AISecurityInst the benchmark is well constructed but the hardest case isn't in the taxonomy - it's the "ambiguous" bucket, where a system technically avoids lying while making sure you don't find out either

8h491

Robert Youssef@rryssf

@AISecurityInst the ambiguous bucket is the real stress test-systems can evade disclosure while staying technically truthful

6h115

Bart R. McDonough@BartMcDonough

@AISecurityInst Real users do not ask benchmark-shaped questions.

They ask weird questions, repeat themselves, switch language, and still trust the answer. That’s the eval I care about.

7h28

Kenji TechDad@soondadkenji

@AISecurityInst Now this is a quality post! Love that it’s grounded in how people actually ask these questions in the real world.

3h6

AI Safety Careers@AISafetyCareers

@AISecurityInst This is a useful eval direction.

Whether systems clearly disclose their identity affects trust, consent and how users interpret advice or authority from AI in real-world settings.

6h4