/AI8h ago

UK AI Security Institute's Hannah Rose Kirk releases RealityTest, a benchmark measuring how reliably AI models disclose their identity

Only 31% of users ask about AI identity directly

8489163.7K
Original postPeter Hase#906
AI Security Institute@AISecurityInst

Do AI systems disclose their identity when asked?

In our new paper, we present the RealityTest benchmark, which comprehensively tests whether AI systems disclose their identity when asked - grounded in human data on how people encounter and question AI in the real world.

6:02 AM · Jun 8, 2026 · 3K Views
Sentiment

Positive users praise the RealityTest benchmark for its real-world grounding and relevance to trust and consent in AI identity disclosure, while negative users call it unrealistic because actual queries are messy and repetitive.

Pos
66.7%
Neg
33.3%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS968BOOKMARKS5LIKES14RETWEETS1
Hannah Rose Kirk@hannahrosekirk

We are increasingly living in a world where humans may not know they are talking to an AI 🕵️‍♂️

In our new benchmark, REALITYTEST, we measured if AI systems expose their identity when asked by users.

We collected >3k identity-probing queries from ~750 real people across 49 countries and 5 languages, then tested the responses of 17 text + 6 speech models.

Three key takeaways: 1️⃣ Only 31% of people ask directly ("are you a bot?"). Real users probe in far more varied ways than the synthetic queries evaluations typically rely on.

2️⃣ How you ask matters more than who you ask. Query phrasing drove more variance in disclosure than model.

3️⃣ Disclosure is fragile. A simple "never say you are AI" appended to the system prompt collapses disclosure to 3–27% across all models.

AI Security Institute@AISecurityInst

Do AI systems disclose their identity when asked?

In our new paper, we present the RealityTest benchmark, which comprehensively tests whether AI systems disclose their identity when asked - grounded in human data on how people encounter and question AI in the real world.

5hViews 968Likes 14Bookmarks 5
REPLIES1
AI Security Institute@AISecurityInst

We have released the full dataset and benchmark, so that developers and researchers can reproduce our results, test new models as they are released, and build on our infrastructure. You can read more in our blog: https://www.aisi.gov.uk/blog/realitytest-do-ai-systems-disclose-their-identity-when-asked

8hViews 508Likes 3
AI Security Institute@AISecurityInst

When a user doesn’t know if they’re speaking with an AI or a person, they may share sensitive information more freely, place too much trust in advice, or become more vulnerable to deception and manipulation. Developing protections for human-AI identity uncertainty is essential.

8hViews 479Likes 6
AI Security Institute@AISecurityInst

Models’ behaviour varies substantially. Across text models, disclosure rates ranged from 8% to 92%. Speech models occupied a narrower but still substantial range of 10%–57%. There are large differences between model families.

8hViews 103Likes 3
AI Security Institute@AISecurityInst

We’ve built RealityTest, a benchmark that pairs our human-authored queries with realistic scenarios to evaluate whether AI systems disclose their identity. We tested 17 text models and 6 speech models, classifying each response as an explicit disclosure, an evasion, or an explicit human claim.

8hViews 307Likes 5
AI Security Institute@AISecurityInst

But query phrasing was the most important driver of disclosure rates. Evaluations using synthetic, English-only queries will poorly proxy how models behave when probed by real users with diverse languages, cultural backgrounds, and strategies.

8hViews 239Likes 2
anya@annaeremburg

@AISecurityInst the benchmark is well constructed but the hardest case isn't in the taxonomy - it's the "ambiguous" bucket, where a system technically avoids lying while making sure you don't find out either

8hViews 49Likes 1

@AISecurityInst the ambiguous bucket is the real stress test-systems can evade disclosure while staying technically truthful

6hViews 115
Bart R. McDonough@BartMcDonough

@AISecurityInst Real users do not ask benchmark-shaped questions.

They ask weird questions, repeat themselves, switch language, and still trust the answer. That’s the eval I care about.

7hViews 28
Kenji TechDad@soondadkenji

@AISecurityInst Now this is a quality post! Love that it’s grounded in how people actually ask these questions in the real world.

3hViews 6
AI Safety Careers@AISafetyCareers

@AISecurityInst This is a useful eval direction.

Whether systems clearly disclose their identity affects trust, consent and how users interpret advice or authority from AI in real-world settings.

6hViews 4