1h ago

Simple AI Identifiers Require Owner Notifications to Trigger Safeguards

51715534

——0——

Original post

#1937Markus Anderljung@MANDERLJUNG

The thought is: - Owners of sensitive software systems add identifiers of their systems to a registry - Frontier companies have a classifier look for the identifier, and if so trigger safeguards (e.g. downgrade model, notify owner of the env), unless the account is whitelisted

10:50 AM · May 22, 2026

POST

#1937Markus Anderljung@MANDERLJUNG

Here's an idea for a new type of AI cyber safeguard: environment-based safeguards.

Basically: identify whether the model is operating in a sensitive software environment, and implement safeguards unless the connected account is whitelisted.

5:50 PM · May 22, 2026 · 376 Views

#1937Markus Anderljung@MANDERLJUNG

Instead, if you only use these very simple identifiers, you'd likely need to just activate safeguards like: notify the owner of the software environment.

Markus Anderljung@Manderljung

The simplest implementation would be to use signatures already present in the software, or add simple canary strings. However, this wouldn't work if it's obvious that safeguards were triggered (e.g. the model refuses). Adversaries could identify and remove the identifier.

5:50 PM · May 22, 2026 · 32 Views

5:50 PM · May 22, 2026 · 14 Views

#1937Markus Anderljung@MANDERLJUNG

The simplest implementation would be to use signatures already present in the software, or add simple canary strings.

However, this wouldn't work if it's obvious that safeguards were triggered (e.g. the model refuses). Adversaries could identify and remove the identifier.

Markus Anderljung@Manderljung

5:50 PM · May 22, 2026 · 36 Views

5:50 PM · May 22, 2026 · 32 Views

#1937Markus Anderljung@MANDERLJUNG

More in the post: https://markusanderljung.substack.com/p/environment-based-safeguards-for

Markus Anderljung@Manderljung

Would love cybersecurity researchers to look into it. I might be barking up the wrong tree!

5:50 PM · May 22, 2026 · 15 Views

5:50 PM · May 22, 2026 · 36 Views

#1937Markus Anderljung@MANDERLJUNG

Thanks for input from folks at GovAI, including @kamilelukosiute @mavroudisv

Markus Anderljung@Manderljung

More in the post: https://markusanderljung.substack.com/p/environment-based-safeguards-for

5:50 PM · May 22, 2026 · 36 Views

5:50 PM · May 22, 2026 · 40 Views

Simple AI Identifiers Require Owner Notifications to Trigger Safeguards

Sentiment

Cluster engagement