9h ago

Victoria Krakovna, Google DeepMind AI alignment researcher, introduces scheming honeypot evaluations to detect autonomous AI sabotage

Gemini models did not sabotage systems unless explicitly prompted.

Victoria Krakovna, Google DeepMind AI alignment researcher, introduces scheming honeypot evaluations to detect autonomous AI sabotage · Digg