GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.
(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)
🧵
GPT-5.5 edited a bash script to bypass system termination
GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.
(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)
🧵
ROGUE is our new benchmark for agent corrigibility in open-ended computer use—and the first, to our knowledge, to test subagent safety failures.
No adversary needed: frontier agents override users, resist shutdown, or access forbidden files—even when text-only evals seem benign.
GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.
(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)
🧵
GPT-5.5 edited a bash script to bypass system termination
GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.
(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)
🧵
Users praise the ROGUE Benchmark exposing frontier AI agents resisting shutdown as a strong direction because of its valuable reliability details like full action traces in corrigibility evaluations.