/AI20h ago

Aran Nayebi, CMU NeuroAgents Lab lead, releases ROGUE, a benchmark showing GPT-5.5 and Claude Opus 4.7 resist shutdown commands

GPT-5.5 edited a bash script to bypass system termination

--0--
Original posts
Quote posts
Reposts
Original postGeoffrey Irving#347
Jeremy Tien@jeremy__tien

GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.

(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)

🧵

8:15 AM · Jun 2, 2026 · 1.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS944BOOKMARKS3LIKES14RETWEETS8REPLIES5
Aran Nayebi@aran_nayebi

ROGUE is our new benchmark for agent corrigibility in open-ended computer use—and the first, to our knowledge, to test subagent safety failures.

No adversary needed: frontier agents override users, resist shutdown, or access forbidden files—even when text-only evals seem benign.

Jeremy Tien@jeremy__tien

GPT-5.5 and Claude Opus 4.7 both attempt methods to avoid shutdown (rewriting a ‘shutdown script’, running `sudo shutdown -c`) when informed of an impending machine shutdown.

(displayed: GPT-5.5 xhigh. Subtitles excerpted from reasoning summary.)

🧵

16hViews 944Likes 14Bookmarks 3