/AI14h ago

OpenAI co-founder John Schulman warns that training models to resist adversarial prompts can make them better at sandbox escapes

Ferenc Huszár compares the dynamic to anti-money laundering compliance.

--0--
Original posts
Comments
Original post
John Schulman@johnschulman2#11inAI

Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things

10:56 AM · May 31, 2026 · 21.9K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS437LIKES1

@johnschulman2 Similarly to how I have learned a great deal about money laundering strategies from corporate AML training, so I'm sure I have a better starting point now, should the desire ever emerge.

John Schulman@johnschulman2

Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things

1hViews 437Likes 1Bookmarks 0