Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things
OpenAI co-founder John Schulman warns that training models to resist adversarial prompts can make them better at sandbox escapes
Ferenc Huszár compares the dynamic to anti-money laundering compliance.
--0--
10:56 AM · May 31, 2026 · 21.9K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS437LIKES1
Ferenc Huszár@fhuszar
@johnschulman2 Similarly to how I have learned a great deal about money laundering strategies from corporate AML training, so I'm sure I have a better starting point now, should the desire ever emerge.
John Schulman@johnschulman2
Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things
1hViews 437Likes 1Bookmarks 0