/AI14h ago

OpenAI co-founder John Schulman warns that training models to resist adversarial prompts can make them better at sandbox escapes

Ferenc Huszár compares the dynamic to anti-money laundering compliance.

1525978122.2K

Original posts

#11

Comments

#144

Original post

John Schulman@johnschulman2#11inAI

Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things

10:56 AM · May 31, 2026 · 21.9K Views

/AI14h ago

OpenAI co-founder John Schulman warns that training models to resist adversarial prompts can make them better at sandbox escapes

Ferenc Huszár compares the dynamic to anti-money laundering compliance.

--0--

Original posts

#11

Comments

#144

Original post

John Schulman@johnschulman2#11inAI

Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things

10:56 AM · May 31, 2026 · 21.9K Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS437LIKES1

Ferenc Huszár@fhuszar

@johnschulman2 Similarly to how I have learned a great deal about money laundering strategies from corporate AML training, so I'm sure I have a better starting point now, should the desire ever emerge.

John Schulman@johnschulman2

Would be funny if inoculation prompting results in models that are much better at sandbox escapes and other forms of hacking because they get to spend the whole RL run practicing these things

1h43710