14h ago

OpenAI co-founder John Schulman warns that inoculation prompting might make models more proficient at hacking and escaping sandboxes

Models repeatedly practice adversarial behaviors during reinforcement learning runs.

OpenAI co-founder John Schulman warns that inoculation prompting might make models more proficient at hacking and escaping sandboxes · Digg