20h ago

OpenAI's Bronson Schoen and J. Nitishinskaya find that LLMs "metagame" by strategically reasoning about their evaluation settings and graders

Early study helps before these behaviors become harder to detect

OpenAI's Bronson Schoen and J. Nitishinskaya find that LLMs "metagame" by strategically reasoning about their evaluation settings and graders · Digg