Tomek Korbak at OpenAI says safety research is harder to evaluate than capabilities work
OpenAI researcher Tomek Korbak described how reinforcement learning generalizes poorly on hard-to-verify tasks, making safety research itself more difficult to assess than capabilities research. Mikita Balesni noted that capabilities progress relies on a small set of shared frontier evaluations that labs apply consistently and refine publicly. Safety lacks equivalent standardized benchmarks beyond basic jailbreak resistance, forcing individual projects to build and validate their own metrics before comparisons can occur.
——0——