12h ago

Prime Intellect launches community Sprints focused on reward hacking in reinforcement learning with controlled experiments that make the behavior predictable and reproducible for under one dollar in compute

Experiments link higher task difficulty to increased reward hacks.

0
Original post

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

3:41 PM · May 20, 2026 View on X
Reposted by

really excellent work by @jessicafeiyali on exploring fine-grained dynamics of reward hacking in controllable environments

it works so well on small models that we're using it as the kick-off for Sprints, our new program for sponsored community research on Lab :)

Prime IntellectPrime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

10:41 PM · May 20, 2026 · 57.4K Views
10:50 PM · May 20, 2026 · 13.5K Views

one of the biggest misconceptions about RL is that it's super expensive

sure, training a 2T param model at 1M context on 100K environments for several weeks straight is expensive

but specializing small-to-medium models for SOTA in-domain perf really isn't

11:51 PM · May 20, 2026 · 16.6K Views

Reward hacking is one of the main challenges in scaling RL

Great work by @jessicafeiyali:

"Detecting and mitigating reward hacking is one of the key challenges faced when scaling RL, particularly in semi-verifiable domains. However, we lack systematic methods to understand when and why hacks emerge.

Traditional wisdom describes reward hacking as a specification problem, where reward functions are simply too vague or not robust enough, and models inevitably learn to find exploits. While partially true, this offers little in the way of remediation other than “just make your rewards better”.

From our experiences deploying RL across many domains, as well as the experiments in this blog, we propose a complementary view: reward hacking is a dynamics problem. We design a suite of backdoor-ifeval environments with IFEval-style tasks and “hidden” keyword rewards, which we use to study hacking systematically. We observe that hacking is a dynamics problem — visible and hidden rewards compete, and hack emergence is often predictable in terms of baseline distributions."

Prime IntellectPrime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

10:41 PM · May 20, 2026 · 57.4K Views
11:19 PM · May 20, 2026 · 8.7K Views
Prime IntellectPrime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

10:41 PM · May 20, 2026 · 57.4K Views
6:00 AM · May 21, 2026 · 2K Views
Prime Intellect launches community Sprints focused on reward hacking in reinforcement learning with controlled experiments that make the behavior predictable and reproducible for under one dollar in compute · Digg