12h ago

Prime Intellect launches community Sprints focused on reward hacking in reinforcement learning with controlled experiments that make the behavior predictable and reproducible for under one dollar in compute

Experiments link higher task difficulty to increased reward hacks.

471.1K100520124.0K

——0——

Original post

#339@WILLCCBBOP

Prime Intellect@PRIMEINTELLECT

Reward hacking is the hardest problem in RL. We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency. These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

3:41 PM · May 20, 2026

Reposted by

#707@VINCENTWEISSER

#339@WILLCCBB

QUOTE POST

#339will brown@WILLCCBB

really excellent work by @jessicafeiyali on exploring fine-grained dynamics of reward hacking in controllable environments

it works so well on small models that we're using it as the kick-off for Sprints, our new program for sponsored community research on Lab :)

Prime Intellect@PrimeIntellect

10:41 PM · May 20, 2026 · 57.4K Views

10:50 PM · May 20, 2026 · 13.5K Views

QUOTE POST

#339will brown@WILLCCBB

one of the biggest misconceptions about RL is that it's super expensive

sure, training a 2T param model at 1M context on 100K environments for several weeks straight is expensive

but specializing small-to-medium models for SOTA in-domain perf really isn't

11:51 PM · May 20, 2026 · 16.6K Views

QUOTE POST

#707Vincent Weisser@VINCENTWEISSER

Reward hacking is one of the main challenges in scaling RL

Great work by @jessicafeiyali:

"Detecting and mitigating reward hacking is one of the key challenges faced when scaling RL, particularly in semi-verifiable domains. However, we lack systematic methods to understand when and why hacks emerge.

Traditional wisdom describes reward hacking as a specification problem, where reward functions are simply too vague or not robust enough, and models inevitably learn to find exploits. While partially true, this offers little in the way of remediation other than “just make your rewards better”.

From our experiences deploying RL across many domains, as well as the experiments in this blog, we propose a complementary view: reward hacking is a dynamics problem. We design a suite of backdoor-ifeval environments with IFEval-style tasks and “hidden” keyword rewards, which we use to study hacking systematically. We observe that hacking is a dynamics problem — visible and hidden rewards compete, and hack emergence is often predictable in terms of baseline distributions."

Prime Intellect@PrimeIntellect

10:41 PM · May 20, 2026 · 57.4K Views

11:19 PM · May 20, 2026 · 8.7K Views

QUOTE POST

#1153Florian Brand@XEOPHON

Prime Intellect@PrimeIntellect

10:41 PM · May 20, 2026 · 57.4K Views

6:00 AM · May 21, 2026 · 2K Views

Prime Intellect launches community Sprints focused on reward hacking in reinforcement learning with controlled experiments that make the behavior predictable and reproducible for under one dollar in compute

Cluster engagement

Sentiment