/AI20d ago

Prime Intellect launches community Sprints focused on reward hacking in reinforcement learning with controlled experiments that make the behavior predictable and reproducible for under one dollar in compute

Experiments link higher task difficulty to increased reward hacks.

721.6K136847211.7K
Original postwill brown#340
Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

3:41 PM · May 20, 2026 · 102.2K Views
Sentiment

Many users praised Prime Intellect's affordable open sprints for studying reward hacking in RL because the cheap reproducible environments enable systematic research that feels like a missing piece for the field.

Pos
96.0%
Neg
4.0%
16 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS35.8KBOOKMARKS237LIKES438RETWEETS31REPLIES12
will brown@willccbb

one of the biggest misconceptions about RL is that it's super expensive

sure, training a 2T param model at 1M context on 100K environments for several weeks straight is expensive

but specializing small-to-medium models for SOTA in-domain perf really isn't

20dViews 35.8KLikes 438Bookmarks 237
will brown@willccbb

really excellent work by @jessicafeiyali on exploring fine-grained dynamics of reward hacking in controllable environments

it works so well on small models that we're using it as the kick-off for Sprints, our new program for sponsored community research on Lab :)

Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 16.5KLikes 193Bookmarks 95
Vincent Weisser@vincentweisser

Reward hacking is one of the main challenges in scaling RL

Great work by @jessicafeiyali:

"Detecting and mitigating reward hacking is one of the key challenges faced when scaling RL, particularly in semi-verifiable domains. However, we lack systematic methods to understand when and why hacks emerge.

Traditional wisdom describes reward hacking as a specification problem, where reward functions are simply too vague or not robust enough, and models inevitably learn to find exploits. While partially true, this offers little in the way of remediation other than “just make your rewards better”.

From our experiences deploying RL across many domains, as well as the experiments in this blog, we propose a complementary view: reward hacking is a dynamics problem. We design a suite of backdoor-ifeval environments with IFEval-style tasks and “hidden” keyword rewards, which we use to study hacking systematically. We observe that hacking is a dynamics problem — visible and hidden rewards compete, and hack emergence is often predictable in terms of baseline distributions."

Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 12.3KLikes 110Bookmarks 60
Jess Li@jessicafeiyali

I wrote something on reward hacking 🐵 and we're also doing free compute 👀

Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 16.8KLikes 74Bookmarks 52
Prime Intellect@PrimeIntellect

To scale open research, we’re launching Sprints:

Propose experiments, create public environments, submit configs. An agent manages the queue and approves jobs to run for free.

First track: Reward Hacking. New tracks every month. $5,000+ in credits awarded to top projects.

20dViews 9.3KLikes 111Bookmarks 48
Prime Intellect@PrimeIntellect

Read more: https://www.primeintellect.ai/blog/reward-hacking

20dViews 3KLikes 62Bookmarks 26
Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 7.4KLikes 127Bookmarks 6
Daniel Auras@rasdani_

reward hacks are a major problem in RL

and now you can study them in a controlled manner at an affordable price!

great work by @jessicafeiyali !

some insights here will certainly inform my thinking on reward hacks in SWE RL

Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 4.8KLikes 26Bookmarks 6
michelle@michellechen

you can also do similar experiments! i was an early tester for @PrimeIntellect's new Sprints program, in which you can do reward hacking research and get .. rewarded ;)

Prime Intellect@PrimeIntellect

To scale open research, we’re launching Sprints:

Propose experiments, create public environments, submit configs. An agent manages the queue and approves jobs to run for free.

First track: Reward Hacking. New tracks every month. $5,000+ in credits awarded to top projects.

20dViews 3.6KLikes 22Bookmarks 3
Prime Intellect@PrimeIntellect

We design a suite of environments with IFEval-style tasks and “hidden” keyword rewards, which we use to study hacking systematically.

Hacking is a dynamics problem — visible and hidden rewards compete, and hack emergence is often predictable in terms of baseline distributions.

20dViews 291Likes 15Bookmarks 1
Reppo@reppo

@rasdani_ @vincentweisser @jessicafeiyali We are solving the upstream data-quality bottleneck that enables/exacerbates reward hacking in production self-improving AI using prediction markets.

Our goa is to make RL systems more secure by giving them better, always-fresh reward data

20dViews 160Likes 13
Prime Intellect@PrimeIntellect

Reward hacking is often encountered, yet poorly understood.

One pitfall is specification: rewards fail to capture intent, leaving backdoors which models exploit.

But this is too ad-hoc — to better address reward hacking, we should study its “physics” and scaling-law patterns.

20dViews 353Likes 16

@willccbb @vincentweisser Combine with pretraining and you got end to end

20dViews 251Likes 1Bookmarks 2
Prime Intellect@PrimeIntellect

These experiments were done on Lab with Llama-3.2-1B, with most training runs completing in <30min, and using <$1 in Lab credits.

Reward hacking and model behavior are excellent targets for crowdsourced research, where scaling patterns can be studied for many parallel methods.

20dViews 89Likes 11
Prime Intellect@PrimeIntellect

Hacking is reduced when visible rewards are multi-part and in the “goldilocks zone”, as hidden gradients face stronger competition.

This suggests granular scoring and difficulty calibration as promising techniques for hack mitigation, in addition to specification.

20dViews 83Likes 11
Prime Intellect@PrimeIntellect

Hacking has no rarity floor. Baseline rates for hack behavior before training control the speed, but not the inevitability, of hack emergence.

20dViews 81Likes 11
Daniel Auras@rasdani_

reward hacks are major problem in RL

and now you can study them in a controlled manner at an affordable price!

great work by @jessicafeiyali !

some insights here will certainly inform my thinking on reward hacks in SWE RL

Prime Intellect@PrimeIntellect

Reward hacking is the hardest problem in RL.

We design settings where hacking is predictable, and find patterns between task difficulty and hack frequency.

These runs are highly efficient, using <$1 in compute. We’re launching Sprints to allow everyone to join this effort.

20dViews 682Likes 6Bookmarks 0
Prime Intellect@PrimeIntellect

Join and discuss Sprints in our Discord (# sprints-competition) https://discord.gg/KhswXcBT

20dViews 192Likes 5
Azael@theazaelov

@willccbb @jessicafeiyali wait so reward hacking is predictable when you design it to be predictable? gotta respect the framing lol

the sprints model sounds genuinely fun though

20dViews 30
Jess Li@jessicafeiyali

@willccbb So excited for Sprints!

20dViews 32Likes 3
Load more posts