/Tech40d ago

Cody Blakeney, who leads research at Arcee AI, claims the RL environment business closely resembles the human preference annotation data business from two to four years ago

Cameron R. Wolfe agreed based on his annotation experience.

1214862712.1K

#1674

Original post

Cody Blakeney@code_star#1674inTech

It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

5:00 AM · May 21, 2026 · 10.5K Views

Sentiment

Many users value the RL environments business for its high leverage and similarity to past annotation markets, while a few find the topic unappealing.

Pos

75.0%

Neg

25.0%

5 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.2KBOOKMARKS2LIKES17

Cody Blakeney@code_star

I read this again and it feels like I said “you know, the sky is blue”

Cody Blakeney@code_star

It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

40d1.2K172

RETWEETS6

Cody Blakeney@code_star

It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

40d10.5K12525

REPLIES2

Cody Blakeney@code_star

@Laz4rz Well, if it’s anything like data work, that means it’s consider low status but super high leverage. Makes me think it’s even more valuable work for anyone to do.

40d2148

Eric W. Tramel@fujikanaeda

@code_star I think the next thing we will see is not rl env but rather that you can pay companies to contract domain experts to use your model+harness for specific domain tasks and they’ll give you the traces

40d18452

Lazarz@Laz4rz

@code_star 👆 and why working on RL environments and similar stuff haven't been too sexy for me last 12 months

40d2628

Cameron R. Wolfe, Ph.D.@cwolferesearch

@code_star as someone who worked in the annotation business for years I couldn't agree more

Cody Blakeney@code_star

It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

40d40460

Lazarz@Laz4rz

that depends very strongly on the place you go to, imo if the team is strong and open to experiment with what it means to do a good rl env then sure, but if you're going to yet another rl env shop then you better speedrun it fast or you'll be on the chopping block once models do another quality step function

40d944

Cody Blakeney@code_star

@Laz4rz Of course it also means that synthetic data … or ugh … agent designed and scaled environments I guess, will eventually become both significantly cheaper and higher quality.

40d312

Cody Blakeney@code_star

@fujikanaeda I can’t believe this website is free

40d331

Alex UGift@Radipdegen

@code_star interesting take, but at least RL envs have a defined right answer sometimes

40d89

Julien Blanchon 🇺🇦@JulienBlanchon

@code_star +1

40d89

SUDARSH CHATURVEDI@Sudarsh3301

@code_star preference annotation had no moat once RLHF ate itself. RL environments have a fidelity moat — until world models get good enough to hallucinate the environment instead of query it.

40d35