It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.
Cody Blakeney, who leads research at Arcee AI, claims the RL environment business closely resembles the human preference annotation data business from two to four years ago
Cameron R. Wolfe agreed based on his annotation experience.
Many users value the RL environments business for its high leverage and similarity to past annotation markets, while a few find the topic unappealing.
No Digg Deeper questions have been answered for this story yet.
Most Activity
I read this again and it feels like I said “you know, the sky is blue”
It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.
It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

@Laz4rz Well, if it’s anything like data work, that means it’s consider low status but super high leverage. Makes me think it’s even more valuable work for anyone to do.

@code_star I think the next thing we will see is not rl env but rather that you can pay companies to contract domain experts to use your model+harness for specific domain tasks and they’ll give you the traces

@code_star 👆 and why working on RL environments and similar stuff haven't been too sexy for me last 12 months
@code_star as someone who worked in the annotation business for years I couldn't agree more
It really feels like the RL environment business is a lot like the human preference annotation data business from 2-4 years ago.

that depends very strongly on the place you go to, imo if the team is strong and open to experiment with what it means to do a good rl env then sure, but if you're going to yet another rl env shop then you better speedrun it fast or you'll be on the chopping block once models do another quality step function

@Laz4rz Of course it also means that synthetic data … or ugh … agent designed and scaled environments I guess, will eventually become both significantly cheaper and higher quality.

@fujikanaeda I can’t believe this website is free

@code_star interesting take, but at least RL envs have a defined right answer sometimes

@code_star +1

@code_star preference annotation had no moat once RLHF ate itself. RL environments have a fidelity moat — until world models get good enough to hallucinate the environment instead of query it.