@Skiminok @ArneKoehn seems very legit to me, its a software engineering project.. no?
@yoavgo @ArneKoehn The latter. Creating SWE-related RL environments and tasks.
Creating the benchmark tasks requires writing automated verifiers.
@Skiminok @ArneKoehn seems very legit to me, its a software engineering project.. no?
@yoavgo @ArneKoehn The latter. Creating SWE-related RL environments and tasks.
Users dismiss claims that SWE RL environments count as legitimate software engineering projects, arguing data quality from deliberately hired and motivated expert teams is far superior to forced reassignment.
@Skiminok @ArneKoehn maybe i misunderstood what you meant by "creating RL environments and tasks" then. can you elaborate?
The job is not to develop data or RL *pipelines*. They don't write code for production systems. They write example code-related tasks on which LLMs are trained.
It's an expert data labeling job like any other (here, SWE domain). We do hire experts in dedicated orgs for this (Anthropic, Mercor, Surge, etc); on all feasible white collar domains.
The difference is initial motivation (what are you hired for) and incentive/reward system. The assignment is full-time and permanent. These are also people with higher than average SWE job performance and years of experience. They are rated by how well their tasks fit the LLM difficulty curve.

@yoavgo @ArneKoehn Data quality from a deliberately hired and motivated org of experts (Anthropic model; or countless modern data vendors) >> forced reassignment where the company is just waiting for people to resign.

@Skiminok @ArneKoehn i dont understand why this specific reassignment is so much worse than other ones. data pipelines are important and interesting. RL pipelines likewise.

The job is not to develop data or RL *pipelines*. They don't write code for production systems. They write example code-related tasks on which LLMs are trained.
It's an expert data labeling job like any other (here, SWE domain). We do hire experts in dedicated orgs for this (Anthropic, Mercor, Surge, etc); on all feasible white collar domains.
The difference is initial motivation (what are you hired for) and incentive/reward system. The assignment is full-time and permanent. These are also people with higher than average SWE job performance and years of experience. They are rated by how well their tasks fit the LLM difficulty curve.
Creating the benchmark tasks requires writing automated verifiers.
@Skiminok @ArneKoehn seems very legit to me, its a software engineering project.. no?
@yoavgo @ArneKoehn The latter. Creating SWE-related RL environments and tasks.