/Tech20h ago

Alex Polozov argues building RL environments for SWE agents is manual software engineering yielding only 2 to 10 tasks weekly

Creating the benchmark tasks requires writing automated verifiers.

2201374

#1082

Original post

(((ل()(ل() 'yoav))))👾@yoavgo#1082inTech

@Skiminok @ArneKoehn seems very legit to me, its a software engineering project.. no?

🇺🇦 Alex Polozov@Skiminok

@yoavgo @ArneKoehn The latter. Creating SWE-related RL environments and tasks.

11:43 AM · Jun 7, 2026 · 151 Views

/Tech20h ago

Alex Polozov argues building RL environments for SWE agents is manual software engineering yielding only 2 to 10 tasks weekly

Creating the benchmark tasks requires writing automated verifiers.

2201374

#1082

Original post

(((ل()(ل() 'yoav))))👾@yoavgo#1082inTech

@Skiminok @ArneKoehn seems very legit to me, its a software engineering project.. no?

🇺🇦 Alex Polozov@Skiminok

@yoavgo @ArneKoehn The latter. Creating SWE-related RL environments and tasks.

11:43 AM · Jun 7, 2026 · 151 Views

Sentiment

Users dismiss claims that SWE RL environments count as legitimate software engineering projects, arguing data quality from deliberately hired and motivated expert teams is far superior to forced reassignment.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS111REPLIES1

(((ل()(ل() 'yoav))))👾@yoavgo

@Skiminok @ArneKoehn maybe i misunderstood what you meant by "creating RL environments and tasks" then. can you elaborate?

🇺🇦 Alex Polozov@Skiminok

The job is not to develop data or RL *pipelines*. They don't write code for production systems. They write example code-related tasks on which LLMs are trained.

It's an expert data labeling job like any other (here, SWE domain). We do hire experts in dedicated orgs for this (Anthropic, Mercor, Surge, etc); on all feasible white collar domains.

The difference is initial motivation (what are you hired for) and incentive/reward system. The assignment is full-time and permanent. These are also people with higher than average SWE job performance and years of experience. They are rated by how well their tasks fit the LLM difficulty curve.

18h11100

🇺🇦 Alex Polozov@Skiminok

@yoavgo @ArneKoehn Data quality from a deliberately hired and motivated org of experts (Anthropic model; or countless modern data vendors) >> forced reassignment where the company is just waiting for people to resign.

20h41

(((ل()(ل() 'yoav))))👾@yoavgo

@Skiminok @ArneKoehn i dont understand why this specific reassignment is so much worse than other ones. data pipelines are important and interesting. RL pipelines likewise.

19h35

🇺🇦 Alex Polozov@Skiminok

The job is not to develop data or RL *pipelines*. They don't write code for production systems. They write example code-related tasks on which LLMs are trained.

It's an expert data labeling job like any other (here, SWE domain). We do hire experts in dedicated orgs for this (Anthropic, Mercor, Surge, etc); on all feasible white collar domains.

19h32