/AI10h ago

Yoav Goldberg, AI2-Israel research director, argues reinforcement learning jargon in LLM discussions is unnecessary rebranding

He notes 'policy' and 'rollout' replace 'model' and 'sample'.

151074177.9K

Original posts

#92

Comments

#23

Original post

(((ل()(ل() 'yoav))))👾@yoavgo#92inAI

LLM RL is when you say advantage instead of weighting heuristic, policy instead of model and rollout instead of sample etc, for no good reason whatsoever.

10:05 PM · Jun 3, 2026 · 5.3K Views

/AI10h ago

Yoav Goldberg, AI2-Israel research director, argues reinforcement learning jargon in LLM discussions is unnecessary rebranding

He notes 'policy' and 'rollout' replace 'model' and 'sample'.

--0--

Original posts

#92

Comments

#23

Original post

(((ل()(ل() 'yoav))))👾@yoavgo#92inAI

LLM RL is when you say advantage instead of weighting heuristic, policy instead of model and rollout instead of sample etc, for no good reason whatsoever.

10:05 PM · Jun 3, 2026 · 5.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1.1KBOOKMARKS2LIKES15

Sasha Rush@srush_nlp

@yoavgo Here’s the case: if you accept the axiom that rule 1 is to be “as on-policy as possible” the design space changes. You now have tons of policies, and this new “group” thing that matters the most. Tokens and weights get abstracted away. Terminology wasn’t scaling.

(((ل()(ل() 'yoav))))👾@yoavgo

LLM RL is when you say advantage instead of weighting heuristic, policy instead of model and rollout instead of sample etc, for no good reason whatsoever.

3h1.1K152

REPLIES2

(((ل()(ل() 'yoav))))👾@yoavgo

@srush_nlp why can't i say "learn with the freshest gradients", "be as close as possible to the representative model" etc?

Sasha Rush@srush_nlp

2h33121