/Tech6h ago

Systems engineer Yacine argues RL for robotics and Atari builds better foundations than working on LLMs directly

Story Overview

Systems engineer Yacine makes the case that reinforcement learning work grounded in robotics control and Atari-style games produces sturdier building blocks for future LLM applications than tackling LLM-specific RL head-on, a stance Joseph Suarez echoes by labeling the latter path bloated and secondary.

131762238K

#403

Original post

kache@yacineMTB#403inTech

I feel like the longer I ignore RL for LLMs, and focus purely on RL for robotics control and atari-esque games, the better my insights for RL for LLMs will be

6:59 AM · Jun 17, 2026 · 6.9K Views

Developer Impact

Embodied work surfaces cleaner signals

Yacine’s robotics hardware push and Suarez’s simulation tooling both highlight how physical and game environments force sharper credit assignment and scaling questions without the extra machinery that bloats LLM value functions.

Open Question

No numbers yet on the payoff timeline

The exchange stays at the level of strategic opinion, leaving open whether sustained focus here will translate into faster LLM progress or simply stronger standalone robotics results.

Sentiment

Many users see value in focusing RL research for LLMs on robotics and games to better handle sparse rewards and core challenges, while some dismiss the idea as bloated or unimportant.

Pos

75.0%

Neg

25.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.4KBOOKMARKS1LIKES24RETWEETS1REPLIES1

Joseph Suarez 🐡@jsuarez

@yacineMTB not really. It's just a bit stupid bloated problem. There's probably some stuff you can do, but it's not the most important application of RL (don't say that in SF, they will kick you out)

kache@yacineMTB

I feel like the longer I ignore RL for LLMs, and focus purely on RL for robotics control and atari-esque games, the better my insights for RL for LLMs will be

4h1.4K241

𝙳𝚊𝚗𝚒𝚎𝚕 ☈@sadasant

(albeit self-promotional, but I’m sure you can dig and get whatever is useful for you) I’m beginning to work on that kind of heuristics in my harness. I currently use it to manage a swarm of tailscale machines, some are even windows servers with foxpro apps, AI doesn’t care: https://github.com/idolum-ai/aphelion

“hey business, forget having to adapt, just let me ssh into your servers and you do business a usual”

5h141

Allison the human@allisonology

@yacineMTB Plausible; when is your next space yacine any new coffees you've had recently?

6h1281

𝙳𝚊𝚗𝚒𝚎𝚕 ☈@sadasant

@yacineMTB agreed. RL for LLMs needs vast amounts of open-ended data, it’s a dead end for most. One can imagine synthetic data pipelines tuned to product KPIs but that still assumes the signal is there, and oh boy, most business problems are severely under-instrumented, under-analyzed even

5h71

𝙳𝚊𝚗𝚒𝚎𝚕 ☈@sadasant

@yacineMTB to the degree that I’d argue most business problems are solvable with much simpler (than RL) pipelines where agents are designed to satisfy a standard substrate, like an ERP or CRM, or some certification, but away from humans, so humans cannot mess it up. Humans get the chatbot

5h18

kache@yacineMTB

@allisonology i think tomorrow am

5h981

Ezika@Ezika_0h

@yacineMTB interesting angle on RL skill transfer

might help isolate core reward modeling challenges without language noise

5h61

0.005 Seconds (3/694)@seconds_0

@yacineMTB i am of the opinion that the hyperperformant ai of the future will be a cluster of models networked together in specialty functions, with hybrid versions of language models and RL functional models working together

4h52

Brian Jordan@bcjordan

@yacineMTB Carmack coded

5h26

Nasib A. Naimi@NasibNaimi

@yacineMTB gotta go to the source

4h71

Dan Erdman@erdmanus

@yacineMTB Reasonable bet. RLHF/RLVR borrowed most of its core machinery - policy gradients, PPO, advantage estimation - from continuous control and Atari, then bolted on a reward model or verifier on top.

4h20