Systems engineer Yacine argues RL for robotics and Atari builds better foundations than working on LLMs directly
Story Overview
Systems engineer Yacine makes the case that reinforcement learning work grounded in robotics control and Atari-style games produces sturdier building blocks for future LLM applications than tackling LLM-specific RL head-on, a stance Joseph Suarez echoes by labeling the latter path bloated and secondary.
Embodied work surfaces cleaner signals
Yacine’s robotics hardware push and Suarez’s simulation tooling both highlight how physical and game environments force sharper credit assignment and scaling questions without the extra machinery that bloats LLM value functions.
No numbers yet on the payoff timeline
The exchange stays at the level of strategic opinion, leaving open whether sustained focus here will translate into faster LLM progress or simply stronger standalone robotics results.
Many users see value in focusing RL research for LLMs on robotics and games to better handle sparse rewards and core challenges, while some dismiss the idea as bloated or unimportant.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@yacineMTB not really. It's just a bit stupid bloated problem. There's probably some stuff you can do, but it's not the most important application of RL (don't say that in SF, they will kick you out)
I feel like the longer I ignore RL for LLMs, and focus purely on RL for robotics control and atari-esque games, the better my insights for RL for LLMs will be

(albeit self-promotional, but I’m sure you can dig and get whatever is useful for you) I’m beginning to work on that kind of heuristics in my harness. I currently use it to manage a swarm of tailscale machines, some are even windows servers with foxpro apps, AI doesn’t care: https://github.com/idolum-ai/aphelion
“hey business, forget having to adapt, just let me ssh into your servers and you do business a usual”

@yacineMTB Plausible; when is your next space yacine any new coffees you've had recently?

@yacineMTB agreed. RL for LLMs needs vast amounts of open-ended data, it’s a dead end for most. One can imagine synthetic data pipelines tuned to product KPIs but that still assumes the signal is there, and oh boy, most business problems are severely under-instrumented, under-analyzed even

@yacineMTB to the degree that I’d argue most business problems are solvable with much simpler (than RL) pipelines where agents are designed to satisfy a standard substrate, like an ERP or CRM, or some certification, but away from humans, so humans cannot mess it up. Humans get the chatbot

@allisonology i think tomorrow am

@yacineMTB interesting angle on RL skill transfer
might help isolate core reward modeling challenges without language noise

@yacineMTB i am of the opinion that the hyperperformant ai of the future will be a cluster of models networked together in specialty functions, with hybrid versions of language models and RL functional models working together

@yacineMTB Carmack coded

@yacineMTB gotta go to the source

@yacineMTB Reasonable bet. RLHF/RLVR borrowed most of its core machinery - policy gradients, PPO, advantage estimation - from continuous control and Atari, then bolted on a reward model or verifier on top.

@yacineMTB like more anticipatory thought?

@jsuarez @yacineMTB it's always back to PPO land for them :)

@yacineMTB Robotics Rl forces you to deal with sparse rewards and real consequences for bad policies.
Llm Rl still gets away with sloppy reward signals.

@yacineMTB Bro I am still waiting for your drone swarm.

@yacineMTB