What if a robot policy weren't a neural net or a test-time chat loop, but a multi-file code repo selected from a Pareto frontier of genetically evolved candidates? RHO moves all its LLM exploration to training time, then runs that repo on scenes it was never trained on. 🧵👇🏽
Users praise RHO's genetic evolution of multi-file code robot policies as a fresh direction superior to neural nets with real efficiency gains.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Really cool research using AI-driven search (a GEPA-like method) to beat the state of the art in robotics. I think these type of neurosymbolic systems where AI generates an AI + code hybrid will be very effective. Seeing this in other applications too.
What if a robot policy weren't a neural net or a test-time chat loop, but a multi-file code repo selected from a Pareto frontier of genetically evolved candidates? RHO moves all its LLM exploration to training time, then runs that repo on scenes it was never trained on. 🧵👇🏽

We release our website, implementation of RHO (HELIX), and paper here: 🌎 https://rho-robotics.github.io/ 💻 https://github.com/KE7/HELIX 📄 https://arxiv.org/abs/2606.16458
A project in collaboration with @BerkeleySky, @berkeley_ai, @CHAI_Berkeley, and @AIatAMD.

Our method, RHO, turns a coding agent into a mutation operator in an evolutionary loop: it edits the repo, runs it in the robot env, reads reward and execution feedback, then reflects and edits again. We keep a child only if it beats its parent. The surviving repo is the policy.

We deploy that frozen repo two ways. First, on CaP-Bench Robosuite: RHO beats the multi-turn record while running single-turn, with zero test-time LLM calls vs up to 66 for the prior record. As a result, RHO offers enhanced policies that are faster 🏎️ and cheaper 💰 to deploy.

Today's robot agents make you pick your poison. Code-as-Policies systems often keep an LLM in the loop at deployment: powerful, but slow and hard to run in real time. VLAs can be faster, but collapse the moment the scene drifts off their training data.

We even planted a coordinate-frame bug in the seed prompt. In one run, RHO found and fixed it on its own, no hint from us. That's the payoff of a policy that's code: we can read it, diff it, and test it, not opaque weights.

That O3DE win comes from editing the whole repo, not just a singular text artifact: the prompt. Our multi-file optimized repo gets more accurate AND cheaper, while prompt-only edits reach the same ballpark only by spending more, +16% calls and +34% time.

Second, a real ROS 2 stack (RAI / O3DE), where we do want an LLM in the loop. Same idea, new payoff: we evolve the harness around it, so the loop gets cheaper and more accurate. Hard held-out success rises 23.5% to 44.3%, with fewer calls and less wall-clock.

Does it generalize? On LIBERO-PRO, 3,000 perturbed trials we never trained on, RHO hits 45.0%, about 2.5× the strongest agentic baseline in only 63 generations, and 3.5x better than π0.5. OpenVLA scores 0.0% 😱

@Reveur_7 Sim-only ?

@Reveur_7 interesting—moving the search offline to training time could make runtime way more predictable. how do they handle tasks the frontier just hasn't seen before?

@matei_zaharia my main challenge in using gepa in practical agentic problems in my field (investment research and forecasting) has been cost -- a single agentic run could cost ~$40 each; so I have found costs add up very quick

@Reveur_7 [TECH] RHO: evolved policy repos > neural nets, running on unseen scenes. Fresh direction for robotics

@Reveur_7 ¡Evolución genética + código = una revolución en políticas robóticas! Mover la exploración LLM al entrenamiento es eficiencia real. Prepárense para aplicaciones en manufactura avanzada.