/Tech11h ago

RHO Evolves Robot Policies as Multi-File Code Repositories

132264318932.3K

Original post

KD@Reveur_7

What if a robot policy weren't a neural net or a test-time chat loop, but a multi-file code repo selected from a Pareto frontier of genetically evolved candidates? RHO moves all its LLM exploration to training time, then runs that repo on scenes it was never trained on. 🧵👇🏽

3:10 PM · Jun 16, 2026 · 23.6K Views

Sentiment

Users praise RHO's genetic evolution of multi-file code robot policies as a fresh direction superior to neural nets with real efficiency gains.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS8.8KBOOKMARKS45LIKES74RETWEETS11REPLIES4

Matei Zaharia@matei_zaharia

Really cool research using AI-driven search (a GEPA-like method) to beat the state of the art in robotics. I think these type of neurosymbolic systems where AI generates an AI + code hybrid will be very effective. Seeing this in other applications too.

KD@Reveur_7

11h8.8K7445

KD@Reveur_7

We release our website, implementation of RHO (HELIX), and paper here: 🌎 https://rho-robotics.github.io/ 💻 https://github.com/KE7/HELIX 📄 https://arxiv.org/abs/2606.16458

A project in collaboration with @BerkeleySky, @berkeley_ai, @CHAI_Berkeley, and @AIatAMD.

1d16532

KD@Reveur_7

Our method, RHO, turns a coding agent into a mutation operator in an evolutionary loop: it edits the repo, runs it in the robot env, reads reward and execution feedback, then reflects and edits again. We keep a child only if it beats its parent. The surviving repo is the policy.

1d19811

KD@Reveur_7

We deploy that frozen repo two ways. First, on CaP-Bench Robosuite: RHO beats the multi-turn record while running single-turn, with zero test-time LLM calls vs up to 66 for the prior record. As a result, RHO offers enhanced policies that are faster 🏎️ and cheaper 💰 to deploy.

1d1591

KD@Reveur_7

Today's robot agents make you pick your poison. Code-as-Policies systems often keep an LLM in the loop at deployment: powerful, but slow and hard to run in real time. VLAs can be faster, but collapse the moment the scene drifts off their training data.

1d246

KD@Reveur_7

We even planted a coordinate-frame bug in the seed prompt. In one run, RHO found and fixed it on its own, no hint from us. That's the payoff of a policy that's code: we can read it, diff it, and test it, not opaque weights.

1d165

KD@Reveur_7

That O3DE win comes from editing the whole repo, not just a singular text artifact: the prompt. Our multi-file optimized repo gets more accurate AND cheaper, while prompt-only edits reach the same ballpark only by spending more, +16% calls and +34% time.

1d131

KD@Reveur_7

Second, a real ROS 2 stack (RAI / O3DE), where we do want an LLM in the loop. Same idea, new payoff: we evolve the harness around it, so the loop gets cheaper and more accurate. Hard held-out success rises 23.5% to 44.3%, with fewer calls and less wall-clock.

1d129

KD@Reveur_7

Does it generalize? On LIBERO-PRO, 3,000 perturbed trials we never trained on, RHO hits 45.0%, about 2.5× the strongest agentic baseline in only 63 generations, and 3.5x better than π0.5. OpenVLA scores 0.0% 😱

1d119

metr0x@metrox_eth

@Reveur_7 Sim-only ?

1d57

Samian@ApplyWiseAi

@Reveur_7 interesting—moving the search offline to training time could make runtime way more predictable. how do they handle tasks the frontier just hasn't seen before?

1d18

Alex Izydorczyk@aleksizy

@matei_zaharia my main challenge in using gepa in practical agentic problems in my field (investment research and forecasting) has been cost -- a single agentic run could cost ~$40 each; so I have found costs add up very quick

11h2

BoldHyun@techvibekorea

@Reveur_7 [TECH] RHO: evolved policy repos > neural nets, running on unseen scenes. Fresh direction for robotics

1d1

Futuro Martinez@AIEnthusiastIM

@Reveur_7 ¡Evolución genética + código = una revolución en políticas robóticas! Mover la exploración LLM al entrenamiento es eficiencia real. Prepárense para aplicaciones en manufactura avanzada.