/Tech2h ago

CMU Robotics researcher Gokul Swamy releases SAILOR, an imitation learning framework that helps robots recover from errors using 10x less data

It steers frozen VLAs using real-time natural language corrections

3234132.7K

#309

Original post

Gokul Swamy@g_k_swamy#1127inTech

Interaction (with the VLA and the world) is *fundamentally* necessary here, as it is hard to know what command elicits correct VLA behavior a priori. Thus, the game becomes how we can do interactive fine-tuning without an infeasible exhaustive search over prompts. [2/n]

Gokul Swamy@g_k_swamy

*Very* excited about this new paper -- we introduce a principled pipeline for training a closed-loop "elicitor" policy that robustly steers a VLA towards success. Plus, if there's a Hitchhiker's Guide reference in the title, you know the paper is gonna be good :p. [1/n]

1:41 PM · Jun 18, 2026 · 220 Views

Sentiment

Users praise the new paper on training language feedback policies to steer VLAs in closed loop as an incredibly impressive first PhD effort due to the author's strong work ethic and technical depth.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

GOKUL.DEVVia

#1127

Posts from X

Most Activity

VIEWS2KBOOKMARKS11LIKES17REPLIES1

Gokul Swamy@g_k_swamy

Hyun Joe@hyunjoej

Excited to share the first paper of my PhD!

If you’ve ever tried to control a VLA via natural language, you know it rarely does what it is told. 🗣️ We introduce a multi-stage pipeline for training a Language Feedback Policy (LFP) to steer a VLA in-the-loop.

2h2K1711

RETWEETS2

Andrea Bajcsy@andrea_bajcsy

When a VLA fails, it's *so hard* to tell whether it lacks the capability or was just prompted poorly.🤔

Joe showed that you can learn a language feedback policy that elicits the capabilities present in a frozen VLA, while refusing to steer when the VLA is incapable at the task!

Hyun Joe@hyunjoej

Excited to share the first paper of my PhD!

43m35532

Gokul Swamy@g_k_swamy

We propose a method for doing a tractable "local search" (similar to what we explored in https://gokul.dev/sailor/) in the space of natural language, before using expert iteration to train our "language feedback policy" (LFP). [3/n]

Gokul Swamy@g_k_swamy

2h14620

Gokul Swamy@g_k_swamy

We can perform this conformalization process with limited data as predicting whether improvement is *possible* is often easier than knowing the *precise* language sequence that will elicit the improvement. [5/n]

2h261

Gokul Swamy@g_k_swamy

However, this LFP can generalize poorly OOD. In response, we use techniques from conformal prediction to figure out when to "fall back" to the default user prompt, making sure steering is "mostly harmless." [4/n]

2h141

Gokul Swamy@g_k_swamy

Anyhow, I think this is an *incredibly* impressive first paper of the PhD for @hyunjoej, who continually impressed me with his work ethic and depth of thought. I can't recommend working with him enough :). Check out our website for more: https://hyunjoe.xyz/LanguagePolicy/.

2h241