5h ago

Stanford researchers release EXPO-FT, an open-source method for RL finetuning of Vision-Language-Action models

It achieved perfect success rates across real-world physical tasks.

0
Original post

The light routing task is pretty cool. nice mix of force + precision + long horizon

2:17 PM · May 27, 2026 View on X

EXPO-FT builds on EXPO (https://arxiv.org/abs/2507.07986)

The first idea of EXPO is to train a small Gaussian policy, to edit the VLA's actions.

We also continuously distill successful trajectories into the base VLA.

Chelsea FinnChelsea Finn@chelseabfinn

How can VLAs achieve 95+% reliability? Using RL post-training with EXPO-FT: - π0.5 improves to 30/30 success on all 8 tasks tested - uses only 19 min of RL data on average Paper & videos: https://pd-perry.github.io/expo-ft/

12:06 AM · May 28, 2026 · 7.5K Views
12:06 AM · May 28, 2026 · 959 Views

How can VLAs achieve 95+% reliability?

Using RL post-training with EXPO-FT: - π0.5 improves to 30/30 success on all 8 tasks tested - uses only 19 min of RL data on average

Paper & videos: https://pd-perry.github.io/expo-ft/

Perry DongPerry Dong@perryadong

Introducing EXPO-FT – Efficient, Reliable & Open-Source VLA Finetuning! EXPO-FT unlocks π0.5 for challenging manipulation tasks: Routing string lights & inserting the power connector to illuminate them Striking pool ball into pocket Inserting flower into wine bottle (1/5)

7:46 AM · May 27, 2026 · 18.8K Views
12:06 AM · May 28, 2026 · 7.5K Views

The second idea of EXPO is to maximize Q-values on the fly, with best-of-N sampling.

It's important to do this both at test time, and when sampling actions for the Q-function targets.

Chelsea FinnChelsea Finn@chelseabfinn

EXPO-FT builds on EXPO (https://arxiv.org/abs/2507.07986) The first idea of EXPO is to train a small Gaussian policy, to edit the VLA's actions. We also continuously distill successful trajectories into the base VLA.

12:06 AM · May 28, 2026 · 959 Views
12:06 AM · May 28, 2026 · 656 Views

Project led by @perryadong and @khhung906, with @TianGao_19, @DorsaSadigh @StanfordAILab

Check out the paper and website for many more details and cool robot videos! 🤖 https://pd-perry.github.io/expo-ft/ https://arxiv.org/abs/2605.25477

Chelsea FinnChelsea Finn@chelseabfinn

EXPO-FT extends EXPO to fine-tune VLAs in the real world, using image observations, action chunking, and DAgger data. Compared to past methods, EXPO-FT - reaches higher reliability with less data - handles wider set of initial states

12:06 AM · May 28, 2026 · 578 Views
12:06 AM · May 28, 2026 · 540 Views

EXPO-FT extends EXPO to fine-tune VLAs in the real world, using image observations, action chunking, and DAgger data.

Compared to past methods, EXPO-FT - reaches higher reliability with less data - handles wider set of initial states

Chelsea FinnChelsea Finn@chelseabfinn

The second idea of EXPO is to maximize Q-values on the fly, with best-of-N sampling. It's important to do this both at test time, and when sampling actions for the Q-function targets.

12:06 AM · May 28, 2026 · 656 Views
12:06 AM · May 28, 2026 · 578 Views