Interaction (with the VLA and the world) is *fundamentally* necessary here, as it is hard to know what command elicits correct VLA behavior a priori. Thus, the game becomes how we can do interactive fine-tuning without an infeasible exhaustive search over prompts. [2/n]
*Very* excited about this new paper -- we introduce a principled pipeline for training a closed-loop "elicitor" policy that robustly steers a VLA towards success. Plus, if there's a Hitchhiker's Guide reference in the title, you know the paper is gonna be good :p. [1/n]
