Chuang Gan and the UMass team open-source Action Images, representing 7-DoF robot actions as visual data

Original post

We open-source Action Images — a new representation that translates 7-DoF robot actions into interpretable images.

Video models are emerging as powerful robotic foundation models, but a key challenge remains: how can we seamlessly integrate robot policies into video models?

Instead of representing actions as low-dimensional control tokens, Action Images provide a pixel-grounded action representation, reframing policy learning as a visual tracking problem!

By unifying observations and actions in the same video space, Action Images enable a unified robotics world model that supports video-action joint generation, action-conditioned video generation, and action labeling!

Code: http://github.com/UMass-Embodied-AGI/ActionImages Paper: https://arxiv.org/abs/2604.06168

11:15 AM · Jun 24, 2026 · 7.6K Views

VIEWS5.6KBOOKMARKS1LIKES14RETWEETS2REPLIES2

Robert Scoble@Scobleizer

The computer scientists are figuring out how to make robots do things.

Chuang Gan@gan_chuang

We open-source Action Images — a new representation that translates 7-DoF robot actions into interpretable images.

Video models are emerging as powerful robotic foundation models, but a key challenge remains: how can we seamlessly integrate robot policies into video models?

Instead of representing actions as low-dimensional control tokens, Action Images provide a pixel-grounded action representation, reframing policy learning as a visual tracking problem!

Code: http://github.com/UMass-Embodied-AGI/ActionImages Paper: https://arxiv.org/abs/2604.06168

6h5.6K141

Vincent@InsiderPresider

@Scobleizer clanker post?

6h4

Robert Scoble@Scobleizer

@InsiderPresider I'm not a clanker yet. :-)

6h43

Mugen@joseph_mugen

@Scobleizer this is pretty next level

6h41