/AI7h ago

DynaFLIP dynamics-guided vision encoder outperforms static backbones like DINOv2 and SigLIP by 22.5% in robot learning evaluations

It is trained on 260K robot and human trajectories.

151761916642.8K

Quote posts

#732

Reposts

#908

Original post

Chris Paxton@chris_j_paxton#732inAI

Motion understanding is key to robotics

Jusuk Lee@jusukle

Are you still running your robot policies on vision encoders trained purely on static images?

Nowadays, the standard practice in robot learning is to plug in powerful vision models like CLIP, SigLIP, or DINOv2. This inherits a quiet, convenient assumption: “Let mainstream computer vision handle perception, and the downstream policy will figure out the dynamics.”

But let’s be real for a moment. Is this truly the best we can do?

We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.⬇️

🔷 Dynamics upstream: we push motion understanding into perception. 🔷 Tri-modal-dynamics supervision: image transitions × language × 3D flow, fused via simplex-volume alignment (260K trajectories from robot & human video) 🔷 Transfers everywhere: a visual backbone for diverse policies (MLP, Diffusion Policy, VLA) 🔷 +22.5% over the strongest baseline (DINOv2, SigLIP) under real-world OOD 🔷 Open-Source & easy to use

🌐 Website: https://dynaflip-robotics.github.io 📄 Paper: https://arxiv.org/abs/2605.30350 💻 Code: https://github.com/JU-SUK/DynaFLIP 🤗 Hugging Face: https://huggingface.co/jlee-larr/dynaflip-base

10:32 AM · Jun 2, 2026 · 2.3K Views

/AI7h ago

DynaFLIP dynamics-guided vision encoder outperforms static backbones like DINOv2 and SigLIP by 22.5% in robot learning evaluations

It is trained on 260K robot and human trajectories.

--0--

Quote posts

#732

Reposts

#908

Original post

Chris Paxton@chris_j_paxton#732inAI

Motion understanding is key to robotics

Jusuk Lee@jusukle

Are you still running your robot policies on vision encoders trained purely on static images?

But let’s be real for a moment. Is this truly the best we can do?

We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.⬇️

10:32 AM · Jun 2, 2026 · 2.3K Views

Sentiment

Users praise DynaFLIP's dynamics-guided vision encoders for robot policies because they tackle real-world motion challenges with ideas like simplex minimization and strong team collaboration.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

RETWEETS18

Jusuk Lee@jusukle

Are you still running your robot policies on vision encoders trained purely on static images?

But let’s be real for a moment. Is this truly the best we can do?

We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.⬇️

1d41.1K165165