9h ago

DynaFLIP fuses vision, language, and 3D motion to outperform DINOv2 and SigLIP on robot policies

The framework was trained on 260K video trajectories.

DynaFLIP fuses vision, language, and 3D motion to outperform DINOv2 and SigLIP on robot policies · Digg