/AI18h ago

SOLE-R1 Model Delivers Zero-Shot Rewards for Robot Manipulation via Online RL

--0--
Quote posts
Reposts
Original postChris Paxton#732
Thomas Weng@thomas_weng

We found that state-of-the-art VLMs (Gemini, GPT-5, etc.) fail at predicting task progress for online RL, so we built our own: SOLE-R1.

SOLE-R1 is trained on 10 million images and video frames, and 4 million chain of thought traces that reason over both space and time.

The result is a video-language reasoning model that can be used as a reward for online RL with no other reward signals!

1:37 PM · Jun 3, 2026 · 3.6K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.