/AI18h ago

SOLE-R1 Model Delivers Zero-Shot Rewards for Robot Manipulation via Online RL

1286143.6K

Quote posts

Reposts

Original post

Thomas Weng@thomas_weng

We found that state-of-the-art VLMs (Gemini, GPT-5, etc.) fail at predicting task progress for online RL, so we built our own: SOLE-R1.

SOLE-R1 is trained on 10 million images and video frames, and 4 million chain of thought traces that reason over both space and time.

The result is a video-language reasoning model that can be used as a reward for online RL with no other reward signals!

1:37 PM · Jun 3, 2026 · 3.6K Views

/AI18h ago

--0--

Quote posts

Reposts

Original post

Thomas Weng@thomas_weng

We found that state-of-the-art VLMs (Gemini, GPT-5, etc.) fail at predicting task progress for online RL, so we built our own: SOLE-R1.

SOLE-R1 is trained on 10 million images and video frames, and 4 million chain of thought traces that reason over both space and time.

The result is a video-language reasoning model that can be used as a reward for online RL with no other reward signals!

1:37 PM · Jun 3, 2026 · 3.6K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

No ranked X posts are available for this story yet.