/Tech3h ago

Reka AI launches PhysicalRealismBench-U to evaluate VLM physical reasoning, with GPT-5.5 leading at 57.7%

No tested model demonstrated reliable physical reasoning in video.

416371.3K

#813

Original post

Reka@RekaAILabs

We just released PhysicalRealismBench-U — a benchmark for testing whether VLMs actually understand physics in programmatically generated videos, fully attributable.

This is an important step toward models that understand and generate physically realistic outputs.

Best result across 9 frontier models: 57.7% realism F1.

Read our blog post here: https://reka.ai/news/physicalrealismbench-attributable-physical-realism-evaluation-for-video-world-models

Visit the benchmark: https://link.reka.ai/physical-realism-benchmarks-VLM

7:04 AM · Jun 11, 2026 · 1.1K Views

/Tech3h ago

Reka AI launches PhysicalRealismBench-U to evaluate VLM physical reasoning, with GPT-5.5 leading at 57.7%

No tested model demonstrated reliable physical reasoning in video.

416371.3K

#813

Original post

Reka@RekaAILabs

We just released PhysicalRealismBench-U — a benchmark for testing whether VLMs actually understand physics in programmatically generated videos, fully attributable.

This is an important step toward models that understand and generate physically realistic outputs.

Best result across 9 frontier models: 57.7% realism F1.

Read our blog post here: https://reka.ai/news/physicalrealismbench-attributable-physical-realism-evaluation-for-video-world-models

Visit the benchmark: https://link.reka.ai/physical-realism-benchmarks-VLM

7:04 AM · Jun 11, 2026 · 1.1K Views

Sentiment

Positive users praise Reka's PhysicalRealismBench-U as important for VLM progress while negative users call the 14% baseline rough.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS259BOOKMARKS1LIKES2RETWEETS1

Mikel Artetxe@artetxem

We're releasing a new benchmark for evaluating whether VLMs can detect, localize, and explain physical violations in video.

We see large gaps across frontier models, with GPT-5.5 as the clear winner, but none are close to reliable yet.

Learn more: https://reka.ai/news/physicalrealismbench-attributable-physical-realism-evaluation-for-video-world-models

Reka@RekaAILabs

We just released PhysicalRealismBench-U — a benchmark for testing whether VLMs actually understand physics in programmatically generated videos, fully attributable.

This is an important step toward models that understand and generate physically realistic outputs.

Best result across 9 frontier models: 57.7% realism F1.

Read our blog post here: https://reka.ai/news/physicalrealismbench-attributable-physical-realism-evaluation-for-video-world-models

Visit the benchmark: https://link.reka.ai/physical-realism-benchmarks-VLM

2h25921

Strata@ChainZenit

@RekaAILabs that actually sounds like a super important benchmark for vlm progress.

3h6

Rugbist@rugbist_

@RekaAILabs so youre saying best VLM got a 14%? thats rough for the baseline lol