We just released PhysicalRealismBench-U — a benchmark for testing whether VLMs actually understand physics in programmatically generated videos, fully attributable.
This is an important step toward models that understand and generate physically realistic outputs.
Best result across 9 frontier models: 57.7% realism F1.
Read our blog post here: https://reka.ai/news/physicalrealismbench-attributable-physical-realism-evaluation-for-video-world-models
Visit the benchmark: https://link.reka.ai/physical-realism-benchmarks-VLM

