Are VLMs nearing saturation on vision benchmarks?
Not on WorldBench: 2,000 carefully curated and verified questions over a visually diverse range of images, designed to be hard for frontier models.
The strongest still gets only 64%.
Led by @DavidYin0609 and @harishkrik
