15h ago

GeoCodeBench Shows GPT-5 Scoring Just 36.6% on 3D Vision Coding Tasks

2141229511.2K

——0——

Original post

#CVPR2026 Can frontier LLMs write PhD-level 3D vision code? We introduce GeoCodeBench, a benchmark that asks models to read real 3D geometric vision papers and implement core functions. Best result so far: GPT-5 reaches only 36.6%. This suggests that scientific coding in 3D vision remains far from solved. Paper: https://arxiv.org/pdf/2603.30038 Project: https://geocodebench.github.io/

2:35 AM · May 19, 2026