#CVPR2026 Can frontier LLMs write PhD-level 3D vision code?
We introduce GeoCodeBench, a benchmark that asks models to read real 3D geometric vision papers and implement core functions.
Best result so far: GPT-5 reaches only 36.6%.
This suggests that scientific coding in 3D vision remains far from solved.
Paper: https://arxiv.org/pdf/2603.30038
Project: https://geocodebench.github.io/