AI is moving beyond text, images, and code.
Engineering artifacts are becoming a new class of model outputs and evaluating them requires different tools than we use for text, code, or images.
Today we're excited to release CADGenBench, a benchmark for CAD generation and editing.
- Given an engineering drawing → generate a valid 3D CAD model - Given a STEP file + change request → edit it correctly
The benchmark is tool-agnostic: any CAD stack works (Fusion, Onshape, build123d, SolidWorks, etc.). Submissions are simply STEP files.
Models are scored on: * geometric accuracy * topology correctness * interface compatibility * CAD validity
The benchmark is open, the ground truth is private, and the leaderboard is live.
Since CAD evaluation is surprisingly subtle, here's how the metrics work 🧵
Introducing CADGenBench: measure how well AI systems produce engineering-grade 3D parts!
While current models can generate 3D parts, they are far from precise enough to build functional parts. We built a benchmark to systematically measure their capabilities on two tasks:
1. Generation from an engineering drawing of a part 2. Editing: given an existing STEP file and a requested change
The benchmark is tool-agnostic. It makes no assumptions about how you build the model. You can vary the LLM, and you can vary the environment. Use build123d, Onshape, Autodesk, or a model without an LLM entirely. We open sourced the scoring engine and a reference baseline on top of build123d.
A collaboration between Hugging Face and @mecadoinc!
Submission space: https://huggingface.co/spaces/HuggingAI4Engineering/CADGenBench Code repository: https://github.com/huggingface/cadgenbench


