Parametric CAD Bench from @gNucleusAI tasks agents with writing CAD code for 3D parts from a natural language description. mini-swe-agent does really well on it, surpassing Claude Code and Gemini CLI, even when used with their own base models. Cool task!
Positive users praise Mini-Swe-Agent's performance leap on the CAD benchmark while negative users see its lower price beating Claude Code as a setback for Anthropic's CLI team.
Most Activity
@gNucleusAI There's another cool CAD benchmark that recently came out, by @MikushRab, check it out here:
Introducing CADGenBench: measure how well AI systems produce engineering-grade 3D parts!
While current models can generate 3D parts, they are far from precise enough to build functional parts. We built a benchmark to systematically measure their capabilities on two tasks:
1. Generation from an engineering drawing of a part 2. Editing: given an existing STEP file and a requested change
The benchmark is tool-agnostic. It makes no assumptions about how you build the model. You can vary the LLM, and you can vary the environment. Use build123d, Onshape, Autodesk, or a model without an LLM entirely. We open sourced the scoring engine and a reference baseline on top of build123d.
A collaboration between Hugging Face and @mecadoinc!
Submission space: https://huggingface.co/spaces/HuggingAI4Engineering/CADGenBench Code repository: https://github.com/huggingface/cadgenbench
@gNucleusAI Read more about Parametric CAD Bench at: https://cadbench.ai/
Parametric CAD Bench from @gNucleusAI tasks agents with writing CAD code for 3D parts from a natural language description. mini-swe-agent does really well on it, surpassing Claude Code and Gemini CLI, even when used with their own base models. Cool task!

@OfirPress @gNucleusAI mini swe agent at $42 beating claude code at $73 on its own base model is a rough day for anthropic's cli team

@OfirPress @gNucleusAI that is a wild leap in performance, huge props to the team

@OfirPress @gNucleusAI ngl, giving an agent parametric CAD tasks is a good stress test
way harder than just writing markdown lol