/Tech4h ago

Mini-Swe-Agent Outperforms Claude Code And Gemini CLI On CAD Benchmark

820565.3K

Original post

Parametric CAD Bench from @gNucleusAI tasks agents with writing CAD code for 3D parts from a natural language description. mini-swe-agent does really well on it, surpassing Claude Code and Gemini CLI, even when used with their own base models. Cool task!

9:13 AM · Jun 13, 2026 · 2.4K Views

Sentiment

Positive users praise Mini-Swe-Agent's performance leap on the CAD benchmark while negative users see its lower price beating Claude Code as a setback for Anthropic's CLI team.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.7K

Ofir Press@OfirPress

@gNucleusAI There's another cool CAD benchmark that recently came out, by @MikushRab, check it out here:

Michael Rabinovich@MikushRab

Introducing CADGenBench: measure how well AI systems produce engineering-grade 3D parts!

While current models can generate 3D parts, they are far from precise enough to build functional parts. We built a benchmark to systematically measure their capabilities on two tasks:

1. Generation from an engineering drawing of a part 2. Editing: given an existing STEP file and a requested change

The benchmark is tool-agnostic. It makes no assumptions about how you build the model. You can vary the LLM, and you can vary the environment. Use build123d, Onshape, Autodesk, or a model without an LLM entirely. We open sourced the scoring engine and a reference baseline on top of build123d.

A collaboration between Hugging Face and @mecadoinc!

Submission space: https://huggingface.co/spaces/HuggingAI4Engineering/CADGenBench Code repository: https://github.com/huggingface/cadgenbench

4h1.7K10

BOOKMARKS1LIKES3RETWEETS2REPLIES1

Ofir Press@OfirPress

@gNucleusAI Read more about Parametric CAD Bench at: https://cadbench.ai/

Ofir Press@OfirPress

4h1.2K31

Lunari@0x_lun

@OfirPress @gNucleusAI mini swe agent at $42 beating claude code at $73 on its own base model is a rough day for anthropic's cli team

4h12

Strata@ChainZenit

@OfirPress @gNucleusAI that is a wild leap in performance, huge props to the team

4h4

Alex YGift@Radipdegen

@OfirPress @gNucleusAI ngl, giving an agent parametric CAD tasks is a good stress test

way harder than just writing markdown lol