/Tech9h ago

Ghostty creator Mitchell Hashimoto argues local AI needs Claude Opus-tier quality for widespread adoption

Drew Houston says running equivalents costs up to $150,000.

1382.2K122338100K

#455

Original post

Mitchell Hashimoto@mitchellh#1562inTech

We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over.

Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge.

It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.

12:05 PM · Jun 16, 2026 · 100.7K Views

Sentiment

Many users are excited about local AI models nearing Opus 4.5 quality because of gains in privacy, consistency, and lower long-term costs versus cloud options.

Pos

85.7%

Neg

14.3%

14 comments with sentiment.

Cluster Engagement

Digg Deeper

Anthony Dikéasked

What's the best open source / local setup currently? And what's the rough estimate of the cost?

Qwen3.6/3.7 MoE variants (or similar like Gemma 4 31B/Kimi K2.6) are among the strongest practical open-weight options right now for local use, runnable via Ollama, llama.cpp, or LM Studio on consumer/prosumer hardware. ^¹

High-end single setups like an RTX 5090 (32GB) or M4 Mac Studio (128GB+ unified) deliver solid speeds for these efficient MoE models at roughly $4k–$10k total. ^²

For GLM-5.2-level (Opus-tier) quality, 2-bit quantized versions need ~240GB memory and fit high-RAM Apple Silicon or workstations (~$6k–$15k+), while fast inference (>100 tok/s) requires multi-GPU rigs like 8x RTX 6000-class cards at $125k–$150k. ^³

Practical local frontier performance remains hardware-intensive and expensive beyond mid-tier setups.

Posts from X

Most Activity

VIEWS3.8KLIKES70REPLIES4

derekmoeller@derekmoeller

@mitchellh I don't think it matters much if they are local or not, versus being open source. Open source models open up commodity marginal cost inference providers, which cuts token costs by 80%.

Running bigger models locally is expensive and won't be economically useful for a while.

8h3.8K704

BOOKMARKS17

Matteo Collina@matteocollina

@mitchellh @anoopcodes Have you tried ds4 with deepseek v4 flash by @antirez? Based on my experience, we are very close if not already there.

I’m using it for 30% of my coding tasks, being mostly limited by speed and HW.

https://github.com/antirez/ds4

8h1.2K3417

RETWEETS1

saint@thetronchguy

@mitchellh >as long as you pair w/ a frontier-level planner/judge

i’m the frontier-level planner-judge lmfao

5h25

Erdal@ErdalToprak

@mitchellh I combine gpt 5.5 with codex 5.3 spark for subagents but recently I added my local Qwen 3.6 27b not only I get more codex usage but I can give all credentials to this local subagent

This article on how to setup subagents is useful

8h1.4K78

Mitchell Hashimoto@mitchellh

@derekmoeller Yeah that’s fair too

8h3.2K21

MACHO Δ@0xTMK

@mitchellh > Kimi K2.7 Code is superb > GLM 5.2 seems to be Opsu 4.6 level and amazing > Qwen 3.7 Pro and Max are fantastic > Deepseek V4 Pro (Max) finds bugs nothing else can

9h590131

Erdal@ErdalToprak

@antonioleivag @mitchellh This doc https://developers.openai.com/codex/config-advanced and a translation layer for responses format between vllm and codex app server

8h2221

Drew Houston@drewhouston

@mitchellh Looking like GLM 5.2 is truly Opus-tier -- to run it fast (>100 tok/sec) you'll need 8x RTX 6000 pros minimum ($125-150k), but achievable now

Mitchell Hashimoto@mitchellh

Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge.

6h89260

Luc@lucrbvi

@mitchellh Did you tried GLM-5.2? It seems really good

9h29721

tmo@tmophoto

@mitchellh Running ai on a laptop is a bad decision. It can be run on an old gaming computer stuffed in a corner of your house that you access the llm with tailscale.

7h9314

Lee Moore@leegmoore

@mitchellh I feel like gemma 4 gets us between sonnet 3.7 and sonnet 4 in terms of raw coding. Still quite a bit from opus 4.5 but at some point I many of us were excited about it

8h1.7K3

Pedro@PedroRevezSilva

I think the quality offered by GPT 5.5 is actually an interesting place. When Fable 5 was available it's probably better at some things but the level of intelligence that you might need for 98% of the tasks is probably that. More intelligence than that is only necessary in very specific niche problems, I would say. Maybe when we have open models that are at the level of 5.5 or Opus 4.8, I would like 5.5 better. It will be fine. I think that will be the turning point.

9h1.1K6

Mitchell Hashimoto@mitchellh

@tmophoto Sure, I mean, wherever as long as its local.

7h7962

Greg Meyer@GMMeyer

@mitchellh you can run high quality local models with 100k in hardware, I have a 5090 and I can't run kimi locally at any reasonable speed.

however, we'll see it happen pretty soon

8h3452

Jesse Ezell@jezell

@mitchellh Maybe when everyone has a $40,000 GPU in their macbook and DRAM is as cheap as water.

9h3432

Anthony Ronning@anthonyronning

@derekmoeller @mitchellh GLM 5.2 is better than Opus 4.5 by benchmarks so we're already there essentially.

8h1152

nocrimez@nocrimez

@mitchellh Whats the closest to Opus 4.5 on ~5K hardware you've seen so far @KyleHessling1 ? 🙏

8h2421

Kyle Hessling@KyleHessling1

Part of me wants to say a REAP prune of Minimax m2.7 on a single NVIDIA GB10 unit (Spark, Gx10); speed is usable this way, but not crazy. The other part of me would say that a single 5090 build with Qwopus 27B v2 or similar is also an excellent bet, just due to the nature of the memory bandwidth, but the PCIe bottleneck kills performance on big MOE's like Minimax.

None of them are fully Opus 4.5 level yet on every front, in my opinion/use cases, even though they benchmark similarly, but they're much closer than I would've ever anticipated at this point in time! They probably match it's capability on 60-70% of common LLM queries.

I speculate small use-case-specific dense models running on a single 5090 with near SOTA performance for that use case, likely coding first, in the very near future.

I hope that helps lol it really depends on what you want to get into, the GB10 units are absurdly complete and efficient out of the box and can run much larger models, but the memory bandwidth of the 5090 is hard to recommend against when these sub-40B dense models are getting so good.

So basically, I don't have a perfect answer; but you can't go wrong buying a 5090 right now if you can get one, and you really can't go wrong buying a GB10! The real answer will come with which one gets a big leap first, ~200B MOE's or sub 40B Dense models. The other nice thing about GB10's is you can cluster 2 and run seriously huge models.

Anyways, I'm rambling now lol

8h542

James Blackwell@jwblackwell

@mitchellh Why run locally? Other than for privacy

Will always be more efficient to run in a proper datacenter on latest hardware

5h392

Scott Jordan@Authentic1ty

@mitchellh Uh, we already have those local models but they don't run on $5k setups.

5h115