/Tech5h ago

Analyst Zephyr says tinygrad's $150,000 Blackwell setup runs GLM 5.2 at 120 tokens per second

Story Overview

A pseudonymous analyst is spotlighting a compact local rig built from two tinygrad Blackwell tinyboxes that supposedly delivers 120 tokens per second on GLM 5.2, framing the $150,000 purchase as a potential long-term alternative to renting cloud GPUs.

1421.7K26367257.3K

#957

Original post

Zephyr@zephyr_z9#1695inTech

3.5 year breakeven

the tiny corp@__tinygrad__

I have on good authority that GLM 5.2 is running at 120 tok/s across two networked Blackwell tinyboxes. $150k and that setup can be yours, either 2x tinybox or 1x tinybox pro. Never pay the cloud again.

3:47 PM · Jun 20, 2026 · 65.9K Views

Cost Pressure

Breakeven math stays fuzzy without full logs

The cited 3.5-year payback includes opportunity costs and power but rests on the unconfirmed 120 t/s speed and unspecified cloud pricing assumptions, leaving the actual timeline open to debate.

Open Question

Hardware specs check out on paper

Each tinybox packs four RTX Pro 6000 Blackwell GPUs for 384 GB VRAM total, yet no public traces or third-party runs have surfaced to back the exact throughput number on this setup.

Sentiment

Some users are enthusiastic about GLM 5.2 speed on the $150K Tinybox and future compute incentives, while many others call the setup a poor investment due to high costs, rapid depreciation, and cheaper alternatives.

Pos

25.4%

Neg

74.6%

18 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS16.8K

Max@scalinglayers

@zephyr_z9 Should ANT/OAI really be worth 10x+ more than Z AI?

5h16.8K1

BOOKMARKS19REPLIES7

Simo Ryu@cloneofsimo

> Spend 150k$ immediately -> this is same as spending 7.5k$/year implicitly assuming 5~6% T-Bill. Close to 12k$ if you typically invest in S&P 500.

> 2$/hour electricity price (this is true in south korea, seoul, can be different for other part of the world). -> this is straight up 4.5k$ / year, if you work 40hours/week.

> insanely hot living room, occasionally electricity shortage, cost of installation (rent isnt free)

So You are looking at 12k~15k$ / year + your inconvenience.

But yes, it is true you get to never pay the cloud again (until GLM stop releasing models or tinybox breaks, which id assume happens in 10 years max)

the tiny corp@__tinygrad__

3h11.3K8819

LIKES119

eyonce@jakuhy

@__tinygrad__ hmm. I think Ill stick to the $20 monthly plan for now

5h2.8K119

RETWEETS17

the tiny corp@__tinygrad__

6h187.8K1.2K283

Reno@RenosBlog

@__tinygrad__ $75k? Brother, we are not all Elon Musk here you know.

5h2.2K38

Isaac Clark@bruhr3aly

@__tinygrad__ Okay $150K.

GLM 5.2 API is about $4.40/m output $1.40 input.

I could hammer GLM 5.2 for ***years*** before that box paid itself off.

Unless someone ***needs*** private compute I have no idea who is buying these lol

5h1.1K41

Kevin Cho@chokevinjs

@__tinygrad__ 75k rma might be the death of the local ai maxxer

5h2.9K16

Tom Turney@no_stp_on_snek

@__tinygrad__ Only 150k guys

5h41512

thestreamingdev()@thestreamingdev

@zephyr_z9 pointless - it’ll be an outdated model by then. or running at cerebras

5h5158

sg123@Siggis_Man

@__tinygrad__ No thanks, I'd rather not turn my apartment into a sauna

5h1.5K5

sg123@Siggis_Man

@__tinygrad__ I'll just wait a couple years till we get GLM 5.2 level models that can run on my macbook

5h972

Marcus Tallhamn@tallhamn

@__tinygrad__ Is that 2 boxes with 4 RTX 6k each?

5h2.4K5

Joe Crowley@crowleylab

@__tinygrad__ how many instances can you inference in parallel on exabox

5h1.6K1

Phi Browser@phibrowser

@__tinygrad__ the 'never pay the cloud' math only works at high utilization. $150k of hardware beats the cloud the moment you're running inference around the clock, and loses badly if you're not. owning vs renting compute is a bet on how constantly you'll actually use it.

6h7535