/Tech2h ago

Entropix creator _xjdr releases performance metrics for public GLM 5.2 service after processing 300 billion tokens

The deployment maintained a 41% cache hit rate.

11502323

#830

Original post

xjdr@_xjdr#830inTech

final GLM 5.2 served stats: ~12000 unique api keys served ~300B tokens total 232 tok/s/gpu output average 431 tok/s/gpu output max sustained 2.1 sec TTFT overage (1M ctx) 61 sec p95 TTFT (1M ctx) 81k tok average input size 41% cache hit rate 0 chat logs kept (dont be evil) thanks again everyone and hopefully you found the service and tokens useful

xjdr@_xjdr

Today marks the end of the free GLM 5.2 with ncode. i hope y'all enjoyed the tokens and found some of our tools useful.

10:57 PM · Jun 29, 2026 · 240 Views

Sentiment

Users are praising @_xjdr for casually delivering 300B frontier-level tokens via a beefy local inference setup after the GLM 5.2 free API service ended.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS314LIKES5REPLIES2

dax@thdxr

@_xjdr why cache hit so low?

1h3145

BOOKMARKS1RETWEETS1

xjdr@_xjdr

@thdxr 🤷 . we had some pretty abusive use in the first 48 hours which i think skewed the stats quite a bit on cache hits and ttft p95 latency (but clever enough to be hard to weed out of the reports)

1h6721

xlr8harder@xlr8harder

@_xjdr this was on nvl72?

xjdr@_xjdr