/Tech2h ago

Entropix creator _xjdr argues provider tokens-per-second claims are unreproducible without standardized reporting context

Omitted metrics include time-to-first-token latency and tokens per GPU.

752003.3K

#830

Original post

xjdr@_xjdr#830inTech

when people or providers say 'this model gets <x> tok/s i honestly have no idea what they mean (and as such can never reproduce / verify said claims)

2:15 PM · Jun 27, 2026 · 3.4K Views

Sentiment

Users see higher tokens-per-second rates as a useful reference point for judging AI model speed and context handling on devices like Macs.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS198LIKES7

xlr8harder@xlr8harder

@_xjdr tok/s? at what tokens/gpu/s and at what ttft and ...

xjdr@_xjdr

when people or providers say 'this model gets <x> tok/s i honestly have no idea what they mean (and as such can never reproduce / verify said claims)

2h19870

Zach Mueller@TheZachMueller

@_xjdr Cards will always have that 🫡 but yeah it’s like “okay is it special kernels? Spec decode? Quantized and not telling us? Congrats, I think?”

2h561

RasputinKaiser@RasputinKaiser

I mean, ones with a higher token per second is speedy, it uses context quicker as well lol. It’s a good reference point if it runs 50 tok/s on a MacBook M2, but let’s say I have the M1, I know it’s POSSIBLE I can get similar, but I know it’s an older gen, so I can assume 30-40tok/s and decide if it’s worth downloading

2h7