/Tech1d ago

xjdr, Entropix creator, says running disaggregated LLM inference requires a minimum setup of 12x8xB200 GPUs

Fine-tuning took six weeks on a GB300 NVL72 cluster

31120216.8K
Original post
xjdr@_xjdr#645inTech

not sure the minimum required but i can tell you i used a full gb300nvl72 for ft (took ~6 weeks) and now i am hosting it on the same cluster . the minimum i've used to run in real disagg for a meaningful number of tokens is 12x8xB200 (8 prefill x 4 decode) but ideally you have much more

vik@vikhyatk

@_xjdr how much infra do you need to finetune / host K2.6?

3:06 PM · Jun 9, 2026 · 4.7K Views
Sentiment

Users praise the massive GPU cluster for fine-tuning K2.6 as a beast because of the high FLOPs and HBM needed to train and run it properly.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.1KLIKES27REPLIES1
vik@vikhyatk

@_xjdr alrighty looks like i'm lowering my ambitions to laguna xs.2

xjdr@_xjdr

not sure the minimum required but i can tell you i used a full gb300nvl72 for ft (took ~6 weeks) and now i am hosting it on the same cluster . the minimum i've used to run in real disagg for a meaningful number of tokens is 12x8xB200 (8 prefill x 4 decode) but ideally you have much more

1dViews 1.1KLikes 27Bookmarks 0
BOOKMARKS1
xjdr@_xjdr

@vikhyatk its a beast. even in fp4, it takes quite a lot flop and quite a bit of HBM to train and run properly

vik@vikhyatk

@_xjdr alrighty looks like i'm lowering my ambitions to laguna xs.2

1dViews 1KLikes 21Bookmarks 1