/AI2h ago

xjdr, Entropix creator, says running disaggregated LLM inference requires a minimum setup of 12x8xB200 GPUs

Fine-tuning took six weeks on a GB300 NVL72 cluster

242041.7K
Original post
xjdr@_xjdr#609inAI

not sure the minimum required but i can tell you i used a full gb300nvl72 for ft (took ~6 weeks) and now i am hosting it on the same cluster . the minimum i've used to run in real disagg for a meaningful number of tokens is 12x8xB200 (8 prefill x 4 decode) but ideally you have much more

vik@vikhyatk

@_xjdr how much infra do you need to finetune / host K2.6?

3:06 PM · Jun 9, 2026 · 1.1K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS373LIKES10REPLIES1
vik@vikhyatk

@_xjdr alrighty looks like i'm lowering my ambitions to laguna xs.2

xjdr@_xjdr

not sure the minimum required but i can tell you i used a full gb300nvl72 for ft (took ~6 weeks) and now i am hosting it on the same cluster . the minimum i've used to run in real disagg for a meaningful number of tokens is 12x8xB200 (8 prefill x 4 decode) but ideally you have much more

2hViews 373Likes 10Bookmarks 0
xjdr@_xjdr

@vikhyatk its a beast. even in fp4, it takes quite a lot flop and quite a bit of HBM to train and run properly

vik@vikhyatk

@_xjdr alrighty looks like i'm lowering my ambitions to laguna xs.2

2hViews 326Likes 7Bookmarks 0