@willccbb Yeah that was the point I intended to develop in next post. As equipment lags, many cos have GPUs lying around but more in the 16-32 h100 range.
@Dorialexander as with all big MoEs, economical serving kicks in at fairly high batch size
32 or 64 gpus is common
