Part of me wants to say a REAP prune of Minimax m2.7 on a single NVIDIA GB10 unit (Spark, Gx10); speed is usable this way, but not crazy. The other part of me would say that a single 5090 build with Qwopus 27B v2 or similar is also an excellent bet, just due to the nature of the memory bandwidth, but the PCIe bottleneck kills performance on big MOE's like Minimax.
None of them are fully Opus 4.5 level yet on every front, in my opinion/use cases, even though they benchmark similarly, but they're much closer than I would've ever anticipated at this point in time! They probably match it's capability on 60-70% of common LLM queries.
I speculate small use-case-specific dense models running on a single 5090 with near SOTA performance for that use case, likely coding first, in the very near future.
I hope that helps lol it really depends on what you want to get into, the GB10 units are absurdly complete and efficient out of the box and can run much larger models, but the memory bandwidth of the 5090 is hard to recommend against when these sub-40B dense models are getting so good.
So basically, I don't have a perfect answer; but you can't go wrong buying a 5090 right now if you can get one, and you really can't go wrong buying a GB10! The real answer will come with which one gets a big leap first, ~200B MOE's or sub 40B Dense models. The other nice thing about GB10's is you can cluster 2 and run seriously huge models.
Anyways, I'm rambling now lol