This is very cool if true Bad boi will be a monster at prefill
Introducing Low-Voltage Inference (LVI) for high throughput workloads.
Today, AI chips can't scale FLOPs without thermal throttling.
As FLOPs utilization increases, AI chips draw more power and downregulate clock speed. This often results in sustained inference throughput under half of peak FLOPs.
Chips in other industries solve the power problem by running at lower voltages. Bitcoin miners run at under 3x the voltage of AI chips!
We’ve designed a new architecture to run our chip’s math blocks at under half the voltage of most AI chips. This enables multiple times the FLOPs density of AI chips today.
We can run trillion parameter sparse MoEs at 80%+ peak FLOPs without thermal throttling.
Running LVI requires co-designing the entire cluster from the transistor to the token: new splittable math arrays, circuit techniques, novel tiling and scheduling algorithms, power delivery networks, VRM architectures, advanced packaging, cold plate designs, and more.


