Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China
lmao
The final version uses 35 billion active parameters.
Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China
lmao
@scaling01 Tbh "for funsies" here means "basically restarted their entire research program to figure out the training of modern LLMs"
Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China
lmao
The final version uses 35 billion active parameters.
Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China
lmao
Positive users highlight Microsoft's rare global scale, while negative users dismiss the MAI development efforts as failing to surpass DeepSeek and wasting resources.