Datadog releases Toto 2.0 time series foundation models
Datadog released Toto 2.0, a family of five open-weights time series foundation models spanning 4 million to 2.5 billion parameters. The models were trained from a single hyperparameter configuration using u-μP scaling. They are available under the Apache 2.0 license on Hugging Face, with inference code on GitHub and integration examples for GluonTS. Toto 2.0 achieves state-of-the-art results on the BOOM, GIFT-Eval, and TIME benchmarks, with forecast quality improving reliably as parameter count increases.
Are scaling laws finally working for time series foundation models?
Today, @datadoghq is releasing Toto 2.0 weights in Apache 2.0 on @huggingface. It's a family of open-weights TSFMs from 4M to 2.5B parameters, where every size beats the last from a single hyperparameter config. First across the leading benchmarks: BOOM, GIFT-Eval, and TIME.
Most TSFM families ship multiple sizes that all perform roughly the same. This one doesn't.
Why it matters: scaling laws gave language and vision a predictable relationship between compute, data, parameters, and downstream performance. Time series hasn't had that curve until now. Once you have it, you can scale data and compute with confidence, and start asking which new capabilities emerge at the next order of magnitude.
2.5B open-source weights: https://huggingface.co/Datadog/Toto-2.0-2.5B 4M open-source weights: https://huggingface.co/Datadog/Toto-2.0-4m
Blogpost: https://www.datadoghq.com/blog/ai/toto-2/?utm_content=blog&utm_medium=organicsocial

it is well known that models show step changes in performance at 7B and ~30B - I'd be very interested to know how this scaling holds up at those sizes
really really great work to the team!
Today we’re releasing Toto 2.0: a family of open-weights time series foundation models spanning 4M to 2.5B parameters. The question we set out to answer was simple (yet previously open): Do time series foundation models get reliably better as they scale? Our answer: yes! 🧵
data dog released a cool u muP wrapper that makes distributed training less painful
data dog released a cool u muP wrapper that makes distributed training less painful

Today we’re releasing Toto 2.0: a family of open-weights time series foundation models spanning 4M to 2.5B parameters.
The question we set out to answer was simple (yet previously open): Do time series foundation models get reliably better as they scale?
Our answer: yes! 🧵
