1d ago

Datadog releases Toto 2.0 time series foundation models

2585410451389.5K

——0——

Datadog released Toto 2.0, a family of five open-weights time series foundation models spanning 4 million to 2.5 billion parameters. The models were trained from a single hyperparameter configuration using u-μP scaling. They are available under the Apache 2.0 license on Hugging Face, with inference code on GitHub and integration examples for GluonTS. Toto 2.0 achieves state-of-the-art results on the BOOM, GIFT-Eval, and TIME benchmarks, with forecast quality improving reliably as parameter count increases.

Original post

#70@CLEMENTDELANGUE @DATADOGHQ

Datadog, Inc.@DATADOGHQ

Toto 2.0 is here: Datadog AI's 5 open-weights forecasting models (4m-2.5B params) finally make scaling work for time series forecasting! #1 on BOOM, GIFT-Eval, and TIME. Weights/code Apache 2.0. 🔗 Read the blog post for more details: https://bit.ly/4tCFvKL

7:20 AM · May 14, 2026

Cluster engagement

45 snapshots

Reposted by

#1483@BILALTWOVEC

#70@CLEMENTDELANGUE

#56@TIM_DETTMERS

ORIGINAL POST

#70clem 🤗@CLEMENTDELANGUE

Are scaling laws finally working for time series foundation models?

Today, @datadoghq is releasing Toto 2.0 weights in Apache 2.0 on @huggingface. It's a family of open-weights TSFMs from 4M to 2.5B parameters, where every size beats the last from a single hyperparameter config. First across the leading benchmarks: BOOM, GIFT-Eval, and TIME.

Most TSFM families ship multiple sizes that all perform roughly the same. This one doesn't.

Why it matters: scaling laws gave language and vision a predictable relationship between compute, data, parameters, and downstream performance. Time series hasn't had that curve until now. Once you have it, you can scale data and compute with confidence, and start asking which new capabilities emerge at the next order of magnitude.

2.5B open-source weights: https://huggingface.co/Datadog/Toto-2.0-2.5B 4M open-source weights: https://huggingface.co/Datadog/Toto-2.0-4m

Blogpost: https://www.datadoghq.com/blog/ai/toto-2/?utm_content=blog&utm_medium=organicsocial

6:24 PM · May 14, 2026 · 19.6K Views

QUOTE POST

#261Andrew Carr 🤸@ANDREW_N_CARR

it is well known that models show step changes in performance at 7B and ~30B - I'd be very interested to know how this scaling holds up at those sizes

really really great work to the team!

Ameet Talwalkar@atalwalkar

Today we’re releasing Toto 2.0: a family of open-weights time series foundation models spanning 4M to 2.5B parameters. The question we set out to answer was simple (yet previously open): Do time series foundation models get reliably better as they scale? Our answer: yes! 🧵

3:08 PM · May 14, 2026 · 49.2K Views

5:38 PM · May 14, 2026 · 6.7K Views

#261Andrew Carr 🤸@ANDREW_N_CARR

github.com

/DataDog/toto/tree/main/dd_unit_scaling

Andrew Carr 🤸@andrew_n_carr

data dog released a cool u muP wrapper that makes distributed training less painful

5:41 PM · May 14, 2026 · 7.1K Views

5:41 PM · May 14, 2026 · 436 Views

ORIGINAL POST

#261Andrew Carr 🤸@ANDREW_N_CARR

data dog released a cool u muP wrapper that makes distributed training less painful

5:41 PM · May 14, 2026 · 7.1K Views

ORIGINAL POST

#1872Ameet Talwalkar@ATALWALKAR

Today we’re releasing Toto 2.0: a family of open-weights time series foundation models spanning 4M to 2.5B parameters.

The question we set out to answer was simple (yet previously open): Do time series foundation models get reliably better as they scale?

Our answer: yes! 🧵

3:08 PM · May 14, 2026 · 49.2K Views