Creator bidhan releases Paris 2.0, a decentralized video model that beats monolithic training by 2x on FVD · Digg

Creator bidhan releases Paris 2.0, a decentralized video model that beats monolithic training by 2x on FVD · Digg

Posts from X

Most Activity

VIEWS71.3KBOOKMARKS546LIKES674RETWEETS52REPLIES15

Chubby♨️@kimmonismus

ByteDance just open-sourced one of the most capable multimodal models out there.

BAGEL does image generation, editing, style transfer, and visual understanding - all in a single 7B parameter model. Apache 2.0 licensed!

One model. No switching between specialized tools. Amazing

bidhan@bidhan

We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model.

We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.

32d71.3K674546

bidhan@bidhan

We're selectively releasing the Paris 2.0 weights and partnering with researchers and teams interested in diffusion-based video models, world models, and embodied agents. The model is on Hugging Face: https://huggingface.co/bageldotcom/paris2

bidhan@bidhan

Video diffusion backbones increasingly anchor the world models behind physical AI. Our early internal results show the Paris 2.0 recipe carries over to them very well. We will share more on that in future work.

32d9.4K6244

Santiago@svpino

For the nerds out there:

The way this model was trained was pretty cool, and probably, the first time this has been done to train a video generation model.

If you want to train a model, you rent a huge GPU cluster and do it there.

But this model was trained differently:

The team that built this model trained different "experts" using separate GPU clusters and data. These GPUs were all different, and didn't need any communication between them.

So instead of allocating 100 interconnected GPUs to train the model, they used 3 GPUs here, 5 there, 4 more over there, etc.

After training, they added a smart router on top of all of the trained experts. This router's job is to take an inference request, and route it to the appropriate experts.

In other words, instead of training a single model, they trained multiple smaller models that work together at inference time.

And the results were really good!

Here is the link to their paper: https://arxiv.org/pdf/2605.26064. It's a very good read.

bidhan@bidhan

We're releasing Paris 2.0, which, to our knowledge, is the world's first decentralized trained video generation model.

We benchmarked it against a monolithic model trained on the same data and compute budget, and Paris 2.0 outperformed the monolithic by ~2x on FVD benchmark.

32d5.8K2711

bidhan@bidhan

Paris 2.0 builds on Paris 1.0, which proved that image generation can be trained on a geographically distributed, heterogeneous pool of GPUs. Temporally coherent video under the same settings remained an open question. Paris 2.0 closes it.

Read the technical report on arxiv: https://arxiv.org/abs/2605.26064

32d12961

bidhan@bidhan

Paris 2.0 is a Decentralized Diffusion Model (DDM). A DDM is an ensemble of independent diffusion models trained in isolation, each on its own slice of the data, and exchanges no gradients, parameters, or activations between them.

During inference, a lightweight router selects a subset of experts at each denoising step.

32d7361

Aditya ⚡Rao@adityarao310

@kimmonismus Your automated AI poster made a mistake. Bytedance doesn't own Bagel Labs. That's a different project named Bagel under Bytedance

32d37315

bidhan@bidhan

Video diffusion backbones increasingly anchor the world models behind physical AI. Our early internal results show the Paris 2.0 recipe carries over to them very well. We will share more on that in future work.

32d6241

bidhan@bidhan

With Paris 2.0, our goal was only to match a monolithic model's benchmarks while training on a distributed, heterogeneous pool of GPUs. It beat them instead.

FVD dropped from 561.04 to 279.01, and CLIP text-video alignment and aesthetic score increased. To our knowledge, the DDM is the first distributed training architecture to surpass its monolithic counterpart under matched data and compute.

Congrats to my team members @roze12321 @mdvillagra24 @ZhiyingJ !

32d5131

Kiri@Kyrannio

@bidhan Congrats Bidhan :). Amazing work!!!!! Can’t wait to try this out

32d3607

ngram@k_nearest

@kimmonismus 1) you're confusing bytedance's "bagel" model with "paris" from "bagel labs". 2) skeptical of Paris. they're gating it, and not for commercial use. they publish only a few frames and a few look bad. If it were good, they wouldn't do that.

32d3996

bidhan ✈️ CVPR@bidhan

@Omarboucher I see. So the bad news is we won't be able to run the model on that machine right now. But the good news is Bagel Labs is working on an inference engine for DDM models actively, which will make this possible very soon. Please stay tuned 🫡

32d451

TomLikesRobots🤖@TomLikesRobots

@bidhan Very interesting direction. I don't think I've seen anything like it.

32d2292

Chubby♨️@kimmonismus

@bidhan Let’s go! Niceee

32d2.2K4

Supreme@supremebeme

@bidhan at first i thought this was only trained on data from the city of Paris 🤣

32d2072

Muhammad Ayan@socialwithaayan

@bidhan Claude who? Paris is eating now 😎

32d3471

bidhan ✈️ CVPR@bidhan

@Kyrannio Thank you for your support Kiri!

32d2551

bidhan ✈️ CVPR@bidhan

@supremebeme wait, that's a good idea for the next model

32d1481

Atul Kumar@atulkumarzz

@bidhan This is truly inspiring! The development of Paris 2.0, the world's first decentralized trained video generation model, is a monumental achievement.

32d402

Özge Döner@astronomerozge1

@bidhan Congrats 👏

32d342

bidhan@bidhan

@MehakdeepK81 Thank you! We're trying hard to make open intelligence work.

32d991