/Tech1d ago

KE:SAI Releases 123D, Largest Unified Open Dataset for Autonomous Driving

113271K
Bernhard Jaeger@bern_jaeger

馃敩 This weeks research highlight is 123D, KE:SAI's effort to unify all open driving data, creating the largest and most diverse pool of autonomous driving data out there.

5:02 AM 路 Jun 9, 2026 路 1K Views
Sentiment

Users are excited about the 123D platform unifying open driving datasets for KE:SAI because it enables training large-scale open foundation models without relying on proprietary sources.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS91BOOKMARKS1LIKES1
Bernhard Jaeger@bern_jaeger

馃摐 "123D: Unifying Multi-Modal Autonomous Driving Data at Scale"

https://arxiv.org/abs/2605.08084

1dViews 91Likes 1Bookmarks 1
REPLIES1
Bernhard Jaeger@bern_jaeger

I am particularly excited about 123D for KE:SAI because it enables us to train large-scale open foundation models without having to rely on proprietary data, making our work easily reproducible and easier to share.

1dViews 11
Bernhard Jaeger@bern_jaeger

Many big advancements in AI in recent years were preceded by a consolidation effort around data that enabled them.

1dViews 64
Bernhard Jaeger@bern_jaeger

You can find the open-source code on GitHub: https://github.com/kesai-labs/py123d

1dViews 28
Bernhard Jaeger@bern_jaeger

馃實 To solve this, KE:SAI has developed 123D, an open-source framework that unifies multimodal driving data through a single API.

Today, 123D already unifies 3300 hours of data spanning 90000 km of real-world driving from nuScenes, Waymo, Argoverse, and many others.

1dViews 17
Bernhard Jaeger@bern_jaeger

To name a few, Common Crawl enabled the training of large language models, LAION enabled the training of diffusion models for image generation, and Open X-Embodiment enabled robotics foundation models.

1dViews 14
Bernhard Jaeger@bern_jaeger

Now with 123D, you can release your data in a unified format that is compatible with all the existing datasets and directly benefit from new research breakthroughs.

We hope 123D will encourage more companies to contribute data to the open data ecosystem in the coming years.

1dViews 12
Bernhard Jaeger@bern_jaeger

Each dataset adopts different modalities: different cameras, lidars, ego states, annotations, HD maps, each with different rates and synchronization scheme.

1dViews 11
Bernhard Jaeger@bern_jaeger

It enables easily studying areas such as viewpoint robust 3D object detection or testing the generalization capabilities of reinforcement learning agents.

1dViews 9
Bernhard Jaeger@bern_jaeger

The 123D paper provides some baselines for these tasks, but there is still a lot of room for new methods to improve performance.

We hope the community leverages the 123D data to solve some of the important open generalization problems in autonomous driving.

1dViews 9
Bernhard Jaeger@bern_jaeger

馃殫 Autonomous driving has yet to see this type of consolidation.

Despite there being many different datasets available online, it is very hard to use them jointly.

1dViews 9
Bernhard Jaeger@bern_jaeger

馃彮 If you are a company that wants to spend the effort and time to open-source data, this used to present a significant risk.

Since your data will be incompatible with all the existing research dataset formats, your dataset might simply not get adopted by the research community.

1dViews 7
Bernhard Jaeger@bern_jaeger

If you try making your code compatible with all the different coordinate system conventions out there, you will quickly throw your PC out of the window.

1dViews 7
Bernhard Jaeger@bern_jaeger

123D is a collaboration between many institutions and people, in particular:

@DanielDauner, Valentin Charraut, @BastianBerle , Tianyu Li, Long Nguyen, Jiabao Wang, Changhui Jing, @MaxiIgl, Holger Caesar, @iamborisi , @yiyi_liao_, Andreas Geiger, and Kashyap Chitta.

1dViews 24