/Tech6h ago

Researchers introduce 3D-DLP, a self-supervised model that decomposes 3D scenes into latent particles without human annotations

The model enables unsupervised scene understanding for robotic manipulation.

44215174.4K

Original post

🧩 #ICML2026 💥 How can a model discover the 3D objects in a scene—their shape, color, and position—without any labels? Introducing 3D-DLP, a self-supervised object-centric model that decomposes colored 3D scenes (RGB-D and voxels) into a set of 3D latent particles.

8:04 AM · Jun 24, 2026 · 3.1K Views

Sentiment

Users thank collaborators on the 3D-DLP model for discovering objects in 3D scenes without labels.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Ellina Zhang@EllinaZhang

Self-supervised object-centric models break scenes into entities with no labels, but progress has stayed in 2D—which can't recover occlusions or precise geometry. We extend Deep Latent Particles (DLP), a fully unsupervised VAE over latent particles, directly into 3D.

9h451

LIKES2

Ellina Zhang@EllinaZhang

Finally, we ask whether 3D particles help downstream control. We feed 3D-DLP tokens into an entity-centric diffusion policy (EC-Diffuser) and evaluate on 12 MimicGen and 10 language-conditioned RLBench tasks.

9h232

REPLIES1

Ellina Zhang@EllinaZhang

3D-DLP is a first practical bridge from self-supervised 3D scene decomposition to downstream control. Open challenges remain—scaling to dynamic, cluttered, in-the-wild scenes—and extending to dynamics and world modeling in 3D particle space.

9h421

Ellina Zhang@EllinaZhang

We introduce three variants for three sensing modalities: 3D-DLP-D for RGB-D, 3D-DLP-V for occupancy voxels, and 3D-DLP-VC for colored RGB voxels—the most general and most challenging of the three.

9h372

Ellina Zhang@EllinaZhang

Each particle carries explicit, disentangled 3D attributes: a keypoint position, a bounding-box scale, a presence value, and appearance features. Unlike 2D DLP, occlusion is handled directly by the 3D rendering instead of an explicit latent variable.

9h312

Ellina Zhang@EllinaZhang

Plain MSE reconstruction has a failure mode: it can match brightness using gray and wash out color, which we call “gray collapse”. A chroma loss penalizes color error on occupied voxels, recovering faithful hue and saturation

9h272

Ellina Zhang@EllinaZhang

The learned latents are interpretable and controllable. Move a particle's 3D keypoint and the object translates; change its scale and it resizes—confirming that particles encode genuinely editable 3D object properties

9h242

Ellina Zhang@EllinaZhang

Porting 2D DLP to voxels doesn't just work out of the box. We identify two components that make it possible: an appearance-aware K-means keypoint prior, and a chroma reconstruction loss. We validate both through ablations.

9h311

Ellina Zhang@EllinaZhang

The spatial-softmax prior from 2D DLP collapses on sparse voxels, so instead we cluster occupied voxels in a joint color (CIELAB) and 3D-position space, weighted by lightness. This places keypoints right on object surfaces and color boundaries.

9h281

Ellina Zhang@EllinaZhang

The decoder maps each particle to a canonical cubic RGBA patch, places it into the global grid with a 3D spatial transformer, and volumetrically composites it with the background. Everything is trained end-to-end as a VAE over the ELBO.

9h261

Ellina Zhang@EllinaZhang

They do, and the 3D lift matters: 48.1% mean success on MimicGen vs 30.8/34.1% for 2D-DLP and 47.3% for a dense raw voxel policy. On RLBench, 3D-DLP wins 9 of 10 matched-compute tasks.

9h241

Ellina Zhang@EllinaZhang

Put together, 3D-DLP discovers semantic keypoints, boxes, and per-object masks with no supervision, and reconstructs scenes far more faithfully than non-object-centric AE/VAE baselines (24.4 vs 11.4 masked PSNR on MimicGen).

9h241

Ellina Zhang@EllinaZhang

Huge thanks to my collaborators:@madhiyen , Amir Zadeh and Chuan Li (@LambdaAPI), @davheld, @pathak2206, and @TalDaniel8🙏 🌐 Website: https://eubooks3003.github.io/3d-dlp/ 📄 Paper: https://arxiv.org/abs/2606.19451 💻 Code: https://github.com/Eubooks3003/3d-dlp

6h121