Semantically annotating 3D gaussian splats on the fly using gemini 3.1 + sparkjs
1. Load any 3D scene and hit scan 2. Get 2D detections from VLM 3. Cluster outputs & project into 3D world space 4. Save as a persistent 3D semantic layer
Inspired by @alexanderchen's experiments with gemini visual intelligence. Just had to try to lift it from 2D to 3D!
Those asking how to do this, here's the recipe: TL;DR wiggle the camera around while taking screen shots and asking Gemini for screen space annotations and then clustering them so they are pinned to the right spot in 3d space. You can make it even more precise to project into world space by consider SAM3 labels to reason about containment while still using the VLM for the richer label / description.
Semantically annotating 3D gaussian splats on the fly using gemini 3.1 + sparkjs 1. Load any 3D scene and hit scan 2. Get 2D detections from VLM 3. Cluster outputs & project into 3D world space 4. Save as a persistent 3D semantic layer Inspired by @alexanderchen's experiments with gemini visual intelligence. Just had to try to lift it from 2D to 3D!