13h ago

Cambrian series co-creator Saining Xie introduces Cambrian-P, an MLLM grounded in camera pose for spatial reasoning

The architecture uses pose tokens instead of heavy 3D modules.

123406317037.5K

——0——

Original post

#908@CSPROFKGDOP

Jihan Yang@JIHANYANG13

Camera pose matters for video understanding! Today's MLLMs excel at recognizing activities, but still struggle with the underlying space and ego/object dynamics in video. We trace this gap to a missing piece: camera pose. Introducing Cambrian-P: a multimodal LLM natively grounded in camera pose. (1/n)

4:14 PM · May 26, 2026

Reposted by

#158@SAININGXIE

QUOTE POST

#158Saining Xie@SAININGXIE

📸latest in our cambrian series: cambrian-p, p for pose. i think pose is probably the minimal sufficient 3d signal (and it’s easy to get!) that we need for robust video multimodal models -- jointly modeling frames and pose turns image sequences into a globally grounded structure.

Jihan Yang@jihanyang13

11:14 PM · May 26, 2026 · 24.1K Views

2:12 AM · May 27, 2026 · 11.2K Views

Cambrian series co-creator Saining Xie introduces Cambrian-P, an MLLM grounded in camera pose for spatial reasoning

Sentiment

Cluster engagement