Introducing SERF: a spatiotemporal environment and robot feature map for long-horizon mobile manipulation.
We demonstrate that conditioning a mobile manipulation policy on a SERF map enables long-horizon reasoning.
Introducing SERF: a spatiotemporal environment and robot feature map for long-horizon mobile manipulation.
We demonstrate that conditioning a mobile manipulation policy on a SERF map enables long-horizon reasoning.
Users express strong excitement about the new SERF Map for long-horizon mobile robot manipulation, reacting with emojis that signal clear approval of the research advance.
No Digg Deeper questions have been answered for this story yet.
3d feature maps are going to enable such rich interaction with the world. This is still in my opinion a deeply underexplored area of robotics
Introducing SERF: a spatiotemporal environment and robot feature map for long-horizon mobile manipulation.
We demonstrate that conditioning a mobile manipulation policy on a SERF map enables long-horizon reasoning.

Long-Horizon Mobile Manipulation
We evaluate SERF on BEHAVIOR-1K household mobile manipulation tasks.
Compared with an image-only VLA policy, SERF improves task progress, follows more direct trajectories, and reaches subgoals faster.
5/n

Map Representation
SERF represents the environment and the robot body as neural points in a shared latent space.
The neural points are trained to reconstruct dense DINOv3 embeddings, giving the policy persistent scene memory and explicit robot–environment spatial context.
2/n

OOD Configuration Shifts
SERF enables the policy to handle:
• moved goal locations • additional target objects • target objects in previously unvisited regions
Across these settings, explicit spatial memory helps maintain higher task progress under scene changes.
6/n

Map Updates
During task execution, SERF updates the map online from egocentric observations and robot states:
• environment points are updated using object-level rigid transforms • robot points are updated via forward kinematics from the current state
3/n

Check out the project page, paper, and code for more details and results.
Project: https://existentialrobotics.org/serf/ Paper: https://arxiv.org/abs/2606.12956 Mapping code: https://github.com/ExistentialRobotics/SERF-mapping VLA code: https://github.com/ExistentialRobotics/SERF-VLA
9/n

Map-Conditioned VLA Policy
We condition a VLA policy on SERF map tokens, in addition to image observations, proprioceptive state, and task information.
Map tokens are extracted across global, robot-base, and end-effector frames to provide both local and global context.
4/n

In long-horizon mobile manipulation, the robot must continually answer:
• Where am I? • What has changed around me? • How far along am I in my task?
SERF suggests that 3D maps can help provide answers to these questions.
8/n

Failure Recovery
SERF also enables more reliable dropped-object recovery
When the robot drops an object during transport, SERF helps the policy re-localize the dropped object and re-grasp it more reliably than the image-only policy.
7/n

It was a pleasure working with this amazing team: @Byeonghyun_Pak (co-first), Kehan Long, Yulun Tian, and @natanaso!

@sssssshwan 🤯💯

@sssssshwan Where can we buy the robot?