I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.
Many users praised researcher Michael Black's MAMMA markerless motion capture system as an incredible breakthrough because it delivers marker-based accuracy for multi-person scenes without markers.
Most Activity

In my lab at MPI, we have now gone completely markerless. MAMMA is trained to work well when multiple people are interacting closely and captures hand motion — these are challenging for marker-based systems.

Project: https://mamma.is.tue.mpg.de/ Code: https://github.com/cuevhv/mamma Paper: https://arxiv.org/abs/2506.13040

The approach is trained on the 2-person case but generalizes well to more people interacting closely.
We make the code available for research and hope it will democratize mocap — the world needs more 3D motions. Commercial licensing is also possible.

It can work with consumer cameras like iPhones, which lets you take mocap out of the lab and into the world where it should be.

The key idea is to train a network to estimate dense keypoints on the surface of the body. These are like virtual mocap markers. Our network architecture uses per-landmark learnable tokens, which are key to accuracy.

To make this robust to multi-person occlusion and hand motion, we created a synthetic dataset that we use to train it. The network is trained to predict occlusion and contact probabilities.

Try it and capture lots of data. Let us know how it goes. MAMMA is the result of an amazing team: @Hanzcun, @soyong_shin, @TsvetelinaAlex2, @AYiannakidis, @gfgbec, Markus Höschle, Joachim Tesch & Taylor Obersat.
Making creation more accessible. Will be useful for scaling robot learning, but also animations for games and video
I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.

@KitsuneFuzzy We do not use LiDAR. We just use the video. So you could use GoPro or any other camera if you like. We just chose iPhones because they are easy to get. Faces are in the works.

@studio_galt Stay tuned for updates! We have a version of this but it is not in this release.

@sen3d There is a license for noncommercial research. And a commercial license can be obtained from Max Plank. They typically have pricing for different sized companies. I’ll talk to them about indie licensing.

@Maslp9 We use black magic to get the sink between the phones. It isn’t 100% perfect but it’s good enough. People tell me that there are other solutions. But we didn’t explore them.

This looks amazing! It sounds like 4 iPhone cameras would be enough to capture these motions including finger tracking? I assume iPhones because it needs the LiDAR tech? So it would be better to keep the capture space as tight as possible it seems.
I assume it does not capture faces and that would still require a 5th iPhone with a head-rig to record that seperately for other software?

@Michael_J_Black What would be great is an app that frame locks multiple iPhones together via Bluetooth. Just hit record on one and all of them sync and capture

@BongBong We do a lot of work on binocular motion capture as well. But if you really want a super high accuracy, multi camera is the way to go. How many fingers am I holding up behind my back. You just can’t answer everything with one camera.

@Michael_J_Black @iBrews Holy moly!! So so important to solving this.

@Michael_J_Black ive just seen this post days after what ive been working on for @imagine to solve this skeleton ,positioning and having a geological lock to respect perspectives and ratios and a anatomical constraint to respect structural composition
https://gist.github.com/AiMathematician/583651139f0ac4055eaeb03fe1036543

@Michael_J_Black Very cool. When do you think feet hovering, feet sliding and feet intersecting the ground will be solved? It seems like a continuous problem.

@Michael_J_Black Shouldn't be much longer until one-camera motion/performance capture should be possible. A.I. systems should theoretically be able to "intuit" beyond visible motion to fill in the motion not seen.

@Michael_J_Black Looks very cool!
Any chance the technique works for facial data as well?