/AI23h ago

Researcher Unveils MAMMA Markerless Motion Capture System at CVPR

601.6K197866162K

Original post

I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.

11:01 AM · Jun 5, 2026 · 160.1K Views

Sentiment

Many users praised researcher Michael Black's MAMMA markerless motion capture system as an incredible breakthrough because it delivers marker-based accuracy for multi-person scenes without markers.

Pos

100.0%

Neg

0.0%

15 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.4K

Michael Black@Michael_J_Black

In my lab at MPI, we have now gone completely markerless. MAMMA is trained to work well when multiple people are interacting closely and captures hand motion — these are challenging for marker-based systems.

23h5.4K576

BOOKMARKS62LIKES65RETWEETS7REPLIES6

Michael Black@Michael_J_Black

Project: https://mamma.is.tue.mpg.de/ Code: https://github.com/cuevhv/mamma Paper: https://arxiv.org/abs/2506.13040

23h3.3K6562

Michael Black@Michael_J_Black

The approach is trained on the 2-person case but generalizes well to more people interacting closely.

We make the code available for research and hope it will democratize mocap — the world needs more 3D motions. Commercial licensing is also possible.

23h2.5K3811

Michael Black@Michael_J_Black

It can work with consumer cameras like iPhones, which lets you take mocap out of the lab and into the world where it should be.

23h4.1K496

Michael Black@Michael_J_Black

The key idea is to train a network to estimate dense keypoints on the surface of the body. These are like virtual mocap markers. Our network architecture uses per-landmark learnable tokens, which are key to accuracy.

23h2.8K338

Michael Black@Michael_J_Black

To make this robust to multi-person occlusion and hand motion, we created a synthetic dataset that we use to train it. The network is trained to predict occlusion and contact probabilities.

23h2.6K367

Michael Black@Michael_J_Black

Try it and capture lots of data. Let us know how it goes. MAMMA is the result of an amazing team: @Hanzcun, @soyong_shin, @TsvetelinaAlex2, @AYiannakidis, @gfgbec, Markus Höschle, Joachim Tesch & Taylor Obersat.

23h3.4K245

Chris Paxton@chris_j_paxton

Making creation more accessible. Will be useful for scaling robot learning, but also animations for games and video

Michael Black@Michael_J_Black

3h2.8K146

Michael Black@Michael_J_Black

@KitsuneFuzzy We do not use LiDAR. We just use the video. So you could use GoPro or any other camera if you like. We just chose iPhones because they are easy to get. Faces are in the works.

12h742101

Michael Black@Michael_J_Black

@studio_galt Stay tuned for updates! We have a version of this but it is not in this release.

21h75051

Michael Black@Michael_J_Black

@sen3d There is a license for noncommercial research. And a commercial license can be obtained from Max Plank. They typically have pricing for different sized companies. I’ll talk to them about indie licensing.

12h36932

Michael Black@Michael_J_Black

@Maslp9 We use black magic to get the sink between the phones. It isn’t 100% perfect but it’s good enough. People tell me that there are other solutions. But we didn’t explore them.

12h32012

Kitsune Fuzzy 🦊@KitsuneFuzzy

This looks amazing! It sounds like 4 iPhone cameras would be enough to capture these motions including finger tracking? I assume iPhones because it needs the LiDAR tech? So it would be better to keep the capture space as tight as possible it seems.

I assume it does not capture faces and that would still require a 5th iPhone with a head-rig to record that seperately for other software?

12h90611

Thomas Halpin@Maslp9

@Michael_J_Black What would be great is an app that frame locks multiple iPhones together via Bluetooth. Just hit record on one and all of them sync and capture

15h46011

Michael Black@Michael_J_Black

@BongBong We do a lot of work on binocular motion capture as well. But if you really want a super high accuracy, multi camera is the way to go. How many fingers am I holding up behind my back. You just can’t answer everything with one camera.

16h76111

Robert Dale Smith@RobertDaleSmith

@Michael_J_Black @iBrews Holy moly!! So so important to solving this.

16h37731

AIMathematician@CustomAIMath

@Michael_J_Black ive just seen this post days after what ive been working on for @imagine to solve this skeleton ,positioning and having a geological lock to respect perspectives and ratios and a anatomical constraint to respect structural composition

https://gist.github.com/AiMathematician/583651139f0ac4055eaeb03fe1036543

13h851

Infinite-Realities@8Infinite8

@Michael_J_Black Very cool. When do you think feet hovering, feet sliding and feet intersecting the ground will be solved? It seems like a continuous problem.

18h1.3K2

BongBong@BongBong

@Michael_J_Black Shouldn't be much longer until one-camera motion/performance capture should be possible. A.I. systems should theoretically be able to "intuit" beyond visible motion to fill in the motion not seen.

19h9431

StudioGaltMocap@studio_galt

@Michael_J_Black Looks very cool!

Any chance the technique works for facial data as well?

22h8831