/AI23h ago

Researcher Unveils MAMMA Markerless Motion Capture System at CVPR

601.6K197866162K
Original post
Michael Black@Michael_J_Black#339inAI

I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.

11:01 AM · Jun 5, 2026 · 160.1K Views
Sentiment

Many users praised researcher Michael Black's MAMMA markerless motion capture system as an incredible breakthrough because it delivers marker-based accuracy for multi-person scenes without markers.

Pos
100.0%
Neg
0.0%
15 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.4K
Michael Black@Michael_J_Black

In my lab at MPI, we have now gone completely markerless. MAMMA is trained to work well when multiple people are interacting closely and captures hand motion — these are challenging for marker-based systems.

23hViews 5.4KLikes 57Bookmarks 6
BOOKMARKS62LIKES65RETWEETS7REPLIES6
Michael Black@Michael_J_Black

Project: https://mamma.is.tue.mpg.de/ Code: https://github.com/cuevhv/mamma Paper: https://arxiv.org/abs/2506.13040

23hViews 3.3KLikes 65Bookmarks 62
Michael Black@Michael_J_Black

The approach is trained on the 2-person case but generalizes well to more people interacting closely.

We make the code available for research and hope it will democratize mocap — the world needs more 3D motions. Commercial licensing is also possible.

23hViews 2.5KLikes 38Bookmarks 11
Michael Black@Michael_J_Black

It can work with consumer cameras like iPhones, which lets you take mocap out of the lab and into the world where it should be.

23hViews 4.1KLikes 49Bookmarks 6
Michael Black@Michael_J_Black

The key idea is to train a network to estimate dense keypoints on the surface of the body. These are like virtual mocap markers. Our network architecture uses per-landmark learnable tokens, which are key to accuracy.

23hViews 2.8KLikes 33Bookmarks 8
Michael Black@Michael_J_Black

To make this robust to multi-person occlusion and hand motion, we created a synthetic dataset that we use to train it. The network is trained to predict occlusion and contact probabilities.

23hViews 2.6KLikes 36Bookmarks 7
Michael Black@Michael_J_Black

Try it and capture lots of data. Let us know how it goes. MAMMA is the result of an amazing team: @Hanzcun, @soyong_shin, @TsvetelinaAlex2, @AYiannakidis, @gfgbec, Markus Höschle, Joachim Tesch & Taylor Obersat.

23hViews 3.4KLikes 24Bookmarks 5
Chris Paxton@chris_j_paxton

Making creation more accessible. Will be useful for scaling robot learning, but also animations for games and video

Michael Black@Michael_J_Black

I’ve been capturing 3D human motion for 30 years and today is maybe the biggest day in that history. We are presenting MAMMA at CVPR (oral session 2A). MAMMA is a markerless multi-camera system that has accuracy similar to marker-based systems.

3hViews 2.8KLikes 14Bookmarks 6
Michael Black@Michael_J_Black

@KitsuneFuzzy We do not use LiDAR. We just use the video. So you could use GoPro or any other camera if you like. We just chose iPhones because they are easy to get. Faces are in the works.

12hViews 742Likes 10Bookmarks 1
Michael Black@Michael_J_Black

@studio_galt Stay tuned for updates! We have a version of this but it is not in this release.

21hViews 750Likes 5Bookmarks 1
Michael Black@Michael_J_Black

@sen3d There is a license for noncommercial research. And a commercial license can be obtained from Max Plank. They typically have pricing for different sized companies. I’ll talk to them about indie licensing.

12hViews 369Likes 3Bookmarks 2
Michael Black@Michael_J_Black

@Maslp9 We use black magic to get the sink between the phones. It isn’t 100% perfect but it’s good enough. People tell me that there are other solutions. But we didn’t explore them.

12hViews 320Likes 1Bookmarks 2
Kitsune Fuzzy 🦊@KitsuneFuzzy

This looks amazing! It sounds like 4 iPhone cameras would be enough to capture these motions including finger tracking? I assume iPhones because it needs the LiDAR tech? So it would be better to keep the capture space as tight as possible it seems.

I assume it does not capture faces and that would still require a 5th iPhone with a head-rig to record that seperately for other software?

12hViews 906Likes 1Bookmarks 1

@Michael_J_Black What would be great is an app that frame locks multiple iPhones together via Bluetooth. Just hit record on one and all of them sync and capture

15hViews 460Likes 1Bookmarks 1
Michael Black@Michael_J_Black

@BongBong We do a lot of work on binocular motion capture as well. But if you really want a super high accuracy, multi camera is the way to go. How many fingers am I holding up behind my back. You just can’t answer everything with one camera.

16hViews 761Likes 11
Robert Dale Smith@RobertDaleSmith

@Michael_J_Black @iBrews Holy moly!! So so important to solving this.

16hViews 377Likes 3Bookmarks 1
AIMathematician@CustomAIMath

@Michael_J_Black ive just seen this post days after what ive been working on for @imagine to solve this skeleton ,positioning and having a geological lock to respect perspectives and ratios and a anatomical constraint to respect structural composition

https://gist.github.com/AiMathematician/583651139f0ac4055eaeb03fe1036543

13hViews 85Bookmarks 1

@Michael_J_Black Very cool. When do you think feet hovering, feet sliding and feet intersecting the ground will be solved? It seems like a continuous problem.

18hViews 1.3KLikes 2
BongBong@BongBong

@Michael_J_Black Shouldn't be much longer until one-camera motion/performance capture should be possible. A.I. systems should theoretically be able to "intuit" beyond visible motion to fill in the motion not seen.

19hViews 943Likes 1
StudioGaltMocap@studio_galt

@Michael_J_Black Looks very cool!

Any chance the technique works for facial data as well?

22hViews 883Likes 1
Load more posts