Over the past few years, motion tracking has largely taken over humanoid whole-body control. Most motion tracking methods rely on explicit phase variables or future target poses to track reference motions.
But, do we actually need them?
We find that task conditions and scene observations alone can already provide enough structure for reference motion tracking. Building on this observation, we introduce HIL: Hybrid Imitation Learning.
Using a unified goal-conditioned observation space, we formulate motion tracking and adversarial imitation learning as a single end-to-end multi-task learning problem.
This allows a single policy to simultaneously: • track reference motions with high fidelity • compose and adapt skills through adversarial imitation learning
By sharing the same observation representation across both tasks, behaviors learned from motion tracking naturally transfer to more general goal-conditioned control.
📄 To appear in ACM Transactions on Graphics (TOG 2026) & SIGGRAPH 2027
🌐 https://jiashunwang.github.io/HIL
🤖 A real-world humanoid follow-up is coming soon