Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching

Reading time: 5 minute
...

📝 Original Info

  • Title: Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
  • ArXiv ID: 2602.15827
  • Date: 2026-02-17
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. **

📝 Abstract

While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.

💡 Deep Analysis

📄 Full Content

Fig. 1: Perceptive Humanoid Parkour (PHP) enables a Unitree G1 humanoid robot to execute highly dynamic, long-horizon parkour behaviors using onboard perception. By composing various agile human skills via motion matching and a teacherstudent training pipeline, we train a single multi-skill visuomotor policy capable of complex contact-rich maneuvers including (a) cat-vaulting over a short obstacle followed by dash-vaulting over a higher obstacle at approximately 3 m/s, (b) climbing onto a 1.25 m (96% of robot height) wall, and rolling down, (c) speed-vaulting over an obstacle at approximately 3 m/s, and (d) a 60-second continuous traversal of a complex parkour course with autonomous skill selection and seamless transitions.

Abstract-While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also humanlike motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearestneighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motiontracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only

Achieving the agility and adaptivity of human motion in traversing complex terrains remains a central challenge for humanoid robotics. Humans traverse challenging terrains of drastically different dimensions by rapidly selecting and chaining dynamic whole-body skills based on perceived environmental context. Our goal is to endow humanoids with the same capability. In this work, we study parkour as a concrete, self-contained testbed for this broader objective.

Parkour highlights several core challenges. First, the robot must perform highly dynamic and contact-rich skills, such as climbing walls around or above its body height or vaulting over obstacles within fractions of a second. This requires effective control in the humanoid’s vast, high-dimensional action space. Second, these skills must be tightly coupled with exteroception, such as vision, to enable adaptation to environmental variation and rapid reaction to unexpected perturbations. Furthermore, to generalize beyond isolated maneuvers and traverse complex obstacle courses, the robot must consolidate many highly dynamic skills into a single visuomotor policy, which becomes increasingly difficult as the number and diversity of required skills grow.

Human motion data has become essential for learning highly dynamic humanoid behaviors. Prior work [20,43] has used human motion data to successfully demonstrate highly dynamic skills such as jumping, rolling, and flipping. However, highly dynamic motion data is inherently scarce: capturing fast, contact-rich maneuvers typically requires specialized setups and careful curation, so datasets often include only one or two demonstrations per skill, each lasting just a few seconds. This scarcity is not unique to parkour but applies broadly to all dynamic human skills. Yet long-horizon tasks such as parkour require both rich within-skill variation that adapts to how the robot approaches an obstacle, and smooth, natural transitions between multiple skills across complex courses.

To address this challenge, we adopt motion matching [5,9] as a simple yet powerful mechanism. Motion matching synthesizes long-horizon motion by retrieving and stitching motion fragments via nearest-neighbor search in a designed feature space. Crucially, this process densifies a sparse motion library by producing diverse transitions across approach distances, headings, and timing, while preserving the realism of captured motions. In our framework, motion matching enables the generation of a large set of obstacle-adaptive, long-horizon kinematic reference trajectories for downstream policy learning.

Learning a visuomotor policy that executes dozens of highly dynamic skills requires perceptive inputs that can be efficiently simulated and reliably transferred to the real world. To improve training efficiency, prior work typically trains privileged statebased experts in simulation and distills them into vision-based students using DAgger [32]. Howe

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut