CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation
Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoid robot to perform forceful manipulation tasks such as moving objects, wiping, and pushing a cart. We propose adaptive Compliance Humanoid control through hIsight Perturbation (CHIP), a plug-and-play module that enables controllable end-effector stiffness while preserving agile tracking of dynamic reference motions. CHIP is easy to implement and requires neither data augmentation nor additional reward tuning. We show that a generalist motion-tracking controller trained with CHIP can perform a diverse set of forceful manipulation tasks that require different end-effector compliance, such as multi-robot collaboration, wiping, box delivery, and door opening.
💡 Research Summary
The paper introduces CHIP (Adaptive Compliance Humanoid control through Hindsight Perturbation), a lightweight plug‑and‑play module that equips any key‑point based humanoid motion‑tracking framework with controllable end‑effector compliance. The central insight is to keep the original reference motion untouched and instead modify the observation fed to the policy by subtracting the expected displacement caused by an applied perturbation force. During training a random external force f is applied to an end‑effector for a random duration, and the compliance coefficient k (specified by the user) scales this force. The “hindsight” tracking goal presented to the policy is g_hind = g – k·f, where g is the original key‑point target. The reward, however, is still computed against the original goal g, typically as an exponential of the distance between the current end‑effector pose x_eef and g. This design eliminates the need to edit dense reference trajectories or redesign reward terms, which is especially problematic for dynamic skills such as running or back‑flipping.
Training uses PPO with an actor‑critic architecture. The critic receives the ground‑truth perturbation force as a privileged observation to improve value estimation, while both actor and critic are given a 10‑step history of proprioceptive data and past actions. This history enables the policy to infer the perturbation from noisy observations, effectively learning an implicit force estimator without an explicit model. At deployment time, only the original tracking goal g, the compliance coefficient k, and proprioception are required; the policy automatically generates actions that yield to external forces according to the commanded stiffness.
CHIP is demonstrated in two settings.
- Local 3‑point tracking (head and both hands) where a kinematic planner supplies lower‑body reference poses. This configuration allows a single humanoid to perform agile motions (running, dancing, squatting) while its hands exhibit variable compliance for tasks such as wiping a whiteboard, opening a door, or delivering a box.
- Global 3‑point tracking (world‑frame head pose and wrist positions) for multi‑robot collaboration. By sharing the same global targets, multiple humanoids can coordinate their end‑effectors and apply complementary forces, enabling cooperative grasping and transport of objects larger than a single robot can handle. The authors extend the SpringGrasp optimization to a two‑robot setting, solving for collision‑free head orientations and wrist positions that satisfy a compliant grasp.
Experimental results show that CHIP‑augmented policies retain the high‑gain tracking performance of baseline motion‑tracking agents while dramatically improving robustness to external disturbances. In force‑rich manipulation scenarios the robots produce smooth, spring‑like responses whose magnitude is directly tunable via k. Importantly, no additional synthetic perturbation data, offline inverse‑kinematics augmentation, or reward reshaping is required; the method works directly on existing large‑scale motion‑capture datasets.
In summary, CHIP provides a simple yet powerful recipe: apply random perturbations during training, feed the “undo‑perturbation” goal to the policy, and keep the original dense reward unchanged. This yields a generalizable, scalable compliance mechanism that bridges the gap between agile humanoid locomotion and safe, controllable contact‑rich manipulation, opening the door to more capable, collaborative, and teleoperated humanoid robots.
Comments & Academic Discussion
Loading comments...
Leave a Comment