SPIDER: Scalable Physics-Informed Dexterous Retargeting

SPIDER: Scalable Physics-Informed Dexterous Retargeting
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic information like force and torque, these demonstrations cannot be directly executed on robots. To bridge this gap, we propose Scalable Physics-Informed DExterous Retargeting (SPIDER), a physics-based retargeting framework to transform and augment kinematic-only human demonstrations to dynamically feasible robot trajectories at scale. Our key insight is that human demonstrations should provide global task structure and objective, while large-scale physics-based sampling with curriculum-style virtual contact guidance should refine trajectories to ensure dynamical feasibility and correct contact sequences. SPIDER scales across diverse 9 humanoid/dexterous hand embodiments and 6 datasets, improving success rates by 18% compared to standard sampling, while being 10X faster than reinforcement learning (RL) baselines, and enabling the generation of a 2.4M frames dynamic-feasible robot dataset for policy learning. As a universal physics-based retargeting method, SPIDER can work with diverse quality data and generate diverse and high-quality data to enable efficient policy learning with methods like RL.


💡 Research Summary

The paper introduces SPIDER (Scalable Physics‑Informed Dexterous Retargeting), a framework that converts large‑scale human motion capture, video‑derived, or VR‑generated demonstrations—typically only kinematic—into dynamically feasible robot trajectories. The authors identify three core challenges: (1) the embodiment gap between human morphology and robot actuation, which makes direct kinematic transfer infeasible; (2) the need for a method that can scale to internet‑size datasets without prohibitive computational cost; and (3) the lack of force/torque information in human data, which hampers contact‑rich manipulation.

To address these, SPIDER formulates retargeting as a constrained optimization problem that minimizes a weighted sum of (i) deviation from the reference state trajectory (positions and velocities of robot joints, base, and objects) and (ii) control effort, subject to the robot’s forward dynamics f(x, u, t). Because the cost landscape is highly non‑convex and discontinuous (especially due to contacts), the authors adopt a sampling‑based optimizer with an annealed covariance schedule. At each iteration, a set of Gaussian perturbations is added to the current control sequence, the resulting trajectories are rolled out in a parallel physics simulator, and a weighted average of the best samples updates the control sequence. The annealing gradually shrinks the sampling radius, enabling coarse global exploration early on and fine local refinement later, which dramatically improves convergence compared with fixed‑radius methods such as MPPI.

A key innovation is Virtual Contact Guidance (VCG). In contact‑rich tasks, many feasible contact modes can achieve the same object motion, but only a subset matches the human demonstration’s intent. VCG introduces temporary virtual constraints (spring‑like forces) between the robot’s finger links and the target contact points on the object. During early optimization stages the constraints are strong, “sticking” the object to the desired hand configuration and expanding the feasible basin toward the intended contact mode. The constraint strength is annealed to zero as optimization proceeds, allowing the robot to naturally satisfy the physical contacts. An adaptive filter disables VCG for contacts that appear unstable (short duration or large drift), preventing noisy demonstrations from corrupting the solution.

To bridge the simulation‑to‑reality gap, SPIDER incorporates a worst‑case (min–max) objective over a bounded set of physical parameters (friction coefficients, compliance, mesh errors). By maximizing the cost over this set before minimizing, the resulting control sequence is robust to the full range of plausible dynamics, akin to a pessimistic domain randomization.

The authors evaluate SPIDER on nine distinct humanoid and dexterous hand embodiments (including XHand, Ability Hand, Inspire Hand, Schunk Hand, etc.) across six publicly available human motion datasets (motion capture, video‑based 3D reconstruction, VR recordings). Compared with a baseline sampling method, SPIDER improves task success rates by an average of 18 %. When contrasted with reinforcement‑learning pipelines that first train policies on the same tasks, SPIDER generates the required robot‑feasible data roughly ten times faster, enabling the creation of a 2.4 million‑frame (≈800 hours) dataset covering five hand models and 103 objects. Policies trained on this dataset converge faster and achieve higher final performance than those trained on RL‑generated data.

Beyond dataset generation, SPIDER is demonstrated for (i) trajectory robustification for direct deployment on real hardware, (ii) augmentation of a single human demonstration to multiple objects and environments, and (iii) boosting RL policy learning by providing high‑quality, dynamically feasible demonstrations as a curriculum.

In summary, SPIDER combines (1) preservation of high‑level task intent from human demonstrations, (2) physics‑based sampling with annealed exploration‑exploitation, (3) virtual contact guidance to enforce human‑like contact modes, and (4) worst‑case robustness to dynamics uncertainty. This synergy yields a scalable, general‑purpose retargeting pipeline that turns abundant human motion data into valuable robot training material, dramatically reducing the data collection burden for dexterous manipulation and humanoid control research.


Comments & Academic Discussion

Loading comments...

Leave a Comment