One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Dexterous manipulation policies today largely assume fixed hand designs, severely restricting their generalization to new embodiments with varied kinematic and structural layouts. To overcome this limitation, we introduce a parameterized canonical representation that unifies a broad spectrum of dexterous hand architectures. It comprises a unified parameter space and a canonical URDF format, offering three key advantages. 1) The parameter space captures essential morphological and kinematic variations for effective conditioning in learning algorithms. 2) A structured latent manifold can be learned over our space, where interpolations between embodiments yield smooth and physically meaningful morphology transitions. 3) The canonical URDF standardizes the action space while preserving dynamic and functional properties of the original URDFs, enabling efficient and reliable cross-embodiment policy learning. We validate these advantages through extensive analysis and experiments, including grasp policy replay, VAE latent encoding, and cross-embodiment zero-shot transfer. Specifically, we train a VAE on the unified representation to obtain a compact, semantically rich latent embedding, and develop a grasping policy conditioned on the canonical representation that generalizes across dexterous hands. We demonstrate, through simulation and real-world tasks on unseen morphologies (e.g., 81.9% zero-shot success rate on 3-finger LEAP Hand), that our framework unifies both the representational and action spaces of structurally diverse hands, providing a scalable foundation for cross-hand learning toward universal dexterous manipulation.

💡 Research Summary

The paper tackles a fundamental bottleneck in dexterous manipulation research: most learning‑based policies are tightly coupled to a single hand design, making it difficult to transfer skills across robots with different kinematic structures, finger counts, or actuation limits. To address this, the authors introduce a “canonical representation” that unifies a wide variety of robotic hands into a single, learnable description. The approach consists of two main components.

First, they define a canonical URDF format that enforces a consistent coordinate convention (palm normal +X, thumb +Y, other fingers +Z) and represents every link as a simple capsule primitive. This eliminates the heterogeneous global and local frame definitions that plague existing URDFs and provides a clean, physics‑compatible skeleton that can be generated automatically.

Second, they compress the essential geometric and kinematic attributes of a hand into a compact parameter vector. The core set contains 82 scalar values (with an extended 173‑parameter version for more exotic designs) describing finger lengths, joint limits, link radii, and other morphology‑defining quantities. By extracting these parameters from any hand’s original URDF and feeding them into a Jinja2‑based template, the pipeline can both parse arbitrary hand models into the canonical space and reconstruct a full URDF from the parameters, enabling bidirectional conversion with minimal manual effort.

With the canonical URDF in place, all hands share a fixed 22‑DoF control structure. Hands that have fewer active joints simply treat the missing joints as dummy variables, preserving a uniform action vector size and ordering across embodiments. This unified action space allows a single policy network to output a 22‑dimensional command vector regardless of the underlying hardware.

To demonstrate the utility of the representation, the authors conduct three complementary studies. They first train a variational auto‑encoder (VAE) on a large collection of sampled hand parameters. The resulting latent space is smooth and semantically meaningful: linear interpolations between latent codes of two distinct hands produce physically plausible intermediate morphologies, confirming that the parameterization captures the essential degrees of freedom of hand design.

Second, they evaluate a grasping policy using both the original URDFs and the canonical URDFs. Performance differences are negligible (within 1 % success rate), showing that the canonical format preserves the functional dynamics of the original models.

Third, they condition a dexterous grasping policy on the canonical parameter vector and train it jointly on over one hundred variants of the LEAP Hand. When tested on an unseen 3‑finger configuration, the policy achieves an 81.9 % zero‑shot success rate in simulation and comparable results on a real robot, without any fine‑tuning. This demonstrates that the learned policy can generalize across substantial morphological changes simply by receiving a different parameter vector as input.

The contributions are threefold: (1) a compact, interpretable, and learnable hand representation that standardizes both morphology and kinematics; (2) extensive experiments showing that the representation preserves functional behavior while enabling a unified action space; and (3) the first scalable framework that supports joint policy training across many hand designs, paving the way for large‑scale, morphology‑aware dexterous manipulation research.

Limitations include the reliance on capsule‑shaped links and a human‑hand‑inspired topology, which may not capture exotic soft‑hand or non‑anthropomorphic gripper designs without further extensions. Future work could explore richer geometric primitives, meta‑learning for rapid adaptation to novel morphologies, and application to more complex manipulation tasks such as in‑hand reorientation or tool use. Overall, the canonical representation offers a practical and powerful foundation for building universal dexterous manipulation systems that transcend individual robot hand designs.

One Hand to Rule Them All: Canonical Representations for Unified Dexterous Manipulation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment