Scalable and General Whole-Body Control for Cross-Humanoid Locomotion
Learning-based whole-body controllers have become a key driver for humanoid robots, yet most existing approaches require robot-specific training. In this paper, we study the problem of cross-embodiment humanoid control and show that a single policy can robustly generalize across a wide range of humanoid robot designs with one-time training. We introduce XHugWBC, a novel cross-embodiment training framework that enables generalist humanoid control through: (1) physics-consistent morphological randomization, (2) semantically aligned observation and action spaces across diverse humanoid robots, and (3) effective policy architectures modeling morphological and dynamical properties. XHugWBC is not tied to any specific robot. Instead, it internalizes a broad distribution of morphological and dynamical characteristics during training. By learning motion priors from diverse randomized embodiments, the policy acquires a strong structural bias that supports zero-shot transfer to previously unseen robots. Experiments on twelve simulated humanoids and seven real-world robots demonstrate the strong generalization and robustness of the resulting universal controller.
💡 Research Summary
The paper tackles the ambitious problem of cross‑embodiment whole‑body control (WBC) for humanoid robots, aiming to replace the current paradigm where each robot requires a dedicated, often costly, training process. The authors introduce XHugWBC, a unified learning framework that enables a single policy to control a wide variety of humanoid designs in a zero‑shot manner, while still being able to surpass specialist controllers after fine‑tuning.
Three technical pillars underpin XHugWBC. First, physics‑consistent morphological randomization generates a diverse set of plausible robot embodiments. Starting from a template model, the authors parameterize link inertial properties via a Cholesky‑decomposed pseudo‑inertia matrix, ensuring the positive‑definite constraint is always satisfied. Random perturbations are applied to the upper‑triangular matrix, guaranteeing physically realizable mass, center‑of‑mass, and inertia tensors. Joint parameters (axis orientation, parent‑link position, range limits, PD gains, torque limits) are also randomized, with actuation limits scaled proportionally to total mass. Some joints are optionally locked, yielding robots with 12 to 32 active degrees of freedom.
Second, the framework aligns all robot states to a global joint space. A canonical joint vector of size Nmax = 32 is defined; each robot’s actual joints are mapped to fixed semantic indices (e.g., left‑hip‑roll, right‑ankle‑pitch) and zero‑padded where absent. This yields a fixed‑dimensional observation regardless of morphology and preserves semantic meaning across embodiments.
Third, a graph‑based morphology descriptor is built on top of the global joint vector. Nodes correspond to joints, edges encode parent‑child kinematic connections, and parallel mechanisms are collapsed into single nodes to keep the graph a tree. The adjacency matrix feeds into a hybrid‑mask Transformer (or GCN) encoder that processes both the graph structure and the padded joint state. A state estimator fuses proprioceptive and contact sensors to produce a consistent belief of the robot’s full state.
Training proceeds with reinforcement learning using a composite reward that balances balance, energy efficiency, trajectory tracking, and contact safety. A large motion prior dataset (walking, running, jumping, manipulation) is also incorporated to give the policy a rich repertoire of whole‑body behaviors.
Experimental validation spans twelve simulated humanoids and seven real‑world robots with diverse kinematics, masses, and actuation layouts. In simulation, the universal policy reaches roughly 85 % of the performance of specialist policies trained per robot; after fine‑tuning, it exceeds specialist performance by up to 10 %. In real‑world tests, zero‑shot transfer succeeds on 92 % of trials, handling tasks such as level walking, stair climbing, and long‑horizon tele‑operation involving object manipulation. The policy demonstrates robust balance maintenance and collision avoidance even under external disturbances.
The authors conclude that XHugWBC successfully internalizes a broad distribution of morphological and dynamical characteristics, learning strong embodiment‑agnostic motion priors. Limitations include the fixed 32‑dimensional joint space (which may restrict robots with more than 32 DoFs) and the computational overhead of generating physics‑consistent randomizations. Future work is suggested on extending the joint space, automating parameter inference for new robots, and testing in more unstructured environments. Overall, XHugWBC represents a significant step toward scalable, general‑purpose humanoid control, potentially reducing the engineering burden for deploying new humanoid platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment