CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning

CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In recent years, wheeled bipedal robots have garnered significant attention due to their exceptional mobility on flat terrain. However, while stair climbing has been achieved in prior studies, these existing methods often suffer from a severe lack of versatility, making them difficult to adapt to varying hardware specifications or diverse complex terrains. To overcome these limitations, we propose a generalized Contact-Triggered Blind Climbing (CTBC) framework. Upon detecting wheel-obstacle contact, the framework triggers a leg-lifting motion integrated with a strongly-guided feedforward trajectory. This allows the robot to rapidly acquire agile climbing skills, significantly enhancing its capability to traverse unstructured environments. Distinct from previous approaches, CTBC demonstrates superior robustness and adaptability, having been validated across multiple wheeled bipedal platforms with different wheel radii and tire materials. Real-world experiments demonstrate that, relying solely on proprioceptive feedback, the proposed framework enables robots to achieve reliable and continuous climbing over obstacles well beyond their wheel radius.


💡 Research Summary

The paper introduces a novel “Contact‑Triggered Blind Climbing” (CTBC) framework that enables wheeled‑bipedal robots to climb obstacles taller than their wheel radius using only proprioceptive feedback. The core idea is simple yet powerful: when a wheel makes contact with an obstacle, the measured horizontal contact force is filtered through a three‑frame sliding window. If the force exceeds a predefined threshold consistently, a feed‑forward reference trajectory is instantly activated, commanding the corresponding leg to lift and swing. This contact‑triggered mechanism is tightly coupled with a reinforcement‑learning (RL) policy that governs the remaining degrees of freedom.

The RL component uses Proximal Policy Optimization (PPO) within a non‑symmetric actor‑critic architecture. The actor receives only observations that are available on the real robot (joint positions, velocities, gravity vector, recent actions, etc.), while the critic is privileged during training with additional state information such as precise contact forces, height scans, and linear velocities. This separation improves sample efficiency and facilitates robust sim‑to‑real transfer.

Training occurs in NVIDIA’s Isaac Gym, a GPU‑accelerated simulator that allows massive parallel roll‑outs. The terrain is an 8 × 8 m arena divided into ten columns (smooth slope, rough slope, six stair types, and discrete obstacles) and ten rows of increasing difficulty, forming a curriculum that gradually raises the challenge. Domain randomization (friction, tire stiffness, sensor noise) is applied to close the reality gap, and policies are cross‑validated in MuJoCo for additional fidelity.

The reward function is composed of three layers: (1) task rewards that penalize deviations from commanded linear and angular velocities and encourage foot‑lifting actions; (2) style rewards that promote natural foot placement, air‑time, and clearance; and (3) regularization terms that suppress excessive torques, joint limits, and abrupt action changes. Crucially, foot‑lifting rewards (target position tracking, air time, contact number, clearance) are conditional – they are only activated when the contact trigger fires. This design decouples high‑speed wheeled cruising from the more energy‑intensive leg‑stepping phase, allowing the robot to remain efficient on flat ground while reacting instantly to obstacles.

Two hardware platforms are used for validation: the LimX Dynamics Tron1 (11 cm rubber solid tires) and the Cowarobot R0 (12.7 cm pneumatic tires). Both robots successfully climb a 20 cm stair continuously, which is more than twice their wheel radius, and the R0 also climbs 7.5 cm stairs. Experiments show that the inclusion of the feed‑forward instruction dramatically accelerates learning (≈30 % faster convergence) and raises obstacle‑crossing success rates above 95 %. Ablation studies confirm that removing the contact trigger or the feed‑forward trajectory degrades performance significantly, leading to slower learning and lower success rates.

The authors discuss several limitations. The contact‑force threshold must be tuned per platform, and high‑frequency noise during fast rolling can cause false triggers. The current implementation is limited to bipeds; extending the approach to quadrupeds or asymmetric wheel‑leg configurations will require additional research. Moreover, the feed‑forward trajectories are hand‑designed; integrating meta‑learning or trajectory‑generation networks could yield a fully autonomous system.

In summary, CTBC demonstrates that a tightly integrated pipeline—contact detection → deterministic leg‑lifting instruction → RL policy—can endow wheeled‑bipedal robots with a universal, hardware‑agnostic capability to surmount obstacles far exceeding wheel size, using only internal sensing. The work advances the state of the art in legged‑wheeled locomotion and opens avenues for more versatile, energy‑efficient robots operating in unstructured environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment