Enhancing Navigation Efficiency of Quadruped Robots via Leveraging Personal Transportation Platforms
Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (\textit{RL-ATR}), inspired by humans’ utilization of personal transporters, including Segways. The \textit{RL-ATR} features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the \textit{RL-ATR}. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.
💡 Research Summary
The paper introduces a novel approach for extending the long‑range navigation efficiency of quadruped robots by enabling them to ride personal transportation platforms such as Segways or hoverboards. The authors call this capability “Active Transporter Riding” and implement it through a reinforcement‑learning (RL) framework named RL‑ATR (Reinforcement Learning‑based Active Transporter Riding).
Problem Motivation
Quadruped robots excel at traversing rough terrain but suffer from low speed and high energy consumption during long‑range missions because they rely solely on legged locomotion. Existing multimodal solutions attach wheels or skates permanently to the robot’s legs, which raises hardware cost, adds weight, and compromises performance in each mode. Inspired by how humans opportunistically use shared personal transporters, the authors propose to treat the platform as an external, temporary vehicle that the robot can mount when high speed or low energy consumption is desired.
System Modeling
Two representative transporter designs are modeled:
-
Single‑board transporter – forward acceleration and yaw rate are controlled by pitching and rolling the board. The dynamics are expressed in equations (1)–(3), where the board’s pitch angle is clipped, resistance forces are modeled, and the board’s mass and inertia are taken into account.
-
Two‑board transporter – two parallel boards are linked by a central pivot. Forward motion is generated by the average pitch of the two boards, while turning is generated by the differential pitch. Equations (7)–(10) capture these dynamics, and separate self‑balancing controllers act on each board. An altitude‑maintenance controller is added to compensate for the limited degrees of freedom.
Both designs assume that the robot can only influence the platform by shifting its weight through foot contacts; the platform’s internal propulsion (wheels, turbines, etc.) is abstracted as an acceleration term.
RL‑ATR Framework
The problem is formulated as a Partially Observable Markov Decision Process (POMDP) because the robot’s onboard sensors provide measurements in a non‑inertial frame, making some states (e.g., true platform velocity) unobservable. The RL policy πθ consists of three main components:
-
Actor backbone (πaθ) – receives a fused observation and outputs joint displacement commands Δq. A PD controller then converts Δq (added to a nominal standing posture) into joint torques.
-
Encoder (πencθ) – embeds intrinsic privileged parameters (mass, friction, self‑balancing gains, etc.) into a 16‑dimensional latent vector zint.
-
Estimators – two neural networks, an intrinsic estimator (eintϕ) and an extrinsic estimator (eextϕ), are trained jointly with the policy. They infer the privileged information from a history of proprioceptive observations (e.g., body accelerations, angular velocities, joint states) using a CNN‑GRU architecture. During deployment, the estimators replace the privileged data, providing the policy with an online estimate of the hidden states.
The observation fed to the policy includes current proprioceptive measurements, the previous action, and the commanded linear/angular velocity (cv,ω). The privileged information is split into intrinsic (dynamic model parameters) and extrinsic (relative pose, velocities, foot‑contact flags) parts.
Curriculum Learning
A grid‑adaptive curriculum gradually expands the command distribution from low speeds and small yaw rates to the full target range. This staged difficulty helps the policy first master stable balancing on a static platform before tackling aggressive maneuvers.
Training Details
Domain randomization is applied to both robot and transporter parameters (payload mass, CoM shift, PD gains, platform mass, friction coefficient, self‑balancing gains, etc.) as listed in Table II. Randomization ranges are wider for testing than for training to evaluate generalization.
Experimental Evaluation
Simulations are conducted with four quadruped platforms (A1, Go1, Anymal‑C, Spot) and the two transporter types, yielding eight robot‑transporter combinations. Performance metrics include:
-
Command tracking error – measured as root‑mean‑square error between desired and actual platform velocities.
-
Cost of Transport (CoT) – energy consumption normalized by robot weight and distance traveled.
-
Stability indicators – duration of continuous foot contacts and incidence of falls.
Results show that RL‑ATR achieves sub‑5 % tracking error across all scenarios. Compared with pure legged locomotion, CoT is reduced by an average of 30 % (up to 40 % in high‑speed regimes). The policy remains robust to sudden command changes and to simulated disturbances such as wind gusts or varying ground friction.
Ablation Studies
Three ablations are performed:
-
Removing estimators – leads to a two‑fold increase in tracking error and loss of most energy savings, confirming the necessity of online state inference.
-
Training without curriculum – slows convergence dramatically and produces unstable behaviors, highlighting the importance of staged difficulty.
-
Feeding privileged parameters directly at test time – causes severe over‑reliance on simulation‑specific data, resulting in performance collapse when the real‑world mismatch appears.
Discussion and Limitations
The work is currently limited to simulation; real‑world implementation may expose modeling gaps in contact dynamics and friction. The platform control assumes only pitch‑based actuation; more complex platforms with steering or variable‑height mechanisms are not addressed. Extreme disturbances (e.g., abrupt platform stops) are only lightly tested, and a dedicated recovery strategy is absent.
Conclusion
RL‑ATR demonstrates that a quadruped robot can learn to ride external personal transporters by modulating its posture, using reinforcement learning together with learned state estimators and curriculum training. The method yields substantial improvements in speed and energy efficiency while preserving stability, opening a new “riding” modality for legged robots. Future work should focus on hardware validation, richer platform dynamics, and robust recovery mechanisms under severe disturbances.
Comments & Academic Discussion
Loading comments...
Leave a Comment