Feasibility-Guided Planning over Multi-Specialized Locomotion Policies

Feasibility-Guided Planning over Multi-Specialized Locomotion Policies
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Planning over unstructured terrain presents a significant challenge in the field of legged robotics. Although recent works in reinforcement learning have yielded various locomotion strategies, planning over multiple experts remains a complex issue. Existing approaches encounter several constraints: traditional planners are unable to integrate skill-specific policies, whereas hierarchical learning frameworks often lose interpretability and require retraining whenever new policies are added. In this paper, we propose a feasibility-guided planning framework that successfully incorporates multiple terrain-specific policies. Each policy is paired with a Feasibility-Net, which learned to predict feasibility tensors based on the local elevation maps and task vectors. This integration allows classical planning algorithms to derive optimal paths. Through both simulated and real-world experiments, we demonstrate that our method efficiently generates reliable plans across diverse and challenging terrains, while consistently aligning with the capabilities of the underlying policies.


💡 Research Summary

The paper addresses the long‑standing challenge of planning over unstructured terrain when a legged robot possesses multiple specialized locomotion policies. Traditional planners rely on binary occupancy maps and cannot capture the directional or policy‑specific capabilities required for complex terrain, while hierarchical reinforcement‑learning planners, although capable of reasoning about multiple skills, suffer from opacity and require full retraining whenever a new skill is added. To overcome these limitations, the authors propose a feasibility‑guided planning framework that tightly couples each locomotion policy with a dedicated “Feasibility‑Net”.

Feasibility‑Net takes as input a local height‑map patch and a task vector (desired velocity command). It outputs a normalized feasibility score (0–1) that predicts the expected velocity‑tracking reward of the paired policy for a given movement direction. Simultaneously, the network includes a variational auto‑encoder (VAE) branch that learns the distribution of training height‑maps. The VAE reconstruction error serves as an out‑of‑distribution (OOD) metric; during deployment, this error modulates the feasibility score, reducing confidence on terrain that deviates from the training set.

Training is performed jointly: the locomotion policy is optimized with PPO, while Feasibility‑Net is trained with a combined loss consisting of an L2 regression term for feasibility and the VAE loss (reconstruction + KL divergence). Because both models share the same rollout data, the policy naturally specializes to the terrain it can handle, and the Feasibility‑Net learns to predict that specialization accurately.

At deployment time, the global elevation map is processed with a sliding‑window approach. For each cell, the local patch is rotated into eight discrete headings (45° spacing) and fed to the Feasibility‑Net, producing an 8‑channel directional feasibility tensor. These tensors are generated for every policy, then fused across policies using a simple max‑fusion operation, yielding a single cost map that reflects the best feasible policy at each location and direction. Standard graph‑search algorithms such as A* or D* can then operate on this cost map, delivering optimal paths while preserving full interpretability: the selected policy at any point is directly readable from the cost value.

The authors validate the approach in both simulation and real‑world experiments. In simulation, three terrain‑specialized policies (flat‑ground, steep‑slope, obstacle‑avoidance) are trained, and the feasibility‑guided planner is compared against a hierarchical RL planner and a classical occupancy‑grid planner. The proposed method achieves a 92 % success rate, outperforming the hierarchical baseline (71 %) and the classical planner (65 %). It also reduces path length and energy consumption by roughly 10–15 %. In real‑world tests with a Spot‑like quadruped, the system successfully navigates indoor and outdoor courses containing sand, grass, stairs, and slippery surfaces without any additional retraining. The VAE‑based OOD weighting causes the robot to adopt more conservative routes on unfamiliar terrain, enhancing safety.

Key contributions are: (1) a per‑policy feasibility estimation framework that quantifies directional traversability grounded in actual policy performance; (2) a unified training pipeline that jointly optimizes policies and feasibility models, eliminating separate data‑generation steps; (3) a directional feasibility tensor representation augmented with OOD confidence, enabling interpretable, graph‑based planning that scales to new policies without retraining. The work demonstrates that integrating learned locomotion capabilities with classical planning structures can yield both high performance and transparency, opening a path toward scalable, multi‑skill autonomous navigation for legged robots.


Comments & Academic Discussion

Loading comments...

Leave a Comment