HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners.
💡 Research Summary
The paper addresses the collision‑free navigation (CFN) problem for autonomous vehicles operating in partially observable traffic environments. The authors formulate CFN as a discrete‑time Partially Observable Markov Decision Process (POMDP) and observe that existing solutions—pure deep‑learning policies, rule‑based planners, or conventional online POMDP solvers—each suffer from a trade‑off between computational speed and safety guarantees. To bridge this gap, they propose HyPlan, a hybrid learning‑assisted planner that tightly integrates four components: (1) a Multi‑Agent Behavior Predictor (MABP) that generates future trajectories for surrounding agents; (2) a weighted Hybrid A* path planner that builds a cost map enriched with the predicted trajectories and yields a steering angle; (3) NavPPO, a Proximal Policy Optimization (PPO) based actor‑critic network that consumes an “intention image” (a visual encoding of the ego‑car’s planned path, predicted agent paths, and past motion) together with non‑visual state features, and outputs both a stochastic policy for acceleration decisions and a scalar value estimate Vθ(b) for any belief state b; and (4) an online POMDP planner, IS‑DESPOT*, which is an enhanced version of the approximate DESPOT algorithm. The novelty of IS‑DESPOT* lies in its confidence‑aware vertical pruning: during deployment NavPPO is queried multiple times with Monte‑Carlo dropout to obtain a mean μ and variance σ² of the value estimate. These statistics are calibrated using the CRUDE method to produce a confidence score φ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment