Dynamic Neural Potential Field: Online Trajectory Optimization in the Presence of Moving Obstacles
Generalist robot policies must operate safely and reliably in everyday human environments such as homes, offices, and warehouses, where people and objects move unpredictably. We present Dynamic Neural Potential Field (NPField-GPT), a learning-enhanced model predictive control (MPC) framework that couples classical optimization with a Transformer-based predictor of footprint-aware repulsive potentials. Given an occupancy sub-map, robot footprint, and optional dynamic-obstacle cues, our NPField-GPT model forecasts a horizon of differentiable potentials that are injected into a sequential quadratic MPC program via L4CasADi, yielding real-time, constraint-aware trajectory optimization. We additionally study two baselines: NPField-StaticMLP, where a dynamic scene is treated as a sequence of static maps; and NPField-DynamicMLP, which predicts the future potential sequence in parallel with an MLP. In dynamic indoor scenarios from BenchMR and on a Husky UGV in office corridors, NPField-GPT produces more efficient and safer trajectories under motion changes, while StaticMLP/DynamicMLP offer lower latency. We also compare with the CIAO* and MPPI baselines. Across methods, the Transformer+MPC synergy preserves the transparency and stability of model-based planning while learning only the part that benefits from data: spatiotemporal collision risk. Code and trained models are available at https://github.com/CognitiveAISystems/Dynamic-Neural-Potential-Field
💡 Research Summary
The paper addresses the problem of safe, real‑time local navigation for mobile robots operating in environments where obstacles move unpredictably (e.g., people in offices, carts in warehouses). Classical receding‑horizon planners such as Model Predictive Control (MPC) require an analytical description of collision risk, while sampling‑based approaches like Model Predictive Path Integral (MPPI) can handle arbitrary maps but often produce chattering or infeasible solutions. The authors propose a hybrid solution: they keep the optimization core model‑based, but replace the hand‑crafted collision term with a learned, differentiable “repulsive potential” that directly encodes spatio‑temporal risk.
Three neural architectures are investigated:
- NPField‑StaticMLP – treats each future frame as an independent static map and predicts a one‑step potential with a simple MLP. It is fast but ignores temporal dependencies.
- NPField‑DynamicMLP – predicts the whole horizon in parallel using separate MLP heads conditioned on the current dynamic‑obstacle state. It improves latency compared to the transformer but still lacks strong temporal coupling.
- NPField‑GPT – the main contribution. A non‑autoregressive transformer receives a token sequence that includes (i) an embedding of the occupancy sub‑map, (ii) a robot‑footprint embedding, (iii) dynamic‑obstacle position and orientation (both raw and sin/cos), (iv) the robot’s current pose, and (v) a learned relative‑coordinate fusion token. Ten learnable query tokens are appended; after a single forward pass the transformer outputs a full horizon of potentials, each conditioned on the query pose via attention. This design captures both spatial geometry and temporal dynamics in a single shot.
Training uses a ground‑truth potential derived from the signed distance function (SDF) of the occupancy grid. Because the SDF is non‑differentiable, the network learns a smooth approximation. The loss is a distance‑weighted mean‑squared error that emphasizes errors near the moving obstacle (weight = 1 + α exp(−(d/σ)²/2), with α = 2, σ = 0.5). An auxiliary decoder reconstructs the occupancy map from the transformer’s last context token, stabilizing the spatial embedding; this decoder is discarded at inference time.
The learned potentials are injected into a sequential quadratic MPC formulated in L4CasADi, which provides automatic differentiation through the neural network. The MPC cost consists of three terms: (i) deviation from a reference path, (ii) control effort, and (iii) the neural repulsive potential. Because the potential is differentiable, the optimizer can directly minimize collision risk while respecting kinodynamic constraints.
Experiments are conducted on two fronts:
- BenchMR simulation – a suite of indoor scenarios with moving obstacles of varying speed and behavior. NPField‑GPT achieves the lowest average path length and highest safety margin compared to CIAO* and MPPI, especially when obstacles execute sudden maneuvers. StaticMLP and DynamicMLP are faster (≈2–5 ms per inference) but produce less smooth trajectories under rapid changes.
- Real‑world Husky UGV – deployed in office corridors with human participants. The robot runs at ~10 Hz, and NPField‑GPT maintains collision‑free motion even when a person abruptly steps into the robot’s path. Latency for NPField‑GPT is ~7–9 ms, still within real‑time limits, while the MLP baselines stay under 5 ms.
The authors discuss limitations: the method assumes that dynamic obstacles are “predictable” (i.e., their future positions can be inferred from current state), the transformer’s parameter count may be prohibitive for low‑power embedded platforms, and the current formulation is limited to 2‑D wheeled robots. Future work includes integrating sophisticated human‑motion predictors, model compression/quantization for edge deployment, and extending the approach to 3‑D aerial robots or multi‑robot fleets.
In summary, the paper demonstrates that learning only the spatio‑temporal collision risk (via a neural potential field) and keeping the rest of the planning pipeline model‑based yields a system that is both transparent and robust. The transformer‑based NPField‑GPT provides a powerful, differentiable risk model that can be seamlessly embedded into MPC, delivering safer and more efficient navigation in dynamic environments while preserving real‑time performance.
Comments & Academic Discussion
Loading comments...
Leave a Comment