Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

In this technical report, we address the path planning problem in dynamic environments, i.e., settings in which obstacles can change over time, introducing uncertainty. Effective path planning in such environments is essential for mobile robots, as many real-world scenarios naturally exhibit dynamic elements. While prior research has proposed various approaches, our work tackles the problem from a relatively unexplored angle. We start by relaxing the common assumption that environmental changes are entirely unlocalizable. In practice, such changes are often confined to a bounded region, even if their exact positions are unknown. Leveraging this assumption enables more efficient replanning, as only affected regions need to be updated. This principle underlies the methodology proposed by Yarahmadi et al., which addresses the path planning problem in two ways. If no changes occur, the environment is treated as static and fully known, and a global path planner (e.g., A*) is used to generate a route to a charging station. If changes do occur, the environment is partitioned into sub-environments, with each assigned to a dedicated agent. Agents responsible for the affected sub-environments adapt by performing local path planning based on Q-learning, a Reinforcement Learning (RL) method. While promising, their methodology has notable limitations. First, relying on a global path planner is impractical in unknown environments where the layout, including free spaces, obstacles, and charging stations, is initially hidden. Second, traditional path planning algorithms such as A* exhibit poor scalability and efficiency in large environments. Furthermore, their method triggers RL-based replanning whenever a sub-environment experiences a change. This strategy is inefficient, as some changes have little to no impact on the overall path planning problem, making constant replanning unnecessary. Another key limitation is that the sub-environments created by partitioning the environment have no defined relationships with one another. As a result, if a charging station is absent or unreachable within a sub-environment, the local planner is stuck and cannot utilize adjacent regions to find an alternative path. This lack of interconnection prevents fallback strategies and offers no guarantee of reaching a charging station. Finally, their evaluation is conducted in overly simplistic environments with minimal dynamic changes, i.e., only one obstacle change per time step. To address these limitations, we propose the following contributions: • A scalable, region-aware RL framework based on a hierarchical decomposition of the environment, which supports efficient, targeted retraining. • A retraining condition based on sub-environment success rates, which determines if retraining is necessary after a change or potentially multiple changes. • Both single-agent and multi-agent (federated Q-learning) RL-based training methods, where the multi-agent version aggregates local Q-tables to accelerate the learning process. • A more realistic evaluation setup that spans three levels of environment difficulty and simulates multiple simultaneous obstacle changes per time step.

📜 Original Paper Content