IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration

IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.


💡 Research Summary

**
The paper tackles the problem of autonomous indoor exploration by a swarm of unmanned aerial vehicles (UAVs) operating in GNSS‑denied environments, where communication is limited in range, bandwidth, and latency. To address the challenges of coordination, perception, and decentralized decision‑making, the authors develop a multi‑agent reinforcement learning (MARL) framework that runs on high‑fidelity 2‑D simulations built with the open‑source Godot game engine. Unlike many prior works that rely on discrete actions, centralized training/execution, or assume permanent connectivity, this study adopts continuous action spaces, a hybrid centralized‑training‑decentralized‑execution (CTDE) paradigm, and realistic network constraints modeled as a dynamic communication graph.

The theoretical foundation is a Network‑Distributed Partially Observable Markov Decision Process (ND‑POMDP), which extends the classic Dec‑POMDP by explicitly representing the communication topology at each time step. Each UAV is equipped with a simulated 2‑D LiDAR (implemented via RayCast2D nodes) that feeds a Bayesian occupancy‑grid map. The policy receives a fixed‑size egocentric map extracted from the local occupancy grid, together with the agent’s pose and velocity. Agents share raw LiDAR scans and, when a link is re‑established, the full local map with neighbors, respecting a simple single‑hop protocol and simulated transmission delays based on packet size and bandwidth.

The reward function is designed to maximize newly discovered area while penalizing collisions. Specifically, the step reward is (R = W_{area} \cdot \Delta\text{Area}), where (\Delta\text{Area}) is the normalized increase in mapped area during the transition, and an optional collision penalty (R_{collision} = -1) can be weighted by (W_{collision}). This formulation prevents agents from becoming passive (a problem observed when only total area or collision penalties were used) and encourages continuous motion.

Observations consist of LiDAR range data, position ((p_x, p_y)), orientation (\theta), linear velocity ((v_x, v_y)) and angular velocity (\omega). Actions are continuous velocity commands ((v_x, v_y, \omega) \in


Comments & Academic Discussion

Loading comments...

Leave a Comment