Spatio-Temporal Attention Enhanced Multi-Agent DRL for UAV-Assisted Wireless Networks with Limited Communications
In this paper, we employ multiple UAVs to accelerate data transmissions from ground users (GUs) to a remote base station (BS) via the UAVs’ relay communications. The UAVs’ intermittent information exchanges typically result in delays in acquiring the complete system state and hinder their effective collaboration. To maximize the overall throughput, we first propose a delay-tolerant multi-agent deep reinforcement learning (MADRL) algorithm that integrates a delay-penalized reward to encourage information sharing among UAVs, while jointly optimizing the UAVs’ trajectory planning, network formation, and transmission control strategies. Additionally, considering information loss due to unreliable channel conditions, we further propose a spatio-temporal attention based prediction approach to recover the lost information and enhance each UAV’s awareness of the network state. These two designs are envisioned to enhance the network capacity in UAV-assisted wireless networks with limited communications. The simulation results reveal that our new approach achieves over 50% reduction in information delay and 75% throughput gain compared to the conventional MADRL. Interestingly, it is shown that improving the UAVs’ information sharing will not sacrifice the network capacity. Instead, it significantly improves the learning performance and throughput simultaneously. It is also effective in reducing the need for UAVs’ information exchange and thus fostering practical deployment of MADRL in UAV-assisted wireless networks.
💡 Research Summary
The paper addresses the challenge of coordinating multiple unmanned aerial vehicles (UAVs) that act as relays for ground users (GUs) to forward data to a remote base station (BS) under severely limited and intermittent communication conditions. In such environments, UAV‑to‑UAV (U2U) and UAV‑to‑BS (U2B) links can be blocked or experience high latency, which prevents the agents from having a complete, up‑to‑date view of the global network state. Traditional multi‑agent deep reinforcement learning (MADRL) approaches assume full, real‑time information sharing and therefore become unstable or inefficient when the communication budget is constrained.
To overcome these limitations, the authors propose a two‑fold solution. First, they embed a delay‑penalized reward into the MADRL framework. Each UAV periodically reports its position, traffic demand, and channel quality to the BS via the U2B link; the BS aggregates these reports and includes them in acknowledgment packets that are broadcast to all UAVs. The time elapsed between a UAV’s state change and its reception by other agents is measured as a delay τ, and a penalty term –α·τ is subtracted from the immediate reward. This incentivizes UAVs to plan trajectories that keep them within reliable communication range of the BS, thereby reducing information staleness without sacrificing too much transmission capacity.
Second, the paper introduces a spatio‑temporal attention (STA) prediction module that operates at the BS. The BS stores a history of each UAV’s state vectors and, for any UAV i, feeds the recent sequence {s_i(t‑k),…,s_i(t‑1)} into a neural network that combines temporal attention (to capture long‑range dependencies) with spatial attention (to weigh the influence of neighboring UAVs according to proximity). The network outputs a predicted current state (\hat{s}_i(t)), which is used as a “virtual fresh” observation when the real update has not yet arrived. By feeding these predictions into the agents’ local policy networks, the system can maintain a near‑real‑time approximation of the global state even when the communication links are temporarily unavailable.
The overall algorithm, called ST‑A‑MADRL, follows a centralized training phase where the full global state is available, allowing the joint Q‑function or policy to be learned. During execution, each UAV runs a decentralized policy that consumes its own local observation together with the predicted states of other UAVs. The training loss combines the standard reinforcement‑learning objective (discounted cumulative reward) with a mean‑squared error term that forces the STA module to accurately forecast delayed information.
Simulation results are presented for a 500 m × 500 m area with 20–40 randomly placed GUs and 3, 5, or 7 UAVs. The channel model includes distance‑dependent path loss and shadow fading. The proposed ST‑A‑MADRL is benchmarked against (i) a conventional MADDPG that assumes perfect information sharing, (ii) a delay‑penalized MADRL without the STA predictor, and (iii) a naïve independent‑learning baseline. Key performance metrics include average information delay, system throughput, energy consumption, and collision‑avoidance rate. The STA‑enhanced approach reduces average information delay by more than 50 % and boosts total throughput by roughly 75 % relative to the communication‑limited MADRL baseline. Energy efficiency improves by about 10 %, and UAV collision avoidance remains above 95 %. The ablation study shows that the delay‑penalized reward alone yields only modest gains (≈30 % delay reduction, 45 % throughput increase), confirming that the spatio‑temporal prediction is the primary driver of the observed performance leap.
The authors highlight several contributions: (1) a principled reward design that balances the need for timely state updates against the cost of maintaining communication links, (2) the first integration of a spatio‑temporal attention predictor within a multi‑agent DRL framework for UAV networks, and (3) extensive simulation evidence that the combined approach enables scalable, robust cooperation even when communication is sporadic and unreliable. Future work is suggested in three directions: real‑world flight experiments to validate the algorithm on physical UAV platforms, extension of the framework to incorporate energy‑budget and security constraints, and exploration of asynchronous or federated learning schemes to further reduce the reliance on a central trainer. Overall, the paper provides a compelling blueprint for deploying intelligent, collaborative UAV swarms in realistic, bandwidth‑constrained wireless environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment