Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting

Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper addresses the critical challenge of coordinating mobile edge UAV networks to maintain robust service in highly dynamic spatiotemporal environments. Conventional Deep Reinforcement Learning (DRL) approaches often suffer from catastrophic forgetting when transitioning between distinct task scenarios, such as moving from dense urban clusters to sparse rural areas. These transitions typically necessitate computationally expensive retraining or model resets to adapt to new user distributions, leading to service interruptions. To overcome these limitations, we propose a computationally efficient Spatiotemporal Continual Learning (STCL) framework realized through a Group-Decoupled Multi-Agent Proximal Policy Optimization (G-MAPPO) algorithm. Our approach integrates a novel Group-Decoupled Policy Optimization (GDPO) mechanism that utilizes dynamic $z$-score normalization to autonomously balance heterogeneous objectives, including energy efficiency, user fairness, and coverage. This mechanism effectively mitigates gradient conflicts induced by concept drifts without requiring offline retraining. Furthermore, the framework leverages the 3D mobility of UAVs as a spatial compensation layer, enabling the swarm to autonomously adjust altitudes to accommodate extreme density fluctuations. Extensive simulations demonstrate that the proposed STCL framework achieves superior resilience, characterized by an elastic recovery of service reliability to approximately 0.95 during phase transitions. Compared to the MADDPG baseline, G-MAPPO not only prevents knowledge forgetting but also delivers an effective capacity gain of 20% under extreme traffic loads, validating its potential as a scalable solution for edge-enabled aerial swarms.


💡 Research Summary

The paper tackles the pressing problem of catastrophic forgetting in multi‑agent deep reinforcement learning (DRL) for mobile edge unmanned aerial vehicle (UAV) networks that must operate under rapidly changing spatiotemporal user distributions. Conventional DRL methods, such as MADDPG, tend to overwrite previously learned policies when the environment shifts—for example, from a dense urban hotspot to a sparse rural area—causing severe performance degradation and service interruptions. To address this, the authors propose a Spatiotemporal Continual Learning (STCL) framework realized through a Group‑Decoupled Multi‑Agent Proximal Policy Optimization (G‑MAPPO) algorithm.

G‑MAPPO introduces two novel mechanisms. First, the Group‑Decoupled Policy Optimization (GDPO) mechanism employs dynamic z‑score normalization of reward signals across heterogeneous objectives (energy efficiency, user fairness, coverage) and then projects gradients onto orthogonal subspaces to resolve conflicts. This gradient‑projection approach eliminates the need for offline retraining, parameter isolation, or replay buffers, enabling true online continual adaptation while preserving previously acquired knowledge. Second, the framework exploits the UAVs’ three‑dimensional mobility as a “spatial compensation layer.” By allowing each UAV to autonomously adjust its altitude within a predefined range (80–120 m), the swarm can physically expand or contract its service footprint in response to abrupt user density fluctuations, thereby mitigating the impact of concept drift on the learning process.

The system model consists of a two‑tier architecture: a macro ground base station (GBS) providing ubiquitous coverage and a swarm of UAV‑BSs acting as mobile small cells. Ground‑to‑UAV links follow a probabilistic line‑of‑sight (LoS) channel model, while GBS‑to‑ground links use a standard terrestrial path‑loss model. Users associate with the node offering the strongest received power, and the SINR‑based rate formulation incorporates both LoS/NLoS probabilities and altitude‑dependent path loss. The multi‑objective reward aggregates energy consumption, a fairness index (e.g., Jain’s index), and a minimum QoS guarantee.

Simulation experiments involve up to 140 users distributed across three sequential tasks—urban, suburban, and rural—each representing a distinct user density regime. The authors compare G‑MAPPO against the state‑of‑the‑art MADDPG baseline under identical conditions. Performance metrics include service reliability (the probability of meeting QoS), average throughput, energy efficiency, and a quantitative forgetting metric (performance drop after a task transition). Results show that G‑MAPPO maintains an elastic recovery of service reliability around 0.95 during task switches, avoids any noticeable forgetting, and achieves roughly a 20 % capacity gain under peak traffic loads relative to MADDPG. Moreover, GDPO’s gradient‑projection and dynamic normalization substantially stabilize learning, preventing the gradient interference that typically plagues multi‑objective MARL.

The paper’s contributions are significant: it offers a principled continual‑learning algorithm tailored to the unique challenges of aerial edge networks, integrates physical UAV altitude control as a complementary adaptation tool, and validates the approach with extensive stress testing. Nonetheless, limitations remain. The evaluation relies on simulated channel models and static user mobility patterns, lacking real‑world flight experiments that would expose battery constraints, wind effects, and regulatory altitude limits. GDPO’s hyper‑parameters (group partitioning, normalization window size) may also be sensitive to deployment scenarios and require further tuning. Future work should involve hardware‑in‑the‑loop testing, incorporation of realistic energy consumption models, and extension to multi‑band or multi‑service continual learning. Overall, the STCL framework and G‑MAPPO algorithm represent a promising step toward resilient, self‑adapting UAV‑enabled edge networks capable of seamless operation across heterogeneous spatiotemporal conditions.


Comments & Academic Discussion

Loading comments...

Leave a Comment