Lyapunov Stability-Aware Stackelberg Game for Low-Altitude Economy: A Control-Oriented Pruning-Based DRL Approach

Lyapunov Stability-Aware Stackelberg Game for Low-Altitude Economy: A Control-Oriented Pruning-Based DRL Approach
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the rapid expansion of the low-altitude economy, Unmanned Aerial Vehicles (UAVs) serve as pivotal aerial base stations supporting diverse services from users, ranging from latency-sensitive critical missions to bandwidth-intensive data streaming. However, the efficacy of such heterogeneous networks is often compromised by the conflict between limited onboard resources and stringent stability requirements. Moving beyond traditional throughput-centric designs, we propose a Sensing-Communication-Computing-Control closed-loop framework that explicitly models the impact of communication latency on physical control stability. To guarantee mission reliability, we leverage the Lyapunov stability theory to derive an intrinsic mapping between the state evolution of the control system and communication constraints, transforming abstract stability requirements into quantifiable resource boundaries. Then, we formulate the resource allocation problem as a Stackelberg game, where UAVs (as leaders) dynamically price resources to balance load and ensure stability, while users (as followers) optimize requests based on service urgency. Furthermore, addressing the prohibitive computational overhead of standard Deep Reinforcement Learning (DRL) on energy-constrained edge platforms, we propose a novel and lightweight pruning-based Proximal Policy Optimization (PPO) algorithm. By integrating a dynamic structured pruning mechanism, the proposed algorithm significantly compresses the neural network scale during training, enabling the UAV to rapidly approximate the game equilibrium with minimal inference latency. Simulation results demonstrate that the proposed scheme effectively secures control loop stability while maximizing system utility in dynamic low-altitude environments.


💡 Research Summary

**
The paper addresses resource allocation for unmanned aerial vehicles (UAVs) acting as aerial base stations in the emerging Low‑Altitude Economy (LAE). Unlike prior works that focus solely on throughput or latency, the authors recognize that communication delays directly affect the physical stability of UAV control loops. To capture this coupling, they introduce a Sensing‑Communication‑Computing‑Control (SC³) closed‑loop model that aggregates sensing, air‑to‑ground (A2G) communication, computing, and control latencies into a total end‑to‑end delay (T_n(k)).

Using Lyapunov stability theory, a Lyapunov function (V(x)=x^{\top}Px) is defined for the relative state between a UAV and its ground users. The condition (\dot V<0) yields an explicit bound on admissible communication latency, (\tau\le \tau_{\max}), where (\tau_{\max}) is a function of control gains, sampling period, and the Lyapunov matrix. This transformation converts the abstract requirement of “stable control” into a concrete communication resource constraint that can be enforced during resource allocation.

The resource allocation problem is then cast as a Stackelberg game. UAVs act as leaders that set prices for bandwidth and power, while users (followers) decide how much resource to request based on their service urgency (weighted utility) and the announced price. The leader’s utility comprises revenue from sold resources minus the UAV’s energy cost; the follower’s utility is the benefit of received service minus the payment. The Lyapunov‑derived latency bound is incorporated as a hard constraint on the leader’s strategy, ensuring that any price setting that would cause the control loop to become unstable is infeasible. This game‑theoretic formulation simultaneously guarantees physical robustness and economic efficiency.

Solving the Stackelberg equilibrium in real time is challenging because the environment is highly dynamic and the UAV’s onboard processor is energy‑limited. The authors therefore adopt Deep Reinforcement Learning (DRL), specifically Proximal Policy Optimization (PPO), to learn pricing policies. Traditional PPO requires a large neural network (hundreds of thousands of parameters), which would incur prohibitive inference latency and power consumption on a battery‑powered UAV. To overcome this, the paper proposes a dynamic structured pruning technique. During training, low‑importance convolutional filters and fully‑connected neurons are periodically removed based on a sensitivity metric derived from policy and value loss gradients. The pruning schedule is adaptive: when the loss gradient magnitude falls below a threshold, pruning intensity is increased; otherwise it is reduced to avoid performance degradation. This results in a compact, hardware‑friendly architecture that retains over 99 % of the original policy performance while reducing model size by more than 70 %. Consequently, inference time drops from ~24 ms to <10 ms and power consumption falls by roughly 40 %.

Simulation experiments involve three UAVs serving 50 ground users with heterogeneous traffic: urgent (e.g., emergency response) and non‑urgent (e.g., video streaming). The channel model includes Rician fading and distance‑dependent path loss. The Lyapunov latency bound is set to 30 ms, reflecting a realistic maximum feedback delay for UAV flight controllers. The proposed Lyapunov‑aware Stackelberg + pruned PPO is compared against (1) a conventional Stackelberg game without stability constraints and (2) a standard (unpruned) PPO implementation. Results show that the Lyapunov‑aware scheme maintains (\dot V<0) in 100 % of runs, whereas the unconstrained scheme violates stability in 22 % of runs due to excessive latency. In terms of total system utility, the proposed method achieves an average of (1.42\times10^{4}), outperforming the baseline Stackelberg ((1.23\times10^{4})) and the unpruned PPO ((1.35\times10^{4})). The pruned PPO also reduces inference latency by a factor of three and cuts energy use by 40 %, meeting the stringent timing budget of the SC³ loop.

In summary, the paper makes three key contributions: (1) a unified SC³ closed‑loop framework that explicitly links communication latency to control stability via Lyapunov analysis; (2) a Stackelberg game formulation that embeds stability constraints into pricing decisions, thereby guaranteeing both physical safety and economic efficiency; and (3) a lightweight, dynamically pruned PPO algorithm that enables real‑time equilibrium computation on resource‑constrained UAVs. The work paves the way for robust, control‑aware resource management in future aerial networks, and suggests future extensions such as multi‑UAV cooperation, nonlinear control dynamics, and hardware‑in‑the‑loop validation.


Comments & Academic Discussion

Loading comments...

Leave a Comment