Deep Reinforcement Learning for Interference Suppression in RIS-Aided Space-Air-Ground Integrated Networks

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Future 6G networks envision ubiquitous connectivity through space-air-ground integrated networks (SAGINs), where high-altitude platform stations (HAPSs) and satellites complement terrestrial systems to provide wide-area, low-latency coverage. However, the rapid growth of terrestrial devices intensifies spectrum sharing between terrestrial and non-terrestrial segments, resulting in severe cross-tier interference. In particular, frequency sharing between the HAPS satellite uplink and HAPS ground downlink improves spectrum efficiency but suffers from interference caused by the HAPS antenna back-lobe. Existing approaches relying on zero-forcing (ZF) codebooks have limited performance under highly dynamic channel conditions. To overcome this limitation, we employ a reconfigurable intelligent surface (RIS)-aided HAPS-based SAGIN framework with a deep deterministic policy gradient (DDPG) algorithm. The proposed DDPG framework optimizes the HAPS beamforming weights to form spatial nulls toward interference sources while maintaining robust links to the desired signals. Simulation results demonstrate that the DDPG framework consistently outperforms conventional ZF beamforming among different RIS configurations, achieving up to (11.3%) throughput improvement for a (4\times4) RIS configuration, validating its adaptive capability to enhance spectral efficiency in dynamic HAPS-based SAGINs.

💡 Research Summary

The paper addresses a critical interference problem in future 6G space‑air‑ground integrated networks (SAGIN) that employ high‑altitude platform stations (HAPS) as relays between satellites and terrestrial users. When the uplink (HAPS‑to‑satellite) and downlink (HAPS‑to‑ground) share the same frequency band, the back‑lobe radiation of the uplink antenna leaks into the downlink coverage area, causing severe cross‑tier interference. Conventional mitigation techniques such as zero‑forcing (ZF) beamforming or code‑book based null steering are limited by discrete resolution, the number of available spatial degrees of freedom, and the need for frequent re‑optimization under rapidly varying channel conditions.

To overcome these limitations, the authors propose a joint solution that combines a reconfigurable intelligent surface (RIS) with a deep deterministic policy gradient (DDPG) reinforcement‑learning framework. The RIS, placed within the HAPS service region, consists of an L × L array of passive reflecting elements (e.g., 4 × 4). Each element can impose an independent phase shift pₗ, calculated according to a closed‑form expression that aligns the reflected wavefront toward the desired direction while destructively interfering with the unwanted back‑lobe path. By shaping the propagation environment, the RIS creates additional spatial degrees of freedom that enable precise null formation without extra power consumption.

The core of the solution is a continuous‑action DRL agent that directly outputs the complex beamforming matrix W of the HAPS antenna arrays. The state observed by the agent is the full channel state information (CSI) of the composite HAPS‑SAGIN system, represented by the real and imaginary parts of the composite channel matrix H_HAPS. The action consists of the real and imaginary components of W, allowing the agent to steer both the uplink and downlink beams simultaneously. The reward function is carefully designed to (i) minimize the total transmit power Σ‖w_i‖², (ii) enforce a minimum SINR γ_min for every ground user, and (iii) penalize any violation of the zero‑interference (ZF) constraint and the overall power budget P_t. This multi‑objective reward guides the learning process toward policies that satisfy hard constraints while improving spectral efficiency.

DDPG employs an actor‑critic architecture with separate target networks and an experience replay buffer. The actor network maps the current state to an action (beamforming weights), while the critic estimates the Q‑value of the state‑action pair. The loss is the mean‑squared error between the predicted Q‑value and a target computed using the Bellman equation with a discount factor τ. Training proceeds with stochastic gradient descent, and the use of target networks stabilizes learning. Hyper‑parameters such as batch size B, learning rate μ, and discount factor τ are tuned to ensure convergence in the highly non‑convex optimization landscape defined by constraints (9)–(11) in the paper.

The system model includes a single satellite, one HAPS equipped with two uniform planar arrays (UPA) – one dedicated to uplink (N = 50 elements) and one to downlink – and a ground layer with a RIS and multiple users (density 150 users/km²). The carrier frequency is 28 GHz, bandwidth 400 MHz, and the transmit power budget is 30 dB. The composite downlink channel incorporates both the direct HAPS‑to‑user link and the RIS‑reflected link, as expressed in equation (3). The optimization problem (P1) seeks to minimize total transmit power while guaranteeing zero interference to non‑intended receivers, meeting SINR thresholds, and respecting the power budget. This problem is non‑convex and intractable for conventional solvers in real time.

Simulation results compare the proposed DDPG‑based beamforming with a baseline ZF code‑book approach under various RIS configurations (2 × 2, 4 × 4, etc.). The DDPG method consistently outperforms ZF, achieving up to an 11.3 % increase in system throughput for the 4 × 4 RIS case. Moreover, the learned policy maintains the required SINR for all users even when the channel varies rapidly, demonstrating robustness to dynamics that would force frequent re‑computation in traditional algorithms. Power consumption is also reduced, confirming the energy‑efficiency advantage of the DRL approach.

In summary, the paper presents a novel, scalable framework that leverages RIS‑enabled environmental control and model‑free deep reinforcement learning to dynamically suppress inter‑link interference in HAPS‑based SAGINs. By directly learning continuous beamforming weights, the DDPG agent circumvents the quantization errors and computational overhead of ZF code‑books, while the RIS provides additional degrees of freedom for precise null steering. The authors suggest future extensions such as handling imperfect CSI, multi‑RIS cooperation, and multi‑HAPS/multi‑satellite scenarios, which would further bridge the gap toward practical 6G non‑terrestrial networks.

Deep Reinforcement Learning for Interference Suppression in RIS-Aided Space-Air-Ground Integrated Networks

💡 Research Summary

Comments & Academic Discussion

Leave a Comment