Deep Learning based Three-stage Solution for ISAC Beamforming Optimization
In this paper, a general ISAC system where the base station (BS) communicates with multiple users and performs target detection is considered. Then, a sum communication rate maximization problem is formulated, subjected to the constraints of transmit power and the minimum sensing rates of users. To solve this problem, we develop a framework that leverages deep learning algorithms to provide a three-stage solution for ISAC beamforming. The three-stage beamforming optimization solution includes three modules: 1) an unsupervised learning based feature extraction algorithm is proposed to extract fixed-size latent features while keeping its essential information from the variable channel state information (CSI); 2) a reinforcement learning (RL) based beampattern optimization algorithm is proposed to search the desired beampattern according to the extracted features; 3) a supervised learning based beamforming reconstruction algorithm is proposed to reconstruct the beamforming vector from beampattern given by the RL agent. Simulation results demonstrate that the proposed three-stage solution outperforms the baseline RL algorithm by optimizing the intuitional beampattern rather than beamforming.
💡 Research Summary
This paper addresses the beamforming optimization problem in a full‑duplex (FD) integrated sensing and communication (ISAC) system that simultaneously serves multiple downlink users and detects multiple radar targets. The objective is to maximize the sum communication rate of all users while respecting a total transmit‑power budget and guaranteeing a minimum sensing (radar) rate for each target. Traditional convex optimization becomes computationally prohibitive as the number of antennas, users, and targets grows, and existing deep‑learning approaches that map channel state information (CSI) directly to beamforming vectors suffer from high dimensionality and lack interpretability.
To overcome these issues, the authors propose a three‑stage deep‑learning framework that decomposes the original problem into (1) CSI feature extraction, (2) beampattern optimization, and (3) beamforming reconstruction.
Stage 1 – Unsupervised CSI Feature Extraction
An autoencoder (AE) is trained in an unsupervised manner to compress the high‑dimensional, complex‑valued CSI (including downlink channels, uplink channels, and residual self‑interference) into a fixed‑size real‑valued latent vector f. The preprocessing pipeline flattens each complex matrix, concatenates them, and separates real and imaginary parts before feeding them to the AE. The encoder learns a compact representation that preserves the essential information needed for subsequent optimization, while the decoder is used only during training to minimize mean‑square error (MSE). After training, only the encoder is retained for inference, providing a dimension‑independent feature vector regardless of antenna or user count.
Stage 2 – Reinforcement‑Learning Beampattern Optimization
The second module employs an on‑policy Advantage Actor‑Critic (A2C) algorithm. The state consists of the latent CSI feature f and the previous time‑slot total communication rate, enabling the agent to exploit temporal performance feedback. The action space is defined as the sampled beampattern gains over a uniformly discretized angular range (e.g., A points between –90° and 90°) for each of the three beamforming vectors (communication, sensing transmit, and sensing receive). By optimizing beampatterns rather than raw beamforming vectors, the dimensionality of the action space becomes fixed (size = A) and independent of the underlying system size. The reward function jointly captures the sum communication rate and penalties for violating the minimum sensing‑rate constraints, thereby steering the agent toward a balanced trade‑off between communication and radar performance.
Stage 3 – Supervised Beamforming Reconstruction
Once the optimal beampatterns are obtained, a separate supervised neural network approximates the inverse mapping G⁻¹ from beampattern to the actual complex beamforming vectors. The network is trained with MSE loss on pairs of beampattern samples and their corresponding beamforming vectors, effectively learning a non‑linear reconstruction that respects the physical relationship between array response and power distribution. This step restores the interpretable, implementable beamforming vectors while preserving the intuitive intermediate representation provided by the beampattern.
System Model and Problem Formulation
The paper details a FD ISAC base station equipped with Nt transmit and Nr receive antennas arranged as uniform linear arrays (ULAs). The transmitted signal is a linear combination of K user data symbols (beamforming vectors wk) and L dedicated sensing symbols (vectors vl). The received signal at each user includes inter‑user interference, sensing‑induced interference, and noise. The radar echo model incorporates transmit/receive steering vectors, target angles, and residual self‑interference. Communication SINR, sensing SINR, and their respective rates are derived, leading to the optimization problem (P1) that maximizes Σk Rcom,k subject to a total power constraint, unit‑norm receive beamformers, minimum sensing‑rate constraints, and the requirement that beamforming vectors be recoverable from the beampattern set.
Simulation Results
Simulations consider realistic parameters (e.g., Nt = 64, Nr = 64, K = 8, L = 4) and compare the proposed three‑stage solution against a baseline RL approach that directly outputs beamforming vectors with 3‑bit quantized phase shifters. The three‑stage method achieves roughly 15 % higher sum communication rate while satisfying all sensing‑rate constraints. Moreover, the convergence speed of the RL agent improves markedly because the action space is reduced to a fixed‑size beampattern vector, and the use of latent CSI features enhances sample efficiency and generalization across different antenna/user configurations.
Conclusions and Future Work
The authors demonstrate that decomposing ISAC beamforming into feature extraction, beampattern optimization, and reconstruction yields a more scalable, interpretable, and high‑performing solution than end‑to‑end RL or convex methods. Remaining challenges include real‑time model updates for time‑varying channels, extension to non‑ULA or multipath‑rich environments, and joint optimization of self‑interference cancellation with beamforming. Future research directions suggested involve online learning mechanisms, robustness to model mismatches, and hardware‑in‑the‑loop validation.
Overall, the paper contributes a novel, modular deep‑learning framework that bridges the gap between data‑driven optimization and physical‑layer interpretability for next‑generation ISAC systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment