RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems

RL based Beamforming Optimization for 3D Pinching Antenna assisted ISAC Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper, a three-dimensional (3D) deployment scheme of pinching antenna array is proposed, aiming to enhances the performance of integrated sensing and communication (ISAC) systems. To fully realize the potential of 3D deployment, a joint antenna positioning, time allocation and transmit power optimization problem is formulated to maximize the sum communication rate with the constraints of target sensing rates and system energy. To solve the sum rate maximization problem, we propose a heterogeneous graph neural network based reinforcement learning (HGRL) algorithm. Simulation results prove that 3D deployment of pinching antenna array outperforms 1D and 2D counterparts in ISAC systems. Moreover, the proposed HGRL algorithm surpasses other baselines in both performance and convergence speed due to the advanced observation construction of the environment.


💡 Research Summary

This paper tackles the emerging challenge of jointly optimizing communication and sensing functions in integrated sensing‑and‑communication (ISAC) systems by exploiting a three‑dimensional (3D) deployment of pinching antennas. Pinching antennas are low‑cost, reconfigurable elements that can be “pinched” at any point along a dielectric waveguide, thereby creating on‑demand line‑of‑sight (LoS) beams without the need for bulky phased‑array hardware. The authors extend the conventional one‑dimensional (1D) or two‑dimensional (2D) placement of such antennas to a full 3D configuration by equipping a base station (BS) with three orthogonal waveguides (aligned with the x, y, and z axes) and placing N antennas on each guide, for a total of M = 3N elements.

The system model assumes K single‑antenna users and L sensing targets, all located in the ground plane. In each time slot t, a time‑division multiple‑access (TDMA) scheme allocates a fraction q_{com,k,t} of the slot to user k and a transmit power p_{k,t}. The transmitted signal is a superposition of the users’ symbols, phase‑shifted by each antenna’s position‑dependent phase θ_{n,t}=2π|ψ_{p0}−ψ_{pn,a,t}|/λ_0. Near‑field spherical‑wave channel models are used for both communication and sensing links, leading to complex‑valued channel vectors h_{k,t} and h_{l,t} that depend on the Euclidean distances between each antenna and the corresponding receiver or target.

The optimization objective is to maximize the sum communication rate Σ_{t=1}^{T} Σ_{k=1}^{K} R_{com,k,t} while satisfying (i) a minimum sensing signal‑to‑noise ratio (SNR) Γ_{min} for every target in every slot, (ii) the TDMA time‑budget constraint Σ_{k} q_{com,k,t} ≤ 1, (iii) a total energy budget Σ_{t,k} p_{k,t} q_{com,k,t} ≤ E, and (iv) a minimum inter‑antenna spacing δ to avoid physical collisions. Because the decision variables include continuous antenna displacements, time fractions, and power levels, the problem is highly non‑convex, involving absolute values, exponential phase terms, and mixed integer‑continuous constraints. Traditional convex solvers or simple reinforcement‑learning (RL) approaches cannot efficiently explore this massive action space.

To overcome these difficulties, the authors cast the problem as a Markov decision process (MDP). The state at time t is represented by a heterogeneous graph G_t = (V, E), where nodes belong to three types—antennas, users, and targets—and edges capture three relation types: communication, sensing, and interference. Each node carries a feature vector consisting of a one‑hot type identifier and its 3‑D position. A heterogeneous graph neural network (HetGNN) processes this graph: for each relation r, a distinct weight matrix W_r^{(l)} aggregates neighbor embeddings, and a self‑loop weight W_0^{(l)} preserves the node’s own information. The update rule is
h_v^{(l+1)} = σ( Σ_{r∈T_e} Σ_{u∈N_r(v)} W_r^{(l)} h_u^{(l)} + W_0^{(l)} h_v^{(l)} ),
where σ is a non‑linear activation. After L layers, a global embedding h_t is obtained by mean‑pooling all node embeddings. This embedding serves as the input to both a continuous‑action policy network π_θ (the “actor”) and a value network V_ψ (the “critic”).

The action a_t =


Comments & Academic Discussion

Loading comments...

Leave a Comment