Learning-Based Phase Shift Optimization of Liquid Crystal RIS in Dynamic mmWave Networks
To enhance coverage and signal quality in millimeter-wave (mmWave) frequencies, reconfigurable intelligent surfaces (RISs) have emerged as a game-changing solution to manipulate the wireless environment. Traditional semiconductor-based RISs face scalability issues due to high power consumption. Meanwhile, liquid crystal-based RISs (LC-RISs) offer energy-efficient and cost-effective operation even for large arrays. However, this promise has a caveat. LC-RISs suffer from long reconfiguration times, on the order of tens of milliseconds, which limits their applicability in dynamic scenarios. To date, prior works have focused on hardware design aspects or static scenarios to address this limitation, but little attention has been paid to optimization solutions for dynamic settings. Our paper fills this gap by proposing a reinforcement learning-based optimization framework to dynamically control the phase shifts of LC-RISs and maximize the data rate of a moving user. Specifically, we propose a Deep Deterministic Policy Gradient (DDPG) algorithm that adapts the LC-RIS phase shifts without requiring perfect channel state information and balances the tradeoff between signal-to-noise ratio (SNR) and configuration time. We validate our approach through high-fidelity ray tracing simulations, leveraging measurement data from an LC-RIS prototype. Our results demonstrate the potential of our solution to bring adaptive control to dynamic LC-RIS-assisted mmWave systems.
💡 Research Summary
**
The paper tackles a critical bottleneck of liquid‑crystal‑based reconfigurable intelligent surfaces (LC‑RIS) for millimeter‑wave (mmWave) communications: the relatively long reconfiguration latency (tens of milliseconds) that hampers their use in dynamic environments. While LC‑RIS offers dramatically lower power consumption (≈150 mW for a 1 m² array) compared with semiconductor‑based RIS, its slow electro‑optic response limits real‑time beam steering for moving users.
To address this, the authors formulate a system model for an indoor office scenario consisting of a single‑antenna access point (AP), one LC‑RIS with 750 unit cells, and a mobile user moving at ≤5 m/s. The AP–RIS and RIS–user links follow a Rician fading model, while the direct AP–user link is blocked, leaving only non‑line‑of‑sight components. Each RIS element reflects with unit amplitude and a controllable phase ωₙ; the physical reconfiguration time for a desired phase shift is derived from exponential dynamics characterized by two time constants (τ⁻_c = 29 ms, τ⁺_c = 9 ms). The maximum configuration time t_c for a slot is the worst‑case among all elements, and the effective serving time is t_k = t_s − t_c. The instantaneous data rate in a slot is R = (t_k/t_s) · B · log₂(1+SNR), leading to the optimization problem of maximizing the expected rate subject to unit‑modulus, phase‑range, and timing constraints.
Because perfect, up‑to‑date channel state information (CSI) is unavailable—RIS is passive and CSI estimation at the AP would be outdated—the authors recast the problem as a Markov decision process (MDP) and apply a Deep Deterministic Policy Gradient (DDPG) algorithm. The state vector includes the current RIS phase vector, a theoretical optimal phase vector computed from outdated CSI (ω_opt), previous slot distances, and the real/imaginary parts of the AP‑RIS, RIS‑user, and AP‑user channels. The action is the continuous phase vector to be applied in the next slot. Rather than using the raw data rate as reward, they define a weighted sum r_i = β₁·SNR + β₂·t_k, allowing explicit control over the trade‑off between signal quality and configuration latency. Two deep neural networks—an actor that outputs phase vectors and a critic that evaluates Q‑values—are trained with experience replay and target networks to handle the continuous state‑action space.
For validation, a 60 GHz LC‑RIS prototype (30 × 25 cells) is used. Due to column‑wise driving constraints, only 30 independent phase values can be set (all 25 cells in a column share the same phase). Measured reflection coefficients from the prototype are embedded into a high‑fidelity ray‑tracing simulator that models a realistic office layout with walls, furniture, and blockage. Simulation parameters include a slot duration t_s = 50 ms, bandwidth B = 1 GHz, and transmit power P_t = 30 dBm.
Results show that the DDPG‑based policy achieves roughly a 12 % increase in average data rate compared with a static “optimal” phase configuration that ignores reconfiguration delay, while keeping the actual configuration time below 20 ms. By adjusting β₁ and β₂, the algorithm can prioritize either higher SNR or longer serving time, demonstrating flexibility. Convergence is observed after about 3,000 training episodes (≈30 minutes of simulated time), and the method is robust to random initial phase settings.
The paper contributes three main innovations: (1) an explicit latency‑aware formulation of the RIS phase‑shift optimization problem; (2) a continuous‑action reinforcement‑learning solution that operates without perfect CSI; and (3) a realistic evaluation that integrates measured LC‑RIS hardware characteristics into ray‑tracing simulations. Future work is suggested on multi‑user/multi‑AP extensions, meta‑learning for rapid adaptation to new environments, and joint hardware‑algorithm co‑design to further reduce LC‑RIS switching times.
Comments & Academic Discussion
Loading comments...
Leave a Comment