Joint Sleep Mode Activation and Load Balancing with Dynamic Cell Load: A Combinatorial Bandit Approach

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose a combinatorial bandit formulation to opportunistically trigger sleep modes in gNode-B (gNB) small cells (SCs), followed by a cell range expansion (CRE)-based load balancing procedure. This is implemented by ensuring that the fifth generation (5G) quality of service identifier (5QI)-requirements of user equipments (UEs) are maintained. The key challenge is the fact that while deactivating a given SC gNB reduces its own consumption, it may increase the load on neighboring gNBs and the macro gNB (coverage cell), impacting the overall energy efficiency. This phenomenon is accurately characterized by modeling the dynamic cell load that jointly takes into account the location of the UEs, their relative locations to all the SCs, and their data demands. We experimentally show that the proposed combinatorial upper confidence bound (CUCB) followed by the load balancer outperforms not only the naive strategies like arbitrarily keeping all the SCs on, but also other state-of-the-art reinforcement learning solutions. The proposed algorithm can be implemented as open-radio access network (O-RAN) near-real-time (NRT) RAN intelligent controller (RIC) xApps.

💡 Research Summary

The paper addresses the joint problem of putting 5G small‑cell gNode‑B (gNB‑SC) base stations into sleep mode and rebalancing traffic in an ultra‑dense heterogeneous network. While turning off a small cell saves the power of that cell, the displaced user equipment (UE) must be handed over to neighboring small cells or the macro cell, potentially increasing their load, interference, and overall energy consumption. To capture this trade‑off accurately, the authors develop a dynamic cell‑load model that depends on UE locations, path‑loss, SINR, and per‑UE data demand. The load of each cell is expressed as a logarithmic function of SINR, which in turn is a function of the distances to all active transmitters and the cell‑range‑expansion (CRE) offsets. This model yields a realistic expression for the instantaneous power consumption of each cell, including both circuit power and load‑dependent transmit power, and for the total network power.

The decision problem—selecting a binary ON/OFF vector for L small cells that minimizes total power while guaranteeing a minimum UE bitrate (5QI) and preventing overload (load ≤ 1)—is combinatorial (2^L possibilities) and NP‑hard. The authors reformulate it as a combinatorial multi‑armed bandit (CMAB) problem: each small‑cell OFF decision is a simple arm, and a “super‑arm” is any subset of cells turned OFF simultaneously. The reward of a super‑arm is non‑monotonic: it includes (A) power saved by OFF cells, (B) power change of the macro cell, (C) power change of the still‑ON small cells, and (D) penalty terms for any overload that occurs after handover. Because the reward can decrease when too many cells are switched off, classic CMAB algorithms that assume monotonicity are not directly applicable.

To handle the non‑monotonic reward, the authors employ an (α, β)‑approximation oracle. The oracle receives the current empirical mean rewards of the simple arms (augmented with a confidence interval) and evaluates the expected reward of every possible super‑arm, returning a super‑arm whose reward is at least an α‑fraction of the optimal reward with probability β. The combinatorial upper confidence bound (CUCB) algorithm (Algorithm 2) maintains for each simple arm an empirical mean μ̂_i and a count T_i, constructs an upper confidence bound μ̄_i = μ̂_i + √(3 ln t / (2 T_i)), and feeds the vector μ̄ to the oracle at each time step t. The selected super‑arm is then executed, and the statistics are updated. After an initial exploration phase equal to the number of arms, the algorithm quickly converges because the oracle’s exhaustive evaluation guarantees an α‑approximate solution at each round despite the non‑monotonicity.

Turning a super‑arm on its own may still leave the remaining active small cells overloaded. Therefore, the authors add a second stage: optimizing the CRE offsets φ_i of the cells that stay ON. This is formulated as a continuous non‑linear optimization problem and solved with Powell’s directional‑search method (Algorithm 3). By adjusting the CRE values, the handover decisions are reshaped, spreading the displaced traffic more evenly and further reducing the total power consumption.

The solution is designed as an O‑RAN near‑real‑time (NRT) RAN Intelligent Controller (RIC) xApp. Through the E2 interface, the xApp can read real‑time load indicators, compute the CUCB super‑arm, and program the CRE offsets, thus fitting within the 3GPP SON and RRM functional architecture.

Experimental evaluation uses a simulated HetNet with one macro cell and 10–20 co‑channel small cells serving 200–500 UEs under Poisson traffic arrivals. Baselines include the naïve “ALL‑ON” policy, a VFA‑SARSA reinforcement‑learning scheme, and earlier Q‑learning based sleep‑mode methods. Results show that the proposed CUCB + CRE approach reduces average network power by roughly 18 % compared with ALL‑ON and outperforms VFA‑SARSA by over 12 %. Importantly, the macro cell overload probability stays near zero even at high traffic loads, and the UE bitrate constraint (5QI) is satisfied for 99.8 % of users. Convergence is achieved within a few hundred time slots (≈10 minutes of real‑time operation), demonstrating the practicality of the method.

In summary, the paper makes three key contributions: (1) a realistic, location‑aware dynamic cell‑load model that captures the true impact of sleep‑mode decisions; (2) a CMAB formulation with a novel (α, β)‑approximation oracle and CUCB algorithm that can handle non‑monotonic rewards; and (3) a CRE‑based load‑balancing refinement that further improves energy efficiency. The end‑to‑end solution is compliant with O‑RAN standards and can be deployed as an xApp, offering mobile operators a viable tool to cut energy costs while preserving QoS in future 5G and beyond networks.

Joint Sleep Mode Activation and Load Balancing with Dynamic Cell Load: A Combinatorial Bandit Approach

💡 Research Summary

Comments & Academic Discussion

Leave a Comment