QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning
ArXiv ID: 2512.19696
Date: 2025-11-21
Authors: Researchers from original ArXiv paper

📝 Abstract

Open Radio Access Network (O RAN) disaggregates conventional RAN into interoperable components, enabling flexible resource allocation, energy savings, and agile architectural design. In legacy deployments, the binding between logical functions and physical locations is static, which leads to inefficiencies under time varying traffic and resource conditions. We address this limitation by relaxing the fixed mapping and performing dynamic service function chain (SFC) provisioning with on the fly O CU selection. We formulate the problem as a Markov decision process and solve it using GRLDyP, i.e., a graph neural network (GNN) assisted deep reinforcement learning (DRL). The proposed agent jointly selects routes and the O-CU location (from candidate sites) for each incoming service flow to minimize network energy consumption while satisfying quality of service (QoS) constraints. The GNN encodes the instantaneous network topology and resource utilization (e.g., CPU and bandwidth), and the DRL policy learns to balance grade of service, latency, and energy. We perform the evaluation of GRLDyP on a data set with 24-hour traffic traces from the city of Montreal, showing that dynamic O CU selection and routing significantly reduce energy consumption compared to a static mapping baseline, without violating QoS. The results highlight DRL based SFC provisioning as a practical control primitive for energy-aware, resource-adaptive O-RAN deployments.

💡 Deep Analysis

Deep Dive into QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning.

📄 Full Content

QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning Sebastian Racedo and Brigitte Jaumard Computer Science and Software Engineering Concordia University Montreal (Qc) Canada brigitte.jaumard@concordia.ca Oscar Delgado Systems engineering Ecole de Technologie Sup´erieure (ETS) Montreal (Qc) Canada Meysam Masoudi Ericsson Kista, Sweden Abstract—Open Radio Access Network (O-RAN) dis- aggregates conventional RAN into interoperable components, enabling flexible resource allocation, energy savings, and agile architectural design. In legacy deployments, the binding between logical functions and physical locations is static, which leads to inefficiencies under time-varying traffic and resource conditions. We address this limitation by relaxing the fixed mapping and performing dynamic service function chain (SFC) provisioning with on-the-fly O-CU selection. We formulate the problem as a Markov decision process and solve it using GRL-DyP, i.e., a graph neural network (GNN)–assisted deep reinforcement learning (DRL). The proposed agent jointly selects routes and the O-CU location (from candidate sites) for each incoming service flow to minimize network energy consumption while satisfying quality-of-service (QoS) constraints. The GNN encodes the instantaneous network topology and resource utilization (e.g., CPU and bandwidth), and the DRL policy learns to balance grade of service, latency, and energy. We perform the evaluation of GRL-DyP on a data set with 24-hour traffic traces from the city of Montreal, showing that dynamic O-CU selection and routing significantly reduce energy consumption compared to a static mapping baseline, without violating QoS. The results highlight DRL-based SFC provisioning as a practical control primitive for energy-aware, resource-adaptive O-RAN deployments. Index Terms—O-RAN, Deep Reinforcement Learning, Graph Neural Networks, SFC Provisioning, Energy Efficiency. I. INTRODUCTION The transition to 5G and the trajectory toward 6G are accelerating adoption of the O-RAN architecture, shifting networks from proprietary, monolithic stacks to disaggregated, virtualized, and intelligent network [1]. By decoupling the Radio Unit (O-RU), Distributed Unit (O-DU), and Centralized Unit (O-CU), O-RAN enables flexible placement and scaling of functions while fostering a multi-vendor ecosystem. These capabilities are underpinned by Software-Defined Networking (SDN) and Network Function Virtualization (NFV), which also support resource partitioning via network slicing [2]. As the O-RAN Alliance advances specifications and deployment profiles, the resulting design space offers greater agility but also introduces substantial orchestration complexity across het- This work was supported by NSERC (under project ALLRP 566589-21) and Innov´E´E (INNOV-R program) through the partnership with Ericsson. We are grateful to Adel Larabi at GAIA, Ericsson Montr´eal for clarifying some concepts of the current 5G technology. erogeneous hardware, fronthaul constraints, and time-varying traffic [1]. However, the same flexibility complicates resource allo- cation and control. Service function chains (SFCs) must be placed, scaled, and steered across heterogeneous compute and transport resources while meeting slice-specific QoS tar- gets, ranging from high-throughput enhanced Mobile Broad- band (eMBB) to Ultra-Reliable Low Latency Communication (URLLC) [3]. In practice, many deployments still rely on static deployment and enforce rigid 1:1 bindings among O-RAN components. Such configurations are often derived from offline capacity planning for peak demand, leading to underutilized hardware and significant energy waste during non-peak hours. The principles of NFV enable resource and routing deci- sions to be managed by a centralized controller that maintains a global view of the network’s state. This opens the door for more intelligent orchestration methods [3]. Although tradi- tional optimization techniques, such as integer linear programs (ILPs), can find optimal solutions, they often struggle to cope with the scale and dynamism of real-world networks [2]. This complex, dynamic trade-off space is an ideal application for Deep Reinforcement Learning (DRL). A DRL agent, in contrast, can learn complex, non-obvious strategies directly from data patterns, managing the multi-objective problem of maximizing service success while minimizing both latency and energy consumption. Besides using DRL, given the natural structure of network-related problems, the use of graph data is usually beneficial. To work directly with this type of data, we utilize Graph Neural Networks (GNNs) [4]. In this paper, we propose an RL framework that leverages a GNN [4] to learn a joint routing and O-CU selection policy. Our agent’s architecture is based explicitly on Graph Convo- lutional Networks (GCNs) [5] that use convolutional network, enabling it to learn from the underlying network topology and real-time state effectively. W

…(Full text truncated)…

📄 Read Full PDF on ArXiv