Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing
Sixth-generation (6G) radio access networks (RANs) must enforce strict service-level agreements (SLAs) for heterogeneous slices, yet sudden latency spikes remain difficult to diagnose and resolve with conventional deep reinforcement learning (DRL) or explainable RL (XRL). We propose \emph{Attention-Enhanced Multi-Agent Proximal Policy Optimization (AE-MAPPO)}, which integrates six specialized attention mechanisms into multi-agent slice control and surfaces them as zero-cost, faithful explanations. The framework operates across O-RAN timescales with a three-phase strategy: predictive, reactive, and inter-slice optimization. A URLLC case study shows AE-MAPPO resolves a latency spike in $18$ms, restores latency to $0.98$ms with $99.9999%$ reliability, and reduces troubleshooting time by $93%$ while maintaining eMBB and mMTC continuity. These results confirm AE-MAPPO’s ability to combine SLA compliance with inherent interpretability, enabling trustworthy and real-time automation for 6G RAN slicing.
💡 Research Summary
The paper introduces AE‑MAPPO, an attention‑enhanced multi‑agent Proximal Policy Optimization framework designed for 6G Open‑RAN (O‑RAN) environments where rapid latency spikes can jeopardize service‑level agreements (SLAs) across heterogeneous network slices (URLLC, eMBB, mMTC). Traditional deep reinforcement learning (DRL) approaches excel at resource allocation but act as black boxes, while existing explainable RL (XRL) methods provide post‑hoc insights that are too slow for millisecond‑scale fault mitigation. AE‑MAPPO embeds interpretability directly into the policy network by integrating six specialized attention heads that operate during inference and are simultaneously used to drive allocation decisions and generate zero‑cost explanations.
The authors first formalize the joint resource allocation problem as a weighted single‑objective optimization that combines slice utilities (QoS satisfaction, efficiency, fairness) with an explicit explainability utility E. The explainability term is decomposed into three components: sparsity (entropy minimization of attention distributions), consistency (stability of attention across ε‑similar states), and faithfulness (correlation between attention vectors and the gradient of the total utility). These components are weighted (η₁=0.3, η₂=0.3, η₃=0.4) to prioritize faithful explanations while preserving focus and stability.
AE‑MAPPO maps the optimization into a multi‑agent RL setting, assigning one PPO agent per slice type. The global state aggregates per‑slice metrics (queue occupancy, SINR, predicted demand) and near‑RT RIC context. Actions consist of power, PRB, and edge‑compute allocations, constrained by safety masks to avoid infeasible assignments. A shared multi‑head attention module produces an attention vector Aπ(s) that both influences the action selection and is logged as the explanation. The per‑step reward is a linear combination of slice utilities and the explainability utility, ensuring that the learning process simultaneously optimizes performance and interpretability.
The six attention heads are:
- Semantic attention – highlights SLA‑critical features such as buffer occupancy or channel quality.
- Temporal attention – captures short‑term and recurring traffic patterns for proactive provisioning.
- Cross‑slice attention – quantifies interference between slices, exposing which slice is affecting another.
- Confidence attention – measures entropy‑based uncertainty in state features, tempering aggressive allocations under high uncertainty.
- Counterfactual attention – compares the chosen action with plausible alternatives, enabling “what‑if” reasoning.
- Meta‑controller – dynamically weights the previous five heads to produce a context‑aware fused attention vector.
These heads are orchestrated across three O‑RAN timescales: a 100 ms predictive phase (using semantic and temporal heads to pre‑allocate resources), a 10 ms reactive phase (leveraging confidence attention to correct sudden SLA violations within a TTI), and a 50 ms inter‑slice optimization phase (using cross‑slice attention to re‑allocate surplus resources from lower‑priority slices). This hierarchy balances long‑term anticipation with ultra‑fast reaction.
Training optimizes a combined loss L_total = L_PPO + α_xrl(β₁L_sparse + β₂L_cons + β₃L_faith), where the PPO term is the standard clipped surrogate loss and the three auxiliary losses enforce sparsity, consistency, and faithfulness of the attention mechanisms. Hard constraints are enforced via action clipping, while soft penalties embed SLA violations directly into the reward.
Evaluation is performed on a realistic 6G RAN slicing simulator with URLLC, eMBB, and mMTC slices. A sudden latency spike in the URLLC slice (from 1.15 ms to >1 ms) is introduced at t = 14:23:15. Manual troubleshooting typically requires inspecting >15 potential causes and takes an average of 11.5 minutes. AE‑MAPPO resolves the spike in 0.8 minutes (93 % faster) by reallocating power (URRLC from 25 % to 42 %), reducing eMBB power (45 % to 30 %), and shifting PRBs accordingly. URLLC latency drops to 0.98 ms with 99.9999 % reliability, while eMBB maintains service continuity at slightly reduced video quality.
The attention outputs provide immediate, human‑readable diagnostics: semantic attention flags a buffer overflow (weight 0.89), cross‑slice attention reveals eMBB interference (weight 0.76), and temporal attention identifies a recurring daily pattern near 14:20. Counterfactual attention discards “do nothing” and “throttle mMTC” alternatives, confirming eMBB reduction as the optimal mitigation. All explanations are generated at inference time with zero additional computational overhead.
Performance metrics show that AE‑MAPPO’s decision latency (18 ms) comfortably fits within the 25 ms near‑RT RIC control window, and the combined utility (performance + explainability) remains higher than baseline DRL or XRL approaches. The study demonstrates that embedding attention‑based interpretability into the policy not only satisfies stringent 6G SLAs during anomaly events but also equips operators with actionable insights, fostering trust in autonomous network management.
The authors acknowledge remaining challenges: the design complexity of multiple attention heads, the need for online tuning of meta‑controller weights, and the validation of scalability in large‑scale live networks. Future work will explore automated attention architecture search, adaptive meta‑learning for weight adjustment, and field trials on O‑RAN testbeds. Overall, AE‑MAPPO represents a significant step toward trustworthy, real‑time automation of 6G RAN slicing, unifying high‑performance resource orchestration with built‑in, zero‑cost explainability.
Comments & Academic Discussion
Loading comments...
Leave a Comment