Federated Multi Agent Deep Learning and Neural Networks for Advanced Distributed Sensing in Wireless Networks
Multi-agent deep learning (MADL), including multi-agent deep reinforcement learning (MADRL), distributed/federated training, and graph-structured neural networks, is becoming a unifying framework for decision-making and inference in wireless systems …
Authors: Nadine Muller, Stefano DeRosa, Su Zhang
1 Federated Multi -Agent Deep Learnin g and Neural Networks for Advanced Distrib uted Sensing in Wireless Networks Nadine Muller, S tefano DeRosa, Su Z hang, Chun Le e Huan Faculty of Info rmation Te chnology and Elect rical Enginee ring, University o f Oulu Abstract — Multi-agent deep learning (MADL), including multi-agent deep reinforcement learning (MADRL), dis tributed/federated training, and graph-structured neural networks, is becoming a unifying framework for d ecision-making and inference in wireless systems where sensing, communication, a nd computing are tightly c oupled. Recent 5G-Advanced and 6G visions strengthen this coupling through integrated sensing and communication, edge int elligence, open programmable RAN, and non-terrestrial/UAV networking, which create decentralized, pa rtially observed, time-varying, and resource-constrained control problems. This survey synthesizes the state of the art, with emphasis on 2021 -2025 research, on MADL for distributed sensing and wireless communications. We present a task-driven taxonomy across (i) learning formulations (Markov games, Dec-POMDPs, CTDE), (ii) neura l architectures (GNN -based radio resour ce management, attention-based policies, hier archical learning, and over-the -air aggregation), (iii) advanced techniques (federated reinforcement learning, communication- efficient f ederated deep RL, and serverless edge learning orch estration), and (iv) application domains (MEC offloading with slicing, UAV-enabled heteroge neous networks with power-domain NOMA, intrusion detection in sensor n etworks, and ISAC-driven perceptive mobile netwo rks). We also provide comparative tables of algorithms, training topologies, and system-l evel trade-offs in latency, spectral efficiency, energy, privacy, and robustness. Finally, we identify open issues including scalability, non-stationari ty, security against poisoning and backdoors, communi cation overhead, and real-time sa fety, and outline researc h directions toward 6G-native sense-communicate- compute-learn systems. Index Terms — multi-agent reinforcement learning, serverless edge computing, distributed sensing, mobile edge computing, graph neural networks, 6G. Introduction Wireless networks are transitioning from largely communication-centric infrastructures to multi- functional, sensing-capable and compute- rich platforms, motivated by applications such as autonomous mobility, industrial automation, extended reality, and low -altitude/UAV-based services. ISAC (a nd the broa der “perceptive mobile networks” concept) e xemplifies this evolution by reusing spectrum, hardware, and signaling to jointly enable communications and environment awareness [1]. These systems are inherently mul ti -node and multi-perspective: sensing is improved by networked collaboration across many sensing/communication nodes, while communications performance must be preserved under sensing-induced interfe rence and ti ght latency budgets. [2] 2 From an optimisation viewpoint, next-generation wireless control involves high -dimensional, stochastic, and coupled decision variables (such as association, pow er, b eamforming, spe ctrum access, trajectory control, caching, and s ervice placement), often under pa rtial observability and decentralised infor mation. In such settings, multi -agent deep learning provides (i) scalable function approximation for com plex policies and value fun ctions, (ii) principled frameworks for decentralised coordination (cooperative or competitive), and (iii) data -driven adaptation when accurate analytical models are unavailable or intra ctable. A representative tutorial for AI-enabled wireless networks emph asises that multi-agent reinforcement learning is particularly relevant when more than on e ent ity makes decisions, for example, UAVs, access points, sli ces, or edge servers. [3] A second, equally important inflection is distributed training and privacy-aware inte lligence at the wireless edge. Federated learning (FL) enables collaborative model tr aining without moving raw data, aligning well with privacy constraints and the data locality of IoT/edge sensing. Ye t wireless FL must co-design learning and communications (for example, client selection, bandwidth/power allocation, over-the-air aggregation), a nd it inherits new threats such as mode l/data poisoning and backdoors, which are particularly acute in ope n wireless environments. [4] This survey makes the following contributions: 1) A systems- to -algorithms taxonomy of MADL across distributed sensing and wireless communications, integrating MADRL, FL/F RL, and GNN-based architectures [5]. 2) A critical synthesis of advanced techniques, hierarchical/over-the-air FL, communication - efficient federated deep RL, and serverless edge computing as an int egrated orch estration substrate [6]. 3) Application-focused cov erage spanning MEC offloading and slicing, UAV-enabled heterogeneous networks (including powe r-domain NOMA and federated multi-agent control), intrusion detection in sensing networks, and ISAC-driven perceptive distributed networking [7]. 4) An open-issues agenda for 6G-na tive deployments, emphasising scalability, security, real- time constraints, and reproducibility [8]. Background and Fundamentals Multi-agent learning in wireless systems is best understood by aligning wireless primitives (nodes, channels, traffic, mobility, slices, and sensing measurements) with learning primi tives (agents, observations, actions, rewards/losses, and coo rdination mechanisms). A widely used modelling approach is the Markov game (a multi-agent ge neralization of MDPs) or its partially observed and decentralized variants (s uch as, Dec-POMDPs), commonly paired with centralized tr aining and decentralized execution (CTDE). This is especially natural for wireless networks: joint training can exploit global simulators or network -level telemetry, while execution must be decentralized to meet signaling and latency constraints. [9] Distributed sensing technologies and ISAC context. Modern distributed sensing spans conventional wireless sensor networks (WSNs), cooperative loc alization, multi-static radar-like sensing, and ISAC-enabled perceptive systems. Key surveys e mphasize that joint 3 radar/communications (or sensing/communications) integration requires c oordinated waveform design, receiver proce ssing, and cross -layer resource management, while ISAC fundamental- limit analyses highlight intrinsic trade-offs between information transfer and sensing perf ormance. [10] Perceptive mobile networks extend this by treating cellular infrastruc ture a s a n etworked sensing platform where multiple nodes collaboratively observe ta rgets from different perspectives; this increases sensing robust ness but raises challenges such as mutual interference and f ast target tracking. [11] Neural network architectures in wire less learning. W hile generic mul tilayer perceptrons c an approximate policies, wireless problems exhibit strong structure: interference graphs, neighbour interactions, permutation symmetry ac ross users/links, and locality . Graph neural networks (GNNs) explicitly en code such struc ture, enabling scalable and generalizable learning- to -optimize for radio resource mana gement (RRM) [12]. I n particular, message-passing GNN approaches formalize permutation equivariance for RRM tasks and provide both performance and interpretability advantages, while subsequent work demonstrates broad er “from theory to practice” pipelines for wireless communications. Edge intelligence and 6G framing. Edge AI visions in 6G argue that intellige nce must be “pushed down” to edge nodes to meet stringent latency a nd reliability requirements and to exploit local data. This motivates distributed training (e.g., FL) and distributed inference, and it ties directly to multi-agent control, since lea rning agents and inference nodes often coincide with network entities that must act in real time. [13] Distribut ed sensi ng + wir eless comm s + MADL (concept ual) ┌──────── ──────── ──────── ────────── ───────── ──────── ──────── ───────┐ │ Sensin g/UE age nts (IoT , UAVs, AP s, RSUs) │ │ - local observat ions: CS I, queue s, AoI, se nsing fea tures │ │ - local actions: power, channel, beam, tra jectory, offloa ding │ └──────── ───────┬ ──────── ────────── ───────── ────┬─── ──────── ───────┘ │ wire less lin ks + sign alling │ sensing returns ▼ ▼ ┌──────── ──────── ──────── ────────── ───────── ──────── ──────── ───────┐ │ Edge a ggregati on/contr ol plane ( MEC / O - RAN / local cl oudlets) │ │ - CTDE training, hierarc hical FL , OTA aggr e gation (AirFL /AirComp │ │ - sched uling: wh o trains /communi cates when , with wh at res ources │ └──────── ───────┬ ──────── ────────── ───────── ────┬─── ──────── ───────┘ │ peri odic glo bal model /policy │ telem etry ▼ ▼ ┌──────── ──────── ──────── ────────── ───────── ──────── ──────── ───────┐ │ Cloud / operat or plane │ │ - long- ho rizon op timisatio n, digit al twin simulation , analy tics │ │ - secur ity monit oring, b ackdoor /poisoning defences │ └──────── ──────── ──────── ────────── ───────── ──────── ──────── ───────┘ The remainder of the paper analyses how specific algorithmic families realise this architecture , and where they succeed or fail under wireless-specific constraints. [14] 4 Advanced Techniq ues This section surveys a s et of technique clusters that repeatedly app ear in high -impact recent literature: multi -agent DRL, federated (and over-the-air/hierarchical) learning, federated reinforcement learning, and serverless paradigms that decouple control logic from infrastructure . Multi-agent deep reinforcement learning for wi reless control. A consistent pattern in rec ent TWC/related work is to c ast wireless optimisation as multi -agent sequential decision-making: each base station, access point, user, UAV, or slice controller acts as an agen t. For example, TWC research demonstrates that multi-agent DRL can achieve strong perf ormance in resou rce management p roblems b y learning decentralised policies that mi nimise coordination overhead at run time while still capturing coupling (such as interference) [15]. Recent wo rk expands the algorithmic toolbox beyond basic DQN/DDPG-style methods, incorporating (i) mean-field approximations for scalability in dense networks (useful when th e number of interacting neighbours is large), and (ii) constrained or safety-aware fo rmulations when QoS/SLA constraints are non-negotiable. [16] Graph-based deep learning as scalable inductive bias. GNN -based “learning to optimise” frameworks exploit the graph structure induced by channel/interference relationships and naturally generalise across network sizes and topologies. The JSAC architecture -and-theory p aper formalises why permutation e quivariance matte rs for dense networks and c onnects message passing to distributed op timisation; later TWC work provides pra ctical end - to -end tr eatment for wireless communication tasks. [12] Federated le arning for wire less networks: from priva cy to c o-design. Wireless FL literature consistently shows that “training over the air” is not simply classical FL on a dif ferent transport. Joint design of lea rning and communications (for example, user selection, power/ba ndwidth allocation) is central fo r conv ergence a nd ene rgy. A canonical TWC framework explicitly optimises learning objectives under wireless re source limitations. [17] Further, hierarc hical FL a rchitectures (client – edge – cloud) offer a structured way to reduce communication load and latency; converge nce-sen sitive quantisation and schedule de sign become key, as detailed in TWC convergence/system-design analyses [18]. Over-the-air computation (AirComp) and Air FL compress aggregation late ncy by exploiting waveform superposition, but introduce new error mechanisms (fading/noise, device heterogeneity) and require c areful pow er control a nd agg regation design; recent surveys and TWC studies formalise these trade-offs [19]. Federated r einforcement learning and fed erated de ep RL. F ederated reinforcement learning (FRL) is increasingly used wh en (i) agents accumulate local trajectories or logs that cannot be centralised, and (ii) policies must adapt to heterogeneous environments . A recent IEEE Open Journal survey consolidates FRL fundamentals, challenge s, and future directions for wireless networks [20]. On the systems side, “federated deep RL” has be en applied to problems such as cooperative edge caching (multiple fog access points train shared policies with reduced co nvergence issues) and high-mobility / aerial relay networks (joint tra jectory and beamforming variables) [21]. 5 Serverless edge computing as an enabling substrate for distributed learning. Serverless edge computing is increasingly positioned as a way to manage the edge – cloud continuum by a bstracting infrastructure and enabling event-driven compute plac ement. A widely c ited I EEE Inte rnet Computing surve y eval uates maturity and ope n challenges for serverless at the edge [22]. From an optimisation perspective, cross-edge orchestration work in IEEE Transactions on Services Computing addre sses cold-start/caching dyna mics through probabilistic caching a nd request distribution, which is directly relevant to latency -sensitive learning pipelines deployed as micro- functions [23]. Importantly for this surve y’s scope, “serverless” can be interpreted not only as compute deployment but also as network intelligence d eplo yment: learning compon ents (policy inference, aggregation, monitoring) can be packaged as ephemeral fun ctions invoked on demand, aligning with variable traffic intensity and heterogeneous device participation — an idea reflected in emerging network and edge-learning discussions [24]. Comparative table of core learning para digms and wirel ess suitabilit y Paradigm Typical training topology Key strength in wireless Key limitation in w ireless Representative recent sources CTDE MADRL (Markov games) Centralised training; decentralised execution Captures coupling (interference, coordination) while keeping run-time signalling low Non-stationarity ; scaling to many agents; safe ty/QoS constraints Feriani & Hossain tutorial; TWC resource mgm t; in - X subnetworks; power-control MARL [25] Mean -fie ld MA DRL CTDE + mean- field approximation Scales to dense deployments by approximating neighbour influence Approximation e rror; may miss rare but critic al interactions Unlicensed spectrum mean-field DRL (TWC) [26] GNN -based learning- to - optimise Supervised/self- supervised training; distributed inference via message passing Exploits permutation equivariance; generalises across topologies Requires graph construction/featu re design; robustn ess to missing CSI JSAC GNN- RRM; TWC GNN practice; 6G optimisation survey [12] Wireless FL (client – server) Periodic aggregation; resource-aware scheduling Privacy by design; leverages local data Communication bottleneck; syste m heterogeneity; se curity threats Wireless FL frameworks; energy-efficient FL; IRS- assisted FL [27] Hierarchical / OTA FL Multi-tier aggregation or AirCo mp Lower latency aggregation; scalable edge intelligence Aggregation erro r; interference/hete rogeneity; complex contro l Hierarchical FL with quantisation; scalable 6 hierarchical OTA-FL; OTA aggregation over time- varying channels [28] Federated RL / federated deep RL Local RL; federated policy/model fusion Distributed adaptation without sharing trajectories Instability; partia l observability; trust/poisoning risk s FRL survey (IEEE OJVT); federated deep RL in TWC applications [29] Serverless edge orchestration Event-driven function placement/cachi ng Elasticity; simplified deployment; fit for intermittent learning tasks Cold starts; resou rce fragmentation; orchestration co mplexity IEEE Internet Computing survey; IEEE TSC orchestration work [30] Applications This section organises representative applications by where multi -agent le arning enters the stack (edge s ensing, RAN control, MEC, aerial networ king) and what system o bjective is optimised (throughput, latency, energy, reliability, fairness, privacy, sensing accuracy). Mobile edge computing offloading, slicing, and adaptive control. Dynamic computation offloading is a canonical application where local traff ic, radio condition s, and edge resource availability evolve quickly. A recent IEEE Communications Letters study proposes dynamic offloading in MEC with traffic-aware n etwork sli cing using an adaptive TD3 strategy, explicitly coupling learning with slicing-awareness, illustrating how RL agents must inte rnalise both compute and network virtualisation dynamics [31]. In parallel, the Open RAN (O-RAN) movement expands the feasible action space for learning - based RAN control by exposing programmable interf aces and near -real-time c ontrol loops. Constraint-aware multi-a gent RL has been used t o deliver flexible RAN slicing under varyi ng numbers of slices, addressing ove r-provisioning/SLA-violation risks in earlier RL slicing methods [32]. Complementarily, IEEE Communications Magazine surveys consolidate DRL methods for slice scaling/placement and R L for RAN slice RRM, providing a bridge between algorithm selection and operational requirements. [33] UAV -enabled and non-terrestrial networks. UAV-based networking int roduces fast, coupled control over mobi lity and communications. Multi-agent deep RL has been used for UAV t rajectory optimisation in differentiated-services settings (TMC), while federated mult i-agent DRL in TWC targets fair communications and trajec tory control across multip le UAVs [34]. UAV -enabled he terogeneous cellular networks further add multi -objective optimisation and multiple-access design. The user-provided Transactions on Emerging Telecommunications 7 Technologies article integrates serverless fede rated learning with power -domain NOMA, highlighting a trend toward combining (i) dist ributed learning for priva cy and scalability, (ii) aerial/heterogeneous infrastructure, and (iii) non-orthogonal access for spectral efficiency [35]. Federated d eep RL for high-dimensional aerial/relay optimisation. R ecent TWC work on federated DRL-aided space – aerial – terrestrial relay networks jointly opti mises UAV trajectory and beamforming variables alongside RIS and RSMA components, using federated learning to ke ep training local while still coordinating complex control [36]. Distributed sensing and ISAC-driven perceptive networking. ISAC surveys and PMN discussions motivate multi-agent learning in two roles: (i) resource control ( e.g., scheduling, waveform/bea m allocation) to balance sensing and communications, and (ii) distributed inference where ag ents fuse information across nodes for d etection/tracking. Surveys of ISAC fundamental limits and waveform design, together with JSTSP system overviews, provide the technical foundation for these roles [37]. In particular, collaborative sensing in perceptive mobi le networks stresses that network -level sensing benefits require coordination protocols and int erference manage ment, which naturally align with cooperative multi-agent formulations [38]. Security and int rusion detection in sensor/wireless networks. Intrusion detection remains a core WSN application; the user-provided TC-IDS work represents an early trust -based IDS design for WSNs. While mod ern deployments increasingly consider distributed l earning ( such as, FL) for privacy and cross-domai n training, they also face poisoning/backdoor threats , especially relevant when IDS models are collaboratively trained over wireless edge networks. Recent IEEE ComST surveys emphasise these vulner abilities and de fences in wireless FL, provid ing a concrete research bridge between “distributed sensing security” and “distributed learning sec urity” [39]. Application- to -method ma pping (system-le vel perspec tive) Application domain Agents (example s) Typical objective(s) Common MADL method choices Representative sources MEC offloading + slicing UEs, edge server s, slice orchestra tor Latency, energy, SLA satisfaction Actor – critic DRL; multi-agent RL fo r coordinated con trol; constrained RL TD3-based slicing- aware offloading ; O- RAN slicing via constrained mult i- agent RL [40] RAN slicing + O- RAN control loops xApps, RIC controllers, slice agents Long-term utility, isolation, admission control Cooperative multi- agent RL; RL sli cing surveys JSAC constrained multi-agent RL; ComMag survey s; NOMS multi-agen t slicing/control [41] UAV networking UAVs, BSs, UE s Coverage, fairness, throughput, energy MADRL (CTDE), federated mu lti- agent DRL TMC UAV trajec tory MADRL; TWC federated mu lti- UAV control [34] ISAC / PMNs BSs/RSUs/sensors Sensing – comms trade- offs, tracking Cooperative MAR L; GNN -based ISAC and PMN surveys; joint rad ar – 8 accuracy, interference mgmt inference/resour ce policies comms overview [42] Cooperative caching / F- RAN Fog access points Delay, hit rate, privacy Communication- efficient federat ed deep RL TWC federated deep RL edge caching [43] Wireless FL / AirFL Edge devices Convergence time, energy, spectral efficiency, privacy Joint learning – comms optimis ation; hierarchical/OTA aggregation TWC joint learn ing – comms FL; hierarchical FL w ith quantisation; sca lable hierarchical OTA- FL [44] WSN intrusion detection Sensor nodes, cluster heads Detection accuracy, false alarm control, energy Local DL + FL; robust aggregation TC -IDS; wireless FL security/backdoor survey [45] Challenges and Open Issues Despite rapid progress, deploying multi -agent deep learning in distributed sensing and wireless communications remains constrained by several systemic and theoretical challenges. Scalability and non-stationarity. In many wireless settings, the number of controllable entiti es can be large (dens e small cells, massive IoT, UAV s warms). Multi -agent RL faces non -stationarity because each agent’s policy update changes the environment seen b y others. Mean -field approximations can mitigate scaling issues but introduce modelling err or and may underrepresent rare but important interactions (for example, edge-case interference) [46]. Communication overhead and learning – control coupling. Distributed learning is fundamentally limited by the wireless medium it relies on. FL and FRL must budget spectrum and power for model updates while still serving use r traffic; over-the-air computation reduces latenc y but introduces aggregation error and int erference sensit ivity. Surveys and TWC studies show that convergence and performance are inseparable from communication design (power control, scheduling, quantisation, hierarchical aggrega tion interv als) [47]. Real-time constraints and safety. Many target use cases, URLLC control, UAV mobility, and ISAC tracking, im pose tight deadlines and safety cons traints. Constraint -aware RL for RAN slicing illustrates progre ss toward SLA -aware policies, yet formal safety guarantees and verification remain limited in most deep RL pipelines [48]. Security and p rivacy un der adversarial conditions. Wireless FL is exposed to both statis tical heterogeneity and adv ersarial manipulation. Backdoor and poisoning attacks can be mount ed through corrupted clients or manipulated updates; a recent IEEE ComST comprehensive survey Organises attack surfaces and defence mechanisms for wireless FL [49]. Additionally, distributed RL and mul ti-agent systems can be attacked through observation spoofing, re ward manipulation, and byzantine behaviours. While FRL surv eys identify challenges and future directions, rob ust-by-design multi-agent learning for wireless remains an open research frontier [50]. 9 System heterogeneity and reproducibility. Wireless learning experiments often rely on custom simulators, proprietary traces, or narrow parameter regimes. Although high-level surveys and tutorials offer unifying perspectives, reproducible benchmarking across (i) radio/traffic models, (ii) sensing modalities, and (iii) compute/energy constraints is still limited, particularly for joint ISAC + learning studies [51]. Operational deployment complexity (O-RAN and serverless). Program mable O-RAN control loops enable learning-based control but also create operational risk: signalling storms, control - loop inst ability, and misconfiguration can cascade across network slices and agents. Meanwhile, serverless edg e computing can sim plify deployment but introduces cold -start and orchestration challenges that may b e incompatible with hard r eal-time objectives unless carefully engineered [52]. Future Directions The next phase of research should be driven by 6G -native integra tion, not only integrating communications and sensing (ISAC), but also tightly integrating communication, sensing, computation, and learning into a single co-designed stack. Sense – communicate – compute co-design with learning in the loop. ISAC surveys indicate that future systems will require joint optimisation over waveforms, beams, and s ensing/communication metrics; multi-agent learning can serve as an adaptive control layer w hen modelling complexity grows. A key future direction is to build hybrid model -based + learning-based controllers that retain interpretability and constraint handling while ga ining adaptation speed [53]. Constraint-aware and verifiable multi -agent policies. Const raint-aware multi-agent RL for RAN slicing is an early example of aligning learning objectives with SLA c onstraints. Extending this to safety-critical U AV/ISAC and industrial settings likely requires: (i) constrained RL with explicit feasibility layers, (ii) ro bust training under distribution shift, and (iii) verification/calibration techniques appropriate for neural policie s [54]. Graph-native and permutation-equivariant multi -agent intelligence. GNN-based ar chitectures have shown strong promise for scalable RRM due to structural inductive bi as and generalisation. A forward-looking theme is graph-native multi-agent lea rning, where communication among agents and policy rep resentations are unifi ed by graph message p assing , potentially enabling consistent scaling across dense terrestrial networks and dynamic aeria l meshes [12]. Federated reinforcement learning and communication -efficient federated deep RL at scale. FRL surveys and TW C applications suggest that federated polic y learning will become more central a s privacy regulations tighten and as data becomes more distributed (for example, across UAV swarms, vehicles, industrial sensors). Res earch priorities include robust aggregation und er byzantine clients, exploration under heterogeneity, and joint optimisation of policy performance vs. communication cost [55]. Serverless learning control planes for elastic edge intelligence. Serverless edge computing surveys argue th at abstraction an d e lasticity are key for managing the e dge – cloud continuum. For wireless learning systems, the c ompelling direction is to tre at learning a s a set of event-driven, composable services (policy inference , aggregation, moni toring, anomaly dete ction), deployed near where data is generated and adapted to fluctuating workload s. However, research must address cold sta rts, 10 orchestration under wireless bandwidth constraints, and integration with O - RA N control loops [56]. Toward multi-agent, privacy-aware, real-time ISAC intelligence . Combining the I SAC stack with FL/FRL opens a pathway to privacy-aware distributed sensing analytics, e. g., collaborative target recognition without centralising raw sensing stre ams. Achieving this req uires new cross -layer protocols that jointly schedule sensing, communications, and lea rning updates unde r strict latency constraints and adversarial risk [57]. References [1] L. U. Khan, W. Saad, Z. Han, E. Hossain, and C. S. Hong, “Federated learning for Internet of Things: Recent advances, taxonomy, and open challenges,” IEEE Communications Surveys & Tutorials, vol. 23, no. 3, pp. 1759 – 1799, 202 1. [2] Liu, Fan, Yuanhao Cui, Christos Masouros, Jie Xu, Tony Xiao Han, Yonina C. Elda r, and Stefano Buzzi. "Integrated sensing and communications : Toward dual-functional wireless networks for 6G and beyond." IEEE jou rnal on selec ted areas in com munications 40, no. 6 (2022 ): 1728-1767. [3] N. Jia et al., “A comprehensive survey on commun ication -efficient federated learn ing in mobile edge environments,” IEE E Commun ications Survey s & Tutorials, vol. 27, pp. 3710– 3741, 2025. [4] Song, Qinghua, Junru Yang, and Amin Mohajer. "Multi‐Objective Resource Optimization in UAV‐ Enabled Heterogeneous Cellular Networks Using Serverless Fed erated Learning and Powe r‐Domain NOMA." Transact ions on Emerging Telecom munications Techno logies 36, no. 8 (2025): e70210. [5] Feriani, Amal, and Ekram Hossain. "Single and multi-agent deep reinforcement l earning for AI-enabled wireless network s: A tutorial." IEEE C ommunications Surveys & Tutori als 23, no. 2 (2021): 1226-1252. [6] N. Naderializadeh, M. Eisen, and A. Ribeiro, “Resource management i n wireless networks vi a multi- agent deep rein forcement learning,” IEEE Transactions on Wireless Commu nications, vol. 20, no. 6, pp. 3507 – 3523, 2021. [7] X. Du, S. Wengerter, S. Khosravirad, H. Viswana than, F. Qian, and A. L. Swindlehurst, “Multi -agent reinforcement learning fo r dynamic resource manage ment i n 6G in- X sub networ ks,” I EEE Transac tions on Wireless Com munications, vol. 22, no. 7, pp. 4250 – 42 65, 2023. [8] W. Pei, N. T. Nguyen, D. Hoang, H. D. Tuan, M. Debbah, and H. V. Poor, “Intelligent access to unlicensed spectrum : A mean field based deep reinforcement learning approach,” IEEE Transactions on Wireless Com munications, vol. 22, no. 4, pp. 2303 – 23 16, 2023. [9] Z. Xu, S. Sun, X. Chen, S. Cui, and H. V. Poor, “Distributed -training-and -execution multi-agent reinforcemen t l earning for power control in heterogeneous networks,” IEE E Transactions on Communication s, vol. 71, no. 8, pp. 46 79 – 4694, 2023. [10] J. M oon, S. K im, H. Ju , and B. Shim, “Ene rgy -efficient user association in m mWave/THz ultra-dense network via m ulti-age nt deep r einforce ment learning,” IEEE Transactions on Green Communicat ions and Networking, pp. 692 – 706, 2023. [11] Y. Xiao, C. Xie, Y. Chen, and X. Liu, “Multi -agent deep reinforcemen t learning based resource allocation for ultra-reliable low-latency Inte rnet of controllable t hings,” IEEE Transactions on Wireless Communication s, vol. 22, no. 6, pp. 37 66 – 3779, 2023. 11 [12] Somarin, Abbas Mirzaei, Younes A laei, Moham mad Reza Tahernezhad, Amin Mohajer, and Morteza Barari. " An ef ficient routing pro tocol for discovering the op timum path in mobi le ad hoc networks." India n Journal of Scien ce and Technolog y 8, no. S8 (2015): 4 50-455. [13] Y. Shen, J. Zhang, S. H . Song, and K. B. Letaief, “Graph neural networks for wir eless com munications: From theory to practice,” I EEE Transactions on Wireless Communic ations, vol. 22, no. 5, pp. 3554– 3569, 2023. [14] Lee, Mengyuan, Guanding Yu, Huaiyu Dai, and Geoffrey Ye Li. "Graph neural networks meet wire less communications : Motivatio n, applica tions, and f uture d irections." IEEE Wire less Communi cations 29, no. 5 (2022): 12-19. [15] Y. Shi et al., “Mach ine learning for lar ge -scale optim ization in 6G wireless ne tworks,” IEEE Communication s Surveys & Tutor ials, vol. 25, no. 4, pp. 2088 – 2132, 2023. [16] K. B. Leta ief, Y. Shi, J. Lu, and J. Lu, “Edge artificial i ntelligence for 6G: Vision, enabling technologies, and ap plications,” IEEE Journal on Se lecte d Areas in Communicati ons, vol. 40, no. 1, pp. 5 – 36, 2022. [17] X. Qin, Y. Ren, S. Wang, L. Yu, and J. A. Mc Cann, “Federated learning and wireless communications,” IEEE W ireless Communic ations, vol. 28, n o. 5, pp. 134– 140, 2021. [18] S. Niknam, H. S. Dhi llon, and J. H. Reed, “Feder ated learning for wireless communicat ions: Motivation, opportunities, and challenges,” IEEE Communi cations Magazine, vol. 58, no. 6, pp. 46 – 51, 2020. [19] Mohajer, Amin, Java d Hajipour, and Victor C M Leung. "Dynamic offloading in mobile edge computing w ith traffic-a ware network slicing and adaptive TD3 strategy." IEEE Commun ications Letters (2024). [20] Wahab, Omar Abdel, Azzam Mourad, Hadi Otrok, and Tarik Taleb. "Federated machine learning: Survey, multi-level classification, desirable criteria and future direc tions in communicat ion and networking systems." IEEE Co mmunications Su rveys & Tuto rials 23, no. 2 (2021 ): 1342-1397. [21] A. Liu et al., “A survey on fundamental limits of i ntegrated sensing and communication,” IEEE Communication s Surveys & Tutor ials, vol. 24, no. 2, pp. 994 – 1034, 2022. [22] J. A. Zhang, F. Liu, C . Masouros, R. W. Heath Jr., Z. Feng, L. Zheng, and A. Petropulu , “An overview of signal processing techniques for joint communication and radar sensing,” IEEE J ournal on Selected Topics in Signal P rocessing, vol. 1 5, no. 6, pp. 129 5 – 1315, 2021. [23] Mohajer, Amin, Mohammad Hasan Hajimobini, Abbas Mirzaei, and Ehsan Noori. "Trusted -CDS based intrusion detection system i n wireless sensor network (TC -IDS)." Open Access Library Journal 1, no. 7 (2014): 1-10. [24] L. Xie, S. H. Song, Y. C. Eldar, and K. B. Letaief, “Collaborative sensing i n perceptive m obile networks: Opportunities and challenges,” IEEE Wireless Communications, vol. 30, no. 1, pp. 16 – 23, 2023. [25] W . Zhou, R. Zhang, G. Chen, and W. Wu, “I ntegrated sensing and communication wavefo rm design: A survey,” IEEE O pen Journal of th e Communicat ions Society, v ol. 3, pp. 1930– 1949, 2022. [26] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A j oint learning and communications framework for f ederated learning over wireless net works,” IEEE Transactions on Wireles s Communication s, vol. 20, no. 1, pp. 26 9 – 283, 2021. 12 [27] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh - Bahaei, “Energy efficient f ederated learning over wireless communication networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp. 1935 – 1949, 2021. [28] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time optimization for federated learning over wireless networks,” IEEE Tra nsactions on Wireless Communications, vol. 20, no. 4, pp. 2457 – 2471, 2021. [29] Z. Wang, J. Qiu, Y. Zhou, Y. Shi, L. Fu, W. Chen, and K. B. Letaief, “Federated learning via intelligent reflecting surface, ” IEEE Transact ions on Wirel ess Communication s, vol. 21, no. 2, pp. 808– 822, 2022. [30] L. Liu, J. Zhang, S. H. Song, and K. B. Letaief, “Hierarchical federated learning wi th quantization: Convergence analysis and system design,” IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 2 – 18, 2023. [31] Tegin, Busra, and Tolga M. Duman. "Federated learning with over -the-air aggr egation over time- varying channels. " IEEE Transactions o n Wireless Co m munications 22, no. 8 ( 2023): 5671-5684. [32] S. M. Azimi- Abarghouyi and V. Fodor, “Scalable hierarch ical over -the- air federated learning,” IEEE Transactions on Wireless Com munications, DOI : 10.1109/TWC.2024.33 50923, 2 024. [33] A. Şahin and R. Yang, “A survey on over -the- air computation,” IEEE Communications Surveys & Tutorials, vol. 25, no. 3, pp. 1877 – 1908, 2023. [34] S. M. Azimi- Abargho uyi, C. Fischione, and K. Huang, “Over -the-air federated lea rning: Rethinking edge AI through signal processing, ” IEEE Signal Pr ocessing Magaz ine, preprint ( Dec. 2025). [35] Y. Wan, Y. Qu, W. Ni, Y. Xiang, L. Gao, and E. Hossain, “Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanis ms: A compre hensive survey,” IEEE Communication s Surveys & Tutor ials, 2024, DOI : 10.1109/COMST .2024.3361451. [36] Zhou, Nan, Ya Nan Li, and Amin Mo hajer. "Distributed c apacity optimisation and resou rce a llocation in heterogeneous m obile networks using advanced s erverless connectivi ty strategie s." International Journal of Sensor Network s 45, no. 3 (2024 ): 127-147. [37] L. Yuan et al., “Decentralized f ederated l earning: A surv ey and pe rspective,” IEEE Internet of Thing s Journal, vol. 11, no. 2 1, pp. 34617 – 346xx, 20 24. [38] S. K. Das, R. Mudi, M. S. Rahman, K. M. Rabie, and X. Li, “Federated reinforcement learning for wireless netwo rks: Fund amentals, c hallenges a nd fu ture research trends,” I EEE O pen Journa l of Veh icula r Technology, 2024, DO I: 10.1109/OJVT.2 024.3466858. [39] M. Zhang, Y. Jiang, and F.- C. Zheng, “Communication - efficient federated deep reinforcement learning based cooperative edge caching in fog radio access networks,” IEEE Transactions on Wireless Communication s, 2024, DOI : 10.1109/TWC.2024.34 67285. [40] K. Guo, M. Wu, X. Li, Z. Lin, and T. A. Tsiftsis, “Joint trajectory and beamforming optimization for federated DRL-aided space-aerial- terrestrial relay networks with RIS and RSMA,” IEEE Transactions on Wireless Com munications, vol. 23, no. 12, pp. 184 56 – 18471, 2024. [41] Raith, Philipp, Stefan Nastic, and Sc hahram Dustdar. "S erverless edge comp uting — where we are and what lies ahead. " IEEE Internet C omputing 27, no. 3 (2023): 50-64. 13 [42] C. Ch en, M. Herrer a, G. Zheng, L. Xia, Z. Ling, and J. Wang, “Cross - ed g e orchestration of server less functions w i th pro babilistic caching,” I EEE Tr ansactio ns on Services Computing, vol. 17, no. 5, pp. 2139– 2150, 2024. [43] M. Zangooei, M. Golkarifard, M. Rouili, N. Saha, and R. Boutaba, “Flexible RAN slicing in Open RAN with constrained multi- agent reinfo rcement learning,” IEEE Journal on Selected A reas i n Communication s, vol. 42, no. 2, pp. 28 0 – 294, 2024. [44] Zangooei, Mohamma d, Nil oy Saha, Morteza Golkarifard, and Raouf Boutaba. "Reinforcemen t learning for radio resource management in RAN slicing: A survey." IEEE Communications Magazine 61, no. 2 (2023): 118-124. [45] N. Saha, M. Zangooei, M. Golkarifard, and R. Boutaba, “Deep reinforcement learning approaches to network slice scaling and placement: A survey,” IEEE Commun ications Magazine, vol. 61, no. 2, pp. 82 – 87, 2023. [46] Sulaiman, Muham mad, Arash Moayyed i, Mahdieh Ah madi, Mohammad A . Salahuddin, Raouf Boutaba, and Aladdin Saleh. "Coordin ated slicing and admission control using multi -agent deep reinforcement learn ing ." IEEE Transactions on Networ k and Service Manage ment 20, no. 2 (2022): 1110- 1124.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment