Towards Semantic-based Agent Communication Networks: Vision, Technologies, and Challenges
The International Telecommunication Union (ITU) identifies "Artificial Intelligence (AI) and Communication" as one of six key usage scenarios for 6G. Agentic AI, characterized by its ca-pabilities in multi-modal environmental sensing, complex task co…
Authors: Ping Zhang, Rui Meng, Xiaodong Xu
SCIENCE CHINA Information Sciences . RESEAR CH P APER . T o w ards Seman tic-based Agen t Comm unication Net w orks: Vision, T ec hnologies, and Challenges Ping ZHANG 1 , Rui MENG* 1 , Xiao dong XU* 1 , Y aheng W ANG 1 , Zixuan HUANG 1 , Yiming LIU 1 , Ruic hen ZHANG 2 , Yinqiu LIU 2 , Haonan TONG 3 , Huishi SONG 4 , Gang WU 5 , Zhaoming LU 6,7,8 , Jia w en KANG 9 , Geng SUN 10 , Qinghe DU 11 , Zhaohui Y ANG 12 , Jingxuan ZHANG 13 , Han MENG 14 , Lexi XU 15 , Haitao ZHA O 16 , Zesong FEI 17 , Yiqing ZHOU 18,19,20 , Pei XIA O 21 , Meixia T A O 22 , Qin yu ZHANG 23 , Shuguang CUI 24 & Rahim T AF AZOLLI 21 1 State Key L ab or atory of Networking and Switching T e chnolo gy, BUPT, Beijing 100876 , China; 2 Col le ge of Computing and Data Scienc e, Nanyang T e chnologic al University, Singap or e 639798 , Singap ore; 3 A erosp ac e Information R ese arch Institute, Chinese A c ademy of Sciences, Bei jing 100094 , China; 4 ZGC Institute of Ubiquitous-X Innovation and Applic ations, Beijing 100083 , China; 5 National Key L abor atory of Wir eless Communic ations, UESTC, Chengdu 611731 , China; 6 Beijing Key L abor atory of Network System A r chitectur e and Conver gence, BUPT, Beijing 100876 , China; 7 Beijing L abor atory of A dvanc e d Information Networks, BUPT, Beijing 100876 , China; 8 Xiong’an A erosp ac e Information R ese arch Institute, Xiong’an 070001 , China; 9 Scho ol of Automation, Guangdong University of T e chnolo gy, Guangzhou 510006 , China; 10 Col le ge of Computer Scienc e and T e chnology, Jilin University, Changchun 130012 , China; 11 Scho ol of Information and Communications Engine ering, Xi’an Jiaotong University, Xi’an 710049 , China; 12 Col le ge of Information Scienc e and Ele ctronic Engine ering, Zhejiang University, Hangzhou 310027 , China; 13 National Scho ol of Elite Engineering, University of Scienc e and T e chnolo gy Beijing, Beijing 100083 , China; 14 Institute of Network and IT T e chnolo gy, China Mobile R esear ch Institute, Beijing 100053 , China; 15 R esear ch Institute, China Unite d Network Communic ations Corp oration, Beijing 100048 , China; 16 Col le ge of Ele ctr onic Science and T e chnolo gy, National University of Defense T echnolo gy, Changsha 410073 , China; 17 Scho ol of Information and Electr onics, Beijing Institute of T e chnolo gy, Beijing 100081 , China; 18 State Key L ab of Pro c essors, Institute of Computing T e chnolo gy, Chinese Ac ademy of Scienc es, Beijing 100190 , China; 19 Beijing Key L abor atory of Mobile Computing and Pervasive Devic e, Beijing 100190 , China; 20 University of Chinese A cademy of Scienc es, Beijing 100049 , China; 21 5GIC and 6GIC, Institute for Communic ation Systems, University of Surrey, Guildfor d GU2 7XH , Unite d Kingdom; 22 Scho ol of Information Science and Ele ctr onic Engineering, Shanghai Jiao T ong University, Shanghai 200240 , China; 23 Guangdong Pr ovincial Key Lab or atory of A er ospac e Communic ation and Networking T e chnolo gy, HIT, Shenzhen 518055 , China; 24 F utur e Network of Intel ligent Institute, The Chinese University of Hong Kong (Shenzhen), Shenzhen 518066 , China Abstract The International T elecommunication Union (ITU) identifies “Artificial Intelligence (AI) and Commu- nication” as one of six key usage scenarios for 6G. Agentic AI, characterized b y its capabilities in multi-modal environmen tal sensing, complex task coordination, and contin uous self-optimization, is anticipated to driv e the evolution tow ard agen t-based communication netw orks. Seman tic comm unication (SemCom), in turn, has emerged as a transformative paradigm that offers task-oriented efficiency , enhanced reliabilit y in complex environmen ts, and dynamic adaptation in resource allo cation. Howev er, comprehensiv e reviews that trace their technological ev olution in the contexts of agent communications remain scarce. Addressing this gap, this paper systematically explores the role of seman tics in agent comm unication netw orks. W e first propose a nov el arc hitecture for seman tic-based agent communication net works, structured into three lay ers, four en tities, and four stages. Three wireless agent netw ork lay ers define the logical structure and organization of en tity interactions: the in tention extraction and understanding lay er, the seman tic enco ding and processing la yer, and the distributed autonomy and collab oration lay er. Across these lay ers, four AI agent entities, namely embo died agents, communication agents, netw ork agents, and applica- tion agents, co exist and perform distinct tasks. F urthermore, four op erational stages of seman tic-enhanced agentic AI systems, namely p erception, memory , reasoning, and action, form a cognitive cycle guiding agent b eha vior. Based on the proposed arc hitecture, we pro vide a comprehensiv e review of the state-of-the-art on how semantics enhance agen t communication netw orks. Finally , we iden tify k ey challenges and present p oten tial solutions to offer directional guidance for future research in this emerging field. Keyw ords Semantic communication, agen tic AI, AI agen t, intellicise (intelligen t and concise) wireless netw orks, communication and AI (ComAI) Citation ZHANG P , MENG R, XU X, et al. T o wards Semantic-based Agen t Communication Networks: Vision, T ec hnologies, and Challenges. Sci China Inf Sci, for review * Corresponding author (email: buptmengrui@bupt.edu.cn, xuxiao dong@bupt.edu.cn) † Ping ZHANG and Rui MENG con tributed equally to this work and should b e considered co-first authors. ZHANG P , et al. Sci China Inf Sci 2 Figure 1 The evolution from 1G to 6G. 1 In tro duction 1.1 Motiv ation The curren t fifth-generation (5G) mobile comm unication system exhibits limitations in co v erage effi- ciency within a complex en vironment, cost and energy efficiency in the dense net working scenarios, and customization capabilities for v ertical industries [1]. These challenges hav e catalyzed the in tegration of Artificial Intelligence (AI) to address the interconnection demands of humans, machines, devices, and genies in future sixth-generation (6G) netw orks [2]. In June 2023, the In ternational T elecommunication Union-Radio comm unication (ITU-R) recommended that “AI and Comm unication” be designated as one of six k ey usage scenarios for 6G. This underscores the principle that ubiquitous intel ligenc e will serv e as a foundational design element across all 6G applications [3]. On 13 F ebruary 2026, during the 131st RAN3 meeting of the 3rd Generation Partnership Pro ject (3GPP) in Gothenburg, Sw eden, 3GPP officially adopted “aNB” as the nomenclature for the 6G radio access net work (RAN) no de [4]. Here, the prefix “a” signifies “Adv anced” and “AI”, signaling that the 6G base station (BS) will achiev e breakthroughs in performance metrics and capabilities while fully integrating AI across critical domains. These include resource sc heduling, c hannel mo deling, net work optimization, and maintenance, among other functional areas. Agen tic AI has recently garnered significant attention owing to its autonomous decision-making ca- pabilities, driv en b y con tinuous p erception-memory-reasoning-action loops [5, 6]. Unlike conv en tional AI systems reliant on static inference pip elines [7, 8], agentic AI in tegrates adv anced generative frameworks, including large language mo dels (LLMs), large vision models (L VMs), large m ulti-mo dal models (LMMs), and world models, to achiev e multi-modal environmen tal sensing, complex task co ordination, and contin- uous self-optimization [9, 10]. These unique strengths make it well-suited for integration in to the RAN, core netw ork, and edge no des of 6G net works, thereby adv ancing the evolution to ward agen t comm uni- cation netw orks. The evolution from 1G to 6G is illustrated in Figure 1. How ever, traditional bit-lev el transmission paradigms face critical challenges in supp orting suc h netw orks, including substan tial data transmission volumes, lo w interaction reliability , and excessive netw ork resource consumption [11]. In resp onse, Semantic Communication (SemCom) has emerged as a transformative solution [12], offering the follo wing adv antages: • T ask-orien ted and Efficien t Communication: SemCom directly aligns with task ob jectiv es, minimizing sp ectrum and computational waste caused by redundan t bit transmission [13]. By extracting agen t inten tions, it supp orts real-time collab oration among massive agents, such as drone swarms and autonomous v ehicle fleets [14]. • Highly Reliable Comm unication in Complex Environmen ts: SemCom reduces noise sensi- tivit y and eliminates redundant information by transmitting core seman tic features [15]. Additionally , through shared semantic knowledge bases (KBs), agen ts can reconstruct critical information via contex- tual reasoning, enhancing resilience to malicious interference and noise [16]. • Flexible Resource Sc heduling and Dynamic Adaptation: SemCom dynamically adjusts ZHANG P , et al. Sci China Inf Sci 3 transmission strategies based on seman tic imp ortance, enabling adaptiv e encoding and decoding in fluc- tuating en vironments and ensuring efficien t net work resource utilization [17]. Ov erall, SemCom alleviates data transmission burdens, enhances in teraction reliability , and optimizes resource allo cation efficiency for agen t communication netw orks. F urthermore, SemCom-enabled intel- licise (intel ligent and c oncise) wireless netw orks [18, 19] in tegrate foundational theories, including in- formation theory , AI theory , and system theory , to strengthen inten tion-driven, seman tic-b earing, and distributed autonomy capabilities for agent comm unication netw orks. Moreov er, the conv ergence of com- m unication and AI (ComAI) [20], with SemCom and intellicise wireless netw orks as its core technologies, lev erages grammatical, semantic, and pragmatic information through the integration of the user plane, con trol plane, computing plane, data plane, and intelligen t plane, thereby adv ancing the ev olution of agen t comm unication netw orks. T o this end, the primary ob jectiv e of this pap er is to provide a comprehensive analysis of the rationale and mec hanisms underlying SemCom in tegration within agent communication netw orks. W e prop ose an arc hitecture that consists of three wireless agent netw ork la yers, four AI agent entities, and four stages of agen tic AI for seman tic-based agent communication net works. F urthermore, we conduct a systematic review of asso ciated driving factors, k ey technologies, and their interdependencies, aiming to catalyze future researc h efforts in this rapidly ev olving domain. 1.2 Preliminaries and State-of-the-Art W orks 1.2.1 Semantic Communic ations SemCom is an emerging paradigm that prioritizes the eff ectiv e extraction, delivery , and interpretation of the underlying meaning of information, in con trast to con v entional bit-level systems that fo cus on the accurate transmission of bit streams. By transmitting only task-relev ant semantic features, SemCom ac hieves substan tial gains in b oth efficiency and robustness. Moreo v er, it serves as a foundational enabler for in tellicise wireless net works and ComAI. W e summarize the relev an t state-of-the-art survey , tutorial, and magazine literature in T able 1. In the context of general fundamentals and architectures, several comprehensiv e surv eys and tutorials ha v e systematically explored the theoretical foundations, early imple- men tations, p oten tial applications, and inherent challenges of semantics-empow ered netw orks [11, 21–23]. More sp ecifically , Chaccour et al. [24] prop ose the first rigorous end-to-end vision for seman tic net works, while Shi et al. [25] introduce an arc hitecture based on federated edge intelligence to enable resource- efficien t seman tic-aw are netw orking. Regarding security and resource management, w orks [26–28] pro vide comprehensiv e ov erviews and guidelines for designing secure SemCom systems, with an in-depth analysis of the interpla y among net w ork architectures, security paradigms, and priv acy concerns. Additionally , Zhang et al. [29] present a systematic categorization and surv ey of adv anced resource allocation strate- gies sp ecifically tailored for SemCom netw orks. In terms of adv anced AI integration and the evolution to ward nativ e netw ork in telligence, Liang et al. [30] inv estigate the transformative role of generativ e AI arc hitectures within SemCom systems. Building upon these foundations, [17, 18] offer in-depth studies on architectural innov ations in intellicise wireless net works and semantic RANs. Finally , Meng et al. [19] examine the emerging security and priv acy implications at the intersection of intellicise net works and agen tic AI. 1.2.2 A gentic AI Systems Agen tic AI refers to highly autonomous AI systems capable of adv anced cognitiv e reasoning, environmen- tal p erception, and independent decision-making. Unlike traditional passiv e AI systems or foundational LAMs, agentic AI distinguishes itself through contin uous in teraction with dynamic environmen ts and the autonomous execution of m ulti-step, complex goals via structured perception-action loops. W e sum- marize recent survey and tutorial literature on agentic AI in T able 1. T o clarify core concepts in this rapidly evolving field, Sapkota et al. [5] establish a rigorous taxonomy that distinguishes agentic AI from con ven tional AI agents. Delving into specific technical dimensions, W ang et al. [33] review the underly- ing techniques, developmen tal c hallenges, and emerging opp ortunities in agen tic AI programming. F rom a security standp oin t, Datta et al. [32] critically examine threat landscap es, defense mechanisms, and unresolv ed security c hallenges inherent in agentic AI systems. In the telecommunications domain, the in- tegration of agentic AI is gaining increasing relev ance. Jiang et al. [10] provide a systematic introduction to the designs of LAMs and agen tic AI technologies within in telligent communication systems. Finally , ZHANG P , et al. Sci China Inf Sci 4 T able 1 A classification of selected surv ey/tutorial/magazine pap ers Domain Ref. Y ear T yp e Contributions SemCom [25] 2021 Magazine Propose an architecture based on federated edge intelligence for supporting resource- efficient semantic-aw are networking. [11] 2022 Survey Systematically inv estigate the fundamentals, potential applications, and intrinsic challenges of SemComs for the future internet. [21] 2023 T utorial Summarize from the early adaptation, semantic perception, and task-oriented com- munication, covering the basics, algorithms, and potential implementations. [22] 2024 T utorial Deliver a comprehensiv e tutorial-cum-surv ey detailing the landscap e of seman tics- empowered communication systems. [26] 2024 Magazine Provide a comprehensive guide on how to design secure SemCom systems in the real- world wireless comm unication netw ork. [18] 2025 Survey Provide an in-depth research of intellicise wireless net works derived from SemComs. [30] 2025 Survey Explore the transformativ e integration of Generative AI architectures and technolo- gies within SemCom netw orks. [27] 2025 Survey Provide a comprehensiv e overview of t he techniques that can b e used to ensure the security of SemCom. [24] 2025 Survey Present the first rigorous and holistic end-to-end vision of SemCom net works. [23] 2025 Survey Introduce the theoretical foundation, research environmen t, market environmen t, op- portunities and challenges, and future research directions of SemCom. [28] 2025 Survey Analyze the intersection of netw ork architecture, security paradigms, and privacy issues in SemCom environmen ts. [29] 2026 Survey Systematically categorize and surv eys adv anced resource allocation strategies tailored for wireless SemCom netw orks. [19] 2026 Magazine Inv estigate the profound security and priv acy implications benefiting from the con- vergence of in tellicise netw orks and Agentic AI. [17] 2026 Survey Discusse the state-of-the-art architectural innov ations and future evolutionary paths of semantic RANs. Agentic AI [31] 2025 Survey Offer a holistic survey on autonomous intelligence, fo cusing on the deploymen t of Agentic AI for resolving complex ob jectiv es. [5] 2025 Surv ey Establish a rigorous conceptual taxonomy distinguishing AI agents from Agentic AI, alongside an analysis of their applications. [32] 2025 Survey Critically introduce and ev aluate the threat landscapes, defense mechanisms, and open securit y challenges inherent in Agentic AI systems. [6] 2025 Surv ey Surveys the div erse arc hitectural paradigms, practical applications, and future tra- jectories of Agentic AI tec hnologies. [33] 2025 Survey Reviews the underpinning techniques, dev elopmental c hallenges, and emerging oppor- tunities in AI agentic programming. [10] 2026 T utorial Systematically and comprehensiv ely introduce the principle, design and application of large AI mo dels (LAMs) in intelligent communication systems. [9] 2026 Surv ey Provide a comprehensive survey of Agentic AI and agentification frameworks tailored for edge general-purp ose intelligence. Zhang et al. [9] present a comprehensiv e survey on agen tic AI and agentification frameworks sp ecifically designed to enable edge general-purpose intelligence. 1.3 Key Con tributions and Outline Although numerous researchers hav e focused on SemCom [11, 17–19, 21–30, 34] and agentic AI [5, 6, 9, 10, 31–33], a comprehensiv e understanding of the state-of-the-art in seman tic-based agent communication net works remains nascent. F or instance, Gao et al. [14] propose a unified agen tic AI-enhanced SemCom framew ork, while Y u et al. [35] develop a seman tic-driven AI agent communication framework. Notably , the literature curren tly lacks a systematic review tracing the ev olutionary tra jectory of b oth semantic- based agentic AI systems and seman tic-based agen t wireless netw orks. T o address this gap, this survey presen ts a comprehensiv e exploration of state-of-the-art tec hnologies for semantic-based agent communi- cation net works. The main con tributions are summarized as follows. 1.3.1 Outline the Ar chite ctur e of Semantic-b ase d A gent Communic ation Networks W e prop ose a no vel architecture that consists of three lay ers, four en tities, and four stages for seman tic- based agent communication netw orks, where the comp onents interact to form a closed-loop system. The arc hitecture comprises three wireless agen t net work lay ers that define the logical structure and organiza- tion of en tity in teractions: the inten tion extraction and understanding la yer, the semantic encoding and pro cessing la yer, and the distributed autonom y and collaboration lay er. F our AI agen t entities, includ- ing embo died agent, comm unication agent, netw ork agen t, and application agent, inhabit the netw ork ZHANG P , et al. Sci China Inf Sci 5 Figure 2 The structure of this paper. and execute tasks. Finally , four operational stages for seman tic-enhanced agentic AI systems, including p erception, memory , reasoning, and action, define the cognitive cycle that guides agent b eha vior. 1.3.2 Explor e the State-of-the-A rt in Semantic-b ase d A gent Communic ation Networks Based on the prop osed arc hitecture, w e inv estigate recent adv ancements in the role of semantics in enhancing agen t communication netw orks. • Thr e e L ayers: F or the inten tion extraction and understanding lay er, we review representativ e ap- proac hes, including evidence-based in tent inference, opp onen t mo deling-based in tent extraction, and mind mo deling-based inten t inference. F or the seman tic enco ding and processing la yer, we summarize represen- tativ e approaches, including seman tic-based co ding, seman tic-based b eam managemen t, semantic-based Channel State Information (CSI) feedbac k, semantic-based Hybrid Automatic Repeat request (HAR Q), Age of Semantic Information (AoSI), and semantic Knowledge Base (KB). F or the distributed auton- om y and collaboration la yer, we explore represen tative metho ds, including distributed access, knowledge collab oration, and resource sc heduling. • F our Stages: F or the seman tic-based p erception stage, we summarize representativ e tec hniques, in- cluding semantic feature extraction and represen tation, task-oriented en vironmen tal sensing, semantic ob- ject grounding and tracking, and em b odied semantic environmen t understanding. F or the seman tic-based memory stage, we examine representativ e approaches, including hierarchical seman tic memory structure, seman tic retriev al and reasoning, memory ev olution and kno wledge up date, and cognitive augmentation via memory . F or the semantic-based reasoning stage, w e review representativ e metho ds, including chain- of-though t (CoT) reasoning, knowledge graph (KG)-augmen ted reasoning, retriev al-augmented reason- ing, tree-structured multi-path reasoning, and neuro-sym b olic reasoning. F or the semantic-based action stage, w e summarize representativ e approac he s, including seman tic tool acquisition, reasoning-action in- terlea ving, multi-agen t collab orativ e action, semantic self-correction, and reinforcement-based semantic feedbac k. • F our Entities: F or embo died agents, we highlight t wo representativ e pro jects: SayCan and Atlas. F or comm unication agents, w e review the semantic-driv en AI agent communication framew ork and agentic AI-enhanced SemCom framew ork, follo wed by an in tro duction to t wo notable implementations, Channel GPT and UniClaw. F or netw ork agents, w e present four key pro jects, RAN Agen t, Agentic AI for ZHANG P , et al. Sci China Inf Sci 6 Figure 3 The proposed arc hitecture for semantic-based agen t comm unication netw orks, comprising three layers, four entities, and four stages. RAN, JoinAI-Agent, and Xingchen Sup er Agent. F or application agents, w e provide an ov erview of t wo represen tative general agents, along with sev eral v ertical agen ts tailored for smart factory , smart healthcare, smart city , and in telligent transp ortation. 1.3.3 Discuss Chal lenges and Potential Solutions While extensive research has explored the use of seman tics to enhance agent communications, sev eral fundamen tal c hallenges remain. Building on this analysis, w e systematically identify these challenges and prop ose potential research directions. Key areas of fo cus include the theoretical framework, the managemen t of seman tic KBs, securit y and priv acy protection, and standardization and industry adoption in seman tic-based agent communication netw orks. R o admap: The outline of this review is depicted in Figure 2. Sp ecifically , Section 2 presen ts the archi- tecture of seman tic-based agen t comm unication netw orks. Section 3 reviews k ey tec hnologies supporting the three la yers of semantic-based wireless agent net w orks. Section 4 discusses how seman tics enhance the four stages of agen tic AI systems. Section 5 examines four AI agent en tities for SemCom netw orks. Section 6 outlines challenges and p oten tial directions. Finally , Section 7 concludes this review. 2 Prop osed Seman tic-based Agen t Comm unication Net w ork Arc hitecture As illustrated in Figure 3, we propose an architecture for semantic-based agen t communication net works, comprising three lay ers, four en tities, and four stages. • “Three Lay ers”: Define the agen t comm unication system’s logical organizational framew ork through three core lay ers: in tention extraction and understanding lay er, semantic enco ding and pro cessing la yer, and distributed autonomy and collab oration la yer. • “F our Entities”: Explicitly categorize communication netw ork en tities in to four distinct AI agen t t yp es: em b odied agents, comm unication agen ts, net work agen ts, and application agen ts, eac h represen ting a unique physical and functional role within the architecture. • “F our Stages”: Outline the seman tic-enhanced w orkflo w for AI agen ts across four sequen tial stages: p erception, memory , reasoning, and action, capturing the end-to-end cognitiv e pro cessing pip eline. ZHANG P , et al. Sci China Inf Sci 7 The relationship b et w een them is as follo ws. Firstly , three lay ers define the ov erarching structure of the comm unication netw ork, establishing how the intelligen t netw ork system is organized and determining where four entities op erate within it. Secondly , four entities serv e as the functional carriers, identifying task executors and driving the execution of four stages. Thirdly , four stages outline the b eha vioral metho dology , addressing the question of ho w agen ts think. This pro cess occurs not only within individual agen ts but also across three la yers, thereby creating a closed loop for information flow. 2.1 Three Wireless Agen t Net w ork La y ers T able 2 pro vides a concise comparison of these three lay ers in terms of their core focuses and op erational lev els, with detailed explanations pro vided below. • In ten tion Extraction and Understanding Lay er: In traditional communication systems, the con trol plane is primarily concerned with how data is forwarded. The inten tion la yer, by con trast, in tro duces tw o elements that ha v e never b een addressed in traditional communication: knowledge and goals. Knowledge provides it with the capacit y to understand the w orld, going b eyond mere sensing [36]; goals enable autonomous decision-making, rather than simply reacting to inputs [37]. Sp ecifically , the in tention lay er p erceiv es raw data streams from v arious sensors [38], such as camera pixels, LiD AR p oin t clouds, and inertial measurement unit data, transforms these signals into structured knowledge [36], and fuses them to resolve am biguity [39]. By leveraging pre-deploy ed KBs such as K Gs and world mo dels, the in tention lay er forms a con textualized understanding of curren t perceptual information, and p erforms inten t recognition. Based on this understanding, it defines communication goals that specify the in tended recipien ts, the kno wledge con ten t to b e con v eyed, and the desired effects. F or example, when an autonomous v ehicle detects a traffic accident 50 meters ahead, it recognizes that this newly acquired information is critical for surrounding v ehicles. It then generates a communication goal: “Broadcast to v ehicles within a 300-meter radius b ehind: ‘accident ahead, slo w down and detour’.” Here, the recipients are v ehicles 300 meters to the rear, the knowledge conten t includes the lo cation and severit y of the acciden t, and the desired effect is for those v ehicles to slo w down or alter their route. • Seman tic Enco ding and Pro cessing Lay er: This lay er aims to simplify information represen- tation and ensure communication robustness, all while preserving the original meaning. It leverages a shared semantic KB that serves as prior knowledge for both the transmitting and receiving agen ts [16]. Seman tic KB is typically aligned through pre-training techniques prior to communication and is dy- namically up dated during op eration. Upon receiving a communication task from the inten t lay er, the transmitting agen t emplo ys a seman tic enco der to extract seman tic information from b oth the source and the channel, enco ding it in to a seman tic v ector. The receiving agen t then performs the reverse op eration to fulfill the comm unication task. Whereas traditional communication relies on bit error rate (BER) as its p erformance metric, the seman tic lay er emphasizes seman tic distortion rate (SDR), whic h measures the discrepancy b etw een the meaning reconstructed by the receiver and the original inten t of the sender [40]. The seman tic la y er can tolerate a certain degree of physical-la yer bit errors, provided that these errors do not compromise the core meaning. • Distributed Autonomy and Collab oration La y er: The ob jectiv e of this la y er is to enable ef- ficien t, reliable, and low-latency collab oration among m ultiple agents operating in highly dynamic and complex wireless environmen ts. Through lo cal p erception and autonomous negotiation, agents coordi- nate communication tasks in a distributed manner. Firstly , the lay er facilitates efficient multi-user access via Mo del Division Multiple Access (MDMA) [41–43]. MDMA extracts high-dimensional source fea- tures by adopting a mo del-driv en approach grounded in semantics, thereby constructing a mo del-based information space that accounts for the characteristics of b oth multi-modal sources and communica- tion c hannels. MDMA differen tiates users based on the semantic features em b edded within the mo dels. Secondly , this lay er employs semantic-a ware resource scheduling strategies that prioritize resource allo- cation for high-priority or time-sensitiv e communication tasks [29]. Finally , agen ts with complementary p erceptual capabilities, computational resources, and knowledge backgrounds collab orate to ov ercome individual limitations, thereby enabling the execution of complex semantic understanding and communi- cation tasks [44]. 2.2 F our AI Agen t En tities T able 3 offers a concise comparison of these four AI agent entities, highlighting their respective roles and primary functions. A more detailed account of each is provided b elo w. ZHANG P , et al. Sci China Inf Sci 8 T able 2 Three layers of seman tic-based wireless agent networks Logical Lay ers Addressed Issues Lev els Inten tion Extraction and Understanding La yer Why Communication? Knowledge and Goal Semantic Enco ding and Pro cessing Layer What to T ransmit? Meaning and Significance Distributed Autonomy and Collab oration Lay er How to Comm unicate? Node and Flow T able 3 F our Entities of AI agents in SemCom networks Agen t Entities Roles Main F unctions Embodied Agen ts Physical Execution Implementation and F eedback of Semantics Communication Agents In teractive Connection Coding and T ransmission of Seman tics Netw ork Agents Netw ork Management Sc heduling and Guarantee of Semantics Application Agents Service Provision Generation and Consumption of Semantics • Em b odied Agen t: Em b odied agen ts are active participants in the physical w orld, capable of understanding, reasoning, and in teracting with their surroundings in real time [45]. They typically tak e the form of v arious rob otic systems, such as h umanoid rob ots, drones, and autonomous vehicles. While executing tasks, these agen ts must transmit multi-modal data in real time, such as visual and tactile information. SemCom enables them to transmit only environmen tal changes or target features instead of full video streams, significan tly reducing the burden on air in terface resources. Moreov er, because tactile and visual p erceptions are inherently correlated in the physical w orld, SemCom facilitates the semantic alignmen t of information across differen t modalities, leading to faster physical resp onses. • Comm unication Agen t: Comm unication agents serv e as the essential link b et ween other agents and the capabilities of SemCom netw orks, allo wing agen ts to nativ ely access and utilize core net work func- tions. Sp ecifically , these agen ts encapsulate the underlying semantic enco ding and decoding capabilities. When an embo died or application agent needs to transmit messages, the comm unication agent generates an enco ded semantic vector at the transmitter and reconstructs the original meaning through the reverse pro cess at the receiver [14]. Moreo ver, since different agents may op erate with distinct KBs, communica- tion agen ts engage in proto col-based negotiation to establish a consisten t seman tic bac kground, thereby enabling effective SemCom links. Additionally , they are resp onsible for the dynamic up dating and syn- c hronization of semantic KBs, which is crucial for the ongoing enhancement of intelligen t communication capabilities. • Net w ork Agen t: The primary purp ose of netw ork agents is to in tegrate agentic AI technologies in to the net work infrastructure, enabling self-in telligence [46]. While traditional netw orks merely sc hedule data flo ws, netw ork agen ts harness their understanding and reasoning capabilities to orc hestrate seman- tic routing. They perform in-net work aggregation, optimization, and in telligent distribution, thereby ac hieving efficient semantic aggregation. Netw ork agen ts supp ort in ten t-based netw ork managemen t by automatically translating high-lev el op erator in ten tions into sp ecific configurations, facilitating netw ork self-configuration, self-optimization, and self-healing. F urthermore, net work agents can rapidly locate and diagnose faults, conduct pre-deploymen t netw ork sim ulations to mitigate risks, accurately iden tify con- gestion to optimize service quality , and analyze data across disparate systems and domains to proactively detect anomalies. • Application Agent: Application agents primarily deliver customized information and services to users, op erating on end devices or at the netw ork edge. These agen ts support a wide range of industries, suc h as smart factories, intelligen t healthcare, and smart cities [6]. By utilizing pre-trained semantic KBs and generating p ersonalized semantic KGs for users, they enhance b oth inten t extraction and se- man tic represen tation. F urthermore, the inference pro cesses of LAMs demand significan t computational resources. T o address this, resource-constrained application agen ts can offload complex semantic under- standing tasks to the netw ork side, fo cusing lo cally only on seman tic presentation and light weigh t data collection. 2.3 F our Op erational Stages for Semantic-enhanced Agen tic AI Systems T able 4 summarizes the four stages b y outlining their core functions and ho w semantics enhance them. F urther elab oration is pro vided as follows. ZHANG P , et al. Sci China Inf Sci 9 T able 4 F our stages of seman tic-enhanced agentic AI systems Stages Main F unctions Enhancemen t of Semantics Perception Information Acquisition and Representation Improv ed Matching and Alignment Memory Knowledge Storage and Management Stronger Asso ciation and Retriev al Reasoning Decomposition of T asks Deeper Reasoning and Analogy Action Execution of Instructions F aster Execution and F eedback • P erception Stage: The agen t p erceives m ulti-mo dal information from the environmen t and con- v erts unstructured data into a structured representation it can pro cess [47, 48]. T raditional p erception tec hniques can iden tify con ten t but often fail to grasp the relationships b et ween disparate pieces of in- formation or the underlying inten t within a sp ecific context. In contrast, seman tic-enhanced p erception in tegrates multi-modal data into a unified seman tic space, achieving semantic alignmen t [49, 50]. This enables the agent to map v ague user instructions to concrete, actionable inten tions. • Memory Stage: The agent stores and manages knowledge, typically divided into short-term mem- ory and long-term memory . The former handles ongoing con versations or tasks [51, 52], while the latter preserv es historical exp eriences, common sense, and learned knowledge [53, 54]. T raditional databases, whic h primarily store keyw ords, make it difficult for conv entional memory systems to link successive instructions. This limitation hinders an agent’s ability to accumulate exp erience and provide con tin uous, p ersonalized service. Seman tics, how ever, enable the construction of KGs that connect disparate memory fragmen ts into a cohesive net w ork [52, 55]. When a new task arises, seman tic-enhanced memory can retriev e relev ant information based on conceptual similarity , allo wing for more accurate association and retriev al [56, 57]. • Reasoning Stage: The agen t performs analysis, planning, decision-making, and logical deduction based on p erceiv ed information and stored knowledge. T raditional symbolic reasoning often struggles with the ambiguit y and common-sense nuances of the real worl d. By leveraging large-scale seman tic KBs, an agen t can b e equipp ed with common-sense kno wledge during its analysis and planning phases [58]. F urthermore, through semantic similarity , the agen t can transfer solutions from previously solved problems to nov el ones, enabling analogical reasoning [59]. When handling complex tasks, semantics help ensure coherence in the c hain of though t through multi-step logical inference. • Action Stage: The agent translates the decisions derived from reasoning into concrete actions. T raditional action techniques are heavily dep endent on strict application programming interface (API) formats, where even slight deviations in instruction can cause the system to fail. In contrast, a semantic- enhanced agent comprehends user inten t and automatically maps it to the required API parameters [52]. Moreo ver, if an error o ccurs during action, semantic analysis can diagnose the cause and automatically adjust the execution strategy in real time [54, 60]. 3 Seman tic-based Wireless Agen t Netw orks This section pro vides an o verview of the technologies that empow er semantic-based wireless agent net- w orks, spanning the lay ers of in tention extraction and understanding, semantic enco ding and processing, and distributed autonomy and collab oration. 3.1 In ten tion Extraction and Understanding La y er The in tention extraction and understanding la yer enables agents to interpret goals and in ten tions, thereb y facilitating co ordinated decision-making. As illustrated in T able 5, w e review several representativ e approac hes within this la yer, including evidence-based inten t inference, opp onen t mo deling-based in tent extraction, and mind mo deling-based inten t inference. 3.1.1 Evidenc e-b ase d Intent Infer enc e Evidence-based inten t inference deriv es agents’ in ten tions directly from observ able evidence suc h as ac- tions, tra jectories, and interaction patterns. These approaches fo cus on iden tifying behavioral patterns and asso ciating them with p ossible goals or inten tions. By lev eraging observ able signals, they offer a practical means of inferring inten tions in multi-agen t environmen ts. R epresen tative metho ds include ZHANG P , et al. Sci China Inf Sci 10 T able 5 Represen tative approaches for the intention extraction and understanding lay er Category Sub-Category Descriptions Evidence-based Inten t Inference Behavior-based Recognition Infers agen t goals b y matching observ ed action sequences to learned or structured b eha vioral patterns [61, 62]. Plan Recognition as Planning Uses planning algorithms to generate candidate plans and infer goals from their consistency with observ ations. [37, 63–65]. Landmark-based Recognition Identifies landmarks in plans to infer the most probable goal by matching behaviors to those milestones [66–69]. Active Goal Recognition Strategically gathers information b y influencing observ ation through its own actions, enhancing goal inference accuracy [70–72]. Opponent Modeling-based Inten t Inference Hypothesis-based Modeling Uses hypothesised agent types to match observed actions and up date b e- liefs ab out the agen t’s b eha vior [73–75]. Subgoal-based Mo deling Models opp onen t by analyzing subgoals to b etter generalize to unknown opponents [76]. Reward Inference Estimates opponent goals by inferring their reward function through in- verse RL [77–79]. T eam Modeling Infers team goals by analyzing the collective actions of agents within the team [36, 80]. Policy-based Mo deling Uses p olicy-lev el reasoning o ver observed actions to infer hidden goals or evolving strategies [81–83]. Mind Modeling-based Inten t Inference Bay esian Mind Models Uses probabilistic mo dels to estimate agents’ mental states and in tentions, managing uncertaint y in observed b eha viors [84, 85]. Mental State Modeling Models agen ts’ mental states b y analyzing observ ed behaviors, enabling predictions of their inten tions [38, 39, 86]. LLM-based Mind Mo deling Uses LLMs to model agen ts’ in ternal men tal states and in tentions [87–89]. b eha vior-based recognition, plan recognition as planning, landmark-based recognition, and active goal recognition. Beha vior-based Recognition Behavior-based recognition infers agents’ inten tions by analyzing ob- serv able action sequences and matching them to learned or structured b eha vioral patterns. Dann et al. [61] prop ose a m ulti-agent in tention recognition framew ork that emplo ys online goal recognition to infer other agen ts’ goals from their b eha viors and subsequently predict their likely future actions. This metho d supp orts rapid recognition even when agents pursue m ultiple goals in parallel. As illustrated in Figure 4, Su et al. [62] introduce a data-driv en goal recognition framew ork that learns skill representations from observ ed b eha vioral histories. The framework infers goals by analyzing discrepancies b et w een ongoing b eha viors and these learned representations, and remains effective even without complete environmen tal kno wledge or prior information about agents. Plan Recognition as Planning Plan recognition as planning frames in tention inference as a planning problem, using planning algorithms to generate or ev aluate plans for candidate goals and identify the goals that b est explain the observ ations [63, 65]. Sohrabi et al. [37] enhance this approac h b y transforming the recognition task in to a planning problem with action costs, introducing additional explain and discard actions to manage unreliable observ ations, and defining p osterior probabilities ov er plans and goals. This metho d significantly improv es recognition performance in the presence of noisy observ ations. Additionally , Sh vo et al. [64] prop ose a planning-based metho d for multi-agen t plan recognition by compiling the recognition task in to a temporal planning problem with durativ e actions, making it suitable for settings in volving temp oral actions and potentially unreliable observ ations. Landmark-based Recognition Landmark-based recognition infers inten tions by identifying critical in termediate states or milestones that must b e ac hiev ed for sp ecific goals [66, 67, 69]. It matches observ ed b eha viors with these landmarks to recognize inten tions. Pereira et al. [66] prop ose a landmark-based goal recognition metho d that computes planning landmarks for candidate goals and uses them to p erform recognition. This metho d further estimates goal completion by measuring the prop ortion of achiev ed ZHANG P , et al. Sci China Inf Sci 11 Figure 4 Illustration of the continuous goal recognition framew ork [62], consisting of the attention, retention, motiv ation, and recognition stages for capturing relev ant actions, retaining skill traces and represen tations, triggering recognition, and inferring goals. landmarks. It impro v es recognition sp eed b y appro ximately 8.6 times compared to prior metho ds while main taining accuracy . Wilk en et al. [68] introduce a landmark-based hybrid recognition metho d that replaces the planning-based comp onen t in mainstream hybrid goal recognition with a landmark-based al- ternativ e. This approach significan tly reduces recognition time, enables rapid goal recognition in complex scenarios, and enhances ov erall performance. Activ e Goal Recognition Active goal recognition enhances goal inference by enabling the observ er to actively gather information through its own actions, thereby reducing uncertaint y and accelerating the recognition pro cess [71, 72]. Shv o et al. [71] prop ose an active goal recognition approac h in whic h the observ er can sense, act, or interact with the en vironment to obtain more informative observ ations. This method facilitates earlier recognition and can mak e goal iden tification possible even when passive observ ation alone is insufficien t. Zhang et al. [70] presen t a probabilistic activ e goal recognition framew ork that enables the observer to select informativ e actions during recognition. By up dating goal beliefs under uncertain ty , this metho d improv es goal disambiguation and recognition accuracy . 3.1.2 Opp onent Mo deling-b ase d Intent Infer enc e Compared with evidence-based approaches that primarily rely on observ ed behaviors, opp onen t mo deling- based inten t inference inv olves explicitly constructing mo dels of other agents. By capturing opp onen ts’ strategies or b eha vioral tendencies, these methods enable more predictive reasoning ab out their goals and in ten tions in in teractive environmen ts. Representativ e metho ds include h yp othesis-based modeling, subgoal-based modeling, rew ard inference, team mo deling, and p olicy-based mo deling. Hyp othesis-based Mo deling Hyp othesis-based mo deling explicitly maintains multiple hypotheses ab out the possible behaviors of other agents and infers inten tions by up dating their lik eliho ods based on observ ed actions [73, 74]. Albrec ht et al. [73] propose a hypothesis-based metho d that represen ts p ossible opp onen t b eha viors as hypothesis t yp es and up dates p osterior b eliefs o v er them using observ ed actions. Zh u et al. [75] introduce a unified type-based framework that main tains m ultiple hypothetical p olicies for previously unknown opp onen ts and up dates b eliefs ov er them from observed actions. This framework captures other agents’ behavioral tendencies through these evolving h yp otheses. Subgoal-based Mo deling Subgoal-based Mo deling infers inten tions b y mo deling the intermediate subgoals underlying opp onen ts’ b eha viors. Since agents with different strategies may share the same ZHANG P , et al. Sci China Inf Sci 12 subgoals, this approach enables more generalizable opp onent understanding. Y u et al. [76] prop ose an opp onen t mo deling method based on subgoal inference, which predicts opponents’ future subgoals from historical tra jectories and uses them to c haracterize behavioral tendencies. Compared with action-based opp onen t mo deling, this metho d impro v es adaptation to unknown opp onen ts. In scenarios inv olving collab oration with unknown opp onen ts, it achiev es a 5% to 20% improv ement in task success rate o ver previous approac hes. Rew ard Inference Reward inference infers inten tions by estimating the reward functions that b est explain opp onen ts’ observed behaviors in multi-agen t interactions [77, 79]. Lin et al. [77] prop ose a multi- agen t inv erse reinforcement learning (RL) metho d for general-sum sto c hastic games, which infers pla y ers’ rew ards under differen t equilibrium assumptions by solving constrained optimization problems. F u et al. [78] in tro duce a scalable multi-agen t inv erse RL approach that reduces the m ulti-agent problem to a set of single-agent in verse learning problems while preserving strong rationalit y . T eam Mo deling T eam mo deling infers inten tions b y mo deling the collective b eha vior and goals of a team rather than individual agen ts. By capturing the interactions and co ordination among team mem b ers, it pro vides a comprehensive understanding of team inten t in co operative en vironments. Reily et al. [36] prop ose a metho d for real-time team b eha vior recognition using multisensory data. The approac h embeds team b eha viors into a graph structure and uses rob ot learning to mo del individual actions and interrelationships, enabling real-time recognition of team goals and improving co operation in dynamic en vironments. Ying et al. [80] mo del a coop erative team as a single collective agent to simplify inference. By analyzing b oth actions and instructions, this sc heme infers the shared goals of the team and enhances understanding of team-lev el b eha viors in coop erativ e en vironments. P olicy-based Mo deling P olicy-based mo deling infers in tentions b y mo deling the strategies or policies that b est explain observed actions [82, 83]. By simulating and refining p oten tial opp onen t policies, it enables more accurate prediction of opp onen ts’ b eha viors and goals. Raileanu et al. [81] prop ose the self-other mo deling scheme, where agents use their own p olicies to mo del others and infer the goals of others during online interactions. The approach significantly improv es p erformance in m ulti-agent tasks b y adapting to the behavior of other agents and optimizing based on inferred goals. In adversarial scenarios, it achiev es a win rate more than 5 times higher than baseline schemes. Zhang et al. [82] in tro duce a p olicy reconstruction metho d for multi-ob jectiv e normal form games, where agents infer opp onen t p olicies from conditional action frequencies, improving p erformance b y predicting actions more accurately and adjusting strategies for better outcomes in Nash equilibrium. 3.1.3 Mind Mo deling-b ase d Intent Infer enc e Mind mo deling-based inten t inference further fo cuses on reasoning ab out the in ternal men tal states that driv e agen t behavior. Drawing on cognitive concepts such as theory of mind, these approac hes infer agen ts’ b eliefs and inten tions from observed actions and interactions. This enables a deeper understanding of in tentions in complex multi-agen t environmen ts. Representativ e metho ds include Bay esian mind mo dels, men tal state mo deling, and LLM-based mind modeling. Ba y esian Mind Mo deling Bay esian mind mo deling infers agen ts’ in tentions by capturing the relation- ship b et ween their men tal states and observ able behaviors, lev eraging a Bay esian framew ork for in v erse reasoning. P¨ opp el et al. [84] in tro duce a satisficing Bay esian Theory of Mind (T oM) that simplifies com- plex Bay esian inference b y switching b et ween discrete b elief states based on the lev el of surprisal. This approac h explains agen t b eha vior efficien tly , particularly in tasks in volving v arious sources of uncertain ty , b y balancing computational demands with inferen tial accuracy . Lim et al. [85] further propose a Ba yesian T oM metho d for multi-agen t co op eration, in which agents mo del each other’s inten tions to support more effectiv e collab orativ e decision-making. It enhances team w ork by enabling agen ts to anticipate and adapt to eac h other’s goals in joint tasks. Men tal State Mo deling Men tal state mo deling infers inten tions by explicitly representing other agen ts’ mental states, including their b eliefs, desires, and inten tions [39, 86]. Rabinowitz et al. [86] ZHANG P , et al. Sci China Inf Sci 13 Figure 5 Illustration of the T oM reasoner for partner inten tion mo delling [89], consisting of information extraction, T oM reason- ing, and partner reasoning stages for constructing structured prompts, generating T oM reasoning, and inferring partner inten tion representations. prop ose the Theory of Mind neural netw ork (T oMnet), which learns to mo del agents’ mental states from limited b eha vioral observ ations. This approach enables the prediction of agen t b eha vior across div erse agen t types with minimal data. W ang et al. [38] in tro duce T oM2C, a multi-agen t co operation method that applies theory of mind to explicitly mo del the in tentions and beliefs of other agents. By relating agents’ men tal states to their goals, T oM2C improv es collab orativ e p erformance. In multi-sensor, m ulti-target co verage scenarios with restricted lo cal observ ations, this method achiev es a 5% to 10% impro vemen t in goal co verage ov er prior approaches. LLM-based Mind Modeling LLM-based mind mo deling leverages LLMs to infer agen ts’ mental states, exploiting their adv anced reasoning capabilities to simulate and predict the inten tions of others [87, 88]. Cross et al. [87] in tro duce Hypothetical Minds, a framew ork that employs an LLM-based T oM mo dule to h yp othesize and ev aluate opp onen ts’ strategies in m ulti-agen t settings. This approac h yields substan tial p erformance gains in b oth mixed-motive and collab orativ e tasks. As illustrated in Figure 5, Li et al. [89] prop ose an LRM-based T oM framew ork that infers partners’ b eliefs and inten tions in co operative multi-agen t tasks. It enhances partner mo delling through structured prompting and T oM reasoning, leading to more effectiv e co operation across div erse scenarios. 3.2 Seman tic Enco ding and Pro cessing La y er The semantic encoding and processing lay er functions as a bridge b et ween agen t in tention and physical transmission. As illustrated in T able 6, w e explore several k ey enabling tec hnologies in tegrated within this la yer, including seman tic-based coding, semantic-based b eam managemen t, semantic-based CSI feedbac k, seman tic-based HAR Q, AoSI, and seman tic KBs. 3.2.1 Semantic-b ase d Co ding T o address the cliff effect inherent in conv entional separation-based co ding schemes, where slight fluc- tuations in c hannel quality can lead to a catastrophic breakdo wn in decoding reliability , seman tic-based co ding adopts an end-to-end learnable strategy integrating feature extraction, source compression, and c hannel co ding. This paradigm shifts the fo cus from bit-level accuracy to the preserv ation of core seman- tic meaning, significan tly enhancing transmission robustness in adverse environmen ts. Semantic-based co ding schemes are generally divided in to t wo primary categories: Joint Source-Channel Coding (JSCC) and generativ e co ding. ZHANG P , et al. Sci China Inf Sci 14 T able 6 Represen tative approaches for the semantic enco ding and processing layer Category Sub-Category Descriptions Semantic-based Coding JSCC Parameterizes functions of transmitter and receiver using deep neural netw ork to map source data into channel symbols. [90–93]. Generative Co ding Leverages generativ e mo dels, such as GAN [94], diffusion [95, 96], and LLM [97], to achiev e semantic reconstruction based on conditional generation. Semantic-based Beam Management Visual Semantic-assisted Beam Management Uses extracted environmental semantics, suc h as block age distribution [98], k eypoints coordinates [99], and localization [100, 101], to infer the optimal b eam index. Channel Seman tic-assisted Beam Management Combines channel semantics and source semantics to improv e b eam- forming p erformance [102, 103]. Semantic-based CSI F eedback Reconstruction-oriented CSI F eedbac k Reconstructs CSI by extracting its represen tative semantic features [104–106]. Knowledge-driv en CSI F eedback Enhances CSI b y lev eraging pre-shared prior knowledge, suc h as se- mantic lab el [107] and channel quality indicator [108], to achiev e in- ference. Optimization for CSI F eedback Optimizes CSI based on specific task, such as spectral efficiency [109], data hiding [110], and image reconstruction [111]. Semantic-based HARQ Similarity-based HARQ Enables decision mec hanism based on semantic similarity instead of CRC using similarit y detection netw ork [112, 113]. F eature-based HARQ Identifies retransmitted data based on significant features, such as semantic base [114] or importance map [115]. Adaptive HARQ Schedules transmission resources adaptively to ac hieve dynamic mec h- anism using RL or p olicy netw ork [116]. Age of Semantic Information Reconstruction-oriented AoSI Defines AoI in seman tic level to minimize the discrepancy between the physical reality and its semantic representation [117–119]. T ask-orien ted AoSI F ocuses on the information significance in sp ecific tasks to prioritize intended ob jective [120, 121]. Semantic KB Semantic KB Construction Defines the structural representation and main tenance mechanisms to construct semantic KB [122, 123]. Semantic KB Deplo yment Leverages shared KBs to emp o wer transmission tasks, such as index- ing [124], residual compression [125], GraphRAG-assisted subgraph extraction [16], and logical error correction based on triples [126]. JSCC JSCC utilizes neural netw orks to directly map multi-modal source data into channel sym b ols to build the connection b et ween transmitter and receiv er. By jointly optimizing the transmitter and receiv er, JSCC achiev es effectiv e p erformance gains even in low SNR environmen ts. Bourtsoulatze et al. [90] first propose a deep JSCC tec hnique for wireless image transmission that eliminates the need for separate compression or error correction coding. By parameterizing the enco der and deco der functions with con v olutional neural net works (CNNs), the scheme directly maps image pixel v alues to complex- v alued channel input symbols. F urthermore, Y ang et al. [92] prop ose SwinJSCC, a nov el neural JSCC bac kb one that in tegrates the Swin T ransformer to ov ercome the limited capabilities of traditional CNN- based mo dels. They introduce channel ModNet and rate Mo dNet to scale laten t representations based on CSI and target transmission rates, ac hieving sup erior p erformance and faster end-to-end co ding sp eeds. Additionally , Bo et al. [93] propose a joint co ding-mo dulation scheme that maps source data to discrete constellation points, enabling digital SemCom. Generativ e Co ding Generativ e co ding lev erages the prior knowledge em b edded in adv anced gener- ativ e mo dels to reconstruct transmitted information by framing the seman tic deco ding as a conditional generation task. Xu et al. [95] prop ose a latent diffusion mo del-based scheme, which emplo ys a joint seman tic equalizer and denoiser mo dule to reco v er clean semantic features and mitigate c hannel effects. As depicted in Figure 6, Meng et al. [96] prop ose an agentic AI-driv en semantic steganograph y communi- cation scheme that hides a secret image into a stego image based on a diffusion mo del to achiev e invisible encryption, which realizes secure SemCom through generativ e co ding. Mean while, Salehi et al. [97] in- ZHANG P , et al. Sci China Inf Sci 15 Figure 6 Illustration of agen tic AI-driv en seman tic steganograph y comm unication [96], whic h includes semantic extraction, digital token controlled reference image generation, cov erless steganography , seman tic codec, and optional task-orien ted enhancement modules. tro duce the KG-LLM framework, whic h integrates K G extraction for structured compression with LLM co ding for con textualized seman tic representation. 3.2.2 Semantic-b ase d Be am Management Seman tic-based b eam management aims to optimize b eamforming by incorporating semantic information deriv ed from the en vironment or the source data, thereby enhancing efficiency and robustness to ensure stable agent comm unication even in highly dynamic scenarios. It can b e divided into visual semantic- assisted and channel seman tic-assisted beam management. Visual Seman tic-assisted Beam Management Visual semantic-assisted b eam management relies on high-lev el en vironmen tal cues, such as blo c k age distribution, keypoint coordinates, and lo calization, to proactiv ely predict and maintain optimal beam alignmen t. Y ang et al. [99] extract seman tic information from en vironmental image s captured b y street cameras and selectively enco de it based on task relev ance to support channel-related decision-making. By predicting the optimal b eam index and p oten tial blo c k- age states without pilot training or costly b eam sw eeps, the prop osed scheme enables efficient b eam managemen t. F urthermore, W en et al. [98] define en vironmental seman tics as the spatial distribution of scatterers that influence the wireless c hannel and emplo y keypoint detection techniques to extract these features from raw images, subsequently mapping them to optimal b eam pairs. Channel Seman tic-assisted Beam Management Channel semantic-assisted b eam managemen t fo cuses on the seman tics of the source con tent to ensure that b eamforming vectors are optimized based on the significance and c haracteristics of the transmitted information. By in tegrating source imp ortance in to the spatial domain, it ac hiev es superior transmission efficiency and task-oriented p erformance. W u et al. [102] prop ose a deep join t seman tic coding and beamforming sc heme. Sp ecifically , the authors utilize t wo specialized semantic extraction netw orks to extract features from b oth the image source and CSI, and in tro duce hybrid data-driven and mo del-driv en semantic-a ware b eamforming net works that jointly optimize coding and spatial pro cessing. ZHANG P , et al. Sci China Inf Sci 16 Figure 7 Illustration of the generative semantic HARQ framework [113], whic h includes the knowledge base, the semantic com- munication system, and the HAR Q enhancement mo dule 3.2.3 Semantic-b ase d Channel State Information (CSI) F e e db ack CSI is essential for agents to enable c hannel-adaptive transmission. Ho w ever, the massiv e dimensional- it y of channels imposes excessive feedbac k and computational ov erhead. Seman tic-based CSI feedbac k addresses these c hallenges by extracting and transmitting only task-relev ant channel features, ensuring robust and goal-oriented p erformance across dynamic wireless environmen ts. Represen tative approaches include reconstruction-orien ted CSI feedbac k, knowledge-driv en CSI feedback, and optimization for CSI feedbac k. Reconstruction-orien ted CSI F eedback A line of research fo cuses on reconstructing the CSI matrix b y extracting its represen tative semantic features. Xie et al. [104] propose a learnable CSI fusion SemCom framew ork that employs an attention masking map and treats MIMO CSI as side information to enhance robustness. Similarly , Gong et al. [106] in tro duce a c hannel matrix adaptor that op erates alongside the c hannel co dec to comp ensate for misaligned CSI, thereby mitigating reconstructed errors b et ween the estimated and actual channel matrices. Kno wledge-driv en CSI F eedbac k Kno wledge-driven CSI feedbac k leverages prior knowledge shared b et ween the transmitter and receiver to achiev e accurate channel inference. Ren et al. [108] prop ose SemCSINet, a seman tic-aw are T ransformer-based framew ork that incorporates the channel quality indi- cator as a semantic prior to guide the feedbac k lo op in massive MIMO systems, significantly impro ving reconstruction accuracy and system robustness. Mean while, Zhu et al. [107] emplo y clustering metho ds to predefine a semantic lab el database for CSI feedbac k. It maps high-dimensional CSI into concise seman tic lab els and transmits only the lab el corresp onding to the current CSI, substantially reducing feedbac k o verhead while preserving task accuracy . Optimization for CSI F eedbac k Rather than pursuing full channel reconstruction, this approach optimizes the feedback process based on sp ecific ob jectives and extracts only the channel features critical to the current task. Cao et al. [110] in tro duce an adaptiv e CSI feedback framework based on the infor- mation b ottlenec k principle, optimized for m ulti-task efficiency and data priv acy . A key innov ation lies in the hidden transfer of sensory data within the CSI feedback loop, thereb y eliminating the additional resource o verhead typically required for separate data rep orting. 3.2.4 Semantic-b ase d Hybrid Automatic R ep e at r eQuest (HARQ) Seman tic-based HARQ shifts the reliabilit y paradigm from bit-lev el error correction to the preserv ation of meaning. By integrating semantic a wareness in to feedback and retransmission processes, it optimizes transmission efficiency and ensures reconstruction quality . It can b e categorized in to similarit y-based HAR Q, feature-based HARQ, and adaptive HARQ. ZHANG P , et al. Sci China Inf Sci 17 Similarit y-based HARQ Similarit y-based HARQ replaces bit-lev el c hecks with semantic similarity metrics to determine whether retransmission is needed, thereby av oiding unnecessary o verhead. Jiang et al. [112] prop ose a Sim32 framework that emplo ys a similarity detection netw ork in place of the con ven tional cyclic redundancy c heck. This approach allo ws the receiv er to accept pack ets that contain bit errors but remain seman tically accurate, significantly reducing the n umber of required retransmissions and conserving communication resources in low-SNR regimes. Additionally , Li et al. [113] prop ose a generativ e semantic HAR Q framework tailored for intelligen t transportation systems, as shown in Figure 7. It in tro duces a synonymous combining strategy that lev erages semantic distance and local kno wledge bases, enabling the receiv er to autonomously recov er information b y iden tifying seman tically equiv alent con tent in c hallenging v ehicular environmen ts. F eature-based HAR Q F eature-based HARQ enhances transmission reliabilit y b y iden tifying and re- transmitting sp ecific seman tic comp onen ts or significan t features. Zheng et al. [114] dev elop a HARQ mec hanism based on a semantic base, where the system precisely identifies and retransmits only the erroneous semantic elements using contextual correlations. Mean while, Sheng et al. [115] in tro duce an imp ortance map-guided HAR Q for co operative perception, which extracts critical semantic information to ensure the reliable transmission of task-essential features, achieving robust p erception p erformance with minimal data ov erhead. Adaptiv e HAR Q Adaptive HAR Q fo cuses on the intelligen t scheduling of transmission resources to dynamically balance reliabilit y and efficiency . Zhou et al. [116] prop ose an adaptive bit rate con trol sc heme for incremen tal kno wledge-based HAR Q, whic h emplo ys a p olicy net work trained via RL. By sensing real-time channel conditions and con tent complexity , the framework adaptiv ely determines the optimal initial transmission length and incremental redundancy steps, effectively optimizing the trade-off b et ween reconstruction distortion and transmission latency . 3.2.5 A ge of Semantic Information Driv en by the demand for real-time intelligence, the Age of Information (AoI) has emerged as a funda- men tal metric for quantifying information freshness, t ypically relying on fixed or p eriodic up date p olicies to maximize throughput or minimize end-to-end delay . The AoSI extends the traditional AoI by incor- p orating information significance and task effectiveness into the freshness metric. It shifts the fo cus from purely temporal freshness to whether the information remains seman tically v aluable and relev ant to the receiv er’s ob jectiv es. It mainly includes reconstruction-oriented and task-oriented AoSI. Reconstruction-orien ted AoSI Reconstruction-oriented AoSI aims to minimize the semantic discrep- ancy b et ween the source and the monitor, ensuring the receiv er maintains a high-fidelit y representation of the physical state. Rather than measuring elapsed time, it quantifies the duration during whic h the re- ceiv er’s knowledge is inconsisten t with the actual state. Maatouk et al. [117] propose the Age of Incorrect Information (AoI I), which alerts the system only when the monitor holds incorrect or outdated infor- mation, effectiv ely shifting the fo cus from fresh up dates to fresh “informative” up dates. F urthermore, Li et al. [118] in tro duce a goal-orien ted tensor framew ork to in tegrate v arious seman tic metrics such as AoI I and the V alue of Information (V oI). Additionally , Delfani et al. [119] explore Query V ersion AoI (QV AoI) in energy-harvesting systems, optimizing transmission p olicies to deliver the most significant up dates while main taining energy sustainability . T ask-orien ted AoSI T ask-orien ted AoSI focuses on the information relev ance to specific do wnstream tasks. It p osits that information freshness is critical only at the moments of task execution, thereby prioritizing up dates that directly serve the intended goal. Chiariotti et al. [120] prop ose the Query AoI (QAoI) for pull-based communication scenarios where information is consumed only up on query generation. By optimizing QAoI, the authors significantly reduce the p erceiv ed age at relev an t instan ts for b oth p eriodic and sto c hastic queries, demonstrating sup erior resource efficiency compared to traditional AoI-based scheduling. Moreo ver, Y ates [121] introduces the concept of version age in gossip net works, whic h measures how man y versions out of date a no de’s kno wledge is relative to the source. This discrete metric effec tiv ely c haracterizes the timeliness of information diffusion in distributed netw orks, ensuring that nodes prioritize the most recent versions to achiev e collectiv e task ob jectiv es. ZHANG P , et al. Sci China Inf Sci 18 Figure 8 Illustration of the knowledge graph-assisted SemCom framework [16], which shows the seman tic extraction at the transmitter and text reconstruction at the receiver. 3.2.6 Semantic Know le dge Base The seman tic KB stores semantic logic and relationships, enabling effective semantic representation through integrated pro cessing, memory , and reasoning capabilities. Researc h in this area can b e cat- egorized in to the construction and deploymen t of seman tic KBs. Seman tic KB Construction Seman tic KB construction fo cuses on structural representations and efficien t maintenance mechanisms to model complex data attributes and task requirements. Ren et al. [123] prop ose a generative seman tic KB architecture that partitions the kno wledge space into three sp ecialized sub-KBs: source, task, and channel KBs. By in tro ducing semantic metalets to standardize units such as em b edding v ectors, this scheme parameterizes source messages into lo w-dimensional spaces, effectiv ely bridging the gap b et w een raw data and seman tic meaning. Similarly , W ang et al. [122] dev elop a unified hierarchical semantic KB framework designed for multi-task scenarios. It employs horizontal construction to maximize the semantic represen tation space and v ertical construction to exploit cross- task correlations through a deep K-subspace clustering metho d, ac hieving a significan t improv emen t in kno wledge searc h efficiency for complex reconstruction tasks. Seman tic KB Deplo ymen t Semantic KB deplo yment inv estigates the practical effectiv eness of shared KBs in empow ering specific transmission tasks, with an emphasis on indexing, compression, and enhanced reasoning capabilities. T o optimize transmission efficiency , Y an et al. [124] employ shared KBs to imple- men t a generative enco ding-decoding paradigm, where transmitted indices and minimal residual data act as “prompts” to trigger high-fidelity conten t synthesis at the receiv er. F or robust semantic delivery , Hu et al. [125] in tro duce V Q-V AE-enabled co debo oks to extract intrinsic features, reducing the statistical dis- crepancy b et w een source messages and training examples and thereby mitigating the impact of semantic noise. Meanwhile, to further refine reasoning capabilities, F an et al. [16] prop ose a GraphRAG-assisted subgraph extraction metho d, as illustrated in Figure 8. It iden tifies the minim um connected subgraph within a K G to pro vide precise con text for semantic interpretation, significantly reducing bandwidth consumption. Additionally , Zhou et al. [126] apply KGs to logical error correction, ensuring consistency of the transmitted information. 3.3 Distributed Autonom y and Collab oration Lay er T o supp ort massiv e connectivit y and autonomous interaction of AI agen ts in future 6G netw orks, the tra- ditional bit-lev el cen tralized communication architecture must evolv e in to a semantic-based distributed paradigm. As summarized in T able 7, we thoroughly inv estigate representativ e approaches for the dis- tributed autonom y and collab oration lay er, which includes distributed access, kno wledge collab oration, and resource scheduling. 3.3.1 Distribute d A c c ess By shifting from bit-level orthogonal multiplexing to semantic-lev el access, distributed access ensures the efficien t utilization of limited spectrum resources among massive agents. It mainly includes MDMA and seman tic fusion. Mo del Division Multiple Access T raditional multiple access tec hniques partition physical resources across time, frequency , or co de domains, often leading to severe capacity bottlenecks and a pronounced ZHANG P , et al. Sci China Inf Sci 19 T able 7 Represen tative approaches for the distributed autonom y and collab oration layer Category Sub-Category Descriptions Distributed Access MDMA Allocates unique semantic enco der-decoder pairs or leverages orthogonal embeddings to decouple multi-agen t access in the semantic space [41–43]. Semantic F usion F uses multi-modal or multi-user semantic representations at the access level to resolve ambiguit y and reduce transmission redundancy [127–129]. Knowledge Collaboration F ederated Semantic Learning Enables agents to collab orativ ely train and up date semantic co dec models in a resource-efficien t and p ersonalized manner without sharing raw data [130–132]. Multi-Agent Alignment F acilitates semantic interoperability and background KB consensus among diverse autonomous agents via signaling games or curriculum learning [133–135]. Semantic Relaying Employs intermediate no des for progressiv e semantic feature computation, denoising, and forwarding ov er long distances [136–138]. Collaborative Inference Partitions semantic extraction and task execution w orkloads among edge devices and servers for single-task or multi-task joint knowledge inference [44, 139, 140]. Resource Scheduling Query-Seman tic Sc heduling Optimizes communication and computation resources jointly , driven top-down by sp ecific downstream tasks or user query inten ts [141–143]. Semantic Imp ortance-Aw are Scheduling Prioritizes the allo cation of physical resources b ottom-up based on the spatial-temp oral imp ortance of semantic features [144–146]. Adaptive Semantic Scheduling Dynamically controls the semantic compression ratio and hybrid transmission mo des in response to real-time c hannel fluctuations [147–149]. “cliff effect” in dense agent communication netw orks. MDMA transforms this paradigm by multiplexing users directly in a high-dimensional semantic feature space. Zhang et al. [41] first establish the foun- dational MDMA framework, which allocates unique seman tic enco der-decoder pairs for each user. This arc hitecture is grounded in the assumption that different AI models p ossess natural separabilit y , enabling a BS to distinguish superimp osed signals by deco ding them through user-specific models without requir- ing extra bandwidth. How ever, as the n umber of agents gro ws, o verlapping seman tic features can cause significan t mutual in terference. T o address this, subsequent adv ances in MDMA aim to achiev e strictly in terference-free concurrency . F or example, Orthogonal-MDMA (O-MDMA) leverages the inherent noise resilience of semantic mo dels to suppress multi-user interference through structured subspace pro jec- tions [42]. F urthermore, addressing the stringent bandwidth constrain ts and highly dynamic top ologies in satellite-ground links, Cao et al. [43] prop ose Sensitivit y-aw are MDMA (S-MDMA), which is shown in Figure 9. This approac h extracts and merges shared seman tic features and applies a sensitivit y-based sorting algorithm to retain only the most critical structural comp onen ts. By mapping shared and unique features into mutually orthogonal subspaces via Kroneck er-based embedding, S-MDMA eliminates inter- user interference entirely . Simulation results demonstrate that it maintains a Peak Signal-to-Noise Ratio (PSNR) ab o ve 28 dB and a structural similarity index (SSIM) exceeding 0.95 even under severely de- graded conditions. Seman tic F usion Semantic fusion functions as a k ey access-lev el mechanism that syn thesizes redundan t or complementary data generated b y multiple agents. By merging data b efore transmission, it resolv es mo dal ambiguities and substantially reduces communication ov erhead ov er the air interface. Zh u et al. [127] develop a multi-modal fusion framew ork that in tegrates diverse sensory inputs, such as visual, auditory , and textual data, directly at the edge device. This integration enables the agent to transmit a unified, low-dimensional semantic representation capable of supp orting m ultiple downstream tasks sim ultaneously . T o further reduce ambiguit y across modalities, Li et al. [128] introduce highly reliable cross-mo dal mapping mec hanisms that ensure consisten t seman tic extraction b y pro jecting heterogeneous data types into a shared latent space. In multi-agen t scenarios, physical-la yer fusion strategies can merge o verlapping seman tic features among neighboring agents. F or example, T ong et al. [129] prop ose a multi- ZHANG P , et al. Sci China Inf Sci 20 Figure 9 System model of multiuser satellite-ground SemCom [43], illustrating the collaborative framework among the LEO satellite, semantic transmission links, and diverse ground terminals for efficient feature-based communication. user semantic fusion strategy tailored for degraded broadcast channels. By p erforming feature-lev el aggregation directly at the access point, this approach exploits the sup erposition prop ert y of wireless c hannels to combine highly correlated seman tic elemen ts, thereby eliminating redundant background information. Simulation results indicate that such collab orativ e fusion sc hemes can reduce the ov erall transmission ov erhead b y up to 40% while preserving the semantic fidelity required for downstream p erception tasks. 3.3.2 Know le dge Col lab or ation Kno wledge collab oration establishes a closed-lo op lifecycle encompassing mo del consensus building, con- text alignmen t, physical-la y er transmission, and joint execution. It enables isolated agents to form a cohesiv e, distributed intelligence net work. F ederated Semantic Learning F ederated semantic learning facilitates the collab orativ e training and up dating of semantic co dec mo dels across agents without sharing raw priv acy-sensitiv e data, forming the foundation of distributed seman tic-level in telligence. By exchanging only mo del gradien ts or weigh ts, agen ts can collectively build a generalized semantic representation space. As sho wn in Figure 10, Li et al. [130] prop ose a decentralized seman tic federated learning architecture tailored for real-time public safet y tasks, effectiv ely av oiding the single p oin t of failure and data centralization b ottlenec ks inher- en t in traditional cloud-based training. T o mitigate the significan t comm unication ov erhead asso ciated with transmitting large neural net work parameters o v er wireless c hannels, Liu et al. [131] in tro duce a resource-aw are allocation and semantic extraction sc heme sp ecifically designed for federated semantic learning-emp o wered vehicular netw orks. F urthermore, addressing the highly heterogeneous data distribu- tions (Non-I ID) across different physical en vironmen ts, Peng et al. [132] develop a p ersonalized federated learning framework. It in tegrates global kno wledge aggregation with local micro-mo del fine-tuning, en- abling agents to main tain customized lo cal mo dels that adapt effectiv ely to specific en vironmental c hanges while still b enefiting from collab orativ e in telligence. Multi-Agen t Alignmen t Multi-agen t alignmen t resolv es discrepancies arising from disparate back- ground KBs, serving as an essen tial prerequisite for accurate semantic encoding and deco ding. Theoretical framew orks mo del this alignment pro cess as signaling games, where agents with correlated KBs negoti- ate and conv erge on a common semantic language through con tinuous mathematical interaction [133]. F or more complex environmen ts, F arsh bafan et al. [134] employ reinforcement-based curriculum learning to gradually adapt the transmission strategies of goal-oriented agen ts, activ ely minimizing the seman tic am biguity caused b y mismatched lo cal con texts. In practical implemen tations, these mechanisms are realized through explicit KB synchronization proto cols to ensure seman tic interoperability . F or instance, Rosic et al. [135] address seman tic in terop erabilit y in autonomous maritime domains, prop osing robust mec hanisms to dynamically align the KGs of disparate marine agents. By aligning these distributed kno wledge structures prior to execution, simulation results demonstrate that agents can achiev e ov er a 30% improv ement in seman tic recov ery accuracy and significan tly reduce op erational conflicts when na vigating in previously unseen collaborative en vironments. ZHANG P , et al. Sci China Inf Sci 21 Figure 10 The decentralized semantic federated learning framework [130], illustrating the integration of local semantic enco ding at edge clients and cen tralized parameter aggregation at the base station for efficient data reco very . Figure 11 The SemRelay-aided semantic communication system [136], illustrating the uplink bit transmission from users to the SemRelay and the subsequent semantic transmission to the base station based on shared probabilit y graphs. Seman tic Rela ying Semantic relaying transforms intermediate no des in to intelligen t en tities for pro- gressiv e feature computation, denoising, and regeneration, thereby extending the co verage of kno wledge exc hange. Rather than simply amplifying and forwarding noisy analog signals, these relays activ ely de- co de, refine, and re-enco de the seman tic features to combat channel fading. As illustrated in Figure 11, Zhao et al. [136] utilize probabilistic graphs to mathematically mo del state transitions at semantic rela ys, prop osing a joint communication and computation resource allo cation sc heme that maximizes co operative reliability ov er multiple hops. Addressing the complexities of m ulti-user conten tion, Hu et al. [137] formulate a non-conv ex optimization problem to optimally divide bandwidth and computational resources among m ultiple users relying on a shared seman tic relay . Additionally , Liu et al. [138] expand the optimization scope b y jointly designing the physical placement of seman tic rela ys and the associated bandwidth allocation. Collab orativ e Inference Collaborative inference partitions hea vy seman tic extraction and deep learn- ing workloads among resource-constrained agents and p o werful edge serv ers, transforming isolated pro- cessing into distributed joint reasoning. The paradigm has ev olv ed from simple one-to-one splitting to complex m ulti-view aggregation. Lo et al. [139] initially inv estigate a device-edge collab oration scheme in which the agen t executes the early la yers of a neural netw ork to extract ligh tw eight semantic features, ZHANG P , et al. Sci China Inf Sci 22 offloading computationally in tensive classification la yers to the edge server. Extending this to m ulti-agent settings, Shao et al. [140] prop ose a task-oriented comm unication framework for multi-device co operative edge inference. Here, the edge server aggregates spatially diverse, multi-view semantic features from distributed cameras or sensors to enhance the global detection accuracy . Recen t adv ances hav e led to framew orks that enable clusters of agents to extract generalized features for simultaneous, heterogeneous reasoning. Zh u et al. [44] introduce a co op erativ e and collaborative multi-task SemCom framework for distributed sources. It allows a cluster of agents to extract generalized se man tic features co operatively , sim ultaneously supp orting diverse downstream reasoning tasks such as ob ject detection and semantic segmen tation, highligh ting the efficiency of distributed m ulti-task execution. 3.3.3 R esour c e Or chestr ation and Sche duling With access and collaboration mechanisms established, the netw ork must optimally allo cate its limited ph ysical resources. In seman tic-based agent communication netw orks, resource sc heduling shifts funda- men tally from maximizing bit-rate to maximizing the effectiv e deliv ery of meaning. This pro cess follows the logic: defining the inten t, ev aluating conten t imp ortance, and adapting to the physical c hannel condi- tions. Represen tative metho ds include query-seman tic scheduling, seman tic imp ortance-a ware sc heduling, and adaptiv e semantic scheduling. Query-Seman tic Sc heduling Query-semantic scheduling allo cates computing and comm unication re- sources based explicitly on the diverse requirements and priorities of the downstream tasks requested by agen ts. Cai et al. [141] prop ose a query-aw are seman tic enco der-based resource allo cation p olicy that dynamically schedules netw ork slices b y precisely matching the extracted data stream c haracteristics to sp ecific user querying inten ts. Similarly , fo cusing on user-centric requiremen ts, Y an et al. [142] for- m ulate a quality of exp erience (QoE)-based resource allo cation algorithm for multi-task net works. It optimizes pow er and c hannel assignmen ts to maximize sub jective task satisfaction rather than ob jective bit-rates. F or mission-critical scenarios, Zeng et al. [143] develop a task-orien ted SemCom scheme uti- lizing rate-splitting multiple access. Instead of treating all data equally , this approach fo cuses strictly on the effectiv eness lev el of comm unication. Simulation results demonstrate that it can increase o verall QoE scores b y ov er 20% in real-time control applications while satisfying ultra-reliable and lo w-latency comm unication constrain ts. Seman tic Imp ortance-Aw are Scheduling Once task inten t is defined, the sc heduling logic tran- sitions to a conten t-driven phase. Semantic imp ortance-a ware scheduling prioritizes ph ysical resource allo cation by ev aluating the spatial-temp oral v alue of extracted semantic features, ensuring that band- width and p o wer are directed primarily tow ard critical information. T o quantify this v alue, W ang et al. [144] introduce a feature imp ortance-a ware semantic transmission strategy that dynamically allocates p o wer blo c ks b y calculating the precise contribution of eac h seman tic sym b ol to final classification accu- racy using atten tion mechanisms. F rom a temp oral p erspective, Chen and Gong [145] utilize the AoSI metric to design a multi-source scheduling algorithm, prioritizing the transmission of fresh and highly impactful seman tic up dates to preven t knowledge staleness. F urthermore, W ang et al. [146] employ an atten tion-based deep RL approach to automate this ev aluation, enabling the BS to dynamically learn the hidden correlations b et ween semantic imp ortance distributions and channel conditions. Adaptiv e Semantic Scheduling sc heduling strategies must contend with highly dynamic physical en vironments. Adaptive semantic scheduling op erates at the execution lev el, dynamically adjusting se- man tic compression ratios and switching transmission mo des in resp onse to real-time CSI fluctuations. Liu et al. [147] prop ose an adaptable semantic compression algorithm that optimally balances local com- putational latency against transmission dela y b y adjusting the dimension of seman tic vectors based on real-time SNR. F or high-throughput applications, Zhu et al. [148] design an adaptiv e con trol sc heme for v olumetric video services, dynamically altering compression ratios to maintain visual QoE amidst sev ere bandwidth constraints. Moreo ver, recognizing that pure SemCom may not b e optimal in high SNR envi- ronmen ts, Xia et al. [149] prop ose a hybrid bit/semantic comm unication optimization framew ork. This sc heme allo ws the net work to dynamically switch agen ts betw een traditional bit-level transmission and se- man tic extraction modes depending on curren t channel qualit y . Suc h real-time adaptation ensures system ZHANG P , et al. Sci China Inf Sci 23 T able 8 Representativ e approaches for the semantic-based p erception stage Category Sub-Category Descriptions Semantic F eature Extraction and Representation Unified Multi-mo dal Embed- ding Maps heterogeneous data into a shared semantic space to enable emer- gent alignment and cross-mo dal correlation [49, 150]. High-level Symbolic Abstrac- tion Conv erts continuous features in to structured symbols via self-sup ervised learning or 3D geometric-semantic fusion [47, 151]. T ask-Orien ted Environmen tal Sensing Goal-Conditioned Selectiv e Atten tion Dynamically filters perception streams based on current inten t using learned queries or fov eal sampling mechanisms [152, 153]. Inten t-Driven Anomaly Detec- tion Identifies semantic paradoxes or ph ysical state warnings by checking con- sistency b et ween perception and logical priors [154, 155]. Semantic Ob ject Grounding and T rac king Zero-shot Ob ject Lo calization Achiev es real-time mapping betw een linguistic symbols and visual enti- ties in op en-w orld scenarios [50, 156]. Spatio-temporal Seman tic Consistency Maintains semantic identit y across video frames using reasoning- embedded segmentation or temp oral association [157, 158]. Embodied Semantic Environmen t Understanding Semantic Scene Graph Gener- ation Constructs global topological maps b y injecting op en-v o cabulary seman- tics into 3D reconstructions [48, 159]. W orld Model-based Predictive Perception Infers future intent and environmen t ev olution using w orld mo dels or large-scale diffusion architectures [160, 161]. resilience, providing robust and highly adaptable sc heduling closure for the en tire agent comm unication net work even during severe channel fading ev ents. 4 Seman tic-enhanced Agen tic AI Systems This section summarizes how seman tics enhance the four stages of agentic AI systems: p erception, memory , reasoning, and action. 4.1 Seman tic-based P erception Stage Seman tics enable the p erception stage to accurately identify entities with logical significance and their ph ysical attributes from massive, noisy , and unstructured m ulti-mo dal signals. T able 8 illustrates how agen ts construct a high-dimensional seman tic view of the physical world from four dimensions: semantic feature extraction and representation, task-oriented en vironmen tal sensing, semantic ob ject grounding and trac king, and em b o died semantic environmen t understanding. 4.1.1 Semantic F e atur e Extr action and R epr esentation Seman tic feature extraction and representation constitute the physical foundation of p erception, aiming to map heterogeneous sensor signals into a unified, computable semantic space via deep neural netw orks, thereb y ac hieving high compression and symbolization of information. Unified Multi-mo dal Embedding Achieving a unified representation of heterogeneous p erceptual information is essential for general cognitive capabilities. T o bridge the gaps b etw een mo dalities, the Im- ageBind framework proposed by [49] constructs a joint cross-mo dal em b edding space through contrastiv e learning. By leveraging images as a bridge, it enables agen ts to achiev e emergent semantic alignment and multi-source correlation even without sp ecific paired data. F urthermore, for complex dynamic in- teractions, Zhou et al. [150] design Audio-Visual Segmenter (A VS) to enhance low-lev el feature fusion of audio-visual mo dalities. It employs pixel-lev el sync hronous attention to significantly improv e spatial lo calization and seman tic binding of sound-pro ducing entities in noisy environmen ts. This unified mech- anism ensures that m ulti-dimensional signals are transformed into compressed sym b olic represen tations without loss of critical information, pro viding robust input for subsequen t decision-making. High-lev el Symbolic Abstraction F ollo wing initial feature extraction, agen ts m ust transform con tin- uous feature vectors in to high-level symbols with logical meaning. In unsup ervised learning, the DINOv2 mo del prop osed in [151] extracts highly discriminativ e visual features through large-scale self-supervised ZHANG P , et al. Sci China Inf Sci 24 Figure 12 The pip eline of OccDepth for 3D semantic scene completion [47]. The framework leverages an OAD mo dule for depth enhancement and transforms 2D visual features into 3D semantic representations for environmental p erception. pre-training. This sc heme ov ercomes the constrain ts of human annotation, achieving robust sym b olic rep- resen tation of open-world entities. Additionally , given the imp ortance of spatial dimensions in embo died in teraction, Miao et al. [47] develop the OccDepth strategy as illustrated in Figure 12. It in tegrates depth geometry with seman tic lab els, and this joint mo deling facilitates a transition from 2D pixel-level analysis to 3D spatial seman tic occupancy symbols. It not only enhances the structural in tegrity of p erception but also provides a solid ph ysical foundation for subsequen t autonomous navigation and obstacle a v oidance planning in 3D environmen ts. 4.1.2 T ask-Oriente d Envir onmental Sensing Emphasizing the sub jective initiativ e of p erception, agents dynamically allo cate computational resources based on their current inten t to achiev e selective extraction of k ey semantic information and conflict detection. Goal-Conditioned Selectiv e Atten tion In physical environmen ts with high levels of interference, agen ts must dynamically filter high-v alue information to reduce cognitive load. The Flamingo arc hitec- ture prop osed in [152] addresses this by in tro ducing a P erceiver Resampler that samples visual features through learned queries via cross-atten tion. This mechanism enables the agen t to concentrate on task- relev ant spatial and semantic dimensions, ac hieving substantial dimensionality reduction. In contrast, the F o cus-Agen t framework in tro duced b y Zhang et al. [153] draws inspiration from biological vision b y em ulating the fo veal sampling principles of the h uman retina. It equips agen ts with in ten t-driven local fine p erception, allo wing them to accurately capture high-v alue details during macro-level logical reasoning with minimal computational cost. This approach effectively mimics human attention allo cation strategies in task-specific con texts, significan tly enhancing the perceptual autonom y and initiativ e of agents. In ten t-Driv en Anomaly Detection Ensuring consistency b et ween environmen tal information and in ternal logic is vital, particularly for identifying anomalies that deviate from task inten ts. The Inner Monologue framework designed b y Huang et al. [154] emplo ys LLMs to p erform closed-lo op logical rea- soning on p erceiv ed semantics. By generating contin uous in ternal feedback, the framework monitors conflicts betw een execution states and inten ts, identifying logical inconsistencies suc h as “missing target ob jects”. F or em b o died agents, Le et al. utilize datasets lik e RflyMAD [155] to train models that enable unmanned aerial v ehicles (UA Vs) to p erform seman tic-level anomaly recognition and collision warning based on kinematic laws. This inten t-driven detection enhances self-correction capabilities and elev ates p erception from passive state recording to active logical verification, marking a significant step to ward ac hieving high autonomy in embo died in telligence. 4.1.3 Semantic Obje ct Gr ounding and T r acking Seman tic ob ject grounding and tracking address the real-time mapping b et ween abstract semantic sym- b ols and concrete physical entities, enabling agents to maintain contin uous and consistent atten tion to sp ecific targets throughout complex mobile interactions. Zero-shot Ob ject Lo calization A core challenge lies in binding abstract linguistic symbols with concrete ph ysical entities in real time. Grounding DINO prop osed in [50] achiev es feature alignmen t ZHANG P , et al. Sci China Inf Sci 25 b et ween natural language and op en-domain visual en tities through deep cross-attention mechanisms, allo wing agents to precisely lo cate targets using only text prompts. T o address temporal ev olution in video streams, Ravi et al. [156] prop ose the SAM 2 that extends zero-shot segmentation to in teractive scenarios b y incorp orating a memory atten tion mo dule. This enables agen ts to contin uously and accurately segmen t and track sp ecific entities despite sev ere deformations, viewp oin t c hanges, or brief o cclusions, granting exceptional generalization and adaptability in unkno wn en vironments. Spatio-temp oral Semantic Consistency In dynamic environmen ts, agents m ust maintain consis- ten t semantic entit y recognition to preven t identit y loss due to p ersp ectiv e shifts or o cclusions. The LISA framework provided in [157] integrates embedded LLM reasoning to help agen ts in terpret complex implicit instructions and preserv e seman tic con tinuit y ov er time, effectiv ely providing a tracking pip eline with “reasoning memory”. Concurren tly , Y ang et al. [158] design the T rack-An ything scheme that com- bines adv anced in teractive segmentation with long-term asso ciation algorithms. F rom a visual-spatial p erspective, it ensures stable tracking of core semantic en tities even in complex scenes inv olving high- sp eed motion or non-rigid deformations. Main taining such spatio-temporal consistency marks the agent’s ev olution from fragmented instan taneous p erception to robust, con tinuous semantic trac king. 4.1.4 Emb o die d Semantic Envir onment Understanding Em b odied semantic environmen t understanding inv olv es the agent’s understanding of the ov erall scene logic, which guides long-term embo died interactiv e b eha viors by constructing top ological scene graphs or predicting ph ysical world evolution. Seman tic Scene Graph Generation Adv anced p erception necessitates a shift from isolated entit y recognition to macro-topological understanding. In the con text of spatial-seman tic fusion, ConceptF usion prop osed in [159] demonstrates significan t inno v ation b y injecting op en-v o cabulary semantics from foun- dation models in to 3D reconstructions at the pixel lev el. This generates global graphs im bued with spatial logic, allowing agents to “read” functional attributes and ob ject affordances. F urthermore, Rosinol et al. [48] propose S-Graph, a b ottom-up mec hanism for hierarc hical scene graph generation. By aggregating lo cal geometric primitiv es and global top ological semantics, it enables agents to autonomously construct m ulti-level environmen tal KBs in large-scale complex scenes. Such structured representation serv es as a cornerstone for planning long-term, complex tasks. W orld Mo del-based Predictive Perception T o endow agen ts with essential foresight, generative w orld model arc hitectures ha v e been increasingly applied to p erception. The GAIA-2 autonomous driving mo del prop osed in [160] identifies physical semantic patterns within massive tra jectory data, thereby enabling it to predict the future inten ts and paths of traffic participants. It significan tly enhances safet y redundancy . Moreov er, inspired by architectures such as Sora, W ang et al. [161] dev elop a w orld sensing sc heme. Po wered b y complex diffusion mo dels, it go es b ey ond understanding static features to sim ulate global physical evolutions for subsequen t moments. This represen ts a fundamental transformation of p erception systems from passiv e recorders in to activ e predictive engines capable of an ticipating ph ysical la ws, pro viding agents with the abilit y to foresee the future and mitigate risks. 4.2 Seman tic-based Memory Stage The memory stage enables agents to break information silos and achiev e lifelong contin uous learning. By transforming transien t high-dimensional information briefly captured in the p erception stage in to a highly structured, long-term queryable underlying KB, semantic-based memory provides solid prior background supp ort for agen ts’ long-term planning decisions. As illustrated in T able 9, we summarize represen tative approac hes for the semantic-based memory stage, including hierarchical semantic memory structures, seman tic retriev al and reasoning, memory ev olution and kno wledge up date, and cognitive augmentation via memory . 4.2.1 Hier ar chic al Semantic Memory Structur es The semantic-based memory is systematically divided in to w orking semantic memory for immediate logical processing and long-term knowledge consolidation for p ersisten t learning. ZHANG P , et al. Sci China Inf Sci 26 T able 9 Representativ e approaches for the semantic-based memory stage Category Sub-Category Descriptions Hierarchical Semantic Memory Structures W orking Semantic Memory Maintains short-term context through recurrent memory mechanisms or high- frequency observ ation buffers [51, 52]. Long-term Knowledge Consolidation Integrates exp eriences across tasks and preven ts catastrophic forgetting via generative replay or systematic knowledge banks [55, 162]. Semantic Retriev al and Reasoning V ector-based Similarity Search Enables high-sp eed ANN search in high-dimensional semantic spaces using HNSW indexing and pro duct quantization [163, 164]. Logic-driven Asso cia- tive Retrieval W ak es up implicit knowledge through multi-hop K G-RAG reasoning or rela- tional memory-based attention [56, 165]. Memory Evolution and Knowledge Update Online Knowledge Base Updating Synchronizes internal logic with ph ysical reality via incremental scene graphs and LLM-based self-correction [154, 166]. Semantic F orgetting and Pruning Optimizes storage efficiency through utility-based importance scoring or se- mantic compaction in to abstract vectors [52, 53]. Cognitive Augmentation via Memory Contextual Augmenta- tion for Reasoning Enhances current decisions via self-reflective retriev al-generation or dual in- struction tuning [57, 167]. Experience-driven Planning Optimizes future action sequences through failure-based self-reflection or reasoning-as-planning search [54, 168]. W orking Seman tic Memory W orking seman tic memory serves as the agen t’s “cac he” during activ e tasks, maintaining short-term contextual consistency . T o mitigate information loss in long sequences, the RMT architecture prop osed in [51] introduces recursive memory tokens, enabling agents to retain critical transien t semantics across sequences of up to a million tok ens. Additionally , for high-frequency semantic fragmen ts, Park et al. [52] dev elop the memory stream buffer, which offers an efficient experiential buffer- ing mechanism. This mimics biological short-term memory , allowing agen ts to extract features directly from the cac he during complex in teractions without repeatedly accessing long-term storage, thereb y en- abling low-latency resp onses. This hierarchical caching significan tly improv es b oth reaction sp eed and logical consistency in real-time tasks. Long-term Kno wledge Consolidation Long-term memory transforms fragmen ted experiences accu- m ulated o v er the agen t’s lifecycle into systematic common sense. As illustrated in Figure 13, the Mem0 framew ork prop osed in [55] represents a notable adv ance in this direction, enabling robust cross-session and cross-task memory integration that builds p ersonalized seman tic assets ov er time. T o address the c hallenge of catastrophic forgetting, Shin et al. [162] design the deep generative replay architecture, whic h trains a “sc holar” model to replay historically v aluable seman tic information. This allo ws agen ts to con- solidate existing knowledge while acquiring new skills without relying on stored raw data. This approach balances knowledge acquisition and retention, enabling the agent to ev olve from a simple executor into a lifelong learner. 4.2.2 Semantic R etrieval and R e asoning Seman tic retriev al and reasoning inv estigate how agen ts accurately extract highly relev ant knowledge from v ast historical memory repositories, thereby achieving on-demand instant activ ation. V ector-based Similarit y Search F or engineering implemen tation of massiv e unstructured kno wledge matc hing, agents rely heavily on efficient v ector retriev al. Malk o v et al. [163] prop ose the Hierarchical Na vigable Small W orld (HNSW) algorithm, whic h emplo ys multi-la yer graph indices to enable fast Ap- pro ximate Nearest Neigh b ors Searc h (ANNS) in billion-scale vector spaces, effectively breaking latency b ottlenec ks in long-term memory access. In large-scale deploymen t, the F aiss framework proposed in [164] further reduces memory fo otprin t through GPU-parallel computing and pro duct quantization, while also incorp orating inten t-w eighting strategies to enhance retriev al intelligence. This ensures that retriev ed historical exp eriences are not only mathematically similar to queries but also logically aligned with the curren t functional execution context. Suc h efficien t retriev al serv es as the technological foundation for on-demand kno wledge activ ation. ZHANG P , et al. Sci China Inf Sci 27 Figure 13 Architectural ov erview of the Mem0 system [55], illustrating the semantic extraction and memory up date phases. The system pro cesses historical con text to generate new memories, which are then ev aluated against existing records and refined via a T ool Call mec hanism b efore b eing stored in the central database. Logic-driv en Associative Retriev al Unlike simple spatial distance comparisons, associative retriev al emphasizes deep logical connections b et ween kno wledge fragments. Sanmartin et al. [56] in tegrates KG top ologies with retriev al-augmented generation (RAG), enabling agents to p erform multi-hop reason- ing along s eman tic edges and retrieve laten t knowledge p oin ts. Similarly , relational memory netw orks prop osed in [165] construct dynamic associative matrices via self-attention mec hanisms. This approach a voids fixed physical memory addressing, instead computing in teractions b et ween memory units in real time based on attention. It sim ulates the brain’s non-linear asso ciativ e process—driven b y semantic re- latedness rather than ph ysical distance. This logic-driven mechanism enhances retriev al precision and equips agen ts with sup erior common-sense in tegration and asso ciativ e reasoning capabilities in complex con texts. 4.2.3 Memory Evolution and Know le dge Up date Memory systems m ust function as dynamic, living structures rather than static repositories. Memory evo- lution and kno wledge update examine how agen ts maintain accuracy , timeliness, and retriev al efficiency in large-scale KBs under limited computational resources, achiev ed through con tinuous online streaming learning and selective pruning of redundan t information. Online Knowledge Base Up dating Giv en the rapid c hanges in physical environmen ts, agen ts m ust con tinuously adjust internal logical structures to maintain relev ance. Hughes et al. [166] prop ose the Hy- dra framew ork, which enables robots to dynamically update seman tic nodes and w eights based on stream- ing data without triggering global restructuring, ensuring the KB remains synchronized with real-w orld conditions. When p erceptual input conflicts with existing experience, the Inner Monologue mec hanism prop osed in [154] pro vides a closed-lo op error correction to ol. It leverages LLMs as logical discrimina- tors to ev aluate perceptual confidence against memory consistency , autonomously correcting erroneous en tries and iterating obsolete task knowledge. This online ev olution enables agen ts to flexibly adapt to en vironmental changes and achiev e real-time cognitiv e updates during long-term deplo yment. Seman tic F orgetting and Pruning T o preven t memory o verload caused by redundant informa- tion, in telligent semantic forgetting and compression mechanisms are essential. The generative agent framew ork proposed in [52] emplo ys a utility-deca y pruning mo del that scores semantic fragments based on importance, relev ance, and recency , thereb y remo ving lo w-v alue data and main taining a ligh t weigh t KB. Alternatively , the LongMem architecture proposed in [53] explores semantic compaction by using memory-augmen ted T ransformers to enco de concrete exp eriences in to abstract, compact knowledge vec- tors. This preserves core logical con ten t while significan tly reducing computational ov erhead and cache burden during long-term storage and retriev al. T ogether, these pruning and compression mec hanisms ZHANG P , et al. Sci China Inf Sci 28 dra w inspiration from biological forgetting principles, substantially enhancing agent efficiency in pro cess- ing large-scale, long-duration tasks. 4.2.4 Co gnitive Augmentation via Memory By seamlessly integrating historical exp eriences into current reasoning pip elines and future execution blueprin ts, agen ts can ac hiev e highly context-a w are and foresighted decision-making capabilities. Con textual Augmentation for Reasoning The core function of con textual augmentation is to en- hance ongoing reasoning and decision-making. The Self-RAG framew ork prop osed by Asai et al. [57] in tro duces “reflection tok ens,” enabling agents to autonomously determine when retriev al is needed, as- sess the qualit y of retriev ed information, and ensure factual accuracy . Th us, it impro v es decision rigor and self-consistency . Building on this, Lin et al. [167] prop osed RA-DIT to enable agen ts to in tegrate bac kground kno wledge to resolv e perceptual ambiguit y . This allo ws agents to disam biguate incomplete or noisy p erceptual inputs through semantic reasoning, arriving at globally optimal judgments. This shift from passive p erception to augmented reasoning renders agents more robust and rational when confron ting unkno wn challenges. Exp erience-driv en Planning By learning from historical successes and failures, agen ts can achiev e foresigh ted long-term task planning. The Reflexion architecture prop osed in [54] endo ws agen ts with self-reflection capabilities, enabling them to analyze past failures and optimize future actions without h uman interv ention, thereb y av oiding rep eated errors. Additionally , heuristic search planners such as RAP proposed in [168], transform optimal memory sequences into internal w orld mo dels. By lev eraging exp erienced LLMs as sandboxes, they sim ulate state transitions within memory to distill optimal paths, whic h then serv e as templates for generating new complex tasks. This experience-driven planning estab- lishes a closed-lo op learning pro cess, significantly enhancing b oth execution efficiency and success rates in complex physical tasks. 4.3 Seman tic-based Reasoning Stage In the reasoning stage, semantics enable agents to p erform analysis, planning, and logical deduction by in tegrating p erceiv ed information with stored knowledge to produce actionable decisions. As illustrated in T able 10, w e review representativ e approaches across fiv e dimensions: CoT reasoning, KG-augmen ted reasoning, retriev al-augmented reasoning, tree-structured m ulti-path reasoning, and neuro-symbolic rea- soning. 4.3.1 Chain-of-Thought R e asoning CoT reasoning enables agen ts to generate step-b y-step in termediate reasoning, thereb y decomposing complex problems into manageable comp onen ts. This approach significan tly enhances p erformance on tasks that require multi-step logical deduction. Representativ e metho ds include step decomposition and path v oting. Step Decomp osition Standard CoT prompting improv es LLM reasoning b y instructing mo dels to articulate intermediate reasoning steps prior to deliv ering a final answ er. W ei et al. [169] demonstrate that simply prompting mo dels to reason step by step yields substan tial p erformance gains across arithmetic, commonsense, and symbolic reasoning b enc hmarks, with the most pronounced improv ements observed at larger mo del scales. Extending this paradigm to zero-shot settings, Ko jima et al. [170] show that app ending the phrase “Let’s think step b y step” suffices to elicit robust m ulti-step reasoning without task- sp ecific exemplars, establishing structured intermediate reasoning as a general and transferable capabilit y . F urthermore, Zhou et al. [171] in tro duce least-to-most prompting, which decomp oses a problem into a sequence of progressively simpler sub-problems and solv es them in order. P ath V oting Single-path CoT is susceptible to the brittleness of greedy deco ding, wherein an early error propagates and amplifies through subsequent steps. W ang et al. [172] address this limitation b y sampling div erse reasoning paths and selecting the final answ er via ma jorit y v ote. This approac h achiev es p erformance improv emen ts exceeding 17 p ercentage p oin ts ov er greedy CoT on b enc hmarks suc h as ZHANG P , et al. Sci China Inf Sci 29 T able 10 Representativ e approaches for the semantic-based reasoning stage Category Sub-Category Descriptions Chain-of-Thought Reasoning Step Decomp osition Elicits step-by-step intermediate reasoning from LLMs [169, 170]; least- to-most prompting further enables progressive sub-problem decomp osi- tion without task-sp ecific exemplars [171]. Path V oting Samples div erse reasoning paths and selects answ ers b y ma jorit y vote [172]; complexity-based prompting biases sampling to ward higher- quality reasoning c hains [173]. KG-Augmen ted Reasoning Graph-LM F usion F uses LM representations with GNN propagation o ver K G sub- graphs [174]; GreaseLM extends this with m ulti-layer interlea ved LM- graph fusion [175]. Iterative KG Query Decomposes queries in to sequen tial structured reading op erations over KGs [58]; Graph of Thoughts further enables non-linear multi-path rea- soning [176]. Retriev al- Augmented Reasoning Retriev al-Conditioned Gener- ation Combines parametric generation with dense dual-enco der retriev al ov er large document indices to inject up-to-date external kno wledge [177, 178]. Multi-Doc F usion Pre-trains retriev al end-to-end with the language model [179]; F usion- in-Decoder aggregates evidence from multiple retrieved passages at the decoder [180]. T ree-Structured Multi-Path Reasoning T ree Search F rames reasoning as deliberate tree search with backtrac king [181]; LA TS integrates MCTS with real-environmen t observations for dynamic re- planning [182]. Process Rew ard Assigns dense step-level scores to intermediate reasoning steps [59]; process-based feedback outp erforms outcome sup ervision as solution complexity grows [183]. Neuro-Symbolic Reasoning Program-Aided Reasoning Offloads computation to a program interpreter [184]; Program of Thoughts further decouples multi-step numerical reasoning from seman- tic parsing [185]. Logic-Symbolic Execution T ranslates problems into formal logical representations executed b y sym- bolic solvers [186]; NS-CL jointly learns visual concepts and sym b olic reasoning under weak sup ervision [187]. GSM8K and StrategyQA. T o further enhance the quality of sampled reasoning chains prior to voting, F u et al. [173] prop ose complexit y-based prompting, which biases sampling to w ard chains with greater reasoning depth and substantially outp erforms uniform random sampling on multi-step benchmarks. 4.3.2 K G-Augmente d R e asoning K G-augmented reasoning grounds inference in structured relational knowledge, allowing agen ts to reason o ver factual information that extends beyond the static knowledge enco ded in mo del parameters. When agen ts must op erate under rapidly evolving conditions, such as real-time net w ork topology changes or emerging cross-domain task requiremen ts, access to external structured knowledge b ecomes essen tial for main taining reasoning accuracy and consistency . Graph-LM F usion Graph-LM integrates language model (LM) represen tations with graph neural net- w ork (GNN) propagation to better ground textual reasoning in structured K Gs. As illustrated in Figure 14, Y asunaga et al. [174] construct a working graph by linking question-answ er entities to relev ant KG subgraphs and perform bidirectional message passing b et w een LM contextual representations and GNN no de em b eddings. This allows relational structure to mo dulate text-based reasoning directly , rather than treating the K G as a static lo okup table. Extending this in tegration to ac hieve tighter cross-mo dal fusion, Zhang et al. [175] prop ose GreaseLM, whic h inserts dedicated interaction lay ers at m ultiple intermediate enco der phases. This design enables structured relational con text to progressiv ely refine token represen ta- tions throughout enco ding, yielding consisten t impro vemen ts on commonsense and biomedical reasoning b enc hmarks. Iterativ e KG Query Complex queries often require syn thesizing information across multiple hetero- geneous structured sources, making iterative in teraction essential. Jiang et al. [58] introduce StructGPT, ZHANG P , et al. Sci China Inf Sci 30 Figure 14 Overview of the QA-GNN reasoning framew ork [174], illustrating the joint graph construction b y connecting the LLM- encoded QA context with the retrieved knowledge graph to perform con text-conditioned reasoning and probability scoring. whic h provides LLMs with a unified in terface of sp ecialized reading functions cov ering KGs, relational ta- bles, and do cumen t databases. The framew ork decomp oses complex queries into sequential sub-op erations that alternately read from these sources and up date the reasoning state. T o further supp ort non-linear reasoning, Besta et al. [176] propose the Graph of Though ts (GoT) framework, which represen ts reasoning outputs as no des and their logical in terdep endencies as directed edges in an arbitrary graph top ology , enabling aggregation of partial results, iterative no de refinemen t, and non-linear bac ktracking for complex m ulti-agent task planning. 4.3.3 R etrieval-Augmente d R e asoning Retriev al-augmented reasoning addresses the scalability limitations of static K Gs by retrieving relev ant evidence on demand from large unstructured corpora, enabling agen ts to remain curren t with con tin uously expanding information environmen ts. Retriev al-Conditioned Generation RAG conditions a parametric sequence-to-sequence mo del on retriev ed passages to inject external kno wledge into the generation process. Lewis et al. [177] formalize this paradigm b y retrieving top- k passages from a dense document index and conditioning generation on these results, ac hieving state-of-the-art p erformance on m ultiple op en-domain question answering b enchmarks while retaining the abilit y to up date kno wledge without full mo del retraining. T o substantially improv e retriev al quality , Karpukhin et al. [178] introduce Dense Passage Retriev al (DPR), which replaces sparse T erm F requency-Inv erse Do cumen t F requency (TF-IDF) retriev al with dense dual-enco der representations trained on question-passage pairs via contrastiv e learning, consistently outp erforming BM25 baselines by large margins. Multi-Do c F usion A fundamen tal limitation of p ost-hoc retriev al is that mo del represen tations are not inherently trained to b enefit from retrieved evidence. Guu et al. [179] address this gap through Retriev al-Augmented Language Model pre-training (REALM), which em b eds a latent kno wledge retriev er directly in to the masked language model pre-training ob jective. This approach trains the retriev er end- to-end alongside the language mo del backbone, jointly optimizing represen tations for b oth language understanding and knowledge retriev al. F or settings requiring m ulti-do cumen t evidence synthesis, Izacard and Grav e [180] prop ose FiD, which pro cesses each retrieved passage indep endently through a shared enco der and fuses all representations collectively at the deco der via cross-attention. It substan tially outp erforms single-passage conditioning and establishes a robust evidence integration mechanism for complex seman tic reasoning tasks. 4.3.4 T r e e-Structur e d Multi-Path R e asoning T ree-structured multi-path reasoning reform ulates reasoning as a delib erate tree searc h, o vercoming the fundamen tal limitation of linear reasoning: each step commits irrevocably to a single direction with no mec hanism for recov ery from early errors. T ree Search T ree of Thoughts (T oT) enables agents to explore multiple comp eting reasoning paths through systematic tree searc h. Y ao et al. [181] design a framework in which, at eac h node, the model gen- erates m ultiple candidate thought contin uations, ev aluates their promise using self-ev aluation heuristics, and applies Breadth-First Search (BFS) or Depth-First Search (DFS) to na vigate the reasoning space, substan tially outp erforming b oth standard CoT and self-consistency deco ding, with the p erformance gap ZHANG P , et al. Sci China Inf Sci 31 widening as problem difficulty increases. T o unify tree-structured reasoning with in teractive en viron- men t feedback, Zhou et al. [182] propose Language Agen t T ree Searc h (LA TS), whic h integrates Mon te Carlo T ree Searc h (MCTS) with LLM-based state ev aluation and real-world observ ations. LA TS enables systematic bac ktrac king, lo ok ahead v alue estimation, and dynamic replanning in resp onse to execution outcomes. Pro cess Rew ard T ree search in tro duces a critical c hallenge: unreliable heuristics for ev aluating can- didate branc hes can lead to systematic exploration of low-qualit y subtrees. Lightman et al. [59] address this through Pro cess Rew ard Mo dels (PRMs), which assign dense scalar scores to individual in termedi- ate reasoning steps. Through large-scale human annotation, they demonstrate that step-lev el sup ervision substan tially outp erforms outcome-sup ervised mo dels in b est-of-N candidate selection. Offering a com- plemen tary p erspective, Uesato et al. [183] systematically compare pro cess-based and outcome-based feedbac k, showing that pro cess-lev el sup ervision delivers a more precise and sample-efficient training signal, with the p erformance gap widening as solution complexity increases. 4.3.5 Neur o-Symb olic R e asoning Neuro-sym b olic reasoning addresses the inherent unreliabilit y of token-lev el computation for arithmetic and logical tasks by delegating formal computation to deterministic external systems while retaining LLMs for natural language comprehension and high-level semantic understanding. Program-Aided Reasoning Program-Aided Reasoning (P AL) separates semantic understanding from formal computation by offloading symbolic op erations to a deterministic program interpreter. Gao et al. [184] instruct LLMs to translate natural-language reasoning problems into executable Python pro- grams, achieving state-of-the-art results on mathematical and symbolic reasoning b enc hmarks by elim- inating arithmetic errors and logic inv ersions that cannot b e resolved through scaling or prompting alone. Extending this to more expressiv e computation structures, Chen et al. [185] prop ose Program of Though ts (PoT) prompting, which generates programs containing explicit iterative and recursive compu- tation patterns to decouple complex multi-step arithmetic from semantic parsing, achieving further gains on financial, scientific, and mathematical b enc hmarks. Logic-Sym b olic Execution While program syn thesis effectively handles numerical computation, it do es not enforce logical consistency across multi-step relational reasoning. P an et al. [186] prop ose Logic-LM, whic h translates natural-language problems into formal logical represen tations encompassing first-order logic and constrain t satisfaction, submitting them to dedicated sym b olic solv ers for guaran teed- correct execution. An LLM-based self-refinemen t loop iteratively corrects translation errors based on solv er feedbac k. Complementing this sc heme, Mao et al. [187] prop ose Neuro-Symbolic Concept Learner (NS-CL), whic h join tly learns visual concepts, semantic language parsing, and symbolic program execu- tion within a unified architecture under w eak supervision, achieving strong systematic generalization b y disen tangling perceptual primitive learning from compositional reasoning. 4.4 Seman tic-based Action Stage T raditional action metho ds are heavily relian t on rigid API formats, where even minor deviations in instruction can cause system failure. In con trast, semantic-enhanced agen ts understand user inten t and automatically map it to the required API parameters. Moreov er, if an error o ccurs during execution, seman tic analysis can diagnose the cause and adjust the strategy in real time. As sho wn in T able 11, w e ev aluate representativ e approaches across five dimensions: semantic to ol acquisition, reasoning-action in terleaving, m ulti-agent collab orativ e action, semantic self-correction, and reinforcement-based seman tic feedbac k. 4.4.1 Semantic T o ol A c quisition Seman tic to ol acquisition enables agents to autonomously determine when and ho w to inv oke external to ols, replacing man ually crafted rules with learned tool selection and inv o cation. ZHANG P , et al. Sci China Inf Sci 32 T able 11 Representativ e approaches for the semantic-based action stage Category Represen tative Schemes Descriptions Semantic T o ol Acquisition T oolformer T rains LLMs to decide autonomously when and how to call external tools via self-sup ervised annotation filtering, without large-scale h uman annotation [188]. T oolLLM Constructs a rep ository of 16,000+ real-world REST APIs and trains LLMs via DFSDT inference to in voke previously unseen APIs through instruction tuning [189]. Reasoning-Action Interlea ving ReAct Interlea ves verbal reasoning traces and external environmen t actions within the LLM output stream, enabling dynamic re-planning based on real-time observ ation feedbac k [190]. HuggingGPT Uses an LLM as a con troller to parse requests, select exp ert models from a mo del hub, execute them in dep endency order, and synthesize results for complex multi-modal tasks [191]. Multi-Agent Collaborative Action MetaGPT Encodes softw are engineering workflo ws as structured SOPs and assigns role-specific agents to execute them collab oratively , substantially reduc- ing incoherence in multi-LLM outputs [192]. Generative Agents Equips LLM-driven agen ts with p ersisten t memory streams, reflection synthesis, and retriev al-based planning, enabling emergent so cial behav- iors across extended multi-agen t interactions [52]. Semantic Self-Correction Reflexion Reinforces language agen ts via verbal self-assessment stored in episo dic memory , prep ended to subsequen t task contexts without an y gradient updates [54]. Self-Refine Employs a single LLM as generator, critic, and refiner in an iterative loop, consistently improving outputs across div erse tasks without addi- tional training [60]. Reinforcement- Based Semantic F eedbac k InstructGPT Aligns LLMs with human preferences via sup ervised fine-tuning on demonstrations, reward mo del training on pairwise rankings, and PPO optimization [193]. Recursive Reward Mo deling Applies RLHF to long-form summarization, demonstrating that learned reward models can sup ervise tasks exceeding direct h uman evaluation capacity [194]. T o olformer T oolformer [188] allows LLMs to learn when and ho w to call external to ols through self- sup ervised learning. It first prompts the mo del to leverage candidate API call insertions within generated text, then retain only those calls whose returned results measurably reduce language mo deling loss on the surrounding context. Fine-tuning on this self-filtered dataset teaches the mo del to recognize when retriev al is genuinely b eneficial, rather than merely syn tactically plausible. Ev aluated across calculators, calendars, searc h engines, and translation APIs, T o olformer ac hieves competitive zero-shot p erformance without task-specific training. T o olLLM T oolformer is trained on a small, curated set of APIs and do es not scale to the breadth of to ols encountered in real-world deploymen ts. As depicted in Figure 15, Qin et al. [189] address this scalabilit y challenge by constructing T oolBench, a large-scale instruction-tuning dataset spanning o ver 16,000 real-world REST APIs across 49 categories. T o handle the combinatorial difficulty of selecting and sequencing calls across this tool space, T o olLLM in tro duces a depth-first search-based decision tree (DFSDT) inference strategy that explores multiple to ol-call paths and backtrac ks from failed branches, an explicit search mec hanism that mirrors the tree-structured reasoning. The resulting mo del generalizes effectiv ely to unseen APIs at inference time. In seman tic-based agen t communication netw orks serving heterogeneous v ertical industries, this zero-shot generalization capabilit y is essen tial for application agen ts in terfacing with industry-sp ecific net work managemen t APIs without requiring p er-API fine-tuning. 4.4.2 R e asoning-A ction Interle aving Reasoning-action in terlea ving dissolves the b oundary betw een in ternal planning and external execution, allo wing agen ts to dynamically replan based on real-time environmen tal feedback. ZHANG P , et al. Sci China Inf Sci 33 Figure 15 Overview of the T oolBench pip eline [189], illustrating the three-phase construction pro cess of data annotation and the dual-stage inference workflo w where an API retriever provides context for T oolLLaMA to execute m ulti-round reasoning. ReAct ReAct [190] interlea ves v erbal reasoning traces and environmen t-facing actions within the same tok en stream, enabling dynamic replanning in resp onse to new observ ations. The mo del alternates b et ween producing natural-language reasoning steps and issuing to ol calls or environmen t queries, with eac h observ ation immediately informing the next reasoning step. Sim ulation results show that, ReAct substan tially outp erforms b oth reasoning-only and acting-only baselines, and the transparency of its in terleav ed traces facilitates straightforw ard failure diagnosis. HuggingGPT While ReAct op erates as a single-agen t system, its throughput is limited when tasks require the co ordination of multiple sp ecialized capabilities. Shen et al. [191] address this limitation by in tro ducing HuggingGPT, whic h emplo ys an LLM as a meta-con troller. The con troller decomp oses user requests into structured task plans, selects the most appropriate exp ert mo del from a mo del hub for each subtask, executes them in dep endency order, and syn thesizes their outputs in to a unified resp onse. This framew ork effectively handles complex cross-mo dal tasks inv olving vision, sp eec h, video, and language that no single mo del can manage alone. 4.4.3 Multi-A gent Col lab or ative A ction Multi-agen t collaborative action shifts the fundamen tal unit of action from a single agent to a co ordinated team, addressing the coherence challenge of ensuring that multiple agents working in parallel produce m utually consisten t outputs and collectiv ely adv ance to ward shared goals. MetaGPT MetaGPT [192] encodes established h uman collab orativ e workflo ws as structured comm u- nication proto cols that go vern role-specific agent behavior. Sp ecifically , it formalizes softw are engineering standard op erating pro cedures (SOPs) in to structured message sc hemas: a product manager agen t decom- p oses requirements, an arc hitect agen t pro duces system designs, engineer agen ts implement comp onen ts, and a tester agen t v alidates outputs, with all communicating through structured schemas rather than free-form natural language. Simulation results sho w that, MetaGPT substantially reduces hallucination and incoherence compared to unconstrained m ulti-agent baselines. Generativ e Agents While MetaGPT relies on pre-sp ecified proto cols tailored to a specific task do- main, enabling coheren t collab orativ e b eha vior ov er extended timescales without suc h protocols presents a more general challenge. Park et al. [52] address this through a sandb ox sim ulation in whic h 25 LLM- driv en agents interact ov er an extended p eriod. Each agent is equipp ed with a memory stream that logs exp eriences as natural-language observ ations, a reflection mechanism that p eriodically synthesizes high-lev el insights from retriev ed memories, and a planning mo dule that translates those insights in to b e- ha vioral in tentions. Despite the absence of explicit coordination instructions, the agents exhibit emergen t relationship formation, information propagation, and collaborative even t organization. 4.4.4 Semantic Self-Corr e ction Seman tic self-correction enables agen ts to diagnose execution failures, extract generalizable lessons, and impro ve subsequent attempts without modifying mo del w eights, thereb y supporting autonomous op era- tion in resource-constrained environmen ts. ZHANG P , et al. Sci China Inf Sci 34 Reflexion Reflexion [54] introduces a form of verbal reinforcemen t learning that enables agent self- impro vemen t without gradient up dates. After each task attempt, the agen t generates a written self- assessmen t identifying failure p oin ts and impro vemen t strategies, stores this reflection in an episo dic memory buffer, and prep ends it to the context in subsequent attempts. On AlfW orld, Hotp otQA, and HumanEv al, Reflexion substantially outp erforms baseline agents. Besides, on programming tasks, it approac hes the p erformance of GPT-4. Self-Refine While Reflexion addresses failures at the episo de level space,many tas ks b enefit from fine- grained correction within a single execution. Against this background, Madaan et al. [60] prop ose Self- Refine, a sc heme that employs a single LLM in three distinct roles within an iterative lo op: generator, critic, and refiner. Given an initial output, the mo del critiques it along task-relev an t dimensions and revises it accordingly , contin uing until a termination condition is met. Ev aluated across seven tasks including co de generation, text rewriting, and mathematical reasoning, Self-Refine consisten tly impro ves o ver single-pass baselines without additional training. 4.4.5 R einfor c ement-Base d Semantic F e e db ack Reinforcemen t-based seman tic feedbac k offers a training-time mec hanism for aligning agen t behavior with target ob jectiv es, complementing the inference-time self-correction approaches discussed ab o v e. InstructGPT InstructGPT [193] demonstrates the effectiveness of aligning LLMs with human pref- erences through a three-stage pip eline. It combines sup ervised fine-tuning on human demonstrations, rew ard mo del training on pairwise preference rankings, and proximal p olicy optimization (PPO), pro- ducing mo dels that human ev aluators substan tially prefer ov er muc h larger unaligned mo dels across a wide range of tasks. The rew ard model acts as a learned pro xy for h uman judgmen t, capturing ev aluation criteria that are difficult to specify as explicit loss functions. Recursiv e Reward Mo deling While InstructGPT op erates on short-form tasks where h uman ev alu- ation is relatively tractable, the c hallenging application lies in tasks whose outputs exceed direct human assessmen t capacity . T o address this, Stiennon et al. [194] train a reward mo del on pairwise human preferences ov er summaries and optimize a p olicy via PPO against this learned reward. Sim ulation re- sults verify that a learned rew ard mo del can provide reliable training signal for tasks exceeding direct ev aluation capacit y . 5 AI Agen ts for SemCom Net works This section outlines four representativ e types of AI agents for SemCom netw orks: em b o died agents, comm unication agents, netw ork agen ts, and application agents. T able 12 illustrates representativ e AI agen ts. 5.1 Em b odied Agent • Sa yCan: Go ogle prop oses the SayCan [195], an agent that in tegrates the high-level semantic plan- ning capabilities of LLMs with the learned affordances of a rob ot’s pre-trained skills. By grounding LLM outputs in feasible actions that align with b oth the rob ot’s physical capabilities and the current environ- men t, SayCan addresses key limitations of LAMs in real-world deplo yment: the lac k of grounding, the generation of infeasible instructions for rob ots, and the difficulty of completing embo died long-horizon tasks describ ed in abstract natural language. In real-world ev aluations, Sa yCan demonstrates strong plan- ning and execution success rates, nearly doubling the performance of non-grounded baseline mo dels. It also supp orts zero-shot execution of long-horizon instructions, accommo dates new skill additions, enables CoT reasoning and m ultilingual queries, and maintains in terpretability throughout the decision-making pro cess. • A tlas: Boston Dynamics, in collaboration with the T o yota Research Institute, has developed Large Beha vior Mo dels (LBMs) for the Atlas humanoid rob ot 1) . T rained on human demonstration data, the 1) https://bostondynamics.com/pro ducts/atlas/ ZHANG P , et al. Sci China Inf Sci 35 T able 12 Representativ e AI agen ts for SemCom networks Category Pro ject Compan y Release Date Open Source Embodied Agen t SayCan Google 2022 ✓ Atlas Boston Dynamics and T o yota Research Institute 2024 × Communication Agent Semantic-driv en AI Agen t Communication UESTC 2025.10 × Agentic AI-enhanced SemCom BUPT 2025.12 × ChannelGPT BUPT 2024 × UniClaw China Unicom 2026.3 × Netw ork Agent RAN Agent Huaw ei 2026.3 × Agentic AI for RAN UPM 2025.11 × JoinAI-Agent China Mobile 2026.2 ✓ Xingchen Sup er Agent China T elecom 2025.9 × Application Agent General Agent Gemini 3 Google 2025.11 × Doubao-Seed-2.0 ByteDance 2026.02 × Smart F actory Industrial Copilot Siemens 2024 × Joyindustrial JD 2025.5 × Smart Healthcare Sully .ai Sully .ai 202 5.6 × Hippo cratic AI Hippo cratic AI 2024 ✓ Smart City NVIDIA Blueprint for smart city AI NVIDIA 2025.6 × City Intelligent Agent Solution iSST ec h and Huaw ei 2026.3 × Intelligen t T ransportation T rafficGo Hua wei Cloud 2019 × NaviAgen t Amap 2025.6 ✓ mo del employs a univ ersal neural net w ork and an end-to-end AI con trol strategy , eliminating the need for man ual programming to adapt to sp ecific scenarios. Atlas is capable of autonomously performing a v ariety of tasks, including running, jumping, obstacle crossing, and complex ob ject manipulation. It can rapidly and intelligen tly recov er from unexp ected disturbances such as b eing pushed, and achiev es mov emen t sp eeds 1.5 to 2 times faster than h umans, significan tly enhancing the flexibilit y and adaptabilit y of task execution. 5.2 Comm unication Agen t • Seman tic-driv en AI Agent Comm unication: Y u et al. [35] prop ose a semantic-driv en AI agent comm unication framework comprising three stages: p erception-a ware semantic sampling, join t semantic- c hannel co ding, and semantic resource orchestration. They develop three core enabling technologies: seman tic adaptiv e transmission based on sample fine-tuning, seman tic ligh tw eigh t transmission inte- grating pruning, quantization, and partial sampling, and semantic self-evolution con trol emplo ying a distributed multi-timescale hierarchical deep reinforcement learning method. Simulation results across three t ypical scenarios, including edge-to-edge, edge-to-BS, and multi-agen t communication netw orks, demonstrate that the prop osed scheme ac hieves faster conv ergence and greater robustness compared to traditional metho ds, with the distributed hierarchical optimization approac h significan tly outperforming con ven tional decision-making sc hemes. • Agen tic AI-enhanced SemCom: Gao et al. [14] prop ose a unified agentic AI-enhanced SemCom framew ork consisting of an application la yer, a semantic la yer, and a cloud-edge collaboration la yer. They also design the agentic KB-JSCC sc heme, in which the source kno wledge base is constructed by LLM and L VM agents, while the channel knowledge base is implemented b y reinforcement learning agen ts. This ZHANG P , et al. Sci China Inf Sci 36 solution addresses limitations of traditional bit transmission, which cannot adapt to new demands such as m ulti-agent collab oration in 6G scenarios, and ov ercomes defects in conv entional SemCom, including limited representational capacity , index fragility at low SNR, insufficient multi-modal feature fusion, and lac k of channel adaptivit y . The prop osed framew ork supp orts v arious 6G application scenarios and enables in telligent optimal scheduling of comm unication resources. • ChannelGPT: Y u et al. prop ose ChannelGPT [196], a large mo del-driv en digital twin channel generator embedded with en vironment intelligence. It adapts to 6G scenarios through a three-lay er ar- c hitecture and core capabilities such as multi-modal fusion and multi-task pro cessing. Experimental v alidation demonstrates excellent performance in channel prediction accuracy and m ulti-scenario gener- alization, pro viding intelligen t support for decision-making across all la yers of 6G net works. • UniCla w: China Unicom has launched UniCla w 2) , an AI-native comm unication capability cen tered on the core concept of “AI reshap es connectivit y” and based on China Unicom’s Y uanjing digital in- telligence capabilities. UniClaw upgrades traditional basic communication functions such as telephon y and short messaging to nativ e-lev el AI connection channels. It aims to ov ercome the lack of AI em- p o wermen t in traditional basic communication, the high thresholds for intelligen t service access, and insufficien t multi-scenario resp onsiv eness and security assurance. By enabling barrier-free access to intel- ligen t services, stable multi-scenario resp onse, and high-lev el security assurance, UniClaw p ositions the basic comm unication netw ork as a key gatewa y in the era of in telligent agents. 5.3 Net w ork Agen t • RAN Agent: Hua w ei has released the industry’s first RAN Agen t 3) , built upon a specialized com- m unication LAM and the Radio Digital Twin System (RDTS). This agent establishes an end-to-end efficien t collab orativ e architecture and enables full-closed-lo op intelligen t op eration, addressing critical limitations in traditional wireless netw orks including insufficien t global intelligence, inaccurate netw ork resource scheduling, and weak cross-scenario collab oration capabilities. The RAN Agen t ac hiev es pre- cise netw ork resource scheduling and full-scenario single-domain autonom y , deliv ering comprehensive impro vemen ts in user exp erience, op eration and maintenance efficiency , and net work energy efficiency while meeting diverse operator requiremen ts across differen t deplo yment scenarios. • Agen tic AI for RAN: P ellejero et al. [197] propose applying the Agen tic AI paradigm to 5G/6G RAN management and optimization, integrating design patterns such as reflection and planning to en- able autonomous decision-making through LAMs and m ulti-agent collab oration. This approac h addresses the high complexity of next-generation net w orks, the inefficiency of traditional man ual and static opti- mization metho ds, and the lack of mature frameworks in this domain. The prop osed system achiev es autonomous KPI monitoring, anomaly diagnosis, and optimization recommendations, with early indus- trial implemen tations demonstrating impro ved net work p erformance and reliability . • JoinAI-Agen t: China Mobile has open-sourced the ”JoinAI-Agen t” in telligent agent engine 4) . F ea- turing a ”one master with multiple slav es” architecture and no-co de extension capabilities, it supp orts automation of complex enterprise pro cesses. The engine has achiev ed top ranking in the international GAIA b enc hmark, breaking technical barriers, low ering developmen t thresholds, and facilitating the con- struction of an op en intelligen t agent ecosystem while accelerating industrial intelligen t transformation. • Xingc hen Sup er Agent: China T elecom has launched the Xingc hen Sup er Agent 5) , based on its self-dev elop ed Xingchen LAM. The agent incorporates core capabilities including autonomous task decomp osition, cross-application collab oration, human-mac hine co operation, and open custom ization, complemen ted b y triple-la yer securit y protection and a full-link evolution closed lo op. This solution addresses the c hallenges of AI implemen tation in go vernmen t and enterprise scenarios, impro ving business efficiency and decision-making quality while facilitating digital transformation and upgrading. 2) https://clien t.sina.com.cn/2026-03-06/do c-inhpzvnm9184324.sh tml 3) https://www.h uawei.com/en/news/2026/3/m wc-m bb-ran 4) https://gith ub.com/op encmit/JoinAI-Agen t?tab=readme-ov-file 5) https://www.c hinatelecom-h.com/en/cg/p df/esg/inno v ation.p df ZHANG P , et al. Sci China Inf Sci 37 5.4 Application Agen t 5.4.1 Gener al A gent • Gemini 3: Go ogle has unv eiled Gemini 3 6) , a new-generation flagship multi-modal intelligen t agent mo del that significan tly adv ances capabilities in deep reasoning, multi-modal integration, and long-term planning. This release addresses k ey limitations, including inaccurate understanding of user inten tions, high barriers to entry for dev elop ers in in telligent agen t developmen t, and insufficient mo del robustness against adv ersarial inputs. • Doubao-Seed-2.0: Doubao-Seed-2.0 7) has undergone comprehensive optimization and upgrading to meet the practical demands of large-scale production deploymen ts. Leveraging core capabilities in efficien t reasoning, multi-modal understanding, and complex instruction processing, it offers three general- purp ose Agent mo dels (Pro, Lite, Mini) alongside a dedicated programming-orien ted Code mo del. 5.4.2 V ertic al A gent Smart F actory • Industrial Copilot: Siemens has developed Industrial Copilot 8) , an in telligent agen t system built up on industrial infrastructure mo dels and industrial agents, while simultaneously fostering an op en ecosystem. Industrial Copilot leverages the in tegration of softw are and hardware, high-qualit y indus- trial data, and cross-domain industry kno wledge to comprehensiv ely address all facets of the industrial v alue c hain. It supports engineers throughout end-to-end collaborative tasks, resolving persistent chal- lenges suc h as the difficult y of implemen ting industrial AI and insufficien t understanding of industrial logic. Currently serving ov er 200 customers, Industrial Copilot is pro jected to increase pro duction effi- ciency by 50%, ac hieve energy-saving optimization across m ultiple scenarios, and driv e the transformation and upgrading of the man ufacturing industry . • Jo yindustrial: JD’s industrial LAM, Jo yindustrial 9) , is optimized jointly for cost, efficiency , and user exp erience. By selecting the smallest-scale mo del based on ScalingLaw principles and employing a T-S strategy to reduce costs, adopting a Mixture-of-Exp erts (MoE) architecture and CoT training to impro ve efficiency , and optimizing user exp erience through domain data syn thesis and reward function design, Jo yindustrial achiev es costs only one-sixteenth those of general-purp ose LAMs while delivering an eightfold impro vemen t in inference throughput. F urthermore, Joyindustrial enables the construction of agents capable of addressing diverse challenges within industrial ultra-long supply chains, including data silos, fragmented standards, complex managemen t requiremen ts, and collaborative conflicts. Smart Healthcare • Sully .ai: Sully .ai 10) is an enterprise-lev el AI assistant purp ose-built for the healthcare industry , in tegrating adv anced natural language pro cessing technologies to comprehensively address core clinical and administrative workflo ws, including medical do cumen tation, clinical research, administrative tasks, and m ultilingual translation. The platform seamlessly integrates with o v er ten electronic health record (EHR) systems, alleviating critical industry pain points such as the burden of manual documentation, inefficien t information retriev al, cross-language comm unication barriers, and lo w institutional operational efficiency . Sully .ai enhances b oth the qualit y of medical services and institutional op erational efficiency while eliminating language barriers in do ctor-patien t communication. Recognized b y ov er 400 medical institutions, it provides robust supp ort for the digital transformation of the healthcare industry . • Hipp ocratic AI: Hipp o cratic AI 11) has launc hed an AI-pow ered medical w orkforce platform built up on its proprietary P olaris mo del. Designed with safety , multilingual accuracy , and seamless elec- tronic medical record system integration as foundational priorities, the platform deliv ers multilingual non-diagnostic routine nursing services. It addresses critical global challenges including the shortage of n ursing resources, the burden of repetitive tasks on medical staff, the high costs of traditional care deliv- ery , and the inabilit y of general-purp ose AI systems to meet stringen t medical compliance requirements. 6) https://gemini3.us/gemini-3 7) https://lf3-static.b ytednsdo c.com/ob j/eden-cn/lapzild-tss/ljh wZthlaukjlkulzlp/seed2/0214/Seed2.0%20Model%20Card.p df 8) https://www.siemens.com/en-us/compan y/insights/generative-ai-industrial-copilot/ 9) https://jdcorporateblog.com/jd-industrials-unv eils-joy-industrial-the-first-ai-model-designed-for-industrial-supply-chain- transformation/ 10) https://www.sully .ai/ 11) https://www.trially .ai/ ZHANG P , et al. Sci China Inf Sci 38 Hipp ocratic AI achiev es high clinical accuracy and patien t satisfaction, with deploymen ts across multi- ple medical institutions handling massiv e volumes of medical calls while reducing costs and improving efficiency . The platform has secured substantial financing and gained significant mark et recognition. Smart Cit y • NVIDIA Blueprint for Smart City AI: NVIDIA has introduced a smart cit y AI blueprin t 12) built on Op enUSD digital t wins and to ols suc h as Omniv erse. By deploying AI agen ts through a three- stage workflo w, this blueprin t addresses the challenges of infrastructure lag and inefficient urban oper- ations resulting from rapid p opulation growth in cities. Based on this blueprint, cities can deploy an in tegrated op erational platform that combines weather data, traffic sensors, and emergency resp onse sys- tems, helping optimize resp onse sp eed, infrastructure planning, real-time monitoring, and other urban capabilities, thereb y facilitating the transformation of urban operations. • Cit y Intelligen t Agen t Solution: iSST ec h and Hua wei ha ve jointly dev elop ed a city in telligent agen t solution 13) that in tegrates AI mo dels and big data technologies within a five-la yer architecture cen tered on conv erged intelligence. The solution cov ers three ma jor application scenarios: cit y gov er- nance, economic developmen t, and public services. It addresses critical bottlenecks in urban in telligence dev elopment, including weak infrastructure and insufficient lo calized AI capabilities. The solution re- duces pro ject cycles b y 50%, streamlines gov ernment affairs, and helps cities adv ance their multi-domain in telligence capabilities while building sustainable localized AI capacit y . In telligen t T ransp ortation • T rafficGo: Hua w ei Cloud has launched the T rafficGo in telligent transportation solution 14) , which in tegrates m ulti-source data with AI, edge computing, and other technol ogies to build an in telligent traffic system. It enables functions such as regional signal co ordination and in telligent congestion management, ensuring smooth traffic flo w and impro ving ov erall efficiency . • Na viAgen t: Amap introduces NaviAgen t 15) , the world’s first AI navigation agen t in the map field. It adopts a planner-executor arc hitecture and a smart closed-lo op system comprising four modules, in te- grating traffic p erception, emotion-a ware voice interaction, and other technologies. NaviAgen t addresses the limitations of traditional na vigation, such as restricted local p erception, rigid execution, and lack of emotional interaction, by enabling b eyond-visual-range road condition prediction, lane-lev el safet y warn- ings, and emotional companionship. This transforms na vigation from a mere to ol into an in telligent trav el partner. 6 Challenges and F uture Researc h Directions This section delv es into the challenges and future research directions of semantic-based agent comm uni- cation net works as follows. 6.1 Theoretical F ramew ork of Seman tic-based Agen t Comm unication Net w orks Classical Shannon information theory measures information in bits, but it cannot characterize the “mean- ing” and “v alue” at the semantic lev el. There is a lac k of a unified mathematical foundation for defining a measuremen t unit for seman tic information to quan tify its impact on the decision-making of the receiving agen t. F urthermore, SemCom impro ves bandwidth efficiency through the integration of comm unication, computation, and intelligence, yet existing theories cannot characterize the p erformance b oundaries of this join t optimization. Moreo ver, noise and in terference in wireless c hannels may lead to seman tic misun- derstandings. Establishing a mathematical mo del for seman tic c hannels to describ e the loss or deviation of seman tic information during transmission is also imp ortan t. F uture research can integrate the funda- men tal theories of AI to further enrich the theory of seman tic-based agent comm unication netw orks. In addition, theoretical supp ort can b e pro vided from an interdisciplinary p erspective, suc h as theories from complex net works and systems science. 12) https://www.n vidia.com/en-us/industries/smart-cities-and-spaces/ 13) https://e.h uawei.com/jp/news/2026/industries/go vernmen t/city-in telligent-agent 14) https://www.h uaweicloud.com/product/trafficgo.html 15) https://www.alibabagroup.com/en-US/document-1889126073686294528 ZHANG P , et al. Sci China Inf Sci 39 6.2 Managemen t of Seman tic KBs for Seman tic-based Agen t Comm unication Net w orks Differen t agents may construct semantic KBs based on different foundation mo dels, K Gs, or domain kno wledge. Therefore, when they in teract for the first time, achieving fast and accurate alignmen t of seman tic KBs to ensure the matc hing of enco ding and decoding p oses a significant challenge. F urthermore, as w orld kno wledge con tinuously evolv es, agents’ semantic KBs m ust undergo ongoing evolution, raising the follo wing issues: Ho w to efficiently up date seman tic KBs? How to prev en t old semantic knowledge from interfering with new semantic kno wledge? How to design efficient and highly reliable forgetting mec hanisms for semantic knowledge? Additionally , on agen ts with constrained computational p o wer and storage, the storage and query ov erhead of seman tic KBs must b e minimized. How ever, existing large- scale KGs or neural netw ork parameters are difficult to deploy directly , necessitating extreme light w eight seman tic representations. Therefore, future researc h can fo cus on the following directions: ligh t weigh t seman tic kno wledge representation, fast alignment of seman tic KBs, and dynamic evolution of semantic KBs. 6.3 Securit y and Priv acy Protection for Seman tic-based Agent Communication Net w orks Compared to traditional bit-level communication, SemCom ma y in tro duce new securit y threats to agen t comm unications. F or example, attac k ers no longer need to corrupt all data. On the contrary , they only need to tamp er with a small amoun t of critical semantics to completely alter the meaning of the infor- mation and mislead the agen t’s decision-making. F urthermore, if an agent generates erroneous seman tic information due to model hallucination and it spreads rapidly through the SemCom netw ork, it can lead to a systemic false consensus, undermining collaborative trust. T o secure seman tic-based agent comm u- nication netw orks, future directions include: researching the representation of encrypted seman tics while main taining semantic usability b et ween legitimate agen ts; introducing tec hnologies such as blo c kc hain and digital signatures to add traceable and tamper-pro of evidence to semantic information, enabling the receiving agen t to v erify the authenticit y and in tegrity of seman tics; utilizing trusted execution environ- men ts and secure multi-part y computation to protect data and mo del priv acy during seman tic enco ding and deco ding pro cesses, ensuring that semantic information is not leaked even if computing nodes are un trusted; and designing interpretable seman tic mo dels to enhance the detection capability of potential attac ks. 6.4 Standardization and Industry for Seman tic-based Agen t Comm unication Net w orks The standardization of SemCom has b een adv anced in numerous standardization organizations, such as the ITU, 3GPP , In ternational Mobile T elecommunications 2030 (IMT-2030), and China Communi- cations Standards Asso ciation (CCSA). Semantic-based agent communication, as one of its application scenarios, has garnered atten tion from b oth academia and industry . F or example, F uture directions in- clude: further promoting seman tic-based agen t comm unication from the persp ectiv e of the en tire industry c hain, encompassing fundamen tal theories, key technologies, protot yp e dev elopment, c hip manufacturing, and application deploymen t; defining cross-domain and cross-industry semantic translation standards to ac hieve seman tic interoperability across v ertical industries; establishing op en-source platforms and test- ing environmen ts for the v alidation of new algorithms and mo dels; and dev eloping agen t comm unication net work tec hnologies that enable the co existence of seman tics and bits, adapting to netw ork en vironmen ts where seman tics and bits are mixed. 7 Conclusions In this review, w e hav e presented a comprehensiv e framework for seman tic-based agen t communication net works, addressing the critical intersection of SemCom and agentic AI systems. W e first prop osed a no vel architecture for semantic-based agent comm unication net works, comprising three la yers, four enti- ties, and four stages. It establishes a structured foundation for understanding and designing semantic- enabled agen t comm unication net works. This arc hitecture in tegrates three wireless agent netw ork la yers with four AI agent en tities and four op erational stages that form a complete cognitiv e cycle for agent b eha vior. Building up on this architectural framew ork, we conducted an extensive exploration of the state-of-the-art in semantic-based agent communication netw orks. Our in vestigation spanned the three arc hitectural lay ers, examining representativ e approaches from in tention inference tec hniques to semantic ZHANG P , et al. Sci China Inf Sci 40 co ding and distributed collab oration mec hanisms. F urthermore, w e systematically reviewed adv ance- men ts across the four stages, including p erception, memory , reasoning, and action, highlighting how seman tics enhance each stage. W e also pro vided a taxonomy of AI agen ts in SemCom net works, catego- rizing them into em b odied, communication, netw ork, and application agents to clarify their distinct roles and functionalities. Despite significant progress in this emerging field, several fundamental c hallenges remain open for future inv estigation. Lastly , w e iden tified and discussed key researc h directions. Ac knowledgemen ts This work was supported in part by the National Key Research and Developmen t Program of China (Grant No. 2020YFB1806905); in part by the National Natural Science F oundation of China (Grant Nos. 62501066 and U24B20131); and in part by the Beijing Municipal Natural Scie nce F oundation (Grant No. L242012). References 1 Cheng-Xiang W ang, Xiaohu Y ou, Xiqi Gao, Xiuming Zhu, Zixin Li, Chuan Zhang, Haiming W ang, Y ongming Huang, Y unfei Chen, Harald Haas, et al. On the road to 6g: Visions, requirements, key tec hnologies, and testb eds. IEEE Communic ations Surveys & T utorials , 25(2):905–974, 2023. 2 Qimei Cui, Xiaohu Y ou, Ni W ei, Guoshun Nan, Xuefei Zhang, Jianhua Zhang, Xinc hen Lyu, Ming Ai, Xiaofeng T ao, Zhiyong F eng, et al. Overview of ai and communication for 6g network: F undamentals, challenges, and future research opportunities. Scienc e China Information Sciences , 68(7):171301, 2025. 3 ITUR RECOMMENDA TION. F ramew ork and o verall objectives of the future dev elopment of imt for 2030 and beyond. T ec hnical rep ort, tech. rep., International T elecommunication Union (ITU) Recommendation (ITU-R), 2023. 4 3GPP Tdoc R3-260548. Discussion on 6G RAN architecture and function, 2026. 5 Ranjan Sapk ota, Konstantinos I Roumeliotis, and Mano j Karkee. Ai agents vs. agentic ai: A conceptual taxonomy , applications and challenges. Information F usion , page 103599, 2025. 6 Mohamad Abou Ali, F adi Dornaika, and Jinan Charafeddine. Agentic ai: a comprehensive survey of architectures, applica- tions, and future directions. Artificial Intel ligenc e R eview , 59(1):11, 2025. 7 Dayu F an, Rui Meng, Xiaodong Xu, Yiming Liu, Guosh un Nan, Chen yuan F eng, Sh ujun Han, Song Gao, Bingxuan Xu, Dusit Niy ato, et al. Generativ e diffusion models for wireless netw orks: F undamental, architecture, and state-of-the-art. IEEE Communic ations Surveys & T utorials , 2026. 8 Xiqi Cheng, Rui Meng, Xiaodong Xu, Haixiao Gao, Ping Zhang, and Dusit Niy ato. Ap eg: Adaptive ph ysical lay er au- thentication with c hannel extrapolation and generativ e ai. IEEE T r ansactions on Information F orensic s and Security , 21:1257–1272, 2026. 9 Ruichen Zhang, Guangyuan Liu, Yinqiu Liu, Changyuan Zhao, Jiac heng W ang, Y un ting Xu, Dusit Niyato, Jiawen Kang, Y ongh ui Li, Shiwen Mao, et al. T o ward edge general intelligence with agentic ai and agentification: Concepts, technologies, and future directions. IEEE Communic ations Surveys & T utorials , 28:4285–4318, 2026. 10 F eib o Jiang, Cunhua Pan, Kezhi W ang, Pietro Michiardi, Octavia A Dobre, and Merouane Debbah. F rom large ai mo dels to agentic ai: A tutorial on future intelligent communications. IEEE Journal on Sele cte d Ar eas in Communic ations , 2026. 11 W anting Y ang, Hongy ang Du, Zi Qin Liew, W ei Y ang Bryan Lim, Zeh ui Xiong, Dusit Niyato, Xuefen Chi, Xuemin Shen, and Chun yan Miao. Semantic communications for future internet: F undamentals, applications, and challenges. IEEE Communic ations Surveys & T utorials , 25(1):213–250, 2022. 12 3GPP TR 22.870, V0.3.1. Study on 6G Use Cases and Service Requirements, 2025. 13 Xiao dong Xu, Bingxuan Xu, Shujun Han, Chen Dong, Huachao Xiong, Rui Meng, and Ping Zhang. T ask-oriented and semantic-a ware heterogeneous netw orks for artificial in telligence of things: Performance analysis and optimization. IEEE Internet of Things Journal , 11(1):228–242, 2023. 14 Haixiao Gao, Mengying Sun, Ruichen Zhang, Y anhan W ang, Xiao dong Xu, Nan Ma, Dusit Niyato, and Ping Zhang. Agentic ai-enhanced semantic communications: F oundations, architecture, and applications. arXiv pr eprint arXiv:2512.23294 , 2025. 15 Hui Cao, Rui Meng, Xiaodong Xu, Sh ujun Han, and Ping Zhang. Imp ortance-a ware robust semantic transmission for leo satellite-ground communication. IEEE Internet of Things Journal , 13(5):9665–9681, 2026. 16 Dayu F an, Rui Meng, Song Gao, and Xiao dong Xu. Kgrag-sc: Knowledge graph rag-assisted seman tic communication. In 2025 lEEE International Confer enc e on Cloud Computing T e chnolo gy and Scienc e (CloudCom) , pages 1–7. IEEE, 2025. 17 Rui Meng, Zixuan Huang, Jingshu Y an, Mengying Sun, Yiming Liu, Chenyuan F eng, Xiao dong Xu, Zhidi Zhang, Song Gao, Ping Zhang, et al. Semantic radio access netw orks: Architecture, state-of-the-art, and future directions. IEEE T r ansactions on Co gnitive Communications and Networking , 2026. 18 Ping Zhang, W enjun Xu, Yiming Liu, Xiaoqi Qin, Kai Niu, Shuguang Cui, Guangming Shi, Zhijin Qin, Xiaodong Xu, F engyu W ang, et al. In tellicise wireless netw orks from semantic communications: A survey , research issues, and c hallenges. IEEE Communic ations Surveys & T utorials , 27(3):2051–2084, 2025. 19 Rui Meng, Zhidi Zhang, Song Gao, Y aheng W ang, Xiao dong Xu, Yijing Lin, Yiming Liu, Chenyuan F eng, Lexi Xu, Yi Ma, et al. In tellicise wireless netw orks meet agen tic ai: A security and privacy persp ectiv e. arXiv pr eprint arXiv:2602.15290 , 2026. 20 Ping Zhang, Kai Niu, Xiaoyun W ang, Yiming Liu, Zijian Liang, Chen Dong, Jincheng Dai, Xiao dong Xu, W enjun Xu, Zhi Zhang, et al. Comai: The con vergence of communication and artificial intelligence. IEEE Communications Surveys & T utorials , 28:2163–2197, 2026. 21 Deniz G¨ und¨ uz, Zhijin Qin, Inaki Estella Aguerri, Harpreet S Dhillon, Zhaohui Y ang, Aylin Y ener, Kai Kit W ong, and Chan- Byoung Chae. Beyond transmitting bits: Con text, semantics, and task-orien ted communications. IEEE Journal on Sele cted Ar e as in Communic ations , 41(1):5–41, 2023. 22 Zhilin Lu, Rongpeng Li, Kun Lu, Xianfu Chen, Ekram Hossain, Zhifeng Zhao, and Honggang Zhang. Semantics-empow ered communications: A tutorial-cum-survey . IEEE Communications Surveys & T utorials , 26(1):41–79, 2024. 23 Tilahun M Getu, Georges Kaddoum, and Mehdi Bennis. Semantic communication: A survey on research landscape, c hal- lenges, and future directions. Pro c e e dings of the IEEE , 112(11):1649–1685, 2025. 24 Christina Chaccour, W alid Saad, Merouane Debbah, Zh u Han, and H Vincen t P oor. Less data, more knowledge: Building next-generation semantic communication netw orks. IEEE Communications Surveys & T utorials , 27(1):37–76, 2025. ZHANG P , et al. Sci China Inf Sci 41 25 Guangming Shi, Y ong Xiao, Yingyu Li, and Xuemei Xie. F rom seman tic communication to semantic-aw are netw orking: Model, arc hitecture, and op en problems. IEEE Communications Magazine , 59(8):44–50, 2021. 26 Zhaohui Y ang, Mingzhe Chen, Gaolei Li, Y ang Y ang, and Zhaoy ang Zhang. Secure semantic comm unications: F undamentals and challenges. IEEE network , 38(6):513–520, 2024. 27 Rui Meng, Song Gao, Dayu F an, Haixiao Gao, Yining W ang, Xiao dong Xu, Bizhu W ang, Suyu Lv, Zhidi Zhang, Mengying Sun, et al. A surv ey of secure seman tic comm unications. Journal of Network and Computer Applic ations , 239:104181, 2025. 28 Shaolong Guo, Y untao W ang, Ning Zhang, Zhou Su, T om H Luan, Zhiyi Tian, and Xuemin Shen. A survey on seman tic communication net works: Architecture, security , and priv acy . IEEE c ommunic ations surveys & tutorials , 27(5):2860–2894, 2025. 29 Chujun Zhang, Lin yu Huang, and Qian Ning. Resource allo cation in wireless seman tic communications: A comprehensiv e survey . IEEE Communic ations Surveys & T utorials , 28:2965–3001, 2026. 30 Chengsi Liang, Hongyang Du, Y ao Sun, Dusit Niyato, Jiawen Kang, Dezong Zhao, and Muhammad Ali Imran. Generative ai-driven seman tic communication netw orks: Architecture, tec hnologies, and applications. IEEE T r ansactions on Co gnitive Communic ations and Networking , 11(1):27–47, 2025. 31 Deepak Bhask ar Achary a, Karthigey an Kuppan, and B Divya. Agentic ai: Autonomous intelligence for complex goals—a comprehensive survey . IEEe A c c ess , 13:18912–18936, 2025. 32 Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, and Prasant Mohapatra. Agentic ai securit y: Threats, defenses, ev aluation, and open challenges. arXiv pr eprint arXiv:2510.23883 , 2025. 33 Huanting W ang, Jingzhi Gong, Huawei Zhang, Jie Xu, and Zheng W ang. Ai agentic programming: A survey of tec hniques, challenges, and opportunities. arXiv pr eprint arXiv:2508.11126 , 2025. 34 Rui Meng, Song Gao, Haixiao Gao, Yinqiu Liu, Ruic hen Zhang, Mengying Sun, Xiaodong Xu, Ping Zhang, and Dusit Niy ato. Image steganography for securing intellicise wireless netw orks:” invisible encryption” against eavesdroppers. arXiv pr eprint arXiv:2505.04467 , 2025. 35 Kaiwen Y u, Mengying Sun, Zhijin Qin, Xiao dong Xu, Ping Y ang, Y ue Xiao, and Gang W u. Seman tic-driven ai agent communications: Challenges and solutions. arXiv pr eprint arXiv:2510.00381 , 2025. 36 Brian Reily , P eng Gao, F ei Han, Hua W ang, and Hao Zhang. Real-time recognition of team behaviors b y multisensory graph-embedded rob ot learning. The International Journal of Rob otics R esear ch , 41(8):798–811, 2022. 37 Shirin Sohrabi, Anton V Riabov, and Octa vian Udrea. Plan recognition as planning revisited. In IJCAI , pages 3258–3264. New Y ork, NY, 2016. 38 Y uanfei W ang, F angwei Zhong, Jing Xu, and Yizhou W ang. T om2c: T arget-oriented m ulti-agent communication and coop eration with theory of mind. arXiv preprint , 2021. 39 Hao jun Shi, Suyu Y e, Xinyu F ang, Ch uany ang Jin, Leyla Isik, Y en-Ling Kuo, and Tianmin Shu. Muma-tom: Multi-modal multi-agen t theory of mind. In Pr o ce e dings of the AAAI Confer enc e on Artificial Intel ligenc e , volume 39, pages 1510–1519, 2025. 40 Kai Niu and Ping Zhang. The mathematic al theory of semantic c ommunic ation . Springer, 2025. 41 Ping Zhang, Xiao dong Xu, Chen Dong, Kai Niu, Haotai Liang, Zijian Liang, Xiao qi Qin, Mengying Sun, Hao Chen, Nan Ma, et al. Mo del division multiple access for semantic communications. F r ontiers of Information T e chnology & Ele ctronic Engine ering , 24(6):801–812, 2023. 42 Haotai Liang, Kaijun Liu, Xiaoyi Liu, Hongchao Jiang, Chen Dong, Xiao dong Xu, Kai Niu, and Ping Zhang. Orthogonal model division multiple access. IEEE T r ansactions on Wireless Communications , 23(9):11693–11707, 2024. 43 Hui Cao, Rui Meng, Shujun Han, Song Gao, Xiaodong Xu, and Ping Zhang. S-mdma: Sensitivity-a ware model division multiple access for satellite-ground seman tic communication. arXiv preprint , 2026. 44 Ahmad Halimi Razlighi, Maximilian HV Tillmann, Edgar Beck, Carsten Bo c kelmann, and Armin Dekorsy . Co operative and collaborative multi-task semantic communication for distributed sources. In ICC 2025-IEEE International Conferenc e on Communic ations , pages 3966–3971. IEEE, 2025. 45 Jiafei Duan, Samson Y u, Hui Li T an, Hongyuan Zhu, and Cheston T an. A surv ey of embo died ai: F rom simulators to research tasks. IEEE T ransactions on Emer ging T opics in Computational Intel ligenc e , 6(2):230–244, 2022. 46 Burak Demirel, Pablo Soldati, and Y u W ang. F rom inten ts to actions: Agentic ai in autonomous networks. arXiv pr eprint arXiv:2602.01271 , 2026. 47 Ruihang Miao, W eizhou Liu, Mingrui Chen, Zheng Gong, W eixin Xu, Chen Hu, and Sh uchang Zhou. Occdepth: A depth- aw are metho d for 3d semantic scene completion. arXiv preprint , 2023. 48 Antoni Rosinol, Arjun Gupta, Marcus Abate, Jingnan Shi, and Luca Carlone. 3d dynamic scene graphs: Actionable spatial perception with places, ob jects, and h umans. arXiv pr eprint arXiv:2002.06289 , 2020. 49 Rohit Girdhar, Alaaeldin El-Nouby , Zhuang Liu, Mannat Singh, Kalyan V asudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. In Pr o c ee dings of the IEEE/CVF confer enc e on c omputer vision and p attern re c o gnition , pages 15180–15190, 2023. 50 Shilong Liu, Zhaoyang Zeng, Tianhe Ren, F eng Li, Hao Zhang, Jie Y ang, Qing Jiang, Chunyuan Li, Jianwei Y ang, Hang Su, et al. Grounding dino: Marrying dino with grounded pre-training for op en-set ob ject detection. In Europ e an c onfere nc e on c omputer vision , pages 38–55. Springer, 2024. 51 Aydar Bulatov, Y ury Kurato v, and Mikhail Burtsev. Recurrent memory transformer. A dvanc es in Neur al Information Pr oc essing Systems , 35:11079–11091, 2022. 52 Jo on Sung P ark, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Mic hael S Bernstein. Generativ e agents: Interactiv e sim ulacra of human b eha vior. In Pr o c ee dings of the 36th annual acm symposium on user interfac e softwar e and technolo gy , pages 1–22, 2023. 53 W eizhi W ang, Li Dong, Hao Cheng, Xiao dong Liu, Xifeng Y an, Jianfeng Gao, and F uru W ei. Augmenting language mo dels with long-term memory . A dvanc es in Neur al Information Pr o c essing Systems , 36:74530–74543, 2023. 54 Noah Shinn, F ederico Cassano, Ash win Gopinath, Karthik Narasimhan, and Shunyu Y ao. Reflexion: Language agen ts with verbal reinforcement learning. A dvanc es in neur al information pr oc essing systems , 36:8634–8652, 2023. 55 Prateek Chhik ara, Dev Khan t, Saket Aryan, T aranjeet Singh, and Deshra j Y ada v. Mem0: Building pro duction-ready ai agents with scalable long-term memory . arXiv preprint , 2025. 56 Diego Sanmartin. Kg-rag: Bridging the gap b et ween knowledge and creativity . arXiv pr eprint arXiv:2405.12035 , 2024. 57 Akari Asai, Zeqiu W u, Yizhong W ang, Avirup Sil, and Hannaneh Ha jishirzi. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conferenc e on L e arning R epresentations , 2023. ZHANG P , et al. Sci China Inf Sci 42 58 Jinhao Jiang, Kun Zhou, Zican Dong, Keming Y e, W ayne Xin Zhao, and Ji-Rong W en. StructGPT: A general framework for large language mo del to reason over structured data. In Pr oc e e dings of EMNLP , 2023. 59 Hunter Lightman, Vineet Kosara ju, Y ura Burda, Harrison Edwards, Bow en Baker, T eddy Lee, Jan Leik e, John Sch ulman, Ilya Sutsk ever, and Karl Cobb e. Let’s verify step by step. In International Confer enc e on Le arning R epresentations , 2024. 60 Aman Madaan, Niket T andon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumo ye, Yiming Y ang, et al. Self-refine: Iterative refinement with self-feedback. In A dvanc es in Neur al Information Pr oc essing Systems , 2023. 61 Michael Dann, Y uan Y ao, Natasha Alechina, Brian Logan, F elipe Meneguzzi, John Thangara jah, et al. Multi-agent in tention recognition and progression. In Pr o ce e dings of the 32nd International Joint Confer ence on Artificial Intel ligence, IJCAI 2023 , volume 2023, pages 91–99. IJCAI Organization, 2023. 62 Zihang Su, Artem Polyvyan yy , Nir Lipovetzky , Sebastian Sardi˜ na, and Nic k v an Beest. F ast and accurate data-driven goal recognition using pro cess mining techniques. Artificial Intel ligenc e , 323:103973, 2023. 63 Miquel Ramırez and Hector Geffner. Plan recognition as planning. In Pr oc e edings of the 21st international joint c onferenc e on Artific al intel ligenc e. Mor gan Kaufmann Publishers Inc , pages 1778–1783, 2009. 64 Maayan Shv o, Shirin Sohrabi, and Sheila A McIlraith. An ai planning-based approac h to the multi-agen t plan recognition problem. In Canadian Confer ence on Artificial Intel ligenc e , pages 253–258. Springer, 2018. 65 Peta Masters and Sebastian Sardina. Goal recognition for rational and irrational agents. In Pr o c ee dings of the 18th international c onferenc e on autonomous agents and multiagent systems , pages 440–448, 2019. 66 Ramon F raga Pereira, Nir Or en, and F elip e Meneguzzi. Landmark-based approaches for goal recognition as planning. Artificial Intel ligenc e , 279:103217, 2020. 67 Nils Wilken, Lea Cohausz, Christian Bartelt, and Heiner Stuck enschmidt. Planning landmark based goal recognition revisited: Do es using initial state landmarks mak e sense? In German Conferenc e on Artificial Intelligence (K ¨ unstliche Intel ligenz) , pages 231–244. Springer, 2023. 68 Nils Wilken, Lea Cohausz, Johannes Sc haum, Stefan L¨ udtke, Christian Bartelt, and Heiner Stuck enschmidt. Leveraging planning landmarks for hybrid online goal recognition. arXiv pr eprint arXiv:2301.10571 , 2023. 69 Zhang Zhang, Yifeng Zeng, W enhui Jiang, Yinghui P an, and Jing T ang. Inten tion recognition for multiple agents. Infor- mation Scienc es , 628:360–376, 2023. 70 Chenyuan Zhang, Cristian Ro jas Cardenas, Hamid Rezatofighi, Mor V ered, and Buser Say . Probabilistic active goal recognition. arXiv pr eprint arXiv:2507.21846 , 2025. 71 Maayan Sh vo and Sheila A McIlraith. Active goal recognition. In Pro c ee dings of the AAAI Confer enc e on Artificial Intel ligenc e , volume 34, pages 9957–9966, 2020. 72 Domenico Maisto, F rancesco Donnarumma, and Gio v anni Pezzulo. Interactiv e inference: A m ulti-agent model of coop erativ e joint actions. IEEE T ransactions on Systems, Man, and Cyb ernetics: Systems , 54(2):704–715, 2023. 73 Stefano V Albrech t, Jacob W Crandall, and Subramanian Ramamoorthy . Belief and truth in hypothesised b eha viours. Artificial Intel ligenc e , 235:63–94, 2016. 74 Stefano V Albrech t and Peter Stone. Reasoning about hypothetical agent b eha viours and their parameters. arXiv preprint arXiv:1906.11064 , 2019. 75 F engming Zhu and F angzhen Lin. Single-agent planning in a m ulti-agent system: A unified framework for type-based planners. arXiv pr eprint arXiv:2502.08950 , 2025. 76 Xiaop eng Y u, Jiech uan Jiang, and Zongqing Lu. Opponent modeling based on subgoal inference. Advanc es in Neur al Information Pr oc essing Systems , 37:60531–60555, 2024. 77 Xiaomin Lin, Stephen C Adams, and Peter A Beling. Multi-agent in verse reinforcement learning for certain general-sum stochastic games. Journal of Artificial Intel ligenc e R esear ch , 66:473–502, 2019. 78 Justin F u, Andrea T acchetti, Julien Perolat, and Y oram Bachrac h. Ev aluating strategic structures in multi-agen t inv erse reinforcement learning. Journal of Artificial Intel ligenc e R ese arch , 71:925–951, 2021. 79 Till F reihaut and Giorgia Ramponi. On feasible rew ards in m ulti-agent inverse reinforcement learning. arXiv pr eprint arXiv:2411.15046 , 2024. 80 Lance Ying, T an Zhi-Xuan, Vikash Mansinghk a, and Joshua B T enenbaum. Inferring the goals of comm unicating agents from actions and instructions. In Pro c ee dings of the AAAI symp osium series , volume 2, pages 26–33, 2023. 81 Rob erta Railean u, Emily Denton, Arth ur Szlam, and Rob F ergus. Mo deling others using oneself in multi-agen t reinforcement learning. In International c onferenc e on machine le arning , pages 4257–4266. PMLR, 2018. 82 Yijie Zhang, Roxana Radulescu, Patric k Mannion, Diederik Roijers, and Ann Now ´ e. Opp onent modelling using p olicy reconstruction for multi-ob jectiv e normal form games. In 2020 A daptive Le arning A gents workshop at AAMAS , 2020. 83 Xiaop eng Y u, Jiech uan Jiang, W anp eng Zhang, Haobin Jiang, and Zongqing Lu. Mo del-based opponent modeling. Advanc es in Neur al Information Pro c essing Systems , 35:28208–28221, 2022. 84 Jan P¨ opp el and Stefan Kopp. Satisficing mo dels of bay esian theory of mind for explaining b eha vior of differently uncertain agents: So cially interactiv e agents track. In Pro c e e dings of the 17th international c onfer ence on autonomous agents and multiagent systems , pages 470–478, 2018. 85 T erence X Lim, Sidney Tio, and Desmond C Ong. Improving multi-agen t co operation using theory of mind. arXiv pr eprint arXiv:2007.15703 , 2020. 86 Neil Rabinowitz, F rank Perbet, F rancis Song, Chiyuan Zhang, SM Ali Eslami, and Matthew Botvinick. Mac hine theory of mind. In International c onferenc e on machine le arning , pages 4218–4227. PMLR, 2018. 87 Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Y amins, and Nick Hab er. Hypothetical minds: Scaffolding theory of mind for multi-agen t tasks with large language models. arXiv pr eprint arXiv:2407.07086 , 2024. 88 Huao Li, Y u Chong, Simon Stepputtis, Joseph P Campb ell, Dana Hughes, Charles Lewis, and Katia Sycara. Theory of mind for multi-agen t collab oration via large language mo dels. In Pr o ce e dings of the 2023 Conferenc e on Empirical Methods in Natur al Language Pr o cessing , pages 180–192, 2023. 89 Xiyun Li, Tielin Zhang, Chenghao Liu, Sh uang Xu, and Bo Xu. Theory of mind inspired large reasoning language mo del improv ed m ulti-agent reinforcement learning algorithm for robust and adaptive partner modelling. Machine Intel ligenc e R esear ch , pages 1–14, 2025. 90 Eirina Bourtsoulatze, Da vid Burth Kurka, and Deniz G ¨ und ¨ uz. Deep joint source-channel co ding for wireless image trans- mission. IEEE T r ansactions on Cognitive Communic ations and Networking , 5(3):567–579, 2019. 91 Jincheng Dai, Sixian W ang, Kailin T an, Zhongwei Si, Xiaoqi Qin, Kai Niu, and Ping Zhang. Nonlinear transform source- channel coding for semantic communications. IEEE Journal on Sele cted Ar eas in Communic ations , 40(8):2300–2316, ZHANG P , et al. Sci China Inf Sci 43 2022. 92 Ke Y ang, Sixian W ang, Jincheng Dai, Xiao qi Qin, Kai Niu, and Ping Zhang. Swinjscc: T aming swin transformer for deep joint source-channel co ding. IEEE Journal on Sele cte d Ar e as in Communic at ions , 41(8):2619–2634, 2023. 93 Y ufei Bo, Yiheng Duan, Shuo Shao, and Meixia T ao. Joint coding-mo dulation for digital seman tic communications via v ariational auto encoder. IEEE T r ansactions on Communic ations , 72(9):5626–5640, 2024. 94 Maheshi U Lokumarambage, Vishnu Sai Sankeerth Gowrisett y , Hossein Rezaei, Thushan Sivalingam, Nandana Ra jathev a, and Anil F ernando. Wireless end-to-end image transmission system using semantic communications. IEEE Ac c ess , 11:37149– 37163, 2023. 95 Bingxuan Xu, Shujun Han, Xiao dong Xu, W eizhi Li, Rui Meng, Chen Dong, and Ping Zhang. Semantic prior aided channel- adaptive equalizing and de-noising seman tic communication system with laten t diffusion model. IEEE T ransactions on Wir eless Communications , 24(6):4614–4630, 2025. 96 Rui Meng, Song Gao, Bingxuan Xu, Xiaodong Xu, Jianqiao Chen, Nan Ma, Pei Xiao, Ping Zhang, and Rahim T afa- zolli. Secure intellicise wireless net work: Agentic ai for co verless semantic steganograph y communication. arXiv preprint arXiv:2601.16472 , 2026. 97 Shavbo Salehi, Melike Erol-Kantarci, and Dusit Niyato. Llm-enabled data transmission in end-to-end semantic communica- tion. pages 1–6, 2025. 98 F eiyang W en, W eihua Xu, F eifei Gao, Chengk ang Pan, and Guangyi Liu. Vision aided en vironment semantics extraction and its application in mmwa ve beam selection. IEEE Communications Letters , 27(7):1894–1898, 2023. 99 Y uw en Y ang, F eifei Gao, Xiaoming T ao, Guangyi Liu, and Chengk ang Pan. Environmen t semantics aided wireless com- munications: A case study of mmw av e b eam prediction and block age prediction. IEEE journal on sele cted ar e as in c ommunications , 41(7):2025–2040, 2023. 100 Avi Deb Raha, Kitae Kim, Apurba Adhikary , Mrityunjo y Gain, Zhu Han, and Cho ong Seon Hong. Advancing ultra-reliable 6 g: T ransformer and semantic localization emp o wered robust beamforming in millimeter-wa ve communications. IEEE T r ansactions on V ehicular T e chnolo gy , 2025. 101 Ahsan Raza Khan and Poonam Y adav. Semqnet: Semantic-a ware quantised netw ork for mmw av e beam prediction. In 2025 IEEE Wir eless Communications and Networking Confer enc e (WCNC) , pages 1–6. IEEE, 2025. 102 Minghui W u, Zhen Gao, Zhao c heng W ang, Dusit Niy ato, George K Karagiannidis, and Sheng Chen. Deep join t seman- tic co ding and beamforming for near-space airship-b orne massive mimo netwo rk. IEEE Journal on Sele cted A r eas in Communic ations , 2024. 103 Yifu Sun, Zhi Lin, Haijun Zhang, Haotong Cao, Kang An, F eng Tian, Naofal Al-Dhahir, and Jiangzhou W ang. T ow ards energy-efficient holographic mimo communications via stacked metasurface-assisted semantic beamforming. In ICC 2025- IEEE International Confer ence on Communic ations , pages 01–07. IEEE, 2025. 104 Bingyan Xie, Y ongp eng W u, Y uxuan Shi, W enjun Zhang, Sh uguang Cui, and M´ erouane Debbah. Robust image semantic coding with learnable csi fusion masking ov er mimo fading channels. IEEE T r ansactions on Wir eless Communic ations , 23(10):14155–14170, 2024. 105 Zhe Zheng, Haotai Liang, Y uc heng Liu, Chen Dong, Xiaodong Xu, and Lin Li. Semantic diversity for massiv e mimo csi feedback. In 2025 10th International Confer enc e on Computer and Communic ation System (ICCCS) , pages 831–837. IEEE, 2025. 106 Mingze Gong, Shuoy ao W ang, Shijian Gao, Jia Y an, and Suzhi Bi. Robust mimo semantic communication with imp erfect csi via knowledge distillation. IEEE T ransactions on V ehicular T e chnolo gy , 2026. 107 Guyue Zhu, Y uanjian Liu, Shuangde Li, Kai Mao, Qiuming Zhu, C ´ esar Briso-Rodr ´ ıguez, Jingyi Liang, and Xuchao Y e. Semantic-based channel state information feedback for aav-assisted isac systems. IEEE Internet of Things Journal , 12(5):4981–4991, 2024. 108 Ruonan Ren, Jianh ua Mo, and Meixia T ao. Semcsinet: A semantic-a ware csi feedback net work in massive mimo systems. arXiv pr eprint arXiv:2505.08314 , 2025. 109 Zhen Gao, Shicong Liu, Y u Su, Zhongxiang Li, and Dezhi Zheng. Hybrid kno wledge-data driven channel seman tic acquisition and b eamforming for cell-free massive mimo. IEEE journal of sele cte d topics in signal pro c essing , 17(5):964–979, 2023. 110 Jiaqi Cao, Lixiang Lian, Yijie Mao, and Bruno Clerckx. Adaptive csi feedback with hidden semantic information transfer. In ICASSP 2023-2023 IEEE International Conferenc e on Ac oustics, Sp e ech and Signal Pr oc essing (ICASSP) , pages 1–5. IEEE, 2023. 111 Guangyi Zhang, Qiyu Hu, Y unlong Cai, and Guanding Y u. Scan: Semantic communication with adaptive channel feedback. IEEE T r ansactions on Co gnitive Communic ations and Networking , 10(5):1759–1773, 2024. 112 Peiwen Jiang, Chao-Kai W en, Shi Jin, and Geoffrey Y e Li. Deep source-channel co ding for sentence semantic transmission with harq. IEEE tr ansactions on communic ations , 70(8):5225–5240, 2022. 113 Y ongk ang Li, Xu W ang, Zheng Shi, and Y aru F u. Semantic harq for intelligent transportation systems: Joint source-channel coding-p o wered reliable retransmissions. arXiv preprint , 2025. 114 Y uan Zheng, F engyu W ang, W enjun Xu, and Ping Zhang. Semantic base enabled image transmission with fine-grained harq. IEEE T r ansactions on Wir eless Communic ations , 24(4):3606–3622, 2025. 115 Y ucheng Sheng, Hao Y e, Le Liang, and Shi Jin. Semantic communication for co operative perception with harq. In 2024 IEEE 34th International Workshop on Machine L earning for Signal Pr o cessing (MLSP) , pages 1–6. IEEE, 2024. 116 Qingyang Zhou, Rongp eng Li, Zhifeng Zhao, Y ong Xiao, and Honggang Zhang. Adaptiv e bit rate control in semantic communication with incremen tal knowledge-based harq. IEEE Op en Journal of the Communic ations So ciety , 3:1076– 1089, 2022. 117 Ali Maatouk, Saad Kriouile, Mohamad Assaad, and Anthony Ephremides. The age of i ncorrect information: A new performance metric for status updates. IEEE/ACM T r ansactions on Networking , 28(5):2215–2228, 2020. 118 Aimin Li, Shaoh ua W u, Siqi Meng, Rongxing Lu, Sumei Sun, and Qinyu Zhang. T oward goal-oriented semantic communi- cations: New metrics, framework, and op en challenges. IEEE Wir eless Communications , 31(5):238–245, 2024. 119 Erfan Delfani and Nikolaos P appas. Semantics-a ware status up dates with energy harvesting devices: Query version age of information. In 2024 22nd International Symp osium on Mo deling and Optimization in Mobile, Ad Hoc, and Wir eless Networks (WiOpt) , pages 177–184. IEEE, 2024. 120 F ederico Chiariotti, Josefine Holm, Anders E Kalør, Beatriz Soret, Søren K Jensen, T orb en B Pedersen, and Petar Popovski. Query age of information: F reshness in pull-based communication. IEEE T ransactions on Communic ations , 70(3):1606– 1622, 2022. 121 Roy D Y ates. The age of gossip in netw orks. In 2021 IEEE International Symposium on Information The ory (ISIT) , ZHANG P , et al. Sci China Inf Sci 44 pages 2984–2989. IEEE, 2021. 122 Lingyi W ang, W ei W u, F uhui Zhou, F eng Tian, Qihui W u, and W alid Saad. A unified hierarc hical semantic kno wledge base for m ulti-task semantic communication. In ICC 2024-IEEE International Confer ence on Communic ations , pages 2937–2943. IEEE, 2024. 123 Jinke Ren, Zezhong Zhang, Jie Xu, Guan ying Chen, Y aping Sun, Ping Zhang, and Shuguang Cui. Knowledge base enabled semantic communication: A generative p erspective. IEEE Wir eless Communications , 31(4):14–22, 2024. 124 Shuling Li, Y aping Sun, Jinbei Zhang, Kechao Cai, Shuguang Cui, and Xiaodong Xu. End-to-end generative semantic communication powered b y shared semantic knowledge base. In 2024 IEEE International Conferenc e on Communications Workshops (ICC Workshops) , pages 1067–1072. IEEE, 2024. 125 Peng Yi, Y ang Cao, Xin Kang, and Ying-Chang Liang. Deep learning-empowered semantic communication systems with a shared knowledge base. IEEE T r ansactions on Wir eless Communications , 23(6):6174–6187, 2023. 126 F uhui Zhou, Yihao Li, Ming Xu, Lu Y uan, Qihui W u, Rose Qingyang Hu, and Naofal Al-Dhahir. Cognitiv e semantic comm u- nication systems driven by knowledge graph: Principle, implementation, and performance ev aluation. IEEE T ransactions on Communic ations , 72(1):193–208, 2023. 127 Zengle Zh u, Rongqing Zhang, Xiang Cheng, and Liuqing Y ang. Multi-mo dal fusion-based m ulti-task seman tic comm unication system. arXiv pr eprint arXiv:2407.00964 , 2024. 128 Ang Li, Xin W ei, Dan W u, and Liang Zhou. Cross-mo dal semantic communications. IEEE Wireless Communic ations , 29(6):144–151, 2022. 129 W u T ong, Chen Zhiyong, T ao Meixia, Xia Bin, and Zhang W enjun. Multi-user semantic fusion for semantic communications ov er degraded broadcast channels. China Communic ations , 21(10):1–15, 2024. 130 Baosheng Li, W eifeng Gao, Zeh ui Xiong, Jin Xie, Binquan Guo, and Miao Du. Decentralized semantic federated learning for real-time public safety tasks: Challenges, metho ds, and directions. arXiv pr eprint arXiv:2504.05107 , 2025. 131 Jia jia Liu, Y unlong Lu, Hao W u, and Y ueyue Dai. Efficient resource allo cation and semantic extraction for federated learning emp o wered vehicular semantic communication. In 2023 IEEE 98th V ehicular T e chnolo gy Confer enc e (VTC2023- F al l) , pages 1–5. IEEE, 2023. 132 Y ub o P eng, F eib o Jiang, Li Dong, Kezhi W ang, and Kun Y ang. Personalized federated learning for generative ai-assisted semantic communications. arXiv pr eprint arXiv:2410.02450 , 2024. 133 Jinho Choi and Jihong Park. Semantic communication as a signaling game with correlated knowledge bases. In 2022 IEEE 96th V ehicular T e chnolo gy Confer ence (VTC2022-F al l) , pages 1–5. IEEE, 2022. 134 Mohammad Karimzadeh F arshbafan, W alid Saad, and Merouane Debbah. Curriculum learning for goal-orien ted semantic communications with a common language. IEEE T r ansactions on Communic ations , 71(3):1430–1446, 2023. 135 Marko Rosic, Dean Sumic, and Lada Males. Semantic interoperability of m ulti-agent systems in autonomous maritime domains. Ele ctronics , 14(13):2630, 2025. 136 Jiahui Zhao, Ming Chen, Zhaohui Y ang, Changsheng Y ou, and Mingzhe Chen. Resource allo cation for semantic relay aided wireless net works with probability graph. In ICC 2024-IEEE International Confer enc e on Communications , pages 5317–5322. IEEE, 2024. 137 Zeyang Hu, Tianyu Liu, Changsheng Y ou, Zhaohui Y ang, and Mingzhe Chen. Multiuser resource allo cation for semantic- relay-aided text transmissions. In 2023 IEEE Glob e c om Workshops (GC Wkshps) , pages 1273–1278. IEEE, 2023. 138 Tianyu Liu, Changsheng Y ou, Zeyang Hu, Chenyu W u, Yi Gong, and Kaibin Huang. Semantic-rela y-aided text transmission: Placement optimization and bandwidth allo cation. In 2023 IEEE Glob e c om Workshops (GC Wkshps) , pages 215–220. IEEE, 2023. 139 Wing F ei Lo, Nitish Mital, Haotian W u, and Deniz G¨ und¨ uz. Collab orativ e semantic communication for edge inference. IEEE Wir eless Communications L etters , 12(7):1125–1129, 2023. 140 Jiawei Shao, Y uyi Mao, and Jun Zhang. T ask-oriented communication for multidevice coop erativ e edge inference. IEEE T r ansactions on Wir eless Communic ations , 22(1):73–87, 2022. 141 Qing Cai, Yiqing Zhou, Ling Liu, Hanxiao Y u, Yihao W u, Ningzhe Shi, and Jinglin Shi. Query-aw are semantic enco der-based resource allo cation in task-orien ted communications. IEEE T r ansactions on Mobile Computing , 2025. 142 Lei Y an, Zhijin Qin, Chunfeng Li, Rui Zhang, Y ongzhao Li, and Xiaoming T ao. Qo e-based semantic-a ware resource allocation for multi-task netw orks. IEEE T r ansactions on Wireless Communic ations , 23(9):11958–11971, 2024. 143 Cheng Zeng, Jun-Bo W ang, Ming Xiao, Changfeng Ding, Yijian Chen, Hongk ang Y u, and Jiangzhou W ang. T ask-orien ted semantic communication ov er rate splitting enabled wireless control systems for urllc services. IEEE T r ansactions on Communic ations , 72(2):722–739, 2023. 144 Yining W ang, Shujun Han, Xiao dong Xu, Haotai Liang, Rui Meng, Chen Dong, and Ping Zhang. F eature imp ortance-a ware task-oriented seman tic transmission and optimization. IEEE T r ansactions on Cognitive Communic ations and Networking , 10(4):1175–1189, 2024. 145 Lunyuan Chen and Jie Gong. Multi-source sc heduling and resource allocation for age-of-semantic-importance optimization in status update systems. In 2024 IEEE Wir eless Communications and Networking Confer ence (WCNC) , pages 1–6. IEEE, 2024. 146 Yining W ang, Mingzhe Chen, T ao Luo, W alid Saad, Dusit Niy ato, H Vincent Poor, and Shuguang Cui. P erformance optimization for semantic comm unications: An atten tion-based reinforcement learning approac h. IEEE Journal on Selecte d Ar e as in Communic ations , 40(9):2598–2613, 2022. 147 Chuanhong Liu, Caili Guo, Y ang Y ang, and Nan Jiang. Adaptable semantic compression and resource allo cation for task- oriented communications. IEEE T r ansactions on Cognitive Communic ations and Networking , 10(3):769–782, 2023. 148 Y uanw ei Zh u, Y akun Huang, Xiuquan Qiao, Zhijie T an, Boyuan Bai, Huadong Ma, and Schahram Dustdar. A seman tic-aw are transmission with adaptive control scheme for volumetric video service. IEEE T r ansactions on Multime dia , 25:7160–7172, 2022. 149 Le Xia, Y ao Sun, Dusit Niyato, Lan Zhang, and Muhammad Ali Imran. Wireless resource optimization in hybrid semantic/bit communication networks. IEEE T r ansactions on Communications , 73(5):3318–3332, 2024. 150 Jinxing Zhou, Jian yuan W ang, Jia yi Zhang, W eixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng W ang, and Yiran Zhong. Audio–visual segmen tation. In Eur ope an Confer ence on Computer Vision , pages 386–403. Springer, 2022. 151 Maxime Oquab, Timoth´ ee Darcet, Th´ eo Moutak anni, Huy V o, Marc Szafraniec, V asil Khalidov, Pierre F ernandez, Daniel Haziza, F rancisco Massa, Alaaeldin El-Noub y , et al. Dino v2: Learning robust visual features without sup ervision. arXiv pr eprint arXiv:2304.07193 , 2023. ZHANG P , et al. Sci China Inf Sci 45 152 Jean-Baptiste Alayrac, Jeff Donah ue, P auline Luc, Antoine Miech, Iain Barr, Y ana Hasson, Karel Lenc, Arth ur Mensc h, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual language mo del for few-shot learning. Advanc es in neur al information pr oc essing systems , 35:23716–23736, 2022. 153 T aiyu Zhang, Xuesong Zhang, Robbe Cools, and Adalberto Simeone. F ocus agen t: Llm-p o wered virtual fo cus group. In Pr oc e e dings of the 24th ACM International Conferenc e on Intel ligent Virtual Agents , pages 1–10, 2024. 154 W enlong Huang, F ei Xia, T ed Xiao, Harris Chan, Jac ky Liang, Pete Florence, Andy Zeng, Jonathan T ompson, Igor Mordatc h, Y evgen Cheb otar, et al. Inner monologue: Embo died reasoning through planning with language mo dels. arXiv pr eprint arXiv:2207.05608 , 2022. 155 Xiangli Le, Bo Jin, Gen Cui, Xunh ua Dai, and Quan Quan. Rflymad: A dataset for multicopter fault detection and health assessment. The International Journal of R ob otics Rese ar ch , 44(7):1081–1092, 2025. 156 Nikhila Ravi, V alentin Gabeur, Y uan-Ting Hu, Ronghang Hu, Chaitany a Ryali, T engyu Ma, Haitham Khedr, Roman R¨ adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos. arXiv pr eprint arXiv:2408.00714 , 2024. 157 Xin Lai, Zh uotao Tian, Y uk ang Chen, Y anw ei Li, Y uhui Y uan, Shu Liu, and Jiay a Jia. Lisa: Reasoning segmentation via large language model. In Pr o c ee dings of the IEEE/CVF c onfer ence on computer vision and pattern r e c o gnition , pages 9579–9589, 2024. 158 Jinyu Y ang, Mingqi Gao, Zhe Li, Shang Gao, F angjing W ang, and F eng Zheng. T rack an ything: Segment anything meets videos. arXiv pr eprint arXiv:2304.11968 , 2023. 159 Krishna Murthy Jata v allabhula, Alih usein Kuwajerwala, Qiao Gu, Mohd Omama, T ao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, et al. Conceptfusion: Op en-set multimodal 3d mapping. arXiv pr eprint arXiv:2302.07241 , 2023. 160 Lloyd Russell, Anthon y Hu, Lorenzo Bertoni, George F edoseev, Jamie Shotton, Elahe Arani, and Gianluca Corrado. Gaia-2: A controllable multi-view generative world mo del for autonomous driving. arXiv pr eprint arXiv:2503.20523 , 2025. 161 F ei-Y ue W ang. Imaginativ e intelligence for intelligen t v ehicles: Sora inspired new directions for new mobility and vehicle intelligence. IEEE T r ansactions on Intel ligent V ehicles , 9(4):4557–4562, 2024. 162 Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. Contin ual learning with deep generative replay . A dvances in neur al information pro c essing systems , 30, 2017. 163 Y u A Malko v and Dmitry A Y ash unin. Efficient and robust appro ximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on p attern analysis and machine intel ligenc e , 42(4):824–836, 2018. 164 Jeff Johnson, Matthijs Douze, and Herv´ e J´ egou. Billion-scale similarity search with gpus. IEEE transactions on big data , 7(3):535–547, 2019. 165 Adam Santoro, Ryan F aulkner, David Raposo, Jack Rae, Mike Chrzanowski, Theophane W eb er, Daan Wierstra, Oriol Viny als, Razvan Pascan u, and Timothy Lillicrap. Relational recurrent neural networks. A dvanc es in neur al information pr oc essing systems , 31, 2018. 166 Nathan Hughes, Y un Chang, and Luca Carlone. Hydra: A real-time spatial perception system for 3d scene graph construction and optimization. arXiv pr eprint arXiv:2201.13360 , 2022. 167 Xi Victoria Lin, Xilun Chen, Mingda Chen, W eijia Shi, Maria Lomeli, Richard James, P edro Rodriguez, Jacob Kahn, Gergely Szilv asy , Mike Lewis, et al. Ra-dit: Retriev al-augmented dual instruction tuning. In The Twelfth International Confer enc e on L earning R epr esentations , 2023. 168 Shib o Hao, Yi Gu, Hao di Ma, Joshua Hong, Zhen W ang, Daisy W ang, and Zhiting Hu. Reasoning with language model is planning with w orld mo del. In Pr o ce e dings of the 2023 Confer ence on Empirical Metho ds in Natural L anguage Pr o cessing , pages 8154–8173, 2023. 169 Jason W ei, Xuezhi W ang, Dale Sc huurmans, Maarten Bosma, Brian Ic hter, F ei Xia, Ed Chi, Quo c Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language mo dels. In A dvanc es in Neural Information Pro c essing Systems , 2022. 170 T akeshi Ko jima, Shixiang Shane Gu, Machel Reid, Y utak a Matsuo, and Y usuke Iwasa wa. Large language mo dels are zero-shot reasoners. In A dvanc es in Neural Information Pr o cessing Systems , 2022. 171 Denny Zhou, Nathanael Sc h¨ arli, Le Hou, Jason W ei, Nathan Scales, Xuezhi W ang, Dale Sch uurmans, Claire Cui, Olivier Bousquet, Quo c Le, and Ed Chi. Least-to-most prompting enables complex reasoning in large language mo dels. In International Confer ence on L e arning Repr esentations , 2023. 172 Xuezhi W ang, Jason W ei, Dale Sch uurmans, Quoc Le, Ed Chi, Sharan Narang, Aak anksha Chowdhery , and Denn y Zhou. Self-consistency improves chain of thought reasoning in language mo dels. In International Confer enc e on L e arning R epr e- sentations , 2023. 173 Y ao F u, Hao Peng, Ashish Sabharwal, Peter Clark, and T ushar Khot. Complexity-based prompting for multi-step reasoning. In International Confer ence on Le arning R epr esentations , 2023. 174 Michihiro Y asunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Lesk ov ec. QA-GNN: Reasoning with language models and knowledge graphs for question answering. In Pr o ce e dings of NAACL , 2021. 175 Xikun Zhang, An toine Bosselut, Michihiro Y asunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, and Jure Lesk ov ec. GreaseLM: Graph REAsoning enhanced language mo dels for question answering. In International Confer ence on L earning R epresentations , 2022. 176 Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Luk as Gianinazzi, Joanna Ga jda, T omasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and T orsten Ho efler. Graph of thoughts: Solving elab orate problems with large language mo dels. In Pro c e e dings of the AAAI Conferenc e on Artificial Intel ligenc e , 2024. 177 Patrick Lewis, Ethan Perez, Aleksandra Piktus, F abio P etroni, Vladimir Karpukhin, Naman Goy al, Heinrich K ¨ uttler, Mike Lewis, W en-tau Yih, Tim Ro c kt¨ aschel, Sebastian Riedel, and Dou we Kiela. Retrieval-augmen ted generation for kno wledge- intensiv e NLP tasks. In A dvanc es in Neur al Information Pro c essing Systems , 2020. 178 Vladimir Karpukhin, Barlas Oguz, Sew on Min, P atrick Lewis, Ledell W u, Sergey Edunov, Danqi Chen, and W en-tau Yih. Dense passage retriev al for op en-domain question answering. In Pr oc e edings of EMNLP , 2020. 179 Kelvin Guu, Kenton Lee, Zora T ung, P anupong Pasupat, and Ming-W ei Chang. REALM: Retriev al-augmented language model pre-training. In Pr o c ee dings of ICML , 2020. 180 Gautier Izacard and Edouard Grav e. Leveraging passage retrieval with generative mo dels for open domain question answer- ing. In Pro c e e dings of the 16th c onferenc e of the eur op e an chapter of the asso ciation for c omputational linguistics: main volume , pages 874–880, 2021. 181 Shunyu Y ao, Dian Y u, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Y uan Cao, and Karthik Narasimhan. T ree of ZHANG P , et al. Sci China Inf Sci 46 thoughts: Delib erate problem solving with large language mo dels. In Advanc es in Neur al Information Pr o cessing Systems , 2023. 182 Andy Zhou, Kai Y an, Michal Shlap en tokh-Rothman, Haohan W ang, and Y u-Xiong W ang. Language agent tree searc h unifies reasoning, acting, and planning in language mo dels. In Pr oc e edings of ICML , 2024. 183 Jonathan Uesato, Nate Kushman, Ramana Kumar, F rancis Song, Noah Siegel, Lisa W ang, Antonia Creswell, Geoffrey Irving, and Irina Higgins. Solving math word problems with pro cess- and outcome-based feedback. In A dvanc es in Neur al Information Pr oc essing Systems , 2022. 184 Luyu Gao, Aman Madaan, Shuy an Zhou, Uri Alon, Pengfei Liu, Yiming Y ang, Jamie Callan, and Graham Neubig. P AL: Program-aided language mo dels. In Pro c e e dings of ICML , 2023. 185 W enhu Chen, Xueguang Ma, Xinyi W ang, and William W. Cohen. Program of thoughts prompting: Disen tangling compu- tation from reasoning for numerical reasoning tasks. T r ansactions on Machine L earning R ese arch , 2023. 186 Liangming Pan, Alon Albalak, Xinyi W ang, and William Y ang W ang. Logic-LM: Emp ow ering large language mo dels with symbolic solvers for faithful logical reasoning. In Findings of EMNLP , 2023. 187 Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. T enenbaum, and Jia jun W u. The neuro-symbolic concept learner: Interpreting scenes, w ords, and sentences from natural supervision. In International Conferenc e on L e arning Repr esenta- tions , 2019. 188 Timo Schick, Jane Dwivedi-Y u, Rob erto Dess ` ı, Rob erta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. T oolformer: Language mo dels can teac h themselv es to use to ols. In A dvanc es in Neur al Information Pr oc essing Systems , 2023. 189 Y ujia Qin, Shihao Liang, Yining Y e, Kunliang Zhang, Y unxiao Zh u, et al. T oolLLM: F acilitating large language mo dels to master 16000+ real-world APIs. In International Confer enc e on L earning R epr esentations , 2024. 190 Shunyu Y ao, Jeffrey Zhao, Dian Y u, Nan Du, Izhak Shafran, Karthik Narasimhan, and Y uan Cao. ReAct: Synergizing reasoning and acting in language models. In International Conferenc e on L e arning R epresentations , 2023. 191 Y ongliang Shen, Kaitao Song, Xu T an, Dongsheng Li, W eiming Lu, and Y ueting Zhuang. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging F ace. In A dvanc es in Neur al Information Pro c essing Systems , 2023. 192 Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Y uheng Cheng, Ceyao Zhang, et al. MetaGPT: Meta program- ming for a multi-agen t collab orativ e framew ork. In International Confer enc e on Le arning R epr esentations , 2024. 193 Long Ouyang, Jeffrey W u, Xu Jiang, Diogo Almeida, Carroll W ainwrigh t, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ra y , et al. T raining language mo dels to follo w instructions with h uman feedback. In A dvanc es in Neur al Information Pro c essing Systems , 2022. 194 Nisan Stiennon, Long Ouyang, Jeffrey W u, Daniel Ziegler, Ryan Low e, Chelsea V oss, Alec Radford, Dario Amo dei, and Paul Christiano. Learning to summarize with human feedback. In Advanc es in Neur al Information Pro c essing Systems , 2020. 195 Anthony Brohan, Y evgen Cheb otar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. Do as i can, not as i say: Grounding language in rob otic affordances. In Conferenc e on r ob ot le arning , pages 287–318. PMLR, 2023. 196 Li Y u, Lianzheng Shi, Jianhua Zhang, Zhen Zhang, Y uxiang Zhang, and Guangyi Liu. Channelgpt: A large mo del toward real-world channel foundation model for 6g environmen t intelligence communication. IEEE Communications Magazine , 63(10):68–74, 2025. 197 Jorge P ellejero, Luis A Hern´ andez G´ omez, Luis Mendo T om´ as, and Zoraida F rias Barroso. Agentic ai for mobile netw ork ran management and optimization. arXiv preprint , 2025.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment