Intent-Based Networking (IBN) offers a promising paradigm for intelligent and automated network control in Industrial Internet of Things (IIoT) environments by translating high-level user intents into executable network strategies. However, frequent strategy deployment and rollback are impractical in real-world IIoT systems due to tightly coupled workflows and high downtime costs, while the heterogeneity and privacy constraints of IIoT nodes further complicate centralized policy verification. To address these challenges, we propose FEIBN, a Federated Evaluation Enhanced Intent-Based Networking framework. FEIBN leverages large language models (LLMs) to align multimodal user intents into structured strategy tuples and employs federated learning to perform distributed policy verification across IIoT nodes without exposing raw data. To improve training efficiency and reduce communication overhead, we design SSAFL, a Strategy Similarity Aware Federated Learning mechanism that selects task-relevant nodes based on strategy similarity and resource status, and triggers asynchronous model uploads only when updates are significant. Experiments demonstrate that SSAFL can improve model accuracy, accelerate model convergence, and reduce the cost by 27.8% compared with SemiAsyn.
a core enabling technology for modern industrial systems [1], [2]. Intent-Based Networking (IBN) provides a promising paradigm for intelligent operation in IIoT by allowing users to express desired outcomes through human-readable intents, which are automatically translated into executable policies for deployment and enforcement [3], [4]. However, IIoT intents often involve task execution goals, device coordination rules, safety constraints, and temporal requirements, rather than simple network configuration updates [5]. For example, in a sensing-driven environment equipped with temperature, humidity, water-level, and ultrasonic modules, an engineer may express intentions such as "increase the sampling priority of the ultrasonic sensing module" or "allocate more processing resources to the water-level monitoring zone." Ensuring that such high-level instructions are correctly interpreted and mapped to actionable IIoT strategies is crucial for safe and efficient system operation [6], [7]. Traditional intent analysis methods, which rely on rule-based or shallow semantic models [8], suffer from limited generalization and adaptability in complex industrial scenarios. Large Language Models (LLMs) [9], with their powerful semantic understanding and crossmodal reasoning capabilities, can integrate intents expressed across different modalities into a unified semantic representation, thereby significantly enhancing the intent recognition capability of IBN systems [10].
However, accurate intent recognition alone is insufficient to ensure reliable policy execution. Unlike traditional network management intents that primarily involve routing or configuration updates, IIoT intents directly drive physical actions, making incorrect interpretations or unsafe deployments potentially lead to costly downtime or even physical hazards [11], [12]. This necessitates thorough policy verification prior to deployment to prevent costly failures or interruptions [4]. Existing AI-based methods to verify network policies before actual deployment, which requires uploading operational and environmental data from multiple devices to a centralized server for model training and performance evaluation. Nevertheless, IIoT nodes are typically distributed and heterogeneous, and the data held by each node often involves sensitive information such as device parameters and operational status [13], rendering centralized evaluation and prediction model training infeasible. Federated Learning (FL) [14], [15], as a distributed collaborative learning framework, enables crossnode policy verification without requiring raw data to leave local devices [16]. FL can be categorized into synchronous FL and asynchronous FL. In synchronous FL, the server must wait for all clients to upload their updates, causing faster clients to idle until the slowest ones finish. This straggler effect slows down training and leads to inefficient resource utilization, resulting in prolonged aggregation time and delayed convergence [17], [18]. Asynchronous FL addresses the previously mentioned challenges by allowing the server to aggregate and update models promptly upon receiving a single client model [19]. This method significantly reduces the waiting times for faster clients and expedites the training process of the global model. Although integrating asynchronous FL with industrial intent-based networking effectively enhances distributed policy verification, it also brings the following new issues.
i. There is a lack of a complete framework that connects multimodal intent fusion, semantic translation, policy generation, and distributed verification into a unified process. Although several recent studies have introduced LLMs into IBN, existing LLMs can only process unstructured textual descriptions, which do not fully meet the requirements of multimodal inputs [20], [21]. Moreover, current IBN approaches for IIoT largely focus on intent interpretation while seldom integrating verification and feedback mechanisms into the overall workflow, making it difficult to form a closed-loop system in which intents can be accurately interpreted, reliably executed, and continuously optimized. ii. Because different strategies often correspond to distinct execution conditions and action sets [22], IBN policy verification tasks exhibit strong task-specific characteristics. However, existing methods usually neglect the relevance between nodes and strategies, with node evaluation metrics only focusing on capability, which can result in inefficient or low-value training. iii. IBN policy verification tasks impose strict requirements on communication efficiency and response time, since frequent uploads of minor updates may lead to resource waste and delay timely strategy deployment due to prolonged training [23]. Although asynchronous FL accelerates global model updates, it often results in redundant communication and unstable convergence due to uneven resource availability and unbalanced node participation.
To address these challenges, we propose a Federated Evaluation Enhanced Intent Based Networking (FEIBN) framework tailored for IIoT environments, which aims to enhance the precision and adaptability of intent understanding through multimodal alignment and semantic modeling, while mitigating the risks of high deployment costs and node heterogeneity inherent in traditional IBN systems. The framework is driven by user intents and employs multi-modal alignment and LLMs to more precisely and efficiently transform heterogeneous intent expressions into a unified policy semantic space. Meanwhile, a federated evaluation mechanism is introduced to verify the effectiveness of the generated strategies in a distributed manner, thereby ensuring data privacy and enhancing evaluation efficiency. Furthermore, because existing participation metrics overlook task relevance and strategy similarity, we design a Strategy-Similarity-Aware Federated Learning (SSAFL) mechanism within the framework to address the inefficiencies in training and communication during policy validation. This mechanism introduces a new metric called the participation score, which evaluates nodes based on both historical strategy similarity and resource availability. Nodes with higher participation scores, indicating stronger task relevance and greater resource availability, are dynamically prioritized for training. In addition, an asynchronous upload mechanism based on model update magnitude is adopted, allowing only significant local updates to be uploaded. This design effectively reduces communication overhead while maintaining model convergence quality. The major contributions of this paper are as follows. Section II reviews the related work. Section III presents the system model of the proposed FEIBN framework. Section IV details the design of the SSAFL. Section V presents the experimental setup and results. Section VI concludes the paper.
IBN abstracts user requirements into high-level intents and automatically maps them to executable network policies, offering a promising approach for achieving automated and intelligent network control in IIoT environments. With the advancement of artificial intelligence, some studies have leveraged AI-driven methods to enhance intent understanding. The authors in [24] introduced an AI-powered IBN architecture that automates the mapping from user intents to policy execution logic. In addition, LLMs have also been explored as powerful tools for semantic alignment in IBN systems. The authors in [25] designed a custom LLM-driven framework for extracting intents in 5G core networks, showcasing significantly improved intent interpretation for policy generation. The authors in [26] proposed an LLM-guided assurance mechanism to detect and correct intent drift in real time, ensuring policy consistency. The authors in [27] introduced an industrial Agentic AI system that decomposes high-level intent into executable control flows using LLM agents, demonstrating feasibility in predictive maintenance scenarios. A summary of related studies is provided in Table I. However, due to the involvement of multiple productionline devices in IIoT environments, it is impractical to frequently deploy and roll back strategies in real-world industrial operations. IBN in IIoT still lacks effective mechanisms for verifying the effectiveness of strategies prior to deployment. In addition, the heterogeneity and distributed nature of IIoT nodes further exacerbate the complexity of centralized policy verification and coordination.
Due to the wide distribution of IIoT nodes and the high sensitivity of local data, federated learning often faces practical challenges such as task diversity, heterogeneous device capabilities, and varying policy applicability across clients, making it difficult to meet the personalized and efficient requirements of IBN policy verification. To address this, several studies have focused on task-aware federated learning approaches. The authors in [28] proposed a federated learning method that emphasizes task similarity among clients by adopting a confidence-aware weighted aggregation strategy, guiding clients with similar tasks to share model parameters more closely and thus improving knowledge transfer efficiency. The authors in [29] introduced a task-granular knowledge aggregation method, where each client selectively integrates only the task-relevant parts of global knowledge to reduce communication costs and mitigate catastrophic forgetting. The authors in [30] presented a personalized federated learning framework based on task similarity, which dynamically adjusts aggregation weights to enhance collaborative effectiveness across tasks. The authors in [23] developed an asynchronous federated learning framework tailored for heterogeneous IoT environments, utilizing asynchronous updates and adaptive aggregation to improve training efficiency and overall stability under non-synchronous conditions. A summary of related studies is provided in Table II. However, most of these methods are designed for generalpurpose learning tasks and lack mechanisms specifically tailored for IBN policy verification, such as explicit modeling of task relevance, policy-semantic alignment, and strategyaware client selection. Particularly in IIoT-based IBN scenarios, where devices are highly heterogeneous, node states are dynamic, and both semantic relevance and communication efficiency are critical, existing approaches fall short in balancing training efficiency with verification quality.
To enable intelligent intent understanding and distributed policy verification in IIoT environments, we propose the FEIBN, as illustrated in Fig. 1. The FEIBN framework consists of four core modules: intent expression, intent translation, intent analyses, and network configuration. First, in the intent expression module, users express their intents in multiple modalities, which are processed by a multimodal alignment module composed of pretrained encoders to extract semantic features. These features are then fused and interpreted in the intent translation module by an LLM, producing a structured strategy tuple. Next, in the intent analysis module, strategy validation is initiated across distributed IIoT nodes. A similarity-aware participation scoring mechanism evaluates each node’s relevance to the current strategy and its available resources. Based on this score, a subset of high-quality nodes is selected to participate in local training. Each participating node computes the magnitude of its local model update and uploads the update only if it exceeds a dynamic threshold, ensuring communication efficiency. Finally, in network configuration module, the central server aggregates these updates to evaluate the policy effectiveness, and, if validated, the policy is deployed to the industrial control system for execution. The main notations used in this paper are shown in Table III.
In IIoT environments, user intents may appear in diverse forms. For instance, a field operator may issue a voice command such as “prioritize safety policies in the pump station due to abnormal vibration,” a supervisor may send a text message like “increase throughput of line B by 10% within 2 hours,” while a monitoring system may provide a visual signal indicating machine overheating. These heterogeneous inputs contain complementary cues, text captures explicit goals, audio conveys urgency or priority, and vision reflects real-time physical states.
We develop an intent expression module in FEIBN that projects text, audio, and images into a unified semantic space, ensuring that intents expressed across diverse industrial contexts can be uniformly interpreted and effectively processed. To achieve consistent interpretation, these heterogeneous signals are first encoded into modality-specific embeddings. Specifically, textual sequences are processed using a pretrained BERT encoder [31], audio waveforms are transformed into latent representations by Wav2Vec2 [32], and visual inputs are converted into high-level semantic features via ResNet [33]. These models are selected for their strong generalization ability and proven robustness across multiple tasks, making them suitable for industrial scenarios where signals exhibit diverse formats and noise patterns. Since these encoders produce representations in different spaces, a learnable linear projection is applied to map each modality into a unified latent space as follows:
where m represents the type of input. W m and b m are trainable parameters. The projected embeddings from multiple modalities are concatenated and then processed by a Transformer encoder, which models cross-modal dependencies and contextual relations among modalities. For example, it can associate the spoken phrase “slow down” with a corresponding visual cue of increasing conveyor-belt speed, thereby reinforcing semantic coherence. Through self-attention, the Transformer learns which modality carries dominant information for a given intent. The resulting fused representation z serves as a comprehensive semantic descriptor that combines tex-tual precision, auditory intent strength, and visual situational awareness. Finally, the fused representation z is passed to an LLM (e.g., GPT [34], DeepSeek [35], and LLaMA [36]), providing a coherent semantic interface for LLM-based intent translation. This unified representation enables the LLM to reason over structurally consistent inputs, thereby improving the accuracy and stability of policy generation and forming the foundation for subsequent strategy generation and strategysimilarity evaluation.
To ensure that high-level intents can be accurately and efficiently deployed in IIoT networks, the unified semantic representation needs to be converted into executable network strategies. In the intent translation module, the LLM is used for strategy generation, transforming abstract multimodal semantics into actionable and verifiable network configurations. The output strategy generated by an LLM is formally represented as a structured intent tuple, denoted as
where U denotes the user who defines the intent. G denotes the objective. E denotes the infrastructure for deploying the intent.
A denotes the set of actions to be executed in the network.
T denotes the period that the required service is scheduled to occur.
Once the intent tuple S is received, the Central Strategy Engine transforms it into executable strategy tuples, denoted as
where G ′ =< g 1 , g 2 , . . . , g n > denotes the set of goals, representing the target objectives that the strategy aims to achieve, where each goal g n can be formally expressed as g n = ℓ n ▷ θ n , with ℓ n representing a metric, θ n a threshold, and ▷ a relational operator (e.g., >, <, ≥, ≤). E ′ =< e 1 , e 2 , . . . , e k > identifies the devices or resources affected by the strategy. A ′ =< a 1 , a 2 , . . . , a k > denotes the set of actions to be executed, with each action a k indicating a concrete operational step. T denotes the period in which the strategy is expected to take effect. Specifies when the required service behavior should be enacted. Below, we provide an example output of the intent translation module for a user intent such as “reduce communication delay for the ultrasonic sensing module”:
5}}, (0, 600s) >. The field U identifies the initiating user (operator 02). The goal set G ′ specifies that the end-toend latency should be kept below 15ms. The entity set E ′ indicates that the strategy targets the ultrasonic module. The action set A ′ describes a concrete network operation, namely a QoS adjustment that raises the scheduling priority of the corresponding traffic to level 5, encoded through the type and params fields. Finally, the time field T defines a 600 second window during which this strategy should be enforced.
In IIoT environments, where production lines are fixed and downtime costs are high, it is impractical to validate strategies through frequent real-world deployments. Therefore, the intent analysis module is designed to evaluate the effectiveness of strategies in a distributed manner prior to actual deployment.
The intent analysis module initiates a federated learning based on strategy S to collaboratively train a predictive model capable of evaluating the strategy. We represent the set of IIoT nodes involved as I node = {1, . . . , i, . . . , I}. Each node i ∈ I node possesses a local dataset D i , consisting of samples (x j , y j ), where x j denotes the input feature, and y j is the corresponding label indicating whether policy S is suitable under the local context. Let w denote the shared model parameter and f (w; x j ; y j ) be the loss function on the j-th sample. The local objective of node i is defined as
where |D i | represents the size of the dataset D i . Therefore, the loss function F (w) of the server side can be calculated as
where
According to the above loss function, the optimization objective of FL can be formulated as
where w * is the optimal global model. After convergence, the global model outputs a deployability score for strategy S, reflecting the probability that S can achieve its goal set G ′ across heterogeneous IIoT nodes. Highscoring strategies are approved for configuration and deployment, while low-scoring ones are refined or re-evaluated. Furthermore, to achieve efficient and scalable federated evaluation across heterogeneous IIoT nodes, a strategy similarity aware federated learning mechanism is employed. which is discussed in Section IV.
After the intent analysis module verifies that a candidate strategy S satisfies the performance and safety requirements, the strategy proceeds to the network configuration stage for deployment in the industrial environment. In this stage, the verified intent is translated into executable control commands that are delivered to the corresponding network elements and industrial devices.
The action set A ′ is mapped to concrete configuration commands for each entity e ∈ E ′ , which can be abstracted as
where c e denotes the configuration state of entity e and Φ e (•) represents the configuration mapping implemented by the controller.
During deployment, real-time telemetry data, such as latency, bandwidth utilization, equipment status, and workload metrics, are continuously collected and compared with the expected performance objectives defined during strategy generation. Let ln (t) denote the measured value of metric ℓ n at time t. The satisfaction indicator of goal g n at time t is defined as
and the overall satisfaction of strategy S ′ at time t is given by
where J S (t) = 1 indicates that all goals in G ′ are satisfied and J S (t) = 0 otherwise. Over a deployment window T , the empirical satisfaction probability of S ′ is computed as
where |T | denotes the number of observation instants in T . When deviations from the desired targets are detected, e.g., when p S < p min for a predefined reliability threshold p min ∈ (0, 1), the system dynamically adjusts configuration parameters or triggers re-verification through the federated evaluation process. This adaptive feedback ensures that each deployed strategy remains valid and stable even under varying network conditions or workload fluctuations. The network configuration module bridges the gap between intent-level decision-making and operational execution. It ensures that every strategy applied in the IIoT system is validated, explainable, and adaptive to dynamic industrial environments, thereby enabling trustworthy and autonomous operation within the intent-based networking framework.
In IFEIBN, federated learning is employed to enable distributed policy verification. Traditional FL methods are primarily designed for general-purpose tasks and therefore cannot effectively distinguish which nodes possess the historical knowledge most relevant to the current strategy, nor can they leverage such relevance to guide efficient model training. To address this limitation, we design SSAFL, which introduces a strategy similarity metric to quantify the semantic closeness between the current strategy and each node’s historical strategy set. SSAFL adaptively selects nodes that are both semantically aligned and resource-sufficient, ensuring that nodes with the highest contribution value participate more substantially in the FL process. Furthermore, SSAFL incorporates a similaritydriven asynchronous update mechanism to prioritize meaningful model uploads and aggregation. As shown in Fig. 2, each node evaluates its strategy similarity score and resource availability score, which together determine its adaptability score for participation in the current federated round. Nodes with adaptability scores exceeding the upload threshold are selected to upload their local model updates to the server, while the others are temporarily excluded from the aggregation process. The server then performs a weighted aggregation to update the global model and redistributes it to the nodes that contributed updates. This mechanism ensures that nodes with higher semantic relevance to the current strategy and sufficient computational resources contribute more effectively to the global optimization process.
To accurately quantify the similarity between the current strategy S and the historical strategies maintained by nodes in FEIBN, we design a strategy similarity metric. This metric is decomposed into three components: action similarity, condition similarity, and resource similarity. The strategy similarity score of node i for strategy S is defined as
where γ 1 , γ 2 ∈ [0, 1] are weights satisfying γ 1 + γ 2 = 1.
|A∩Ai,j | |A∪Ai,j | denotes the action similarity, which evaluates the overlap between the action sets of the two strategies, and is calculated using the Jaccard similarity coefficient. |•| denotes the cardinality of a set, a value of 1 indicates identical action sets, and a value of 0 indicates no common actions.
c∈C max c ′ ∈ci,j h(g, g ′ ) denotes the condition similarity, which measures the degree of alignment between the conditions under which actions are applied. h (g, g ′ ) is a pairwise condition similarity function, which can be defined as
where µ g and µ g ′ are the thresholds of conditions c and c ′ . α g > 0 is a scaling factor controlling the sensitivity to threshold differences. h (g, g ′ ) adopts an exponential decay formulation to measure the semantic closeness between two intent conditions. It ensures that two conditions exhibit a high similarity score when they involve the same performance metric and their thresholds are close, while their similarity decreases rapidly as the threshold gap widens [37]. Such behavior naturally reflects the semantics of intent conditions in IBN, where even small deviations in latency, loss, or throughput constraints may lead to significantly different operational requirements.
To efficiently select IIoT nodes for federated training in FEIBN, we design a suitability score H i that evaluates each node’s potential contribution based on two key factors: strategy similarity and resource availability. The suitability score guides the asynchronous training process by preferentially selecting nodes most relevant to the current validation task. For a node i and a target strategy S, the suitability score H i is defined as
where Res i denotes the current resource status of the node i.
The resource availability score Res i captures the computational and communication readiness of the node and is computed as
where U i denotes the normalized CPU utilization of node i. B i denotes the normalized available communication bandwidth. δ 1 , δ 2 ∈ [0, 1] are resource-specific importance weights satisfying
Given a threshold τ s , node i is selected to participate in the current training round if H i ≥ τ s . Otherwise, it remains idle for this training.
To efficiently validate strategies in FEIBN, we adopt an asynchronous FL approach, where node participation and model updates occur independently based on each node’s readiness and relevance to the current validation task. Upon receiving the current validation strategy S from the server, each selected node i initiates local training. Each node computes the L 2 norm of its local model update, denotes as
where θ t i denotes the node’s local model parameters after training. θ t-1 global denotes the latest global model parameters received by the node before local training. We define ∥∆θ t i ∥ 2 as the distance between the model trained by node i and the global model.
We set an update threshold for the node to upload its update only when it exceeds this threshold. The update threshold is defined as
where ϵ base is the base threshold value. λ s is the scaling factor controlling the influence of similarity on the threshold.
The node uploads its model update ∆θ t i to the server if and only if ∥∆θ t i ∥ ≥ ϵ i . Otherwise, the node will continue to train its local model until the model distance reaches a threshold, thus avoiding unnecessary communication overhead.
When a node i uploads its local model update ∆θ t i to the server after passing the upload threshold, the server performs asynchronous aggregation immediately without waiting for other nodes. The server receives ∆θ t i and computes the preliminary weight w ′ i as
To avoid the situation where important nodes contribute insignificantly due to small update magnitudes, we introduce a minimum weight protection mechanism. The final aggregation weight w i is defined as
where w i is a predefined minimum weight threshold. Q (t) denotes the set of nodes whose updates have been received by the server in the current aggregation server. The server asynchronously updates the global model using:
We define the communication cost incurred by node i after the t-th round of local training as Γ t i . If the local model θ t i satisfies ∥∆θ t i ∥ ≥ ϵ i and uploads the model, we define Γ t i = 1. Therefore, the communication cost of the node is more formally expressed as
In the FEIBN, the objective of SSAFL is to minimize the overall communication cost throughout the federated validation process while ensuring that the final global model achieves acceptable validation accuracy. The communication cost of each client throughout the training process is abbreviated as
i , where T i denotes the number of rounds trained by the i-th node. Then, the objective function can be formulated as
where θ * is the optimal FL training model, and ν is a constant. for e = 1 to E i do 5:
end for 7:
Upload ∆θ i (and metadata such as |D i |, t) to server 10:
wait for next θ t+1 from server; θ t ← θ t+1 ; θ i ← θ t 11: end if 14: until receive signal from server
The proposed SSAFL training process consists of two components: a client-side training procedure (Algorithm 1) and a server-side coordination mechanism (Algorithm 2). The client module handles local training and decides whether to upload updates based on an update norm threshold. The server module computes similarity-aware participation scores to select relevant nodes and aggregates valid updates asynchronously.
Algorithm 1 specifies the behavior of each participating client. After initialization with the received global model θ t , intent tuple S, and threshold ϵ i , the client performs local SGD training (Lines 3-6) according to Eq. ( 4). It then computes the update ∆θ i = θ i -θ t and its L2 norm (Line 7, Eq. ( 15)). If the update magnitude exceeds the threshold ϵ i (Lines 8-10), the client uploads ∆θ i to the server and waits for the next global model. Otherwise, it continues local training to accumulate larger updates (Lines 11-12), thereby avoiding unnecessary communication. The process repeats until a stop signal is issued by the server (Line 13).
Algorithm 2 describes the federated training and aggregation procedure executed by the central server. Lines 1-4 compute the strategy similarity Sim i (S) (Eq. ( 11)) and resource availability Res i (Eq. ( 14)) for each node, then derive the suitability score H i using Eq. ( 13). Line 5 selects nodes with H i ≥ τ s to participate in training, ensuring only taskrelevant and resource-capable nodes are involved. Lines 6-8 set personalized upload thresholds ϵ i according to Eq. ( 16), making high-similarity nodes more likely to upload. Lines 9-24 form the asynchronous event-driven loop: updates are received (Lines 11-13) and pre-weights w ′ i are computed (Eq. ( 17)); micro-batch aggregation is triggered (Lines 14-21), where minimum weight protection and normalization are applied before updating the global model via Eq.( 19). The communication counters Γ i are updated following Eq. (20). Finally, convergence is checked (Lines 22-24) based on Eq. ( 21), and the global model θ t is returned (Line 26).
The computational cost of SSAFL follows the same order Algorithm 2 Server-side SSAFL in FEIBN Require: Intent tuple S = ⟨U, G, E, A, T ⟩, initial global model θ 0 ; weight sets γ, β, δ; thresholds τ s , ϵ base ; scale λ s ; minimum weight w min ; stopping tolerance ν. 1: // Node scoring and selection uses Eqs. ( 11),( 13),( 14) 2: for each node i do 3:
Compute Sim i (S) by Eq. ( 11); compute Res i by Eq. ( 14); 4:
Send (θ t , S, ϵ i ) to client i 11: end for 12: // Event-driven asynchronous aggregation uses Eqs. ( 15),( 17),( 19) 13: Initialize a short event window ∆ and buffer Q(t) = ∅ 14: loop 15:
Upon receiving update ∆θ i from any i ∈ P : 16: [38]. Regarding communication, each client transmits its update only when the condition in Eq. ( 16) is satisfied. The expected number of transmissions per client is thereby reduced from T i .
According to the convergence conditions in the FL definition given by the literature [39], the convergence of the proposed SSAFL update rule can be analyzed following the asynchronous federated optimization framework in [40]. The detailed convergence analysis of SSAFL is provided in Appendix A.
We model the strategy validation problem as a regression task, where the goal is to predict the effectiveness score of a given strategy unit S =< U, G, E, A, T > within its contextual environment. The predicted value ŷ is employed to approximate the true deployable outcome y.
Experimental Environment. The experiments were carried out on a computing platform running Ubuntu 22.04.5 LTS, equipped with an Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz and 4 × NVIDIA RTX 3090 GPUs. The experiments were implemented in Python 3.9, with federated training simulated using the FedML framework.
Datasets. Datasets used in this experiment consists of two components. The first part is device parameter data obtained from the publicly available Edge-IIoTset [41] dataset, which includes real device operation logs and sensor parameters across various IIoT scenarios, thereby providing a representative reflection of IIoT node behaviors and characteristics under different operating conditions. The second part is intentrelated data, which encompasses common business requirements in IIoT scenarios, such as bandwidth allocation, latency constraints, and energy-throughput trade-offs. In our setup, each client holds heterogeneous data sources, which naturally form a feature-skew non-IID distribution. Moreover, since the performance gains of SSAFL stem primarily from its similarity-aware scoring mechanism and asynchronous evaluation dynamics rather than from dataset-specific statistical properties, the same qualitative trends are expected to hold across different datasets. Methods. We conducted comparative experiments on several federated learning strategies, including FedAvg [42], Federated Asynchronous Learning (FedAsyn) [40], and Semi-Asynchronous FL (SemiAsyn) [43]. In FedAsyn, the server updates the global model immediately upon receiving an update from any client, whereas in SemiAsyn, the server performs an update once it has received updates from top k clients.
To evaluate the contribution of the multimodal alignment module, we analyze the accuracy of the generated strategy tuples S. As shown in Fig. 3, the alignment module notably improves the precision of slot prediction, with the most significant gain observed in the “Action”. This indicates that multimodal semantic fusion helps the model capture complex operational intents that cannot be fully expressed in text alone.
Fig. 4 shows the variation in the number of federated evaluations under different matching accuracies. As alignment accuracy increases from 0.6 to 0.9, the number of evaluations performed decreases significantly. This result indicates that higher alignment quality enhances the semantic consistency of the strategies generated by the LLM (i.e., GPT-5.1 and DeepSeek-V3.2), enabling the system to make more accurate and confident decisions. Consequently, fewer redundant verifications are required, thereby improving the overall efficiency of the federated evaluation process. Fig. 5 shows the total time required for strategy deployment across different methods. Adding only the alignment module slightly increases the deployment time due to the additional semantic parsing process. In contrast, FEIBN that integrates both alignment and federated evaluation results in a higher overall time cost, especially under lower alignment accuracy such as FEIBN-0.6, where more verification rounds are required. As the alignment accuracy increases to FEIBN-0.9, the deployment time decreases accordingly, indicating that improved alignment quality enhances the efficiency of federated validation and reduces the number of verifications.
We randomly assign each node a subset of the training data from the dataset as its local training set, while the test set is retained on the server for performance evaluation. Following previous experimental settings, we compare SSAFL with other FL methods, with each method repeated five times. In addition, an ablation experiment is conducted on the adaptive model aggregation at the server side within SSAFL to verify the impact of this controllable factor on model training. When SSAFL does not include adaptive aggregation, it is denoted as SSAFL*. The experimental results are reported in Table IV as point estimates using the mean ± standard deviation. Fig. 6 illustrates the R²-based training curves of five federated learning methods. SSAFL achieves the best training performance among all compared methods, converging to an R² of 0.89 within only 15 epochs. Its ablated variant SSAFL* also performs well, validating the effectiveness of similarity-aware node selection. FedAvg and FedAsyn show slower convergence and lower final R² scores, around 0.85 and 0.83 respectively. Overall, these results highlight the advantages of combining intent-aware participation scoring and asynchronous communication in federated policy verification.
To evaluate the communication cost of different FL strategies under heterogeneous client latency, we configure Client 1, Client 5, and Client 10 as fast, medium, and slow clients, respectively, by assigning different local training times and upload delays. The experimental results are displayed in Fig. 7. Synchronous FedAvg produces identical communication rounds for all clients since each aggregation must wait for the slowest client. In contrast, asynchronous strategies show clear disparities. Fast clients upload much more frequently, while slow clients contribute fewer updates. SSAFL achieves the lowest communication rounds across all clients by suppressing redundant fast-client uploads and filtering low-impact updates from slow clients.
In this paper, we have proposed FEIBN, a Federated Evaluation Enhanced Intent-Based Networking framework tailored for IIoT environments. FEIBN leverages large language models to align heterogeneous multimodal intents into structured strategy tuples, and integrates federated learning to achieve distributed policy verification without exposing sensitive local data. To address the challenges of communication cost and training efficiency, we have further designed SSAFL, a Strategy Similarity Aware Federated Learning mechanism that combines similarity-aware node selection with adaptive asynchronous update thresholds. The experiments have demonstrated that SSAFL significantly improves model accuracy and convergence speed while reducing communication overhead compared with existing synchronous and asynchronous baselines. The ablation studies further validated the effectiveness of similarity-aware participation scoring and adaptive aggregation in enhancing federated policy verification.
According to the convergence conditions in the FL definition given by [39], [44], it is assumed that Centralized Learning converges to the optimal model parameter θ (c) and FL converges to the optimal model parameter θ (f ) . If the gap between the two is small enough, that is, θ (f ) -θ (c) < ρ (ρ is an infinitesimal constant), it means that the FL model can converge.
We analyze the proposed SSAFL under standard smoothness assumptions [45], [46] for the global objective F (θ) = I i=1 p i F i (θ), where p i = |Di| j |Dj | . Recall that in each aggregation event, the server updates θ t+1 = θ t + i∈Q(t) w i ∆θ ti i , where Q(t) is the set of arrived clients within the micro-batch window, t i ≤ t is the (possibly stale) local generation time of ∆θ ti i , and w i are similarity-aware aggregation weights after minimum-weight protection and renormalization. Each client i uploads only if ∥∆θ ti i ∥ 2 ≥ ϵ i , where ϵ i = ϵ base (1 + λ s (1 -Sim i (S))). , where wi are the pre-weights before minimum-weight protection. Assumption A6 is mild: with thresholded uploads and renormalization, the effective deviation from the pre-weighted update is bounded; the bound improves as ϵ base ↓ or λ s ↓.
By L-smoothness and the update rule, F (θ t+1 ) ≤ F (θ t ) + ∇F (θ t ), i∈Q(t) w i ∆θ ti i + L 2 i∈Q(t) w i ∆θ ti i
. Each client’s local update with step size η and E i steps satisfies E[∆θ ti i | θ ti ] ≈ -ηE i ∇F i (θ ti ) and E∥∆θ ti i ∥ 2 ≤ c 1 η 2 E 2 i (∥∇F i (θ ti )∥ 2 + σ 2 ) for some constant c 1 determined by the local optimizer. Using bounded staleness (A3) and smoothness, we relate stale gradients to current ones: ∥∇F i (θ ti )-∇F i (θ t )∥ ≤ L∥θ ti -θ t ∥ ≤ c 2 ητ max , which yields the following descent lemma.
Lemma 1 (Descent with staleness and trigger). Under A1-A6 and η ≤ Theorem 2 (PL condition). If F satisfies the Polyak-Łojasiewicz (PL) inequality 1 2 ∥∇F (θ)∥ 2 ≥ µ(F (θ) -F ⋆ ) for some µ > 0, then for η ≤ min{ 1 4L , µ 4L 2 (τmax+1) }, E[F (θ t+1 ) -F ⋆ ] ≤ (1 -µηE) E[F (θ t ) -F ⋆ ] + c 4 η 2 Σ + ηLτ max +ζ , i.e., linear convergence to a neighborhood whose radius scales with variance Σ, staleness τ max , and trigger bias ζ.
Thresholds & similarity. Larger similarity Sim i (S) gives smaller ϵ i and hence more frequent uploads; this reduces ζ (smaller trigger bias) and tightens the neighborhood in Theorem 2, at the cost of more communication. Conversely, a larger λ s or ϵ base shrinks traffic but increases ζ.
Minimum-weight protection. Enforcing w i ≥ w min prevents starvation of informative but low-magnitude updates, which stabilizes E and improves the contraction factor 1 -µηE.
Staleness. A smaller micro-batch window and bounded network delay keep τ max small, reducing the degradation terms O(Lητ max ) and improving both bounds.
Overall, SSAFL achieves standard convergence guarantees of asynchronous federated optimization under common assumptions, while its similarity-aware triggering and weighting introduce explicit, controllable trade-offs among accuracy, communication, and delay.
j , w i ← max{w min , wi }, w i ← t ) ≤ F (θ * ) + ν or t ≥ T max then
This content is AI-processed based on open access ArXiv data.