Fine-Grained Network Traffic Classification with Contextual QoS Profiling
Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. While recent advances in application-level classification sh…
Authors: Huiwen Zhang, Feng Ye
1 Fine-Grained Network T raf fic Classification with Conte xtual QoS Profiling Huiwen Zhang, Graduate Student Member , IEEE, Feng Y e, Senior Member , IEEE Abstract —Accurate network traffic classification is vital f or managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. While recent adv ances in application-level classification show high accuracy , they often miss fine-grained in-app QoS variations critical for service differentiation. This paper proposes a hierarchical graph neural network (GNN) framework that combines a three-lev el graph representation with an automated QoS-aware assignment algorithm. The model captures multi-scale temporal patterns via packet aggregation, time-window clustering, and session-le vel beha vior modeling. QoS priorities are derived using five key metrics (bandwidth, jitter , packet stability , burst frequency , and burst stability), processed through logarithmic transformation and weighted ranking. Eval- uations across 14 usage scenarios from Y ouT ube, Prime Video, TikT ok, and Zoom show that the proposed GNN significantly outperforms state-of-the-art methods in service-level classifica- tion. The QoS-aware assignment further r efines classification to enhance user experience. This work advances QoS-aware traffic classification by enabling precise in-app usage differentiation and adaptive service prioritization in dynamic network envir onments. I . I N T RO D U C T I O N Accurate network traf fic classification (NTC) is essential for effecti ve network management, particularly in the context of emerging and future applications that demand stringent performance guarantees. For example, applications such as edge cloud computing [1], real-time extended reality [2], autonomous vehicle communication [3], and industrial IoT [4] rely heavily on low-latency , high-throughput, and highly reli- able network services. These applications introduce complex traffic patterns characterized by variable data rates, strict la- tency constraints, and dynamic resource demands that fluctuate in real time. As a result, precise NTC and Quality of Service (QoS) management hav e become increasingly critical to ensure application performance and user experience. NTC has ev olved significantly o ver the years, transition- ing from traditional port-based methods and deep packet inspection to more adv anced statistical and AI-driv en tech- niques [5]–[7]. Recent developments in application-le vel NTC hav e achiev ed notable success, particularly in encrypted traf- fic classification, where models can infer application types without accessing payload content [8]. These approaches have demonstrated high accuracy in Internet and mobile application identification [9]–[12], and have also significantly advanced This project is partially supported by the U.S. National Science Foundation under Grant 2344341. Huiwen and Feng Y e (corresponding) are with the Department of Electrical and Computer Engineering, Uni versity of W isconsin-Madison, Wisconsin, WI, USA. Emails: { hzhang2279, feng.ye } @wisc.edu. anomaly detection [13]–[15], enabling proacti ve service as- surance and threat mitigation. Howe ver , most existing methods focus on identifying the application or traffic type rather than capturing nuanced in-app QoS differences, such as distin- guishing between video streaming at dif ferent resolutions or between interactive and background data flows. This limi- tation stems from their original design goals, which priori- tized coarse-grained classification over the fine-grained service differentiation required for advanced QoS provisioning. T o address these limitations, QoS-oriented NTC methods have emerged, aiming to classify traffic based on QoS attributes such as throughput, latency , and jitter [16], [17]. While these methods provide valuable insights into network performance, they often rely on handcrafted features and manually defined service categories, which limit their scalability and adaptabil- ity to new or e volving applications. Furthermore, the rigid mapping between traffic patterns and QoS labels can hinder generalization across div erse network environments. In this work, a new hierarchical graph neural network (GNN) framew ork is proposed to address these challenges. The proposed framework inte grates a three-lev el hierarchical graph representation with an automated, magnitude-based QoS awareness assignment algorithm. It captures multi-scale tem- poral patterns through packet aggregation at Lev el-1, time window clustering at Level-2, and session-lev el behavioral modeling at Level-3. Classification is performed at the time window le vel (Level-2), le veraging both fine-grained packet- lev el features and broader session-lev el context to enhance usage pattern discrimination. Moreo ver , the ne wly de veloped QoS awareness assignment algorithm takes into consideration fiv e different QoS attrib utes, including bandwidth, jitter , pack et stability , burst frequency , and burst stability . By taking a logarithmic transformation of the raw values, each traffic flow can be dynamically assigned to a QoS class defined by all fiv e metrics. A weighted ranking algorithm is further implemented to establish data-driv en service priorities that are automatically adapting to traffic distribution characteristics. Ev aluations are conducted on traffic traces collected from 14 different usage scenarios across Y ouT ube, Prime V ideo, TikT ok, and Zoom. The results demonstrate that the ne wly de veloped QoS-aware NTC enables fine-grained differentiation of in-app usage pat- terns (e.g., T ikT ok browsing vs. li ve streaming vs. long-form video) while ensuring appropriate QoS provisioning by priori- tizing service quality preserv ation over resource optimization, comparing to a standard application-level NTC approach. The contributions of this work are fourfold. First, a novel three-lev el hierarchical graph representation is introduced, cap- turing temporal dependencies from packet-lev el interactions to 2 session-lev el behaviors, thereby enabling fine-grained traffic classification beyond traditional application-lev el identifica- tion. Second, an automated magnitude-based QoS aw areness assignment algorithm is developed, using logarithmic transfor- mation and automated grouping to establish consistent, data- driv en QoS priorities across di verse network conditions. Third, a QoS-aware training framework is proposed, incorporating composite loss functions and inference strategies that prior- itize service quality preservation, ensuring over -provisioning rather than under -provisioning for critical applications. Finally , comprehensiv e experimental validation is conducted, demon- strating significant impro vements in QoS Experience while maintaining competitiv e classification performance across 14 distinct usage scenarios spanning Y ouT ube, Prime V ideo, T ikT ok, and Zoom. The remainder of this paper is organized as follows: Sec- tion II revie ws related work in network traf fic classification and QoS-aware systems. Section III presents the proposed hierar- chical GNN frame work and graph construction methodology . Section IV details the QoS awareness assignment algorithm and QoS-aware training strategies. Section V provides com- prehensiv e experimental ev aluation and results analysis. The paper concludes in Section VI with future research directions. I I . R E L A T E D W O R K A. AI-based Network T raffic Classification Prior research in NTC has laid a strong theoretical founda- tion from multiple perspectives. T able I summarizes represen- tativ e research on NTC, focusing on recent AI techniques. As it shows, recent AI-based solutions hav e demonstrated near-perfect accuracy (typically around 90%) in basic app identification e ven on encrypted traf fic [18]–[21]. Among these approaches, GraphDapp [20] models traf fic flo ws as graph structures, where nodes represent network endpoints and edges capture communication patterns, enabling ef fecti ve app iden- tification with 89% accuracy through graph neural networks. ProGraph [21] extends graph-based approaches by incorporat- ing protocol-lev el features and achieving over 92% accuracy in distinguishing between different applications under distinct networking scenarios. Parallel efforts in network intrusion and anomaly detection hav e also achieved high accuracy rates ( > 95%) [22]–[27]. CADE [27] employs contrastive learning to detect adversarial attacks in encrypted traffic, achieving ov er 95% detection accuracy by learning robust feature rep- resentations. A CID [26] improv es model robustness against ev asion attacks, demonstrating 99% accuracy in identifying malicious traf fic patterns. B ARS [25] specifically addresses the robustness of NTC systems against adversarial perturbations. Howe ver , most existing work focuses on coarse-grained app- lev el labeling. In practice, traffic patterns from the same app can be highly heterogeneous, reflecting the contextual complexity of service behaviors. As a result, while these solutions provide a solid foundation, they often fall short in supporting QoS provisioning or resource management in edge network environments. T able I: A comparison summary of selected prior literature. Algorithm Flow T arget Acc. QoS AI-NTC [18] N A app label > 90% No ET -BER T [19] ↑↓ app label > 92% No GraphDapp [20] ↑↓ app label > 89% No ProGraph [21] ↑↓ app label > 90% No CADE [27] ↑↓ Attacks > 95% N A A CID [26] ↑↓ Attacks > 99% N A A WEE [28] N A Attacks > 98% N A B ARS [25] N A Robustness N A N A P2P-act [29] ↓ P2P actions N A No W eb-act [30] N A W eb actions > 92% No CUMMA [31] ↓ MSG services > 90% Y es rCKC+FRF [32] ↑↓ MSG/SM services > 94% Y es B. Service A ware Network T raf fic Classification T o bridge the gap between coarse-grained application classi- fication and service-level differentiation, researchers hav e ex- plored fine-grained NTC methods ov er the past decade. Early efforts focused on identifying functional categories within specific applications. For example, Park et al. [29] proposed a method to classify peer-to-peer (P2P) traffic by acti vity type (e.g., download, upload, and search) using Jaccard similarity . Their analysis showed that downloading traffic dominated usage on platforms such as Fileguri and BitT orrent, accounting for 74%–90% of total traffic. Lin et al. [30] extended this approach to web applications. The y classified user actions such as video streaming and map bro wsing by analyzing statistical features from HTTPS messages without relying on payload inspection. Their method achieved up to 98.30% accuracy . Fu et al. [31] focused on mobile messaging applications such as W eChat and WhatsApp. By combining packet- and flo w- lev el features, they classified activities like te xt messaging and voice calls with ov er 90% accuracy . Liu et al. [32] dev eloped a real-time analysis framew ork for encrypted mobile traffic. Their method achiev ed 94.01% accuracy on W eChat while significantly improving processing speed and memory efficienc y . Despite these advances, scaling fine-grained clas- sification across a broad range of applications in dynamic, heterogeneous edge en vironments remains an open challenge. I I I . F R A M E W O R K O F T H E H I E R A R C H I C A L G R A P H N E U R A L N E T W O R K B A S E D N T C The overall architecture of the proposed hybrid Graph Neural Network (GNN) model is depicted in Fig. 1. The model is designed to capture multi-scale structural and temporal Graph Encoder Graph Encoder Graph Encoder QoS-Aware MLP ... Graph Presents a time window Collection of fixed packets # ... Graph Same 5-tuple short sessions With duration threshold Node → Level 2 graph ... Graph Presents a short session Collection of time windows Node → Level 3 graph Figure 1: Overvie w of the hierarchical GNN framework. 3 dependencies inherent in network traffic through a three- tiered hierarchical encoding framework, followed by a unified classification module. A. Gr aph Construction Network packets are initially grouped based on the canon- ical 5-tuple: sour ce IP addr ess, source port, destination IP addr ess, destination port, and pr otocol . The source and des- tination IP addresses are considered interchangeable for the bidirectional flows in the same session. A session timeout threshold (e.g., 0.5 seconds) is applied to segment prolonged flows into shorter sessions, while a maximum session duration (e.g., 60 seconds) is enforced to bound session length. T o model the hierarchical and temporal structure of network traffic, we construct a three-level graph representation that captures traf fic characteristics at multiple granularities: pack et aggr e gation , time windowing , and session clustering . As il- lustrated in Fig. 1, the graph construction proceeds in three stages, each corresponding to a distinct level of abstraction. The architecture incorporates 18 semantic features (T able II) that encode statistical and temporal properties across these lev els, enabling the model to learn expressiv e, multi-scale representations for QoS-aware traffic classification. Level-1 Nodes (P ac ket Aggr e gation): W ithin each Level-2 time window , packets are grouped into Lev el-1 nodes based on a fixed packet count (e.g., 10 packets per node). Each Lev el-1 node is represented by a 9-dimensional feature vector capturing fine-grained traffic characteristics, including basic statistical metrics and inter-arri val timing patterns (T able II). Higher-order distributional features are excluded at this level. Each packet cluster forms an independent Level-1 subgraph. Level-2 Nodes (T ime W indow Clusters): W ithin each short session (segmented using a fixed idle timeout, e.g., 0.5 sec- ond), non-empty time windows (e.g., 100 ms) are aggregated into Le vel-2 nodes. Each node represents a time window cluster and forms an independent Le vel-1 subgraph. Each Lev el-2 node is encoded with an additional 11-dimensional feature vector comprising nine shared features and two higher- order statistical features—skewness and kurtosis—computed as described in T able II. These features provide medium- grained temporal insights and capture the distributional char- acteristics of packet lengths within each time window . Level-3 Nodes (Session Aggr e gation): Multiple short ses- sions associated with the same 5-tuple are aggregated into Lev el-3 nodes. T o ensure compatibility with real-time con- straints, each Lev el-3 session is limited to a fixed maximum duration (e.g., 60 seconds), with longer sessions split ac- cordingly . Beyond the embedding features from the Lev el- 2 subgraph, each Level-3 node is encoded with an addition 11-dimensional feature vector , including four shared features (packet count, total bytes, av erage packet size, uplink ratio) and sev en session-specific features (session duration, packet rate, byte rate, flow symmetry , burst count, average burst size, inter-b urst time), as detailed in T able II. These features abstract long-term behavioral patterns and emphasize burst- lev el dynamics and flow characteristics. T o address the challenge of graphs containing only a single real node, where GNNs struggle due to the absence T able II: Multi-level feature definitions. Featur e Notation and calculation L1 L2 L3 # packet n ✓ ✓ ✓ T otal bytes P n i =1 l i ✓ ✓ ✓ Mean(bytes) P n i =1 l i n = l ✓ ✓ ✓ V ar(bytes) 1 n P n i =1 ( l i − l ) 2 = σ 2 l ✓ ✓ Uplink ratio 1 n P n i =1 1 ( src i = client ip ) ✓ ✓ ✓ Mean(IA T) 1 n − 1 P n k =2 ( t k − t k − 1 ) = IA T ✓ ✓ V ar(IA T) 1 n − 1 P n k =2 (IA T k − IA T) 2 ✓ ✓ Min(IA T) min 2 ≤ k ≤ n IA T k ✓ ✓ Max(IA T) max 2 ≤ k ≤ n IA T k ✓ ✓ Ske wness 1 n P n i =1 ( l i − l ) 3 ( σ 2 l ) 3 / 2 , σ 2 l > 0 0 , else ✓ Kurtosis 1 n P n i =1 ( l i − l ) 4 ( σ 2 l ) 2 − 3 , σ 2 l > 0 0 , else ✓ Session dur. t end − t start ✓ Packet rate n/ ( t end − t start ) ✓ Byte rate ( P n i =1 l i ) / ( t end − t start ) ✓ Flow symm. 1 − | l up − l down | max( l up , l down ) ✓ Burst count |{ B i : IA T ≤ 100 ms }| ✓ Mean(burst) 1 B P B i =1 | burst i | ✓ Burst interval 1 B − 1 P B − 1 i =1 ( t start ,i +1 − t end ,i ) ✓ # Features 9 11 11 of neighborhood conte xt, auxiliary head and tail nodes are introduced at all lev els. These auxiliary nodes are assigned zero-valued feature vectors with dimensionality matching that of the corresponding real nodes (9-dimensional for Level-1, 11-dimensional for Lev el-2 and Lev el-3, respecti vely). Intra-le vel Edges: W ithin each level, nodes are fully con- nected in forward temporal order, with edge weights reflecting time delays between consecutiv e nodes i and j : edge weight L 1 i,j = timestamp j − timestamp i , (1a) edge weight L 2 i,j = center time j − center time i , (1b) edge weight L 3 i,j = session start j − session end i . (1c) These edge weights encode temporal dependencies: Level-1 edges capture delays between packet aggregations, Le vel-2 edges capture delays between time window centers, and Le vel- 3 edges capture inter-session gaps. Zero-weight edges connect auxiliary head and tail nodes to the first and last real nodes, respectiv ely , ensuring structural consistenc y . Inter-le vel Edges: The hierarchical structure maintains strict correspondence across lev els without explicit inter-le vel edges. Each Level-3 session node aggregates multiple Level-2 time window subgraphs, and each Level-2 node aggregates multi- ple Le vel-1 packet cluster subgraphs. Information propagates bottom-up through learned feature embeddings: Level-1 fea- tures inform Le vel-2 representations, which in turn inform Lev el-3 behavioral abstractions. This hierarchical design enables the model to capture multi- scale temporal patterns-ranging from fine-grained packet-lev el 4 interactions (Level-1), through medium-grained time window dependencies (Level-2), to coarse-grained session-level behav- iors (Level-3), while preserving temporal ordering and causal relationships at each lev el. B. Hier ar chical Graph Encoder and QoS-aware Classifier BatchNorm GA Tv2 BatchNorm + ELU GA Tv2 Pool Graph Embeddings Figure 2: Overvie w of the graph encoder . A sub-graph in each level is processed by a 2-layer graph encoder based on based on GA Tv2 [33], as depicted in Fig. 2. The designs of the graph encoder are slightly different, described in the following. • The level-1 graph encoder employs 2 attention heads with edge feature integration in the first layer, transforming input features to 64-dimensional representations. The second layer uses single-head attention to produce 64- dimensional node embeddings. Dual global pooling op- erations (mean and max) aggregate node representations into 128-dimensional cluster embeddings. • The lev el-2 graph encoder processes the 11-dimensional features from time window nodes, and the 128- dimensional Lev el-1 cluster embeddings, creating 139- dimensional features. The augmented features undergo 2- layer GA Tv2 processing: the first layer with 2 attention heads expands to 256 dimensions, while the second layer with single attention consolidates to 128-dimensional embeddings. Global pooling produces 256-dimensional time window representations. • The lev el-3 graph encoder process the 11-dimensional features from session nodes, and the 256-dimensional Lev el-2 embeddings, creating 267-dimensional fea- tures. Similar 2-layer GA Tv2 processing expands fea- tures to 256 dimensions, then consolidates to 128- dimensional session embeddings. Global pooling yields 256-dimensional session-lev el representations. Linear BatchNorm + ReLU Linear BatchNorm + ReLU Linear Raw Logits True Label Cross- Entropy Loss QoS Penalty Final Loss Backpropagation QoS Biased Logits Final Prediction ... ... QoS Awareness Matrix Figure 3: Overvie w of the QoS-aware classifier . The final stage of the model is a QoS-a ware classification network designed for fine-grained, traf fic categorization at the Lev el-2. As shown in Fig. 3, the classifier lev erages a multi- scale feature fusion strategy , combining context from both the Time W indo w (TW) and its parent Session to make a prediction. For each TW graph to be classified, its learned 256- dimensional embedding, E tw , is concatenated with the 256- dimensional embedding of its corresponding parent Session, E session , which contains the information learning from all three lev el. This creates a combined 512-dimensional feature vector , E combined = [ E tw ∥ E session ] , that encapsulates both temporal patterns directly from the TW and beha vioral context from the all three le vels. This combined embedding is then passed through a Multi-Layer Perceptron (MLP) which acts as the classifier: 1) A linear layer maps the 512-dimensional input to a 512- dimensional hidden space, followed by Batch Normal- ization, a ReLU activ ation, and Dropout. 2) A second linear layer reduces the dimensionality from 512 to 256, again followed by Batch Normalization, ReLU, and Dropout. 3) A final linear output layer maps the 256-dimensional representation to a C -dimensional logit vector , where C is the number of the classes. The resulting logits are used to compute the classification loss and final predictions. I V . Q O S A WA R E N ES S N T C A. QoS A wareness Assignment T o align the model’ s predictions with network Quality of Service (QoS) requirements, we introduce a QoS-aware training and inference frame work. This frame work prioritizes the correct classification of high QoS aw areness traffic (e.g., liv e streaming) and penalizes misclassifications that would lead to assigning a lower -than-required service level. QoS awareness assignment employs a magnitude-based approach that automatically determines awareness le vels for network traffic flows based on their service requirements. The algo- rithm analyzes traffic characteristics using logarithmic magni- tude classification and weighted scoring to establish differ - entiated service awareness. For better illustration, the QoS awareness of each traffic flo w f is characterized by fiv e QoS metrics c i : bandwidth (Mbps) , jitter stability , pack et stability , averag e inter-b urst delay , and burst stability , denoted as [ c bw , c jitter , c packet , c burst freq , c burst stab ] , where c bw = P N i =1 size i × 8 ( t end − t start ) × 10 6 , (2a) c jitter = σ IA T µ IA T , (2b) c packet = σ IA T , (2c) c burst freq = 1 N bursts − 1 N bursts − 1 X k =1 ( t k +1 start − t k end ) , (2d) c burst stab = σ inter burst delay . (2e) For each QoS metric c i ( i being an inde x to the QoS metric, e.g., c 1 indicates c bw ), we perform a logarithmic transforma- tion to normalize the scale and emphasize order-of-magnitude differences: m i = log 10 ( c i ) , i is indexed to QoS metrics . (3) 5 The logarithmic transformation is applied directly , as it pre- serves order-of-magnitude distinctions across the full range of metric values, with m i being negati ve for fractional values and positiv e for values greater than 1. The classification process operates independently for each QoS metric, grouping traf fic flows based on magnitude similarity . For each metric, all flows are first sorted by their transformed magnitude v alues m i in ascending order . The classification then proceeds sequentially through this sorted list: • Class Initiation: The first flo w in the sorted list initializes the first QoS class (e.g., Class 0 ) for the current metric. Its transformed magnitude m (1) i serves as the reference point for subsequent comparisons. • Sequential Assignment: For each subsequent flow k in the sorted order, its magnitude m ( k ) i is compared against the magnitude of the most recently processed flow in the current class. If the magnitude difference satisfies: | m ( k ) i − median ( { m ( j ) i : j ∈ Class } ) | ≤ X thresh , (4) where median ( { m ( j ) i : j ∈ Class } ) is the median magnitude of all flo ws currently in that class, then flow k is assigned to that class. • New Class Creation: If the magnitude dif ference exceeds the threshold (i.e., | m ( k ) i − median ( { m ( j ) i : j ∈ Class } ) | > X thresh ), a ne w QoS class is created, and flo w k becomes the first member of this new class. This process is repeated independently for all five QoS metrics: bandwidth, jitter, packet stability , burst frequency , and burst stability . Each metric produces its own set of classes, and each traffic flow receives a class assignment for e very metric. Illustrative example: Consider a bandwidth metric with three flows having transformed magnitudes m (1) bw , m (2) bw , and m (3) bw where m (1) bw < m (2) bw < m (3) bw . In this work, we set X thresh = 0 . 6 to capture one order-of-magnitude dif ferences between classes. The example followed the setting. Flow f 1 initiates Class 0 . Flow f 2 is compared: if | m (2) bw − m (1) bw | ≤ 0 . 6 , it joins Class 0 . Flow f 3 is compared against the last assigned flow: if | m (3) bw − median ( m (1) bw , m (2) bw ) | > 0 . 6 , it creates a new Class 1 . After independent classification of each metric, ev ery traf- fic flow is characterized by a 5-dimensional class sequence vector s = [ s 1 , s 2 , s 3 , s 4 , s 5 ] , where s i represents the class assignment for the i -th QoS metric (bandwidth, jitter , packet stability , burst frequency , b urst stability , respecti vely). T raffic flows with identical class sequence vectors are grouped into the same QoS label. After initial classification, QoS classes are reordered based on their relativ e importance in network management. For instance, higher bandwidth and lower jitter typically indicate higher QoS priority . T o formalize this, we define a QoS awareness score p k a for each class k based on its class sequence values: p k a = w 1 · s k 1 + 5 X i =2 w i · (max j ∈ k s ( j ) i ) − s k i , (5) where the first term re wards higher bandwidth (QoS metric s 1 ), and the remaining terms penalize instability in jitter , packet size, inter-b urst delay , and burstiness metrics, where lower values indicate better QoS. The weights w i can be tuned based on application-specific requirements. In this work, we prioritize real-time responsiv eness and assign the weights as follows: w bandwidth = 0 . 30 , w jitter = 0 . 20 , w packet = 0 . 15 , w burst freq = 0 . 20 , and w burst stab = 0 . 15 . Classes are then ranked in ascending order of p k a , with higher scores indicating higher QoS aw areness. The final QoS lev els are assigned accordingly , ranging from 0 to N − 1 for N classes. This magnitude-based classification framework automatically adapts to diverse traffic distributions; ensures similar flows are grouped under the same QoS class; and provides interpretable and tunable prioritization based on weighted QoS metrics. B. QoS-awar e Model T raining and Inference T o incorporate QoS awareness into the training process, we design a composite loss function that balances standard classification accuracy with penalties for QoS-violating mis- classifications. The total loss is defined as: L total = (1 − λ ) L CE + λ L QoS , (6) where L CE is the standard cross-entropy loss, L QoS is the QoS- aware penalty term, and λ ∈ [0 , 1] is a tunable hyperparameter that controls the trade-off between classification accuracy and QoS sensitivity . The QoS-aw are loss L QoS is computed by scaling the cross-entropy loss with a penalty matrix P [ i, j ] that encodes the cost of misclassifying a sample from class i as class j : L QoS = L CE × (1 + P [ y true , y pred ]) , (7) where the penalty matrix P is defined as follows: • P [ i, i ] = 0 for correct classifications. • P [ i, j ] = β for misclassifications to higher or equal QoS classes, i.e., QoS ( j ) ≥ QoS ( i ) . • P [ i, j ] = 1 . 0 + γ · ( QoS ( i ) − QoS ( j )) for misclassifications to lower QoS classes. Here, β and γ are hyperparameters that control the sev erity of penalties, with γ typically set higher to discourage under- provisioning errors. T o further align predictions with QoS priorities during inference, we introduce three complementary strategies. QoS bias adjustment: The raw output logits are adjusted by incorporating a bias term proportional to each class’ s QoS awareness score: logits biased = logits raw + α · QoS awareness , (8) where α is a tunable parameter that controls the strength of the QoS bias. This encourages the model to fav or higher-QoS classes when confidence is comparable. P ost-pr ocessing r efinement: For predictions with low confi- dence (i.e., maximum softmax probability score top-1 below a threshold σ ), we compare the top-2 candidate classes. If their confidence scores are within a relative margin θ , the class with the higher QoS awareness score is selected: score top-2 score top-1 < θ ⇒ select class with higher QoS . (9) QoS-awar e evaluation metrics: In addition to standard ac- curacy , we introduce two QoS-centric ev aluation metrics: 6 • QoS satisfaction rate: The percentage of samples where the predicted QoS le vel is greater than or equal to the ground truth in misclassified samples. • QoS e xperience score: A metric that rewards ov er - provisioning errors (predicting higher QoS than required) more than under-pro visioning errors. This QoS-aware training and inference frame work ensures that classification errors are biased tow ard over -provisioning rather than under-provisioning, thereby preserving service quality for latenc y-sensitiv e or mission-critical applications. It provides a principled mechanism to integrate application- lev el QoS priorities into both model optimization and decision- making, ultimately enhancing the reliability and utility of traffic classification in real-world network en vironments. V . E V A L U A T I O N R E S U LT S A comprehensi ve ev aluation is conducted on the proposed QoS-aware hierarchical GNN model for fine-grained network traffic classification. The experiments demonstrate the ef fec- tiv eness of the three-le vel graph representation and the QoS- integrated training strategy in accurately classifying 14 traffic classes across four widely used applications. A. Data Collection The dataset used in this study was collected using PCAPdroid [34] on Android devices connected to WiFi networks, capturing real-world traf fic traces from four major applications: Y ouT ube , Prime V ideo , T ikT ok , and Zoom . For each application, 10-minute PCAPNG traces were recorded under div erse usage scenarios to construct a comprehensiv e 14-class dataset: • Y ouT ube : Browsing, liv e streaming, long-form video, short-form video (4 classes). • Prime Video : browsing, liv e streaming, long-form video (3 classes). • TikT ok : Bro wsing, liv e streaming, short-form video (3 classes). • Zoom : Audio conferencing, symmetric video conferenc- ing, uplink-only presentation mode, do wnlink-only atten- dance mode (4 classes). Raw packet traces were processed to extract sessions using 5- tuple flo w identification and idle timeout segmentation. Each session was then transformed into a three-lev el hierarchical graph structure comprising: • Level-1 (packet cluster graphs) : Nodes represent packet clusters aggregated by fixed packet count. • Level-2 (time window graphs) : nodes represent 100 ms time windows within short sessions. • Level-3 (session graphs) : Nodes represent short sessions grouped under the same 5-tuple, constrained to a maxi- mum duration of 60 seconds. The resulting dataset exhibits natural class imbalance, reflect- ing realistic usage distributions and pro viding a challenging yet authentic benchmark for ev aluating classification performance in practical network en vironments. Fig. 4 illustrates an example of a 3-level hierarchical graph from Y ouT ube Browsing traffic. As sho wn in Fig. 4(a), blue nodes represent Level-1 packet cluster graphs, green nodes represent Level-2 time windo w graphs, and orange nodes rep- resent Le vel-3 session graphs. Gray nodes indicate auxiliary nodes; solid arrows denote real temporal edges with time- delay labels; dashed arrows connect virtual nodes. Node size encodes total bytes, and transparency reflects session duration (Lev el-3) or av erage packet length (Lev el-1 and Lev el-2). Fig. 4(b) shows the I/O traffic graph of the original session corresponding to Fig. 4(a), sho wing the temporal network activity used to construct the hierarchical graph. B. Experimental Setup and Evaluation Metrics The experimental ev aluation follows an 80/20 train-test split using stratified sampling to preserve the original class distri- bution across both subsets. The model is implemented using the PyT orch Geometric framework and optimized with the AdamW optimizer . A learning rate scheduler is employed to ensure stable conv ergence during training. W e first ev aluate a con ventional Pack et-le vel Multi-Layer Perceptron NTC which has been used in [18], to do in-app traffic classification, which achiev es only 72.7% accuracy , indicating that con ventional methods cannot effecti vely distinguish in-app traffic. There- fore, we employ our proposed QoS-aw are hierarchical GNN approach to address these limitations. T o isolate the impact of QoS-awareness, two models are trained and ev aluated under identical conditions: (1) A baseline model without QoS- aware loss or inference strategies; and (2) The proposed QoS- aware hierarchical GNN model. Both models utilize the same dataset, preprocessing pipeline, and data splits, ensuring a controlled comparison. The experimental ev aluation follows an 80/20 train-test split using stratified sampling to preserve the original class distribution across both subsets. The model is implemented using the PyT orch Geometric framework and optimized with the AdamW optimizer . A learning rate sched- uler is employed to ensure stable con vergence during training. Therefore, we employ our proposed QoS-aware hierarchical GNN approach to address these limitations. T o isolate the impact of QoS-awareness, two models are trained and ev alu- ated under identical conditions: (1) A baseline model without QoS-aware loss or inference strategies; and (2) The proposed QoS-aware hierarchical GNN model. Both models utilize the same dataset, preprocessing pipeline, and data splits, ensuring a controlled comparison. Model performance is assessed using both con ventional and QoS-centric ev aluation metrics. The con ventional metrics focus on traditional classification accuracy , measuring the ov erall correctness of predicted traffic classes. The accuracy performance is e v aluated using standard classification metrics: Precision, Recall, and F1-Score. The first QoS ev aluation metric is QoS satisfaction rate , which quantifies the proportion of predictions where the predicted QoS lev el is greater than or equal to the ground truth in the misclassified samples, reflecting over -provisioning behavior . T o further e valuate the effecti veness of QoS-aw are classification, a new metric QoS experience scor e is introduced. This metric extends beyond traditional accuracy by incorporating the se verity of misclas- sifications based on QoS aw areness lev els, thereby assessing 7 (a) The 3-level hierarchical graph. (b) The I/O graph of the session. Figure 4: Example of the multi-level graph structure and the corresponding ra w session traffic. the practical impact of prediction errors in network traffic management. The QoS experience score is computed using a re ward-penalty mechanism applied to the confusion matrix: Q score = N X i =1 N X j =1 C i,j · w i,j , (10) where C i,j denotes the number of samples with true class i predicted as class j , and w i,j is the weight assigned to each prediction outcome: w i,j = ( + P i , if P j ≥ P i , ov er-pro visioning bias , − P i , if P j < P i , under-pro visioning bias , (11) where P i denotes the QoS awareness lev el of class i . The scoring logic is as follows: • Over-pr ovisioning bias ( P j ≥ P i ): When a flow is classified into a class with equal or higher QoS aw areness than its true class, is it more likely to allocate sufficient resources with over provisioning, earning a positive score proportional to the true class’ s a wareness level. • Under pr ovisioning bias ( P j < P i ): When high- awareness traffic is misclassified into a lower -aw areness class, it risks resource under-pro visioning that cannot meet QoS needs, incurring a penalty proportional to the true class’ s awareness level. The theoretical maximum score, representing perfect classifi- cation, is giv en by: Q max = N X i =1 n i · P i , (12) where n i is the number of samples in class i . The QoS Score Ratio provides a normalized performance metric: Q ratio = Q score Q max × 100% . (13) This ratio provides a meaningful comparison between QoS- aware and con ventional models, reflecting the practical con- 8 T able III: QoS metrics for adaptive awareness assignment. Application Bandwidth Jitter Stability Packet Stability Burst Frequency Burst Stability Class QoS Usage T ype (Mbps) (CV) (ms) (ms) (ms) Sequence A war eness Prime Video Browse 1.526 16.878 264.646 819.475 1843.938 [1,1,1,1,2] 1 Prime Video Liv e 8.266 20.444 106.283 853.313 1198.881 [2,1,1,1,2] 2 Prime Video LongV ideo 3.761 17.660 199.026 2232.049 1851.821 [1,1,1,1,2] 1 T ikT ok Browse 1.161 11.499 136.573 480.831 796.071 [1,1,1,1,2] 1 T ikT ok Live 1.049 3.501 22.653 95.712 46.384 [1,0,0,0,1] 4 T ikT ok ShortV ideo 1.211 12.772 309.290 1380.829 1983.031 [1,1,1,1,2] 1 Y ouTube Browse 2.029 17.398 71.809 552.994 734.962 [1,1,1,1,2] 1 Y ouTube Live 1.287 15.469 109.855 1013.868 928.634 [1,1,1,1,2] 1 Y ouTube LongV ideo 0.576 24.026 388.746 4124.126 4751.549 [1,1,1,2,2] 0 Y ouTube ShortV ideo 2.221 41.676 159.914 2822.534 3488.171 [1,1,1,1,2] 1 Zoom Audio 0.056 1.170 23.013 70.206 14.184 [0,0,0,0,0] 3 Zoom BiVideo 2.341 1.848 6.158 60.906 15.694 [1,0,0,0,0] 5 Zoom DownV ideo 2.077 1.904 7.249 58.149 9.431 [1,0,0,0,0] 5 Zoom UpVideo 2.130 2.246 8.139 58.725 7.739 [1,0,0,0,0] 5 Metric Classes 3 classes (0-2) 2 classes (0-1) 2 classes (0-1) 3 classes (0-2) 3 classes (0-2) YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.91 0.02 0.00 0.05 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.94 0.00 0.01 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.23 0.63 0.03 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.04 0.02 0.82 0.06 0.00 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.92 0.03 0.02 0.02 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.87 0.03 0.02 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.83 0.05 0.00 0.04 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.04 0.01 0.00 0.80 0.04 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.01 0.93 0.02 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.07 0.05 0.01 0.11 0.05 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.85 0.01 0.04 0.08 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.92 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.07 0.87 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.01 0.06 0.77 Baseline Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (a) Baseline NTC. YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.93 0.01 0.00 0.02 0.00 0.00 0.00 0.00 0.03 0.01 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.02 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.01 0.00 0.00 0.20 0.63 0.06 0.00 0.00 0.00 0.09 0.03 0.00 0.00 0.00 0.00 0.00 0.03 0.03 0.02 0.84 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 0.01 0.02 0.02 0.16 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.83 0.01 0.00 0.11 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.83 0.00 0.08 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.77 0.14 0.06 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.97 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.02 0.01 0.08 0.20 0.65 0.01 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.70 0.02 0.04 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.90 0.05 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.06 0.87 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.04 0.91 QoS- A war e Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (b) QoS-aware NTC. YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.17 0.02 0.00 0.77 0.00 0.01 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.10 0.00 0.87 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.08 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.97 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.45 0.46 0.01 0.05 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.97 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.63 0.34 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.00 0.02 0.09 0.17 0.00 0.64 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.01 0.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.48 0.00 0.07 0.01 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 0.01 0.07 0.03 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.93 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.93 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.98 P ack et-level MLP -NT C Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (c) Packet-le vel NTC [18]. Figure 5: Normalized confusion matrices for baseline NTC, proposed QoS-a ware NTC, and an e xisting packet-le vel NTC. sequences of misclassification in network resource allocation. Higher QoS scores indicate better alignment with service-level requirements, while lower scores suggest potential degradation due to inappropriate traffic prioritization. C. P erformance Evaluation on QoS-A war e NTC QoS awareness is first extracted before implementing the QoS-aware NTC. Each of the fourteen application usage scenario is represented by a five-element class sequence, constructed by concatenating its class IDs across the fiv e QoS metrics in the following order: [bandwidth, jitter stability , packet stability , burst frequency , burst stability], as detailed in T able III. Using the magnitude-based classification algorithm described in Sec. IV, the QoS awareness scores are deriv ed for each application usage scenario. In this study , the magnitude threshold X thresh is set to 1.0, resulting in the follo wing class distributions: bandwidth is divided into three classes (0,1,2), while jitter stability , packet stability , burst frequency , and burst stability are each di vided into two classes (0,1). Based on identical class sequences, the algorithm identifies six distinct QoS awareness groups among the fourteen usages. The largest group, Group 1, includes seven usage scenarios, including Prime V ideo (Browsing and Long-form V ideo), Tik- T ok (Browsing and Short-form V ideo), and Y ouT ube (Brows- ing, Li ve, and Short-form V ideo), all sharing the class se- quence [1, 1, 1, 1, 2], indicativ e of moderate bandwidth and stability requirements. In contrast, Group 5 achieves the highest QoS awareness lev el (score 5), comprising three real- time application usage scenarios including Zoom video confer- encing modes. These scenarios exhibit the class sequence [1, 0, 0, 0, 0], reflecting high stability demands and lo w tolerance for jitter and burst v ariability . T o rank the QoS groups by priority , a weighted scoring mechanism is applied using the follo wing metric weights: bandwidth (30%), jitter stability (20%), packet stability (15%), burst frequency (20%), and burst stability (15%). For example, Group 5 achiev es the highest weighted score of 1.35 due to its optimal stability profile, while Group 0 receives the lowest score of 0.30, reflecting its relati vely relaxed QoS require- ments. This ranking frame work enables priority-based QoS management, where traffic flows belonging to higher-scored groups are granted preferential treatment in network resource allocation. Such differentiation is critical for maintaining ser - vice quality in latency-sensiti ve and real-time applications. The QoS aw areness mechanism is then integrated into the proposed NTC framew ork. For comparison purposes, a stan- dard MLP classifier that does not incorporate QoS awareness is used as the baseline. Before presenting the QoS performance metrics, we first e valuate the traditional classification accurac y . As illustrated in Fig. 5, both the baseline classifier and the 9 T able IV: Classification results comparison between baseline NTC and QoS-aware NTC. Class Baseline NTC QoS-awar e NTC Precision Recall F1 Precision Recall F1 YT Browsing 0.96 0.91 0.94 0.97 0.93 0.95 YT Live 0.85 0.94 0.89 0.89 0.92 0.91 YT LongV ideo 0.85 0.63 0.72 0.92 0.63 0.75 YT ShortV ideo 0.87 0.82 0.84 0.91 0.84 0.88 PV Browsing 0.79 0.92 0.85 0.94 0.78 0.85 PV Live 0.86 0.87 0.86 0.92 0.83 0.87 PV LongV ideo 0.85 0.83 0.84 0.92 0.83 0.88 TT Browsing 0.82 0.80 0.81 0.89 0.77 0.82 TT Live 0.95 0.93 0.94 0.75 0.97 0.85 TT ShortV ideo 0.70 0.69 0.70 0.79 0.65 0.71 Zoom Audio 0.84 0.85 0.84 0.97 0.70 0.81 Zoom Bi 0.90 0.92 0.91 0.88 0.90 0.89 Zoom Up 0.88 0.87 0.87 0.87 0.87 0.87 Zoom Down 0.83 0.77 0.80 0.68 0.91 0.78 Accuracy 0.86 0.85 Mac. A vg 0.85 0.84 0.84 0.88 0.82 0.84 Wtd. A vg 0.86 0.86 0.86 0.86 0.85 0.85 QoS-aware classifier achie ve high accuracy across various application usage scenarios. A closer examination of T able IV shows that, although the overall accuracy of the QoS-aware model is slightly lo wer than that of the baseline, the difference is marginal. In fact, the a verage weighted accuracy remains the same for both models. Furthermore, the QoS-aw are approach demonstrates improved classification accuracy for certain spe- cific usage types, highlighting its ability to capture nuanced in-app beha viors. In contrast, a state-of-the-art packet-le vel NTC method [18] achiev es only 72.7% accuracy across all usage scenarios. This lower performance is largely due to its tendency to misclassify usage scenarios that originate from the same application, which can mislead QoS pro visioning. W e then demonstrate the improved QoS performance from the QoS-aware NTC. As shown in Fig. 6, the QoS-aware model achie ves a significantly higher QoS score ratio of 96.78 compared to the baseline’ s 88.30, representing an improv ement of 8.48 points. Additionally , for misclassified samples, the QoS-aware model achie ves a satisfaction rate of 91.79% com- pared to the baseline’ s 69.97%. The performance distribution figures rev eal that while the baseline model may achieve higher overall classification accuracy , its misclassifications often fail to meet QoS requirements, as evidenced by a larger proportion of under-provisioned cases. In contrast, the QoS-aware model shows a substantially smaller proportion of misclassifications that fail to satisfy QoS lev el requirements, ensuring better service quality for applications. Howe ver , this conservati ve approach may lead to ov er-pro visioning in some cases, potentially resulting in resource wastage as the model tends to assign higher QoS levels to avoid service degradation. D. Mor e Discussion A fundamental design decision in the proposed framework is to classify usage patterns first, rather than directly predicting QoS awareness levels as tar get labels. By decoupling these two stages, the framework gains greater stability , flexibility , and interpretability . Usage classification remains consistent 0 1 2 3 4 5 T rue QoS A war eness 0 20 40 60 80 100 P er centage of P r edictions (%) 92.2% 7.3% 9.9% 13.9% 91.4% 87.3% 80.1% 84.8% 5.1% 13.4% 93.0% Baseline Classification QoS P erfor mance QoS Scor e R atio: 88.30% | QoS Satisfaction R ate (Misclassified): 69.97% P erfect Match (P r edicted = T rue) Over -conservative (P r edicted > T rue) Under -estimation (P r edicted < T rue) (a) QoS performance with baseline NTC. 0 1 2 3 4 5 T rue QoS A war eness 0 20 40 60 80 100 P er centage of P r edictions (%) 82.4% 7.4% 93.4% 83.1% 76.9% 69.9% 13.4% 12.0% 15.3% 29.7% 98.4% QoS- A war e Classification QoS P erfor mance QoS Scor e R atio: 96.78% | QoS Satisfaction R ate (Misclassified): 91.79% P erfect Match (P r edicted = T rue) Over -conservative (P r edicted > T rue) Under -estimation (P r edicted < T rue) (b) QoS performance with the proposed fine-grained NTC. Figure 6: QoS performance comparison between baseline and proposed fine-grained NTC models. and reusable across different network environments, while QoS policies can be dynamically adapted based on e volving service requirements or resource constraints. The experimental results yield sev eral key insights into the ef fecti veness of the proposed QoS-aw are hierarchical GNN framework for fine-grained network traf fic classification. Notably , the three- lev el hierarchical graph structure successfully addresses the limitations of single-scale approaches by enabling the model to learn both local and contextual features. The e v aluation results prov es its effecti veness in capturing multi-scale temporal de- pendencies, which ensure accurate and robust classification in fine-grained usage scenarios. Meanwhile, the QoS a wareness is not necessarily obtained with a trade-of f from the traditional classification accuracy . In fact, the e v alution results demon- strated a slightly improv ed classification accuracy . It is because the QoS awareness impacts more on the uncertain classifica- tion results, which are highly likely to be misclassified by a normal NTC. The QoS awareness alters the final output, which may lead to a correct output. Meanwhile, the QoS-aw are model significantly impro ves the QoS Experience Score (96.78 vs. 88.30). This improvement reflects the model’ s conservati ve bias toward ov er-pro visioning, which is preferable in practical network management scenarios where under-pro visioning can lead to service degradation, whereas temporary over -allocation is generally more tolerable. 10 Despite these strengths, sev eral limitations warrant consid- eration. The current ev aluation focuses on four major appli- cations, which may limit generalizability to broader traffic domains. Additionally , the conservati ve QoS bias, while ben- eficial for service assurance, may lead to inefficient resource utilization in bandwidth-constrained environments. These ob- servations suggest promising directions for future work, in- cluding dynamic adjustment of QoS weighting strategies and expansion to a wider range of application types and network conditions. V I . C O N C L U S I O N A N D F U T U R E W O R K S This paper presented a hierarchical GNN frame work de- signed for fine-grained, QoS-aware network traffic classifica- tion. By integrating multi-scale graph modeling with a five- attribute QoS awareness assignment algorithm, the proposed framew ork enables accurate differentiation of in-app usage patterns while maintaining a strong focus on service quality . Experimental results demonstrate that the dev eloped GNN framew ork outperforms a state-of-the-art NTC method in fine-grained service-level application identification, achieving an accurac y of 86% compared to 72.9%. Furthermore, the inclusion of QoS-aware adjustment within the ov erall GNN framew ork does not negati vely impact the ov erall classification accuracy . On the contrary , it significantly enhances the QoS experience, with a notable improv ement in the QoS score (96.78 vs. 88.30) and the QoS satisfaction rate (91.79% vs. 69.97%). This improv ement is particularly valuable in real- world network en vironments, where preserving service quality is essential. T o further improve the adaptability and efficienc y of the framework, future work will focus on dynamic QoS bias adjustment based on real-time network conditions. Addi- tionally , the framew ork will be extended to support a wider range of application types and deployment scenarios. R E F E R E N C E [1] M. Satyanarayanan, P . Bahl, R. Caceres, and N. Davies, “The case for vm-based cloudlets in mobile computing, ” IEEE P ervasive Computing , vol. 8, no. 4, pp. 14–23, 2009. [2] A. Alhakamy , “Extended reality (xr) toward building immersive solutions: The key to unlocking industry 4.0, ” ACM Comput. Surv . , vol. 56, no. 9, Apr . 2024. [Online]. A vailable: https: //doi.org/10.1145/3652595 [3] C. Campolo, A. Molinaro, A. Iera, and F . Menichella, “5g network slic- ing for vehicle-to-ev erything services, ” IEEE W ireless Communications , vol. 24, no. 6, pp. 38–45, 2017. [4] G. Aceto, V . Persico, and A. Pescap ´ e, “ A survey on information and communication technologies for industry 4.0: State-of-the-art, tax- onomies, perspectives, and challenges, ” IEEE Communications Surveys & Tutorials , vol. 21, no. 4, pp. 3467–3501, 2019. [5] E. Papadogiannaki and S. Ioannidis, “ A survey on encrypted network traffic analysis applications, techniques, and countermeasures, ” ACM Comput. Surv . , vol. 54, no. 6, Jul. 2021. [Online]. A v ailable: https://doi.org/10.1145/3457904 [6] M. S. Sheikh and Y . Peng, “Procedures, criteria, and machine learning techniques for network traffic classification: A surve y , ” IEEE Access , vol. 10, pp. 61 135–61 158, 2022. [7] A. Shahraki, M. Abbasi, A. T aherkordi, and A. D. Jurcut, “ Acti ve learning for network traffic classification: A technical study , ” IEEE T ransactions on Cognitive Communications and Networking , vol. 8, no. 1, pp. 422–439, 2022. [8] A. Azab, M. Khasawneh, S. Alrabaee, K.-K. R. Choo, and M. Sarsour, “Network traffic classification: T echniques, datasets, and challenges, ” Digital Communications and Networks , vol. 10, no. 3, pp. 676– 692, 2024. [Online]. A vailable: https://www .sciencedirect.com/science/ article/pii/S2352864822001845 [9] T . Shapira and Y . Shavitt, “Flowpic: A generic representation for encrypted traffic classification and applications identification, ” IEEE T ransactions on Network and Service Management , vol. 18, no. 2, pp. 1218–1232, 2021. [10] T .-D. Pham, T .-L. Ho, T . Truong-Huu, T .-D. Cao, and H.- L. T ruong, “Mappgraph: Mobile-app classification on encrypted network traffic using deep graph con volution neural networks, ” in Pr oceedings of the 37th Annual Computer Security Applications Confer ence , ser . ACSA C ’21. Ne w Y ork, NY , USA: Association for Computing Machinery , 2021, p. 1025–1038. [Online]. A vailable: https://doi.org/10.1145/3485832.3485925 [11] T .-L. Huoh, Y . Luo, P . Li, and T . Zhang, “Flow-based encrypted network traffic classification with graph neural networks, ” IEEE T ransactions on Network and Service Management , vol. 20, no. 2, pp. 1224–1237, 2023. [12] O. Aouedi, K. Piamrat, and B. Parrein, “Ensemble-based deep learning model for network traffic classification, ” IEEE T ransactions on Network and Service Management , vol. 19, no. 4, pp. 4124–4135, 2022. [13] X. Duan, Y . Fu, and K. W ang, “Network traffic anomaly detection method based on multi-scale residual classifier, ” Computer Communications , vol. 198, pp. 206–216, 2023. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/S0140366422004121 [14] Q. Ma, C. Sun, B. Cui, and X. Jin, “ A novel model for anomaly detection in network traffic based on kernel support vector machine, ” Computers & Security , vol. 104, p. 102215, 2021. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/S0167404821000390 [15] M. B. Pranto, M. H. A. Ratul, M. M. Rahman, I. J. Diya, and Z.-B. Zahir , “Performance of machine learning techniques in anomaly detection with basic feature selection strategy-a network intrusion detection system, ” J . Adv . Inf. T echnol , vol. 13, no. 1, 2022. [16] C. Y u, J. Lan, J. Xie, and Y . Hu, “Qos-aware traffic classification architecture using machine learning and deep packet inspection in sdns, ” Pr ocedia Computer Science , vol. 131, pp. 1209–1216, 2018, recent Advancement in Information and Communication T echnology:. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/ S1877050918307129 [17] M. Beshley , N. Kryvinska, H. Beshley , O. Panchenko, and M. Med- vetskyi, “Traf fic engineering and qos/qoe supporting techniques for emerging service-oriented software-defined network, ” Journal of Com- munications and Networks , vol. 26, no. 1, pp. 99–114, 2024. [18] J. Zhang, F . Li, and F . Y e, “Sustaining the high performance of ai- based network traf fic classification models, ” IEEE/ACM T ransactions on Networking , vol. 31, no. 2, pp. 816–827, April 2023. [19] R. Zhao, M. Zhan, X. Deng, Y . W ang, Y . W ang, G. Gui, and Z. Xue, “Y et another traffic classifier: A masked autoencoder based traffic transformer with multi-level flo w representation, ” in Proceedings of the AAAI Confer ence on Artificial Intelligence , vol. 37, 2023, pp. 5420– 5427. [20] M. Shen, J. Zhang, L. Zhu, K. Xu, and X. Du, “ Accurate decentralized application identification via encrypted traffic analysis using graph neural networks, ” IEEE Tr ansactions on Information F or ensics and Security , vol. 16, pp. 2367–2380, 2021. [21] W . Li, X.-Y . Zhang, H. Bao, H. Shi, and Q. W ang, “Prograph: Ro- bust network traffic identification with graph propagation, ” IEEE/ACM T ransactions on Networking , vol. 31, no. 3, pp. 1385–1399, 2023. [22] Z. Zhao, Z. Li, X. Xie, J. Y u, F . Zhang, R. Zhang, B. Chen, X. Luo, M. Hu, and W . Ma, “: T o wards fine-grained unknown class detection against the open-set attack spectrum with v ariable le gitimate traf fic, ” IEEE/ACM T ransactions on Networking , 2024. [23] G. Apruzzese, P . Laskov , and J. Schneider , “Sok: Pragmatic assessment of machine learning for network intrusion detection, ” in 2023 IEEE 8th Eur opean Symposium on Security and Privacy (Eur oS&P) , 2023, pp. 592–614. [24] N. Mathews, J. K. Holland, S. E. Oh, M. S. Rahman, N. Hopper , and M. Wright, “Sok: A critical evaluation of efficient website fingerprinting defenses, ” in 2023 IEEE Symposium on Security and Privacy (SP) , 2023, pp. 969–986. [25] K. W ang, Z. W ang, D. Han, W . Chen, J. Y ang, X. Shi, and X. Y in, “Bars: Local robustness certification for deep learning based traffic analysis systems. ” in NDSS , 2023. [26] A. F . Diallo and P . Patras, “ Adaptiv e clustering-based malicious traffic classification at the network edge, ” in IEEE INFOCOM 2021 - IEEE Confer ence on Computer Communications , 2021, pp. 1–10. 11 [27] L. Y ang, W . Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. W ang, “ { CADE } : Detecting and explaining concept drift samples for security applications, ” in 30th USENIX Security Symposium (USENIX Security 21) , 2021, pp. 2327–2344. [28] M. Abbasi, S. L ´ opez Fl ´ orez, A. Shahraki, A. T aherkordi, J. Prieto, and J. M. Corchado, “Class imbalance in network traffic classification: An adaptiv e weight ensemble-of-ensemble learning method, ” IEEE Access , vol. 13, pp. 26 171–26 192, 2025. [29] B. Park, J. W .-K. Hong, and Y . J. W on, “T oward fine-grained traf fic classification, ” IEEE Communications Magazine , v ol. 49, no. 7, pp. 104– 111, July 2011. [30] P .-C. Lin, S.-Y . Chen, and C.-H. Lin, “T owards fine-grained traf fic classi- fication for web applications, ” in 2014 Australasian T elecommunication Networks and Applications Conference (ATN AC) , 2014, pp. 28–33. [31] Y . Fu, H. Xiong, X. Lu, J. Y ang, and C. Chen, “Service usage classification with encrypted internet traffic in mobile messaging apps, ” IEEE T ransactions on Mobile Computing , vol. 15, no. 11, pp. 2851– 2864, 2016. [32] J. Liu, Y . Fu, J. Ming, Y . Ren, L. Sun, and H. Xiong, “Effecti ve and real-time in-app activity analysis in encrypted internet traffic streams, ” in Proceedings of the 23rd A CM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’17. New Y ork, NY , USA: Association for Computing Machinery , 2017, p. 335–344. [Online]. A vailable: https://doi.org/10.1145/3097983.3098049 [33] S. Brody , U. Alon, and E. Y ahav , “How attentiv e are graph attention networks?” CoRR , vol. abs/2105.14491, 2021. [Online]. A vailable: https://arxiv .or g/abs/2105.14491 [34] E. Fusillo, “Pcapdroid: No-root network monitor, firewall and pcap dumper for android, ” https://github.com/emanuele- f/PCAPdroid, 2020, accessed: 2025-04-29.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment