Fine-Grained Network Traffic Classification with Contextual QoS Profiling

1 Fine-Grained Network T raf ﬁc Classiﬁcation with Conte xtual QoS Proﬁling Huiwen Zhang, Graduate Student Member , IEEE, Feng Y e, Senior Member , IEEE Abstract —Accurate network trafﬁc classiﬁcation is vital f or managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. While recent adv ances in application-level classiﬁcation show high accuracy , they often miss ﬁne-grained in-app QoS variations critical for service differentiation. This paper proposes a hierarchical graph neural network (GNN) framework that combines a three-lev el graph representation with an automated QoS-aware assignment algorithm. The model captures multi-scale temporal patterns via packet aggregation, time-window clustering, and session-le vel beha vior modeling. QoS priorities are derived using ﬁve key metrics (bandwidth, jitter , packet stability , burst frequency , and burst stability), processed through logarithmic transformation and weighted ranking. Eval- uations across 14 usage scenarios from Y ouT ube, Prime Video, TikT ok, and Zoom show that the proposed GNN signiﬁcantly outperforms state-of-the-art methods in service-level classiﬁca- tion. The QoS-aware assignment further r eﬁnes classiﬁcation to enhance user experience. This work advances QoS-aware trafﬁc classiﬁcation by enabling precise in-app usage differentiation and adaptive service prioritization in dynamic network envir onments. I . I N T RO D U C T I O N Accurate network traf ﬁc classiﬁcation (NTC) is essential for effecti ve network management, particularly in the context of emerging and future applications that demand stringent performance guarantees. For example, applications such as edge cloud computing [1], real-time extended reality [2], autonomous vehicle communication [3], and industrial IoT [4] rely heavily on low-latency , high-throughput, and highly reli- able network services. These applications introduce complex trafﬁc patterns characterized by variable data rates, strict la- tency constraints, and dynamic resource demands that ﬂuctuate in real time. As a result, precise NTC and Quality of Service (QoS) management hav e become increasingly critical to ensure application performance and user experience. NTC has ev olved signiﬁcantly o ver the years, transition- ing from traditional port-based methods and deep packet inspection to more adv anced statistical and AI-driv en tech- niques [5]–[7]. Recent developments in application-le vel NTC hav e achiev ed notable success, particularly in encrypted traf- ﬁc classiﬁcation, where models can infer application types without accessing payload content [8]. These approaches have demonstrated high accuracy in Internet and mobile application identiﬁcation [9]–[12], and have also signiﬁcantly advanced This project is partially supported by the U.S. National Science Foundation under Grant 2344341. Huiwen and Feng Y e (corresponding) are with the Department of Electrical and Computer Engineering, Uni versity of W isconsin-Madison, Wisconsin, WI, USA. Emails: { hzhang2279, feng.ye } @wisc.edu. anomaly detection [13]–[15], enabling proacti ve service as- surance and threat mitigation. Howe ver , most existing methods focus on identifying the application or trafﬁc type rather than capturing nuanced in-app QoS differences, such as distin- guishing between video streaming at dif ferent resolutions or between interactive and background data ﬂows. This limi- tation stems from their original design goals, which priori- tized coarse-grained classiﬁcation over the ﬁne-grained service differentiation required for advanced QoS provisioning. T o address these limitations, QoS-oriented NTC methods have emerged, aiming to classify trafﬁc based on QoS attributes such as throughput, latency , and jitter [16], [17]. While these methods provide valuable insights into network performance, they often rely on handcrafted features and manually deﬁned service categories, which limit their scalability and adaptabil- ity to new or e volving applications. Furthermore, the rigid mapping between trafﬁc patterns and QoS labels can hinder generalization across div erse network environments. In this work, a new hierarchical graph neural network (GNN) framew ork is proposed to address these challenges. The proposed framework inte grates a three-lev el hierarchical graph representation with an automated, magnitude-based QoS awareness assignment algorithm. It captures multi-scale tem- poral patterns through packet aggregation at Lev el-1, time window clustering at Level-2, and session-lev el behavioral modeling at Level-3. Classiﬁcation is performed at the time window le vel (Level-2), le veraging both ﬁne-grained packet- lev el features and broader session-lev el context to enhance usage pattern discrimination. Moreo ver , the ne wly de veloped QoS awareness assignment algorithm takes into consideration ﬁv e different QoS attrib utes, including bandwidth, jitter , pack et stability , burst frequency , and burst stability . By taking a logarithmic transformation of the raw values, each trafﬁc ﬂow can be dynamically assigned to a QoS class deﬁned by all ﬁv e metrics. A weighted ranking algorithm is further implemented to establish data-driv en service priorities that are automatically adapting to trafﬁc distribution characteristics. Ev aluations are conducted on trafﬁc traces collected from 14 different usage scenarios across Y ouT ube, Prime V ideo, TikT ok, and Zoom. The results demonstrate that the ne wly de veloped QoS-aware NTC enables ﬁne-grained differentiation of in-app usage pat- terns (e.g., T ikT ok browsing vs. li ve streaming vs. long-form video) while ensuring appropriate QoS provisioning by priori- tizing service quality preserv ation over resource optimization, comparing to a standard application-level NTC approach. The contributions of this work are fourfold. First, a novel three-lev el hierarchical graph representation is introduced, cap- turing temporal dependencies from packet-lev el interactions to 2 session-lev el behaviors, thereby enabling ﬁne-grained trafﬁc classiﬁcation beyond traditional application-lev el identiﬁca- tion. Second, an automated magnitude-based QoS aw areness assignment algorithm is developed, using logarithmic transfor- mation and automated grouping to establish consistent, data- driv en QoS priorities across di verse network conditions. Third, a QoS-aware training framework is proposed, incorporating composite loss functions and inference strategies that prior- itize service quality preservation, ensuring over -provisioning rather than under -provisioning for critical applications. Finally , comprehensiv e experimental validation is conducted, demon- strating signiﬁcant impro vements in QoS Experience while maintaining competitiv e classiﬁcation performance across 14 distinct usage scenarios spanning Y ouT ube, Prime V ideo, T ikT ok, and Zoom. The remainder of this paper is organized as follows: Sec- tion II revie ws related work in network traf ﬁc classiﬁcation and QoS-aware systems. Section III presents the proposed hierar- chical GNN frame work and graph construction methodology . Section IV details the QoS awareness assignment algorithm and QoS-aware training strategies. Section V provides com- prehensiv e experimental ev aluation and results analysis. The paper concludes in Section VI with future research directions. I I . R E L A T E D W O R K A. AI-based Network T rafﬁc Classiﬁcation Prior research in NTC has laid a strong theoretical founda- tion from multiple perspectives. T able I summarizes represen- tativ e research on NTC, focusing on recent AI techniques. As it shows, recent AI-based solutions hav e demonstrated near-perfect accuracy (typically around 90%) in basic app identiﬁcation e ven on encrypted traf ﬁc [18]–[21]. Among these approaches, GraphDapp [20] models traf ﬁc ﬂo ws as graph structures, where nodes represent network endpoints and edges capture communication patterns, enabling ef fecti ve app iden- tiﬁcation with 89% accuracy through graph neural networks. ProGraph [21] extends graph-based approaches by incorporat- ing protocol-lev el features and achieving over 92% accuracy in distinguishing between different applications under distinct networking scenarios. Parallel efforts in network intrusion and anomaly detection hav e also achieved high accuracy rates ( > 95%) [22]–[27]. CADE [27] employs contrastive learning to detect adversarial attacks in encrypted trafﬁc, achieving ov er 95% detection accuracy by learning robust feature rep- resentations. A CID [26] improv es model robustness against ev asion attacks, demonstrating 99% accuracy in identifying malicious traf ﬁc patterns. B ARS [25] speciﬁcally addresses the robustness of NTC systems against adversarial perturbations. Howe ver , most existing work focuses on coarse-grained app- lev el labeling. In practice, trafﬁc patterns from the same app can be highly heterogeneous, reﬂecting the contextual complexity of service behaviors. As a result, while these solutions provide a solid foundation, they often fall short in supporting QoS provisioning or resource management in edge network environments. T able I: A comparison summary of selected prior literature. Algorithm Flow T arget Acc. QoS AI-NTC [18] N A app label > 90% No ET -BER T [19] ↑↓ app label > 92% No GraphDapp [20] ↑↓ app label > 89% No ProGraph [21] ↑↓ app label > 90% No CADE [27] ↑↓ Attacks > 95% N A A CID [26] ↑↓ Attacks > 99% N A A WEE [28] N A Attacks > 98% N A B ARS [25] N A Robustness N A N A P2P-act [29] ↓ P2P actions N A No W eb-act [30] N A W eb actions > 92% No CUMMA [31] ↓ MSG services > 90% Y es rCKC+FRF [32] ↑↓ MSG/SM services > 94% Y es B. Service A ware Network T raf ﬁc Classiﬁcation T o bridge the gap between coarse-grained application classi- ﬁcation and service-level differentiation, researchers hav e ex- plored ﬁne-grained NTC methods ov er the past decade. Early efforts focused on identifying functional categories within speciﬁc applications. For example, Park et al. [29] proposed a method to classify peer-to-peer (P2P) trafﬁc by acti vity type (e.g., download, upload, and search) using Jaccard similarity . Their analysis showed that downloading trafﬁc dominated usage on platforms such as Fileguri and BitT orrent, accounting for 74%–90% of total trafﬁc. Lin et al. [30] extended this approach to web applications. The y classiﬁed user actions such as video streaming and map bro wsing by analyzing statistical features from HTTPS messages without relying on payload inspection. Their method achieved up to 98.30% accuracy . Fu et al. [31] focused on mobile messaging applications such as W eChat and WhatsApp. By combining packet- and ﬂo w- lev el features, they classiﬁed activities like te xt messaging and voice calls with ov er 90% accuracy . Liu et al. [32] dev eloped a real-time analysis framew ork for encrypted mobile trafﬁc. Their method achiev ed 94.01% accuracy on W eChat while signiﬁcantly improving processing speed and memory efﬁcienc y . Despite these advances, scaling ﬁne-grained clas- siﬁcation across a broad range of applications in dynamic, heterogeneous edge en vironments remains an open challenge. I I I . F R A M E W O R K O F T H E H I E R A R C H I C A L G R A P H N E U R A L N E T W O R K B A S E D N T C The overall architecture of the proposed hybrid Graph Neural Network (GNN) model is depicted in Fig. 1. The model is designed to capture multi-scale structural and temporal Graph Encoder Graph Encoder Graph Encoder QoS-Aware MLP ... Graph Presents a time window Collection of fixed packets # ... Graph Same 5-tuple short sessions With duration threshold Node → Level 2 graph ... Graph Presents a short session Collection of time windows Node → Level 3 graph Figure 1: Overvie w of the hierarchical GNN framework. 3 dependencies inherent in network trafﬁc through a three- tiered hierarchical encoding framework, followed by a uniﬁed classiﬁcation module. A. Gr aph Construction Network packets are initially grouped based on the canon- ical 5-tuple: sour ce IP addr ess, source port, destination IP addr ess, destination port, and pr otocol . The source and des- tination IP addresses are considered interchangeable for the bidirectional ﬂows in the same session. A session timeout threshold (e.g., 0.5 seconds) is applied to segment prolonged ﬂows into shorter sessions, while a maximum session duration (e.g., 60 seconds) is enforced to bound session length. T o model the hierarchical and temporal structure of network trafﬁc, we construct a three-level graph representation that captures traf ﬁc characteristics at multiple granularities: pack et aggr e gation , time windowing , and session clustering . As il- lustrated in Fig. 1, the graph construction proceeds in three stages, each corresponding to a distinct level of abstraction. The architecture incorporates 18 semantic features (T able II) that encode statistical and temporal properties across these lev els, enabling the model to learn expressiv e, multi-scale representations for QoS-aware trafﬁc classiﬁcation. Level-1 Nodes (P ac ket Aggr e gation): W ithin each Level-2 time window , packets are grouped into Lev el-1 nodes based on a ﬁxed packet count (e.g., 10 packets per node). Each Lev el-1 node is represented by a 9-dimensional feature vector capturing ﬁne-grained trafﬁc characteristics, including basic statistical metrics and inter-arri val timing patterns (T able II). Higher-order distributional features are excluded at this level. Each packet cluster forms an independent Level-1 subgraph. Level-2 Nodes (T ime W indow Clusters): W ithin each short session (segmented using a ﬁxed idle timeout, e.g., 0.5 sec- ond), non-empty time windows (e.g., 100 ms) are aggregated into Le vel-2 nodes. Each node represents a time window cluster and forms an independent Le vel-1 subgraph. Each Lev el-2 node is encoded with an additional 11-dimensional feature vector comprising nine shared features and two higher- order statistical features—skewness and kurtosis—computed as described in T able II. These features provide medium- grained temporal insights and capture the distributional char- acteristics of packet lengths within each time window . Level-3 Nodes (Session Aggr e gation): Multiple short ses- sions associated with the same 5-tuple are aggregated into Lev el-3 nodes. T o ensure compatibility with real-time con- straints, each Lev el-3 session is limited to a ﬁxed maximum duration (e.g., 60 seconds), with longer sessions split ac- cordingly . Beyond the embedding features from the Lev el- 2 subgraph, each Level-3 node is encoded with an addition 11-dimensional feature vector , including four shared features (packet count, total bytes, av erage packet size, uplink ratio) and sev en session-speciﬁc features (session duration, packet rate, byte rate, ﬂow symmetry , burst count, average burst size, inter-b urst time), as detailed in T able II. These features abstract long-term behavioral patterns and emphasize burst- lev el dynamics and ﬂow characteristics. T o address the challenge of graphs containing only a single real node, where GNNs struggle due to the absence T able II: Multi-level feature deﬁnitions. Featur e Notation and calculation L1 L2 L3 # packet n ✓ ✓ ✓ T otal bytes P n i =1 l i ✓ ✓ ✓ Mean(bytes) P n i =1 l i n = l ✓ ✓ ✓ V ar(bytes) 1 n P n i =1 ( l i − l ) 2 = σ 2 l ✓ ✓ Uplink ratio 1 n P n i =1 1 ( src i = client ip ) ✓ ✓ ✓ Mean(IA T) 1 n − 1 P n k =2 ( t k − t k − 1 ) = IA T ✓ ✓ V ar(IA T) 1 n − 1 P n k =2 (IA T k − IA T) 2 ✓ ✓ Min(IA T) min 2 ≤ k ≤ n IA T k ✓ ✓ Max(IA T) max 2 ≤ k ≤ n IA T k ✓ ✓ Ske wness      1 n P n i =1 ( l i − l ) 3 ( σ 2 l ) 3 / 2 , σ 2 l > 0 0 , else ✓ Kurtosis      1 n P n i =1 ( l i − l ) 4 ( σ 2 l ) 2 − 3 , σ 2 l > 0 0 , else ✓ Session dur. t end − t start ✓ Packet rate n/ ( t end − t start ) ✓ Byte rate ( P n i =1 l i ) / ( t end − t start ) ✓ Flow symm. 1 − | l up − l down | max( l up , l down ) ✓ Burst count |{ B i : IA T ≤ 100 ms }| ✓ Mean(burst) 1 B P B i =1 | burst i | ✓ Burst interval 1 B − 1 P B − 1 i =1 ( t start ,i +1 − t end ,i ) ✓ # Features 9 11 11 of neighborhood conte xt, auxiliary head and tail nodes are introduced at all lev els. These auxiliary nodes are assigned zero-valued feature vectors with dimensionality matching that of the corresponding real nodes (9-dimensional for Level-1, 11-dimensional for Lev el-2 and Lev el-3, respecti vely). Intra-le vel Edges: W ithin each level, nodes are fully con- nected in forward temporal order, with edge weights reﬂecting time delays between consecutiv e nodes i and j : edge weight L 1 i,j = timestamp j − timestamp i , (1a) edge weight L 2 i,j = center time j − center time i , (1b) edge weight L 3 i,j = session start j − session end i . (1c) These edge weights encode temporal dependencies: Level-1 edges capture delays between packet aggregations, Le vel-2 edges capture delays between time window centers, and Le vel- 3 edges capture inter-session gaps. Zero-weight edges connect auxiliary head and tail nodes to the ﬁrst and last real nodes, respectiv ely , ensuring structural consistenc y . Inter-le vel Edges: The hierarchical structure maintains strict correspondence across lev els without explicit inter-le vel edges. Each Level-3 session node aggregates multiple Level-2 time window subgraphs, and each Level-2 node aggregates multi- ple Le vel-1 packet cluster subgraphs. Information propagates bottom-up through learned feature embeddings: Level-1 fea- tures inform Le vel-2 representations, which in turn inform Lev el-3 behavioral abstractions. This hierarchical design enables the model to capture multi- scale temporal patterns-ranging from ﬁne-grained packet-lev el 4 interactions (Level-1), through medium-grained time window dependencies (Level-2), to coarse-grained session-level behav- iors (Level-3), while preserving temporal ordering and causal relationships at each lev el. B. Hier ar chical Graph Encoder and QoS-aware Classiﬁer BatchNorm GA Tv2 BatchNorm + ELU GA Tv2 Pool Graph Embeddings Figure 2: Overvie w of the graph encoder . A sub-graph in each level is processed by a 2-layer graph encoder based on based on GA Tv2 [33], as depicted in Fig. 2. The designs of the graph encoder are slightly different, described in the following. • The level-1 graph encoder employs 2 attention heads with edge feature integration in the ﬁrst layer, transforming input features to 64-dimensional representations. The second layer uses single-head attention to produce 64- dimensional node embeddings. Dual global pooling op- erations (mean and max) aggregate node representations into 128-dimensional cluster embeddings. • The lev el-2 graph encoder processes the 11-dimensional features from time window nodes, and the 128- dimensional Lev el-1 cluster embeddings, creating 139- dimensional features. The augmented features undergo 2- layer GA Tv2 processing: the ﬁrst layer with 2 attention heads expands to 256 dimensions, while the second layer with single attention consolidates to 128-dimensional embeddings. Global pooling produces 256-dimensional time window representations. • The lev el-3 graph encoder process the 11-dimensional features from session nodes, and the 256-dimensional Lev el-2 embeddings, creating 267-dimensional fea- tures. Similar 2-layer GA Tv2 processing expands fea- tures to 256 dimensions, then consolidates to 128- dimensional session embeddings. Global pooling yields 256-dimensional session-lev el representations. Linear BatchNorm + ReLU Linear BatchNorm + ReLU Linear Raw Logits True Label Cross- Entropy Loss QoS Penalty Final Loss Backpropagation QoS Biased Logits Final Prediction ... ... QoS Awareness Matrix Figure 3: Overvie w of the QoS-aware classiﬁer . The ﬁnal stage of the model is a QoS-a ware classiﬁcation network designed for ﬁne-grained, traf ﬁc categorization at the Lev el-2. As shown in Fig. 3, the classiﬁer lev erages a multi- scale feature fusion strategy , combining context from both the Time W indo w (TW) and its parent Session to make a prediction. For each TW graph to be classiﬁed, its learned 256- dimensional embedding, E tw , is concatenated with the 256- dimensional embedding of its corresponding parent Session, E session , which contains the information learning from all three lev el. This creates a combined 512-dimensional feature vector , E combined = [ E tw ∥ E session ] , that encapsulates both temporal patterns directly from the TW and beha vioral context from the all three le vels. This combined embedding is then passed through a Multi-Layer Perceptron (MLP) which acts as the classiﬁer: 1) A linear layer maps the 512-dimensional input to a 512- dimensional hidden space, followed by Batch Normal- ization, a ReLU activ ation, and Dropout. 2) A second linear layer reduces the dimensionality from 512 to 256, again followed by Batch Normalization, ReLU, and Dropout. 3) A ﬁnal linear output layer maps the 256-dimensional representation to a C -dimensional logit vector , where C is the number of the classes. The resulting logits are used to compute the classiﬁcation loss and ﬁnal predictions. I V . Q O S A WA R E N ES S N T C A. QoS A wareness Assignment T o align the model’ s predictions with network Quality of Service (QoS) requirements, we introduce a QoS-aware training and inference frame work. This frame work prioritizes the correct classiﬁcation of high QoS aw areness trafﬁc (e.g., liv e streaming) and penalizes misclassiﬁcations that would lead to assigning a lower -than-required service level. QoS awareness assignment employs a magnitude-based approach that automatically determines awareness le vels for network trafﬁc ﬂows based on their service requirements. The algo- rithm analyzes trafﬁc characteristics using logarithmic magni- tude classiﬁcation and weighted scoring to establish differ - entiated service awareness. For better illustration, the QoS awareness of each trafﬁc ﬂo w f is characterized by ﬁv e QoS metrics c i : bandwidth (Mbps) , jitter stability , pack et stability , averag e inter-b urst delay , and burst stability , denoted as [ c bw , c jitter , c packet , c burst freq , c burst stab ] , where c bw = P N i =1 size i × 8 ( t end − t start ) × 10 6 , (2a) c jitter = σ IA T µ IA T , (2b) c packet = σ IA T , (2c) c burst freq = 1 N bursts − 1 N bursts − 1 X k =1 ( t k +1 start − t k end ) , (2d) c burst stab = σ inter burst delay . (2e) For each QoS metric c i ( i being an inde x to the QoS metric, e.g., c 1 indicates c bw ), we perform a logarithmic transforma- tion to normalize the scale and emphasize order-of-magnitude differences: m i = log 10 ( c i ) , i is indexed to QoS metrics . (3) 5 The logarithmic transformation is applied directly , as it pre- serves order-of-magnitude distinctions across the full range of metric values, with m i being negati ve for fractional values and positiv e for values greater than 1. The classiﬁcation process operates independently for each QoS metric, grouping traf ﬁc ﬂows based on magnitude similarity . For each metric, all ﬂows are ﬁrst sorted by their transformed magnitude v alues m i in ascending order . The classiﬁcation then proceeds sequentially through this sorted list: • Class Initiation: The ﬁrst ﬂo w in the sorted list initializes the ﬁrst QoS class (e.g., Class 0 ) for the current metric. Its transformed magnitude m (1) i serves as the reference point for subsequent comparisons. • Sequential Assignment: For each subsequent ﬂow k in the sorted order, its magnitude m ( k ) i is compared against the magnitude of the most recently processed ﬂow in the current class. If the magnitude difference satisﬁes: | m ( k ) i − median ( { m ( j ) i : j ∈ Class } ) | ≤ X thresh , (4) where median ( { m ( j ) i : j ∈ Class } ) is the median magnitude of all ﬂo ws currently in that class, then ﬂow k is assigned to that class. • New Class Creation: If the magnitude dif ference exceeds the threshold (i.e., | m ( k ) i − median ( { m ( j ) i : j ∈ Class } ) | > X thresh ), a ne w QoS class is created, and ﬂo w k becomes the ﬁrst member of this new class. This process is repeated independently for all ﬁve QoS metrics: bandwidth, jitter, packet stability , burst frequency , and burst stability . Each metric produces its own set of classes, and each trafﬁc ﬂow receives a class assignment for e very metric. Illustrative example: Consider a bandwidth metric with three ﬂows having transformed magnitudes m (1) bw , m (2) bw , and m (3) bw where m (1) bw < m (2) bw < m (3) bw . In this work, we set X thresh = 0 . 6 to capture one order-of-magnitude dif ferences between classes. The example followed the setting. Flow f 1 initiates Class 0 . Flow f 2 is compared: if | m (2) bw − m (1) bw | ≤ 0 . 6 , it joins Class 0 . Flow f 3 is compared against the last assigned ﬂow: if | m (3) bw − median ( m (1) bw , m (2) bw ) | > 0 . 6 , it creates a new Class 1 . After independent classiﬁcation of each metric, ev ery traf- ﬁc ﬂow is characterized by a 5-dimensional class sequence vector s = [ s 1 , s 2 , s 3 , s 4 , s 5 ] , where s i represents the class assignment for the i -th QoS metric (bandwidth, jitter , packet stability , burst frequency , b urst stability , respecti vely). T rafﬁc ﬂows with identical class sequence vectors are grouped into the same QoS label. After initial classiﬁcation, QoS classes are reordered based on their relativ e importance in network management. For instance, higher bandwidth and lower jitter typically indicate higher QoS priority . T o formalize this, we deﬁne a QoS awareness score p k a for each class k based on its class sequence values: p k a = w 1 · s k 1 + 5 X i =2 w i ·  (max j ∈ k s ( j ) i ) − s k i  , (5) where the ﬁrst term re wards higher bandwidth (QoS metric s 1 ), and the remaining terms penalize instability in jitter , packet size, inter-b urst delay , and burstiness metrics, where lower values indicate better QoS. The weights w i can be tuned based on application-speciﬁc requirements. In this work, we prioritize real-time responsiv eness and assign the weights as follows: w bandwidth = 0 . 30 , w jitter = 0 . 20 , w packet = 0 . 15 , w burst freq = 0 . 20 , and w burst stab = 0 . 15 . Classes are then ranked in ascending order of p k a , with higher scores indicating higher QoS aw areness. The ﬁnal QoS lev els are assigned accordingly , ranging from 0 to N − 1 for N classes. This magnitude-based classiﬁcation framework automatically adapts to diverse trafﬁc distributions; ensures similar ﬂows are grouped under the same QoS class; and provides interpretable and tunable prioritization based on weighted QoS metrics. B. QoS-awar e Model T raining and Inference T o incorporate QoS awareness into the training process, we design a composite loss function that balances standard classiﬁcation accuracy with penalties for QoS-violating mis- classiﬁcations. The total loss is deﬁned as: L total = (1 − λ ) L CE + λ L QoS , (6) where L CE is the standard cross-entropy loss, L QoS is the QoS- aware penalty term, and λ ∈ [0 , 1] is a tunable hyperparameter that controls the trade-off between classiﬁcation accuracy and QoS sensitivity . The QoS-aw are loss L QoS is computed by scaling the cross-entropy loss with a penalty matrix P [ i, j ] that encodes the cost of misclassifying a sample from class i as class j : L QoS = L CE × (1 + P [ y true , y pred ]) , (7) where the penalty matrix P is deﬁned as follows: • P [ i, i ] = 0 for correct classiﬁcations. • P [ i, j ] = β for misclassiﬁcations to higher or equal QoS classes, i.e., QoS ( j ) ≥ QoS ( i ) . • P [ i, j ] = 1 . 0 + γ · ( QoS ( i ) − QoS ( j )) for misclassiﬁcations to lower QoS classes. Here, β and γ are hyperparameters that control the sev erity of penalties, with γ typically set higher to discourage under- provisioning errors. T o further align predictions with QoS priorities during inference, we introduce three complementary strategies. QoS bias adjustment: The raw output logits are adjusted by incorporating a bias term proportional to each class’ s QoS awareness score: logits biased = logits raw + α · QoS awareness , (8) where α is a tunable parameter that controls the strength of the QoS bias. This encourages the model to fav or higher-QoS classes when conﬁdence is comparable. P ost-pr ocessing r eﬁnement: For predictions with low conﬁ- dence (i.e., maximum softmax probability score top-1 below a threshold σ ), we compare the top-2 candidate classes. If their conﬁdence scores are within a relative margin θ , the class with the higher QoS awareness score is selected: score top-2 score top-1 < θ ⇒ select class with higher QoS . (9) QoS-awar e evaluation metrics: In addition to standard ac- curacy , we introduce two QoS-centric ev aluation metrics: 6 • QoS satisfaction rate: The percentage of samples where the predicted QoS le vel is greater than or equal to the ground truth in misclassiﬁed samples. • QoS e xperience score: A metric that rewards ov er - provisioning errors (predicting higher QoS than required) more than under-pro visioning errors. This QoS-aware training and inference frame work ensures that classiﬁcation errors are biased tow ard over -provisioning rather than under-provisioning, thereby preserving service quality for latenc y-sensitiv e or mission-critical applications. It provides a principled mechanism to integrate application- lev el QoS priorities into both model optimization and decision- making, ultimately enhancing the reliability and utility of trafﬁc classiﬁcation in real-world network en vironments. V . E V A L U A T I O N R E S U LT S A comprehensi ve ev aluation is conducted on the proposed QoS-aware hierarchical GNN model for ﬁne-grained network trafﬁc classiﬁcation. The experiments demonstrate the ef fec- tiv eness of the three-le vel graph representation and the QoS- integrated training strategy in accurately classifying 14 trafﬁc classes across four widely used applications. A. Data Collection The dataset used in this study was collected using PCAPdroid [34] on Android devices connected to WiFi networks, capturing real-world traf ﬁc traces from four major applications: Y ouT ube , Prime V ideo , T ikT ok , and Zoom . For each application, 10-minute PCAPNG traces were recorded under div erse usage scenarios to construct a comprehensiv e 14-class dataset: • Y ouT ube : Browsing, liv e streaming, long-form video, short-form video (4 classes). • Prime Video : browsing, liv e streaming, long-form video (3 classes). • TikT ok : Bro wsing, liv e streaming, short-form video (3 classes). • Zoom : Audio conferencing, symmetric video conferenc- ing, uplink-only presentation mode, do wnlink-only atten- dance mode (4 classes). Raw packet traces were processed to extract sessions using 5- tuple ﬂo w identiﬁcation and idle timeout segmentation. Each session was then transformed into a three-lev el hierarchical graph structure comprising: • Level-1 (packet cluster graphs) : Nodes represent packet clusters aggregated by ﬁxed packet count. • Level-2 (time window graphs) : nodes represent 100 ms time windows within short sessions. • Level-3 (session graphs) : Nodes represent short sessions grouped under the same 5-tuple, constrained to a maxi- mum duration of 60 seconds. The resulting dataset exhibits natural class imbalance, reﬂect- ing realistic usage distributions and pro viding a challenging yet authentic benchmark for ev aluating classiﬁcation performance in practical network en vironments. Fig. 4 illustrates an example of a 3-level hierarchical graph from Y ouT ube Browsing trafﬁc. As sho wn in Fig. 4(a), blue nodes represent Level-1 packet cluster graphs, green nodes represent Level-2 time windo w graphs, and orange nodes rep- resent Le vel-3 session graphs. Gray nodes indicate auxiliary nodes; solid arrows denote real temporal edges with time- delay labels; dashed arrows connect virtual nodes. Node size encodes total bytes, and transparency reﬂects session duration (Lev el-3) or av erage packet length (Lev el-1 and Lev el-2). Fig. 4(b) shows the I/O trafﬁc graph of the original session corresponding to Fig. 4(a), sho wing the temporal network activity used to construct the hierarchical graph. B. Experimental Setup and Evaluation Metrics The experimental ev aluation follows an 80/20 train-test split using stratiﬁed sampling to preserve the original class distri- bution across both subsets. The model is implemented using the PyT orch Geometric framework and optimized with the AdamW optimizer . A learning rate scheduler is employed to ensure stable conv ergence during training. W e ﬁrst ev aluate a con ventional Pack et-le vel Multi-Layer Perceptron NTC which has been used in [18], to do in-app trafﬁc classiﬁcation, which achiev es only 72.7% accuracy , indicating that con ventional methods cannot effecti vely distinguish in-app trafﬁc. There- fore, we employ our proposed QoS-aw are hierarchical GNN approach to address these limitations. T o isolate the impact of QoS-awareness, two models are trained and ev aluated under identical conditions: (1) A baseline model without QoS- aware loss or inference strategies; and (2) The proposed QoS- aware hierarchical GNN model. Both models utilize the same dataset, preprocessing pipeline, and data splits, ensuring a controlled comparison. The experimental ev aluation follows an 80/20 train-test split using stratiﬁed sampling to preserve the original class distribution across both subsets. The model is implemented using the PyT orch Geometric framework and optimized with the AdamW optimizer . A learning rate sched- uler is employed to ensure stable con vergence during training. Therefore, we employ our proposed QoS-aware hierarchical GNN approach to address these limitations. T o isolate the impact of QoS-awareness, two models are trained and ev alu- ated under identical conditions: (1) A baseline model without QoS-aware loss or inference strategies; and (2) The proposed QoS-aware hierarchical GNN model. Both models utilize the same dataset, preprocessing pipeline, and data splits, ensuring a controlled comparison. Model performance is assessed using both con ventional and QoS-centric ev aluation metrics. The con ventional metrics focus on traditional classiﬁcation accuracy , measuring the ov erall correctness of predicted trafﬁc classes. The accuracy performance is e v aluated using standard classiﬁcation metrics: Precision, Recall, and F1-Score. The ﬁrst QoS ev aluation metric is QoS satisfaction rate , which quantiﬁes the proportion of predictions where the predicted QoS lev el is greater than or equal to the ground truth in the misclassiﬁed samples, reﬂecting over -provisioning behavior . T o further e valuate the effecti veness of QoS-aw are classiﬁcation, a new metric QoS experience scor e is introduced. This metric extends beyond traditional accuracy by incorporating the se verity of misclas- siﬁcations based on QoS aw areness lev els, thereby assessing 7 (a) The 3-level hierarchical graph.                         (b) The I/O graph of the session. Figure 4: Example of the multi-level graph structure and the corresponding ra w session trafﬁc. the practical impact of prediction errors in network trafﬁc management. The QoS experience score is computed using a re ward-penalty mechanism applied to the confusion matrix: Q score = N X i =1 N X j =1 C i,j · w i,j , (10) where C i,j denotes the number of samples with true class i predicted as class j , and w i,j is the weight assigned to each prediction outcome: w i,j = ( + P i , if P j ≥ P i , ov er-pro visioning bias , − P i , if P j < P i , under-pro visioning bias , (11) where P i denotes the QoS awareness lev el of class i . The scoring logic is as follows: • Over-pr ovisioning bias ( P j ≥ P i ): When a ﬂow is classiﬁed into a class with equal or higher QoS aw areness than its true class, is it more likely to allocate sufﬁcient resources with over provisioning, earning a positive score proportional to the true class’ s a wareness level. • Under pr ovisioning bias ( P j < P i ): When high- awareness trafﬁc is misclassiﬁed into a lower -aw areness class, it risks resource under-pro visioning that cannot meet QoS needs, incurring a penalty proportional to the true class’ s awareness level. The theoretical maximum score, representing perfect classiﬁ- cation, is giv en by: Q max = N X i =1 n i · P i , (12) where n i is the number of samples in class i . The QoS Score Ratio provides a normalized performance metric: Q ratio = Q score Q max × 100% . (13) This ratio provides a meaningful comparison between QoS- aware and con ventional models, reﬂecting the practical con- 8 T able III: QoS metrics for adaptive awareness assignment. Application Bandwidth Jitter Stability Packet Stability Burst Frequency Burst Stability Class QoS Usage T ype (Mbps) (CV) (ms) (ms) (ms) Sequence A war eness Prime Video Browse 1.526 16.878 264.646 819.475 1843.938 [1,1,1,1,2] 1 Prime Video Liv e 8.266 20.444 106.283 853.313 1198.881 [2,1,1,1,2] 2 Prime Video LongV ideo 3.761 17.660 199.026 2232.049 1851.821 [1,1,1,1,2] 1 T ikT ok Browse 1.161 11.499 136.573 480.831 796.071 [1,1,1,1,2] 1 T ikT ok Live 1.049 3.501 22.653 95.712 46.384 [1,0,0,0,1] 4 T ikT ok ShortV ideo 1.211 12.772 309.290 1380.829 1983.031 [1,1,1,1,2] 1 Y ouTube Browse 2.029 17.398 71.809 552.994 734.962 [1,1,1,1,2] 1 Y ouTube Live 1.287 15.469 109.855 1013.868 928.634 [1,1,1,1,2] 1 Y ouTube LongV ideo 0.576 24.026 388.746 4124.126 4751.549 [1,1,1,2,2] 0 Y ouTube ShortV ideo 2.221 41.676 159.914 2822.534 3488.171 [1,1,1,1,2] 1 Zoom Audio 0.056 1.170 23.013 70.206 14.184 [0,0,0,0,0] 3 Zoom BiVideo 2.341 1.848 6.158 60.906 15.694 [1,0,0,0,0] 5 Zoom DownV ideo 2.077 1.904 7.249 58.149 9.431 [1,0,0,0,0] 5 Zoom UpVideo 2.130 2.246 8.139 58.725 7.739 [1,0,0,0,0] 5 Metric Classes 3 classes (0-2) 2 classes (0-1) 2 classes (0-1) 3 classes (0-2) 3 classes (0-2) YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.91 0.02 0.00 0.05 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.94 0.00 0.01 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.06 0.23 0.63 0.03 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.04 0.02 0.82 0.06 0.00 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.92 0.03 0.02 0.02 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.87 0.03 0.02 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.83 0.05 0.00 0.04 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.04 0.01 0.00 0.80 0.04 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.01 0.93 0.02 0.00 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.07 0.05 0.01 0.11 0.05 0.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.85 0.01 0.04 0.08 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.92 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.07 0.87 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.01 0.06 0.77 Baseline Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (a) Baseline NTC. YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.93 0.01 0.00 0.02 0.00 0.00 0.00 0.00 0.03 0.01 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.02 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.01 0.00 0.00 0.20 0.63 0.06 0.00 0.00 0.00 0.09 0.03 0.00 0.00 0.00 0.00 0.00 0.03 0.03 0.02 0.84 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 0.01 0.02 0.02 0.16 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.83 0.01 0.00 0.11 0.01 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.02 0.83 0.00 0.08 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.77 0.14 0.06 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.97 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.02 0.01 0.08 0.20 0.65 0.01 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.70 0.02 0.04 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.90 0.05 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.06 0.87 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.04 0.04 0.91 QoS- A war e Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (b) QoS-aware NTC. YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo P r edicted Label YT_Br owsing YT_Live YT_L ongV ideo YT_ShortV ideo PV_Br owsing PV_Live PV_L ongV ideo T T_Br owsing T T_Live T T_ShortV ideo Zoom_Audio Zoom_BiV ideo Zoom_UpV ideo Zoom_DownV ideo T rue Label 0.17 0.02 0.00 0.77 0.00 0.01 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.10 0.00 0.87 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.08 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.97 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.45 0.46 0.01 0.05 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.97 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.63 0.34 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.00 0.02 0.09 0.17 0.00 0.64 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.03 0.00 0.01 0.92 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.10 0.48 0.00 0.07 0.01 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 0.01 0.07 0.03 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.93 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.93 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.98 P ack et-level MLP -NT C Classification Nor malized Confusion Matrix 0.0 0.2 0.4 0.6 0.8 (c) Packet-le vel NTC [18]. Figure 5: Normalized confusion matrices for baseline NTC, proposed QoS-a ware NTC, and an e xisting packet-le vel NTC. sequences of misclassiﬁcation in network resource allocation. Higher QoS scores indicate better alignment with service-level requirements, while lower scores suggest potential degradation due to inappropriate trafﬁc prioritization. C. P erformance Evaluation on QoS-A war e NTC QoS awareness is ﬁrst extracted before implementing the QoS-aware NTC. Each of the fourteen application usage scenario is represented by a ﬁve-element class sequence, constructed by concatenating its class IDs across the ﬁv e QoS metrics in the following order: [bandwidth, jitter stability , packet stability , burst frequency , burst stability], as detailed in T able III. Using the magnitude-based classiﬁcation algorithm described in Sec. IV, the QoS awareness scores are deriv ed for each application usage scenario. In this study , the magnitude threshold X thresh is set to 1.0, resulting in the follo wing class distributions: bandwidth is divided into three classes (0,1,2), while jitter stability , packet stability , burst frequency , and burst stability are each di vided into two classes (0,1). Based on identical class sequences, the algorithm identiﬁes six distinct QoS awareness groups among the fourteen usages. The largest group, Group 1, includes seven usage scenarios, including Prime V ideo (Browsing and Long-form V ideo), Tik- T ok (Browsing and Short-form V ideo), and Y ouT ube (Brows- ing, Li ve, and Short-form V ideo), all sharing the class se- quence [1, 1, 1, 1, 2], indicativ e of moderate bandwidth and stability requirements. In contrast, Group 5 achieves the highest QoS awareness lev el (score 5), comprising three real- time application usage scenarios including Zoom video confer- encing modes. These scenarios exhibit the class sequence [1, 0, 0, 0, 0], reﬂecting high stability demands and lo w tolerance for jitter and burst v ariability . T o rank the QoS groups by priority , a weighted scoring mechanism is applied using the follo wing metric weights: bandwidth (30%), jitter stability (20%), packet stability (15%), burst frequency (20%), and burst stability (15%). For example, Group 5 achiev es the highest weighted score of 1.35 due to its optimal stability proﬁle, while Group 0 receives the lowest score of 0.30, reﬂecting its relati vely relaxed QoS require- ments. This ranking frame work enables priority-based QoS management, where trafﬁc ﬂows belonging to higher-scored groups are granted preferential treatment in network resource allocation. Such differentiation is critical for maintaining ser - vice quality in latency-sensiti ve and real-time applications. The QoS aw areness mechanism is then integrated into the proposed NTC framew ork. For comparison purposes, a stan- dard MLP classiﬁer that does not incorporate QoS awareness is used as the baseline. Before presenting the QoS performance metrics, we ﬁrst e valuate the traditional classiﬁcation accurac y . As illustrated in Fig. 5, both the baseline classiﬁer and the 9 T able IV: Classiﬁcation results comparison between baseline NTC and QoS-aware NTC. Class Baseline NTC QoS-awar e NTC Precision Recall F1 Precision Recall F1 YT Browsing 0.96 0.91 0.94 0.97 0.93 0.95 YT Live 0.85 0.94 0.89 0.89 0.92 0.91 YT LongV ideo 0.85 0.63 0.72 0.92 0.63 0.75 YT ShortV ideo 0.87 0.82 0.84 0.91 0.84 0.88 PV Browsing 0.79 0.92 0.85 0.94 0.78 0.85 PV Live 0.86 0.87 0.86 0.92 0.83 0.87 PV LongV ideo 0.85 0.83 0.84 0.92 0.83 0.88 TT Browsing 0.82 0.80 0.81 0.89 0.77 0.82 TT Live 0.95 0.93 0.94 0.75 0.97 0.85 TT ShortV ideo 0.70 0.69 0.70 0.79 0.65 0.71 Zoom Audio 0.84 0.85 0.84 0.97 0.70 0.81 Zoom Bi 0.90 0.92 0.91 0.88 0.90 0.89 Zoom Up 0.88 0.87 0.87 0.87 0.87 0.87 Zoom Down 0.83 0.77 0.80 0.68 0.91 0.78 Accuracy 0.86 0.85 Mac. A vg 0.85 0.84 0.84 0.88 0.82 0.84 Wtd. A vg 0.86 0.86 0.86 0.86 0.85 0.85 QoS-aware classiﬁer achie ve high accuracy across various application usage scenarios. A closer examination of T able IV shows that, although the overall accuracy of the QoS-aware model is slightly lo wer than that of the baseline, the difference is marginal. In fact, the a verage weighted accuracy remains the same for both models. Furthermore, the QoS-aw are approach demonstrates improved classiﬁcation accuracy for certain spe- ciﬁc usage types, highlighting its ability to capture nuanced in-app beha viors. In contrast, a state-of-the-art packet-le vel NTC method [18] achiev es only 72.7% accuracy across all usage scenarios. This lower performance is largely due to its tendency to misclassify usage scenarios that originate from the same application, which can mislead QoS pro visioning. W e then demonstrate the improved QoS performance from the QoS-aware NTC. As shown in Fig. 6, the QoS-aware model achie ves a signiﬁcantly higher QoS score ratio of 96.78 compared to the baseline’ s 88.30, representing an improv ement of 8.48 points. Additionally , for misclassiﬁed samples, the QoS-aware model achie ves a satisfaction rate of 91.79% com- pared to the baseline’ s 69.97%. The performance distribution ﬁgures rev eal that while the baseline model may achieve higher overall classiﬁcation accuracy , its misclassiﬁcations often fail to meet QoS requirements, as evidenced by a larger proportion of under-provisioned cases. In contrast, the QoS-aware model shows a substantially smaller proportion of misclassiﬁcations that fail to satisfy QoS lev el requirements, ensuring better service quality for applications. Howe ver , this conservati ve approach may lead to ov er-pro visioning in some cases, potentially resulting in resource wastage as the model tends to assign higher QoS levels to avoid service degradation. D. Mor e Discussion A fundamental design decision in the proposed framework is to classify usage patterns ﬁrst, rather than directly predicting QoS awareness levels as tar get labels. By decoupling these two stages, the framework gains greater stability , ﬂexibility , and interpretability . Usage classiﬁcation remains consistent 0 1 2 3 4 5 T rue QoS A war eness 0 20 40 60 80 100 P er centage of P r edictions (%) 92.2% 7.3% 9.9% 13.9% 91.4% 87.3% 80.1% 84.8% 5.1% 13.4% 93.0% Baseline Classification QoS P erfor mance QoS Scor e R atio: 88.30% | QoS Satisfaction R ate (Misclassified): 69.97% P erfect Match (P r edicted = T rue) Over -conservative (P r edicted > T rue) Under -estimation (P r edicted < T rue) (a) QoS performance with baseline NTC. 0 1 2 3 4 5 T rue QoS A war eness 0 20 40 60 80 100 P er centage of P r edictions (%) 82.4% 7.4% 93.4% 83.1% 76.9% 69.9% 13.4% 12.0% 15.3% 29.7% 98.4% QoS- A war e Classification QoS P erfor mance QoS Scor e R atio: 96.78% | QoS Satisfaction R ate (Misclassified): 91.79% P erfect Match (P r edicted = T rue) Over -conservative (P r edicted > T rue) Under -estimation (P r edicted < T rue) (b) QoS performance with the proposed ﬁne-grained NTC. Figure 6: QoS performance comparison between baseline and proposed ﬁne-grained NTC models. and reusable across different network environments, while QoS policies can be dynamically adapted based on e volving service requirements or resource constraints. The experimental results yield sev eral key insights into the ef fecti veness of the proposed QoS-aw are hierarchical GNN framework for ﬁne-grained network traf ﬁc classiﬁcation. Notably , the three- lev el hierarchical graph structure successfully addresses the limitations of single-scale approaches by enabling the model to learn both local and contextual features. The e v aluation results prov es its effecti veness in capturing multi-scale temporal de- pendencies, which ensure accurate and robust classiﬁcation in ﬁne-grained usage scenarios. Meanwhile, the QoS a wareness is not necessarily obtained with a trade-of f from the traditional classiﬁcation accuracy . In fact, the e v alution results demon- strated a slightly improv ed classiﬁcation accuracy . It is because the QoS awareness impacts more on the uncertain classiﬁca- tion results, which are highly likely to be misclassiﬁed by a normal NTC. The QoS awareness alters the ﬁnal output, which may lead to a correct output. Meanwhile, the QoS-aw are model signiﬁcantly impro ves the QoS Experience Score (96.78 vs. 88.30). This improvement reﬂects the model’ s conservati ve bias toward ov er-pro visioning, which is preferable in practical network management scenarios where under-pro visioning can lead to service degradation, whereas temporary over -allocation is generally more tolerable. 10 Despite these strengths, sev eral limitations warrant consid- eration. The current ev aluation focuses on four major appli- cations, which may limit generalizability to broader trafﬁc domains. Additionally , the conservati ve QoS bias, while ben- eﬁcial for service assurance, may lead to inefﬁcient resource utilization in bandwidth-constrained environments. These ob- servations suggest promising directions for future work, in- cluding dynamic adjustment of QoS weighting strategies and expansion to a wider range of application types and network conditions. V I . C O N C L U S I O N A N D F U T U R E W O R K S This paper presented a hierarchical GNN frame work de- signed for ﬁne-grained, QoS-aware network trafﬁc classiﬁca- tion. By integrating multi-scale graph modeling with a ﬁve- attribute QoS awareness assignment algorithm, the proposed framew ork enables accurate differentiation of in-app usage patterns while maintaining a strong focus on service quality . Experimental results demonstrate that the dev eloped GNN framew ork outperforms a state-of-the-art NTC method in ﬁne-grained service-level application identiﬁcation, achieving an accurac y of 86% compared to 72.9%. Furthermore, the inclusion of QoS-aware adjustment within the ov erall GNN framew ork does not negati vely impact the ov erall classiﬁcation accuracy . On the contrary , it signiﬁcantly enhances the QoS experience, with a notable improv ement in the QoS score (96.78 vs. 88.30) and the QoS satisfaction rate (91.79% vs. 69.97%). This improv ement is particularly valuable in real- world network en vironments, where preserving service quality is essential. T o further improve the adaptability and efﬁcienc y of the framework, future work will focus on dynamic QoS bias adjustment based on real-time network conditions. Addi- tionally , the framew ork will be extended to support a wider range of application types and deployment scenarios. R E F E R E N C E [1] M. Satyanarayanan, P . Bahl, R. Caceres, and N. Davies, “The case for vm-based cloudlets in mobile computing, ” IEEE P ervasive Computing , vol. 8, no. 4, pp. 14–23, 2009. [2] A. Alhakamy , “Extended reality (xr) toward building immersive solutions: The key to unlocking industry 4.0, ” ACM Comput. Surv . , vol. 56, no. 9, Apr . 2024. [Online]. A vailable: https: //doi.org/10.1145/3652595 [3] C. Campolo, A. Molinaro, A. Iera, and F . Menichella, “5g network slic- ing for vehicle-to-ev erything services, ” IEEE W ireless Communications , vol. 24, no. 6, pp. 38–45, 2017. [4] G. Aceto, V . Persico, and A. Pescap ´ e, “ A survey on information and communication technologies for industry 4.0: State-of-the-art, tax- onomies, perspectives, and challenges, ” IEEE Communications Surveys & Tutorials , vol. 21, no. 4, pp. 3467–3501, 2019. [5] E. Papadogiannaki and S. Ioannidis, “ A survey on encrypted network trafﬁc analysis applications, techniques, and countermeasures, ” ACM Comput. Surv . , vol. 54, no. 6, Jul. 2021. [Online]. A v ailable: https://doi.org/10.1145/3457904 [6] M. S. Sheikh and Y . Peng, “Procedures, criteria, and machine learning techniques for network trafﬁc classiﬁcation: A surve y , ” IEEE Access , vol. 10, pp. 61 135–61 158, 2022. [7] A. Shahraki, M. Abbasi, A. T aherkordi, and A. D. Jurcut, “ Acti ve learning for network trafﬁc classiﬁcation: A technical study , ” IEEE T ransactions on Cognitive Communications and Networking , vol. 8, no. 1, pp. 422–439, 2022. [8] A. Azab, M. Khasawneh, S. Alrabaee, K.-K. R. Choo, and M. Sarsour, “Network trafﬁc classiﬁcation: T echniques, datasets, and challenges, ” Digital Communications and Networks , vol. 10, no. 3, pp. 676– 692, 2024. [Online]. A vailable: https://www .sciencedirect.com/science/ article/pii/S2352864822001845 [9] T . Shapira and Y . Shavitt, “Flowpic: A generic representation for encrypted trafﬁc classiﬁcation and applications identiﬁcation, ” IEEE T ransactions on Network and Service Management , vol. 18, no. 2, pp. 1218–1232, 2021. [10] T .-D. Pham, T .-L. Ho, T . Truong-Huu, T .-D. Cao, and H.- L. T ruong, “Mappgraph: Mobile-app classiﬁcation on encrypted network trafﬁc using deep graph con volution neural networks, ” in Pr oceedings of the 37th Annual Computer Security Applications Confer ence , ser . ACSA C ’21. Ne w Y ork, NY , USA: Association for Computing Machinery , 2021, p. 1025–1038. [Online]. A vailable: https://doi.org/10.1145/3485832.3485925 [11] T .-L. Huoh, Y . Luo, P . Li, and T . Zhang, “Flow-based encrypted network trafﬁc classiﬁcation with graph neural networks, ” IEEE T ransactions on Network and Service Management , vol. 20, no. 2, pp. 1224–1237, 2023. [12] O. Aouedi, K. Piamrat, and B. Parrein, “Ensemble-based deep learning model for network trafﬁc classiﬁcation, ” IEEE T ransactions on Network and Service Management , vol. 19, no. 4, pp. 4124–4135, 2022. [13] X. Duan, Y . Fu, and K. W ang, “Network trafﬁc anomaly detection method based on multi-scale residual classiﬁer, ” Computer Communications , vol. 198, pp. 206–216, 2023. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/S0140366422004121 [14] Q. Ma, C. Sun, B. Cui, and X. Jin, “ A novel model for anomaly detection in network trafﬁc based on kernel support vector machine, ” Computers & Security , vol. 104, p. 102215, 2021. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/S0167404821000390 [15] M. B. Pranto, M. H. A. Ratul, M. M. Rahman, I. J. Diya, and Z.-B. Zahir , “Performance of machine learning techniques in anomaly detection with basic feature selection strategy-a network intrusion detection system, ” J . Adv . Inf. T echnol , vol. 13, no. 1, 2022. [16] C. Y u, J. Lan, J. Xie, and Y . Hu, “Qos-aware trafﬁc classiﬁcation architecture using machine learning and deep packet inspection in sdns, ” Pr ocedia Computer Science , vol. 131, pp. 1209–1216, 2018, recent Advancement in Information and Communication T echnology:. [Online]. A vailable: https://www .sciencedirect.com/science/article/pii/ S1877050918307129 [17] M. Beshley , N. Kryvinska, H. Beshley , O. Panchenko, and M. Med- vetskyi, “Traf ﬁc engineering and qos/qoe supporting techniques for emerging service-oriented software-deﬁned network, ” Journal of Com- munications and Networks , vol. 26, no. 1, pp. 99–114, 2024. [18] J. Zhang, F . Li, and F . Y e, “Sustaining the high performance of ai- based network traf ﬁc classiﬁcation models, ” IEEE/ACM T ransactions on Networking , vol. 31, no. 2, pp. 816–827, April 2023. [19] R. Zhao, M. Zhan, X. Deng, Y . W ang, Y . W ang, G. Gui, and Z. Xue, “Y et another trafﬁc classiﬁer: A masked autoencoder based trafﬁc transformer with multi-level ﬂo w representation, ” in Proceedings of the AAAI Confer ence on Artiﬁcial Intelligence , vol. 37, 2023, pp. 5420– 5427. [20] M. Shen, J. Zhang, L. Zhu, K. Xu, and X. Du, “ Accurate decentralized application identiﬁcation via encrypted trafﬁc analysis using graph neural networks, ” IEEE Tr ansactions on Information F or ensics and Security , vol. 16, pp. 2367–2380, 2021. [21] W . Li, X.-Y . Zhang, H. Bao, H. Shi, and Q. W ang, “Prograph: Ro- bust network trafﬁc identiﬁcation with graph propagation, ” IEEE/ACM T ransactions on Networking , vol. 31, no. 3, pp. 1385–1399, 2023. [22] Z. Zhao, Z. Li, X. Xie, J. Y u, F . Zhang, R. Zhang, B. Chen, X. Luo, M. Hu, and W . Ma, “: T o wards ﬁne-grained unknown class detection against the open-set attack spectrum with v ariable le gitimate traf ﬁc, ” IEEE/ACM T ransactions on Networking , 2024. [23] G. Apruzzese, P . Laskov , and J. Schneider , “Sok: Pragmatic assessment of machine learning for network intrusion detection, ” in 2023 IEEE 8th Eur opean Symposium on Security and Privacy (Eur oS&P) , 2023, pp. 592–614. [24] N. Mathews, J. K. Holland, S. E. Oh, M. S. Rahman, N. Hopper , and M. Wright, “Sok: A critical evaluation of efﬁcient website ﬁngerprinting defenses, ” in 2023 IEEE Symposium on Security and Privacy (SP) , 2023, pp. 969–986. [25] K. W ang, Z. W ang, D. Han, W . Chen, J. Y ang, X. Shi, and X. Y in, “Bars: Local robustness certiﬁcation for deep learning based trafﬁc analysis systems. ” in NDSS , 2023. [26] A. F . Diallo and P . Patras, “ Adaptiv e clustering-based malicious trafﬁc classiﬁcation at the network edge, ” in IEEE INFOCOM 2021 - IEEE Confer ence on Computer Communications , 2021, pp. 1–10. 11 [27] L. Y ang, W . Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. W ang, “ { CADE } : Detecting and explaining concept drift samples for security applications, ” in 30th USENIX Security Symposium (USENIX Security 21) , 2021, pp. 2327–2344. [28] M. Abbasi, S. L ´ opez Fl ´ orez, A. Shahraki, A. T aherkordi, J. Prieto, and J. M. Corchado, “Class imbalance in network trafﬁc classiﬁcation: An adaptiv e weight ensemble-of-ensemble learning method, ” IEEE Access , vol. 13, pp. 26 171–26 192, 2025. [29] B. Park, J. W .-K. Hong, and Y . J. W on, “T oward ﬁne-grained traf ﬁc classiﬁcation, ” IEEE Communications Magazine , v ol. 49, no. 7, pp. 104– 111, July 2011. [30] P .-C. Lin, S.-Y . Chen, and C.-H. Lin, “T owards ﬁne-grained traf ﬁc classi- ﬁcation for web applications, ” in 2014 Australasian T elecommunication Networks and Applications Conference (ATN AC) , 2014, pp. 28–33. [31] Y . Fu, H. Xiong, X. Lu, J. Y ang, and C. Chen, “Service usage classiﬁcation with encrypted internet trafﬁc in mobile messaging apps, ” IEEE T ransactions on Mobile Computing , vol. 15, no. 11, pp. 2851– 2864, 2016. [32] J. Liu, Y . Fu, J. Ming, Y . Ren, L. Sun, and H. Xiong, “Effecti ve and real-time in-app activity analysis in encrypted internet trafﬁc streams, ” in Proceedings of the 23rd A CM SIGKDD International Conference on Knowledge Discovery and Data Mining , ser. KDD ’17. New Y ork, NY , USA: Association for Computing Machinery , 2017, p. 335–344. [Online]. A vailable: https://doi.org/10.1145/3097983.3098049 [33] S. Brody , U. Alon, and E. Y ahav , “How attentiv e are graph attention networks?” CoRR , vol. abs/2105.14491, 2021. [Online]. A vailable: https://arxiv .or g/abs/2105.14491 [34] E. Fusillo, “Pcapdroid: No-root network monitor, ﬁrewall and pcap dumper for android, ” https://github.com/emanuele- f/PCAPdroid, 2020, accessed: 2025-04-29.

Fine-Grained Network Traffic Classification with Contextual QoS Profiling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment