From Simulation to Deep Learning: Survey on Network Performance Modeling Approaches

F rom Sim ulation to Deep Learning: Surv ey on Net w ork P erformance Mo deling Approac hes Carlos Güemes-P alau , Miquel F erriol-Galmés, Jordi P aillisse-Vilanov a, P ere Barlet-Ros, Alb ert Cabellos-Aparicio Universitat Politè cnic a de Catalunya (UPC), Bar c elona, Catalonia, Sp ain Abstract Net work p erformance mo deling is a ﬁeld that predates early computer net- w orks and the b eginning of the Internet. It aims to predict the traﬃc p erfor- mance of pac ket ﬂo ws in a giv en net work. Its applications range from net w ork planning and troublesho oting to feeding information to netw ork con trollers for conﬁguration optimization. T raditional net w ork p erformance mo deling has relied heavily on Discrete Even t Simulation (DES) and analytical meth- o ds grounded in mathematical theories such as Queuing Theory and Net work Calculus. Ho wev er, as of late, we ha v e observed a paradigm shift, with at- tempts to obtain eﬃcient Parallel DES, the surge of Machine Learning mo d- els, and their in tegration with other methodologies in h ybrid approac hes. This has resulted in a great v ariet y of mo deling approac hes, eac h with its strengths and often tailored to sp eciﬁc scenarios or requiremen ts. In this pa- p er, w e comprehensiv ely survey the relev an t netw ork p erformance mo deling approac hes for wired netw orks o ver the last decades. With this understand- ing, w e also deﬁne a taxonomy of approaches, summarizing our understand- ing of the state-of-the-art and ho w b oth tec hnology and the concerns of the researc h communit y ev olv e ov er time. Finally , w e also consider how these mo dels are ev aluated, how their diﬀerent nature results in diﬀerent ev alua- tion requiremen ts and goals, and how this ma y complicate their comparison. Keywor ds: net work mo deling, net work p erformance, net work simulation, analytical mo dels, machine learning, deep learning Email addr ess: carlos.guemes@upc.edu (Carlos Güe mes-P alau) Con ten ts 1 In tro duction 3 1.1 Metho dology and scop e . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Describing Net work Mo dels . . . . . . . . . . . . . . . . . . . 7 1.3 Surv ey Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Structure of Surv ey . . . . . . . . . . . . . . . . . . . . . . . . 9 2 T axonomy of Net work Performance Mo dels 9 3 Net w ork Simulation 13 3.1 Discrete Ev ent Simulation . . . . . . . . . . . . . . . . . . . . 13 3.2 P arallel Discrete Even t Sim ulation . . . . . . . . . . . . . . . . 17 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Analytical Mo dels 20 4.1 Queuing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Fluid Mo dels . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Net work Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Algebraic-based Discrete Net work Calculus (ADNC) . . 29 4.3.2 Optimization-based Discrete Net work Calculus (ODNC) 30 4.3.3 Sto c hastic Net work Calculus (SNC) . . . . . . . . . . . 30 4.4 Other Analytical Mo dels . . . . . . . . . . . . . . . . . . . . . 31 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Mac hine Learning Mo dels 32 5.1 “Shallo w" ML Mo dels . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Deep Learning and Graph Neural Net works . . . . . . . . . . 38 5.2.1 RouteNet and Successors . . . . . . . . . . . . . . . . . 38 5.2.2 Other GNN Mo dels . . . . . . . . . . . . . . . . . . . . 40 5.2.3 Other DL Mo dels . . . . . . . . . . . . . . . . . . . . . 43 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Hybrid Approac hes 45 6.1 Mo del-tuned Emulation for Performance Mo deling . . . . . . . 45 6.2 ML + Analytical Hybrid Mo dels . . . . . . . . . . . . . . . . . 48 6.3 A ccelerated DES . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.4 Sim ulation with DL-Enhanced Accuracy . . . . . . . . . . . . 51 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2 7 Discussion on Identiﬁed T rends and Challenges within Net- w ork Performance Mo deling 52 7.1 Balance Betw een Accuracy , Resolution, Applicability , and In- ference Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2 The Dominance of DES and the Surge of GNNs . . . . . . . . 54 7.3 Reduced In terest in Analytical Mo dels . . . . . . . . . . . . . 55 7.4 A dapting to Changing Netw orks . . . . . . . . . . . . . . . . . 56 7.5 Sim ulation-Dominated Ev aluation . . . . . . . . . . . . . . . . 57 7.6 Heterogeneous Approac hes, Heterogeneous Ev aluations . . . . 59 8 F uture Directions and Opp ortunities 60 8.1 Consolidation of PDES and GNNs Mo dels . . . . . . . . . . . 60 8.2 Analytical Mo dels Enhancing DL Mo dels . . . . . . . . . . . . 61 8.3 ML as a New T o ol for Ev olving Netw orks . . . . . . . . . . . . 61 8.4 Data Cen ter-Centric Designs . . . . . . . . . . . . . . . . . . . 62 8.5 Better Usage of Sim ulation Data for T raining Real-W orld Mo dels 63 9 Conclusions 63 1. In tro duction P erformance mo deling has long been a fundamen tal tool in computer net working. By enabling the prediction and ev aluation of netw ork b eha vior without requiring disruptiv e c hanges in live systems, mo dels supp ort netw ork design, planning, and proto col developmen t. F rom early bac kb one netw orks to mo dern cloud infrastructures, p erformance mo dels hav e help ed op erators mak e informed, data-driv en decisions ab out scalability , reliability , and eﬃ- ciency . In the research comm unity , muc h of the fo cus ov er the past tw o decades has b een on wireless net works, driven b y the rapid rise of mobile communi- cation, 5G/6G tec hnologies, and the Internet of Things. Wired netw orks, by con trast, seemed comparativ ely static: their role w as often limited to large national bac kb ones such as GBN [1] or Abilene [2], and for a time, mo deling researc h in this area stagnated. Ho wev er, this p erception dramatically c hanged with the rise of data cen- ters. Once motiv ated mainly b y cloud computing and large-scale w eb ser- vices, to da y’s data cen ters are increasingly driv en b y artiﬁcial in telligence (AI) and large language mo del (LLM) training workloads. According to the 3 In ternational Energy Agency [3], conv en tional serv ers increased their energy demand from 145 TWh in 2020 to 195 TWh in 2024, with pro jections ex- ceeding 300 TWh b y 2030. A ccelerated serv ers designed for AI workloads pro ject even steep er growth, increasing from 10 TWh in 2020 to 60 TWh in 2024, and are similarly exp ected to surpass 300 TWh by 2030. These ﬁgures illustrate the unpreceden ted demands on wired netw orks, particularly data cen ter net works, and the reason wh y mo deling them has b ecome an urgen t researc h priority . Unlik e domains suc h as electromagnetism or ﬂuid dynamics, where gov- erning equations capture system b ehavior, computer netw orks lack a single set of exact mathematical laws. Their discrete nature, combined with in- ternet traﬃc’s burst y and self-correlated b eha vior [4, 5] and the complexity of congestion control mak es accurate mo deling a con tinuous challenge. As net works evolv e, so do mo deling approaches, from formula-based analytical mo dels and pack et-lev el discrete ev ent sim ulation to, more recently , mac hine learning–driv en prediction. F rom the 1990s to the 2010s, most net work performance models were based on analytical mo deling or simulation. Analytical models, such as those built on Queuing Theory (QT) [6], describ ed netw orks as systems of equa- tions. While computationally eﬃcient, they often rely on simpliﬁed assump- tions ab out traﬃc and service distributions. Net work Calculus (NC) [7, 8], alternativ ely , oﬀered deterministic w orst-case p erformance b ounds without assuming sp eciﬁc traﬃc or service distributions. Ho w ever, its b ounds were often ov erly conserv ative and limited to feedforward top ologies. Con versely , Discrete Ev en t Simulation (DES) b ecame p opular b y oﬀering high-ﬁdelity , pac ket-lev el mo deling of netw orks. Y et, its computational cost, prop ortional to the num b er of net w ork ev en ts, made it impractical for large-scale and high-sp eed net works. While each approach had its strengths, neither fully met the gro wing demands for scalability , expressiveness, and accuracy . After a p eriod of stagnation in the 2010s, researc h in netw ork p erformance mo deling w as revitalized b y the rise of Mac hine Learning (ML). Although the ﬁrst applications of ML to net work mo deling date back to 2001, it was the surge of Deep Learning (DL) mo dels in 2018 that brough t a signiﬁcan t breakthrough. These mo dels prov ed capable of accurately predicting net- w ork p erformance while b eing relatively inexp ensiv e to train and run. This mark ed a ma jor shift aw ay from traditional analytical mo dels. Although an- alytical approac hes provide theoretical guaran tees, DL mo dels ha ve increas- ingly outp erformed them in practice, pro viding higher accuracy at similar 4 1990 1991-2000 2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 Y ear 0 10 20 30 Number of Models Simulation Analytical Models ML Models Hybrid Appr oaches Figure 1: Iden tiﬁed mo dels p er year of publication and type. Does not consider other t yp es of publications, suc h as surv eys. If the same mo del is cov ered in multiple pap ers, it is counted once in the y ear of its earliest publication. or even low er computational costs. As a result, DL-based models quic kly b ecame the dominant approac h in the literature. Bey ond replacing earlier tec hniques, DL also inspired a second paradigm shift: the emergence of h y- brid approaches. Researc hers b egan integrating ML to existing approaches to lev erage their complementary strengths. F or instance, ML mo dels ha ve b een incorp orated in to DES frameworks to reduce sim ulation time without sacriﬁcing detail or expressiv eness. This trend is reﬂected in Figure 1, whic h shows the num b er of net w ork p erformance mo dels published o ver the past decades. Researc h activity p eak ed in the late 1990s and early 2000s, follo w ed b y a perio d of stagna- tion around 2010. Ho wev er, since 2018, the ﬁeld has exp erienced a strong resurgence, with ML-based models becoming the dominant approach. The ﬁgure also sho ws the steady rise of h ybrid metho ds and a decline in the use of traditional analytical mo dels. In contrast, the n umber of simulation- based mo dels has remained relatively stable. Although sim ulation remains a reliable and expressiv e mo deling to ol, recen t researc h has shifted tow ard impro ving its scalability and execution sp eed. In this survey , w e study the evolution of net work p erformance models o ver time. W e analyze the motiv ations behind each shift, the limitations that shap ed subsequen t approaches, and the emerging trends in the latest generation of models. A dditionally , we highligh t the c hallenges the ﬁeld curren tly faces and the directions research is taking to address them. While surv eys exist in related areas, such as wireless net works [9, 10, 11] or sp eciﬁc mo deling approaches lik e NC [12, 13] and Graph Neural Netw orks [14], to the best of our kno wledge, there has not been a dedicated surv ey that revisits 5 p erformance mo deling with a fo cus on wired net works. 1.1. Metho dolo gy and sc op e T o ﬁnd all relev ant models published in the state-of-the-art (from now on referred to as SotA), w e searc hed within IEEE Xplore, A CM’s Digital Library , and Elvisier’s ScienceDirect for pap ers with either the terms “net work mo d- eling" or “net w ork p erformance". W e fo cused on papers published within the IEEE INFOCOM, IEEE NOMS, IEEE/ACM T ransactions on Netw orking, A CM SIGCOMM, ACM CoNEXT, Elsevier Computer Netw orks, and Else- vier Computer Communications conferences and journals. W e then excluded pap ers that fell outside the scop e of the surv ey and pap ers that lac k ed a net work mo del capable of predicting performance metrics. F or certain high- impact pap ers, such as [15, 16, 17, 18], w e also reviewed articles that cite them. F or older metho ds, suc h as the original QT mo dels, we started b y searc hing in the references of the pap ers w e had already found, and then rep eated this pro cess recursively until ﬁnding their original precursor. This surv ey limits itself to mo dels capable of replicating or predicting the p erformance of traﬃc ﬂows within wired netw orks. While there are similar researc h ﬁelds, ultimately , the solutions they encompass attempt to solve diﬀeren tly-natured problems, rendering their comparison fruitless. This ap- plies to the follo wing ﬁelds: • Wireless and mobile net w orks : The metho dologies for mo deling wired and wireless net works v ary signiﬁcantly due to their diﬀerences. Hence, in this surv ey , w e focus only on wired netw orks, and w e refer the in terested reader to existing surveys on wireless netw orks [9, 10, 11]. • Net w ork veriﬁcation : Rather than predicting netw ork b eha vior un- der sp eciﬁc conditions, these mo dels fo cus on ensuring the viability of p oten tial net w ork conﬁgurations (e.g., if certain QoS guarantees are main tained [19, 20]). • T raﬃc prediction models : Unlik e netw ork mo deling, whose task is to predict the p erformance of traﬃc ﬂo ws within the net work, in traﬃc prediction, the task is to predict and describ e the incoming traﬃc ﬂo ws. • Anomaly detection : These mo dels are task ed with detecting anoma- lous patterns that can b e indicative of a net w ork securit y threat. Rather than making predictions, they discriminate betw een the “normal" ex- p ected b ehaviors and the p oten tially dangerous unexp ected ones. 6 1.2. Describing Network Mo dels Throughout the surv ey , we will summarize the identiﬁed mo dels in tables. In the follo wing section, we describ e ho w we identify mo dels: Mo del T yp e Sp eciﬁc t ype of mo del (e.g., if sim ulation, type of sim ulator; if ML mo del, whic h ML architecture...). Input Sc op e Refers to the scop e of the input data of the mo del: • Single-ﬂo w scenarios: mo dels can only mo del single ﬂows. They do not consider cross-traﬃc in teractions. They also ignore the underlying net work top ology , or only consider individual links or devices. • T raﬃc-matrix scenarios: models consider cross-traﬃc interactions. All ﬂo ws are explicitly represented, even if the mo del pro duces predictions for just a subset of them. Ho w ever, they still heavily simplify or ignore the underlying net work top ology . • F ull netw ork scenarios: mo dels consider cross-traﬃc in teractions, and also consider the net work’s top ology , routing, and c haracteristics in their calculations. Output Sc op e Refers to the scop e of the predicted p erformance metrics: • Flo w-level: av erage p erformance metrics for each ﬂow. • Flo w-level with temp oral component: like b efore, but with a temp o- ral comp onent. Metho ds ma y v ary in the coarseness of the temp oral comp onen t or ho w time is aggregated. • P ack et-lev el: can predict individual pack ets. Supp orte d T r aﬃc T yp es • UDP: T raﬃc without congestion control. • TCP: T raﬃc regulated b y congestion control. Some mo dels may con- sider a generic congestion con trol algorithm; others consider one or m ultiple versions of TCP . 7 • Non-sp eciﬁc: The mo del do es not sp ecify which traﬃc distributions it supp orts. How ev er, b ecause of the nature of ho w they model traﬃc distributions, this ma y result in v arying degrees of error dep ending on the pac k et arriv al time distribution (e.g., queue mo dels mo deling traﬃc through a P oisson pro cess) • An y: can supp ort faithfully any traﬃc type. Supp orte d Performanc e Metrics These usually dep end on the output scop e and traﬃc t yp e supp orted by the net work mo del: • Mo dels with pack et-lev el predictions usually predict the full pac k et in- formation (i.e., the timestamp of the pack et at eac h p oin t of its routing path) • Mo dels for UDP traﬃc fo cus on pack et dela y , jitter, and loss rates. • Mo dels for TCP traﬃc ma y also fo cus on ﬂow completion time (F CT), throughput, and pac ket round-trip time (R TT). Evaluation Most net work models are ev aluated or oﬀer theoretical guaran tees. If they are, w e categorize ev aluation in the follo wing manner: • Analytical ev aluation: prop erties are analyzed from an analytical view- p oin t only (i.e., veriﬁed through formal pro ofs). Common in analytical mo dels. Restricted to assumptions made by the analysis itself. • Sim ulated data ev aluation: The mo del was ev aluated using simulated data against the sim ulator’s ground truth. • T estb ed data ev aluation: The mo del was ev aluated using data gener- ated and captured from a testb ed netw ork. • Real data ev aluation: The mo del was ev aluated using captured real in ternet data. 8 1.3. Survey Outc omes The main con tributions of this survey are as follows: • W e in tro duce a taxonom y for describing the diﬀeren t kinds of net work p erformance mo dels, allowing us to b etter understand the SotA and ho w the v arious approac hes inﬂuence and complemen t each other. • W e p erform a comprehensive study of the relev an t net work p erformance mo dels present in the SotA. • W e discuss the trends and limitations of the current SotA of net- w ork p erformance mo deling, comparing them to those iden tiﬁed b y researc hers back at the start of the millennium. 1.4. Structur e of Survey F or ease of reading, w e summarize the structure of the survey . W e also include a summary of the acron yms used throughout the survey in T able 1. • In Section 2 w e in tro duce and describ e our taxonomy for netw ork p er- formance mo dels. • In the main b o dy of the pap er, w e surv ey the curren t or previously relev ant net w ork performance mo dels. Sp eciﬁcally , Section 3 co v ers net work sim ulators, Section 4 analytical mo dels, Section 5 ML mo dels, and Section 6 h ybrid approaches. • Section 7 includes our discussion of the SotA of netw ork p erformance mo dels. • Finally , in Section 8 w e discuss future directions of research into net- w ork p erformance mo dels. 2. T axonomy of Net work Performance Mo dels The evolution of netw ork p erformance mo dels is summarized in Figure 2. This taxonomy shows the four main types of net work mo dels, sim ulation, analytical mo dels, ML mo dels, and hybrid approac hes, and how they interact. The earliest net w ork p erformance models w ere analytical, based on Queu- ing Theory (QT) [6]. These mo dels describ e the b eha vior of pack ets in de- vices through systems of queues, based on giv en assumptions (e.g., pac ket 9 A cron ym Description ADNC Algebraic-based Deterministic Netw ork Calculus AI Artiﬁcial Intelligence A QM A ctive Queue Management CV AE Conditional V ariational Auto-Enco der [P]DES [Parallel] Discrete Even t Sim ulation DCQCN Data Center Quantized Congestion Notiﬁcation [21] DL Deep Learning DNN Dense Neural Netw ork EVT Extreme V alue Theory F CT Flo w Completion Time FIF O First-In First-Out GCN Graph Conv olutional Net w ork [22] GGNN Gated Graph Neural Netw ork [23] GNN Graph Neural Netw ork [24] IPG Inter-P ac k et Gap LLM Large Language Mo del LP Logical Pro cess LPP Linear Programming Problem [Bi]LSTM [Bidirectional] Long Short-T erm Memory [25] [EW]MA [Exp onen tially W eighted] Mo ving A verage ML Machine Learning MPNN Message-Passing Neural Netw ork [26] NC Netw ork Calculus NP Neural Pro cesses [27] ns[-2 ∥ -3] The Netw ork Sim ulator [28, 29] ODE (System of ) Ordinary Diﬀeren tial Equations ODNC Optimization-based Deterministic Netw ork Calculus PBOO Pa y Bursts Only Once [30] PDE (System of ) P artial Diﬀerential Equations PMOO Pa y Multiplexing Only Once [31] QoS Quality of Service QT Queuing Theory RBFNN Radial Basis F un ction Neural Netw ork RED Random Early Detection RF Random F orest R GCN Relational Graph Con volutional Netw ork [32] RNN Recurrent Neural Netw ork R ON Resilien t Overla y Netw orks [33] 10 R ON Resilien t Overla y Netw orks [33] R TT Round T rip Time SDE (System of ) Sto c hastic Diﬀeren tial Equations SF A Separate Flow Analysis [30] SNC Sto c hastic Netw ork Calculus SVR Supp ort V ector Regression [DC]TCP [Data Center [34]] T ransmission Con trol Proto col TF A T otal Flo w Analysis [7, 8] TMA T andem Matching Analysis [35] UDP User Datagram Proto col T able 1: List of acronyms arriv al distributions). Simpler QT models, constrained b y stricter assump- tions, are often inaccurate. More adv anced mo dels relax these assumptions and impro ve accuracy , but b ecome signiﬁcan tly harder to solv e. A ddition- ally , QT mo dels t ypically pro vide only aggregate predictions, suc h as a verage pac ket dela y , rather than detailed, per-pack et insights. This makes them bet- ter suited for general approximations of netw ork p erformance, rather than thoroughly examining sp eciﬁc, complex scenarios. This has inspired other t yp es of analytical mo dels, in tending to improv e accuracy , cost, and expressive ness compared to their QT counterparts. On the one hand, ﬂuid queuing mo dels iterate o v er traditional discrete QT mo d- els while assuming that transmitted data can b e represented as a contin uous v alue. While this assumption do es not hold in reality , it allo ws for net work p erformance mo dels built through systems of diﬀerential equations, whic h can b e solv ed eﬃciently . Hence, these mo dels are inexp ensiv e, while the in- clusion of the temp oral comp onen t in the diﬀerential equations also mak es them more expressiv e. Alternativ ely , other researc hers turn to Net w ork Calculus (NC) [7, 8]. Unlik e QT, initial NC mo dels aimed to reduce the n umber of assumptions, include the temp oral comp onen t, and provide predictable and theoretically pro ven correct p erformance predictions at a lo w computational cost. This is ac hieved b y oﬀering w orst-case b ounds for the pac ket dela y as a function of time. Ho w ev er, while the b ounds are pro ved to b e correct, early NC mo dels also sho w that these tend to b e to o pessimistic, while more complex NC mo dels prov e to b e to o computationally exp ensiv e to b e practical. By contrast, an old y et popular alternativ e for building net w ork mod- 11 Discrete Event Simulation (1988) Early Parallelizable DES (2001) New Parallelizable DES (2016) Queuing Theory (1963) Network Calculus (1991) Machine Learning Models (2001) ML + Analytical Hybrid Models (2019) Fluid Queuing Models (1999) DL-Accelerated DES (2021) Deep Learning Models (2018) Simulation Analytical Models Maching Learning Models Hybrid Approaches Other Analytical Models Model-tuned Emulation (2018) Accelerated DES (2004) Simulation with DL- Enhanced Accuracy (2023) Figure 2: T axonomy of net work p erformance mo dels. Solid line arrows indicate direct ev olution, while dashed line arro ws indicate inspiration or a “response to". Y ear indicates the year of the earliest publication within that category . els is sim ulation, sp eciﬁcally Discrete Even t Simulation (DES). It allows for accurate, ﬁne-grained results. Popular DES pro jects like OMNET [36] and ns [28] are op en source, allo wing researc hers to share the implemen tations for new er proto cols and devices and facilitating their use. The most signiﬁcant limitation of DES is its computational cost, which is compounded b y the fact that its sequential nature mak es it extremely hard to parallelize. As a result, the developmen t of parallelizable DES (PDES) b ecame a prominen t researc h area, driven b oth by the technical c hallenges inv olv ed and the p o- ten tial b eneﬁts: an accurate, detailed mo deling approac h that could scale to large net works if implemented eﬀectively . QT and other analytical mo dels also inﬂuenced the in tro duction of the ﬁrst ML mo dels. Originally , the researc hers referred to the former as “formula- based" and the latter as “history-based" [37]. The ob jectiv e was for these mo dels to oﬀer better accuracy-cost trade-oﬀs compared to the analytical mo dels. Ho w ever, ML mo dels would not start to thrive until 2018, with the in tro duction of DL. New er architectures, such as Graph Neural Netw orks [24], w ere expressive enough to extract and process as m uch information as was a v ailable and pro duce more accurate results. Since their in tro duction, DL mo dels ha ve dominated, completely o vertaking analytical mo dels as they oﬀered more accurate p erformance predictions at similar or lo wer computa- 12 tional costs. F urthermore, b ey ond building ML or DL mo dels directly , there is also an in terest in integrating them in to previously researched approaches. W e refer to these as hybrid approac hes and are, b y their nature, extremely v aried as to how they may ac hiev e this. F or example, ML and analytical mo dels can b e com bined b y using the output predictions of one approach as input to the other and obtain an ov erall more accurate prediction than if done separately [38, 39]. Another approac h is to augment net w ork em ula- tors, useful for netw ork veriﬁcation but not for p erformance prediction, with queuing mo dels and Bay esian optimization to enable accurate p erformance prediction [40, 41]. One of the most promising directions in volv es enhancing DES b y integrating ﬂuid models and, more recently , DL. This approac h aims to supp ort parallelization and low er simulation costs b y replacing selected comp onen ts with alternativ e mo dels, while preserving the high accuracy and expressiv eness of traditional DES [42, 15]. Finally , recen t w ork has explored using DL mo dels to enhance simulation accuracy in situations where they are unable to predict net work b eha vior faithfully [43, 44]. 3. Net w ork Simulation This section focuses on netw ork mo dels that are implemented through sim ulation. These are summarized in T able 2. 3.1. Discr ete Event Simulation Discrete Ev ent Simulation describ es the scenario through a global state and a sequence of “discrete even ts" ov er time [45]. These ev ents are deﬁned as p oin ts in time at whic h the state of the simulation (i.e., the netw ork and the traﬃc within it) changes. It includes the generation of pac kets b eing in tro duced in to a link, reac hing the other side and b eing buﬀered, to then b eing queued and pro cessed by the device. The simulator mo dels the state of the net work devices and the rules to pro cess the diﬀerent ev ents b y study- ing, and sometimes ev en re-implemen ting their proto cols. Proto cols can b e studied through their sp eciﬁcation (e.g., RFC publications). 13 Mo del T yp e Input scop e Output scop e T raﬃc t yp e P erformance metrics Ev aluation REAL [46] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion ns [28], -2, -3 [29] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion OMNeT, OM- NeT++ [36] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion OPNET [47] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion CaSiNo [45] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion WNS [48] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion IKR Sim ulation Library [49] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Ja v a mo deling to ols [50] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion gem5 [51] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion 14 p d-gem5 [52] DES F ull net work scenarios P ack et-lev el An y F ull pac ket path informa- tion T estb ed data SimBric ks [53] Mo dular DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data SplitSim [54] Mo dular DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data DONS [18] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data Unison [55] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data NSX [56] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Only ev aluates cost, not accuracy P arsimon [57] Link-level DES F ull net work scenarios Flo w-level TCP (F o cuses on DCTCP , but generalizable) T ail F CT (90th p ercen tile or higher) Sim ulated data T able 2: Summary of simulated net w ork p erformance mo dels 15 The earliest dedicated net w ork DES sim ulator is REAL [46], originally built to ev aluate the p erformance of queuing algorithms in gatew ays. It then acted as a basis for the more general purp ose “The Netw ork Simulator" or ns [28]. ns has been updated o ver the y ears, reac hing v2.0 (kno wn as ns- 2) in 1997, and later v3 (ns-3) in 2010 [29]. Along with OMNeT (later to b ecome OMNeT++) [36], they are the most widely used net work simulators in academia since their creation in the late 1990s. Note that, during their dev elopment, many alternative DES soft ware were released. This includes OPNET [47], CaSiNo [45], Op en WNS [48], IKR Sim- ulation Library [49], Jav a mo deling to ols [50] and gem5 [51]. Ultimately , ns and OMNeT remained the most p opular choices due to t w o factors. First, they are open-source pro jects with con tinuous dev elopment ov er the y ears, allo wing them to stay up dated and supp ort new technologies. Second, b oth ha ve p ermissiv e licenses that allow them to b e used freely for research pur- p oses: ns uses the GNU GPL, and OMNeT uses the A cademic Public license. The main adv an tage of DES is its completeness: b y fully simulating the individual in teractions in the net w ork, the sim ulator can return accurate and complete descriptions of the traﬃc behavior. It can also b e used to understand cross-traﬃc in teractions, the impact of the netw ork top ology and routing, and congestion con trol proto cols. Because of its accuracy and lev el of detail, DES is the preferred c hoice of net w ork op erators whenev er applicable. F urthermore, as w e explore other alternativ es of net w ork mo dels, one quic kly realizes ho w DES is widely used as a baseline (discussed later in Section 7.5). In addition, in the case of models that undergo some “training" process, DES is commonly used as a source of the training scenarios due to its ability to generate large quan tities of scenarios, sp eciﬁcally those to o hard to capture in real-life net works. Ho wev er, its completeness also results in its main drawbac k: computa- tional cost. The cost of DES is prop ortional to the n umber of ev ents in the net work and, in turn, to the amount of traﬃc in the netw ork, which becomes unmanageable in larger, high-capacit y net w orks [58, 15, 16]. F urthermore, this is exacerbated b y the fact that DES simulators, due to the sequen tial nature of the even ts, are usually implemented as single-thread programs (we co ver PDES and its challenges in the next subsection). F urthermore, sometimes simulation softw are fails to accurately describ e certain proto cols or netw ork devices. This may b e b ecause the implementa- tion of certain devices is unknown, protected b y intellectual protection la ws, so their b ehavior cannot b e accurately replicated b y sim ulators [58, 40]. Sim- 16 ilarly , as we discuss later in Section 7, the evolution of in ternet traﬃc and algorithms used may result in diﬀerent implementations of proto cols such as TCP , and the release of new v ersions that ma y y et to b e supported b y the sim ulation soft ware [59, 60]. These factors result in gaps in the sim- ulation soft w are, ev en b ecoming outdated if not contin uously maintained. Con versely , to aid future up dates, the simulators end up with a mo dular arc hitecture, allo wing new comp onen ts to interact with existing ones, ex- panding the sim ulator’s applicability ov er time. 3.2. Par al lel Discr ete Event Simulation Because of the high computational cost, there has alwa ys b een an incen- tiv e to dev elop Parallel Discrete Ev ent Simulation (PDES) [61, 62]. This consists of dividing and pro cessing the sequence of ev en ts in parallel while main taining the global state b et w een even ts. While this do es not reduce the computational eﬀort (if anything, the required sync hronization mechanisms will require an additional eﬀort), it do es reduce simulation time by splitting the load b et w een multiple pro cesses. Most PDES work by splitting the simulation in to Logical Pro cesses (LPs). That is, the simulation itself is split across these LPs, a pro cess done man- ually , where eac h one runs as if it w ere its o wn sim ulation. Ideally , the LPs are selected to balance the num b er of even ts while minimizing in ter-pro cess comm unication, whic h is muc h slo wer than ev ents handled fully lo cally . T o main tain the global order of even ts, a sync hronization algorithm is required. While man y approaches exist, these can be categorized as either conserv a- tiv e or optimistic [63]. Conserv ative sync hronization algorithms attempt to minimize causalit y errors, i.e., when the LPs process ev ents out of order, ne- cessitating a rollbac k. Optimistic sync hronization algorithms, in contrast, try to maximize the degree of parallelization at a higher risk. Ultimately , man y PDES sim ulators oﬀer b oth options, as p erformance ma y v ary dep ending on the scenario. While the b eneﬁts of PDES are evident, its complexities ha ve prev ented it from completely replacing DES. The in ter-pro cess messaging and sync hro- nization costs in tro duce signiﬁcan t o v erhead while reducing the eﬀectiv e par- allelization. Correctly deﬁning the LPs is cum b ersome, making their eﬀectiv e use more diﬃcult [18]. F urthermore, most common netw orks are hard to par- tition to begin with —even if the topology is easily partitioned (e.g., a fat tree topology split in to its branches), there will still b e large v olumes of traﬃc that cross these boundaries. Ov erall, net works hav e pro v en hard to 17 parallelize, with the gains obtained from parallelization outw eighed b y the o verheads present in PDES. Still, many simulators, including ns-3 and OM- NeT++, ha v e added supp ort for PDES. There hav e also b een developmen ts of PDES inspired b y existing DES, like p d-gem5 [52] based on gem5 [51]. Recen tly , no vel approaches hav e b een dev elop ed to further reduce PDEs’ o verhead. SimBric ks [53] in tro duces the concept of mo dular simulation: basi- cally , it integrates other simulators to model diﬀerent asp ects of the net work, suc h as OMNeT++ and ns-3 of the net work itself and gem5 for the hosts, for example. This both exploits the features a w arded b y eac h in tegrated sim- ulation softw are and increases speed b y running eac h sim ulator in parallel. Ho wev er, the authors themselv es iden tify that their simulator cannot acceler- ate individual comp onen ts. This leads to the sim ulation b eing b ottlenec ked b y the slo west comp onen ts [54]. F urthermore, requiring interoperability b e- t ween simulators results in some of their features b eing lost (e.g., gem5’s atomic memory proto col) and limits the scenarios SimBricks can cov er, such as only sim ulating single-core hosts. Some of these issues are addressed in its expansion, SplitSim [54]. Extra features include supp ort for mixed ﬁdelity sim ulation, reducing the accuracy of certain components to low er costs; decomp osition of the slo west comp o- nen ts; and an impro ved sync hronization algorithm. It also simpliﬁes the user conﬁguration through an orc hestration framework. Another no vel solution is the implemen tation of DONS [18]. Unlik e other sim ulators, DONS follows a Data-Orien ted design in its implemen tation, re- structuring how memory is organized and accessed to improv e paralleliza- tion. Ho w ever, these b eneﬁts only apply to thread-based parallelization in m ulti-core pro cessors, but not to multiple pro cesses across diﬀeren t machines. Hence, to facilitate distributed execution, it utilizes an automatic LP par- titioning algorithm. DONS prioritizes correctness, both by mathematically pro ving the robustness of its Data-Orien ted design and b y emplo ying a con- serv ative sync hronization algorithm. Ho w ever, DONS’s biggest limitation is its Data-Orien ted design, making it incompatible with other PDES imple- men tations. This requires all the sim ulation logic to b e re-implemen ted to ﬁt the new paradigm [55]. Con versely , Unison [55] follo ws a more standard approac h. Its no velt y arises from its automatic ﬁne-grained LP partition, easing conﬁguration and impro ving the eﬃcacy of the selected LPs, and from dynamic scheduling to a void b ottlenec ks. It can also reuse framew orks meant for other DES simu- lators like ns-3. While eﬀective, it also comes with imp ortan t limitations: its 18 automatic LP partition do es not supp ort stateful links (e.g., wireless chan- nels), and its load balancing assumes that all the processors it runs on are iden tical. A similar, recent approac h is that of NSX [56]. Unlike other PDES, NSX is meant to run in Graphical Processing Units (GPUs). The primary motiv ation b ehind this c hoice is to capitalize on the curren t p opularit y of GPU-hea vy data cen ters designed for LLMs. NSX is designed for GPUs’ ex- treme lev els of parallelization: for example, using local even t queues to main- tain a “lo cal" ev ent order. The main drawbac k of this approach is presuming the a v ailabilit y of plen tiful, p o w erful GPU hardware, which is monetarily exp ensiv e. An alternative approach to accelerate DES to those presented earlier is P arsimon [57]. Parsimon decreases computational cost while enabling eﬃ- cien t parallelization by decomp osing a netw ork top ology in to its links. That is, for eac h link in a ﬂow’s path, Parsimon studies its experienced dela y through a set of indep enden t simulations. Each link sim ulation predicts link b eha vior b y appro ximating the exp ected experienced load in the original top ology but in a smaller scenario. These link-level simulations are designed to light, and, as they are indep enden t, they can b e executed in parallel. Par- simon also oﬀers the option of using a greedy clustering algorithm to share sim ulation results b et w een similar links, th us reducing the n umber of them to be executed, albeit at the cost of accuracy . Ho wev er, the link sim ula- tions in tro duce error (usually o v erestimating delays) and are incompatible with pac ket-lev el visibilit y . Instead, Parsimon can only predict p erformance metrics that can b e computed as the aggregation of the p erformance at eac h link, such as the FCT. Because it tends to ov erestimate, Parsimon is b etter suited for tail-prediction —i.e., upp er b ound approximations. Ov erall, building an eﬃcient PDES is non-trivial. Due to the sequen tial nature of the DES, the authors ultimately need to mak e some concessions to maximize the eﬃciency of the sim ulation’s parallelization. Y et, when successfully applied, PDES achiev es accurate and complete results with low inference times, assuming the users ha ve the hardware to run it on. 3.3. Summary Ov erall, simulation, and in particular DES, hav e b een the most complete net work mo del av ailable. It remains the most accurate option, and con- tin uous supp ort b oth by the research communit y and industry allo ws DES sim ulators such as OMNeT++ [36] and ns-3 [29] to keep up with the adv ances in mo dern net works. Its biggest threat, ho wev er, is the computational cost. 19 T raditional, single-threaded DES cannot simulate large net works or mo dern data cen ters, and ev en if it can, it is at a prohibitively high computational cost. While interest in PDES existed ev er since the inception of DES itself, its tec hnical challenges were not addressed properly until fairly recently , with examples like SplitSim [54], DONS [18] and Unison [55]. Ev en then, it is imp ortan t to note that PDES (except for Parsimon [57]) does not reduce the computational cost of sim ulation (if anything, it will increase it due to the synchronization o verhead), but rather oﬄoads it to mutiple machines. Hence, without the hardware av ailable to run them, sim ulating large netw ork scenarios are still out of reac h. 4. Analytical Mo dels This section fo cuses on analytical netw ork mo dels, summarized in T a- ble 3. These are generally based on some formal theory and are implemen ted through systems of equations. 4.1. Queuing The ory Queuing Theory (QT) is a ﬁeld of mathematics that studies the b eha vior of queue-lik e systems. In them, a system will receive requests and service them according to a sp eciﬁed distribution (e.g., P oisson). If the system is o ccupied, requests will b e queued until they can b e pro cessed. The original mo del w as prop osed in the 1960s in [6], and w as later reﬁned for computer net works in [64]. With them, devices and protocols can b e approximated through these models to extract relev ant p erformance metrics such as the mean queuing dela y . Queues’ deﬁnitions v ary in complexity [45]. Simpler queuing mo dels can b e solv ed instan tly , but are limited to simple b eha viors and therefore ha ve more stringent assumptions. F or example, the simplest queue is the M/M/1 queue, which assumes (1) a Poisson arriv al pro cess of pack ets, (2) exp onen- tially distributed service times, and (3) a single First-In, First-Out (FIFO) queue with inﬁnite buﬀer. These prop erties make calculating factors lik e av- erage pack age queuing delay or service time trivial, but are also unrealistic. While the inﬁnite buﬀer is the clearest example, an important assumption is that of a P oisson arriv al pro cess. In practice, internet traﬃc distributions ha ve b een sho wn to be self-correlated and ﬁt best hea vy-tail distributions lik e W eibull [4] and log-normal [5] distributions. 20 Mo del T yp e Input scop e Output scop e T raﬃc t yp e P erformance metrics Ev aluation [64] M/M/1, M/G/ ∞ , M/G/1 queues Single-ﬂo w scenarios Flo w-level Non-sp eciﬁc A verage pac k et de- la y , buﬀer ov erﬂow probabilit y Analytical ev aluation [65] M/G/ ∞ queue T raﬃc- matrix scenarios Flo w-level Non-sp eciﬁc Loss rate Sim ulated data ev aluation [66] GI/G/1 queue Single-ﬂow scenarios Flo w-level Non-sp eciﬁc Upp er and low er b ounds, a verage pac ket delay Sim ulated data ev aluation [67] System of M/M/ ∞ queues T raﬃc- matrix scenarios Flo w-level TCP (T aho e) A verage pack et [loss probabilit y , R TT] Sim ulated data ev aluation [68] System of M/G/ ∞ queues T raﬃc- matrix scenarios Flo w-level TCP (T aho e) A verage pack et [loss probabilit y , R TT] Sim ulated data ev aluation [69] Mo diﬁed M/M/1 queues T raﬃc- matrix scenarios Flo w-level Non-sp eciﬁc Flo w throughput, a verage pac ket dela y Sim ulated data ev aluation [70] ODE ﬂuid queuing mo del Single-ﬂo w scenarios Flo w-level (temp oral) TCP (Reno and V egas) Flo w throughput No baseline 1 AIMD [71] ODE ﬂuid queuing mo del T raﬃc- matrix scenarios Flo w-level (temp oral) TCP (Generic) Flow throughput and mean R TT p er sesion Sim ulated data ev aluation [72] ODE ﬂuid queuing mo del T raﬃc- matrix scenarios Flo w-level (temp oral) TCP (Generic) A verage RR T and loss rate Sim ulated data ev aluation 1 Analytical comparison b et ween TCP versions, no comparison with baseline. 21 [73] SDE ﬂuid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP (Generic w/ RED) Flo w throughput, a verage R TT (but ev aluation only ev aluates queue- length) Sim ulated data ev aluation [74] SDE ﬂuid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP (Sac k, Reno, and NewReno) Flo w throughput, a verage R TT (but ev aluation only ev aluates queue- length and windo w size) Sim ulated data ev aluation m ulti-AIMD [75] Fluid/ discrete h ybrid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP and UDP Flow throughput, a verage R TT Sim ulated data ev aluation [76], [77] Fluid/ discrete h ybrid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP and UDP A ver age ﬂo w throughput and R TT (only drop rate and windo w size retain a temp o- ral comp onen t) Sim ulated data ev aluation [78] PDE ﬂuid queuing mo del T raﬃc- matrix scenarios Flo w-level (temp oral) TCP (Generic) A verage ﬂo w throughput, FCT and loss rate Sim ulated data ev aluation [79] PDE ﬂuid queuing mo del T raﬃc- matrix scenarios Flo w-level (temp oral) TCP (Scalable TCP) Flo w throughput Sim ulated data ev aluation [80] ODE ﬂuid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) UDP 2 and TCP (Reno) Flo w R TT, throughput and loss rate Sim ulated data ev aluation 2 UDP only as supp orted as background traﬃc. 22 [81] Discrete and ﬂuid queuing mo del T raﬃc- matrix scenarios Flo w-level (temp oral) TCP Flow pack et delay (ev aluation fo cuses in av erage queue lengths) Sim ulated data ev aluation TF A [7, 8] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Analytical ev aluation D-BIND [82] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Real data ev aluation [83] ADNC F ull netw ork scenarios Flo w-level (temp oral) UDP and traﬃc with throughput guaran tees W orst-b ound pac ket delay Analytical ev aluation SF A [30] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Analytical ev aluation PMOO[31] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Sim ulated data ev aluation [84] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Analytical ev aluation TMA [35] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Sim ulated data ev aluation [85], ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Analytical and Sim ulated data ev aluation [86] ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Sim ulated data ev aluation 23 [87] ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Sim ulated data ev aluation [88] SNC T raﬃc- matrix scenarios Flo w-level (temp oral) Non-sp eciﬁc Sto c hastic w orst- b ound pack et delay Analytical ev aluation [89] SNC T raﬃc- matrix scenarios Flo w-level (temp oral) Non-sp eciﬁc Sto c hastic w orst- b ound pack et delay Analytical ev aluation [90] SNC T raﬃc- matrix scenarios Flo w-level (temp oral) Non-sp eciﬁc Sto c hastic w orst- b ound pack et delay Analytical ev aluation [91] Analytical mo del from RF C deﬁnitions Single-ﬂo w scenarios Flo w-level (temp oral) TCP (generic with phase eﬀects and RED) Flo w throughput Sim ulated data ev aluation [92] Sto c hastic analytical mo del from observ ations Single-ﬂo w scenarios Flo w-level (temp oral) TCP (Reno) Flow pac ket rate and throughput Real data ev aluation [93] Utilit y maximization mo del T raﬃc- matrix scenarios Flo w-level TCP Flow throughput Analytical ev aluation [94] Utilit y maximization mo del T raﬃc- matrix scenarios Flo w-level TCP Flow throughput Analytical ev aluation T able 3: Summary of analytical netw ork p erformance models 24 Alternativ ely , relaxing these assumptions will result in more complex mo dels, like the G/G/1 queue, whic h assumes general distributions for b oth pac ket interarriv al and service times, or the mo diﬁed M/G/ ∞ queue in [65] for mo deling video traﬃc. By not being bound b y such assumptions, the resulting model is harder to solve, and may not ev en ha ve an exact solu- tion [95]. Usually , appro ximate solutions are extracted from solvers such as Buzen’s conv olution algorithm [96], mean v alue analysis [97], Lo cal Balance Algorithm for Normalizing Constants and Coalesce Computation of Normal- izing Constan ts [98]. A recen t prop osal is that of [66], where the authors use nonstandard analysis for analyzing GI/G/1 queues. Hence, it is also common to build more complete mo dels using an ensem- ble of simpler queue mo dels. An example can b e found in [67, 68], where a mo del of TCP T aho e was built out of a graph made out of M/M/ ∞ queues (and later M/G/ ∞ , assuming a generalized service time distribution), each represen ting a state of the proto col and which the pack ets trav ersed. Their mo del also co vers link in terfaces through an M/M/1/B queue, but only con- siders line top ologies in their ev aluation. Another example can b e found in [69], where the authors prop ose a model capturing burst y and phase-t yp e traﬃc in a m ultistage switch net work. It explains how these traﬃc proﬁles are aggregated and uses adapted M/M/1 queues to approximate the more complex arriv al distributions. QT mo dels are further limited b y only working through distributions. First, it only allo ws mo dels to predict aggregate (e.g., av erage) p erformance measures, suc h as the av erage pack et dela y . Second, no matter how expres- siv e the distribution is, it will alwa ys b e less descriptive than working with individual traﬃc pack ets, as is done in DES. Next, they tend to mo del only traﬃc ﬂo ws, indep endently of the underlying net work hardware. The latter is abstracted into “servers" that dela y pac kets b y a giv en amoun t, according to a constan t or exp onen tial distribution, dep ending on the queue mo del. W e also note that early queuing mo dels focused on highly sp eciﬁc sce- narios, t ypically mo deling single ﬂo ws without accoun ting for cross-traﬃc from other TCP connections or bac kground traﬃc. That b eing said, this limitation has b een addressed in the SotA [67, 68, 69], incrementally making them viable in a wider range of scenarios. That b eing said, queuing mo dels for TCP implementations are susceptible to b ecoming outdated as netw orks ev olve and the TCP ﬂav or they supp ort b ecomes deprecated. In spite of this, QT models remain a useful to ol for net w ork op erators to quickly obtain a general and appro ximate description of the exp ected b eha vior. 25 4.2. Fluid Mo dels Fluid mo dels (a.k.a. liquid models) are a subtype of QT mo dels. Un- lik e standard (discrete) QT mo dels, which account for data transmission in indivisible pack ets, ﬂuid mo dels treat data as a contin uous, inﬁnitely divis- ible ﬂow. This allo ws for more expressive mo dels to b e developed based on systems of Ordinary , Sto c hastic, and, later on, Partial Diﬀeren tial Equations (ODE, SDE, and PDE, resp ectiv ely). Since plen t y of ODE and PDE solv ers exist, due to their prev alence in other ﬁelds, ﬂuid mo dels tend to b e computationally inexp ensiv e to solv e. F urthermore, they are more expressive, as their description by sets of diﬀer- en tial questions also allo ws authors to include a temp oral component in their metrics predictions, unlik e discrete mo dels that focus on aggregated (e.g., a verage) results. Being able to formulate the net work state through a series of equations, akin to other ﬁelds like thermodynamics and electrical systems, is also app ealing. Ho wev er, ODEs also introduce important limitations. By themselv es, these systems in tro duce a degree of error through appro ximation —solvers based on numerical solutions, for example, will inevitably in tro duce a degree of error due to their appro ximate approach. These inaccuracies are muc h more prev alent in computer net works due to their discrete nature: digital information is measured in bits and sent through indivisible pack ets; decisions tak en b y transp ort proto col algorithms, lik e TCP , are made at a resolution of these individual pac kets. Hence, an y formulation that deals with the net work as a con tinuous system will b e limited due to this fundamen tal diﬀerence. Nonetheless, its adv an tages prev ailed, and ﬂuid mo dels proliferated in the SotA b et ween 2000 and 2010. F or example, in [70], the authors implemen ted a ﬂuid mo del of b oth TCP Reno and TCP V egas to compare the tw o. The comparison, how ev er, is limited to the ﬂo w control section of b oth ﬂa v ors. In AIMD [71], the authors prop ose a generic TCP model, including a steady- state solution to extract the ﬂo w’s throughput and mean stationary in ter- congestion time. How ev er, lik e other generic TCP mo dels, it do es not cov er adv ances and diﬀerences betw een TCP implemen tations. In [72], the authors prop ose a generic TCP mo del to predict performance metrics relev an t to video transmission (pac k et R TT s and loss rates). The model assumes that pac ket loss can b e describ ed as a Poisson Pro cess. Next, in [73], the authors deﬁne a system of SDE, later con verted to a system of ODE, to model Random Early Detection (RED), a mec hanism of Activ e Queue Management (A QM). The model can be expanded from 26 mo deling a single router to a netw ork with multiple TCP ﬂows, and scales to larger instances through ﬂow aggregation. How ev er, the traﬃc matrix it uses ignores the order of visited routers and assumes a generic version of TCP . Shortcomings of this mo del are later addressed in [74]: it expands the RED mo del to include other types of A QM p olicies, sp eciﬁc versions of TCP (SA CK, Reno, and NewReno), a nd expands the routing information from the netw ork topology considered by it. The mo del also introduces pruning of irrelev an t elements (e.g., non-congested queues) to reduce the num b er of v ariables and improv e computational p erformance. There are also examples of mo dels that attempt to com bine b oth dis- crete and ﬂuid queuing models. In multi-AIMD [75], the authors expand the AIMD mo del to supp ort the presence of b oth TCP and UDP ﬂows. It considers a ﬂuid mo del that feeds into an M/M/1/B (discrete) queue mo del for the net work devices according to the netw ork top ology . In [76, 77], the authors follo w a diﬀerent approach, where ﬂows are describ ed through a dis- crete state (i.e., empty queues, non-congested ﬂo w, and congested ﬂo w), eac h with its own system of SDE. It allows mo deling a more complex set of b e- ha viors than a single SDE. The model supp orts diﬀeren t TCP ﬂav ors and UDP , alb eit the latter follo wing a sp eciﬁc On/Oﬀ interarriv al traﬃc distri- bution. Imp ortan tly , the ev aluation only co v ers predictions of the a v erage ﬂo w’s throughput and R TT s ov er the scenario, and the pap ers do not sp ecify ho w to compute the time-aw are v alues. One of the ﬁrst examples of a mo del that uses a system of PDEs can b e found in [78]. The mo del exploits the increased expressiv eness of ﬂuid mo dels when expressing ﬂo w b eha vior ov er time, to b etter deal with “mice" ﬂo ws. These are short ﬂo ws whose completion time is more dependent on propagation delays than transmission delays. The mo del considers a generic TCP version. It still relies on assumpti ons (e.g., mice ﬂow lengths are exp o- nen tially distributed) and do es not address the underlying top ology . In the ev aluation, despite including the temp oral comp onen t in its reasoning, they only consider av erage v alues for the throughput and F CT. Later, in [79], the authors deﬁne a model using PDEs to mo del Scalable TCP . While this mo del supp orts m ultiple ﬂows, it assumes constant R TT and contin ues to disregard the net work top ology . Later on, w e hav e mo dels that co ver more complex net w ork scenarios. F or example, in [80] the authors develop an ODE mo del for considering queuing p olicies other than FIFO. This includes fair queuing, longest queue ﬁrst, and shortest queue ﬁrst. The mo del also supp orts m ultiple TCP ﬂo ws and back- 27 ground UDP traﬃc. The mo del can predict the TCP ﬂo w’s throughput and buﬀer o ccupancy , and the UDP’s loss rate. Ho wev er, the mo del is adjusted for TCP Reno, sp eciﬁcally during the congestion a voidance phase. Another example is in [81], where the authors mo dify b oth discrete and ﬂuid queuing mo dels to capture transien t states of TCP ﬂows. While the obtained ﬂuid mo del is less accurate, its low er computational cost allows it to b e applicable on larger net works, unlike the discrete mo del. 4.3. Network Calculus Net work Calculus (NC) is a set of techniques used to study net work p er- formance in tro duced in [7, 8]. Unlik e QT, which describ es net w ork traﬃc through traﬃc distributions, NC describ es traﬃc through arriv al and service curv es. These are then used to generate w orst-case b ounds for the ﬂow’s per- formance metrics o v er time. In [7], the authors deﬁned net w ork elemen ts, like dela y lines, buﬀers, and regulators. Then, in [8], the authors sho w ho w these elemen ts can b e combined to deﬁne a net w ork model. Brieﬂy , NC became p opular as a middle p oin t b et ween QT and DES, oﬀering inexp ensiv e predic- tions, relying on fewer traﬃc assumptions, and with an integrated temporal comp onen t NC mo dels, ho w ever, are not ﬂa wless. While NC ensured a correct w orst- case b ound, originally these w ere too permissive for the prediction to be useful. Conv ersely , tighter b ounds ma y result in increased mo del complexity and computational cost. Due to this, SoT A NC models attempt to push the Pareto fron t b y pursuing a better balance betw een tight b ounds and the cost of obtaining them [35]. Ultimately , NC mo dels can be subdivided in to three categories: Algebraic-based Discrete Netw ork Calculus (ADNC), Optimization-based Discrete Netw ork Calculus (ODNC), and Sto c hastic Net- w ork Calculus (SNC). Finally , a signiﬁcant limitation shared across all NC mo dels is that it can only work in feedforward net works —i.e., where routing paths do not result in cycles. This asp ect was studied in [99], where the authors tried to deﬁne a delay b ound for generalized topologies, but only found it p ossible when link utilization is extremely low. Otherwise, the pac ket dela y for a giv en ﬂow ma y be inﬂuenced b y ﬂo ws that do not share a queue within their routing paths or ha v e been completed b efore the given ﬂo w has ev en started. As the authors state, a delay b ound ma y not even exist b ey ond a large enough link utilization. This can be addressed b y turning general net w ork top ologies in to functionally feedforw ard ones. F or example, net works can be shap ed into 28 spanning trees b y removing links from consideration. Alternativ ely , in [100] they prop ose mo difying the ﬂo w’s routing paths through turn-prohibition. While the latter is less drastic, both decrease ﬂow throughput. Empirical ev aluation suggests that this eﬀect worsens in larger top ologies. 4.3.1. A lgebr aic-b ase d Discr ete Network Calculus (ADNC) It is the approach follo wed by the original prop osal [7, 8], where net work elemen ts are deﬁned and can b e chained. The dela y b ounds are solv ed alge- braically through the use of (min, + ) and (max, + ) algebra [30]. Later on, the original prop osal was referred to as T otal Flo w Analysis (TF A). One of the ﬁrst improv emen ts w as using piece-wise functions to replace the linear arriv al and service curves, as used in [82], resulting in more expressive and tigh ter deﬁnitions. [83] also extends the original prop osal by supp orting ﬂo ws with ﬂow con trol —i.e., ﬂows that reserv e net work resources for p erformance guaran tees. Later in [30], the authors prov ed that TF A’s delay bound o veres- timates the impact of ﬂow bursts at each no de in a ﬂow’s path. This can b e addressed b y deﬁ ning and exploiting the concatenation prop ert y (known as the Pa y Bursts Only Once Principle or PBOO), resulting in Separate Flow Analysis (SF A). The PBOO principle was extended in [31] to mak e it applicable to apply NC for m ultiplexing ﬂo ws in the same node, renamed as P ay Multiplexing Only Once (PMOO). In theory , PMOO also allow ed for considering arbitrary m ultiplexing of ﬂows, unlike previous approac hes, which assumed FIF O mul- tiplexing. How ev er in [85] shows that arbitrary m ultiplexing ma y result in lo oser b ounds than those obtained with SF A. Next, the authors at [84] pro- p ose extending the PMOO mo del using b ounded arriv al curves, making the mo del more eﬃcient with the aim of unlo c king the use of more accurate y et complex function deﬁnitions for the arriv al and service curves. Finally , in [35] the authors compare b oth ADNC and ODNC alternativ es in the SotA, and oﬀer an impro ved version of ADNC, named T andem Matching Analysis (TMA). It considers all p ossible (min, + ) op erations during the netw ork de- comp osition to obtain the least p essimistic delay b ound. While this approac h results in an exponential n umber of decomp ositions according to the net work size, it is optimized to k eep computational costs akin to similar alternatives, as pro ved by their ev aluation. 29 4.3.2. Optimization-b ase d Discr ete Network Calculus (ODNC) These mo dels ﬁnd the dela y bound for a giv en ﬂow in the netw ork b y solving a Linear Programming Problem (LPP). The LPP is formulated to iden tify the worst p ossible delay according to a set of temp oral, spatial, and service constrain ts. An early example is introduced at [85] as an alternative the PMOO model. While more accurate, esp ecially when facing complex scenarios lik e non-FIFO ﬂo w multiplexing, this approach suﬀers from an extremely high computational cost. This is b ecause solving the dela y b ound for a given ﬂo w is an NP-hard problem, as b oth the n umber of LPPs to solve and constrain ts in eac h of them increase exp onen tially with the netw ork size [86]. Hence, authors ha ve lo ok ed at heuristics or scenarios where the compu- tational cost is more b ounded. In the same paper [86], the authors ﬁnd that when considering tandem netw orks —graphs that can b e describ ed as a directed path with no shortcuts— the computational cost decreases from exp onen tial to p olynomial. They also prov e a heuristic where a univ ersal service curve can b e found for all ﬂo ws in the netw ork, further reducing the computational cost. Another heuristic is prop osed in [87] where a com bina- tion of Monte-Carlo and Direct Search w as used to reduce the cost of solving the LPP . How ev er, an analysis done by [35] shows that these heuristics are insuﬃcien t to make solving the problems feasible for large netw orks. 4.3.3. Sto chastic Network Calculus (SNC) While traditional (deterministic) NC oﬀers a strict dela y b ound, SNC instead deﬁnes delay b ounds that can b e incorrect with a small probability of error [101]. The earliest pap er proposing this approac h can b e found in [88], where they use b ounds for the moment generating functions of random v ariables. While there hav e b een several attempts to deﬁne a complete SNC [13], the authors at [89] are the ﬁrst to aggregate formulations from previous researc h to build one that satisﬁed a set of listed prop erties (e.g., service guaran tees, p er-ﬂo w service). How ev er, later w ork b y [102] sho wed that such a mo del is not purely sto chastic and struggles in asp ects such as capturing statistical ﬂo w multiplexing. In [90], the author prop oses expanding existing SNC by estimating the eﬀectiv e netw ork bandwidth to obtain b etter delay b ounds. Ultimately , the biggest dra wbac k of SNC is its relatively recen t dev elopment, making it less explored and dev elop ed than its deterministic coun terparts. 30 4.4. Other A nalytic al Mo dels Analytical mo dels that are not based on either QT or NC are generally rare, as this implies losing the accumulated kno wledge from previous researc h. Nonetheless, sometimes mo dels ma y b e built based on other principles, or the authors wish to a void the weaknesses present in these ﬁelds. A common source of alternative analytical mo dels is building ones based on the empirical observ ations, or, in the case of w ell-deﬁned proto cols lik e TCP , on the RFC deﬁnitions. In [91], the authors implement an analytical mo del for TCP with forward ac kno wledgments using Rate-Halving. This mo del was later ev aluated considering phase eﬀects and RED. Ho wev er, the ev aluated top ologies w ere small, generally single-link top ologies, and the In ternet’s topology was simpliﬁed to a single link with losses b et ween the routers. Next, in [92], the authors build a sto c hastic mo del for TCP Reno. It pro cesses ﬂows in to rounds according to the congestion window. Unfortu- nately , it relies on sev eral assumptions for the mo del to b e solved through a closed-form solution: losses are indep enden t of those in diﬀeren t rounds, and R TT is assumed to b e indep enden t of the windo w size. The mo del also do es not cov er Reno’s fast-reco very algorithm or the small diﬀerences across implemen tations. Alternativ ely , there has b een research applying game theory to model congestion control algorithms. This idea was in tro duced in [93], where the congestion algorithm w as tasked to ﬁnd a Nash equilibrium where perfor- mance across all presen t ﬂows is maximized. The authors prop ose using the Nash arbitration sc heme to predict the throughput allo cation for each ﬂo w while proving the existence of an optimal allo cation. Inspired by it, in [94] prop oses tw o mo dels. The ﬁrst, an ideal mo del, relies on knowing the “utilit y" obtained by each ﬂo w at a given throughput, which is unkno wn b eforehand. The second mo del simpliﬁes the ﬁrst, b eing decomp osed into a solv able dual problem. A strength of this approac h is that mo dels considered link capac- ities in their reasoning, whic h w as rare for analytical models at the time. Con versely , this approach had tw o main dra wbacks. First, in practice, con- gestion algorithms like TCP may not necessarily allo cate throughput solely b y maximizing bandwidth allo cation. Second, the prop osed mo dels are gen- erally to o complex to be applied to real netw orks, while their simpliﬁed forms in tro duce inaccuracies. 31 4.5. Summary In summary , analytical mo dels aim to b e a cost-eﬀectiv e alternativ e to DES. The ﬁrst mo dels are based on Queueing Theory , whic h uses queues or systems of queues to describ e b oth net work devices and proto cols [6, 64]. Ho wev er, simpler queue mo dels lac k accuracy due to ov ersimplistic assump- tions, while more complex mo dels are no longer cost-eﬀectiv e to solve. Fluid mo dels are also based on QT, but allow ﬂo w payloads to b e inﬁnitely divis- ible. While this con tradicts ho w traﬃc pac k ets b eha v e in realit y , it allo ws mo dels to b e formulated through a system of diﬀerential equations, reinforc- ing their cost-eﬀectiv eness while incorporating the temp oral comp onen t in their prediction [70]. In resp onse to QT’s limitations, Net w ork Calculus was developed as an alternativ e for netw ork mo deling [7, 8]. Unlike queuing mo dels, NC mo dels included the temp oral comp onen t and predicted p erformance b ounds rather than direct estimations. Ho w ev er, like QT mo dels, NC mo dels are also split b et w een inaccurate predictions (i.e., lo ose b ounds) or prohibitiv ely computa- tional costs. Finally , while there are analytical mo dels that do not fall under an y of these ﬁelds, their use tends to b e more sp oradic and more sp ecialized (e.g., using Game Theory to understand bandwidth allo cation in congestion- con trolled traﬃc [93, 94]). 5. Mac hine Learning Mo dels This section fo cuses on net work mo dels that predict net w ork performance b eha vior through the use of ML mo dels. Discussed mo dels are summarized in T able 4. 5.1. “Shal low" ML Mo dels The earliest attempt to use ML to predict net work p erformance that w e iden tiﬁed w as in [37]. In it, the authors considered b oth “form ula-based" (an- alytical) and “history-based" (temp oral regression) mo dels when predicting TCP throughput. They pro ved analytical mo dels’ limitations when dealing with saturated traﬃc, while ev en simple regression mo dels like mo ving a v er- age (MA) or exp onen tially weigh ted MA (EWMA), w orked fairly w ell. Still, the purp oses of these models w ere not to build a to ol, but to ev aluate the feasibilit y of ML mo dels. 32 Mo del T yp e Input scop e Output scop e T raﬃc t yp e P erformance metrics Ev aluation [37] MA, EWMA, Holt-Win ters Single-ﬂo w scenarios Flo w-level (temp oral) TCP (generic) Flo w throughput T estb ed data P athPerf [103] SVR Single-ﬂo w scenarios Flo w-level (temp oral) TCP (generic) Flo w throughput T esb ed data and real data WISE [104] Causal graph Single-ﬂo w scenarios Flo w-level HTTP connections Net work resp onse time Real data [105] EVT Multi-ﬂo w scenarios Flo w-level UDP 50th, 75th, 90th, 99th, 99.9th, 99.99th, 99.999th and 100th p er- cen tiles for pack et dela y and jitter T estb ed data CLAAP [106] RBFNN Single-ﬂo w scenarios Flo w-level (temp oral) UDP Flo w latency Real data [107] DNN Multi-ﬂo w scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [108] DNN, RF Multi-ﬂo w scenarios Flo w-level UDP and TCP A v erage pack et de- la y Sim ulated data RouteNet [109, 110, 111] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter Sim ulated data [112] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter Sim ulated data 33 RouteNet- Erlang [113] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter, loss rate Sim ulated data RouteNet- F ermi [17] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter, loss rate Sim ulated data and testb ed data RouteNet- Gauss [58] Custom- designed MPNN with temp oral comp onen t F ull netw ork scenarios Flo w-level (temp oral) UDP [A v erage, median, and 90th, 95th, and 99th p ercen tile] pac ket [dela y , jitter] T estb ed data [114] Custom- designed MPNN with graph atten tion F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [115] Custom- designed MPNN with graph atten tion F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [116] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y T estb ed data [117] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y T estb ed data 34 DeepComNet [118] GGNN [23] F ull netw ork scenarios Flo w-level UDP and TCP TCP ﬂow through- put, UDP a v erage pac ket delay Sim ulated data [119] GGNN [23] F ull netw ork scenarios Flo w-level TCP (Reno, Cubic, Bic, Illinois, V eno, V egas and Ledbat) Flo w R TT and throughput Sim ulated data [120] GCN [22] F ull netw ork scenarios Flo w-level UDP and TCP A v erage pack et de- la y Sim ulated data EA GLE [121] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP and TCP A v erage pack et de- la y and loss rate Sim ulated data xNet [122] Custom- designed MPNN with temp oral comp onen t F ull netw ork scenarios Flo w-level (temp oral) UDP and TCP (DTCP) A verage pac k et de- la y , ﬂow throughput and F CT Sim ulated data GLANCE [123] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pac ket dela y and jitter, ﬂo w throughput, loss rate Sim ulated data Flo wSeer [124] Ensem ble of GCN [22], DNN with atten tion and DNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data m4 [125] GraphSage [126] + GRU [127] F ull netw ork scenarios Flo w-level (temp oral) TCP (DCTCP , TIMEL Y, DCQCN) [A verage, 90th p er- cen tile] ﬂow F CT and throughput Sim ulated data 35 xW eav er [128] CNN F ull netw ork scenarios Flo w-level UDP Flo w FCT and throughput Sim ulated data and testb ed data Deep-Q [129] CV AE [130] + LSTM [25] Multi-ﬂo w scenarios Flo w-level (temp oral) UDP P ac ket delay and loss rates T estb ed data [131] T ransformer [132] Single-ﬂo w scenarios P ack et-lev el UDP P ack et delay Sim ulated data DeepQueueNet [16] T ransformer [132] + BiLSTM F ull netw ork scenarios P ack et-lev el UDP F ull pack et path in- formation Sim ulated data [133, 134] NP [27] Multi-ﬂo w scenarios Flo w-level Non-sp eciﬁc A verage pac k et de- la y , ﬂow throughput and loss rates T estb ed data T able 4: Summary of ML netw ork p erformance models 36 Later in P athP erf [103], the authors used a Supp ort V ector Regression (SVR) mo del to predict TCP throughput. It w as trained in a testb ed using real net work data and div erse routing paths, showing competitive perfor- mance relative to analytical mo dels. SVRs, a popular arc hitecture at the time, are able to learn from limited amoun ts of data, whic h is extremely useful considering the cost and diﬃculties of capturing real-w orld data for training. Ho wev er, this architecture still had sev eral limitations: the mo del could only consider one TCP ﬂow at a time, could only predict av erage met- rics, and the mo del had to b e retrained for every change to the ﬂow’s path. Another example of an early p erformance prediction to ol is WISE [104]. WISE sits on the edge b etw een performance prediction and netw ork v ali- dation: it builds a causal graph from captured traﬃc traces that fo cus on HTTP connections and the identiﬁed v ariables, such as the num b er of pack ets transmitted, their size, and estimates net w ork bandwidth. A causal graph is a statistical mo del that deﬁnes ho w v ariables inﬂuence eac h other in a non- cyclical graph. WISE focused on predicting the HTTP connection’s response time, b oth from the netw ork and bro wsers. The causal graph allo ws the user to query diﬀeren t v alues for the sp eciﬁed v ariables, hence allo wing the ev alu- ation of “what-if" scenarios to understand their impact on the resp onse time. In the ev aluation, which consisted of captured data from Go ogle’s net work, it pro ved to b e accurate in predicting b oth options of the response time. Unlik e other ML mo dels, it do es not need training. Instead, it learns and builds the causal graph during inference, making the metho d versatile but slow. No wada ys, “shallo w" ML mo dels ha v e fallen out of fa vor against their DL coun terparts, as the latter tend to oﬀer more accurate results due to their more complete arc hitecture. How ev er, “shallo w" ML models are sometimes desired due to their eﬃciency , being able to b e trained with few samples and at a lo w computational cost. A recent example of this is in [105], based on Extreme V alue Theory (EVT), a statistical mo del designed to predict extreme even ts or worst-case b ounds. In this pap er, EVT is used to predict upp er-bound approximations of metrics lik e ﬂow latency and jitter. The EVT mo del is highly eﬃcien t: it only required to be trained on 5% of the data a v ailable. It w as also able to predict scenarios with diﬀerent top ologies and mak e predictions for the entire netw ork (with m ultiple-interacting ﬂo ws) and not only for individual ﬂo ws. How ev er, the results show that its accuracy for netw ork predictions is signiﬁcan tly lo w er. Finally , the EVT mo del also dep ends on assumptions like stationary ﬂow latencies. Another recen t example is that of CLAAP [106]. CLAAP uses a Radial 37 Basis F unction Neural Netw ork (RBFNN) to predict ﬂo w latency in online gaming sessions. It bases its predictions on previous latency measuremen ts, akin to a regression mo del such as MA and EWMA in [37]. While the RBFNN is a shallow neural net work, it still outp erforms other online regression mo dels in its ev aluation. F urthermore, it is designed for quic k inference sp eed, as it is a mo del meant for online management of online gaming sessions. Hence, resp onse time is more critical than creating an accurate yet exp ensiv e internal represen tation of the ﬂow state. 5.2. De ep L e arning and G r aph Neur al Networks The initial attempts to build DL netw ork p erformance mo dels w ere tested using Dense Neural Net works (DNNs). In [107], they ev aluate the use of DNNs when predicting the a verage ﬂo w delay . While their results show the p oten tial of DL metho ds b y b eing computationally inexp ensiv e and capable of mo deling non-linear relationships, they noted their inability to generalize to unseen top ologies and routings. Also, the only traﬃc proﬁle of pac ket arriv als considered w as an exponential distribution, a simpler distribution than those faced in real-life scenarios. Later in [108], the authors compared building a DNN mo del against a Random F orest (RF) mo del using a mixture of UDP and TCP traﬃc of v arying complexit y . They ultimately sho w ed that the RF mo dels generally outp erformed the DNN mo dels. Hence, researc hers quic kly mo ved from using DNNs and lo ok ed in to alternativ e, b etter-suited DL arc hitectures. 5.2.1. R outeNet and Suc c essors Among all the p ossible alternativ es, the family of arc hitectures that pre- dominated w ere Graph Neural Net works (GNNs) [24]. GNNs are designed to tak e a graph input and exploit the top ological information presen t. Unlik e previous ML and analytical mo dels, which struggled to represent the impact of the net work top ology and ﬂo w’s routing path on their reasoning, GNNs can prop erly exploit them, leading to their success. This is exempliﬁed b y the RouteNet family of mo dels. The original one w as describ ed and introduced in [109], and w as later more thoroughly examined at [110, 111]. RouteNet’s architecture is based on the Message Passing Neural Net work (MPNN) arc hitecture [26]. It w orks b y building a heterogeneous graph that captures the dep endencies b et w een links and ﬂo ws within a netw ork, based on its topology and routing. In principle, using this description allows RouteNet to accurately predict ﬂo w 38 dela y and jitter in top ologies and routing paths unseen during training while main taining a low computational cost (i.e., milliseconds per net w ork sce- nario). The RouteNet mo dels are trained and ev aluated on samples using DES. While eﬀectiv e, RouteNet do es hav e imp ortan t dra wbacks. First, further ev aluation has sho wn limitations in its generalization [135]. This includes larger top ologies and link capacities than those seen during training, but also to diﬀerent traﬃc proﬁles, pac ket sizes, and inter-pac ket gap (IPG) time distributions. F urthermore, RouteNet can only predict a v erage measures across the en tire scenario, and cannot mo del transitory b eha vior. Finally , it do es not supp ort key net work features, suc h as mo deling TCP traﬃc, Quality of Service (QoS) requiremen ts, or alternative routing paths. T o address these limitations and impro ve ov erall accuracy , RouteNet has since b een extended through successive iterations to cov er a broader range of scenarios. The ﬁrst c hange was in [112], in whic h additional features were added to add supp ort for v ariable queue sizes. The ﬁrst ma jor expansion, ho wev er, came with RouteNet-Erlang [113]. It includes queues in its graph represen tation of the net work alongside links and ﬂo ws, and its feature extrac- tion is mo diﬁed to b etter supp ort inferring o ver larger top ologies than seen during training. It also supp orts ﬂows parametrized through auto-correlated traﬃc distributions, based on observ ations done to in ternet traﬃc [4]. Ov er- all, this allows for a more complete deﬁnition of the netw ork and its traﬃc, b etter accuracy when dealing with unseen scenarios during training, and in tro duces supp ort for m ultiple queues p er p ort, diﬀeren t queuing p olicies, supp orting QoS requirements, and pack et loss predictions. The next large iteration is that of RouteNet-F ermi [17]. Rather than predicting the a verage ﬂow dela y directly , RouteNet-F ermi instead learns to predict the mean queu e o ccupancy at each step of the ﬂo w’s routing path. This change in the model’s reasoning results in more accurate predictions and b etter scaling when facing larger netw orks, in num ber of elements, traf- ﬁc intensit y , and link capacities. RouteNet-F ermi also expanded the n umber of supp orted queuing p olicies relative to RouteNet-Erlang. Finally , this it- eration of RouteNet was also ev aluated with real (testbed) net work data, pro ving its feasibility in more complex traﬃc scenarios. The most recen t iteration of RouteNet is RouteNet-Gauss [58]. It in- tro duces a temp oral comp onen t, splitting traﬃc scenarios in to ﬁxed-time windo ws, enabling it to mo del non-stationary traﬃc and b etter address the complexities present in real net work samples. The window size can b e ad- 39 justed to balance mo del expressivity to computational cost. It also remo ves the need to specify ﬂo w distribution parameters as input, allo wing it to accu- rately mo del non-parametric traﬃc. Still, RouteNet-Gauss do es share some limitations with the original version: sp eciﬁcally , a lac k of supp ort for con- gestion con trol algorithms and generalization to unseen netw ork proto cols and traﬃc distributions from those seen during training. Bey ond the main line of RouteNet architectures, there hav e b een improv e- men ts made b y authors other than the original ones. The most common is including an atten tion mec hanism at some p oin t in the message passing phase. In [114], this is introduced to the original RouteNet, while in [115] it is introduced in the RouteNet-F ermi. The inclusion of graph atten tion mec hanisms is used to improv e the expressivity of the message passing phase b y allowing no des to weigh t the relative imp ortance of their neigh b ors. This mec hanism was in tro duced in the Graph A ttention Net work [136], which b e- came the SotA general-purp ose GNN arc hitecture. W e also note that, in [115], other improv emen ts were in tro duced, suc h as using feature selection to reﬁne the feature extraction pro cess. In [116], the authors also expand RouteNet-F ermi by increasing the fea- tures used to co dify ﬂo w information. Sp eciﬁcally , they extract the IPG mean, v ariance, and sev eral percentiles for eac h ﬂo w. They also prop ose using the pac k et loss rate as an input for predicting the ﬂow dela y , which, while eﬀective, is impractical in real-life scenarios, as either the pack et loss rate is not known during inference, or if it is, the a verage pack et delay is al- ready known as well. Finally , in [117], the authors mo dify RouteNet-F ermi, replacing the use of ﬂo w distribution parameters with the ra w pack et traces and the w av elet decomp osition pro cess during ﬂow enco ding. This allows for cost-eﬃcien t, rich descriptions of non-parametric traﬃc distributions. The results also suggest the mo del’s abilit y to p erform inference on unseen traﬃc distributions at a minimal penalty to the prediction error. How ev er, this metho d requires the pac k et traces of the ﬂo ws, whic h are costly to obtain and not alw ays av ailable. 5.2.2. Other GNN Mo dels While RouteNet has inspired many of the GNN mo dels, there are other alternativ es present in the SotA. Published around the same time as the original RouteNet pap er, the au- thors in [118] also prop ose a GNN-based net w ork model. Lik e RouteNet, the authors also prop osed building a graph that captures the dep endencies 40 b et w een links and ﬂo ws. Unlike RouteNet, its arc hitecture is based on Gated GNNs (GGNNs) [23] rather than MPNNs. Next, it can mo del congestion- con trolled traﬃc by represen ting b oth the original stream and the A CK mes- sages as separate ﬂows. Ho w ever, while RouteNet’s hypergraph is a heteroge- neous graph, with the diﬀeren t net work elemen ts b eing pro cessed diﬀerently , the h yp ergraph in [118] is a homogeneous graph, and its no des are only diﬀeren tiated through a one-hot feature. F urthermore, its simpler feature extraction results in low er accuracy and generalization to unseen net w ork proto cols and traﬃc intensities. This approac h was later expanded in [119]. First, the hypergraph it builds also includes device p orts (referred to as in terfaces), and path no des that enco de the ﬂo w’s routings. The encoding of the netw ork elements is also extended to supp ort m ultiple TCP v ersions. Ov erall, it sho ws lo w er error rates than their baseline analytical mo dels, but p erformance v aries dep ending on the TCP v ersion and the routing path length. F urthermore, the TCP version is enco ded through one-hot enco ding, meaning that it can only b e applied to versions cov ered during training. Next, in [120], the authors prop ose using a Graph Conv olution Netw ork (GCN) [22] arc hitecture, rather than an MPNN, to predict traﬃc delays. Un- lik e previous netw orks, they apply the graph to the original netw ork top ol- ogy instead of building a h yp ergraph of relationships. Unfortunately , their metho dology is not describ ed, and w e lack details suc h as the extracted fea- tures. Ho wev er, w e know they ev aluated on synthetic 500-no de net works and included TCP traﬃc ﬂo ws. In EA GLE [121], the authors prop ose a more complex arc hitecture to predict pack et delays and loss rates simultaneously . Similar to RouteNet, the net w ork is represen ted as a h yp ergraph that describ es links, routers, and routing paths. Information b et w een diﬀerent elemen ts is propagated through random w alks across the h yp ergraph. A ccording to the authors, this tec hnique is parallelizable and eﬀective at extracting neigh b orho od informa- tion, alb eit it also introduces several h yp erparameters to consider. Once the em b eddings are obtained, a multi-head graph attention mec hanism is used. While it compares well against their reference queuing model and the original RouteNet, it lac ks comparison with other SotA mo dels at the time. Then, in xNet [122], they prop ose another MPNN-based arc hitecture that represen ts the netw ork through a h yp ergraph. It considers ﬁve t yp es of net work elements, more than an y other approach: no des, queues, links, paths, and ﬂo ws. Paths and ﬂo ws are later used in the readout to extract the 41 p erformance delays, suc h as path pack et dela y or FCT. A temporal domain is also introduced by splitting the netw ork in to ﬁxed-sized windo ws, ac hieving this earlier than RouteNet-Gauss. Overall, their ev aluation sho ws better results than mo dels such as Deep-Q [129] and the original RouteNet, but lac ks ev aluating diﬀerent traﬃc distributions or diﬀeren t-sized netw ork top ologies. In GLANCE [123], a similar hypergraph deﬁnition of the net work is pro- p osed, considering path, no de, and link embeddings, and an MPNN-lik e ar- c hitecture. Unlik e other MPNNs, GLANCE do es not reuse parameters b e- t ween iterations in the message passing phase. This is uncommon, as while it increases the p oten tial mo del expressivit y , in practice, it increases training costs and usually do es not result in improv ed p erformance. Also, GLANCE deﬁnes an edge graph conv olutional la y er when up dating the no de em b ed- dings, combining graph and sp ectral features. While the latter can capture expressiv e relationships, they to o risk lo wer generalization to unseen top olo- gies. Moreo v er, GLANCE is the ﬁrst mo del to propose the use of transfer learning, sp eciﬁcally , to re-train the mo del to new tasks quickly . It consists of ﬁxing (a.k.a freezing) the w eigh ts of the embeddings and message passing phases, and retraining the readout function from scratc h. In the ev aluation, GLANCE do es improv e against RouteNet-F ermi, although this diﬀerence is smaller while generalizing to unseen top ologies. Unlik e in previous mo dels, Flo wSeer [124] pro cesses top ological and ﬂow information separately . The top ological information is sen t through a GCN, while ﬂo w information is pro cessed by a DNN with atten tion, trained through an enco der-deco der sc hema. Extracted features are then concatenated and fed to a DNN for the ﬁnal prediction. While separating b oth types of infor- mation allows pro cessing each with an appropriate architecture, the mo del cannot learn how they in teract un til the ﬁnal DNN. In its ev aluation, it do es sho w similar or b etter prediction accuracy than RouteNet-Erlang, but it do es not include comparisons of generalization to unseen net work top ologies. Next, m4 [125] prop oses the use of a “ﬂow-lev el sim ulator" based on GNNs. Sp eciﬁcally , their solution includes a ﬂow generation module that generates TCP ﬂo ws according to a user-sp eciﬁed workload. The model then uses a temp oral and spatial comp onen t to up date the state and predict the ﬂo w’s p erformance metrics, sp eciﬁcally F CT and throughput. The temp o- ral component is based on a GRU [127] model, and the spatial mo del on a GraphSage [126] mo del. The spatial component also uses a h yp ergraph deﬁnition, which considers ﬂows and links only . Unlik e RouteNet-Gauss or xNET, its temp oral comp onen t is based on ev ents (i.e., a ﬂow starting/ending 42 in the netw ork) rather than ﬁxed-sized windows. This minimizes ho w many times the ﬂow states need to b e up dated, although it makes it unsuitable for predicting metrics such as pac ket delay o ver time. m4 also prop oses learning secondary p erformance metrics during training, suc h as the ﬂo w’s remaining data to retransmit, to provide the model with further information and im- pro ve its ov erall accuracy , v alidated in their ev aluation. Finally , m4 cannot mo del QoS queues. 5.2.3. Other DL Mo dels Due to GNN’s adv an tages, they are the preferred architecture when build- ing net work models. How ev er, authors ha v e also exp erimen ted with other arc hitectures and approaches. An early example is that of xW ea ver [128], where authors prop ose w ork- ing Con volutional Neural Net w ork (CNN). Sp eciﬁcally , its model uses tw o separate CNNs to pro cess the adjacency and traﬃc matrices, whose results are then concatenated and joined in a DNN to predict any relev ant p erfor- mance (in the ev aluation, they fo cus on FCT and throughput). While using a CNN allows for a more eﬀectiv e architecture than a DNN, as corrob orated in its ev aluation, it still raises some questions. F or example, it is not clear if the method is p erm utation in v ariant —that is, whether the no des’ order in the traﬃc matrix impacts the results. Also, similar to Flo wSeer, pro cess- ing separately the top ological and traﬃc information mak es it harder for the mo del to learn cross-interactions. Another early approach is prop osed in Deep-Q [129], consisting of train- ing a Conditional V ariational Auto-Enco der (CV AE) mo del [130]. Both the enco der and deco der segmen ts of the CV AE, implemen ted through DNNs, use the extracted traﬃc features as bias. These features are extracted from the traﬃc matrices using a Long Short-T erm Memory (LSTM) [25] netw ork. A sp ecialized loss deﬁnition is used to train b oth the LSTM and CV AE. A t inference time, only the LSTM and the deco der segmen t are used to predict the desired p erformance metrics. While this approac h is accurate and has lo w inference time, it ignores the underlying netw ork top ology . Later, in [131], the authors prop ose the adoption of the transformer ar- c hitecture [132], p opular at the time due to its successful use in LLMs. They argue that ha ving a pre-trained netw ork performance mo del can b e used and later ﬁne-tuned to sp eciﬁc net works. They also defend that trans- formers, as a pow erful sequential mo del, can b e used to generate and use pac ket embeddings to make pack et-lev el predictions. While they examine 43 some transformer-based netw ork mo del protot yp es, they lac k comparisons against SotA mo dels. A more robust implemen tation of a net w ork transformer mo del can be found in DeepQueueNet [16]. It follows a mo dular approac h: using small, c heap-to-simulate scenarios, it builds a library of DL mo dels, eac h represent- ing a net work device. These are implemented through a Bidirectional LSTM (BiLSTM) and a transformer mo del. They tak e as input the sequence of pac ket arriv al times to the device and output the up dated sequence after exiting the device. Ultimately , most, if not all, of the devices in the DES are replaced by the DL model coun terparts. DeepQueueNet was directly inspired b y MimicNet [15], a hybrid mo del that replaced asp ects of DES sim ulation with DL, but, unlik e it, it is not constrained to a F at T ree top ology . Ho wev er, DeepQueueNet’s metho dology do es come with limitations. F or example, as noted by their authors, it cannot model the transien t state of net works and, b ecause of the batc h pro cessing of pac kets, it cannot supp ort stateful protocols lik e TCP . F urthermore, b ecause of the batc hing, pac k ets can b e pro cessed out of order; the authors do prop ose a solution, but it re- quires rep eated inferences by the DL mo dels, increasing computational cost. In addition, b oth the BiLSTM and transformer architectures are notoriously computationally exp ensiv e. As a result, DeepQueueNet do es not app ear to reduce the amoun t of computational eﬀort, relativ e to DES. Instead, due to its implemen tation through libraries like T ensorﬂo w, it is better suited to run distributiv ely and even in sp ecialized hardw are lik e GPUs. Still, less computationally expensive methods, suc h as MimicNet, p erform inference faster under the same hardware, with the diﬀerence scaling with the netw ork top ology’s size. Finally , in [133, 134], the authors prop ose a tw o-step pro cess to build a net work mo del b y ﬁrst training it using sim ulated net work data and then eﬃcien tly adjusting it using a small dataset of real-w orld netw ork data. The ob jective w as to utilize transfer learning to allo w the mo del to learn from sim ulated samples while b eing eﬀective in real-world scenarios. The mo del itself w as to follo w the Neural Pro cesses (NPs) [27] architecture, a type of DNN that focuses on learning the input’s laten t features, and eases learn- ing. How ev er, it also shares limitations presen t in DNNs, lik e the lac k of adaptabilit y to unseen top ologies, routing conﬁgurations, and traﬃc proﬁles during training. 44 5.3. Summary While ML-based netw ork mo dels ha v e existed since the early 2000s, it w as not until the proliferation of deep neural net work arc hitectures, and sp eciﬁ- cally GNNs, that they b ecame so dominant. The ﬂexibilit y b ehind these ar- c hitectures allo ws for sp ecialized designs for netw ork p erformance mo deling, allo wing them to b e more accurate than analytical mo dels while p erforming inference at low inference costs. This mak es them especially app ealing for activ e netw ork management, where quick, accurate predictions are required (e.g., xW ea ver [128], RouteNet [109], GLANCE [123]), o v ertaking analytical mo dels (discussed more in detail in Section 7). T raditional ML mo dels also remain a viable option, as they are c haracterized b y even c heap er inferences (e.g., CLAAP [106]). ML mo dels are not ﬂa wless, ho wev er. Unlike sim ulation or analytical mo dels, ML mo dels are constrained to scenarios similar to those seen during training (e.g., netw ork top ologies, traﬃc proﬁles, netw ork proto cols). While this can be mitigated through clever design (e.g., model architecture, ho w input data is enco ded), no co v ered approac h achiev es the same degree of generalization as DES. Also, while faster than DES simulators, ML mo dels usually cannot oﬀer the same level of gran ularit y in their predictions. Those that do, lik e DeepQueueNet [16], do so at the cost of their lo w inference costs. 6. Hybrid Approac hes In this section, we discuss netw ork mo dels that combine t w o or more of the previously discussed approaches, summarized in T able 5. This has the aim of complementing their adv an tages and minimizing the pitfalls. By nature, these mo dels are the most heterogeneous in the w ays they approach the mo del. 6.1. Mo del-tune d Emulation for Performanc e Mo deling Net work em ulators are p ow erful to ols to v alidate and understand net w ork dynamics. Unlik e sim ulation, whic h fo cuses on deﬁning the netw ork state and understanding ho w it is up dated, em ulation replicates the net work b eha vior and its devices through softw are. Ho w ev er, emulation also replicates through soft ware certain b eha viors that were originally executed by hardware logic. This distorts the time tak en to p erform eac h of these op erations, presen ting serious limitations when predicting the p erformance of the netw orks. 45 Mo del Main T yp e Secondary T yp e Input Scop e Output Scop e T raﬃc T yp e Performance Metrics Ev aluation P antheon [40] Net work em ulator (Mahimahi [137]) Ba yesian optimization F ull net work scenarios P ack et- lev el An y F ull pack et path information Real data iBo x [41] Net work em ulator with a queue mo del Ba yesian optimization F ull net work scenarios P ack et- lev el An y F ull pack et path information Real data Prophet [138] NUM analytical mo del [94] Gradien t descen t F ull net work scenarios Flo w-level TCP 1 Flo w throughput Sim ulated data DeepTMA [139, 140] NC (TMA [35]) GGNN [23] F ull net work scenarios Flo w-level (temp oral) Non-sp eciﬁc W orst-b ound pac ket delay Sim ulated data [39] GraphSage [126] NC (TF A [7, 8], SF A [30]) F ull net work scenarios Flo w-level UDP Mean, min, max, 90th and 99th p ercen tile pack et dela ys T estb ed data QT- RouteNet [38] RouteNet [109] M/M/1/B queue mo del F ull net work scenarios Flo w-level UDP A v erage pack et dela y Sim ulated data GNNetSlice [141] R GCN [32] Unsp eciﬁed QT mo del F ull net work scenarios Flo w-level UDP A v erage pack et dela y , jitter, and loss Sim ulated data 1 TCP ﬂa vors supp orted by Srik an t’s mo del [142] 46 QINN [143] DNN M/G/1 queue mo del T raﬃc- matrix scenarios Flo w-level UDP and TCP (non-sp eciﬁc) A vg. pac k et de- la y , throughput Sim ulated data [42] ns-2 [28] Flo w queueing mo del [74] F ull net work scenarios P ack et- lev el UDP and TCP 2 F ull pack et path information Sim ulated data MimicNet [15] ns-3 [29] LSTM [25] F ull net work scenarios P ack et- lev el An y F ull pack et path information Sim ulated data m3 [144] ﬂo wSim [145] T ransformer (LLama2 [146]) F ull net work scenarios Flo w-level UDP and TCP (DCTCP , TIMEL Y, DCQCN, HPCC) 99th p ercen tile F CT Sim ulated + Real data CausalSim [43] T race sim ulation Causal DNN F ull net work scenarios P ack et- lev el An y F ull pack et path information Sim ulated data Sim2HW [44] OMNET++ [36] GraphSage [126] F ull net work scenarios Flo w-level UDP Min, 25th, 50th, 75th, 90th, 99th, 99.9th, 99.99th, 99.999th p er- cen tile pac ket dela ys T estb ed data 2 Secondary mo del accounts only for TCP . T able 5: Summary of hybrid net w ork p erformance mo dels 47 The authors of P antheon [40] identiﬁed this issue and prop osed a mo del to ﬁne-tune emulation soft w are to track traﬃc p erformance accurately . It consists in ﬁne-tuning em ulation parameters, lik e propagation dela y , using Ba yesian optimization. Speciﬁcally , this is a process where the parameters w ere adjusted, ev aluated using captured internet traces, and up dated ac- cording to their error. This w as done using the Mahimahi [137] emulation soft ware. Their ev aluation sho w ed a tenfold decrease in prediction error. Ho wev er, this approach remains limited, due to the small num b er of ad- justable parameters, resulting in the error rates a veraging at 17% for the tested traces, and the cost of em ulating the diﬀerent scenarios rep eatedly . A similar approac h is that of iBo x [41]. iBox models the netw ork as a sin- gle b ottlenec k link with a FIFO, drop-tail queue. iBox considers t w o mo dels, discriminating b et ween “reactiv e" and “non-reactive" cross traﬃc. They de- ﬁne non-reactiv e traﬃc as those ﬂows whose p erformance is not inﬂuenced b y c hanging the sp eciﬁc congestion control proto col, reactive otherwise. While the non-reactiv e cross traﬃc can be mo deled directly from parameters ex- tracted from the net w ork traces, the reactiv e cross traﬃc parameters are learned through Ba y esian Optimization and real traﬃc traces, as in P an- theon. The queue mo del is executed within a net w ork simulator or em ulator to b e ev aluated; in their ev aluation, the authors sp eciﬁcally used ns-2. iBo x ultimately shares similar b eneﬁts and limitations to its predecessor: b y being built with real traﬃc traces, it attempts to minimize the impact of training with sim ulated traces [147, 148, 59]. How ever, the com bination of Bay esian optimization’s high computational cost and the mo del’s simplicity ma y limit its practicalit y in real-world deploymen ts. 6.2. ML + Analytic al Hybrid Mo dels Generally , b oth ML and analytical models oﬀer similar strengths and w eaknesses —i.e., quic k inference times but less accurate than DES. While it ma y b e coun terintuitiv e to com bine the tw o, successful applications can still b e found in the SotA. An early example is Prophet [138], where they used ML to solv e the NUM mo del [94]. In the original NUM paper, the authors aimed to maxi- mize TCP throughput, but their initial ideal mo del required prior knowledge of each ﬂo w’s utilit y . In Prophet, how ev er, they can appro ximate the ex- p ected utility using Srik an t’s unifying mo del [142]. Although this to o relies on another unkno wn, the scaling factor, Prophet approximates it through sampling and gradient descent. Sampling is done eﬃciently , appro ximating 48 the ﬂo w parameters, and grouping ﬂows to reduce the n um b er of them to consider. Conv ersely , these mo diﬁcations limit the mo del’s applicability . F or example, the ﬂo w grouping is designed assuming a Clos top ology . Another approac h is to use ML to improv e the reasoning of an analytical mo del. In [139], a GNN mo del is used to predict the b est tandem decomp o- sitions to b e used by their NC mo del. By acting as a heuristic, the GNN can impro ve the results of the NC mo del without incurring a signiﬁcan t p enalty in inference time. F urthermore, the GNN mo del does not need to be p er- fectly accurate for the NC mo del to b eneﬁt from it. The authors expanded DeepTMA in [140], to allow the generation of decomp ositions, as well as per- forming feature analysis to understand whic h are the most signiﬁcant features of the GNN mo del. While this approac h reliably increases the NC mo del’s accuracy , ultimately , it cannot address the inherent limitations presen t in NC mo dels (e.g., assuming feedforward netw orks). In con trast, in [39], the authors in vert the dynamic, instead using the upp er b ounds gathered from an NC mo del as an additional input to a GNN mo del task ed to predict the p erformance metrics. Their analysis lev eraged a GraphSage [126] mo del, a generic GNN architecture, but its ﬁndings may b e generalized to more sp ecialized GNN architectures. They hav e also con- ﬁrmed, as expected, that tighter NC b ounds increase the accuracy of the resulting GNN mo del. A similar approac h was follow ed b y the authors of QT-RouteNet [38]. In this case, the authors use a QT model of the netw ork —a M/M/1/B mo del— to extract ﬂo w and link features, which later are used as input b y an RouteNet [109] model. Doing so allow ed it to generalize to topolo- gies m uch larger than training: it w as trained on top ologies up to 50 no des large, and ev aluated in topologies ranging from 51-300 no des. Similarly , GNNetSlice [141] also introduces netw ork slicing information and appro x- imate QT predictions, suc h as the exp ected maxim um queuing delay and pac ket loss rate, as inputs to its mo del. The mo del itself is a Relational GCN (R GCN) [32] designed to measure the impact of netw ork slicing on p erformance. In its ev aluation, it outp erformed RouteNet-F ermi [17] when predicting the a verage pack et loss rate, dela y , and jitter. Later, in Queue-Informed Neural Net work (QINN) [143], integration is expanded to include a queue model in the model’s loss function. In summary , a DNN mo del predicts b oth the av erage queuing delay and the throughput; the latter is then used by a M/G/1 queue mo del to obtain a second queuing dela y prediction. The loss function computes the loss o ver b oth predictions. 49 While the authors hav e used a DNN, this approac h is compatible with other NN arc hitectures. Ultimately , all of these approac hes are a reliable w ay of impro ving the mo del’s accuracy , but they cannot resolve ML’s inheren t limitations. 6.3. A c c eler ate d DES An alternative hybrid approac h for netw ork mo deling, and arguably one of the most p opular no wada ys in the SotA, consists of reducing the compu- tational cost of DES by replacing some elements with a faster alternativ e. The main ob jective is to maintain DES’s b eneﬁts (mainly its accuracy) while minimizing its computational cost. The ﬁrst mo del to attempt this was in [42], where the authors com bined a DES sim ulator (ns-2) with the ﬂuid mo del in [74]. The ﬂuid mo del only co vered TCP ﬂo ws in the net work’s core. The pap er in tro duces rules on ho w pack ets are up dated when they cross the core, according to the v alues of the resolv ed ﬂuid mo del, along with synchronization rules to av oid the DES and ﬂuid model’s states from div erging. Limitations include the ﬂuid mo del’s increased error rate and only applying to TCP ﬂo ws. F urthermore, the translation b et w een ﬂuid and simulated traﬃc can result in predictions that generally hold but do not oﬀer suﬃcien t granularit y . F or example, the ﬂuid mo del may predict accurately that a p ercentage of pac kets will drop at a given time, but cannot exactly predict which pac kets are dropp ed; instead, these will b e selected at random. It will tak e nearly tw o decades for a new iteration of this idea to b e prop osed, which ev en tually do es so in the form of MimicNet [15]. Rather than an analytical mo del, MimicNet uses an LSTM mo del to replace parts of the netw ork topology in the DES. The approac h exploits the symmetry presen t in fat tree top ologies, commonly used within data cen ters: ﬁrst, it sim ulates a tw o-branch fat tree top ology to train its LSTM mo del. During inference, it only simulates a single branch, while the rest are replaced with LSTM replicas. The LSTM mo dels predict whether pack ets in the replicated branc hes are dropp ed or forwarded according to their exp ected b ehavior. By lev eraging the symmetry of the topology , MimicNet remains accurate and computationally eﬃcient. How ever, this results in some rigid assumptions. First, the net work is assumed to b e a failure-free fat tree top ology , with congestion only present in the fan-in to wards the ﬂow’s destination. Second, traﬃc patterns are exp ected “scale prop ortionally to the size of the netw ork", 50 as otherwise faithful mo dels cannot b e trained using the smaller, t wo-branc h top ology . Inspired b y Parsimon [57], the authors of m3 [144] prop ose splitting the net work simulation in to path-lev el simulations. Speciﬁcally , paths are sim- ulated separately , considering in eac h of them the set of foreground and bac kground ﬂo ws. F oreground ﬂ o ws are sim ulated in parallel and quickly using ﬂo wSim [145], while the impact of background ﬂows is approximated using a LLama2 transformer model [146]. These results, as w ell as additional scenario context (e.g., congestion algorithm), are concatenated and fed to a DNN to obtain the corrected appro ximations of the ﬂow’s F CT. The main adv antage of m3 relative to Parsimon is that assuming path-level is a w eaker assumption than link-level independence. Relativ e to its inspiration, m3’s ev aluation show ed it to b e more accurate and quic ker inference times, de- spite including the relatively large LLama2 model. Ho wev er, unlike most sim ulators, it do es not main tain pack et-lev el visibilit y . Instead, like P arsi- mon, m3 is designed for aggregated tail-prediction p erformance metrics, suc h as w orst-case TCP throughput. Finally , its supp ort of congestion control is based on b eing parametrized and added to the input features of its ﬁnal DNN in its arc hitecture. Consequen tly , it only supp orts those proto cols seen during training. 6.4. Simulation with DL-Enhanc e d A c cur acy As w e discussed, while net work simulation is p erceived as the most accu- rate approach for p erformance mo deling, it is still sub ject to some inaccura- cies [58, 40]. Consequen tly , w e hav e seen some approaches that attempt to use DL to enhance the simulation’s accuracy to b etter matc h realit y . Unlik e the models in Section 6.1, these are applied to simulators, not emulators, and they are complemen ted with DL mo dels. One example of this is CasualSim [43]. In it, they use trace simulation, a faster y et more inaccurate alternativ e to DES, and instead use ML to impro ve its accuracy . T race-simulation is a v arian t of DES where only a subset of the system is sim ulated, while the other segments are replaced with traﬃc traces. Ho w ever, it assumes that the traces’ conten ts are independent of the rest of the simulation, which rarely holds. Consequen tly , CasualSim prop oses the use of a causal DNN mo del to adapt the traces according to the system b eha vior, impro ving accuracy . Ho w ever, trace simulation is meant to sim ulate the impact of sp eciﬁc small changes for “what-if" scenarios. Larger 51 c hanges result in fewer elements b eing replaced by traces, hence b ecoming increasingly similar to standard DES. Recen tly , in Sim2HW [44], the authors use an expanded GraphSage [126] mo del to correct the net work p erformance predictions giv en by OMNET++ [36]. T o train the mo del, sim ulated netw ork scenarios were replicated in a testb ed to obtain their ground truth. While this approach may enhance the sim u- lator’s accuracy in replicating real-w orld traﬃc, it does not address DES’s main issue: its high computational cost. 6.5. Summary Ultimately , hybrid mo dels are characterized b y their diversit y and prac- tical nature. By combining existing, pro ven approaches, the authors of these approaches obtain stronger net work models. This includes expand- ing emulation with netw ork mo dels to supp ort p erformance mo deling (e.g., P antheon [40] and iBo x [41]), applying ML to accelerate DES (e.g., Mimic- Net [15] m3 [144]) or correct its outputs (e.g., Sim2HW [44]), using ML-based heuristics to improv e NC mo dels (e.g., DeepTMA [139, 140]), and con versely using NC and QT to impro v e ML model training (e.g., QINN [143]) or to pro vide additional input information (e.g., [39], QT-RouteNet [38], GNNet- Slice [141]). The biggest beneﬁ ts of these approac hes are the ability to com bine the strengths of both approac hes. F or example, when using ML to accelerate DES, ideally , the resulting mo del can retain DES’s accuracy and gran ularity while b eneﬁting from lo wer computational costs. How ev er, this comes at the risk of inheriting the weaknesses as well —for example, an y hybrid approach with an ML-mo del will require such to b e trained. 7. Discussion on Identiﬁed T rends and Challenges within Net work P erformance Mo deling In this section, we discuss the trends and challenges iden tiﬁed in current net work performance mo dels. This section is also mean t to expand on earlier discussions [147, 148, 59]. 7.1. Balanc e Betwe en A c cur acy, R esolution, Applic ability, and Infer enc e Cost An ideal net work performance mo del should b e accurate , expressiv e (i.e., granular predictions, ideally pac k et-level), applicable in general sce- narios, and with a lo w computational cost . In practice, how ev er, current net work p erformance mo dels cannot guarantee all of these prop erties. 52 Approac h A ccuracy Expressiveness Applicabilit y Computational Cost Simulation DES High P ack et-lev el General High (and sequential) PDES High QT Discrete Low Flow-lev el Speciﬁc TCP version Lo w Fluid Flow-lev el (temporal) NC ADNC Medium Flow-lev el F eed-fow ard netw orks Medium ODNC High (temporal) High ML Shallow ML Medium Flow-lev el On trained top ologies Low (temporal and and traﬃc proﬁles DL High non-temporal) On trained traﬃc pro- ﬁles Medium Hybrid approaches ML+Analytical High Flo w-level On trained traﬃc pro- ﬁles Medium Accelerated DES High Flow-lev el / Pac ket-lev el On trained traﬃc pro- ﬁles; topology supp ort v aries. High T able 6: Summary of current net w ork p erformance mo deling approac hes. This is reﬂected in T able 6, where the diﬀerent approaches are summa- rized and qualitativ ely compared. On the one hand, DES tends to be the preferred option for netw ork mo deling, but its cost mak es it unfeasible in man y scenarios. PDES addresses this, not by reducing its cost, but b y allow- ing for the simulation to b e spread across more cores. This do es allow it to sim ulate larger scenarios, but requires higher amounts of computing p o w er, whic h remains impractical. As a result, research in DES and PDES fo cuses on reducing the cost while main taining its other beneﬁts. On the other hand, b oth analytical and ML mo dels tend to b e computationally inexp en- siv e. Instead, research on these mo dels fo cuses on improving their accuracy , expressiv eness, and the scenarios in whic h they are applicable while retaining the lo w computational cost. Hybrid approaches, on the other hand, try to split the diﬀerence b y syn- thesizing diﬀeren t approaches. F or instance, accelerated DES metho ds seek to reduce computational cost while preserving the strengths of traditional DES. DL techniques hav e also prov en ﬂexible, as survey ed examples include quic ker mo dels with reasonable accuracy (e.g., RouteNet-F ermi [17]) and more complex, complete mo dels akin to DES (e.g., DeepQueueNet [16]). Ho wev er, curren t approac hes still fall short of reac hing all four c haracter- istics. F or example, MimicNet [15] is only applicable in fat tree top ologies, while m3 [144] loses pac k et-level visibility and can only predict high p ercen tile F CT. F urthermore, approac hes such as ML+Analytical Hybrid Mo dels tend to oﬀer small yet signiﬁcan t improv ements in their accuracy but do not ad- 53 dress other fundamen tal constraints present in either approach. As a result, the optimal approac h will dep end on its exp ected application. If time and computational resources are not a constrain t, DES remains the b est option. How ev er, if the net w ork p erformance prediction is expected to b e integrated in a more complex, time-critical application (or simply a quic ker, appro ximate prediction is desired), an analytical or ML mo del is b etter suited. Alternativ ely , a mo del that is mean t to b e applied only to a speciﬁc net work with a static top ology ma y not need the same degree of applicability that a general-purpose approac h like DES. Ultimately , as also argued in [148], researchers m ust consider their use case when deciding whic h measuremen ts to use as input, whic h asp ects of the netw ork should b e modeled in this use case, and whic h properties are most relev ant. The diﬀerences in approaches are also reﬂected in how the ﬁeld tries to develop b etter netw ork mo dels. While some researchers fo cus on reducing the costs of the hea vier, more accurate mo dels, others try to improv e the accuracy of the more eﬃcien t ones. F urthermore, it is not unreasonable to b eliev e that suc h an “ideal" mo del is impossible to build to begin with, as the diﬀerent features can imp ede others. F or example: • An expressive mo del should predict the b eha vior of individual pack ets, y et their amount in mo dern netw orks is measured in the billions [58]. Hence, obtaining pac k et predictions, without aggregation or summa- rization, will b e inherently exp ensive by sheer scale. • Similarly , aggregation also implies losing information as the sequences of pac kets get shortened to a ﬁxed set of v alues, resulting in a p otential loss of accuracy . • Analytical and statistical mo dels rely on assumptions for them to b e accurate. Ho wev er, these ma y constrain the range of scenarios they can b e applied to. Nonetheless, without suc h assumptions, the mo d- els fail to c haracterize net w ork traﬃc and its behavior (no-free-lunch theorem [149]). 7.2. The Dominanc e of DES and the Sur ge of GNNs T raditionally , DES has b een regarded by netw ork op erators as the gold standard for netw ork p erformance mo deling. This is exempliﬁed by the suc- cess of ma jor DES sim ulators like ns and OMNET++, as w ell as the fact that 54 most non-DES net work p erformance mo dels are compared using or against sim ulated data. Referring bac k to Figure 1 in the introduction, sim ulation is the only t yp e of mo del that has had constan t attention o ver the last three decades. This can be explained by the fact that simulation is b oth one of the most accurate options and capable of oﬀering pac ket-lev el predictions. Ho wev er, since 2018, w e ha v e seen a surge in DL net w ork p erformance mo dels. This is in part due to the success of the GNN architectures. Their design exploits the relational information in computer netw orks to their ad- v antage, pro ving to b e accurate and computationally inexp ensiv e. While they do not oﬀer pac ket-lev el granularit y , they can b e augmen ted to include a temp oral resolution [58, 122, 125]. This has resulted in their dominance as the most used DL architecture when building netw ork mo dels: 17 out of the 29 ( ≈ 58% ) pure ML mo dels iden tiﬁed in the survey are based on GNNs. They are also quite relev ant in h ybrid approaches: 5 out of the 12 h ybrid mo dels identiﬁed included a GNN mo del. Note that this prev alence is not univ ersally shared across the en tire com- m unity . Let us consider mo dels published since 2018 in ACM SIGCOMM and IEEE INF OCOM, the t w o CORE A* conferences regarding computer net working. On the one hand, the IEEE INFOCOM do es reﬂect the prev a- lence of GNNs, with tw o out of three mo dels accounted for b eing GNN- based [17, 122] and the remainder one b eing a hybrid method in volving ML [138]. On the other hand, ACM SIGCOMM instead prefers PDES and similar metho ds: out of the sev en mo dels published since 2018, three mo dels are PDES prop osals [18, 53, 56], and another tw o mo dels are DL-accelerated DES hybrid approaches [15, 144]. Of the remaining tw o mo dels, only one is a GNN [110]. The other, DeepQueueNet [16], is a transformer-based mo del whose design mimics DES reasoning, do wn to reasoning o v er individual pack- ets. These discrepancies show that diﬀerent voices in the communities may sho w preferences in whic h asp ects of net w ork p erformance simulation they prioritize, hence preferring approac hes whose strengths align with them. 7.3. R e duc e d Inter est in A nalytic al Mo dels While there has b een a rise in DL-based mo dels, sp eciﬁcally the GNN arc hitectures, it has come at the cost of slo wing the dev elopmen t of newer analytical mo dels. Out of the models w e hav e surv ey ed, only one purely analytical mo del w as published in the last 5 years [66], with the previous one b eing published in 2017 [35]. W e hav e identiﬁed some p otential reasons why . 55 First, the most straightforw ard explanation is the fact that ML (and DL) mo dels oﬀer similar b eneﬁts while outp erforming analytical mo dels. At ﬁrst, the main diﬀerence that analytical mo dels oﬀered against DES w as their re- duced computational cost, at the cost of less expressive and accurate results. No wada ys, ML mo dels also oﬀer accurate and computationally inexp ensiv e predictions, hence placing themselves in direct comparison against analytical mo dels. Second, unlike ML, analytical models try to explicitly deﬁne complex net work b eha vior through sets or systems of equations. ML mo dels either treat netw orks as black b o xes and try to predict their p erformance through regression (e.g., [37] and CLAAP [106]), or explicitly represen t some exist- ing dep endencies within the netw ork but still rely on the training pro cess for them to b e completely developed (e.g., RouteNet [111]). By con trast, analyt- ical mo dels must b e completely formulated by their creators, whic h in volv es deﬁning their assumptions and mathematically proving their v alidity or error b ounds. In turn, this makes analytical mo dels more rigid, less future-pro of, and arguably harder to build o verall. An example of this is how diﬀeren t approaches supp ort m ultiple versions of TCP sim ultaneously . This is an imp ortan t asp ect for netw ork mo dels, as TCP implemen tations ev olve, and it is not uncommon for several versions to co-exist in the same net w ork [147]. In analytical mo dels, authors either assume a “generic" TCP version, whic h results in inaccuracies as it fails to capture the diﬀerences b et ween implemen tations [78, 71, 72], or are forced to re-form ulate segments of the mo del for eac h v ersion they supp ort [70, 74]. In con trast, DL mo dels that support m ultiple TCP v ersions usually diﬀeren tiate b et w een versions through a one-hot encoded vector [119, 125]. This means that DL mo dels can b e expanded to ﬁt more TCP v ersions easily , as long as the authors ha ve the recorded scenarios to train them with. There is also the fact that, as new DL arc hitectures keep b eing dev elop ed, researc hers will con tin ue to explore their p oten tial as netw ork p erformance mo dels, as it happ ened recently with the transformer architecture [131, 16]. Altogether, this can explain why the researc h into new analytical mo dels has slo wed down, and instead, this eﬀort has mo v ed into the developmen t of ML mo dels. 7.4. A dapting to Changing Networks One of the biggest c hallenges of netw ork p erformance mo deling identiﬁed early on was that of describing the in ternet as an “immense moving target" 56 [147, 59]. They iden tiﬁed the dynamic nature of netw orks, ho w they change o ver time in terms of traﬃc patterns, usage, and implemented proto cols, and how these could be a c hallenge to net w ork models at the time. This remains an ongoing c hallenge: since the 2000s, in ternet usage has contin ued to increase, data centers rely on new er TCP v arian ts like DCTCP [34] and DCQN [21], and wireless net works are more common and complex. A t the time, solutions prop osed to address this c hallenge w ere based on the mo deling approac hes prev alen t at that time. Analytical mo dels were seen as the most vulnerable, as they tend to rely more on net work assumptions or, in the case of queuing and ﬂuid mo dels, they were designed with a giv en TCP v ersion in mind. A solution prop osed back then w as searching for inv arian t prop erties —asp ects or prop erties that remain constan t o ver diﬀerent netw ork top ologies, sizes, and usages [147, 59]. Examples include the self-correlation in pac ket in ter-arriv al distributions, or the heavy-tail distributions presen t in metrics suc h as pack et dela y , R TT, or F CT. Another solution prop osed was the push for mo dular, interoperable mo d- els [148]. The idea w as to av oid building monolithic netw ork mo dels capable of individually addressing any scenario. Instead, it prop osed building net- w ork mo dels for them to b e interoperable, that is, for them to b e combined and fed to each other. This allo ws mo dels to b e gradually up dated to newer dev elopments, like newer TCP versions. This approac h is better suited for DES sim ulators, as these can b e expanded to co v er new proto cols and de- vices as they are released. Op en-sourced simulators like ns [28, 29] and OM- NeT [36] hav e remained up dated thanks to the communit y . F urthermore, we ha ve mo dular PDES lik e SimBric ks [53] and SplitSim [54] whose mo dular- it y enables their co verage of net w ork features as well as their parallelization. Finally , DL mo dels do oﬀer new wa ys to address this issue, w e will co ver it later in Section 8.3. 7.5. Simulation-Dominate d Evaluation Another trend in net work mo deling is the prev alence of using sim ulated data in the ev aluation. This fact is reﬂected in Figure 3, which shows how mo dels across the diﬀeren t approaches are ev aluated. Overall, it sho ws that the use of sim ulated data dominates, used in ov er half of the survey ed mo dels, and is the most used across all the approaches except sim ulation itself; ev en across sim ulators, comparing against other sim ulators remains the preferred ev aluation c hoice. This is because most sim ulators surv ey ed are published through white pap ers, whic h describ e the sim ulation implemen tation but 57 Simulation Analytical Models ML Models Hybrid Appr oaches T otal 10 10 5 21 17 8 51 1 9 2 12 2 3 3 8 13 1 14 Analytical Simulated-data T estbed-data R eal-data No evaluation / Other Figure 3: Ev aluation categories across diﬀerent model t yp es. If a model is ev aluated with m ultiple categories, they are categorized in the follo wing priorit y: real data, testb ed data, sim ulated data, and analytical. otherwise do not measure their accuracy . Otherwise, w e see a minority of mo dels being ev aluated with testbed or captured data in ML and h ybrid approac hes, while another signiﬁcan t minority of samples are just ev aluated analytically in the case of the analytical mo dels. Ov erall, there are strong reasons for this. First, captured data is scarce and not alw a ys a v ailable. Unlik e it, sim ulated data can b e generated to co v er an y desired scenario, allo wing for more div erse datasets. Second, captured data ma y also b e limited by the features that the data’s authors were able to capture, while in sim ulation, the entire state of the net work and traﬃc is presen t. F urthermore, testb ed net w orks require a monetary cost to set up and later upgrade (e.g., changing devices). Mean while, sim ulation softw are can b e run on generic computing devices without additional hardware. Ho w- ev er, sim ulation has sev eral disadv an tages, as commen ted bac k in Section 3. First, its high computational cost limits the size of the net w orks or traﬃc in tensities used in the ev aluation. Hence, whenever the mo del is applied to these challenging scenarios, it ma y p erform w orse than expected. There is also the issue where simulation ma y not cov er sp eciﬁc proto cols and netw ork devices, or without the required accuracy [59, 60, 40, 58]. 58 7.6. Heter o gene ous Appr o aches, Heter o gene ous Evaluations Another challenge present in net work p erformance mo deling is a lac k of common ev aluation pro cedures. W e b eliev e that this is due to the heterogene- it y of mo deling approaches comp ounded by the scarcity of public datasets. The diﬀerences in net work p erformance modeling approac hes allo w for the creation of mo dels with diﬀerent strengths and goals. Suc h versatilit y p er- mits net work op erators to c ho ose the mo deling approac h that b est ﬁts their needs. First, mo dels ma y diﬀer on the performance metric they measure, whic h dep ends on the traﬃc type they are mo deling (e.g., traﬃc from a spe- ciﬁc proto col like TCP). This also inﬂuences what data they tak e as input, as some use solely traﬃc traces, while others can include the entire netw ork in their reasoning. Later, some mo dels can b e only applicable under certain constrain ts (e.g., NC only applicable to feed-forw ard net w orks, or Mimic- Net [15] only to fat tree top ologies), which also limits common scenarios where they can b e compared to other mo dels. Even when considering the same metric, the granularit y of their outputs ma y also condition ho w mo dels are ev aluated. F or example, mo dels that predict on a pack et-lev el, or even a ﬂo w-level with a temp oral comp onen t, must see their predictions aggregated to b e compared against mo dels oﬀering ﬂow-lev el predictions. F urthermore, ev en comparing pack et-lev el outputs against each other may b e complicated. F or example, one mo del ma y predict a giv en delay for a pac ket that, according to the ground truth, was lost. This, in turn, makes us rely on error metrics lik e the W asserstein metric to compare the distribution of the predictions. While useful, such metrics lac k the in tuitiveness b ehind the obtained result. That is, unlike metrics like the Mean Absolute P ercent- age Error that can b e easily understo o d, there is no clear w ay of interpreting whether a given W asserstein metric can b e regarded as a “go o d" or “bad" re- sult. A t most, it can b e used to compare mo dels, knowing that low er v alues mean closer predictions to the ground truth. Note that these diﬃculties also apply to mo dels that oﬀer ﬂow-lev el pre- dictions with a temp oral comp onent. This may b e exacerbated by the fact that these mo dels may consider their temp oral comp onen t under diﬀeren t scales (e.g., predictions every second v ersus ev ery millisecond). Ev en worse, mo dels ma y deﬁne their temp oral comp onen ts diﬀerently . F or example, while RouteNet-Gauss [58] ma y consider ﬁxed-length windows, the temp oral com- p onen t in m4 [125] is ev ent-driv en. Finally , all of these diﬃculties are w orsened b y the lac k of publicly a v ail- able netw ork data. This is the case for several reasons. In the case of real- 59 w orld captured data, only a few groups and companies hav e access to suc h, and the capabilit y to capture it. F urthermore, publishing real data from real users may p ose priv acy risks and requires anon ymization pro cedures for it to b e safe to b e made public. Alternativ ely , testb ed netw orks are rare due to their costs; hence, b y extension, published datasets will also b e rare. 8. F uture Directions and Opp ortunities In this section, we follow up on the previous discussion with our prediction on future research directions for net w ork p erformance modeling. This is based on our expected ev olution of curren t trends and ho w curren t c hallenges ma y b e addressed. 8.1. Consolidation of PDES and GNNs Mo dels First, from the conclusions obtained from Sections 7.2 and 7.5, w e can deriv e that DES is one of, if not the most desirable, mo deling approac hes a v ailable to net w ork op erators no w adays. Unsurprisingly , lately , there has b een a push by researc hers to address the c hallenge of PDES design. Hence, w e b eliev e that PDES researc h will contin ue to consolidate its p osition as the new “baseline" metho d, replacing traditional (single-process) DES sim- ulators. This means reducing the synchronization ov erhead while retaining correctness. W e believe that the increased a v ailabilit y of computing p o wer [3] means that computational eﬃciency will not b e as m uch of a priorit y com- pared with the ability of the PDES to exploit the resources a v ailable. The most recent PDES survey ed, NSX [56], exempliﬁes this, prop osing a highly distributed sim ulation that can run in the same sp ecialized hardw are used to train AI mo dels. By analyzing the SotA, we can also conclude that DL models, and sp eciﬁ- cally GNNs, are w ell-p ositioned to b ecome the main alternativ e to simulation. They can be applied to scenarios where PDES cannot, suc h as those with restrictions on computing resources av ailable. Then, for the remaining sce- narios, DL models are magnitudes of times quic k er than PDES while also b eneﬁting from parallelization and specialized hardw are thanks to libraries lik e T ensorﬂo w and Pytorch. Among DL architectures, sp ecialized GNNs for net working ha ve prov ed to b e the most eﬀective, being b oth cost-eﬀective, accurate, and more robust than their other alternativ es. Ho wev er, there is still progress to be made for GNNs. F or example, supp ort for congestion control algorithms [119, 118] or the temp oral comp o- 60 nen t [58] is still fairly recen t. Also, current models can only faithfully predict traﬃc patterns seen during their training. Ultimately , addressing these issues will increase the num b er of scenarios GNNs can b e applied to and establish them as a reliable alternativ e to DES. 8.2. A nalytic al Mo dels Enhancing DL Mo dels In Section 7.3, w e discussed ho w researc h in to analytical mo dels is getting reduced atten tion, as ML and DL netw ork models are pro ving to be more cost-eﬀectiv e. Ho w ever, we hav e seen how analytical models are still b eing used in conjunction with other techniques. An example of this is how b oth QT and NC mo dels ha ve b een used as heuristic input to be fed to a DL mo del to impro v e their accuracy [39, 38]. Recen tly , studies ha ve also sho wn that they can b e used to deﬁne more informed loss functions, further impro ving the training pro cess [143]. Ultimately , analytical mo dels still oﬀer inexpensive, goo d appro ximations of the expected p erformance that more complex netw ork mo dels can later re- ﬁne. The proposed approac hes also b eneﬁt from the fact that they can b e applied indep enden tly of the underlying ML architecture, easing their imple- men tation. F urther researc h in this area can lead to impro v ed ML mo dels that retain their cost-eﬀectiveness. This includes studying other asp ects in whic h the analytical mo del can b e integrated into the DL architecture b e- y ond the input features and loss functions, as w ell as how to incorp orate more complex analytical mo dels without incurring excessiv e computational costs. 8.3. ML as a New T o ol for Evolving Networks Section 7.4 discussed the c hallenge of the ev er-c hanging nature of net- w orks, and ho w to address mo dels becoming outdated b ecause of it. Pro- p osals in the past include mo dels focusing on true inv ariant prop erties of net works or designing systems of interoperable mo dels. In addition to these prop osals, we b elieve ML, and sp eciﬁcally DL, has un- lo c k ed a new wa y to handle c hanging netw orks. Unlik e analytical mo dels or sim ulations, an ML mo del do es not explicitly enco de the net wo rk dynamics. Instead, it learns these dynamics from its training data. Hence, under the assumption of having an appropriate ML architecture to learn such netw ork dynamics, suc h as a GNN, it is reasonable to b eliev e that the architecture itself do es not need to b e modi ﬁed to b e adapted to c hanges in the netw ork. 61 Instead, the ML mo del w ould hav e to b e re-trained, a pro cess that, while re- quiring some computational eﬀort it no longer requires the exp ert knowledge required to adapt an analytical mo del, implemen t a simulator, or design an ML arc hitecture. W e refer to this new approac h as design onc e, tr ain as ne c essary , and it has already b een proposed in papers suc h as [131]. Besides adapting to ev olving netw ork conditions, it w ould also allow for designing a general ar- c hitecture v alid for many sp eciﬁc use cases, dep ending on how it is trained. F urthermore, re-training may b e less costly than training from scratch b y exploiting transfer learning, as done in existing w orks [123, 133, 134]. At its b est, this approach allo ws the adv an tages of using a universal design for building net w ork performance mo dels, while making their implemen tation sp eciﬁc for each use case, adapting to its necessities. Nonetheless, this approach still has some dra wbacks that future researc h m ust address. First, with the current form ulation, the mo del w ould still re- quire new traﬃc measuremen ts for its readjustmen t. While transfer learning w ould reduce the amoun t of data needed, it can still b e a costly process. Also, until the mo del is adjusted, it cannot b e exp ected to w ork accurately . Hence, DL mo dels should still b e designed to b e as generalizable as p ossible, to reduce the amoun t of retraining to b e done. Another risk is the gradual degradation of the mo del’s accuracy intro- duced by minor changes, rather than a single signiﬁcant change. The issue with this is that the gradual degradation ma y b e harder to iden tify , and hence, the op erators may b e w orking with an inaccurate mo del without re- alizing it. This ma y b e adjusted with a con tin uous mo del ev aluation, but requires p eriodic netw ork measuremen ts. 8.4. Data Center-Centric Designs The rise in data cen ter demand, and its fo cus on ML-related w orkloads, is an opp ortunit y to develop more sp eciﬁc, data cen ter-centric net w ork mo dels. Sp eciﬁcally , data cen ters already share common, w ell-researched top ologies (e.g., F at T rees [150]), and currently it is expected that future demand will b e mainly due to the training and usage of Large Language Mo dels or similarly large DL mo dels [3]. Altogether, most wired net work scenarios in the future will lik ely represen t net w orks with similar top ologies, hardw are, and ev en traﬃc patterns as they will b e dedicated to the same ML-related use cases. Consequen tly , this homogenization introduces common prop erties across data cen ter net w orks. These can be leveraged, in the same wa y as other 62 in v arian t prop erties, to simplify mo del design without compromising mo del accuracy . While such mo dels are highly sp ecialized, giv en the rising demand and imp ortance of data centers, it is a sensible trade-oﬀ. Among the survey ed mo dels, we hav e already found mo dels that do b en- eﬁt from the increased demand in data centers. MimicNet [15] is a p ow erful net work model, b eing accurate, cost-eﬀectiv e, and gran ular, but only ap- plicable to F at T ree top ologies. NSX [56] is a net w ork sim ulator that is purp osefully built to tak e adv an tage of mo dern data centers — that is, b eing designed to run concurren tly in multiple GPUs. 8.5. Better Usage of Simulation Data for T r aining R e al-W orld Mo dels Bac k in Section 7.5, we discussed the prev alence of DES in the ev aluation of netw ork mo dels and the issues arising from doing so. It is worth noting that ev aluating models (and training them in case of ML-based ones) on sim ulated data may not lead to accurate results when addressing real-w orld traﬃc. Hence, future w ork must ﬁnd wa ys to reduce this discrepancy . Currently , one w ay this may b e addressed is b y improving on DES itself. F or example, more eﬃcient PDES simulators can b e used to simulate scenarios to o large for traditional DES to handle. There is also w ork lik e [44] where the DES’s output is corrected using a surrogate ML mo del. Another p opular approach, sp eciﬁcally when building ML mo dels, is fram- ing the discrepancies b et ween simulation and reality as a transfer learning problem. T ransfer learning is a series of tec hniques mean t to exploit trained mo dels for a giv en task to assist in the training of mo dels for a second, re- lated task. In this case, transfer learning would consist of using ML mo dels trained with simulated data to assist mo dels to b e applied to real-w orld traf- ﬁc. This allows for the latter to require fewer samples to build and b eneﬁt from additional accuracy . Among the surv eyed w orks, in [133, 134] sim u- lated samples are eﬀectively used through transfer learning, but it relies on a sp eciﬁc architecture that b eneﬁts from it. In the recently p ublished [151], transfer learning w as successfully applied to a RouteNet-F ermi [17] mo del. 9. Conclusions In conclusion, this survey analyzes the ev olution of netw ork p erformance mo deling ov er the last decades. W e hav e identiﬁed 95 unique netw ork p er- formance mo dels spread across multiple conferences and journals. By iden- tifying the taxonomy of mo deling approaches, w e can gain a deep er under- 63 standing of the ev olution of priorities within the researc h and professional comm unity . F or example, we observ ed an evolution in preferred metho dolo- gies, as mo dels hav e transitioned from analytical mo dels to ML and hybrid approac hes. F rom the survey ed mo dels, we ha ve recognized the prop erties sought af- ter b y net work op erators: accuracy , expressiv eness (the lev el of detail in the results), applicability in an y plausible scenario, and low computational cost. While diﬀerent approaches fo cus on one or sev eral of these prop erties, no surv eyed net w ork p erformance mo del can ac hiev e all of them simultaneously . A dmittedly , it may be imp ossible for all of these properties to b e reached sim ultaneously , as they imp ede each other. F or example, accurate and ex- pressiv e mo dels tend to b e more complex, which will b e more costly . Consequen tly , this leads to the heterogeneit y and diversit y of the av ailable approac hes in net work p erformance mo deling’s SotA. On the one hand, such heterogeneit y allo ws the design of sp ecialized mo dels, allowing researc hers to optimize those prop erties that ma y be most relev an t to the scenario at hand. On the other hand, it also makes the comparison b et w een approac hes harder, which is then comp ounded by the limited av ailability of public net- w orking datasets. W e ha ve also discussed other open problems in net work p erformance mo deling. First, w e hav e the issue of ever-ev olving net works, a problem p osed ov er 20 y ears ago, whic h still c hallenges the abilit y to build mo dels applicable to future net w orks. Second, the ma jority of the identi- ﬁed netw ork p erformance mo dels rely on sim ulated data for their ev aluation, whic h ma y compromise their expected eﬀectiv eness when applied in real- w orld scenarios. Finally , w e ha ve iden tiﬁed p oten tially fruitful researc h directions that are just starting to b e explored. ML-based models and transfer learning are a promising approac h to address c hanging netw orks. Adv ances in PDES may result in feasible sim ulations of large net w ork top ologies. F urthermore, w e exp ect that other trends, lik e the increased demand for data cen ters, will shap e future research. A c knowledgmen ts This publication is part of the I+D+i pro ject titled BLOSSOMS, gran t PID2024-158530OB-I00, funded by MICIU/AEI/10.13039/501100011033/ and b y ERDF/EU. This w ork is also partially funded by the Catalan Institution for Research and Adv anced Studies (ICREA). Carlos Güemes is funded by 64 the A GA UR-FI a juts (Gran t Ref. 2023 F-1 00083) Joan Oró of the Sec- retariat of Univ ersities and Research of the Departmen t of Research and Univ ersities of the Generalitat of Catalonia and the Europ ean So cial Plus F und. CRediT authorship contribution statemen t Carlos Güemes-P alau: Conceptualization, Inv estigation, Visualiza- tion, W riting - Original Draft, W riting - Review and Editing Miquel F erriol- Galmés: W riting - Original Draft, W riting - Review and Editing Jordi P aillisse-Vilano v a: W riting - Original Draft, W riting - Review and Edit- ing P ere Barlet-Ros: W riting - Review and Editing, Sup ervision Alb ert Cab ellos-Aparicio: Supervision, W riting - Review and Editing, F unding acquisition Declaration of comp eting interest Carlos Güemes-P alau rep orts ﬁnancial support was provided b y Spain Ministry of Science and Innov ation. Miquel F erriol-Galmés rep orts ﬁnancial supp ort w as pro vided by Spain Ministry of Science and Innov ation. Jordi P aillisse-Vilanov a rep orts ﬁnancial supp ort w as provided by Spain Ministry of Science and Innov ation. Pere Barlet-Ros rep orts ﬁnancial supp ort w as pro vided b y Spain Ministry of Science and Inno v ation. Alb ert Cab ellos- Aparicio rep orts ﬁnancial support was pro vided b y Spain Ministry of Science and Inno v ation. Carlos Güemes-P alau rep orts ﬁnancial supp ort was provided b y Generalitat de Cataluny a Ministry of Research and Universities. Carlos Güemes-P alau rep orts ﬁnancial supp ort w as pro vided b y Europ ean Social Plus F und. P ere Barlet-Ros rep orts ﬁnancial supp ort w as provided by Cata- lan Institution for Researc h and Adv anced Studies. Albert Cab ellos-Aparicio rep orts ﬁnancial supp ort was provided b y Catalan Institution for Research and Adv anced Studies. If there are other authors, they declare that they ha ve no kno wn competing ﬁnancial interests or p ersonal relationships that could ha ve app eared to inﬂuence the w ork rep orted in this pap er. Data a v ailability No data w as used for the research describ ed in the article. 65 References [1] J. P edro, J. San tos, J. Pires, Performance ev aluation of integrated otn/dwdm netw orks with single-stage m ultiplexing of optical c hannel data units, in: 2011 13th International Conference on T ransparen t Op- tical Net works, 2011, pp. 1–4. doi:10.1109/ICTON.2011.5970940. [2] Ab eliene netw ork [arc hived in wa ybac k machine]. URL https://web.archive.org/web/20120324103518/http: //www.internet2.edu/pubs/200502- IS- AN.pdf [3] IEA, Energy and AI (Jul 2025). URL https://www.iea.org/reports/energy- and- ai [4] J. P op oola, R. A. Ipin y omi, Empirical P erformance of W eibull Self- Similar T ele-traﬃc Mo del, In ternational Journal of Engineering and Applied Sciences 4 (8) (3 2017). [5] M. Alasmar, R. Clegg, N. Zakhleniuk, G. Parisis, Internet T raf- ﬁc V olumes are Not Gaussian—They are Log-Normal: An 18-Y ear Longitudinal Study With Implications for Mo delling and Prediction, IEEE/A CM T ransactions on Netw orking 29 (3) (2021) 1266–1279. doi:10.1109/TNET.2021.3059542. URL https://ieeexplore.ieee.org/document/9361437/ [6] J. R. Jackson, Jobshop-lik e Queueing Systems, Managemen t Science 10 (1) (1963) 131–142. URL http://www.jstor.org/stable/2627213 [7] R. Cruz, A calculus for net w ork delay . I. Net w ork elements in isola- tion, IEEE T ransactions on Information Theory 37 (1) (1991) 114–131. doi:10.1109/18.61109. URL http://ieeexplore.ieee.org/document/61109/ [8] R. Cruz, A calculus for netw ork delay . I I. Net work analysis, IEEE T ransactions on Information Theory 37 (1) (1991) 132–141. doi:10.1109/18.61110. URL http://ieeexplore.ieee.org/document/61110/ [9] Q. Mao, F. Hu, Q. Hao, Deep learning for in telligen t wireless net works: A comprehensive surv ey , IEEE Communications Surveys & T utorials 20 (4) (2018) 2595–2621. doi:10.1109/COMST.2018.2846401. 66 [10] Y. Shi, L. Lian, Y. Shi, Z. W ang, Y. Zhou, L. F u, L. Bai, J. Zhang, W. Zhang, Machine learning for large-scale optimization in 6g wireless net works, IEEE Comm unications Surv eys & T utorials 25 (4) (2023) 2088–2132. doi:10.1109/COMST.2023.3300664. [11] R. V erdecchia, L. Scommegna, B. Picano, M. Becattini, E. Vicario, Net work Digital T wins: A Systematic Review, IEEE Access 12 (2024) 145400–145416. doi:10.1109/A CCESS.2024.3453034. [12] M. Fidler, Survey of deterministic and sto c hastic service curv e mo dels in the netw ork calculus, IEEE Communications Surv eys & T utorials 12 (1) (2010) 59–86. doi:10.1109/SUR V.2010.020110.00019. [13] Y. Jiang, Y. Liu, Stochastic Netw ork Calculus, 1st Edition, Springer Publishing Compan y , Incorp orated, 2008. [14] W. Jiang, Graph-based deep learning for communication net- w orks: A surv ey , Computer Communications 185 (2022) 40–54. doi:h ttps://doi.org/10.1016/j.comcom.2021.12.015. URL https://www.sciencedirect.com/science/article/pii/ S0140366421004874 [15] Q. Zhang, K. K. W. Ng, C. Kazer, S. Y an, J. Sedoc, V. Liu, Mim- icNet: fast p erformance estimates for data center net works with ma- c hine learning, in: Pro ceedings of the 2021 ACM SIGCOMM 2021 Con- ference, SIGCOMM ’21, Association for Computing Mac hinery , New Y ork, NY, USA, 2021, pp. 287–304. doi:10.1145/3452296.3472926. URL https://doi.org/10.1145/3452296.3472926 [16] Q. Y ang, X. Peng, L. Chen, L. Liu, J. Zhang, H. Xu, B. Li, G. Zhang, DeepQueueNet: to wards scalable and generalized net work p erformance estimation with pack et-lev el visibility , in: Proceedings of the A CM SIGCOMM 2022 Conference, SIGCOMM ’22, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2022, pp. 441–457. doi:10.1145/3544216.3544248. URL https://doi.org/10.1145/3544216.3544248 [17] M. F erriol-Galmés, J. P aillisse, J. Suárez-V arela, K. Rusek, S. Xiao, X. Shi, X. Cheng, P . Barlet-Ros, A. Cab ellos-Aparicio, RouteNet-F ermi: Net w ork Mo deling With Graph Neural Netw orks, 67 IEEE/A CM T ransactions on Netw orking 31 (6) (2023) 3080–3095. doi:10.1109/TNET.2023.3269983. [18] K. Gao, L. Chen, D. Li, V. Liu, X. W ang, R. Zhang, L. Lu, DONS: F ast and Aﬀordable Discrete Even t Net w ork Sim ulation with Automatic P arallelization, in: Pro ceedings of the A CM SIGCOMM 2023 Con- ference, ACM SIGCOMM ’23, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2023, pp. 167–181. doi:10.1145/3603269.3604844. URL https://doi.org/10.1145/3603269.3604844 [19] V. Arun, M. T. Arashlo o, A. Saeed, M. Ali zadeh, H. Balakrishnan, T ow ard formally verifying congestion control behavior, in: Proceedings of the 2021 ACM SIGCOMM 2021 Conference, A CM, New Y ork, NY, USA, 2021, pp. 1–16. doi:10.1145/3452296.3472912. [20] M. T. Arashlo o, R. Bec kett, R. Agarwal, F ormal Metho ds for Net w ork P erformance Analysis, in: 20th USENIX Symp osium on Net w ork ed Systems Design and Implementation (NSDI 23), USENIX Asso ciation, Boston, MA, 2023, pp. 645–661. URL https://www.usenix.org/conference/nsdi23/ presentation/tahmasbi [21] Y. Zh u, H. Eran, D. Firestone, C. Guo, M. Lipsh teyn, Y. Liron, J. P ad- h ye, S. Raindel, M. H. Y ahia, M. Zhang, Congestion control for large- scale rdma deplo yments, SIGCOMM Comput. Comm un. Rev. 45 (4) (2015) 523–536. doi:10.1145/2829988.2787484. URL https://doi.org/10.1145/2829988.2787484 [22] T. N. Kipf, M. W elling, Semi-sup ervised classiﬁcation with graph con- v olutional netw orks, arXiv preprin t arXiv:1609.02907 (2016). [23] Y. Li, R. Zemel, M. Bro c ksc hmidt, D. T arlo w, Gated graph sequence neural netw orks, in: Pro ceedings of ICLR’16, proceedings of iclr’16 Edition, 2016. URL https://www.microsoft.com/en- us/research/publication/ gated- graph- sequence- neural- networks/ [24] F. Scarselli, M. Gori, A. C. T soi, M. Hagen buchner, G. Monfardini, The graph neural netw ork model, IEEE T ransactions on Neural Netw orks 20 (1) (2009) 61–80. doi:10.1109/TNN.2008.2005605. 68 [25] S. Ho c hreiter, J. S c hmidh ub er, Long short-term memory , Neural Computation 9 (8) (1997) 1735– 1780. arXiv:h ttps://direct.mit.edu/neco/article- p df/9/8/1735/813796/neco.1997.9.8.1735.pdf, doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735 [26] J. Gilmer, S. S. Sc ho enholz, P . F. Riley , O. Viny als, G. E. Dahl, Neural message passing for quan tum c hemistry , in: D. Precup, Y. W. T eh (Eds.), Proceedings of the 34th In ternational Conference on Mac hine Learning, V ol. 70 of Pro ceedings of Machine Learning Research, PMLR, 2017, pp. 1263–1272. URL https://proceedings.mlr.press/v70/gilmer17a.html [27] M. Garnelo, J. Sch w arz, D. Rosen baum, F. Viola, D. J. Rezende, S. M. A. Eslami, Y. W. T eh, Neural pro cesses (2018). URL [28] The Net work Simulator - ns-2 (7 1995). URL https://www.isi.edu/websites/nsnam/ns/ [29] G. F. Riley , T. R. Henderson, The ns-3 Netw ork Simulator, in: Mo d- eling and T o ols for Net work Sim ulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 15–34. doi:10.1007/978-3-642-12331-3_2. URL http://link.springer.com/10.1007/978- 3- 642- 12331- 3_2 [30] J.-Y. Le Boudec, P . Thiran (Eds.), Netw ork Calculus, V ol. 2050 of Lecture Notes in Computer Science, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2001. doi:10.1007/3-540-45318-0. URL http://link.springer.com/10.1007/3- 540- 45318- 0 [31] M. Fidler, Extending the Net work Calculus Pa y Bursts Only Once Principle to Aggregate Scheduling, 2003, pp. 19–34. doi:10.1007/3- 540-36480-3_2. URL http://link.springer.com/10.1007/3- 540- 36480- 3_2 [32] M. Sc hlic h tkrull, T. N. Kipf, P . Blo em, R. V an Den Berg, I. Titov, M. W elling, Mo deling relational data with graph conv olutional net- w orks, in: The seman tic web: 15th in ternational conference, ESWC 69 2018, Heraklion, Crete, Greece, June 3–7, 2018, pro ceedings 15, Springer, 2018, pp. 593–607. [33] Resilien t ov erla y net works (2001). URL http://nms.lcs.mit.edu/ron/ [34] M. Alizadeh, A. Green b erg, D. A. Maltz, J. Padh y e, P . P atel, B. Prab- hak ar, S. Sengupta, M. Sridharan, Data cen ter tcp (dctcp), in: Pro- ceedings of the A CM SIGCOMM 2010 Conference, SIGCOMM ’10, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2010, p. 63–74. doi:10.1145/1851182.1851192. URL https://doi.org/10.1145/1851182.1851192 [35] S. Bondorf, P . Nik olaus, J. B. Schmitt, Qualit y and Cost of Deter- ministic Net work Calculus: Design and Ev aluation of an Accurate and F ast Analysis, Pro ceedings of the A CM on Measurement and Analysis of Computing Systems 1 (1) (2017) 1–34. doi:10.1145/3084453. URL https://dl.acm.org/doi/10.1145/3084453 [36] A. V arga, OMNeT++, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Simulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 35–59. doi:10.1007/978-3- 642-12331-3_3. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_3 [37] Q. He, C. Do vrolis, M. Ammar, On the predictability of large trans- fer TCP throughput, in: Proceedings of the 2005 conference on Ap- plications, technologies, architectures, and proto cols for computer comm unications, ACM, New Y ork, NY, USA, 2005, pp. 145–156. doi:10.1145/1080091.1080110. URL https://dl.acm.org/doi/10.1145/1080091.1080110 [38] B. K. de Aquino Afonso, L. Berton, QT-Routenet: Impro v ed GNN generalization to larger 5G netw orks by ﬁne-tuning predictions from queueing theory, ITU Journal on F uture and Ev olving T ec hnologies 3 (2) (2022) 134–141. doi:10.52953/FBRB3688. URL https://www.itu.int/pub/S- JNL- VOL3.ISSUE2- 2022- A12 [39] M. Helm, G. Carle, Predicting Latency Quantiles using Net work Calculus-assisted GNNs, in: Pro ceedings of the 2nd on Graph 70 Neural Net working W orkshop 2023, GNNet ’23, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2023, pp. 13–18. doi:10.1145/3630049.3630173. URL https://doi.org/10.1145/3630049.3630173 [40] F. Y. Y an, J. Ma, G. D. Hill, D. Ragha v an, R. S. W ahb y , P . Levis, K. Winstein, Pan theon: the training ground for In ternet congestion- con trol research, in: 2018 USENIX Annual T ec hnical Conference (USENIX A TC 18), USENIX Asso ciation, Boston, MA, 2018, pp. 731– 743. URL https://www.usenix.org/conference/atc18/presentation/ yan- francis [41] S. Ashok, S. Tiwari, N. Natara jan, V. N. Padmanabhan, S. Sellaman- ic k am, Data-Driven Netw ork Path Simulation with iBo x, Pro ceedings of the A CM on Measuremen t and Analysis of Computing Systems 6 (1) (2022) 1–26. doi:10.1145/3508026. URL https://dl.acm.org/doi/10.1145/3508026 [42] Y u Gu, Y ong Liu, D. T owsley , On in tegrating ﬂuid mo dels with pac ket sim ulation, in: IEEE INF OCOM 2004, V ol. 4, IEEE, 2004, pp. 2856– 2866. doi:10.1109/INF COM.2004.1354702. URL http://ieeexplore.ieee.org/document/1354702/ [43] A. Alomar, P . Hamadanian, A. Nasr-Esfahany , A. Agarwal, M. Al- izadeh, D. Shah, CausalSim: A Causal F ramew ork for Unbiased T race-Driven Simulation, in: 20th USENIX Symp osium on Net work ed Systems Design and Implementation (NSDI 23), USENIX Asso ciation, Boston, MA, 2023, pp. 1115–1147. URL https://www.usenix.org/conference/nsdi23/ presentation/alomar [44] J. Späth, M. Helm, B. Jaeger, G. Carle, Sim2HW: Mo deling La- tency Oﬀset Bet w een Netw ork Sim ulations and Hardw are Measure- men ts, in: Pro ceedings of the 3rd GNNet W orkshop on Graph Neural Net working W orkshop, A CM, New Y ork, NY, USA, 2024, pp. 20–26. doi:10.1145/3694811.3697820. URL https://dl.acm.org/doi/10.1145/3694811.3697820 71 [45] M. Guizani, A. Ra yes, B. Khan, A. Al-F uqaha, Netw ork Mo deling and Sim ulation, Wiley , 2010. doi:10.1002/9780470515211. URL https://onlinelibrary.wiley.com/doi/book/10.1002/ 9780470515211 [46] S. Kesha v, REAL: A Netw ork Simulator, T ech. Rep. UCB/CSD-88-472 (12 1988). URL http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/ 5316.html [47] X. Chang, Netw ork simulations with OPNET, in: Pro ceedings of the 31st conference on Winter simulation Sim ulation—a bridge to the fu- ture - WSC ’99, A CM Press, New Y ork, New Y ork, USA, 1999, pp. 307–314. doi:10.1145/324138.324232. [48] Bültmann Daniel, M. Mühleisen, Max Sebastian, Op en WNS, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Sim ulation, Springer Berlin Heidelberg, Berlin, Heidelb erg, 2010, pp. 69–81. doi:10.1007/978-3-642-12331-3_5. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_5 [49] J. Sommer, J. Sc harf, IKR Simulation Library , in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Sim u- lation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 61–68. doi:10.1007/978-3-642-12331-3_4. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_4 [50] G. Casale, G. Serazzi, Quan titative system ev aluation with Ja v a mo d- eling to ols, in: Pro ceedings of the 2nd A CM/SPEC In ternational Con- ference on P erformance engineering, ACM, New Y ork, NY, USA, 2011, pp. 449–454. doi:10.1145/1958746.1958813. URL https://dl.acm.org/doi/10.1145/1958746.1958813 [51] N. Bink ert, B. Bec kmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Ho wer, T. Krishna, S. Sardashti, R. Sen, K. Sew ell, M. Shoaib, N. V aish, M. D. Hill, D. A. W o o d, The gem5 simula- tor, ACM SIGAR CH Computer Architecture News 39 (2) (2011) 1–7. doi:10.1145/2024716.2024718. 72 [52] M. Alian, D. Kim, N. Sung Kim, p d-gem5: Simulation Infrastructure for P arallel/Distributed Computer Systems, IEEE Computer Architec- ture Letters 15 (1) (2016) 41–44. doi:10.1109/LCA.2015.2438295. URL http://ieeexplore.ieee.org/document/7114236/ [53] H. Li, J. Li, A. Kaufmann, SimBricks: end-to-end net w ork system ev aluation with mo dular simulation, in: Pro ceedings of the ACM SIG- COMM 2022 Conference, ACM, New Y ork, NY, USA, 2022, pp. 380– 396. doi:10.1145/3544216.3544253. URL https://dl.acm.org/doi/10.1145/3544216.3544253 [54] H. Li, P . Balasubramanian, M. Meiers, J. Li, A. Kaufmann, Split- Sim: Large-Scale Simulations for Ev aluating Netw ork Systems Re- searc h (2024). URL [55] S. Bai, H. Zheng, C. Tian, X. W ang, C. Liu, X. Jin, F. Xiao, Q. Xiang, W. Dou, G. Chen, Unison: A Parallel-Eﬃcien t and User-T ransparen t Net work Sim ulation Kernel, in: Pro ceedings of the Nineteenth Euro- p ean Conference on Computer Systems, A CM, New Y ork, NY, USA, 2024, pp. 115–131. doi:10.1145/3627703.3629574. URL https://dl.acm.org/doi/10.1145/3627703.3629574 [56] S. Khashab, H. Sezhiyan, R. Abb oud, A. Normatov, S. Kaestle, E. Bar- Ilan, M. Nassar, O. Shabtai, W. Bai, M. Kadosh, J. Xing, M. Silb er- stein, T. E. Ng, A. Chen, Nsx: Large-scale net work sim ulation on an ai serv er, in: Proceedings of the 2nd W orkshop on Net works for AI Com- puting, NAIC ’25, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2025, p. 19–25. doi:10.1145/3748273.3749199. URL https://doi.org/10.1145/3748273.3749199 [57] K. Zhao, P . Go yal, M. Alizadeh, T. E. Anderson, Scalable T ail Latency Estimation for Data Cen ter Netw orks (2022). URL [58] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, RouteNet-Gauss: Hardw are-Enhanced Netw ork Mo deling with Machine Learning (1 2025). 73 [59] S. Floyd, V. Paxson, Diﬃculties in simulating the Internet, IEEE/A CM T ransactions on Netw orking 9 (4) (2001) 392–403. doi:10.1109/90.944338. [60] V. P axson, S. Flo yd, Wh y w e don’t kno w ho w to sim ulate the in- ternet, in: Pro ceedings of the 29th Conference on Win ter Simula- tion, WSC ’97, IEEE Computer So ciet y , USA, 1997, p. 1037–1044. doi:10.1145/268437.268737. URL https://doi.org/10.1145/268437.268737 [61] R. F ujimoto, Parallel and distributed sim ulation systems, in: Proceed- ing of the 2001 Win ter Simulation Conference (Cat. No.01CH37304), IEEE, 2001, pp. 147–157. doi:10.1109/WSC.2001.977259. URL http://ieeexplore.ieee.org/document/977259/ [62] G. Kunz, Parallel Discrete Ev ent Sim ulation, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Netw ork Sim- ulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 121– 131. doi:10.1007/978-3-642-12331-3_8. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_8 [63] S. Jafer, Q. Liu, G. W ainer, Synchronization methods in parallel and distributed discrete-ev ent sim ulation, Sim- ulation Mo delling Practice and Theory 30 (2013) 54–73. doi:h ttps://doi.org/10.1016/j.simpat.2012.08.003. URL https://www.sciencedirect.com/science/article/pii/ S1569190X12001244 [64] H. K oba yashi, A. K onheim, Queueing Mo dels for Computer Comm uni- cations System Analysis, IEEE T ransactions on Communications 25 (1) (1977) 2–29. doi:10.1109/TCOM.1977.1093702. URL http://ieeexplore.ieee.org/document/1093702/ [65] W. ch ung Poon, K. tung Lo, A reﬁned version of m/g / ∞ processes for mo delling vbr video traﬃc, Computer Comm unications 24 (2001) 1105–1114. doi:10.1016/S0140-3664(00)00325-X. [66] F. Fiorini, M. Co co ccioni, M. Pagano, Quantitativ e dela y analysis of gi/g/1 queues with hea vy-tailed traﬃc b y means of alpha the- ory , Computer Net w orks 269 (2025) 111394, vER Y math ha v e. Ba- sically , propases a metho dology based on nonstandard analysis to 74 deriv e an approximation of a GI/G/1 queue. Can b e used for ob- tainin estimated mean delay , but also upp er and low er b ounds. doi:10.1016/j.comnet.2025.111394. [67] M. Garetto, R. Lo Cigno, M. Meo, M. Ajmone Marsan, A detailed and accurate closed queueing netw ork mo del of many interacting TCP ﬂo ws, in: Pro ceedings IEEE INF OCOM 2001. Conference on Computer Comm unications. T w entieth Ann ual Joint Conference of the IEEE Computer and Comm unications So ciet y (Cat. No.01CH37213), V ol. 3, IEEE, 2001, pp. 1706–1715. doi:10.1109/INF COM.2001.916668. URL http://ieeexplore.ieee.org/document/916668/ [68] M. Garetto, Renato Lo Cigno, M. Meo, M. Marsan, Closed queueing netw ork models of interacting long-lived TCP ﬂo ws, IEEE/A CM T ransactions on Netw orking 12 (2) (2004) 300–311. doi:10.1109/TNET.2004.826297. URL https://ieeexplore.ieee.org/document/1288134/ [69] M. Y u, M. Zhou, A P erformance Mo deling Sc heme for Mul- tistage Switch Net works With Phase-Type and Bursty T raﬃc, IEEE/A CM T ransactions on Netw orking 18 (4) (2010) 1091–1104. doi:10.1109/TNET.2009.2036437. URL http://ieeexplore.ieee.org/document/5352328/ [70] T. Bonald, Comparison of TCP Reno and TCP V egas: eﬃ- ciency and fairness, Performance Ev aluation 36-37 (1999) 307–332. doi:10.1016/S0166-5316(99)00037-1. [71] F. Baccelli, D. Hong, AIMD, fairness and frac tal scaling of TCP traﬃc, in: Pro ceedings.T w ent y-First Ann ual Join t Conference of the IEEE Computer and Comm unications So cieties, V ol. 1, IEEE, New Y ork, 2002, pp. 229–238. doi:10.1109/INF COM.2002.1019264. URL http://ieeexplore.ieee.org/document/1019264/ [72] S. Bohacek, A sto c hastic mo del of TCP and fair video trans- mission, in: IEEE INF OCOM 2003. T w ent y-second Annual Join t Conference of the IEEE Computer and Communications So ci- eties (IEEE Cat. No.03CH37428), IEEE, 2003, pp. 1134–1144. doi:10.1109/INF COM.2003.1208950. URL https://ieeexplore.ieee.org/document/1208950/ 75 [73] V. Misra, W.-B. Gong, D. T owsley , Fluid-based analysis of a net work of A QM routers supp orting TCP ﬂo ws with an application to RED, in: Pro ceedings of the conference on Applications, T ec hnologies, Architec- tures, and Proto cols for Computer Comm unication, ACM, New Y ork, NY, USA, 2000, pp. 151–160. doi:10.1145/347059.347421. URL https://dl.acm.org/doi/10.1145/347059.347421 [74] Y. Liu, F. Lo Presti, V. Misra, D. T owsley , Y. Gu, Fluid mo dels and solutions for large-scale IP netw orks, in: Pro ceedings of the 2003 ACM SIGMETRICS in ternational conference on Measuremen t and mo deling of computer systems, ACM, New Y ork, NY, USA, 2003, pp. 91–101. doi:10.1145/781027.781039. URL https://dl.acm.org/doi/10.1145/781027.781039 [75] F. Baccelli, D. Hong, Flow lev el sim ulation of large IP net- w orks, in: IEEE INFOCOM 2003. T wen t y-second Annual Joint Conference of the IEEE Computer and Communications So cieties (IEEE Cat. No.03CH37428), V ol. 3, IEEE, 2003, pp. 1911–1921. doi:10.1109/INF COM.2003.1209213. URL http://ieeexplore.ieee.org/document/1209213/ [76] S. Bohacek, J. P . Hespanha, J. Lee, K. Obraczk a, A h ybrid sys- tems mo deling framework for fast and accurate simulation of data comm unication net works, in: Pro ceedings of the 2003 ACM SIG- METRICS international conference on Measurement and mo deling of computer systems, A CM, New Y ork, NY, USA, 2003, pp. 58–69. doi:10.1145/781027.781036. URL https://dl.acm.org/doi/10.1145/781027.781036 [77] J. Lee, S. Bohacek, J. P . Hespanha, K. Obraczk a, Modeling Comm u- nication Netw orks With Hybrid Systems, IEEE/ACM T ransactions on Net working 15 (3) (2007) 630–643. doi:10.1109/TNET.2007.893090. URL http://ieeexplore.ieee.org/document/4237147/ [78] M. Marsan, M. Garetto, P . Giaccone, E. Leonardi, E. Sc hiattarella, A. T arello, Using partial diﬀerential equations to mo del TCP mice and elephan ts in large IP net w orks, in: IEEE INF OCOM 2004, V ol. 4, IEEE, 2004, pp. 2821–2832. doi:10.1109/INF COM.2004.1354699. URL http://ieeexplore.ieee.org/document/1354699/ 76 [79] F. Baccelli, G. Caroﬁglio, M. Piancino, Stochastic Analysis of Scal- able TCP, in: IEEE INFOCOM 2009, IEEE, 2009, pp. 19–27. doi:10.1109/INF COM.2009.5061902. URL https://ieeexplore.ieee.org/document/5061902/ [80] G. Caroﬁglio, L. Muscariello, On the Impact of TCP and P er-Flo w Sc heduling on In ternet P erformance, in: 2010 Proceedings IEEE IN- F OCOM, IEEE, 2010, pp. 1–9. doi:10.1109/INF COM.2010.5461973. URL http://ieeexplore.ieee.org/document/5461973/ [81] T. Czac hórski, Queueing Mo dels for Performance Ev aluation of Computer Net w orks—T ransien t State Analysis, 2015, pp. 51–80. doi:10.1007/978-3-319-12148-2_4. URL https://link.springer.com/10.1007/978- 3- 319- 12148- 2_4 [82] E. Knigh tly , Hui Zhang, D-BIND: an accurate traﬃc mo del for pro- viding QoS guaran tees to VBR traﬃc, IEEE/A CM T ransactions on Net working 5 (2) (1997) 219–231. doi:10.1109/90.588085. URL http://ieeexplore.ieee.org/document/588085/ [83] R. Agra wal, R. Cruz, C. Okino, R. Ra jan, P erformance b ounds for ﬂo w con trol proto cols, IEEE/A CM T ransactions on Net working 7 (3) (1999) 310–323. doi:10.1109/90.779197. [84] K. Lampk a, S. Bondorf, J. Schmitt, Ac hieving Eﬃciency without Sac- riﬁcing Mo del A ccuracy: Net work Calculus on Compact Domains, in: 2016 IEEE 24th In ternational Symp osium on Mo deling, Analysis and Simulation of Computer and T elecomm unication Systems (MAS- COTS), IEEE, 2016, pp. 313–318. doi:10.1109/MASCOTS.2016.9. URL http://ieeexplore.ieee.org/document/7774596/ [85] J. B. Sc hmitt, F. A. Zdarsky , M. Fidler, Delay Bounds under Arbitrary Multiplexing: When Net work Calculus Leav es Y ou in the Lurch..., in: IEEE INF OCOM 2008 - The 27th Conference on Computer Comm uni- cations, IEEE, 2008, pp. 1669–1677. doi:10.1109/INF OCOM.2008.228. URL http://ieeexplore.ieee.org/document/4509823/ [86] A. Bouillard, E. Thierry , Tight p erformance b ounds in the worst-case analysis of feed-forward net works, Discrete Ev en t Dynamic Systems 77 26 (3) (2016) 383–411. doi:10.1007/s10626-015-0213-2. URL http://link.springer.com/10.1007/s10626- 015- 0213- 2 [87] A. Kiefer, N. Gollan, J. B. Sc hmitt, Searching for Tigh t Performance Bounds in F eed-F orward Netw orks, in: B. Müller-Clostermann, K. Ech- tle, Rathgeb Erwin P (Eds.), Measuremen t, Modelling, and Ev alua- tion of Computing Systems and Dep endability and F ault T olerance, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 227–241. doi:10.1007/978-3-642-12104-3_18. URL http://link.springer.com/10.1007/978- 3- 642- 12104- 3_18 [88] Cheng-Shang Chang, Stability , queue length, and delay of deterministic and sto chastic queueing net w orks, IEEE T ransactions on Automatic Con trol 39 (5) (1994) 913–931. doi:10.1109/9.284868. [89] Y. Jiang, A basic sto chastic net work calculus, A CM SIG- COMM Computer Communication Review 36 (4) (2006) 123–134. doi:10.1145/1151659.1159929. [90] K. Angrishi, An end-to-end sto c hastic net w ork calculus with ef- fectiv e bandwidth and eﬀective capacity , Computer Net works 57 (2013) 78–84, sto chastic NC, mak es b ound tigh ter based on es- timating the eﬀective bandwidth. Ho wev er, no ev aluation added. doi:10.1016/j.comnet.2012.09.003. [91] T. Lakshman, U. Madhow, The p erformance of TCP/IP for net works with high bandwidth-dela y pro ducts and random loss, IEEE/A CM T ransactions on Netw orking 5 (3) (1997) 336–350. doi:10.1109/90.611099. URL http://ieeexplore.ieee.org/document/611099/ [92] J. Padh y e, V. Firoiu, D. T o wsley , J. Kurose, Mo deling TCP Reno p erformance: a simple mo del and its empirical v alidation, IEEE/A CM T ransactions on Netw orking 8 (2) (2000) 133–145. doi:10.1109/90.842137. URL http://ieeexplore.ieee.org/document/842137/ [93] R. Mazumdar, L. Mason, C. Douligeris, F airness in net w ork optimal ﬂo w con trol: optimality of pro duct forms, IEEE T ransactions on Com- m unications 39 (5) (1991) 775–782. doi:10.1109/26.87140. 78 [94] F. P . Kelly , A. K. Maullo o, D. K. H. T an, Rate con trol for commu- nication netw orks: shado w prices, proportional fairness and stabilit y , Journal of the Operational Research Society 49 (3) (1998) 237–252. doi:10.1057/palgra ve.jors.2600523. URL https://www.tandfonline.com/doi/full/10.1057/ palgrave.jors.2600523 [95] W. WHITT, Approximations for the gi/g/m queue, Pro- duction and Op erations Management 2 (2) (1993) 114– 161. arXiv:h ttps://doi.org/10.1111/j.1937-5956.1993.tb00094.x, doi:10.1111/j.1937-5956.1993.tb00094.x. URL https://doi.org/10.1111/j.1937- 5956.1993.tb00094.x [96] K. Chandy , The analysis and solutions for general queueing netw orks, in: Proceedings of the Sixth An ual Princeton Conference on Informa- tion Sciences and Systems, 1972, pp. 224–228. [97] M. Reiser, S. S. Lav en b erg, Mean-v alue analysis of closed mul- tic hain queuing net works, J. A CM 27 (2) (1980) 313–322. doi:10.1145/322186.322195. URL https://doi.org/10.1145/322186.322195 [98] K. M. Chandy , C. H. Sauer, Computational algorithms for pro duct form queueing netw orks, Communications of the ACM 23 (10) (1980) 573–583. doi:10.1145/359015.359020. URL https://dl.acm.org/doi/10.1145/359015.359020 [99] A. Charny , J.-Y. L. Boudec, Dela y Bounds in a Net work with Aggregate Sc heduling, in: Cro wcroft Jon, J. Rob erts, Smirno v Mikhail I (Eds.), Qualit y of F uture In ternet Services, Springer Berlin Heidelberg, Berlin, Heidelb erg, 2000, pp. 1–13. [100] D. Starobinski, M. Karp o vsky , L. Zakrevski, Application of net work calculus to general topologies using turn-prohibition, IEEE/A CM T ransactions on Netw orking 11 (3) (2003) 411–421. doi:10.1109/TNET.2003.813040. URL http://ieeexplore.ieee.org/document/1208302/ [101] M. Fidler, A. Rizk, A Guide to the Sto c hastic Net work Calculus, IEEE Communications Surv eys & T utorials 17 (1) (2015) 92–105. 79 doi:10.1109/COMST.2014.2337060. URL https://ieeexplore.ieee.org/document/6868978/ [102] F. Ciucu, J. Schmitt, P ersp ectiv es on net work calculus, ACM SIG- COMM Computer Communication Review 42 (4) (2012) 311–322. doi:10.1145/2377677.2377747. URL https://dl.acm.org/doi/10.1145/2377677.2377747 [103] M. Mirza, J. Sommers, P . Barford, X. Zh u, A Machine Learning Ap- proac h to TCP Throughput Prediction, IEEE/A CM T ransactions on Net working 18 (4) (2010) 1026–1039. doi:10.1109/TNET.2009.2037812. URL https://ieeexplore.ieee.org/document/5378489 [104] M. B. T ariq, K. Bhandank ar, V. V alancius, A. Zeitoun, N. F eam- ster, M. Ammar, Answ ering “What-If ” Deplo yment and Conﬁgu- ration Questions With WISE: T ec hniques and Deplo ymen t Exp eri- ence, IEEE/A CM T ransactions on Netw orking 21 (1) (2013) 1–13. doi:10.1109/TNET.2012.2230448. [105] M. Helm, F. Wiedner, G. Carle, Flo w-lev el T ail Latency Estimation and V eriﬁcation based on Extreme V alue Theory , in: 2022 18th In- ternational Conference on Net work and Service Management (CNSM), IEEE, 2022, pp. 359–363. doi:10.23919/CNSM55787.2022.9964525. URL https://ieeexplore.ieee.org/document/9964525/ [106] D. Monaco, A. Sacco, D. Spina, F. Strada, A. Bottino, T. Cerquitelli, G. Marchetto, Real-time latency prediction for cloud gaming applications, Computer Netw orks 264 (2025) 111235. doi:10.1016/j.comnet.2025.111235. URL https://linkinghub.elsevier.com/retrieve/pii/ S1389128625002038 [107] A. Mestres, E. Alarcón, Y. Ji, A. Cab ellos-Aparicio, Understanding the Mo deling of Computer Netw ork Delays using Neural Net w orks, in: Pro ceedings of the 2018 W orkshop on Big Data Analytics and Mac hine Learning for Data Comm unication Netw orks, A CM, New Y ork, NY, USA, 2018, pp. 46–52. doi:10.1145/3229607.3229613. URL https://dl.acm.org/doi/10.1145/3229607.3229613 80 [108] F. Krasniqi, J. Elias, J. Leguay , A. E. C. Redondi, End-to-end Dela y Prediction Based on T raﬃc Matrix Sampling, in: IEEE INF OCOM 2020 - IEEE Conference on Computer Comm unica- tions W orkshops (INF OCOM WKSHPS), IEEE, 2020, pp. 774–779. doi:10.1109/INF OCOMWKSHPS50562.2020.9162765. URL https://ieeexplore.ieee.org/document/9162765/ [109] K. Rusek, J. Suárez-V arela, A. Mestres, P . Barlet-Ros, A. Cab ellos- Aparicio, Unv eiling the p otential of Graph Neural Net works for net work mo deling and optimization in SDN, in: Pro ceedings of the 2019 A CM Symp osium on SDN Researc h, A CM, New Y ork, NY, USA, 2019, pp. 140–151. doi:10.1145/3314148.3314357. URL https://dl.acm.org/doi/10.1145/3314148.3314357 [110] J. Suárez-V arela, S. Carol-Bosc h, K. Rusek, P . Almasan, M. Arias, P . Barlet-Ros, A. Cab ellos-Aparicio, Challenging the generaliza- tion capabilities of Graph Neural Net works for netw ork mo del- ing, in: Pro ceedings of the ACM SIGCOMM 2019 Conference P osters and Demos, ACM, New Y ork, NY, USA, 2019, pp. 114–115. doi:10.1145/3342280.3342327. URL https://dl.acm.org/doi/10.1145/3342280.3342327 [111] K. Rusek, J. Suárez-V arela, P . Almasan, P . Barlet-Ros, A. Cabellos- Aparicio, RouteNet: Lev eraging Graph Neural Net w orks for Net- w ork Mo deling and Optimization in SDN, IEEE Journal on Selected Areas in Comm unications 38 (10) (2020) 2260–2270. doi:10.1109/JSA C.2020.3000405. [112] A. Badia-Samp era, J. Suárez-V arela, P . Almasan, K. Rusek, P . Barlet- Ros, A. Cab ellos-Aparicio, T ow ards more realistic net work mod- els based on Graph Neural Netw orks, in: Pro ceedings of the 15th In ternational Conference on emerging Netw orking EXp erimen ts and T echnologies, A CM, New Y ork, NY, USA, 2019, pp. 14–16. doi:10.1145/3360468.3366773. URL https://dl.acm.org/doi/10.1145/3360468.3366773 [113] M. F erriol-Galmés, K. Rusek, J. Suárez-V arela, S. Xiao, X. Shi, X. Cheng, B. W u, P . Barlet-Ros, A. Cab ellos-Aparicio, 81 RouteNet-Erlang: A Graph Neural Net w ork for Net w ork P er- formance Ev aluation, in: IEEE INFOCOM 2022 - IEEE Con- ference on Computer Communications, 2022, pp. 2018–2027. doi:10.1109/INF OCOM48880.2022.9796944. [114] B. K. Dhamala, B. R. Daw adi, P . Manzoni, B. K. A chary a, P erfor- mance Ev aluation of Graph Neural Netw ork-Based RouteNet Mo del with Atten tion Mec hanism, F uture Internet 16 (4) (2024) 116. doi:10.3390/ﬁ16040116. [115] Cláudio Mo desto, Reb ecca Ab en-A thar, Andrey Silv a, Silvia Lins, Glauco Gon alv es, Aldebaro Klautau, Dela y estimation based on mul- tiple stage message passing with attention mec hanism using a real net- w ork comm unication dataset, ITU Journal on F uture and Evolving T echnologies 5 (4) (2024) 465–477. doi:10.52953/RBNE4256. URL https://www.itu.int/pub/S- JNL- VOL5.ISSUE4- 2024- A35 [116] Kaan A ykurt, Maximilian Stephan, Serkut A yv asik, Johannes Zerwas, W olfgang Kellerer, Digital t win opportunities with lev eraging graph neural net w orks on real net work data, ITU Journal on F uture and Ev olving T ec hnologies 5 (4) (2024) 458–464. doi:10.52953/ZOEM2142. URL https://www.itu.int/pub/S- JNL- VOL5.ISSUE4- 2024- A34 [117] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, W av elet-Enhanced Graph Neural Net works: T o wards Non-P arametric Netw ork T raﬃc Mo del- ing, in: Pro ceedings of the 3rd GNNet W orkshop on Graph Neural Net working W orkshop, A CM, New Y ork, NY, USA, 2024, pp. 14–19. doi:10.1145/3694811.3697823. [118] F. Geyer, DeepComNet: P erformance ev aluation of net work top ologies using graph-based deep learning, P erformance Ev aluation 130 (2019) 1–16. doi:10.1016/j.pev a.2018.12.003. URL https://www.sciencedirect.com/science/article/abs/ pii/S0166531618300944 [119] B. Jaeger, M. Helm, L. Sch w egmann, G. Carle, Mo deling TCP p erfor- mance using graph neural net works, in: Pro ceedings of the 1st Interna- tional W orkshop on Graph Neural Netw orking, ACM, New Y ork, NY, USA, 2022, pp. 18–23. doi:10.1145/3565473.3569190. 82 [120] T. Suzuki, Y. Y asuda, R. Nak amura, H. Ohsaki, On Estimat- ing Communication Delays using Graph Conv olutional Net works with Semi-Sup ervised Learning, in: 2020 International Conference on Information Net working (ICOIN), IEEE, 2020, pp. 481–486. doi:10.1109/ICOIN48656.2020.9016603. URL https://ieeexplore.ieee.org/document/9016603 [121] J. Liu, F. T ang, L. Chen, X. Li, J. Y u, Y. Zh u, Y. Y u, Y. Y ang, EA GLE: Heterogeneous GNN-based Net work P erfor- mance Analysis, in: 2023 IEEE/A CM 31st In ternational Sym- p osium on Qualit y of Service (IWQoS), IEEE, 2023, pp. 1–10. doi:10.1109/IW QoS57198.2023.10188804. URL https://ieeexplore.ieee.org/abstract/document/ 10188804 [122] S. Huang, Y. W ei, L. Peng, M. W ang, L. Hui, P . Liu, Z. Du, Z. Liu, Y. Cui, xNet: Mo deling Net w ork Performance With Graph Neural Net works, IEEE/A CM T ransactions on Net working 32 (2) (2024) 1753– 1767. doi:10.1109/TNET.2023.3329357. [123] B. Li, G. V erma, T. Eﬁmo v, A. Kumar, S. Segarra, GLANCE: Graph- based Learnable Digital T win for Communication Netw orks (2024). URL [124] H. Du, M. Li, FlowSeer: A Nov el F ramework for Gen- eralized Netw ork P erformance Estimation at Flo w Lev el, in: 2024 27th International Conference on Computer Supp orted Co- op erativ e W ork in Design (CSCWD), 2024, pp. 2834–2839. doi:10.1109/CSCWD61410.2024.10580262. [125] C. Li, A. A. Zabreyk o, A. Nasr-Esfahany , K. Zhao, P . Go y al, M. Al- izadeh, T. Anderson, m4: A Learned Flo w-level Net work Simulator (3 2025). URL [126] W. L. Hamilton, R. Ying, J. Lesko v ec, Inductive representation learn- ing on large graphs, in: Proceedings of the 31st In ternational Con- ference on Neural Information Processing Systems, NIPS’17, Curran Asso ciates Inc., Red Ho ok, NY, USA, 2017, p. 1025–1035. 83 [127] K. Cho, B. v an Merrienbo er, C. Gulcehre, D. Bahdanau, F. Bougares, H. Sc h wenk, Y. Bengio, Learning phrase represen tations using rnn enco der-deco der for statistical machine translation (2014). URL [128] M. W ang, Y. Cui, S. Xiao, X. W ang, D. Y ang, K. Chen, J. Zh u, Neural Net work Meets DCN: T raﬃc-driven T op ology A daptation with Deep Learning, Pro ceedings of the A CM on Measuremen t and Analysis of Computing Systems 2 (2) (2018) 1–25. doi:10.1145/3224421. URL https://dl.acm.org/doi/10.1145/3224421 [129] S. Xiao, D. He, Z. Gong, Deep-Q: T raﬃc-driven QoS Inference using Deep Generativ e Netw ork, in: Pro ceedings of the 2018 W orkshop on Net work Meets AI & ML - NetAI’18, A CM Press, New Y ork, New Y ork, USA, 2018, pp. 67–73. doi:10.1145/3229543.3229549. URL http://dl.acm.org/citation.cfm?doid=3229543.3229549 [130] D. P . Kingma, M. W elling, Auto-enco ding v ariational ba yes (2022). URL [131] A. Dietmüller, S. Ray , R. Jacob, L. V an b ev er, A new hop e for net work mo del generalization, in: Pro ceedings of the 21st A CM W orkshop on Hot T opics in Net works, HotNets ’22, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2022, pp. 152–159. doi:10.1145/3563766.3564104. URL https://doi.org/10.1145/3563766.3564104 [132] A. V aswani, N. Shazeer, N. P armar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. P olosukhin, A ttention is all y ou need (2023). URL [133] K. Hattori, T. K orik a w a, C. T ak asaki, Meta Learner-Based T rans- fer Learning: Bridging Simulation and Actual Router Metrics, in: 2024 IEEE 25th In ternational Conference on High Perfor- mance Switching and Routing (HPSR), IEEE, 2024, pp. 203–208. doi:10.1109/HPSR62440.2024.10635943. URL https://ieeexplore.ieee.org/document/10635943/ 84 [134] K. Hattori, T. K orik aw a, C. T ak asaki, Meta Learner-Based T ransfer Learning: Bridging Simulation and Actual Router Metrics, IEEE Ac- cess 13 (2025) 76085–76099. doi:10.1109/A CCESS.2025.3564954. [135] M. Happ, J. L. Du, M. Herlich, C. Maier, P . Dorﬁnger, J. Suárez- V arela, Exploring the Limitations of Current Graph Neural Net- w orks for Netw ork Mo deling, in: NOMS 2022-2022 IEEE/IFIP Net work Op erations and Managemen t Symp osium, 2022, pp. 1–8. doi:10.1109/NOMS54207.2022.9789708. URL https://ieeexplore.ieee.org/document/9789708 [136] P . V eličk o vić, G. Cucurull, A. Casanov a, A. Romero, P . Liò, Y. Bengio, Graph atten tion netw orks (2018). URL [137] R. Netra v ali, A. Siv araman, S. Das, A. Go y al, K. Winstein, J. Mic k ens, H. Balakrishnan, Mahimahi: Accurate Record-and-Replay for HTTP, in: 2015 USENIX Ann ual T echnical Conference (USENIX A TC 15), USENIX Asso ciation, Santa Clara, CA, 2015, pp. 417–429. URL https://www.usenix.org/conference/atc15/technical- session/presentation/netravali [138] J. Zhang, K. Gao, Y. R. Y ang, J. Bi, Prophet: T ow ard F ast, Error- T olerant Model-Based Throughput Prediction for Reactiv e Flo ws in DC Netw orks, IEEE/ACM T ransactions on Netw orking 28 (6) (2020) 2475–2488. doi:10.1109/TNET.2020.3016838. URL https://ieeexplore.ieee.org/document/9178502/ [139] F. Gey er, S. Bondorf, DeepTMA: Predicting Eﬀectiv e Conten tion Mo d- els for Netw ork Calculus using Graph Neural Netw orks, in: IEEE INF OCOM 2019 - IEEE Conference on Computer Communications, IEEE, 2019, pp. 1009–1017. doi:10.1109/INF OCOM.2019.8737496. URL https://ieeexplore.ieee.org/document/8737496/ [140] F. Geyer, S. Bondorf, On the Robustness of Deep Learning-predicted Con tention Mo dels for Netw ork Calculus, in: 2020 IEEE Symposium on Computers and Communications (ISCC), IEEE, 2020, pp. 1–7. doi:10.1109/ISCC50000.2020.9219693. URL https://ieeexplore.ieee.org/document/9219693/ 85 [141] M. F arreras, J. P aillissé, L. Fàbrega, P . Vilà, GNNetSlice: A GNN-based p erformance mo del to supp ort netw ork slicing in B5G netw orks, Computer Comm unications 232 (2025) 108044. doi:10.1016/j.comcom.2025.108044. [142] R. Srik an t, The Mathematics of In ternet Congestion Con trol, 2004. doi:10.1007/978-0-8176-8216-3. [143] K. Hattori, T. Korik aw a, C. T ak asaki, Queue-informed neural net work mo del for estimating queuing dela y in pon-based aggre- gation netw orks, in: 2025 IEEE 11th International Conference on Net work Softw arization (NetSoft), IEEE, 2025, pp. 199–203. doi:10.1109/NetSoft64993.2025.11080627. URL https://ieeexplore.ieee.org/document/11080627/ [144] C. Li, A. Nasr-Esfahany , K. Zhao, K. Noorbakhsh, P . Go y al, M. Al- izadeh, T. E. Anderson, m3: A ccurate Flo w-Lev el P erformance Es- timation using Mac hine Learning, in: Pro ceedings of the A CM SIGCOMM 2024 Conference, A CM SIGCOMM ’24, Association for Computing Mac hinery , New Y ork, NY, USA, 2024, pp. 813–827. doi:10.1145/3651890.3672243. URL https://doi.org/10.1145/3651890.3672243 [145] P . Namy ar, B. Arzani, S. Kandula, S. Segarra, D. Crankshaw, U. Kr- ishnasw amy , R. Go vindan, H. Ra j, Solving { Max-Min } fair resource allo cations quic kly on large graphs, in: 21st USENIX Symposium on Net work ed Systems Design and Implementation (NSDI 24), 2024, pp. 1937–1958. [146] Inference co de for llama mo dels. URL https://github.com/facebookresearch/llama/blob/main/ llama/model.py [147] V. P axson, S. Floyd, Wh y we don’t kno w how to sim ulate the In ternet, in: Pro ceedings of the 29th conference on Win ter simulation - WSC ’97, A CM Press, New Y ork, New Y ork, USA, 1997, pp. 1037–1044. doi:10.1145/268437.268737. [148] S. Flo yd, E. Kohler, Internet researc h needs b etter mo dels, A CM SIGCOMM Computer Comm unication Review 33 (1) (2003) 29–34. doi:10.1145/774763.774767. 86 [149] Simple explanation of the no-free-lunch theorem and its implications, Journal of optimization theory and applications 115 (2002) 549–570. [150] C. E. Leiserson, F at-trees: Universal netw orks for hardware-eﬃcien t sup ercomputing, IEEE T ransactions on Computers C-34 (10) (1985) 892–901. doi:10.1109/TC.1985.6312192. [151] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, Bridging the gap b et w een sim ulated and real netw ork data using transfer learning, arXiv preprin t arXiv:2510.00956 (2025). 87

From Simulation to Deep Learning: Survey on Network Performance Modeling Approaches

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment