From Simulation to Deep Learning: Survey on Network Performance Modeling Approaches

Network performance modeling is a field that predates early computer networks and the beginning of the Internet. It aims to predict the traffic performance of packet flows in a given network. Its applications range from network planning and troublesh…

Authors: Carlos Güemes-Palau, Miquel Ferriol-Galmés, Jordi Paillisse-Vilanova

From Simulation to Deep Learning: Survey on Network Performance Modeling Approaches
F rom Sim ulation to Deep Learning: Surv ey on Net w ork P erformance Mo deling Approac hes Carlos Güemes-P alau , Miquel F erriol-Galmés, Jordi P aillisse-Vilanov a, P ere Barlet-Ros, Alb ert Cabellos-Aparicio Universitat Politè cnic a de Catalunya (UPC), Bar c elona, Catalonia, Sp ain Abstract Net work p erformance mo deling is a field that predates early computer net- w orks and the b eginning of the Internet. It aims to predict the traffic p erfor- mance of pac ket flo ws in a giv en net work. Its applications range from net w ork planning and troublesho oting to feeding information to netw ork con trollers for configuration optimization. T raditional net w ork p erformance mo deling has relied heavily on Discrete Even t Simulation (DES) and analytical meth- o ds grounded in mathematical theories such as Queuing Theory and Net work Calculus. Ho wev er, as of late, we ha v e observed a paradigm shift, with at- tempts to obtain efficient Parallel DES, the surge of Machine Learning mo d- els, and their in tegration with other methodologies in h ybrid approac hes. This has resulted in a great v ariet y of mo deling approac hes, eac h with its strengths and often tailored to sp ecific scenarios or requiremen ts. In this pa- p er, w e comprehensiv ely survey the relev an t netw ork p erformance mo deling approac hes for wired netw orks o ver the last decades. With this understand- ing, w e also define a taxonomy of approaches, summarizing our understand- ing of the state-of-the-art and ho w b oth tec hnology and the concerns of the researc h communit y ev olv e ov er time. Finally , w e also consider how these mo dels are ev aluated, how their different nature results in different ev alua- tion requiremen ts and goals, and how this ma y complicate their comparison. Keywor ds: net work mo deling, net work p erformance, net work simulation, analytical mo dels, machine learning, deep learning Email addr ess: carlos.guemes@upc.edu (Carlos Güe mes-P alau) Con ten ts 1 In tro duction 3 1.1 Metho dology and scop e . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Describing Net work Mo dels . . . . . . . . . . . . . . . . . . . 7 1.3 Surv ey Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Structure of Surv ey . . . . . . . . . . . . . . . . . . . . . . . . 9 2 T axonomy of Net work Performance Mo dels 9 3 Net w ork Simulation 13 3.1 Discrete Ev ent Simulation . . . . . . . . . . . . . . . . . . . . 13 3.2 P arallel Discrete Even t Sim ulation . . . . . . . . . . . . . . . . 17 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Analytical Mo dels 20 4.1 Queuing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Fluid Mo dels . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Net work Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Algebraic-based Discrete Net work Calculus (ADNC) . . 29 4.3.2 Optimization-based Discrete Net work Calculus (ODNC) 30 4.3.3 Sto c hastic Net work Calculus (SNC) . . . . . . . . . . . 30 4.4 Other Analytical Mo dels . . . . . . . . . . . . . . . . . . . . . 31 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Mac hine Learning Mo dels 32 5.1 “Shallo w" ML Mo dels . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Deep Learning and Graph Neural Net works . . . . . . . . . . 38 5.2.1 RouteNet and Successors . . . . . . . . . . . . . . . . . 38 5.2.2 Other GNN Mo dels . . . . . . . . . . . . . . . . . . . . 40 5.2.3 Other DL Mo dels . . . . . . . . . . . . . . . . . . . . . 43 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6 Hybrid Approac hes 45 6.1 Mo del-tuned Emulation for Performance Mo deling . . . . . . . 45 6.2 ML + Analytical Hybrid Mo dels . . . . . . . . . . . . . . . . . 48 6.3 A ccelerated DES . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.4 Sim ulation with DL-Enhanced Accuracy . . . . . . . . . . . . 51 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2 7 Discussion on Identified T rends and Challenges within Net- w ork Performance Mo deling 52 7.1 Balance Betw een Accuracy , Resolution, Applicability , and In- ference Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2 The Dominance of DES and the Surge of GNNs . . . . . . . . 54 7.3 Reduced In terest in Analytical Mo dels . . . . . . . . . . . . . 55 7.4 A dapting to Changing Netw orks . . . . . . . . . . . . . . . . . 56 7.5 Sim ulation-Dominated Ev aluation . . . . . . . . . . . . . . . . 57 7.6 Heterogeneous Approac hes, Heterogeneous Ev aluations . . . . 59 8 F uture Directions and Opp ortunities 60 8.1 Consolidation of PDES and GNNs Mo dels . . . . . . . . . . . 60 8.2 Analytical Mo dels Enhancing DL Mo dels . . . . . . . . . . . . 61 8.3 ML as a New T o ol for Ev olving Netw orks . . . . . . . . . . . . 61 8.4 Data Cen ter-Centric Designs . . . . . . . . . . . . . . . . . . . 62 8.5 Better Usage of Sim ulation Data for T raining Real-W orld Mo dels 63 9 Conclusions 63 1. In tro duction P erformance mo deling has long been a fundamen tal tool in computer net working. By enabling the prediction and ev aluation of netw ork b eha vior without requiring disruptiv e c hanges in live systems, mo dels supp ort netw ork design, planning, and proto col developmen t. F rom early bac kb one netw orks to mo dern cloud infrastructures, p erformance mo dels hav e help ed op erators mak e informed, data-driv en decisions ab out scalability , reliability , and effi- ciency . In the research comm unity , muc h of the fo cus ov er the past tw o decades has b een on wireless net works, driven b y the rapid rise of mobile communi- cation, 5G/6G tec hnologies, and the Internet of Things. Wired netw orks, by con trast, seemed comparativ ely static: their role w as often limited to large national bac kb ones such as GBN [1] or Abilene [2], and for a time, mo deling researc h in this area stagnated. Ho wev er, this p erception dramatically c hanged with the rise of data cen- ters. Once motiv ated mainly b y cloud computing and large-scale w eb ser- vices, to da y’s data cen ters are increasingly driv en b y artificial in telligence (AI) and large language mo del (LLM) training workloads. According to the 3 In ternational Energy Agency [3], conv en tional serv ers increased their energy demand from 145 TWh in 2020 to 195 TWh in 2024, with pro jections ex- ceeding 300 TWh b y 2030. A ccelerated serv ers designed for AI workloads pro ject even steep er growth, increasing from 10 TWh in 2020 to 60 TWh in 2024, and are similarly exp ected to surpass 300 TWh by 2030. These figures illustrate the unpreceden ted demands on wired netw orks, particularly data cen ter net works, and the reason wh y mo deling them has b ecome an urgen t researc h priority . Unlik e domains suc h as electromagnetism or fluid dynamics, where gov- erning equations capture system b ehavior, computer netw orks lack a single set of exact mathematical laws. Their discrete nature, combined with in- ternet traffic’s burst y and self-correlated b eha vior [4, 5] and the complexity of congestion control mak es accurate mo deling a con tinuous challenge. As net works evolv e, so do mo deling approaches, from formula-based analytical mo dels and pack et-lev el discrete ev ent sim ulation to, more recently , mac hine learning–driv en prediction. F rom the 1990s to the 2010s, most net work performance models were based on analytical mo deling or simulation. Analytical models, such as those built on Queuing Theory (QT) [6], describ ed netw orks as systems of equa- tions. While computationally efficient, they often rely on simplified assump- tions ab out traffic and service distributions. Net work Calculus (NC) [7, 8], alternativ ely , offered deterministic w orst-case p erformance b ounds without assuming sp ecific traffic or service distributions. Ho w ever, its b ounds were often ov erly conserv ative and limited to feedforward top ologies. Con versely , Discrete Ev en t Simulation (DES) b ecame p opular b y offering high-fidelity , pac ket-lev el mo deling of netw orks. Y et, its computational cost, prop ortional to the num b er of net w ork ev en ts, made it impractical for large-scale and high-sp eed net works. While each approach had its strengths, neither fully met the gro wing demands for scalability , expressiveness, and accuracy . After a p eriod of stagnation in the 2010s, researc h in netw ork p erformance mo deling w as revitalized b y the rise of Mac hine Learning (ML). Although the first applications of ML to net work mo deling date back to 2001, it was the surge of Deep Learning (DL) mo dels in 2018 that brough t a significan t breakthrough. These mo dels prov ed capable of accurately predicting net- w ork p erformance while b eing relatively inexp ensiv e to train and run. This mark ed a ma jor shift aw ay from traditional analytical mo dels. Although an- alytical approac hes provide theoretical guaran tees, DL mo dels ha ve increas- ingly outp erformed them in practice, pro viding higher accuracy at similar 4 1990 1991-2000 2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 Y ear 0 10 20 30 Number of Models Simulation Analytical Models ML Models Hybrid Appr oaches Figure 1: Iden tified mo dels p er year of publication and type. Does not consider other t yp es of publications, suc h as surv eys. If the same mo del is cov ered in multiple pap ers, it is counted once in the y ear of its earliest publication. or even low er computational costs. As a result, DL-based models quic kly b ecame the dominant approac h in the literature. Bey ond replacing earlier tec hniques, DL also inspired a second paradigm shift: the emergence of h y- brid approaches. Researc hers b egan integrating ML to existing approaches to lev erage their complementary strengths. F or instance, ML mo dels ha ve b een incorp orated in to DES frameworks to reduce sim ulation time without sacrificing detail or expressiv eness. This trend is reflected in Figure 1, whic h shows the num b er of net w ork p erformance mo dels published o ver the past decades. Researc h activity p eak ed in the late 1990s and early 2000s, follo w ed b y a perio d of stagna- tion around 2010. Ho wev er, since 2018, the field has exp erienced a strong resurgence, with ML-based models becoming the dominant approach. The figure also sho ws the steady rise of h ybrid metho ds and a decline in the use of traditional analytical mo dels. In contrast, the n umber of simulation- based mo dels has remained relatively stable. Although sim ulation remains a reliable and expressiv e mo deling to ol, recen t researc h has shifted tow ard impro ving its scalability and execution sp eed. In this survey , w e study the evolution of net work p erformance models o ver time. W e analyze the motiv ations behind each shift, the limitations that shap ed subsequen t approaches, and the emerging trends in the latest generation of models. A dditionally , we highligh t the c hallenges the field curren tly faces and the directions research is taking to address them. While surv eys exist in related areas, such as wireless net works [9, 10, 11] or sp ecific mo deling approaches lik e NC [12, 13] and Graph Neural Netw orks [14], to the best of our kno wledge, there has not been a dedicated surv ey that revisits 5 p erformance mo deling with a fo cus on wired net works. 1.1. Metho dolo gy and sc op e T o find all relev ant models published in the state-of-the-art (from now on referred to as SotA), w e searc hed within IEEE Xplore, A CM’s Digital Library , and Elvisier’s ScienceDirect for pap ers with either the terms “net work mo d- eling" or “net w ork p erformance". W e fo cused on papers published within the IEEE INFOCOM, IEEE NOMS, IEEE/ACM T ransactions on Netw orking, A CM SIGCOMM, ACM CoNEXT, Elsevier Computer Netw orks, and Else- vier Computer Communications conferences and journals. W e then excluded pap ers that fell outside the scop e of the surv ey and pap ers that lac k ed a net work mo del capable of predicting performance metrics. F or certain high- impact pap ers, such as [15, 16, 17, 18], w e also reviewed articles that cite them. F or older metho ds, suc h as the original QT mo dels, we started b y searc hing in the references of the pap ers w e had already found, and then rep eated this pro cess recursively until finding their original precursor. This surv ey limits itself to mo dels capable of replicating or predicting the p erformance of traffic flows within wired netw orks. While there are similar researc h fields, ultimately , the solutions they encompass attempt to solve differen tly-natured problems, rendering their comparison fruitless. This ap- plies to the follo wing fields: • Wireless and mobile net w orks : The metho dologies for mo deling wired and wireless net works v ary significantly due to their differences. Hence, in this surv ey , w e focus only on wired netw orks, and w e refer the in terested reader to existing surveys on wireless netw orks [9, 10, 11]. • Net w ork verification : Rather than predicting netw ork b eha vior un- der sp ecific conditions, these mo dels fo cus on ensuring the viability of p oten tial net w ork configurations (e.g., if certain QoS guarantees are main tained [19, 20]). • T raffic prediction models : Unlik e netw ork mo deling, whose task is to predict the p erformance of traffic flo ws within the net work, in traffic prediction, the task is to predict and describ e the incoming traffic flo ws. • Anomaly detection : These mo dels are task ed with detecting anoma- lous patterns that can b e indicative of a net w ork securit y threat. Rather than making predictions, they discriminate betw een the “normal" ex- p ected b ehaviors and the p oten tially dangerous unexp ected ones. 6 1.2. Describing Network Mo dels Throughout the surv ey , we will summarize the identified mo dels in tables. In the follo wing section, we describ e ho w we identify mo dels: Mo del T yp e Sp ecific t ype of mo del (e.g., if sim ulation, type of sim ulator; if ML mo del, whic h ML architecture...). Input Sc op e Refers to the scop e of the input data of the mo del: • Single-flo w scenarios: mo dels can only mo del single flows. They do not consider cross-traffic in teractions. They also ignore the underlying net work top ology , or only consider individual links or devices. • T raffic-matrix scenarios: models consider cross-traffic interactions. All flo ws are explicitly represented, even if the mo del pro duces predictions for just a subset of them. Ho w ever, they still heavily simplify or ignore the underlying net work top ology . • F ull netw ork scenarios: mo dels consider cross-traffic in teractions, and also consider the net work’s top ology , routing, and c haracteristics in their calculations. Output Sc op e Refers to the scop e of the predicted p erformance metrics: • Flo w-level: av erage p erformance metrics for each flow. • Flo w-level with temp oral component: like b efore, but with a temp o- ral comp onent. Metho ds ma y v ary in the coarseness of the temp oral comp onen t or ho w time is aggregated. • P ack et-lev el: can predict individual pack ets. Supp orte d T r affic T yp es • UDP: T raffic without congestion control. • TCP: T raffic regulated b y congestion control. Some mo dels may con- sider a generic congestion con trol algorithm; others consider one or m ultiple versions of TCP . 7 • Non-sp ecific: The mo del do es not sp ecify which traffic distributions it supp orts. How ev er, b ecause of the nature of ho w they model traffic distributions, this ma y result in v arying degrees of error dep ending on the pac k et arriv al time distribution (e.g., queue mo dels mo deling traffic through a P oisson pro cess) • An y: can supp ort faithfully any traffic type. Supp orte d Performanc e Metrics These usually dep end on the output scop e and traffic t yp e supp orted by the net work mo del: • Mo dels with pack et-lev el predictions usually predict the full pac k et in- formation (i.e., the timestamp of the pack et at eac h p oin t of its routing path) • Mo dels for UDP traffic fo cus on pack et dela y , jitter, and loss rates. • Mo dels for TCP traffic ma y also fo cus on flow completion time (F CT), throughput, and pac ket round-trip time (R TT). Evaluation Most net work models are ev aluated or offer theoretical guaran tees. If they are, w e categorize ev aluation in the follo wing manner: • Analytical ev aluation: prop erties are analyzed from an analytical view- p oin t only (i.e., verified through formal pro ofs). Common in analytical mo dels. Restricted to assumptions made by the analysis itself. • Sim ulated data ev aluation: The mo del was ev aluated using simulated data against the sim ulator’s ground truth. • T estb ed data ev aluation: The mo del was ev aluated using data gener- ated and captured from a testb ed netw ork. • Real data ev aluation: The mo del was ev aluated using captured real in ternet data. 8 1.3. Survey Outc omes The main con tributions of this survey are as follows: • W e in tro duce a taxonom y for describing the differen t kinds of net work p erformance mo dels, allowing us to b etter understand the SotA and ho w the v arious approac hes influence and complemen t each other. • W e p erform a comprehensive study of the relev an t net work p erformance mo dels present in the SotA. • W e discuss the trends and limitations of the current SotA of net- w ork p erformance mo deling, comparing them to those iden tified b y researc hers back at the start of the millennium. 1.4. Structur e of Survey F or ease of reading, w e summarize the structure of the survey . W e also include a summary of the acron yms used throughout the survey in T able 1. • In Section 2 w e in tro duce and describ e our taxonomy for netw ork p er- formance mo dels. • In the main b o dy of the pap er, w e surv ey the curren t or previously relev ant net w ork performance mo dels. Sp ecifically , Section 3 co v ers net work sim ulators, Section 4 analytical mo dels, Section 5 ML mo dels, and Section 6 h ybrid approaches. • Section 7 includes our discussion of the SotA of netw ork p erformance mo dels. • Finally , in Section 8 w e discuss future directions of research into net- w ork p erformance mo dels. 2. T axonomy of Net work Performance Mo dels The evolution of netw ork p erformance mo dels is summarized in Figure 2. This taxonomy shows the four main types of net work mo dels, sim ulation, analytical mo dels, ML mo dels, and hybrid approac hes, and how they interact. The earliest net w ork p erformance models w ere analytical, based on Queu- ing Theory (QT) [6]. These mo dels describ e the b eha vior of pack ets in de- vices through systems of queues, based on giv en assumptions (e.g., pac ket 9 A cron ym Description ADNC Algebraic-based Deterministic Netw ork Calculus AI Artificial Intelligence A QM A ctive Queue Management CV AE Conditional V ariational Auto-Enco der [P]DES [Parallel] Discrete Even t Sim ulation DCQCN Data Center Quantized Congestion Notification [21] DL Deep Learning DNN Dense Neural Netw ork EVT Extreme V alue Theory F CT Flo w Completion Time FIF O First-In First-Out GCN Graph Conv olutional Net w ork [22] GGNN Gated Graph Neural Netw ork [23] GNN Graph Neural Netw ork [24] IPG Inter-P ac k et Gap LLM Large Language Mo del LP Logical Pro cess LPP Linear Programming Problem [Bi]LSTM [Bidirectional] Long Short-T erm Memory [25] [EW]MA [Exp onen tially W eighted] Mo ving A verage ML Machine Learning MPNN Message-Passing Neural Netw ork [26] NC Netw ork Calculus NP Neural Pro cesses [27] ns[-2 ∥ -3] The Netw ork Sim ulator [28, 29] ODE (System of ) Ordinary Differen tial Equations ODNC Optimization-based Deterministic Netw ork Calculus PBOO Pa y Bursts Only Once [30] PDE (System of ) P artial Differential Equations PMOO Pa y Multiplexing Only Once [31] QoS Quality of Service QT Queuing Theory RBFNN Radial Basis F un ction Neural Netw ork RED Random Early Detection RF Random F orest R GCN Relational Graph Con volutional Netw ork [32] RNN Recurrent Neural Netw ork R ON Resilien t Overla y Netw orks [33] 10 R ON Resilien t Overla y Netw orks [33] R TT Round T rip Time SDE (System of ) Sto c hastic Differen tial Equations SF A Separate Flow Analysis [30] SNC Sto c hastic Netw ork Calculus SVR Supp ort V ector Regression [DC]TCP [Data Center [34]] T ransmission Con trol Proto col TF A T otal Flo w Analysis [7, 8] TMA T andem Matching Analysis [35] UDP User Datagram Proto col T able 1: List of acronyms arriv al distributions). Simpler QT models, constrained b y stricter assump- tions, are often inaccurate. More adv anced mo dels relax these assumptions and impro ve accuracy , but b ecome significan tly harder to solv e. A ddition- ally , QT mo dels t ypically pro vide only aggregate predictions, suc h as a verage pac ket dela y , rather than detailed, per-pack et insights. This makes them bet- ter suited for general approximations of netw ork p erformance, rather than thoroughly examining sp ecific, complex scenarios. This has inspired other t yp es of analytical mo dels, in tending to improv e accuracy , cost, and expressive ness compared to their QT counterparts. On the one hand, fluid queuing mo dels iterate o v er traditional discrete QT mo d- els while assuming that transmitted data can b e represented as a contin uous v alue. While this assumption do es not hold in reality , it allo ws for net work p erformance mo dels built through systems of differential equations, whic h can b e solv ed efficiently . Hence, these mo dels are inexp ensiv e, while the in- clusion of the temp oral comp onen t in the differential equations also mak es them more expressiv e. Alternativ ely , other researc hers turn to Net w ork Calculus (NC) [7, 8]. Unlik e QT, initial NC mo dels aimed to reduce the n umber of assumptions, include the temp oral comp onen t, and provide predictable and theoretically pro ven correct p erformance predictions at a lo w computational cost. This is ac hieved b y offering w orst-case b ounds for the pac ket dela y as a function of time. Ho w ev er, while the b ounds are pro ved to b e correct, early NC mo dels also sho w that these tend to b e to o pessimistic, while more complex NC mo dels prov e to b e to o computationally exp ensiv e to b e practical. By contrast, an old y et popular alternativ e for building net w ork mod- 11 Discrete Event Simulation (1988) Early Parallelizable DES (2001) New Parallelizable DES (2016) Queuing Theory (1963) Network Calculus (1991) Machine Learning Models (2001) ML + Analytical Hybrid Models (2019) Fluid Queuing Models (1999) DL-Accelerated DES (2021) Deep Learning Models (2018) Simulation Analytical Models Maching Learning Models Hybrid Approaches Other Analytical Models Model-tuned Emulation (2018) Accelerated DES (2004) Simulation with DL- Enhanced Accuracy (2023) Figure 2: T axonomy of net work p erformance mo dels. Solid line arrows indicate direct ev olution, while dashed line arro ws indicate inspiration or a “response to". Y ear indicates the year of the earliest publication within that category . els is sim ulation, sp ecifically Discrete Even t Simulation (DES). It allows for accurate, fine-grained results. Popular DES pro jects like OMNET [36] and ns [28] are op en source, allo wing researc hers to share the implemen tations for new er proto cols and devices and facilitating their use. The most significant limitation of DES is its computational cost, which is compounded b y the fact that its sequential nature mak es it extremely hard to parallelize. As a result, the developmen t of parallelizable DES (PDES) b ecame a prominen t researc h area, driven b oth by the technical c hallenges inv olv ed and the p o- ten tial b enefits: an accurate, detailed mo deling approac h that could scale to large net works if implemented effectively . QT and other analytical mo dels also influenced the in tro duction of the first ML mo dels. Originally , the researc hers referred to the former as “formula- based" and the latter as “history-based" [37]. The ob jectiv e was for these mo dels to offer better accuracy-cost trade-offs compared to the analytical mo dels. Ho w ever, ML mo dels would not start to thrive until 2018, with the in tro duction of DL. New er architectures, such as Graph Neural Netw orks [24], w ere expressive enough to extract and process as m uch information as was a v ailable and pro duce more accurate results. Since their in tro duction, DL mo dels ha ve dominated, completely o vertaking analytical mo dels as they offered more accurate p erformance predictions at similar or lo wer computa- 12 tional costs. F urthermore, b ey ond building ML or DL mo dels directly , there is also an in terest in integrating them in to previously researched approaches. W e refer to these as hybrid approac hes and are, b y their nature, extremely v aried as to how they may ac hiev e this. F or example, ML and analytical mo dels can b e com bined b y using the output predictions of one approach as input to the other and obtain an ov erall more accurate prediction than if done separately [38, 39]. Another approac h is to augment net w ork em ula- tors, useful for netw ork verification but not for p erformance prediction, with queuing mo dels and Bay esian optimization to enable accurate p erformance prediction [40, 41]. One of the most promising directions in volv es enhancing DES b y integrating fluid models and, more recently , DL. This approac h aims to supp ort parallelization and low er simulation costs b y replacing selected comp onen ts with alternativ e mo dels, while preserving the high accuracy and expressiv eness of traditional DES [42, 15]. Finally , recen t w ork has explored using DL mo dels to enhance simulation accuracy in situations where they are unable to predict net work b eha vior faithfully [43, 44]. 3. Net w ork Simulation This section focuses on netw ork mo dels that are implemented through sim ulation. These are summarized in T able 2. 3.1. Discr ete Event Simulation Discrete Ev ent Simulation describ es the scenario through a global state and a sequence of “discrete even ts" ov er time [45]. These ev ents are defined as p oin ts in time at whic h the state of the simulation (i.e., the netw ork and the traffic within it) changes. It includes the generation of pac kets b eing in tro duced in to a link, reac hing the other side and b eing buffered, to then b eing queued and pro cessed by the device. The simulator mo dels the state of the net work devices and the rules to pro cess the different ev ents b y study- ing, and sometimes ev en re-implemen ting their proto cols. Proto cols can b e studied through their sp ecification (e.g., RFC publications). 13 Mo del T yp e Input scop e Output scop e T raffic t yp e P erformance metrics Ev aluation REAL [46] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion ns [28], -2, -3 [29] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion OMNeT, OM- NeT++ [36] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion OPNET [47] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion CaSiNo [45] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion WNS [48] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion IKR Sim ulation Library [49] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Ja v a mo deling to ols [50] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion gem5 [51] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion 14 p d-gem5 [52] DES F ull net work scenarios P ack et-lev el An y F ull pac ket path informa- tion T estb ed data SimBric ks [53] Mo dular DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data SplitSim [54] Mo dular DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data DONS [18] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data Unison [55] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Sim ulated data NSX [56] DES F ull netw ork scenarios P ack et-lev el An y F ull pac ket path informa- tion Only ev aluates cost, not accuracy P arsimon [57] Link-level DES F ull net work scenarios Flo w-level TCP (F o cuses on DCTCP , but generalizable) T ail F CT (90th p ercen tile or higher) Sim ulated data T able 2: Summary of simulated net w ork p erformance mo dels 15 The earliest dedicated net w ork DES sim ulator is REAL [46], originally built to ev aluate the p erformance of queuing algorithms in gatew ays. It then acted as a basis for the more general purp ose “The Netw ork Simulator" or ns [28]. ns has been updated o ver the y ears, reac hing v2.0 (kno wn as ns- 2) in 1997, and later v3 (ns-3) in 2010 [29]. Along with OMNeT (later to b ecome OMNeT++) [36], they are the most widely used net work simulators in academia since their creation in the late 1990s. Note that, during their dev elopment, many alternative DES soft ware were released. This includes OPNET [47], CaSiNo [45], Op en WNS [48], IKR Sim- ulation Library [49], Jav a mo deling to ols [50] and gem5 [51]. Ultimately , ns and OMNeT remained the most p opular choices due to t w o factors. First, they are open-source pro jects with con tinuous dev elopment ov er the y ears, allo wing them to stay up dated and supp ort new technologies. Second, b oth ha ve p ermissiv e licenses that allow them to b e used freely for research pur- p oses: ns uses the GNU GPL, and OMNeT uses the A cademic Public license. The main adv an tage of DES is its completeness: b y fully simulating the individual in teractions in the net w ork, the sim ulator can return accurate and complete descriptions of the traffic behavior. It can also b e used to understand cross-traffic in teractions, the impact of the netw ork top ology and routing, and congestion con trol proto cols. Because of its accuracy and lev el of detail, DES is the preferred c hoice of net w ork op erators whenev er applicable. F urthermore, as w e explore other alternativ es of net w ork mo dels, one quic kly realizes ho w DES is widely used as a baseline (discussed later in Section 7.5). In addition, in the case of models that undergo some “training" process, DES is commonly used as a source of the training scenarios due to its ability to generate large quan tities of scenarios, sp ecifically those to o hard to capture in real-life net works. Ho wev er, its completeness also results in its main drawbac k: computa- tional cost. The cost of DES is prop ortional to the n umber of ev ents in the net work and, in turn, to the amount of traffic in the netw ork, which becomes unmanageable in larger, high-capacit y net w orks [58, 15, 16]. F urthermore, this is exacerbated b y the fact that DES simulators, due to the sequen tial nature of the even ts, are usually implemented as single-thread programs (we co ver PDES and its challenges in the next subsection). F urthermore, sometimes simulation softw are fails to accurately describ e certain proto cols or netw ork devices. This may b e b ecause the implementa- tion of certain devices is unknown, protected b y intellectual protection la ws, so their b ehavior cannot b e accurately replicated b y sim ulators [58, 40]. Sim- 16 ilarly , as we discuss later in Section 7, the evolution of in ternet traffic and algorithms used may result in different implementations of proto cols such as TCP , and the release of new v ersions that ma y y et to b e supported b y the sim ulation soft ware [59, 60]. These factors result in gaps in the sim- ulation soft w are, ev en b ecoming outdated if not contin uously maintained. Con versely , to aid future up dates, the simulators end up with a mo dular arc hitecture, allo wing new comp onen ts to interact with existing ones, ex- panding the sim ulator’s applicability ov er time. 3.2. Par al lel Discr ete Event Simulation Because of the high computational cost, there has alwa ys b een an incen- tiv e to dev elop Parallel Discrete Ev ent Simulation (PDES) [61, 62]. This consists of dividing and pro cessing the sequence of ev en ts in parallel while main taining the global state b et w een even ts. While this do es not reduce the computational effort (if anything, the required sync hronization mechanisms will require an additional effort), it do es reduce simulation time by splitting the load b et w een multiple pro cesses. Most PDES work by splitting the simulation in to Logical Pro cesses (LPs). That is, the simulation itself is split across these LPs, a pro cess done man- ually , where eac h one runs as if it w ere its o wn sim ulation. Ideally , the LPs are selected to balance the num b er of even ts while minimizing in ter-pro cess comm unication, whic h is muc h slo wer than ev ents handled fully lo cally . T o main tain the global order of even ts, a sync hronization algorithm is required. While man y approaches exist, these can be categorized as either conserv a- tiv e or optimistic [63]. Conserv ative sync hronization algorithms attempt to minimize causalit y errors, i.e., when the LPs process ev ents out of order, ne- cessitating a rollbac k. Optimistic sync hronization algorithms, in contrast, try to maximize the degree of parallelization at a higher risk. Ultimately , man y PDES sim ulators offer b oth options, as p erformance ma y v ary dep ending on the scenario. While the b enefits of PDES are evident, its complexities ha ve prev ented it from completely replacing DES. The in ter-pro cess messaging and sync hro- nization costs in tro duce significan t o v erhead while reducing the effectiv e par- allelization. Correctly defining the LPs is cum b ersome, making their effectiv e use more difficult [18]. F urthermore, most common netw orks are hard to par- tition to begin with —even if the topology is easily partitioned (e.g., a fat tree topology split in to its branches), there will still b e large v olumes of traffic that cross these boundaries. Ov erall, net works hav e pro v en hard to 17 parallelize, with the gains obtained from parallelization outw eighed b y the o verheads present in PDES. Still, many simulators, including ns-3 and OM- NeT++, ha v e added supp ort for PDES. There hav e also b een developmen ts of PDES inspired b y existing DES, like p d-gem5 [52] based on gem5 [51]. Recen tly , no vel approaches hav e b een dev elop ed to further reduce PDEs’ o verhead. SimBric ks [53] in tro duces the concept of mo dular simulation: basi- cally , it integrates other simulators to model different asp ects of the net work, suc h as OMNeT++ and ns-3 of the net work itself and gem5 for the hosts, for example. This both exploits the features a w arded b y eac h in tegrated sim- ulation softw are and increases speed b y running eac h sim ulator in parallel. Ho wev er, the authors themselv es iden tify that their simulator cannot acceler- ate individual comp onen ts. This leads to the sim ulation b eing b ottlenec ked b y the slo west comp onen ts [54]. F urthermore, requiring interoperability b e- t ween simulators results in some of their features b eing lost (e.g., gem5’s atomic memory proto col) and limits the scenarios SimBricks can cov er, such as only sim ulating single-core hosts. Some of these issues are addressed in its expansion, SplitSim [54]. Extra features include supp ort for mixed fidelity sim ulation, reducing the accuracy of certain components to low er costs; decomp osition of the slo west comp o- nen ts; and an impro ved sync hronization algorithm. It also simplifies the user configuration through an orc hestration framework. Another no vel solution is the implemen tation of DONS [18]. Unlik e other sim ulators, DONS follows a Data-Orien ted design in its implemen tation, re- structuring how memory is organized and accessed to improv e paralleliza- tion. Ho w ever, these b enefits only apply to thread-based parallelization in m ulti-core pro cessors, but not to multiple pro cesses across differen t machines. Hence, to facilitate distributed execution, it utilizes an automatic LP par- titioning algorithm. DONS prioritizes correctness, both by mathematically pro ving the robustness of its Data-Orien ted design and b y emplo ying a con- serv ative sync hronization algorithm. Ho w ever, DONS’s biggest limitation is its Data-Orien ted design, making it incompatible with other PDES imple- men tations. This requires all the sim ulation logic to b e re-implemen ted to fit the new paradigm [55]. Con versely , Unison [55] follo ws a more standard approac h. Its no velt y arises from its automatic fine-grained LP partition, easing configuration and impro ving the efficacy of the selected LPs, and from dynamic scheduling to a void b ottlenec ks. It can also reuse framew orks meant for other DES simu- lators like ns-3. While effective, it also comes with imp ortan t limitations: its 18 automatic LP partition do es not supp ort stateful links (e.g., wireless chan- nels), and its load balancing assumes that all the processors it runs on are iden tical. A similar, recent approac h is that of NSX [56]. Unlike other PDES, NSX is meant to run in Graphical Processing Units (GPUs). The primary motiv ation b ehind this c hoice is to capitalize on the curren t p opularit y of GPU-hea vy data cen ters designed for LLMs. NSX is designed for GPUs’ ex- treme lev els of parallelization: for example, using local even t queues to main- tain a “lo cal" ev ent order. The main drawbac k of this approach is presuming the a v ailabilit y of plen tiful, p o w erful GPU hardware, which is monetarily exp ensiv e. An alternative approach to accelerate DES to those presented earlier is P arsimon [57]. Parsimon decreases computational cost while enabling effi- cien t parallelization by decomp osing a netw ork top ology in to its links. That is, for eac h link in a flow’s path, Parsimon studies its experienced dela y through a set of indep enden t simulations. Each link sim ulation predicts link b eha vior b y appro ximating the exp ected experienced load in the original top ology but in a smaller scenario. These link-level simulations are designed to light, and, as they are indep enden t, they can b e executed in parallel. Par- simon also offers the option of using a greedy clustering algorithm to share sim ulation results b et w een similar links, th us reducing the n umber of them to be executed, albeit at the cost of accuracy . Ho wev er, the link sim ula- tions in tro duce error (usually o v erestimating delays) and are incompatible with pac ket-lev el visibilit y . Instead, Parsimon can only predict p erformance metrics that can b e computed as the aggregation of the p erformance at eac h link, such as the FCT. Because it tends to ov erestimate, Parsimon is b etter suited for tail-prediction —i.e., upp er b ound approximations. Ov erall, building an efficient PDES is non-trivial. Due to the sequen tial nature of the DES, the authors ultimately need to mak e some concessions to maximize the efficiency of the sim ulation’s parallelization. Y et, when successfully applied, PDES achiev es accurate and complete results with low inference times, assuming the users ha ve the hardware to run it on. 3.3. Summary Ov erall, simulation, and in particular DES, hav e b een the most complete net work mo del av ailable. It remains the most accurate option, and con- tin uous supp ort b oth by the research communit y and industry allo ws DES sim ulators such as OMNeT++ [36] and ns-3 [29] to keep up with the adv ances in mo dern net works. Its biggest threat, ho wev er, is the computational cost. 19 T raditional, single-threaded DES cannot simulate large net works or mo dern data cen ters, and ev en if it can, it is at a prohibitively high computational cost. While interest in PDES existed ev er since the inception of DES itself, its tec hnical challenges were not addressed properly until fairly recently , with examples like SplitSim [54], DONS [18] and Unison [55]. Ev en then, it is imp ortan t to note that PDES (except for Parsimon [57]) does not reduce the computational cost of sim ulation (if anything, it will increase it due to the synchronization o verhead), but rather offloads it to mutiple machines. Hence, without the hardware av ailable to run them, sim ulating large netw ork scenarios are still out of reac h. 4. Analytical Mo dels This section fo cuses on analytical netw ork mo dels, summarized in T a- ble 3. These are generally based on some formal theory and are implemen ted through systems of equations. 4.1. Queuing The ory Queuing Theory (QT) is a field of mathematics that studies the b eha vior of queue-lik e systems. In them, a system will receive requests and service them according to a sp ecified distribution (e.g., P oisson). If the system is o ccupied, requests will b e queued until they can b e pro cessed. The original mo del w as prop osed in the 1960s in [6], and w as later refined for computer net works in [64]. With them, devices and protocols can b e approximated through these models to extract relev ant p erformance metrics such as the mean queuing dela y . Queues’ definitions v ary in complexity [45]. Simpler queuing mo dels can b e solv ed instan tly , but are limited to simple b eha viors and therefore ha ve more stringent assumptions. F or example, the simplest queue is the M/M/1 queue, which assumes (1) a Poisson arriv al pro cess of pack ets, (2) exp onen- tially distributed service times, and (3) a single First-In, First-Out (FIFO) queue with infinite buffer. These prop erties make calculating factors lik e av- erage pack age queuing delay or service time trivial, but are also unrealistic. While the infinite buffer is the clearest example, an important assumption is that of a P oisson arriv al pro cess. In practice, internet traffic distributions ha ve b een sho wn to be self-correlated and fit best hea vy-tail distributions lik e W eibull [4] and log-normal [5] distributions. 20 Mo del T yp e Input scop e Output scop e T raffic t yp e P erformance metrics Ev aluation [64] M/M/1, M/G/ ∞ , M/G/1 queues Single-flo w scenarios Flo w-level Non-sp ecific A verage pac k et de- la y , buffer ov erflow probabilit y Analytical ev aluation [65] M/G/ ∞ queue T raffic- matrix scenarios Flo w-level Non-sp ecific Loss rate Sim ulated data ev aluation [66] GI/G/1 queue Single-flow scenarios Flo w-level Non-sp ecific Upp er and low er b ounds, a verage pac ket delay Sim ulated data ev aluation [67] System of M/M/ ∞ queues T raffic- matrix scenarios Flo w-level TCP (T aho e) A verage pack et [loss probabilit y , R TT] Sim ulated data ev aluation [68] System of M/G/ ∞ queues T raffic- matrix scenarios Flo w-level TCP (T aho e) A verage pack et [loss probabilit y , R TT] Sim ulated data ev aluation [69] Mo dified M/M/1 queues T raffic- matrix scenarios Flo w-level Non-sp ecific Flo w throughput, a verage pac ket dela y Sim ulated data ev aluation [70] ODE fluid queuing mo del Single-flo w scenarios Flo w-level (temp oral) TCP (Reno and V egas) Flo w throughput No baseline 1 AIMD [71] ODE fluid queuing mo del T raffic- matrix scenarios Flo w-level (temp oral) TCP (Generic) Flow throughput and mean R TT p er sesion Sim ulated data ev aluation [72] ODE fluid queuing mo del T raffic- matrix scenarios Flo w-level (temp oral) TCP (Generic) A verage RR T and loss rate Sim ulated data ev aluation 1 Analytical comparison b et ween TCP versions, no comparison with baseline. 21 [73] SDE fluid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP (Generic w/ RED) Flo w throughput, a verage R TT (but ev aluation only ev aluates queue- length) Sim ulated data ev aluation [74] SDE fluid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP (Sac k, Reno, and NewReno) Flo w throughput, a verage R TT (but ev aluation only ev aluates queue- length and windo w size) Sim ulated data ev aluation m ulti-AIMD [75] Fluid/ discrete h ybrid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP and UDP Flow throughput, a verage R TT Sim ulated data ev aluation [76], [77] Fluid/ discrete h ybrid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) TCP and UDP A ver age flo w throughput and R TT (only drop rate and windo w size retain a temp o- ral comp onen t) Sim ulated data ev aluation [78] PDE fluid queuing mo del T raffic- matrix scenarios Flo w-level (temp oral) TCP (Generic) A verage flo w throughput, FCT and loss rate Sim ulated data ev aluation [79] PDE fluid queuing mo del T raffic- matrix scenarios Flo w-level (temp oral) TCP (Scalable TCP) Flo w throughput Sim ulated data ev aluation [80] ODE fluid queuing mo del F ull netw ork scenarios Flo w-level (temp oral) UDP 2 and TCP (Reno) Flo w R TT, throughput and loss rate Sim ulated data ev aluation 2 UDP only as supp orted as background traffic. 22 [81] Discrete and fluid queuing mo del T raffic- matrix scenarios Flo w-level (temp oral) TCP Flow pack et delay (ev aluation fo cuses in av erage queue lengths) Sim ulated data ev aluation TF A [7, 8] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Analytical ev aluation D-BIND [82] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Real data ev aluation [83] ADNC F ull netw ork scenarios Flo w-level (temp oral) UDP and traffic with throughput guaran tees W orst-b ound pac ket delay Analytical ev aluation SF A [30] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Analytical ev aluation PMOO[31] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Sim ulated data ev aluation [84] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Analytical ev aluation TMA [35] ADNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Sim ulated data ev aluation [85], ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Analytical and Sim ulated data ev aluation [86] ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Sim ulated data ev aluation 23 [87] ODNC F ull netw ork scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Sim ulated data ev aluation [88] SNC T raffic- matrix scenarios Flo w-level (temp oral) Non-sp ecific Sto c hastic w orst- b ound pack et delay Analytical ev aluation [89] SNC T raffic- matrix scenarios Flo w-level (temp oral) Non-sp ecific Sto c hastic w orst- b ound pack et delay Analytical ev aluation [90] SNC T raffic- matrix scenarios Flo w-level (temp oral) Non-sp ecific Sto c hastic w orst- b ound pack et delay Analytical ev aluation [91] Analytical mo del from RF C definitions Single-flo w scenarios Flo w-level (temp oral) TCP (generic with phase effects and RED) Flo w throughput Sim ulated data ev aluation [92] Sto c hastic analytical mo del from observ ations Single-flo w scenarios Flo w-level (temp oral) TCP (Reno) Flow pac ket rate and throughput Real data ev aluation [93] Utilit y maximization mo del T raffic- matrix scenarios Flo w-level TCP Flow throughput Analytical ev aluation [94] Utilit y maximization mo del T raffic- matrix scenarios Flo w-level TCP Flow throughput Analytical ev aluation T able 3: Summary of analytical netw ork p erformance models 24 Alternativ ely , relaxing these assumptions will result in more complex mo dels, like the G/G/1 queue, whic h assumes general distributions for b oth pac ket interarriv al and service times, or the mo dified M/G/ ∞ queue in [65] for mo deling video traffic. By not being bound b y such assumptions, the resulting model is harder to solve, and may not ev en ha ve an exact solu- tion [95]. Usually , appro ximate solutions are extracted from solvers such as Buzen’s conv olution algorithm [96], mean v alue analysis [97], Lo cal Balance Algorithm for Normalizing Constants and Coalesce Computation of Normal- izing Constan ts [98]. A recen t prop osal is that of [66], where the authors use nonstandard analysis for analyzing GI/G/1 queues. Hence, it is also common to build more complete mo dels using an ensem- ble of simpler queue mo dels. An example can b e found in [67, 68], where a mo del of TCP T aho e was built out of a graph made out of M/M/ ∞ queues (and later M/G/ ∞ , assuming a generalized service time distribution), each represen ting a state of the proto col and which the pack ets trav ersed. Their mo del also co vers link in terfaces through an M/M/1/B queue, but only con- siders line top ologies in their ev aluation. Another example can b e found in [69], where the authors prop ose a model capturing burst y and phase-t yp e traffic in a m ultistage switch net work. It explains how these traffic profiles are aggregated and uses adapted M/M/1 queues to approximate the more complex arriv al distributions. QT mo dels are further limited b y only working through distributions. First, it only allo ws mo dels to predict aggregate (e.g., av erage) p erformance measures, suc h as the av erage pack et dela y . Second, no matter how expres- siv e the distribution is, it will alwa ys b e less descriptive than working with individual traffic pack ets, as is done in DES. Next, they tend to mo del only traffic flo ws, indep endently of the underlying net work hardware. The latter is abstracted into “servers" that dela y pac kets b y a giv en amoun t, according to a constan t or exp onen tial distribution, dep ending on the queue mo del. W e also note that early queuing mo dels focused on highly sp ecific sce- narios, t ypically mo deling single flo ws without accoun ting for cross-traffic from other TCP connections or bac kground traffic. That b eing said, this limitation has b een addressed in the SotA [67, 68, 69], incrementally making them viable in a wider range of scenarios. That b eing said, queuing mo dels for TCP implementations are susceptible to b ecoming outdated as netw orks ev olve and the TCP flav or they supp ort b ecomes deprecated. In spite of this, QT models remain a useful to ol for net w ork op erators to quickly obtain a general and appro ximate description of the exp ected b eha vior. 25 4.2. Fluid Mo dels Fluid mo dels (a.k.a. liquid models) are a subtype of QT mo dels. Un- lik e standard (discrete) QT mo dels, which account for data transmission in indivisible pack ets, fluid mo dels treat data as a contin uous, infinitely divis- ible flow. This allo ws for more expressive mo dels to b e developed based on systems of Ordinary , Sto c hastic, and, later on, Partial Differen tial Equations (ODE, SDE, and PDE, resp ectiv ely). Since plen t y of ODE and PDE solv ers exist, due to their prev alence in other fields, fluid mo dels tend to b e computationally inexp ensiv e to solv e. F urthermore, they are more expressive, as their description by sets of differ- en tial questions also allo ws authors to include a temp oral component in their metrics predictions, unlik e discrete mo dels that focus on aggregated (e.g., a verage) results. Being able to formulate the net work state through a series of equations, akin to other fields like thermodynamics and electrical systems, is also app ealing. Ho wev er, ODEs also introduce important limitations. By themselv es, these systems in tro duce a degree of error through appro ximation —solvers based on numerical solutions, for example, will inevitably in tro duce a degree of error due to their appro ximate approach. These inaccuracies are muc h more prev alent in computer net works due to their discrete nature: digital information is measured in bits and sent through indivisible pack ets; decisions tak en b y transp ort proto col algorithms, lik e TCP , are made at a resolution of these individual pac kets. Hence, an y formulation that deals with the net work as a con tinuous system will b e limited due to this fundamen tal difference. Nonetheless, its adv an tages prev ailed, and fluid mo dels proliferated in the SotA b et ween 2000 and 2010. F or example, in [70], the authors implemen ted a fluid mo del of b oth TCP Reno and TCP V egas to compare the tw o. The comparison, how ev er, is limited to the flo w control section of b oth fla v ors. In AIMD [71], the authors prop ose a generic TCP model, including a steady- state solution to extract the flo w’s throughput and mean stationary in ter- congestion time. How ev er, lik e other generic TCP mo dels, it do es not cov er adv ances and differences betw een TCP implemen tations. In [72], the authors prop ose a generic TCP mo del to predict performance metrics relev an t to video transmission (pac k et R TT s and loss rates). The model assumes that pac ket loss can b e describ ed as a Poisson Pro cess. Next, in [73], the authors define a system of SDE, later con verted to a system of ODE, to model Random Early Detection (RED), a mec hanism of Activ e Queue Management (A QM). The model can be expanded from 26 mo deling a single router to a netw ork with multiple TCP flows, and scales to larger instances through flow aggregation. How ev er, the traffic matrix it uses ignores the order of visited routers and assumes a generic version of TCP . Shortcomings of this mo del are later addressed in [74]: it expands the RED mo del to include other types of A QM p olicies, sp ecific versions of TCP (SA CK, Reno, and NewReno), a nd expands the routing information from the netw ork topology considered by it. The mo del also introduces pruning of irrelev an t elements (e.g., non-congested queues) to reduce the num b er of v ariables and improv e computational p erformance. There are also examples of mo dels that attempt to com bine b oth dis- crete and fluid queuing models. In multi-AIMD [75], the authors expand the AIMD mo del to supp ort the presence of b oth TCP and UDP flows. It considers a fluid mo del that feeds into an M/M/1/B (discrete) queue mo del for the net work devices according to the netw ork top ology . In [76, 77], the authors follo w a different approach, where flows are describ ed through a dis- crete state (i.e., empty queues, non-congested flo w, and congested flo w), eac h with its own system of SDE. It allows mo deling a more complex set of b e- ha viors than a single SDE. The model supp orts differen t TCP flav ors and UDP , alb eit the latter follo wing a sp ecific On/Off interarriv al traffic distri- bution. Imp ortan tly , the ev aluation only co v ers predictions of the a v erage flo w’s throughput and R TT s ov er the scenario, and the pap ers do not sp ecify ho w to compute the time-aw are v alues. One of the first examples of a mo del that uses a system of PDEs can b e found in [78]. The mo del exploits the increased expressiv eness of fluid mo dels when expressing flo w b eha vior ov er time, to b etter deal with “mice" flo ws. These are short flo ws whose completion time is more dependent on propagation delays than transmission delays. The mo del considers a generic TCP version. It still relies on assumpti ons (e.g., mice flow lengths are exp o- nen tially distributed) and do es not address the underlying top ology . In the ev aluation, despite including the temp oral comp onen t in its reasoning, they only consider av erage v alues for the throughput and F CT. Later, in [79], the authors define a model using PDEs to mo del Scalable TCP . While this mo del supp orts m ultiple flows, it assumes constant R TT and contin ues to disregard the net work top ology . Later on, w e hav e mo dels that co ver more complex net w ork scenarios. F or example, in [80] the authors develop an ODE mo del for considering queuing p olicies other than FIFO. This includes fair queuing, longest queue first, and shortest queue first. The mo del also supp orts m ultiple TCP flo ws and back- 27 ground UDP traffic. The mo del can predict the TCP flo w’s throughput and buffer o ccupancy , and the UDP’s loss rate. Ho wev er, the mo del is adjusted for TCP Reno, sp ecifically during the congestion a voidance phase. Another example is in [81], where the authors mo dify b oth discrete and fluid queuing mo dels to capture transien t states of TCP flows. While the obtained fluid mo del is less accurate, its low er computational cost allows it to b e applicable on larger net works, unlike the discrete mo del. 4.3. Network Calculus Net work Calculus (NC) is a set of techniques used to study net work p er- formance in tro duced in [7, 8]. Unlik e QT, which describ es net w ork traffic through traffic distributions, NC describ es traffic through arriv al and service curv es. These are then used to generate w orst-case b ounds for the flow’s per- formance metrics o v er time. In [7], the authors defined net w ork elemen ts, like dela y lines, buffers, and regulators. Then, in [8], the authors sho w ho w these elemen ts can b e combined to define a net w ork model. Briefly , NC became p opular as a middle p oin t b et ween QT and DES, offering inexp ensiv e predic- tions, relying on fewer traffic assumptions, and with an integrated temporal comp onen t NC mo dels, ho w ever, are not fla wless. While NC ensured a correct w orst- case b ound, originally these w ere too permissive for the prediction to be useful. Conv ersely , tighter b ounds ma y result in increased mo del complexity and computational cost. Due to this, SoT A NC models attempt to push the Pareto fron t b y pursuing a better balance betw een tight b ounds and the cost of obtaining them [35]. Ultimately , NC mo dels can be subdivided in to three categories: Algebraic-based Discrete Netw ork Calculus (ADNC), Optimization-based Discrete Netw ork Calculus (ODNC), and Sto c hastic Net- w ork Calculus (SNC). Finally , a significant limitation shared across all NC mo dels is that it can only work in feedforward net works —i.e., where routing paths do not result in cycles. This asp ect was studied in [99], where the authors tried to define a delay b ound for generalized topologies, but only found it p ossible when link utilization is extremely low. Otherwise, the pac ket dela y for a giv en flow ma y be influenced b y flo ws that do not share a queue within their routing paths or ha v e been completed b efore the given flo w has ev en started. As the authors state, a delay b ound ma y not even exist b ey ond a large enough link utilization. This can be addressed b y turning general net w ork top ologies in to functionally feedforw ard ones. F or example, net works can be shap ed into 28 spanning trees b y removing links from consideration. Alternativ ely , in [100] they prop ose mo difying the flo w’s routing paths through turn-prohibition. While the latter is less drastic, both decrease flow throughput. Empirical ev aluation suggests that this effect worsens in larger top ologies. 4.3.1. A lgebr aic-b ase d Discr ete Network Calculus (ADNC) It is the approach follo wed by the original prop osal [7, 8], where net work elemen ts are defined and can b e chained. The dela y b ounds are solv ed alge- braically through the use of (min, + ) and (max, + ) algebra [30]. Later on, the original prop osal was referred to as T otal Flo w Analysis (TF A). One of the first improv emen ts w as using piece-wise functions to replace the linear arriv al and service curves, as used in [82], resulting in more expressive and tigh ter definitions. [83] also extends the original prop osal by supp orting flo ws with flow con trol —i.e., flows that reserv e net work resources for p erformance guaran tees. Later in [30], the authors prov ed that TF A’s delay bound o veres- timates the impact of flow bursts at each no de in a flow’s path. This can b e addressed b y defi ning and exploiting the concatenation prop ert y (known as the Pa y Bursts Only Once Principle or PBOO), resulting in Separate Flow Analysis (SF A). The PBOO principle was extended in [31] to mak e it applicable to apply NC for m ultiplexing flo ws in the same node, renamed as P ay Multiplexing Only Once (PMOO). In theory , PMOO also allow ed for considering arbitrary m ultiplexing of flows, unlike previous approac hes, which assumed FIF O mul- tiplexing. How ev er in [85] shows that arbitrary m ultiplexing ma y result in lo oser b ounds than those obtained with SF A. Next, the authors at [84] pro- p ose extending the PMOO mo del using b ounded arriv al curves, making the mo del more efficient with the aim of unlo c king the use of more accurate y et complex function definitions for the arriv al and service curves. Finally , in [35] the authors compare b oth ADNC and ODNC alternativ es in the SotA, and offer an impro ved version of ADNC, named T andem Matching Analysis (TMA). It considers all p ossible (min, + ) op erations during the netw ork de- comp osition to obtain the least p essimistic delay b ound. While this approac h results in an exponential n umber of decomp ositions according to the net work size, it is optimized to k eep computational costs akin to similar alternatives, as pro ved by their ev aluation. 29 4.3.2. Optimization-b ase d Discr ete Network Calculus (ODNC) These mo dels find the dela y bound for a giv en flow in the netw ork b y solving a Linear Programming Problem (LPP). The LPP is formulated to iden tify the worst p ossible delay according to a set of temp oral, spatial, and service constrain ts. An early example is introduced at [85] as an alternative the PMOO model. While more accurate, esp ecially when facing complex scenarios lik e non-FIFO flo w multiplexing, this approach suffers from an extremely high computational cost. This is b ecause solving the dela y b ound for a given flo w is an NP-hard problem, as b oth the n umber of LPPs to solve and constrain ts in eac h of them increase exp onen tially with the netw ork size [86]. Hence, authors ha ve lo ok ed at heuristics or scenarios where the compu- tational cost is more b ounded. In the same paper [86], the authors find that when considering tandem netw orks —graphs that can b e describ ed as a directed path with no shortcuts— the computational cost decreases from exp onen tial to p olynomial. They also prov e a heuristic where a univ ersal service curve can b e found for all flo ws in the netw ork, further reducing the computational cost. Another heuristic is prop osed in [87] where a com bina- tion of Monte-Carlo and Direct Search w as used to reduce the cost of solving the LPP . How ev er, an analysis done by [35] shows that these heuristics are insufficien t to make solving the problems feasible for large netw orks. 4.3.3. Sto chastic Network Calculus (SNC) While traditional (deterministic) NC offers a strict dela y b ound, SNC instead defines delay b ounds that can b e incorrect with a small probability of error [101]. The earliest pap er proposing this approac h can b e found in [88], where they use b ounds for the moment generating functions of random v ariables. While there hav e b een several attempts to define a complete SNC [13], the authors at [89] are the first to aggregate formulations from previous researc h to build one that satisfied a set of listed prop erties (e.g., service guaran tees, p er-flo w service). How ev er, later w ork b y [102] sho wed that such a mo del is not purely sto chastic and struggles in asp ects such as capturing statistical flo w multiplexing. In [90], the author prop oses expanding existing SNC by estimating the effectiv e netw ork bandwidth to obtain b etter delay b ounds. Ultimately , the biggest dra wbac k of SNC is its relatively recen t dev elopment, making it less explored and dev elop ed than its deterministic coun terparts. 30 4.4. Other A nalytic al Mo dels Analytical mo dels that are not based on either QT or NC are generally rare, as this implies losing the accumulated kno wledge from previous researc h. Nonetheless, sometimes mo dels ma y b e built based on other principles, or the authors wish to a void the weaknesses present in these fields. A common source of alternative analytical mo dels is building ones based on the empirical observ ations, or, in the case of w ell-defined proto cols lik e TCP , on the RFC definitions. In [91], the authors implement an analytical mo del for TCP with forward ac kno wledgments using Rate-Halving. This mo del was later ev aluated considering phase effects and RED. Ho wev er, the ev aluated top ologies w ere small, generally single-link top ologies, and the In ternet’s topology was simplified to a single link with losses b et ween the routers. Next, in [92], the authors build a sto c hastic mo del for TCP Reno. It pro cesses flows in to rounds according to the congestion window. Unfortu- nately , it relies on sev eral assumptions for the mo del to b e solved through a closed-form solution: losses are indep enden t of those in differen t rounds, and R TT is assumed to b e indep enden t of the windo w size. The mo del also do es not cov er Reno’s fast-reco very algorithm or the small differences across implemen tations. Alternativ ely , there has b een research applying game theory to model congestion control algorithms. This idea was in tro duced in [93], where the congestion algorithm w as tasked to find a Nash equilibrium where perfor- mance across all presen t flows is maximized. The authors prop ose using the Nash arbitration sc heme to predict the throughput allo cation for each flo w while proving the existence of an optimal allo cation. Inspired by it, in [94] prop oses tw o mo dels. The first, an ideal mo del, relies on knowing the “utilit y" obtained by each flo w at a given throughput, which is unkno wn b eforehand. The second mo del simplifies the first, b eing decomp osed into a solv able dual problem. A strength of this approac h is that mo dels considered link capac- ities in their reasoning, whic h w as rare for analytical models at the time. Con versely , this approach had tw o main dra wbacks. First, in practice, con- gestion algorithms like TCP may not necessarily allo cate throughput solely b y maximizing bandwidth allo cation. Second, the prop osed mo dels are gen- erally to o complex to be applied to real netw orks, while their simplified forms in tro duce inaccuracies. 31 4.5. Summary In summary , analytical mo dels aim to b e a cost-effectiv e alternativ e to DES. The first mo dels are based on Queueing Theory , whic h uses queues or systems of queues to describ e b oth net work devices and proto cols [6, 64]. Ho wev er, simpler queue mo dels lac k accuracy due to ov ersimplistic assump- tions, while more complex mo dels are no longer cost-effectiv e to solve. Fluid mo dels are also based on QT, but allow flo w payloads to b e infinitely divis- ible. While this con tradicts ho w traffic pac k ets b eha v e in realit y , it allo ws mo dels to b e formulated through a system of differential equations, reinforc- ing their cost-effectiv eness while incorporating the temp oral comp onen t in their prediction [70]. In resp onse to QT’s limitations, Net w ork Calculus was developed as an alternativ e for netw ork mo deling [7, 8]. Unlike queuing mo dels, NC mo dels included the temp oral comp onen t and predicted p erformance b ounds rather than direct estimations. Ho w ev er, like QT mo dels, NC mo dels are also split b et w een inaccurate predictions (i.e., lo ose b ounds) or prohibitiv ely computa- tional costs. Finally , while there are analytical mo dels that do not fall under an y of these fields, their use tends to b e more sp oradic and more sp ecialized (e.g., using Game Theory to understand bandwidth allo cation in congestion- con trolled traffic [93, 94]). 5. Mac hine Learning Mo dels This section fo cuses on net work mo dels that predict net w ork performance b eha vior through the use of ML mo dels. Discussed mo dels are summarized in T able 4. 5.1. “Shal low" ML Mo dels The earliest attempt to use ML to predict net work p erformance that w e iden tified w as in [37]. In it, the authors considered b oth “form ula-based" (an- alytical) and “history-based" (temp oral regression) mo dels when predicting TCP throughput. They pro ved analytical mo dels’ limitations when dealing with saturated traffic, while ev en simple regression mo dels like mo ving a v er- age (MA) or exp onen tially weigh ted MA (EWMA), w orked fairly w ell. Still, the purp oses of these models w ere not to build a to ol, but to ev aluate the feasibilit y of ML mo dels. 32 Mo del T yp e Input scop e Output scop e T raffic t yp e P erformance metrics Ev aluation [37] MA, EWMA, Holt-Win ters Single-flo w scenarios Flo w-level (temp oral) TCP (generic) Flo w throughput T estb ed data P athPerf [103] SVR Single-flo w scenarios Flo w-level (temp oral) TCP (generic) Flo w throughput T esb ed data and real data WISE [104] Causal graph Single-flo w scenarios Flo w-level HTTP connections Net work resp onse time Real data [105] EVT Multi-flo w scenarios Flo w-level UDP 50th, 75th, 90th, 99th, 99.9th, 99.99th, 99.999th and 100th p er- cen tiles for pack et dela y and jitter T estb ed data CLAAP [106] RBFNN Single-flo w scenarios Flo w-level (temp oral) UDP Flo w latency Real data [107] DNN Multi-flo w scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [108] DNN, RF Multi-flo w scenarios Flo w-level UDP and TCP A v erage pack et de- la y Sim ulated data RouteNet [109, 110, 111] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter Sim ulated data [112] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter Sim ulated data 33 RouteNet- Erlang [113] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter, loss rate Sim ulated data RouteNet- F ermi [17] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y and jitter, loss rate Sim ulated data and testb ed data RouteNet- Gauss [58] Custom- designed MPNN with temp oral comp onen t F ull netw ork scenarios Flo w-level (temp oral) UDP [A v erage, median, and 90th, 95th, and 99th p ercen tile] pac ket [dela y , jitter] T estb ed data [114] Custom- designed MPNN with graph atten tion F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [115] Custom- designed MPNN with graph atten tion F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data [116] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y T estb ed data [117] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y T estb ed data 34 DeepComNet [118] GGNN [23] F ull netw ork scenarios Flo w-level UDP and TCP TCP flow through- put, UDP a v erage pac ket delay Sim ulated data [119] GGNN [23] F ull netw ork scenarios Flo w-level TCP (Reno, Cubic, Bic, Illinois, V eno, V egas and Ledbat) Flo w R TT and throughput Sim ulated data [120] GCN [22] F ull netw ork scenarios Flo w-level UDP and TCP A v erage pack et de- la y Sim ulated data EA GLE [121] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP and TCP A v erage pack et de- la y and loss rate Sim ulated data xNet [122] Custom- designed MPNN with temp oral comp onen t F ull netw ork scenarios Flo w-level (temp oral) UDP and TCP (DTCP) A verage pac k et de- la y , flow throughput and F CT Sim ulated data GLANCE [123] Custom- designed MPNN F ull netw ork scenarios Flo w-level UDP A v erage pac ket dela y and jitter, flo w throughput, loss rate Sim ulated data Flo wSeer [124] Ensem ble of GCN [22], DNN with atten tion and DNN F ull netw ork scenarios Flo w-level UDP A v erage pack et de- la y Sim ulated data m4 [125] GraphSage [126] + GRU [127] F ull netw ork scenarios Flo w-level (temp oral) TCP (DCTCP , TIMEL Y, DCQCN) [A verage, 90th p er- cen tile] flow F CT and throughput Sim ulated data 35 xW eav er [128] CNN F ull netw ork scenarios Flo w-level UDP Flo w FCT and throughput Sim ulated data and testb ed data Deep-Q [129] CV AE [130] + LSTM [25] Multi-flo w scenarios Flo w-level (temp oral) UDP P ac ket delay and loss rates T estb ed data [131] T ransformer [132] Single-flo w scenarios P ack et-lev el UDP P ack et delay Sim ulated data DeepQueueNet [16] T ransformer [132] + BiLSTM F ull netw ork scenarios P ack et-lev el UDP F ull pack et path in- formation Sim ulated data [133, 134] NP [27] Multi-flo w scenarios Flo w-level Non-sp ecific A verage pac k et de- la y , flow throughput and loss rates T estb ed data T able 4: Summary of ML netw ork p erformance models 36 Later in P athP erf [103], the authors used a Supp ort V ector Regression (SVR) mo del to predict TCP throughput. It w as trained in a testb ed using real net work data and div erse routing paths, showing competitive perfor- mance relative to analytical mo dels. SVRs, a popular arc hitecture at the time, are able to learn from limited amoun ts of data, whic h is extremely useful considering the cost and difficulties of capturing real-w orld data for training. Ho wev er, this architecture still had sev eral limitations: the mo del could only consider one TCP flow at a time, could only predict av erage met- rics, and the mo del had to b e retrained for every change to the flow’s path. Another example of an early p erformance prediction to ol is WISE [104]. WISE sits on the edge b etw een performance prediction and netw ork v ali- dation: it builds a causal graph from captured traffic traces that fo cus on HTTP connections and the identified v ariables, such as the num b er of pack ets transmitted, their size, and estimates net w ork bandwidth. A causal graph is a statistical mo del that defines ho w v ariables influence eac h other in a non- cyclical graph. WISE focused on predicting the HTTP connection’s response time, b oth from the netw ork and bro wsers. The causal graph allo ws the user to query differen t v alues for the sp ecified v ariables, hence allo wing the ev alu- ation of “what-if" scenarios to understand their impact on the resp onse time. In the ev aluation, which consisted of captured data from Go ogle’s net work, it pro ved to b e accurate in predicting b oth options of the response time. Unlik e other ML mo dels, it do es not need training. Instead, it learns and builds the causal graph during inference, making the metho d versatile but slow. No wada ys, “shallo w" ML mo dels ha v e fallen out of fa vor against their DL coun terparts, as the latter tend to offer more accurate results due to their more complete arc hitecture. How ev er, “shallo w" ML models are sometimes desired due to their efficiency , being able to b e trained with few samples and at a lo w computational cost. A recent example of this is in [105], based on Extreme V alue Theory (EVT), a statistical mo del designed to predict extreme even ts or worst-case b ounds. In this pap er, EVT is used to predict upp er-bound approximations of metrics lik e flow latency and jitter. The EVT mo del is highly efficien t: it only required to be trained on 5% of the data a v ailable. It w as also able to predict scenarios with different top ologies and mak e predictions for the entire netw ork (with m ultiple-interacting flo ws) and not only for individual flo ws. How ev er, the results show that its accuracy for netw ork predictions is significan tly lo w er. Finally , the EVT mo del also dep ends on assumptions like stationary flow latencies. Another recen t example is that of CLAAP [106]. CLAAP uses a Radial 37 Basis F unction Neural Netw ork (RBFNN) to predict flo w latency in online gaming sessions. It bases its predictions on previous latency measuremen ts, akin to a regression mo del such as MA and EWMA in [37]. While the RBFNN is a shallow neural net work, it still outp erforms other online regression mo dels in its ev aluation. F urthermore, it is designed for quic k inference sp eed, as it is a mo del meant for online management of online gaming sessions. Hence, resp onse time is more critical than creating an accurate yet exp ensiv e internal represen tation of the flow state. 5.2. De ep L e arning and G r aph Neur al Networks The initial attempts to build DL netw ork p erformance mo dels w ere tested using Dense Neural Net works (DNNs). In [107], they ev aluate the use of DNNs when predicting the a verage flo w delay . While their results show the p oten tial of DL metho ds b y b eing computationally inexp ensiv e and capable of mo deling non-linear relationships, they noted their inability to generalize to unseen top ologies and routings. Also, the only traffic profile of pac ket arriv als considered w as an exponential distribution, a simpler distribution than those faced in real-life scenarios. Later in [108], the authors compared building a DNN mo del against a Random F orest (RF) mo del using a mixture of UDP and TCP traffic of v arying complexit y . They ultimately sho w ed that the RF mo dels generally outp erformed the DNN mo dels. Hence, researc hers quic kly mo ved from using DNNs and lo ok ed in to alternativ e, b etter-suited DL arc hitectures. 5.2.1. R outeNet and Suc c essors Among all the p ossible alternativ es, the family of arc hitectures that pre- dominated w ere Graph Neural Net works (GNNs) [24]. GNNs are designed to tak e a graph input and exploit the top ological information presen t. Unlik e previous ML and analytical mo dels, which struggled to represent the impact of the net work top ology and flo w’s routing path on their reasoning, GNNs can prop erly exploit them, leading to their success. This is exemplified b y the RouteNet family of mo dels. The original one w as describ ed and introduced in [109], and w as later more thoroughly examined at [110, 111]. RouteNet’s architecture is based on the Message Passing Neural Net work (MPNN) arc hitecture [26]. It w orks b y building a heterogeneous graph that captures the dep endencies b et w een links and flo ws within a netw ork, based on its topology and routing. In principle, using this description allows RouteNet to accurately predict flo w 38 dela y and jitter in top ologies and routing paths unseen during training while main taining a low computational cost (i.e., milliseconds per net w ork sce- nario). The RouteNet mo dels are trained and ev aluated on samples using DES. While effectiv e, RouteNet do es hav e imp ortan t dra wbacks. First, further ev aluation has sho wn limitations in its generalization [135]. This includes larger top ologies and link capacities than those seen during training, but also to different traffic profiles, pac ket sizes, and inter-pac ket gap (IPG) time distributions. F urthermore, RouteNet can only predict a v erage measures across the en tire scenario, and cannot mo del transitory b eha vior. Finally , it do es not supp ort key net work features, suc h as mo deling TCP traffic, Quality of Service (QoS) requiremen ts, or alternative routing paths. T o address these limitations and impro ve ov erall accuracy , RouteNet has since b een extended through successive iterations to cov er a broader range of scenarios. The first c hange was in [112], in whic h additional features were added to add supp ort for v ariable queue sizes. The first ma jor expansion, ho wev er, came with RouteNet-Erlang [113]. It includes queues in its graph represen tation of the net work alongside links and flo ws, and its feature extrac- tion is mo dified to b etter supp ort inferring o ver larger top ologies than seen during training. It also supp orts flows parametrized through auto-correlated traffic distributions, based on observ ations done to in ternet traffic [4]. Ov er- all, this allows for a more complete definition of the netw ork and its traffic, b etter accuracy when dealing with unseen scenarios during training, and in tro duces supp ort for m ultiple queues p er p ort, differen t queuing p olicies, supp orting QoS requirements, and pack et loss predictions. The next large iteration is that of RouteNet-F ermi [17]. Rather than predicting the a verage flow dela y directly , RouteNet-F ermi instead learns to predict the mean queu e o ccupancy at each step of the flo w’s routing path. This change in the model’s reasoning results in more accurate predictions and b etter scaling when facing larger netw orks, in num ber of elements, traf- fic intensit y , and link capacities. RouteNet-F ermi also expanded the n umber of supp orted queuing p olicies relative to RouteNet-Erlang. Finally , this it- eration of RouteNet was also ev aluated with real (testbed) net work data, pro ving its feasibility in more complex traffic scenarios. The most recen t iteration of RouteNet is RouteNet-Gauss [58]. It in- tro duces a temp oral comp onen t, splitting traffic scenarios in to fixed-time windo ws, enabling it to mo del non-stationary traffic and b etter address the complexities present in real net work samples. The window size can b e ad- 39 justed to balance mo del expressivity to computational cost. It also remo ves the need to specify flo w distribution parameters as input, allo wing it to accu- rately mo del non-parametric traffic. Still, RouteNet-Gauss do es share some limitations with the original version: sp ecifically , a lac k of supp ort for con- gestion con trol algorithms and generalization to unseen netw ork proto cols and traffic distributions from those seen during training. Bey ond the main line of RouteNet architectures, there hav e b een improv e- men ts made b y authors other than the original ones. The most common is including an atten tion mec hanism at some p oin t in the message passing phase. In [114], this is introduced to the original RouteNet, while in [115] it is introduced in the RouteNet-F ermi. The inclusion of graph atten tion mec hanisms is used to improv e the expressivity of the message passing phase b y allowing no des to weigh t the relative imp ortance of their neigh b ors. This mec hanism was in tro duced in the Graph A ttention Net work [136], which b e- came the SotA general-purp ose GNN arc hitecture. W e also note that, in [115], other improv emen ts were in tro duced, suc h as using feature selection to refine the feature extraction pro cess. In [116], the authors also expand RouteNet-F ermi by increasing the fea- tures used to co dify flo w information. Sp ecifically , they extract the IPG mean, v ariance, and sev eral percentiles for eac h flo w. They also prop ose using the pac k et loss rate as an input for predicting the flow dela y , which, while effective, is impractical in real-life scenarios, as either the pack et loss rate is not known during inference, or if it is, the a verage pack et delay is al- ready known as well. Finally , in [117], the authors mo dify RouteNet-F ermi, replacing the use of flo w distribution parameters with the ra w pack et traces and the w av elet decomp osition pro cess during flow enco ding. This allows for cost-efficien t, rich descriptions of non-parametric traffic distributions. The results also suggest the mo del’s abilit y to p erform inference on unseen traffic distributions at a minimal penalty to the prediction error. How ev er, this metho d requires the pac k et traces of the flo ws, whic h are costly to obtain and not alw ays av ailable. 5.2.2. Other GNN Mo dels While RouteNet has inspired many of the GNN mo dels, there are other alternativ es present in the SotA. Published around the same time as the original RouteNet pap er, the au- thors in [118] also prop ose a GNN-based net w ork model. Lik e RouteNet, the authors also prop osed building a graph that captures the dep endencies 40 b et w een links and flo ws. Unlike RouteNet, its arc hitecture is based on Gated GNNs (GGNNs) [23] rather than MPNNs. Next, it can mo del congestion- con trolled traffic by represen ting b oth the original stream and the A CK mes- sages as separate flows. Ho w ever, while RouteNet’s hypergraph is a heteroge- neous graph, with the differen t net work elemen ts b eing pro cessed differently , the h yp ergraph in [118] is a homogeneous graph, and its no des are only differen tiated through a one-hot feature. F urthermore, its simpler feature extraction results in low er accuracy and generalization to unseen net w ork proto cols and traffic intensities. This approac h was later expanded in [119]. First, the hypergraph it builds also includes device p orts (referred to as in terfaces), and path no des that enco de the flo w’s routings. The encoding of the netw ork elements is also extended to supp ort m ultiple TCP v ersions. Ov erall, it sho ws lo w er error rates than their baseline analytical mo dels, but p erformance v aries dep ending on the TCP v ersion and the routing path length. F urthermore, the TCP version is enco ded through one-hot enco ding, meaning that it can only b e applied to versions cov ered during training. Next, in [120], the authors prop ose using a Graph Conv olution Netw ork (GCN) [22] arc hitecture, rather than an MPNN, to predict traffic delays. Un- lik e previous netw orks, they apply the graph to the original netw ork top ol- ogy instead of building a h yp ergraph of relationships. Unfortunately , their metho dology is not describ ed, and w e lack details suc h as the extracted fea- tures. Ho wev er, w e know they ev aluated on synthetic 500-no de net works and included TCP traffic flo ws. In EA GLE [121], the authors prop ose a more complex arc hitecture to predict pack et delays and loss rates simultaneously . Similar to RouteNet, the net w ork is represen ted as a h yp ergraph that describ es links, routers, and routing paths. Information b et w een different elemen ts is propagated through random w alks across the h yp ergraph. A ccording to the authors, this tec hnique is parallelizable and effective at extracting neigh b orho od informa- tion, alb eit it also introduces several h yp erparameters to consider. Once the em b eddings are obtained, a multi-head graph attention mec hanism is used. While it compares well against their reference queuing model and the original RouteNet, it lac ks comparison with other SotA mo dels at the time. Then, in xNet [122], they prop ose another MPNN-based arc hitecture that represen ts the netw ork through a h yp ergraph. It considers five t yp es of net work elements, more than an y other approach: no des, queues, links, paths, and flo ws. Paths and flo ws are later used in the readout to extract the 41 p erformance delays, suc h as path pack et dela y or FCT. A temporal domain is also introduced by splitting the netw ork in to fixed-sized windo ws, ac hieving this earlier than RouteNet-Gauss. Overall, their ev aluation sho ws better results than mo dels such as Deep-Q [129] and the original RouteNet, but lac ks ev aluating different traffic distributions or differen t-sized netw ork top ologies. In GLANCE [123], a similar hypergraph definition of the net work is pro- p osed, considering path, no de, and link embeddings, and an MPNN-lik e ar- c hitecture. Unlik e other MPNNs, GLANCE do es not reuse parameters b e- t ween iterations in the message passing phase. This is uncommon, as while it increases the p oten tial mo del expressivit y , in practice, it increases training costs and usually do es not result in improv ed p erformance. Also, GLANCE defines an edge graph conv olutional la y er when up dating the no de em b ed- dings, combining graph and sp ectral features. While the latter can capture expressiv e relationships, they to o risk lo wer generalization to unseen top olo- gies. Moreo v er, GLANCE is the first mo del to propose the use of transfer learning, sp ecifically , to re-train the mo del to new tasks quickly . It consists of fixing (a.k.a freezing) the w eigh ts of the embeddings and message passing phases, and retraining the readout function from scratc h. In the ev aluation, GLANCE do es improv e against RouteNet-F ermi, although this difference is smaller while generalizing to unseen top ologies. Unlik e in previous mo dels, Flo wSeer [124] pro cesses top ological and flow information separately . The top ological information is sen t through a GCN, while flo w information is pro cessed by a DNN with atten tion, trained through an enco der-deco der sc hema. Extracted features are then concatenated and fed to a DNN for the final prediction. While separating b oth types of infor- mation allows pro cessing each with an appropriate architecture, the mo del cannot learn how they in teract un til the final DNN. In its ev aluation, it do es sho w similar or b etter prediction accuracy than RouteNet-Erlang, but it do es not include comparisons of generalization to unseen net work top ologies. Next, m4 [125] prop oses the use of a “flow-lev el sim ulator" based on GNNs. Sp ecifically , their solution includes a flow generation module that generates TCP flo ws according to a user-sp ecified workload. The model then uses a temp oral and spatial comp onen t to up date the state and predict the flo w’s p erformance metrics, sp ecifically F CT and throughput. The temp o- ral component is based on a GRU [127] model, and the spatial mo del on a GraphSage [126] mo del. The spatial component also uses a h yp ergraph definition, which considers flows and links only . Unlik e RouteNet-Gauss or xNET, its temp oral comp onen t is based on ev ents (i.e., a flow starting/ending 42 in the netw ork) rather than fixed-sized windows. This minimizes ho w many times the flow states need to b e up dated, although it makes it unsuitable for predicting metrics such as pac ket delay o ver time. m4 also prop oses learning secondary p erformance metrics during training, suc h as the flo w’s remaining data to retransmit, to provide the model with further information and im- pro ve its ov erall accuracy , v alidated in their ev aluation. Finally , m4 cannot mo del QoS queues. 5.2.3. Other DL Mo dels Due to GNN’s adv an tages, they are the preferred architecture when build- ing net work models. How ev er, authors ha v e also exp erimen ted with other arc hitectures and approaches. An early example is that of xW ea ver [128], where authors prop ose w ork- ing Con volutional Neural Net w ork (CNN). Sp ecifically , its model uses tw o separate CNNs to pro cess the adjacency and traffic matrices, whose results are then concatenated and joined in a DNN to predict any relev ant p erfor- mance (in the ev aluation, they fo cus on FCT and throughput). While using a CNN allows for a more effectiv e architecture than a DNN, as corrob orated in its ev aluation, it still raises some questions. F or example, it is not clear if the method is p erm utation in v ariant —that is, whether the no des’ order in the traffic matrix impacts the results. Also, similar to Flo wSeer, pro cess- ing separately the top ological and traffic information mak es it harder for the mo del to learn cross-interactions. Another early approach is prop osed in Deep-Q [129], consisting of train- ing a Conditional V ariational Auto-Enco der (CV AE) mo del [130]. Both the enco der and deco der segmen ts of the CV AE, implemen ted through DNNs, use the extracted traffic features as bias. These features are extracted from the traffic matrices using a Long Short-T erm Memory (LSTM) [25] netw ork. A sp ecialized loss definition is used to train b oth the LSTM and CV AE. A t inference time, only the LSTM and the deco der segmen t are used to predict the desired p erformance metrics. While this approac h is accurate and has lo w inference time, it ignores the underlying netw ork top ology . Later, in [131], the authors prop ose the adoption of the transformer ar- c hitecture [132], p opular at the time due to its successful use in LLMs. They argue that ha ving a pre-trained netw ork performance mo del can b e used and later fine-tuned to sp ecific net works. They also defend that trans- formers, as a pow erful sequential mo del, can b e used to generate and use pac ket embeddings to make pack et-lev el predictions. While they examine 43 some transformer-based netw ork mo del protot yp es, they lac k comparisons against SotA mo dels. A more robust implemen tation of a net w ork transformer mo del can be found in DeepQueueNet [16]. It follows a mo dular approac h: using small, c heap-to-simulate scenarios, it builds a library of DL mo dels, eac h represent- ing a net work device. These are implemented through a Bidirectional LSTM (BiLSTM) and a transformer mo del. They tak e as input the sequence of pac ket arriv al times to the device and output the up dated sequence after exiting the device. Ultimately , most, if not all, of the devices in the DES are replaced by the DL model coun terparts. DeepQueueNet was directly inspired b y MimicNet [15], a hybrid mo del that replaced asp ects of DES sim ulation with DL, but, unlik e it, it is not constrained to a F at T ree top ology . Ho wev er, DeepQueueNet’s metho dology do es come with limitations. F or example, as noted by their authors, it cannot model the transien t state of net works and, b ecause of the batc h pro cessing of pac kets, it cannot supp ort stateful protocols lik e TCP . F urthermore, b ecause of the batc hing, pac k ets can b e pro cessed out of order; the authors do prop ose a solution, but it re- quires rep eated inferences by the DL mo dels, increasing computational cost. In addition, b oth the BiLSTM and transformer architectures are notoriously computationally exp ensiv e. As a result, DeepQueueNet do es not app ear to reduce the amoun t of computational effort, relativ e to DES. Instead, due to its implemen tation through libraries like T ensorflo w, it is better suited to run distributiv ely and even in sp ecialized hardw are lik e GPUs. Still, less computationally expensive methods, suc h as MimicNet, p erform inference faster under the same hardware, with the difference scaling with the netw ork top ology’s size. Finally , in [133, 134], the authors prop ose a tw o-step pro cess to build a net work mo del b y first training it using sim ulated net work data and then efficien tly adjusting it using a small dataset of real-w orld netw ork data. The ob jective w as to utilize transfer learning to allo w the mo del to learn from sim ulated samples while b eing effective in real-world scenarios. The mo del itself w as to follo w the Neural Pro cesses (NPs) [27] architecture, a type of DNN that focuses on learning the input’s laten t features, and eases learn- ing. How ev er, it also shares limitations presen t in DNNs, lik e the lac k of adaptabilit y to unseen top ologies, routing configurations, and traffic profiles during training. 44 5.3. Summary While ML-based netw ork mo dels ha v e existed since the early 2000s, it w as not until the proliferation of deep neural net work arc hitectures, and sp ecifi- cally GNNs, that they b ecame so dominant. The flexibilit y b ehind these ar- c hitectures allo ws for sp ecialized designs for netw ork p erformance mo deling, allo wing them to b e more accurate than analytical mo dels while p erforming inference at low inference costs. This mak es them especially app ealing for activ e netw ork management, where quick, accurate predictions are required (e.g., xW ea ver [128], RouteNet [109], GLANCE [123]), o v ertaking analytical mo dels (discussed more in detail in Section 7). T raditional ML mo dels also remain a viable option, as they are c haracterized b y even c heap er inferences (e.g., CLAAP [106]). ML mo dels are not fla wless, ho wev er. Unlike sim ulation or analytical mo dels, ML mo dels are constrained to scenarios similar to those seen during training (e.g., netw ork top ologies, traffic profiles, netw ork proto cols). While this can be mitigated through clever design (e.g., model architecture, ho w input data is enco ded), no co v ered approac h achiev es the same degree of generalization as DES. Also, while faster than DES simulators, ML mo dels usually cannot offer the same level of gran ularit y in their predictions. Those that do, lik e DeepQueueNet [16], do so at the cost of their lo w inference costs. 6. Hybrid Approac hes In this section, we discuss netw ork mo dels that combine t w o or more of the previously discussed approaches, summarized in T able 5. This has the aim of complementing their adv an tages and minimizing the pitfalls. By nature, these mo dels are the most heterogeneous in the w ays they approach the mo del. 6.1. Mo del-tune d Emulation for Performanc e Mo deling Net work em ulators are p ow erful to ols to v alidate and understand net w ork dynamics. Unlik e sim ulation, whic h fo cuses on defining the netw ork state and understanding ho w it is up dated, em ulation replicates the net work b eha vior and its devices through softw are. Ho w ev er, emulation also replicates through soft ware certain b eha viors that were originally executed by hardware logic. This distorts the time tak en to p erform eac h of these op erations, presen ting serious limitations when predicting the p erformance of the netw orks. 45 Mo del Main T yp e Secondary T yp e Input Scop e Output Scop e T raffic T yp e Performance Metrics Ev aluation P antheon [40] Net work em ulator (Mahimahi [137]) Ba yesian optimization F ull net work scenarios P ack et- lev el An y F ull pack et path information Real data iBo x [41] Net work em ulator with a queue mo del Ba yesian optimization F ull net work scenarios P ack et- lev el An y F ull pack et path information Real data Prophet [138] NUM analytical mo del [94] Gradien t descen t F ull net work scenarios Flo w-level TCP 1 Flo w throughput Sim ulated data DeepTMA [139, 140] NC (TMA [35]) GGNN [23] F ull net work scenarios Flo w-level (temp oral) Non-sp ecific W orst-b ound pac ket delay Sim ulated data [39] GraphSage [126] NC (TF A [7, 8], SF A [30]) F ull net work scenarios Flo w-level UDP Mean, min, max, 90th and 99th p ercen tile pack et dela ys T estb ed data QT- RouteNet [38] RouteNet [109] M/M/1/B queue mo del F ull net work scenarios Flo w-level UDP A v erage pack et dela y Sim ulated data GNNetSlice [141] R GCN [32] Unsp ecified QT mo del F ull net work scenarios Flo w-level UDP A v erage pack et dela y , jitter, and loss Sim ulated data 1 TCP fla vors supp orted by Srik an t’s mo del [142] 46 QINN [143] DNN M/G/1 queue mo del T raffic- matrix scenarios Flo w-level UDP and TCP (non-sp ecific) A vg. pac k et de- la y , throughput Sim ulated data [42] ns-2 [28] Flo w queueing mo del [74] F ull net work scenarios P ack et- lev el UDP and TCP 2 F ull pack et path information Sim ulated data MimicNet [15] ns-3 [29] LSTM [25] F ull net work scenarios P ack et- lev el An y F ull pack et path information Sim ulated data m3 [144] flo wSim [145] T ransformer (LLama2 [146]) F ull net work scenarios Flo w-level UDP and TCP (DCTCP , TIMEL Y, DCQCN, HPCC) 99th p ercen tile F CT Sim ulated + Real data CausalSim [43] T race sim ulation Causal DNN F ull net work scenarios P ack et- lev el An y F ull pack et path information Sim ulated data Sim2HW [44] OMNET++ [36] GraphSage [126] F ull net work scenarios Flo w-level UDP Min, 25th, 50th, 75th, 90th, 99th, 99.9th, 99.99th, 99.999th p er- cen tile pac ket dela ys T estb ed data 2 Secondary mo del accounts only for TCP . T able 5: Summary of hybrid net w ork p erformance mo dels 47 The authors of P antheon [40] identified this issue and prop osed a mo del to fine-tune emulation soft w are to track traffic p erformance accurately . It consists in fine-tuning em ulation parameters, lik e propagation dela y , using Ba yesian optimization. Specifically , this is a process where the parameters w ere adjusted, ev aluated using captured internet traces, and up dated ac- cording to their error. This w as done using the Mahimahi [137] emulation soft ware. Their ev aluation sho w ed a tenfold decrease in prediction error. Ho wev er, this approach remains limited, due to the small num b er of ad- justable parameters, resulting in the error rates a veraging at 17% for the tested traces, and the cost of em ulating the different scenarios rep eatedly . A similar approac h is that of iBo x [41]. iBox models the netw ork as a sin- gle b ottlenec k link with a FIFO, drop-tail queue. iBox considers t w o mo dels, discriminating b et ween “reactiv e" and “non-reactive" cross traffic. They de- fine non-reactiv e traffic as those flows whose p erformance is not influenced b y c hanging the sp ecific congestion control proto col, reactive otherwise. While the non-reactiv e cross traffic can be mo deled directly from parameters ex- tracted from the net w ork traces, the reactiv e cross traffic parameters are learned through Ba y esian Optimization and real traffic traces, as in P an- theon. The queue mo del is executed within a net w ork simulator or em ulator to b e ev aluated; in their ev aluation, the authors sp ecifically used ns-2. iBo x ultimately shares similar b enefits and limitations to its predecessor: b y being built with real traffic traces, it attempts to minimize the impact of training with sim ulated traces [147, 148, 59]. How ever, the com bination of Bay esian optimization’s high computational cost and the mo del’s simplicity ma y limit its practicalit y in real-world deploymen ts. 6.2. ML + Analytic al Hybrid Mo dels Generally , b oth ML and analytical models offer similar strengths and w eaknesses —i.e., quic k inference times but less accurate than DES. While it ma y b e coun terintuitiv e to com bine the tw o, successful applications can still b e found in the SotA. An early example is Prophet [138], where they used ML to solv e the NUM mo del [94]. In the original NUM paper, the authors aimed to maxi- mize TCP throughput, but their initial ideal mo del required prior knowledge of each flo w’s utilit y . In Prophet, how ev er, they can appro ximate the ex- p ected utility using Srik an t’s unifying mo del [142]. Although this to o relies on another unkno wn, the scaling factor, Prophet approximates it through sampling and gradient descent. Sampling is done efficiently , appro ximating 48 the flo w parameters, and grouping flows to reduce the n um b er of them to consider. Conv ersely , these mo difications limit the mo del’s applicability . F or example, the flo w grouping is designed assuming a Clos top ology . Another approac h is to use ML to improv e the reasoning of an analytical mo del. In [139], a GNN mo del is used to predict the b est tandem decomp o- sitions to b e used by their NC mo del. By acting as a heuristic, the GNN can impro ve the results of the NC mo del without incurring a significan t p enalty in inference time. F urthermore, the GNN mo del does not need to be p er- fectly accurate for the NC mo del to b enefit from it. The authors expanded DeepTMA in [140], to allow the generation of decomp ositions, as well as per- forming feature analysis to understand whic h are the most significant features of the GNN mo del. While this approac h reliably increases the NC mo del’s accuracy , ultimately , it cannot address the inherent limitations presen t in NC mo dels (e.g., assuming feedforward netw orks). In con trast, in [39], the authors in vert the dynamic, instead using the upp er b ounds gathered from an NC mo del as an additional input to a GNN mo del task ed to predict the p erformance metrics. Their analysis lev eraged a GraphSage [126] mo del, a generic GNN architecture, but its findings may b e generalized to more sp ecialized GNN architectures. They hav e also con- firmed, as expected, that tighter NC b ounds increase the accuracy of the resulting GNN mo del. A similar approac h was follow ed b y the authors of QT-RouteNet [38]. In this case, the authors use a QT model of the netw ork —a M/M/1/B mo del— to extract flo w and link features, which later are used as input b y an RouteNet [109] model. Doing so allow ed it to generalize to topolo- gies m uch larger than training: it w as trained on top ologies up to 50 no des large, and ev aluated in topologies ranging from 51-300 no des. Similarly , GNNetSlice [141] also introduces netw ork slicing information and appro x- imate QT predictions, suc h as the exp ected maxim um queuing delay and pac ket loss rate, as inputs to its mo del. The mo del itself is a Relational GCN (R GCN) [32] designed to measure the impact of netw ork slicing on p erformance. In its ev aluation, it outp erformed RouteNet-F ermi [17] when predicting the a verage pack et loss rate, dela y , and jitter. Later, in Queue-Informed Neural Net work (QINN) [143], integration is expanded to include a queue model in the model’s loss function. In summary , a DNN mo del predicts b oth the av erage queuing delay and the throughput; the latter is then used by a M/G/1 queue mo del to obtain a second queuing dela y prediction. The loss function computes the loss o ver b oth predictions. 49 While the authors hav e used a DNN, this approac h is compatible with other NN arc hitectures. Ultimately , all of these approac hes are a reliable w ay of impro ving the mo del’s accuracy , but they cannot resolve ML’s inheren t limitations. 6.3. A c c eler ate d DES An alternative hybrid approac h for netw ork mo deling, and arguably one of the most p opular no wada ys in the SotA, consists of reducing the compu- tational cost of DES by replacing some elements with a faster alternativ e. The main ob jective is to maintain DES’s b enefits (mainly its accuracy) while minimizing its computational cost. The first mo del to attempt this was in [42], where the authors com bined a DES sim ulator (ns-2) with the fluid mo del in [74]. The fluid mo del only co vered TCP flo ws in the net work’s core. The pap er in tro duces rules on ho w pack ets are up dated when they cross the core, according to the v alues of the resolv ed fluid mo del, along with synchronization rules to av oid the DES and fluid model’s states from div erging. Limitations include the fluid mo del’s increased error rate and only applying to TCP flo ws. F urthermore, the translation b et w een fluid and simulated traffic can result in predictions that generally hold but do not offer sufficien t granularit y . F or example, the fluid mo del may predict accurately that a p ercentage of pac kets will drop at a given time, but cannot exactly predict which pac kets are dropp ed; instead, these will b e selected at random. It will tak e nearly tw o decades for a new iteration of this idea to b e prop osed, which ev en tually do es so in the form of MimicNet [15]. Rather than an analytical mo del, MimicNet uses an LSTM mo del to replace parts of the netw ork topology in the DES. The approac h exploits the symmetry presen t in fat tree top ologies, commonly used within data cen ters: first, it sim ulates a tw o-branch fat tree top ology to train its LSTM mo del. During inference, it only simulates a single branch, while the rest are replaced with LSTM replicas. The LSTM mo dels predict whether pack ets in the replicated branc hes are dropp ed or forwarded according to their exp ected b ehavior. By lev eraging the symmetry of the topology , MimicNet remains accurate and computationally efficient. How ever, this results in some rigid assumptions. First, the net work is assumed to b e a failure-free fat tree top ology , with congestion only present in the fan-in to wards the flow’s destination. Second, traffic patterns are exp ected “scale prop ortionally to the size of the netw ork", 50 as otherwise faithful mo dels cannot b e trained using the smaller, t wo-branc h top ology . Inspired b y Parsimon [57], the authors of m3 [144] prop ose splitting the net work simulation in to path-lev el simulations. Specifically , paths are sim- ulated separately , considering in eac h of them the set of foreground and bac kground flo ws. F oreground fl o ws are sim ulated in parallel and quickly using flo wSim [145], while the impact of background flows is approximated using a LLama2 transformer model [146]. These results, as w ell as additional scenario context (e.g., congestion algorithm), are concatenated and fed to a DNN to obtain the corrected appro ximations of the flow’s F CT. The main adv antage of m3 relative to Parsimon is that assuming path-level is a w eaker assumption than link-level independence. Relativ e to its inspiration, m3’s ev aluation show ed it to b e more accurate and quic ker inference times, de- spite including the relatively large LLama2 model. Ho wev er, unlike most sim ulators, it do es not main tain pack et-lev el visibilit y . Instead, like P arsi- mon, m3 is designed for aggregated tail-prediction p erformance metrics, suc h as w orst-case TCP throughput. Finally , its supp ort of congestion control is based on b eing parametrized and added to the input features of its final DNN in its arc hitecture. Consequen tly , it only supp orts those proto cols seen during training. 6.4. Simulation with DL-Enhanc e d A c cur acy As w e discussed, while net work simulation is p erceived as the most accu- rate approach for p erformance mo deling, it is still sub ject to some inaccura- cies [58, 40]. Consequen tly , w e hav e seen some approaches that attempt to use DL to enhance the simulation’s accuracy to b etter matc h realit y . Unlik e the models in Section 6.1, these are applied to simulators, not emulators, and they are complemen ted with DL mo dels. One example of this is CasualSim [43]. In it, they use trace simulation, a faster y et more inaccurate alternativ e to DES, and instead use ML to impro ve its accuracy . T race-simulation is a v arian t of DES where only a subset of the system is sim ulated, while the other segments are replaced with traffic traces. Ho w ever, it assumes that the traces’ conten ts are independent of the rest of the simulation, which rarely holds. Consequen tly , CasualSim prop oses the use of a causal DNN mo del to adapt the traces according to the system b eha vior, impro ving accuracy . Ho w ever, trace simulation is meant to sim ulate the impact of sp ecific small changes for “what-if" scenarios. Larger 51 c hanges result in fewer elements b eing replaced by traces, hence b ecoming increasingly similar to standard DES. Recen tly , in Sim2HW [44], the authors use an expanded GraphSage [126] mo del to correct the net work p erformance predictions giv en by OMNET++ [36]. T o train the mo del, sim ulated netw ork scenarios were replicated in a testb ed to obtain their ground truth. While this approach may enhance the sim u- lator’s accuracy in replicating real-w orld traffic, it does not address DES’s main issue: its high computational cost. 6.5. Summary Ultimately , hybrid mo dels are characterized b y their diversit y and prac- tical nature. By combining existing, pro ven approaches, the authors of these approaches obtain stronger net work models. This includes expand- ing emulation with netw ork mo dels to supp ort p erformance mo deling (e.g., P antheon [40] and iBo x [41]), applying ML to accelerate DES (e.g., Mimic- Net [15] m3 [144]) or correct its outputs (e.g., Sim2HW [44]), using ML-based heuristics to improv e NC mo dels (e.g., DeepTMA [139, 140]), and con versely using NC and QT to impro v e ML model training (e.g., QINN [143]) or to pro vide additional input information (e.g., [39], QT-RouteNet [38], GNNet- Slice [141]). The biggest benefi ts of these approac hes are the ability to com bine the strengths of both approac hes. F or example, when using ML to accelerate DES, ideally , the resulting mo del can retain DES’s accuracy and gran ularity while b enefiting from lo wer computational costs. How ev er, this comes at the risk of inheriting the weaknesses as well —for example, an y hybrid approach with an ML-mo del will require such to b e trained. 7. Discussion on Identified T rends and Challenges within Net work P erformance Mo deling In this section, we discuss the trends and challenges iden tified in current net work performance mo dels. This section is also mean t to expand on earlier discussions [147, 148, 59]. 7.1. Balanc e Betwe en A c cur acy, R esolution, Applic ability, and Infer enc e Cost An ideal net work performance mo del should b e accurate , expressiv e (i.e., granular predictions, ideally pac k et-level), applicable in general sce- narios, and with a lo w computational cost . In practice, how ev er, current net work p erformance mo dels cannot guarantee all of these prop erties. 52 Approac h A ccuracy Expressiveness Applicabilit y Computational Cost Simulation DES High P ack et-lev el General High (and sequential) PDES High QT Discrete Low Flow-lev el Specific TCP version Lo w Fluid Flow-lev el (temporal) NC ADNC Medium Flow-lev el F eed-fow ard netw orks Medium ODNC High (temporal) High ML Shallow ML Medium Flow-lev el On trained top ologies Low (temporal and and traffic profiles DL High non-temporal) On trained traffic pro- files Medium Hybrid approaches ML+Analytical High Flo w-level On trained traffic pro- files Medium Accelerated DES High Flow-lev el / Pac ket-lev el On trained traffic pro- files; topology supp ort v aries. High T able 6: Summary of current net w ork p erformance mo deling approac hes. This is reflected in T able 6, where the different approaches are summa- rized and qualitativ ely compared. On the one hand, DES tends to be the preferred option for netw ork mo deling, but its cost mak es it unfeasible in man y scenarios. PDES addresses this, not by reducing its cost, but b y allow- ing for the simulation to b e spread across more cores. This do es allow it to sim ulate larger scenarios, but requires higher amounts of computing p o w er, whic h remains impractical. As a result, research in DES and PDES fo cuses on reducing the cost while main taining its other benefits. On the other hand, b oth analytical and ML mo dels tend to b e computationally inexp en- siv e. Instead, research on these mo dels fo cuses on improving their accuracy , expressiv eness, and the scenarios in whic h they are applicable while retaining the lo w computational cost. Hybrid approaches, on the other hand, try to split the difference b y syn- thesizing differen t approaches. F or instance, accelerated DES metho ds seek to reduce computational cost while preserving the strengths of traditional DES. DL techniques hav e also prov en flexible, as survey ed examples include quic ker mo dels with reasonable accuracy (e.g., RouteNet-F ermi [17]) and more complex, complete mo dels akin to DES (e.g., DeepQueueNet [16]). Ho wev er, curren t approac hes still fall short of reac hing all four c haracter- istics. F or example, MimicNet [15] is only applicable in fat tree top ologies, while m3 [144] loses pac k et-level visibility and can only predict high p ercen tile F CT. F urthermore, approac hes such as ML+Analytical Hybrid Mo dels tend to offer small yet significan t improv ements in their accuracy but do not ad- 53 dress other fundamen tal constraints present in either approach. As a result, the optimal approac h will dep end on its exp ected application. If time and computational resources are not a constrain t, DES remains the b est option. How ev er, if the net w ork p erformance prediction is expected to b e integrated in a more complex, time-critical application (or simply a quic ker, appro ximate prediction is desired), an analytical or ML mo del is b etter suited. Alternativ ely , a mo del that is mean t to b e applied only to a specific net work with a static top ology ma y not need the same degree of applicability that a general-purpose approac h like DES. Ultimately , as also argued in [148], researchers m ust consider their use case when deciding whic h measuremen ts to use as input, whic h asp ects of the netw ork should b e modeled in this use case, and whic h properties are most relev ant. The differences in approaches are also reflected in how the field tries to develop b etter netw ork mo dels. While some researchers fo cus on reducing the costs of the hea vier, more accurate mo dels, others try to improv e the accuracy of the more efficien t ones. F urthermore, it is not unreasonable to b eliev e that suc h an “ideal" mo del is impossible to build to begin with, as the different features can imp ede others. F or example: • An expressive mo del should predict the b eha vior of individual pack ets, y et their amount in mo dern netw orks is measured in the billions [58]. Hence, obtaining pac k et predictions, without aggregation or summa- rization, will b e inherently exp ensive by sheer scale. • Similarly , aggregation also implies losing information as the sequences of pac kets get shortened to a fixed set of v alues, resulting in a p otential loss of accuracy . • Analytical and statistical mo dels rely on assumptions for them to b e accurate. Ho wev er, these ma y constrain the range of scenarios they can b e applied to. Nonetheless, without suc h assumptions, the mo d- els fail to c haracterize net w ork traffic and its behavior (no-free-lunch theorem [149]). 7.2. The Dominanc e of DES and the Sur ge of GNNs T raditionally , DES has b een regarded by netw ork op erators as the gold standard for netw ork p erformance mo deling. This is exemplified by the suc- cess of ma jor DES sim ulators like ns and OMNET++, as w ell as the fact that 54 most non-DES net work p erformance mo dels are compared using or against sim ulated data. Referring bac k to Figure 1 in the introduction, sim ulation is the only t yp e of mo del that has had constan t attention o ver the last three decades. This can be explained by the fact that simulation is b oth one of the most accurate options and capable of offering pac ket-lev el predictions. Ho wev er, since 2018, w e ha v e seen a surge in DL net w ork p erformance mo dels. This is in part due to the success of the GNN architectures. Their design exploits the relational information in computer netw orks to their ad- v antage, pro ving to b e accurate and computationally inexp ensiv e. While they do not offer pac ket-lev el granularit y , they can b e augmen ted to include a temp oral resolution [58, 122, 125]. This has resulted in their dominance as the most used DL architecture when building netw ork mo dels: 17 out of the 29 ( ≈ 58% ) pure ML mo dels iden tified in the survey are based on GNNs. They are also quite relev ant in h ybrid approaches: 5 out of the 12 h ybrid mo dels identified included a GNN mo del. Note that this prev alence is not univ ersally shared across the en tire com- m unity . Let us consider mo dels published since 2018 in ACM SIGCOMM and IEEE INF OCOM, the t w o CORE A* conferences regarding computer net working. On the one hand, the IEEE INFOCOM do es reflect the prev a- lence of GNNs, with tw o out of three mo dels accounted for b eing GNN- based [17, 122] and the remainder one b eing a hybrid method in volving ML [138]. On the other hand, ACM SIGCOMM instead prefers PDES and similar metho ds: out of the sev en mo dels published since 2018, three mo dels are PDES prop osals [18, 53, 56], and another tw o mo dels are DL-accelerated DES hybrid approaches [15, 144]. Of the remaining tw o mo dels, only one is a GNN [110]. The other, DeepQueueNet [16], is a transformer-based mo del whose design mimics DES reasoning, do wn to reasoning o v er individual pack- ets. These discrepancies show that different voices in the communities may sho w preferences in whic h asp ects of net w ork p erformance simulation they prioritize, hence preferring approac hes whose strengths align with them. 7.3. R e duc e d Inter est in A nalytic al Mo dels While there has b een a rise in DL-based mo dels, sp ecifically the GNN arc hitectures, it has come at the cost of slo wing the dev elopmen t of newer analytical mo dels. Out of the models w e hav e surv ey ed, only one purely analytical mo del w as published in the last 5 years [66], with the previous one b eing published in 2017 [35]. W e hav e identified some p otential reasons why . 55 First, the most straightforw ard explanation is the fact that ML (and DL) mo dels offer similar b enefits while outp erforming analytical mo dels. At first, the main difference that analytical mo dels offered against DES w as their re- duced computational cost, at the cost of less expressive and accurate results. No wada ys, ML mo dels also offer accurate and computationally inexp ensiv e predictions, hence placing themselves in direct comparison against analytical mo dels. Second, unlike ML, analytical models try to explicitly define complex net work b eha vior through sets or systems of equations. ML mo dels either treat netw orks as black b o xes and try to predict their p erformance through regression (e.g., [37] and CLAAP [106]), or explicitly represen t some exist- ing dep endencies within the netw ork but still rely on the training pro cess for them to b e completely developed (e.g., RouteNet [111]). By con trast, analyt- ical mo dels must b e completely formulated by their creators, whic h in volv es defining their assumptions and mathematically proving their v alidity or error b ounds. In turn, this makes analytical mo dels more rigid, less future-pro of, and arguably harder to build o verall. An example of this is how differen t approaches supp ort m ultiple versions of TCP sim ultaneously . This is an imp ortan t asp ect for netw ork mo dels, as TCP implemen tations ev olve, and it is not uncommon for several versions to co-exist in the same net w ork [147]. In analytical mo dels, authors either assume a “generic" TCP version, whic h results in inaccuracies as it fails to capture the differences b et ween implemen tations [78, 71, 72], or are forced to re-form ulate segments of the mo del for eac h v ersion they supp ort [70, 74]. In con trast, DL mo dels that support m ultiple TCP v ersions usually differen tiate b et w een versions through a one-hot encoded vector [119, 125]. This means that DL mo dels can b e expanded to fit more TCP v ersions easily , as long as the authors ha ve the recorded scenarios to train them with. There is also the fact that, as new DL arc hitectures keep b eing dev elop ed, researc hers will con tin ue to explore their p oten tial as netw ork p erformance mo dels, as it happ ened recently with the transformer architecture [131, 16]. Altogether, this can explain why the researc h into new analytical mo dels has slo wed down, and instead, this effort has mo v ed into the developmen t of ML mo dels. 7.4. A dapting to Changing Networks One of the biggest c hallenges of netw ork p erformance mo deling identified early on was that of describing the in ternet as an “immense moving target" 56 [147, 59]. They iden tified the dynamic nature of netw orks, ho w they change o ver time in terms of traffic patterns, usage, and implemented proto cols, and how these could be a c hallenge to net w ork models at the time. This remains an ongoing c hallenge: since the 2000s, in ternet usage has contin ued to increase, data centers rely on new er TCP v arian ts like DCTCP [34] and DCQN [21], and wireless net works are more common and complex. A t the time, solutions prop osed to address this c hallenge w ere based on the mo deling approac hes prev alen t at that time. Analytical mo dels were seen as the most vulnerable, as they tend to rely more on net work assumptions or, in the case of queuing and fluid mo dels, they were designed with a giv en TCP v ersion in mind. A solution prop osed back then w as searching for inv arian t prop erties —asp ects or prop erties that remain constan t o ver different netw ork top ologies, sizes, and usages [147, 59]. Examples include the self-correlation in pac ket in ter-arriv al distributions, or the heavy-tail distributions presen t in metrics suc h as pack et dela y , R TT, or F CT. Another solution prop osed was the push for mo dular, interoperable mo d- els [148]. The idea w as to av oid building monolithic netw ork mo dels capable of individually addressing any scenario. Instead, it prop osed building net- w ork mo dels for them to b e interoperable, that is, for them to b e combined and fed to each other. This allo ws mo dels to b e gradually up dated to newer dev elopments, like newer TCP versions. This approac h is better suited for DES sim ulators, as these can b e expanded to co v er new proto cols and de- vices as they are released. Op en-sourced simulators like ns [28, 29] and OM- NeT [36] hav e remained up dated thanks to the communit y . F urthermore, we ha ve mo dular PDES lik e SimBric ks [53] and SplitSim [54] whose mo dular- it y enables their co verage of net w ork features as well as their parallelization. Finally , DL mo dels do offer new wa ys to address this issue, w e will co ver it later in Section 8.3. 7.5. Simulation-Dominate d Evaluation Another trend in net work mo deling is the prev alence of using sim ulated data in the ev aluation. This fact is reflected in Figure 3, which shows how mo dels across the differen t approaches are ev aluated. Overall, it sho ws that the use of sim ulated data dominates, used in ov er half of the survey ed mo dels, and is the most used across all the approaches except sim ulation itself; ev en across sim ulators, comparing against other sim ulators remains the preferred ev aluation c hoice. This is because most sim ulators surv ey ed are published through white pap ers, whic h describ e the sim ulation implemen tation but 57 Simulation Analytical Models ML Models Hybrid Appr oaches T otal 10 10 5 21 17 8 51 1 9 2 12 2 3 3 8 13 1 14 Analytical Simulated-data T estbed-data R eal-data No evaluation / Other Figure 3: Ev aluation categories across different model t yp es. If a model is ev aluated with m ultiple categories, they are categorized in the follo wing priorit y: real data, testb ed data, sim ulated data, and analytical. otherwise do not measure their accuracy . Otherwise, w e see a minority of mo dels being ev aluated with testbed or captured data in ML and h ybrid approac hes, while another significan t minority of samples are just ev aluated analytically in the case of the analytical mo dels. Ov erall, there are strong reasons for this. First, captured data is scarce and not alw a ys a v ailable. Unlik e it, sim ulated data can b e generated to co v er an y desired scenario, allo wing for more div erse datasets. Second, captured data ma y also b e limited by the features that the data’s authors were able to capture, while in sim ulation, the entire state of the net work and traffic is presen t. F urthermore, testb ed net w orks require a monetary cost to set up and later upgrade (e.g., changing devices). Mean while, sim ulation softw are can b e run on generic computing devices without additional hardware. Ho w- ev er, sim ulation has sev eral disadv an tages, as commen ted bac k in Section 3. First, its high computational cost limits the size of the net w orks or traffic in tensities used in the ev aluation. Hence, whenever the mo del is applied to these challenging scenarios, it ma y p erform w orse than expected. There is also the issue where simulation ma y not cov er sp ecific proto cols and netw ork devices, or without the required accuracy [59, 60, 40, 58]. 58 7.6. Heter o gene ous Appr o aches, Heter o gene ous Evaluations Another challenge present in net work p erformance mo deling is a lac k of common ev aluation pro cedures. W e b eliev e that this is due to the heterogene- it y of mo deling approaches comp ounded by the scarcity of public datasets. The differences in net work p erformance modeling approac hes allo w for the creation of mo dels with different strengths and goals. Suc h versatilit y p er- mits net work op erators to c ho ose the mo deling approac h that b est fits their needs. First, mo dels ma y differ on the performance metric they measure, whic h dep ends on the traffic type they are mo deling (e.g., traffic from a spe- cific proto col like TCP). This also influences what data they tak e as input, as some use solely traffic traces, while others can include the entire netw ork in their reasoning. Later, some mo dels can b e only applicable under certain constrain ts (e.g., NC only applicable to feed-forw ard net w orks, or Mimic- Net [15] only to fat tree top ologies), which also limits common scenarios where they can b e compared to other mo dels. Even when considering the same metric, the granularit y of their outputs ma y also condition ho w mo dels are ev aluated. F or example, mo dels that predict on a pack et-lev el, or even a flo w-level with a temp oral comp onen t, must see their predictions aggregated to b e compared against mo dels offering flow-lev el predictions. F urthermore, ev en comparing pack et-lev el outputs against each other may b e complicated. F or example, one mo del ma y predict a giv en delay for a pac ket that, according to the ground truth, was lost. This, in turn, makes us rely on error metrics lik e the W asserstein metric to compare the distribution of the predictions. While useful, such metrics lac k the in tuitiveness b ehind the obtained result. That is, unlike metrics like the Mean Absolute P ercent- age Error that can b e easily understo o d, there is no clear w ay of interpreting whether a given W asserstein metric can b e regarded as a “go o d" or “bad" re- sult. A t most, it can b e used to compare mo dels, knowing that low er v alues mean closer predictions to the ground truth. Note that these difficulties also apply to mo dels that offer flow-lev el pre- dictions with a temp oral comp onent. This may b e exacerbated by the fact that these mo dels may consider their temp oral comp onen t under differen t scales (e.g., predictions every second v ersus ev ery millisecond). Ev en worse, mo dels ma y define their temp oral comp onen ts differently . F or example, while RouteNet-Gauss [58] ma y consider fixed-length windows, the temp oral com- p onen t in m4 [125] is ev ent-driv en. Finally , all of these difficulties are w orsened b y the lac k of publicly a v ail- able netw ork data. This is the case for several reasons. In the case of real- 59 w orld captured data, only a few groups and companies hav e access to suc h, and the capabilit y to capture it. F urthermore, publishing real data from real users may p ose priv acy risks and requires anon ymization pro cedures for it to b e safe to b e made public. Alternativ ely , testb ed netw orks are rare due to their costs; hence, b y extension, published datasets will also b e rare. 8. F uture Directions and Opp ortunities In this section, we follow up on the previous discussion with our prediction on future research directions for net w ork p erformance modeling. This is based on our expected ev olution of curren t trends and ho w curren t c hallenges ma y b e addressed. 8.1. Consolidation of PDES and GNNs Mo dels First, from the conclusions obtained from Sections 7.2 and 7.5, w e can deriv e that DES is one of, if not the most desirable, mo deling approac hes a v ailable to net w ork op erators no w adays. Unsurprisingly , lately , there has b een a push by researc hers to address the c hallenge of PDES design. Hence, w e b eliev e that PDES researc h will contin ue to consolidate its p osition as the new “baseline" metho d, replacing traditional (single-process) DES sim- ulators. This means reducing the synchronization ov erhead while retaining correctness. W e believe that the increased a v ailabilit y of computing p o wer [3] means that computational efficiency will not b e as m uch of a priorit y com- pared with the ability of the PDES to exploit the resources a v ailable. The most recent PDES survey ed, NSX [56], exemplifies this, prop osing a highly distributed sim ulation that can run in the same sp ecialized hardw are used to train AI mo dels. By analyzing the SotA, we can also conclude that DL models, and sp ecifi- cally GNNs, are w ell-p ositioned to b ecome the main alternativ e to simulation. They can be applied to scenarios where PDES cannot, suc h as those with restrictions on computing resources av ailable. Then, for the remaining sce- narios, DL models are magnitudes of times quic k er than PDES while also b enefiting from parallelization and specialized hardw are thanks to libraries lik e T ensorflo w and Pytorch. Among DL architectures, sp ecialized GNNs for net working ha ve prov ed to b e the most effective, being b oth cost-effective, accurate, and more robust than their other alternativ es. Ho wev er, there is still progress to be made for GNNs. F or example, supp ort for congestion control algorithms [119, 118] or the temp oral comp o- 60 nen t [58] is still fairly recen t. Also, current models can only faithfully predict traffic patterns seen during their training. Ultimately , addressing these issues will increase the num b er of scenarios GNNs can b e applied to and establish them as a reliable alternativ e to DES. 8.2. A nalytic al Mo dels Enhancing DL Mo dels In Section 7.3, w e discussed ho w researc h in to analytical mo dels is getting reduced atten tion, as ML and DL netw ork models are pro ving to be more cost-effectiv e. Ho w ever, we hav e seen how analytical models are still b eing used in conjunction with other techniques. An example of this is how b oth QT and NC mo dels ha ve b een used as heuristic input to be fed to a DL mo del to impro v e their accuracy [39, 38]. Recen tly , studies ha ve also sho wn that they can b e used to define more informed loss functions, further impro ving the training pro cess [143]. Ultimately , analytical mo dels still offer inexpensive, goo d appro ximations of the expected p erformance that more complex netw ork mo dels can later re- fine. The proposed approac hes also b enefit from the fact that they can b e applied indep enden tly of the underlying ML architecture, easing their imple- men tation. F urther researc h in this area can lead to impro v ed ML mo dels that retain their cost-effectiveness. This includes studying other asp ects in whic h the analytical mo del can b e integrated into the DL architecture b e- y ond the input features and loss functions, as w ell as how to incorp orate more complex analytical mo dels without incurring excessiv e computational costs. 8.3. ML as a New T o ol for Evolving Networks Section 7.4 discussed the c hallenge of the ev er-c hanging nature of net- w orks, and ho w to address mo dels becoming outdated b ecause of it. Pro- p osals in the past include mo dels focusing on true inv ariant prop erties of net works or designing systems of interoperable mo dels. In addition to these prop osals, we b elieve ML, and sp ecifically DL, has un- lo c k ed a new wa y to handle c hanging netw orks. Unlik e analytical mo dels or sim ulations, an ML mo del do es not explicitly enco de the net wo rk dynamics. Instead, it learns these dynamics from its training data. Hence, under the assumption of having an appropriate ML architecture to learn such netw ork dynamics, suc h as a GNN, it is reasonable to b eliev e that the architecture itself do es not need to b e modi fied to b e adapted to c hanges in the netw ork. 61 Instead, the ML mo del w ould hav e to b e re-trained, a pro cess that, while re- quiring some computational effort it no longer requires the exp ert knowledge required to adapt an analytical mo del, implemen t a simulator, or design an ML arc hitecture. W e refer to this new approac h as design onc e, tr ain as ne c essary , and it has already b een proposed in papers suc h as [131]. Besides adapting to ev olving netw ork conditions, it w ould also allow for designing a general ar- c hitecture v alid for many sp ecific use cases, dep ending on how it is trained. F urthermore, re-training may b e less costly than training from scratch b y exploiting transfer learning, as done in existing w orks [123, 133, 134]. At its b est, this approach allo ws the adv an tages of using a universal design for building net w ork performance mo dels, while making their implemen tation sp ecific for each use case, adapting to its necessities. Nonetheless, this approach still has some dra wbacks that future researc h m ust address. First, with the current form ulation, the mo del w ould still re- quire new traffic measuremen ts for its readjustmen t. While transfer learning w ould reduce the amoun t of data needed, it can still b e a costly process. Also, until the mo del is adjusted, it cannot b e exp ected to w ork accurately . Hence, DL mo dels should still b e designed to b e as generalizable as p ossible, to reduce the amoun t of retraining to b e done. Another risk is the gradual degradation of the mo del’s accuracy intro- duced by minor changes, rather than a single significant change. The issue with this is that the gradual degradation ma y b e harder to iden tify , and hence, the op erators may b e w orking with an inaccurate mo del without re- alizing it. This ma y b e adjusted with a con tin uous mo del ev aluation, but requires p eriodic netw ork measuremen ts. 8.4. Data Center-Centric Designs The rise in data cen ter demand, and its fo cus on ML-related w orkloads, is an opp ortunit y to develop more sp ecific, data cen ter-centric net w ork mo dels. Sp ecifically , data cen ters already share common, w ell-researched top ologies (e.g., F at T rees [150]), and currently it is expected that future demand will b e mainly due to the training and usage of Large Language Mo dels or similarly large DL mo dels [3]. Altogether, most wired net work scenarios in the future will lik ely represen t net w orks with similar top ologies, hardw are, and ev en traffic patterns as they will b e dedicated to the same ML-related use cases. Consequen tly , this homogenization introduces common prop erties across data cen ter net w orks. These can be leveraged, in the same wa y as other 62 in v arian t prop erties, to simplify mo del design without compromising mo del accuracy . While such mo dels are highly sp ecialized, giv en the rising demand and imp ortance of data centers, it is a sensible trade-off. Among the survey ed mo dels, we hav e already found mo dels that do b en- efit from the increased demand in data centers. MimicNet [15] is a p ow erful net work model, b eing accurate, cost-effectiv e, and gran ular, but only ap- plicable to F at T ree top ologies. NSX [56] is a net w ork sim ulator that is purp osefully built to tak e adv an tage of mo dern data centers — that is, b eing designed to run concurren tly in multiple GPUs. 8.5. Better Usage of Simulation Data for T r aining R e al-W orld Mo dels Bac k in Section 7.5, we discussed the prev alence of DES in the ev aluation of netw ork mo dels and the issues arising from doing so. It is worth noting that ev aluating models (and training them in case of ML-based ones) on sim ulated data may not lead to accurate results when addressing real-w orld traffic. Hence, future w ork must find wa ys to reduce this discrepancy . Currently , one w ay this may b e addressed is b y improving on DES itself. F or example, more efficient PDES simulators can b e used to simulate scenarios to o large for traditional DES to handle. There is also w ork lik e [44] where the DES’s output is corrected using a surrogate ML mo del. Another p opular approach, sp ecifically when building ML mo dels, is fram- ing the discrepancies b et ween simulation and reality as a transfer learning problem. T ransfer learning is a series of tec hniques mean t to exploit trained mo dels for a giv en task to assist in the training of mo dels for a second, re- lated task. In this case, transfer learning would consist of using ML mo dels trained with simulated data to assist mo dels to b e applied to real-w orld traf- fic. This allows for the latter to require fewer samples to build and b enefit from additional accuracy . Among the surv eyed w orks, in [133, 134] sim u- lated samples are effectively used through transfer learning, but it relies on a sp ecific architecture that b enefits from it. In the recently p ublished [151], transfer learning w as successfully applied to a RouteNet-F ermi [17] mo del. 9. Conclusions In conclusion, this survey analyzes the ev olution of netw ork p erformance mo deling ov er the last decades. W e hav e identified 95 unique netw ork p er- formance mo dels spread across multiple conferences and journals. By iden- tifying the taxonomy of mo deling approaches, w e can gain a deep er under- 63 standing of the ev olution of priorities within the researc h and professional comm unity . F or example, we observ ed an evolution in preferred metho dolo- gies, as mo dels hav e transitioned from analytical mo dels to ML and hybrid approac hes. F rom the survey ed mo dels, we ha ve recognized the prop erties sought af- ter b y net work op erators: accuracy , expressiv eness (the lev el of detail in the results), applicability in an y plausible scenario, and low computational cost. While different approaches fo cus on one or sev eral of these prop erties, no surv eyed net w ork p erformance mo del can ac hiev e all of them simultaneously . A dmittedly , it may be imp ossible for all of these properties to b e reached sim ultaneously , as they imp ede each other. F or example, accurate and ex- pressiv e mo dels tend to b e more complex, which will b e more costly . Consequen tly , this leads to the heterogeneit y and diversit y of the av ailable approac hes in net work p erformance mo deling’s SotA. On the one hand, such heterogeneit y allo ws the design of sp ecialized mo dels, allowing researc hers to optimize those prop erties that ma y be most relev an t to the scenario at hand. On the other hand, it also makes the comparison b et w een approac hes harder, which is then comp ounded by the limited av ailability of public net- w orking datasets. W e ha ve also discussed other open problems in net work p erformance mo deling. First, w e hav e the issue of ever-ev olving net works, a problem p osed ov er 20 y ears ago, whic h still c hallenges the abilit y to build mo dels applicable to future net w orks. Second, the ma jority of the identi- fied netw ork p erformance mo dels rely on sim ulated data for their ev aluation, whic h ma y compromise their expected effectiv eness when applied in real- w orld scenarios. Finally , w e ha ve iden tified p oten tially fruitful researc h directions that are just starting to b e explored. ML-based models and transfer learning are a promising approac h to address c hanging netw orks. Adv ances in PDES may result in feasible sim ulations of large net w ork top ologies. F urthermore, w e exp ect that other trends, lik e the increased demand for data cen ters, will shap e future research. A c knowledgmen ts This publication is part of the I+D+i pro ject titled BLOSSOMS, gran t PID2024-158530OB-I00, funded by MICIU/AEI/10.13039/501100011033/ and b y ERDF/EU. This w ork is also partially funded by the Catalan Institution for Research and Adv anced Studies (ICREA). Carlos Güemes is funded by 64 the A GA UR-FI a juts (Gran t Ref. 2023 F-1 00083) Joan Oró of the Sec- retariat of Univ ersities and Research of the Departmen t of Research and Univ ersities of the Generalitat of Catalonia and the Europ ean So cial Plus F und. CRediT authorship contribution statemen t Carlos Güemes-P alau: Conceptualization, Inv estigation, Visualiza- tion, W riting - Original Draft, W riting - Review and Editing Miquel F erriol- Galmés: W riting - Original Draft, W riting - Review and Editing Jordi P aillisse-Vilano v a: W riting - Original Draft, W riting - Review and Edit- ing P ere Barlet-Ros: W riting - Review and Editing, Sup ervision Alb ert Cab ellos-Aparicio: Supervision, W riting - Review and Editing, F unding acquisition Declaration of comp eting interest Carlos Güemes-P alau rep orts financial support was provided b y Spain Ministry of Science and Innov ation. Miquel F erriol-Galmés rep orts financial supp ort w as pro vided by Spain Ministry of Science and Innov ation. Jordi P aillisse-Vilanov a rep orts financial supp ort w as provided by Spain Ministry of Science and Innov ation. Pere Barlet-Ros rep orts financial supp ort w as pro vided b y Spain Ministry of Science and Inno v ation. Alb ert Cab ellos- Aparicio rep orts financial support was pro vided b y Spain Ministry of Science and Inno v ation. Carlos Güemes-P alau rep orts financial supp ort was provided b y Generalitat de Cataluny a Ministry of Research and Universities. Carlos Güemes-P alau rep orts financial supp ort w as pro vided b y Europ ean Social Plus F und. P ere Barlet-Ros rep orts financial supp ort w as provided by Cata- lan Institution for Researc h and Adv anced Studies. Albert Cab ellos-Aparicio rep orts financial supp ort was provided b y Catalan Institution for Research and Adv anced Studies. If there are other authors, they declare that they ha ve no kno wn competing financial interests or p ersonal relationships that could ha ve app eared to influence the w ork rep orted in this pap er. Data a v ailability No data w as used for the research describ ed in the article. 65 References [1] J. P edro, J. San tos, J. Pires, Performance ev aluation of integrated otn/dwdm netw orks with single-stage m ultiplexing of optical c hannel data units, in: 2011 13th International Conference on T ransparen t Op- tical Net works, 2011, pp. 1–4. doi:10.1109/ICTON.2011.5970940. [2] Ab eliene netw ork [arc hived in wa ybac k machine]. URL https://web.archive.org/web/20120324103518/http: //www.internet2.edu/pubs/200502- IS- AN.pdf [3] IEA, Energy and AI (Jul 2025). URL https://www.iea.org/reports/energy- and- ai [4] J. P op oola, R. A. Ipin y omi, Empirical P erformance of W eibull Self- Similar T ele-traffic Mo del, In ternational Journal of Engineering and Applied Sciences 4 (8) (3 2017). [5] M. Alasmar, R. Clegg, N. Zakhleniuk, G. Parisis, Internet T raf- fic V olumes are Not Gaussian—They are Log-Normal: An 18-Y ear Longitudinal Study With Implications for Mo delling and Prediction, IEEE/A CM T ransactions on Netw orking 29 (3) (2021) 1266–1279. doi:10.1109/TNET.2021.3059542. URL https://ieeexplore.ieee.org/document/9361437/ [6] J. R. Jackson, Jobshop-lik e Queueing Systems, Managemen t Science 10 (1) (1963) 131–142. URL http://www.jstor.org/stable/2627213 [7] R. Cruz, A calculus for net w ork delay . I. Net w ork elements in isola- tion, IEEE T ransactions on Information Theory 37 (1) (1991) 114–131. doi:10.1109/18.61109. URL http://ieeexplore.ieee.org/document/61109/ [8] R. Cruz, A calculus for netw ork delay . I I. Net work analysis, IEEE T ransactions on Information Theory 37 (1) (1991) 132–141. doi:10.1109/18.61110. URL http://ieeexplore.ieee.org/document/61110/ [9] Q. Mao, F. Hu, Q. Hao, Deep learning for in telligen t wireless net works: A comprehensive surv ey , IEEE Communications Surveys & T utorials 20 (4) (2018) 2595–2621. doi:10.1109/COMST.2018.2846401. 66 [10] Y. Shi, L. Lian, Y. Shi, Z. W ang, Y. Zhou, L. F u, L. Bai, J. Zhang, W. Zhang, Machine learning for large-scale optimization in 6g wireless net works, IEEE Comm unications Surv eys & T utorials 25 (4) (2023) 2088–2132. doi:10.1109/COMST.2023.3300664. [11] R. V erdecchia, L. Scommegna, B. Picano, M. Becattini, E. Vicario, Net work Digital T wins: A Systematic Review, IEEE Access 12 (2024) 145400–145416. doi:10.1109/A CCESS.2024.3453034. [12] M. Fidler, Survey of deterministic and sto c hastic service curv e mo dels in the netw ork calculus, IEEE Communications Surv eys & T utorials 12 (1) (2010) 59–86. doi:10.1109/SUR V.2010.020110.00019. [13] Y. Jiang, Y. Liu, Stochastic Netw ork Calculus, 1st Edition, Springer Publishing Compan y , Incorp orated, 2008. [14] W. Jiang, Graph-based deep learning for communication net- w orks: A surv ey , Computer Communications 185 (2022) 40–54. doi:h ttps://doi.org/10.1016/j.comcom.2021.12.015. URL https://www.sciencedirect.com/science/article/pii/ S0140366421004874 [15] Q. Zhang, K. K. W. Ng, C. Kazer, S. Y an, J. Sedoc, V. Liu, Mim- icNet: fast p erformance estimates for data center net works with ma- c hine learning, in: Pro ceedings of the 2021 ACM SIGCOMM 2021 Con- ference, SIGCOMM ’21, Association for Computing Mac hinery , New Y ork, NY, USA, 2021, pp. 287–304. doi:10.1145/3452296.3472926. URL https://doi.org/10.1145/3452296.3472926 [16] Q. Y ang, X. Peng, L. Chen, L. Liu, J. Zhang, H. Xu, B. Li, G. Zhang, DeepQueueNet: to wards scalable and generalized net work p erformance estimation with pack et-lev el visibility , in: Proceedings of the A CM SIGCOMM 2022 Conference, SIGCOMM ’22, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2022, pp. 441–457. doi:10.1145/3544216.3544248. URL https://doi.org/10.1145/3544216.3544248 [17] M. F erriol-Galmés, J. P aillisse, J. Suárez-V arela, K. Rusek, S. Xiao, X. Shi, X. Cheng, P . Barlet-Ros, A. Cab ellos-Aparicio, RouteNet-F ermi: Net w ork Mo deling With Graph Neural Netw orks, 67 IEEE/A CM T ransactions on Netw orking 31 (6) (2023) 3080–3095. doi:10.1109/TNET.2023.3269983. [18] K. Gao, L. Chen, D. Li, V. Liu, X. W ang, R. Zhang, L. Lu, DONS: F ast and Affordable Discrete Even t Net w ork Sim ulation with Automatic P arallelization, in: Pro ceedings of the A CM SIGCOMM 2023 Con- ference, ACM SIGCOMM ’23, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2023, pp. 167–181. doi:10.1145/3603269.3604844. URL https://doi.org/10.1145/3603269.3604844 [19] V. Arun, M. T. Arashlo o, A. Saeed, M. Ali zadeh, H. Balakrishnan, T ow ard formally verifying congestion control behavior, in: Proceedings of the 2021 ACM SIGCOMM 2021 Conference, A CM, New Y ork, NY, USA, 2021, pp. 1–16. doi:10.1145/3452296.3472912. [20] M. T. Arashlo o, R. Bec kett, R. Agarwal, F ormal Metho ds for Net w ork P erformance Analysis, in: 20th USENIX Symp osium on Net w ork ed Systems Design and Implementation (NSDI 23), USENIX Asso ciation, Boston, MA, 2023, pp. 645–661. URL https://www.usenix.org/conference/nsdi23/ presentation/tahmasbi [21] Y. Zh u, H. Eran, D. Firestone, C. Guo, M. Lipsh teyn, Y. Liron, J. P ad- h ye, S. Raindel, M. H. Y ahia, M. Zhang, Congestion control for large- scale rdma deplo yments, SIGCOMM Comput. Comm un. Rev. 45 (4) (2015) 523–536. doi:10.1145/2829988.2787484. URL https://doi.org/10.1145/2829988.2787484 [22] T. N. Kipf, M. W elling, Semi-sup ervised classification with graph con- v olutional netw orks, arXiv preprin t arXiv:1609.02907 (2016). [23] Y. Li, R. Zemel, M. Bro c ksc hmidt, D. T arlo w, Gated graph sequence neural netw orks, in: Pro ceedings of ICLR’16, proceedings of iclr’16 Edition, 2016. URL https://www.microsoft.com/en- us/research/publication/ gated- graph- sequence- neural- networks/ [24] F. Scarselli, M. Gori, A. C. T soi, M. Hagen buchner, G. Monfardini, The graph neural netw ork model, IEEE T ransactions on Neural Netw orks 20 (1) (2009) 61–80. doi:10.1109/TNN.2008.2005605. 68 [25] S. Ho c hreiter, J. S c hmidh ub er, Long short-term memory , Neural Computation 9 (8) (1997) 1735– 1780. arXiv:h ttps://direct.mit.edu/neco/article- p df/9/8/1735/813796/neco.1997.9.8.1735.pdf, doi:10.1162/neco.1997.9.8.1735. URL https://doi.org/10.1162/neco.1997.9.8.1735 [26] J. Gilmer, S. S. Sc ho enholz, P . F. Riley , O. Viny als, G. E. Dahl, Neural message passing for quan tum c hemistry , in: D. Precup, Y. W. T eh (Eds.), Proceedings of the 34th In ternational Conference on Mac hine Learning, V ol. 70 of Pro ceedings of Machine Learning Research, PMLR, 2017, pp. 1263–1272. URL https://proceedings.mlr.press/v70/gilmer17a.html [27] M. Garnelo, J. Sch w arz, D. Rosen baum, F. Viola, D. J. Rezende, S. M. A. Eslami, Y. W. T eh, Neural pro cesses (2018). URL [28] The Net work Simulator - ns-2 (7 1995). URL https://www.isi.edu/websites/nsnam/ns/ [29] G. F. Riley , T. R. Henderson, The ns-3 Netw ork Simulator, in: Mo d- eling and T o ols for Net work Sim ulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 15–34. doi:10.1007/978-3-642-12331-3_2. URL http://link.springer.com/10.1007/978- 3- 642- 12331- 3_2 [30] J.-Y. Le Boudec, P . Thiran (Eds.), Netw ork Calculus, V ol. 2050 of Lecture Notes in Computer Science, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2001. doi:10.1007/3-540-45318-0. URL http://link.springer.com/10.1007/3- 540- 45318- 0 [31] M. Fidler, Extending the Net work Calculus Pa y Bursts Only Once Principle to Aggregate Scheduling, 2003, pp. 19–34. doi:10.1007/3- 540-36480-3_2. URL http://link.springer.com/10.1007/3- 540- 36480- 3_2 [32] M. Sc hlic h tkrull, T. N. Kipf, P . Blo em, R. V an Den Berg, I. Titov, M. W elling, Mo deling relational data with graph conv olutional net- w orks, in: The seman tic web: 15th in ternational conference, ESWC 69 2018, Heraklion, Crete, Greece, June 3–7, 2018, pro ceedings 15, Springer, 2018, pp. 593–607. [33] Resilien t ov erla y net works (2001). URL http://nms.lcs.mit.edu/ron/ [34] M. Alizadeh, A. Green b erg, D. A. Maltz, J. Padh y e, P . P atel, B. Prab- hak ar, S. Sengupta, M. Sridharan, Data cen ter tcp (dctcp), in: Pro- ceedings of the A CM SIGCOMM 2010 Conference, SIGCOMM ’10, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2010, p. 63–74. doi:10.1145/1851182.1851192. URL https://doi.org/10.1145/1851182.1851192 [35] S. Bondorf, P . Nik olaus, J. B. Schmitt, Qualit y and Cost of Deter- ministic Net work Calculus: Design and Ev aluation of an Accurate and F ast Analysis, Pro ceedings of the A CM on Measurement and Analysis of Computing Systems 1 (1) (2017) 1–34. doi:10.1145/3084453. URL https://dl.acm.org/doi/10.1145/3084453 [36] A. V arga, OMNeT++, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Simulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 35–59. doi:10.1007/978-3- 642-12331-3_3. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_3 [37] Q. He, C. Do vrolis, M. Ammar, On the predictability of large trans- fer TCP throughput, in: Proceedings of the 2005 conference on Ap- plications, technologies, architectures, and proto cols for computer comm unications, ACM, New Y ork, NY, USA, 2005, pp. 145–156. doi:10.1145/1080091.1080110. URL https://dl.acm.org/doi/10.1145/1080091.1080110 [38] B. K. de Aquino Afonso, L. Berton, QT-Routenet: Impro v ed GNN generalization to larger 5G netw orks by fine-tuning predictions from queueing theory, ITU Journal on F uture and Ev olving T ec hnologies 3 (2) (2022) 134–141. doi:10.52953/FBRB3688. URL https://www.itu.int/pub/S- JNL- VOL3.ISSUE2- 2022- A12 [39] M. Helm, G. Carle, Predicting Latency Quantiles using Net work Calculus-assisted GNNs, in: Pro ceedings of the 2nd on Graph 70 Neural Net working W orkshop 2023, GNNet ’23, Asso ciation for Computing Machinery , New Y ork, NY, USA, 2023, pp. 13–18. doi:10.1145/3630049.3630173. URL https://doi.org/10.1145/3630049.3630173 [40] F. Y. Y an, J. Ma, G. D. Hill, D. Ragha v an, R. S. W ahb y , P . Levis, K. Winstein, Pan theon: the training ground for In ternet congestion- con trol research, in: 2018 USENIX Annual T ec hnical Conference (USENIX A TC 18), USENIX Asso ciation, Boston, MA, 2018, pp. 731– 743. URL https://www.usenix.org/conference/atc18/presentation/ yan- francis [41] S. Ashok, S. Tiwari, N. Natara jan, V. N. Padmanabhan, S. Sellaman- ic k am, Data-Driven Netw ork Path Simulation with iBo x, Pro ceedings of the A CM on Measuremen t and Analysis of Computing Systems 6 (1) (2022) 1–26. doi:10.1145/3508026. URL https://dl.acm.org/doi/10.1145/3508026 [42] Y u Gu, Y ong Liu, D. T owsley , On in tegrating fluid mo dels with pac ket sim ulation, in: IEEE INF OCOM 2004, V ol. 4, IEEE, 2004, pp. 2856– 2866. doi:10.1109/INF COM.2004.1354702. URL http://ieeexplore.ieee.org/document/1354702/ [43] A. Alomar, P . Hamadanian, A. Nasr-Esfahany , A. Agarwal, M. Al- izadeh, D. Shah, CausalSim: A Causal F ramew ork for Unbiased T race-Driven Simulation, in: 20th USENIX Symp osium on Net work ed Systems Design and Implementation (NSDI 23), USENIX Asso ciation, Boston, MA, 2023, pp. 1115–1147. URL https://www.usenix.org/conference/nsdi23/ presentation/alomar [44] J. Späth, M. Helm, B. Jaeger, G. Carle, Sim2HW: Mo deling La- tency Offset Bet w een Netw ork Sim ulations and Hardw are Measure- men ts, in: Pro ceedings of the 3rd GNNet W orkshop on Graph Neural Net working W orkshop, A CM, New Y ork, NY, USA, 2024, pp. 20–26. doi:10.1145/3694811.3697820. URL https://dl.acm.org/doi/10.1145/3694811.3697820 71 [45] M. Guizani, A. Ra yes, B. Khan, A. Al-F uqaha, Netw ork Mo deling and Sim ulation, Wiley , 2010. doi:10.1002/9780470515211. URL https://onlinelibrary.wiley.com/doi/book/10.1002/ 9780470515211 [46] S. Kesha v, REAL: A Netw ork Simulator, T ech. Rep. UCB/CSD-88-472 (12 1988). URL http://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/ 5316.html [47] X. Chang, Netw ork simulations with OPNET, in: Pro ceedings of the 31st conference on Winter simulation Sim ulation—a bridge to the fu- ture - WSC ’99, A CM Press, New Y ork, New Y ork, USA, 1999, pp. 307–314. doi:10.1145/324138.324232. [48] Bültmann Daniel, M. Mühleisen, Max Sebastian, Op en WNS, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Sim ulation, Springer Berlin Heidelberg, Berlin, Heidelb erg, 2010, pp. 69–81. doi:10.1007/978-3-642-12331-3_5. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_5 [49] J. Sommer, J. Sc harf, IKR Simulation Library , in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Net work Sim u- lation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 61–68. doi:10.1007/978-3-642-12331-3_4. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_4 [50] G. Casale, G. Serazzi, Quan titative system ev aluation with Ja v a mo d- eling to ols, in: Pro ceedings of the 2nd A CM/SPEC In ternational Con- ference on P erformance engineering, ACM, New Y ork, NY, USA, 2011, pp. 449–454. doi:10.1145/1958746.1958813. URL https://dl.acm.org/doi/10.1145/1958746.1958813 [51] N. Bink ert, B. Bec kmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Ho wer, T. Krishna, S. Sardashti, R. Sen, K. Sew ell, M. Shoaib, N. V aish, M. D. Hill, D. A. W o o d, The gem5 simula- tor, ACM SIGAR CH Computer Architecture News 39 (2) (2011) 1–7. doi:10.1145/2024716.2024718. 72 [52] M. Alian, D. Kim, N. Sung Kim, p d-gem5: Simulation Infrastructure for P arallel/Distributed Computer Systems, IEEE Computer Architec- ture Letters 15 (1) (2016) 41–44. doi:10.1109/LCA.2015.2438295. URL http://ieeexplore.ieee.org/document/7114236/ [53] H. Li, J. Li, A. Kaufmann, SimBricks: end-to-end net w ork system ev aluation with mo dular simulation, in: Pro ceedings of the ACM SIG- COMM 2022 Conference, ACM, New Y ork, NY, USA, 2022, pp. 380– 396. doi:10.1145/3544216.3544253. URL https://dl.acm.org/doi/10.1145/3544216.3544253 [54] H. Li, P . Balasubramanian, M. Meiers, J. Li, A. Kaufmann, Split- Sim: Large-Scale Simulations for Ev aluating Netw ork Systems Re- searc h (2024). URL [55] S. Bai, H. Zheng, C. Tian, X. W ang, C. Liu, X. Jin, F. Xiao, Q. Xiang, W. Dou, G. Chen, Unison: A Parallel-Efficien t and User-T ransparen t Net work Sim ulation Kernel, in: Pro ceedings of the Nineteenth Euro- p ean Conference on Computer Systems, A CM, New Y ork, NY, USA, 2024, pp. 115–131. doi:10.1145/3627703.3629574. URL https://dl.acm.org/doi/10.1145/3627703.3629574 [56] S. Khashab, H. Sezhiyan, R. Abb oud, A. Normatov, S. Kaestle, E. Bar- Ilan, M. Nassar, O. Shabtai, W. Bai, M. Kadosh, J. Xing, M. Silb er- stein, T. E. Ng, A. Chen, Nsx: Large-scale net work sim ulation on an ai serv er, in: Proceedings of the 2nd W orkshop on Net works for AI Com- puting, NAIC ’25, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2025, p. 19–25. doi:10.1145/3748273.3749199. URL https://doi.org/10.1145/3748273.3749199 [57] K. Zhao, P . Go yal, M. Alizadeh, T. E. Anderson, Scalable T ail Latency Estimation for Data Cen ter Netw orks (2022). URL [58] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, RouteNet-Gauss: Hardw are-Enhanced Netw ork Mo deling with Machine Learning (1 2025). 73 [59] S. Floyd, V. Paxson, Difficulties in simulating the Internet, IEEE/A CM T ransactions on Netw orking 9 (4) (2001) 392–403. doi:10.1109/90.944338. [60] V. P axson, S. Flo yd, Wh y w e don’t kno w ho w to sim ulate the in- ternet, in: Pro ceedings of the 29th Conference on Win ter Simula- tion, WSC ’97, IEEE Computer So ciet y , USA, 1997, p. 1037–1044. doi:10.1145/268437.268737. URL https://doi.org/10.1145/268437.268737 [61] R. F ujimoto, Parallel and distributed sim ulation systems, in: Proceed- ing of the 2001 Win ter Simulation Conference (Cat. No.01CH37304), IEEE, 2001, pp. 147–157. doi:10.1109/WSC.2001.977259. URL http://ieeexplore.ieee.org/document/977259/ [62] G. Kunz, Parallel Discrete Ev ent Sim ulation, in: W ehrle Klaus, M. Güneş, Gross James (Eds.), Mo deling and T o ols for Netw ork Sim- ulation, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 121– 131. doi:10.1007/978-3-642-12331-3_8. URL https://doi.org/10.1007/978- 3- 642- 12331- 3_8 [63] S. Jafer, Q. Liu, G. W ainer, Synchronization methods in parallel and distributed discrete-ev ent sim ulation, Sim- ulation Mo delling Practice and Theory 30 (2013) 54–73. doi:h ttps://doi.org/10.1016/j.simpat.2012.08.003. URL https://www.sciencedirect.com/science/article/pii/ S1569190X12001244 [64] H. K oba yashi, A. K onheim, Queueing Mo dels for Computer Comm uni- cations System Analysis, IEEE T ransactions on Communications 25 (1) (1977) 2–29. doi:10.1109/TCOM.1977.1093702. URL http://ieeexplore.ieee.org/document/1093702/ [65] W. ch ung Poon, K. tung Lo, A refined version of m/g / ∞ processes for mo delling vbr video traffic, Computer Comm unications 24 (2001) 1105–1114. doi:10.1016/S0140-3664(00)00325-X. [66] F. Fiorini, M. Co co ccioni, M. Pagano, Quantitativ e dela y analysis of gi/g/1 queues with hea vy-tailed traffic b y means of alpha the- ory , Computer Net w orks 269 (2025) 111394, vER Y math ha v e. Ba- sically , propases a metho dology based on nonstandard analysis to 74 deriv e an approximation of a GI/G/1 queue. Can b e used for ob- tainin estimated mean delay , but also upp er and low er b ounds. doi:10.1016/j.comnet.2025.111394. [67] M. Garetto, R. Lo Cigno, M. Meo, M. Ajmone Marsan, A detailed and accurate closed queueing netw ork mo del of many interacting TCP flo ws, in: Pro ceedings IEEE INF OCOM 2001. Conference on Computer Comm unications. T w entieth Ann ual Joint Conference of the IEEE Computer and Comm unications So ciet y (Cat. No.01CH37213), V ol. 3, IEEE, 2001, pp. 1706–1715. doi:10.1109/INF COM.2001.916668. URL http://ieeexplore.ieee.org/document/916668/ [68] M. Garetto, Renato Lo Cigno, M. Meo, M. Marsan, Closed queueing netw ork models of interacting long-lived TCP flo ws, IEEE/A CM T ransactions on Netw orking 12 (2) (2004) 300–311. doi:10.1109/TNET.2004.826297. URL https://ieeexplore.ieee.org/document/1288134/ [69] M. Y u, M. Zhou, A P erformance Mo deling Sc heme for Mul- tistage Switch Net works With Phase-Type and Bursty T raffic, IEEE/A CM T ransactions on Netw orking 18 (4) (2010) 1091–1104. doi:10.1109/TNET.2009.2036437. URL http://ieeexplore.ieee.org/document/5352328/ [70] T. Bonald, Comparison of TCP Reno and TCP V egas: effi- ciency and fairness, Performance Ev aluation 36-37 (1999) 307–332. doi:10.1016/S0166-5316(99)00037-1. [71] F. Baccelli, D. Hong, AIMD, fairness and frac tal scaling of TCP traffic, in: Pro ceedings.T w ent y-First Ann ual Join t Conference of the IEEE Computer and Comm unications So cieties, V ol. 1, IEEE, New Y ork, 2002, pp. 229–238. doi:10.1109/INF COM.2002.1019264. URL http://ieeexplore.ieee.org/document/1019264/ [72] S. Bohacek, A sto c hastic mo del of TCP and fair video trans- mission, in: IEEE INF OCOM 2003. T w ent y-second Annual Join t Conference of the IEEE Computer and Communications So ci- eties (IEEE Cat. No.03CH37428), IEEE, 2003, pp. 1134–1144. doi:10.1109/INF COM.2003.1208950. URL https://ieeexplore.ieee.org/document/1208950/ 75 [73] V. Misra, W.-B. Gong, D. T owsley , Fluid-based analysis of a net work of A QM routers supp orting TCP flo ws with an application to RED, in: Pro ceedings of the conference on Applications, T ec hnologies, Architec- tures, and Proto cols for Computer Comm unication, ACM, New Y ork, NY, USA, 2000, pp. 151–160. doi:10.1145/347059.347421. URL https://dl.acm.org/doi/10.1145/347059.347421 [74] Y. Liu, F. Lo Presti, V. Misra, D. T owsley , Y. Gu, Fluid mo dels and solutions for large-scale IP netw orks, in: Pro ceedings of the 2003 ACM SIGMETRICS in ternational conference on Measuremen t and mo deling of computer systems, ACM, New Y ork, NY, USA, 2003, pp. 91–101. doi:10.1145/781027.781039. URL https://dl.acm.org/doi/10.1145/781027.781039 [75] F. Baccelli, D. Hong, Flow lev el sim ulation of large IP net- w orks, in: IEEE INFOCOM 2003. T wen t y-second Annual Joint Conference of the IEEE Computer and Communications So cieties (IEEE Cat. No.03CH37428), V ol. 3, IEEE, 2003, pp. 1911–1921. doi:10.1109/INF COM.2003.1209213. URL http://ieeexplore.ieee.org/document/1209213/ [76] S. Bohacek, J. P . Hespanha, J. Lee, K. Obraczk a, A h ybrid sys- tems mo deling framework for fast and accurate simulation of data comm unication net works, in: Pro ceedings of the 2003 ACM SIG- METRICS international conference on Measurement and mo deling of computer systems, A CM, New Y ork, NY, USA, 2003, pp. 58–69. doi:10.1145/781027.781036. URL https://dl.acm.org/doi/10.1145/781027.781036 [77] J. Lee, S. Bohacek, J. P . Hespanha, K. Obraczk a, Modeling Comm u- nication Netw orks With Hybrid Systems, IEEE/ACM T ransactions on Net working 15 (3) (2007) 630–643. doi:10.1109/TNET.2007.893090. URL http://ieeexplore.ieee.org/document/4237147/ [78] M. Marsan, M. Garetto, P . Giaccone, E. Leonardi, E. Sc hiattarella, A. T arello, Using partial differential equations to mo del TCP mice and elephan ts in large IP net w orks, in: IEEE INF OCOM 2004, V ol. 4, IEEE, 2004, pp. 2821–2832. doi:10.1109/INF COM.2004.1354699. URL http://ieeexplore.ieee.org/document/1354699/ 76 [79] F. Baccelli, G. Carofiglio, M. Piancino, Stochastic Analysis of Scal- able TCP, in: IEEE INFOCOM 2009, IEEE, 2009, pp. 19–27. doi:10.1109/INF COM.2009.5061902. URL https://ieeexplore.ieee.org/document/5061902/ [80] G. Carofiglio, L. Muscariello, On the Impact of TCP and P er-Flo w Sc heduling on In ternet P erformance, in: 2010 Proceedings IEEE IN- F OCOM, IEEE, 2010, pp. 1–9. doi:10.1109/INF COM.2010.5461973. URL http://ieeexplore.ieee.org/document/5461973/ [81] T. Czac hórski, Queueing Mo dels for Performance Ev aluation of Computer Net w orks—T ransien t State Analysis, 2015, pp. 51–80. doi:10.1007/978-3-319-12148-2_4. URL https://link.springer.com/10.1007/978- 3- 319- 12148- 2_4 [82] E. Knigh tly , Hui Zhang, D-BIND: an accurate traffic mo del for pro- viding QoS guaran tees to VBR traffic, IEEE/A CM T ransactions on Net working 5 (2) (1997) 219–231. doi:10.1109/90.588085. URL http://ieeexplore.ieee.org/document/588085/ [83] R. Agra wal, R. Cruz, C. Okino, R. Ra jan, P erformance b ounds for flo w con trol proto cols, IEEE/A CM T ransactions on Net working 7 (3) (1999) 310–323. doi:10.1109/90.779197. [84] K. Lampk a, S. Bondorf, J. Schmitt, Ac hieving Efficiency without Sac- rificing Mo del A ccuracy: Net work Calculus on Compact Domains, in: 2016 IEEE 24th In ternational Symp osium on Mo deling, Analysis and Simulation of Computer and T elecomm unication Systems (MAS- COTS), IEEE, 2016, pp. 313–318. doi:10.1109/MASCOTS.2016.9. URL http://ieeexplore.ieee.org/document/7774596/ [85] J. B. Sc hmitt, F. A. Zdarsky , M. Fidler, Delay Bounds under Arbitrary Multiplexing: When Net work Calculus Leav es Y ou in the Lurch..., in: IEEE INF OCOM 2008 - The 27th Conference on Computer Comm uni- cations, IEEE, 2008, pp. 1669–1677. doi:10.1109/INF OCOM.2008.228. URL http://ieeexplore.ieee.org/document/4509823/ [86] A. Bouillard, E. Thierry , Tight p erformance b ounds in the worst-case analysis of feed-forward net works, Discrete Ev en t Dynamic Systems 77 26 (3) (2016) 383–411. doi:10.1007/s10626-015-0213-2. URL http://link.springer.com/10.1007/s10626- 015- 0213- 2 [87] A. Kiefer, N. Gollan, J. B. Sc hmitt, Searching for Tigh t Performance Bounds in F eed-F orward Netw orks, in: B. Müller-Clostermann, K. Ech- tle, Rathgeb Erwin P (Eds.), Measuremen t, Modelling, and Ev alua- tion of Computing Systems and Dep endability and F ault T olerance, Springer Berlin Heidelb erg, Berlin, Heidelb erg, 2010, pp. 227–241. doi:10.1007/978-3-642-12104-3_18. URL http://link.springer.com/10.1007/978- 3- 642- 12104- 3_18 [88] Cheng-Shang Chang, Stability , queue length, and delay of deterministic and sto chastic queueing net w orks, IEEE T ransactions on Automatic Con trol 39 (5) (1994) 913–931. doi:10.1109/9.284868. [89] Y. Jiang, A basic sto chastic net work calculus, A CM SIG- COMM Computer Communication Review 36 (4) (2006) 123–134. doi:10.1145/1151659.1159929. [90] K. Angrishi, An end-to-end sto c hastic net w ork calculus with ef- fectiv e bandwidth and effective capacity , Computer Net works 57 (2013) 78–84, sto chastic NC, mak es b ound tigh ter based on es- timating the effective bandwidth. Ho wev er, no ev aluation added. doi:10.1016/j.comnet.2012.09.003. [91] T. Lakshman, U. Madhow, The p erformance of TCP/IP for net works with high bandwidth-dela y pro ducts and random loss, IEEE/A CM T ransactions on Netw orking 5 (3) (1997) 336–350. doi:10.1109/90.611099. URL http://ieeexplore.ieee.org/document/611099/ [92] J. Padh y e, V. Firoiu, D. T o wsley , J. Kurose, Mo deling TCP Reno p erformance: a simple mo del and its empirical v alidation, IEEE/A CM T ransactions on Netw orking 8 (2) (2000) 133–145. doi:10.1109/90.842137. URL http://ieeexplore.ieee.org/document/842137/ [93] R. Mazumdar, L. Mason, C. Douligeris, F airness in net w ork optimal flo w con trol: optimality of pro duct forms, IEEE T ransactions on Com- m unications 39 (5) (1991) 775–782. doi:10.1109/26.87140. 78 [94] F. P . Kelly , A. K. Maullo o, D. K. H. T an, Rate con trol for commu- nication netw orks: shado w prices, proportional fairness and stabilit y , Journal of the Operational Research Society 49 (3) (1998) 237–252. doi:10.1057/palgra ve.jors.2600523. URL https://www.tandfonline.com/doi/full/10.1057/ palgrave.jors.2600523 [95] W. WHITT, Approximations for the gi/g/m queue, Pro- duction and Op erations Management 2 (2) (1993) 114– 161. arXiv:h ttps://doi.org/10.1111/j.1937-5956.1993.tb00094.x, doi:10.1111/j.1937-5956.1993.tb00094.x. URL https://doi.org/10.1111/j.1937- 5956.1993.tb00094.x [96] K. Chandy , The analysis and solutions for general queueing netw orks, in: Proceedings of the Sixth An ual Princeton Conference on Informa- tion Sciences and Systems, 1972, pp. 224–228. [97] M. Reiser, S. S. Lav en b erg, Mean-v alue analysis of closed mul- tic hain queuing net works, J. A CM 27 (2) (1980) 313–322. doi:10.1145/322186.322195. URL https://doi.org/10.1145/322186.322195 [98] K. M. Chandy , C. H. Sauer, Computational algorithms for pro duct form queueing netw orks, Communications of the ACM 23 (10) (1980) 573–583. doi:10.1145/359015.359020. URL https://dl.acm.org/doi/10.1145/359015.359020 [99] A. Charny , J.-Y. L. Boudec, Dela y Bounds in a Net work with Aggregate Sc heduling, in: Cro wcroft Jon, J. Rob erts, Smirno v Mikhail I (Eds.), Qualit y of F uture In ternet Services, Springer Berlin Heidelberg, Berlin, Heidelb erg, 2000, pp. 1–13. [100] D. Starobinski, M. Karp o vsky , L. Zakrevski, Application of net work calculus to general topologies using turn-prohibition, IEEE/A CM T ransactions on Netw orking 11 (3) (2003) 411–421. doi:10.1109/TNET.2003.813040. URL http://ieeexplore.ieee.org/document/1208302/ [101] M. Fidler, A. Rizk, A Guide to the Sto c hastic Net work Calculus, IEEE Communications Surv eys & T utorials 17 (1) (2015) 92–105. 79 doi:10.1109/COMST.2014.2337060. URL https://ieeexplore.ieee.org/document/6868978/ [102] F. Ciucu, J. Schmitt, P ersp ectiv es on net work calculus, ACM SIG- COMM Computer Communication Review 42 (4) (2012) 311–322. doi:10.1145/2377677.2377747. URL https://dl.acm.org/doi/10.1145/2377677.2377747 [103] M. Mirza, J. Sommers, P . Barford, X. Zh u, A Machine Learning Ap- proac h to TCP Throughput Prediction, IEEE/A CM T ransactions on Net working 18 (4) (2010) 1026–1039. doi:10.1109/TNET.2009.2037812. URL https://ieeexplore.ieee.org/document/5378489 [104] M. B. T ariq, K. Bhandank ar, V. V alancius, A. Zeitoun, N. F eam- ster, M. Ammar, Answ ering “What-If ” Deplo yment and Configu- ration Questions With WISE: T ec hniques and Deplo ymen t Exp eri- ence, IEEE/A CM T ransactions on Netw orking 21 (1) (2013) 1–13. doi:10.1109/TNET.2012.2230448. [105] M. Helm, F. Wiedner, G. Carle, Flo w-lev el T ail Latency Estimation and V erification based on Extreme V alue Theory , in: 2022 18th In- ternational Conference on Net work and Service Management (CNSM), IEEE, 2022, pp. 359–363. doi:10.23919/CNSM55787.2022.9964525. URL https://ieeexplore.ieee.org/document/9964525/ [106] D. Monaco, A. Sacco, D. Spina, F. Strada, A. Bottino, T. Cerquitelli, G. Marchetto, Real-time latency prediction for cloud gaming applications, Computer Netw orks 264 (2025) 111235. doi:10.1016/j.comnet.2025.111235. URL https://linkinghub.elsevier.com/retrieve/pii/ S1389128625002038 [107] A. Mestres, E. Alarcón, Y. Ji, A. Cab ellos-Aparicio, Understanding the Mo deling of Computer Netw ork Delays using Neural Net w orks, in: Pro ceedings of the 2018 W orkshop on Big Data Analytics and Mac hine Learning for Data Comm unication Netw orks, A CM, New Y ork, NY, USA, 2018, pp. 46–52. doi:10.1145/3229607.3229613. URL https://dl.acm.org/doi/10.1145/3229607.3229613 80 [108] F. Krasniqi, J. Elias, J. Leguay , A. E. C. Redondi, End-to-end Dela y Prediction Based on T raffic Matrix Sampling, in: IEEE INF OCOM 2020 - IEEE Conference on Computer Comm unica- tions W orkshops (INF OCOM WKSHPS), IEEE, 2020, pp. 774–779. doi:10.1109/INF OCOMWKSHPS50562.2020.9162765. URL https://ieeexplore.ieee.org/document/9162765/ [109] K. Rusek, J. Suárez-V arela, A. Mestres, P . Barlet-Ros, A. Cab ellos- Aparicio, Unv eiling the p otential of Graph Neural Net works for net work mo deling and optimization in SDN, in: Pro ceedings of the 2019 A CM Symp osium on SDN Researc h, A CM, New Y ork, NY, USA, 2019, pp. 140–151. doi:10.1145/3314148.3314357. URL https://dl.acm.org/doi/10.1145/3314148.3314357 [110] J. Suárez-V arela, S. Carol-Bosc h, K. Rusek, P . Almasan, M. Arias, P . Barlet-Ros, A. Cab ellos-Aparicio, Challenging the generaliza- tion capabilities of Graph Neural Net works for netw ork mo del- ing, in: Pro ceedings of the ACM SIGCOMM 2019 Conference P osters and Demos, ACM, New Y ork, NY, USA, 2019, pp. 114–115. doi:10.1145/3342280.3342327. URL https://dl.acm.org/doi/10.1145/3342280.3342327 [111] K. Rusek, J. Suárez-V arela, P . Almasan, P . Barlet-Ros, A. Cabellos- Aparicio, RouteNet: Lev eraging Graph Neural Net w orks for Net- w ork Mo deling and Optimization in SDN, IEEE Journal on Selected Areas in Comm unications 38 (10) (2020) 2260–2270. doi:10.1109/JSA C.2020.3000405. [112] A. Badia-Samp era, J. Suárez-V arela, P . Almasan, K. Rusek, P . Barlet- Ros, A. Cab ellos-Aparicio, T ow ards more realistic net work mod- els based on Graph Neural Netw orks, in: Pro ceedings of the 15th In ternational Conference on emerging Netw orking EXp erimen ts and T echnologies, A CM, New Y ork, NY, USA, 2019, pp. 14–16. doi:10.1145/3360468.3366773. URL https://dl.acm.org/doi/10.1145/3360468.3366773 [113] M. F erriol-Galmés, K. Rusek, J. Suárez-V arela, S. Xiao, X. Shi, X. Cheng, B. W u, P . Barlet-Ros, A. Cab ellos-Aparicio, 81 RouteNet-Erlang: A Graph Neural Net w ork for Net w ork P er- formance Ev aluation, in: IEEE INFOCOM 2022 - IEEE Con- ference on Computer Communications, 2022, pp. 2018–2027. doi:10.1109/INF OCOM48880.2022.9796944. [114] B. K. Dhamala, B. R. Daw adi, P . Manzoni, B. K. A chary a, P erfor- mance Ev aluation of Graph Neural Netw ork-Based RouteNet Mo del with Atten tion Mec hanism, F uture Internet 16 (4) (2024) 116. doi:10.3390/fi16040116. [115] Cláudio Mo desto, Reb ecca Ab en-A thar, Andrey Silv a, Silvia Lins, Glauco Gon alv es, Aldebaro Klautau, Dela y estimation based on mul- tiple stage message passing with attention mec hanism using a real net- w ork comm unication dataset, ITU Journal on F uture and Evolving T echnologies 5 (4) (2024) 465–477. doi:10.52953/RBNE4256. URL https://www.itu.int/pub/S- JNL- VOL5.ISSUE4- 2024- A35 [116] Kaan A ykurt, Maximilian Stephan, Serkut A yv asik, Johannes Zerwas, W olfgang Kellerer, Digital t win opportunities with lev eraging graph neural net w orks on real net work data, ITU Journal on F uture and Ev olving T ec hnologies 5 (4) (2024) 458–464. doi:10.52953/ZOEM2142. URL https://www.itu.int/pub/S- JNL- VOL5.ISSUE4- 2024- A34 [117] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, W av elet-Enhanced Graph Neural Net works: T o wards Non-P arametric Netw ork T raffic Mo del- ing, in: Pro ceedings of the 3rd GNNet W orkshop on Graph Neural Net working W orkshop, A CM, New Y ork, NY, USA, 2024, pp. 14–19. doi:10.1145/3694811.3697823. [118] F. Geyer, DeepComNet: P erformance ev aluation of net work top ologies using graph-based deep learning, P erformance Ev aluation 130 (2019) 1–16. doi:10.1016/j.pev a.2018.12.003. URL https://www.sciencedirect.com/science/article/abs/ pii/S0166531618300944 [119] B. Jaeger, M. Helm, L. Sch w egmann, G. Carle, Mo deling TCP p erfor- mance using graph neural net works, in: Pro ceedings of the 1st Interna- tional W orkshop on Graph Neural Netw orking, ACM, New Y ork, NY, USA, 2022, pp. 18–23. doi:10.1145/3565473.3569190. 82 [120] T. Suzuki, Y. Y asuda, R. Nak amura, H. Ohsaki, On Estimat- ing Communication Delays using Graph Conv olutional Net works with Semi-Sup ervised Learning, in: 2020 International Conference on Information Net working (ICOIN), IEEE, 2020, pp. 481–486. doi:10.1109/ICOIN48656.2020.9016603. URL https://ieeexplore.ieee.org/document/9016603 [121] J. Liu, F. T ang, L. Chen, X. Li, J. Y u, Y. Zh u, Y. Y u, Y. Y ang, EA GLE: Heterogeneous GNN-based Net work P erfor- mance Analysis, in: 2023 IEEE/A CM 31st In ternational Sym- p osium on Qualit y of Service (IWQoS), IEEE, 2023, pp. 1–10. doi:10.1109/IW QoS57198.2023.10188804. URL https://ieeexplore.ieee.org/abstract/document/ 10188804 [122] S. Huang, Y. W ei, L. Peng, M. W ang, L. Hui, P . Liu, Z. Du, Z. Liu, Y. Cui, xNet: Mo deling Net w ork Performance With Graph Neural Net works, IEEE/A CM T ransactions on Net working 32 (2) (2024) 1753– 1767. doi:10.1109/TNET.2023.3329357. [123] B. Li, G. V erma, T. Efimo v, A. Kumar, S. Segarra, GLANCE: Graph- based Learnable Digital T win for Communication Netw orks (2024). URL [124] H. Du, M. Li, FlowSeer: A Nov el F ramework for Gen- eralized Netw ork P erformance Estimation at Flo w Lev el, in: 2024 27th International Conference on Computer Supp orted Co- op erativ e W ork in Design (CSCWD), 2024, pp. 2834–2839. doi:10.1109/CSCWD61410.2024.10580262. [125] C. Li, A. A. Zabreyk o, A. Nasr-Esfahany , K. Zhao, P . Go y al, M. Al- izadeh, T. Anderson, m4: A Learned Flo w-level Net work Simulator (3 2025). URL [126] W. L. Hamilton, R. Ying, J. Lesko v ec, Inductive representation learn- ing on large graphs, in: Proceedings of the 31st In ternational Con- ference on Neural Information Processing Systems, NIPS’17, Curran Asso ciates Inc., Red Ho ok, NY, USA, 2017, p. 1025–1035. 83 [127] K. Cho, B. v an Merrienbo er, C. Gulcehre, D. Bahdanau, F. Bougares, H. Sc h wenk, Y. Bengio, Learning phrase represen tations using rnn enco der-deco der for statistical machine translation (2014). URL [128] M. W ang, Y. Cui, S. Xiao, X. W ang, D. Y ang, K. Chen, J. Zh u, Neural Net work Meets DCN: T raffic-driven T op ology A daptation with Deep Learning, Pro ceedings of the A CM on Measuremen t and Analysis of Computing Systems 2 (2) (2018) 1–25. doi:10.1145/3224421. URL https://dl.acm.org/doi/10.1145/3224421 [129] S. Xiao, D. He, Z. Gong, Deep-Q: T raffic-driven QoS Inference using Deep Generativ e Netw ork, in: Pro ceedings of the 2018 W orkshop on Net work Meets AI & ML - NetAI’18, A CM Press, New Y ork, New Y ork, USA, 2018, pp. 67–73. doi:10.1145/3229543.3229549. URL http://dl.acm.org/citation.cfm?doid=3229543.3229549 [130] D. P . Kingma, M. W elling, Auto-enco ding v ariational ba yes (2022). URL [131] A. Dietmüller, S. Ray , R. Jacob, L. V an b ev er, A new hop e for net work mo del generalization, in: Pro ceedings of the 21st A CM W orkshop on Hot T opics in Net works, HotNets ’22, Asso ciation for Computing Mac hinery , New Y ork, NY, USA, 2022, pp. 152–159. doi:10.1145/3563766.3564104. URL https://doi.org/10.1145/3563766.3564104 [132] A. V aswani, N. Shazeer, N. P armar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. P olosukhin, A ttention is all y ou need (2023). URL [133] K. Hattori, T. K orik a w a, C. T ak asaki, Meta Learner-Based T rans- fer Learning: Bridging Simulation and Actual Router Metrics, in: 2024 IEEE 25th In ternational Conference on High Perfor- mance Switching and Routing (HPSR), IEEE, 2024, pp. 203–208. doi:10.1109/HPSR62440.2024.10635943. URL https://ieeexplore.ieee.org/document/10635943/ 84 [134] K. Hattori, T. K orik aw a, C. T ak asaki, Meta Learner-Based T ransfer Learning: Bridging Simulation and Actual Router Metrics, IEEE Ac- cess 13 (2025) 76085–76099. doi:10.1109/A CCESS.2025.3564954. [135] M. Happ, J. L. Du, M. Herlich, C. Maier, P . Dorfinger, J. Suárez- V arela, Exploring the Limitations of Current Graph Neural Net- w orks for Netw ork Mo deling, in: NOMS 2022-2022 IEEE/IFIP Net work Op erations and Managemen t Symp osium, 2022, pp. 1–8. doi:10.1109/NOMS54207.2022.9789708. URL https://ieeexplore.ieee.org/document/9789708 [136] P . V eličk o vić, G. Cucurull, A. Casanov a, A. Romero, P . Liò, Y. Bengio, Graph atten tion netw orks (2018). URL [137] R. Netra v ali, A. Siv araman, S. Das, A. Go y al, K. Winstein, J. Mic k ens, H. Balakrishnan, Mahimahi: Accurate Record-and-Replay for HTTP, in: 2015 USENIX Ann ual T echnical Conference (USENIX A TC 15), USENIX Asso ciation, Santa Clara, CA, 2015, pp. 417–429. URL https://www.usenix.org/conference/atc15/technical- session/presentation/netravali [138] J. Zhang, K. Gao, Y. R. Y ang, J. Bi, Prophet: T ow ard F ast, Error- T olerant Model-Based Throughput Prediction for Reactiv e Flo ws in DC Netw orks, IEEE/ACM T ransactions on Netw orking 28 (6) (2020) 2475–2488. doi:10.1109/TNET.2020.3016838. URL https://ieeexplore.ieee.org/document/9178502/ [139] F. Gey er, S. Bondorf, DeepTMA: Predicting Effectiv e Conten tion Mo d- els for Netw ork Calculus using Graph Neural Netw orks, in: IEEE INF OCOM 2019 - IEEE Conference on Computer Communications, IEEE, 2019, pp. 1009–1017. doi:10.1109/INF OCOM.2019.8737496. URL https://ieeexplore.ieee.org/document/8737496/ [140] F. Geyer, S. Bondorf, On the Robustness of Deep Learning-predicted Con tention Mo dels for Netw ork Calculus, in: 2020 IEEE Symposium on Computers and Communications (ISCC), IEEE, 2020, pp. 1–7. doi:10.1109/ISCC50000.2020.9219693. URL https://ieeexplore.ieee.org/document/9219693/ 85 [141] M. F arreras, J. P aillissé, L. Fàbrega, P . Vilà, GNNetSlice: A GNN-based p erformance mo del to supp ort netw ork slicing in B5G netw orks, Computer Comm unications 232 (2025) 108044. doi:10.1016/j.comcom.2025.108044. [142] R. Srik an t, The Mathematics of In ternet Congestion Con trol, 2004. doi:10.1007/978-0-8176-8216-3. [143] K. Hattori, T. Korik aw a, C. T ak asaki, Queue-informed neural net work mo del for estimating queuing dela y in pon-based aggre- gation netw orks, in: 2025 IEEE 11th International Conference on Net work Softw arization (NetSoft), IEEE, 2025, pp. 199–203. doi:10.1109/NetSoft64993.2025.11080627. URL https://ieeexplore.ieee.org/document/11080627/ [144] C. Li, A. Nasr-Esfahany , K. Zhao, K. Noorbakhsh, P . Go y al, M. Al- izadeh, T. E. Anderson, m3: A ccurate Flo w-Lev el P erformance Es- timation using Mac hine Learning, in: Pro ceedings of the A CM SIGCOMM 2024 Conference, A CM SIGCOMM ’24, Association for Computing Mac hinery , New Y ork, NY, USA, 2024, pp. 813–827. doi:10.1145/3651890.3672243. URL https://doi.org/10.1145/3651890.3672243 [145] P . Namy ar, B. Arzani, S. Kandula, S. Segarra, D. Crankshaw, U. Kr- ishnasw amy , R. Go vindan, H. Ra j, Solving { Max-Min } fair resource allo cations quic kly on large graphs, in: 21st USENIX Symposium on Net work ed Systems Design and Implementation (NSDI 24), 2024, pp. 1937–1958. [146] Inference co de for llama mo dels. URL https://github.com/facebookresearch/llama/blob/main/ llama/model.py [147] V. P axson, S. Floyd, Wh y we don’t kno w how to sim ulate the In ternet, in: Pro ceedings of the 29th conference on Win ter simulation - WSC ’97, A CM Press, New Y ork, New Y ork, USA, 1997, pp. 1037–1044. doi:10.1145/268437.268737. [148] S. Flo yd, E. Kohler, Internet researc h needs b etter mo dels, A CM SIGCOMM Computer Comm unication Review 33 (1) (2003) 29–34. doi:10.1145/774763.774767. 86 [149] Simple explanation of the no-free-lunch theorem and its implications, Journal of optimization theory and applications 115 (2002) 549–570. [150] C. E. Leiserson, F at-trees: Universal netw orks for hardware-efficien t sup ercomputing, IEEE T ransactions on Computers C-34 (10) (1985) 892–901. doi:10.1109/TC.1985.6312192. [151] C. Güemes-P alau, M. F erriol-Galmés, J. Paillisse-Vilano v a, A. Lóp ez- Brescó, P . Barlet-Ros, A. Cab ellos-Aparicio, Bridging the gap b et w een sim ulated and real netw ork data using transfer learning, arXiv preprin t arXiv:2510.00956 (2025). 87

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment