We develop a probabilistic framework for global modeling of the traffic over a computer network. This model integrates existing single-link (-flow) traffic models with the routing over the network to capture the global traffic behavior. It arises from a limit approximation of the traffic fluctuations as the time--scale and the number of users sharing the network grow. The resulting probability model is comprised of a Gaussian and/or a stable, infinite variance components. They can be succinctly described and handled by certain 'space-time' random fields. The model is validated against simulated and real data. It is then applied to predict traffic fluctuations over unobserved links from a limited set of observed links. Further, applications to anomaly detection and network management are briefly discussed.
Deep Dive into Global Modeling and Prediction of Computer Network Traffic.
We develop a probabilistic framework for global modeling of the traffic over a computer network. This model integrates existing single-link (-flow) traffic models with the routing over the network to capture the global traffic behavior. It arises from a limit approximation of the traffic fluctuations as the time–scale and the number of users sharing the network grow. The resulting probability model is comprised of a Gaussian and/or a stable, infinite variance components. They can be succinctly described and handled by certain ‘space-time’ random fields. The model is validated against simulated and real data. It is then applied to predict traffic fluctuations over unobserved links from a limited set of observed links. Further, applications to anomaly detection and network management are briefly discussed.
Understanding the statistical behavior of computer network traffic has been an important and challenging problem for the past 15 years, because of its impact on network performance and provisioning [21,29,15,26] and on the potential for development of more suitable protocols [25,26]. Since the early 1990s it has been well established that the traffic over a single link exhibits intricate temporal dependence, known as burstiness, which could not be explained by traffic models developed for telephone networks [20]. This phenomenon could be understood and described by using the notions of long-range dependence and self-similarity [12], which in turn are affected by the presence of heavy tails in the distribution of file sizes [7,25]. A bottom-up mechanistic model for single link network traffic that is in agreement with the empirical features observed in real network traces was presented in [38]. A competing model based on queuing ideas was studied in [22]. These works lead to many further developments (see eg [26]).
Advances in technology that allowed the acquisition of direct, through sampling [10,42], and indirect [19] measurements have allowed researchers to examine the characteristics of traffic in entire networks [18,15,31,41], based on statistical modeling analysis. On the other hand, an analogue of the mechanistic models available for single link network traffic is not available. Such a model would allow better understanding of network performance [13,21] and detection of anomalous behavior [27]. Further, it would manage to capture and explain statistical relationships between flows traversing the network at all time scales (time) and across all links (space); the latter represents a fairly tall requirement, which may also prove rather impractical given the underlying complexity (protocols, applications) and heterogeneity (physical infrastructure, diverse users) of modern networks.
Our objective in this paper is to propose a mechanistic model that captures several fundamental characteristics of network-wide traffic and thus constitutes a partial solution for this challenging problem. The model is based on modeling user behavior on source-destination paths across the network and then aggregate over users and over time, thus developing a joint ‘space-time’ probability model for the traffic fluctuations over all links in the network. This model reflects the statistical dependence of the traffic across different links, observed at the same or different points in time. We demonstrate the success of our modeling strategy in the context of network traffic prediction -a problem with important implications on network performance, provisioning, and management.
The remainder of the paper is structured as follows. In Section 2.1, we review briefly the existing and relatively well-understood theory of single-flow (link) models for the temporal dependence in network traffic. Long-range dependence and heavy tails play a central role. In Section 3, we postulate our network-wide model based on combining single-flow models through the routing equation. We show that the scaling limit of such a model is a combination of fractional Brownian motions and infinite variance stable Lévy motions. A succinct representation of these processes is given in Section 3.2 via the functional fractional Brownian motion and functional Lévy stable motion. The resulting model is then used in Section 4 to solve the network kriging problem, i.e. to predict the traffic fluctuations on a unobserved link from a limited set of measurements of observed links. In Section 5, we use extensive NetFlow data of sampled network traffic to obtain approximations of the flow-level traffic X j (t). These data are then used to validate our model and demonstrate the success of the network kriging methodology. We conclude in Section 6 with some remarks on future applications, statistical problems on networks, and further extensions of the network-wide probabilistic model.
Consider a computer network of L links and N nodes. The network typically carries traffic flows (via groups of packets) from any node (source) to any other node (destination) over a predetermined set of links (route). This can be formally described by the routing matrix A = (a j ) L×J , where a j = 1 , route j involves link 0 , otherwise, and where J is the total number of routes (typically, J = N (N -1)).
We describe next the physical premises of our modeling framework. We assume, for simplicity, that the traffic is fluid. That is, the amount of data (bytes) transmitted over link during the time interval (a, b) is b a Y (t)dt, where Y (t) is the traffic intensity (bytes per unit time) over link . Let also X j (t) denote the traffic intensity at time t over route j, 1 ≤ j ≤ J . Then, assuming that traffic propagates instantaneously over the network, we obtain the following routing equation:
where
This relationship is valid only to the extent that traffic propagates instantaneously along
…(Full text truncated)…
This content is AI-processed based on ArXiv data.