Analysis of Inter-Domain Traffic Correlations: Random Matrix Theory Approach

Reading time: 5 minute
...

📝 Original Info

  • Title: Analysis of Inter-Domain Traffic Correlations: Random Matrix Theory Approach
  • ArXiv ID: 0706.2520
  • Date: 2007-06-19
  • Authors: 원문에 저자 정보가 제공되지 않았습니다.

📝 Abstract

The traffic behavior of University of Louisville network with the interconnected backbone routers and the number of Virtual Local Area Network (VLAN) subnets is investigated using the Random Matrix Theory (RMT) approach. We employ the system of equal interval time series of traffic counts at all router to router and router to subnet connections as a representation of the inter-VLAN traffic. The cross-correlation matrix C of the traffic rate changes between different traffic time series is calculated and tested against null-hypothesis of random interactions. The majority of the eigenvalues \lambda_{i} of matrix C fall within the bounds predicted by the RMT for the eigenvalues of random correlation matrices. The distribution of eigenvalues and eigenvectors outside of the RMT bounds displays prominent and systematic deviations from the RMT predictions. Moreover, these deviations are stable in time. The method we use provides a unique possibility to accomplish three concurrent tasks of traffic analysis. The method verifies the uncongested state of the network, by establishing the profile of random interactions. It recognizes the system-specific large-scale interactions, by establishing the profile of stable in time non-random interactions. Finally, by looking into the eigenstatistics we are able to detect and allocate anomalies of network traffic interactions.

💡 Deep Analysis

Figure 1

📄 Full Content

The infrastructure, applications and protocols of the system of communicating computers and networks are constantly evolving. The traffic, which is an essence of the communication, presently is a voluminous data generated on minute-byminute basis within multi-layered structure by different applications and according to different protocols. As a consequence, there are two general approaches in analysis of the traffic and in modeling of its healthy behavior. In the first approach, the traffic analysis considers the protocols, applications, traffic matrix and routing matrix estimates, independence of ingress and egress points and much more. The second approach treats the infrastructure between the points from which the traffic is obtained as a "black box" [33], [34].

Measuring interactions between logically and architecturally equivalent substructures of the system is a natural extension of the “black box” approach. Certain amount of work in this direction has already been done. Studies on statistical traffic flow properties revealed the “congested”, “fluid” and “transitional” regimes of the flow at a large scale [1], [2]. The observed collective behavior suggests the existence of the large-scale network-wide correlations between the network subparts. Indeed, the [3] work showed the large-scale crosscorrelations between different connections of the Renater scientific network. Moreover, the analysis of correlations across all simultaneous network-wide traffic has been used in network distributed attacks detection [4].

The distributions and stability of established interactions statistics represent the characteristic features of the system and may be exploited in healthy network traffic profile creation, which is an essential part of network anomaly detection. As it is successfully demonstrated in [5], all tested traffic anomalies change the distribution of the traffic features.

Among numerous types of traffic monitoring variables, time series of traffic counts are free of applications “semantics” and thus more preferable for “black box” analysis. To extract the meaningful information about underlying interactions contained in time series, the empirical correlation matrix is a usual tool at hand. In addition, there are various classes of statistical tools, such as principal component analysis, singular value decomposition, and factor analysis, which in turn strongly rely on the validity of the correlation matrix and obtain the meaningful part of the time series. Thus, it is important to understand quantitatively the effect of noise, i.e. to separate the noisy, random interactions from meaningful ones. In addition, it is crucial to consider the finiteness of the time series in the determination of the empirical correlation, since the finite length of time series available to estimate cross correlations introduces “measurement noise” [19]. Statistically, it is also advisable to develop null-hypothesis tests in order to check the degree of statistical validity of the results obtained against cases of purely random interactions.

The methodology of random matrix theory (RMT) developed for studying the complex energy levels of heavy nuclei and is given a detailed account in [6], [7], [8], [9], [10], [11]. For our purposes this methodology comes in as a series of statistical tests run on the eigenvalues and eigenvectors of “system matrix”, which in our case is traffic time series crosscorrelation matrix C (and is Hamiltonian matrix in case of nuclei and other RMT systems [6], [7], [8], [9], [10], [11]).

In our study, we propose to investigate the network traffic as a complex system with a certain degree of mutual interactions of its constituents, i.e. single-link traffic time series, using the RMT approach. We concentrate on the large scale correlations between the time series generated by Simple Network Manage Protocol (SNMP) traffic counters at every router-router and router-VLAN subnet connection of University of Louisville backbone routers system.

The contributions of this study are as follows:

• We propose the application constraints free methodology of network-wide traffic time series interactions analysis. Even though in this particular study, we know in advance that VLANs represent separate broadcast domains, VLAN-router incoming traffic is a traffic intended for other VLANs and VLAN-router outgoing traffic is a routed traffic from other VLANs. Nevertheless, this information is irrelevant for our analysis and acquired only at the interpretation of the analysis results. • Using the RMT, we are able to separate the random interactions from system specific interactions. The vast majority of traffic time series interact in random fashion.

The time stable random interactions signify the healthy, and free of congestion traffic. The proposed analysis of eigenvector distribution allows to verify the time series content of uncongested traffic. • The time stable non-random interactions provide us with information

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut