Analysis of Inter-Domain Traffic Correlations: Random Matrix Theory Approach
The traffic behavior of University of Louisville network with the interconnected backbone routers and the number of Virtual Local Area Network (VLAN) subnets is investigated using the Random Matrix Theory (RMT) approach. We employ the system of equal…
Authors: Viktoria Rojkova, Mehmed Kantardzic
1 Analysis of Inter -Domain T raf fic Correlations: Random Matrix Theory Approach V iktoria Rojkov a , Mehmed Kantardzic Departmen t of Co mputer E ngineerin g and Computer Science, University of Lou isville, Lo uisville, KY 4 0292 email: {vbro zh01, mmkan t01}@gwise.louisv ille.edu Abstract — The traffic behavior of Univ ersity of Louisville network with the interconnected backb one routers and the number of V irtu al L ocal Area Network (VLAN) sub nets is in vestigated using the Random Matrix Theory (RMT) approach. W e employ the system of equal interval time series of traffic counts at all router to r outer and ro uter to subnet connections as a representation of th e inter-VLAN traffic. The cross-correlation matrix C of t he traffic rate changes between different traffic time series is calculated and tested a gainst null-hypothesis of random interactions. The majo rity of the eigen values λ i of matrix C fall wit hin the bounds predicted by the RMT for the eigen values of ran- dom corre lation matrices. Th e d istribution of eigen values and eigen vectors out side of the RMT bou nds displays p rominent and systematic de viations fr om th e RMT predictions. Moreov er , these deviations are stable in time. The method we use provides a unique possibility to a ccomplish three concurrent tasks of traffic analysis. The method verifies th e uncongested state of the network, by establishin g the profile of random interactio ns. It r ecognizes th e system-specific larg e-scale interactions, by establish ing the profile of stable i n time non- random interactions. Fi nally , by looking int o the eigenstatistics we are able to detect and allocate anomalies of n etwork traffic interactions. C AT E G O R I E S A N D S U B J E C T D E S C R I P T O R S C.2.3 [ Co mputer -Communication Networks ]: Network Operations G E N E R A L T E R M S Measuremen t, Experim entation Index T erms — Network-W ide T raffic Analysis, Random Ma- trix Theory , Large-Scale Correlations I . I N T R O D U C T I O N The infrastructure, applications an d p rotocols of th e system of communicatin g compu ters and networks are constantly ev olv ing. Th e traffic, which is an essence of th e commu nica- tion, pre sently is a voluminou s data g enerated on m inute-by - minute basis within multi-layered structure by d ifferent app li- cations and accordin g to dif ferent proto cols. As a conseque nce, there are two gen eral approac hes in analysis of the tr affic and in mod eling of its healthy behavior . In the first appro ach, the traffic an alysis con siders th e proto cols, ap plications, traffic matrix and ro uting matrix estimates, independ ence of ingr ess and e gress p oints an d muc h m ore. Th e secon d app roach treats the infrastructur e between the points from which the traffic is obtained as a “black box ” [33], [34]. Measuring interactions between logically and architecturally equiv alent substructur es of the sy stem is a natural extension of the “black box” approach . C ertain amo unt of work in this direction has already been done. Studies o n statistical traffic flow pr operties revealed the “cong ested”, “fluid ” an d “transitional” regimes of th e flo w at a large s cale [1], [2 ]. The observed collectiv e behavior sug gests the existence of the large-scale network-wid e correlations between the n etwork subparts. Indeed, the [3] work sho wed the large-scale cr oss- correlation s between different connections of the Renater sci- entific ne twork. Mo reover , the an alysis of c orrelation s across all simultaneous netw ork-w ide traffi c has been used in netw ork distributed attacks detection [4]. The distributions and stability of established interactio ns statistics rep resent the charac teristic features of the system and may be exploited in healthy ne twork tr affic p rofile creation, which is an essential p art o f n etwork anom aly detection . As it is successfully demonstrated in [5], all tested traffic an omalies change the d istribution of the traffic featu res. Among nu merous types of tr affic monitorin g variables, time series o f tra ffic co unts are free o f app lications “semantics” and thu s more pr eferable for “black b ox” analysis. T o extract the mean ingful in formation ab out under lying interaction s con - tained in time series, the empirical correlatio n matrix is a usual tool at h and. In ad dition, th ere are various classes of statistical tools, such as principal component an alysis, singular v alue decomp osition, and factor analysis, which in turn strongly rely on the v alidity of the correlation m atrix and obtain the meaningf ul part of the time series. Th us, it is im portant to understan d quantitatively the effect of noise, i.e. to separate the noisy , random interac tions from me aningfu l ones. In addition, it is crucial to con sider the finiten ess of th e time series in the determ ination of the emp irical correlation , since the finite length of time series available to estimate cross correlations introdu ces “measurement noise” [19]. Statistically , it is also advisable to develop null-hypo thesis tests in or der to check the degree o f statistical validity of the results ob tained ag ainst cases o f p urely ra ndom in teractions. The method ology of random matrix theory (RMT) devel- oped fo r stud ying the comp lex en ergy levels of heavy nuclei and is given a de tailed account in [6], [7], [8], [9], [ 10], [11]. For our purposes this meth odolog y comes in as a series of statistical tests r un on t he eigen values and eigenv ectors of “system m atrix”, which i n our case is traffic tim e series cross- correlation m atrix C (and is Hamilton ian matr ix in case of nuclei and other RMT systems [ 6], [7], [8], [9], [10], [11]). 2 In our stud y , we pro pose to inv estigate the network traffic as a complex system with a certain degree o f mutu al inter actions of its constituents, i.e. sing le-link traffic time series, using the RMT ap proach. W e concentrate on the large scale correlations between the time series gener ated by Simple Network Mana ge Protocol (SNMP) traffic co unters at every router-router and router-VLAN subnet con nection o f University of Lo uisville backbo ne routers system. The contributions of this study are as f ollows: • W e prop ose the applicatio n co nstraints free methodology of network-wide traffic time series interactions an aly- sis. Even though in t his particular study , we kno w in advance that VLA Ns re present separa te bro adcast do- mains, VLAN- router in coming traffic is a traffic inten ded for other VL ANs and VLAN-r outer outgo ing traffic is a routed traffic from other VLANs. Nevertheless, this informa tion is irrele vant for our analysis and acq uired only at the interpre tation of the analysis results. • Using the RMT , we are able to separate the rando m interactions fro m system specific interactions. Th e vast majority of traf fic time s eries interact in random fashion. The time stable ran dom inter actions sig nify the healthy , and free of congestion traffic. Th e pro posed analysis of eigenv ector d istribution allows to verify the time series content o f uncon gested traffic. • The time stable non-rand om intera ctions provide us with inf ormation abo ut large-scale system-specific inter- actions. • Finally , the temporal ch anges in rand om and non -rando m interactions can be de tected and allocated with eigenv al- ues and eigenvectors st atistics of interactio ns. The organization of this paper is as follows. Section II presents the survey of related work. W e describe the RMT methodolo gy in Section III. Section IV contains the e xplanatio n of the data analyzed. In Section V we test t he eigenv alue distribution of inter-VLAN traffic tim e series cross-corr elation matrix C against the RMT pred ictions. In Section VI we an alyze the content o f inter-VLAN tr affic in teractions by mean of eigen- values and eigenvectors deviated from RMT . Section VII dis- cusses th e characteristic traffic interactions par ameters of the system such as time stability of the deviating eigen values and eigenv ectors, inverse participa tion ratio (IPR) of eigenv alu es spectra, localization points in IPR p lot, overlap matrices of the deviating eigen vectors. W ith series o f different experiments, we demonstrate h ow tra ffic intera ctions anomalies can be detected and allo cated in time and space using various visu- alization techniqu es on eige n values and eigenvectors statistics in Sec tion VIII . W e present our conclusions and pr ospective research steps in Section IX. I I . R E L AT E D W O R K Few w orks inves tigate the interactions of tra ffic time series regardless of u nderlyin g architecture of the traffic system. As it was stated in Intro duction, the study of [3] showed the large-scale cr oss-correlation s between different con nections o f the Fren ch scien tific network Renater with 2 6 interco nnected routers and 650 con nections lin ks. The rand om inter actions between traffic time series of com plex traffic system withou t the r outing p rotocol info rmation were established by Krbale k and Seba in [12] for transpo rtation system in Cue rnav a ca (Mexico). The urgent n eed fo r a network-wid e, scalable appr oach to the problem of h ealthy network traffic profile cre ation is expressed in works of [5], [14], [15], [13], [16], [17]. There are se veral studies with th e promising re sults, which demon strate that the traffic ano malous events cause the temp oral changes in statistical p roperties of traffic features. Lakhin a, Crovella and Diot presented the char acterization of th e network-wide anomalies o f the traffic flows. The authors studied thr ee different typ es of traffic flo ws and fused the info rmation from flow measure ments taken throughout the entire network . Th ey obtained and classified a dif ferent set of anomalies for dif ferent traffic types u sing th e subspace metho d [14]. The same group of researchers extended their work in [5]. Under the new assump tion that any network anomaly in duces the changes in distributional aspects o f p acket h eader field s, they detected an d iden tified large set o f anom alies using the entropy me asurement tool. Hidden Mar kov model has be en prop osed to model th e distribution o f network-wide traf fic in [1 5]. The observation window is u sed to disting uish denial o f serv ice (D oS) floodin g attack m ixed with the no rmal back groun d traf fic. Roughan et a l. combined the entire network ro uting an d traffic data to detect th e IP forwardin g anomalies [16]. Huang et al., [17] used the distributed version of the Principal Componen t Analysis (PCA) meth od for c entralized network-wide v olume anom aly detection. A ke y ingredien t of their frame work is an analytical meth od b ased o n stochastic matrix perturbation th eory that balances between th e accu racy of the ap proxim ate network anomaly detection and the amo unt of d ata co mmunicatio n over the network. The authors of [1 3] found the high tem poral correlation (frequ ently > 0.99) between flow co unts on quiescent p orts (TCP/IP ports which are not in regular use) at the one o f the known pre- attack, so called r econna issance , ano malous behavior , vertical scan. I I I . R M T M E T H O D O L O G Y The RMT was employed in th e financial studies of stock correlation s [ 18], [19], com munication th eory of wir eless sys- tems [20], ar ray s ignal processing [ 21], bioinformatics studies of protein folding [22]. W e are n ot a ware of any work, except for [3], wh ere RMT tec hniques were ap plied to the Intern et traffic system. W e adopt the method ology used in w o rks on financial time series corr elations ( see [18], [19] an d referen ces ther ein) and later in [3], which discu sses cross-co rrelations in Intern et traffic. In particular, we quantify correlations between N traffic counts time series o f L time points, by calcu lating the traffic rate change of ev ery time series T i = 1 , . . . , N , over a time scale ∆ t , G i ( t ) ≡ ln T i ( t + ∆ t ) − ln T i ( t ) (1) where T i ( t ) denotes the traffic rate of time series i . Th is mea- sure is indepe ndent from the volume o f th e tr affic exchange 3 and allows capturing the subtle changes in the traffic rate [3]. The normalized traffic rate change is g i ( t ) ≡ G i ( t ) − h G i ( t ) i σ i (2) where σ i ≡ q h G 2 i i − h G i i 2 is the standard deviation of G i . The equal-time cross-corre lation matr ix C can be computed as f ollows C ij ≡ h g i ( t ) g j ( t ) i (3) The pro perties o f the traffic interactions matrix C have to be compare d with th ose of a ran dom cr oss-correlation matrix [23]. In ma trix notation , the in teraction matrix C can be expressed as C = 1 L GG T , (4) where G is N × L matrix with elements { g i m ≡ g i ( m △ t ) ; i = 1 , . . . , N ; m = 0 , . . . , L − 1 } , and G T denotes th e transpose of G . Just as was don e in [ 19], we consider a random correlation matrix R = 1 L AA T , (5) where A is N × L matrix containing N tim e series of L rando m elements a i m with zero mean a nd unit variance, which are mutually u ncorre lated as a n ull hyp othesis. Statistical p roper ties of the rand om matrices R ha ve b een known fo r years in physics literature [6], [10], [7], [8], [9], [11]. I n p articular, it was shown an alytically [24] that, unde r the restriction of N → ∞ , L → ∞ and pr oviding that Q ≡ L/ N ( > 1) is fix ed, the probability density f unction P r m ( λ ) of e igenv alues λ of the r andom matrix R is given by P r m ( λ ) = Q 2 π p ( λ + − λ ) ( λ − λ − ) λ (6) where λ + and λ − are max imum and min imum eigenv alues of R, r espectively and λ − ≤ λ i ≤ λ + . λ + and λ − are given analytically b y λ ± = 1 + 1 Q ± 2 r 1 Q . (7) Random matrices display universal functio nal forms for eigen - values correlations which depen d on the general symmetries of the matrix only . First step to t est the data for such a uni versal proper ties is to find a transfor mation ca lled “unfo lding”, which m aps th e eig en values λ i to n ew v ariables, “u nfolded eigenv alue s” ξ i , whose distribution is unifor m [9], [1 0], [11]. Unfoldin g ensures that the distances between eig en values are expressed in units of loca l mean eigenv alues spacing [9], and thus facilitates the co mparison with an alytical r esults. W e define the cu mulative d istribution function o f eige n val- ues, which counts the n umber of eig en values in the interval λ i ≤ λ, F ( λ ) = N Z λ −∞ P ( x ) dx, (8) where P ( x ) denotes the prob ability density o f eigenv alues and N is the total nu mber of eigen values. T he function F ( λ ) can be d ecompo sed into an av erage an d a flu ctuating p art, F ( λ ) = F av ( λ ) + F f luc ( λ ) , (9) Since P f luc ≡ dF f luc ( λ ) /dλ = 0 on average, P r m ( λ ) ≡ dF av ( λ ) dλ , (10) is the averaged eigen values density . The dim ensionless, un - folded eigenv alue s are the n g i ven by ξ i ≡ F av ( λ i ) . (11) Three known universal proper ties of GOE matrices ( ma- trices whose elemen ts are distributed accord ing to a Gau s- sian probability measur e) are: (i) the d istribution of nearest- neighbo r eigenv alues spa cing P GOE ( s ) P GOE ( s ) = π s 2 exp − π 4 s 2 , (12) (ii) th e distrib u tion of next-nearest-neig hbor eigenv alues s pac- ing, which is accordin g to the th eorem d ue to [8] is identical to the distribution of nearest-ne ighbor spacing o f Gaussian symplectic ensemble ( GSE), P GS E ( s ) = 2 18 3 6 π 3 s 4 exp − 64 9 π s 2 (13) and finally (iii) the “nu mber v ariance” statistics Σ 2 , defined as the variance of th e nu mber of un folded eigenv alu es in the intervals of length l , around eac h ξ i [9], [1 1], [1 0]. Σ 2 ( l ) = D [ n ( ξ , l ) − l ] 2 E ξ , (14) where n ( ξ , l ) is th e num ber of the un folded eigenv alues in the interval ξ − l 2 , ξ + l 2 . Th e number variance is expressed as follows Σ 2 ( l ) = l − 2 Z l 0 ( l − x ) Y ( x ) dx, (15) where Y ( x ) for the GOE case is giv en by [9] Y ( x ) = s 2 ( x ) + ds dx Z ∞ x s ( x ′ ) dx ′ , (16) and s ( x ) = sin ( π x ) π x . (17) Just as was stressed in [19], [18], [ 25] the ov erall time of obser - vation is cr ucial for explaining th e empirical cross-correlation coefficients. On one hand, the longer we observe the traffic the m ore info rmation about th e cor relations we o btain and less “noise” we introdu ce. On the other hand , the cor relations are not stationary , i.e. they can change with time. T o d ifferentiate the “rand om” contribution to empirical cor relation co efficients from “gen uine” contr ibution, the eigenv alues statistics of C is contr asted with the eige n values statistics o f a corr elation matrix taken from the so called “chiral” Gaussian Ortho gonal Ensemble [1 9]. Such an ensemble is one of the ensembles of RMT [ 25], [2 6], briefly discussed in Appendix A. A 4 random cross-correlation matr ix, which is a ma trix filled with uncorr elated Gaussian rand om numbers, is su pposed to repre- sent tran sient un correlated in time network activity , that is, a completely no isy en vironm ent. In case the cross-corr elation matrix C o beys the same eig enstatistical pro perties as the RMT -matrix, the n etwork traffic is equilibrated an d deemed universal in a sense that every sing le conn ection inter acts with the rest in a completely chaotic m anner . It also means a complete absen ce of cong estions and anomalies. Meantime, any stable in time deviations from the un iversal predictions of RMT signif y system-specific, nonran dom proper ties of the system, p roviding the clues abou t the nature of the un derlying interactions. That allows us to establish the profile of system- specific correlatio ns. I V . DAT A In this p aper, we study the averaged tr affic count d ata collected from all router-router an d router-VLAN sub net con- nections of the Un i versity of Lo uisville b ackbone routers system. The system consists o f nine inter connected mu lti- gigabit backbon e routers, o ver 200 E thernet segments and over 300 VLAN subnets. W e collected the traffic coun t data for 3 months, for the period from September 2 1 , 200 6 to December 20 , 200 6 from 7 routers, since two routers are r eserved for server farms. T he overall data amounted to a pprox imately 18 GB. The traffic coun t data is provided by Multi Rou ter Traf fic Grapher (MR TG) tool that reads the SNMP traffic counters. MR T G log file never gro ws in size d ue to the data con sol- idation algorithm: it contains r ecords of a verage incoming , outgoin g, max and min tran sfer rate in by tes per seco nd with time intervals 3 0 0 seconds, 30 m inutes, 1 d ay and 1 mo nth. W e extracted 3 00 seconds inter val data for seven day s. T hen, we separated th e incoming a nd outgoing tr affic co unts t ime s eries and considere d them as independ ent. F o r 352 conne ctions we formed L = 2 015 reco rds of N = 704 tim e series with 300 seconds interval. W e pursued the change s in the traffic rate, thus, we excluded from co nsideration the connections, where ch annel is open but the traffic is not established o r ther e is just constant rate and equal low amoun t tes t traffic. Anoth er reason for exclud ing the “empty” traf fic time series is that the y make the time series cross-corre lation matrix unnecessary sparse. The exclu sion does not influence th e analysis an d results. After the e x clusions the numb er of th e traffic time series becam e N = 497 . T o calculate the tr affic rate chan ge G i ( t ) we u sed the logarithm of the ratio of tw o successi ve counts. As it is s tated earlier, log -transfo rmation m akes th e r atio independen t f rom the traffic volume and allows capturin g the subtle c hanges in the traf fic rate. W e added 1 byte to all data poin ts, to a void manipulatio ns with l og (0) , in cases where tr affic count is equal to zero bytes. This measur e did n ot affect the chan ges in the traffic rate. V . E I G E N V A L U E DI S T R I B U T I O N O F C RO S S - C O R R E L A T I O N M AT R I X , C O M PA R I S O N W I T H R M T W e con structed inter-VLAN traffi c cross-cor relation matrix C with numb er o f time series N = 49 7 and n umber of observations per series L = 2015 , ( Q = 4 . 0 625 ) so that, λ + = 2 . 2384 3 and λ − = 0 . 25 3876 . Our first go al is to compare the eigenv alue distribution P ( λ ) o f C with P r m ( λ ) [23]. T o com pute eigenv alu es of C we used standard MATLAB function . Th e emp irical pr obability distribution P ( λ ) is then giv en by the correspondin g histogram. W e display the resulting distribution P ( λ ) in Figure 1 and com pare it to the p robab ility distribution P r m ( λ ) taken from Eq. (6) calculated fo r the sam e value of traffic time series par ameters ( Q = 4 . 0 6 25 ). The solid curve demonstrates P r m ( λ ) of Eq .(6). The largest eigenv alue shown in in set has the v alue λ 497 = 8 . 99 . W e zoom in the deviations fro m th e R MT predictions on th e inset to Figure 1. 0 2 4 6 8 Eigenvalue , Λ 0 0.2 0.4 0.6 0.8 1 Prob density P H Λ L 3 4 5 6 7 8 9 0 0.02 0.04 0.06 0.08 0.1 0.12 Deviations from RMT P rm H Λ L Fig. 1. Em pirica l probabili ty distribut ion function P ( λ ) for the inter-VLAN traf fic cross-corr elatio ns matrix C (histogram). W e note the pr esence o f “bulk” (RMT -like) e igen values which fall with in the bound s [ λ − ,λ + ] fo r P r m ( λ ) , an d presence of the eigenv alues which lie outside of the “bulk”, representin g deviations fro m the RMT pred ictions. In partic- ular , largest eigenv alue λ 497 = 8 . 99 for seven d ays per iod is approx imately fo ur times lar ger than the RMT upper b ound λ + . The histogram f or well-defined bulk agre es with P r m ( λ ) suggesting that the cr oss-correlation s of m atrix C are mostly random . W e observe that in ter-VLAN traffic time s eries inter- act mostly in a random fashion. Nev ertheless, the a greement of emp irical prob ability distri- bution P ( λ ) o f the bulk with P r m ( λ ) is not sufficient to claim that the b ulk of eigenv alu e sp ectrum is random. Therefor e, further RMT tests a re needed [19]. T o do that, we obtained t he unfolded eigen values ξ i by fol- lowing the phen omenolo gical pro cedure r eferred to as Gau s- sian broadenin g [2 7], (see [27], [35], [19], [18]). The empirical cumulative distribution fu nction of eigen values F ( λ ) agrees well with the F av ( λ ) (see Figure 2) , where ξ i obtained with Gaussian broaden ing procedure with the broade ning parameter a = 8 . The first independen t RMT test is the com parison of the distribution of the nearest-n eighbor u nfolded eigenv a lue spacing P nn ( s ) , where s ≡ ξ k +1 − ξ k with P GOE ( s ) [9], [10], [11]. The empirical pro bability distrib ution of ne arest- neighbo r unf olded eigenv alues spa cing P nn ( s ) and P GOE ( s ) are presented in Figure 3. The Gaussian decay of P GOE ( s ) for large s suggests that P GOE ( s ) “probes” scales only of t he 5 0 2 4 6 8 Eigenvalue , Λ 0 0.2 0.4 0.6 0.8 1 Cumulative density CDF H Λ L F av H Λ L Fig. 2. T he empirical cumulati ve distributi on of λ i and unfolded eigen v alues ξ i ≡ F av ( λ ) . order of one eig en value spacing. T he solid line rep resents. The 0 0.5 1 1.5 2 2.5 3 NNS, s 0 0.2 0.4 0.6 0.8 1 Prob density P NN H s L Fig. 3. Neare st-neighbo r spa cing distrib ution P nn ( s ) of unfolded eigen val - ues ξ i of cross-correlat ion matrix C . agreemen t between emp irical prob ability distribution P nn ( s ) and th e distribution of nearest-neig hbor eigen values spacing of the GOE matrices P GOE ( s ) testifies that the p ositions of two adjacen t empirical unfo lded eigen values at the distance s are co rrelated ju st as the eigenv alue s of th e GOE m atrices. Next, we took on th e distribution P nnn ( s ′ ) of next-nearest- neighbo r spacings s ′ ≡ ξ k +2 − ξ k between the unfo lded eigenv alue s. Acco rding to [8] th is d istribution should fit to the distrib u tion of nearest-n eighbo r spacin g of the GSE. W e demonstra te this correspondence in Figure 4. The solid line shows P GS E ( s ) . Finally , the long-rang e two-point eigen value correlation s were tested. I t is known [9], [ 10], [ 11], that if eigenv alue s are uncorrelated we expec t the nu mber v arian ce to scale with l , Σ 2 ∼ l . Meanwhile, when th e unfold ed eigenv alue s of C are correlated , Σ 2 approa ches constant v alue, revealing “spe ctral rigid ity” [9], [10], [11]. In Fig ure 5, we contrasted Poissonian numbe r variance with the one w e ob- served, an d came to th e conclusio n tha t eigenv alues belon ging to the “bulk” clearly exhibit universal RMT p roperties. The broade ning parame ter a = 8 was used in Gaussian broa dening proced ure to unf old the eigenv alues λ i [27], [ 35], [ 19], [1 8]. The dashed line co rrespon ds to the case o f u ncorrelate d 0.5 1 1.5 2 NNNS, s 0 0.25 0.5 0.75 1 1.25 1.5 1.75 Prob density P NNN H s L Fig. 4. Next-ne arest-nei ghbor eigen value spacing di stribu tion P nnn ( s ′ ) . eigenv alue s. T hese finding s sho w that the system of inter- 0 2 4 6 8 10 l 0 0.2 0.4 0.6 0.8 1 Number variance â 2 H l L Poisson GOE Fig. 5. Number va riance Σ 2 ( l ) calculat ed from the unfolded eigen val ues ξ i of C . VLAN traf fic has a un iversal part of eigenv a lues spectr al cor - relations, shar ed by bro ad class of systems, in cluding chao tic and disor dered systems, nuc lei, ato ms and molecules. Th us it ca n be conclu ded, that the bulk eig en value statistics of the inter-VLAN traf fic cro ss-correlation matrix C are consistent with those of real sy mmetric ran dom ma trix R , given by Eq. (5) [24]. Mean time, the deviations from the RMT contain the informa tion about the system-specific cor relations. The next section is entirely dev oted to the analy sis of the eig en values and eigen vectors d eviating from the RMT , which signifies the meaningf ul inter-VLAN traffic interaction s. V I . I N T E R - V L A N T R A FFI C IN T E R A C T I O N S A N A LY S I S W e overview the points of inter est in eig en vectors of inter- VLAN traffic cross-corr elation matrix C , which are deter- mined according to C u k = λ k u k , where λ k is k -th eige n value. Particularly important ch aracteristics of eigen vector s, proven to be u seful in ph ysics of disorde red con ductors is th e in verse participation ratio (IPR) (see, fo r e xample, Ref. [11]). In such systems, the IPR being a fun ction o f an eigenstate (eigen- vector) allo ws to judg e and clarify whether the correspond ing eigenstate, and therefo re electron is extended o r loc alized. 6 A. In verse participation ratio of eigenvectors componen ts For o ur purp oses, it is suf ficien t to kn ow that IPR q uantifies the recipr ocal of the numb er of significant compo nents of the eigenv ector . For the e igen vector u k it is d efined as I k ≡ N X l =1 u k l 4 , (18) where u k l , l = 1 , . . . , 497 are compo nents of the eigenvector u k . In particular , the vector with one si gnificant compo nent has I k = 1 , while vector with iden tical compon ents u k l = 1 / √ N has I k = 1 / N . Consequently , th e in verse of IPR gi ves us a 0 2 4 6 8 −5 −4 −3 −2 −1 eigenvalues, λ log(IPR) control Fig. 6. In verse partici pation rati o a s a function of eigen value λ . number o f significan t participan ts of the e igen vector . In Figur e 6 we plot the IPR of c ross-correlatio n m atrix C as a function of eige n value λ . The c ontrol plot is IPR of eig en vectors of random cross-co rrelation m atrix R of E q. 5. As we can see, eigenv ectors corresponding to eigen values from 0 . 25 to 3 . 5 , what is within the RMT boundaries, ha ve IPR close to 0 . This me ans that almo st all co mpon ents of eigen vectors in the bulk interact in a random fashion. The numb er of significant compon ents of eige n vectors deviating f rom th e RMT is typ - ically twenty time s s maller than the one of the eigenvectors within the RMT bo undaries, aro und twenty . For instance, IPR of eigenvector u 492 , which correspo nds to the e igen value 5 . 9 in Figure 6, is 0 . 05 , i.e. twenty time series are significantly contribute to u 492 . Another o bservation which we derive fro m Figure 6 is th at the number of eig en vectors significant partic- ipants is co nsiderably smaller at bo th edges of the eigenv alue spectrum. These findings resemb le the results of [1 9], wher e the eigen vector s with a fe w participating componen ts were referred to as localized vectors. The theory o f localization is explained in th e context o f rando m band matrice s, where elements in depend ently drawn from different proba bility dis- tributions [19]. T hese matrices desp ite th eir rand omness, still contain prob abilistic information . Th e localization in inte r- VLAN traffic is explained as follo ws. The separ ated b roadcast domains, i.e. VLANs fo rward traffic f rom on e to another only through the router , reduc ing the routing f or b roadcast containme nt. Although th e optim al VLAN deployment is to keep as much traffi c as possible from traversing thro ugh the rou ter , th e bottlen eck at the large number of VLANs is unav oidable. B. Distrib ution of eigenvector s compone nts Another target of interest is the d istribution of the compo - nents u k l ; l = 1 , . . . , N of eigenv ector u k of the interactions matrix C . T o calculate vectors u we used the MA TLAB routine again and obtain ed componen ts distrib u tion p ( u ) of the eig en vectors compon ents. Then, we contr asted it with the RMT predictions for the eigenv ector d istribution p r m ( u ) of the ran dom correlation matrix R . Accor ding to [11] p r m ( u ) has a Gaussian distribution with m ean zero and unit v a riance, i.e. p r m ( u ) = 1 √ 2 π exp − u 2 2 . (19 ) The weigh ts o f rand omly interactin g tra ffic counts tim e series, which are repr esented by the eigenv ectors compo nents has to be distrib uted normally . The results ar e presented in F igure 7. One can see (fr om Figures 7a and 7 b) that p ( u ) f or two u k taken from the bulk is in acco rd with p r m ( u ) . The distribution p ( u ) correspond ing to the eigenv alue λ i , wh ich e xceeds the RMT u pper b ound ( λ i > λ + ), is sh own in Figure 7 c. T he solid line shows p r m ( u ) fro m Eq. 19. ( c) p ( u ) for u 496 , correspo nding t o the eigenv alue outside of the RMT b u lk. (d) p ( u ) fo r u 497 , correspo nding to largest eig en value. Eigenvector components Probability density p H u L 0.1 0 0.1 0.2 0 10 20 H c L u 496 - 0.2 - 0.1 0 0.1 0.2 0 10 20 H d L u 497 - 0.1 0 0.1 0 5 10 H a L u bulk - 0.1 0 0.1 0 5 10 H b L u bulk Fig. 7 . Di stribu tion of compone nts p ( u ) of eige n vectors corresponding to eigen v alues ( a) from the m iddle of the bulk, i.e. λ − < λ < λ + , (b) from the bulk close t o λ + , (c) λ 496 (d) λ 497 . C. Deviating eigenvalues a nd significant inter -VLAN traf fic series contributing to the deviating eigenvectors . The distribution of u 497 , the eigenv ector c orrespon ding to the largest eigenv alue λ 497 , de viates significan tly fro m the Gaussian (as follows from Figure 7d). While Gaussian kurtosis has the value 3, the kurtosis of p u 497 comes out to 23 . 22 . The smaller n umber of significant components of the eigenv ector also influen ces the difference between Gaussian d istribution and emp irical d istribution of eigenvector compon ents. More than half o f u 497 compon ents have the same sign, thus slightly shiftin g the p ( u ) to one side. Th is result suggests the existence of the co mmon VLAN traf fic intended for inter-VLAN co mmunic ation that affects all of the significant p articipants of the eigen vector u 497 with th e sam e bias. W e k now that the numb er of significant compon ents of u 497 is twen ty two, since IPR o f u 497 is 0 . 045 . Hence, the 7 largest eigenv ector content re veals 22 traffic time series, which are affected by the same e vent. W e obtain the time series, which affects 22 traffic time series by th e following procedure. First of all, we calculate pr ojection G 497 ( t ) of the time series G i ( t ) o n th e eigenv ector u 497 , G 497 ( t ) ≡ 497 X i =1 u 497 i G i ( t ) (20) Next, we com pare G 497 ( t ) with G i ( t ) , b y findin g the cor re- lation c oefficient D G 497 ( t ) σ 497 G i ( t ) σ i E . The Fib er Distributed Da ta Interface (FDDI)-VLAN intern et switch at o ne of the route rs demonstra tes the largest corr elation coefficient of 0 . 89 (see Figure 8). The eigen vector u 497 has the following con tent: Fig. 8. (a) FDDI-VLAN internet switch time series regressed against the project ion G 497 ( t ) f rom Eq. 20. (b) T ime series define d by the eigen vector correspond ing to eigen val ue within RMT bounds shows no linear dependenc e on G 497 ( t ) . se ven most sig nificant particip ants are seven FDDI- VLAN switches at the se ven r outers. The presence of FDDI-VLAN switch provide us with info rmation about VLAN m embership definition. FDDI is layer 2 protoco l, which m eans th at at least one of two layer 2 membership is u sed, por t group or /and MA C add ress membership. The next gr oup of significant participants compr ises o f VLAN traffic intended for routing and alread y route d tr affic f rom different VLANs. The final group of significan t p articipants con stitutes open switches, which pick u p any “ leaking” traf fic on the router . Usually , the “leaking” traffic is the network manage ment traffic, a very low lev el traffic which spikes when queried by the man agement systems. If every deviating eig en value notifies a p articular sub- model of non- random inte ractions of the network, th en every correspo nding eigenvector p resents the numbe r of sign ificant dimensions of sub-m odel. Th us, we can thin k of every devi- ating eigen vector as a re presentative network-wide “snapshot” of in teractions within the ce rtain d imensions. The analysis o f the significant particip ants of the deviating eigenv ectors revealed thr ee ty pes o f in ter-VLAN traffic time series groupin gs. One grou p co ntains time series, which a re interlinked on th e rou ter . W e recogn ize them as, router1- VLAN_100 0 tr affic, router1-firewall traffic and VLAN_1000 - router1 traffic. The time serie s, which a re listed as router 1- vlan_20 00, router2-VLAN_2 000, router3-VLAN_2 000, etc. , are reserved f or th e same service VLA N on e very rou ter an d comprise another gr oup. T he content of these groups sug gests the VLAN s implementa tion, it is a mixture of infrastructu ral approa ch, where functio nal groups (dep artments, schools, etc.) are conside red, and servic e app roach, wh ere VLAN provides a particular service (network m anagemen t, fire wall, etc. ). V I I . S TA B I L I T Y O F I N T E R - V L A N T R A FFI C I N T E R AC T I O N S I N T I M E W e expe ct to observe the s tability of inter-VLAN tr affic interactions in the per iod of time used to co mpute traffic cross- correlation matrix C . The eige n values distribution at different time periods provides the informatio n abo ut the system sta- bilization, i.e. abo ut the time after which the fluctu ations of eigenv alue s are no t significant. T im e periods of 1 hour, 3 ho urs and 6 ho urs are n ot sufficient to gain the knowledge about the system , which is demon strated in Figure 9a. I n Figu re 9b the system stabilizes after 1 day period. T o observe th e time stability of inter -VLAN meaningful interactions we comp uted the “overlap matr ix” of th e deviating eig en vectors for the time period t and deviating eigen vectors for the tim e per iod t + τ , where t = 60 h, τ = { 0 h, 3 h, 12 h, 24 h, 36 h, 48 h } . First, we obtained matrix D fro m p = 5 7 eigenv ectors, which correspo nd to p eigenv a lues outside of the RMT u pper bound λ + . T hen we compu ted the “overlap ma trix” O ( t, τ ) from D A D T B , wh ere O ij is a scalar pro duct of th e eige n vector u i of period A (starting at time t = t ) with u j of perio d B a t the time t = t + τ , O ij ( t, τ ) ≡ N X k =1 D ik ( t ) D ik ( t + τ ) (21) The values of O ij ( t, τ ) elem ents at i = j , i.e. o f diag onal elements o f matrix O will be 1 , if th e m atrix D ( t + τ ) is identical to th e matrix D ( t ) . Clear ly , the diag onal of the “overlap matrix ” O can serve as an in dicator of time stability of p eigenvectors outside of the RMT up per bo und λ + . The g ray scale color map of the “overlap matrices” O ( t = 60 h, τ = { 0 h, 3 h, 12 h , 24 h, 36 h, 48 h } ) is p resented in Figure 10. Black co lor of gr ayscale rep resents O ij = 1 , white color represents O ij = 0 . The most stable eigen value is λ 492 . At lag τ = 3 h ours the inter -VLAN inter actions show the highest degree o f stability . For further lags the overall stability decays. As the an alysis o f deviating eigenvectors content s howed, th e h ighly in teracting traf fic tim e series are 8 0 100 200 300 400 500 Range 0 4 8 12 Eigenvalue, Λ H b L 48hours 96hours 24hours 0 100 200 300 400 500 Range 0 20 40 60 Eigenvalue, Λ 3hours H a L 6hours 1 hour Fig. 9. (a) Eige n value s distrib utions of traffic streams correlati on matrix C for 1 hour , 3 hours a nd 6 hours t ime interv als. (b) E igen val ues distrib utions for 24 hours, 48 hours and 7 2 hours time series of service based VLANs, intended for routin g. Particular network servic es are evoked at the same time and activ e fo r the same period of time, wh ich explain s the stability and conseq uent d ecay of deviating eig en vectors of tra ffic interactions. V I I I . D E T E C T I N G A N O M A L I E S O F T R A FF I C I N T E R A C T I O N S W e a ssume that th e health of inter-VLAN traffic is ex- pressed by stability of its inter actions in time. Meanwhile, the temporal critical ev ents or ano malies will cause the temporal instabilities. T he “ deviating” eigen values and eigenvectors provide u s with stable in tim e s napsho ts o f interactions repre- sentativ e of the entire network. Therefo re, th ese eigenvectors judged on the b asis of their IPR c an serve as monitor ing parameters o f th e system stability . Among the essential an omalous events of VL AN inf rastruc- ture we can list violations in VLAN memb ership assignment, in ad dress resolution protocol, in VLAN trunk ing protoco l, router m isconfigura tion. The violation of membersh ip assign- ment an d rou ter miscon figuration will c ause th e ch anges in the picture of ra ndom and n on-ran dom interaction s of inter- VLAN traf fic. T o shed more ligh t on the p ossibilities o f anomaly detection w e condu cted the experiments to establish spatial-tempor al traces o f in stabilities caused by artificial and temporal increa se o f the correlation in normal non -cong ested inter-VLAN traf fic. W e explo red the possibility to distingu ish different types of increased temp oral co rrelations. Finally , 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Fig. 10. The grayscale of overla p matrix O ( t, τ ) at t = 60 h and τ = { 0 h, 3 h, 12 h, 24 h, 36 h, 48 h } . we observed th e consequences of breaking the in teractions between time series, by injecting traffic counts obtaine d from sample o f r andom distribution. Experiment 1 W e selected the traffic counts tim e series representin g the compon ents of the eigen vector which lies within th e RMT bound s and temporarily increased th e correlation between these series for three hour period. The proposed monitoring parameters show the depen dence of system st ability on the number o f te mporarily correla ted time series (see Figure 11). Presented in Figure 11, left to right are (a ) eig en value distribution of interactions with two tem porarily corr elated time series, (b) IPR o f eigenvectors of interactions with two temporar ily co rrelated time series, ( c) the overlap matrix of deviating eigenvectors with two tempo rarily co rrelated tim e series. T op to bottom the lay out shows th ese monito ring parameters when correlation is tempo rarily in creased between 10 co nnections (d,e and f) and between 2 0 con nections ( g,h and i). One can co nclude that increased tempora l corr elation between tw o time series and between ten time ser ies does not affect system stab ility . Meanwh ile, when the numbe r of temp orarily correlated time series r eaches the nu mber of 9 0 200 400 0 4 8 12 H g L 0 2 4 6 8 10 12 - 4 - 3 - 2 - 1 H h L 0 10 20 30 40 50 0 10 20 30 40 50 H i L 0 200 400 0 4 8 H d L 0 2 4 6 8 10 - 4 - 3 - 2 - 1 H e L 0 10 20 30 40 50 0 10 20 30 40 50 H f L 0 200 400 0 4 8 H a L 0 2 4 6 8 10 - 4 - 3 - 2 - 1 H b L 0 10 20 30 40 50 0 10 20 30 40 50 H c L Fig. 11. Eigen val ues distrib ution, IPR and overla p matrix of de viating eigen vec tors. significant participan ts o f u 497 , which is calculated as inverse of I 497 and is equal to twenty two, th e system becomes visibly unstable. The largest eige n value chang es fro m 10 in stable condition to 1 2 , the tail of in verse par ticipation r atio p lot is extended and the d iagonal of “overlap matrix” disappears at twenty temporar ily correlated tim e series. In Figu res 12 (a, b, c an d d ), the temporal cor rela- tion b etween ten time series is traced with the matrix of sorted in decr easing order of their components deviating f rom RMT eigen vectors. The sorted in decreasing ord er de v iating eigenvectors components 10 20 30 40 50 100 200 300 400 10 20 30 40 50 100 200 300 400 10 20 30 40 50 100 200 300 400 10 20 30 40 50 100 200 300 400 (a) (c) (b) (d) Fig. 1 2. Sorted de viating eigen vectors with inje cted correla tion among ten traf fic time series. eigenv ectors of 60 h of uninterr upted traf fic are presented in Figure 12a. The n, after three hours of uninterrup ted traffic t he weights of eig en vectors c ompon ents, which had z ero value start changing , Th is is captur ed in Figure 12b. Same process for traffic with induced thr ee hours co rrelation is cap tured in Figure 1 2c. The difference between results in Figures 12b and 12c is presented in Figu re 12d. The procedu re used to visualize this produ ces the high rate of false positive alarms. In addition, we visualize in Figure 13 the system instability during tempo ral increase o f corr elation b etween twen ty time series with spatial-temporal representation of eigenv ector u 497 . W e u sed the weig hts of comp onents of eigenvector u 497 , Fig. 13. (a) The weights of components of u 497 plotte d for time period from 36 to 84 hours of uninterrup ted traf fic with 6 hours int erv al. (b) The we ights of component s of u 497 plotte d with respect to the same time peri od, with induce d three hours correlat ion. (c) T he weights of components of u 496 plotte d with respect to the same t ime period, with i nduced three h ours correlatio n. defined for IPR comp utation and plotted them with r espect to time t + τ , where t = 36 h ours a nd τ = 6 n, wh ere n ∈ { 0 , 1 , . . . , 7 } . In Figure 13a the spatial-temporal pattern of u 497 captures precise locations of system-specific interaction s of unin terrupted traffic for 84 hour s of observation. The abr upt change of th is pattern in Figure 13b indicates the starting point o f induc ed co rrelation betwee n twenty traf fic time s eries usually intera cting in a random fashion. It turns ou t, that the “normal” stable pattern of eigenvector u 497 moves to eig en- vector u 496 , whe n the inter ruption ends. Th us, we are able to observe the end p oint of the indu ced correlatio ns in Figu re 13c, which represents weights o f components o f eigenvector u 496 plotted with respect to the same time intervals. With this setup we are able to loc ate the anomaly in time and space. T r anslated to network topological r epresentation , the behavior of eigen vectors u 497 and u 496 during our m anipulation s with inter-VLAN traffi c may be mo nitored with the following graphs ( see Figure 14). Experiment 2 In the p revious experiment we injected just o ne type o f in- creased co rrelation am ong time series. Now we make two and three different types of indu ced correlations produce different spatial-tempor al patterns o n eig en vector u 497 compon ents (see Figure 15). T ime series for tempor al increase of correlation are obtained in the same way as in Exper iment 1. W e temporarily increased the correlation between serie s by i nducin g e lements from distributions of sine functio n and quad ratic func tion, respectively for thr ee hours. In Figure 1 5a, one typ e of th ree hours co rrelation is induced amon g ten traffic time series 10 eigenvector 497 eigenvector 496 Fig. 14. Left column - behav ior of u 497 during time period from 48 h to 60 h with 6 h time window , induce d correlatio n starts at 5 4 h and lasts for 3 h. Right column - b ehav ior of u 496 in same conditions. and an other type of correlation amo ng other ten tim e series. Three d ifferent types of three hours c orrelations are induced among twenty traffic time series in Figu re 15b. The sor ted in decreasing or der con tent of significant compo nents shows th at time series ten d to group acco rding to the ty pe of co rrelation they are inv o lved in. Experiment 3 Next we turn our attention to disruptio n of normal picture of inter-VLAN traffic interactions. This can be don e by injecting th e traffic fr om ran dom d istribution to non-ran domly interacting time series for three h ours. W e dem onstrate it by exam ining the e igenv alue distribution, the IPR and the deviating eigen vectors overlap matrix plotted in Fig ure 16. After 60 hours of uninter rupted tr affic, we injected elements from ra ndom distribution to significant particip ants of u 497 for three ho urs. T he largest eigenv alue increa ses, f rom 10 to 12 . Extended IPR tail sho ws the larger number of lo calized eigenv ectors and we observe the dramatic br eak in deviating eigenv ectors stability . I X . C O N C L U S I O N A N D F U T U R E W O R K The RMT me thodolo gy we used in this paper enables u s to analyze th e comp lex system behavior without th e consider a- tion of system constraints, type a nd s tructure . Our goal was to in vestigate th e characteristics of day-to-da y tempora l d ynamics of the system of interconn ected router s with VLAN subnets of the Un i versity of Lo uisville. T he ty pe and structur e of the Fig. 15. (a) The weights of component s of u 497 plotte d for time period from 36 to 84 hours with 6 hours interv al, two diffe rent types of induced correla tions. (b) The wei ghts of component s of u 497 plotte d with respect to the same time period, t hree differe nt ty pes o f induced correlations. 0 200 400 0 4 8 12 H a L 0 2 4 6 8 10 12 - 4 - 3 - 2 - 1 H b L 0 10 20 30 40 50 0 10 20 30 40 50 H c L Fig. 16. Eigen v alues distrib ution, IPR and overla p matrix of de viating eigen vec tors of inter-VLAN traf fic cross-corr elatio n matrix C . system at han d suggests th e natural interpreta tion of the RMT - like b ehavior and the RMT deviating resu lts. T he time stable random inter actions signify th e hea lthy , and free o f co ngestion traffic. The time stable non -rando m interactions provide us with infor mation about large-scale n etwork-wide traffic inter- actions. The ch anges in the stable picture of random an d non- random inter actions signify the temporal traffic ano malies. In genera l, th e fact of sharin g the un i versal p roperties of the bulk of eigenv alues spectrum of inter-VLAN traffic interactions with random matrices opens a ne w ven ue in network-wide traffic m odeling. As stated in [1 9], in physical systems it is c ommon to start with the mod el of dynam ics of the system. This way , on e would model the traffic time series interactions with the family of stochastic differential equatio ns [28], [29], which describe the “instantaneo us” traffic coun ts g i ( t ) = ( d/dt ) l nT i ( t ) , (22) as a ran dom walk with coup lings. T hen one would relate the revealed intera ctions to the correlated “modes” of the system. Additional q uestion that RMT findings raise in network- wide traf fic analysis is whether the found eigenv alu es sp ectrum correlation s and localized eigen vectors o utside of RMT b ulk can add to the explanation of the fundamental property of th e network traffic, such as self-similarity [30]. T o summarize , we have tested the eig en values statistics o f inter-VLAN traffic cross-correlation matrix C against the null hypoth esis of random correla tion matrix. By separating the eigenv alue s spe ctrum correlations of random matrices tha t are 11 present in this system, the unco ngested state of th e network traffic is verified. W e a nalyzed the stable in time system- specific correlatio ns. The an alyzed eigenv alu es and eigenvec- tors deviating from the RMT showed the principal gro ups of VLAN-rou ter switches, groups of traf fic time s eries interlinked throug h the fire walls and group s of same service VLANs a t ev ery ro uter . With straigh tforward experimen ts on the traffic time series, we d emonstrated that eigenv alue distribution, IPR of eigen vectors, overlap matrix and spatial-temp oral patterns of deviating eige n vectors can monito r the stability of inter- VLAN traffic interactio ns, de tect and spo t in time and space of any network -wide changes in nor mal traffic time series interactions. As the reservation for the futu re work, we would like to in vestigate the beha vior of de layed traffic tim e series cro ss- correlation matrix C d in the R MT terms. The impo rtance of delay in measuremen t-based analysis of Inter net is emphasized in [3 1]. T o understand and quantify the ef fect of one time series on another at a later time, on e can calc ulate the d elay correlation matrix, wh ere the en tries are cr oss-correlatio n of one time series and ano ther at a tim e delay τ [32]. In ad dition, we are in terested in testing the fruitfuln ess of the RMT approa ch o n the lar ger system of inter-domain interactions, for instance, on 5-minute a verage d tra ffic coun t ti me series of underly ing backbo ne circuits o f Abilene backb one network. AC K N O W L E D G M E N T This r esearch was partially suppor ted by a grant fro m the US Dep artment of Treasury through a sub contract from the University of Kentucky . Th e auth ors th ank I gor Rozh kov fo r consulting on the RMT methodology . W e than k Hans Fiedler , University o f Lo uisville network mana ger , for MR TG data of UofL router s system used in this study an d helpful sugge stions in network interpr etations of our resu lts. W e are g rateful to Nathan John son, University of Lo uisville super co mputing administrator, for providing the comp uting en vironm ent a nd space. R E F E R E N C E S [1] K. Fukuda, PhD Thesis: A study on phase transition pheno mena in interne t tra ffic , K eio Uni versity , 1999. [2] T . Ohira, R. Sawa tari, Phase transitio n in a computer netw ork traf fi c model, Phys. Rev . E 58 , Jul y 1998, 193-195. [3] M. Barthelemy , B. Gondran and E. Guichard, L arge scale cross- correla tions in internet t raf fic, arXi v:cond0mat /0206185 vol 2 3 Dec 2002. [4] A. Lakhi na, M. Crovel la, and C. Diot, Detecti ng distribut ed attacks using network-wide flo w tra ffic , Proceed ings of FloCon 2005 Analysis W orkshop, 2005. [5] A. Lakhina, M. Crovell a, and C. Diot. Mining Anomalies Using Traf fic Feature Distrib utions. T echnic al Report BUCS-TR-2005-002 , Boston Uni versi ty , 2005. [6] E.P . W igner , On a c lass of ana lytic fun ctions from t he quantum th eory of collisions, Ann. Math. 5 3 , 36 (1951), Proc. Cambri dge Philos. Soc. 47 , 790 (1951). [7] F . Dyson, Statistical theory of t he energy lev els of complex systems, J. Math. Phys. 3 , 140 (19 62). [8] F . Dyson and M.L. Mehta, Statistica l theory of the energy le vels of comple x syst ems, J. Math. Phys. 4 , 701, 713 (196 3). [9] M.L Mehta, Random ma trices (Academic Press, Boston, 1991). [10] T . A. Brody , J.Flores, J.B. French, P .A. Mello, A. Pandey , and S. S.M. W ong, Random-matrix physics: s pectru m and strength fluctuations, Re v . Mod. Phys. 53 , 385 - 479, issue 3 , July 1981. [11] T . Guhr , A. Muller-Groe ling, and H.A. W eidenmuller , Random matri x theorie s in quantum physics: common concepts, Phys. Rep. 299 , 190 (1998). [12] M. Krbalek and P .Seba, Statistic al prope rties of the city transport in Cuerna vaca (Mexico ) and random matrix theory . J. Phys. 214 (2000), 1, 91-100. [13] J. McNutt and M. De Shon, Correlation between quiescent ports in netw ork flows, CER T network situati onal a warene ss group rep ort, Carne gie Me llon Univ ersity , Septembe r 2005. [14] A. Lakhina, M. Crovell a, and C. Diot, Characte rizati on o f network-wi de anomalie s in traffic flows, Proceedings of the ACM/ SIGCOMM Internet Measurement conference , 2004, 201-206. [15] L. Min, Y . Shun-Zheng, A n etwork- wide t raf fic anoma ly detection method based on HSMM, Int. conf. on communications, circuits and system proceeding s, v ol 6 , June 20 06, 1636 - 1640. [16] M. Roughan, T . Griffin, M. Mao, A. Greenber g, a nd B. Freeman, Combining routing and traf fic data for det ection of IP forw arding anomalie s, Proceeding s of the joint int. conf. on Measurement and modeling of computer s ystems, 2004, 416 - 4 17. [17] L. Huang, X. Nguyen, M. Garofal akis, M. Jordan, A. Joseph and N. T aft, Distribute d PCA and network anomaly detection, T echnic al report No. UCB/EECS-2006-99. [18] S. Sharifi, M. Crane, A. Sha maie and H. Ruskin, Random matri x portfoli o optimizatio n: a stability approach, Physica A 335 (2004) 629- 643. [19] V . P lerou, P . Gopikrishna n, B. Rosenow , L. A. Nunes Amaral, T . Guhr, and H. E. Stanley , Random matrix theory approach to cross correlat ions in financial data, Phys. R e v . E, vol 65 , 066126, 2 7 June 2002. [20] A. T ulino and S. V erdu, Random matrix theory and wireless commu- nicat ions, Communications and Information theory , vol 1 , issue 1 , June 2004, 1 - 182. [21] D. T se, Multiuser rece i vers, random mat rices and free proba bility , Proceedi ngs of 37th Ann. Allerton Conf., Monticell o, IL, September 1999. [22] A. Zee, Random matrix theory and RNA foldin g, Acta Physica Polonica B, vol 36 , No 9 , Jun e 2005. [23] L. Laloux, P . Cizeau, J.-P . Bouchaud, and M. Potters, Noise dressing of financia l correla tion matri ces, Phys. Rev . Lett. 83 , Au gust 1999, 1467- 1470. [24] A.M. Sengupta and P .P . Mitra, Distributio ns of singular val ues for some random matrices, arXiv: cond-mat/ 9709283 vol 1 25 September 1997. [25] H.-J. Stockman, Quant um Chaos: a n introduction , 1999. [26] J.-P . Bouchaud, Theory of financial risk and deriv ati ve p ricing: from statisti cal physi cs to r isk management, 1962. [27] H. Bruus and J.-C. Ang les d’Auria c, Energy le vel statistics of two- dimensiona l Hubbard model at low filling, arXi v:cond-mat/96101 42 vo l 1 18 October 1996. [28] J.D. Farmer , Market Force, ecology and e volut ion, e-print adap-or g/9812005 , In t. J. Theo. Appl. fin. 3 , 4 25, 2000. [29] J.-P . Bouchaud, R. Cont, A Langevi n approach t o sto ck mark et fluctu - ations and crashes, European Jour nal of Physics, B 6 , 5 43, 1998. [30] W .E. Lel and, M.S. T aqq, W . W illinger , and D.V . W illson, On the sel f- similar nature of Ethernet tra f fic, A CM SIGCOMM, 1993, 183 - 193. [31] B. Zhang, T . S. Eugene Ng, and A. Nandi, Measurement-ba sed analysis, modeling, and synthesis of the Internet delay space, Proceeding s of the 6-th A CM SIGCOMM on Internet Mea surement, 2006, 85 -98. [32] K.B.K. Mayya and R .E. Am ritkar , Ana lysis of del ay correlation matri- ces, oai:arXi v .org:cond-mat/ 0601279 (2006-12-20). [33] W .-C. Lau, S.-Q. Li, T raffic an alysis in large-sca le high-speed inte- grated networks: va lidat ion of nodal decomposition approach, INFO- COM, 1993, Proceedings of twelfth annual joint conference of the IE EE Computer and Communications Societies, v ol 3 , 1320-1329. [34] W .H. Allen, G.A. Marin, L.A. Riv era, Aut omated detection of mali- cious reconnaissanc e to enhance netw ork security , Southea stCon, 2005, Proceedi ngs of IEE E, issue 8-10, April 2005, 450- 454. [35] H. Bruus a nd J.-C. Angles d ’Auriac, The spectrum of two-dimension al Hubbard model at lo w filling, Europhysics letters, 35 (5), 321-326, 1999. A P P E N D I X In this Appendix, we provide a short ( and no n-rigo rous) explanation of ma in concepts a nd glossary of ter ms used in the RMT studies. The RMT a pproach es, which or iginated in nuclear and condensed matter ph ysics an d later became common in many bran ches of mathematical ph ysics [25], 12 have rece ntly penetrated in to econ ophy sics, fin ance [2 6] and network traffic analysis [ 3]. For the statistical descrip tion of com plex physical systems, such as , f or e xample, atomic nucleus or acoustical reverberant structure, th e RMT serves as guidin g light wh en o ne is inter- ested in th e degree of mu tual inter action o f the constituents. As it turns o ut, the uncor related energy le vels or acoustic eigenfreq uencies would produ ce q ualitativ ely dif f erent result from those ob eying RMT -like corr elations [ 25]. Th erefore , real (experimen tally measured ) spectra can help to decide on the natur e of interactions in the und erlying system. T o be specific, idea lly , sym metric system is expected to exhibit spectral pro perties d rastically d ifferent fro m the properties of generic one, and if th e spectral pro perties ar e those of RMT systems, other ideas o f RMT can be broug ht to the researcher aid. T o describe “awareness” of the structu ral constituen ts ab out each othe r , scientists in d ifferent fields use similar co nstructs. Physicists use Ham iltonian matrix, eng ineers stif fness m atrix, finance and ne twork analysts the eq ual-time cross-corr elation matrix. Although the physical meaning of mentioned op- erators can b e different, the e igen values/eigen vectors anal- ysis seems to be a univ ersally accepted tool. The eigen- values have dir ect con nection to spectru m o f physical sys- tems, wh ile eigenvectors ca n be used f or the description o f excitation/signal/info rmation propag ation inside the system. In phy sics, the RMT approaches c ome abo ut whenever the system of interest demo nstrates certa in qualitative features in their spectral behavior . For example, if one looks a t nearest neighbo r spacing distrib u tion o f eig en values and instead of Poisson law P ( s ) = exp ( − s ) , discovers “W ign er sur mise” P ( s ) = π 2 s exp − π 2 s 2 , one concludes (upon running several add itional s tatistical tests) that apparatus of RMT can be used for the system at hand, and system matrix c an be r eplaced by a m atrix with r andom entries. For math ematical con venien ce, th ese entries are gi ven Gaussian weight. Th e only other ingr edient o f this rath er succinct pheno menolog ical mo del is re cognizing the physical situation. F o r examp le, systems with and without magnetic field and /or cen tral symm etry ar e describe d by different m a- trix en sembles (that is the set of m atrices) with e lements distributed within distribution cor respond ing to the same β P ( β ) ( H ) ∝ e xp − β 4 v 2 trH 2 , where the constant v sets the length of the resulting eigenv al- ues spectrum. The very fact that RMT can be help ful in statistical descrip- tion of the broad range of systems su ggests that these s ystems are analyzed in a cer tain spe cial universal regime, in which physical or other la w s are undermined by eq uilibrated and er - godic ev olution. In most p hysical applications, a Hamiltonian matrix is rathe r sparse, ind icating lack of interaction between different subparts of the correspond ing ob ject. Ho wever , if the universal regime is in ferred from the above mentioned statistical tests, it is very beneficial to replace t his single matrix with the ensemble of random m atrices. Then , on e can p roceed with statistical analy sis using matrix ensemble for calculation of statistical av erages more r elev an t fo r the physical problem a t hand th an t he statistics o f eig en values. The latter can be mean or variance of th e response to external or internal excitation.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment