Detrended fluctuation analysis of traffic data

Different routing strategies may result in different behaviors of traffic on internet. We analyze the correlation of traffic data for three typical routing strategies by the detrended fluctuation analysis (DFA) and find that the degree of correlation…

Authors: Xiaoyan Zhu, Zonghua Liu, Ming Tang

Detrended fluctuation analysis of traffic data
Detr ended fluctuation analysi s of traffic data Xiaoyan Zhu, Zongh ua Liu, and Ming T ang 1 1 Institute of Theor etical Physics and Department of Physics, East China Normal University , Shanghai, 200062, China (Dated: March 27, 2007 ) Different rou ting strategies may resu lt in differen t beha viors of traffic on internet. W e analyze the co rrelation of traffic data for three typical routing strategies by the detrended fluctuation analysis (DF A) and find that the degree of correlation of the data can be divided into three region s, i. e., weak, medium, and str ong correlation. The DF A scalings are constants in both the regio ns of weak and strong correlation b ut monotonously increase in the reg ion of medium correlation. W e sugge st that i t is better to consider the traffic on comple x network as three phases, i.e., the free, buf fer , and congestion phase, than j ust as two phases believ ed before, i.e. , the free and congestion phase. P A CS numbers: 89.75.Hc ,89.40.-a,89.20.Hh T el: 021-6 2233 216(Office); 021-5 2705 019(Home); 13585 6910 04(Cell) Fax:021-62 23241 3 Email: zh liu@phy .ecnu.edu. cn Undoub tedly , the in ternet has become a very imp ortant tool in our daily life. The operatio ns on th e internet, such as b rows- ing W orld W ide W eb ( WWW) pages, sending messages b y email, transfer ring files by ftp, searching fo r inf ormation on a range of topics, and sh opping etc., have benefited us a lot. Therefo re, sustain ing its normal and efficient fu nctioning is a basic req uiremen t. Howe ver , th e co mmunica tion in the in - ternet does n ot alw ays mar ch/go freely . Similar to the traf- fic jam on the hig hway , the intermitten t con gestion in the in - ternet h as been ob served [ 1]. This p henom enon can be also observed in other commu nication networks, su ch a s in the airline transpor tation network or in the postal service n et- work. For red ucing/co ntrolling the traffic conge stion in co m- plex networks, a number of appro aches have been pre sented [2, 3, 4, 5, 6 , 7, 8, 9 , 10, 11, 12]. Their ro uting strategies can be classified into two classes acco rding to if the p ackets are delivered along the shor test path or not. The delivering time of a packet from its bo rn to its destina- tion depend s on the status of inter net and the routing strate gy . It is belie ved that there are two phases in co mmunica tion, i.e., the free and congestion phase. In th e routing strategy of the shortest path, the delivering time equals the path length in the free p hase and become long er and lon ger in the co ngestion phase with time going . For the former, the deliv ering times for different packets will b e u ncorr elated as each in dividual packet can go freely to its destination; while f or the later, they will become co rrelated as the waiting times ar e d etermined b y the accumulated packets in their p aths. In the routin g strategy of non-shortest path, the deli vering times may be different fo r different strategies ev en in the free phase. As the inter net has a power -law d egree distrib ution, the nod es with hea vy links are easy to be the mid dle stations for packets to pass by and hence are easy to b e congested. F or reducin g congestion on these heavy nodes, the pa cket may be delivered along a p ath which av oids the heavy nod es and hence the path is a little long er than the sho rtest pa th [9, 10, 11, 12]. Of co urse, the packets will still go the shortest p ath if the packets in the network is not accu mulated. Ther efore, it is p ossible for the delivering times to be either corr elated or uncorrelated in the free phase. As the d eliv ering ti mes are c losely related to the degree of ac- cumulation of pac kets in the network s, the co rrelation of de- li vering times c an be also reflected in the time series of packets of the network. A ty pical rou ting strategy of the shortest path is given by Liu et al. [9]. An d two typ ical routin g strategies of the non -shortest p ath are given by Echenique et al. [10, 11] and Zha ng et al. [12]. Here we will stu dy the cor relation of packets produced by these three typical strategies. As the tra ffic d ata are p roduce d b y all the n odes with some random ness, there exist erratic fluctuation, heteroge neity , an d nonstationa rity in the d ata. These features make the corre- lation difficult to be q uantified. A conventional approa ch to measure the corr elation in this situation is by th e detrend ed fluctuation analy sis (DF A), which can reliably quantify scal- ing featur es in the fluctuation s by filtering o ut poly nomial trends. Th e DF A meth od is based on th e idea th at a corr elated time series can be map ped to a self-similar pro cess by in tegra- tion [13, 14, 15, 16, 17]. Th erefor e, measu ring the self-similar feature can indirectly tell us infor mation ab out the correlation proper ties. Th e DF A meth od has be en successfully ap plied to detect long-range correlations in hig hly complex h eart beat time series [14], stock index [15], and other p hysiolog ical sig - nals [1 7]. In this p aper, we will use the DF A method to m ea- sure the correlation of traffic da ta. Most of the previous studies assume that th e creatio n an d delivering rates of p ackets do not change from nod e to n ode. Considering the fact that different nod es in the interne t have different capacities, a more re alistic a ssumption is tha t the packet creation and delivering rates at a n ode are degree- depend ent. Th is featu re has bee n recen tly addre ssed by Zh ao et a l. [6] and Liu et al. [9, 12]. They assum e th at the cre- ation and deliv ering rates of packets are λk i and (1 + β k i ) , respectively , where k i is th e degree of node i , λ repr esents the ability of creatin g packets for a no de with degree one, the 1 in (1 + β k i ) reflects the fact th at a nod e can deliver a t least one packet each time , a nd β d enotes the ability for a link to 2 deliver pa ckets. For a fixed λ , there is a thresho ld β c . It is in the free ph ase when β > β c and in the con gestion phase when β < β c . Her e we stud y how th e scaling of corr elation changes with the p arameter β . W e fin d th at ther e is always a scaling in the D F A of traffic data an d the scaling can be di- vided in to th ree regions, wh ich implies th e existence o f th ree phases of traffic on complex networks, i.e. , the f ree, b uffer , and conge stion p hase. W e now co nstruct a scale- free ne twork with the to tal nu m- ber of no des N = 100 0 and the av erage numb er of links connected with one node < k > = 6 acc ording to the alg o- rithm gi ven in Ref. [18] and let e very n ode create λk i packets and send out at most 1 + β k i packets at each time step. T he destinations of the created packets are rand omly chosen and the sending out ob eys the first-in -first-out rule. In the deliv- ering pro cess, the newly created and arrived packets will b e placed a t the end of the q ueues of each no de. For the Liu’ s approa ch of the shortest path , we follo w the Ref. [ 9] to collect the time series of traffic data. Figure 1 shows the evolution of the average p ackets per node < n ( t ) > in the network where the three lines fr om top to bottom repr esent the ca ses of β = 0 . 06 < β c , β = 0 . 061 ≈ β c , and β = 0 . 1 > β c , respectively . Obviou sly , the packets in the con gestion ph ase of β = 0 . 06 increa se linear ly with time t , an d th e packets in the free phase of β = 0 . 061 and 0 . 1 fluctuate around dif ferent constants. For the Echen ique’ s appro ach of the non-sho rtest path, we f ollow the Ref . [1 0, 11] th at a packet o f n ode i will choose o ne o f its ne ighbor ing nodes, ℓ , as its next station ac- cording to the minimum value of δ ℓ = hd ℓ,j + (1 − h ) n ℓ , where d ℓ,j is the shor test path length from the neighb oring node ℓ to the destination j and n ℓ is the acc umulated p ack- ets at nod e ℓ . The param eter h is a weighing factor , which can b e taken as a variational pa rameter an d h ≈ 0 . 8 is foun d to g iv e the best p erform ance. The Echeniqu e’ s ap proach thus accounts fo r th e waiting time only at the ne ighbor ing no des. Echeniqu e’ s approac h was pr esented fo r the case of equal cre- ation and deliv ering r ates at every n ode. For th e delivery rate of (1 + β k i ) , a modified Echeniq ue’ s appro ach [12] is to choose a neighbo ring n ode with the minimum value o f δ ℓ = hd ℓ,j + (1 − h ) n ℓ 1 + β k ℓ . (1) W e h ere ch oose h = 0 . 85 in Eq. (1) and find tha t th e traffic data has th e similar be haviors fo r different β as tha t shown in Fig. 1. And fo r the Zhan g’ s ap proach of th e no n-shor test path, we follow the Ref. [12] to collect the traffic d ata. For a packet at node i , we take a n ode ℓ fr om the neig hbor s of node i and label th e shortest path f rom no de ℓ to the sour ce j by { S P : ℓ , j } . Along this p ath, we ev aluate the fo llowing quantity for the node ℓ : d ( ℓ ) = X s ∈{ S P : ℓ,j } n s 1 + β k s , (2) where the sum is over the nodes alo ng the shortest path { S P : ℓ, j } , excluding the destination . Thu s, d ( ℓ ) is an estimate o f 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 t FIG. 1: The av erage packets per node for λ = 0 . 01 where the t hree lines from top t o bottom represent the case of β = 0 . 06 , 0 . 061 , and 0 . 1 , respecti vely . the time that a packet would take to go f rom node ℓ to the destination j through the sho rtest path. The nod e ℓ with the minimum of d ( ℓ ) will b e chosen as the next station of the packet at no de i . W e find that the traf fic data also has the similar behaviors for dif ferent β as th at shown in Fig. 1. All the thr ee typical ap proach es show a commo n fea ture that ther e are two kinds of data: the d ata in the cong estion phase increa ses lin early with t and the data in the free ph ase fluctuations aro und a constant. In order to qu antify the cor- relations in the congestion p hase, it is im portant to rem ove the global trend. Therefo re, we r emove the trend of linearly increasing with t b y sub tracting a best fitting straight line of the tim e series. This p rocedu re makes the data in con - gestion phase have th e similar beh avior with tha t in th e free phase. Figu re 2 shows an examp le of removing th e global trend wh ere the upp er line denotes the orig inal data with β = 0 . 06 > β c and the lower line the d ata after removing the global trend. The DF A meth od is a modified root-m ean-squar e ( rms) analysis of a rando m walk and its algo rithm c an be worked out as the following steps [ 13, 14, 15, 16, 17]: (1) Start with a si gnal s ( j ) , where j = 1 , · · · , N , an d N is the length of the signal, and integrate s ( j ) to obtain y ( i ) = i X j =1 [ s ( j ) − < s > ] , (3) where < s > = 1 N P N j =1 s ( j ) . (2) Divide the integrated pro file y ( i ) into boxes o f equal length m . In each bo x, we fit y ( i ) to get its local tren d y f it by using a least-square fit. (3) The integrate d profile y ( i ) is de trended by sub tracting the local trend y f it in each box : Y m ( i ) ≡ y ( i ) − y f it ( i ) . (4) (4) For a given box size m , the rms fluctuatio n for the inte- 3 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 - 0 .1 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 t FIG. 2: Removing t he global trend of the congestion data for the shortest path approach where the u pper line d enotes the original data with β = 0 . 06 > β c and the lower line the data after remov ing the global trend. grated and detrend ed s ignal is calculated: F ( m ) = v u u t 1 N N X i =1 [ Y m ( i )] 2 . (5) (5) Repeat this proced ure f or different b ox size m . For scale-inv ariant signals with power -law correlation s, there is a power-law re lationship between the rms fluctuatio n func- tion F ( m ) and the box size m : F ( m ) ∼ m α . (6) The scaling α represents the degree of the correlation in the signal: the signal is unco rrelated for α = 0 . 5 and corr elated for α > 0 . 5 [13, 14, 15, 16, 17]. W e n ow use the DF A meth od to q uantify th e correlation an d scaling p roper ties of the flu ctuated da ta with no g lobal tre nd. For th e collected d ata in th e three ty pical appr oaches, Fig. 3 shows how the rms fluctuatio n fu nction F ( m ) chang es with the scale t where the lin es f rom top to b ottom in each panel denote the direc tion o f increasing β and (a) represents the case of Liu’ s ap proach , (b) the case of Echen ique’ s approac h, and (c) the case o f Z hang’ s appr oach. It is easy to see that all the lines are straight wh en m is smaller th an the crossover po int (shown by the arrows), indicating there is a scaling α fo r ea ch line. Com paring the the lines with dif ferent β , we see that the scaling α chang es with β . The relationship between α and β is shown in Fig. 4 where the lines with “square s”, “circles”, an d “stars” den ote the cases o f Liu’ s, Echen ique’ s, and Zha ng’ s approa ch, respecti vely . From Fig. 4, it is easy to see that α is an ap prox imate con- stant in th e co ngestion p hase ( β < β c ) fo r a ll th e th ree c ases where β c are the locations o f the thr ee dashed lines, but α have d ifferent behaviors in the free pha se betwee n th e m ethod with the shor test p ath and that with the n on-shor test path . In 1 0 1 0 0 1 0 0 0 0 . 0 1 0 . 1 1 1 0 1 0 0 1 0 0 0 0 . 0 1 0 . 1 1 1 0 1 0 0 1 0 0 0 0 . 0 1 0 . 1 1 ( b ) F ( m ) m ( c) F ( m ) ( a ) F ( m ) FIG. 3: F ( m ) v ersus m for different β where the arro ws sho w the crossov er points, the dashed lines sho w the slopes/scalings of F ( m ) for eye guide, and (a) represents the case of Liu’ s approach , (b) t he case of Echenique’ s approa ch, and ( c) the case of Zhang’ s approach. 0 .0 0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 0 0 .5 1 .0 1 .5 FIG. 4: Ho w the scaling α changes wit h β where the lines with “squares”, “circles”, and “stars” denote the cases of Liu’ s, Echenique’ s, and Zhang’ s approach, respectiv ely , and the three dot- ted lines sho w the locations of β c for the three approaches. the Liu ’ s approach, α in the f ree pha se is a con stant, while in the Ech enique ’ s an d Zh ang’ s app roache s, α is a constant fo r β > 0 . 061 but mono tonou sly increases before β dec reases to β c . Let’ s call the sepa ration value of β from a con stant to in- creasing as β 1 , nam ely β 1 = 0 . 061 . Then , we h ave β 1 = β c in Liu’ s ap proach and β 1 > β c in both the E cheniqu e’ s and 4 Zhang’ s approaches. For β > β 1 , all the values of α are close to 0 . 5 , hence the corr espond ing traf fic data are appro xi- mately un correlated . For β c < β < β 1 , the correlation in the Echeniqu e’ s and Zhang ’ s appro aches a re gradu ally incre ased from sho rt range to lo ng ran ge c orrelation , an d arrive global correlation fo r β < β c . For distinguishin g the three different behaviors, we call their corresponding traffic as free ( β > β 1 ), buf fer ( β c < β < β 1 ), and congestion ( β < β c ) phase. The relationship be tween the scaling α and the correspond- ing traffic behaviors can be und erstood as follo ws. In th e strat- egy of th e shortest path, as th e accumulatio n of packets will firstly oc cur at the hub no des, the n etwork c an b e co nsidered as cong ested once the hub n odes are conge sted [ 9]. In that time, th ere a re n o accum ulated packets at the n on-hu b nodes. Hence, the critical value β c can be fig ured out b y the average packets on the hub eq ualling 1 + β c k hub . Once th e conges- tion o ccurs ( β < β c ), the accumu lation at the hub nod es will increase linear ly and hence influ ence all the packets that take the h ub no des as the middle statio ns. As m ost of the shortest paths cross the hub no des in scale-free networks, the influ- ence of their con gestion will be globa l and make a significant change in co rrelation wh en th e traf fic ch anges fro m fr ee to congestion phase. This is why we o bserve th e jump of α in the line with “squares” in Fig. 4. While in th e strategy of the n on-sho rtest path, the accum u- lated packets at the hub nodes for β c < β < β 1 will not increase line arly with time even when its average is sligh tly larger than 1 + βk hub because the coming packets will choose other paths to reduce the deli vering tim e. Therefore, for a fixed creation rate, the packets will go longer an d longer p aths to avoid the co ngestion wh en the delivering p arameter β d e- creases. With th e further decrease of β , mor e and more nodes have their average packets larger than 1 + β k i , i.e., there a re accumulated packets at these nodes. When the nodes with the smallest links b egin to be a ccumulated with packets, th e co n- gestion occurs. There fore, th e co rrelation among the p ackets will beco me stronger and stron ger with the decr ease of de liv- ering rate β . Tha t is why we observe the gradua lly increase of α in the lines with “squares” an d “cir cles” in Fig. 4 . On the other h and, a packet will ch oose its path fro m all the nod es when β < β c , thus the co rrelation will becom e global. And for β > β 1 , the average packets at the hub no des will not be over 1 + β k hub , but the fluctu ation of packets may be over 1 + β k hub sometimes. Once it happens, the coming pack- ets will choose other p aths to av oid the accum ulation at the hub nod es, resulting a sho rt range correlatio n ev en in the fr ee phase. So we o bserve the “circles” and “star s” is a little higher than the “ squares” fo r β > β 1 in Fig. 4. I n su m, a difference between th e buffer p hase and the free p hase is that the average packets on a n ode will b e over 1 + β k i for the for mer but not for the later; and a dif ference b etween the buffer p hase and the congestion p hase is that th e a verage packets on a no de will in- crease linearly with time in the c ongestion p hase but not in the buf fer phase. In conclusions, we have inv estigated the correlatio n of traf- fic d ata for three typical routing strategies by the DF A meth od. W e find that there are two p hases in the strategy of th e sho rt- est p ath but three phases in the strategy of the non-sh ortest path. Th e buf fer p hase comes fro m the fact that th e coming packets will go a little lo nger paths with small n odes to avoid the h eavy acc umulation at the hub n odes. The av erage pack- ets in the buffer phase is larger than that in th e fr ee phase but does not increase linearly with tim e. Th e finding of th e buf fer phase may shed light on the way of furth er studying the com- mulation in internet. This work was supp orted b y the NNSF of China u nder Grant No . 10 47502 7 and No. 1063 5040, by the PPS unde r Grant No. 0 5PJ14036 , by SPS under Gran t No. 0 5SG27, and by NCET -05-0424 . [1] Huberman B A and Lukose R M 1997 Science 277 535 [2] Guimer ´ a R, D ´ i az-Guilera A, V e ga-Redon do F , Cabrales A and Arenas A 2002 Phys. Rev . Lett. 89 248701 [3] Moreno Y , Pastor -Satorras R, V azquez A and V espignani A 2003 Europhys. Lett. 62 292 [4] Echenique P Go mez-Gardenes J and Moreno Y 2004 cond-mat/0412 053 [5] Echenique P Gomez-Gardenes J a nd Moreno Y 2004 P hys. Rev . E 70 056105 [6] Zhao L, Lai Y , Park K and Y e N 2005 Phys. Rev . E 71 026125 [7] Y in C, W ang B, W ang W , Zhou T and Y ang H 2006 Phys. Lett. A 351 220 [8] W ang W , W ang B, Y in C, Xie Y an d Zhou T 20 06 Phys. Re v . E 73 026111 [9] Liu Z, Ma W , Zhang H, Sun Y and Hui P 2006 Physica A 370 843 [10] Echenique P , Gomez-Gard enes J and Mo reno Y 2004 Phys. Rev . E 70 05610 5 [11] Echenique P , Gomez-Gardenes J and Moreno Y 2005 Euro- phys. Lett. 71 325 [12] Zhang H, Liu Z, T ang M and Hui P 2007 Phys. Lett. A 3 64 177 [13] Peng C, Buldyre v S V , Havlin S , Si mons M, St anley H E and Goldberg er A L 1994 Phys. Rev . E 49 1685 [14] Peng C, Havlin S, Stanley H E and Goldber ger A L 1995 Cha os 5 82 [15] Liu Y , Gopikrishnan P , Cizeau P , Meyer M, Peng C and Stanle y H E 1999 Phys. Rev . E 60 1390 [16] Y ang H, Zhao F , Qi L and Hu B 2004 Phys. Re v . E 69 066104 [17] Chen Z, Hu K, Carpena P , Bernaola-Galvan P , Stanley H E and Iv anov P 2005 Phys. Re v . E 71 011 104 [18] Liu Z, Lai Y , Y e N and Dasgupta P 2002 Phys. Lett. A 30 3 337

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment