An integrated model of traffic, geography and economy in the Internet

Modeling Internet growth is important both for understanding the current network and to predict and improve its future. To date, Internet models have typically attempted to explain a subset of the following characteristics: network structure, traffic…

Authors: Petter Holme, Josh Karlin, Stephanie Forrest

An integrated model of traffic, geography and economy in the Internet
An integrated model of traffi c, geograph y and econom y in the Internet P etter Holme Schoo l of Computer Scienc e and Comm uication, Roy al Institute of T e chnology , 100 44 Stockh olm, Swe den and Department of Compu ter Scienc e, University of Ne w Me xico , Albuquerqu e, NM 87131, U.S .A. Josh Karl in Department of Compu ter Scienc e, University of Ne w Me xico , Albuquerqu e, NM 87131, U.S .A. Stephanie F orrest Department of Compu ter Scienc e, University of Ne w Me xico , Albuquerqu e, NM 87131, U.S .A. and Santa F e Institute, 1399 Hyde P ark Road, Santa Fe , NM 87501 , U .S.A. Modelin g Internet growt h is important bot h for unde rstanding the current net work and to predict and improv e its future. T o date , Internet models hav e typic ally attempte d to e xplain a subset of the foll owi ng charact eristics: netw ork structu re, tra ffi c flow , geogra phy , a nd economy . In t his pa per we present a disc rete, age nt-based model , that integr ates all of them. W e sho w that the model generate s netw orks with topologie s, dynamics, and (more specula ti vely) spati al distribution s that are simila r to the Internet. I. INTRODUCTION As one of the most complex human constructions, the Inter- net is a challenging system to mo del. Dynamic processes of di ff erent time-scales operate simultaneo usly—from slo w pro- cesses, li ke the de velopmen t of new hardware, to the transport of data, which occurs at the speed of light. These phenom ena are to some extent interdepend ent. Traf- fic provid es income to the service provid ers which is then in - vested in infrastructu re, wh ich c an lead to changes in tra ffi c patterns. This p aper describ es an agent- based mod el (ABM) that attempts to reprodu ce large-scale featu res of th e Au- tonomo us System (AS) level of the Inter net by m odeling lo - calized an d well- understoo d n etwork interaction s. The ASes of the Interne t lend themselves n aturally to discrete ABM models (4). Each AS is an econ omic agen t, comp rised of a spatially d iscrete network. Over time, ASes c reate new links to o ther ASes, up grade their c arrying capa city , and comp ete for custom er tra ffi c. The agents in the mo del described here, behave similarly , although we ha ve simp lified as much as pos- sible. Specifically , the mo del is designed to be b oth simple and gen eral enou gh to simu late any spatially extended com- munication network built by sub networks of econo mically driven agents. In previous work, Cha ng e t al. showed that inco rporatin g econom ics an d geo graphy in to the Highly-Optim ized T oler- ance (HOT) (6) model in creases the model’ s accura cy (7). A related ABM model o f the AS gr aph produces degree distribu- tions similar to empirical observations ( 8). Bar et al. proposed a similar model (2), that incorp orates ano ther aspect of the real Internet—th at the age nts are spatially extended ob jects. Our model is similar in scope to this earlier work but di ff ers in the details, most imp ortantly by adding explicit eco nomics (cost) to the m odel. Other di ff e rences in clude accoun ting f or populatio n d ensity , simplifying th e treatmen t of tr a ffi c flow , and not assum ing a HO T framework. The previous work in this a rea, like mu ch research on network models, focu ses al- most exclusi vely on degree distrib utions of the graphs. In this paper, we comp are our results to In ternet data u sing several topolog ical measur es (19), including degree distributions, as well as geograph y and tra ffi c dynam ics. The remainde r of the pap er is organ ized as f ollows. First, we descr ibe an d m otiv ate the mo del. Then, we char acterize the time ev olution, network topolo gy , correlation between net- work structure and tr a ffi c flow , packet routing statistics, and geogra phical aspects of th e network s produ ced by the m odel. Where possible, we compa re the properties of these synth etic networks to observed data from the Intern et. II. AS SIMULA TION MODEL W e begin with the fundamen tal unit respon sible for network growth, an agent with econo mic interests (15). These agents manage tra ffi c ov er a geogr aphically extended n etwork (wh ich we refer to as a sub-network to distinguish it from the network of ASes) and pr ofit fro m the tra ffi c that flo ws thr ough their network. W e compar e the agents to the ASes th at comprise the In - ternet. This is n ot an exact map ping—some of th e Inter- net Ser vice Pr oviders (ISPs) have many AS number s (e.g., A T&T), while other ASes are shared by se veral organizations. W e make the co mmon simplifyin g assumptio n that o nce an agent is introd uced, it does not merge with an other ag ent or go bank rupt ( 8; 22; 24). This is partially justified by the fact that the Internet, from its inception, has grown monotonically , and we seek to ca pture this dyn amic in our model Most other models of the AS graph enforce strict growth (22) as well and are, as ours, justified by their a posteriori ability to reproduce measured features. W e assume a network user popu lation d istributed over a two-dimensional area. T ra ffi c is simulated b y a packet- exchange m odel, where a pac ket’ s source an d d estination are generated with a probability th at is a fu nction o f the p opu- lation p rofile. The model is initialized with one agent com - prised of a network (a sub-n etwork in our termin ology) that spans o ne g rid location ( referred to as a pix el of the lan d- 2 a b c FIG. 1 Illustration of the network growth algorithm. (a) sho ws the locations of four agen ts on the geographic grid. These are as sumed to be connected by a physical network administrated by the agent, but is not explicit in the model. (b) is an example graph resulting from (a). That two agents are present in the same pixel is a necessary , but not su ffi cient condition for a link to form between the agents. (c) illustrates the area that each hypothetical agent can a ff ord t o expand to (the shaded region). scape. As tim e pro gresses, the agent may extend its sub- network to other pixels, so th at th e sub- networks reach a larger fraction o f the pop ulation. This create s mo re traf- fic, which gener ates pro fit, wh ich is the n rein vested into fur- ther n etwork expansion. Thr ough positive feed back, the net- work grows until it covers the entire popu lation. In this section we describe the assump tions and most of the de- tails of the model; the sour ce code is publicly av ailable fro m www . csc.kth.se / ∼ p holme / asim / . An ag ent i is associated with a set of location s Λ i (repre- senting sources o r end-poin ts of tra ffi c, and peerin g points), a capacity K i (limiting the rate of packets that can pass thro ugh the agent), a packet-qu eue Q i , an d a set of neighbor agents Γ i . A necessary , but no t su ffi cien t, cond ition for two ag ents to b e con nected is that their locations overlap at, at least, o ne pixel. The locatio ns exist on an L x × L y square grid. A pixel of the grid is character ized by its p opulation p ( x , y ) a nd the set o f ag ents with a pre sence there A ( x , y ). The total nu mber of agents is denoted by n , and th e number of links between agents by m . T hese quantities, except L x and L y , depend on the simulation time. The outer loop of the model then iterates over the following steps: 1. Network gr owth. The numb er of agents is increased. Existing agents expand geogra phically , and their capac- ities are adjusted. 2. Network tra ffi c. Packets are created, propagated tow ard their targets, and delivered. This p rocess is r epeated N tra ffi c times befor e the ne xt network-growth step. W e measure simulation time τ as the n umber of times Step 1 is executed (the tim e un it between packet movements is 1 / N tra ffi c ). I n the remain der of this section we descr ibe the growth and tra ffi c steps in gr eater detail. A. Network grow th The inco me of an agent, during a time step, is p ropor tional to the tr a ffi c prop agated b y the agen t d uring the pe riod. Th is is a simplification—in a more detailed simulation on e co uld let the income depen d b oth o n the amoun t of tra ffi c, and the prices for forwarding the pac kets set by business agreements. Assume an agent i has a budget B i that it tries to in vest so that it can increa se its tra ffi c, and thus its profit. Since there is a possibility of congestion in the mod el, agent i tries first to re- move bottlenecks by increasing its capacity K i (the number of packets that the agent can transit dur ing one time step). When the cap acity is su ffi c ient, the agen t spend s the rest its budget on increasing its tra ffi c by expand ing geogra phically . There are three pr ices associated with network growth. First, the capacity price C capacity —the price of increa sing K i one unit. For simplicity we let C capacity be indep endent of the size of the agent’ s subnetwork. Secon d, the wire price C wire . This is the price per pixel between a new location and the agent’ s closest existing location. L ast, the cost C connect to connect two agents with locations at the same pixel. It has been o bserved that the av erage degree (numb er o f neighbo rs of an AS) in the AS graph is relatively constan t over time (11; 2 2). W e take this as a constrain t in our mod el and let the desired average agen t degree k D be a contro l p arameter . W e also ass ume that each agent tries to spend all of its budget, but not more t han that, whenever it is updated . The network growth step iterates over the following s teps: 1. Increase of th e numbe r of agents. As lo ng as the net- work is too den se (i. e. if 2 m > k D n ), new agents are added. New agents are situated in the pixel ( x , y ) th at has the highest a vailable p opulation p ( x , y ) / ( A ( x , y ) + 1) where A ( x , y ) is the cardinality of A ( x , y ) and A ( x , y ) ≥ 1. The budget an d capacity of the new agents are ini- tialized to B init and K init respectively . If the network is small, n < k D + 1, it is not dense en ough for new agen ts to be added in step 1. Th us, we d o n ot apply this co ndition wh en n is less than a thre shold n 0 and call the time when n = n 0 is reached t 0 . 2. Capa city incr ease . Ea ch agent synchrono usly increases its subne twork’ s capacity based u pon tra ffi c fro m the last time step (but n ot more th an the a gent ca n a ff ord) . Agent i invests the minim um of ( B i , C capacity ∆ T i , 0, 0) to increase cap acity ( ∆ T i is th e cha nge in tra ffi c prop a- gated by i since the last update) . 3. Link add ition. While 2 m ≤ nk D (which usu ally mean s k D − 1 times), choo se two agents random ly that are not already con nected and share a commo n pixel. If the budgets of both agents are larger than C connect , then con- nect them. 4. Spa tial extension. Let the agents with remainin g bud- get to spen d extend their network s. Iterate thro ugh all agents i and add a location at the pixel, not in Λ i , that has the h ighest a vailable po pulation p ( x , y ) / ( L ( x , y ) + 1), and is not further than ( B i − C connect ) / C wire from a loca- tion in Λ i (i.e., not fu rther from i than i can a ff ord). (See Figure 1(b)). An alter nativ e location selecto r mig ht se- lect the point which has the lowest cost per unit of pop- ulation. Unfo rtunately , such a n alg orithm is co mputa- tionally proh ibitiv e for modeled networks o f the Inter- net’ s scale. 3 t s b a t = 2 t = 3 t = 1 FIG. 2 Il lustration of tra ffi c simulation. (a) A packet is created wit h source pixel s and target pixel t with probability proportional t o the product o f pop ulations at s and t . One of the age nts at the target pixel is randomly c hosen as the target agent. The propagation of th e pack et is sho wn i n the graph. E ach ag ent i is associated with a queue Q i and a capacity K i . When a pack et reaches an agent, it is appended to Q i . K i packe ts i n the queue are relayed to neighboring agents and i ’ s budg et is credited one u nit. The arro ws in (b) sy mbolize the pack et’ s route from source to destination agent. T he package is routed to a neighboring agent j with probability exp(( d ( i , t ) − d ( j , t )) /λ (where t is the packet’ s target, d ( · , · ) gi ves the graph distance, and λ is a parameter). The cost of each agent modification mentioned above is im- mediately deducted from the budget of the agent. B. Network traffi c W e model tra ffi c with a discrete, packet-exchange model (12; 1 8). The packets are generated with specific source and target pixels, but th e routin g ta kes place on the network of agen ts. W e neglect intrado main r outing between the agent’ s lo cations, assuming the time it takes fo r a pa cket to pass thro ugh an ag ent is in depend ent of the specific location s it visits. The dynam ics are defined as follows: 1. P ac ket generation. W e assume that most tr a ffi c origi- nates f rom d irect communication between individuals and does not depend on the distance between them. So , for eac h pair of p oints [( x , y ) , ( x ′ , y ′ )] on the g rid, we create a packet with source ( x , y ) and destination ( x ′ , y ′ ) with pr obability P pkg p ( x , y ) p ( x ′ , y ′ ). Then one agent, selected at r andom fr om the ag ents with a location at the pixel, is made the source node f or th e packet. The destination agen t is rand omly chosen from the agents at the destination pixel. Finally , one unit of c redit is added to the sender’ s budget. 2. P ac ket pr o pagation. Each agent i propag ates th e first K i packets from its q ueue (o f length l i ) each time step and receives one unit cred it f or each pro pagated packet. A packet can pr opagate o nly one hop (inter-AS tran smis- sion) per time step. A packet at agent i is pro pagated to a neigh bor j with probab ility exp( λ ( d ( i , t ) − d ( j , t )) Parameter Interpretation V alue L x = L y Number of pixels in the x (and y) direction 50 N tra ffi c Number of pack ets sent per simulation step 1 × 10 4 P pkg Constant to determine packet source and dest. 0 . 001 n 0 Agent gro wth threshold 35 K init Initial capacity of an agent 5 C wire Price per pixel for ne w wire 500 B init Initial bud get for a new agen t 3 × 10 5 λ Parameter in e xponential distribution 75 T ABLE I Default parameters v alues for simulation experimen ts. (where t is th e rec ipient AS, d ( · , · ) is th e gr aph dis- tance, and λ is a par ameter contro lling the deviation from shortest-path routing (25) observed in Ref. (16)). 3. P acket delivery . For all agents, delete all packets that have reached their target. The a ssumption, in step 1, that the pro bability tha t two agents communicate is independen t of their spatial separation is in lin e with the (somewhat deb ated) “death of distance” in the I nternet ag e (5). W e also tested commun ication rates that decay with the square of the distance, as o bserved in co n ven- tional trade firms (20), with qualitatively the same results. Business agreeme nts between ASes ar e an impo rtant fac- tor in the Border Gatew ay Proto col ( BGP) (23) ( the I nter- net’ s largest scale routing protoco l). Next hop s are o ften se- lected by cost, rather than path length. W e do no t explicitly include in ter-AS contr actual agree ments, but ou r prob abilis- tic propag ation method 2 has a similar e ff e ct on average path length (16). III. NUMERICAL SIMULA TIONS A. Parame ter values Before presenting the simu lation results, we describe the experimental design, and cho ice of p arameters. First, we spec- ify a popu lation p rofile p ( x , y ). W e pr imarily model popula- tion distrib utions, but we also model specific geographic pop - ulations (e.g. U.S.A. ce nsus data). T o simp lify the gener ation of population d istributions, we ne glect spatial cor relations and simply model the frequ ency of pop ulation densities. This fre- quency has two importa nt features: it is skewed (pixels with low population densities are mor e frequent than highly po pu- lated p ixels) and fat-tailed (th ere ar e pixels with a p opulation density many o rders of ma gnitude larger than the average). One probab ility distribution with suc h features is the power- law distribution Prob p ∼ p − χ . T o red uce the fluctuations between di ff erent realizations of { p ( x , y ) } , and prevent unreal- istically h igh pop ulations within a pixel, we sample the power- law distribution in the b ounde d interval [1 , ( L x L y ) 1 / (1 − χ ) ] (10) with χ = 3. Our resu lts d o not dep end stron gly on the distri- bution p ( x , y ). W e o btain qualitati vely s imilar result with nor- mally distributed p -values and real po pulation-d ensity m aps (data not shown). 4 c a b fraction n , m h τ p i h d i 100 1000 10 4 2.5 3 3.5 4 h τ p i , h d i 0 0.25 0.5 0.75 1 10 4 number of agents, n number of links, m covered population time, τ × 10 6 1 1.5 2 2.5 1000 100 × 10 6 1 1.5 2 2.5 time, τ number of agents, N covered area FIG. 3 T ime e v olution o f an example run. In pan el (a) t he number of agents and the number of inter -agent l inks as a function of simu lation time. I n (b) the fraction of the landscape with network cov erage, and the fraction of the population reached by the network, is plotted against time. Panel (c) shows the averag e trav el time h τ p i for packets and the average distance (number of i nter-agen t hops) in the netwo rk h d i , as functions of the number of agents. In multiparameter , agent-based mod els, such as ours, a sys- tematic inv estigation of the full p arameter space is infeasible. Parameters are, if p ossible, obtained from rea l systems. W e set the desired degree k D = 5 . 52 as obser ved in Ref. (19). Unless otherwise stated, the d esired size o f the network is n D = 16 , 000, which is the same order of magnitude as t he real AS graph. Other parameters are balanced to keep runtime low (less than o ne day) while still engaging all aspects of the algo- rithm. This me ans, fo r example, that between e very network update, a significant numb er of packets are r outed th rough ev en the smallest agents, an d enou gh pack ages to cause con- gestion pass thro ugh larger agen ts. Unless oth erwise stated, we use the param eter set giv en in T ab le I. Many o f the r e- sults we show are fro m a single run, we hav e co nfirmed th at the results are r epresentative by comparing th em with 20 other runs. B. Network Growth W e begin by studying the growth of the network over time. In Fig. 3(a) we plo t the nu mber of agen ts and links as a function of simulation time fo r o ne r epresentative ru n. At τ = τ 0 ∼ 4 × 10 5 the grap h is sparser than k D . Initially , th e agents spend the b udget they accumulate o n new links ( and increasing capacity). Around τ ∼ 1 . 5 × 1 0 6 , the budget of the wealthier agents is su ffi cient to inves t in wires to new loca- tions (see Fig. 3(b)) . This creates new tra ffi c, which causes positive feedback accelerating the tra ffi c flow , coverage, bud- get, and also mor e congestion . Ar ound τ ∼ 1 . 9 × 10 6 , n ( τ ) and m ( τ ) change from exponential to sub-exponential gro wth. As we see below , this is also the time w hen a significant lev el of congestion app ears in the sy stem. At ab out the same time, the entire population is serviced by the network. W ith the cur - rent mod el, the network would grow indefin itely but with de- creasing return s for the agen ts. Alternatively one cou ld intro- duce maintenance co sts proportion al to network size, i n which case the network would re ach a steady state where the budgets of th e agen ts are balanced and no further investments can be made. F or τ & 1 . 9 × 10 6 the increase o f n ( τ ) is slower than exponential. Th is is explain ed by th e inc reasing lev el o f co n- gestion in the sy stem. I n Fig. 3(c) we plot the a verage time h τ p i f or a packet to trav el from source to destination. h τ p i is bound ed from below by the a verage distance (numb er of li nks in the shortest p ath, averaged over pairs o f n odes) h d i . The two curves div erge, i.e. a sign ificant level of congestion app ears, around N = 1 000. The growth o f n ( τ ) and m ( τ ) slows d own at the same point. W e conclud e that growth slo wdown comes from a congestion -driven negati ve feedback . Th e mo st strik- ing feature of network growth over t ime is the transition from a small network, almost constant in size, to a rapidly incr eas- ing system (ar ound τ ∼ 1 . 8 × 10 6 ). This e ff ect is ty pical fo r technolog ies emerging from the interactions of a large number of agents—they need a critical mass of users to reach a signif- icant fr action of the total popu lation. One ca n argue that the Internet r eached th is cr itical mass in the early 1 980’ s when it started to span the globe. Ano ther im portant point in the Inter- net’ s history was the advent of the W orld W ide W eb (WWW) in the early 1990’ s, and with it commer cial app lications and access to the g eneral pu blic. Our mod el do es no t in clude ap- plications, such as the WWW , that unden iably a ff ect network growth. Such e ff ects could b e inclu ded by ado pting a di ff er- ent tra ffi c mo del, but for this paper we aim at simp licity and generality . In the Internet th e g rowth o f the number of ASes is slower than the expo nential incr ease of agents pr edicted by the model (b gp.potar oo.net / cidr / ; read January 7, 20 08). One reason for the fas ter growth is that we do not assume that maintenan ce costs are propor tional to income—if such costs grow super-linearly , negati ve feedback could damp en gro wth. Other external factors, such as the fact that AS numbers are al- located and assigned by a central authority (Inter net Assigned Numbers Authority , www . iana.org), m ight also influen ce the actual rate of growth experienc ed by the Internet. C. Degree distri bution One of the most c onspicuou s network structu res of AS- graphs is its ske wed degree distribution ( first observed in Ref. (1 4)), com patible with a power -law f unctional form (9). In Fig. 4(a) we compare the cumulative degree distribution of our model with that of th e In ternet’ s. W e u se th e mod el net- work from the e xample run d escribed earlier (ta king data from the simulation when N = 16 , 000) , and the “ AS06” network of Ref. (19) (an AS-graph constructed from www .routeviews.org and www .ripe.net, with N = 22 , 688 ). The match between the model and th e real networks is striking . Preliminary studies indicate that the slope of the curve is largely in sensiti ve to changes in parameter values. W e comp are this result with a generic network model that produ ces power -law degree dis- tributions (the Barab ´ asi–Albert (BA) m odel (3)) and a sim- ple, geogra phic model of the AS-graph designed by Fabrikant, K outsoupias, and Papadimitriou (FKP) (13). The B A mod el is a growth model in which one node (and m 5 a b c d 1000 10 10 − 2 10 − 3 degree 100 1 10 − 4 relativ e tra ffi c intensity 10 − 4 10 − 6 10 − 4 10 − 3 0.01 0.1 1 10 − 4 0.1 1 1 10 100 1000 1 100 k B A model real model 0.01 0.1 1 1 10 100 1000 P ( k ) 10 10 − 4 k P ( k ) FKP model 0.01 10 − 3 P ( k ) 10 − 3 1000 k FIG. 4 The degree distribution (cumulati ve mass function) of a real AS-graph (AS06) togethe r with degree distribution of a netw ork ge n- erated with the model (a), the B A (b) and the FKP models (c). Panel (d) is a density plot that illustrates the correlation between tra ffi c and degree in ou r model runs. links to attach it with the rest of the network) is added every time step . Prefer en tial a ttachment is used to determine th e endpo ints of the new lin ks—the p robability of attachin g to a node of degree k is propo rtional to k . The FKP mode l is also a simp le growth-model. Each time step, one node, and a link attached to it, is added to th e graph. A n ew node i is assigned r andom coordinates in the unit sq uare and attached to th e old no de j that m inimizes d 0 ( j ) + α | r i − r j | (where d 0 ( j ) is the gr aph d istance between j and the no de ad ded fir st, | r i − r j | is the Euclidean distance between i and j , and α is a p arameter setting th e cost-balance between making n ew physical connections or using th e exist- ing network). In Figs. 4 (b) and (c) we plot the cumulative mass fu nction of degree f or o ne B A and on e FKP network. Th e model p a- rameter values wer e ch osen to give networks as close as p os- sible to th e real AS-graph ( m = 5 for the B A model, α = 4 for the FKP m odel, and N = 2 2 , 688 for b oth). The slope of the B A m odel is steep er than the real network, an d the cur ve for the FKP-mod el is flatter than th e real data. T o co mpare the good ness-of-fit, since the curves have a similar range in log p k , we measure the ratio θ of th e area between the curves and the area (in the lo g p k , log k -space) spa nned by the ex- treme values o f log k and lo g p k . W e find θ = 0 . 95% for our model, 4 . 0% f or the BA mode l, and 11% for the FKP m odel. Although bo th the BA and FKP m odels hae been extende d to yield better data fits (1; 28), the origina l for ms of the mod- els illustrate two impor tant compon ents o f Intern et growth, namely the rich-g ets-richer e ff ect driving th e growth o f the B A model and the spatial trade-o ff e ff ect of the FKP model. A comb ination of these e ff ects may explain wh y our model’ s degree distrib ution, and the curve of the real network, a b c d e f 3 4 3 4 3 4 average distance, ¯ d average distance, ¯ d av erage distance, ¯ d 5 6 5 6 5 6 3 4 3 4 3 4 average distance, ¯ d average distance, ¯ d av erage distance, ¯ d 5 6 5 6 5 6 0 0.1 0.2 our model AS06 FKP BA fraction of vertices average de gree 1 10 10 2 10 3 FIG. 5 Radial statistics for real and model networks. Panels (a)– (c) sho w t he radial densities of nodes for the real AS-graph and our algorithm (a), t he BA ( b) and FKP (c) model. Panels (d)–(f) sho w the av erage degree vs. average distance ¯ d for our algorithm, the BA, and the FKP model respectiv ely . The data of panels (b), (c), (e), and (f) are plotted in Ref. (19) as well. lies between those of the origina l B A and FKP models. In our model, the degrees of nodes do not directly a ff ect the creation of n ew links. Howev er , p referen tial attachment occurs in di- rectly via positi ve feedback —nodes with lar ge degree acquire more tra ffi c, and thus more budget which they can rein vest in more co nnection s, thus increasing their degre e. The e ff ect of preferen tial attachment in the mod el is shown in Fig. 4 (d), which is a plot o f the p robab ility den sity of a node’ s tra ffi c load gi ven its degree. Because an agent’ s income is correlated with the tra ffi c th at it propag ates, and a larger budget will in- crease th e po ssibility of creating new links, there is positiv e feedback betwe en th e degree and th e rate of degree inc rease, i.e. a f orm of pr eferential attachm ent. Note that the corr elation in Fig. 4(d) is not linear (the slop e is di ff e rent from the solid line’ s). It is known that nonlinear preferential attachment does not gi ve a power-la w degree distribution (21) (which we seem to hav e), so pref erential attac hment is n ot the only factor af- fecting our network’ s growth. (If we had lin ear p referen tial attachment, the slope of P ( k ) would, further more, be the same as the B A model.) D . Radial structure Structually , the AS g raph is hiera rchically order ed (27)— engineer s an d network operator s speak of the first, s econd and third tier . For the model networks, we measure a nod e’ s posi- tion in the hierarch y by its network centrality ( 19). In Fig. 5 we diagram the average fraction of nod es and the av erage de- gree as functio ns of the average d istance ¯ d to other nodes in the network ( ¯ d is the inverse of a centr ality measure, kn own as c loseness cen trality , so mor e central nodes are to the left in the d iagrams). By this method we can get a radia l p icture of the AS g raph structure from the center to the peripher y . In 6 a b × 10 4 10 − 6 10 − 5 10 − 4 10 − 3 relati ve tra ffi c density , ρ 10 5 1 betweenness, C B 10 − 5 10 − 1 10 − 3 10 − 6 10 − 5 10 − 4 10 − 3 0.01 0.1 observed data simulation 0 2 4 6 8 10 number of extra steps, d + 1 p ( d + ) FIG. 6 Tra ffi c patterns of t he model. (a) displays the number of extra steps d + in packet navigation in the real Internet compared to our model. Panel (b) shows the probability density of agents having betweenness C B and tra ffi c density ρ . The data is collected from twenty independe nt runs. Fig. 5(a)–(c ) we plot the fraction of vertices at di ff er ent ¯ d - values. W e note that ou r mo del resembles the real AS-gr aph more closely than the B A and FKP models. Having p eaks (roug hly cor respond ing to the tiers of the Internet) like the observed AS-graph. The shift to the lef t of the model curve in Fig. 5(a) can, to some extent, be explained by its smaller size (lar ger networks ha ve larger average distances, leading to a curve displaced to the righ t). In brief, the B A model lacks the comp lex periphery of the real A S-graph (the de nsity is more balan ced, compared with the left-skewed curve of the real-world network). The av erage degree as a function of ¯ d is less rig ht-skewed in the B A mod el compar ed with the e mpir- ical network. Just like the degree distrib ution, the FKP model deviates from the real network in th e op posite way c ompared to the BA mode l—the hig h degree n odes o f th e FKP mo del are extremely concentrated to the center of the network. E. T raffi c flow and congestion patterns In Sec tion I II.B we investigated network topo logy and its growth. I n this section we study tra ffi c flow and how network topolog y a ff ects it. In th e In ternet, packets do not necessar- ily trav el the shortest distances between source and destina- tion. Most imp ortantly , business agreements between a gents arrange agen ts into a hierar chy ( 15). The business contracts put constraints on how packets are ro uted—for e xample, usu- ally a packet cannot first be routed do wnwards (to customers), then upwards (to providers), in the hierarchy , even if that is a shorter path (known as the valley free rule). Gao and W ang (1 6) investigated the extra d istance d + packets n eed to trav el due to such re asons. They fou nd a deca ying probabil- ity distribution of d + , meaning that most of the tra ffi c actually trav els via shortest paths. In o ur model we do not have ex- plicit business agreem ents that cause hierar chical routing into the cor e of the network, an d out ag ain. It is, howe ver , true for mo st g raphs that a vast majority of shortest paths pass a restricted core of th e graph (17), and our tra ffi c mo del r outes most tra ffi c via sho rt (if not the shortest) paths. The d + distri- bution of our mode l (shown in Fig. 6(a)) matches the observa- tion of Gao and W an g (16). W e pro ceed to in vestigate the relationship between g raph centrality and tra ffi c density . This can tell us something about how con gestion an d fluctuation s a ff ect rou ting ( 18). If all agents h av e su ffi cien t capacity for pa ckets to alw ays route along shortest paths, th en tra ffi c density along a link l will be propo rtional to its betweenn ess centrality C B ( l ) = X i , j σ l ( i , j ) . X i , j σ ( i , j ) (1) where σ l ( i , j ) is the num ber of shortest paths between nodes i and j passing through th e link l , and σ ( i , j ) is the total numbe r of sho rtest path s between i and j . If an AS is cong ested, th e tra ffi c throug h its links will be lower than anticipated b y the betweenness of the ed ge. Thu s, congestion patterns can be il- lustrated b y studying betweenne ss and tra ffi c load. Fig . 6(b ) is a density p lot of the actual tra ffi c den sity as a fun ction of betweenness o f the links of the mode l network . For mor e central nodes (h igher b etweenness), th ere is a strong corre- lation between betweenness and tr a ffi c d ensity—the vertices with C B ≈ 4 × 10 5 spans half a decade o f ρ . For the more peripher al node s the correlation is less clear (vertices with C B ≈ 5 × 10 4 can have ρ - values of almo st three order s of magnitud e). Indeed , there seems to b e a separa tion o f agents into two classes, one with capacity to k eep the tra ffi c flowing, another with too low capacity . For links o f low b etweenness the tra ffi c– betweenn ess co rrelation is weak. T o summar ize, congestion do es a ff ect the system, an d it is most pr onou nced for nodes carrying little, or intermediate, tra ffi c le vels. F . Geographic structure W e briefly discuss the spatial network structu re—anoth er feature th at emerges f rom our m odel. As an example, we ran the simulation on the populatio n density profile o f the United States. In Fig. 7 (a)–(d ) we show th e gr owth of the largest agent for a run with n D = 20, L x = 51 3 an d L y = 32 3. Lines are drawn betwee n each node (pixel) and the agen t’ s nearest node at the time of the no de’ s addition . In this repre senta- tion the len gth of the lines ar e pro portion al to th e wire co st. Fig. 7(e) and (f) plot the locations of T ier 1 exchange points of two major Internet providers S print and A T &T (adapted from 7 FIG. 7 The spatial expan sion of a single agent w ith the US population density as model input. The simulation parameters are the same as the rest of the paper , excep t n D = 20, L x = 513 and L y = 323. Pane ls ( e) and (f) represent the points of presence of A T&T and Sprint within t he United States. This data was adapted from Ref. (26). Ref. (26)). There are some similarities between these real net- works and the model network of Fig. 7(d)—all netw orks span the wh ole contin ent and have locations con centrated in u rban areas. In future work we inten d to make a statistical charac- terization of the spatial asp ects of the networks produce d by our model. IV . DISCUSSION W e h av e p resented a model of com munication networks that, like the AS-level Internet, is built of spatially extended subnetworks that ha ve an interest in increasing the tra ffi c run- ning th rough th em. Our model n etworks grow slowly un til they r each a critical mass where an approximately e xpon ential growth begins; they match th e degre e distribution of real ne t- works and the radial statistics closely . T he degree distrib ution of the mod el, and the real world lies between the distributions of the pu re BA and FKP models. Since the model in corpo rates aspects of both th e BA and FKP models we hyp othesize that, the explanation for the degree distribution of the mo del, and the real w orld, is a combined result of preferential attachmen t (of the B A mod el) and ge ograph ically co nstrained optimiza- tion (of the FKP mode l). W e are able to r ecreate th e tra ffi c characteristic observed in rea l I nternet tra ffi c. If we run the model on the US population density map many features of the backbo ne of large, real agents are recreated. The d i ff erent aspects o f the m odel (tra ffi c, geogra phy , and agents try ing to incr ease the tr a ffi c they re lay) all a ff ect the output. I n this paper we do not scrutinize the model’ s par ame- ter dependen ce, although prelimin ary st udies indicate that the speed of growth (quantified by e.g. the time to reach the cr iti- cal density) is strongly dependent on both the wire and attach- ment price s, the population d ensity profile (a more clum ped populatio n distribution pro duces faster growth), an d th eir de - sire to communicate. On the other hand, th e network topology is rather insensiti ve to the population distrib ution, and also not very d epende nt on how sou rces and destinations are gener- ated ( e.g., intr oducing a distance depen dence does n ot m atter much). The specific layout of the network is, howev er , depen- dent on population profile. Many interesting extensions o f the basic mo del are possi- ble. On e interesting extension would , for example, b e to in- clude business agreemen ts between the di ff er ent agents (sim- ilar to Ref. (8; 24)), or change the tra ffi c p atterns from the person– to–person commun ication of the present mode l to a situation with m ore tra ffi c coming from central servers. It might also b e inter esting to m odel intra-AS ro uting. Many of today’ s ASes emp loy “ho t-potato” routing and transf er pack- ets to the next AS as quickly as possible, to reduce cost. Alter- native intra -AS rou ting stra tegies, such as ro uting the packet as close to th e destina tion as possible, c ould be tested within the model’ s frame work. Ack nowledgeme nts The authors would like to thank Allen Downe y for his help- ful c omments. PH ackn owledges finan cial sup port fr om th e Swedish F ound ation for Strategic Research. SF acknowledges 8 the suppo rt of the National Scien ce Foundation (grants CCF 06219 00 and CCR–03315 80), and the Santa F e Institute. Referen ces [1] J. I. Alvarez-Hame lin and N. Schabanel. An Internet graph model ba sed on trade-o ff optimization. Eur . Phys. J. B , 38:231– 237, 2004. [2] S. Bar , M. Gonen a, and A. W ool. A geograp hic directed prefer - ential Internet topology model. Computer Networks , 51:4174– 4188, 2007 . [3] A.-L. Barab ´ asi and R. Albert. Emergence of scaling in random networks. Science , 286:509– 512, 1999. [4] E. Bonabeau. Agent-ba sed modeling: Methods and techniques for simulating human systems. Pr oc Natl Acad Sci , 99:7280– 7287, 2002 . [5] F . Cairncross. The death of distance . Harvard Business School Press, Boston, MA, 1997. [6] J. M. Carlson and J. Do yle. High ly optimized tolerance: a mechanism f or power laws in designed systems. Phys. Rev . E , 60:1412– 1427, August 1999. [7] H. Chang, S. Jamin, and W . Willinger . Internet connecti vity at the AS-lev el: an optimization-driv en modeling approach. In MoMeT ools ’03: Pro ceedings of the ACM SIGCOMM work- shop on Models, methods and tools f or r epr oducible network r esear c h , pages 33–46, New Y ork, NY , USA, 2003. ACM. [8] H. Chang, S. Jamin, and W . Willinger . T o peer or not to peer: Modeling the ev olution of the Internet’ s AS -lev el topology . In Pr oc. IEEE INFOCOM , 200 6. [9] A. Clauset, C. R. S halizi, and M. E. J. Ne wman. Po wer-law distributions in empirical data. e-print arXiv :0706.1062, 2007. [10] R. Cohen, K. Erez, D. ben A vraham, and S . Havlin. Re- silience o f the Internet to ra ndom break do wns. Phys. Rev . Lett. , 85:4626– 4628, 2000. [11] I. Daubechies, K. Drakakis, and T . Kho v anov a. A detailed study of the attachment st rategies of ne w autonomou s systems in the AS connecti vity graph. Internet Mathematics , 2:185–246, 2006. [12] P . Echenique, J. G ´ omez-Gard ˜ e nes, and Y . Moreno. Dynamics of jamming transitions in complex networks. Eur oph ys. Lett. , 71:325–3 31, 2005. [13] A. Fabrikant, E. K outsoup ias, and C. H. Papad imitriou. Heuris- tically optimized trade-o ff s: A new pa radigm for po wer laws in the Internet. In Pr oceedings of the 29th International Confer- ence on Automata, Langua ges, and Pr o gra mming , v olume 2380 of Lecture notes in Computer science , pages 110–122, Heidel- berg, 200 2. S pringer . [14] M. Faloutsos, P . Faloutsos, and C. Faloutsos. On power -law relationships of the Internet topology . Comput. Commun. Rev . , 29:251–2 62, 1999. [15] L. Gao. On inferring autonomous system relationships in the Internet. IEEE / ACM T r ansactions on Networking , 9:733 –745, 2001. [16] L. Gao and F . W ang. The exten t of AS p ath inflation by routing policies. In Pr oceedings of GL OBECOM ’02 , volume 3, pages 2180–2 184, 2002. [17] K.-I. Goh, E. Oh, H. Jeong, B. Kahng, and D. Kim. Classi- fication of scale-free networks. Proc. Natl. Acad. Sci. USA , 99:12583 –12588, 2002. [18] P . Holme. Conge stion and centrality in tra ffi c flow on complex networks. Advances in Complex Systems , 6:163–1 76, 2003. [19] P . Holme, J. Karlin, and S. Forrest. Radial structure of the In- ternet. Proc. R. Soc . A , 463:1231– 1246, 2007. [20] W . Isard. Location an d space economy . MIT Press, Cambridge MA, 1956. [21] P . L. Krapivsk y , S . Redner , and F . Leyvraz. Connectivity of gro wing random networks . P hys. Rev . Lett. , 85:4629 – 4632, 2000. [22] R. Pastor-Santorras and A. V espignani. Evolution and struc- tur e of the I nternet: a statistical physics appr oach . Cambridge Univ eristy Press, Cambridge, 2004. [23] Y . Rekhter and T . Li. A Border Gatew ay Protocol 4 ( BGP– 4). RF C 1771 (Draft Standard), Mar . 1995. Obsoleted by RFC 4271. [24] S. Shakkottai, T . V est, D. Kriouko v , and K. C. Cla ff y . E co- nomic evo lution of the Internet AS-lev el ecosystem. e-print arxi v:cs.NI / 0608058, 2006. [25] V . Sood and P . Grassberger . Localization transition of bi- ased random walks on random networks. Phys. Rev . Lett. , 99:09870 1, 2007. [26] N. Spring, R. Mahajan, D. W etherall, and T . Anderson. Measur- ing IS P t opologies wit h Rocketfuel. IEEE / ACM Tr ansactions of Networking , 12:2–16, 2004. [27] L. Subramanian, S. Aga rwal, J. Re xford, and R. H. Katz. Char- acterizing the Internet hierarchy from multiple van tage points. In INFOCOM 2002. T wenty-First A nnual Joint Confer ence of the IEEE Computer and Communications Societies. Pr oceed- ings. IEEE , volume 2, pages 61 8–627, 2002. [28] S.-H. Y ook, H. Jeong, and A. -L. Barab ´ asi. Modeling the Internet’ s l arge-scale topology . P r oc. N atl. Acad. Sci. USA , 99:13382 –13386, 2002.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment