Combined Intra- and Inter-domain Traffic Engineering using Hot-Potato Aware Link Weights Optimization

A well-known approach to intradomain traffic engineering consists in finding the set of link weights that minimizes a network-wide objective function for a given intradomain traffic matrix. This approach is inadequate because it ignores a potential i…

Authors: Simon Balon, Guy Leduc

Combined Intra- and Inter-domain Traffic Engineering using Hot-Potato   Aware Link Weights Optimization
Combined Intra- and Inter -domain T raffic Engineering using Hot-P otato A ware Link W eights Optimization Simon Balon ∗ and Guy Leduc Research Unit in Networking Unive rsité de Liège (ULg) - Belgium {balon, leduc}@run.mon tefiore.ulg .ac.be ABSTRA CT A w ell-k now n approach to i ntradomai n traffic engineering consists in finding the set of link weigh ts that minimizes a netw ork-wide ob jectiv e function for a giv en in tradomain traffic matrix. This approach is inadequate b ecause it ig- nores a p otential impact on interdomain routing. Indeed, the resulting set of link w eigh ts may trigg er BGP to c h ange the BGP next hop for some destination prefixes, t o enforce hot-p otato routing p olicies. In turn , this results in changes in the intradomain traffic matrix that hav e not b een an- ticipated by the link weigh ts optimizer, p ossibly leading to degraded n etw ork p erformance. W e prop ose a BGP-a ware link w eights optimization method that takes these effects into accoun t, and even turns them into an adv antage. This metho d uses the interdomain traf- fic matrix and other av aila ble BGP data, to extend th e in- tradomain top ology with external virtual nodes and links, on whic h all th e well-tuned heuristics of a classical link wei ghts optimizer can b e applied. A key inn o v ative asset of our metho d is its a bility to also optimize the traffic o n the inter- domain p eering links. W e show, using an op erational n et- w ork as a case study , t h at our approac h do es so efficien tly at almost no extra computational cost. 1. INTR ODUC TION & MO TIV A TION Intradomain traffic engineering consists in routing traffic in an optimal w ay from ingress no des to egress no des in a given domain. If shortest path IP rout in g is used, th e only w ay to optimize the traffic is by finding an appropriate set of link w eigh ts th at minimizes a give n domain-wide ob jective function. F or example, if this ob jectiv e is t o minimize the total ( eq uiv alently the ave rage) link lo ad , a solution consists in assigning unitary w eights to all link s in t he n etw ork. On the other han d , to minimize the ave rage link util i zation , th e ∗ S. Balon is Researc h F ello w of the Belgian National F und for the Scientific Research (FNRS) and also partially fund ed by the EU und er the AN A FET pro ject (FP6-IST-27489). solution consists in choosing link wei ghts that are invers ely prop ortional to the link capaciti es 1 . Note that these tw o ex - amples are representativ e o f traffic-indep endent link w eights settings, i.e. they minimize their respective ob jectiv es for every p ossible traffic matrix. F or other ob jectiv e functions (e.g. minimizing the maximum link load or utilization), th e optimal choice of link weigh ts usually d ep en ds on the traffic matrix. Therefore in its sim- plest fo rm the resolution of this optimization problem needs to take as inputs (1) the netw ork t opology with unk now n link weigh ts, (2) th e chosen n etw ork-wide ob jective func- tion, an d (3) an in tradomain traffic matrix, which sp ecifies the amount of traffic b etw een every pair of ingress/egress nod es [1 0]. This optimization problem is NP-hard and goo d local-search heuristics are thus needed to find a set of link w eights that reasonably minimizes the ob jective function in a reasonable time. How ever this app roach is unaw are of the interdep endence b e- tw een intradomain and interdomain routings. Actually th e real traffic demand is an interdomain traffic matrix (from prefix to prefix ), while the intradoma in traffic matrix ( from ingress to egress no des) is only the result of applying BGP routing decisions on th e interdomain traffic matrix (TM). Even if we consider th at the interdomain TM and the in- terdomain (BGP ) routes are inv arian t, the intradomain T M ma y still v ary if some link weig hts are changed inside the domain. This is due to the so-called hot-p otato (or early exit) decision ru le implemented by BGP . The to y ex amp le depicted in figure 1 suffices to illustrate the problem 2 . This figure shows a domain with three nodes: an ingress no de R 1 p ossibly sending traffic to egress no des R 2 and R 3 , and three intradomain links of weigh ts w 1 , w 2 and w 3 . Supp ose this domain (also called an A S) has tw o peering links (resp ectively R 2 - N 1 and R 3 - N 2 ) with a neighboring AS providing conn ectivity to th e IP prefix P 1 . F urther supp ose that no BGP rule of higher precedence th an the hot-p otato rule has b een able to make a selection b etw een R 2 and R 3 . If the link weigh ts are inverse ly prop ortional to t he link capac- ities shown on th e figure, then ingress n od e R 1 will choose to rea ch this prefix through egres s node R 2 according to the hot-p otato rule (b ecause w 1 = 1/1 0 < w 2 = 1/8 ). If R 1 has 5 units of traffic to send to P 1 , then the intradomain TM is 1 In [3] we d emonstrate that these link weigh ts settings min- imizes th ese ob jective functions. 2 This ex ample netw ork is similar to the one used in [6] Figure 1: T oy Example just 5 units from R 1 to R 2 and no traffic elsewhere. Now suppose that w e run a link w eights optimizer (denoted L W O in t he sequel) t hat tries to minimize the maximum link utilization, while allo wing equal cost multipath (ECMP)[16]. A p ossible optimal link weigh ts setting is w 1 = 2, w 2 = w 3 = 1, leading to tw o I GP equal cost paths from R 1 to R 2 and to a max imum link u tilization of 2.5/8 on link R 1 - R 3 . How ever, if the w eigh ts are set as prop osed, the hot-p otato rule will now select R 3 as egress node to reach P 1 (b ecause w 2 < w 1 ), and the resulting intradomain TM is actually 5 units of traffic from R 1 to R 3 , with a maximum utilization of 5/8 on l ink R 1 - R 3 . Clearly , the outcome is much wors e than exp ected, and even w orse than keeping the in itial w eigh ts setting! This toy example illustrates that we cannot rely on the in- tradomain TM to solve the optimization problem, b ecause it is n ot in v arian t under link w eights c hanges, possibly leading to degraded netw ork p erformance. Even t h ough this toy examp le is not represen tative of real netw orks with real traffic, we will show in section 5, by us- ing an op erational netw ork as a case study , th at th is ph e- nomenon can really happ en with bad consequen ces, b ecause a substantial amount of prefixes/traffic ma y b e sub ject to hot-p otato (re)routing. F or the case study in section 5 we sho w that 97.2% of the prefixes h a ve multiple p ossible egress p oin ts, which amounts to 35.6% of the traffic on av erage. Without t ak ing hot-p otato effects into account, we will show that the link wei ghts p roposed by a classical L W O may re- sult in link utilizations close to and ev en ab o ve 100%, while the to ol exp ected maximal link utilizations of only ab out 35%. W e prop ose a link weigh ts optimization metho d that takes these hot-p otato effects into account, and even t u rns th em into an adv antage. T o th is end we use as inp uts the (h ot- p otato inv ariant) interdomai n TM and some BGP d ata, b oth collected inside the d omain, to infer t h e set of IP p re- fixes that can b e reached by at least tw o egress no des and for which no BGP rule of higher preceden ce than t he hot- p otato rule has b een able to mak e a selection, i.e. for whic h the hot-p otato rule can p otential ly b e the tie-breaker. W e call this subset the hot-p otato p refixes, and from now on in this introdu ction we will only consider these prefix es. Figure 2: T o y Ex a m ple - Simplifie d V ersion Our metho d is b ased on an ext en sion of the intradomain top ology with external virtu al nod es and links. A first naive and unscalable w ay to solve the problem w ould consist in adding a virtual no de p er hot-p otato prefix and attach this nod e to all p ossible BGP next -hops for this prefix. This is depicted on figure 2 for the toy example of figure 1. If w e now run L WO on this virtual t opology , while still allo wing equal cost multipaths, including multiple BGP next-hop s, an optimal w eights setting is w 1 = w 2 = w 3 = 1, whic h will split the 5 units of traffic evenly on the tw o p aths R 1 - R 2 - N 1 and R 1 - R 3 - N 2 . How ever, the number of hot-p otato prefixes can b e very large and we would like t o k eep the number of virtual nod es roughly similar to the num b er of ordinary no des. T o t h is end we prop ose t o aggregate all virtual no des attached to exactly the same sets of BGP n ext-hops. They are indeed in- distinguishable with resp ect to in tradomain routing. On the operational net w ork t h at we ha ve considered, t he num b er of such nodes b oils d o wn from 160,000 to only 26. W e further sho w by considering the amount of t raffic sent to th ese ag- gregates, that we can reduce this set to only 5 virtual no des without losing more than 0.06% of th e total ” hot-p otato” traffic. An asset of our metho d lies in reusing t h e well-tuned L WO heuristics on this extended topology . Moreo ver, we hav e also ext en ded this intra domain traffic engineering problem to the p eering links, by taking these links into account in the ob jective fun ct ion. In our sim u lations this metho d allow ed us to reduce the maximal interdomain link utilization from 70.1% to 36.5%. The pap er is structu red as follo ws. In section 2 we review related works. In section 3 we present the necessary knowl - edge on intra- and interdomain routings. In section 4 we for- mula te the problem and prop ose our BGP-a w are L WO. In section 5 w e show an application of th e metho d using an op- erational net w ork. In section 6, we discuss future w ork con- cerning p otential oscillatio ns. Finally , section 7 concludes the pap er. 2. RELA TED WORK A fi rst L WO algorithm for a give n intradomain traffic matrix has been prop osed by F ortz et al. in [10]. It is based on a tabu-search metaheuristic and finds a nearly-optimal set of link weigh ts th at minimizes a particular ob jective function, namely th e sum over all links of a conve x function of the link loads and/or utilizations. This problem has later been generalized to take several traffic matrices [11] and some link failures [12] into account. A heuristic that takes into account p ossible link failure scenarios when choosing weig hts is also prop osed in [17] by Nucci et al. In our L WO we reuse the heuristic detailed in [10], bu t we h a ve adapted this algorithm to consider the effect of hot-p otato routing. All the later impro vemen ts to this algori thm (i.e. multiple traffic matrices, link failures) could b e integrated in our new L WO in a similar wa y . The fact th at the intradomain TM is not the correct in- put for many T raffic Engineering problems had already been p oin ted out in [9, 8] by F eldmann et al., who suggested to consider the set of possible egress links in th e traffic matrix. In [18] severa l exten sions to the classical L WO problem are briefly describ ed by Rex ford, including a sk etch of a metho d that resembles ours. Our w ork is in line with this recom- mendation, as we connect several equiva lent egress nodes to a single v irtual no de representi ng t he d estination, but our pap er proposes a complete method to solv e t he link weigh ts optimization problem, applicable to intradomain and p eer- ing links, and w e demonstrate its efficiency on an op erational netw ork. I n [9] some metho ds to compute traffic matrices from netfl o w traces are also p resented, which are reused in this pap er. In [2] Agarwal et al. study how h ot-p otato routing influences the selec tion of IGP link metrics and ho w traffic to neighboring ASes shifts due to changes in the lo- cal AS’s link metrics. In their measuremen t study they find that metrics resulting from ignoring hot- p otato interaction can b e sub-optimal by as m uch as 20% of link utilization. W e show in this pap er that the sub-optimalit y can be muc h larger. They also find that as muc h as 25% of traffic to a neighboring AS can shift the exit p oint due to a lo cal A S IGP li nk metric opt imization. They ha ve developed a patc h to their link weigh t o ptimizer which recomputes the in t rado- main traffic matrix f rom the in terdomain one at eac h step of the optimization. Their optimize r d o es not consi der directly the interdomain traffic matrix so n othing preven ts it to in- definitely iterate, as it would probably do in the toy example of figure 1 if the heuristic tries to tu ne the link we ights to enable intradomain equal cost multipaths. Also, their link w eights optimizer do es not engineer interdomain link s and they hav e tested their algorithm on only 80% of the total traffic of their priv ate ISP while w e hav e tested it on 99% of traffic of an op erational netw ork. Finally their source code is not a v ailable, while our algorithm will b e a v ailable in op en- source in the TOTEM to olb o x ([1]). Cera v- Erbas et al. h a ve already shown in [6] that the link w eights found by a L WO may change th e intradomain TM considered as input. In t hat pap er they also sh o w that ap- plying L WO recu rsiv ely on the resulting intradomain TM ma y not converge . They prop ose a metho d that k eeps track of the serie s of res ulting TMs and at each itera tion they op - timize the weigh ts f or al l the previous resulting intradomain TMs simultaneously . How ever, they do not consider the gen- eral problem with multiple exit p oints for each d estination prefix, let alone taking ad van tage of it. In [25] a class of t raffic engineering algorithms is prop osed by W ang et al. to optimize for the exp ected scenarios while providing a worst-case guarantee for unexp ected scenarios. They p ropose to t ake th e interdomain rout in g into account by splitting the problem into tw o subproblems. The first one consists in optimizing the mapping of every (hot-p otato) destination prefix to a single egre ss p oint. This can then b e implemented in BGP by assigning a higher lo cal preference to the route received by the c hosen egress no de. The se cond subproblem is then the classical link weig hts optimization for th e resulting (and now inv ariant) intradomai n TM. In our approach we solve b oth subproblems in one step with the usual L WO and we do not necessarily need to assign local preference v alues t o pin dow n every destination p refix to a un ique BGP next- hop. By keeping all the p otential next-hops we hav e more flexibility to engineer the netw ork. Several studies hav e sho wn th at the p rop ortion of p refixes whose nex t hop is selected by the h ot-p otato criterion can b e very large in ISP netw orks. Based on measurements of one ISP netw ork (A T&T’s tier-1 backb one netw ork) T eixeira et al. show in [23] that hot-potato routing changes a re respon- sible for a big part of BGP rout in g changes. While t his is not the main goal of th at pap er th ey hav e measured th at more than 60% of th e prefix es can b e affected by the hot-p otato routing changes and that these hot-p otato prefixes account for 5-35% of the traffic in the netw ork. It is also explained in [22] that ” Since large ISPs typicall y p eer with each other in multiple lo cations, the hot-p otato tie-breakin g step al- most alw ays drives the final rout ing decision for destinations learned from peers, although th is is m uch less common for destinations advertised by customers.” . The auth ors sho w that although most routing c h anges do not cause important traffic shifts, routing is a ma jor con tributor t o large traffic v ariations. This demonstrates that it is very imp ortant to take BGP routing considerations in to account when running traffic engineering algorithms. In [19] Roughan et al. analyse t he effects of imprecision in the traffic matrix due to estimation techniques on traf- fic engineering algorithms. While the effects of th ese im- precisions seem to b e quite limited, we show in this pap er that the eff ects due to h ot-p otato routing can be very large. This is an imp ortant result as this highlights th at n ot taking hot-p otato effects in to account cannot b e simply seen as re- sulting in little (harmless) imprecision in the t raffic matrix. Hot-p otato errors in the TM can really be a big problem for intradoma in TM-based TE algorithms optimizing th e link w eights. An imp ortant point in the whole traffic engineering pro cess is t he selection of a (set of ) traffic matrix(ces) to use as input of the traffic engineering algorithm. This problem is addressed in [27] by Zhang and Ge, who try and fin d su ch a subset of critical traffic matrices from the whole set of measured traffic matrices. This work is complemen tary to ours. T o the b est of our knowl edge this pap er prop oses the first algorithm to find the best p ossible set of link weigh ts to en- gineer intra- and inter-domain links while taking h ot-p otato effects into account. 3. R OU TING PRINCIPLES P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 N 1 N 2 N 3 N 4 R 1 R 2 R 3 R 0 AS 1 AS 2 1 / 1 1 / 2 1 / 3 1 / 4 Enginee red AS Figure 3: More Complex T op ology Eac h pack et sent on the Internet follow s a path which is defined by routing protocols. The exterior gatew a y protocol (EGP) defi n es the path at the netw ork-level. This path is called the AS path 3 . The EGP used in the Internet is BGP (Border Gatew ay Protocol). In eac h A S the path f rom eac h ingress router to each egress router is defined b y the interi or gatew ay protocol (IGP). The IGPs most commonly u sed in transit netw orks are OSPF and I SIS. In an AS the path b etw een ingress and egress routers are computed by a S hortest-Path algorithm based on the link w eights. If ECMP (Equal Cost Multi-P ath) is enabled, sev- eral eq ual shortest-path s can b e used simultaneously to evenly split the traffic among them, by using a hash table that maps a hash of multiple fields in th e pac ket header to one of these p aths, so that all pac kets of a flow will follo w the same p ath with limited pack et reordering (see [5] for a p er- formance analysis of hashing b ased schemes for Internet load balancing). Figure 4 shows an example of ECMP inside an AS. This figure assumes th at there a re t w o equal cost paths from R 0 to R 1 . P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 R 1 R 0 AS 1 AS 2 1 / 1 1 / 2 1 / 2 1 / 3 1 / 4 Enginee red AS Figure 4: In tradomain Equal Cost Multipath (ECMP) BGP allow s routers to exchange reachabili ty information b e- tw een neighboring ASes ([21]). Each A S is connected t o sever al neighboring ASes by in terdomain links. Dep ending on th e conn ectivity of the netw ork and on the destination of the p ac ket, one or several neigh b oring ASes can b e cho- sen to forw ard the p acket to the d estination. The choice of the BGP next- hop ( i.e. the egress router in this A S or the 3 AS stands for Autonomous System. In the p ap er we u se domain and AS interc hangeably . b order router in the next A S , that will rela y the pack et to- w ard th e destination) is based on th e information exchanged with neighbors and on a local configuration implementing its routing p olicy . There are tw o typ es of BGP sessions that are used to ex- change routes b etw een routers. eBGP sessions are used b e- tw een routers in differen t ASes, while iBGP sessions are u sed b etw een routers in the same AS. When a router receiv es a route on a iBGP or eBGP session, this route has to pass the input filter to b e eligible in the BGP decision p rocess that selects the b est rout e(s) tow ard eac h d estination pre- fix. The b est route(s) selected by this pro cess is(are) then forw arded on other BGP sessions after passing through an output filter. The BGP route selection p rocess, implementing routing p oli- cies, is made of several criteria ([4, 13]): 1) Prefer routes with the highest lo cal preference which reflects the routing p olicies of the domain; 2) Prefer routes with th e shortest A S-level Path; 3) Prefer rou t es with the low est origin number, e.g., th e routes originating from IGP are most reliable; 4) Prefer routes with th e low est MED (multiple-exit dis- criminator) typ e which is an attribute used to compare routes with th e same next AS-h op; 5) Prefer eBGP-learned routes ov er iBGP-learned ones (referred to as the eBGP > iBGP criterion in th e se- quel); 6) Prefer the route with th e low est IGP distance to the egress p oint (i.e. th e so-called h ot-p otato, or early ex it, criterion); 7) If supp orted, ap p ly load sharing b etw een path s. Oth - erwise, apply a domain-dep end ent tie-b reaking rule, e.g., select t he one with the lo w est egress ID. In this pap er we will b e p articularly interes ted in routes that are selected using the 6th criterion, whic h refers to the link w eights of the d omain to sel ect the best route tow ard a destination. Consider th e netw ork of figure 3 . Supp ose th at routes to P 1 are announced by N 1 to R 1 and N 2 to R 2 on eBGP sessions. Supp ose that the routes announced by th ese tw o routers hav e the same attributes ( i.e. lo cal-preference, AS- path, origin number and MED) after p assing the input filters of rout ers R 1 and R 2 (this is very frequent in practice for routes that are receiv ed from the same neighboring AS) . Supp ose also that these tw o routes are forw arded by R 1 and R 2 to R 0 on iBGP sessions. Usually the attribut es are not changed when forw arding routes on iBGP sessions. So R 0 has tw o routes to reac h P 1 and these tw o routes are equiv alent w.r. t. criteria 1 to 4. Both are receive d on iBGP sessions so are also equ iv alent w.r.t. t h e 5th criterion. In this case R 0 will use its IGP distance to R 1 and R 2 to select the b est route tow ard P 1 . W e sa y that this route is c hosen using the hot-p otato criterion by router R 0 . N ote t hat R 1 and R 2 will directly forward traffic tow ard this prefix on th eir interdoma in link using the eBGP > iBGP criterion. So w e see that prefixes that are routed via the h ot-p otato criterion b y some routers will b e routed according to the eBGP > iBGP criterion by some others and vice-versa. P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 R 1 R 2 R 0 AS 1 AS 2 1 / 1 1 / 2 1 / 2 1 / 3 1 / 4 Enginee red AS Figure 5: i BGP multipath Now if R 1 and R 2 are at t he same distance from R 0 , the 7th criterion will b e used. By d efault only one next hop can b e c hosen and a tie-break selects th e b est route. But it is also p ossible to enable iBGP multipath load sharing ([4, 13]) to balance the load on b oth paths. A s for intradomain ECMP , a hash table is u sed to select the particular route of a pack et. Figure 5 supp oses th at iBG P m u ltipath is activ ated and that R 1 and R 2 are at the same d istance from R 0 . In this case the traffic going from R 0 to P 1 will be sp lit evenly on b oth paths. If b oth ECMP and iBGP multipath are activ ated, w e ha ve to clarify ho w the traffic is split b etw een multiple paths. Consider figure 6. S upp ose that R 1 and R 2 are at equ al distance from R 0 . Two equal cost path s are av ailable from R 0 to R 1 and only one from R 0 to R 2 . The load sharing implementa tions in routers w e are aw are of will send 1 / 3 of the traffic on each of t he 3 av ailable paths at router R 0 . P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 R 1 R 2 R 0 AS 1 AS 2 1 / 1 1 / 2 1 / 3 1 / 3 1 / 3 1 / 3 2 / 3 1 / 4 Enginee red AS Figure 6: ECMP + iBGP multipath 4. A BGP-A W ARE LINK W EIGHTS OPTI- MIZER In this section w e p resent our mo del of t he general traffic engineering problem. W e will use th e netw ork of figure 3 to illustrate all the presented concepts. 4.1 F ormulatio n of the traffic engineering pr ob- lem A n etw ork is mo deled as a directed graph, G = ( N , L ) whose vertices and edges represent n odes and link s. The basic in- tradomain top ology is composed of all the no des and links that b elong t o t he AS. W e consider t w o disjoin t categories of destination prefixes. The single-e gr ess pr efixes are those prefixes for which the BGP next-hop is chosen by one of the first 4 BGP criteria. The hot-p otato pr efixes are all the other prefixes. F or eac h of them there is at least one router in the d omain t hat h as u sed the h ot-p otato crite- rion, or a follo wing one, to select the next - hop. F or each of th ese hot-p otato pr efixes how ever, th ere are also at least tw o other routers that forwa rd traffic according to the 5th BGP criterion (eBGP > iBGP), that has precedence over the hot-p otato criterion (as shown in the example of section 3). The traffic forw arded to the single-e gr ess pr efixes constitutes a (hot- p otato inv arian t) intradoma in TM, called T M invar . W e also include in that T M invar the traffic forwa rded to the hot-p otato pr efixes originated from the particular no des that uses the 5th BGP criterion ( eBGP > iBGP) to choose th eir b est route. The remaining traffic forw arded to hot-p otato pr efixes constitutes T M hp . F or every hot-p otato prefix w e conceptually add a virtual no de represen t in g it. Then for ev ery p eering link on whic h equiv alent BGP routes (u p t o criterion 4) hav e b een an- nounced for that prefix, we extend the intra domain top ol- ogy with a link+no de pair represen t ing this p eering link and the n eigh b oring router on the other side of t his link. Finally w e attach all these neigh b oring routers to the virtual node (representing the hot-potato p refix) b y adding virtual li nks . Therefore we hav e three disjoint sets of edges in the top ol- ogy: L intra is t he set of intradomai n links, L inter is t he set of interdomain links, and L vir tu al is the set of v irtual links. Similarly we split the n od es in the top ology into three dis- join t sets: N intra is the set of routers from the local AS, N neigh is the set of b order routers in neighboring ASes, and N vir tu al is the set of v irtual no des. Figure 7 shows such a t op ology . It is the same as figure 3 where prefix es are replaced by virtual n o des and p ossible paths to prefix es are replaced by virtual links. P 1 , P 2 , P 3 and P 4 are HP prefixes that comp ose N vir tu al . The BGP- equiv alent rout es (up to rule 4) are announced by N 1 and N 2 for P 1 and P 2 , b y N 1 , N 2 and N 4 for P 3 , and by N 2 , N 3 and N 4 for P 4 . L inter = { R 1 − N 1 , R 2 − N 2 , R 3 − N 3 , R 3 − N 4 } and L vir tu al = { N 1 − P 1 , N 1 − P 2 , ... } . N intra = { R ∗ } , N neigh = { N ∗ } , and N vir tu al = { P ∗ } . P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 N 1 N 2 N 3 N 4 R 1 R 2 R 3 R 0 AS 1 AS 2 1 / 1 1 / 2 1 / 3 1 / 4 Enginee red AS Figure 7: More Complex T opol ogy with virtual node s Eac h v irtual link ( l ∈ L vir tu al ) h as infinite capacity c l = ∞ and a fixed weigh t w l = 0. Every other link ( l ∈ L intra ∪ L inter ) h as a capacity c l and a w eight w l . Let us n ote that interdoma in and virtual links are directed (t ow ard the des- tination prefi x ) as no transit via a virtu al n od e is allo wed. The traffic will follo w th e shortest p ath(s) b ased o n the li nk w eights. If there are multiple equal cost paths, traffic is con- sidered to b e evenly split among them, as show n on figures 4, 5 and 6. Once t he paths are chose n, we can associate with eac h link l a load l l , which is the prop ortion of traffic that trav erses link l summed o ver all pairs of source/destination n odes. The utilization of a link l is u l = l l /c l . The goal of the L WO is t h en t o find th e set of link weigh ts that minimizes our n etw ork-wide ob jective function b ased on the loads and/or utilizations of in tradomain and interdo- main links. 4.2 Aggr egati ng pr efixes The problem as formulated in the preceding section is not solv able in practice. I ndeed the num b er of prefix es in t he BGP routing table of an internet router is ab out 160,000 and so in the worst case all the prefix es are hot- p otato pre- fixes and ab ou t 160,000 nodes would b e added to the in- tradomain top ology (see section 5 for the actual number of hot-p otato prefix es in the op erational netw ork w e hav e studied). Ho w ever all prefixes that are reac hable through exactly th e same set of p ossible no des ∈ N neigh can b e ag- gregated (e.g., no des P 1 and P 2 in figure 7 can b e merged) as they are ind istinguishable from an intradomain routing p ersp ective. This will drastically reduce the num b er of vir- tual no des. N ote that if n is th e num b er of peering links of t he AS, there can still b e 2 n virtual n od es in th e wors t case. In practice how ever it is muc h lo w er, as explained in [7]. In deed routes are often announ ced with the same pa- rameters on p eering link s with the same neighbor AS. F or the op erational netw ork we hav e used as a case study , the num b er of p eering links tra versed by hot-p otato traffic is 18. Out of 2 18 p ossible different combinations of p eering links, only 26 are actually observed! W e can still go one step further by taking the traffic des- tined for eac h aggregated virtu al nod e into account. F or example, in our cas e study we have noticed that no traffic is sent to 8 of them, and only a very small volume of traffic is sent to 13 others, thus leading to 5 no des receiving 99.94% of th e hot-p otato traffic ( T M hp ). So w e can basically ex- tend the intradomain top ology with these 5 v irtual n od es without really losing accuracy . This is really significan t for the p ractical efficiency of the L WO. More precisely , using 5 nod es in stead of 18 reduced th e av erage computation t ime of the algorithm from 582 to 140 seconds 4 without decreasing the q ualit y of the provided solutions. Stated otherwise, t he same compu tational budget w ould allo w u s to find a b etter solution (using more iterations) on t h e smaller top ology . Figure 8 d epicts the structure of the aggregated interdomain 4 This is the av erage computation time ov er 14 runs on dif- feren t TMs with 50 iterations p er run . W e have used 50 iterations b ecause we have noticed that increasing this num- b er did not significan tly improv e the qu ality of the solution found on this d ata. These simulation times are meas ured on an IBM computer eServer 325 with 2 AMD opteron 2GHz 64 b its pro cessors and 2GB of memory . traffic matrix, with on e ro w p er edge no de in N intra and one column p er edge no de in N intra or in N vir tu al . P S f r a g r e p l a c e m e n t s P 1 P t R 1 R 1 R n R n S r c Dst T M invar T M hp Figure 8: The Aggregated Interdomain T raffic Ma- trix T o b uild this aggregated interdomain traffic matrix we pro- ceed as follo ws. Let ( s, p ) b e the traffic volume from an ingress no de ( s ∈ N intra ) t o a d estination prefix ( p ). I f p is not a h ot-p otato prefix (i.e., th ere is only one p ossible egress no de t ∈ N intra ), we add this traffic volume to the pair ( s, t ) in T M invar . I f the prefix p is a hot-p otato prefix, w e distinguish t w o sub cases. If nod e s is a p ossible egress nod e for this prefi x, we add this traffic volume to the pair ( s, s ) in T M invar (indeed this traffic will b e rout ed using the eBGP > iBGP criterion). On th e other han d , if s is n ot one of the p ossible egress no d es for p , we add th is amount of traffic to the pair ( s, P i ) in T M hp , where P i ∈ N vir tu al is the virtual nod e asso ciated with the prefix aggregate comprising p . Now we hav e our aggregated interdomain traffic matrix, whic h is comp osed of T M invar and T M hp . 4.3 Engineer ing intra- and interdomain links L WOs usually try and find nearly optimal set of intradomain link weig hts. A n optimal set of we ights is defined as a set of w eights that asso ciates the minimal v alue to a predefined ob jective function. The ob jectiv e function is generally the sum ov er all the link s of a conv ex fun ction of the link load and/or utilization. In [10] they u se a piecewise linear con vex function of the link u tilization and capacity ( φ l for link l ): φ = P l ∈ L intra φ l , where L intra is the set of intradomain links. W e reuse this netw ork-wide ob jectiv e function, while others could b e used in the optimizer. As interdoma in links are now part of the top ology , w e can include t h ese l inks in the ob jective function. W e are flexible with resp ect to the inclusion of these interdomain links in the ob jective function by adding a parameter α which de- termines the relative imp ortance of interdomain links with respect to intradomai n ones. The new function is φ = P l ∈ L intra φ l + α P l ∈ L inter φ l . I n section 5 w e will compare cases where α = 0 and α = 1. V alues of α in b etw een hav e not b een tested as α = 1 seemed to b e the go o d compromise in our case. Indeed as sho wn in section 5.2 it w as p ossible to engineer interdomain links without decreasing th e effi- ciency of the intra domain load balance. Note that it could b e different in other netw ork s an d in this case it would be interes ting to test other v alues of α . The inclusion of interdomain links in the ob jective function is a key adv antage of our metho d as it allo ws the L WO to engineer these interdoma in links in addition to in tradomain ones. With a classical L WO th ere is no p oint in includ- ing interdomain links in the top ology and engineer them, b ecause the intradomain TM u sed as input pins d o wn the egress node anyw ay , th us assigning the same load on t he in- terdomain li nks irres p ective of th e link weigh ts. As we relax the constrain ts on the egress nodes, it see ms natural to tak e adv antage of it to also engineer th e traffic on interdomain links. With our metho d it suffices to include t hese links in the ob jectiv e function. 4.4 Collecting input data f or the optimizer Our L WO needs as input some information ab out the traffic and also some BGP d ata. The needed traffic information is the traffic volume from every ingress router to every desti- nation prefix. F or the BGP information w e ha ve to discrim- inate the h ot-p otato prefixes from the oth er ones. F or h ot- p otato prefixes, w e need the set of p ossible BGP next- hops. F or other prefixes, we just need the u nique BGP next-hop . W e will mainly d escribe the metho d w e hav e used in our case study . A m on itoring station has b een installed inside the netw ork to collect BGP traces 5 . I t is part of the iBGP full- mesh and records all the exchanged BGP messages to build BGP t races, i.e. daily d umps containing all the rout es re- ceive d b y t he monitoring station. In oth er wo rds, the traces conta in for each day all the b est routes used by all the routers of th e netw ork tow ard every p ossible destination prefixes. W e distinguish tw o categories of prefixes: • The prefixes f or which the same route is se lected by all the routers as th e b est route (they will correspond to our earlier d efinition of single-egress prefix es); • The prefixes for which at least tw o routers in t h e AS hav e selected different b est routes (they will corre- sp on d to our earlier d efinition of h ot-p otato prefixes). The first category of prefix es contai ns all the prefixes for whic h the b est route is selected by one of the first 4 cri- teria of the BGP pro cess (local preference, AS path, origin num b er and MED), and the second category contains all the prefixes for which the b est route is selected at a later stage (i.e. by the eBGP > iBGP , hot-p otato, or t ie-b reak or load- balancing criteria). Indeed sup p ose that several routes for the same prefix are received on different eBGP sessions. If one router selects its b est route by one of the first four cri- teria, all the other routers will select exactly the same rout e by the same criterion, b ecause eBGP data are exchanged ” as is” on all iBGP sessions and all the rout ers are part of the iBGP full mesh. On the other hand if there are at leas t tw o equiv alent routes after th e 4th criterion, then each of these routes wi ll b e chosen by at least one ro uter according to the 5th criterio n (eBGP > iBG P), namely the border router th at has received that rout e on its eBGP session. So we can dedu ce t hat if w e see only one route for one p refix in the BGP trace, this means th at this prefix is n ot a hot- p otato p refix. If this prefix app ears at least twice this means that this prefix is rout ed by the 5th, 6th or 7th criterion 5 W e reuse the BGP traces collected for [24]. dep ending on the rout er. This prefix is an ywa y a hot-p otato prefix, b ecause even though some routers hav e chosen their b est route by t he 5th criterion, other routers must ha ve used the 6th or 7th criterion in this case. 4.5 Incor porating changes in a classi cal L WO W e hav e mo dified th e classical L WO to includ e BGP con- siderations. Three typ es of links (intradomain, in terdomain and virtual) are now present in the mo del. Intradomain links are u nchanged. Interdomain links ha ve a fi nite capacity and a w eigh t. These are considered in t h e ob jective function, w eighted by the α parameter. Finally virtual link s hav e in- finite capacities, are n ot considered in the ob jective fun ction, and h a ve a n ull weigh t. After these mo difications a classical L WO, equipp ed with all its h euristics, can be app lied on our extended mo del. Notice that the classical L WO considers implicitly that it is p ossible to split the traffic evenly along sev eral equal cost paths. Therefore it will b e necessary to enable ECMP in the netw ork to really get the expected p erformance. This is anyw ay a very reasonable choice. Moreo ver, it w as sho wn (in [14] for th e Sprint n etw ork) that ECMP impro ves ro- bustness. In [20] the authors claim th at having m ultiple shortest paths b etw een p airs of rou t ers p ro vides the ability to switch ov er to another path in cas e of link failure without o verla pping with the previous path of another no de, whic h could ha ve lead to a transient forw ard in g loop. It is also said that this is useful to reduce the latency for fo rw arding- plane conv ergence for IGP routing changes . Similarly to ECMP , we h ave considered that it is possible to split the traffic evenly along multiple equal shortest-paths up to the virtual no de. So to get the exp ected p erformance the net- w ork administrator will hav e to enable iBGP multipath load sharing. Enabling iBGP multipath load sharing is again a natural choice for traffic engineering and is easily enabled on routers of main eq uipment v end ors. 4.6 Respecting the eBGP > iBGP criterion If nex t-hop-self is n ot activ ated in the net work, it is possible to let the optimizer c ho ose weigh ts on in terdomain links. This give s more knobs to tune to the L WO, in addition to the in tradomain links wei ghts. The pros is that the L WO ma y p otentially find a b etter solution, and th e cons is the larger searc h space t hat increases the computation time to p erformance ratio. I n large netw orks it may b ecome to o costly to assign link weigh ts to interdomai n links. Moreo ver, assigning w eights to interdomain links may con- tradict the eBGP > iBGP criterion. W e explain this p oin t on t h e simplified netw ork of figure 9. Su pp ose that the L WO has found the link weig hts ind icated on the fi gure. W e can easily compute that the sh ortest path tree tow ard destination prefi x P 1 is R 1 - R 3 - R 2 - P 1 . And that is exactly what the L WO h as considered du ring its optimiza- tion. Ho w ever traffic sen t by R 1 to P 1 will actually follow another path , n amely R 1 - R 3 - P 1 , b ecause according to the eBGP > iBGP rule, whic h has precedence o ver the hot- p otato rule, R 3 prefers to forw ard this traffic directly on its p eering link, although the path via R 2 has a low er cost (in terms of w eights). In our simulations we force interdomain link weigh ts to 0, Figure 9: T oy Example - with link weight s while all intradomain links are constraint to h a ve integ er w eights ≥ 1, so that this problem is av oided. Indeed for example in the simplified netw ork of figure 9 the shortest path from R 3 to P 1 will alw ays b e R 3 - P 1 (w eigh t = 0) and never R 3 - R 2 - P 1 (w eigh t ≥ 1). Note that setting all the w eights of interdomain links to 0 still allo ws us to engineer interdoma in links by including these in the ob jectiv e func- tion as explained in section 4.3. So this i s not a shortcoming and th is is confirmed by the goo d results of the sim u lation study . 4.7 Simplifying the model When using the L WO without optimizing in terdomain links (i.e. only in tradomain links are in the ob jective function), a simplification of the mo del is p ossible. Ind eed we can remo ve all th e interdomain links ( L inter ) and all the neighbor n od es ( N neigh ) from our model. Figure 7 w ould result in this case in figure 10 (where P 1 and P 2 hav e already b een aggregated). Indeed in th is case the mo del has just to includ e all the p ossible egress nodes for each traffic. This simplification decreases the number of links and n od es of the mo del and so improv es the efficiency of the optimizer. P S f r a g r e p l a c e m e n t s P 1 P 2 P 3 P 4 N 1 N 2 N 3 N 4 R 1 R 2 R 3 R 0 A S 1 A S 2 1 / 1 1 / 2 1 / 3 1 / 4 Enginee red AS Figure 10: Sim plified Mo de l 5. SIMULA TIONS ON AN OPERA TIONAL NETWORK W e have tested o ur algorithm on rea l data of a multi-gigabit operational n etw ork that spreads ov er th e Europ ean conti- nent and is comp osed of ab out 25 no des and 40 bidirectional intradoma in links. Link capacities range from 155Mbps to 10Gbps. It is a transit netw ork th at has tw o providers con- nected with about 10 interdomain links, h as other p eer ASes connected with abou t 15 shared-cost links, and has more than 25 customer A S es, which are mainly single-homed. The total traffic exchanged is ab out 10 Gbps on a verag e. In this netw ork there is an iBGP full mesh, MEDs are cur- rently not used, and there are three different local pref- erence v alues: th e lo west v alue is used for routes learned from pro vider links, the intermediate va lue is used for rout es learned from shared-cost p eering link s, and the highest v alue is used for routes learned from customer links. R oute p aram- eters are exchanged unmo dified on all iBGP sessions. W e hav e used the technique exp osed in section 4.4 to b uild our mod el. W e hav e u sed netflow data du mp ed every 15 min- utes on ev ery ingress router with a sampling rate of 1 / 1000 , aggregated per ingress node and destination prefix. W e had access to ab out one mon th of traces, one BGP dump per day and one sampled netflow file for eac h ingress router. W ith these data w e ha ve generated 2,512 aggreg ated interdomain traffic matri ces (eac h matrix is an av erage o ver 15 minutes). This whole set of traffic matrices is representativ e of th e traffic on the studied netw ork. Some of these in d uce a low load on the netw ork while some induce a high load 6 . The a verag e num b er of p refixes is 160,973 of which 97.2% (156,407 ) are hot-p otato prefixes. If w e now take traffic into accoun t , we hav e measured that these 97.2% amount to 35.6% of the t raffic on av erage. This is still enough to hav e a significant impact on the link loads of the n etw ork. Over all recorded TMs, the p eak v alue is 51.7% of the traffic and the minimal v alue is 24.6%. Another in teresting fact is that on ave rage 99.94% of hot-p otato traffic is destined for the 5 biggest clusters of prefix es. The sets of in terdomain links giving access to eac h of th ese 5 clusters of prefixes are either all p eering links to a neighboring AS (for 3 clusters), or a mix of p eering links from tw o such ASes (for 2 clusters). W e have run d ifferent versions of the L WO on a large num b er of traffic matrices. Section 5.1 presents some simulation re- sults demonstrating the intradomain t raffic engineering ca- pabilities of our algorithm while section 5.2 demonstrates that interdoma in traffic engineering is also possible. All the sim u lations consider th at ECMP an d iBGP multipath are enabled. 5.1 Intradomain TE W e first compare a classical L WO (d enoted Intr aL WO ) with our BGP-aw are optimizer (denoted BGP-awar eL WO ). T o execute I ntr aL WO w e had to generate for each in terdo- main TM the corresponding intradomai n TM where the hot- p otato traffic is routed considering the present (i.e., non en- gineered) link w eights. So these i ntradomain TMs are those that would b e measured in t h e netw ork. F or the comparison w e hav e run b oth optimizers on all the 2,51 2 aggregated in- terdomain TM. Optimizers consider weigh ts in a range from 1 to 150. Figure 11 show s the maximal intradoma in link utilization ( U max ) for some worst-case TMs. W e ha ve run Intr aL WO on ev ery intradomain TM, and com- puted th e resulting maximal link utilization, assuming that the intradomain TM remains inv arian t (thus ignoring hot- p otato effects). In the sequel th ese v alues are d enoted Intr aL WO- optimistic . F or this link weigh ts setting, if hot-p otato effects are taken into account, we get the resulting maximal intrado- main link utilization den oted Intr aL WO-r esulting . These are the real v alues that would b e observed if the opt imized 6 The set of in t radomain traffic matric es bu ilt from th e same BGP data and neflow traces is describ ed in [24]. 0 20 40 60 80 100 120 140 160 TM1 TM2 TM3 TM4 TM5 TM6 TM7 TM8 TM9 TM10 TM11 TM12 TM13 TM14 Maximal link utilization (%) Traffic Matrix ID IntraLWO optimistic IntraLWO resulting BGP-aware LWO Figure 11: U max v alues for some w orst case TMs link w eigh ts were installed in the netw ork. These v alues are very different, and so metimes t he resulting max imal ut i- lization is even worse t h an the routing without link weig ht optimization (not present in the figure). Finally we ha ve run our BGP-awar eL WO and we can see that the maximal link utilizations are ver y goo d. Figure 11 shows a selec- tion of TMs providing the w orst-case v alues for Intr aL WO- r esulting 7 . The a verage reduction of U max from Intr aL W O- r esulting to BGP-awar eL WO ov er all TMs is 4.5%, but let us outline th at the wors t-case TMs do matter muc h more, b ecause the main goal of our L WO is to filter out the u n- exp ectedly bad link w eights settings proposed by a class ical L WO. In all cases the real minimal v alue of U max ac hiev- able in pr actic e are the val ues of BGP-awar eL WO , since the Intr aL WO-optimi stic are d isqualified in the comparison. 0 0.2 0.4 0.6 0.8 1 1.2 0 20 40 60 80 100 120 140 160 180 Percentage of TMs Maximal Link Utilisation BGP-aware LWO Intra LWO resulting Figure 12: CDF s of U max o ver all TM s for BGP- awar e L WO and Intr aL WO-r esulting Figure 12 sh ows th e CDFs (cumulativ e distribution func- tions) of the maximal link utilization ov er the 2,512 TMs for BGP-awar eL WO and Intr aL WO-r esulting . I ntr aL WO- optimistic is n ot depicted on th e figure b ecause it w ould b e almost mixed up with BGP-awar eL WO . W e can clearly 7 In this case w e define wo rst case v alues as v alues of traffic matrices providing the highest intra domain maximal link utilizations. see t hat BGP-awar eL WO is b etter than Intr aL WO-r esulting . Figure 13 giv es th e prop ortions of TMs p er range of max- imal link utilizations. In this figure we can see that BGP- awar eL WO tak es adva ntage of the freedom of choice of th e egress p oint(s) for hot-p otato traffic. In deed BGP-awar eL WO is sligh tly b etter than Intr aL WO-optimistic . F or ex ample there are 3.4% less TMs in the [30, 40) range. This in- dicates that our optimizer can change the egress p oin t of some hot-p otato traffic to b etter engineer t he netw ork. Concerning the comput ational efficiency of the L WO, add ing the virtual links and nodes has roughly doubled th e compu- tation time. W e consider that this is not a high cost given the improv ed quality of the solutions found. One ma y w onder why BGP-awar eL WO d oes n ot alwa ys find a b etter solution than I ntr aL WO-optimistic (figu re 11) . It is b ecause the ob jectiv e function do es not strictly minimize the maximal link utilizatio n (i.e., it minimizes the sum o ver all links o f a con vex function of the link u tilization). There- fore even when the solution is sli ghtly b etter with respect to the ob jective function, it can still b e a little bit w orse with respect to the maximal link utilization. 0 10 20 30 40 50 60 [0,10) [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) [70,80) [80,90) [90,inf) Percentage of TMs in the interval (%) Maximal link utilization interval (%) IntraLWO optimistic IntraLWO resulting BGP-aware LWO Figure 13: Prop ortions of TMs in eac h U max interv al 5.1.1 In-depth analysis of the worst case scenario In this section we would like to analyse the wo rst case sce- nario concerning the maximal link utilization of I ntr aL WO- r esulting . With the wors t case traffic matrix, th e maximal link u ti- lization is 160% with the metrics optimized with Intr aL WO . The traffic shifts that happ en in this case are d epicted on figure 14. If P 2 < P 1 8 , traffic on the flow from S t o D 1 will b e routed on link L , and this will be exp ected by In- tr aL WO . But if P 4 < P 3 while b efore optimization P 4 > P 3 , the hot-p otato traffic from S to V ir tual D 4 will b e routed on L and this will NOT b e exp ected by Intr aL WO . This situa- tion happ en s four times on t he same low capacity link 9 for the worst case scenario, and for quite b ig h ot-p otato traffic flow s compared to the link capacity . 8 By P 2 < P 1 w e mean t hat the sum of th e metrics of the links of P 2 is smaller t h an the sum of the metrics of the links of P 1 . 9 This link has a capacit y of 155 Mbps. P S f r a g r e p l a c e m e n t s S P 1 P 2 P 3 P 4 L D 1 D 2 D 3 V ir tualD 4 Figure 14: T raffic shifts from one shortest path to another Before optimization the maximal link utilization is 34.8%. The utilization of the problematic link is only 0.4%. There are only four intra domain shortest p aths th at use this link and the t otal traffic on th ese four flows is 0.6 Mbps. After optimization the problematic link is u sed in 20 short- est paths instead of 4. This is exp ected by Intr aL WO which thinks that these 20 flows will afford 29 Mbps, leading to a utilization of on ly 18.9% ( < 34.1%, Intr aL WO t h inks that this link is not the most utilized link). What is not ex- p ected by I ntr aL WO is that 4 of th ese shortest paths will also attract hot- p otato traffic. The hot-p otato traffic whic h is shifted on one of these shortest paths comes from one of the 19 remaining shortest paths u sing this link. So th is shift has no effect on the load of this link. But th e h ot-p otato traffic attracted on the three remaining shortest paths comes from oth er flows whose shortest path do es not include the problematic link . These three fl o ws attract a total amoun t of 220 Mbps of h ot-p otato traffic, whic h is more than the capacit y of the link. 5.1.2 Incr easing the bottleneck links capac ities and the traf fic matrices T o analyse whether the presence of lo w capacit y links h as any impact on our results, we did also run our algo rithm on a mo dified versi on of t h e top ology , where all the 155Mbps links hav e b een replaced by 622Mbps links. W e h a ve also doubled all th e elemen ts of the traffic matrices in ord er to reflect a p ossible increase in the traffic demand in the future. With this v ersion of the top ology and traffic matrices, we hav e noticed that th e impact of hot-p otato reroutings on U max after a L WO optimization is larger than with the initial top ology and load. Ind eed the mean reduct ion of U max o ver all TMs from Intr aL W O-r esulting to BGP-awar eL WO is now 21.8% instead of 4.5%. This can b e observed on th e CDF of fi gure 15 for the up dated top ology and traffic matrices, whic h should b e compared to figure 12 for t he initial d ata. W e can also observe that for more than 45% of the traffic matrices, Intr aL WO-r esulting leads to a U max greater than 67.8% whic h is the U max reac h ed on the worst case TM by BGP-awar eL WO . Over all TMs U max hav e b een observed on at least 10 d if- feren t links. There a re 7.5% of the t raffic matrices fo r whic h U max is greater than 100% for Intr aL W O-r esulting , and th ese high U max v alues can b e observed on 6 d ifferent links, out of whic h only 2 are 622 Mbp s links. The worst case traffic matrix concerning U max for Intr aL WO-r esulting induces a utilization of 189.1% on a link whose capacity is 2.5 Gbp s. These results d emonstrate that it is n ot alw ays the same lo west capacity link that induces the highest u tilization in the netw ork. W e have also analysed CDF curves for t he second, third, fourth and fifth most u tilized links. F or the second most utilized link, results are similar to those sho wn on figure 15, with a p eak maximal utilization for I ntr aL WO-r esulting reac h ing 175.9%, and a maximal utilization b eing ab ov e 100% for 2% of the traffic matrices. Concerning th e third most utilized links, hot-p otato reroutings hav e less disas- trous consequences, while still significant in the w orst case as the maximal ut ilization p eaks at 95.3% for Intr aL WO- r esulting while it p eaks at 62.3% for BGP-awar eL W O . 0 0.2 0.4 0.6 0.8 1 1.2 0 50 100 150 200 Percentage of TMs Maximal Link Utilization BGP-aware LWO Intra LWO resulting Figure 15: CDF s of U max o ver all TM s for BGP- awar e L WO and Intr aL WO-r e sulting for the up dated topology 5.2 Interdomain TE One of th e most inn o v ative feature of our L WO is its abilit y to engineer traffic on th e interdomain links. W e first analyse the maximal link utilizations of the interdomain links with the p resent link we ights . The av erage v alue of Interdomain U max o ver all TMs is 36.8%. This v alue can p eak at 73.7%. W e h a ve selected the wors t TMs in this respect 10 and run B GP-a w are L WO on them with interdomain links in the ob jective function. The results are shown in figure 16 for the p eak TM. The maximal in terdomain link utilization is reduced from 73.7% to 36.8% when using B GP-aw are L WO. It sho ws that th e L WO can take adv antage of hot- p otato routing to also engineer traffic on interdomai n link s. W e now show that the optimization of interdomain link s is not done at the exp ense of in tradomain links. T o this end w e hav e run BGP-awar e L WO with and without interdo- main links in the ob jective function ( α = 1 or α = 0, see section 4.3) on the 50 TMs leading currently to the max- imal interdomain link utilization. Figure 17 presents the a verag e in tradomain and in terdomain U max v alues for these matrices. It sho ws t hat BGP-awar e L WO with all links in its o b jectiv e function can optimize interdomain links almost without impacting intradomain links. The a verag e intrado- main U max v alue is indeed almost eq u iv alent in b oth cases. 10 Here wo rst case TMs means TMs providing the highest interdoma in link utilization with present link metrics. 0 10 20 30 40 50 60 70 80 L1 L2 L3 L4 L5 L6 L7 L8 L9 Utilization of the link (%) Link ID Non optimized Optimized by BGP-aware LWO Figure 16: In terdomain link utilizations 0 10 20 30 40 50 60 70 80 Intradomain Links Interdomain Links Maximal link utilization (%) Present Routing BGP-aware LWO, No Interdomain TE BGP-aware LWO, Interdomain TE Figure 17: Combined Int ra- and In terdomain T raffic Engineering 6. FUTURE WORK A known p otential issue with L WOs is route instabilit y . As there is no mutual agreement on th e egress/ingress p oints b etw een ASes, it is n ot guaran t eed that tw o neighboring ASes (say AS x and AS y ) ru n ning their L WO will not oscil- late, one reoptimizing its link weig hts after th e other. I ndeed eac h link we ights optimization in AS x can lead to a change of some egress p oin ts, c hanging the traffic matrix in AS y whic h may trigger the reoptimization of t he link weigh ts in this AS , and so on, leading to route oscillations. Such instabilit y may already happ en with classical BGP- blind L WOs and, as our BGP-aw are L WO does not address this issue, some instability may also p otentiall y exist. In [15] the authors prop ose a metho d to negotiate the BGP egress p oint b etw een neighboring ASes. This technique should remo ve oscillations provided that it is p ossible to fix the egress p oint, which is not easy in O SPF/ISIS netw orks. I n [15] the authors consider MPLS netw orks instead. The related problem of BGP route oscillations when inter- domain traffic engineering techniques are used is considered in [26], where sufficient conditions are elaborated to guaran- tee BGP ro ute sta bility . Unfortunately , these conditions are not fulfilled in presence of L WOs (be it BGP-aw are or not), b ecause all L WOs take input traffic into account to choose links weigh ts, which in turn determine egress p oints for hot- p otato prefix es, and thus the corresp onding BGP routes. This p roblem of p otential oscillations is still an op en researc h topic, and w as n ot the primary goal of this pap er. 7. CONCLUSION W e prop osed a BGP-aw are Link W eight Optimizer (L WO) that extend s the classical (intradomain) L WO to take in to account BGP’s h ot-p otato routing principle. The optimized link weig hts, if deploy ed, will actually give rise to th e link loads exp ected by the optimizer, contrary to a classical ( in- tradomain) L WO that ma y lead to un exp ectedly high loads on some links when changing weigh ts impact t he intrado- main traffic matrix. In practice the metho d only requires to extend th e intra domain top ology with a limited num- b er of virtual no des and links, which preserves scalability , as sho wn on an op erational n etw ork used as a case study . The aggregated interdomain traffic matrix associated with this extended top ology replac es adv an tageously th e classical intradoma in traffic ma trix as in put to th e L WO. On this ba- sis, a classical L WO requires only small mo difications to b e reused on the exten ded top ology , and th is allows us to reuse all its well-tuned heuristics. The most inn ov ative key asset of th e metho d is its ability to optimize traffic on interdomain peering links as well. W e hav e shown on a case study that it do es so very efficiently at almost n o extra compu tational cost, while preserving the 5th BGP routing criterion stating that eBGP-learned routes should b e preferred to iBGP-learned ones. As for a classical L WO, our meth od can b e extended to more general scenarios including several traffic matrices as input and/or possible link failures. Note how ever that an interdo- main traffic matrix u sed as input is likely to b e already more stable (and thus represen tative) than in tradomain matrices. Indeed the interdomain matrix is inv ariant un der all local hot-p otato fluctu ations, e.g. due to failures. This b etter stabilit y of th e in t erd omain matrix wo uld allo w us to use a smaller set of representativ e matrices as input, which in turn wo uld give un ique link w eights settings that a re b etter optimized for eac h of them. Even though our metho d requires additional inputs t o build the interdomain traffic matrix and some more computation p o w er, this p a ys off, b ecause our BGP-aw are L WO clearly outp erforms classical ( intra domain) L WO. 8. REFERENCES [1] http://totem. run.montefiore.ulg.a c.be/. [2] S. Agarwa l, A . Nu cci, and S. Bhattacharyy a. Measuring the Sh ared F ate of IGP Engineering and Interdomain T raffic. In Pr o c. IEEE ICNP , Nove mber 2005. [3] S. Balon and G. Leduc. Dividing t h e T raffic Matrix to Approach Optimal T raffic Engineering. In Pr o c e e dings of 14th IEEE International Confer enc e on Networks (ICON 2006) , Singap ore, 13-15 Sep . 2006. [4] BGP Best p ath selection algorithm. http://w ww.cisco. com/w arp/pub lic/459/ 25.sh tml. [5] Z. Cao, Z. W ang, and E. Zegura. Performance of Hashing-Based Schemes for Internet Load Balancing. In Pr o c e e dings of INFOCOM , 2000. [6] S. Cera v- Erb as, O. Delcourt, B. F ortz, and B. Quoitin. The Interaction of IGP W eight Optimization with BGP. In Pr o c e e dings of ICISP , Cap Esterel, F rance, August 2006. [7] N. F eamster, J. Borkenhagen, and J. Rexford. Guidelines for interdomain traffic engineering. ACM SIGCOMM Computer Comm unic ations R eview , Octob er 2003. [8] A. F eldmann , A. Greenb erg, C. Lun d, N. Reingold, and J. R exford. NetS cope: T raffic en gineering for IP netw orks. I EEE Network M agazine , pages 11–19, Marc h /April 2000. [9] A. F eldmann , A. Greenb erg, C. Lun d, N. Reingold, J. Rexford, and F. T rue. Deriving traffic demand s for operational IP n etw orks: Methodology and exp erience. IEEE/ACM T r ansactions on Networking , pages 265–279, Jun e 2001. [10] B. F ortz and M. Thorup. I nternet T raffic Engineering by Optimizing OSPF W eigh ts. In Pr o c e e dings of INFOCOM , pages 519–528, 2000. [11] B. F ortz and M. Thorup. O p timizing OSPF/IS- IS W eights in a Changing W orld. IEEE Journal on Sele cte d Ar e as in Communic ations , 20(4):756–7 67, 2002. [12] B. F ortz and M. Thorup. R obust optimization of OSPF/IS-IS wei ghts. In Pr o c e e dings of INOC , pages 225–230 , Octob er 2003. [13] F oun dry enterprise configuration and management guide. http://www. foundrynet.com/services/ docu mentatio n/ecmg/BGP4.h tml#17143. [14] G. Ian n accone, C.-N. Chuah, S. Bhattacharyya, and C. Diot. F easibilit y of IP restoration in a tier 1 backbone. IEEE Network , 18(2), 2004. [15] R. Maha jan, D. W etherall, and T. Anderson. Negotiation-Based Routing Betw een Neighboring ISPs. In Pr o c. NSDI , 2005. [16] J. Moy . OS PF V ersion 2. RFC 2328 , April 1998. [17] A. Nu cci, B. S chroed er, S. Bhattacharyy a, N. T aft, and C. D iot. IGP Link W eigh t Assignment for T ransient Link F ailures. In Pr o c e e dings of 18th International T eletr affic Congr ess (ITC) , September 2003. [18] J. Rex ford. Handb o ok of Optimization in T ele c om m unic ations , chapter Route optimization in IP netw orks. Springer Science + Business Media, F ebruary 2006. [19] M. R ou gh an , M. Thorup, and Y. Zhang. T raffic Engineering with Estimated T raffic Matrices. I n Pr o c. IMC , 2003. [20] A. Sridh aran, S. B. Moon, and C. Diot. On th e correlation b etw een route dynamics and routing lo ops. In IMC ’ 03: Pr o c e e dings of the 3r d ACM SIGCOMM c onfer enc e on Internet m e asur ement , pages 285–294 , New Y ork , NY, US A, 2003. ACM Press. [21] J. Stewart. BGP4 : Inter domain r outing in the Internet . Addison W esley, 1999. [22] R. T eixeira, N. Du ffield, J. Rex ford, and M. Roughan. T raffic matrix reloaded: Impact of rout ing changes. In Pr o c e e dings of Passive and A ctive Me asur ement , Marc h /April 2005. [23] R. T eixeira, A. S h aikh, T. Griffin, and J. Rexford. Dynamics of hot- p otato routing in IP n etw orks. In Pr o c e e dings of ACM SIGMETRICS , Jun e 2004. [24] S. Uh lig, B. Quoitin, S. Balon, and J. Lepropre. Pro v iding public intradomain traffic matrices to the researc h communit y . ACM SI GCOMM Computer Communic ation R eview , 36(1):83–86, January 2006. [25] H. W ang, H. X ie, L. Q iu, Y. R. Y ang, Y. Zhang, and A. Greenberg. COPE: T raffic Engineering in Dynamic Netw orks. In Pr o c e e dings of A CM SIGCOMM , September 2006. [26] Y. Y ang, H. X ie, H. W ang, A. Silb ersc h atz, A. Krishnamurthy , L. Y anbin, and L. E. Li. O n route selection for interdomai n traffic engineering. IEEE Network , 19(6):20–27, Nov.-Dec. 2005. [27] Y. Zhang and Z. Ge. Finding Critical T raffic Matrices. In Pr o c e e dings of the International Confer enc e on Dep endable Systems and Networks (DSN) , June 2005.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment