Betweenness Centrality : Algorithms and Lower Bounds

Bet w eenness Cen tralit y : Algorithms and Lo w er Bounds Shiv a Kin tali ∗ Abstract One of the most f u ndament al pr oblems in large-scal e net wo rk analysis is to determine the imp ortance of a p articular no de i n a net w ork. Bet w eenness cen tralit y is the most widely used metric to measure the imp ortance of a n o de in a net wo rk. In this pap er, w e present a r andomize d p ar al lel algorithm and an algebr aic metho d for computing b et we enn ess cen tralit y of all no des in a netw ork. W e pr ov e that an y path-comparison based algorithm cannot compute b et we enn ess in less t h an O ( nm ) time. Keyw ords : all-pairs shortest paths, b etw een- ness cen tralit y , lo wer b ounds, parallel graph al- gorithms, so cial net works. 1 In tro duc tion One of the most f u ndament al problems in large- scale netw ork analysis is to determine the im- p ortanc e of a particular no de (or an edge) in a net work. F or example, in so cial netw orks w e wish to kno w agent s that hav e very short con- nections to large p ortions of the p opulation. In comm unication netw orks we wish to kno w the links that carry a lot of traﬃc, ISPs that at- tract a lot of bu siness, links that, if disconnected, decrease net work p erformance dramatically , and so on. A particular wa y to measure the im p or- tance of net work elemen ts (no d es or edges) is us- ing c entr ality metrics such as closeness cen tralit y [29], graph cen tralit y [19], stress cen tralit y [31] and b et w eenness cen tralit y ([16 ], [2]). An imp or- ∗ College of C omput ing, Georgia Institute of T ec hnol- ogy , A tlanta, GA-30332. E mail : kintali@c c.gate ch.e du tan t application of cen tralit y arises in the study epidemic phenomena in net works when an infec- tious disease o r a computer virus is disseminated. The p o we r of a no de to spread the epidemic is related to its cen tralit y [28]. Cen tralit y metrics also ﬁnd applications in natural language pr o- cessing [14], to compute relativ e imp ortance of textual units. Bet w eenness centralit y (in tro duced b y F ree- man [16] and An thonisse [2]) is the m ost p op- ular (and computationall y exp ensiv e) cent rality metric. Some recen t app lications of b et wee n - ness include the study o f biol ogical netw orks [20, 26, 12], study of sexual net wo rks and AIDS [24], identifying ke y actors in terrorist net works [22, 10], organizational b eha vior [6], supply c hain managemen t [9], and transp ortation netw orks [18]. Bet weenness can also b e used as a heuristic to solve NP-hard problems lik e graph clustering. F or e xample, Newman and Girv an [25] dev elop ed a heuristic to ﬁnd comm un it y str u cture in large net wo rks , b ased on b et w eenness of the edges of the net w ork. Since the n etw orks of interest are h u ge, it is imp ortan t to dev elop algorithms that compute these metrics eﬃcien tly . Br an d es [4] show ed that b et wee n ness cent rality can b e computed in the same asymptotic time b ounds as n Single Sou r ce Shortest Pat h (SSS P) computations. Brandes and Pic h [5] presented exp erimen tal resu lts of estimating diﬀerent centrali ty measur es under v arious nod e-selecti on strategies. Epp stein and W ang [13] pr esen ted a randomized appro xima- tion a lgorithm for closeness cen tralit y . 1 1.1 Bet weenne ss Cen trality W e d en ote a netw ork by an undir e cte d grap h G ( V , E ), with v ertex set { v 1 , v 2 , . . . , v n } (or { 1 , 2 , . . . , n } ), with | V | = n v ertices and | E | = m edges, represen ting the relationships b et wee n the v ertices. In this p ap er, w e refer to c onne cte d undir e cte d grap h s, unless otherwise stated. Eac h edge e ∈ E has a p ositiv e integer w eight w ( e ). Un weig hted graphs ha ve w ( e ) = 1 for all edges. A pat h from s to t is deﬁned as a sequence of edges ( v i , v i +1 ), 0 ≤ i ≤ l , where v 0 = s and v l = t . T h e l eng th of a p ath is the su m of weigh ts of edges in this sequence. W e us e d ( s, t ) to d e- note the distan ce (the minimum length of an y path conn ecting s and t in G ) b etw een vertic es s and t . W e set d ( i, i ) = 0 b y con ve ntion. W e d e- note the total n um b er of shortest paths b et wee n v ertices s an d t by λ st = λ ts . W e set λ ss = 1 b y conv en tion. The num b er of sh ortest paths b et wee n s and t , p assing through a v ertex v , is denoted b y λ st ( v ). Let D iam ( G ) b e the diam- eter (the longest shortest p ath) of th e graph G . Let A = ( a ij ) b e the adjacency matrix of the graph, i.e., A is a 0-1 matrix with a ij = 1 iﬀ ( i, j ) ∈ E . Let δ st ( v ) denote the f r action of shortest paths b et w een s and t that pass through a p artic- ular vertex v i.e., δ st ( v ) = λ st ( v ) λ st . W e call δ st ( v ) the pair - dependency of s, t on v . Bet w eenness cen tralit y of a verte x v is deﬁn ed as B C ( v ) = X s,t : s 6 = v 6 = t δ st ( v ) The dependency of a source verte x s ∈ V on a v ertex v ∈ V is deﬁned as δ s ∗ ( v ) = X t : t 6 = s,t 6 = v δ st ( v ) . The b et we enn ess centralit y of a v ertex v can b e then exp ressed as B C ( v ) = X s : s 6 = v δ s ∗ ( v ) . Deﬁne the set of pr edecessor s of a v ertex v on s hortest paths fr om s as P s ( v ) = { u ∈ V : ( u, v ) ∈ E , d ( s, v ) = d ( s, u ) + w ( u, v ) } . Th e fol- lo wing th eorem, states that the dep end en cies of the cl oser v ertices can b e computed from the de- p endencies of th e f ar ther v ertices. Theorem 1.1. [4] The dep endency of s ∈ V on any v ∈ V ob eys δ s ∗ ( v ) = X w : v ∈ P s ( w ) λ sv λ sw (1 + δ s ∗ ( w )) Brandes’s Algorithm [4] is based on the abov e theorem. First, n single-source sh ortest paths (SSSP) computations are done, one for eac h s ∈ V . The p redecessor sets P s ( v ) are main- tained du ring these computatio ns . Next, for ev ery s ∈ V , usin g the information from the shortest paths tree and predecessor sets along the paths, compute the dep endencies δ s ∗ ( v ) for all other v ∈ V . T o compute the b et ween- ness v alue of a v ertex v , we ﬁnally compute the sum of all dep end ency v alues. The O ( n 2 ) space requirement s can b e reduced to O ( n + m ) by main taining a running centralit y score. Note that the cen tralit y scores need to b e divided by t wo if the graph is undir ected, since al l shortest paths are considered t wice. Brandes’s Algorithm runs in O ( n m ) time for u n weig hted graphs and O ( nm + n 2 log n ) time f or w eigh ted graphs. 1.2 Our Results Brandes’s algorithm is a path-comparison based algorithm. W e pro v e that any p ath-comparison based algorithm cannot compute b et wee n ness in less than O ( nm ) time. Bet w eenness cen tralit y is closely relat ed to All P airs Shortest P aths Problems (APSP) and algebraic metho ds hav e b een v ery successful in obtaining b etter run- ning times for APSP ([30], [1], [32 ], [15], [8], [34]). W e pr esen t an algebr aic metho d for com- puting b et w eenness cen tralit y of all no d es in a net wo rk. F or unw eigh ted graphs, our algorithm runs in time O ( n ω D iam ( G )), where ω < 2 . 376 is the exp onent o f matrix multi p licatio n and D iam ( G ) is the diameter of the graph. F or w eigh ted graphs with in teger w eigh ts take n from 2 the range { 1 , 2 , . . . , M } , w e present an algorithm that r uns in time O ( M n ω D iam ( G )). As in [4], our time b ounds are tru e in the mo del w here all arithmetic op erations (indep endent of size of the n umb ers) tak e unit time and num b ers use un it space. Recen t observ ations, on real- world graph ev olution, suc h as d ensiﬁcation and shrinking di- ameters [23], m ak e our algorithms v ery r elev an t to the rea l-world g r aphs. W e presen t a r andomize d p ar al lel algorithm for compu ting b et wee n ness centralit y of all no des in a net work. O ur ap p roac h is based on the r andomiz ed parallel S S S P algo rithm for u n weig hted graph s is giv en by Ullman and Y ann ak akis [33]. W e compute the b et ween- ness in t wo stages (which w e call the for- w ard pass and th e b ac kw ard pass). Our algo- rithm for forward pass runs in O ( n ) time u s- ing O ( m log n ) pro cessors for unw eigh ted graph s and O ( n log 2 n log M ) time u s ing O ( m ) pro ces- sors for weigh ted graphs with in teger we ights tak en from the ran ge { 1 , 2 , . . . , M } . Ou r bac k- w ard pass algorithm ru ns in O ( n 2 ) time us- ing O ( n ) pro cessors for b oth weigh ted and un- w eigh ted graphs. F or b ounded-degree graph s, w e pr esen t an optimal b ac kw ard pass algorithm that r uns in O ( n log m ) time using O ( m ) pr o- cessors for u n weig hted graphs and O ( M n log m ) time using O ( m ) pro cessors for w eigh ted graphs. 2 Lo w er Boun ds Deﬁnition 2.1. A P a th -comp ar ison based Algorithm [11 ] : A Path-c omp arison b ase d A l- gorithm A ac c epts as input a gr aph G and a weight function. The algorithm A c an p erform al l standar d op er ations. However, the only way it c an ac c e ss the e dge weights is to c omp ar e the weights of two diﬀer ent p aths. Karger, Koller and Phillips [11] established that Ω( n 3 ) is a lo w er b ound on the complex- it y of an y path-comparison b ased algorithm for the all-pairs shortest path problem on a graph with Θ( n 2 ) ed ges. They conjectured that simi- lar low er b oun ds hold f or u n directed graph s also. W e use their construction to d er ive lo we r b ounds on computing b et w eenness in dir ected graphs . F or the details of the construction w e refer the reader to [11]. The graph G , they constructed, is a dir ected tripartite graph on v ertices u i , v j and w k where i , j and k range from 0 to n − 1. T he edge set for G is { ( u i , v j ) } ∪ { ( v j , w k ) } . Therefore, the only paths are individu al edges and p aths ( u i , v j , w k ) of length t wo . A weig ht function W is prop erly chosen so that the u nique shortest path from u i to w k go es through v 0 . Note that the b et w eenness of the no de v 0 is n 2 . Let A b e an y path-comparison-based algorithm. Consider giving ( G, W ) as input to A , and sup p ose that A run s correctly . It must therefore o u tput n 2 as the b etw eenness of v 0 based on the set of optimal paths L . Supp ose further that a particular p ath p ∗ = ( u i ∗ , v j ∗ , w k ∗ ) wa s neve r one of the op eran d s in any comparison op eration whic h A p erformed. The weig ht f u nction can b e su itably mo diﬁ ed (as in [11]) to W ′ in w hic h p ∗ is the uniqu e shortest path from u i ∗ to w k ∗ , but the ord ering by wei ght of all the other p aths remains the same. Note that the central ity of v 0 decreases with th e new w eigh t fun ction W ′ . If w e run A on ( G, W ′ ), all path comparisons not in vo lving p ∗ giv e the same result as they did u sing W . T herefore, s ince A nev er p erform ed a comparison in volving p ∗ while runn in g on W , we dedu ce th at it still outpu ts n 2 , whic h is no w incorrect. The follo wing theorem is immediate . Theorem 2.2. Ther e exists a dir e cte d gr aph of 3 n vertic es on which any p ath-c omp arison b ase d algorithm for b etwe enness must p erform at le ast n 3 / 2 p ath weight c omp arisons. A similar argument can b e used to show an Ω( nm ) lo wer b oun d on graphs of m edges. As- sume without loss of generalit y that m ≥ 4 n and that 2 n divid es m . W e p erform the same con- struction, bu t of the m id dle v ertices w e use only v 1 , . . . , v m/ 2 n , connecting eac h of them to a ll the v ertices u i and w k . This requir es m edges and creates mn/ 2 paths. 3 Theorem 2.3. Ther e exists a dir e c te d gr aph with 2 n + m/ 2 n vertic es and m e dges, on which any p ath-c omp arison-b ase d algorithm for b etwe enness must p erform at le ast mn/ 2 p ath weight c omp arisons. Conjecture : Comp uting b et we enn ess of a single v ertex is at least as hard as computing b et wee n ness of all v ertices. W e make the follo wing conjecture for comput- ing b et weenness centrali ty in general graphs. If our conjecture is true, then the existing tec h- niques for APSP pro vide lo wer b ound s for com- puting b et w eenness. Conjecture : Computing b et wee n n ess of all v ertices is at least as hard as computing all-pairs shortest distances. 3 An Algebraic Metho d W e d enote matrices by u pp er case letters and the elemen ts of a matrix by the corresp onding lo w er case letter. Recall that A is the adja- cency m atrix of t h e graph . Let 0 n × n b e a n n × n zero-matrix. Let I n × n b e an n × n iden tit y ma- trix. Let D b e an n × n m atrix of distances, i. e., d ij = d ( i, j ). L et D l b e a 0-1 matrix su c h th at ( d l ) ij = 1 iﬀ d ( i, j ) = l . Let Λ b e an n × n ma- trix, w here λ ij is the num b er of shortest paths b et wee n i and j . Let ∆ b e an n × n matrix of dep en d encies, i.e., δ ij = δ i ∗ ( j ). Let ∆ l b e a matrix suc h that ( δ l ) ij is non-zero and equal to δ i ∗ ( j ) iﬀ d ( i, j ) = l . If X and Y are tw o matri- ces, we let X mul t Y ( X div Y ) b e th e matrix obtained by el ement - wise multi p licatio n (divi- sion) of the matrices X and Y . W e let X · Y de- note th e pro d uct of the tw o matrices X and Y , i.e., ( X · Y ) ij = P k x ik y k j . W e call the co mp uta- tion of the distance and the n u m b er of s h ort- est p aths (b et ween all pairs) as the forward pass , sin ce sh ortest p aths are computed using BFS/Dijkstra’s algo rithm . The computation of dep enden cies is called the bac kward pass , since dep enden cies are computed in a b ottom-up fash- ion. In other w ords, the matrices D and Λ are computed in the forward pass and the matrix ∆ is co mp uted in the bac kward pass. 3.1 Un w eighted Graphs 3.1.1 F orward P ass The lengths of all shortest paths can b e com- puted using the follo wing theorem of Seidel [30]. Theorem 3.1. [30] A l l-p airs shortest distanc es for undir e cte d unweighte d gr aphs c an b e c om- pute d in time O ( n ω log( D iam ( G ))) . W e compu te the numb er of shortest paths ( λ ij for all i, j ) u sing the follo wing algorithm : ComputeP athC ount ( A ) Initialize Z to I n × n Initialize Λ to I n × n Initialize Λ pr ev and Λ cur r to 0 n × n for l ← 1 to D iam ( G ) Z ← Z · A for i, j ← 1 to n if ( λ pr ev ) ij > 0 ( λ cur r ) ij ← 0 else ( λ cur r ) ij ← z ij Λ ← Λ + Λ cur r Λ pr ev ← Λ cur r for i ← 1 to n λ ii ← 1 return Λ Corr e ctness : Note that Z = A l after l th iteration of the main for lo op. Let A l = ( a l ij ). I t is easy to see that a l ij equals the num b er of paths (not necessarily shortest) fr om i to j of length exactly 4 l . Note that the least l for w hic h a l ij is non- zero, represen ts the num b er of shor test p aths from i to j , of length exactly l . The ﬁrst time w e encoun ter a n on -zero v alue of a l ij , we s tore the v alue in Λ cur r and ev en tually in Λ. Also, w e mak e su re that these v alues are n ot o v erwritten in th e future iterations. In the end we set all λ ii to 1 by conv en tion. Hence the ab o v e alg orithm correctly computes the num b er of sh ortest paths, for all pairs, in an u n directed un weig hted graph. As a consequence we get the follo wing lemma : Lemma 3.2. Al l- p airs shortest p ath c ounts for undir e cte d unweighte d gr aphs c an b e c ompute d in time O ( n ω D iam ( G )) . 3.1.2 Bac kward P ass Lemma 3.3. If d ( i, j ) = D iam ( G ) , then δ i ∗ ( j ) = δ j ∗ ( i ) = 0 . Henc e ∆ D iam ( G ) = 0 n × n . Lemma 3.4. F or unweighte d gr aphs, if l = D iam ( G ) then ∆ l − 1 = ( D l div Λ) · A . Pr o of. W e ha v e the follo wing cases : Case I : d ( i, j ) = l − 1 n X k =1  ( d l ) ik λ ik  · a k j = X k : a kj =1 , ( d l ) ik =1  ( d l ) ik λ ik  · a k j = X d ( i,k )= l ,a kj =1  ( d l ) ik λ ik  · a k j = X k : j ∈ P i ( k )  ( d l ) ik λ ik  · a k j = X k : j ∈ P i ( k )  1 λ ik  = X k : j ∈ P i ( k )  1 λ ik  (1 + δ i ∗ ( k )) = δ i ∗ ( j ) Note th at w e ha ve used th e fact that, if d ( i, k ) = l = D iam ( G ) then δ i ∗ ( k ) = 0. Case I I : d ( i, j ) < l − 1 n X k =1  ( d l ) ik λ ik  · a k j = X k : a kj =1 , ( d l ) ik =1 ( d l ) ik λ ik = X k : a kj =1 ,d ( i,k )= l ( d l ) ik λ ik = 0 Since if d ( i, j ) < l − 1, ∄ k su ch that d ( i, k ) = l and a k j = 1. Case I I I : d ( i, j ) = l In this case, it is easy to see that n X k =1  ( d l ) ik λ ik  · a k j = 0. Lemma 3.5. F or unweighte d gr aphs if l < D iam ( G ) then ∆ l − 1 = (( D l + ∆ l ) div (Λ)) · A . Pr o of. This can b e pro ved b y induction u sing the previous lemma as the base case, and the argu- men t is similar to th e pr o of for un weig hted trees. In addition we use the fact that shortest path trees h av e no cross edges (i.e., all the edges of BFS tr ee join vertice s of lev els that diﬀer at most b y one). Hence, the dep endencies computed at distance l − 1 u s es only the dep en d encies at d is- tance l . ComputeDep endency ( A, D , Λ) Initialize ∆ to 0 n × n Initialize ∆ D iam ( G ) to 0 n × n for l ← D iam ( G ) to 1 Construct a 0 -1 matrix D l , suc h that ( d l ) ij = 1 iﬀ d ( i, j ) = l . ∆ l − 1 ← (( D l + ∆ l ) di v (Λ)) · A ∆ l − 1 ← Mask (∆ l − 1 , l − 1) ∆ l − 1 ← ∆ l − 1 mul t Λ ∆ ← ∆ + ∆ l − 1 return ∆ Mask ( X, l ) for all 0 ≤ i, j ≤ n 5 if d ( i , j ) 6 = l x ij ← 0. return X F rom th e previous lemma, it is easy to see that the ab ov e algorithm ru ns in O ( n ω D iam ( G )) using O ( n 2 ) sp ace. Once the d ep endencies are computed, the cen tralit y of eac h no d e can b e computed by adding the corresp onding dep en- dencies, in O ( n 2 ) time. Theorem 3.6. The b etwe enness of al l v e rtic e s of an undir e cte d unweighte d gr aph G , c an b e c om- pute d in time O ( n ω D iam ( G )) . 3.2 W eigh ted Graphs 3.2.1 F orward P ass W e mak e use of a well-kno wn redu ction from APSP to the compu tation of the distanc e pr o d- uct (also kno wn as the min- plus pr o duct ) of tw o n × n matrices. Deﬁnition 3.7. Dist anc e Products : L et X , Y b e n × n matric es. The distanc e pr o duct of X and Y , denote d X ⋆ Y , is an n × n matrix Z such that z ij = min n k =1 { x ik + y k j } , f or 1 ≤ i, j ≤ n . It is w ell-kno wn that the distance p ro duct of t wo n × n matrices, wh ose element s are tak en from the set {− M , . . . , 0 , . . . , M } ∪ { + ∞} , can b e computed in time O ( M n ω ). Com binin g the distance pro ducts w ith our observ ations for un- w eigh ted grap h s we get the follo wing theorem. Theorem 3.8. Al l-p airs shortest distanc es and numb er of shortest p aths for undir e cte d weighte d gr aphs w ith inte ger weig hts taken fr om { 1 , 2 , . . . , M } c an b e c ompute d in time O ( M n ω D iam ( G )) . The lengths of all sh ortest p aths can also b e computed by th e f ollo wing th eorem of Alon, Galil, Marga lit [1]. Theorem 3.9. [1] Al l-p airs shortest distanc es for undir e c te d weighte d gr aphs with inte g e r weights taken fr om { 1 , 2 , . . . , M } c an b e c om- pute d in time ˜ O ( M n ω ) . 3.2.2 Bac kward Pass Let D , D l , ∆, ∆ l , Λ b e the matrices as deﬁned earlier. Let A ∗ b e a 0-1 matrix with a ∗ ij = 1 iﬀ w ( i, j ) = d ( i, j ). In other w ords, a ∗ ij = 1 iﬀ th e edge ( i, j ) participates in the shortest paths. Theorem 3.10. ComputeDep endency c orr e ctly c omputes the dep endencies in a weighte d gr aph with inte ger weights taken fr om { 1 , 2 , . . . , M } i n time O ( M n ω D iam ( G )) . Pr o of. F ollo ws from the correctness of the algo- rithm for unw eigh ted graphs. 4 A Randomized P arallel Algo- rithm W e assume a mo del of parallel compu tation called OR CR CW PRAM [3], in which multiple pro cessors c an sim ultaneously read and write to a sh ared memory . I f multiple pro cessors attempt to write multiple v alues to a sin gle location, the v alue written is the bit wise OR of the v alues. The most e lementa r y parallel SSS P algorithm is p ar al lel br e ad th-ﬁrst se ar ch , in whic h the no d es are visited lev el by lev el as the search progresses. Lev el 0 consists of the source. Th e problem w ith this appr oac h is that the time requ ired gro ws linearly with the n umb er of lev els tra v ersed. T o k eep the time small Ullman and Y annak akis [33] use k -limited searc h. The size of a path is the num b er of no des in the path and th e minimum p ath-size is the sh ortest distance measured in n u mb er of no des tra ve r s ed. A k- limite d shortest p ath from s to t is a path from s to t that is n o longer than an y s -to- t path of size at most k . T o ﬁnd k -limited sh ortest paths in un we ighte d graphs we can run k iterations of parallel BFS. W e call this k - limite d br e adth-ﬁrst 6 se ar ch . The work required b y a p arallel algo- rithm is deﬁned to b e th e pro du ct of time and n umb er of pro cessors required; this corresp onds to the time that would b e required if the parallel pro cessors were all sim ulated by a s in gle pro ces- sor. If the w or k of a parallel algorithm is equal to the time r equired by a sequ en tial algorithm f or the same problem, then the p arallel algorithm is said to b e optimal . In the follo wing sections, we present parallel algorithms f or the forwa rd and b ac kw ard passes. F orward pass consists of computating the dis- tance and the num b er of sh ortest paths (b et w een all pairs). Bac kwa rd pass in v olve s computing the d ep endencies. Once th e dep endencies are kno wn, to compute the b et wee n n ess v alue of a v ertex v , we can simply compute the sum of all the dep end encies for eac h vertex. This can b e done in time O ( n log n ) time using O ( n ) pro ces- sors. 4.1 F orw ard P ass 4.1.1 Un weigh ted Graphs Ullman and Y an n ak akis’s algorithm [33] for p ar- allel BFS, uses k -limited searc h u sing random sampling of disting uis hed vertic es based on the follo wing well kno wn observ ation (see, e.g., Greene and K nuth [17 ]). Th eir algorithm u ses ab out √ n log n d istinguished n o d es, and th er e- fore needs to search forwa rd for ab out √ n dis- tance from eac h distinguished no d e. O ur algo- rithm for paralleliz ing the forw ard pass is based on their te chnique. Theorem 4.1. [17] Given a p ath of length k in a gr aph, a r andom sample of n l og n k vertic es wil l have at le ast one ve rtex b elonging to the p ath with pr ob ability 1 − 1 n c . Theorem 4.2. With high pr ob ability, Algo- rithm 1 c omputes c orr e ctly the shortest p aths fr om the sour c e s to al l the other no des in V . The p ar al lel glob al time O ( √ n ) using m log n pr o c es- sors. Pr o of. Giv en any v ∈ V , let P v b e an arb itrary shortest path from s to v . F rom Theorem 5.1, with high p robabilit y , e ac h subpath of P v of size √ n con tains at least a no de x ∈ S . Hence, P v can b e seen as a sequence of subpaths of size not larger than √ n , whose extremal no d es b elonging to S (except for the last no de v ). Su c h subpaths are computed in the √ n -limited searc h in S tep 2. Thus, the sh ortest path from s to the last S - v ertex x in P v is correctly compu ted in S tep 4 and the shortest p ath from the latter to no d e v is correctly computed in Step 2. The √ n -limited searc h, in S tep 2, can b e p erform ed in O ( √ n ) time us in g usin g m log n p ro cessors. Th e total w ork of Step 4 is O (( √ n ) 3 log n ) and can b e done in O ( √ n ) u sing m log n pr o cessors. Correctness of the num b er of sh ortest paths follo ws. Since w e n eed the distances and n umb er of shortest paths b et w een al l p airs of vertice s, w e can simply run the ab o ve algo r ith m for n times, once for eac h source v ertex. T his ap- proac h duplicates many computations. S ince we c ho ose Θ ( √ n log n ) distinguished n o des, we can compute the shortest path distances f rom eac h of these distinguish ed no des (treating them as source no des), with a sing l e ru n of Algorithm 1 . The follo wing theorem states that we n eed to run the algorithm for only O ( √ n ) times. This results in an optimal parallel algorithm (modulo log-fact ors) for the f orw ard pass. Theorem 4.3. With high pr ob ability, Algo- rithm 1 is run only O ( √ n ) times to c omp ute all -p airs shortest distan c es and numb er of short- est p ath s. Pr o of. Let u s say , we run the Algorithm 1 independentl y for k times. Eac h time the al- gorithm pic ks √ n log n v ertices. Then th e proba- bilit y t h at a v ertex v ∈ V is not pic k ed in any of these it erations is giv en by Pr [ v not picked ] =  1 − √ nl og n n  k < e − k √ n · log n n 7 Cho osing k = c √ n , for some constan t c > 0, w e get Pr [ v not pick ed ] < e − c √ n √ n · log n n = e − clog n = e − c ′ lnn = 1 n c ′ Hence the prob ab ility that a v ertex v ∈ V is not pick ed in an y of the O ( √ n ) iterations is very small, in v erse p olynomial in n . Theorem 4.4. With high pr ob ability, we c an c ompute the D and Λ matr ic es for an unw eig hted gr aph i n O ( n ) time u si ng O ( m log n ) pr o c essors. Algorithm 1 : Input : An undirected graph G ( V , E ), a source s ∈ V . Output : d ( s, v ) and λ sv for all v ∈ V . 1. Ch o ose uniformly at r an d om a subset S of V , to gether with s ; the s ize of S must b e Θ( √ n log n ). 2. F rom an y x ∈ S p erform, in parallel, a √ n - limited searc h, generating the shortest path P ′ x,v from x to ev ery no de v ∈ V . 3. An auxiliary weight e d graph H is computed on the vertex set S , where the weigh t of an edge is deﬁn ed to b e the length computed b y the previous √ n -limited searc h. 4. Compu te the al l-p airs sh ortest p aths P x,y in H , with no-limited searc h . 5. Th e shortest distance d ( s, v ) = | P v | , f r om s to a no d e v ∈ V , is computed in the follo wing wa y: P v ≡ P s,min [ P min,v where min is a v ertex in H for whic h: | P s,min | + | P ′ min,v | = min x ∈ H {| P s,x | + | P ′ x,v |} 6. Th e n umb er of shortest paths λ sv , can b e computed by coun ting the n u mb er of suc h min nod es. 4.1.2 W eighted Graphs Ullman and Y annak akis’s approac h cannot b e di- rectly applied to we ighted graph s, ind eed there is no apparent w ay to p erform eﬃcientl y the √ n - limited search, esp ecially w hen the w eigh ts are large. O n the other hand, it is easy to ver- ify that the remainin g steps of Algorithm 1 w orks also for w eigh ted graphs, thus the cru- cial pr oblem is to ﬁnd a weig hte d v ersion of th e √ n -limited search. A useful metho d for solv- ing optimization problems w hic h in v olv e n umer- ical inpu ts is to uniformly sh rink all weigh ts; but this, in itself, is not suﬃ cien t since the searc h is strongly based on the f act that weigh ts are in tegers. Klein and S ubramanian [21] pro- p osed a √ n -limited search for w eighte d graphs whic h uses the in teger shrinkin g together with the well-e stablish ed tec hnique, due to Ragha- v an and Thompson [27], for roun ding weig hts without c hanging their su ms “too m uch”. The k ey idea is that a non-in tegral v alue is rounded up or do wn according to a pr obabilit y function whic h reﬂects ho w close the v alue is to the next higher inte ger and next lo we r one. By applying this approac h to the basic tec hniques of Ullman and Y annak akis, Klein and Subr amanian pr o- vided a randomized parallel algorithm for SS SP in w eigh ted graphs. Their algorithm runs in O ( √ n log 2 n log M ) time and using O ( m ) pro ces- sors to compu te an SS SP tree. W e enhance their algorithm to compute al l -pairs shortest paths (and num b er of shortest p aths). The mo diﬁca- tions needed are similar to th ose presen ted in the previous section. W e menti on our main theorem here. Theorem 4.5. With high pr ob ability, we c an c omp ute the D and Λ matric es for a w eig hted gr ap h, with inte ger weights taken fr om th e r ange 8 { 1 , 2 , . . . , M } , in O ( n log 2 n log M ) time using O ( m ) pr o c essors. 4.2 Bac kwa rd P ass 4.2.1 General Graphs After the forw ard pass is p erform ed, w e ma y as- sume that the matrices D and Λ are a v ailable in th e shared memory . The follo wing algorithm computes the b etwe enness c entr alities (without actually computing the d ep endencies) in O ( n 2 ) time using O ( n ) processors. Algorithm 2 : Input : D and Λ matrices. Output : Bet w eenness centralit y ( B C ( v )) of all v ertices. Let n p ro cessors represen t the v ertices. Eac h pro cessor main tains a run ning cen tralit y score B C ( v ), initialized to ze ro F or eac h pair of vertice s s, t ∈ V , pro cessor v ( v 6 = s 6 = t ) do es th e follo w ing : if d ( s, t ) = d ( s, v ) + d ( v , t ) B C ( v ) + = λ sv · λ vt λ st else B C ( v ) + = 0 4.2.2 Bounded Degree Graphs In this sec tion w e pr esen t a f aster paralle l algorithm for bac kward pass in b ounde d-de gr e e graphs. Bac kward pass in vo lves computing the dep enden cies (i.e ., computing the matrix ∆). Recall the foll owing lemma. Lemma 3.3 : If d ( i, j ) = D iam ( G ), then δ i ∗ ( j ) = δ j ∗ ( i ) = 0. Brandes’s th eorem ( The or em 1.1 ) state s that the dep enden cies of the clos er v ertices can b e computed f rom the dep endencies of th e f ar ther v ertices. The follo wing alg orithm ( Comput- eDep endency ) uses this fact (and a small tric k) to compute the dep enden cies in parallel. Th e main ide a b ehind th e algorithm is to compute dep enden cies of pairs of v ertices (taking a maxi- m um of n/ 2 p airs) which are a t distance d . Dis- tance d is decreased from n to 1. ComputeDep endency ( A, D , Λ) F or d ← n to 1 Let V d = { v ∈ V : ∃ u ∈ V w ith d ( u, v ) = d } While | V d | 6 = 0 Select a m axim um of n/ 2 p airs of v ertices (with no tw o pairs h a ving a common v ertex) from V d suc h that eac h pair is at a distance d from eac h other. Let V ′ d b e suc h a set. V d ← V d \ V ′ d ∆ = ParallelCompute ( A, D , Λ , V ′ d ) return ∆ Corr e ctness : P arallelCompute computes the dep enden cies of (at most n / 2 pairs of ) ve rtices (suc h th at eac h pair of ve r tices a re at a distance of d f rom eac h other) in parallel. This can b e done in O (log k ) time, since th is inv olv es com- puting sum of k v alues. W h en there are mul- tiple vertic es at distance d (from a v ertex v ) the algo r ith m is rep eated until all the pairs’s de- p endencies are ca lculated. Note that there can b e at m ost O ( maxdeg ( G )) s u c h no des, wh er e maxdeg ( G ) is the maximum degree of an y vertex in th e graph . Hence ParallelCompute tak es O ( maxdeg ( G ) log k ) = O (log m ) time (since we are in terested in b ounde d-de gr e e graphs). S ince there are at most n diﬀerent distances and O ( n 2 ) pairs of dep endencies to b e compu ted, Parallel- Compute is called at most O ( n ) times. The constan t in O ( n ) dep ends on the distribution of ( n p ossible) distances among the O ( n 2 ) pairs of v ertices. Note th at th is giv es an optimal algo- rithm. P arallelCompute ( A, D , Λ , V ′ d ) Let m p ro cessors repr esen t the edge s. F or eac h p air u, v ∈ V ′ d (suc h that d ( u, v ) = d ) do the follo wing in par al l el ◦ Let w 1 , w 2 , w 3 , . . . , w k b e the vertice s s uc h that v ∈ P u ( w i ). 9 ◦ The pr o cessor r epresen ting edge ( v , w i ) calculate s 1 λ uw i (1 + δ u ∗ ( w i )). ◦ The k pro cessors (representing the e d ges ( v , w i )) compute the su m k X i =1 (1 + δ u ∗ ( w i )). ◦ This sum is m ultiplied by λ uv and stored in the shared memory as δ uv . ◦ Compute δ vu similarly . ◦ I f there are m ultiple v ertices at distance d fr om v then rep eat the algorithm Parallel Compute for th e remaining pairs of vertices. return ∆ Theorem 4.6. The dep endencies in an unw eig hted gr aph c an b e c ompute d in O ( n log m ) time using O ( m ) pr o c essors. F or w eight ed graphs with intege r weigh ts tak en from th e range { 1 , 2 , . . . , M } , the distances v ary fr om n M to 1. Theorem 4.7. The d ep endencies in a w eig hted gr ap h with inte ger weights taken fr om the r ange { 1 , 2 , . . . , M } , c an b e c ompute d i n O ( M n log m ) time using O ( m ) pr o c essors. 5 Op en Prob lems 1. Is there an algorithm to compute (exactly or app ro ximately) th e b et w eenness of all (or ev en top k ) vertic es in sub-cubic (or o ( mn )) time ? 2. Sin ce th e net wo rks of in terest are huge and dynamic, it is exp ens iv e to recompute b et wee n ness f or eve ry addition/deletion of edge. Is there a fully dynamic algorithm to mainta in b et we enn ess in O ( n 2 ) amortized time p er up d ate (edge in sertion or deletion), using only O ( n 2 ) space. Here, it is crucial to observe that b et w eenness cent rality of al l v ertices can b e c hanged b y deleting (hence adding) a single edge to the graph. F or ex- ample, let C 4 k + 1 b e a cycle on 4 k + 1 v er- tices. Th e centralit y of an y v ertex in C 4 k + 1 is k 2 . Remo ving a n edge from C 4 k + 1 results in a path P 4 k + 1 on 4 k + 1 vertic es. Bet w een- ness of v ertices of P 4 k + 1 are 0 , 4 k − 1 , 2(4 k − 2) , . . . , 4 k 2 , . . . , 2(4 k − 2) , 4 k − 1 , 0. 3. Bet wee n n ess centrali ty implicitly assumes that comm unications in the net work use shortest paths. Sh ortest paths are sen s i- tiv e to lo c al change s (addition/deletion of edges). One p ossible w a y to add ress this is- sue is to consid er δ -stretch paths, instead of shortest paths [7]. A δ -stretc h path is a p ath from s to t of lengt h ≤ (1 + δ ) d ( s, t ). What is the complexit y o f computing b et weenness based on δ -stretc h p aths ? 4. Ou r co n j ectures men tioned in Section 2 are op en. Ac knowled gemen t s This pro ject is fund ed by AR C (Algorithms and Randomness C en ter) of the C ollege o f Computing at Georgia Institute of T ec hn ology . References [1] N. Alon, Z. Galil, and O. Marga lit. On the exp onent of the all-pairs sh ortest path prob- lem. J. Compu. Syst. Sci , 54:255 –262, 199 7. [2] J. M. Anthonisse. The r ush in a directed graph. In T e chnic al R ep ort BN 9/71, Sticht- ing Mathematisch Centrum , Amsterdam, 1971. [3] H. Bast, M. Dietzfelbinger, and T. Hagerup. A p erfect parallel dictionary . In 17th Sym- p osium on Mathemat ic al F oundations of Computer Scienc e , 1992. [4] U. Brandes. A faster algorithm for b etw een- ness cen tralit y . J. Mathematic al So ciolo gy , 25(2): 163177, 2001. 10 [5] U. Brandes and C. P ic h. Cen tralit y estima- tion in large net works. T o app ear in Intl. Journal of Bifur cation and Chaos, Sp ecial Issue on Complex Net w orks’ S tructure and Dynamics, 200 7. [6] N. Buc kley and M. v an Alst yne. Do es email mak e white collar w orkers more pro ductive? T ec hnical report, Universit y o f Mic higan, 2004. [7] T. Carp en ter, G. Karak ostas, and D. Shall- cross. Practica l issues and algorithms for analyzing terrorist netw orks. Invite d p ap er at WMC , 2002. [8] T. M. Chan. More algorithms for all-pairs shortest paths in w eight ed graphs. In Pr o c . STOC , 20 07. [9] D. C isic, B. Kesic, and L. Jak omin. Re- searc h of the p o wer in the supply c hain. In- ternational T rade, Economics W orking Pa- p er Arc hiv e EconWP A, April 20 00. [10] T. Coﬀman, S. Green blatt, and S. Mar- cus. Graph-based tec hnologies for in tel- ligence an alysis. Communic ations of th e ACM , 47(3):4 5–47, 2004. [11] S . J. Phillips D. R. Karger, D. Koller. Find- ing the hidd en path : time b ound s for all- pairs shortest paths. SIAM Journal on Computing , 22:11 99–1217, 1993. [12] A. del Sol, H. F ujihashi, and P . O’Meara. T op ology of small-wo r ld net works of protein-protein complex structures. Bioin- formatics , 21(8 ):1311–13 15, 2005. [13] D. Eppstein a n d J. W ang. F ast appro xima- tion of centralit y . Journal of Gr aph Algo - rithms and A pplic ations , 8(1):39–45 , 200 4. [14] G. Erk an and D. R. Radev. Lexrank: Graph-based cen tralit y as salience in text summarization. Journal of Artiﬁcial Intel- ligenc e R e se ar ch (JAIR) , 22:457– 479, 2004. [15] M. L. F redman. New b ound s on the com- plexit y of the sh ortest path problem. SIAM J. Comput. , 5:49–6 0, 1976. [16] L. C. F reeman. A set of measur es of cen- tralit y b ased on b et w eenness. So ci ometry , 40(1): 35–41, 1977. [17] D. H. Greene and D. E. K n uth. Math- ematics for the analysis of alg orithms . Birkhauser, B oston , 1982. [18] R. Gu imer` a, S . Mossa, A. T urtschi, and L.A.N. Amaral. The worldwide air trans- p ortation n et w ork: Anomalous central ity , comm unity structure, and cities’ global roles. 102(22):779 4–7799, 2005. [19] P . Hage and F. Harary . Eccen tricit y and cen tralit y in n et w orks. So cial Networks , 17:57– 63, 1995. [20] H. Jeong, S.P . Mason, A.-L. Barab´ asi, and Z.N. Oltv ai. Lethalit y and cen tralit y in pr o- tein net w orks. Natur e , 411:4 1–42, 2001. [21] P . N. Klein and S. Su bramanian. A ran- domized parallel algo rithm for single-source shortest-paths. In Pr o c. of the 24th Annual ACM-STOC , pages 750–7 58, 1992. [22] V.E. Krebs. Mapping net works of terrorist cells. Conne ctions , 24(3) :43–52, 2002. [23] J. Lesko v ec, J. Klein b erg, and C. F alout- sos. Graph evo lution: Densiﬁcation and shrinking diameters. ACM T r ansa ctions on Know le dge Disc overy fr om D ata (ACM TKDD) , 1(1) , 2007. [24] F. Liljeros, C.R. Edling, L.A.N. Am aral, H.E. Stanley , and Y. ˚ Ab erg. The we b of h u- man sexual conta cts. Natur e , 411:907–9 08, 2001. [25] M. E. J. Newman and M. Girv an. Findin g and ev aluat ing communit y structur e in net- w orks. Phys. R ev. E , 69, 026113, 2004. 11 [26] J.W. Pin ney , G.A. McConk ey , and D.R. W esthead. Decomp osition of biologica l net- w orks using betw eenness c entralit y . In Pr o c. 9th Ann. Int’l Conf. o n R ese ar ch in Computational Mole c ular Bi olo gy (RE- COMB 2005) , Cam bridge, MA, May 2005. P oster session. [27] P . Ragha v an and C.D. Thomp son. Prov- ably go o d routing in graph s : regular arra ys. In Pr o c . of the 17th Annual ACM-STOC , pages 79– 87, 1985. [28] A. H. Rustam. Ep idemic net wo r k and cen- tralit y . Master Thesis, University of Oslo , Ma y 2006 . [29] G. Sabidussi. T h e cen tralit y index of a graph. Psychometrika , 31:581 –603, 1966. [30] R. S eidel. On th e all-pairs-shortest-path problem. In Pr o c. of STOC , 19 92. [31] A. S him b el. Stru ctural parameters of com- m un icatio n net works. Bul letin of Mathe- matic al Biophysics , 15:501– 507, 1953. [32] A. Shoshan and U. Zwic k. All-pairs short- est paths in undirected graphs with in teger w eigh ts. P r o c. of 40th FOCS , pages 605– 614, 199 9. [33] J. D. Ullman and M. Y annak akis. High probabilit y parallel transitiv e closure algo- rithms. SIAM J. of Computing , 20:100–1 25, 1991. [34] U. Zwic k. Exact and appro ximate distances in graphs - a survey . In Pr o c. of 9th ESA , pages 33– 48, 2001. 12

Betweenness Centrality : Algorithms and Lower Bounds

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment