Graph Sparsification by Effective Resistances

Graph Sparsiﬁcation b y Eﬀectiv e Resi stances ∗ Daniel A. Spielman Program in Applied Mathematics and Departmen t o f Computer Science Y ale Univ ersit y Nikhil Sriv astav a Departmen t o f Computer Science Y ale Univ ersit y No vem b er 26, 202 4 Abstract W e present a nearly-linear time algo rithm that pro duces high-qualit y spectral sparsiﬁers of weigh ted graphs. Given as input a weight ed graph G = ( V , E , w ) and a parameter ǫ > 0, we pro duce a weigh ted subgraph H = ( V , ˜ E , ˜ w ) of G suc h that | ˜ E | = O ( n log n/ǫ 2 ) and for all vectors x ∈ R V (1 − ǫ ) X uv ∈ E ( x ( u ) − x ( v )) 2 w uv ≤ X uv ∈ ˜ E ( x ( u ) − x ( v )) 2 ˜ w uv ≤ (1 + ǫ ) X uv ∈ E ( x ( u ) − x ( v )) 2 w uv . (1) This impro ves up on the spectral sparsiﬁers co ns tructed by Spielm an and T eng, which had O ( n log c n ) edges for some large constant c , and up on the cut s pa rsiﬁers of Bencz´ ur and Karger, which only satisﬁed (1) for x ∈ { 0 , 1 } V . A key ingredien t in our a lg orithm is a subro utine of indep enden t in ter est: a nearly-linear time algorithm that builds a data structure from which w e can q uer y the approximate eﬀective resistance b etw een any t wo v ertices in a graph in O (log n ) time. 1 In tro duction The goal of sparsiﬁcation is to approxima te a giv en graph G by a sparse graph H on the same set of v ertices. If H is close to G in some app ropriate metric, then H can b e used as a pro xy for G in computations without introducing to o m uch error. A t the same time, since H has ve ry few edges, computation with and storage of H should b e c heap er. W e stud y the notion of sp ectral sparsiﬁcation in tro duced by Spielman and T eng [25]. Sp ectral sparsiﬁcation was insp ired by the noti on of cut sparisiﬁcat ion in tro duced b y Bencz´ ur and Karger [5] to accelerat e cut algorithms whose ru n ning time dep ends on the num b er of edges. They ga ve a nearly-linear time pr o cedure whic h tak es a graph G on n v ertices with m edges and a parameter ǫ > 0, and outputs a weigh ted subgraph H with O ( n log n/ǫ 2 ) edges such that the weig h t of ev ery cut in H is within a factor of (1 ± ǫ ) of its weigh t in G . This w as used to turn Goldb erg and T arjan’s e O ( mn ) max-ﬂo w algorithm [16] into an e O ( n 2 ) algorithm for appro ximate st -min cu t, and ∗ This material is based upon work supp orted by the National S cience F oundation under Gran ts No. CCF-070752 2 and CCF -0634957. An y opinio ns, ﬁndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessari ly reﬂect the views o f the National Scie nce F oun dation. 1 app eared more recen tly as the ﬁrs t step of an e O ( n 3 / 2 + m )- time O (log 2 n ) appro ximation algorithm for sparsest cut [19]. The cut-preserving guaran tee of [5] is equiv alent to satisfying (1) for all x ∈ { 0 , 1 } n , which are the c h aracteristic v ectors of cuts. Spielman and T eng [23, 25] devised stronger s p arsiﬁers whic h extend (1) to all x ∈ R n , but h a ve O ( n log c n ) ed ges for some large constan t c . They used these sparsiﬁers to construct preconditioners for symmetric diagonally-dominan t matrice s, w hic h led to the ﬁrst nearly-linear time s olv ers for suc h systems of equations. In this w ork, we construct sparsiﬁers that ac hiev e the same guaran tee as Spielman and T eng’s but with O ( n log n/ǫ 2 ) edges, th us impro ving on b oth [5] and [23]. Our sparsiﬁers are subgraphs of the original graph and can b e computed in e O ( m ) time b y random sampling, where the sampling probabilities are giv en by th e eﬀectiv e resistances of the edges. While this is conceptually muc h simpler than the recursive partitioning approac h of [23], w e need to s olv e O (log n ) linear syste ms to compute the eﬀe ctiv e resistances quic kly , and w e do this using Spielman and T eng’s linear e quation solv er. 1.1 Our Results Our main id ea is to include eac h edge of G in the sparsiﬁer H with p r obabilit y pr op ortional to its eﬀectiv e resistance. The eﬀec tiv e resistance of an edge is kno wn to b e equal to the probabilit y that the edge app ears in a random spanning tree of G (see, e.g., [9] or [6]), and wa s prov en in [7] to b e prop ortional to the co mm ute time b et w een the endp oin ts of the edge. W e sho w h o w to appro ximate the eﬀec tiv e resistances of edges in G quic kly and pro ve that samp ling according to these approxi mate v alues yields a go o d sparsiﬁer. T o d eﬁne eﬀectiv e resistance, iden tify G = ( V , E , w ) with an electrical net work on n n o des in whic h eac h edge e corresp ond s to a link of conductance w e (i.e., a r esistor of resistance 1 / w e ). T h en the eﬀectiv e resistance R e across an edge e is the p otent ial d iﬀerence induced ac ross it when a unit current is injecte d at one end of e and extracted at the other end of e . Our algorithm can n o w b e stated as follo ws . H = Sparsify ( G, q ) Cho ose a rand om edge e of G with probabilit y p e prop ortional to w e R e , and add e to H with w eigh t w e /q p e . T ake q samples ind ep endentl y with replacemen t, summing w eight s if an edge is chose n m ore than once. Recall that the L aplacian of a weigh ted graph is giv en b y L = D − A where A is the w eigh ted adjacency matrix ( a ij ) = w ij and D is the diago nal matrix ( d ii ) = P j 6 = i w ij of we igh ted degrees. Notice that the quadratic form asso ciated with L is just x T Lx = P uv ∈ E ( x ( u ) − x ( v )) 2 w uv . Let L b e the Laplacian of G and let ˜ L b e the Laplacian of H . Our main theorem is that if q is suﬃcien tly large, then the quadratic forms of L and ˜ L are close. Theorem 1. Supp ose G and H = Sparsify ( G, q ) have L aplacians L and ˜ L r esp e ctively, and 1 / √ n < ǫ ≤ 1 . If q = 9 C 2 n log n/ǫ 2 , wher e C is the c onstant i n L emma 5 and if n is suﬃciently lar ge, then with pr ob ability at le ast 1 / 2 ∀ x ∈ R n (1 − ǫ ) x T Lx ≤ x T ˜ Lx ≤ (1 + ǫ ) x T Lx. (2) 2 Sparsiﬁers that satisfy this condition preserve many prop erties of the graph. The Cour an t- Fisc her Theorem tells us that λ i = max S : dim( S )= k min x ∈ S x T Lx x T x . Th us, if λ 1 , . . . , λ n are the eigen v alues of L and ˜ λ 1 , . . . , ˜ λ n are the eigen v alues of ˜ L , then w e ha v e (1 − ǫ ) λ i ≤ ˜ λ i ≤ (1 + ǫ ) λ i , and the eigenspaces spanned by corresp onding eig en v alues are related. As the eigenv alues of the normalized Laplacian are giv en by λ i = max S : dim( S )= k min x ∈ S x T D − 1 / 2 LD − 1 / 2 x x T x , and are the same as the eigen v alues of th e walk matrix D − 1 L , we obtain the same relationship b et w een the ei gen v alues of the wa lk matrix of the original graph a nd its sparsiﬁer. Many prop erties of graphs and random w alks are kno w n to b e revea led by their sp ectra (see for example [6, 8, 15]). The existence of spars e sub graphs whic h retain these prop erties is in teresting its o wn r igh t; indeed, expander graphs can b e view ed as constan t degree sparsiﬁers for the complete graph. W e remark that the conditio n (2) also implies ∀ x ∈ R n 1 1 + ǫ x T L + x ≤ x T ˜ L + x ≤ 1 1 − ǫ x T L + x, where L + is the ps eu d oin verse of L . Th us sparsiﬁers also app ro ximately preserv e the eﬀectiv e resistances b et w een v ertices, since for v ertices u and v , the eﬀectiv e resistance b et ween them is giv en b y t he formula ( χ u − χ v ) T L + ( χ u − χ v ), where χ u is the e lemen tary unit v ector with a co ord inate 1 in p osition u . W e pro v e Theorem 1 in Secti on 3. A t the end of Section 3, w e pro ve that the sp ectral guaran tee (2) of T heorem 1 is n ot harmed to o muc h if use appro ximate eﬀectiv e resistances for sampling instead of exact ones(Corollary 6). In Section 4, w e sho w ho w to compu te appro ximate eﬀectiv e resistances in nearly-linear time, whic h is essen tially optimal. The tools w e use to d o th is are Spielman and T eng’s nearly-linea r time solv er [23, 24] and the Johnson-Lindenstrauss Lemma [18, 1 ]. Sp eciﬁcally , we p r o ve the follo wing theorem, in whic h R uv denotes the eﬀectiv e resistance b et wee n vertic es u and v . Theorem 2. Ther e is an e O ( m (log r ) /ǫ 2 ) time algo rithm wh ich on input ǫ > 0 and G = ( V , E , w ) with r = w max /w min c omputes a (24 log n/ǫ 2 ) × n ma trix e Z such that with pr ob ability at le ast 1 − 1 /n (1 − ǫ ) R uv ≤ k e Z ( χ u − χ v ) k 2 ≤ (1 + ǫ ) R uv for every p air of vertic es u, v ∈ V . Since e Z ( χ u − χ v ) is simply the diﬀerence of the corresp onding t wo columns of e Z , we can query the appr oximat e eﬀectiv e r esistance b et wee n an y p air of v ertices ( u, v ) in time O (log n /ǫ 2 ), and for all the edges in time O ( m log n /ǫ 2 ). By Corollary 6, this yields an e O ( m (log r ) /ǫ 2 ) time for sparsifying graphs, as adv ertised. In Section 5, we show that H can b e made close to G in some add itional w ays whic h make it more useful for preconditioning systems of linear equations. 3 1.2 Related W ork Batson, Sp ielman, and Sriv asta v a [4 ] ha ve give n a deterministic algorit hm that constru cts sparsiﬁers of size O ( n/ǫ 2 ) in O ( mn 3 /ǫ 2 ) time. While this is too slo w to b e u s efu l in applications, it is optimal in terms of the tradeoﬀ b etw een sparsit y and qualit y of appro ximation and can b e view ed as generalizing expander graphs. Their construction p arallels ours in that i t redu ces the task of sp ectral sparsiﬁcation to appro ximating the matrix Π deﬁn ed in Section 3; ho wev er, their metho d for selecti ng edges is iterativ e and more delicate than the random sampling described in this pap er. In a ddition to the graph sparsiﬁers of [5, 4, 23], there is a la rge b o dy of w ork on sparse [3, 2] and lo w-rank [14, 2, 22, 10, 11] appro ximations for general matrices. The algorithms in this literature pro vide guaran tees of the form k A − ˜ A k 2 ≤ ǫ , wh ere A is the original matrix and ˜ A is obtained b y en tryw ise or columnwise sampling of A . This is analogous to satisfying (1) only for v ectors x in the span of the dominan t eigen vec tors of A ; th us, if we w ere to use these sparsiﬁers on graphs, they w ould only preserv e the large cuts. Int erestingly , our pr o of u ses some of the same machinery as the lo w-rank app r o ximation r esu lt of Rudelson and V ersh ynin [22] — the sampling of edges in our algorithm corresp onds to pic king q = O ( n log n ) columns at random f r om a certain rank ( n − 1) matrix of dimension m × m (this is the matrix Π in tro d u ced in Section 3). The use of e ﬀectiv e resistance as a distance in graphs has recen tly gained atten tion as it is often more useful than the ordinary geodesic distance in a graph. F or example, in sm all-wo rld graphs, all v ertices will b e close to one another, but those with a smaller eﬀect iv e resistance distance are connected b y more short paths. See, for instance [13, 12], whic h use eﬀectiv e resista nce/comm ute time as a distance measure in so cial n et work graphs. 2 Preliminarie s 2.1 The Incidence Matrix and the Laplacia n Let G = ( V , E , w ) b e a co nnected we igh ted und irected graph with n v ertices and m edge s and edge w eigh ts w e > 0. If we orien t the edges of G arbitrarily , we can write its Laplacian as L = B T W B , where B m × n is the signe d e dge-ve rtex incidenc e matrix , given b y B ( e, v ) =    1 if v is e ’s head − 1 if v is e ’s tail 0 otherwise and W m × m is the diagonal matrix with W ( e, e ) = w e . De note the r o w v ectors of B by { b e } e ∈ E and the span of its co lumns by B = im( B ) ⊆ R m (also called the cut sp ac e of G [15]). Note that b T ( u,v ) = ( χ v − χ u ). It is immediate that L is p ositiv e s emideﬁnite since x T Lx = x T B T W B x = k W 1 / 2 B x k 2 2 ≥ 0 for ev ery x ∈ R n . 4 W e also hav e ke r( L ) = k er( W 1 / 2 B ) = span( 1 ), sin ce x T Lx = 0 ⇐ ⇒ k W 1 / 2 B x k 2 2 = 0 ⇐ ⇒ X uv ∈ E w uv ( x ( u ) − x ( v )) 2 = 0 ⇐ ⇒ x ( u ) − x ( v ) = 0 for all edges ( u, v ) ⇐ ⇒ x is constan t, since G is connected. 2.2 The Pseudoin verse Since L is symmetric we can diagonaliz e it and write L = n − 1 X i =1 λ i u i u T i where λ 1 , . . . , λ n − 1 are the nonzero eigen v alues of L and u 1 , . . . , u n − 1 are a corresp onding set of orthonormal eigen vec tors. Th e Mo or e-Penr ose Pseudoinverse of L is then deﬁned as L + = n − 1 X i =1 1 λ i u i u T i . Notice that k er( L ) = k er( L + ) and that LL + = L + L = n − 1 X i =1 u i u T i , whic h is simply the pro jection on to the span of the n onzero eig en v ectors of L (wh ic h are also the eigen v ectors of L + ). Thus, LL + = L + L is the iden tit y on im( L ) = k er( L ) ⊥ = span( 1 ) ⊥ . W e will rely on this fact hea vily in the pro of of Theorem 1. 2.3 Electrical Flo ws Begin b y arbitrarily orien ting the ed ges of G as in Sectio n 2.1. W e will use the same notat ion as [17] t o describ e electrical ﬂo ws on graphs: for a ve ctor i ext ( u ) of current s injected at the vertice s, let i ( e ) b e the curren ts induced in th e edges (in the direction of orien tation) and v ( u ) the p oten tials induced at the vertice s. By K irc h oﬀ ’s current la w, the sum of the current s ent ering a v ertex is equal to the amount injecte d at th e v ertex: B T i = i ext . By Ohm’s la w, the curren t ﬂo w in an edge is equal to the p oten tial diﬀerence across its ends times its conductance: i = W B v . Com b ining these t w o facts, we obtain i ext = B T ( W B v ) = L v . 5 If i ext ⊥ span( 1 ) = k er( L ) — i.e., if the total amoun t of cur ren t injected is equal to the total amoun t extracted — then we can write v = L + i ext b y the deﬁnition of L + in Section 2.2. Recall that the eﬀe ctive r esistanc e b et wee n t w o v ertices u and v is deﬁned as the p oten tial diﬀerence indu ced b et w een them when a unit cur ren t is inj ected at one and extracted at the other. W e will derive an algebraic expr ession for the eﬀectiv e r esistance in terms of L + . T o inject and extract a unit current across the endp oint s of an edge e = ( u, v ), w e set i ext = b T e = ( χ v − χ u ), whic h is clearly orthogonal to 1 . Th e p oten tials ind uced by i ext at the v ertices are giv en b y v = L + b T e ; to measure the p oten tial d iﬀerence across e = ( u, v ), we simply m ultiply b y b e on the left: v ( v ) − v ( u ) = ( χ v − χ u ) T v = b e L + b T e . It follo ws that the eﬀectiv e resistance across e is giv en b y b e L + b T e and that the matrix B L + B T has as its diagonal entrie s B L + B T ( e, e ) = R e . 3 The Main Result W e will pro ve Theorem 1. Cons ider the matrix Π = W 1 / 2 B L + B T W 1 / 2 . Since we kno w B L + B T ( e, e ) = R e , the diagonal entrie s of Π are Π( e, e ) = p W ( e, e ) R e p W ( e, e ) = w e R e . Π has some notable prop erties. Lemma 3 (Pro jection Matrix) . (i) Π is a pr oje ction mat rix. (ii) im(Π) = im( W 1 / 2 B ) = W 1 / 2 B . (iii) The eigenvalues of Π ar e 1 with multiplicity n − 1 and 0 with multiplicity m − n + 1 . (iv) Π( e, e ) = k Π( · , e ) k 2 . Pr o of. T o see (i), observ e that Π 2 = ( W 1 / 2 B L + B T W 1 / 2 )( W 1 / 2 B L + B T W 1 / 2 ) = W 1 / 2 B L + ( B T W B ) L + B T W 1 / 2 = W 1 / 2 B L + LL + B T W 1 / 2 since L = B T W B = W 1 / 2 B L + B T W 1 / 2 since L + L is the identi t y on im( L + ) = Π . F or (ii), we ha ve im(Π) = im( W 1 / 2 B L + B T W 1 / 2 ) ⊆ im( W 1 / 2 B ) . T o see the other inclusion, assume y ∈ im( W 1 / 2 B ). Then w e can c ho ose x ⊥ ker( W 1 / 2 B ) = ker( L ) suc h that W 1 / 2 B x = y . But now Π y = W 1 / 2 B L + B T W 1 / 2 W 1 / 2 B x = W 1 / 2 B L + Lx since B T W B = L = W 1 / 2 B x since L + Lx = x for x ⊥ k er( L ) = y . 6 Th us y ∈ im(Π), as desired. F or (iii), r ecall fr om Section 2.1 th at dim(k er( W 1 / 2 B )) = 1. Consequen tly , dim(im(Π)) = dim(im( W 1 / 2 B )) = n − 1. But since Π 2 = Π, the eigen v alues of Π are all 0 or 1, and as Π p ro jects on to a space of dimension n − 1, it m ust ha v e exactly n − 1 nonzero eigen v alues. (iv) follo ws from Π 2 ( e, e ) = Π( · , e ) T Π( · , e ), since Π is symmetric. T o sho w that H = ( V , ˜ E , ˜ w ) is a go o d sparsiﬁ er for G , we need to sho w that the qu adratic forms x T Lx and x T ˜ Lx are close. W e start b y r educing the problem of preserving x T Lx to that of p reserving y T Π y . This will b e m uch nicer since the eigen v alues of Π are all 0 or 1, so that any matrix ˜ Π whic h appro ximates Π in the sp ectral norm (i.e., make s k ˜ Π − Π k 2 small) also preserv es its quadratic form. W e may describ e the outcome of H = Sparsify ( G, q ) b y the follo w ing r andom matrix: S ( e, e ) = ˜ w e w e = (# of times e is sampled) q p e . (3) S m × m is a nonnegati v e diagonal matrix and the random en try S ( e, e ) sp eciﬁes the ‘amoun t’ of edge e included in H by Sparsify . F or exa mple S ( e, e ) = 1 /q p e if e is sampled once, 2 / q p e if it is sampled t wice, and zero if it is not sampled at all. The w eigh t of e in H is no w giv en by ˜ w e = S ( e, e ) w e , and we ca n write the Laplacian of H as: ˜ L = B T ˜ W B = B T W 1 / 2 S W 1 / 2 B since ˜ W = W S = W 1 / 2 S W 1 / 2 . The scali ng of weig h ts by 1 /q p e in Sparsify implies that E ˜ w e = w e (since q indep enden t samples are tak en, eac h with probabilit y p e ), and thus E S = I and E ˜ L = L . W e can no w prov e the follo wing lemma, wh ic h sa ys that if S do es not distort y T Π y too muc h then x T Lx and x T ˜ Lx are close. Lemma 4. Supp ose S is a nonne gative diagonal matrix such that k Π S Π − ΠΠ k 2 ≤ ǫ. Then ∀ x ∈ R n (1 − ǫ ) x T Lx ≤ x T ˜ Lx ≤ (1 + ǫ ) x T Lx, wher e L = B T W B and ˜ L = B T W 1 / 2 S W 1 / 2 B . Pr o of. The assumption is equiv alen t to sup y ∈ R m ,y 6 =0 | y T Π( S − I )Π y | y T y ≤ ǫ since k A k 2 = sup y 6 = 0 | y T Ay | /y T y for s ym metric A . Restricting our atte nt ion to v ectors in im( W 1 / 2 B ), w e ha v e sup y ∈ i m( W 1 / 2 B ) ,y 6 =0 | y T Π( S − I )Π y | y T y ≤ ǫ. 7 But b y Lemma 3.(ii), Π is the identit y on im( W 1 / 2 B ) so Π y = y for all y ∈ im( W 1 / 2 B ). Also , ev ery suc h y can b e written as y = W 1 / 2 B x for x ∈ R n . Substituting this in to the ab ov e expression w e obtain: sup y ∈ i m( W 1 / 2 B ) ,y 6 =0 | y T Π( S − I )Π y | y T y = sup y ∈ i m( W 1 / 2 B ) ,y 6 =0 | y T ( S − I ) y | y T y = sup x ∈ R n ,W 1 / 2 B x 6 =0 | x T B T W 1 / 2 S W 1 / 2 B x − x T B T W B x | x T B T W B x = sup x ∈ R n ,W 1 / 2 B x 6 =0 | x T ˜ Lx − x T Lx | x T Lx ≤ ǫ. Rearranging yields the d esired conclusion for all x / ∈ ker( W 1 / 2 B ). When x ∈ ker( W 1 / 2 B ) then x T Lx = x T ˜ Lx = 0 and the claim holds trivially . T o show that k Π S Π − ΠΠ k 2 is lik ely to b e small we use the follo wing concen tration result, whic h is a sort of la w of large num b ers for sym metric rank 1 matrices. It w as ﬁrst prov en b y Rudelson in [21], bu t the v ersion w e state here app ears in the more recen t pap er [22] b y Rudelson and V ershynin. Lemma 5 (Rudelson & V ersh ynin, [22] Thm . 3.1) . L et p b e a pr ob ability distribution over Ω ⊆ R d such that sup y ∈ Ω k y k 2 ≤ M a nd k E p y y T k 2 ≤ 1 . L et y 1 . . . y q b e indep endent samples dr awn fr om p . Then E      1 q q X i =1 y i y T i − E y y T      2 ≤ min C M s log q q , 1 ! wher e C is an absolute c onstant. W e can now ﬁnish the pro of of Theorem 1. Pr o of of The or em 1. Sparsify samples edges from G indep endent ly with rep lacemen t, with prob- abilities p e prop ortional to w e R e . S ince P e w e R e = T r (Π) = n − 1 by Lemma 3.(ii i), the actual probabilit y distribution o v er E is giv en by p e = w e R e n − 1 . Sampling q edges from G corresp onds to sampling q columns fr om Π, so w e can write Π S Π = X e S ( e, e )Π( · , e )Π( · , e ) T = X e (# of times e is sampled) q p e Π( · , e )Π( · , e ) T b y (3) = 1 q X e (# of times e is sampled) Π( · , e ) √ p e Π( · , e ) T √ p e = 1 q q X i =1 y i y T i 8 for v ectors y 1 , . . . , y q dra wn indep endently with replaceme n t from the distribution y = 1 √ p e Π( · , e ) with probabilit y p e . W e can now apply Lemma 5. Th e exp ectatio n of y y T is giv en b y E y y T = X e p e 1 p e Π( · , e )Π( · , e ) T = ΠΠ = Π , so k E y y T k 2 = k Π k 2 = 1. W e also h a ve a b ound on the norm of y : 1 √ p e k Π( · , e ) k 2 = 1 √ p e p Π( e, e ) = r n − 1 R e w e p R e w e = √ n − 1 . T aking q = 9 C 2 n log n /ǫ 2 giv es: E k Π S Π − ΠΠ k 2 = E      1 q q X i =1 y i y T i − E y y T      2 ≤ C s ǫ 2 log(9 C 2 n log n/ǫ 2 )( n − 1) 9 C 2 n log n ≤ ǫ/ 2 , for n suﬃcient ly large, as ǫ is assumed to b e at least 1 / √ n . By Mark o v’s inequalit y , w e ha v e k Π S Π − Π k 2 ≤ ǫ with probabilit y at least 1 / 2. By Lemma 4, this completes th e pro of of the theorem. W e now sh o w that using appro ximate resistances for sampling d o es not damage the spars iﬁ er v ery m u c h . Corollary 6. Supp ose Z e ar e numb ers satisfying Z e ≥ R e /α and P e w e Z e ≤ α P e w e R e for some α ≥ 1 . If we sample as i n Sparsify bu t take e ach e dge with pr ob ability p ′ e = w e Z e P e w e Z e inste ad of p e = w e R e P e w e R e , then H satisﬁes: (1 − ǫ α ) x T ˜ Lx ≤ x T Lx ≤ (1 + ǫα ) x T ˜ Lx ∀ x ∈ R n , with pr ob ability at le ast 1 / 2 . Pr o of. W e note that p ′ e = w e S e P e w e S e ≥ w e ( R e /α ) α P e w e R e = p e α 2 and pro ceed as in the pro of of Theorem 1. T h e norm of the rand om v ector y is no w b ound ed by: 1 p p ′ e k Π( e, · ) k 2 ≤ α √ p e p Π( e, e ) = α √ n − 1 whic h in tro d u ces a factor of α in to the ﬁn al b ound on the exp ectat ion, but c hanges nothing else. 9 4 Computing Appro ximate Resistances Quic kly It is not clear ho w to compute all the eﬀectiv e resistances { R e } exactly and eﬃcien tly . In this section, we show that one can compute constan t fac tor appro ximations to all the R e in time e O ( m log r ). In fact, we d o something stronger: we b uild a O (log n ) × n matrix e Z from whic h the eﬀectiv e resistance b et ween an y tw o v ertices (including vertic es n ot connected b y an edge) can b e computed in O (log n ) time. Pr o of of The or em 2. If u and v are v ertices in G , then the eﬀectiv e resistance b et ween u and v can b e written as: R uv = ( χ u − χ v ) T L + ( χ u − χ v ) = ( χ u − χ v ) T L + LL + ( χ u − χ v ) = (( χ u − χ v ) T L + B T W 1 / 2 )( W 1 / 2 B L + ( χ u − χ v )) = k W 1 / 2 B L + ( χ u − χ v ) 2 k 2 2 . Th us eﬀectiv e resistances are just pairwise distances b et we en vec tors in { W 1 / 2 B L + χ v } v ∈ V . By the Johnson-Lindenstrauss Lemma, these distances are preserved if we pro ject the v ectors onto a subspace spanned by O (log n ) random v ectors. F or concreteness, w e use the follo wing version of the Johnson-Lind enstrauss Lemma due to Ac hlioptas [1]. Lemma 7. Given ﬁxe d ve ctors v 1 . . . v n ∈ R d and ǫ > 0 , let Q k × d b e a r andom ± 1 / √ k matrix (i.e ., indep endent Bernoul li entries) with k ≥ 24 log n/ǫ 2 . Then with pr ob ability at le ast 1 − 1 /n (1 − ǫ ) k v i − v j k 2 2 ≤ k Qv i − Qv j k 2 2 ≤ (1 + ǫ ) k v i − v j k 2 2 for al l p airs i, j ≤ n . Our goal is no w to compute the pro jections { QW 1 / 2 B L + χ v } . W e will exploit the linear system solv er of Spielman and T eng [23, 24], whic h w e recall sati sﬁes: Theorem 8 (Spielman-T eng) . Ther e is an algorit hm x = S TSolve ( L, y , δ ) which takes a L aplacian matrix L , a c olumn ve ctor y , and an err or p ar ameter δ > 0 , and r eturns a c olumn ve c tor x satisfying k x − L + y k L ≤ ǫ k L + y k L , wher e k y k L = p y T Ly . The algorithm runs in exp e cte d time e O ( m log (1 /δ )) , wher e m is the numb er of non-zer o entries in L . Let Z = QW 1 / 2 B L + . W e w ill compute an appro ximation e Z b y us ing STSolve to appr o ximately compute the ro ws of Z . Let the column v ectors z i and ˜ z i denote the i th ro ws of Z and ˜ Z , resp ectiv ely (so that z i is the i th column of Z T ). No w w e can construct the matrix e Z in the follo wing three steps. 1. Let Q b e a random ± 1 / √ k matrix of dimension k × n where k = 24 log n/ǫ 2 . 2. Comp ute Y = QW 1 / 2 B . Note that this tak es 2 m × 24 log n/ǫ 2 + m = e O ( m/ǫ 2 ) time since B has 2 m ent ries and W 1 / 2 is diagonal. 3. Let y i , for 1 ≤ i ≤ k , denote the rows of Y , and compute ˜ z i = STSolve ( L, y i , δ ) for eac h i . 10 W e now pro ve that, for our pur p oses, it suﬃces to call STSolv e w ith δ = ǫ 3 s 2(1 − ǫ ) w min (1 + ǫ ) n 3 w max . Lemma 9. Supp ose (1 − ǫ ) R uv ≤ k Z ( χ u − χ v ) k 2 ≤ (1 + ǫ ) R uv , for every p air u, v ∈ V . If for al l i , k z i − ˜ z i k L ≤ δ k z i k L , (4) wher e δ ≤ ǫ 3 s 2(1 − ǫ ) w min (1 + ǫ ) n 3 w max (5) then (1 − ǫ ) 2 R uv ≤ k e Z ( χ u − χ v ) k 2 ≤ (1 + ǫ ) 2 R uv , for every uv . Pr o of. Consider an arbitrary pair of v ertices u , v . It suﬃ ces to sho w that    k Z ( χ u − χ v ) k − k ˜ Z ( χ u − χ v ) k    ≤ ǫ 3 k Z ( χ u − χ v ) k (6) since this will imply    k Z ( χ u − χ v ) k 2 − k ˜ Z ( χ u − χ v ) k 2    =    k Z ( χ u − χ v ) k − k ˜ Z ( χ u − χ v ) k    ·    k Z ( χ u − χ v ) k + k ˜ Z ( χ u − χ v ) k    ≤ ǫ 3 ·  2 + ǫ 3  k Z ( χ u − χ v ) k 2 . As G is co nnected, there is a simple path P connecting u to v . Applying the triangle inequalit y t wice, w e obtain    k Z ( χ u − χ v ) k −    e Z ( χ u − χ v )       ≤    ( Z − e Z )( χ u − χ v )    ≤ X ab ∈ P    ( Z − e Z )( χ a − χ b )    . 11 W e will upp er b ound this later term by considering its square: X ab ∈ P    ( Z − e Z )( χ a − χ b )    ! 2 ≤ n X ab ∈ P    ( Z − e Z )( χ a − χ b )    2 b y Cauc hy-Sc hw arz ≤ n X ab ∈ E    ( Z − e Z )( χ a − χ b )    2 = n    ( Z − e Z ) B T    2 F writing this as a F rob enius norm = n    B ( Z − e Z ) T    2 F ≤ n w min    W 1 / 2 B ( Z − e Z ) T    2 F since k W − 1 / 2 k 2 ≤ 1 / √ w min ≤ δ 2 n w min    W 1 / 2 B Z T    2 F since k W 1 / 2 B ( z i − ˜ z i ) k 2 ≤ δ 2 k W 1 / 2 B z i k 2 b y (4) = δ 2 n w min X ab ∈ E w ab k Z ( χ a − χ b ) k 2 ≤ δ 2 n w min X ab ∈ E w ab (1 + ǫ ) R ab ≤ δ 2 n (1 + ǫ ) w min ( n − 1) b y Lemma 3.(iii). On the other hand , k Z ( χ u − χ v ) k 2 ≥ (1 − ǫ ) R uv ≥ 2(1 − ǫ ) nw max , b y Prop osition 10. Com bining these b ounds, w e ha ve    k Z ( χ u − χ v ) k −    e Z ( χ u − χ v )       k Z ( χ u − χ v ) k ≤ δ  n (1 + ǫ ) w min ( n − 1)  1 / 2 ·  nw max 2(1 − ǫ )  1 / 2 ≤ ǫ 3 b y (5), as desired. Prop osition 10. If G = ( V , E , w ) is a c onne cte d gr aph, then for al l u, v ∈ V , R uv ≥ 2 nw max . Pr o of. B y Rayle igh’s monotonicit y la w (see [6 ]), eac h resista nce R uv in G is at least the co rresp ond - ing resistance R ′ uv in G ′ = w max × K n (the complete graph with all edge weig h ts w max ) since G ′ is obtained b y increasing wei gh ts (i.e., conductances) of edges in G . But b y symm etry eac h resistance R ′ uv in G ′ is exactly P uv R ′ uv  n 2  = ( n − 1) /w max n ( n − 1) / 2 = 2 nw max . Th us R uv ≥ 2 nw max for all u, v ∈ V . 12 Th us the construction of e Z tak es e O ( m log (1 /δ ) /ǫ 2 ) = e O ( m log r /ǫ 2 ) time. W e can then ﬁnd the approxima te resistance k e Z ( χ u − χ v ) k 2 ≈ R uv for any u, v ∈ V in O (log n /ǫ 2 ) time simp ly by subtracting t w o columns of e Z and computing the n orm of their diﬀerence. Using the ab o ve pro cedure, we can compute arbitrarily go o d approximat ions to the eﬀectiv e resistances { R e } wh ic h w e need for sampling in nearly-linear time. By Corollary 6, an y constan t factor appro ximation yields a sp arsiﬁer, so w e are done. 5 An Additional Prop ert y Corollary 6 suggests that Sparsify is quite robust with resp ect to c hanges in the sampling prob- abilities p e , and that w e ma y b e able to p ro ve additional guarante es on H by t we aking them. I n this section, we pro ve one suc h clai m. The follo wing prop ert y is desirable for using H to s olv e linear sys tems (sp eciﬁcally , for the construction of ultr asp arsiﬁers [23, 24], whic h we will not deﬁne here): F or ev ery ve rtex v ∈ V , X e ∋ v ˜ w e w e ≤ 2 deg ( v ) . (7) This sa ys, roughly , that not to o man y of th e edges incident to an y giv en v ertex get blo wn up to o m uc h b y sampling and rescaling. W e sho w h o w to incorp orate this prop ert y into our sparsiﬁers. Lemma 11. Supp ose we sample q > 4 n log n/β e dges of G as in Sparsify with pr ob abilities that satisfy p ( u,v ) ≥ β n min(d eg ( u ) , deg ( v )) for some c onstant 0 < β < 1 . Then with pr ob ability at le ast 1 − 1 /n , X e ∋ v ˜ w e w e ≤ 2 deg( v ) for al l v ∈ V . Pr o of. F o r a v ertex v , deﬁne i.i.d. r andom v ariables X 1 , . . . , X q b y: X i =  1 p e if e ∋ v is th e i th edge c hosen 0 otherwise so that X i is set to 1 /p e with probabilit y p e for eac h edge e attac hed to v . Let D v = X e ∋ v ˜ w e w e = X e ∋ v (# of times e is sampled) q p e = 1 q q X i =1 X i . W e wan t to show that with high probabilit y , D v ≤ 2 deg( v ) for al l vertice s v . W e begin b y b ounding 13 the exp ectation and v ariance of eac h X i : E X i = X e ∋ v p e 1 p e = deg( v ) V ar ( X i ) = X e ∋ v p e  1 p 2 e − 1 p e  ≤ X e ∋ v 1 p e ≤ X ( u,v ) ∋ v n min(d eg ( u ) , deg ( v )) β b y assumption ≤ X ( u,v ) ∋ v n deg ( v ) β = n deg ( v ) 2 β Since the X i are indep enden t, th e v ariance of D v is just V ar ( D v ) = 1 q 2 q X i =1 V ar ( X i ) ≤ n deg ( v ) 2 β q . W e now apply Bennett’s inequalit y for sums of i.i.d. v ariables (see, e.g., [20]), whic h sa ys P [ | D v − E D v | > E D v ] ≤ exp − ( E D v ) 2 V ar ( D v )(1 + E D v q ) ! W e kno w that E D v = E X i = deg ( v ). Su bstituting our estimate for V ar ( D v ) and setting q ≥ 4 n log n/β giv es: P [ D v > 2 deg( v )] ≤ exp   − deg ( v ) 2 n deg( v ) 2 β q (1 + deg( v ) q )   ≤ exp  − β q 2 n  since 1 + deg( v ) q ≤ 2 ≤ exp ( − 2 log n ) = 1 /n 2 . T aking a un ion b ound o ver all v giv es the desired result. Sampling with probabilities p ′ e = p ′ ( u,v ) = 1 2  k Z b T e k 2 w e P e k Z b T e k 2 w e + 1 n min(deg ( u ) , deg ( v ))  satisﬁes the r equiremen ts of b oth Corollary 6 (with α = 2) and Lemma 11 (with β = 1 / 2) and yields a sparsiﬁer with the d esired prop erty . 14 Theorem 12. Ther e is an e O ( m/ǫ 2 ) time algorithm wh ich on input G = ( V , E , w ) , ǫ > 0 pr o duc es a weighte d sub gr aph H = ( V , ˜ E , ˜ w ) of G with O ( n log n/ǫ 2 ) e dges which, with pr ob ability at le ast 1 / 2 , satisﬁes b oth (2 ) and (7). References [1] D . Ac hlioptas. Database-frie ndly random pro jections. In PODS ’01 , pages 274–2 81, 2001. [2] D . Ac hlioptas and F. McSherry . F ast computation of lo w rank matrix app r o ximations. In STOC ’01 , pages 611–61 8, 2001 . [3] S. Arora, E. Hazan, and S. Kale. A fast random sampling algorit hm for sparsifying matrices. In APPRO X -RANDOM ’06 , vo lume 4110 of L e ctur e Notes in Computer Scienc e , p ages 272 –279. Springer, 2006. [4] Josh ua D. Batson, Daniel A. Sp ielman, and Nikhil Sr iv asta v a. Twice-Raman ujan sp arsiﬁers. In STOC ’09: Pr o c e e dings of the 41st annual A CM symp osium on The ory of c omp uting , pages 255–2 62, New Y ork, NY, USA, 200 9. ACM. [5] A. A. Bencz´ ur and D. R. Karger. Approxima ting s-t minim um cu ts in ˜ O ( n 2 ) time. In STOC ’96 , pages 47–55, 1996. [6] B . Bollobas. Mo dern Gr aph The ory . S p ringer, July 1998. [7] A. K. Chandra, P . R aghav an, W. L . Ruzzo, and R. Smolensky . T he elec trical resistance of a graph captures its comm ute and cov er times. In STOC ’89 , p ages 574–5 86, 198 9. [8] F . R. K . C h ung. Sp e ctr al Gr aph The ory . CBM S Regional Conference Series in Mathema tics. American Mathematica l So ciet y , 1997. [9] P . Do yle and J. Snell. R andom walks and ele ctric networks . Math. Asso c. America., W ashing- ton, 1984. [10] P . Drineas and R. K annan. F ast mon te-carlo algorithms for appro ximate mat rix multi plication. In FOCS ’01 , pages 452–459 , 2001. [11] P . Drineas and R. Kannan. Pa ss eﬃcient algorithms for approximat ing large matrices. In SODA ’03 , pages 223–23 2, 2003. [12] A. Firat, S. Chatterjee, and M. Yilmaz . Genetic cluste ring of so cial net wo rks using rand om w alks. Computational Statistics & Data An alysis , 51(12 ):6285– 6294, Au gust 2007. [13] F. F ouss, A. Pirotte, J.-M. Renders, and M. Saerens. Random-w alk compu tation of simila rities b et w een no des of a graph with application to collab orativ e recommendation. Know le dge and Data Engine ering, IE EE T r ansactions on , 19(3):35 5–369 , 2007 . [14] A. F rieze, R. Kannan, and S . V empala. F ast mon te-ca rlo a lgorithms for ﬁ nding lo w-rank appro ximations. J. ACM , 51( 6):102 5–1041 , 2004. [15] C hris Go d sil and Gordon Ro yle. Algebr aic Gr aph The ory . Graduate T exts in Mathematics. Springer, 2001. 15 [16] A. V. Goldb erg and R. E. T arjan. A new appr oac h to the maxim um ﬂ o w problem. In STOC ’86 , pages 136–14 6, 1986. [17] S . Guattery and G. L. Miller. Graph em b edd ings and Laplacian eig en v alues. SIAM J . Matrix Ana l. Appl. , 21(3):7 03–72 3, 200 0. [18] W. Johns on and J. Lin denstrauss. Extensions of Lipsc hitz mappings in to a Hilb ert space. Contemp. Math. , 26:18 9–206, 1984. [19] R. K h andek ar, S. Rao, and U. V azirani. Graph partitioning using single commo dit y ﬂows. In STOC ’06 , pages 385–39 0, 2006 . [20] G. Lugosi. Concent ration-of-measure inequalities, 2003. Av ailable at http://w ww.econ. upf.edu/ ∼ lugosi/anu.ps . [21] M. Rud elson. Random v ectors in the isotropic p osition. J. of F u nctional Analysis , 16 3(1):60 – 72, 1999. [22] M. Rudelson and R. V ershynin. Sampling f rom large matrices: An approac h through geometric functional analysis. J. ACM , 54 (4):21, 2007. [23] D. A. Spielman and S.-H. T eng. Nearly-li near time algorithms for graph partitioning, graph sparsiﬁcation, and solving linear s y s tems. In STOC ’04 , pages 81–90 , 2004. F ull v ersion a v ailable at http://a rxiv.org/ abs/cs.DS/0310051 . [24] D. A. S pielman and S.-H. T eng. Nea rly-linear time algorithms for pr econdi- tioning and solving symmetric, dia gonally dominant li near systems. Av ailable at http://w ww.arxiv .org/abs/cs.NA/0607105 , 200 6. [25] D. A. Spielman and S.-H. T eng. Sp ectral Sp arsiﬁcation of Graphs. Av ailable at http://a rxiv.org /abs/0808.4134 , 200 8. 16

Graph Sparsification by Effective Resistances

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment