Distributed Averaging in the presence of a Sparse Cut

Distributed a v eraging in the presence of a sparse cut Hariha r an Naray anan Department of Computer Sci ence, Univ er sit y of Chica go Septem b er 2, 202 1 Abstract W e consider the questio n of a v eraging on a graph that has one sparse cut separating t wo subgraphs that are in ternally wel l connecte d. While there has b een a large b o dy of w ork dev oted to algorithms for distributed a verag ing, nearly all algorithms in volv e only c onvex u p dates. In this pap er, we su ggest that non-c onvex u p dates can lead to signiﬁcan t i mp ro v ements. W e do so by exhibiting a decen tralized algorithm for graph s with one spars e cut that u ses non-con vex a v erages and has an a ve r aging time that can b e signiﬁcant ly smaller than the a ve r aging time of kno w n distributed algorithms, such as those of [3, 2 ]. W e use sto c hastic dominance to pro ve this result in a wa y that ma y b e of ind ep endent in terest. 1 In tro duction Consider a Graph G = ( V , E ) , where i.i.d P oisson clo ck s with rate 1 are asso ciated with eac h edge 1 . W e represen t the “true" real v alue d time b y T . Eac h no de v i holds a v alue x i ( T ) a t time T . Let the av erage v alue held b y the no des b e x av . Eve ry time an edge e = ( v , w ) tic ks, it up dates the v alues of v ertices a dja cen t to it on the basis of presen t and past v alues of v , w and their immediate neigh b ors according to some algorithm A . There is an extens ive b o dy of w ork surrounding the sub ject of gossip alg o rithms in v arious con texts. Non-con v ex up dates ha v e b een used in the con text of a second order diﬀusion for load balancing [5] in a sligh tly diﬀeren t setting. The idea there w as to tak e into accoun t the v alue o f the no des during the previous t w o time steps rather than just the previous one, (in a sync hronous setting), a nd set the future v alue of a no de to a non- conv ex linear com bination of the past v alues of some 1 This model can b e sim ulated using previous models suc h as [2] b y allo c a ting edges to no des and equipping no des with mul tiple i.i.d p oisso n clocks. 1 of its neigh b o rs. There is also a line of researc h on a v eraging alg o rithms havi ng tw o time scales, [1, 4] whic h is closely related to the presen t pap er. In a previous pap er [6], w e considered the use of non- conv ex combin at io ns for gossip on a geographic random graph on n nodes. There w e sh ow ed that one can ach iev e a v eraging using n 1+ o (1) up dates if one is willing to allow a certain amoun t of cen tralized control.The main tec hnical diﬃcult y in using non-con v ex up dates is that they can sk ew the v alues held b y no des in the short term. W e sho w that nonetheless , in the long term this leads to faster a v eraging. Let the v alues held by the no des b y X ( T ) = ( x 1 ( T ) , . . . , x | V | ( T )) T . W e study distributed av eraging algorithms A whic h result in lim T →∞ X ( T ) = x av 1 , where x av is in v arian t under the passage of time. and sho w t hat in some cases there is an expo nen tial sp eed-up in n if one allo ws the use of non-con v ex up dates, as opp osed to only con v ex ones. Deﬁnition 1 L et v ar X ( t ) := P | V | i =1 ( x i ( t ) − x av ) 2 | V | . L et T av = sup x ∈ R | V | inf t P  ∃ T > t, v ar X ( T ) v ar X (0) > 1 e 2    X (0) = x  < 1 e . Notation 1 L et a c onne cte d gr aph G = ( V , E ) have a p artition into c onne cte d gr aphs G 1 = ( V 1 , E 1 ) , and G 2 = ( V 2 , E 2 ) . Sp e ciﬁc al ly, every vertex in V is either in V 1 or V 2 , and every e dge in E b elongs to either E 1 or to E 2 , or to the set of e dges E 12 that have on e endp oint in V 1 and one in V 2 . L et | V 1 | = n 1 , | V 2 | = n 2 wher e wi thout loss of gener ality, n 1 ≤ n 2 and | V | = n . L et T van ( G 1 ) and T van ( G 2 ) b e the aver aging times of the “vanil la" algorithm that r eplac es at the clo ck tick of a n e dge e the values of the en d p oints of e by the arithm e tic me an of the two, applie d to G 1 and G 2 r esp e c tive ly. Deﬁnition 2 L et C de note the set of algorithms that use onl y c onvex up dates of the form 1. x i ( t + ) = αx i ( t − ) + β x j ( t − ) . 2. x j ( t + ) = αx j ( t − ) + β x i ( t − ) . wher e α ∈ [0 , 1] a n d α + β = 1 . These up dates ha v e b een extensiv ely studied, see for example [3, 2]. 2 Theorem 1 The ave r aging time of any distribute d algorithm in C is Ω( min( | V 1 | , | V 2 | ) | E 12 | ) Theorem 2 The ave r aging time of A is O (log n ( T van ( G 1 ) + T van ( G 2 ))) . Note that in the case where G 1 and G 2 are suﬃcien tly w ell connected internally but p o orly connected to each other, A outp erforms an y algorithm in C . In fact fo r the graph G ′ obtained by joining t wo complete gr a phs G ′ 1 , G ′ 2 eac h ha ving n 2 v ertices b y a single edge, Ω( min( | V ′ 1 | , | V ′ 2 | ) | E ′ 12 | ) = Ω( n ) , while O ( lo g n ( T av ( G ′ 1 ) + T av ( G ′ 2 ))) = O (log n ) . 1.0.1 Algorithm A Let the v ertices of G 1 b e lab eled by [ n 1 ] and those of G 2 b y [ n 2 ] \ [ n 1 ] , where [ n ] := { 1 , . . . , n } . Let e c = ( v n 1 , v n 1 +1 b e a ﬁxed edge b elonging to E 12 . Let the time of the k th clo c k tick of an edge e b e t . Let C > > 1 b e a suﬃcien tly large absolute constan t (indep enden t of n .) • If the edge e is e c = ( v n 1 , v n 1 +1 ) , 1. If k ≡ − 1 mo d ( ⌈ C ( T van ( G 1 ) + T van ( G 2 )) ln n ⌉ ) (a) x n 1 ( t + ) = x n 1 ( t − ) + n 1 { x n 1 +1 ( t − ) − x n 1 ( t − ) } (b) x n 1 +1 ( t + ) = x n 1 +1 ( t − ) − n 1 { x n 1 +1 ( t − ) − x n 1 ( t − ) } 2. If k 6≡ − 1 mo d ( ⌈ C ( T van ( G 1 ) + T van ( G 2 )) ln n ⌉ ) mak e no up dat e. • If the edge e is ( v i , v j ) 6∈ E 12 1. x i ( t + ) = x i ( t − )+ x j ( t − ) 2 . 2. x j ( t + ) = x i ( t − )+ x j ( t − ) 2 . • If e ∈ E 12 \ { e c } mak e no up date. 2 Limitations of con v ex com bin ations Giv en a function a ( t ) , let its righ t limit at t b e denoted b y a ( t +) and its left limit a t t by a ( t − ) . Consi der an a lgorithm C ∈ C . Let us consider the initial condition where X (0) is the vec tor that is 1 on v ertices v 1 , . . . , v n 1 of G 1 and − n 1 n 2 on ve rtices v n 1 +1 , . . . , v n of G 2 . Let us denote P n 1 i =1 x i ( t ) n 1 b y y ( t ) and P n 2 i = n 1 +1 x i ( t ) n 2 b y z ( t ) . In the mo del w e hav e considered, with probabilit y 1 , at no time do es more than one 3 clo c k tic k. In the course o f the execution any a lg o rithm in C y ( t ) can c hange only during clo c k tick s of e c and the same holds for z ( t ) . This is b ecause during a clo c k tic k of an y other edge, b oth of whose end-v ertices lie in G 1 or in G 2 , y ( t ) and z ( t ) do not change. The v ertices adjacen t to e c can c hange by at most 2 across these instan ts. F urther, the v alues x n ( t ) and x n +1 ( t ) are seen to lie in the interv al [min i ∈| V | x i (0) , max i ∈| V | x i (0)] ⊆ [ − 1 , 1] . If the clo c k of e c tic ks at time t , w e therefore ﬁnd that | y ( t + ) − y ( t − ) | ≤ 2 n 1 , (1) The num ber of clo c ks tic ks of e c un til time t is a Poiss on r a ndom v ariable whose mean is t . A direct calculation tells us that v ar( X ( t )) ≥ n 1 y ( t ) 2 n . (2) T o obtain a lo w er b ound fo r y ( t ) 2 , w e note that the total n um b er of times the clo c ks of edges b elonging to E 12 tic k is a P oisson random v ariable ν t with mean t | E 12 | . It follo ws from Inequalit y (1) that y ( t ) ≥ 1 − 2 ν t n 1 . | E 12 | T av = E [ ν T av ] ≥ P  ν T av ≥ (1 − 1 e ) n 1 4  (1 − 1 e ) n 1 4 Ho w ev er P  ν T av ≥ (1 − 1 e ) n 1 4  m ust b e large, b ecause otherwise y ( T av ) w ould probably b e large. More precisely , P  ν T av ≥ (1 − 1 e ) n 1 4  ≥ 1 − P  ∃ T > T av , v ar X ( T ) > 1 e 2  ≥ 1 − 1 e Therefore, T av ≥ P  ν T av ≥ (1 − 1 e ) n 1 4 | E 12 |  (1 − 1 e ) n 1 4 ≥ Ω( n 1 | E 12 | ) 4 3 Using non-con v ex com binations 3.0.2 Analysis Since T av is deﬁned in terms o f v ariance a nd algorithm A uses only linear up dates, we ma y subtract out the mean fro m eac h X i (0) and it is suﬃcien t to analyze the case when x av = 0 . Let V 1 = [ n 1 ] and V 2 = [ n ] \ [ n 1 ] . Let µ 1 ( t ) = P n 1 i =1 x i ( t ) n 1 and µ 2 = P 2 n i = n 1 x i ( t ) n and µ ( t ) = | µ 1 ( t ) | + | µ 2 ( t ) | . Let σ ( t ) = s P n 1 i =1 ( x i ( t ) − µ 1 ( t )) 2 + P n n 1 +1 ( x i ( t ) − µ 2 ( t )) 2 n . W e consider time instan ts T 1 , T 2 , . . . where T i is the instan t at whic h the clo c k of edge e tic ks for the ⌈ iC ( T van ( G 1 ) + T van ( G 2 )) ln n ⌉ th time. Observ e that the v alue of µ ( t ) c hanges only across time instan ts T k , k = 1 , 2 , . . . . The amoun t by whic h x n 1 ( t ) and x n 1 +1 ( t ) deviate from µ 1 ( t ) and µ 2 ( t ) resp ectiv ely , can b e seen to b e b ounded a b o v e by √ nσ ( t ) max {| x n 1 ( t ) − µ 1 ( t ) | , | x n +1 ( t ) − µ 2 ( t ) |} ≤ √ nσ ( t ) . (3) W e now examine the ev olution of σ ( T + k ) and µ ( T + k ) as k → ∞ . T he statemen ts b elow are true if C is a suﬃcien tly large univ ersal constan t (indep enden t of n ). F rom T + k to T − k +1 , indep enden t of x , P  σ ( T − k +1 ) ≥ σ ( T + k ) n 6    X ( T + k ) = x  ≤ 1 4 n (4) µ ( T − k +1 ) = µ ( T + k ) . (5) Because of inequalit y (3), from T + k to T − k +1 σ ( T + k +1 ) ≤ n ( σ ( T − k +1 ) + | µ ( T − k +1 ) | ) (6) | µ ( T + k +1 ) | ≤ n 3 2 σ ( T − k +1 ) (7) v ar X ( t ) = µ ( t ) 2 + σ ( t ) 2 . W e deduce from the ab o v e that P  v ar X ( T + k +1 ) ≥ v ar X ( T + k ) n 4  ≤ 1 4 n (8) 5 Let A k b e the (random) op erator obtained by comp osing the linear updates from time T + k to T + k +1 . Let k A k denote the norm of an op erator acting from ℓ 2 to ℓ 2 k A k = sup x ∈ R n k Ax k 2 k x k 2 . Lemma 1 P  k A k k 2 ≥ 1 n 3  ≤ 1 2 (9) T o see this, let v 1 , . . . , v n b e the canonical basis for R n . F or a n y unit v ector x = n X i =1 λ i v i Then, k A k ( x ) k ≤ n X i =1 | λ i | k A k ( v i ) k ( T riangle Inequalit y ) (10) ≤ v u u t n X i =1 k A k ( v i ) k 2 ( Cauc h y-Sc hw artz inequalit y ) (11) The Lemma no w f ollo ws from Inequalit y (8) b y an a pplication of the Union Bound.  Moreo v er, w e observ e b y construction that the norm of A k is less o r equal to n , k A k k ≤ n (12) Note that log ( v ar X ( T + k )) deﬁnes a random pro cess (that is not Mark ov ). The up da tes A k from time T + k to T + k +1 for successiv e k are i.i.d random op erators acting on R 2 n . Note that log(v ar X ( T + k )) − log ( v ar X (0)) ≤ k X i =1 log k A i k due to the presence of the suprem um in the deﬁnition of op erator norm. W k := k X i =1 log k A i k is a random w alk on the real line for k = 1 , . . . , ∞ . The last and p erhaps most imp ortan t ingredien t is that of sto chastic domina nc e . It follo ws from Lemma 1 and Equation 12 that the random w alk { W k } can b e coupled with a random 6 w alk { ˜ W k } that is a lw a ys to the right of it on the real line, i. e. for all k , W k ≤ ˜ W k , where the incremen ts ˜ W k +1 − ˜ W k = log n ( with probability 1 2 ) (13) = − 3 2 log n ( with probabilit y 1 2 . ) (14) Noting that b y construction, log(v ar X ( T + k )) − log ( v ar X (0)) ≤ ˜ W k , (15) it follows that T av is upp er b ounded by any t 0 whic h satisﬁes P h ∀ T > t 0 , ˜ W T ≤ − 2 i > 1 − 1 e . Note that E [ ˜ W k ] = − k log n 2 and E [v ar ˜ W k ] = 9 k 16 log 2 n . In order to pro ceed, w e shall need the follo wing inequalit y ab out simple un biased random w alk { S k } k ≥ 0 on Z starting at 0 . Theorem 3 Ther e exist c onstants c, β such that for any n ∈ Z , s > 0 P [ S n ≥ s √ n ] ≤ ce − β s 2 . Using this fact, P [ ∀ T > t 0 , ˜ W T ≤ − 2] = P [ ∀ T > t 0 , (log n )( S T − T 2 ) ≤ − 2] (16) F or large n , this is the same as P [ ∀ T > t 0 , S T < T 2 ] ≥ 1 − X T >t 0 ce − β T / 4 . Clearly , there is a constan t t 0 independen t of n suc h that 1 − P T >t 0 ce − β T / 4 > 1 − 1 e . This completes the pro of.  4 A c kno wledg emen t I am gra t eful t o D imitris A c hlioptas, Viv ek Borkar, Steph en Bo yd and Stev en Lalley for man y helpful discussions. 7 References [1] V. S. Borkar. Sto c hastic a pproximation with tw o time-scales. Systems and Contr ol letters. (199 7) [2] S. Bo yd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms : Design, analysis a nd applications. In Pr o c e e ding s of the 24th Confer enc e of the IEEE Communic ations So ciety (INFOCOM 2005) , 2005. [3] D. Bertsekas, J. T sitsiklis. P arallel a nd Distributed Computation: Numerical Metho ds, Pren tice-Hall, 19 89 [4] V. K onda and J. T sitsiklis . Con v ergence rate of a linear tw o time scale sto ch astic appro ximation. A nnals of A pplie d Pr ob ability 2004. [5] S. Muth ukrishnan, B. Ghosh and M. H. Sc h ultz First and Second-Order Diﬀusiv e Metho ds fo r R a pid, Coarse, Distributed Load Balanc- ing. The ory of Computing Systems, V olume 31, Number 4, D ecem b er, 19 98 [6] H. Nara yanan. Geographic Gossip on Geometric Random G raphs via Aﬃne Com binations. In Principles of Distribute d Computing (PODC ) , 2007 8

Distributed Averaging in the presence of a Sparse Cut

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment