Network-based consensus averaging with general noisy channels

Net w ork-based consen s us a v eraging with general n o isy c hannels Ram Ra jagopal ∗ Martin J. W ain wright ∗ , † ramr@eec s.berkele y.edu wainwrig @stat.ber keley.edu Departmen t of Statistics † , and Departmen t of Electrical Engineering and Computer Sciences ∗ Univ ersit y of California, Berke ley Berk eley , CA 94720 T ec hnical Rep ort Departmen t of Statistics, UC Berk eley Abstract This pap er focuses o n the consensus av erag ing problem on graphs under general noisy c han- nels. W e study a particular class of distributed consens us algorithms based on damp ed up dates, and using the ordina ry diﬀerential equation metho d, we pro ve that the updates conv erge al- most sur ely to exact cons e nsus for ﬁnite v a riance noise . Our a nalysis applies to v arious types of s tochastic disturba nces, including errors in parameters, transmiss ion noise, and q uantization noise. Under a suitable stability condition, we prove that the erro r is asymptotically Gaussia n, and we s how how the asymptotic cov aria nce is sp eciﬁed by the g raph L a placian. F or additive parameter no ise, we show how the sca ling of the asymptotic MSE is c o nt ro lled by the s pectr al gap of the Laplacia n. Keyw ords: Distributed a v eraging; sensor net works; message-passing; consensus proto cols; gossip algorithms; sto c hastic approximat ion; graph Laplacian. 1 In tro duction Consensus problems, in whic h a group of no d es wa nt to arriv e at a common d ecision in a distribu ted manner, h a v e a lengthy history , dating back to seminal work fr om o v er tw ent y yea rs ago [8, 5, 18]. A p articular t yp e of consensus e stimation is the d istributed a v eraging problem, in wh ic h a group of no d es wan t to compu te the a v erage (or m ore generally , a linear fu nction) of a set of v alues. Due to its ap p lications in s ensor and wireless net wo rkin g, this distr ib uted av eraging problem has b een the fo cus of substanti al recen t researc h. The distrib u ted av eraging p r oblem can b e stud ied either in cont inuous-time [16], or in the discrete-time setting (e.g., [13, 19, 6, 3, 9]). In b oth cases, there is no w a fairly go o d understand ing of the conditions under whic h v arious d istr ibuted a v eraging algorithms con v erge, as well as the rates of conv ergence for diﬀeren t graph stru ctures. The bulk of early w ork on consensus has fo cu sed on the case of p erfect communicatio n b et we en no des. Giv en th at noiseless communicatio n ma y b e an unr ealistic assumption for sensor net wo rks, a more recent line of w ork h as addressed th e issue of noisy communicati on links. With imp erfect observ ations, man y of the standard consensus pr otocols migh t fail to r eac h an agreemen t. Xiao et al. [20] observed this phenomenon, and opted to instead r edeﬁne the n otion of agreemen t, obtaining a pr oto col th at allo ws no des to obtain a steady-state agreemen t, whereb y all n o des are able to 1 trac k bu t need not obtain consensus agreemen t. Sc hizas et al. [17] s tudy distributed algorithms for optimization, including th e consensus a v eraging p roblem, and establish stabilit y under noisy up d ates, in that the iterates are guarantee d to remain within a ball of the correct consens u s, but do not necessarily ac hiev e exact consensus . Kash ya p et al. [12] stud y consens us up d ates with the additional constraint that the v alue stored at eac h no de must b e integ ral, and establish conv ergence to quan tized consensus . F agnani and Z ampieri [10] stud y the case of pac ke t-droppin g c hannels, and prop ose v arious u p dates that are guarantee d to ac hieve consensus. Yildiz and Scaglione [21] su ggest co ding strategies to deal w ith qu an tization n oise, b ut do not establish con v ergence. In related wo rk, Aysal et al [3] u s ed probabilistic forms of quanti zation to deve lop algorithms that ac hiev e consensu s in exp ectation, bu t n ot in an almost sur e sens e. In the cur r en t pap er, w e add ress the discrete-time a ve rage co nsens us pr oblem for general sto c hastic c hann els. Our main contribution is to pr op ose and analyze s im p le distrib uted proto- cols that are guaran teed to ac hiev e exact consensu s in an almost su re (sample-path) sense. These exactness guarante es are obtained using pr otocols with d ecreasing step sizes, which smooths out the noise factors. The framework d escrib ed here is based on the classic ordinary d iﬀerential equation metho d [15], and allo w s for the analysis of seve ral diﬀerent and imp ortan t scenarios, namely: • Noisy storage: stored v alues at eac h node are corrupted b y noise, with kno w n cov ariance structure. • Noisy trans mission: messages across eac h edge are corrupted by noise, with kno wn co v ariance structure. • Bit constrained c hannels: dithered quantiza tion is app lied to messages prior to transmission. T o the b est of our knowledge , this is the ﬁrst pap er to analyze p roto cols that can achiev e arbitrarily small mean-squared err or (MSE) for d istr ibuted a v eraging with noise. By using sto c hastic appr ox- imation theory [4, 14], w e establish almost sure con v ergence of the up d ates, as we ll as asymptotic normalit y of th e error u nder appropr iate stabilit y conditions. The resulting expressions for the asymptotic v ariance rev eal ho w d iﬀeren t graph structures—r anging from ring grap h s at one ex- treme, to expander graphs at the other—lead to diﬀerent v ariance scaling b eha viors, as determined b y the eigensp ectrum of the graph Laplacian [7]. The remainder of this p ap er is organized as f ollo ws. W e b egin in Section 2 by describ ing the distributed av eraging problem in detail, and deﬁn ing the class of sto chastic algorithms stud ied in this p ap er. In Section 3, we state our m ain resu lts on the almost-sure con v ergence and asymp- totic normalit y of our p roto cols, and illustrate some of their consequences for particular classes of graphs. In particular, we illustr ate the sh arpness of our theoretical pr edictions by comparing them to simulatio n results, on v arious classes of graph s. Section 4 is d ev oted to the pr o ofs of our main results, and we conclude the pap er w ith discussion in S ection 5. (This w ork w as pr esen ted in part at the Allerton Confer en ce on Con trol, Compu tin g and C omm unication in Septem b er 2007.) Commen t on notation: Throughout th is p ap er, we use the f ollo win g standard asymptotic n o- tation: for a functions f and g , the notation f ( n ) = O ( g ( n )) m eans that f ( n ) ≤ C g ( n ) f or some constan t C < ∞ ; the notation f ( n ) = Ω ( g ( n )) means that f ( n ) ≥ C ′ g ( n ) f or some constan t C ′ > 0, and f ( n ) = Θ( g ( n )) means that f ( n ) = O ( g ( n )) and f ( n ) = Ω( g ( n )). 2 2 Problem set-up In this section, we describ e the d istributed a ve raging problem, and sp ecify the class of sto c hastic algorithms studied in this pap er. 2.1 Consensus matrices and sto c hastic up dates Consider a set of m = | V | no des, eac h represent ing a p articular sensing and p ro cessing device. W e mo del th is system as an und irected graph G = ( V , E ), with pro cessors asso ciated with no des of the graph, and th e edge set E ⊂ V × V represen ting pairs of pro cessors that can communicat e d irectly . F or eac h no de v , w e let N ( v ) : = { u ∈ V | ( v , u ) ∈ E } b e its neigh b orho o d set. P S f r a g r e p l a c e m e n t s θ ( t ) t u s θ ( s ) θ ( u ) r F ( θ t , ξ ( t, r )) Figure 1. Illustration of the distr ibuted proto col. E ach no de t ∈ V maintains a n estimate θ ( t ). At each round, for a ﬁxe d reference no de r ∈ V , ea ch neigh b or t ∈ N ( r ) sends the messag e F ( θ t , ξ ( t, r )) along the edge t → r . Supp ose that eac h ve rtex v mak es a real-v alued measurement x ( v ), and consider the goal of computing the a verag e x = 1 m P v ∈ V x ( v ). W e assu me th at | x ( v ) | ≤ x max for all v ∈ V , as dictated b y physic al constraints of sensin g. F or iterations n = 0 , 1 , 2 , . . . , let θ n = { θ n ( v ) , v ∈ V } represent an m -dimensional v ector of estimates. S olving the distributed a v eraging problem amoun ts to having θ n con v erge to θ ∗ : = x ~ 1, where ~ 1 ∈ R m is the ve ctor of all ones. V arious algorithms for distribu ted a v eraging [16, 6] are based on symmetric consensus matrices L ∈ R m × m with the p r op erties: L ( v , v ′ ) 6 = 0 only if ( v , v ′ ) ∈ E (1a) L ~ 1 = ~ 0 , and (1b) L  0 . (1c) The simplest example of su c h a matrix is the gr aph L aplacian , deﬁned as follo ws. Let A ∈ R m × m b e the adjacency matrix of the graph G , i.e. the symm etric matrix with entries A ij = ( 1 if ( i, j ) ∈ E 0 otherwise, (2) 3 and let D = d iag { d 1 , d 2 , . . . , d m } wh ere d i : = | N ( i ) | is the degree of no de i . Assu ming that the graph is connected (so th at d i ≥ 1 for all i ), the graph Laplacian is giv en by L ( G ) = I − D − 1 / 2 AD − 1 / 2 . (3) Our analysis app lies to the (rescaled) graph Laplacian, as well as to v arious we ight ed forms of graph Laplacian matrices [7]. Giv en a ﬁxed c hoice of consensus matrix L , we consider the follo wing family of up dates, gen- erating the sequence { θ n , n = 0 , 1 , 2 . . . } of m -d imensional v ectors. The up d ates are designed to resp ect the n eigh b orho o d str u cture of the grap h G , in the sense th at at eac h iteration, the estimate θ n +1 ( r ) at a r e c eiv i ng no de r ∈ V is a fun ction of only 1 the estimates { θ n ( t ) , t ∈ N ( r ) } asso ciated with tr ansmitting no des t in the neigh b orh o o d of no de r . In order to m o del noise and uncertain ty in the s torage an d communicati on pro cess, w e introdu ce rand om v ariables ξ ( t, r ) asso ciated with the tr ansmission link from t to r ; we allo w f or the p ossibility that ξ ( t, r ) 6 = ξ ( r, t ), since th e noise structure might b e asymm etric. With this set-up, we consider algorithms that generate a s tochastic sequence { θ n , n = 0 , 1 , 2 , . . . } in the follo wing manner: 1. A t time step n = 0, initialize θ 0 ( v ) = x ( v ) for all v ∈ V . 2. F or time s teps n = 0 , 1 , 2 , . . . , eac h n o de t ∈ V co mpu tes th e random v ariables Y n +1 ( r , t ) =        θ n ( t ) , if t = r F ( θ n ( t ) , ξ n +1 ( t, r )) if ( t, r ) ∈ E 0 otherwise , (4) where F is the c ommunic ation-noise function deﬁning the mo d el. 3. Generate estimate θ n +1 ∈ R m as θ n +1 = θ n + ǫ n h −  L ⊙ Y n +1  ~ 1 i , (5) where ⊙ denotes th e Hadamard (element wise) pro du ct b et w een m atrices, and ǫ n > 0 is a deca ying step size p arameter. See Figure 1 for an illustration of the message-passing up date of this proto col. In th is p ap er, w e fo cus on step size parameters ǫ n that scale as ǫ n = Θ(1 /n ). O n an elemen t wise b asis, the up date (5 ) tak es the form θ n +1 ( r ) = θ n ( r ) − ǫ n   L ( r , r ) θ n ( r ) + X t ∈ N ( r ) L ( r , t ) F ( θ n ( t ) , ξ n +1 ( t, r ))   . 1 In fact, our analysis is easily generalized to t he case where θ n +1 ( r ) dep end s only on vertices t ∈ N ′ ( r ), where N ′ ( r ) is a (p ossibly rand om) subset of the full n eigh b orho od set N ( v ). How ever, to bring our results into sharp focu s, w e restrict attention to the case N ′ ( r ) = N ( r ). 4 2.2 Comm unication and noise mo dels It r emains to sp ecify the form of the the function F that con trols the communicat ion and noise mo del in the lo cal compu tation step in equation (4). Noiseless real num b er mo del: The simplest mo del, as consid er ed by the b ulk of past w ork on distributed a verag ing, assumes noiseless communicatio n of real n umb ers. This mo del is a sp ecial case of the u p date (4) with ξ n ( t, r ) = 0, and F ( θ n ( t ) , ξ n +1 ( t, r )) = θ n ( t ) . (6) Additiv e edge-based noise mo del ( AEN ): I n this mo d el, the term ξ n ( t, r ) is zero-mean additiv e random noise v ariable that is asso ciated with the transmission t → r , and the comm unication function tak es the form F ( θ n ( t ) , ξ n +1 ( t, r )) = θ n ( t ) + ξ n +1 ( t, r ) . (7) W e assume that the random v ariables ξ n +1 ( t, r ) and ξ n +1 ( t ′ , r ) are ind ep endent for distinct edges ( t ′ , r ) and ( t, r ), and identic ally distribu ted with zero-mean and v ariance σ 2 = V ar( ξ n +1 ( t, r )). Additiv e no de- based noise mo del ( ANN ): In this mod el, the function F tak es the same form (7) as th e edge-based noise mo del. Ho wev er, the key distinction is that for eac h v ′ ∈ V , we assume that ξ n +1 ( t, r ) = ξ n +1 ( t ) for all r ∈ N ( t ), (8) where ξ n +1 ( t ) is a single n oise v ariable asso ciated with no d e t , with zero mean and v ariance σ 2 = V ar ( ξ n ( t )). Th us, th e random v ariables ξ n +1 ( t, r ) and ξ n +1 ( t, r ′ ) are all identic al f or all edges out-going from th e transm itting no de t . Bit-constrained communic ation ( BC ): Supp ose that the c hannel from no de v ′ to v is b it- constrained, so that one can transmit at most B bits, w hic h is then sub j ected to r andom dithering. Under these assump tions, the comm unication function F tak es the form F ( θ ( v ′ ) , ξ ( v ′ , v )) = Q B  θ ( v ′ ) + ξ ( v ′ , v )  , (9) where Q B ( · ) r epresen ts the B -b it quan tization function with maxim um v alue M and ξ ( v ′ , v ) is random d ithering. W e assume that the random dithering is ap p lied p rior to tran s mission across the c hannel out-going from vertex v ′ , so that ξ ( v ′ , v ) = ξ ( v ′ ) is the s ame random v ariable across all neigh b ors v ∈ N ( v ′ ). 3 Main result and consequences In this section, w e ﬁr st state our main r esult, concerning the sto chastic b ehavior of sequen ce { θ n } generated by th e up d ates (5) . W e then illustrate its consequences f or th e sp eciﬁc communicatio n and noise mo dels d escrib ed in Section 2.2, and conclude with a discussion of b ehavior for sp eciﬁc graph structures. 5 3.1 Statemen t of main r esult Consider the f actor L ⊙ Y that driv es the up d ates (5). An imp ortan t elemen t of our analysis is the conditional co v ariance of th is up date factor, denoted b y Σ = Σ θ and giv en by Σ θ : = E h ( L ⊙ Y ( θ , Z )) ~ 1 ~ 1 T ( L ⊙ Y ( θ , Z )) T | θ i − L θ ( Lθ ) T . (10) A little calculation sh o ws that the ( i, j ) th elemen t of this m atrix is giv en by Σ θ ( i, j ) = m X k ,ℓ =1 L ( i, k ) L ( j, ℓ ) E [ Y ( i, k ) Y ( j, ℓ ) − θ ( k ) θ ( ℓ ) | θ ] . (11) Moreo ver, the eigenstructure of th e consensus matrix L plays an imp ortan t role in our analysis. Since it is s ymmetric and p ositiv e semideﬁnite, we can w rite L = U J U T , (12) where U b e an m × m orthogonal matrix w ith columns deﬁned by unit-norm eigenv ectors of L , and J : = diag { λ 1 ( L ) , . . . , λ m ( L ) } is a diagonal m atrix of eigen v alues, with 0 = λ 1 ( L ) < λ 2 ( L ) ≤ . . . < λ m ( L ) . (13) It is con v enient to let e U denote the m × ( m − 1) matrix with columns deﬁn ed by eigenv ectors asso ciated with p ositive eigen v alues of L — that is, excludin g column U 1 = ~ 1 / k ~ 1 k 2 , asso ciated with the zero-eigen v alue λ 1 ( L ) = 0. With this notation, w e h av e e J = diag { λ 2 ( L ) , . . . , λ m ( L ) } = e U T L e U . (14) Theorem 1. Co nsider the r andom se quenc e { θ n } gene r ate d by the up date (5) for some c ommuni- c ation f unction F , c onsensus matrix L , and step size p ar ameter ǫ n = Θ(1 /n ) . (a) In al l c ases, the se quenc e { θ n } is a str ongly c onsistent estimator of θ ∗ = x ~ 1 , me aning that θ n → θ ∗ almost sur ely (a.s.). (b) F urthermor e, if the se c ond smal lest eige nvalue of the c onsensus matrix L satisﬁes λ 2 ( L ) > 1 / 2 then √ n ( θ n − θ ∗ ) d → N 0 , U T " 0 0 0 e P # U ! , (15) wher e the ( m − 1) × ( m − 1) matrix e P is the solution of the c ontinuous time Lyapunov e quation  e J − I 2  e P + e P  e J − I 2  T = e Σ θ ∗ (16) wher e e J is the diagonal matrix (14) , and e Σ θ ∗ = e U T Σ θ ∗ e U is the tr ansforme d version of the c onditional c ovarianc e (10) . 6 Theorem 1(a) asserts th at the sequence { θ n } is a s trongly consisten t estimator of the av erage. As opp osed to weak consistency , this resu lt guaran tees that for almost an y realization of th e algorithm, the asso ciated sample path con verge s to the exact consens us solution. Th eorem 1(b) establishes that for appropriate choi ces of consensus m atrices, the rate of MSE conv ergence is of order 1 /n , s ince the √ n -rescaled err or con verge s to a non-degenerate Gaussian limit. Suc h a rate is to b e exp ected in the p resence of suﬃcient noise, since the num b er of observ ations receiv ed by an y given no de (and hence th e in ve rse v ariance of estimate) scales as n . Th e solution of the Lyapuno v equation (16) sp eciﬁes the pr ecise form of this asymptotic co v ariance, whic h (as we will s ee) d ep ends on the graph structure. 3.2 Some consequences Theorem 1 can b e sp ecialized to p articular noise and communicatio n mo dels. Here w e derive some of its consequ ences for the AEN, ANN and BC mo dels. F or any mo del for which Theorem 1 (b ) holds, w e d eﬁne the a v erage mean-squared error as AMSE( L ; θ ∗ ) : = 1 m trace( e P ( θ ∗ )) , (17) corresp onding to asymptotic error v ariance, a v eraged o v er no des of the graph. Corollary 1 (Asymp totic MS E for sp eciﬁc mo d els) . Given a c onsensus matrix L with se c ond- smal lest e igenvalue λ 2 ( L ) > 1 2 , the se quenc e { θ n } is a str ongly c onsistent e stimator of the aver age θ ∗ , with asymptotic MSE c har acterize d as fol lows: (a) F or the additive e dge-b ase d noise ( AEN ) mo del (7) : AMSE( L ; θ ∗ ) ≤ σ 2 m m X i =2   max j = 1 ,...,m P k 6 = j L 2 ( j, k ) 2 λ i ( L ) − 1   . (18) (b) F or the additive no de- b ase d noise ( ANN ) mo del (8) and the bi t-c onstr aine d ( BC ) mo del (9) : AMSE( L ; θ ∗ ) = σ 2 m m X i =2  [ λ i ( L )] 2 2 λ i ( L ) − 1  , (19) wher e the varianc e term σ 2 is given by the quantization noise E  Q B ( θ + ξ ) 2 − θ 2 | θ  for the BC mo del, and the noise varianc e V ar( ξ ( v ′ )) for the ANN mo del. Pr o of. The essent ial ingredient con trolling th e asymp totic MSE is the conditional co v ariance matrix Σ θ ∗ , which sp eciﬁes e P via the Lyapuno v equation (16 ). F or analyzing mo del AEN, it is useful to establish ﬁr s t the follo wing auxiliary r esult. F or eac h i = 1 , . . . , m − 1, we ha v e e P ii ≤ | | | Σ θ ∗ | | | 2 2 λ i +1 ( L ) − 1 , (20) where | | | Σ θ ∗ | | | 2 = | | | Σ | | | 2 is th e sp ectral norm (maximum eigen v alue f or a p ositiv e s emd eﬁnite sym- metric matrix). T o see this fact, note that e U T Σ e U  e U T [ | | | Σ | | | 2 I ] e U = | | | Σ | | | 2 I . 7 Since e P satisﬁes the Ly apun o v equation, we h a v e  e J − I 2  e P + e P  e J − I 2  T  | | | Σ | | | 2 I . Note that the d iagonal en tries of the matrix  e J − I 2  e P + e P  e J − I 2  T are of the f orm (2 λ i +1 − 1) e P ii . The diﬀerence b et wee n the RHS and LHS m atrices constitute a p ositive semideﬁ nite matrix, whic h m ust hav e a n on -n egativ e diagonal, imp lyin g the claimed inequalit y (20) . In order to use the b ound (20), it remains to compute or upp er b ound the sp ectral n orm | | | Σ | | | 2 , whic h is most easily done u sing th e element wise r epresen tation (11). (a) F or the AEN mo del (7), w e ha ve E [ Y ( i, k ) Y ( j, ℓ ) − θ k θ ℓ | θ ] = E [ ξ ( i, k ) ξ ( j, ℓ )] . (21) Since we hav e assumed th at the random v ariables ξ ( i, k ) on eac h edge ( i, k ) are i.i.d., with zero-mean and v ariance σ 2 , we ha v e E [ Y ( i, k ) Y ( j, ℓ ) − θ ( k ) θ ( ℓ ) | θ ] = ( σ 2 if ( i, k ) = ( j, ℓ ) and i 6 = j 0 otherwise. Consequent ly , from the element wise expression (11 ), we conclude that Σ is diagonal, with entries Σ ( j, j ) = σ 2 X k 6 = j L 2 ( k , j ) , so that | | | Σ | | | 2 = σ 2 max j = 1 ,...,m P k 6 = j L 2 j k , which establishes th e claim (18). (b) F or the BC mo del (9), we h a v e E [ Y ( i, k ) Y ( j, ℓ ) − θ ( k ) θ ( ℓ ) | θ ] = ( σ 2 qn t if i = j and k = ℓ 0 otherwise , (22) where σ 2 qn t : = E  Q B ( θ + ξ ) 2 − θ 2 | θ  is the quantiza tion noise. Th erefore, w e h a v e Σ ( θ ∗ ) = σ 2 qn t L 2 , and using the fact that e U consists of eigen vec tors of L (and hence also L 2 , the Ly apun o v equation (16) tak es the form  e J − I 2  e P + e P  e J − I 2  T = σ 2 qn t ( e J ) 2 , whic h has the explicit diagonal solution e P with entries e P ii = σ 2 qnt λ 2 i +1 ( L ) 2 λ i +1 ( L ) − 1 . Comp uting th e asymp totic MSE 1 m P m − 1 i =1 e P ii yields the claim (19). Th e pro of of the same claim for the ANN mo d el is analogous. 8 3.3 Scaling b ehavior for sp eciﬁc graph classes W e can obtain fu rther ins ight b y considering Corollary 1 for sp eciﬁc graphs, and particular c hoices of consen s us matrices L . F or a ﬁxed graph G , consid er the graph Laplacian L ( G ) deﬁ n ed in equation (3). It is easy to see th at L ( G ) is alw a ys p ositiv e semi-deﬁnite, with min im al eigen v alue λ 1 ( L ( G )) = 0, corresp ond ing to the constan t v ector. F or a connected graph, the second smallest eigen v ector L ( G ) is strictly p ositiv e [7]. Therefore, giv en an und irected graph G that is connected, the most straigh tforwa rd mann er in wh ic h to ob tain a consensus matrix L satisfying the cond itions of Corollary 1 is to rescale the graph Laplacian L ( G ), as deﬁ n ed in equation (3), b y its second smallest eigen v alue λ 2 ( L ( G )), thereby formin g the rescaled consensus m atrix R ( G ) : = 1 λ 2 ( L ( G )) L ( G ) . (23) with λ 2 ( R ( G )) = 1 > 1 2 . With this c hoice of consensus matrix, let us consider the implications of Corollary 1(b), in application to the add itiv e no d e-based noise (ANN) mo del, for v arious graphs. W e b egin with a simple lemma, pr o v ed in App endix A, sho wing that, up to constan ts, th e scaling b eha vior of the asymptotic MSE is controll ed by the second smallest eigen v alue λ 2 ( L ( G )). Lemma 1. F or any c onne cte d gr aph G , using the r esc ale d L aplacian c onsensus matrix (23) , the asymptot ic MSE for the ANN mo del (8) satisﬁes the b ounds σ 2 2 λ 2 ( L ( G )) ≤ AMSE( R ( G ); θ ∗ ) ≤ σ 2 λ 2 ( L ( G )) , (24) wher e λ 2 ( L ( G )) is the se c ond smal lest eige nvalue of the gr aph. Com bined with kno wn resu lts from sp ectral graph theory [7], Lemma 1 allo ws us to make sp eciﬁc predictions ab out th e num b er of iterations required, for a given graph top ology of a giv en size m , to reduce the asymptotic MSE to any δ > 0: in p articular, the required num b er of iterations scales as n = Θ  σ 2 λ 2 ( L ( G )) 1 δ  . (25) Note that this scaling is similar bu t diﬀeren t f rom the scaling of noiseless up d ates [6, 9], wh ere the MSE is (with h igh p robabilit y) up p er b ounded by δ f or n = Θ ( log(1 /δ ) − log(1 − λ 2 ( L ( G ))) ), which scales as n = Θ  log(1 /δ ) λ 2 ( L ( G ))  , (26) for a deca ying sp ectral gap λ 2 ( L ( G )) → 0. 3.4 Illustrativ e sim ulations W e illustr ate th e predicted scaling (25) by some sim ulations on diﬀerent classes of graphs. F or all exp eriment s rep orted here, we s et the step size parameter ǫ n = 1 n +100 . The additiv e oﬀset serves to ensure stabilit y of the up d ates in very early roun ds, due to the p ossibly large gain s p eciﬁed by the rescaled Laplacian (23). W e p erformed exp erimen ts for a r ange of graph s izes, for the additiv e no de n oise (ANN) mo del (8), with n oise v ariance σ 2 = 0 . 1 in all cases. F or eac h graph size m , w e measured the num b er of iterations n required to reac h a ﬁxed leve l δ of mean-squared err or. 9 3.4.1 Cycle graph Consider the ring graph C m on m ve rtices, as illustrated in Figure 2(a). Panel (b ) provi des a log-log plot of the MSE v ersus th e iteration n umb er n ; eac h trace corresp onds to a particular sample path. Notice ho w the MSE o v er eac h sample co v erges to zero. Moreo ve r, since T heorem 1 pr edicts that the MSE shou ld d r op oﬀ as 1 /n , the linear rate sh o wn in th is log-log plot is consisten t. Figure 2(c) plots th e num b er of iterations (vertic al axis) r equired to achiev e a giv en constant MSE v ersus the size of th e ring graph (horizon tal axis). F or the ring graph, it can b e shown (see Chung [7]) that the s econd smallest eigen v alue scales as λ 2 ( L ( C m )) = Θ (1 /m 2 ), wh ic h implies that th e num b er of iterations to ac hieve a ﬁxed MSE for a r in g graph with m v ertices should scale as n = Θ( m 2 ). Consisten t with this prediction, the plot in Figure 2(c) sho ws a quadr atic scaling; in p articular, note the excellen t agreemen t b et w een the theoretical prediction and the data. 10 0 10 1 10 2 10 3 10 4 10 5 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 Steps Mean Squared Error 0 50 100 150 200 0 200 400 600 800 1000 1200 Nodes Steps to ε = 1 (a) (b) (c) Figure 2. Compariso n o f empirical simulations to theor etical predictions for the ring g raph in panel (a). (b) Sample path plo ts of log MSE versus log iter ation num ber : as predicted by the theory , the log MSE scale s linearly with lo g iterations. (c) P lot of num b er o f itera tions (vertical axis) req uired to reach a ﬁxed level of MSE versus the graph size (hor izontal axis ). F or the r ing graph, this q uantit y scales quadra tica lly in the gr a ph size, cons istent with Co r ollary 1. 3.4.2 Lattice mo del Figure 3 (a) sh o ws the t wo- dimens ional four nearest-neigh b or lattice graph w ith m v ertices, denoted F m . Again, p anel (b ) corresp onds to a log-log p lot of the MSE ve rsu s the iteration num b er n , with eac h trace corresp onding to a particular sample path, again sh o wing a linear rate of conv ergence to zero. P anel (c) sh o ws th e n umber of iterations required to ac h iev e a constan t MSE as a fu n ction of the graph size. F or the lattice, it is kno wn [7] that λ 2 ( L ( F m )) = Θ(1 /m ), wh ic h imp lies that th e critical n umb er of iterations should scale as n = Θ( m ). Note that pan el (c) shows linear scaling, again consisten t with the theory . 3.4.3 Expander graphs Consider a bipartite graph G = ( V 1 , V 2 , E ), with m = | V 1 | + | V 2 | v ertices and edges joining only v ertices in V 1 to those in V 2 , and constant degree d ; see Figure 4 (a) for an illustration with d = 3. 10 10 0 10 1 10 2 10 3 10 4 10 5 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 Steps Mean Squared Error 0 50 100 150 200 250 300 350 400 0 5 10 15 20 25 Nodes Steps to ε = 1 (a) (b) (c) Figure 3. Comparis o n of empirical simulations to theor etical pr edictions for the four nearest- neighbor la ttice (panel (a)). (b) Sample path plots of log MSE versus log iter ation num b er: as predicted by the theory , the lo g MSE s cales linear ly with log itera tions. (c) Plo t of num b er o f iterations (vertical a x is) requir ed to reach a ﬁxed level of MSE versus the gra ph size (hor izontal axis). F or the lattice, g raph, this qua nt ity s cales linear ly in the graph siz e, co nsistent with Coro llary 1. A bipartite graph of th is form is an expander [1, 2, 7] w ith p arameters α, δ ∈ (0 , 1), if for all su bsets S ⊂ V 1 of size | S | ≤ α | V 1 | , th e neigh b orho o d set of S —n amely , the su bset N ( S ) : = { t ∈ V 2 | ( s , t ) for some s ∈ S } , has card in alit y | N ( S ) | ≥ δ d | S | . In tuitiv ely , this prop ert y guarantee s that eac h sub set of V 1 , up to some critical size, “expand s” to a relativ ely large num b er of n eigh b ors in V 2 . (Note that the maxim um size of | N ( S ) | is d | S | , so that δ close to 1 guaran tees that th e neigh b orh o o d size is close to its maximum, for all p ossible su bsets S .) Exp ander graphs h a v e a num b er of interesti ng theoretical prop erties, includin g the prop ert y that λ 2 ( L ( K m )) = Θ(1)—that is, a b oun ded sp ectral gap [1 , 7]. In ord er to in vesti gate the b ehavio r of our algorithm for expanders, we construct a r andom bi- partite graph as follo ws: for an ev en n umber of no des m , we split them in to tw o sub sets V i , i = 1 , 2, eac h of size m/ 2. W e then ﬁ x a degree d , construct a random matc hing on d m 2 no des, and use it connect the v ertices in V 1 to those in V 2 . This p ro cedure forms a random bipartite d -regular graph; using the probabilistic metho d, it can b e sho wn to b e an edge-expander with with probabilit y 1 − o (1), as th e graph size tend s to inﬁnity [1, 11]. Giv en the constan t sp ectral gap λ 2 ( L ( K m )) = Θ(1), the scaling in n umb er of iterations to ac hiev e constan t MS E is n = Θ(1). T his th eoretica l prediction is compared to simulati on resu lts in Figure 4; n ote h o w the num b er of iterations so on settles do wn to a constan t, as pr ed icted b y the theory . 4 Pro of of Th eorem 1 W e n o w turn to the p ro of of Theorem 1 . The basic idea is to relate the b eh avior of the sto chasti c recursion (5) to an ordinary diﬀerential equation (ODE), and then use the O DE metho d [15] to analyze its prop erties. The O DE in v olv es a function t 7→ θ t ∈ R m , w ith its sp eciﬁc structure 11 10 0 10 1 10 2 10 3 10 4 10 5 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 Steps Mean Squared Error 0 50 100 150 200 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Nodes Steps to ε = 1 (a) (b) (c) Figure 4. Compariso n o f empirical simulations to theoretical predic tio ns for the bipartite expa nder graph in panel (a). (b) Sa mple pa th plots of log MSE versus log iter ation num ber : as predicted by the theory , the lo g MSE sc ales linearly with lo g iterations. (c) Plot of n umber o f itera tions (vertical axis) required to reach a ﬁxed level of MSE versus the graph s ize (ho r izontal axis). F or an expander , this quantit y r emains essentially constant with the gr aph size, consistent with Corollar y 1. dep endin g on the comm unication an d noise mo del un der consideration. F or th e AEN and ANN mo dels, the relev ant ODE is giv en by dθ t dt = − Lθ t . (27) F or the BC mo del, the appr o ximating ODE is give n by dθ t dt = − L C M ( θ t ) with C M ( u ) : =        u if | u | < M − M if u ≤ − M + M if u ≥ + M . (28) In b oth cases, th e O DE must satisfy the initial condition θ 0 ( v ) = x ( v ). 4.1 Pro of of Theorem 1(a) The follo wing result connects the discrete-time sto c hastic p ro cess { θ n } to the deterministic O DE solution, and establishes Theorem 1(a): Lemma 2. The ODEs (27) and (28) e ach have θ ∗ = x ~ 1 as their unique stable ﬁxe d p oint. Mor e over, for al l δ > 0 , we have P  lim sup n →∞ k θ n − θ t n k > δ  = 0 , for t n = P n k =1 1 k , (29) which implies that θ n → θ ∗ almost sur ely. Pr o of. W e prov e this lemma by us in g the ODE metho d and sto c hastic app ro ximation—in partic- ular, Theorem 1 f rom K ushner and Yin [14 ], whic h connects sto c hastic recursions of the form (5) 12 to the ordinary diﬀerenti al equation dθ t /dt = E ξ [ n ( L ⊙ Y ( θ t , ξ )) | θ t ]. Using the deﬁn ition of Y in terms of F , for the AEN and ANN mo d els, we h a v e E ξ [ F ( θ ( v ) , ξ ( v , r )) | θ ( v )] = θ ( v ) , from which we conclude th at with the stepsize c hoice ǫ m = Θ(1 /m ), we hav e E ξ [ n ( L ⊙ Y ( θ t , ξ )) | θ t ] = − Lθ t . By our assumptions on the eigenstru ctur e of L , the s ystem dθ t /dt = − L θ t is globally asymptotically stable, with a line of ﬁxed p oin ts { θ ∈ R m | Lθ = 0 } . Giv en the initial condition θ 0 ( v ) = x ( v ), we conclude that θ ∗ = x ~ 1 is the uniqu e asymptotically ﬁ xed p oin t of the ODE, so that the claim (29) follo w s from K u shner and Yin [14]. F or the BC mod el, the analysis is somewhat more inv olv ed, sin ce the quan tizatio n f unction saturates the outpu t at ± M . F or the dithered q u an tization mo del (9), w e h a v e E ξ [ n ( L ⊙ Y ( θ t , ξ )) | θ t ] = − L C M ( θ t ) , where C M ( · ) is the saturation fun ction (28 ). W e no w claim th at θ ∗ is also the uniqu e asymp totically stable ﬁxed p oint of the ODE dθ t /dt = − L C M ( θ t ) su b ject to th e initial condition θ 0 ( v ) = x ( v ). Consider the eig endecomp osition L = U J U T , where J = d iag { 0 , λ 2 ( L ) , . . . , λ m ( L ) } . Deﬁne the rotated v ariable γ t : = U T θ t , so that the ODE (28) can b e re-written as dγ t (1) /dt = 0 (30a) dγ t ( k ) /dt = − λ k ( L ) U T k C M ( U γ t ) , for k = 2 , . . . , m , (30b) where U k denotes the k th column of U . Note that U 1 = ~ 1 / k ~ 1 k 2 , s ince it is asso ciated with the eigen v alue λ 1 ( L ) = 0. Consequen tly , the solution to equation (30a ) tak es the form γ t (1) = U T 1 θ 0 = √ m x, (31) with uniqu e ﬁ xed p oin t γ ∗ (1) = √ m x , where x : = 1 m P m i =1 x ( i ) is the av erage v alue, A ﬁ xed p oin t γ ∗ ∈ R m for equations (30b) requires that U T k C M ( U γ ∗ ) = 0, for k = 2 , . . . , m . Giv en that the columns of U f orm an orthogonal basis, this implies that C M ( U γ ∗ ) = α ~ 1 for some constan t α ∈ R , or equiv alentl y (giv en the connection U γ ∗ = θ ∗ ) C M ( θ ∗ ) = α ~ 1 . (32) Giv en the p iecewise linear nature of th e saturation fu nction, this equalit y implies either that th e ﬁxed p oint satisﬁes the elemen t wise in equalit y θ ∗ > M (if α = M ); or the elemen t wise in equ alit y θ ∗ < − M (if α = − M ); or as the ﬁnal option, the θ ∗ = α when α ∈ ( − M , + M ). B ut from equation (31), w e kn o w that γ ∗ (1) = √ m x ∈ [ − M √ m, + M √ m ]. But we also h a v e γ ∗ (1) = ~ 1 T √ m θ ∗ b y deﬁn ition, so that putting together the pieces yields − M < ~ 1 θ ∗ m < M , (33) Th us the only p ossibilit y is that θ ∗ = α ~ 1 for some constan t α ∈ ( − M , + M ), and the relation U γ ∗ = α ~ 1 implies th at α = γ ∗ (1) / √ m = x , which establishes the claim. 13 4.2 Pro of of Theorem 1(b) W e analyze the up date (5) using results from Ben ve niste et al [4 ]. In particular, giv en the sto c hastic iteration θ n +1 = θ n + ǫ n H ( θ n , Y n +1 ), deﬁne the exp ectation h ( θ ) = E [ H ( θ , X )], its Jacobian matrix ∇ h ( θ ), and the co v ariance matrix Σ ( θ ) = E  ( H ( θ , X ) − h ( θ ))( H ( θ , X ) − h ( θ )) T  . Th en Theorem 3 (p. 110) of Benv eniste et al [4] asserts that as long as th e eigenv alues λ ( ∇ h ( θ )) are strictly b elo w − 1 / 2. then √ n ( θ n − θ ∗ ) d → N (0 , Q ) , (34) where the co v ariance matrix Q is the u nique solution to the Lyapuno v equation  I 2 + ∇ h ( θ ∗ )  Q + Q  I 2 + ∇ h ( θ ∗ )  T + Σ θ ∗ = 0 . (35) W e b egin by computing the conditional distribution h ( θ ); for the mo dels AEN and ANN it tak es the form h ( θ ) = − L θ (36) since the cond itional exp ectation of the rand om m atrix Y is give n by E [ Y | θ ] = θ ~ 1. F or the BC mo del, since the quan tization is ﬁ nite with maximum v alue M , the exp ectatio n is give n by h ( θ ) = − L C M ( θ ) (37) where the saturation fun ction was deﬁned pr eviously (28) . In addition, w e computed form of the the co v ariance matrix Σ θ ∗ previously (10 ). Finally , we note that ∇ h ( θ ∗ ) = − L (38) for all three mo dels. (This fact is immediate for mo dels AEN and ANN; for the BC mo del, note that Theorem 1 (a) guarantee s that θ ∗ falls in th e middle lin ear p ortion of the saturation f unction.) W e cannot im m ediately conclude that asymp totic normalit y (34) h olds, b ecause the matrix L has a zero eigen v alue ( λ 1 ( L ) = 0). Ho wev er, let u s decomp ose L = U J U T where U is the matrix with unit norm columns as eigenv ectors, and J = diag { 0 , λ 2 ( L ) , . . . , λ m ( L ) } . Let e U denote the m × ( m − 1) matrix obtained b y deleting the ﬁ rst column of U . Deﬁning the ( m − 1) vecto r β n = e U T θ n , we can rewrite the up date in ( m − 1)-dimensional space as β n +1 = β n + 1 n h − U T  L ⊙ Y n +1 ( θ n )  ~ 1 i , (39) for wh ic h the new eﬀectiv e h f unction is giv en b y ˜ h ( β ) = − e J β , with e J = diag { λ 2 ( L ) , . . . , λ m ( L ) } . Since λ 2 ( L ) > 1 2 b y assumption, the asymptotic normalit y (34) app lies to this reduced iteration, so that we can conclude that √ n ( β n − β ∗ ) d → N (0 , e P ) where e P solv es the Ly apun o v equation  e J − I 2  e P + e P  e J − I 2  T = e U T Σ θ ∗ e U . 14 W e conclude by noting that the asymptotic cov ariance of θ n is related to that of β n b y the relation P = U T " 0 0 0 e P # U, (40) from which Theorem 1(b ) f ollo ws. 5 Discussion This pap er analyzed the con ve rgence and asymptotic b ehavio r of distribu ted a v eraging algorithms on graphs with general noise mo d els. Usin g suitably damp ed up d ates, we sh o w ed that it is p ossible to obtain exact consensus, as opp osed to ap p ro ximate or near consensus , ev en in the presence of noise. W e guaran teed almost sur e con v ergence of our algorithms und er f airly general conditions, and moreo v er, u nder suitable stabilit y conditions, we sh o w ed that the error is asymp totica lly n or- mal, with a co v ariance matrix that can b e p redicted f rom the stru ctur e of the consensu s op er ator. W e p ro vided a num b er of simulat ions that illustrate the sharpness of these theoretical pr ed ictions. Although the cur r en t pap er has f o cused exclusiv ely on th e a v eraging problem, the metho ds of anal- ysis in this pap er are app licable to other t yp es of d istributed inf erence problems, su c h as computing quan tiles or ord er statistics, as well as compu ting v arious typ es of M -estimators. Obtaining analo- gous resu lts for more general problems of d istr ibuted statistical inferen ce is an in teresting direction for future r esearch. Ac kno wledgemen ts This wo rk wa s p resen ted in part at th e Allerton Conference on Control, Compu ting and Communi- cation, S ep tem b er 2007. W ork fun d ed by NSF-gran ts DMS-0605165 and CCF-0545862 C ARE E R to MJW. The authors thank Pra vin V araiy a and Alan Willsky for helpful comments. A Pro of of Lemma 1 W e b egin by noting that for the n ormalized graph Laplacian L ( G ), it is kno wn that for any graph, the second sm allest eigen v alue satisﬁes th e up p er b ound λ 2 ( L ( G )) ≤ m/ ( m − 1) ≤ 1. Moreo v er, w e hav e trace( L ( G )) = m . See Lemma 1.7 in Chung [7] for pr o ofs of these claims. Using these f acts, we establish Lemma 1 as follo ws. Recall that b y construction, w e ha ve R ( G ) = L ( G ) λ 2 ( L ( G )) , so that the second sm allest eigen v alue of R ( G ) is λ 2 ( R ( G )) = 1, and the remaining 15 eigen v alues are greater than or equal to one. Applying Corollary 1 to the ANN mo d el, w e h a v e AMSE( L ; θ ∗ ) = σ 2 m m X i =2  [ λ i ( R ( G ))] 2 2 λ i ( R ( G )) − 1  , = σ 2 m λ 2 ( L ( G )) m X i =2  [ λ i ( L ( G ))] 2 2 λ i ( L ( G )) − λ 2 ( L ( G ))  ≥ σ 2 2 λ 2 ( L ( G )) m trace( L ( G )) = σ 2 2 λ 2 ( L ( G )) using the fact th at trace( L ( G )) = m . In the other direction, u sing the fact that λ 2 ( R ( G )) ≥ 1 and th e b oun d x 2 2 x − 1 ≤ x for x ≥ 1. w e ha v e AMSE( L ; θ ∗ ) = σ 2 m m X i =2  [ λ i ( R ( G ))] 2 2 λ i ( R ( G )) − 1  , ≤ σ 2 m trace( R ( G )) = σ 2 λ 2 ( L ( G )) m trace( L ( G )) = σ 2 λ 2 ( L ( G )) . References [1] N. Alon. Eigen v alues and expanders. Combinato ric a , 6(2):8 3–96, 1986. [2] N. Alon and J. S p encer. The Pr ob abilistic M etho d . Wiley Inte rscience, New Y ork, 2000. [3] T. C. Aysal, M. Coates, and M. Rabbat. Distributed a v erage consensus using probabilistic quan tization. In IEEE Workshop on Stat. Sig. Pr o c. , Madison, WI, August 2007. [4] A. Benv eniste, M. Metivier, and P . Pr iouret. A daptive Algorithms and Sto chastic A ppr oxima- tions . S pringer-V erlag, New Y ork, NY, 1990. [5] V. Bork ar and P . V araiy a. Asymp totic agreemen t in distributed estimation. IEEE T r ans. Auto. Contr ol , 27(3):6 50–655, 1982. [6] S. Bo yd, A. Ghosh, B. Prabhak ar, and D. Sh ah. Rand omized gossip algorithms. IEE E T r ans- actions on Information The ory , 52(6):2508– 2530, 2006. [7] F.R.K. Chung. Sp e ctr al Gr aph The ory . American Mathematical So ciet y , Providence, R I , 1991. [8] M. H. deGro ot. Reac h ing a consensus. Journal of the Am eric an Statistic al Asso ciation , 69(34 5):118–12 1, Marc h 1974. 16 [9] A. G. Dimakis, A. Sarwat e, and M. J. W ainwrigh t. Geographic gossip: Eﬃcien t a v eraging for sensor net wo rks. IEEE T r ans. Signal Pr o c essing , 53:1205–1 216, Marc h 2008. [10] F. F agnani and S . Z ampieri. Av erage consens u s with pac k et drop comm un ication. SIAM J . on Contr ol and Optimization , 2007. T o app ear. [11] J . F eldman, T. Malkin, R. A. Servedio, C . S tein, and M. J. W ain wright. LP deco ding corrects a constan t fr action of errors. IEEE T r ans. Information The ory , 53(1):82–89 , January 2007. [12] A. Kashy ap, T. Basar, and R. Srik ant. Quantized consensus. Automatic a , 43:1192– 1203, 2007. [13] D. Kemp e, A. Dobra, and J. Gehrk e. Gossip-based computation of aggregate inform ation. Pr o c. 44th Ann. IEE E FOCS , pages 482–491 , 2003. [14] H. J. Ku shner and G. G. Yin. Sto chastic Appr oximation Algorithms and Applic ations . Springer- V erlag, New Y ork, NY, 1997. [15] L. Ljun g. Analysis of recurs ive sto c hastic algorithms. IEEE T r ansactions in Automatic Contr ol , 22:551 –575, 1977. [16] R. O lfati-Sab er, J. A. F ax, and R. M. Mu r ra y . Con s ensus and co op eration in n et w ork ed m ulti-agen t systems. Pr o c e e dings of the IEEE , 95(1):215 –233, 2007. [17] I. D. Schiza s, A. Rib eiro, and G. B. Giannakis. Consensu s in ad ho c WSNs with noisy links: P art I distrib uted estimation of deterministic signals. IEEE T r ansactions on Si g nal P r o c essing , 56(1): 350–364, 2008. [18] J . Tsitsiklis. Pr oblems in de c entr alize d de cision-making and c omputation . PhD th esis, Depart- men t of EEC S, MIT, 1984. [19] L. Xiao and S. Bo yd . F ast linear iteratio ns for distribu ted a ve raging. Systems & Contr ol L etters , 52:65–78, 2004. [20] L. Xiao, S. Boyd, and S.-J. Kim. Distributed av erage consensus with least-mean-square devi- ation. Journal of Par al lel and Distribute d Computing , 67(1):33–4 6, 2007. [21] M. E. Yildiz and A. Scaglione. Diﬀeren tial nested lattice enco din g for consensus problems. In Info. Pr o c. Sensor Networks (IPSN) , Cambridge, MA, Ap ril 2007. 17

Network-based consensus averaging with general noisy channels

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment