Distributed Kalman Filter via Gaussian Belief Propagation

Distrib uted Kalman Filter via Gaussian Belie f Prop agati on Danny Bickson IBM Haifa Research Lab Mount Carmel, Haifa 31905, Israel Email: dannybi@il.ibm .com Ori S hental Center for Magnetic Recording Research UCSD 9500 Gilman Driv e La Jolla, CA 92093, USA Email: oshental@ucsd.edu Danny Dole v School of Computer Science and Engineering Hebre w Unive rsity of Jerusalem Jerusalem 91904, Israel Email: dolev@cs.huji.ac.il Abstract —Recent result shows how to co mpute distribu- tively and efﬁciently the linear MMSE for the multiuser detection problem, using the Gaussian BP algo rithm. In the current work, we exte nd this c onstruction, a nd show that operating this algorithm twice on the matching inputs, has several interesting interpretations. First, we show equivalence to computing one iteratio n of the Kalman ﬁlter . Second, we show that the Kalman ﬁlter is a special case of the Gaussian inf ormation bottleneck algorithm, when the weight pa rameter β = 1 . T hird, we discuss the relatio n to the Afﬁne-scaling interior-point metho d and show it is a special case of Kalman ﬁlter . Besides o f the theoretical interest of t his linking esti- mation, compression/clustering and optimization, we allow a single distrib uted implementation of those a lgorithms, which is a highly pra ctical a nd important t ask in sensor and mobile ad-hoc networks. A pplication to numerous problem domains includes collabora tive signal processing and distrib uted allocat ion of resour ces in a communicatio n network. I . I N T R O D U C T I O N Recent work [1] shows h ow to compute efﬁ- ciently and distributiv el y the MM SE prediction for the mu ltiuser det ection probl em, using th e Gaussian Belief Propagation (GaBP) al gorithm. The basic idea is to shift the problem from li near algebra domain into a probabilist ic graphical model, solvin g an equiv alent inference problem using the efﬁcient belief propagation inference engine. [2] compares the em pirical performance of t he GaBP algorithm relativ e to other linear iterative algorithm s, demon- strating fa ster con ver gence. [3] elaborates on the relation to solving systems of linear equations. In the present w ork, we propose to e x tend the pre- vious construction, and sho w , that by per forming the MMSE computation twice on the matchin g inputs we are abl e to com pute sever al algorithms. First, we reduce the discrete Kalman ﬁlter computation [4] to a matrix in version problem and sh ow how to solve it using the GaBP algorithm. W e sho w that Kalm an ﬁlter iteration which i s compo sed from prediction and measurement steps can be com puted by t wo consecutive MMSE predictions. Second, we explore the relation to Gaussian information bottleneck (GIB) [5] and show that Kalman ﬁlter is a special instance of the GIB algorit hm, when the weight parameter β = 1 . T o t he best of the authors knowledge, this is t he ﬁrst algorith mic link bet ween the information bottleneck frame work and linear dy- namical systems. Third, we discuss t he connection to the Af ﬁne-scaling interior-point method and sho w it is an instance of the Kalman ﬁlter . Besides of the theoretical in terest o f linki ng com- pression, estim ation and opt imization together , our work is highly practical since it proposes a general frame work for computin g all of t he above tasks distributiv ely in a com puter network. This result can hav e many applications in t he ﬁelds of est imation, collaborative s ignal processing, dis tributed resource allocation etc. A closely related work is [6]. I n this work, Fre y et al. focus on the belief prop agation algorithm (a.k.a sum-product algorithm) usi ng factor graph topolo- gies. They show that the Kalman ﬁlt er algorithm can be computed using belief propagation over a factor graph. In this contribution we extend their work in se veral directions. First, we extend the computation to vector v ariables (relative to scalar var iables in Fre y’ s work). Second, we use a dif ferent graphical mo del: an u ndirected graphical m odel which results i n si mpler update rules, where Frey uses factor -graph wi th two types of messages: factor to variables and variables to factors. Third and most i mportant, we allow an ef ﬁcient distributed calculation of th e Kalman ﬁlt er steps, where Frey’ s algorithm is centralized. Another related work is [7]. In this work the link between Kalman ﬁlter and linear programming is established. In this work we propose a ne w and diffe rent const ruction which ties t he two algorithm s. The structure of this paper is as follows. In Section II we describe the discrete Kalman ﬁlter . In Section III we outline the GIB algorithm and discuss its relation t o the Kalman ﬁlter . Section IV presents th e Afﬁne-scaling i nterior-point meth od and compares it to the Kalman ﬁlter algorithm . Section V presents our novel constructi on for per- forming an efﬁcient di stributed computation of the three methods. I I . K A L M A N F I LT E R A. An Overview The Kalman ﬁlter is an efﬁ cient iterativ e al- gorithm t o esti mate the state of a dis crete-time controlled p rocess x ∈ R n that is g over ned by th e linear stochastic difference equation 1 : x k = Ax k − 1 + B u k − 1 + w k − 1 , (1) with a measurement z ∈ R m that is z k = H x k + v k . The random variables w k and v k that rep- resent t he process and m easurement A WGN n oise (respectiv ely). p ( w ) ∼ N (0 , Q ) , p ( v ) ∼ N (0 , R ) . W e furt her assume that t he m atrices A, H , B , Q, R are given 2 . The discrete Kalm an ﬁlter update equatio ns are giv en by [4]: 1 In this pape r , we assume there is no e xternal input, namely x k = Ax k − 1 + w k − 1 . Ho weve r , our approach can be easily exten ded to support external inputs. 2 Another possible extension i s that the matrices A , H, B , Q, R change in time, in this paper we assume they are ﬁxed. Ho we ver , our approach can be generalized to this case as well. The prediction step: ˆ x − k = A ˆ x k − 1 + B u k − 1 , (2a) P − k = AP k − 1 A T + Q. (2b) The measurement step: K k = P − k H T ( H P − k H T + R ) − 1 , (3a) ˆ x k = ˆ x − k + K k ( z k − H ˆ x − k ) , (3b) P k = ( I − K k H ) P − k . (3c) where I is the i dentity m atrix. The algorithm operates in rou nds. In round k the estimates K k , ˆ x k , P k are computed, i ncorporating the (noisy) measurement z k obtained in this rou nd. The output of the algorit hm are t he m ean vector ˆ x k and the cov ariance matrix P k . B. New cons truction Our novel contribution is a new efﬁ cient dis- tributed algorithm for computi ng the Kalm an ﬁlter . W e begin by showing that t he Kalman ﬁlt er can be comput ed by inv ertin g t he following cov ariance matrix: E =   − P k − 1 A 0 A T Q H 0 H T R   , (4) and taking the lower right 1 × 1 block to be P k . The computation of E − 1 can b e don e efﬁc iently using recent advances in t he ﬁeld o f Gaussian belief propagation [1], [3]. The int uition for our approach, is that the Kalm an ﬁlter is composed o f two steps . In the p rediction step, given x k , we comp ute the MMSE prediction of x − k [6]. In the measurement step, we compute the MMSE p rediction of x k +1 giv en x − k , the ou tput of the predictio n st ep. Each MMSE compu tation can be done using the GaBP algorithm [1]. The basi c idea, is that giv en t he joint Gaussian di stribution p ( x , y ) with the covariance matrix C =  Σ xx Σ xy Σ y x Σ y y  , we can com pute the MMSE prediction ˆ y = arg max y p ( y | x ) ∝ N ( µ y | x , Σ − 1 y | x ) , where µ y | x = (Σ y y − Σ y x Σ − 1 xx Σ xy ) − 1 Σ y x Σ − 1 xx x , Σ y | x = (Σ y y − Σ y x Σ − 1 xx Σ xy ) − 1 . This in turn is equiv alent to computing t he Schur complement of the l owe r right b lock of the m atrix C . In total, computing the MM SE prediction in Gaussian graphical model boi ls down to a com puta- tion of a m atrix in verse. In [3] we have shown th at GaBP is an efﬁc ient iterative algorithm for solving a system of linear equation s (or equiv alently com put- ing a m atrix in verse). In [1] we h a ve s hown th at fo r the speciﬁc case of linear detection we can compu te the MMSE estimator using the GaBP algorit hm. Next, we show t hat performing two consecutive computations of the M MSE are equivalent to one iteration of the Kalman ﬁlter . Theor em 1: The lo wer right 1 × 1 block of the matrix in verse E − 1 (eq. 4), comput ed by two MMSE it erations, is equiv alent to the computation of P k done by one iteration of the Kalman ﬁlter algorithm. Proof of Theorem 1 is given in Append ix A. In Section V we explain how to utilize the above observation to an ef ﬁcient distri b uted iterativ e alg orithm for computi ng the Kalman ﬁlter . I I I . G AU S S I A N I N F O R M A T I O N B O T T L E N E C K Giv en t he joint distribution of a sou rce variable X and anoth er relev ance variable Y , Informatio n bottleneck (IB) operates to compress X, while pre- serving information about Y [8], [9], using the following variational problem: min p ( t | x ) L : L ≡ I ( X ; T ) − β I ( T ; Y ) T represents the compressed representatio n of X via the conditional di stributions p ( t | x ) , while the information that T maintains on Y i s captu red by the distrib ution p ( y | t ) . β > 0 is a l agrange multiplier which weig hts t he tradeoff between minimizi ng the compression information and maximizing the rele vant information. As β → 0 we are interested solely i n compression, but all relev ant in formation about Y is lost I ( Y ; T ) = 0 . When β → ∞ where are focused on preserv ation of relev ant i nformation, in thi s case T is simp ly the distribution X and we obtain I ( T ; Y ) = I ( X ; Y ) . The interesting cases are in between, when for ﬁnite values of β we are able to extract rather compressed representatio n of X while still main taining a sign iﬁcant fraction of the original information about Y . An it erati ve algorith m for sol ving the IB p roblem is giv en in [9]: P k +1 ( t | x ) = P k ( t ) Z k +1 ( x,β ) · · exp( − β D K L [ p ( y | x ) || p k ( y | t )]) , (5a) P k ( t ) = R x p ( x ) P k ( t | x ) dx, (5b) P k ( y | t ) = 1 P k ( t ) R x P k ( t | x ) p ( x, y ) dx. (5c) where Z k +1 is a normalization factor computed in round k + 1 . The Gaussian inform ation bottleneck (GIB) [5] deals with the special c ase where the underlying dis- tributions are Gaussian. In thi s case, the comput ed distribution p ( t ) is Gaussi an as well, represented by a linear transform ation T k = A k X + ξ k where A k is a joint covariance matrix of X and T , ξ k ∼ N (0 , Σ ξ k ) is a multivariate Gaussian independent of X. The outputs of the algorithm are the cova riance m atrices representing the linear transformation T : A k , Σ ξ k . An iterativ e algorithm is derive d by su bstituting Gaussian distributions into (5), resulting in the fol- lowing update rules: Σ ξ +1 = ( β Σ t k | y − ( β − 1)Σ − 1 t k ) , (6a) A k +1 = β Σ ξ k +1 Σ − 1 t k | y A k ( I − Σ y | x Σ − 1 x ) . (6b ) T ABLE I S U M M A RY O F N OTA T I O N S I N T H E G I B [ 5 ] PA P E R V S . K A L M A N FI LT E R [ 4 ] GIB [5] Kalman [4] Kalman meaning Σ x P 0 a-priori esti mate error cov ariance Σ y Q Process A W GN noise Σ t k R Measurement A WGN noise Σ xy , Σ y x A, A T process st ate transformation matrix Σ xy A, A T Σ y x H T , H measurement transformation matri x Σ ξ k P k posterior error cov ariance in round k Σ x | y k P − k a-priori error covarian ce in round k Since th e underlying graphi cal model of both algorithms (GIB and Kalman ﬁlter) is M arkovian with Gaus sian probabil ities, i t is int eresting to ask what is the relation between them . In t his w ork we show , that the Kalman ﬁlter posterior error cov ariance computatio n is a special case of the GIB algorit hm when β = 1 . Furthermore, we show Y X T (a) X k -1 X k X k - (b) Z k -1 Z k X k -1 X k X k - (c) (d) Fig. 1. Comparison of the different graphical models used. (a) Gaussian Information Bottleneck [5] (b) Kalman Filter (c) F rey’ s sum-product factor graph [6] (d) Our new construction. how to com pute GIB using the Kalman ﬁlt er when β > 1 (the case where 0 < β < 1 is not interesti ng since it gives a degenerate soluti on where A k ≡ 0 [5].) T able I outlines the diff erent no tations u sed by both algorithms. Theor em 2: The GIB algorit hm when β = 1 i s equiv alent to t he Kalman ﬁlter algorithm. The proof is give n in Appendix B. Theor em 3: The GIB algorithm when β > 1 can be computed by a modiﬁed Kalman ﬁlter iteration. The proof is give n in Appendix C. There are some di f ferences between t he GIB algorithm and Kalm an ﬁlter computation. First, the Kalman ﬁlter has input observations z k in each round. No te t hat the obs erv ations do not affect the posterior error cov ariance computati on P k (eq. 3c), P(t) (a) P(y|t) P(t|x) (b) X k-1 X k X k Affine- mapping Measure- ment (c) X k-1 X k Gradient descent Translation Inverse affine mapping Prediction V k - Fig. 2. Comparison of the schematic operation of the different algorithms. (a) i terativ e information bottleneck operation (b) Kalman ﬁlter operation (c) Afﬁne-scaling operation. but affect the p osterior mean ˆ x k (eq. 3b). Second, Kalman ﬁlter compu tes both posterior m ean ˆ x k and error covariance P k . The cov ariance Σ ξ k computed by the GIB algorithm was shown to b e ident ical to P k when β = 1 . The GIB algorithm does not compute t he pos terior m ean, but com putes an additional covariance A k (eq. 6 b), which is assum ed known in the Kalman ﬁlter . From the i nformation theoretic p erspecti ve, o ur work extends the ideas present ed in [10]. Predicti ve information is d eﬁned to be th e m utual information between the past and the future of a time serias. In that sense, by using Theorem 2, Kalm an ﬁlter can be t hought of as a p rediction of the future, which from the one hand compresses the information about past, and from the o ther h and m aintains information about the present. The origin s of simi larity bet ween the GIB algo- rithm and Kalman ﬁlter are root ed in the IB iterative algorithm: For computi ng (5a), we need to com pute (5b,5c) in recursion, and vice versa. I V . R E L A T I O N T O T H E A FFI N E - S C A L I N G A L G O R I T H M One of t he most ef ﬁcient i nterior point methods used for linear programm ing is the Afﬁne-sca ling algorithm [11]. It is known th at the Kalman ﬁlt er is linked to t he Afﬁne-scaling algorithm [7]. In t his work we giv e an alternate proof, based on different construction, which shows that Afﬁne-scaling is an instance of Kalm an ﬁlter , which is an instance of GIB. Th is link between estimati on and optim ization allows for nu merous applications. Furtherm ore, by providing a single distribute efﬁcient impl ementa- tion of the GIB algorithm, we are able to solve numerous problems in communicati on networks. The linear programming problem in its canonical form is give n by: minimize c T x (7a) subject to A x = b , x ≥ 0 . (7b) where A ∈ R n × p with rank { A } = p < n . W e assume the p roblem is solvable with an op timal x ∗ . W e also assume that the problem is strict ly feasible, in other words there exists x ∈ R n that satis ﬁes A x = b and x > 0 . The Af ﬁne-scaling algorithm [11] is s ummarized below . Assume x 0 is an i nterior feasibl e point to (7b). Let D = diag ( x 0 ) . T he Afﬁne-scaling is an iterativ e algorithm which computes a new feasible point that minimi zes the cost function (7a): x 1 = x 0 − α γ D 2 r (8) where 0 < α < 1 is the step size, r is the step direction. r = ( c − A T w ) , (9a) w = ( AD 2 A T ) − 1 AD 2 c , (9b) γ = max i ( e i P D c ) . (9c) Where e i is t he i th unit vector and P is a p rojection matrix giv en by: P = I − D A T ( AD 2 A T ) − 1 AD . (10) The algorithm continues in rounds and is guaranteed to ﬁnd an optimal solution in at most n rounds. In a nuts hell, in each iteration, the Afﬁne-scaling algorithm ﬁrst performs an Afﬁ ne-scaling with re- spect t o th e current solution poin t x i and ob tains the direction of descent by proj ecting the gradient of the transform ed cost functio n on the nu ll space of the constraints set. Th e ne w solut ion is obtained by translating the current soluti on along the di rection found and then mapping the resul t b ack into the original space [7]. Th is has interesting analogy for the two p hases of t he Kalman ﬁlter . Theor em 4: The Afﬁne-sca ling algorithm it era- tion is an instance of the Kalman ﬁlter algorith m iteration. Proof is giv en in Appendix D. V . E FFI C I E N T D I S T R I B U T E D C O M P U TA T I O N W e have s hown how to express the Kalman ﬁlter , Gaussian information bottleneck and Afﬁne-scaling algorithms as a two step MMSE computation. Each step inv olves in verting a 2 × 2 block m atrix. Recent result by Bickso n and Shental et al. [1] s how that the MMSE comput ation can be done efﬁc iently and distributiv ely using the Gaussian b elief propagation algorithm. Beca use of space limitati ons the full algorithm is not reproduced here. The interested reader is referred to [1], [3] for a com plete deriv ation of the GaBP update rul es and conv ergence analysis. The GaBP algorithm is summarized in T able II. Regarding conv ergence, if it con verges, GaBP is known to result i n e xact inference [12]. Determining the e xact region of con vergence and con ver gence rate remain open research problem s. All that is known is a sufﬁcient (but not necessary) cond i- tion [13], [14] stating that GaBP con verges when the spectral radius satisﬁes ρ ( | I K − A | ) < 1 , where A is ﬁrst norm alized s.t. the main diagonal con tains ones. A stricter sufﬁcient conditi on [12], determi nes that the matrix A m ust be diagonally dominant ( i.e. , | A ii | > P j 6 = i | A ij | , ∀ i ) in o rder for GaBP to con ver g e. Regarding con ver gence s peed, [15] shows th at when con verging, the algorithm con verges i n O ( l og ( ǫ ) /log ( γ )) it erations, where ǫ is the desired accurac y , and 1 / 2 < γ < 1 is a parameter related to the in verted matrix . Th e computati on overhead in each iteratio n is determined by the number of non- zero elements of the in verted matrix A . In practice, [16] demonstrates con ver gence of 5-10 rou nds on sparse m atrices with severa l mil lions of variables. [17] sho ws con vergence of dense constraint matrices T ABLE II C O M P U T I N G x = A − 1 b V I A G A B P [ 3 ] . # Stage Operation 1. Initializ e Compute P ii = A ii and µ ii = b i / A ii . Set P k i = 0 and µ k i = 0 , ∀ k 6 = i . 2. Iterate P ropagate P k i and µ k i , ∀ k 6 = i such that A k i 6 = 0 . Compute P i \ j = P ii + P k ∈ N ( i ) \ j P k i and µ i \ j = P − 1 i \ j ( P ii µ ii + P k ∈ N( i ) \ j P k i µ k i ) . Compute P ij = − A ij P − 1 i \ j A j i and µ ij = − P − 1 ij A ij µ i \ j . 3. Chec k If P ij and µ ij did not con ver ge, return to #2. Else, continue to #4. 4. Infer P i = P ii + P k ∈ N( i ) P k i , µ i = P − 1 i ( P ii µ ii + P k ∈ N( i ) P k i µ k i ) . 5. Output x i = µ i of size up to 150 , 000 × 150 , 000 in 6 rounds, where the algorithm i s run i n parallel using 1 ,024 CPUs. Em pirical comparison with ot her i terativ e algorithms is giv en in [2]. V I . E X A M P L E A P P L I C A T I O N The TransF ab software package is a distributed middlewar e dev eloped in IBM Haifa Labs, which supports real time forwarding of message streams, providing quali ty o f service guarantees. W e plan to use our distributed Kalman ﬁlter algorit hm for online moni toring o f software resources and perfor- mance. On each second each n ode records a vec- tor of performance p arameters like memory usage, CPU usage, current b andwidth, queue sizes etc. The nodes execute the d istributed Kalman ﬁlter algo- rithm on the background. Figure 3 plots a cov ariance matrix of runni ng an experiment usi ng t wo TransF ab nodes propagati ng data. The covariance m atrix is used as an i nput the K alman ﬁlter algorithm. Y ello w sections s how hi gh correlation between measured parameters. Ini tial results are encouraging, we plan to report them using a future contribution. V I I . C O N C L U S I O N In this work we h a ve li nked together s e veral diffe rent algorithms from the the ﬁelds of estimation (Kalman ﬁlter), clustering /compression (Gaussian information bottleneck) and optim ization (Afﬁne- scaling in terior- point method). Besides of the theo- retical interest in linking tho se different domains, we are motiv ated by practical problems in commu nica- tion networks. T o this end, we propos e an ef ﬁcient distributed iterative algorithm , t he Gauss ian bel ief 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 −10 0 10 20 30 40 50 60 Fig. 3. Cov ariance matrix which represents vector data captured from two Tran sfab nodes. propagation algorithm, t o be used for efﬁciently solving these problems. A C K N OW L E D G M E N T O. Shental acknowledges the partial support of the NSF (Grant CCF-0514859). The authors are grateful to Noam Slonim and Naftali Tishby from the Hebrew U niv ersit y of Jerusalem for u seful dis- cussions. R E F E R E N C E S [1] D. Bickson, O. Shental, P . H. Siegel, J. K. W olf, and D. Dole v , “Gaussian belief propagation based multiuser detection, ” in IEEE I nt. Symp. on Inform. T heory (ISIT ) , T oronto, Canada, July 2008. [2] ——, “Linear detection via belief propagation, ” in Proc. 45th Allerton Conf. on Communications, Contr ol and Computing , Monticello, IL, USA, Sept. 2007. [3] O. Shental, D. Bickson, P . H. Siegel, J. K. W olf, and D. Dole v , “Gaussian belief propagation solve r for systems of li near equa- tions, ” i n IEEE Int. Symp. on I nform. Theory (ISIT) , T oronto, Canada, July 2008. [4] G. W elch and G. Bishop, “ An introduction to t he kalman ﬁlter , ” T ech. Rep ., 2006. [Online]. A v ailable: http://www .cs.unc.edu/ ∼ welch/kalman/kalmanIn tro.html [5] G. Chechik, A. Gl oberson, N. T ishby , and Y . W eiss, “Informa- tion bottleneck for gaussian variables , ” in Journ al of Mach ine Learning Researc h , vo l. 6. Cambridge, MA, USA: MIT Press, 2005, pp. 165–188. [6] F . Kschischang, B. Frey , and H. A. Loeliger, “Factor graphs and the sum-product algorithm, ” in IEEE T ransactions on Information T heory , vo l. 47, Feb. 2001, pp. 498–51 9. [7] S. Puthenpura, L. Sinha, S.-C. Fang, and R. Saigal, “Solving stochastic programming problems via kalman ﬁlter and afﬁne scaling, ” in E ur opean Jou rnal of Operational Resear ch , vol. 83, no. 3, 1995, pp. 503–513. [Online]. A vailable: http://ideas.repec.org/a/eee /ejores/v83y199 5i3p503- 513.html [8] N. Tishby , F . Pereira, and W . Bialek, “The information bottle- neck method, ” in T he 37th annual Allerton Confer ence on Com- munication, Contr ol, and Computing , in vited paper , September 1999. [9] N. S lonim, “T he information bottelneck: T heory and appli- cation, ” in Ph.D. Thesis, School of Computer Science and Enigeering , The Hebr ew University of Jerusa lem , 2003. [10] W . Bialek, I. Nemenman, and N. T ishby , “Predictability , com- plexity , and learning, ” in Neural Comput. , vol. 13, no. 11. Cambridge, MA, US A: MIT Press, Nov ember 2001, pp. 24 09– 2463. [11] R. J. V anderbei, M. S. Meketon, and B. A. Freedman, “ A modiﬁcation of karmarkar’ s linear programming algorithm, ” in Algorithmica , vol. 1, no. 1, March 1986, pp. 395–407. [12] Y . W eiss and W . T . Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology , ” in Neura l Computation , vo l. 13, no. 10, 2001, pp. 2173–2 200. [13] J. Johnson , D. Malioutov , and A. W illsky , “W alk-sum interpre- tation and analysis of gaussian belief propagation, ” in Nine- teenth Annual Confer ence on Neural Information Pr ocessing Systems (NIPS 05’) , 2005. [14] D. M. Malioutov , J. K. Johnson, and A. S. W illsky , “W alk- sums and belief propagation in Gaussian graphical models, ” in J ournal of Machine Learning Resear ch , vol. 7, Oct. 2006. [15] D. Bickson, Y . T ock, D. Dolev , and O. Shental, “Polynomial linear programming with gaussian belief propaga tion, ” in the 46th Allerton Conf. on Communications, Control and Comput- ing , Monticello, IL, USA, 2008. [16] D. Bickson and D. Malkhi, “ A unifying framew ork for rating users and data items in peer-to-peer and social networks, ” in P eer-to-P eer Networking and Applications (PP NA) J ournal, Spring er-V erla g , 2008. [17] D. Bi ckson, D. Dolev , and E. Y om-T ov , “ A gaussian belief propagation solver for large scale support vector machines, ” in 5th Eur opean C onfer ence on Complex Systems , Jerusalem, Sept. 2008. A P P E N D I X A Pr oof: W e prove that in verting the matrix E (eq. 4) is equiv alent t o o ne it eration of the Kalman ﬁlter for computing P k . W e st art from the matrix E and sh ow that P − k can be computed i n recursion using the Schur com- plement formula: D − C A − 1 B (11) applied t o the 2 × 2 upper left submatrix of E, where D , Q, C , A T , B , A, A , P k − 1 we get: P − k = D z}|{ Q − z}|{ + C z}|{ A T − A − 1 z }| { P k − 1 B z}|{ A . Now we compute recursiv ely the Schur comple- ment of lower ri ght 2 × 2 submatrix of the matrix E us ing the m atrix in version lemm a: A − 1 + A − 1 B ( D − C A − 1 B ) − 1 C A − 1 where A − 1 , P − k , B , H T , C , H , D , Q. In total we get: A − 1 z}|{ P − k + A − 1 z}|{ P − k B z}|{ H T ( D z}|{ R + C z}|{ H A − 1 z}|{ P − k B z}|{ H T ) − 1 C z}|{ H A − 1 z}|{ P − k = (12) ( I − ( 3a ) z }| { P − k H T ( H P − k H T + R ) − 1 H ) P − k = = ( 3c ) z }| { ( I − K k H ) P − k = P k A P P E N D I X B Pr oof: Loo king at [5, § 39 ], when β = 1 we get Σ ξ +1 = (Σ − 1 t k | y ) − 1 = Σ t k | y = MMSE z }| { Σ t k − Σ t k y Σ − 1 y Σ y t k = [5, § 38b] z }| { Σ t k + B T Σ y | t k B = Σ t k + [5, § 34] z }| { Σ − 1 t k Σ t k y Σ y | t k [5, § 34] z }| { Σ y t k Σ − 1 t k = [5, § 33 ] z }| { A T Σ x A + Σ ξ + [5, § 33 ] z }| { ( A T Σ x A + Σ ξ ) A T Σ xy · · Σ y | t k Σ y x A [5, § 33] z }| { ( A T Σ x A + Σ ξ ) T = A T Σ x A + Σ ξ + ( A T Σ x A + Σ ξ ) A T Σ xy · · MMSE z }| { (Σ y + Σ y t k Σ − 1 t k Σ t k y ) Σ y x A ( A T Σ x A + Σ ξ ) T = A T Σ x A + Σ ξ + ( A T Σ x A + Σ ξ ) A T Σ xy · (Σ y + [5, § 5] z }| { A T Σ y x ( [5, § 5] z }| { ( A Σ x A T + Σ ξ ) [5, § 5] z }| { Σ xy A ) · · Σ y x A ( A T Σ x A + Σ ξ ) T . Now we show this formu lation is equiv alent t o the Kalman ﬁlter with the following notations: P − k , ( A T Σ x A + Σ ξ ) , H , A T Σ y x , R , Σ y , P k − 1 , Σ x , Q , Σ ξ . Substitutin g we get: P − k z }| { ( A T Σ x A + Σ ξ ) + P − k z }| { ( A T Σ x A + Σ ξ ) H T z }| { A T Σ xy · · ( R z}|{ Σ y + H z }| { A T Σ y x P − k z }| { ( A T Σ x A + Σ ξ ) H T z }| { Σ xy A ) · · H z }| { Σ y x A P − k z }| { ( A T Σ x A + Σ ξ ) . Which is equ iv alent t o (12). Now we can apply Theorem 1 and get the desired result. A P P E N D I X C Pr oof: In t he case where β > 1 , the MAP co- var iance m atrix as compu ted b y t he GIB algorit hm is: Σ ξ k +1 = β Σ t k | y + (1 − β )Σ t k (13) This is a weig hted ave rage of two cova riance m a- trices. Σ t k is computed at the ﬁrst phase of t he algorithm (equiv alent to the prediction phase in Kalman literature), and Σ t k | y is computed in the second phase of the a lgorithm (measurement phase). At the end of the Kalman iteratio n, we simply compute the weighted av erage of the two matrices to get (13). Finally , we compute A k +1 using (eq. 6b) by substituti ng the modiﬁed Σ ξ k +1 . A P P E N D I X D Pr oof: W e start by expanding the Afﬁne- scaling update rule: x 1 = ( 8 ) z }| { x 0 − α γ D 2 r = x 0 − α max i e i P D c | {z } ( 9c ) D 2 r = = x 0 − α max i e i ( I − D A T ( AD 2 A T ) AD ) | {z } ( 10 ) D c D 2 r = = x 0 − αD 2 ( 9a ) z }| { ( c − A T w ) max i e i ( I − DA T ( AD 2 A T ) − 1 AD ) D c = x 0 − αD 2 ( c − A T ( 9b ) z }| { ( AD 2 A T ) − 1 AD 2 c ) max i e i ( I − DA T ( AD 2 A T ) AD ) − 1 D c = x 0 − αD ( I − D A T ( AD 2 A T ) − 1 AD ) D c max i e i ( I − DA T ( AD 2 A T ) − 1 AD ) D c Looking at the numerator and using the Schur com- plement formula (11) with the following n otations: A , ( AD 2 A T ) − 1 , B , AD , C , D A T , D , I we get the following m atrix:  AD 2 A T AD D A T I  . Again, the upper left block is a Schur com plement A , 0 , B , AD , C , D A T , D , I of the follow- ing matrix:  0 AD D A T I  . In t otal with g et a 3 × 3 block m atrix of the form:   0 AD 0 D A T I AD 0 D A T I   . Note that the divisor is a scalar which affe cts the scaling of the step size. Using Theorem 1, we get a com putation of Kalman ﬁlter w ith the following parameters: A, H , AD , Q , I , R , I , P 0 , 0 . Thi s has an interesting interpretati on in the context of Kalman ﬁlter: both prediction and m easurement transform a- tion are id entical and equal AD . The no ise variance of both transformation s are Gaussian variables with prior ∝ N (0 , I ) .

Distributed Kalman Filter via Gaussian Belief Propagation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment