Distributed Kalman Filter via Gaussian Belief Propagation
Recent result shows how to compute distributively and efficiently the linear MMSE for the multiuser detection problem, using the Gaussian BP algorithm. In the current work, we extend this construction, and show that operating this algorithm twice on …
Authors: Danny Bickson, Ori Shental, Danny Dolev
Distrib uted Kalman Filter via Gaussian Belie f Prop agati on Danny Bickson IBM Haifa Research Lab Mount Carmel, Haifa 31905, Israel Email: dannybi@il.ibm .com Ori S hental Center for Magnetic Recording Research UCSD 9500 Gilman Driv e La Jolla, CA 92093, USA Email: oshental@ucsd.edu Danny Dole v School of Computer Science and Engineering Hebre w Unive rsity of Jerusalem Jerusalem 91904, Israel Email: dolev@cs.huji.ac.il Abstract —Recent result shows how to co mpute distribu- tively and efficiently the linear MMSE for the multiuser detection problem, using the Gaussian BP algo rithm. In the current work, we exte nd this c onstruction, a nd show that operating this algorithm twice on the matching inputs, has several interesting interpretations. First, we show equivalence to computing one iteratio n of the Kalman filter . Second, we show that the Kalman filter is a special case of the Gaussian inf ormation bottleneck algorithm, when the weight pa rameter β = 1 . T hird, we discuss the relatio n to the Affine-scaling interior-point metho d and show it is a special case of Kalman filter . Besides o f the theoretical interest of t his linking esti- mation, compression/clustering and optimization, we allow a single distrib uted implementation of those a lgorithms, which is a highly pra ctical a nd important t ask in sensor and mobile ad-hoc networks. A pplication to numerous problem domains includes collabora tive signal processing and distrib uted allocat ion of resour ces in a communicatio n network. I . I N T R O D U C T I O N Recent work [1] shows h ow to compute effi- ciently and distributiv el y the MM SE prediction for the mu ltiuser det ection probl em, using th e Gaussian Belief Propagation (GaBP) al gorithm. The basic idea is to shift the problem from li near algebra domain into a probabilist ic graphical model, solvin g an equiv alent inference problem using the efficient belief propagation inference engine. [2] compares the em pirical performance of t he GaBP algorithm relativ e to other linear iterative algorithm s, demon- strating fa ster con ver gence. [3] elaborates on the relation to solving systems of linear equations. In the present w ork, we propose to e x tend the pre- vious construction, and sho w , that by per forming the MMSE computation twice on the matchin g inputs we are abl e to com pute sever al algorithms. First, we reduce the discrete Kalman filter computation [4] to a matrix in version problem and sh ow how to solve it using the GaBP algorithm. W e sho w that Kalm an filter iteration which i s compo sed from prediction and measurement steps can be com puted by t wo consecutive MMSE predictions. Second, we explore the relation to Gaussian information bottleneck (GIB) [5] and show that Kalman filter is a special instance of the GIB algorit hm, when the weight parameter β = 1 . T o t he best of the authors knowledge, this is t he first algorith mic link bet ween the information bottleneck frame work and linear dy- namical systems. Third, we discuss t he connection to the Af fine-scaling interior-point method and sho w it is an instance of the Kalman filter . Besides of the theoretical in terest o f linki ng com- pression, estim ation and opt imization together , our work is highly practical since it proposes a general frame work for computin g all of t he above tasks distributiv ely in a com puter network. This result can hav e many applications in t he fields of est imation, collaborative s ignal processing, dis tributed resource allocation etc. A closely related work is [6]. I n this work, Fre y et al. focus on the belief prop agation algorithm (a.k.a sum-product algorithm) usi ng factor graph topolo- gies. They show that the Kalman filt er algorithm can be computed using belief propagation over a factor graph. In this contribution we extend their work in se veral directions. First, we extend the computation to vector v ariables (relative to scalar var iables in Fre y’ s work). Second, we use a dif ferent graphical mo del: an u ndirected graphical m odel which results i n si mpler update rules, where Frey uses factor -graph wi th two types of messages: factor to variables and variables to factors. Third and most i mportant, we allow an ef ficient distributed calculation of th e Kalman filt er steps, where Frey’ s algorithm is centralized. Another related work is [7]. In this work the link between Kalman filter and linear programming is established. In this work we propose a ne w and diffe rent const ruction which ties t he two algorithm s. The structure of this paper is as follows. In Section II we describe the discrete Kalman filter . In Section III we outline the GIB algorithm and discuss its relation t o the Kalman filter . Section IV presents th e Affine-scaling i nterior-point meth od and compares it to the Kalman filter algorithm . Section V presents our novel constructi on for per- forming an efficient di stributed computation of the three methods. I I . K A L M A N F I LT E R A. An Overview The Kalman filter is an effi cient iterativ e al- gorithm t o esti mate the state of a dis crete-time controlled p rocess x ∈ R n that is g over ned by th e linear stochastic difference equation 1 : x k = Ax k − 1 + B u k − 1 + w k − 1 , (1) with a measurement z ∈ R m that is z k = H x k + v k . The random variables w k and v k that rep- resent t he process and m easurement A WGN n oise (respectiv ely). p ( w ) ∼ N (0 , Q ) , p ( v ) ∼ N (0 , R ) . W e furt her assume that t he m atrices A, H , B , Q, R are given 2 . The discrete Kalm an filter update equatio ns are giv en by [4]: 1 In this pape r , we assume there is no e xternal input, namely x k = Ax k − 1 + w k − 1 . Ho weve r , our approach can be easily exten ded to support external inputs. 2 Another possible extension i s that the matrices A , H, B , Q, R change in time, in this paper we assume they are fixed. Ho we ver , our approach can be generalized to this case as well. The prediction step: ˆ x − k = A ˆ x k − 1 + B u k − 1 , (2a) P − k = AP k − 1 A T + Q. (2b) The measurement step: K k = P − k H T ( H P − k H T + R ) − 1 , (3a) ˆ x k = ˆ x − k + K k ( z k − H ˆ x − k ) , (3b) P k = ( I − K k H ) P − k . (3c) where I is the i dentity m atrix. The algorithm operates in rou nds. In round k the estimates K k , ˆ x k , P k are computed, i ncorporating the (noisy) measurement z k obtained in this rou nd. The output of the algorit hm are t he m ean vector ˆ x k and the cov ariance matrix P k . B. New cons truction Our novel contribution is a new effi cient dis- tributed algorithm for computi ng the Kalm an filter . W e begin by showing that t he Kalman filt er can be comput ed by inv ertin g t he following cov ariance matrix: E = − P k − 1 A 0 A T Q H 0 H T R , (4) and taking the lower right 1 × 1 block to be P k . The computation of E − 1 can b e don e effic iently using recent advances in t he field o f Gaussian belief propagation [1], [3]. The int uition for our approach, is that the Kalm an filter is composed o f two steps . In the p rediction step, given x k , we comp ute the MMSE prediction of x − k [6]. In the measurement step, we compute the MMSE p rediction of x k +1 giv en x − k , the ou tput of the predictio n st ep. Each MMSE compu tation can be done using the GaBP algorithm [1]. The basi c idea, is that giv en t he joint Gaussian di stribution p ( x , y ) with the covariance matrix C = Σ xx Σ xy Σ y x Σ y y , we can com pute the MMSE prediction ˆ y = arg max y p ( y | x ) ∝ N ( µ y | x , Σ − 1 y | x ) , where µ y | x = (Σ y y − Σ y x Σ − 1 xx Σ xy ) − 1 Σ y x Σ − 1 xx x , Σ y | x = (Σ y y − Σ y x Σ − 1 xx Σ xy ) − 1 . This in turn is equiv alent to computing t he Schur complement of the l owe r right b lock of the m atrix C . In total, computing the MM SE prediction in Gaussian graphical model boi ls down to a com puta- tion of a m atrix in verse. In [3] we have shown th at GaBP is an effic ient iterative algorithm for solving a system of linear equation s (or equiv alently com put- ing a m atrix in verse). In [1] we h a ve s hown th at fo r the specific case of linear detection we can compu te the MMSE estimator using the GaBP algorit hm. Next, we show t hat performing two consecutive computations of the M MSE are equivalent to one iteration of the Kalman filter . Theor em 1: The lo wer right 1 × 1 block of the matrix in verse E − 1 (eq. 4), comput ed by two MMSE it erations, is equiv alent to the computation of P k done by one iteration of the Kalman filter algorithm. Proof of Theorem 1 is given in Append ix A. In Section V we explain how to utilize the above observation to an ef ficient distri b uted iterativ e alg orithm for computi ng the Kalman filter . I I I . G AU S S I A N I N F O R M A T I O N B O T T L E N E C K Giv en t he joint distribution of a sou rce variable X and anoth er relev ance variable Y , Informatio n bottleneck (IB) operates to compress X, while pre- serving information about Y [8], [9], using the following variational problem: min p ( t | x ) L : L ≡ I ( X ; T ) − β I ( T ; Y ) T represents the compressed representatio n of X via the conditional di stributions p ( t | x ) , while the information that T maintains on Y i s captu red by the distrib ution p ( y | t ) . β > 0 is a l agrange multiplier which weig hts t he tradeoff between minimizi ng the compression information and maximizing the rele vant information. As β → 0 we are interested solely i n compression, but all relev ant in formation about Y is lost I ( Y ; T ) = 0 . When β → ∞ where are focused on preserv ation of relev ant i nformation, in thi s case T is simp ly the distribution X and we obtain I ( T ; Y ) = I ( X ; Y ) . The interesting cases are in between, when for finite values of β we are able to extract rather compressed representatio n of X while still main taining a sign ificant fraction of the original information about Y . An it erati ve algorith m for sol ving the IB p roblem is giv en in [9]: P k +1 ( t | x ) = P k ( t ) Z k +1 ( x,β ) · · exp( − β D K L [ p ( y | x ) || p k ( y | t )]) , (5a) P k ( t ) = R x p ( x ) P k ( t | x ) dx, (5b) P k ( y | t ) = 1 P k ( t ) R x P k ( t | x ) p ( x, y ) dx. (5c) where Z k +1 is a normalization factor computed in round k + 1 . The Gaussian inform ation bottleneck (GIB) [5] deals with the special c ase where the underlying dis- tributions are Gaussian. In thi s case, the comput ed distribution p ( t ) is Gaussi an as well, represented by a linear transform ation T k = A k X + ξ k where A k is a joint covariance matrix of X and T , ξ k ∼ N (0 , Σ ξ k ) is a multivariate Gaussian independent of X. The outputs of the algorithm are the cova riance m atrices representing the linear transformation T : A k , Σ ξ k . An iterativ e algorithm is derive d by su bstituting Gaussian distributions into (5), resulting in the fol- lowing update rules: Σ ξ +1 = ( β Σ t k | y − ( β − 1)Σ − 1 t k ) , (6a) A k +1 = β Σ ξ k +1 Σ − 1 t k | y A k ( I − Σ y | x Σ − 1 x ) . (6b ) T ABLE I S U M M A RY O F N OTA T I O N S I N T H E G I B [ 5 ] PA P E R V S . K A L M A N FI LT E R [ 4 ] GIB [5] Kalman [4] Kalman meaning Σ x P 0 a-priori esti mate error cov ariance Σ y Q Process A W GN noise Σ t k R Measurement A WGN noise Σ xy , Σ y x A, A T process st ate transformation matrix Σ xy A, A T Σ y x H T , H measurement transformation matri x Σ ξ k P k posterior error cov ariance in round k Σ x | y k P − k a-priori error covarian ce in round k Since th e underlying graphi cal model of both algorithms (GIB and Kalman filter) is M arkovian with Gaus sian probabil ities, i t is int eresting to ask what is the relation between them . In t his w ork we show , that the Kalman filter posterior error cov ariance computatio n is a special case of the GIB algorit hm when β = 1 . Furthermore, we show Y X T (a) X k -1 X k X k - (b) Z k -1 Z k X k -1 X k X k - (c) (d) Fig. 1. Comparison of the different graphical models used. (a) Gaussian Information Bottleneck [5] (b) Kalman Filter (c) F rey’ s sum-product factor graph [6] (d) Our new construction. how to com pute GIB using the Kalman filt er when β > 1 (the case where 0 < β < 1 is not interesti ng since it gives a degenerate soluti on where A k ≡ 0 [5].) T able I outlines the diff erent no tations u sed by both algorithms. Theor em 2: The GIB algorit hm when β = 1 i s equiv alent to t he Kalman filter algorithm. The proof is give n in Appendix B. Theor em 3: The GIB algorithm when β > 1 can be computed by a modified Kalman filter iteration. The proof is give n in Appendix C. There are some di f ferences between t he GIB algorithm and Kalm an filter computation. First, the Kalman filter has input observations z k in each round. No te t hat the obs erv ations do not affect the posterior error cov ariance computati on P k (eq. 3c), P(t) (a) P(y|t) P(t|x) (b) X k-1 X k X k Affine- mapping Measure- ment (c) X k-1 X k Gradient descent Translation Inverse affine mapping Prediction V k - Fig. 2. Comparison of the schematic operation of the different algorithms. (a) i terativ e information bottleneck operation (b) Kalman filter operation (c) Affine-scaling operation. but affect the p osterior mean ˆ x k (eq. 3b). Second, Kalman filter compu tes both posterior m ean ˆ x k and error covariance P k . The cov ariance Σ ξ k computed by the GIB algorithm was shown to b e ident ical to P k when β = 1 . The GIB algorithm does not compute t he pos terior m ean, but com putes an additional covariance A k (eq. 6 b), which is assum ed known in the Kalman filter . From the i nformation theoretic p erspecti ve, o ur work extends the ideas present ed in [10]. Predicti ve information is d efined to be th e m utual information between the past and the future of a time serias. In that sense, by using Theorem 2, Kalm an filter can be t hought of as a p rediction of the future, which from the one hand compresses the information about past, and from the o ther h and m aintains information about the present. The origin s of simi larity bet ween the GIB algo- rithm and Kalman filter are root ed in the IB iterative algorithm: For computi ng (5a), we need to com pute (5b,5c) in recursion, and vice versa. I V . R E L A T I O N T O T H E A FFI N E - S C A L I N G A L G O R I T H M One of t he most ef ficient i nterior point methods used for linear programm ing is the Affine-sca ling algorithm [11]. It is known th at the Kalman filt er is linked to t he Affine-scaling algorithm [7]. In t his work we giv e an alternate proof, based on different construction, which shows that Affine-scaling is an instance of Kalm an filter , which is an instance of GIB. Th is link between estimati on and optim ization allows for nu merous applications. Furtherm ore, by providing a single distribute efficient impl ementa- tion of the GIB algorithm, we are able to solve numerous problems in communicati on networks. The linear programming problem in its canonical form is give n by: minimize c T x (7a) subject to A x = b , x ≥ 0 . (7b) where A ∈ R n × p with rank { A } = p < n . W e assume the p roblem is solvable with an op timal x ∗ . W e also assume that the problem is strict ly feasible, in other words there exists x ∈ R n that satis fies A x = b and x > 0 . The Af fine-scaling algorithm [11] is s ummarized below . Assume x 0 is an i nterior feasibl e point to (7b). Let D = diag ( x 0 ) . T he Affine-scaling is an iterativ e algorithm which computes a new feasible point that minimi zes the cost function (7a): x 1 = x 0 − α γ D 2 r (8) where 0 < α < 1 is the step size, r is the step direction. r = ( c − A T w ) , (9a) w = ( AD 2 A T ) − 1 AD 2 c , (9b) γ = max i ( e i P D c ) . (9c) Where e i is t he i th unit vector and P is a p rojection matrix giv en by: P = I − D A T ( AD 2 A T ) − 1 AD . (10) The algorithm continues in rounds and is guaranteed to find an optimal solution in at most n rounds. In a nuts hell, in each iteration, the Affine-scaling algorithm first performs an Affi ne-scaling with re- spect t o th e current solution poin t x i and ob tains the direction of descent by proj ecting the gradient of the transform ed cost functio n on the nu ll space of the constraints set. Th e ne w solut ion is obtained by translating the current soluti on along the di rection found and then mapping the resul t b ack into the original space [7]. Th is has interesting analogy for the two p hases of t he Kalman filter . Theor em 4: The Affine-sca ling algorithm it era- tion is an instance of the Kalman filter algorith m iteration. Proof is giv en in Appendix D. V . E FFI C I E N T D I S T R I B U T E D C O M P U TA T I O N W e have s hown how to express the Kalman filter , Gaussian information bottleneck and Affine-scaling algorithms as a two step MMSE computation. Each step inv olves in verting a 2 × 2 block m atrix. Recent result by Bickso n and Shental et al. [1] s how that the MMSE comput ation can be done effic iently and distributiv ely using the Gaussian b elief propagation algorithm. Beca use of space limitati ons the full algorithm is not reproduced here. The interested reader is referred to [1], [3] for a com plete deriv ation of the GaBP update rul es and conv ergence analysis. The GaBP algorithm is summarized in T able II. Regarding conv ergence, if it con verges, GaBP is known to result i n e xact inference [12]. Determining the e xact region of con vergence and con ver gence rate remain open research problem s. All that is known is a sufficient (but not necessary) cond i- tion [13], [14] stating that GaBP con verges when the spectral radius satisfies ρ ( | I K − A | ) < 1 , where A is first norm alized s.t. the main diagonal con tains ones. A stricter sufficient conditi on [12], determi nes that the matrix A m ust be diagonally dominant ( i.e. , | A ii | > P j 6 = i | A ij | , ∀ i ) in o rder for GaBP to con ver g e. Regarding con ver gence s peed, [15] shows th at when con verging, the algorithm con verges i n O ( l og ( ǫ ) /log ( γ )) it erations, where ǫ is the desired accurac y , and 1 / 2 < γ < 1 is a parameter related to the in verted matrix . Th e computati on overhead in each iteratio n is determined by the number of non- zero elements of the in verted matrix A . In practice, [16] demonstrates con ver gence of 5-10 rou nds on sparse m atrices with severa l mil lions of variables. [17] sho ws con vergence of dense constraint matrices T ABLE II C O M P U T I N G x = A − 1 b V I A G A B P [ 3 ] . # Stage Operation 1. Initializ e Compute P ii = A ii and µ ii = b i / A ii . Set P k i = 0 and µ k i = 0 , ∀ k 6 = i . 2. Iterate P ropagate P k i and µ k i , ∀ k 6 = i such that A k i 6 = 0 . Compute P i \ j = P ii + P k ∈ N ( i ) \ j P k i and µ i \ j = P − 1 i \ j ( P ii µ ii + P k ∈ N( i ) \ j P k i µ k i ) . Compute P ij = − A ij P − 1 i \ j A j i and µ ij = − P − 1 ij A ij µ i \ j . 3. Chec k If P ij and µ ij did not con ver ge, return to #2. Else, continue to #4. 4. Infer P i = P ii + P k ∈ N( i ) P k i , µ i = P − 1 i ( P ii µ ii + P k ∈ N( i ) P k i µ k i ) . 5. Output x i = µ i of size up to 150 , 000 × 150 , 000 in 6 rounds, where the algorithm i s run i n parallel using 1 ,024 CPUs. Em pirical comparison with ot her i terativ e algorithms is giv en in [2]. V I . E X A M P L E A P P L I C A T I O N The TransF ab software package is a distributed middlewar e dev eloped in IBM Haifa Labs, which supports real time forwarding of message streams, providing quali ty o f service guarantees. W e plan to use our distributed Kalman filter algorit hm for online moni toring o f software resources and perfor- mance. On each second each n ode records a vec- tor of performance p arameters like memory usage, CPU usage, current b andwidth, queue sizes etc. The nodes execute the d istributed Kalman filter algo- rithm on the background. Figure 3 plots a cov ariance matrix of runni ng an experiment usi ng t wo TransF ab nodes propagati ng data. The covariance m atrix is used as an i nput the K alman filter algorithm. Y ello w sections s how hi gh correlation between measured parameters. Ini tial results are encouraging, we plan to report them using a future contribution. V I I . C O N C L U S I O N In this work we h a ve li nked together s e veral diffe rent algorithms from the the fields of estimation (Kalman filter), clustering /compression (Gaussian information bottleneck) and optim ization (Affine- scaling in terior- point method). Besides of the theo- retical interest in linking tho se different domains, we are motiv ated by practical problems in commu nica- tion networks. T o this end, we propos e an ef ficient distributed iterative algorithm , t he Gauss ian bel ief 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 −10 0 10 20 30 40 50 60 Fig. 3. Cov ariance matrix which represents vector data captured from two Tran sfab nodes. propagation algorithm, t o be used for efficiently solving these problems. A C K N OW L E D G M E N T O. Shental acknowledges the partial support of the NSF (Grant CCF-0514859). The authors are grateful to Noam Slonim and Naftali Tishby from the Hebrew U niv ersit y of Jerusalem for u seful dis- cussions. R E F E R E N C E S [1] D. Bickson, O. Shental, P . H. Siegel, J. K. W olf, and D. Dole v , “Gaussian belief propagation based multiuser detection, ” in IEEE I nt. Symp. on Inform. T heory (ISIT ) , T oronto, Canada, July 2008. [2] ——, “Linear detection via belief propagation, ” in Proc. 45th Allerton Conf. on Communications, Contr ol and Computing , Monticello, IL, USA, Sept. 2007. [3] O. Shental, D. Bickson, P . H. Siegel, J. K. W olf, and D. Dole v , “Gaussian belief propagation solve r for systems of li near equa- tions, ” i n IEEE Int. Symp. on I nform. Theory (ISIT) , T oronto, Canada, July 2008. [4] G. W elch and G. Bishop, “ An introduction to t he kalman filter , ” T ech. Rep ., 2006. [Online]. A v ailable: http://www .cs.unc.edu/ ∼ welch/kalman/kalmanIn tro.html [5] G. Chechik, A. Gl oberson, N. T ishby , and Y . W eiss, “Informa- tion bottleneck for gaussian variables , ” in Journ al of Mach ine Learning Researc h , vo l. 6. Cambridge, MA, USA: MIT Press, 2005, pp. 165–188. [6] F . Kschischang, B. Frey , and H. A. Loeliger, “Factor graphs and the sum-product algorithm, ” in IEEE T ransactions on Information T heory , vo l. 47, Feb. 2001, pp. 498–51 9. [7] S. Puthenpura, L. Sinha, S.-C. Fang, and R. Saigal, “Solving stochastic programming problems via kalman filter and affine scaling, ” in E ur opean Jou rnal of Operational Resear ch , vol. 83, no. 3, 1995, pp. 503–513. [Online]. A vailable: http://ideas.repec.org/a/eee /ejores/v83y199 5i3p503- 513.html [8] N. Tishby , F . Pereira, and W . Bialek, “The information bottle- neck method, ” in T he 37th annual Allerton Confer ence on Com- munication, Contr ol, and Computing , in vited paper , September 1999. [9] N. S lonim, “T he information bottelneck: T heory and appli- cation, ” in Ph.D. Thesis, School of Computer Science and Enigeering , The Hebr ew University of Jerusa lem , 2003. [10] W . Bialek, I. Nemenman, and N. T ishby , “Predictability , com- plexity , and learning, ” in Neural Comput. , vol. 13, no. 11. Cambridge, MA, US A: MIT Press, Nov ember 2001, pp. 24 09– 2463. [11] R. J. V anderbei, M. S. Meketon, and B. A. Freedman, “ A modification of karmarkar’ s linear programming algorithm, ” in Algorithmica , vol. 1, no. 1, March 1986, pp. 395–407. [12] Y . W eiss and W . T . Freeman, “Correctness of belief propagation in Gaussian graphical models of arbitrary topology , ” in Neura l Computation , vo l. 13, no. 10, 2001, pp. 2173–2 200. [13] J. Johnson , D. Malioutov , and A. W illsky , “W alk-sum interpre- tation and analysis of gaussian belief propagation, ” in Nine- teenth Annual Confer ence on Neural Information Pr ocessing Systems (NIPS 05’) , 2005. [14] D. M. Malioutov , J. K. Johnson, and A. S. W illsky , “W alk- sums and belief propagation in Gaussian graphical models, ” in J ournal of Machine Learning Resear ch , vol. 7, Oct. 2006. [15] D. Bickson, Y . T ock, D. Dolev , and O. Shental, “Polynomial linear programming with gaussian belief propaga tion, ” in the 46th Allerton Conf. on Communications, Control and Comput- ing , Monticello, IL, USA, 2008. [16] D. Bickson and D. Malkhi, “ A unifying framew ork for rating users and data items in peer-to-peer and social networks, ” in P eer-to-P eer Networking and Applications (PP NA) J ournal, Spring er-V erla g , 2008. [17] D. Bi ckson, D. Dolev , and E. Y om-T ov , “ A gaussian belief propagation solver for large scale support vector machines, ” in 5th Eur opean C onfer ence on Complex Systems , Jerusalem, Sept. 2008. A P P E N D I X A Pr oof: W e prove that in verting the matrix E (eq. 4) is equiv alent t o o ne it eration of the Kalman filter for computing P k . W e st art from the matrix E and sh ow that P − k can be computed i n recursion using the Schur com- plement formula: D − C A − 1 B (11) applied t o the 2 × 2 upper left submatrix of E, where D , Q, C , A T , B , A, A , P k − 1 we get: P − k = D z}|{ Q − z}|{ + C z}|{ A T − A − 1 z }| { P k − 1 B z}|{ A . Now we compute recursiv ely the Schur comple- ment of lower ri ght 2 × 2 submatrix of the matrix E us ing the m atrix in version lemm a: A − 1 + A − 1 B ( D − C A − 1 B ) − 1 C A − 1 where A − 1 , P − k , B , H T , C , H , D , Q. In total we get: A − 1 z}|{ P − k + A − 1 z}|{ P − k B z}|{ H T ( D z}|{ R + C z}|{ H A − 1 z}|{ P − k B z}|{ H T ) − 1 C z}|{ H A − 1 z}|{ P − k = (12) ( I − ( 3a ) z }| { P − k H T ( H P − k H T + R ) − 1 H ) P − k = = ( 3c ) z }| { ( I − K k H ) P − k = P k A P P E N D I X B Pr oof: Loo king at [5, § 39 ], when β = 1 we get Σ ξ +1 = (Σ − 1 t k | y ) − 1 = Σ t k | y = MMSE z }| { Σ t k − Σ t k y Σ − 1 y Σ y t k = [5, § 38b] z }| { Σ t k + B T Σ y | t k B = Σ t k + [5, § 34] z }| { Σ − 1 t k Σ t k y Σ y | t k [5, § 34] z }| { Σ y t k Σ − 1 t k = [5, § 33 ] z }| { A T Σ x A + Σ ξ + [5, § 33 ] z }| { ( A T Σ x A + Σ ξ ) A T Σ xy · · Σ y | t k Σ y x A [5, § 33] z }| { ( A T Σ x A + Σ ξ ) T = A T Σ x A + Σ ξ + ( A T Σ x A + Σ ξ ) A T Σ xy · · MMSE z }| { (Σ y + Σ y t k Σ − 1 t k Σ t k y ) Σ y x A ( A T Σ x A + Σ ξ ) T = A T Σ x A + Σ ξ + ( A T Σ x A + Σ ξ ) A T Σ xy · (Σ y + [5, § 5] z }| { A T Σ y x ( [5, § 5] z }| { ( A Σ x A T + Σ ξ ) [5, § 5] z }| { Σ xy A ) · · Σ y x A ( A T Σ x A + Σ ξ ) T . Now we show this formu lation is equiv alent t o the Kalman filter with the following notations: P − k , ( A T Σ x A + Σ ξ ) , H , A T Σ y x , R , Σ y , P k − 1 , Σ x , Q , Σ ξ . Substitutin g we get: P − k z }| { ( A T Σ x A + Σ ξ ) + P − k z }| { ( A T Σ x A + Σ ξ ) H T z }| { A T Σ xy · · ( R z}|{ Σ y + H z }| { A T Σ y x P − k z }| { ( A T Σ x A + Σ ξ ) H T z }| { Σ xy A ) · · H z }| { Σ y x A P − k z }| { ( A T Σ x A + Σ ξ ) . Which is equ iv alent t o (12). Now we can apply Theorem 1 and get the desired result. A P P E N D I X C Pr oof: In t he case where β > 1 , the MAP co- var iance m atrix as compu ted b y t he GIB algorit hm is: Σ ξ k +1 = β Σ t k | y + (1 − β )Σ t k (13) This is a weig hted ave rage of two cova riance m a- trices. Σ t k is computed at the first phase of t he algorithm (equiv alent to the prediction phase in Kalman literature), and Σ t k | y is computed in the second phase of the a lgorithm (measurement phase). At the end of the Kalman iteratio n, we simply compute the weighted av erage of the two matrices to get (13). Finally , we compute A k +1 using (eq. 6b) by substituti ng the modified Σ ξ k +1 . A P P E N D I X D Pr oof: W e start by expanding the Affine- scaling update rule: x 1 = ( 8 ) z }| { x 0 − α γ D 2 r = x 0 − α max i e i P D c | {z } ( 9c ) D 2 r = = x 0 − α max i e i ( I − D A T ( AD 2 A T ) AD ) | {z } ( 10 ) D c D 2 r = = x 0 − αD 2 ( 9a ) z }| { ( c − A T w ) max i e i ( I − DA T ( AD 2 A T ) − 1 AD ) D c = x 0 − αD 2 ( c − A T ( 9b ) z }| { ( AD 2 A T ) − 1 AD 2 c ) max i e i ( I − DA T ( AD 2 A T ) AD ) − 1 D c = x 0 − αD ( I − D A T ( AD 2 A T ) − 1 AD ) D c max i e i ( I − DA T ( AD 2 A T ) − 1 AD ) D c Looking at the numerator and using the Schur com- plement formula (11) with the following n otations: A , ( AD 2 A T ) − 1 , B , AD , C , D A T , D , I we get the following m atrix: AD 2 A T AD D A T I . Again, the upper left block is a Schur com plement A , 0 , B , AD , C , D A T , D , I of the follow- ing matrix: 0 AD D A T I . In t otal with g et a 3 × 3 block m atrix of the form: 0 AD 0 D A T I AD 0 D A T I . Note that the divisor is a scalar which affe cts the scaling of the step size. Using Theorem 1, we get a com putation of Kalman filter w ith the following parameters: A, H , AD , Q , I , R , I , P 0 , 0 . Thi s has an interesting interpretati on in the context of Kalman filter: both prediction and m easurement transform a- tion are id entical and equal AD . The no ise variance of both transformation s are Gaussian variables with prior ∝ N (0 , I ) .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment