On Computational Complexity Reduction Methods for Kalman Filter Extensions

1 On computational comple xity reduction methods for Kalman ﬁlter e xtensions Matti Raitoharju †‡ and Robert Pich ´ e † † Faculty of Information T echnology and Communication Sciences, T ampere Univ ersity , Finland ‡ Department of Electrical Engineering and Automation, Aalto Uni versity , Finland Abstract —The Kalman ﬁlter and its extensions ar e used in a vast number of aer ospace and navigation applications f or nonlinear state estimation of time series. In the literature, different approaches hav e been proposed to exploit the structure of the state and measurement models to reduce the computational demand of the algorithms. In this tutorial, we survey existing code optimization methods and present them using uniﬁed notation that allows them to be used with various Kalman ﬁlter extensions. W e develop the optimization methods to cov er a wider range of models, sho w how different structural optimizations can be combined, and pr esent new applications f or the existing optimizations. Furthermore, we present an example that shows that the exploitation of the structure of the problem can lead to impro ved estimation accuracy while r educing the computational load. This tutorial is intended for persons who are familiar with Kalman ﬁltering and want to get insights for reducing the computational demand of different Kalman ﬁlter extensions. I . I N T RO D U C T I O N Since its pioneering application to trajectory estimation in the Apollo program in the 1960’ s, the Kalman Filter (KF) and its nonlinear extensions hav e been used in a multitude of aerospace and navigation applications, including inertial nav- igation, radar systems, and global navig ation satellite systems [18]. KFs are also used in many other application areas, for example state estimation of a lithium polymer battery [11] or brain imaging [21]. KF is the optimal Bayesian ﬁlter under certain conditions, which include linearity of models and Gaussianity and white- ness of noise [22]. Kalman Filter Extensions (KFEs) are based on approximate linear-Gaussian models that extend the use of KF to nonlinear models. In the literature, there are sev eral different types of KFEs, with dif ferent demands on computational resources. The computational complexity of the KFE increases when the number of estimated state variables or dimensionality of the measurement vector increases; in some KFEs e ven exponentially [25]. The number of state variables v aries depending on the application, for example, in some positioning applications only 2 position variables are estimated, but in other positioning applications the state may also contain v ariables for the locations of thousands of This is an author accepted version of the article. Final published article: doi:10.1109/MAES.2019.2927898. c  2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. landmarks. In many applications the computational resources are limited, for e xample, in miniaturized satellites [3]. In this tutorial, we study various methods to reduce the com- putational load of state estimation with KFEs by exploiting the structure of the state transition and measurement models. This tutorial is intended for persons who are familiar with the basics of KFEs and want to know how to reduce the computational demand of KFEs. The presented algorithms are such that the result is exact when applied to a situation where the KF produces the exact result. This leav es algorithms that are not optimal in the linear-Gaussian case, such as Ensemble Kalman Filter (EnKF), out of the scope of this tutorial. Howe ver , some of the given optimizations can still be applied with such algorithms. W e present the algorithms in a general form so that they can be applied to as wide a range of problems as possible, but still in a form that they are easy to implement. Some of the algorithms in the original sources are gi ven only for a certain KFE; the general notation used in this tutorial allows the op- timizations to be implemented for dif ferent KFEs. In addition to surve ying the algorithms in the literature, we giv e some generalizations of the algorithms and new ways of applying the optimizations with the KFEs. T o our kno wledge, there is no similar uniﬁed presentation of optimization methods in the literature. A dra wback of KFEs is that, because of the Gaussian approximation on which the algorithm is based, the estimate is inaccurate when the true posterior is f ar from normal. Gaussian Mixture Filters (GMFs) (a.k.a. Gaussian sum ﬁlters) use a sum of normal densities to estimate the probability distributions and can approximate any probability density function [39]. Because GMFs use se veral KFEs the computational load is larger than with algorithms that use only one Gaussian. The optimizations in this tutorial can be applied also in the GMFs for state propagation and update of individual components. For implementations, we assume that a library for basic linear algebra is av ailable. In many practical applications, the algorithms can be optimized further by taking the speciﬁc structure of the matrices into account. If the matrices are sparse, the sparsity can be exploited for optimization either by hand or by implementing the algorithms with sparsity- optimized subroutines. These can be found, for e xample, in Sparse Basic Linear Algebra Subprograms 1 or in Matlab 2 . 1 http://math.nist.gov/spblas/ 2 http://www .mathworks.com/products/matlab/ 2 The optimizations in matrix algebra libraries also make it impossible to provide accurate complexity estimates of the giv en algorithms. For example the naiv e matrix multiplication for square matrices with n columns has complexity O ( n 3 ) while Strassen’ s algorithm has complexity O ( n 2 . 807 ) [17], [43] and algorithms with smaller complexity exist, although they are faster only with very lar ge n . The computation time of algorithms can also be reduced using parallel computing. The matrix operations can be effec- tiv ely parallelized in the linear algebra library that handles the matrix operations. Thus, we do not consider parallelization in this tutorial. The remainder of this tutorial is organized as follows. In the next section we present the common notations that are used throughout the tutorial. In Section III the background of KFEs is presented. Section IV contains linearity based optimizations of the KFEs. In Section V the solution of the set of linear equations containing the inv erse of the innov ation cov ariance is optimized. Section VI presents optimizations based on di vision of the state into multiple subprocesses. Section VII gi ves example applications of the optimization methods. Section VIII concludes the article. I I . N OTA T I O N S In the following list there are the variables that are used throughout the tutorial. In dif ferent sections, there are algo- rithm speciﬁc notations that are e xplained as they occur . – Scalars j number of iterations m dimension of measurement n dimension of state (or augmented state) – Random variables x state z augmented state ε noise ε x state transition noise ε y measurement noise – Subscripts i iteration index t time index – Functions f ( · ) state transition function g ( · ) non-speciﬁc function G ( · ) Matrix v alued function h ( · ) measurement function diag( · ) diagonal matrix with function arguments on its diagonal – Expected values E [ x ] expected v alue of x P g ( x ) x cov ariance of g ( x ) and x µ x mean of x – Other variables I identity matrix J matrix that deﬁnes the statistically linearized relationship between state and measurement, which is the Jacobian matrix in linear systems K Kalman gain S inno vation covariance 0 zero matrix 1 matrix containing ones – Acronyms CKF Cubature Kalman Filter DD Divided Dif ference EKF Extended Kalman Filter EKF2 Second Order Extended Kalman Filter EnKF Ensemble Kalman Filter GF Gaussian Filter GMF Gaussian Mixture Filter KF Kalman Filter KFE Kalman Filter Extension PDR Pedestrian Dead Reckoning QKF Gauss-Hermite Quadrature Kalman ﬁlter RBPF Rao-Blackwellized Particle Filter R UF Recursiv e Update Filter SLAM Simultaneous Localization and Mapping SOKF2 Second Order Polynomial Kalman Filter S 2 KF Smart Sampling Kalman Filter TDoA T ime Dif ference of Arriv al UKF Unscented Kalman Filter I I I . B A C K G RO U N D In discrete-time Bayesian ﬁltering, the state of a dynamical system is estimated based on noisy measurements. The state model describes how the n -dimensional state x is propagated in time. A state model can be expressed as x t = f t ( x t − 1 , ε x t ) , (1) where f ( · ) is the state transition function, x t − 1 is the state at the previous time step, and ε x t is the state transition noise. The state’ s distribution is updated using measurements of the form y t = h t ( x t , ε y t ) , (2) where y t is the realized measurement, h ( · ) is the measurement function, and ε y is the measurement noise. T o shorten notations we use the augmented state z =  x ε  , where applicable. In the augmented state ε is either state transition noise or measurement noise, depending on context. The state and measurement noises are assumed independent and white. W e also assume that all the v ariables of interest are in the state. If the state or measurement noises are not white, the y can be modeled using more v ariables in the state. In the Schmidt-Kalman ﬁlter, the additional noise bias states are not estimated, instead their ef fect on the cov ariance is ap- proximated. Howe ver , the Schmidt-Kalman ﬁlter is suboptimal and we do not treat it in this paper . The interested reader may refer to [26, p.282]. W e omit the time index t when its absence should not cause confusion. In general, a KFE can be divided into two parts: 1) Prediction: µ − x t = µ f ( z t − 1 ) (3) P − x t ,x t = P f ( z t − 1 ) f ( z t − 1 ) , (4) 3 where µ − x t is the predicted mean computed using the state transition function and posterior of the pre vious time step and P − x t ,x t is the predicted co variance. 2) Update: y − t = µ h ( z − t ) (5) S t = P h ( z − t ) h ( z − t ) (6) K t = P x − t h ( z − t ) S − 1 t (7) µ x t = µ − x t + K ( y t − y − t ) (8) P x t ,x t = P − x t ,x t − K t S t K T t , (9) where z − t is the augmented state of predicted state and measurement noise, y − t is the predicted mean of the measurement, S t is the innov ation covariance, K t is the Kalman gain, µ x t is the updated mean of the state, and P x t ,x t is the updated state cov ariance. In (7), P x − t h ( z − t ) refers to the rows of P z − t h ( z − t ) that correspond to the state v ariables. The mean and co variance matrices associated to measurement and state transition functions can be written using expecta- tions: µ g ( z ) = E [ g ( z )] (10) P g ( z ) g ( z ) = E  ( g ( z ) − µ g ( z ) )( g ( z ) − µ g ( z ) ) T  (11) P z g ( z ) = E h ( z − µ z )  g ( z ) − µ g ( z )  T i . (12) When a function is linear , that is, of the form g ( z ) = J z , (13) the expectations have analytic form: µ g ( z ) = J µ z (14) P g ( z ) g ( z ) = J P z z J T (15) P z g ( z ) = P z z J T (16) and the algorithm is the KF. There are extensions that approximate the expectations (10)–(12) using analytic dif ferentiation (or integration) of the functions. For example, the Extended Kalman Filter (EKF) uses the ﬁrst order T aylor expansion and the Second Order Extended Kalman Filter (EKF2) does the linearization based on the second order T aylor e xpansion [26]. There are also algo- rithms that use numerical differentiation to approximate (10)– (12). In Divided Dif ference (DD) ﬁlters [31] the computation is based on numerical dif ferentiation of the functions to obtain a linear model. DD ﬁlters use 2 n + 1 function ev aluations. The Second Order Polynomial Kalman Filter (SOKF2) [25] uses 1 2 n 2 + 3 2 n + 1 points to ﬁt a second order polynomial to the function and then computes the analytic covariance matrices for the polynomial. One very commonly used and large group of KFEs algo- rithms approximate expectations (10)–(12) as: µ g ( z ) ≈ X w s i g ( χ i ) (17) P g ( z ) g ( z ) ≈ X w c i ( g ( χ i ) − µ g ( z ) )( g ( χ i ) − µ g ( z ) ) T (18) P z g ( z ) ≈ X w c i ( χ i − z )( g ( χ i ) − µ g ( z )) T , (19) where χ i are so-called sigma-points that are chosen according to the prior distribution and w s i and w c i are associated weights. Examples of this kind of ﬁlters are Unscented Kalman Filter (UKF) [46], dif ferent Cubature Kalman Filters (CKFs) [2], and Gauss-Hermite Quadrature Kalman ﬁlter (QKF) [25]. The selection and number of sigma-points and weights depend on the algorithm used. The UKF is usually used with 2 n + 1 sigma-points. The Gaussian Filter (GF) uses k n + 1 sigma-points, where k > 1 is an integer parameter [24]. CKFs are dev eloped for different orders and they use O ( n o ) sigma-points, where o is the order of the cubature rule. The number of sigma-points in QKFs increases exponentially , as the number of sigma-points is α n , where α > 1 [25]. There are also algorithms that allow an arbitrary number of points, for example Smart Sampling Kalman Filter (S 2 KF) [41], which uses at least n + 1 sigma-points. Algorithms that use (17)–(19) do not compute J explicitly . In some optimizations J is required and can be computed using statistical linearization [37] J = P g ( z ) z P − 1 z z . (20) For nonlinear systems, this cannot be substituted into (15) to obtain P g ( z ) g ( z ) . Because the computation of (20) requires solving a set of linear equations it should be avoided when the dimension of z is large. The computational requirements of the algorithms can be reduced in several ways. One way is to compute the expected values numerically only for state variables that are transformed by a nonlinear function and compute the expectations for linear parts analytically [10]. The improving of implemen- tation efﬁcienc y using the linear KF update for the linear state v ariables is more familiar in particle ﬁltering. Speciﬁcally Rao-Blackwellized P article Filter (RBPF) solves the estimates of conditionally linear variables using a KF and rest of the variables using particles [14]. The dimension of the output of the nonlinear part can also be reduced to reduce the computational b urden. In (18) the m × m cov ariance matrix is updated for e very sigma point. Thus, halving the dimension m of g ( · ) reduces the operations applied to the cov ariance matrix by a factor of four . Such update optimizations are considered in Section IV. It is also possible to compute the Kalman gain (7) faster by exploiting the structure of the innov ation co variance (6) as shown in Section V. When the state can be divided into decoupled blocks, the blocks can be updated independently and the global estimate can be computed only when needed. The updates of the blocks can be made at dif ferent rates. These optimizations are presented in Section VI. I V . O P T I M I Z ATI O N S B A S E D O N T H E L I N E A R I T Y I N N O N L I N E A R F U N C T I O N S A. P artially Linear Functions In [7], algorithms for treating dif ferent setups of UKF were presented. The algorithms considered dif ferent implementa- tions when the state transition model or measurement model has one of the follo wing three forms g ( x, ε ) = Gx + ε (21) 4 g ( x, ε ) = g 1 ( x ) + ε (22) g ( x, ε ) = g 2 ( x, ε ) , (23) where noise ε is assumed independent of the state. The algorithms improv e in computation efﬁciency when either or both of the state and measurement models belong to the ﬁrst or second group. If both are in the ﬁrst group the updates can be computed exactly using the KF. If the function belongs to the second group then (10)–(12) become µ g ( z ) = E [ g ( x )] + µ ε (24) P g ( z ) g ( z ) = E  ( g ( x ) − µ g ( x ) )( g ( x ) − µ g ( x ) ) T  + P εε (25) P xg ( z ) = E h ( x − µ x )  g ( x ) − µ g ( x )  T i (26) and the approximation can be computed for the dimension of the state instead of the dimension of the augmented state. This optimization is widely used among different ﬁlters and is often considered to be the standard model. In [30] the function is split into a nonlinear g 1 ( · ) part that depends only on a part of the state z n and linear parts h ( z ) =  g 1 ( z n ) H 1 z  . (27) The UKF is used for computing the expected v alues of the nonlinear part and the correlation between nonlinear and linear parts. W e generalize the abov e models to functions of form: g ( z ) = Ag n ( T z ) + H z , (28) where T has full row rank. T o reduce the required resources in computation T is chosen to hav e the minimal number of rows and g ( · ) to have the minimal number of elements. Expectations (10)–(12) are computed for T z , which should have a smaller dimension than z . For computing the update, an algorithm that approximates expectations (10)–(12) is required. These are usually computed in the update stage of a KFE. The cross correlation matrix P z g ( z ) (19) is not needed in the normal state propagation, but the algorithm from the update stage can be used for this. The transformed augmented state is denoted ˜ z = T z ∼ N( µ ˜ z = T µ z , P ˜ z ˜ z = T P z z T T ) . (29) When expectations µ g n ( ˜ z ) (10), P g n ( ˜ z ) g n ( ˜ z ) (11), and P ˜ z g n ( ˜ z ) (12) are kno wn, the expectations for (28) are µ g ( z ) = Aµ g n ( ˜ z ) + H µ z (30) P g ( z ) g ( z ) =  A H   P g n ( ˜ z ) g n ( ˜ z ) P T z g n ( ˜ z ) P z g n ( ˜ z ) P z z   A T H T  (31) P z g ( z ) =  P z g n ( ˜ z ) P z z   A T H T  , (32) where P z g n ( ˜ z ) = P z z T T  T P z z T T  − 1 P ˜ z g n ( ˜ z ) . (33) This is based on the fact that the cross term P ˜ z g n ( ˜ z ) describes the linear dependence of function g n ( ˜ z ) and ˜ z as in (20): P ˜ z g n ( ˜ z ) = P ˜ z ˜ z J T = T P z z T T J T . (34) Algorithm 1: State transition using a partially linear measurement Input: x t − 1 ∼ N( µ x t − 1 , P x t − 1 x t − 1 ) // State estimate from previous time step ε q ∼ N( µ ε q , P ε q ε q ) , // State transition noise P x t − 1 ε q // Cross covariance between state and state transition noise f ( x, ε q ) = f ( z ) = Ag ( T z ) + H z , // State transition function of given form Output: x ∼ N( µ x − , P x − x − ) // Propagated state µ z =  µ x t − 1 µ ε q  // Augmented mean P z z =  P x t − 1 x t − 1 P x t − 1 ε q P T x t − 1 ε q P ε q ε q  // Augmented covariance µ ˜ z = T µ z // Transformed mean P ˜ z ˜ z = T P z z T T // Transformed covariance Compute µ g (˜ z ) , P g (˜ z ) g ( ˜ z ) , and P ˜ z g ( ˜ z ) using a KFE P z g ( ˜ z ) = P z z T T  T P z z T T  − 1 P ˜ z g ( ˜ z ) µ x − = Aµ g ( T z ) + H µ z // Predicted mean of state P x − x − =  A H   P g (˜ z ) g ( ˜ z ) P T z g ( ˜ z ) P z g ( ˜ z ) P z z   A T H T  // Predicted covariance and the term P z g n ( ˜ z ) is P z g n ( ˜ z ) = P z z T T J T . (35) Solving J from (34) and substituting it into (35) we get (33). In the update, the matrix with cross terms is required for state v ariables only . This can be e xtracted by P xh ( z ) =  I 0  P z h ( z ) (36) when the state variables appear ﬁrst in the augmented state. Naturally , in a computationally efﬁcient code (36) is done with indices, not matrix multiplication. The algorithm for state transition is gi ven in Algorithm 1 and an algorithm for state update is giv en in Algorithm 2. Use of these algorithms is beneﬁcial when the dimension of ˜ z is smaller than z and the matrix inv erse in (33) is applied in a small dimension. Algorithms 1 and 2 are given in a general form and for applications they can be optimized further by considering the structures of matrices. Examples of exploiting the partially linear state are gi ven in sections VII-A,VII-B, and VII-D. B. Conditionally Linear Measur ements In [29], a situation where a function can be divided into nonlinear z n and conditionally linear z l parts g ( z ) = g n ( z n ) + G n ( z n ) z l (37) is considered. In the original article, the distribution of this nonlinear function is approximated using a modiﬁcation of the 5 Algorithm 2: Update using a partially linear measurement Input: x − ∼ N( µ x − , P x − x − ) // Prior state ε y ∼ N( µ ε y , P ε y ε y ) , // Measurement noise P x − ε y // Cross covariance between state and measurement noise h ( x, ε y ) = h ( z ) = Ag ( T z ) + H z , // Measurement function of given form Output: x ∼ N( µ x , P xx ) // Posterior estimate µ z =  µ x − µ ε y  // Mean of the augmented state P z z =  P x − x − P x − ε y P T x − ε y P ε y ε y  // Covariance matrix of the augmented state µ ˜ z = T µ z // Transformed mean P ˜ z ˜ z = T P z z T T // Transformed covariance Compute µ g (˜ z ) , P g (˜ z ) g ( ˜ z ) , and P ˜ z g ( ˜ z ) using a KFE P z g ( ˜ z ) = P z z T T  T P z z T T  − 1 P ˜ z g ( ˜ z ) µ h ( z ) = Aµ g ( T z ) + H µ z // Predicted mean of measurement P h ( z ) h ( z ) =  A H   P g (˜ z ) g ( ˜ z ) P T z g ( ˜ z ) P z g ( ˜ z ) P z z   A T H T  // Innovation covariance P xh ( z ) =  P z g ( ˜ z ) P z z  [1: n, :]  A T H T  // State-measurement cross correlation K = P xh ( z ) P − 1 h ( z ) h ( z ) // Kalman Gain µ x = µ − x + K ( y − µ h ( z ) ) // Posterior mean P xx = P x − x − − K P h ( z ) h ( z ) K T // Posterior covariance UKF. The number of sigma-points depends on the dimension of z n instead of the dimension of the full state. This algorithm is conv erted for use with the GF in [5]. Here we present the algorithm in a general form that allo ws it to be used with any KFE that uses weighted points to compute the expectations, as in (24)–(26), although it cannot be used with other types of ﬁlters e.g. DD or SOKF2. In the algorithm the sigma-points are computed for the non- linear part only and then used for computing the conditional probabilities of the conditionally linear part: µ z l | χ i = µ z l + P z l z n P − 1 z n z n ( χ i − µ z n ) (38) P z l z l | χ i = P z l z l − P z l z n P − 1 z n z n P z n z l . (39) The matrices in (38–39) are independent of the sigma-point χ i . Thus, P z l z n P − 1 z n z n and P z l z l | χ i need to be computed only once. According to [29], the e xpectations for a function of form (37) can be approximated as Y i = g n ( χ i ) + G n ( χ i ) µ z l | χ i (40) µ g ( z ) = X w i Y i (41) P g ( z ) g ( z ) = X w i   Y i − µ g ( z )   Y i − µ g ( z )  T + G n ( χ i ) P z l z l | χ i G ( χ i ) T  (42) P z g ( z ) = X w i  χ i µ z l | χ i  − µ z   Y i − µ g ( z )  T +  0 P z l z l | z n G n ( χ i ) T  . (43) These formulas can be used to compute the expectations for the nonlinear part g ( ˜ z ) in algorithms 1 and 2, if there are conditionally linear state v ariables. This is done in the e xample in sections VII-B and VII-C. V . O P T I M I Z ATI O N S R E L A T E D T O T H E I N V E R S E O F T H E I N N OV AT I O N C OV A R I A N C E A. Bloc k Diagonal Measur ement Covariance The formula for the Kalman gain (7) contains the in verse of the innov ation covariance S . When the dimension of the innov ation cov ariance (i.e. the number of measurements) is large the computation of the inv erse or solving the set of linear equations is a computationally e xpensiv e operation. When measurements are linear and the measurement co- variance is diagonal, the Kalman update can be applied one measurement element at a time and the partially updated state used as a prior for the next measurement [42]. This kind of update can be generalized also to block diagonal measurement cov ariances and the update can be done by applying one block at a time. This reduces the computation time when the state dimension is small, the measurement dimension is large, and the measurements are independent. When some measurements are independent and these dif- ferent measurements contain dif ferent state terms, updating the state using only a part of measurements at once can be effecti vely combined with the algorithms presented in Section IV. The block update does not change the estimate when the measurement model is linear , b ut when the model is nonlinear the linearization is redone in the partially updated state. This may alter the output of the algorithm. In [34], [35] application of measurements one element at a time is used to improv e the estimation accuracy by applying the most linear measurements ﬁrst. The algorithm for updating by blocks is giv en in Algorithm 3. Algorithm 3: Block Update KF Input: x 0 ∼ N( µ x, 0 , P xx, 0 ) // Prior state ε y i ∼ N( µ ε y i , P ε y i ε y i ) , 1 ≤ i ≤ n // Measurement noises h i ( x, ε y i ) , 1 ≤ i ≤ n // Measurement functions Output: x n ∼ N( µ x,n , P xx,n ) // Posterior estimate for i=1 to n do y − i = µ h i ( x i − 1 ,ε y i ) S i = P h i ( z − t,i ) h ( x i − 1 ,ε y i ) K i = P xh i ( z − t,i ) S − 1 i µ x,i = µ x,i − 1 + K ( y i − y − i ) P xx,i = P xx,i − 1 − K i S i K T i end 6 B. Applying the Matrix In version Lemma to Innovation Co- variance When the measurements are not independent the block up- date formula cannot be used. In some situations the innovation cov ariance can be written in the form S = P s + U P v U T , (44) where P s is easy to in vert, P v has a small dimension and U is a transformation matrix. Using the matrix inv ersion lemma [42, p. 62] the in verse of (44) is S − 1 = P − 1 s − P − 1 s U  P − 1 v + U P − 1 s U T  − 1 U T P − 1 s . (45) This formula is worth using if the in verse of P s is easy to compute or can be computed ofﬂine and the dimension of P v is smaller than the dimension of P s . An example of its use is giv en in Section VII-D. V I . O P T I M I Z A T I O N B A S E D O N D I V I D I N G T H E S TA T E I N T O I N D I V I D UA L E S T I M A T I O N P R O C E S S E S In this section, we study optimizations that can be applied when the state can be di vided into separate subprocesses. The situation where only part of state v ariables occur in the measurements and the part of the state that is not in measurements does not change in time has been considered in [12], [20]. In that situation the estimation process can be done at each time step only for the part of the state that has been observed and then the whole state is updated only occasionally . Such algorithms are de veloped further in [19]. The algo- rithm given in [19] assumes that the state can be divided into substates that can be updated individually . The division can be done when the measurement and state transition models in a time interv al from t 0 to t 1 are decoupled between the substates. State blocks x 1 , x 2 , . . . are considered to be decoupled if the state transition and measurement models can be expressed as f t ( x, ε ) =    f 1 ,t ( x 1 ,t − 1 , ε x 1 ,t ) f 2 ,t ( x 2 ,t − 1 , ε x 2 ,t ) . . .    (46) h t ( x, ε ) =    h 1 ,t ( x 1 ,t , ε y 1 ,t ) h 2 ,t ( x 2 ,t , ε y 2 ,t ) . . .    , (47) where noise terms ε x and ε y are independent. The purpose of this optimization is to allow to track state variables that are decoupled from time index t 0 to time index t 1 . Note that there is no requirement for blocks x 1 , x 2 , . . . to be independent at t 0 . If the blocks were independent at t 0 and models could be expressed using (46) and (47) in all time instances, then the system could be solv ed using a set of independent KFEs. A similar idea for splitting the state into multiple blocks is presented in [8], [45] under the name of multiple quadrature Kalman ﬁltering. In multiple quadrature Kalman ﬁltering, there is no strict requirement for blocks to be decoupled, which makes the algorithm applicable to a larger set of problems, but leads to additional approximations, so we follow [19]. T o update block x a from time t 0 to t 1 a new Gaussian variable ¯ x , which is partitioned into two parts x and ˆ x , is introduced: ¯ x a,t 0 =  x a,t 0 ˆ x a,t 0  ∼ N  µ x a,t 0 µ ˆ x a,t 0  ,  P x a,t 0 x a,t 0 P x a,t 0 ˆ x a,t 0 P ˆ x a,t 0 x a,t 0 P ˆ x a,t 0 ˆ x a,t 0  = N  µ x a,t 0 µ x a,t 0  ,  P x a,t 0 x a,t 0 P x a,t 0 x a,t 0 P x a,t 0 x a,t 0 P x a,t 0 x a,t 0  . (48) In the estimation of this, initially degenerate, v ariable from time t 0 to t 1 x a is updated and propagated normally and ˆ x a changes only through its dependence on part x a . In the prediction, state elements belonging to x a are trans- formed and ˆ x a remains static:  x a,t ˆ x a,t  =  f a,t ( x a,t − 1 , ε x a,t ) ˆ x a,t − 1  . (49) The measurement function is applied only to the ﬁrst elements h a,t  x a,t ˆ x a,t  , ε y a,t  = h  x a,t , ε y a,t  . (50) These updates can be done with any suitable KF or other estimation method. The division of the state into blocks allows also updating blocks at dif ferent rates. The number of elements in the cov ariance matrix of each block is (2 n block ) 2 , thus if a 100-dimensional state is divided into 100 decoupled blocks, the updates of blocks change 400 values in covariance matrices, while the update of the full state would change 10000 v alues. The estimates for each block contain only information about the measurements and state transition of that block, and not information that comes from other blocks through the dependencies of the blocks at time t 0 . At time t 1 the information of the different blocks is merged in a global update that is done in two phases: First a virtual update is constructed from all blocks and applied to the full state estimate at time inde x t 0 . Then a virtual state prediction is applied to the virtually updated full state. This order is the opposite of the usual Kalman ﬁlter update where prediction is usually done ﬁrst and the update is done after that. The virtual update fuses the individual blocks so that the correlations that existed between the blocks at time t 0 are taken into account. T o denote state parameters that have been updated with virtual update we use superscript v ; parameters for which both the virtual update and prediction has been applied are denoted with superscript + . A. V irtual update The virtual update updates the state with a Kalman update that uses virtual observations y v , measurement matrix J v , and measurement cov ariance R v [19]. The parameters for a virtual update are y v t 1 =      y v 1 ,t 1 y v 2 ,t 1 y v 3 ,t 1 . . .      (51) J v t 1 = I (52) 7 R v t 1 =        R v 1 ,t 1 0 . . . 0 R v 2 ,t 1 0 . . . . . . 0 R v 3 ,t 1 . . . . . . . . . . . .        , (53) where components ha ve the form y v a,t 1 = µ x a ,t 0 + P x a x a ,t 0  P x a x a ,t 0 − P ˆ x a ˆ x a ,t 1  − 1  ˆ µ a,t 1 − µ x a ,t 0  (54) R v a,t 1 = P x a x a ,t 0  P x a x a ,t 0 − P ˆ x a ˆ x a ,t 1  − 1 P x a x a ,t 0 − P x a x a ,t 0 . (55) These both require the in version of  P x a x a ,t 0 − P ˆ x a ˆ x a ,t 1  which may be singular . One example of it being singular is the situation of not having any measurements considering the current block. In [19], the singular situation is handled by using singular v alue decomposition. In the following, we giv e a new formulation that does not require the singular value decomposition and is equi valent to the update presented in [19]. The formulation is deri ved in the Appendix. In the new formulation, the ”posterior” parameters after the virtual update are µ v x,t 1 = µ x,t 0 +  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 ·  ( P ˆ x ˆ x,t 1 ) − 1 ˆ µ x,t 1 − ( P xx,t 0 ) − 1 µ x,t 0  (56) P v xx,t 1 =  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 , (57) where P ˆ x ˆ x,t 1 and P D ˆ x ˆ x,t 0 are deﬁned as P ˆ x ˆ x,t 1 =        P ˆ x 1 ˆ x 1 ,t 1 0 . . . 0 P ˆ x 2 ˆ x 2 ,t 1 0 . . . . . . 0 P ˆ x 3 ˆ x 3 ,t 1 . . . . . . . . . . . .        (58) P D xx,t 0 =        P x 1 x 1 ,t 0 0 . . . 0 P x 2 x 2 ,t 0 0 . . . . . . 0 P x 3 x 3 ,t 0 . . . . . . . . . . . .        . (59) Because these matrices are block diagonal, their in verses can be computed blockwise. This formulation does not require inv ersion of singular matrices, provided that • Prior P xx,t 0 has full rank • Posterior P xx,t 1 has full rank B. V irtual state pr opagation After all blocks are updated as presented in the pre vious subsection the state is propagated with a virtual state prop- agation to obtain the global posterior . The state propagation model is linear x = F v ( x v − ˆ µ ) + µ + ε v , (60) where ε v has zero mean and cov ariance Q v . Thus the posterior mean and cov ariance are µ + x,t 1 = F v µ v − F v ˆ µ + µ (61) P + x,x,t 1 = F v P v F v T + Q v (62) The state propagation parameters have the follo wing form µ =  µ T x 1 ,t 1 µ T x 2 ,t 1 . . .  T (63) ˆ µ =  ˆ µ T x 1 ,t 1 ˆ µ T x 2 ,t 1 . . .  T (64) F v =        F 1 0 . . . 0 F 2 0 . . . . . . 0 F 3 . . . . . . . . . . . .        (65) Q v =        Q 1 0 . . . 0 Q 2 0 . . . . . . 0 Q 3 . . . . . . . . . . . .        , (66) where the matrix blocks are F v a = P x a ˆ x a ,t 1 P − 1 ˆ x a ˆ x a ,t 1 (67) Q v a = P x a x a ,t 1 − P x a ˆ x a ,t 1 P − 1 ˆ x a ˆ x a ,t 1 P ˆ x a x a ,t 1 . (68) The full algorithm for ﬁltering a model whose state has decoupled blocks is gi ven in Algorithm 4. C. Static bloc ks In this section, we consider a common special case that allows to op timize the formulas in the previous sections further . This corresponds to the algorithm presented in [27]. Let x s be a block that does not change during the estimation from time t 0 to t 1 . For this block µ x s,t 0 = µ x s,t 1 = µ ˆ x s,t 0 = µ ˆ x s,t 1 (69) P x s x s,t 0 = P ˆ x s ˆ x s,t 0 = P x s x s,t 1 = P ˆ x s ˆ x s,t 1 = P ˆ x s x s,t 1 (70) and there is no need to compute their v alues before the ﬁnal update where the whole state is updated. W e can see by looking at (54) and (55) that the formulation presented in [19] would require in version of zero matrices. In (57) the inv erses can be computed blockwise. Because block s of ( P ˆ x ˆ x,t 1 ) − 1 and  P D xx,t 0  − 1 are identical their in verses cancel each other and so do not need to be computed. In the virtual state propag ation F v s is an identity matrix and Q v s is a zero matrix. An e xample of applying the block algorithm is presented in Section VII-E. V I I . E X A M P L E A P P L I C A T I O N S A. F ourier series In this example, we sho w ho w the general form of Algo- rithm 2 is simplifed for a speciﬁc application and compare the computational complexity of the optimized version when we are using UKF as the estimation algorithm. In this example an 8 Algorithm 4: KF with decoupled blocks Input: x t 0 ∼ N( µ x, 0 , P xx, 0 ) // Prior state that can be divided into decoupled blocks f t,a ( x a , ε x t,a ) // State transition functions for blocks for each time step h t,a ( x a , ε y t,a ) // Measurement functions for blocks for each time step // Note that each block may have different number of state transition and measurement functions between t 0 and t 1 Output: x ∼ N( µ + x,t 1 , P + xx,t 1 ) // Posterior estimate for a=1 to n do µ x a =  µ x a,t 0 µ x a,t 0  // Initialize block mean P x a x a =  P x a,t 0 x a,t 0 P x a,t 0 x a,t 0 P x a,t 0 x a,t 0 P x a,t 0 x a,t 0  // Initialize block covariance end for a = 1 to n do for t a = t 0 to t 1 do Update µ x a and P x a x a with the state propagation and measurement models at time t a using a suitable ﬁlter and (49)–(50). end end // Update global state after each block is updated to time t 1 Compute virtual prediction mean µ v x,t 1 with (56) and cov ariance P v xx,t 1 with (57). Compute virtual prediction mean µ + x,t 1 with (61) and cov ariance P + xx,t 1 with (62). n dimensional state that is inferred giv en a scalar measurement of form y = a 0 + k X j =1 ( a j sin( j x 1 ) + b j cos( j x 1 )) + ε y , (71) where ε y ∼ N(0 , R ) is independent of the state. The aug- mented state is  x ε y  . Matrices for Algorithm 2 are T =  1 0 . . .  and H =  0 . . . 0 1  . Using these the transformed mean is µ ˜ z = µ x 1 and the transformed covariance is P ˜ z ˜ z = P x 1 x 1 . Sigma-points are generated using one- dimensional mean µ x 1 and covariance P x 1 x 1 and the moments are µ g (˜ z ) = X w i g ( χ i ) = X   w i a 0 + k X j =1 ( a j sin( j χ i ) + b j cos( j χ i ))   P g (˜ z ) g ( ˜ z ) = X w i ( g ( χ i ) − µ g ( ˜ z ) )( g ( χ i ) − µ g (˜ z ) ) T P ˜ z g ( ˜ z ) = X w i ( χ j − µ ˜ z )( g ( χ i ) − µ g (˜ z ) ) T . (72) T ABLE I C O MP U TA T I O NA L C OM P L E XI T Y O F PA RT S O F T H E U PD A T E O F T HE S TA T E W I TH D I FF ER E N T O PT I M I ZAT IO N S Computation of Optimized version Basic UKF Sigma points O (1) O ( n 3 ) µ g ( z ) O ( k ) O ( nk ) P h ( z ) h ( z ) O ( k ) O ( nk ) P zg ( z ) O ( n + k ) O ( n 2 + nk ) K O ( n ) O ( n ) µ O ( n ) O ( n ) P O ( n 2 ) O ( n 2 ) and equation (33) is P z g ( ˜ z ) = P z z [: , 1] P − 1 z z [1 , 1] P ˜ z g ( ˜ z ) (73) Because measurement error has zero mean we have µ h ( z ) = µ g (˜ z ) (74) and, as it is independent of the state, the last element of (73) is 0. Thus, we can simplify (31) to P h ( z ) h ( z ) = P g (˜ z ) g ( ˜ z ) + R (75) and (36) to P xh ( z ) = P z z [: , 1] P − 1 z z [1 , 1] P ˜ z g ( ˜ z ) . (76) Using the abov e equations we e valuate the complexity improv ements when they are applied with UKF that uses 2 n +1 sigma-points. T able I shows the asymptotic complexities of application of basic UKF and optimized version. The com- plexities are giv en as functions of k to take into account the complexity of the measurement function and of n , the number of state v ariables. The highest complexities are underlined. For the optimized version, the most complex parts of the update ha ve complexities O ( n + k ) for computing the cross correlation and O ( n 2 ) for updating the covariance matrix. The most demanding parts for basic UKF are also computing the cross correlation which has complexity O ( n 2 + nk ) and the computation of sigma points which has complexity O ( n 3 ) due to computing the matrix square root of the cov ariance matrix. This implies that the gi ven optimization reduces computational load in both situations where n or k dominates. B. P edestrian Dead Rec koning In Pedestrian Dead Reckoning (PDR), a pedestrian’ s posi- tion is estimated based on steps and heading changes that are measured with accelerometers and gyroscopes. In [32], a state consists of two-dimensional position r , heading θ , step length l , and possibly a ﬂoor inde x. The state transition is computed with a particle ﬁlter . Here we show how the state model without ﬂoor inde x and with normal noise can be optimized for use with KFEs. W e use in this example algorithms presented in Sections IV -A and IV -B. 9 The nonlinear state transition model is x t +1 =     r 1 ,t +1 r 2 ,t +1 θ t +1 l t +1     =     r 1 ,t + ( l t + ε x l ) cos ( θ t + ε x θ ) + ε x r, 1 r 2 ,t + ( l t + ε x l ) sin ( θ t + ε x θ ) + ε x r, 2 θ t + ε x θ l t + ε x l     , (77) where ε x θ ∼ N(∆ t , σ 2 θ ) is the noisy heading change, with ∆ t measured from gyroscopes, l is the footstep length and r is the position of the pedestrian. The augmented state is z =  r 1 r 2 θ l ε x r, 1 ε x r, 2 ε x θ ε x l  T . The state transi- tion model can be written in the form of (28): ˜ z =  θ t + ε x θ l t + ε x l  T =  0 2 × 2 I 2 × 2 0 2 × 2 I 2 × 2  g ( ˜ z ) =  ˜ z 2 cos ˜ z 1 ˜ z 2 sin ˜ z 1  A =  I 2 × 2 0 2 × 2  H =  I 4 × 4 I 4 × 4  . (78) In this formulation the reduced state ˜ z is two-dimensional and the nonlinear part of the state transition function is also two- dimensional. If this problem is to be solved with a KFE that uses sigma- points, we can optimize the implementation further . This can be done because ˜ z 2 in (78) is conditionally linear , gi ven ˜ z 1 . The parts of function (37) are z n = ˜ z 1 z l = ˜ z 2 g n ( ˜ z 1 ) =  0 0  G n ( ˜ z 1 ) =  sin ˜ z 1 cos ˜ z 1  . (79) Using this the dimension of the nonlinear part of the state is reduced to 1 and the sigma-points are generated for only one dimension. The reduction of the computational complexity using these optimizations depend on the used KFE. Using QKF with parameter α = 3 the number of sigma-points for computing (17)-(19) is reduced from 6561 to 3. W ith UKF the reduction is not as signiﬁcant, the number of sigma-points is reduced from 17 to 3. Figure 1 shows mean estimates of the ﬁrst position vari- able r 1 after one propagation step in a simulation. The initial state has step length 1 and step direction 50 π 180 degrees. The initial covariance is strongly correlated being diag  1 , 1 , 1 , 0 . 01 , . 01 , . 01 , π 2 180 2 , . 00001  + 1 . The estimates are computed using S 2 KF with a v arying number of sigma- points. An implementation of the sigma-point generator and S 2 KF can be found in [40]. S 2 KF is applied to the original state (77) and to the abov e presented reduced states with 2 Number of sigma points 0 5 10 15 20 25 30 35 40 45 50 Mean of r 1 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 Dimension of the nonlinear state 8 2 1 Fig. 1. Estimated mean of r 1 with different number of function evaluation points using S 2 KF and 1 nonlinear variables. The propagation of the state with the full state model cannot be computed with fewer than 9 sigma-points. The state computed using the model that uses conditional linearity in addition to the linearity has the smoothest con- ver gence to an estimate and the estimate with full state estimate has the lar gest variation. This shows that the use of optimizations can also improve the estimation accuracy as the sigma-points will be better aligned along the dimensions with nonlinearities. Ho wev er , if one w ould use a KFE whose number of sigma-points cannot be altered, such as having 2 n + 1 sigma-points in UKF, one may end up with better (or sometimes worse) accuracy without optimizations just because there are more sigma-points. W e don’t compute the accuracy of the estimates in the following examples as it is dependent on the selected KFE and parameters. One can expect the accuracy to be approximately the same when the optimizations are applied. C. Remaining useful life prediction of lithium-ion battery In this example we apply the optimizations to prediction of the useful lifetime of a lithium-ion battery . The original model is proposed in [28]. The state consists of four estimated variables x =  x 1 x 2 x 3 x 4  T and the state transition model is linear x t +1 = x t + ε x , where ε x ∼ N(0 , Q ) and measurement model is y t = x 3 ,t e tx 1 ,t + x 4 ,t e tx 2 ,t + ε y , (80) where y t is the measured capacity of the lithium-ion battery at cycle t , and ε y has v ariance R . Because the state transition model is completely linear it is evident that the propagation should be done using the linear KF, which becomes µ − x t = I µ x t − 1 = µ x t P − x t x t = I P x t − 1 x t − 1 I T + Q t = P x t − 1 x t − 1 + Q t (81) This prediction requires 16 additions when Q is added to the state cov ariance, or 4 if Q is diagonal. If the prediction were computed with UKF and an augmented state, there would be 11 sigma-points with 5 elements each, thus computation of predicted cov ariance (18) requires 550 products and 275 summations. In addition, generation of sigma points in volving 10 matrix square root computation and computation of the mean (17) have to be performed. The absolute numbers of computa- tions are small for modern computers either way , b ut reducing the number of computations by a factor greater than 200 is a signiﬁcant reduction of operations. The measurement model has one additiv e component, ε y , and we can use (24)-(26) for it. V ariables x 3 and x 4 are condi- tionally linear and we can apply equations from Section IV -B. In this case µ x l = µ [3 , 4] µ x n = µ [1 , 2] P x l x l = P [3 , 4] , [3 , 4] P x l x n = P [3 , 4] , [1 , 2] P x n x n = P [1 , 2] , [1 , 2] g n ( χ i ) = 0 G n ( χ i ) =  e tµ 1 e tµ 2  (82) These can be used to compute µ x l | χ i (38) and P x l x l | χ i (39). Merging (24)-(26) and (40)-(43) giv es us Y i = G n ( χ i ) µ x l | χ i µ g ( z ) = X w i Y i P g ( z ) g ( z ) = X w i  ( Y i − µ g ( z ) )( Y i − µ g ( z ) ) T + G n ( χ i ) P x l x l G n ( χ i ) T  + R P z g ( z ) = X w i  χ i µ z l | χ i  − µ x  ( Y i − µ g ( z ) ) T +  0 P x l x l | χ i G n ( χ i ) T  . (83) In this situation, the computation gains come mostly from the reduction of the number of sigma-points as the dimension is reduced from 5 to 2. When using UKF the number of sigma-points is reduced from 11 to 5. Howe ver , the abov e computations are more complex for each summand than if the moments would hav e been computed using (17)-(19). So the optimizations based on conditional linearity are not reducing the computational comple xity . Ho we ver , when using QKF with parameter α = 4 , the number of sigma-points is reduced from 4 5 = 1024 to 4 2 = 16 and the optimizations are useful. D. Sour ce T rac king Using a Micr ophone Array T ime Differ - ence of Arrival (TDoA) In tracking of a sound source using a microphone array , the receiv ed acoustic signals of two microphones are compared so that the TDoA of the acoustic signal is measured. In the ideal situation the TDoA is h i,j ( r 0 ) = | | r i − r 0 | | − | | r j − r 0 | | v , (84) where r 0 is the location of the sound source, r i is the location of the i th microphone and v is the speed of sound. When there are m microphones in the array there are at most 1 2 m ( m − 1) TDoA measurements a vailable [33]. In practice, measurements contain noise. Here we consider model with noises h i,j ( x ) = | | r i − x | | − | | r j − x | | + ε y i − ε y j + ε y i,j , (85) where ε i is the noise corresponding to i th microphone and ε i,j is the error corresponding to the correlation of the measured signals from the microphone pair . The speed of sound is multiplied out from the equation. When ε i,j = 0 for all microphone pairs the measurement noise cov ariance is not full rank and the measurement equa- tions can be modeled with m − 1 measurements of the form h i ( x ) = | | r i − x | | − | | r m − x | | + ε y i − ε y m , (86) which has a full rank noise cov ariance matrix. This kind of assumption is done for example in [4]. In practice, this assumption does not hold and the selection of microphone pairs is a tradeoff, where pairs close to each other hav e smaller correlation error ε i,j , b ut worse geometry than pairs far aw ay from each other [38]. Here we consider the situation where all possible microphone pairs are used and errors ε i and ε i,j are modeled as Gaussians. The augmented state model is z =  x T ε y 1 , 1 ε y 1 , 2 . . . ε y m,m ε y 1 . . . ε y m  T , (87) where x contains 3 position and velocity variables. Using (24)–(26) the nonlinear part of the estimation can be done only for the state v ariables and sigma-points would be required only for the 6 dimensional state instead of using sigma-points also for 1 2 m ( m + 1) noise terms. The measure- ment model can be written in compact form using (28): ˜ z = x 1:3 T =  I 3 × 3 0 3 × 3  g ( ˜ z ) =  | | r 1 − ˜ z | | . . . | | r m − ˜ z | |  T A =        1 m − 1 × 1 − I m − 1 × m − 1 0 m − 2 × 1 1 m − 2 × 1 − I m − 2 × m − 2 0 m − 3 × 2 1 m − 3 × 1 − I m − 3 × m − 3 . . . 0 1 × m − 2 1 − 1        H = h 0 m ( m − 1) 2 × 6 A I m ( m − 1) 2 × m ( m − 1) 2 i . (88) Using these the dimension of the nonlinear function is reduced from 1 2 m ( m − 1) to m . This reduces the number of updated elements for each sigma-point in (18) from  m ( m − 1) 2  2 to m 2 . A and H matrices are sparse and an application of sparse linear algebra codes would further enhance the performance of the algorithm. The dimension of the innov ation cov ariance S is  m ( m − 1) 2  2 ×  m ( m − 1) 2  2 . The computation of the in verse of the innov ation cov ariance for Kalman gain (7) can be optimized using the in version formula (45). The noise terms are assumed independent so the noise co variance matrix R is diagonal. W e partition noises into two parts, and denote the cov ariance matrices D 1 and D 2 . D 1 corresponds to micro- phone speciﬁc terms ε y i and D 2 to microphone pair speciﬁc terms ε y i,j . Because ˜ z contains only state variables and they do not contrib ute to the linear part ( AP z g ( ˜ z ) T H T = 0 ), the innov ation cov ariance can be written as S = D 2 + A ( P g (˜ z ) g ( ˜ z ) + D 1 ) A T (89) 11 T=1 T=2 T=5 T=10 Prior mean Posterior mean True location BS location Measurement Fig. 2. Posterior means computed with R UF with v arying number of iterations. and its in verse is S − 1 = D − 1 2 + D − 1 2 AQ − 1 A T D − 1 2 , (90) where Q =  ( P g (˜ z ) g ( ˜ z ) + D 1 ) − 1 + AD − 1 2 A T  . Because D − 1 2 is diagonal its in verse is diagonal with the reciprocal of the diagonal elements of D 2 on its diagonal. Other in verses that appear in this formula are applied to m × m matrices instead of  m ( m − 1) 2  2 ×  m ( m − 1) 2  2 matrices. If the used matrix in version algorithm has complexity of O ( k 3 ) , the comple xity of the in version operation is reduced from O ( m 12 ) to O ( m 3 ) . E. Optimization of Iter ative KFEs The optimizations based on the division of the state into multiple sub-blocks in Section VI are most widely used in the ﬁeld of Simultaneous Localization and Mapping (SLAM) [15], [19], [20]. In SLAM, the state contains a dynamic part that represents the agent (robot) that is mapping static land- marks in the environment and localizing itself. The landmark coordinates are the static part of the state. W e propose that these optimizations can be applied also to KFEs that do the update iteratively e.g. Recursiv e Update Filter (R UF) [47] and its extension for sigma-point ﬁlters [23] or posterior linearization ﬁlters [16], [36]. In RUF, the state is updated by applying the update in parts that hav e smaller impact than the original update. After each part of the update the linearization is recomputed. These iterativ e ﬁlters are dev eloped assuming additi ve Gaussian noise. Instead of making the updates for the full state the update could be done iterati vely only for the observed variables. Because the iterati ve update is actually for a single observation, the unobserved state variables are static during the partial updates. In this example we consider a state model with 3D position, velocity , and acceleration i.e. 9 state variables. Measurements used are range measurements from 3 base stations. The measurement model for the range from the i th base station is y i =   x [1:3] − r i   + ε y i , (91) where r i is the location of the i th base station. In our example, the prior has a large variance and its mean is chosen so that the linearization about the mean is not accurate. Figure 2 T ABLE II S U MM A RY O F PR E S E NT E D O P TI M I ZAT IO N S Section Exploited structure Section IV -A Partially linear functions f ( z ) = Ag ( T z ) + H z Section IV -B Conditionally linear functions f ( z ) = g n ( z n ) + G ( z n ) z l Section V -A Measurement covariance is block diagonal Section V -B Innov ation co variance of form S = D 2 + A ( P g ( ˜ z ) g ( ˜ z ) + D 1 ) A T Section VI State can be di vided into multiple subprocesses shows the example situation and the posterior means which are computed with different number of iterations of R UF. The estimate with one iteration ( T = 1 ) is identical to the EKF estimate and is not close to the true location. The estimate with 10 iterations is close to the true location. The computation of 10 iterations with R UF in volves the updates to the 9 × 9 state co variance matrices 10 times. Because the measurement model (91) depends only on the three ﬁrst state variables this block can be updated 10 times using methods presented in Section VI and the remaining variables can be left in a block that does not need any update until the full covariance update. The full co variance update is computed after all iterations are applied. In this update scheme, the 3 × 3 cov ariance matrix is updated T times and the 9 × 9 matrix is updated only once using (56)–(68). Thus, in each iteration the cov ariance update is 9 times faster . Moreov er , if the KFE computes sigma-points using Cholesky decomposition, as UKF does, there is more gain in speed. This is because the complexity of Cholesky decomposition is O ( n 3 ) . V I I I . C O N C L U S I O N S In this tutorial, we presented different optimizations for KFEs that exploit the structure of the state and measurement models to reduce the computational load. T able II summarizes the structures of models that are e xploited in different opti- mizations. These structures are present in v arious nonlinear estimation problems. T able III shows some properties of selected KFEs. Different optimizations give different amounts of speed increase with different KFEs. Optimizations presented in sections IV -A and IV -B are most useful with KFEs that have a high number of function e valuations as a function of state dimension. The e xploitation of conditionally linear part of functions in Section IV -B requires that the expectations are approximated with equations of form (17)–(19). In addition to surveying existing code optimization methods and using a general uniﬁed notation that allo ws them to be used with v arious KFEs, we make the following contributions: 1) W e point out the possibility to use a linear transfor - mation to ﬁnd the minimal nonlinear subspace (Sec- tion IV -A). 2) W e introduce an algorithm for systems with condition- ally linear states (Section IV -B); it can be used to compute the moments for the nonlinear part and solve others using linearity (Section IV -A) as is done in the example in Section VII-B. 12 T ABLE III S U MM A RY O F PR O PE RT I ES O F S E L EC T E D K FE S Algorithm Order of measurement function ev aluations α = algorithm speciﬁc parameter ( α > 0 ) j = number of iterations Computation of J A=analyticall I=using (20) N=numerically inside algorithm Expectations of form (17)–(19) Iterativ e algorithm EKF [26] 1 A EKF2 [26] 1 A UKF [46] n I X CKF [2] n α I X S 2 KF [41] n + α I X GF [24] αn I X QKF [25] α n I X DD [31] n N SOKF2 [25] n 2 N R UF [47] j A X 3) W e present a new formulation for estimation with a state that can be di vided into separate estimation processes (Section VI). The new formulation avoids dealing with in verses of singular matrices. 4) W e present an example showing that in addition to gain- ing increase in speed of computation the optimization algorithms may also lead to better estimation accuracy (Section VII-B). 5) W e show ho w to use the matrix inv ersion lemma to reduce the computational complexity in TDoA-based positioning (Section VII-D). 6) W e present an example showing that the optimization that exploits a static state and partly unobserved v ari- ables can be applied with iterati ve KFEs (Section VII-E). Optimizations in this tutorial were applied to KFEs that use a mean vector and co variance matrix to represent the state. Howe ver , there is another ﬂav our of KFs, called square root ﬁlters, that propagate a square root of the cov ariance matrix instead of the full matrix for better numerical properties [1], [6], [13], [44]. In [9], square root ﬁltering is optimized for the situation where some of the state v ariables are linear and applying independent measurements sequentially as in Section V -A is straightforward. Ho wev er , most of the opti- mizations presented in this tutorial have not been considered for square-root form in literature and this remains an open topic. Another interesting topic is whether optimizations such as the ones in this tutorial could be applied with EnKF. EnKF is mainly used for problems whose state space dimension is so large that it is infeasible to work with the cov ariance matrix. In such a context, one can expect that the effort in vested in ﬁnding and implementing computation optimizations would be especially worthwhile. R E F E R E N C E S [1] A R A SA R A T NA M , I . , A N D H A Y K I N , S . Square-root quadrature Kalman ﬁltering. IEEE T ransactions on Signal Processing 56, 6 (June 2008), 2589–2593. [2] A R A SA R A T NA M , I . , A N D H A Y KI N , S . Cubature Kalman ﬁlters. IEEE T ransactions on Automatic Control 54, 6 (June 2009), 1254–1269. [3] A R N OL D , S . S . , N U Z Z AC I , R . , A ND G O RD O N -R O SS , A . Energy budgeting for CubeSats with an integrated FPGA. In 2012 IEEE Aerospace Conference (March 2012), pp. 1–14. [4] B E C HL E R , D . , S C HL O S SE R , M . S . , A N D K RO S C H EL , K . System for robust 3d speaker tracking using microphone array measurements. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS) (Sendai, Japan, 2004), vol. 3, IEEE, pp. 2117–2122. [5] B E U TL E R , F. , H UB E R , M . F. , A ND H A N E BE C K , U . D . Gaussian ﬁltering using state decomposition methods. In Proceedings of the 12th International Conference on Information Fusion (FUSION) (Seattle, W A, USA, July 2009), IEEE, pp. 579–586. [6] B I E RM A N , G. J . Factorization Methods for Discrete Sequential Estimation. Academic Press, Inc., Ne w Y ork, 1977. [7] B R I ER S , M . , M A SK E L L , S . R . , A ND W RI G H T , R . A Rao-Blackwellised Unscented Kalman ﬁlter . In Proceedings of the 6th International Conference of Information Fusion (FUSION) (Cairns, Australia, July 2003), vol. 1, IEEE, pp. 55–61. [8] C L O SA S , P . , F E R NA ND E Z - P R A D E S , C . , A ND V I LA - V A L L S , J . Multiple quadrature kalman ﬁltering. IEEE T ransactions on Signal Processing 60, 12 (Dec 2012), 6125–6137. [9] C L O SA S , P . , A N D F E RN ´ A N DE Z - P RA D E S , C . The marginalized square- root quadrature Kalman ﬁlter . In 2010 IEEE 11th International W orkshop on Signal Processing Advances in Wireless Communications (SP A WC) (June 2010), pp. 1–5. [10] C L O SA S , P . , V I L ` A - V A L L S , J . , A N D F ER N ´ A N DE Z - P RA D E S , C . Compu- tational complexity reduction techniques for quadrature kalman ﬁlters. In 2015 IEEE 6th International W orkshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) (Dec 2015), pp. 485– 488. [11] DA C O S T A , S . C . L ., A R AU JO , A . S . , A N D D . S . C A RV A L H O , A . Battery state of charge estimation using extended Kalman ﬁlter . In 2016 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEED AM) (June 2016), pp. 1085–1092. [12] D A V I S ON , A . J . Mobile Robot Na vigation Using Active Vision. PhD thesis, Department of Engineering Science, Univ ersity of Oxford, 1998. [13] D E R M E RW E , R . V . , A N D W AN , E . A . The square-root unscented Kalman ﬁlter for state and parameter-estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’01) (2001), vol. 6, pp. 3461–3464 vol.6. [14] D O U CE T , A ., F R E I T A S , N . D . , M UR P H Y , K . P . , A N D R US S E L L , S . J . Rao-Blackwellised particle ﬁltering for dynamic Bayesian networks. In Proceedings of the 16th Conference on Uncertainty in Artiﬁcial Intelligence (San Francisco, CA, USA, 2000), U AI ’00, Morgan Kauf- mann Publishers Inc., pp. 176–183. [15] E S T RA DA , C . , N E I R A , J ., A N D T A R DO S , J . D . Hierarchical slam: Real- time accurate mapping of large en vironments. IEEE T ransactions on Robotics 21, 4 (Aug 2005), 588–596. [16] G A R C ´ I A -F E R N ´ A N DE Z , A . F., S V E N SS O N , L . , M O R E L AN D E , M . R ., A N D S ¨ A R KK A , S . Posterior linearisation ﬁlter: principles and imple- mentation using sigma points. IEEE T ransactions on Signal Processing, 99 (2015), 1–1. [17] G O L UB , G . H . , A ND V A N L OA N , C . F . Matrix computations, vol. 3. JHU Press, 2012. [18] G R E W A L , M . S ., A N D A N D R EW S , A . P . Applications of Kalman ﬁltering in aerospace 1960 to the present [historical perspectiv es]. IEEE Control Systems Magazine 30, 3 (June 2010), 69–78. [19] G U I V A N T , J . E . The generalized compressed Kalman ﬁlter . Robotica 35, 8 (2017), 1639–1669. [20] G U I V A N T , J . E . , A ND N E B OT , E . M . Optimization of the simultaneous localization and map-building algorithm for real-time implementation. IEEE Transactions on Robotics and Automation 17, 3 (Jun 2001), 242– 257. [21] H I L T U NE N , P . , S ¨ A R KK ¨ A , S . , N I SS I L ¨ A , I . , L A JU N E N , A ., A N D L A MP I N E N , J . State space regularization in the nonstationary in verse problem for diffuse optical tomography . In verse Problems 27, 2 (2011), 025009. [22] H O , Y . - C . , AN D L E E , R . A Bayesian approach to problems in stochastic estimation and control. IEEE T ransactions on Automatic Control 9, 4 (October 1964), 333 – 339. [23] H UA N G , Y . , Z HA N G , Y . , L I , N . , A ND Z H AO , L . Design of sigma-point Kalman ﬁlter with recursi ve updated measurement. Circuits, Systems, and Signal Processing (2015), 1–16. 13 [24] H U B ER , M . F., A N D H A N E BE C K , U . D . Gaussian ﬁlter based on deter- ministic sampling for high quality nonlinear estimation. In Proceedings of the 17th { IF AC } W orld Congress (2008), v ol. 41, pp. 13527 – 13532. [25] I T O , K ., A N D X I ON G , K . Gaussian ﬁlters for nonlinear ﬁltering problems. IEEE Transactions on Automatic Control 45, 5 (May 2000), 910–927. [26] J A Z WI N S K I , A . H . Stochastic Processes and Filtering Theory. Academic Press, New Y ork, NY , USA, 1970. [27] L E F EB V R E , T. , B RU Y N IN C K X , H . , A N D S C H UT T E R , J . D . Nonlinear Kalman Filtering for Force-Controlled Robot T asks, vol. 19 of Springer T racts in Adv anced Robotics. Springer-V erlag, Berlin, Germany , 2005. [28] M I AO , Q . , X I E , L . , C U I , H . , L I AN G , W. , A ND P E C HT , M . Remaining useful life prediction of lithium-ion battery with unscented particle ﬁlter technique. Microelectronics Reliability 53, 6 (2013), 805 – 810. [29] M O R EL A N DE , M . R . , A N D M O R AN , B . An unscented transformation for conditionally linear models. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Honolulu, HI, USA, April 2007), vol. 3, IEEE, pp. III–1417–III–1420. [30] M O R EL A N DE , M . R . , AN D R I S TI C , B . Reduced sigma point ﬁltering for partially linear models. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (T oulouse, France, May 2006), v ol. 3, IEEE, pp. III–III. [31] N Ø R GA A R D , M . , P O U L SE N , N . K . , A N D R A V N , O . New de velopments in state estimation for nonlinear systems. Automatica 36, 11 (Nov . 2000), 1627–1638. [32] N U R MI N E N , H . , R I ST I M A KI , A . , A L I -L ¨ O Y TT Y , S . , A ND P I C H ´ E , R . Particle ﬁlter and smoother for indoor localization. In Proceedings of the International Conference on Indoor Positioning and Indoor Navigation (IPIN) (Montb ´ eliard, France, Oct 2013), IEEE, pp. 1–10. [33] O UA L I L , Y . , F AU B EL , F. , D O S S , M . M . , A ND K L A KOW , D . A TDO A Gaussian mixture model for improving acoustic source tracking. In Proceedings of the 20th European Signal Processing Conference (EUSIPCO) (Bucharest, Romania, 2012), IEEE, pp. 1339–1343. [34] R A I TO H AR J U , M . , G A R C ´ I A -F E R N ´ A N DE Z , A . F . , A N D P I C H ´ E , R . Kullback–Leibler div ergence approach to partitioned update Kalman ﬁlter . Signal Processing 130 (2017), 289 – 298. [35] R A I TO H AR J U , M . , P I C H ´ E , R . , A L A -L U H T A L A , J . , A N D A L I -L ¨ O Y TT Y , S . Partitioned update Kalman ﬁlter . Journal of Adv ances in Information Fusion (June 2016), 3–14. In press. [36] R A I TO H AR J U , M . , S V E N SS O N , L . , G A R C IA - F E RN AN D E Z , A . F. , A N D P I CH E , R . Damped posterior linearization ﬁlter. IEEE Signal Processing Letters (2018). In press. [37] S ¨ A R KK ¨ A , S . Bayesian ﬁltering and smoothing, vol. 3. Cambridge Univ ersity Press, Cambridge, UK, 2013. [38] S I L V E RM A N , H . F. , P A T T E RS O N , W . R ., A N D F LA N AG AN , J . L . The huge microphone array . IEEE Concurrency 6, 4 (1998), 36–46. [39] S O R EN S O N , H . W . , A N D A LS PAC H , D . L . Recursive Bayesian estima- tion using Gaussian sums. Automatica 7, 4 (1971), 465–479. [40] S T E IN B R IN G , J . Nonlinear estimation toolbox, Accessed: Dec. 8. 2015. [41] S T E IN B R IN G , J . , A N D H A NE B E C K , U . D . LRKF Revisited: The Smart Sampling Kalman Filter (S2KF). Journal of Adv ances in Information Fusion 9, 2 (Dec. 2014), 106 – 123. [42] S T E NG E L , R . F. Stochastic optimal control. W iley-Interscience, New Y ork, 1986. [43] S T R AS S E N , V . Gaussian elimination is not optimal. Numerische mathematik 13, 4 (1969), 354–356. [44] T H O RN T ON , C. L . T riangular Covariance Factorizations for Kalman Filtering. PhD thesis, N ASA, 1993. NASA T echnical Memorandum 33-798. [45] V I L ` A - V A L L S , J ., C L O S AS , P . , A N D G A R C ´ I A -F E R N ´ A N DE Z , ´ A . F. Un- certainty exchange through multiple quadrature kalman ﬁltering. IEEE Signal Processing Letters 23, 12 (Dec 2016), 1825–1829. [46] W A N , E . A . , A N D V A N D E R M E RWE , R . The unscented Kalman ﬁlter for nonlinear estimation. In Proceedings of the Adapti ve Systems for Signal Processing, Communications, and Control Symposium (AS-SPCC) (Lake Louise, AB, Canada, 2000), IEEE, pp. 153–158. [47] Z A N ET T I , R . Recursive update ﬁltering for nonlinear estimation. IEEE T ransactions on Automatic Control 57, 6 (June 2012), 1481–1490. 14 A P P E N D I X Deriv ation of the virtual update that does not require the in version of a possibly singular matrix. First, we deﬁne matrices P ˆ x ˆ x,t 1 and P D ˆ x ˆ x,t 0 P ˆ x ˆ x,t 1 =        P ˆ x 1 ˆ x 1 ,t 1 0 . . . 0 P ˆ x 2 ˆ x 2 ,t 1 0 . . . . . . 0 P ˆ x 3 ˆ x 3 ,t 1 . . . . . . . . . . . .        (92) P D xx,t 0 =        P x 1 x 1 ,t 0 0 . . . 0 P x 2 x 2 ,t 0 0 . . . . . . 0 P x 3 x 3 ,t 0 . . . . . . . . . . . .        (93) The virtual update co variance R v t 1 can be written using these and the matrix in version lemma R v t 1 = P D xx,t 0  P D xx,t 0 − P ˆ x ˆ x,t 1  − 1 P D xx,t 0 − P D xx,t 0 =  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1  − 1 (94) The predicted measurement is y v t 1 = µ x,t 0 + P D xx,t 0  P D xx,t 0 − P ˆ x ˆ x,t 1  − 1 ( ˆ µ t 1 − µ x,t 0 ) = µ x,t 0 + P D xx,t 0   P D xx,t 0  − 1 −  P D xx,t 0  − 1   P D xx,t 0  − 1 − ( P ˆ x ˆ x,t 1 ) − 1  − 1  P D xx,t 0  − 1  ( ˆ µ t 1 − µ x,t 0 ) = µ x,t 0 +  I −   P D xx,t 0  − 1 − ( P ˆ x ˆ x,t 1 ) − 1  − 1  P D xx,t 0  − 1  ( ˆ µ t 1 − µ x,t 0 ) = ˆ µ t 1 −   P D xx,t 0  − 1 − ( P ˆ x ˆ x,t 1 ) − 1  − 1  P D xx,t 0  − 1 ( ˆ µ t 1 − µ x,t 0 ) (95) The Kalman gain is K v t 1 = P xx,t 0  P xx,t 0 +  R v t 1  − 1  − 1 = P xx,t 0  ( P xx,t 0 ) − 1 − ( P xx,t 0 ) − 1   R v t 1  − 1 + ( P xx,t 0 ) − 1  − 1 ( P xx,t 0 ) − 1  = I −   R v t 1  − 1 + ( P xx,t 0 ) − 1  − 1 ( P xx,t 0 ) − 1 =   R v t 1  − 1 + ( P xx,t 0 ) − 1  − 1   R v t 1  − 1 + ( P xx,t 0 ) − 1  −   R v t 1  − 1 + ( P xx,t 0 ) − 1  − 1 ( P xx,t 0 ) − 1 =   R v t 1  − 1 + ( P xx,t 0 ) − 1  − 1  R v t 1  − 1 =  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1  (96) The posterior mean is µ v x,t 1 = µ x,t 0 + K v t 1  y v t 1 − µ x,t 0  = µ x,t 0 +  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1  ×  ˆ µ x,t 1 −   P D xx,t 0  − 1 − ( P ˆ x ˆ x,t 1 ) − 1  − 1  P D xx,t 0  − 1 ( ˆ µ x,t 1 − µ x,t 0 )  = µ x,t 0 +  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 ×  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1  ˆ µ x,t 1 +  P D xx,t 0  − 1 ( ˆ µ x,t 1 − µ x,t 0 )  = µ x,t 0 +  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1  ( P ˆ x ˆ x,t 1 ) − 1 ˆ µ x,t 1 − ( P xx,t 0 ) − 1 µ x,t 0  (97) 15 The posterior cov ariance is P v xx,t 1 = P xx,t 0 − K v t 1  P xx,t 0 +  R v t 1  − 1  K v t 1 T = P xx,t 0 − P xx,t 0  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1   ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 = P xx,t 0  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1   ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 − P xx,t 0  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1   ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 =  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 . (98) The posterior mean and cov ariance contain the inv erses ( P ˆ x ˆ x,t 1 ) − 1 ,  P D xx,t 0  − 1 , ( P xx,t 0 ) − 1 , and  ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 + ( P xx,t 0 ) − 1  − 1 . If measurements hav e non-degenerate noise, the posterior ( P ˆ x ˆ x,t 1 ) − 1 is positiv e deﬁnite. Matrices  P D xx,t 0  − 1 and ( P xx,t 0 ) − 1 depend only on the prior which is assumed positi ve deﬁnite. The posterior has smaller or equal cov ariance than the prior , thus ( P ˆ x ˆ x,t 1 ) − 1 −  P D xx,t 0  − 1 is positive semideﬁnite and the sum of this and the positive deﬁnite matrix ( P xx,t 0 ) − 1 is positiv e deﬁnite. Thus, assuming that the prior cov ariance is positi ve deﬁnite and that the posterior is positive deﬁnite, the matrix in verses in (98) are applied only to positiv e deﬁnite matrices. Matti Raitoharju received M.Sc. and Ph.D. degrees in Mathematics in T ampere University of T echnology , Finland, in 2009 and 2014 respectiv ely . He works as a signal processing specialist at an aerospace and defence company Patria and as a postdoctoral researcher at Aalto University . He is a visiting scholar at T ampere Univ ersity . His scientiﬁc interests include mathematical modeling and de velopment and application of Kalman type ﬁlters. Robert Pich ´ e receiv ed the Ph.D. degree in civil engineering in 1986 from the Univ ersity of W aterloo, Canada. Since 2004 he holds a professorship in mathematics at T ampere Univ ersity of T echnology (now T ampere University). , Finland. His scientiﬁc interests include mathematical and statistical modelling, systems theory , and applica- tions in positioning, computational ﬁnance, and mechanics.

On Computational Complexity Reduction Methods for Kalman Filter Extensions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment