An Optimal Transport Formulation of the Ensemble Kalman Filter

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing fe…

Authors: Amirhossein Taghvaei, Prashant G. Mehta

An Optimal Transport Formulation of the Ensemble Kalman Filter
1 An Optimal T ransport F ormulation of the Ensemble Kalman Filter Amirhossein T aghv aei, Prashant G. Mehta Abstract —Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to appr oximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law . It has been noted in the literature that the control law is not unique. This is the main problem addr essed in this paper . T o obtain a unique contr ol law , the filtering problem is f ormulated here as an optimal transportation problem. An explicit f ormula f or the (mean-field type) optimal control law is deri ved in the linear Gaussian setting. Comparisons ar e made with the contr ol laws for different types of EnKF algorithms described in the literature. V ia empirical approximation of the mean-field control law , a finite- N controlled interacting particle algorithm is obtained. F or this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter . This allows strong conclusions on conv ergence and error properties based on the classical filter stability theory for the Kalman filter . It is shown that, under certain technical conditions, the mean squared error (m.s.e.) con verges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite- N algorithm. The analysis is used to pr ove weak con vergence of the empirical distribution as N → ∞ . F or a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature. I . I N T RO D U C T I O N The subject of this paper concerns Monte-Carlo methods for simulating a nonlinear filter (conditional distribution) in continuous-time settings. The mathematical abstraction of any filtering problem inv olves two processes: a hidden Markov process { X t } t ≥ 0 and the observation process { Z t } t ≥ 0 . The numerical problem is to compute the posterior distribution P ( X t ∈ ·| Z t ) where Z t : = σ { Z s ; 0 ≤ s ≤ t } is the filtration generated by the observations. A standard solution approach is the particle filter which relies on importance sampling to implement the ef fect of conditioning [ 3 ], [ 4 ]. In numerical implementations, this often leads to the particle de generacy issue (or weight collapse) whereby only a few particles hav e large weights. T o combat this issue, v arious types of resam- pling schemes have been proposed in the literature [ 5 ], [ 6 ]. In the past decade, an alternate class of algorithms has attracted growing attention. These algorithms can be regarded A. T aghv aei is with the Department of Mechanical and Aerospace Engi- neering at University of California Irvine. The research reported in this paper was performed while he was a graduate student at the Univ ersity of Illinois at Urbana-Champaign ataghvae@uci.edu . P . G. Mehta is with the Coordinated Science Laboratory and the Department of Mechanical Science and Engineering at the University of Illinois at Urbana- Champaign (UIUC) mehtapg@illinois.edu . Financial support from the NSF CMMI grants 1462773 and 1761622 is gratefully acknowledged. The conference version of this paper appears in [ 1 ] and [ 2 ]. as a controlled interacting particle system where the central idea is to implement the ef fect of conditioning using feedback control. Mathematically , this in v olves construction of con- trolled stochastic process, denoted by { ¯ X t } t ≥ 0 . In continuous- time settings, the model for the ¯ X t is a stochastic differential equation (sde): d ¯ X t = u t ( ¯ X t ) d t + K t ( ¯ X t ) d Z t + [additional terms] , ¯ X 0 d = X 0 (1) where the [additional terms] are pre-specified (these terms may be zero). The control problem is to design mean-field type control law { u t ( · ) } t ≥ 0 and { K t ( · ) } t ≥ 0 such that the conditional distribution of ¯ X t (giv en Z t ) is equal to the posterior distribution of X t . If this property holds, the filter is said to be exact . In a numerical implementation, the mean- field terms in the control law are approximated empirically by simulating N copies of ( 1 ). The resulting system is a controlled interacting particle system with a finite number of N interacting particles. The particles hav e uniform importance weights by construction. Therefore, the particle degenerac y issue does not arise. Resampling is no longer necessary and steps such as rules for reproduction, death or birth of particles are altogether avoided. The focus of this paper is on (i) formal methods for design of control laws ( u t ( · ) and K t ( · ) ) for ( 1 ); (ii) algorithms for empirical approximation of the control laws using N particles; and (iii) error analysis of the finite- N interacting particle mod- els as N → ∞ . The main problem highlighted and addressed in this paper is the issue of uniqueness: one can interpret the controlled system ( 1 ) as transporting the initial distribution at time t = 0 (prior) to the conditional distribution at time t (posterior). Clearly , there are infinitely many maps that transport one distribution into another . This suggests that there are infinitely man y choices of control laws that all lead to exact filters. This is not surprising: The e xactness condition specifies only the mar ginal distribution of the stochastic process { ¯ X t } t ≥ 0 at times t ≥ 0, which is not enough to uniquely identify a stochastic process, e.g., the joint distributions at two time instants are not specified. Although these issues are relev ant more generally , the scope of this paper is limited to the linear Gaussian problem. A motiv ation comes from the widespread use of the ensemble Kalman filter (EnKF) algorithm in applications. It is noted that the mean-field limit of the EnKF algorithm is exact only in linear Gaussian settings. The issue of non-uniqueness is manifested in the different types of EnKF algorithms reported in literature. Some of these EnKF types are discussed as part of the literature surve y (in Sec. I-B ) and in the main body of the paper (in Sec. II-B ). 2 A. Contributions of this paper The following contributions are made in this paper: 1) Non-uniqueness issue: For the linear Gaussian problem, an error process is introduced to help explain the non- uniqueness issue in the selection of the control law in ( 1 ). The error process helps clarify the relationship between the different types of control la ws leading to the different types of EnKF algorithms that ha ve appeared ov er the years in the literature. 2) Optimal transport FPF: T o select a unique control law , an optimization problem is proposed in the form of a time-stepping procedure. The optimality concept is motiv ated by the optimal transportation theory [ 7 ], [ 8 ]. The solution of the time-stepping procedure yields a unique optimal control law . The resulting filter is referred to as the optimal transport FPF . The procedure is suitably adapted to handle the case, important in Monte- Carlo implementations with finitely many N particles, where the co variance is singular . In this case, the optimal (deterministic) transport maps are replaced by optimal (stochastic) couplings. The general form of the optimal FPF includes stochastic terms which are zero when the cov ariance is non-singular . 3) Error analysis: A detailed error analysis is carried out for the deterministic form of the optimal FPF for the fi- nite but large N limit. For the purposes of error analysis, it is assumed that the linear system is controllable and observable, and the initial empirical co variance matrix is non-singular . The main results are as follows: (i) Empirical mean and covariance of particles is shown to conv erge almost surely to exact mean and cov ariance as t → ∞ ev en for finite N (Prop. 2 -(i)); (ii) Mean-squared error is shown to be bounded by Ce − λ t √ N where the constant C has polynomial depen- dence on the problem dimension (Prop. 2 -(ii)); (iii) A propagation of chaos analysis is carried out to show that empirical distribution of the particles con ver ges in a weak sense to the exact filter poste- rior distribution (Cor . 1 ). 4) Comparison to importance sampling: For a certain simplified filtering problem, a comparison of the m.s.e. between the importance sampling and control-based filters is described. The main result is to show that using an important sampling approach, the number of particles N must grow e xponentially with the dimension d . In contrast, with a control-based approach, N scales at most as order d 2 in order to maintain the same error (Prop. 4 ). The conclusions are also verified numerically (Fig. 1 ). This paper extends and completes the preliminary results reported in our prior conference papers [ 1 ], [ 2 ]. The optimal transport formulation of the FPF and the time-stepping proce- dure was originally introduced in [ 1 ]. Howe ver , its extension to the singular covariance matrix case, in Sec. III-D , is original and has not appeared before. Preliminary error analysis of the deterministic form of optimal FPF appeared in our prior work [ 2 ]. The current paper extends the results in [ 2 ] in two key aspects: (i) The error bounds for the con vergence of the empirical mean and co variance, in Prop. 2 -(ii), re veal the scaling with the problem dimension (see Remark 3 -(ii)), whereas previous results did not; (ii) The propagation of chaos analysis is carried out for the vector case (Cor . 1 ), whereas previous result was only v alid for the scalar case. These improvements became possible with a proof approach that is entirely dif ferent than the one used in [ 2 ]. Finally , the analytical comparison with the importance sampling particle filter , in Sec. V , is new . B. Literatur e survey T wo examples of the controlled interacting particle systems are the classical ensemble Kalman filter (EnKF) [ 9 ]–[ 12 ] and the more recently de veloped feedback particle filter (FPF) [ 13 ], [ 14 ]. The EnKF algorithm is the workhorse in applications (such as weather prediction) where the state dimension d is very high; cf., [ 12 ], [ 15 ]. The high-dimension of the state- space provides a significant computational challenge even in linear Gaussian settings. For such problems, an EnKF imple- mentation may require less computational resources (memory and FLOPS) than a Kalman filter [ 15 ], [ 16 ]. This is because the particle-based algorithm a voids the need to store and propagate the error covariance matrix (whose size scales as d 2 ). An expository revie w of the continuous-time filters in- cluding the progression from the Kalman filter (1960s) to the ensemble Kalman filter (1990s) to the feedback particle filter (2010s) appears in [ 17 ]. In continuous-time settings, the first interacting particle representation of the nonlinear filter appears in the work of Crisan and Xiong [ 18 ]. Also in continuous-time settings, Reich and collaborators have deriv ed deterministic forms of the EnKF [ 12 ], [ 19 ]. In discrete- time settings, Daum and collaborators hav e pioneered the dev elopment of closely related particle flow algorithms [ 20 ], [ 21 ]. The technical approach of this paper has its roots in the op- timal transportation theory . These methods hav e been widely applied for uncertainty propagation and Bayesian inference: The ensemble transform particle filter is based upon com- puting an optimal coupling by solving a linear optimization problem [ 22 ]; Polynomial approximations of the Rosenblatt transport maps for Bayesian inference appears in [ 23 ], [ 24 ]; Solution of such problems using the Gibbs flow is the subject of [ 25 ]. The time stepping procedure of this paper is inspired by the J-K-O construction in [ 26 ]. Its extension to the filtering problem appears in [ 27 ]–[ 30 ]. Closely related to error analysis of this paper is the recent literature on stability and con v ergence of the EnKF algorithm. For the discrete-time EnKF algorithm, these results appear in [ 31 ]–[ 35 ]. The analysis for continuous-time EnKF is more recent. For continuous-time EnKF with perturbed observ ation, under additional assumptions (stable and fully observable), it has been shown that the empirical distribution of the ensemble con ver ges to the mean-field distribution uniformly for all time with the rate O ( 1 √ N ) [ 36 ]. This result has been extended to the nonlinear setting for the case with Langevin type dynamics with a strongly conv ex potential and full linear 3 observation [ 37 ]. The stability assumption is recently relaxed in [ 38 ]. Under certain conditions, conv ergence and long term stability results appear in [ 39 ]. In independent numerical ev aluations and comparisons, it has been observed that EnKF and FPF exhibit smaller sim- ulation v ariance and better scaling properties – with respect to the problem dimension – when compared to the traditional methods [ 40 ]–[ 44 ]. The error analysis (in Sec. IV ) together with the analytical bounds on comparison with the importance sampling (in Sec. V ) provide the first such rigorous justifica- tion for the performance improv ement reported in literature. The analysis of this paper is likely to spur wider adoption of the control-based algorithms for the purposes of sampling and simulation. C. P aper outline Sec. II includes the preliminaries along with a discussion of the non-uniqueness issue. Its resolution is provided in Sec. III where the optimal FPF is deriv ed. The error analysis appears in Sec. IV and comparison with importance sampling particle filter is given in Sec. V . The proofs appear in the Appendix. Notation: For a vector m , k m k 2 denotes the Euclidean norm. For a square matrix Σ , k Σ k F denotes the Frobenius norm, k Σ k 2 is the spectral norm, Tr ( Σ ) is the matrix-trace, Ker ( Σ ) denotes the null-space, Range ( Σ ) denotes the range space, and Spec ( Σ ) denotes the spectrum. For a symmetric matrix Σ , λ max ( Σ ) and λ min ( Σ ) denote the maximum and minimum eigen v alues of Σ respectiv ely . The partial order of positiv e definite matrices is denoted by  such that A  B means A − B is positive definite. N ( m , Σ ) is a Gaussian probability distribution with mean m and covariance Σ . I I . T H E N O N - U N I Q U E N E S S I S S U E A. Pr eliminaries The linear Gaussian filtering problem is described by the linear stochastic differential equations (sde-s): d X t = AX t d t + σ B d B t , (2a) d Z t = H X t d t + d W t , (2b) where X t ∈ R d and Z t ∈ R m are the state and observation at time t , { B t } t ≥ 0 , { W t } t ≥ 0 are mutually independent standard W iener processes taking values in R q and R m , respectiv ely , and A , H , σ B are matrices of appropriate dimension. The initial condition X 0 is assumed to have a Gaussian distribution N ( m 0 , Σ 0 ) . The filtering problem is to compute the posterior distribution, π t ( · ) : = P ( X t ∈ ·| Z t ) , (3) where Z t : = σ ( Z s ; 0 ≤ s ≤ t ) . Kalman-Bucy filter: In this linear Gaussian case, the posterior distribution π t is Gaussian N ( m t , Σ t ) , whose mean m t and variance Σ t ev olve according to the Kalman-Bucy filter [ 45 ]: d m t = Am t d t + K t ( d Z t − H m t d t ) , (4a) d d t Σ t = Ricc ( Σ t ) : = A Σ t + Σ t A > + Σ B − Σ H > H Σ t , (4b) V ariable Notation Equation State of the hidden process X t Eq. ( 2a ) State of the i th particle in finite- N sys. X i t Eq. ( 8 ), ( 21 ) State of the mean-field model ¯ X t Eq. ( 5 ), ( 19 ) Kalman filter mean and covariance m t , Σ t Eq. ( 4a )-( 4b ) Empirical mean and co variance m ( N ) t , Σ ( N ) t Eq. ( 9 ) Mean-field mean and co variance ¯ m t , ¯ Σ t Eq. ( 5 )-( 19 ) Conditional distribution of X t π t Eq. ( 3 ) Conditional distribution of ¯ X t ¯ π t Eq. ( 6 ) Empirical distribution of particles { X i t } π ( N ) t Eq. ( 10 ) T ABLE I: Nomenclature. where K t : = Σ t H > is the Kalman gain and Σ B : = σ B σ > B . Feedback particle filter: The stochastic linear FPF [ 14 , Eq. (26)] (and also the square-root form of the EnKBF [ 12 , Eq (3.3)]) is described by the Mckean-Vlasov sde: d ¯ X t = A ¯ X t d t + σ B d ¯ B t + ¯ K t  d Z t − H ¯ X t + H ¯ m t 2 d t  | {z } FPF control law , (5) where ¯ K t : = ¯ Σ t H > is the Kalman gain, ¯ B t is a standard Wiener process, ¯ m t : = E [ ¯ X t | Z t ] , ¯ Σ t : = E [( ¯ X t − ¯ m t )( ¯ X t − ¯ m t ) > | Z t ] are the mean-field terms, and ¯ X 0 ∼ N ( m 0 , Σ 0 ) . W e use ¯ π t ( · ) : = P ( ¯ X t ∈ ·| Z t ) (6) to denote the conditional distribution of mean-field process ¯ X t . The FPF control law is exact. The exactness result appears in the following theorem which is a special case of the [ 14 , Thm. 1] that describes the exactness result for the general non- linear non-Gaussian case. A proof is included in Appendix A . The proof is useful for studying the non-uniqueness issue described in Sec. II-B . Theor em 1: (Exactness of linear FPF) Consider the linear Gaussian filtering problem ( 2a )-( 2b ) and the linear FPF ( 5 ). If π 0 = ¯ π 0 then π t = ¯ π t , ∀ t ≥ 0 . The notation nomenclature is tabulated in T able I . B. The non-uniqueness issue In the proof of Thm. 1 (giv en in Appendix A ), it is shown that (i) the conditional mean process { ¯ m t } t ≥ 0 ev olves according to ( 4a ); and (ii) the conditional variance process { ¯ Σ t } t ≥ 0 ev olves according to ( 4b ). Define an error process ξ t : = ¯ X t − ¯ m t for t ≥ 0. The equation for ξ t is obtained by subtracting the equation for the mean, ( 38 ) in Appendix A , from ( 5 ): d ξ t = ( A − 1 2 ¯ Σ t H T H ) ξ t + σ B d ¯ B t This is a linear system and therefore, the v ariance of ξ t , easily seen to be giv en by ¯ Σ t , ev olves according to the L yapunov equation d d t ¯ Σ t = ( A − 1 2 ¯ Σ t H T H ) ¯ Σ t + ¯ Σ t ( A − 1 2 ¯ Σ t H T H ) > + Σ B = Ricc ( ¯ Σ t ) 4 which is identical to ( 4b ). These arguments suggest the follo wing general procedure to construct an exact ¯ X t process: Express ¯ X t as a sum of two terms: ¯ X t = ¯ m t + ξ t , where ¯ m t ev olves according to the Kalman-Bucy equation ( 4a ) and the ev olution of ξ t is defined by the sde: d ξ t = G t ξ t d t + σ t d ¯ B t + σ 0 t d ¯ W t where { ¯ W } t ≥ 0 and { ¯ B } t ≥ 0 are independent Brownian motions, and G t , σ t , and σ 0 t are solutions to the matrix equation G t ¯ Σ t + ¯ Σ t G T t + σ t σ > t + σ 0 t ( σ 0 t ) > = Ricc ( ¯ Σ t ) (7) By construction, the equation for the v ariance is giv en by the Riccati equation ( 4b ). In general, there are infinitely man y solutions for ( 7 ). Belo w , we describe three solutions that lead to three established form of EnKF and linear FPF: 1) EnKF with perturbed observation [ 19 , Eq. (27)]: G t = A − ¯ Σ t H > H , σ t = ¯ Σ t H > , σ 0 t = σ B 2) Stochastic linear FPF [ 14 , Eq. (26)] or square-root form of the EnKF [ 12 , Eq (3.3)] : G t = A − 1 2 ¯ Σ t H > H , σ t = σ B , σ 0 t = 0 3) Deterministic linear FPF [ 2 ]: G t = A − 1 2 ¯ Σ t H > H + 1 2 ¯ Σ − 1 t Σ B , σ t = 0 , σ 0 t = 0 Giv en a particular solution G t , one can construct a family of solutions G t + ¯ Σ − 1 t Ω t , where Ω t is any skew-symmetric matrix. For the linear Gaussian problem, the non-uniqueness issue has been discussed in literature: At least tw o forms of EnKF , the perturbed observation form [ 19 ] and the square-root form [ 12 ], are well known. A homotopy of exact deterministic and stochastic filters is giv en in [ 46 ]. An explanation for the non-uniqueness in terms of the Gauge transformation appears in [ 47 ]. C. F inite-N implementation In a numerical implementation, one simulates N stochastic processes (particles) { X i t : 1 ≤ i ≤ N } t ≥ 0 , where X i t is the state of the i th -particle at time t . The ev olution of X i t is obtained upon empirically approximating the mean-field terms. The finite- N filter for the linear FPF ( 5 ) is an interacting particle system: d X i t = AX i t d t + σ B d B i t + K ( N ) t ( d Z t − H X i t + H m ( N ) t 2 d t ) (8) where K ( N ) t : = Σ ( N ) t H > ; { B i t } N i = 1 are independent copies of B t ; X i 0 i.i.d ∼ N ( m 0 , Σ 0 ) for i = 1 , 2 , . . . , N ; and the empirical approximations of the two mean-field terms are as follows: m ( N ) t : = 1 N N ∑ j = 1 X i t Σ ( N ) t : = 1 N − 1 N ∑ j = 1 ( X i t − m ( N ) t )( X i t − m ( N ) t ) > (9) W e use the notation π ( N ) t : = 1 N N ∑ i = 1 δ X i t (10) to denote the empirical distribution of the particles. Here, δ x denotes the Dirac delta distribution at x . I I I . O P T I M A L T R A N S P O RT F P F The problem is to uniquely identify an e xact stochastic process ¯ X t . The solution is based on an optimality concept motiv ated by the optimal transportation theory [ 7 ], [ 8 ]. The background on this theory is briefly re viewed next. A. Backgr ound on optimal transportation Let µ X and µ Y be two probability measures on R d with finite second moments. The (Monge) optimal transportation problem with quadratic cost is to minimize min T E [( T ( X ) − X ) 2 ] (11) ov er all measurable maps T : R d → R d such that X ∼ µ X , T ( X ) ∼ µ Y (12) If it e xists, the minimizer T is called the optimal transport map between µ X and µ Y . The optimal cost is referred to as L 2 -W asserstein distance between µ X and µ Y [ 8 ]. Theor em 2: (Optimal map between Gaussians [ 48 , Prop. 7]) Consider the optimization problem ( 11 )-( 12 ). Suppose µ X and µ Y are Gaussian distributions, N ( m X , Σ X ) and N ( m Y , Σ Y ) , with Σ X , Σ Y  0. Then the optimal transport map between µ X and µ Y is given by T ( x ) = m Y + F ( x − m X ) (13) where F = Σ 1 2 Y ( Σ 1 2 Y Σ X Σ 1 2 Y ) − 1 2 Σ 1 2 Y . B. The time-stepping optimization pr ocedur e T o uniquely identify an exact stochastic process ¯ X t , the following time stepping optimization procedure is proposed: 1) Divide the time interv al [ 0 , T ] into n ∈ N equal time steps with the time instants t 0 = 0 < t 1 < . . . < t n = T . 2) Initialize a discrete time random process { ¯ X t k } n k = 1 ac- cording to the initial distribution (prior) of X 0 , ¯ X t 0 ∼ π 0 3) For each time step [ t k , t k + 1 ] , ev olve the process ¯ X t k according to ¯ X t k + 1 = T k ( ¯ X t k ) , for k = 0 , . . . , n − 1 (14) where the map T k is the optimal transport map between the probability measures π t k and π t k + 1 . 4) T ake the limit as n → ∞ to obtain the continuous-time process ¯ X t and the sde: d ¯ X t = u t ( ¯ X t ) d t + K t ( ¯ X t ) d Z t (15) The procedure leads to the control laws u t and K t that depend upon π t . Since π t is unkno wn, one simply replaces 5 it with ¯ π t – the two are meant to be identical by construction. The resulting sde ( 15 ) is referred to as the optimal transport FPF or simply the optimal FPF . A definition is needed to state the main result. Definition 1: For a given positi ve-definite matrix Q  0, define √ Ricc ( Q ) as the (unique such) symmetric solution to the matrix equation: √ Ricc ( Q ) Q + Q √ Ricc ( Q ) = Ricc ( Q ) (16) Remark 1: The symmetric solution to the matrix equa- tion ( 16 ) is explicitly given by: √ Ricc ( Q ) = Z ∞ 0 e − sQ Ricc ( Q ) e − sQ d s The solution can also be expressed as: √ Ricc ( Q ) = A − 1 2 QH > H + 1 2 Σ B Q − 1 + Ω Q − 1 (17) where Ω is the (unique such) skew-symmetric matrix that solves the matrix equation Ω Q − 1 + Q − 1 Ω = ( A > − A ) + 1 2 ( QH > H − H > H Q ) + 1 2 ( Σ B Q − 1 − Q − 1 Σ B ) (18) The main result is as follo ws. Its proof appears in the Appendix B . Pr oposition 1: Consider the linear Gaussian filtering prob- lem ( 2a )-( 2b ). Assume the initial cov ariance Σ 0  0. Then the optimal transport FPF is giv en by d ¯ X t = A ¯ m t d t + ¯ K t ( d Z t − H ¯ m t d t ) + G t ( ¯ X t − ¯ m t ) d t (19) where ¯ K t : = ¯ Σ t H T , ¯ m t = E [ ¯ X t | Z t ] , ¯ Σ t = E [( ¯ X t − ¯ m t )( ¯ X t − ¯ m t ) > | Z t ] , ¯ X 0 ∼ N ( m 0 , Σ 0 ) , and G t = √ Ricc ( ¯ Σ t ) The filter is e xact: That is, the conditional distrib ution of ¯ X t is Gaussian N ( ¯ m t , ¯ Σ t ) with ¯ m t = m t and ¯ Σ t = Σ t . Using the form of the solution ( 17 ) for G t = √ Ricc ( ¯ Σ t ) , the optimal transport sde ( 19 ) is expressed as d ¯ X t = A ¯ X t d t + 1 2 Σ B ¯ Σ − 1 t ( ¯ X t − ¯ m t ) d t + 1 2 ¯ K t ( d Z t − H ¯ X t + H ¯ m t 2 d t ) + Ω t ¯ Σ − 1 t ( ¯ X t − ¯ m t ) d t (20) Compared to the original (linear Gaussian) FPF ( 5 ), the optimal transport FPF ( 20 ) has two differences: 1) The stochastic term σ B d ¯ B t is replaced with the de- terministic term 1 2 Σ B ¯ Σ − 1 t ( ¯ X t − ¯ m t ) d t . Gi ven a Gaussian prior , the two terms yield the same posterior . Howe ver , in a finite- N implementation, the difference becomes significant. The stochastic term serves to introduce an additional error of order O ( 1 √ N ) . 2) The sde ( 20 ) has an extra term in volving the ske w- symmetric matrix Ω t . The extra term does not effect the posterior distrib ution. This term is vie wed as a correction term that serves to make the dynamics symmetric and hence optimal in the optimal transportation sense. It is noted that for the scalar ( d = 1) case, the skew- symmetric term is zero. Therefore, in the scalar case, the update formula in the original FPF ( 5 ) is optimal. In the vector case, it is optimal iff Ω t ≡ 0. C. F inite-N implementation in non-singular covariance case The finite- N implementation of the optimal transport sde ( 19 ) is giv en by the follo wing system of N equations: d X i t = Am ( N ) t d t + K ( N ) t ( d Z t − H m ( N ) t d t ) + √ Ricc ( Σ ( N ) t )( X i t − m ( N ) t ) d t (21) for i = 1 , . . . , N , where K ( N ) t : = Σ ( N ) t H > ; X i 0 i.i.d ∼ N ( m 0 , Σ 0 ) ; and empirical approximations of mean and v ariance are defined in ( 9 ). The matrix √ Ricc ( Σ ( N ) t ) is not well-defined if Σ ( N ) t is a singular matrix. This is a problem because in a finite- N implementation, it is possible that Σ ( N ) t  0 e ven though Σ t  0. In particular , when N < d , the empirical covariance matrix is of rank at most N and hence singular . Note that this problem only arises for the optimal and deterministic forms of the FPF . In particular , the stochastic FPF ( 8 ) does not suffer from this issue. It can be simulated for any choice of N . In Sec. III-D , we extend the optimal transportation formulation to handle also the singular forms of the cov ariance matrix. D. The singular covariance case The deriv ation of the optimal FPF ( 19 ) crucially relies on the assumption that ¯ Σ 0  0 which in turn implies that, in the time-stepping procedure, ¯ Σ t k  0 for k = 0 , 1 , . . . , n − 1. In the proof of Prop. 1 , the assumption is used to deriv e the optimal transport map T k . In general, when the cov ariance of Gaussian random variables ¯ X t k or ¯ X t k + 1 is singular , the optimal transport map T k may not exist. In the singular case, a relax ed form of the optimal trans- portation problem, first introduced by Kantorovich, is used to search for optimal (stochastic) couplings instead of (de- terministic) transport maps [ 8 ]. The following example helps illuminate the issue: Example 1: Consider Gaussian random variable X and Y with distributions, N ( m X , Σ X ) and N ( m Y , Σ Y ) , respectively . Suppose m X = m Y =  0 0  , Σ X =  1 0 0 ε  , Σ Y =  1 0 0 1  where ε ≥ 0 is small. If ε > 0, the optimal transportation map exists, and is gi ven by Y = " 1 0 0 1 √ ε # X If ε = 0 then there is no transport map that satisfies the constraints of the optimal transportation problem. The Kantorovich relaxation of the optimal transportation problem ( 11 ) is the following optimization problem: min µ E ( X , Y ) ∼ µ [ | X − Y | 2 ] (22) where µ is a joint distribution on R d × R d with marginals fix ed to µ X and µ Y . 6 Although a deterministic map does not exist for the ε = 0 problem, a (stochastic) coupling that solves the Kantoro vich problem ( 22 ) exists and is given by Y = X +  0 1  B where B ∼ N ( 0 , 1 ) is independent of X . In Appendix C , the Kantorovich relaxation is used to motiv ate an optimization problem whose solution yields the following extension of the optimal FPF: d ¯ X t = A ¯ m t + ¯ K t ( d Z t − H ¯ m t d t ) + G t ( ¯ X t − ¯ m t ) d t + σ t d ¯ B t (23) with σ t : = P K σ B where P K is the projection matrix into the kernel of the matrix ¯ Σ t , and G t is any symmetric solution of the matrix equation G t ¯ Σ t + ¯ Σ t G t = Ricc ( ¯ Σ t ) − σ t ( σ t ) > (24) Remark 2: When ¯ Σ t is singular, the solution to the matrix equation ( 24 ) is not unique. It is sho wn in Appendix C that the solution is of the following general form: G t = A − 1 2 ¯ Σ t H H > + 1 2 P R Σ B ¯ Σ + t + P K Σ B ¯ Σ + t + ¯ Σ + t Σ B P K + P R Ω ( 1 ) ¯ Σ + t + P K ( Ω ( 0 ) − A ) P K (25) where ¯ Σ + t is the pseudo in verse 1 of ¯ Σ t , P R is projection matrix onto the range of ¯ Σ t , and Ω ( 1 ) ∈ R d × d is a skew-symmetric matrix that solves a certain matrix equation ( 46 ) and Ω ( 0 ) is an arbitrary symmetric matrix. Using the formula ( 25 ), the optimal FPF ( 23 ) is expressed as follows: d ¯ X t = A ¯ X t d t + 1 2 ( σ B + σ t ) σ > B ¯ Σ + t ( ¯ X t − ¯ m t ) d t + σ t d ¯ B t + ¯ K t ( d Z t − H ¯ X t + H ¯ m t 2 d t ) + [additional terms] (26) The formula ( 26 ) allows one to clearly see the relationship between the deterministic and stochastic forms of the opti- mal FPF . In particular , when the covariance matrix is non- singular , ¯ Σ + t = ¯ Σ − 1 t , and σ t = P K σ B = 0. This results in the deterministic form of the optimal transport FPF ( 20 ). When the covariance matrix is singular , then the effect of the linear term 1 2 Σ B ¯ Σ − 1 t ( ¯ X t − ¯ m t ) d t in ( 20 ) is now simulated with the two terms 1 2 ( σ B + σ t ) σ > B ¯ Σ + t ( ¯ X t − ¯ m t ) d t + σ t d ¯ B t in ( 26 ). This is conceptually similar to the Example 1 , where the deterministic optimal transport map is replaced with a stochastic coupling. The [additional terms] in ( 26 ) do not affect the distribution. E. F inite-N implementation in the singular covariance case The finite- N implementation of the optimal transport sde ( 23 ) is giv en by the follo wing system of N equations: d X i t = Am ( N ) t d t + K ( N ) t ( d Z t − H m ( N ) t d t ) + G ( N ) t ( X i t − m ( N ) t ) d t + σ ( N ) t d B i t , X i 0 ∼ N ( m 0 , Σ 0 ) (27) 1 The pseudo inv erse of matrix Q is the unique matrix Q + that satisfies Q + QQ + = Q + , QQ + Q = Q , Q + Q is symmetric, and QQ + is symmetric [ 49 ]. where K ( N ) t : = Σ ( N ) t H > ; { B i t } N i = 1 are independent copies of B t , σ ( N ) t = σ B − Σ ( N ) t ( Σ ( N ) t ) + σ B , G ( N ) t is a symmetric matrix solution to the matrix equation G ( N ) t Σ ( N ) t + Σ ( N ) t G ( N ) t = Ricc ( Σ ( N ) t ) − σ ( N ) t σ ( N ) > t and m ( N ) t and Σ ( N ) t are empirical mean and cov ariance defined in ( 9 ). Note that the stochastic term is zero when σ B ∈ Range ( Σ ( N ) t ) , which is true, e.g., when σ B ∈ span { X 1 t , . . . , X N t } . I V . E R R O R A NA L Y S I S This section is concerned with error bounds in the large but finite N regime. Given that N is large, we restrict ourselves to the non-singular case. For the purposes of the error analysis, the following assumptions are made: Assumption (I): The system ( A , H ) is detectable and ( A , σ B ) is stabilizable. Assumption (II): Assume N > d and the initial empirical cov ariance matrix Σ ( N ) 0  0 almost surely . The main result for the finite- N deterministic FPF ( 21 ) is as follows with the proof giv en in Appendix D . Pr oposition 2: Consider the Kalman filter ( 4a )-( 4b ) initial- ized with the prior N ( m 0 , Σ 0 ) and the finite- N deterministic form of the optimal FPF ( 21 ) initialized with X i 0 i.i.d ∼ N ( m 0 , Σ 0 ) for i = 1 , 2 , . . . , N . Under Assumption (I) and (II), the follo wing characterizes the conv ergence and error properties of the empirical mean and cov ariance ( m ( N ) t , Σ ( N ) t ) obtained from the finite- N filter relativ e to the mean and covariance ( m t , Σ t ) obtained from the Kalman filter: (i) Con ver gence: For any finite N > 1, as t → ∞ : lim t → ∞ e λ t k m ( N ) t − m t k 2 = 0 a.s lim t → ∞ e 2 λ t k Σ ( N ) t − Σ t k F = 0 a.s for all λ ∈ ( 0 , λ 0 ) where λ 0 is a fix ed positi ve constant (see ( 48 ) in Appendix D ). (ii) Mean-squared error: For any t > 0, as N → ∞ : E [ k m ( N ) t − m t k 2 2 ] ≤ (const.) e − 2 λ t T r ( Σ 0 ) + Tr ( Σ 0 ) 2 N (28a) E [ k Σ ( N ) t − Σ t k 2 F ] ≤ (const.) e − 4 λ t T r ( Σ 0 ) 2 N (28b) for all λ ∈ ( 0 , λ 0 ) . The constant depends on λ , and spectral norms k Σ 0 k , k Σ ∞ k 2 and k H k 2 , where Σ ∞ is the solution to the algebraic Riccati equation (see Lemma 2 ). Remark 3: T w o remarks are in order: 1) Asymptotically (as t → ∞ ) the empirical mean and variance of the finite- N filter becomes exact. This is because of the stability of the Kalman filter whereby the filter forgets the initial condition. In fact, for any (not necessarily i.i.d Gaussian) { X i 0 } N i = 1 , that satisfy the assumption Σ ( N ) 0  0, the result holds. 2) (Scaling with dimension) If the parameters of the linear Gaussian filtering problem ( 2a )-( 2b ) scale with the di- mension in a way that the spectral norms k Σ 0 k 2 , k Σ ∞ k 2 , 7 k H k 2 , and λ 0 do not change, then the constant in the error bounds ( 28a )-( 28b ) do not change. The only term that scales with the dimension is Tr ( Σ 0 ) . For example, with Σ 0 = σ 2 0 I d × d , T r ( Σ 0 ) = d σ 2 0 . Therefore, the bound on the error typically scales as d 2 in problem dimension. A. Pr opagation of chaos In this section, we study the con ver gence of the empirical distribution of the particles π ( N ) t for the finite- N system ( 21 ) to the exact posterior distrib ution π t . Deri vation of error estimates in volv e construction of N independent copies of the exact process ( 19 ). Consistent with the conv ention to denote mean- field variables with a bar , the stochastic processes are denoted as { ¯ X i t : 1 ≤ i ≤ N } where ¯ X i t denotes the state of the i th particle at time t . The particle evolv es according to the mean-field equation ( 19 ) as d ¯ X i t = A ¯ m t d t + ¯ K t ( d Z t − H ¯ m t d t ) + √ Ricc ( ¯ Σ t )( ¯ X i t − ¯ m t ) (29) where ¯ K t = ¯ Σ t H > is the Kalman gain and the initial condition ¯ X i 0 = X i 0 , the right-hand side being the initial condition of the i th particle in the finite- N FPF ( 21 ). The mean-field process ¯ X i t is thus coupled to X i t through the initial condition. In order to carry out the error analysis, the following ev ent is defined for an arbitrary choice of a fixed matrix Λ 0  0: S Λ 0 : = { Σ ( N ) 0  Λ 0 } (30) The following proposition characterizes the error between X i t and ¯ X i t , when the e vent S Λ 0 is true (the estimate is ke y to the propagation of chaos analysis). The proof appears in the Appendix E . Pr oposition 3: Consider the stochastic processes X i t and ¯ X i t whose ev olution is defined according to the optimal transport FPF ( 21 ) and its mean-field model ( 29 ), respectiv ely . The initial condition X i 0 i.i.d ∼ N ( m 0 , Σ 0 ) for i = 1 , 2 , . . . , N Then, under Assumptions (I) and (II): E [ k X i t − ¯ X i t k 2 2 1 S Λ 0 ] 1 / 2 ≤ (const.) √ N (31) The estimate ( 31 ) is used to prove the following important result that the empirical distribution of the particles in the linear FPF con verges weakly to the true posterior distribution. Its proof appears in the Appendix E . Cor ollary 1: Consider the linear filtering problem ( 2a )-( 2b ) and the finite- N deterministic FPF ( 21 ). The initial condition X i 0 i.i.d ∼ N ( m 0 , Σ 0 ) for i = 1 , 2 , . . . , N . Under Assumptions (I) and (II), for any bounded and Lipschitz function f : R d → R , E        1 N N ∑ i = 1 f ( X i t ) − E [ f ( X t ) | Z t ]      2   1 / 2 ≤ (const.) √ N V . C O M PA R I S O N T O I M P O RTA N C E S A M P L I N G For the purposes of the comparison of the optimal FPF with the importance sampling-based particle filter , we restrict to the static filtering example with a fully observed observation model: d X t = 0 X 0 ∼ N ( 0 , σ 2 0 I d ) d Z t = X t d t + σ w d W t (32) for t ∈ [ 0 , 1 ] , where σ W , σ 0 > 0. The posterior distribution at time t = 1, denoted as π EXA CT , is a Gaussian N ( m 1 , Σ 1 ) with m 1 = σ 2 0 σ 2 0 + σ 2 W Z 1 and Σ 1 = σ 2 0 σ 2 w σ 2 0 + σ 2 w I d . Let { X i 0 } N i = 1 be N i.i.d samples of X 0 . The importance sampling-based particle filter yields an empirical approxima- tion of the posterior distribution π EXA CT as follows: π ( N ) PF = N ∑ i = 1 w i δ X i 0 , w i = e − k Z 1 − X i 0 k 2 2 2 σ 2 w ∑ N i = 1 e − k Z 1 − X i 0 k 2 2 2 σ 2 w (33) In contrast, giv en the initial samples { X i 0 } N i = 1 , the FPF approx- imates the posterior by implementing a feedback control law as follows: π ( N ) FPF = 1 N N ∑ i = 1 δ X i 1 , d X i t = 1 σ 2 w Σ ( N ) t ( d Z t − X i t + m ( N ) t 2 d t ) (34) where the empirical mean m ( N ) t and cov ariance Σ ( N ) t are approximated as ( 9 ). The m.s.e in estimating the conditional expectation of a giv en function f is defined as follo ws: m.s.e ∗ ( f ) = E [ k π ( N ) ∗ ( f ) − π EXA CT ( f ) k 2 2 ] where the subscript ∗ is either the PF or the FPF. For f ( x ) = 1 √ d 1 > x , a numerically computed plot of the lev el-sets of m.s.e, as a function of N and d , is depicted in Figure 1 -(a)-(b). The expectation is approximated by av eraging ov er M = 1000 independent simulations. It is observ ed that, in order to hav e the same error , the importance sampling- based approach requires the number of samples N to gro w exponentially with the dimension d , whereas the growth using the FPF for this numerical example is O ( d 1 2 ) . This conclusion is consistent with other numerical studies reported in the literature [ 43 ]. For the purposes of the analysis, a modified form of the particle filter is considered whereby the denominator is replaced by its exact form: π ( N ) PF = 1 N N ∑ i = 1 ¯ w i δ X i 0 , ¯ w i = e − k Z 1 − X i 0 k 2 2 2 σ 2 w E [ e − k Z 1 − X i 0 k 2 2 2 σ 2 w | Z 1 ] (35) The proof of the follo wing result on the scaling of the m.s.e. with the state dimension d appears in the Appendix F : Pr oposition 4: Consider the filtering problem ( 32 ) with state dimension d . Suppose σ 0 = σ w = σ > 0 and f ( x ) = a > x where a ∈ R d with k a k 2 = 1. Then: (i) The m.s.e. for the modified form of the importance sampling estimator ( 35 ) is giv en by m.s.e PF ( f ) = σ 2 N  3 ( 2 d ) − 1 2  ≥ σ 2 N 2 d + 1 (36) 8 1 2 3 4 5 6 7 8 9 10 5 6 7 8 9 l o g ( N ) d N 2 d 0.01 0.02 0.05 0.10 m.s.e (PF) (a) 1 2 3 4 5 6 7 8 9 10 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 l o g ( N ) d N d 1 2 0.001 0.002 0.005 0.010 m.s.e (FPF) (b) Fig. 1: Level sets of the m.s.e. for the filtering problem ( 32 ): (a) importance sampling algorithm ( 33 ) and (b) the FPF ( 34 ) algorithm. (ii) The m.s.e for the FPF estimator ( 34 ) is bounded as m.s.e FPF ( f ) ≤ σ 2 N ( 3 d 2 + 2 d ) (37) Remark 4: In the limit as d → ∞ , the performance of the importance sampling-based particle filters has been studied in the literature [ 50 ]–[ 53 ]. The main result in these papers is concerned with the particle de generacy (or the weight collapse) issue: it is shown that if log N log d d → 0 then the largest weight max 1 ≤ i ≤ N w i → 1 in probability . Consequently , in order to prev ent the weight collapse, the number of particles should grow exponentially with the dimension. This phenomenon is referred to as the curse of dimensionality for the particle filters. Remark 5: The scaling with dimension depicted in Fig- ure 1 (b) suggests that the O ( d 2 ) bound for the m.s.e in the linear FPF is loose. This is the case because, in deriving the bound, the inequality k · k 2 ≤ k · k F was used. The inequality is loose particularly so as the dimension grows. Also, it is ob- served that the m.s.e for the particle filter gro ws slightly slo wer than the lower -bound 2 d . This is because the lower -bound is obtained for the modified particle filter ( 35 ), while the m.s.e is numerically ev aluated for the standard particle filter ( 33 ). The correlation between the numerator and denominator in ( 33 ) reduces the m.s.e error . V I . C O N C L U S I O N S & D I R E C T I O N S F O R F U T U R E W O R K In this paper , a principled approach is presented for design of the EnKF algorithm. The approach is based upon a reformu- lation of the filtering problem into an optimal transportation problem. Its solution is referred to as the optimal transport FPF . Empirical approximation of the mean-field terms in the control law yield a finite- N form of the algorithm as a controlled interacting particle system. Detailed error analysis is presented for the finite- N algorithm including a comparison with the importance sampling-based approach. T aken together with numerical comparisons in recent literature, the analysis of this paper is likely to spur research and application of controlled interacting particle algorithms for filtering and data assimilation. There are many directions for future work: 1) Apart from the optimal transportation formulation stressed in this paper , one may consider alternativ e approaches for control design. One possible direction is based on the Schr ¨ odinger bridge problem [ 54 ], [ 55 ]. It is an interesting question whether such an approach can lead to stochastic forms of FPF , in contrast to the deterministic form obtained using the optimal transport formulation. 2) An important research direction is to extend the stability and error analysis to the class of finite- N stochastic EnKF and FPF algorithms. The results in this paper serve as baseline, in terms of assumptions and order scalings, for the analysis of the stochastic algorithm. 3) It will be of interest to construct optimization formu- lations that directly yield a finite- N algorithm without the need for empirical approximation as an intermediate step. Such constructions may lead to better error prop- erties by design. 4) Finally , it is extremely important to understand the curse of dimensionality (CoD) for general types of controlled interacting particle systems. The result in Prop. 4 is very exciting because it suggests that CoD is av oided in the linear Gaussian case. Whether such a property also holds for the non-Gaussian case remains an open question. A P P E N D I X A P RO O F O F T H M . 1 Pr oof: It is first shown that the conditional mean and variance of ¯ X t ev olve according to Kalman filtering equations. Express the sde ( 5 ) in its integral form, ¯ X t = ¯ X 0 + Z t 0 A ¯ X s d s + Z t 0 σ B d ¯ B s + Z t 0 ¯ K s ( d Z s − H ¯ X s + H ¯ m s 2 d s ) 9 Upon taking the conditional expectation of both sides ¯ m t = E [ ¯ X 0 | Z t ] + E [ Z t 0 A ¯ X s d s | Z t ] + E [ Z t 0 σ b d ¯ B s | Z t ] + E [ Z t 0 ¯ K s ( d Z s − H ¯ X s + H ¯ m s 2 d s ) | Z t ] = E [ ¯ X 0 | Z 0 ] + Z t 0 E [ ¯ K s | Z s ] d Z s + Z t 0 E [ A ¯ X s − ¯ K s H ¯ X s + H ¯ m s 2 | Z s ] d s = ¯ m 0 + Z t 0 A ¯ m s d s + Z t 0 ¯ K s ( d Z s − H ¯ m s d s ) where we used the fact that ¯ X t is adapted to the filteration Z t to obtain the second identity (see [ 56 , Lemma 5.4]). As a result, the sde for the conditional mean is d ¯ m t = A ¯ m t d t + ¯ K t ( d Z t − H ¯ m t d t ) (38) Define the error ξ t according to ξ t : = ¯ X t − ¯ m t . The equation for ξ t is obtained by simply subtracting ( 38 ) from ( 5 ): d ξ t = ( A − ¯ Σ t H T H 2 ) ξ t + σ B d ¯ B t By application of the It ˆ o’ s rule d ( ξ t ξ > t ) =( A − 1 2 ¯ Σ t H > H ) ξ t ξ > t d t + σ B d ¯ B t ξ > t + ξ t ξ > t ( A > − 1 2 H > H ¯ Σ t ) d t + ξ t ( σ B d ¯ B t ) > + Σ B d t which, follo wing the same procedure as for the conditional mean, leads to the sde for the conditional covariance ¯ Σ t = E [ ξ t ξ > t | Z t ] . It is giv en by d d t ¯ Σ t = A ¯ Σ t + ¯ Σ t A T + Σ B − ¯ Σ t H T H ¯ Σ t identical to the Ricatti equation ( 4b ). Hence, because ¯ Σ 0 = Σ 0 , ¯ Σ t = Σ t for all t ≥ 0. This also implies ¯ K t = K t , which in turn implies that the sde for conditional mean ( 38 ) is identical to the Kalman filter equation ( 4a ). Therefore, again because ¯ m 0 = m 0 , ¯ m t = m t for all t ≥ 0. Giv en ¯ Σ t = Σ t and ¯ m t = m t , the mean-field terms in the McKean-Vlaso v sde ( 5 ) can be treated as exogenous pro- cesses. Therefore, the McKean-Vlaso v sde ( 5 ) simplifies to a Ornstein-Uhlenbeck sde. Because the distribution of the initial condition ¯ X 0 is Gaussian, the distrib ution of ¯ X t is also Gaussian giv en by N ( m t , Σ t ) which is equal to the posterior distribution giv en by Kalman filter and concludes the proof. A P P E N D I X B P RO O F O F P R O P . 1 The key step in the proof is the following Lemma: Lemma 1: Consider the ode ( 4b ). Let Σ t be its solution for t ∈ [ 0 , T ] . Then Σ 1 2 t + ∆ t ( Σ 1 2 t + ∆ t Σ t Σ 1 2 t + ∆ t ) − 1 2 Σ 1 2 t + ∆ t = I + G t ∆ t + O ( ∆ t 2 ) (39) where G t is the solution to the matrix equation G t Σ t + Σ t G t = A Σ t + Σ t A T + Σ B − Σ t H T H Σ t (40) and the O ( ∆ t 2 ) in ( 39 ) is uniformly bounded for all t ∈ [ 0 , T ] . Pr oof: From the theory of dynamic Riccati equations, the solution is bounded over any finite time horizon [ 57 ]. Moreov er , because Σ 0  0, Σ t  0. Fix t ∈ [ 0 , T ] , and define F ( s ) : = Σ 1 2 t + s ( Σ 1 2 t + s Σ t Σ 1 2 t + s ) − 1 2 Σ 1 2 t + s Equation ( 39 ) is obtained by considering the T aylor series of F ( s ) at s = 0 F ( ∆ t ) = I + ˙ F ( 0 ) ∆ t + 1 2 ¨ F ( τ ) ∆ t 2 and showing that ˙ F ( 0 ) = G t ; here τ ∈ [ 0 , ∆ t ] . The second order term is uniformly bounded for all t ∈ [ 0 , T ] because Σ t is positiv e definite and bounded. In order to verify ˙ F ( 0 ) = G t , express F ( s ) Σ t F ( s ) = Σ t + s Evaluating the deriv ative with respect to s at s = 0 ˙ F ( 0 ) Σ t + Σ t ˙ F ( 0 ) = A Σ t + Σ t A T + Σ B − Σ t H > H Σ t By the uniqueness property of the solution to the L yapunov equation ( 40 ), ˙ F ( 0 ) = G t . Pr oof: (Prop. 1 ) The proof of exactness is similar to the proof of Thm. 1 and is omitted. In order to obtain the optimal transport sde, the time stepping procedure is used. The key step in the procedure is to obtain the optimal transport map T k . The optimal map is between two Gaussians, N ( m t k , Σ t k ) and N ( m t k + 1 , Σ t k + 1 ) . By Thm. 2 , the optimal map is, ¯ X t k + 1 = m t k + 1 + F k ( ¯ X t k − m t k ) where F k = Σ 1 2 t k + 1 ( Σ 1 2 t k + 1 Σ t k Σ 1 2 t k + 1 ) − 1 2 Σ 1 2 t k + 1 . Using Lemma 1 , ¯ X t k + 1 = m t k + 1 + ( ¯ X t k − m t k ) + G k ( ¯ X t k − m t k ) ∆ t + O ( ∆ t 2 ) T o obtain the sde, take a sum over k = 0 , 1 , . . . , n − 1, ¯ X t n = ¯ X t 0 + m t n − m t 0 + n − 1 ∑ k = 0  G k ( ¯ X t k − m t k ) ∆ t + O ( ∆ t 2 )  In the limit as ∆ t → 0, ¯ X t n = ¯ X t 0 + m t n − m t 0 + Z t 0 G s ( ¯ X s − m s ) d s where the uniform boundedness of the second order term is used. The associated sde is, d ¯ X t = d m t + G t ( ¯ X t − m t ) d t where d m t is gi ven by ( 4a ). Finally one obtains ( 19 ) by replacing m t and Σ t with ¯ m t and ¯ Σ t respectiv ely . 10 A P P E N D I X C D E R I V A T I O N O F O P T I M A L F P F I N S I N G U L A R C O V A R I A N C E C A S E Consider the follo wing general form of the controlled pro- cess: d ¯ X t = G t ( ¯ X t − ¯ m t ) d t + d v t + σ t d ¯ B t (41) The problem is to choose G t , v t and σ t such that the stochastic map ¯ X t → ¯ X t + ∆ t is optimal in the limit as ∆ t → 0. The optimality criterion is the Kantorovich form ( 22 ) of the optimal transportation problem. The particular choice ( 41 ) of the sde for { ¯ X t } t ≥ 0 is moti vated by the optimal transport sde ( 19 ) deriv ed in Prop. 1 . W e e xpect to recover the deterministic form of the filter ( σ t = 0) for the special case when the covariance is non-singular . The stochastic map ¯ X t → ¯ X t + ∆ t is given by ¯ X t + ∆ t = ¯ X t + Z t + ∆ t t G s ( ¯ X s − ¯ m s ) d s + ( v t + ∆ t − v t ) + Z t + ∆ t t σ s d B s = ¯ X t + ∆ t G t ( ¯ X t − ¯ m t ) + v t + ∆ t − v t + √ ∆ t σ t ζ + o ( ∆ t ) where ζ ∼ N ( 0 , 1 ) . The stochastic map is optimal if (i) the marginals ¯ X t ∼ N ( m t , Σ t ) and ¯ X t + ∆ t ∼ N ( m t + ∆ t , Σ t + ∆ t ) , and (ii) the transport cost E [ | ¯ X t + ∆ t − ¯ X t | 2 ] is minimized. Now , given ¯ X t ∼ N ( m t , Σ t ) , the marginal constraint is satisfied by the following choice: m t + v t + ∆ t − v t = m t + ∆ t ( I + ∆ t G t ) Σ t ( I + ∆ t G t ) + ∆ t σ t σ > t + o ( ∆ t ) = Σ t + ∆ t The first constraint simply means that the increment of ν t must be chosen according to d v t = d m t = Am t d t + K t ( d Z t − H m t d t ) Dividing the second constraint by ∆ t and taking the limit as ∆ t → 0 giv es G t Σ t + Σ t G > t + σ t σ > t = Ricc ( Σ t ) (42) which means that, in the limit as ∆ t → 0, G t and σ t must satisfy the constraint ( 42 ). Clearly , there are infinitely many possible choices for G t and σ t which accounts for the non-uniqueness of the control law as discussed in Sec. II . A unique choice is obtained by minimizing the optimal transportation cost E [ | ¯ X t + ∆ t − ¯ X t | 2 ] = | m t + ∆ t − m t | 2 + ∆ t Tr ( σ t σ > t ) | {z } f 1 ( σ t ) + ( ∆ t ) 2 T r ( G t Σ t G > t ) | {z } f 2 ( G t ) + o ( ∆ t 2 ) T aking the limit as ∆ t → 0 suggests the following sequence of problems: (i) Choose σ t to first minimize f 1 ( σ t ) ; and (ii) Choose G t to next minimize f 2 ( G t ) . These formal considera- tions lead to the following optimization problem: Optimization problem: Define f 1 ( σ ) : = T r ( σ σ > ) , f 2 ( G ) : = T r ( G Σ G > ) together with the sets D Σ : = { ( σ , G ) ∈ R d × d B × R d × d ; G Σ + Σ G > + σ σ > = Ricc ( Σ ) } D Σ | σ ∗ : = { G ∈ R d × d ; G Σ + Σ G > + σ ∗ σ ∗ > = Ricc ( Σ ) } The pair ( σ ∗ , G ∗ ) ∈ D Σ are said to be optimal if T r ( σ ∗ ( σ ∗ ) > ) = min ( σ , G ) ∈ D Σ f 1 ( σ ) , T r ( G ∗ Σ G ∗ ) = min G ∈ D Σ | σ ? f 2 ( G ) (43) Let P R and P K be the orthogonal projection matrices onto the range and kernel space of Σ . Pr oposition 5: Consider the optimization problem ( 43 ). Its optimal solution ( σ ∗ , G ∗ ) is as follows: σ ∗ = P K σ B and G ∗ is the (unique such) symmetric solution of the matrix equation G ∗ Σ + Σ G ∗ = Ricc ( Σ ) − σ ∗ ( σ ∗ ) > (44) solve the optimization problem ( 43 ). These choices yield the formula ( 23 ) for the optimal FPF described in Sec. III-D . It remains to prove the Proposition. Pr oof: (of Prop. 5 ) For any ( σ , G ) ∈ D Σ , multiply both sides of constraint ( 42 ) from left and right by P K to obtain P K σ σ > P K = P K σ B σ B P K where P K Σ = Σ P K = 0 is used. Therefore, f 1 ( σ ) = T r ( σ σ > ) = T r ( P K σ σ > P K ) + Tr ( P R σ σ > P R ) = T r ( P K σ B σ > B P K ) + Tr ( P R σ σ > P R ) = T r ( σ ∗ ( σ ∗ ) > ) + Tr ( P R σ σ > P R ) The second term is non-negati ve and zero if f σ = σ ∗ . There- fore, σ ∗ minimizes f 1 ( σ ) . It remains to sho w that G ∗ minimizes f 2 ( G ) over all G ∈ D Σ | σ ∗ . W e begin by showing that any symmetric solution of the L yapunov equation ( 44 ) exists and is well-defined. The formula for the solution is giv en by G ∗ = Z ∞ 0 e − t Σ ( Ricc ( Σ ) − P K σ B σ B P K ) e − t Σ d t + P K Ω ( 0 ) P K where Ω ( 0 ) is any symmetric matrix. The integral is well- defined because G ∗ − P K Ω ( 0 ) P K = ( P K + P R )( G ∗ − P K Ω ( 0 ) P K )( P K + P R ) = Z ∞ 0 ( P K + P R ) e − t Σ ( Ricc ( Σ ) − P K σ B σ B P K ) e − t Σ ( P K + P R ) d t = Z ∞ 0 e − t Σ P R ( Ricc ( Σ ) − P K σ B σ B P K ) e − t Σ d t + Z ∞ 0 e − t Σ P K ( Ricc ( Σ ) − P K σ B σ B P K ) P R e − t Σ d t where P K ( Ricc ( Σ ) − P K σ B σ B P K ) P K = 0 is used. The integral is bounded because k e − t Σ P R k 2 = k P R e − t Σ k 2 = e − t µ where µ > 0 is the smallest non-zero eigen value of Σ . It remains to sho w that G ∗ thus defined is optimal. Express an arbitrary G ∈ D Σ | σ ∗ as G = G ∗ + V . Since G , G ∗ ∈ D Σ | σ ∗ , it is an easy calculation to see that V Σ + Σ V > = 0 Now , f 2 ( G ) = T r ( G Σ G > ) = T r (( G ∗ + V ) Σ ( G ∗ + V ) > ) = T r ( G ∗ Σ G ∗ ) + Tr ( G ∗ Σ V > ) + Tr ( V Σ G ∗ ) + Tr ( V Σ V > ) = T r ( G ∗ Σ G ∗ ) + Tr ( G ∗ ( V Σ + Σ V > )) + Tr ( V Σ V > ) = T r ( G ∗ Σ G ∗ ) + Tr ( V Σ V > ) 11 Therefore, the choice G = G ∗ minimizes f 2 ( G ) . Justification for f ormula ( 25 ) : The goal is to show that any symmetric solution to the matrix equation ( 24 ) is of the form ( 25 ). Without loss of generality , express the solution as G t = A − 1 2 ¯ Σ t H H > + 1 2 P R Σ B ¯ Σ + t + P K Σ B ¯ Σ + t + ¯ Σ + t Σ B P K + ˜ G t where ˜ G t is the ne w variable. Because G t is symmetric, ˜ G t should satisfy ˜ G t − ˜ G > t = A > − A + 1 2 ( ¯ Σ t H H > − H > H ¯ Σ t ) − 1 2 ( P R Σ B ¯ Σ + t − ¯ Σ + t Σ B P R ) (45) Inserting this form of the solution for G t to the matrix equation ( 24 ) yields: A ¯ Σ t + ¯ Σ t A > − ¯ Σ t H H > ¯ Σ t + 1 2 P R Σ B ¯ Σ + t ¯ Σ t + 1 2 ¯ Σ t ¯ Σ + t Σ B P R + P K Σ B ¯ Σ + t ¯ Σ t + ¯ Σ t ¯ Σ + t Σ B P K + ˜ G t ¯ Σ t + ¯ Σ t ˜ G > t = Ricc ( ¯ Σ t ) − σ t σ > t Using ¯ Σ + t ¯ Σ t = ¯ Σ t ¯ Σ + t = P R , σ t = P K σ B and P K + P R = I con- cludes: ˜ G t ¯ Σ t + ¯ Σ t ˜ G > t = 0 All the solutions to this linear matrix equation can be ex- pressed as: ˜ G t = P K Ω ( 0 ) P K + P R Ω ( 1 ) ¯ Σ + t where Ω ( 0 ) ∈ R d × d is arbitrary and Ω ( 1 ) ∈ R d × d is ske w- symmetric. Next, conditions for Ω ( 0 ) and Ω ( 1 ) are obtained so that the constraint ( 45 ) is true. For Ω ( 0 ) , multiply ( 45 ) by P K from left and right to obtain P K Ω ( 0 ) P K − P K Ω ( 0 ) P K = P K ( A > − A ) P K This condition is satisfied when Ω ( 0 ) + A is symmetric. F or Ω ( 1 ) , multiply ( 45 ) by P R from left and right to obtain P R Ω ( 1 ) ¯ Σ + t + ¯ Σ + t Ω ( 1 ) t P R = P R ( A > − A ) P R + 1 2 P R ( ¯ Σ t H H > − H > H ¯ Σ t ) P R 1 2 P R ( P R Σ B ¯ Σ + t − ¯ Σ + t Σ B , P R ) P R (46) This is the condition that is satisfied by Ω ( 1 ) . A P P E N D I X D P RO O F O F T H E P R O P . 2 The proof of the Prop. 2 relies on the stability theory of the Kalman filter . The following results are quoted without proof: Lemma 2: Consider the Kalman filter ( 4a )-( 4b ) with initial condition ( m 0 , Σ 0 ) . Then, under Assumption (I): (i) ( [ 58 , Thm. 4.11]) There exists a solution Σ ∞  0 to the algebraic Riccati equation (ARE) A Σ ∞ + Σ ∞ A > + σ B σ > B − Σ ∞ H > H Σ ∞ = 0 (47) such that A − Σ ∞ H > H is Hurwitz. Let 0 < λ 0 = min {− Real λ : λ ∈ Spec ( A − Σ ∞ H > H ) } (48) (ii) ( [ 57 , Thm. 1.1]) If the initial co variance matrix Σ 0  0, then there exists matrices Λ min , Λ max  0 such that the solution Σ t satisfies Λ min  Σ t  Λ max (iii) ( [ 59 , Lem. 2.2]) The error cov ariance Σ t → Σ ∞ expo- nentially fast for any initial condition Σ 0 (not necessarily the prior): for all λ ∈ ( 0 , λ 0 ) , there exists a constant c λ such that lim t → ∞ k Σ t − Σ ∞ k F ≤ c λ e − 2 λ t → 0 (iv) [ 59 , Thm. 2.3] Starting from two initial conditions ( m 0 , Σ 0 ) and ( ˜ m 0 , ˜ Σ 0 ) , the means conv erge in the follow- ing senses: lim t → ∞ E [ k m t − ˜ m t k 2 2 ] ≤ (const.) e − 2 λ t → 0 lim t → ∞ k m t − ˜ m t k 2 e λ t = 0 a.s. for all λ ∈ ( 0 , λ 0 ) . In the remainder of this paper , the notation Σ ∞ is used to denote the positi ve definite solution of the ARE ( 47 ) and λ 0 is used to denote the spectral bound as defined in ( 48 ). Proof of Prop. 2 : Consider the finite- N filter ( 21 ) for the deterministic FPF . The empirical mean and covariance are defined in Eq. ( 9 ). The error is defined as ξ i t : = X i t − m ( N ) t for i = 1 , 2 , . . . , N The e volution equations for the mean, co variance, and the error are as follows: d m ( N ) t = Am ( N ) t d t + K ( N ) t ( d Z t − H m ( N ) t d t ) (49a) d Σ ( N ) t d t = Ricc ( Σ ( N ) t ) (49b) d ξ i t d t = √ Ricc ( Σ ( N ) t ) ξ i t (49c) Eq. ( 49a ) is obtained by summing up Eq. ( 21 ) for the i th particle from i = 1 , . . . , N . Equation ( 49b ) is obtained by application of It ¨ o rule d ( ξ t ξ i t > ) = √ Ricc ( Σ ( N ) t ) ξ i t ξ i t > d t + ξ i t ξ i t > √ Ricc ( Σ ( N ) t ) d t and summing over i = 1 , . . . , N and dividing by ( N − 1 ) . Equa- tion ( 49c ) is obtained by subtracting ( 21 ) for X i t from ( 49a ) for m ( N ) t . Because the equations for the empirical mean ( 49a ) and the empirical cov ariance ( 49b ) are identical to the Kalman filter ( 4a )-( 4b ), the a.s. con ver gence of mean and variance follows from Lemma 2 on the filter stability theory . It remains to deri ve the mean-squared estimates. This is done in the following steps: 1) Denote F ∞ : = A − Σ ∞ H > H . In the step 1, an estimate for the spectral norm of the transition matrix e t F ∞ is obtained. From Lemma 2 , the eigen values of F ∞ hav e negati ve real parts smaller than − λ 0 . Consider the Jor- dan decomposition J = P − 1 F ∞ P to bound k e t F ∞ k 2 ≤ k P k 2 k P − 1 k 2  max 0 ≤ k ≤ n t k k !  e − λ 0 t , ∀ t > 0 12 where n the lar gest multiplicity of the eigen v alues of F ∞ . As a result, for all λ < λ 0 , there exists a constant c 0 λ : = k P k 2 k P − 1 k 2 sup t ≥ 0 e − ( λ 0 − λ ) t  max 0 ≤ k ≤ n t k k !  such that k e t F ∞ k 2 ≤ c 0 λ e − λ t 2) Denote F t : = A − Σ t H > H and consider the linear system d d t x t = F t x t = F ∞ x t + ( Σ ∞ − Σ t ) H > H x t (50) Therefore x t = e t F ∞ x s + Z t s e ( t − τ ) F ∞ ( Σ ∞ − Σ τ ) H > H x τ d τ Upon taking the norm and using the triangle inequality k x t k 2 ≤ c λ e − t λ k x s k 2 + Z t s c λ e − ( t − τ ) λ k Σ τ − Σ ∞ k 2 k H > H k 2 k x τ k 2 d τ The Gronwall inequality is then used to conclude that k x t k 2 ≤ c 0 λ e − λ ( t − s ) k x s k 2 e c 0 λ k H > H k 2 R t s k Σ τ − Σ ∞ k 2 d τ This sho ws that the transition matrix Φ t , s for the linear system ( 50 ) is bounded as follows: k Φ t , s k 2 ≤ c 0 λ e − λ ( t − s ) e c 0 λ k H > H k 2 R t s k Σ τ − Σ ∞ k 2 d τ Now , because of the exponential con ver gence k Σ t − Σ ∞ k 2 ≤ c λ e − 2 λ t from Lemma 2 , k Φ t , s k 2 ≤ c 0 λ e − λ ( t − s ) e c 0 λ k H > H k 2 c λ k Σ 0 − Σ ∞ k 2 2 λ and therefore k Φ t , s k 2 ≤ c λ e − λ ( t − s ) for some constant c λ . 3) Consider the empirical counterpart of the linear sys- tem ( 50 ) defined using F ( N ) t : = A − Σ ( N ) t H > H . Denote the associated transition matrix as Φ ( N ) t , s . Then, because Σ ( N ) t also ev olves according to the Riccati equation and con ver ges exponentially to Σ ∞ , by repeating the steps abov e, we also obtain k Φ ( N ) t , s k 2 ≤ c λ e − λ ( t − s ) . 4) W e are now ready to deriv e an estimate for the error Σ ( N ) t − Σ t . From the Riccati equation, d d t ( Σ ( N ) t − Σ t ) =( A − Σ t H H > )( Σ ( N ) t − Σ t ) + ( Σ ( N ) t − Σ t )( A − Σ ( N ) t H H > ) > whose solution is giv en by Σ ( N ) t − Σ t = Φ t , 0 ( Σ ( N ) 0 − Σ 0 )( Φ ( N ) t , 0 ) > Therefore, k Σ ( N ) t − Σ t k F ≤ k Φ Σ t t k 2 k Φ Σ ( N ) t t k 2 k Σ ( N ) 0 − Σ 0 k F ≤ c 2 λ e − 2 λ t k Σ ( N ) 0 − Σ 0 k F Upon squaring and taking the expectation of both sides E [ k Σ ( N ) t − Σ t k 2 F ] ≤ c 4 λ e − 4 λ t E [ k Σ ( N ) 0 − Σ 0 k 2 F ] = c 4 λ e − 4 λ t 1 N E [ T r (( ξ i 0 ξ i 0 > − Σ 0 ) 2 )] ≤ c 4 λ e − 4 λ t E [ k ξ i 0 k 4 2 ] N = c 4 λ e − 4 λ t 3tr ( Σ 0 ) 2 N 5) Finally , a bound for the mean-squared error in estimating the mean is deriv ed. Subtracting ( 4a ) for the conditional mean from ( 49a ) for the empirical mean yields: d m ( N ) t − d m t =( A − Σ ( N ) t H > H )( m ( N ) t − m t ) d t + ( Σ ( N ) t − Σ t ) H > H d I t where d I t = d Z t − H m t d t is the increment of the inno- vation process. Its solution is giv en by m ( N ) t − m t = Φ ( N ) t ( m ( N ) 0 − m 0 ) + Z t 0 Φ ( N ) t , s ( Σ ( N ) s − Σ s ) H > H d I s The norm of the first term is bounded by: E [ k Φ ( N ) t ( m ( N ) 0 − m 0 ) k 2 2 ] ≤ c 2 λ e − 2 λ t E [ k m ( N ) 0 − m 0 k 2 2 ] ≤ c 2 λ e − 2 λ t T r ( Σ 0 ) N The norm of the second term is bounded by: E "     Z t 0 Φ ( N ) t , s ( Σ ( N ) s − Σ s ) H > d I s     2 2 # = Z t 0 E  T r  Φ ( N ) t , s ( Σ ( N ) s − Σ s ) H > H ( Σ ( N ) s − Σ s ) Φ ( N ) t , s >  d s ≤ Z t 0 E [ k Φ ( N ) t , s ( Σ ( N ) s − Σ s ) k 2 F ] k H k 2 2 d s ≤ k H k 2 2 Z t 0 c 2 λ e − 2 λ ( t − s ) c 4 λ e − 4 λ s E [ k Σ ( N ) 0 − Σ 0 k 2 F ] d s ≤ c 6 λ k H k 2 2 3tr ( Σ 0 ) 2 N e − 2 λ t 2 λ where we used the fact that the innov ation process is a Brownian motion [ 56 , Lemma 5.6] and It ˆ o isometry in the first step. Adding the two bounds, E [ k m t − m ( N ) t k 2 2 ] ≤ e − 2 λ t 2 c 2 λ T r ( Σ 0 ) N + e − 2 λ t 6 c 6 λ k H k 2 2 tr ( Σ 0 ) 2 2 λ N A P P E N D I X E P RO O F S O F T H E P R O P . 3 A N D C O R . 1 Pr oof: In the proof S is used to denote S Λ 0 . Use the decomposition X i t = m ( N ) t + ξ i t , ¯ X i t = ¯ m t + ¯ ξ i t to bound the error as E [ k X i t − ¯ X i t k 2 2 1 S ] 1 / 2 ≤ E [ k m ( N ) t − ¯ m t k 2 2 ] 1 / 2 + E [ k ξ i t − ¯ ξ i t k 2 2 1 S ] 1 / 2 ≤ (const.) √ N + E [ k ξ i t − ¯ ξ i t k 2 2 1 S ] 1 / 2 where we have used the e xactness property ¯ m t = m t and the bound ( 28a ) deriv ed in Prop. 2 -(ii). The hard part of the proof is to establish the following bound: E [ k ξ i t − ¯ ξ i t k 2 2 1 S ] 1 2 ≤ (const.) √ N (51) This is done next. 13 The two processes ev olve as follows: d ξ i t = √ Ricc ( Σ ( N ) t ) ξ i t d t , ξ i 0 = X i 0 − m ( N ) 0 d ¯ ξ i t = √ Ricc ( ¯ Σ t ) ¯ ξ i t d t , ¯ ξ i 0 = X i 0 − ¯ m 0 T o express the solution, define the state transition matrix according to d d t Ψ ( Q t ) t , s = √ Ricc ( Q t ) Ψ ( Q t ) t , s , Ψ ( Q t ) s , s = I Using this definition, ξ i t − ¯ ξ i t = Ψ ( Σ ( N ) t ) t , 0 ( ξ i 0 − ¯ ξ i 0 ) + Z t 0 Ψ ( Σ ( N ) s ) t , s ( √ Ricc ( ¯ Σ s ) − √ Ricc ( Σ ( N ) s )) ¯ ξ i s d s (52) W e claim: There exists c 1 , c 2 > 0 and a matrix Λ min  0 such that for all t ≥ s ≥ 0: P ( { Σ ( N ) t  Λ min } ∩ S ) = 1 (53a) P ( {k Ψ ( Σ ( N ) t ) t , s k 2 ≤ c 1 } ∩ S ) = 1 (53b) E [ k √ Ricc ( ¯ Σ t ) − √ Ricc ( Σ ( N ) t ) k 2 2 1 S ] 1 2 ≤ c 2 √ N e − 2 λ t (53c) Assuming the claim is true, the bound ( 51 ) is obtained as follows: E [ k ξ i t − ¯ ξ i t k 2 2 1 S ] 1 2 ≤ c 1 E [ k ξ i 0 − ¯ ξ i 0 k 2 2 1 S ] 1 2 + c 1 c 2 √ N Z t 0 e − 2 λ t E [ k ¯ ξ i s k 2 ] 1 2 d s ≤ c 1 E [ k m ( N ) 0 − ¯ m 0 k 2 2 ] 1 2 + c 1 c 2 √ N Z t 0 e − 2 λ t tr ( ¯ Σ s ) 1 2 d s ≤ c 1 T r ( Σ 0 ) 1 2 √ N + c 1 c 2 √ N tr ( Λ max ) 1 2 2 λ where we used the identity ξ i 0 − ¯ ξ i 0 = ¯ m 0 − m ( N ) 0 and ¯ Σ t = Σ t ≺ Λ max from Lemma 2 -(ii). It remains to pro ve the claim. The three bounds ( 53b )-( 53a ) are obtained in the following three steps: 1) (Bound ( 53a )): On the ev ent S , Σ ( N ) 0  Λ 0 . Let Λ t be the solution of the Riccati equation initialized at Λ 0 . By Lemma 2 -(ii), ∃ Λ min  0 such that Λ t  Λ min . Because Σ ( N ) 0  Λ 0 , by the monotonicity property of the solution to Riccati equation (see [ 57 , prop. 4.1]), Σ ( N ) t  Λ t . Therefore, Σ ( N ) t  Λ min . 2) (Bound ( 53b )): W e begin by deriving a bound for k Ψ ( Σ ∞ ) t , s k 2 . Consider the linear system: d d t x t = √ Ricc ( Σ ∞ ) > x t and a function V ( x ) : = x > Σ ∞ x . Then, d d t V ( x t ) = x > t √ Ricc ( Σ ∞ ) Σ ∞ x t + x > t Σ ∞ √ Ricc ( Σ ∞ ) > x t = 0 where we used the definition ( 16 ). As a result, for all t ≥ s ≥ 0, x > t Σ ∞ x t = x > s Σ ∞ x s ⇒ k x t k 2 2 ≤ λ max ( Σ ∞ ) λ min ( Σ ∞ ) k x s k 2 2 ⇒ k ( Ψ ( Σ ∞ ) t , s ) > x s k 2 ≤ c 3 k x s k 2 ⇒ k ( Ψ ( Σ ∞ ) t , s ) k 2 ≤ c 3 where the constant c 3 : = q λ max ( Σ ∞ ) λ min ( Σ ∞ ) . T o obtain a bound for k Ψ ( Σ ( N ) t ) t , s k 2 , consider the linear system d d t x t = √ Ricc ( Σ ( N ) t ) x t = √ Ricc ( Σ ∞ ) x t + ( √ Ricc ( Σ ( N ) t ) − √ Ricc ( Σ ∞ )) x t whose solution is giv en by x t = Ψ ( Σ ∞ ) t , 0 x 0 + Z t 0 Ψ ( Σ ∞ ) t , s ( √ Ricc ( Σ ( N ) s ) − √ Ricc ( Σ ∞ )) x s d s Therefore, using the bound k Ψ ( Σ ∞ ) t , s k 2 ≤ c 3 , k x t k 2 ≤ c 3 k x 0 k 2 + c 3 Z t 0 k √ Ricc ( Σ ( N ) s ) − √ Ricc ( Σ ∞ ) k 2 k x s k 2 d s Now , k √ Ricc ( Σ ( N ) t ) − √ Ricc ( Σ ∞ ) k 2 = k 1 2 Σ B (( Σ ( N ) t ) − 1 − Σ − 1 ∞ ) − 1 2 ( Σ ( N ) t − Σ ∞ ) H > H k 2 ≤ 1 2 ( k Σ B k 2 k Σ − 1 ∞ k 2 k ( Σ ( N ) t ) − 1 k 2 + k H > H k 2 ) k Σ ( N ) t − Σ ∞ k 2 ≤ ( k Σ B k 2 k Σ − 1 ∞ k 2 k Λ − 1 min k 2 + k H > H k 2 ) c λ | {z } c 4 e − 2 λ t where we used Lemma 2 -(iii) (because Σ ( N ) t is a solu- tion of the Riccati equation ( 49b )) and k ( Σ ( N ) t ) − 1 k 2 ≤ k Λ − 1 min k 2 from ( 53a ). Therefore, k x t k 2 ≤ c 3 k x 0 k 2 + c 3 c 4 Z t 0 e − 2 λ s k x s k 2 d s By an application of the Gr ¨ onwall inequality k x t k 2 ≤ c 3 e c 3 c 4 R t 0 e − 2 λ s d s k x 0 k 2 ≤ c 3 e c 3 c 4 1 2 λ | {z } = : c 1 k x 0 k 2 which implies k Ψ ( Σ ( N ) t ) t , s k ≤ c 1 for all t ≥ s ≥ 0. This is the bound ( 53b ). 3) (Bound ( 53c )): W e have k √ Ricc ( Σ ( N ) t ) − √ Ricc ( ¯ Σ t ) k 2 = k √ Ricc ( Σ ( N ) t ) − √ Ricc ( Σ t ) k 2 ≤ 1 2 ( k Σ B k 2 k Σ − 1 t k 2 k Σ ( N ) t − 1 k 2 + k H > H k 2 ) k Σ ( N ) t − Σ t k 2 ≤ 1 2 ( k Σ B k 2 k Λ − 1 min k 2 2 + k H > H k 2 ) k Σ ( N ) t − Σ t k F where we used ¯ Σ t = Σ t (by the exactness prop- erty), k Σ ( N ) t − Σ t k 2 ≤ k| Σ ( N ) t − Σ t k F , and k ( Σ ( N ) t ) − 1 k 2 ≤ k Λ − 1 min k 2 from ( 53a ). T aking the squared expectation and using ( 28b ) in Prop. 2 -(ii) giv es ( 53c ). Pr oof of the Cor ollary 1 : Consider the event S = S 1 2 Σ 0 . W e deriv e the following bounds: E        1 N N ∑ i = 1 f ( X i t ) − E [ f ( X t ) | Z t ]      2 1 S   1 / 2 ≤ (const.) √ N (54) E        1 N N ∑ i = 1 f ( X i t ) − E [ f ( X t ) | Z t ]      2 1 S c   1 / 2 ≤ (const.) √ N (55) 14 1) (Bound ( 54 )) Using the triangle inequality , E        1 N N ∑ i = 1 f ( X i t ) − E [ f ( X t ) | Z t ]      2 1 S   1 / 2 ≤ E        1 N N ∑ i = 1 f ( X i t ) − 1 N N ∑ i = 1 f ( ¯ X i t )      2 1 S   1 / 2 + E        1 N N ∑ i = 1 f ( ¯ X i t ) − E [ f ( X t ) | Z t ]      2   1 / 2 Now , because ¯ X i t are i.i.d with distribution equal to the conditional distribution, the second term on the right- hand side E        1 N N ∑ i = 1 f ( ¯ X i t ) − E [ f ( X t ) | Z t ]      2   1 / 2 = V ar ( f ( X t ) | Z t ) √ N The first term on the right-hand side is bounded as follows: E        1 N N ∑ i = 1 f ( X i t ) − 1 N N ∑ i = 1 f ( ¯ X i t )      2 1 S   1 / 2 ≤ 1 N N ∑ i = 1 E h   f ( X i t ) − f ( ¯ X i t )   2 1 S i 1 / 2 ≤ (const.) N N ∑ i = 1 E h   X i t − ¯ X i t   2 2 1 S i 1 / 2 ≤ (const.) √ N where we used triangle inequality in the first step, the Lipschitz property of f in the second step, and the estimate ( 31 ) from Prop. 3 in the final step. 2) (Bound ( 55 )) The function f is assumed bounded. So, E        1 N N ∑ i = 1 f ( X i t ) − E [ f ( X t ) | Z t ]      2 1 S c   1 / 2 ≤ (const) P ( S c ) 1 2 The probability of the ev ent S c is bounded as follows: P ( S c ) = P ( Σ 0 − Σ ( N ) 0  1 2 Σ 0 ) ≤ P ( k Σ ( N ) 0 − Σ 0 k 2 F ≥ 1 4 k Σ 0 k 2 F ) ≤ E [ k Σ ( N ) 0 − Σ 0 k 2 F ] 1 4 k Σ 0 k 2 F ≤ 12T r ( Σ 2 0 ) N k Σ 0 k 2 F = 12 N A P P E N D I X F P RO O F O F T H E P R O P . 4 Pr oof: Part (i) Express the m.s.e as: m.s.e PF ( f ) = E        1 N N ∑ i = 1 ¯ w i f ( X i 0 ) − E [ ¯ w i f ( X i 0 ) | Z 1 ]      2   where we used π ( f ) = E [ ¯ w i f ( X i 0 ) | Z 1 ] . The expectation sim- plifies to: m.s.e PF = 1 N    E [ | ¯ w i f ( X i 0 ) | 2 ] | {z } 1st term − E [ | E [ ¯ w i f ( X i 0 ) | Z 1 ] | 2 ] | {z } 2nd term    The two terms are simplified separately: 1) (2nd term) Note that E [ ¯ w i f ( X i 0 ) | Z 1 ] = E [ a > X 0 | Z 1 ] = σ 2 x σ 2 x + σ 2 w a > Z 1 = 1 2 a > Z 1 where we used σ 2 x = σ 2 w = σ 2 in the last step. Therefore, the (2nd term) is ev aluated as E [ | 1 2 a > Z 1 | 2 ] = 1 4 a > E [ Z 1 Z > 1 ] a = σ 2 2 where we used E [ Z 1 Z > 1 ] = E [ X 0 X > 0 ] + σ 2 w I d = 2 σ 2 I d . 2) (1st term) W e have E [ | ¯ w i f ( X i 0 ) | 2 ] = E        | f ( X i 0 ) | 2 e − k Z 1 − X i 0 k 2 2 σ 2 w      E [ e − k Z 1 − X i 0 k 2 2 2 σ 2 w | Z 1 ]      2        The denominator      E [ e − k Z 1 − X i 0 k 2 2 2 σ 2 w | Z 1 ]      2 =      ( 2 π σ 2 w ) d / 2 ( 2 π ( σ 2 w + σ 2 0 )) d / 2 e − k Z 1 k 2 2 2 ( σ 2 w + σ 2 x )      2 = 1 2 d e − k Z 1 k 2 2 2 σ 2 The conditional expectation of the numerator E [ | f ( X i 0 ) | 2 e − k Z 1 − X i 0 k 2 2 σ 2 w | Z 1 ] = ( π σ 2 w ) d / 2 ( π ( σ 2 w + 2 σ 2 0 )) d / 2 e − k Z 1 k 2 2 σ 2 0 + σ 2 w E ζ ∼ N ( 2 3 Z 1 , σ 2 3 I d ) [ | f ( ζ ) | 2 ] = 1 3 d / 2 e − k Z 1 k 2 3 σ 2 ( 4 9 | a > Z 1 | 2 + σ 2 3 ) Using the tower property of the conditional expectation E [ | ¯ w i f ( X i 0 ) | 2 = 2 d 3 d / 2 E [ e k Z 1 k 2 6 σ 2 ( 4 9 | a > Z 1 | 2 + σ 2 3 )] = 2 d 3 d / 2 ( 12 π σ 2 ) d / 2 ( 4 π σ 2 ) d / 2 E ζ ∼ N ( 0 , 6 σ 2 I d ) [ 4 9 | a > ζ | 2 + σ 2 3 ] = 2 d ( 3 σ 2 ) The two terms are combined to obtain the formula ( 36 ). Part (ii) The FPF estimator is π FPF ( f ) = a > m ( N ) 1 where d m ( N ) t = K ( N ) t ( d Z t − m ( N ) t d t ) where K ( N ) t = 1 σ 2 w Σ ( N ) t . The exact mean evolv es according to: d m t = K t ( d Z t − m t d t ) where K t = 1 σ 2 w Σ t . Therefore, the difference m ( N ) t − m t solves the sde: d m ( N ) t − d m t = − K ( N ) t ( m ( N ) t − m t ) d t + ( K ( N ) t − K t ) d I t 15 where d I t = d Z t − m t d t is the increment of the innov ation process. Let Φ t , s be the state transition matrix for the linear system d d t x t = − K ( N ) t x t . In terms of this matrix m ( N ) 1 − m 1 = Φ 1 , 0 ( m ( N ) 0 − m 0 ) + Z 1 0 Φ 1 , t ( K ( N ) t − K t ) d I t T aking an inner product of both sides with a yields a > m ( N ) 1 − a > m 1 = a > Φ 1 , 0 ( m ( N ) 0 − m 0 ) + Z 1 0 a > Φ 1 , t ( K ( N ) t − K t ) d I t Therefore, E [ | a > m ( N ) 1 − a > m 1 | 2 ] ≤ 2 E [ | a > Φ 1 , 0 ( m ( N ) 0 − m 0 ) | 2 ] + 2 E [( Z 1 0 a > Φ 1 , t ( K ( N ) t − K t ) d I t ) 2 ] The formula ( 37 ) follo ws by showing the following bounds for the two terms: E [ | a > Φ 1 , 0 ( m ( N ) 0 − m 0 ) | 2 ] ≤ 2 d σ 2 N (56a) E [( Z 1 0 a > Φ 1 , t ( K ( N ) t − K t ) d I t ) 2 ] ≤ 3 d 2 σ 2 N (56b) 1) (Bound ( 56a )) The spectral norm k Φ t , s k 2 ≤ 1 because K ( N ) t = 1 σ 2 Σ ( N ) t  0. Therefore, | a > Φ 1 , 0 ( m ( N ) 0 − m 0 ) | ≤ k a k 2 k Φ 1 , 0 k 2 k ( m ( N ) 0 − m 0 ) k 2 ≤ k ( m ( N ) 0 − m 0 ) k 2 and E [ | a > Φ 1 , 0 ( m ( N ) 0 − m 0 ) | 2 ] ≤ E [ k m ( N ) 0 − m 0 k 2 2 ] = σ 2 0 d N 2) (Bound ( 56b )) By the It ˆ o isometry , because the innov a- tion process is a Brownian motion [ 56 , Lemma 5.6], E [( Z 1 0 a > Φ 1 , t ( K ( N ) t − K t ) d I t ) 2 ] = σ 2 w E [ Z 1 0 a > Φ 1 , t ( K ( N ) t − K t ) 2 Φ > 1 , t a d t ] ≤ 1 σ 2 w Z 1 0 E [ k Σ ( N ) t − Σ t ) k 2 2 ] d t where we used k Φ 1 , t k 2 ≤ 1 and k a k 2 = 1 to derive the inequality . Next, we bound the spectral norm k Σ ( N ) t − Σ k 2 . W e ha ve d d t Σ ( N ) t = − 1 σ 2 w ( Σ ( N ) t ) 2 , d d t Σ t = − 1 σ 2 w Σ 2 t and thus d d t ( Σ ( N ) t − Σ t ) = − 1 2 σ 2 w ( Σ ( N ) t + Σ t )( Σ ( N ) t − Σ t ) − 1 2 σ 2 w ( Σ ( N ) t − Σ t )( Σ ( N ) t + Σ t ) Its solution is obtained as Σ ( N ) t − Σ t = Φ t ( Σ ( N ) 0 − Σ 0 ) Φ > t where Φ t solves d d t Φ t = − 1 2 σ 2 ( Σ t + Σ ( N ) t ) Φ t with Φ 0 = I . Because Σ t + Σ ( N ) t  0, the spectral norm k Φ t k 2 ≤ 1. Therefore, k Σ ( N ) t − Σ t k 2 ≤ k Σ ( N ) 0 − Σ 0 k 2 and 1 σ 2 w Z 1 0 E k Σ ( N ) t − Σ t ) k 2 2 d t ≤ 1 σ 2 w k Σ ( N ) 0 − Σ 0 k 2 ≤ 1 σ 2 w k Σ ( N ) 0 − Σ 0 k F ≤ 1 σ 2 w 1 N E [ k X i 0 k 4 ] ≤ 1 σ 2 w 3 σ 4 0 d 2 N R E F E R E N C E S [1] A. T aghv aei and P . G. Mehta, “ An optimal transport formulation of the linear feedback particle filter , ” in American Contr ol Conference (ACC), 2016 . IEEE, 2016, pp. 3614–3619. [2] ——, “Error analysis for the linear feedback particle filter, ” in 2018 Annual American Control Conference (ACC) . IEEE, 2018, pp. 4261– 4266. [3] N. J. Gordon, D. J. Salmond, and A. F . Smith, “Novel approach to nonlinear/non-Gaussian Bayesian state estimation, ” in IEE Proceedings F (Radar and Signal Pr ocessing) , vol. 140, 1993, pp. 107–113. [4] A. M. Doucet, A.and Johansen, “ A tutorial on particle filtering and smoothing: Fifteen years later , ” Handbook of Nonlinear F iltering , vol. 12, pp. 656–704, 2009. [Online]. A vailable: https://www .cs.ubc.ca/ ∼ arnaud/doucet johansen tutorialPF .pdf [5] A. Bain and D. Crisan, Fundamentals of stochastic filtering . Springer, 2009, vol. 3. [6] P . Del Moral, “Feynman-Kac formulae, ” in F e ynman-Kac F ormulae . Springer , 2004, pp. 47–93. [7] L. C. Ev ans, “Partial dif ferential equations and Monge-Kantoro vich mass transfer , ” Current developments in mathematics , pp. 65–126, 1997. [8] C. V illani, T opics in Optimal Transportation . American Mathematical Soc., 2003, vol. 58. [9] G. Evensen, “Sequential data assimilation with a nonlinear quasi- geostrophic model using Monte Carlo methods to forecast error statis- tics, ” Journal of Geophysical Researc h: Oceans , vol. 99, no. C5, pp. 10 143–10 162, 1994. [10] ——, “The ensemble Kalman filter: Theoretical formulation and prac- tical implementation, ” Ocean dynamics , vol. 53, no. 4, pp. 343–367, 2003. [11] J. Whitaker and T . M. Hamill, “Ensemble data assimilation without perturbed observations, ” Monthly W eather Revie w , vol. 130, no. 7, pp. 1913–1924, 2002. [12] K. Bergemann and S. Reich, “ An ensemble Kalman-Bucy filter for continuous data assimilation, ” Meteorolo gische Zeitschrift , v ol. 21, no. 3, pp. 213–219, 2012. [13] T . Y ang, P . G. Mehta, and S. P . Meyn, “Feedback particle filter, ” IEEE T ransactions on A utomatic Control , vol. 58, no. 10, pp. 2465–2480, October 2013. [14] T . Y ang, R. S. Laugesen, P . G. Mehta, and S. P . Meyn, “Multiv ariable feedback particle filter , ” Automatica , v ol. 71, pp. 10–23, 2016. [15] P . Houtekamer and H. Mitchell, “ A sequential ensemble Kalman filter for atmospheric data assimilation, ” Mon. W ea. Rev . , vol. 129, pp. 123– 136, 2001. [16] G. Evensen, Data Assimilation. The Ensemble Kalman F ilter . New Y ork: Springer-V erlag, 2006. [17] A. T aghvaei, J. De Wiljes, P . G. Mehta, and S. Reich, “Kalman filter and its modern extensions for the continuous-time nonlinear filtering problem, ” Journal of Dynamic Systems, Measur ement, and Control , vol. 140, no. 3, p. 030904, 2018. [18] D. Crisan and J. Xiong, “ Approximate McKean-Vlasov representations for a class of SPDEs, ” Stochastics , v ol. 82, no. 1, pp. 53–68, 2010. [19] S. Reich, “ A dynamical systems frame work for intermittent data assim- ilation, ” BIT Numerical Analysis , vol. 51, pp. 235–249, 2011. [20] F . Daum and J. Huang, “Particle flow for nonlinear filters with log- homotopy , ” in Proc. SPIE , v ol. 6969, 2008, pp. 696 918–696 918. [21] F . Daum, J. Huang, and A. Noushin, “Generalized Gromov method for stochastic particle flo w filters, ” in SPIE Defense+ Security . International Society for Optics and Photonics, 2017, pp. 102 000I–102 000I. 16 [22] S. Reich, “ A nonparametric ensemble transform method for Bayesian inference, ” SIAM Journal on Scientific Computing , vol. 35, no. 4, pp. A2013–A2024, 2013. [23] T . A. El Moselhy and Y . M. Marzouk, “Bayesian inference with optimal maps, ” J ournal of Computational Physics , vol. 231, no. 23, pp. 7815– 7850, 2012. [24] D. A. Mesa, J. T antiongloc, M. Mendoza, S. Kim, and T . P . Coleman, “ A distributed framework for the construction of transport maps, ” Neural computation , vol. 31, no. 4, pp. 613–652, 2019. [25] J. Heng, A. Doucet, and Y . Pokern, “Gibbs flo w for approximate transport with applications to Bayesian computation, ” arXiv preprint arXiv:1509.08787 , 2015. [Online]. A vailable: https://arxiv .or g/abs/1509. 08787 [26] R. Jordan, D. Kinderlehrer , and F . Otto, “The variational formulation of the Fokker–Planck equation, ” SIAM journal on mathematical analysis , vol. 29, no. 1, pp. 1–17, 1998. [27] R. S. Laugesen, P . G. Mehta, S. P . Meyn, and M. Raginsky , “Poisson’ s equation in nonlinear filtering, ” SIAM Journal on Control and Optimization , vol. 53, no. 1, pp. 501–525, 2015. [Online]. A v ailable: 10.1137/13094743X [28] A. Halder and T . T . Geor giou, “Gradient flo ws in uncertainty propagation and filtering of linear Gaussian systems, ” in 2017 IEEE 56th Annual Confer ence on Decision and Contr ol (CDC) . IEEE, 2017, pp. 3081– 3088. [29] ——, “Gradient flows in filtering and Fisher-Rao geometry , ” in 2018 Annual American Control Conference (ACC) . IEEE, 2018, pp. 4281– 4286. [30] ——, “Proximal recursion for the Wonham filter , ” 2019. [Online]. A v ailable: http://geor giou.eng.uci.edu/papers/CDC2019 W onham- 5.pdf [31] F . Le Gland, V . Monbet, and V . T ran, “Large sample asymptotics for the ensemble Kalman filter , ” Ph.D. dissertation, INRIA, 2009. [32] J. Mandel, L. Cobb, and J. D. Beezley , “On the conv ergence of the ensemble Kalman filter, ” Applications of Mathematics , vol. 56, no. 6, pp. 533–541, 2011. [33] X. T . T ong, A. J. Majda, and D. Kelly , “Nonlinear stability and ergodicity of ensemble based Kalman filters, ” Nonlinearity , vol. 29, no. 2, p. 657, 2016. [34] D. Kelly , K. J. Law , and A. M. Stuart, “W ell-posedness and accuracy of the ensemble Kalman filter in discrete and continuous time, ” Nonlin- earity , vol. 27, no. 10, p. 2579, 2014. [35] E. Kwiatko wski and J. Mandel, “Con vergence of the square root ensemble Kalman filter in the large ensemble limit, ” SIAM/ASA Journal on Uncertainty Quantification , v ol. 3, no. 1, pp. 1–17, 2015. [36] P . Del Moral and J. Tugaut, “On the stability and the uniform propagation of chaos properties of ensemble KalmanBucy filters, ” Ann. Appl. Pr obab. , vol. 28, no. 2, pp. 790–850, 04 2018. [Online]. A v ailable: https://doi.or g/10.1214/17- AAP1317 [37] P . Del Moral, A. Kurtzmann, and J. Tug aut, “On the stability and the uniform propagation of chaos of a class of extended ensemble Kalman– Bucy filters, ” SIAM Journal on Contr ol and Optimization , vol. 55, no. 1, pp. 119–155, 2017. [38] A. N. Bishop and P . Del Moral, “On the stability of matrix-valued Riccati diffusions, ” arXiv preprint , 2018. [Online]. A v ailable: https://arxi v .org/abs/1808.00235 [39] J. de W iljes, S. Reich, and W . Stannat, “Long-time stability and accu- racy of the ensemble Kalman–Bucy filter for fully observed processes and small measurement noise, ” SIAM Journal on Applied Dynamical Systems , vol. 17, no. 2, pp. 1152–1181, 2018. [40] P . M. Stano, A. K. Tilton, and R. Babuska, “Estimation of the soil- dependent time-varying parameters of the hopper sedimentation model: The FPF versus the BPF, ” Control Engineering Practice , vol. 24, pp. 67–78, 2014. [41] P . M. Stano, “Nonlinear state and parameter estimation for hopper dredgers, ” Ph.D. dissertation, Ph. D. dissertation). Delft University of T echnology , 2013. [42] K. Berntorp, “Feedback particle filter: Application and ev aluation, ” in 18th Int. Conf. Information Fusion, W ashington, DC , 2015. [Online]. A v ailable: https://ieeexplore.ieee.or g/document/7266752 [43] S. C. Surace, A. K utschireiter, and J.-P . Pfister , “Ho w to avoid the curse of dimensionality: scalability of particle filters with and without importance weights, ” Siam Review , vol. 61, no. 1, pp. 79–91, 2019. [44] A. K. T ilton, S. Ghiotto, and P . G. Mehta, “ A comparative study of nonlinear filtering techniques, ” in Pr oc. 16 th Int. Conf. on Inf. Fusion , Istanbul, T urkey , July 2013, pp. 1827–1834. [45] R. E. Kalman and R. S. Bucy , “New results in linear filtering and prediction theory , ” Journal of basic engineering , vol. 83, no. 1, pp. 95– 108, 1961. [46] J. W . Kim, A. T aghv aei, and P . G. Mehta, “Derivation and extensions of the linear feedback particle filter based on duality formalisms, ” in 2018 IEEE Confer ence on Decision and Control (CDC) . IEEE, 2018, pp. 7188–7193. [47] E. Abedi and S. C. Surace, “Gauge freedom within the class of linear feedback particle filters, ” arXiv pr eprint arXiv:1903.06689 , 2019. [48] C. R. Gi vens, R. M. Shortt, et al. , “ A class of Wasserstein metrics for probability distributions, ” Michigan Math. J , v ol. 31, no. 2, 1984. [49] A. Ben-Israel and T . N. Greville, Generalized in verses: theory and applications . Springer Science & Business Media, 2003, vol. 15. [50] P . Bick el, B. Li, T . Bengtsson, et al. , “Sharp failure rates for the bootstrap particle filter in high dimensions, ” in Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh . Institute of Mathematical Statistics, 2008, pp. 318–329. [51] T . Bengtsson, P . Bickel, and B. Li, “Curse of dimensionality re visited: Collapse of the particle filter in very lar ge scale systems, ” in IMS Lectur e Notes - Monograph Series in Probability and Statistics: Essays in Honor of David F . F r eedman . Institute of Mathematical Sciences, 2008, vol. 2, pp. 316–334. [52] C. Sn yder, T . Bengtsson, P . Bickel, and J. Anderson, “Obstacles to high-dimensional particle filtering, ” Monthly W eather Review , vol. 136, no. 12, pp. 4629–4640, 2008. [53] P . Rebeschini, R. V an Handel, et al. , “Can local particle filters beat the curse of dimensionality?” The Annals of Applied Probability , vol. 25, no. 5, pp. 2809–2866, 2015. [54] Y . Chen, T . T . Geor giou, and M. Pa von, “On the relation between optimal transport and Schr ¨ odinger bridges: A stochastic control viewpoint, ” Journal of Optimization Theory and Applications , vol. 169, no. 2, pp. 671–691, 2016. [55] S. Reich, “Data assimilation: The Schr ¨ odinger perspective, ” Acta Nu- merica , vol. 28, pp. 635–711, 2019. [56] J. Xiong, An intr oduction to stochastic filtering theory , ser . Oxford Graduate T exts in Mathematics. Oxford University Press, 2008, vol. 18. [57] A. N. Bishop and P . Del Moral, “On the stability of Kalman–Bucy dif- fusion processes, ” SIAM Journal on Control and Optimization , v ol. 55, no. 6, pp. 4015–4047, 2017. [58] H. Kwakernaak and R. Sivan, Linear optimal contr ol systems . Wile y- interscience New Y ork, 1972, v ol. 1. [59] D. Ocone and E. Pardoux, “ Asymptotic stability of the optimal filter with respect to its initial condition, ” SIAM J ournal on Contr ol and Optimization , vol. 34, no. 1, pp. 226–243, 1996. Amirhossein T aghv aei is a Postdoctoral Researcher in the Department of Mechanical and Aerospace Engineering at University of California Irvine. He obtained his PhD in Mechanical Science and Engi- neering and M.S in Mathematics from Univ ersity of Illinois at Urbana-Champaign. He is currently working in the area of Control theory and Machine learning. Prashant G Mehta (M’06) receiv ed the Ph.D. degree in Applied Mathematics from Cornell Uni- versity , Ithaca, NY , in 2004. He is an Associate Professor in the Department of Mechanical Science and Engineering, Univ ersity of Illinois at Urbana- Champaign. Prior to joining Illinois, he was a Re- search Engineer at the United T echnologies Research Center (UTRC). His current research interests are in nonlinear filtering and mean-field games. He receiv ed the Outstanding Achievement A ward at UTRC for his contrib utions to the modeling and control of combustion instabilities in jet-engines. His students received the Best Student Paper A wards at the IEEE Conference on Decision and Control 2007 and 2009 and were finalists for these awards in 2010 and 2012. He has served on the editorial boards of the ASME Journal of Dynamic Systems, Measur ement, and Control and the Systems and Contr ol Letters .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment