An optimization framework for resilient batch estimation in Cyber-Physical Systems

1 An optimization frame work for resilient batch estimation in Cyber -Physical Systems Alexandre Kircher 1 , Laurent Bako 1 , Eric Blanco 1 , Mohamed Benallouch 2 Abstract —This paper proposes a class of resilient state esti- mators for L TV discrete-time systems. The dynamic equation of the system is assumed to be affected by a bounded pr ocess noise. As to the av ailable measurements, they are potentially corrupted by a noise of both dense and impulsive natures. The latter in addition to being arbitrary in its form, need not be strictly bounded. In this setting, we construct the estimator as the set-valued map which associates to the measurements, the minimizing set of some appropriate performance functions. W e consider a family of such performance functions each of which yielding a speciﬁc instance of the general estimator . It is then shown that the proposed class of estimators enjoys the property of resilience, that is, it induces an estimation error which, under certain conditions, is independent of the extreme values of the (impulsi ve) measurement noise. Hence, the estimation err or may be bounded while the measurement noise is virtually unbounded. Moreov er , we provide sev eral error bounds (in different conﬁgurations) whose expressions depend explicitly on the degree of observability of the system being observed and on the considered performance function. Finally , a few simulation results are provided to illustrate the resilience property . Index T erms —Secure state estimation, resilient estimators, optimal estimation, Cyber -physical systems. I . I N T RO D U C T I O N Context. W e consider in this work the problem of designing state estimators which would be resilient against an (unknown) sparse noise sequence affecting the measurements. By sparse noise we refer here to a signal sequence which is of impulsiv e nature, that is, a sequence which is most of the time equal to zero, except at a few instants where it can take on arbitrarily large values. The problem is rele vant for example, in the supervision of Cyber-Physical Systems (CPS) [8]. In this application, the supervisory data may be collected by spatially distributed sensors and then sent to a distant processing unit through some communication netw ork. During the transmis- sion, the data may incur intermittent packet losses or adversar- ial attacks consisting in e.g., the injection of arbitrary signals. Beyond CPS, there are many other applications where the sparse noise model of uncertainty is relev ant: robust statistics [14], hybrid system identiﬁcation [1], intermittent sensor fault detection, etc. Related works. The problem of estimating the state of CPS under attacks has been in vestigated recently through many different approaches. Since the measurements are assumed to 1 A. Kircher , L. Bako and E. Blanco are with Université de L yon, Laboratoire Ampère (Ecole Centrale L yon, CNRS UMR 5005), F-69134. E-mails: alexandre.kircher, laurent.bako, eric.blanco@ec-lyon.fr 2 M. Benallouch is with Université de Lyon, ECAM L yon, Lab ECAM, F-69321 L yon, France. E-mail: mohamed.benallouch@ecam.fr be affected by a sequence of outliers which is sparse in time, a natural scheme of solution to the state estimation problem may be to ﬁrst process the data so as to detect the occurrences of the nonzero instances of that sparse noise, remove the corrupted data and then proceed with classical estimation methods such as the Kalman ﬁlter or the Luenberger type of observer [17], [20]. While this approach sounds a priori reasonable, the main challenge remains to achiev e an efﬁcient detection and isolation of the outliers. Regarding the scenarios where the sporadic noise is modeled in a probabilistic setting, there e xists a body of interesting results providing performance limits of estimation schemes [25], [18], [21]. Another category of approaches, which are inspired by some recent results in compressiv e sampling [7], [11], rely on sparsity-inducing optimization techniques. A striking feature of these methods is that they do not treat separately the tasks of detection, data cleaning and estimation. Instead, an implicit discrimination of the wrong data is induced by some speciﬁc properties of the to-be-minimized cost function. One of the ﬁrst works that puts forward this approach for the resilient state estimation problem is the one reported in [10]. There, it is assumed that only a ﬁxed number of sensors are subject to attacks (sparse over time but otherwise arbitrary disturbances). The challenge then resides in the fact that at each time instant, one does not kno w which sensor is compromised. Note howe ver that the assumptions in [10] were quite restrictive as no dense process noise or measurement noise (other than the sparse attack signal) was considered. These limitations open ways for later extensions in many directions. For example, [24] suggests a reformulation which is argued to reduce com- putational cost by using the concept of ev ent-triggered update ; [19] considers an observation model which includes dense noise along with the sparse attack signal. In [9], the assumption of a ﬁxed number of attacked sensors is relaxed. Finally , the recent paper [13] proposes a uniﬁed frame work for analyzing resilience capabilities of most of these (con vex) optimization- based estimators. Although a bound on the estimation error was derived in this paper , it is not quantitati vely related to the properties (e.g., observability) of the dynamic system being observed. The state estimation problem treated there is rather viewed as a linear regression problem similarly to [2], [5]. Contributions. The contributions of the current paper consist in the design and the analysis of a class of optimization-based resilient estimators for Linear T ime-V arying (L TV) discrete- time systems. The av ailable model of the system assumes bounded noise in both the dynamics and the observation equation with the latter being possibly affected, additionally , 2 by an unknown but sparse attack signal. Contrary to the settings considered in some existing works, we did not impose here any restriction on the number of sensors which are subject to attacks, that is, any sensor can be compromised at any time. Note also that no statistical signiﬁcance is attached to the uncertainties modeled by noise. In this setting, by generalizing our pre vious w ork reported in [16], the current paper proposes a general (rob ust) estimation framew ork for the state of L TV systems. W e propose a class of state estimators constructed as the set-v alued maps which associate to the output measurements, the minimizing set of some appropriate performance functions. A variety of performance functions are considered for the design of the estimator and handled in a uniﬁed analysis frame work: con vex nonsmooth/smooth loss functions and noncon ve x saturated ones. Our main theoretical results concern the resilience analysis of the proposed class of estimators. W e show that the estimation error associated with the new class of estimators can be made, under certain conditions, insensiti ve to the (possibly very lar ge) amplitude of the sparse attack signal. The proposed analysis relies on new quantitativ e characterizations of the observ ability property of the system whose state is being observ ed. Although the deri ved error bounds may be conservati ve, they have the important advantage of being explicitly expressible in function of the properties of the considered dynamic system and those of the optimized loss functions. This makes them valuable qualita- tiv e tools for assessing the impact of the estimator’ s design parameters and that of the system matrices on the quality of the estimation. For example, the proposed error bounds reﬂect the intuition that the more observable the system is with respect to the ne w criteria, the larger the number of instances of gross values (of the output noise) it can handle and the smaller the values of the bounds. Finally the paper shows that for some choice of the design functions (loss functions), some instances of the proposed family of estimators enjoy the exact recov erability property in the particular situation where the measurements are corrupted only by sparse noise. W e present a condition for this property that can be numerically veriﬁed by means of conv ex optimization. Overall, in comparison with [13] which also considers resilient estimation though in a linear regression setting, we (i) introduce here an alternative deﬁnition of resilience, (ii) characterize quantitatively the impact of intrinsic properties (observ ability) of the system being observed on the quality of the estimation (iii) derive an explicit expression of a bound on the estimation error . Outline. The rest of the paper is structured as follows. W e start by introducing in Section II, the settings for the resilient state estimation problem. W e then deﬁne in Section III the new class of optimization-based estimators proposed here to address this problem. The analysis of this new framew ork is presented in Section IV. In Section V, we further discuss the properties of a special constrained version of the initial class of estimators. In Section VI, we comment on the numerical veriﬁcation of the conditions deri ved in the analysis part. Some numerical results are described in Section VII and ﬁnally , concluding remarks are giv en in Section VIII. Notation. R ≥ 0 (respectiv ely R > 0 ) is the set of nonnegati ve (respectiv ely positiv e) reals. R ∗ designates the set of real numbers excluding zero. W e note R a the set of (column) vectors with a real elements and R a × b , the set of real matrices with a rows and b columns. If M ∈ R a × b , then M > will designate the transposed matrix of M . I will refer to the (square) identity matrix of appropriate dimension. The notation k·k will denote a norm ov er a gi ven set (which will be speciﬁed when necessary). k·k p denotes the ` p norm (for p ≥ 1 ) or the ` p quasi-norm (for 0 < p < 1 ) deﬁned for z =  z 1 · · · z a  in R a by k z k p = ( | z 1 | p + · · · + | z a | p ) 1 /p . The limit of this when p ↓ 0 gi ves the so-called ` 0 -norm k·k 0 of z , i.e., the number of nonzero entries in z . Its limit when p ↑ + ∞ gi ves the inﬁnity norm denoted k z k ∞ and returning the maximum value of the | z i | . For x ∈ R , e x refers to the exponential function applied to x . If S is a set, then P ( S ) is the po wer set of S . If S is a ﬁnite set, the notation |S | refers to the cardinality of S . K ∞ functions [15]. W e name K ∞ the set of functions f : R ≥ 0 → R ≥ 0 which are continuous, zero at zero, strictly increasing and satisfy lim λ → + ∞ f ( λ ) = + ∞ . If f ∈ K ∞ , then it admits an inv erse, denoted here f − 1 , which also lies in K ∞ . Similarly , we use the notation K sat ,a to denote the set of saturated functions f : R ≥ 0 → [0 , a ] which are continuous, zero at zero, strictly increasing on [0 , a ] and such that f ( λ ) = f ( a ) for all λ ≥ a . Supr emum. Given a function f ov er R a and a subset S of R a , the notation sup z ∈S f ( z ) < b , with b ∈ R , will mean that for all z in S , f ( z ) < b . This notation includes the case where the supremum is b b ut is not attained by any element of S . I I . T H E R E S I L I E N T E S T I M AT I O N P RO B L E M Consider a discrete-time Linear T ime-V arying (L TV) system described by Σ :  x t +1 = A t x t + w t y t = C t x t + f t (1) where x t ∈ R n is the state vector of the system at time t and y t ∈ R n y is the output vector at time t ; { A t } and { C t } are families of matrices with appropriate dimensions; { w t } is an unobserved bounded noise sequence . As to { f t } , it is regarded here as an (unobserved) arbitrary noise sequence affecting the measurements. For clarity of the exposition, it may be con venient to view f t as a combination of two types of sequences: a bounded sequence { v t } and a sparse sequence { s t } (this decomposition is indeed always possible for an arbitrary noise signal). Hence, we may write f t = v t + s t , (2) where v t is a sensor noise of moderate amplitude and s t is a sparse noise sequence in the sense that its (entrywise and/or timewise) components are mostly equal to zero but its nonzero elements can take on (possibly) arbitrarily large values. Such a sparse sequence { s t } may account for adversarial attacks in the same spirit as in [10], [13], intermittent sensor faults, or data losses, in particular when a communication network is in volved in the data acquisition-transmission chain. In the sequel, we may also refer to { w t } and { v t } in (1) and (2) as dense noises and to the largest elements of { s t } as outliers. 3 For the sake of simplicity , the sparse (and potentially arbitrary large) noise is assumed here to affect only the measurement equation. Note howe ver that the proposed anal- ysis method can be extended to the more general scenario where the sparse noises may affect both the dynamics and the measurements. Problem. The problem considered in this paper is the one of estimating the states x 0 , . . . , x T − 1 of the system (1) on a time period T = { 0 , . . . , T − 1 } giv en T measurements y 0 , ..., y T − 1 of the system output. W e shall seek an accurate estimate of the state despite the uncertainties in the system equations (1) modeled by w t and f t the characteristics of which are informally described abov e. In particular, we would like the to-be-designed estimator to produce an estimate such that the estimation error is, when possible, independent of the maximum amplitude of { f t } . Such an estimator will be called r esilient , see Deﬁnition 2 for a formal characterization of this property . Denote with X T − 1 =  x 0 x 1 . . . x T − 1  (3) the actual state trajectory of the system Σ on a ﬁnite time horizon of length T . Similarly , we use the notation Y T − 1 =  y 0 y 1 · · · y T − 1  (4) to refer to the collection of measurements on the same time horizon. The state estimation problem is approached here from an of ﬂine perspecti ve, therefore T is ﬁxed. For the sake of sim- plicity , the T index will be dropped from the variable names and it will be assumed that signal matrices without an index represent values on the period T = { 0 , . . . , T − 1 } . T o simplify further the formulas, we also pose T 0 = { 0 , . . . , T − 2 } while S = { 1 , . . . , n y } will be a set indexing the sensors (or the rows of the matrices C t in (1)). I I I . O P T I M I Z A T I O N - B A S E D A P P R OAC H T O R E S I L I E N T S T A T E E S T I M A T I O N A. The state estimator In this section we present an optimization-based framework for solving the state estimation problem deﬁned above. T o de- ﬁne formally the proposed state estimator , let us ﬁrst introduce the to-be-minimized objectiv e function. Given the matrices { ( A t , C t ) } of the system (1) and T output measurements Y =  y 0 · · · y T − 1  , we consider a performance function V Σ : R n y × T × R n × T → R ≥ 0 deﬁned by V Σ ( Y , Z ) = λ X t ∈ T 0 φ t ( z t +1 − A t z t ) + X t ∈ T ψ t ( y t − C t z t ) (5) where Z =  z 0 · · · z T − 1  ∈ R n × T is a hypothetical trajectory matrix with z i denoting the i -th column of Z ; λ > 0 is a user-deﬁned parameter which aims at balancing the contributions of the two terms in volv ed in the e xpression of the performance index V Σ ( Y , Z ) . { φ t } and { ψ t } are two families of positi ve functions (called here loss functions) deﬁned on R n and R n y respectiv ely . For the sake of simplicity , we will assume throughout the paper that for all t in T , φ t and ψ t can be expressed by φ t ( z ) = φ ( W t z ) ∀ z ∈ R n (6) ψ t ( e ) = ψ ( V t e ) ∀ e ∈ R n y , (7) where φ : R n → R ≥ 0 and ψ : R n y → R ≥ 0 are two ﬁxed loss functions and { W t } and { V t } are two families of nonsingular weighting matrices with appropriate dimensions. Deﬁnition 1. Given a system Σ such as the one in (1) and given an output measur ement matrix Y ∈ R n y × T , we deﬁne a state estimator to be a set-valued map E : R n y × T → P ( R n × T ) which maps Y to a subset of the space of possible trajectories of the system. W e consider a class of state estimators deﬁned by E ( Y ) = arg min Z ∈ R n × T V Σ ( Y , Z ) . (8) As such the estimator E is well-deﬁned if for any ﬁx ed Y , V Σ ( Y , Z ) admits a non empty minimizing set, that is, if there exists at least one Z ? such that V Σ ( Y , Z ) ≥ V Σ ( Y , Z ? ) for all Z ∈ R n × T . T o ensure this property we will need to put an observ ability assumption on the system whose state is being estimated and require some further properties on the loss functions φ and ψ entering in the deﬁnition of the objectiv e function V Σ . B. W ell-deﬁnedness of the estimator Let us start by stating the properties required for the loss functions inv olved in the deﬁnition of V Σ . Due to the multiple usages that will be made of these properties, it is con venient to state them for a generic loss function deﬁned on a set of matrices (of which vectors constitute a special case). Throughout this paper, a loss function is a positive function ξ : R a × b → R ≥ 0 which will be required to satisfy a subset (depending of the speciﬁc usage) of the following properties: (P1) Positi ve deﬁniteness: ξ (0) = 0 and ξ ( Z ) > 0 for all Z 6 = 0 (P2) Continuity: ξ is continuous (P3) Symmetry: ξ ( − Z ) = ξ ( Z ) for all Z ∈ R a × b (P4) Generalized Homogeneity (GH): There exists a K ∞ func- tion q : R ≥ 0 → R ≥ 0 such that for all λ ∈ R ∗ and for all Z ∈ R a × b , ξ ( Z ) ≥ q  1 | λ |  ξ ( λZ ) . (9) (P5) Generalized T riangle Inequality (GTI): There exists a positiv e real number γ ξ such that for all Z 1 , Z 2 in R a × b ξ ( Z 1 − Z 2 ) ≥ γ ξ ξ ( Z 1 ) − ξ ( Z 2 ) . (10) It can be usefully observed, for the future developments, that (10) can be equiv alently written as ξ ( Z 1 + Z 2 ) ≤ γ − 1 ξ ξ ( Z 1 ) + γ − 1 ξ ξ ( Z 2 ) . Examples of loss functions. Note that norms on R a × b satisfy naturally the properties (P1) – (P5) with q : λ 7→ λ and γ ξ = 1 , hence yielding the classic homogeneity property and triangle inequality . It can also be checked that functions ξ of the form ξ ( Z ) = k Z k p with p > 0 , fully qualify as loss functions in the 4 sense that they fulﬁll all the properties (P1)–(P5). In this case, γ ξ in (10) can be taken equal to 2 1 − 1 /p if 0 < p ≤ 1 and 2 1 − p otherwise. Lastly we note that if ` : R a × b → R ≥ 0 satisﬁes (P1) – (P3) and (P5), then so does the function ξ deﬁned by ξ ( Z ) = 1 − e − ` ( Z ) (see Lemma 9 in the appendix). Similarly , saturated functions of the form ξ ( Z ) = min( ` ( Z ) , R 0 ) for some R 0 > 0 satisfy (P1) – (P3) and (P5). In the case of conv ex functions, a link can be established between (P4) and (P5). Lemma 1 ([16]) . If ξ : R a × b → R ≥ 0 is con vex and satisﬁes pr operty (P4) with a K ∞ function q , then it also satisﬁes (P5) with γ ξ = 2 q (1 / 2) . Observe that quadratic functions ξ : R a × b → R ≥ 0 of the form ξ ( Z ) = T r( Z > QZ ) with Q ∈ R a × a being a positi ve deﬁnite matrix and T r referring to the trace of a matrix, satisfy properties (P1) – (P4) with a K ∞ function q : λ 7→ λ 2 . Since such functions are con ve x, it follo ws from Lemma 1 abo ve that they also verify (P5) for γ ξ = 2 q (1 / 2) = 1 / 2 . Remark 1. In virtue of (6) - (7) , the families of functions { φ t } and { ψ t } satisfy (P1) – (P5) whenever φ and ψ satisfy (P1)– (P5). W e now recall from [16] a technical lemma which will play a fundamental role in analyzing the properties of the estimator (8). In particular, our proof of well-deﬁnedness relies on this lemma. Lemma 2 (Lo wer Bound of a loss function) . Let ξ : R a × b → R ≥ 0 be a function which has pr operties (P1) – (P2) and (P4) with a K ∞ function q . Then, for all norm k·k on R a × b , ξ ( Z ) ≥ D q ( k Z k ) ∀ Z ∈ R a × b (11) wher e D = min k Z k =1 ξ ( Z ) > 0 . (12) Pr oof: W e start by observing that the unit hypersphere S =  Z ∈ R a × b : k Z k = 1  is a compact set in the topology induced by the norm k·k . By the extreme value theorem, ξ being continuous, admits necessarily a minimum value on S , i.e., there is Z ? ∈ S such that ξ ( Z ) ≥ D , ξ ( Z ? ) > 0 for all Z ∈ S . F or any nonzero Z ∈ R a × b , Z k Z k ∈ S so that ξ ( Z k Z k ) ≥ D . On the other hand, by the relaxed homogeneity of ξ , ξ ( Z ) ≥ q ( k Z k ) ξ ( Z k Z k ) ≥ D q ( k Z k ) . Moreov er , this inequality holds for Z = 0 . It therefore holds true for any Z ∈ R a × b . Proposition 1 (W ell-deﬁnedness of the estimator) . Let the loss functions φ and ψ in (6) - (7) satisfy pr operties (P1)–(P5) and assume that the LTV system (1) is observable on [0 , T − 1] in the sense that the observability matrix O 0 ,T − 1 ,  ( C 0 ) > ( C 1 A 0 ) > · · · ( C T − 1 A T − 2 . . . A 1 A 0 ) >  > (13) has full column rank. Then the estimator (8) is well-deﬁned, i.e., the objective function V Σ ( Y , · ) attains its minimum for any ﬁxed Y . Hence, the condition of the proposition guarantees that E ( Y ) is non empty for all Y ∈ R n y × T . Before proving this result, we ﬁrst make the following observation. Lemma 3 (Equi v alent condition of Observability) . Consider the objective function V Σ deﬁned in (5) where { ( φ t , ψ t ) } are deﬁned as in (6) - (7) with φ and ψ satisfying (P1) – (P4). Then the following two statements are equivalent: (i) The system is observable on the time interval [0 , T − 1] . (ii) Ther e exists a K ∞ function q such that for all Z =  z 0 z 1 . . . z T − 1  in R n × T , V Σ (0 , Z ) ≥ q ( k z 0 k ) (14) A proof of this lemma is reported in Appendix B. The function q can be interpreted here as a gain function which measures how much the system is observable with reg ards to the two families { φ t } and { ψ t } : the more the system is observable, the more q ampliﬁes its argument magnitude, making different trajectories more discernible. Pr oof of Pr oposition 1: The idea of the proof is to show that V Σ ( Y , · ) is coer cive ( i.e. , continuous and radially un- bounded) for any given Y and then apply a result 1 in [22, Thm 1.9] to conclude on the attainability of the inﬁmum (which certainly exists since V Σ ( Y , · ) is a positiv e function). Clearly , V Σ ( Y , · ) is continuous as a consequence of φ and ψ being continuous by assumption (see property (P2)). W e then just need to prov e the radial unboundedness of V Σ ( Y , · ) , i.e. , lim k Z k→ + ∞ V Σ ( Y , Z ) = + ∞ for an arbitrary norm k·k on the Z -space and for all ﬁx ed Y . Since ψ satisﬁes property (P5), there exists a constant γ ψ > 0 such that ψ t ( y t − C t z t ) ≥ γ ψ ψ t ( C t z t ) − ψ t ( y t ) . Applying this property leads naturally to V Σ ( Y , Z ) ≥ F ( Z ) − X t ∈ T ψ t ( y t ) , where F ( Z ) = λ X t ∈ T 0 φ t ( z t +1 − A t z t ) + γ ψ X t ∈ T ψ t ( C t z t ) . (15) It can then be shown (following a similar reasoning as in Appendix B), under the observability assumption, that F satisﬁes the conditions of Lemma 2. It follo ws that for any norm k·k on R n × T , there exists a K ∞ function q such that F ( Z ) ≥ q ( k Z k ) . Combining this with the inequality abov e, we obtain that V Σ ( Y , Z ) ≥ q ( k Z k ) − X t ∈ T ψ t ( y t ) which implies the radial unboundedness of V Σ ( Y , · ) for any ﬁxed Y . Hence the estimator (8) is well-deﬁned as stated. As it turns out from Proposition 1, observability of system (1) and properties (P1) – (P4) imposed on φ and ψ ensure that E ( Y ) is a non empty set for any giv en Y . W e then call any 1 Note that radial unboundedness is equiv alent to lev el-boundedness in the terminology of [22]. 5 member ˆ X =  ˆ x 0 ˆ x 1 . . . ˆ x T − 1  of E ( Y ) , an estimate of the state trajectory X of system (1) on the time interv al T . In particular , ˆ x t is called an estimate of the state x t at time t ∈ T . T o conclude this section, note that the deﬁnition of the estimator in (8) does not require any con ve xity assumption on the objective function V Σ . Hence the theoretical analysis to be presented in the next sections does not make use of con ve xity either . Ho wever , we may prefer in practice to select con ve x loss functions φ and ψ . In ef fect, the elements of E ( Y ) are not necessarily expressible through an explicit formula. So, in practice one would resort instead to numerical solvers to approach the solution of the underlying optimization problem. And the numerical search process is known to be more efﬁcient when V Σ ( Y , Z ) is a con ve x function of Z [4], [12]. Nev ertheless it is fair to recognize that noncon vex optimization methods can be successfully implemented as well though with less theoretical guarantees of reaching global optimality with general purpose solvers. I V . T H E R E S I L I E N C E P RO P E RT Y O F T H E P R O P O S E D C L A S S O F E S T I M A T O R S In this section, we prove that the state estimator proposed in (8) possesses the resilience property under some conditions. More speciﬁcally , our main result states that the estimation er- ror ˆ X − X , i.e. , the difference between the real state trajectory and the estimated one, is upper bounded by a bound which does not depend on the amplitude of the outliers contained in { f t } provided that the number of such outliers is below some threshold. A. Deﬁnition of the r esilience of an estimator Let us start with a formal deﬁnition of the resilience property for a state estimator of the form (8). For this purpose, let Y ? =  y ? 0 . . . , y ? T − 1  denote the noise-free output matrix of (1), i.e., the output deﬁned by y ? t = C t x ? t , t = 0 , . . . , T − 1 , with x ? t +1 = A t x ? t and x ? 0 = x 0 ( x 0 being the true initial state of (1)). Let F r , a subset of R n y × T containing 0 , denote a matrix of measurement noise components. Deﬁnition 2 (Resilience of an estimator) . The set-valued estimator E deﬁned in (8) is called r esilient against the set F r of measur ement noise if there exists a K ∞ function g such that, when the pr ocess noise { w t } is zer o, it holds that for any measur ement noise matrix F ∈ R n y × T , k ˆ X − X k ≤ g  inf Ω ∈F r d ( F − Ω)  ∀ ˆ X ∈ E ( Y ? + F ) (16) with k · k denoting some norm, d : R n y × T → R ≥ 0 being a function subject to (P1)-(P5). Hence inf Ω ∈F r d ( F − Ω) denotes some pseudo-distance fr om F to the set F r . Since 0 ∈ F r , a consequence of property (16) is that E ( Y ? ) = { X } which follows from (16) for F = 0 . This fact expresses correctness of the estimator in a nominal situation, i.e., its ability to recover the true state matrix X in the absence of any uncertainty in the a priori known model. Indeed this condition is guaranteed to hold if the system Σ is observable over the considered observation time horizon T . Another key implication of condition (16) is that the estimation error associated with a resilient estimator is totally insensitiv e to any measurement noise matrix F which lies in F r , that is, E ( Y ? + F ) = { X } for all measurement noise F ∈ F r . Throughout this paper, we consider a set F r deﬁned as follows. For F =  f 0 · · · f T − 1  ∈ R n y × T , let T c 0 ( F ) = { t ∈ T : ψ t ( f t ) > 0 } and T 0 ( F ) = { t ∈ T : ψ t ( f t ) = 0 } . For r a positiv e integer , deﬁne F r to be the set of matrices in R n y × T having at most r nonzero columns, i.e., F r = { F : | T c 0 ( F ) | ≤ r } . (17) For the need of making explicit the resilience property (in the results to be presented) with respect to the set F r , we will need the follo wing lemma. Lemma 4. Consider the set F r of measur ement noise matrix deﬁned in (17) and select a (pseudo distance) function d : R n y × T → R ≥ 0 deﬁned by d ( F ) = P t ∈ T ψ t ( f t ) with ψ t a function deﬁned as in (7) and having the pr operties (P1)-(P5). Then inf Ω ∈F r d ( F − Ω) is equal to the sum of the T − r smallest terms in { ψ t ( f t ) : t ∈ T } . Pr oof: Let I c r ( F ) denote the index set of the r largest entries of the vector  ψ 0 ( f 0 ) . . . ψ T − 1 ( f T − 1 )  and I r ( F ) denote the index set of its T − r smallest entries. Then, with the notation Ω =  ω 0 · · · ω T − 1  , inf Ω ∈F r d ( F − Ω) = inf Ω ∈F r X t ∈ T ψ t ( ω t − f t ) = inf Ω ∈F r h X t ∈ I r ( F ) ψ t ( ω t − f t ) + X t ∈ I c r ( F ) ψ t ( ω t − f t ) i = inf Ω ∈F r T 0 (Ω)= I r ( F ) h X t ∈ I r ( F ) ψ t ( f t ) + X t ∈ I c r ( F ) ψ t ( ω t − f t ) i = X t ∈ I r ( F ) ψ t ( f t ) where the notation T 0 (Ω) is deﬁned in the lines preceding Eq. (17). The inﬁmum is reached here for Ω ∈ F r such that ω t = 0 ∀ t ∈ I r ( F ) and ω t = f t ∀ t ∈ I c r ( F ) . Hence inf Ω ∈F r d ( F − Ω) is, as claimed, the sum of the T − r smallest v alues among { ψ t ( f t ) : t ∈ T } . W e will also introduce in the sequel a notion of appr oximate r esilience of E . This terminology refers to Deﬁnition 2 when the right hand side of (16) is modiﬁed as g  inf Ω ∈F r d ( F − Ω) + δ  with δ some nonnegati ve real number . B. Some notational conventions for the analysis For conv enience, let us introduce a fe w more notations. Let Φ : R n × T → R ≥ 0 and Ψ T : R n y × T → R ≥ 0 be deﬁned by Φ( Z ) = X t ∈ T 0 φ t ( z t +1 − A t z t ) (18) Ψ T ( Z ) = X t ∈ T ψ t ( C t z t ) (19) W e also introduce the partial cost function Ψ I deﬁned for any I ⊂ T by Ψ I ( Z ) = P t ∈ I ψ t ( C t z t ) . W e will assume throughout the paper that the loss functions φ and ψ satisfy 6 a subset of the properties (P1)–(P5) and in particular , when they are required to satisfy the GTI (P5), we will denote the associated positi ve constants with γ φ and γ ψ respectiv ely . Finally , let us pose H Σ ( Z ) = λγ φ Φ( Z ) + γ ψ Ψ T ( Z ) . (20) W e will organize the resilience analysis for the estimator (8) along two cases: ﬁrst, the scenario where the gross error vector sequence { s t } in (2) is block-sparse in time and then the situation where it is both componentwise and temporally sparse. T o be more precise, if we denote with S ∈ R n y × T the matrix formed from the sequence { s t : t ∈ T } , then the ﬁrst case refers to columnwise block-sparsity of S while the second is related to an entrywise sparsity . Note that the two cases coincide when the system of interest is single-input single- output (SISO). C. Resilience to intermittent timewise block-spar se err ors W e start by introducing the concept of r -Resilience index of an estimator such as the one in (8), a measure which depends of the system matrices, the structure of the performance function V Σ and on the loss functions φ and ψ . Deﬁnition 3. Let r be a nonne gative inte ger . Assume that the system Σ in (1) is observable on [0 , T − 1] . W e then deﬁne the r -Resilience index of the estimator E in (8) (when applied to Σ ) to be the real number p r given by p r = sup Z ∈ R n × T Z 6 =0 sup I ⊂ T | I | = r Ψ I ( Z ) H Σ ( Z ) (21) wher e H Σ is as deﬁned in (20) . The supr emum is taken here over all nonzer o Z in R n × T and over all subsets I of T with car dinality equal to r . The index p r can be interpreted as a quantitative measure of the observ ability of the system Σ . The observability is needed here to ensure that the denominator H Σ ( Z ) of (21) is different from zero whene ver Z 6 = 0 . Furthermore, it should be remarked that Ψ I ( Z ) ≤ H Σ ( Z ) for any I ⊂ T , which implies that the deﬁning suprema of p r are well-deﬁned. Note that p r is an increasing function of r and satisﬁes p 0 = 0 and p T = 1 . More discussions on the numerical e valuation of p r are deferred to Section VI. In order to state the resilience result for the estimator (8) when applied to system Σ , let us introduce a last notation to be used in the analysis. Let ε ≥ 0 be a gi ven number . For any admissible sequence { f t } t ∈ T in (1), we can split the time index set T into two disjoint label sets, T ε = { t ∈ T : ψ t ( f t ) ≤ ε } , (22) indexing those f t which are upper bounded by ε and T c ε = { t ∈ T : ψ t ( f t ) > ε } indexing those f t which are possibly unbounded. It is important to keep in mind that ε is just a parameter for decomposing the noise sequence in two parts in view of the analysis (and not necessarily a bound on elements of the sequence { ψ ( f t ) } ). For example, taking ε = 0 would be appropriate for analyzing the properties of the estimator when f t is strictly sparse and each of its nonzero elements is treated as an outlier . Theorem 1 (Upper bound on the estimation error) . Consider the system Σ deﬁned by (1) with output Y tog ether with the state estimator (8) in whic h the loss functions φ and ψ are assumed to obey (P1)–(P5). Denote with γ φ and γ ψ the constants associated with the GTI (P5) and q φ and q ψ the K ∞ functions associated with the GH (P4) for φ and ψ r espectively . Let ε ≥ 0 and set r = | T c ε | . If the system is observable on [0 , T − 1] and p r < 1 / (1 + γ ψ ) , then for any norm k·k on R n × T , k ˆ X − X k ≤ h  2 β Σ ( ε ) D  1 − (1 + γ ψ ) p r  + δ ( ε )  ∀ ˆ X ∈ E ( Y ) (23) with X denoting the true state matrix fr om (1) , D = min k Z k =1 H Σ ( Z ) > 0 and β Σ ( ε ) , δ ( ε ) and h being deﬁned by β Σ ( ε ) = λ X t ∈ T 0 φ t ( w t ) + X t ∈ T ε ψ t ( f t ) , (24) δ ( ε ) = 1 − γ ψ D [1 − (1 + γ ψ ) p r ] X t ∈ T c ε ψ t ( f t ) (25) h ( α ) = max n q − 1 φ ( α ) , q − 1 ψ ( α ) o , α ∈ R ≥ 0 (26) Pr oof: Let ˆ X in E ( Y ) . By deﬁnition of E in (8), we hav e V Σ ( Y , ˆ X ) ≤ V Σ ( Y , X ) , which giv es explicitly λ X t ∈ T 0 φ t ( ˆ x t +1 − A t ˆ x t ) + X t ∈ T ψ t ( y t − C t ˆ x t ) ≤ λ X t ∈ T 0 φ t ( w t ) + X t ∈ T ψ t ( f t ) (27) Using the fact that x t +1 = A t x t + w t from (1) and applying the GTI and the symmetry properties of φ t , we can write φ t ( ˆ x t +1 − A t ˆ x t ) = φ t ( ˆ x t +1 − x t +1 − A t ( ˆ x t − x t ) + w t ) ≥ γ φ φ t ( e t +1 − A t e t ) − φ t ( w t ) with e t = ˆ x t − x t . It follows that the ﬁrst term on the left hand side of (27) is lower bounded as follows λ X t ∈ T 0 [ γ φ φ t ( e t +1 − A t e t ) − φ t ( w t )] ≤ λ X t ∈ T 0 φ t ( ˆ x t +1 − A t ˆ x t ) . (28) Similarly , by making use of (1), observ e that ψ t ( y t − C t ˆ x t ) = ψ t ( f t − C t e t ) . W e now apply the GTI and symmetry of ψ t in two different ways depending on whether t belongs to T ε or T c ε : ∀ t ∈ T ε , ψ t ( y t − C t ˆ x t ) ≥ γ ψ ψ t ( C t e t ) − ψ t ( f t ) ∀ t ∈ T c ε , ψ t ( y t − C t ˆ x t ) ≥ γ ψ ψ t ( f t ) − ψ t ( C t e t ) These inequalities imply that the second term on the left hand 7 side of (27) is lower bounded as follows X t ∈ T ε [ γ ψ ψ t ( C t e t ) − ψ t ( f t )] + X t ∈ T c ε [ γ ψ ψ t ( f t ) − ψ t ( C t e t )] ≤ X t ∈ T ψ t ( y t − C t ˆ x t ) (29) Combining (27), (28) and (29) giv es λγ φ X t ∈ T 0 φ t ( e t +1 − A t e t ) + γ ψ X t ∈ T ψ t ( C t e t ) − (1 + γ ψ ) X t ∈ T c ε ψ t ( C t e t ) ≤ 2  λ X t ∈ T 0 φ t ( w t ) + X t ∈ T ε ψ t ( f t )  + X t ∈ T c ε (1 − γ ψ ) ψ t ( f t ) which, by using (19), (20), (24), can be written as H Σ ( E ) − (1 + γ ψ )Ψ T c ε ( E ) ≤ 2 β Σ ( ε ) + X t ∈ T c ε (1 − γ ψ ) ψ t ( f t ) with E =  e 0 e 1 · · · e T − 1  . As T c ε has r elements, applying the deﬁnition of p r giv es Ψ T c ε ( E ) ≤ p r H Σ ( E ) (30) By the assumption that p r < 1 / (1 + γ ψ ) we ha ve that 1 − (1 + γ ψ ) p r > 0 , and consequently , that H Σ ( E ) ≤ 1 1 − (1 + γ ψ ) p r h 2 β Σ ( ε ) + (1 − γ ψ ) X t ∈ T c ε ψ t ( f t ) i (31) Giv en that the system is observ able on [0 , T − 1] , it can be shown, thanks to Lemma 7 in the Appendix, that H Σ satisﬁes properties (P1)–(P4) (the proof of this is quite similar to that of Lemma 3 in Appendix B). W e can therefore apply Lemma 2 to conclude that for any norm k·k H Σ ( E ) ≥ D q 0 ( k E k ) (32) with D deﬁned by D = min k Z k =1 H Σ ( Z ) and q 0 ( α ) = min { q φ ( α ) , q ψ ( α ) } . Finally , the result follows by selecting h to be h = q 0− 1 with q 0− 1 denoting the inv erse of q 0 , which can be simpliﬁed to match its deﬁnition in (26). Strict resilience. No w we state our (strict) resilience result as a consequence of Theorem 1 when the output error-measuring loss function ψ satisﬁes the triangle inequality . Corollary 1 (Resilience property) . Let the conditions of Theor em 1 hold with the additional requir ement that γ ψ = 1 . Then k ˆ X − X k ≤ h  2 β Σ ( ε ) D  1 − 2 p r   ∀ ˆ X ∈ E ( Y ) . (33) Pr oof: The proof is immediate by considering the bound in (23) and observing that δ ( ε ) expressed in (25) vanishes when γ ψ = 1 , hence eliminating completely the contribution of the extreme values of { f t } to the error bound. This giv es immediately (33). It remains now to make clear that (33) is consistent with the requirement (16) of Deﬁnition 2. For this purpose note that the bound in (33) can be written as g ( β Σ ( ε )) with g ∈ K ∞ deﬁned by g ( α ) = h  2 α/  D (1 − 2 p r )  . More- ov er , β Σ ( ε ) reduces to P t ∈ T ε ψ t ( f t ) = inf Ω ∈F r d ( F − Ω) when the process noise is zero (see (24) and Lemma 4). Hence, E qualiﬁes, in the sense of Deﬁnition 2, as an estimator which is resilient to the set F r of measurement noise deﬁned in (17). The resilience property of the estimator (8) lies here in the fact that, under the conditions of Theorem 1 and Corollary 1, the bound in (33) on the estimation error does not depend on the magnitudes of the extreme values of the noise sequence { f t } t ∈ T . Considering in particular the function β Σ ( ε ) , we remark that it can be ov erestimated as follo ws β Σ ( ε ) ≤ λ X t ∈ T 0 φ t ( w t ) + | T ε | ε. (34) The ﬁrst term on the left hand side of (34) represents the uncertainty brought by the dense noise { w t } over the whole state trajectory . It is bounded since { w t } is bounded by assumption (see the description of the system in Section II). The second term is a bound on the sum of those instances of f t whose magnitude is smaller that ε . Because β Σ is a function of ε , the bound in (33) represents indeed a family of bounds parameterized by ε . Since ε is a mere analysis device, a question would be how to select it for the analysis to achieve the smallest bound. Such fav orable values, say ε ? , satisfy ε ? ∈ arg min ε ≥ 0  h  2 β Σ ( ε ) D (1 − 2 p r )  : r = | T c ε | , p r < 1 / 2  . Another interesting point is that the inequality stated by Theorem 1 holds for any norm k·k on R n × T . Note though that the value of the bound depends (through the parameter D ) on the speciﬁc norm used to measure the estimation error . Moreov er , dif ferent choices of the performance-measuring norm will result in different geometric forms for the uncertain set, that is, the ball (in the chosen norm) centered at the true state with radius equal to the upper bound displayed in (33). W e also observe that the smaller the parameter p r is, the tighter the error bound will be, which suggests that the esti- mator is more resilient when p r is lo wer . A similar reasoning applies to the number D which is desired to be large here. These two parameters (i.e., p r and D ) reﬂect properties of the system whose state is being estimated. The y can be interpreted, to some extent, as measures of the degree of observability of the system. In conclusion, the estimator inherits partially its resilience property from characteristics of the system being observed. This is consistent with the well-known fact that the more observ able a system is, the more robustly its state can be estimated from output measurements. Appr oximate resilience. As discussed abov e, the triangle inequality property of the loss function ψ is fundamental for achieving strict r esilience . When ψ does not satisfy this property (i.e., when γ ψ 6 = 1 ), the term δ ( ε ) in (23) is unlikely to vanish completely . Howe ver we can prevent it from gro wing excessi vely by an appropriate choice of ψ in (7). T o see this, assume for example that ψ is deﬁned by ψ ( y ) = 1 − e − ` ( y ) . Then since ψ ( y ) ≤ 1 for all y , δ ( ε ) saturates to a constant value regardless of how large the f t are for t ∈ T c ε . On the other hand, this choice introduces a ne w technical challenge 8 related to the fact that the function q 0 in (32) is no longer a K ∞ function but a bounded (saturated) function. Handling this situation will require some additional condition on the upper bound in (31). T o sum up, by selecting a saturated loss function for ψ , we obtain the following appr oximate resilience result. Corollary 2 (Case where γ ψ 6 = 1 ) . Let the conditions of Theor em 1 hold. Assume further that the loss function ψ in (7) is deﬁned by ψ ( y ) = 1 − e − ` ( y ) wher e ` : R n y → R ≥ 0 satisﬁes (P1) – (P5). In particular , assume that pr operty (P4) is satisﬁed by ` with a K ∞ function q such that (9) is an equality r elation. Also, let ε ≥ 0 be such that b ( ε ) , 2 β Σ ( ε ) + r (1 − γ ψ ) r o ( ε ) D [1 − (1 + γ ψ ) p r ] < 1 , (35) wher e r = r ( ε ) = | T c ε | and r o ( ε ) = max t ∈ T c ε ψ t ( f t ) ≤ 1 . Then there exists a continuous and strictly increasing function h sat : [0 , 1] → [0 , 1] (obe ying h sat (0) = 0 and h sat (1) = 1 ) such that for any norm k·k on R n × T , k ˆ X − X k ≤ h − 1 sat  b ( ε )  ∀ ˆ X ∈ E ( Y ) . (36) with D in (35) deﬁned as in the pr oof of Theorem 1 using the norm k·k . Pr oof: That the particular function ψ speciﬁed in the statement of the corollary satisﬁes the properties (P1)–(P3) and (P5) is a question which is fully answered by Lemma 9 in Section C of the appendix. Consequently , let us observe that the inequality (31) arising in the proof of Theorem 1 still holds true here. As to (32), it also holds as well but with the slight difference that q 0 is just a saturated function in K sat , 1 (as deﬁned in the notation section) with bounded range [0 , 1] . This results in fact from Lemma 9 and the proof of Lemma 2. W e can therefore write q 0 ( k E k ) ≤ 1 D (1 − (1 + γ ψ ) p r ) h 2 β Σ ( ε ) + (1 − γ ψ ) X t ∈ T c ε ψ t ( f t ) i ≤ b ( ε ) < 1 with q 0 ∈ K sat , 1 . Note from the deﬁnition of the class K sat , 1 , that q 0 ( k E k ) < 1 implies that k E k < 1 (since otherwise we would hav e q 0 ( k E k ) = 1 ). Letting h sat be the restriction of such a function q 0 on [0 , 1] , we hav e q 0 ( k E k ) = h sat ( k E k ) ≤ b ( ε ) with h sat being inv ertible. W e can now apply h − 1 sat to each member of this inequality to reach the desired result since b ( ε ) lies in the range of h sat . D. Resilience to attacks on the individual sensors W e now consider the situation where the matrix S ∈ R n y × T formed from { s t } in (2) may be sparse entrywise i.e., a relativ ely important fraction of the entries of S are equal to zero 2 . This case is relev ant when any individual sensor may be faulty (or compromised by an attacker) at any time. T o address the resilient state estimation problem in this scenario, we select 2 In this case, the sparsity is expressed in term of fraction of nonzer o entries in the matrix S whereas in the timewise block-sparsity case, the sparsity level is measured in term of the fraction of nonzer o columns in S . the loss functions ψ t to hav e a separable structure. T o be more speciﬁc, let ψ t be such that for any e = [ e 1 · · · e n y ] ∈ R n y ψ t ( e ) = n y X i =1 ψ ti ( e i ) (37) where, consistently with (7), ψ ti ( e i ) = ψ ◦ i ( V ti e i ) with V ti ∈ R > 0 and ψ ◦ i : R → R + , i = 1 , . . . , n y , being some loss functions on R enjoying the properties (P1) – (P5). As in the statement of Corollary 1, we shall require that γ ψ ◦ i = 1 . It follows that one can set ψ ◦ i to be the absolute value without loss of generality . Let therefore set ψ ◦ i ( e i ) = | e i | so that ψ ti ( e i ) = | V ti e i | and ψ t ( e ) = k V t e k 1 (38) with V t being a diagonal matrix having the V ti , i = 1 , . . . , n y , on its diagonal. T o state the resilience property in this particular setting, we partition the index set T × S of the entries of S as Λ ε = { ( t, i ) ∈ T × S : ψ ti ( f ti ) ≤ ε } Λ c ε = { ( t, i ) ∈ T × S : ψ ti ( f ti ) > ε } (39) with f ti denoting the i -th entry of the vector f t ∈ R n y . Also, in order to account for the speciﬁcity of the ne w scenario, let us reﬁne slightly the r -Resilience index (21) to be ˜ p r = sup Z ∈ R n × T Z 6 =0 sup I ⊂ T × S | I | = r P ( t,i ) ∈ I ψ ti ( c > ti z t ) H Σ ( Z ) (40) where H Σ is deﬁned as in (20) from ψ t in (38) and c > ti is i -th row of the observ ation matrix C t . The difference between p r in (21) and ˜ p r in (40) resides in the index sets for counting possible error occurrences which are T and T × S , respectiv ely . W ith these notations, we can provide the following theorem which is the analog of Corollary 1 in the case where the disturbance matrix S (see Eq. (2)) is entrywise sparse. Theorem 2 (Upper bound on the estimation error: Separable case) . Consider the system Σ deﬁned by (1) with output Y together with the state estimator (8) in which φ is assumed to obe y (P1)–(P5) and ψ is deﬁned as in (38) . Let ε ≥ 0 and set r = | Λ c ε | with Λ c ε deﬁned in (39) . If the system is observable on [0 , T − 1] and if ˜ p r < 1 / 2 , then ther e exists a K ∞ function ˜ h such that for all norm k·k on R n × T , k ˆ X − X k ≤ ˜ h 2 ˜ β Σ ( ε ) ˜ D (1 − 2 ˜ p r ) ! ∀ ˆ X ∈ E ( Y ) (41) with X denoting the true state matrix from (1) and ˜ β Σ ( ε ) deﬁned by ˜ β Σ ( ε ) = λ X t ∈ T 0 φ t ( w t ) + X ( t,i ) ∈ Λ ε ψ ti ( f ti ) ˜ D and ˜ h ar e deﬁned as in the statement of Theorem 1 with H Σ being constructed fr om ψ in (38) . T o some extent, Theorem 2 can be viewed as a special case of Theorem 1 in which the function ψ is taken to be the ` 1 9 norm and the data set is modiﬁed to be T × S . Hence the proof follows a similar line of arguments as that of Theorem 2. Again it is not hard to see that the result of Theorem 2 implies the property of resilience with respect to the set F r in (17) of measurement noise in the sense of Deﬁnition 2 (see the proof of Corollary 1). An interesting property of the estimator can be stated in the absence of dense noise, i.e., when only the sparse noise is activ e: Corollary 3. Consider the system Σ deﬁned by (1) and let r = | Λ c 0 | (which means that we consider every nonzer o occurrence of f it as an outlier by taking ε = 0 in (39) ). If the conditions of Theorem 2 hold, ˜ p r < 1 / 2 , and if w t = 0 in (1) for all t , then the estimator deﬁned by (8) r etrieves exactly the state trajectory of the system, i.e., E ( Y ) = { X } . Pr oof: This follows directly from the fact that ˜ β Σ (0) = 0 in the case where there is no dense noise w t and ε = 0 . Therefore, the estimator (8) has the exact recoverability prop- erty , that is, it is able to recover exactly the true state of the system (1) when only the sparse noise is active in the measurement equation provided that the number r = | Λ c 0 | of nonzero in the sequence { f ti } ( t,i ) ∈ T × S is small enough for ˜ p r to be less than 1 / 2 . According to our analysis, the number of outliers that can be handled by the estimator in this case can be underestimated by max  r : ˜ p r < 1 / 2  . (42) V . A S P E C I A L V A R I A N T O F T H E E S T I M A T O R E In this section, we consider a constrained reformulation of the estimator E deﬁned in (8). As will be shown shortly , this reformulation also enjoys the resilience property but under a condition which is more easily veriﬁable from a numerical perspectiv e. W e start by considering the simple scenario where the process noise w t in (1) is identically equal to zero and the sequence { f t } is sparse in the sense that its dense component v t displayed in (2) does not exist. In this setting we can obtain a more resilient (to sparse noise in the measurement) estimator than (8) by making it aware of the absence of dense process noise. This can be achiev ed by contraining the searched state matrix to be in the set Z Σ ⊂ R n × T deﬁned by Z Σ =  Z =  z 0 A 0 z 0 · · · A T − 1 · · · A 1 A 0 z 0  : z 0 ∈ R n  of possible trajectories starting in any initial state z 0 ∈ R n . Follo wing this idea, we consider the estimator E ◦ deﬁned by E ◦ ( Y ) = arg min Z ∈Z Σ V Σ ( Y , Z ) . Then E ◦ ( Y ) can be rewritten more simply in the form E ◦ ( Y ) = n Z =  z 0 A 0 z 0 · · · A T − 1 · · · A 1 A 0 z 0  : z 0 ∈ arg min z ∈ R n V ◦ Σ ( Y , z ) o (43) where V ◦ Σ ( Y , z ) = X t ∈ T ψ t ( y t − M t z ) (44) with M t = C t A t − 1 · · · A 1 A 0 (45) for all t ≥ 1 and M 0 = C 0 . Hence the estimation of the state trajectory reduces to estimating the initial state x 0 . This can be viewed as a robust regression problem, like the ones discussed in [13], [2]. Generalizing a result in [2], we deriv e next a necessary and sufﬁcient condition for exact recov ery of the true state, which holds if and only if arg min z ∈ R n V ◦ Σ ( Y , z ) = { x 0 } with x 0 being the exact initial state of the system Σ . T o this end, we ﬁrst introduce the concept of concentration ratio of a collection of matrices with respect to a loss function. A notational conv ention will be necessary for the statement of this property: for any subset I of T , let Ψ ◦ I ( z ) = X t ∈ I ψ t ( M t z ) . (46) Deﬁnition 4 ( r -th concentration ratio) . Let { ψ t } be a family of loss functions deﬁned by (7) in which ψ is assumed to satisfy (P1), (P3) and (P5) with constant γ ψ = 1 . Let M = { M t } t ∈ T be a sequence of matrices such that the function Ψ ◦ T deﬁned in (46) is positive deﬁnite. W e call r -th concentration r atio of M , the number deﬁned by ν r ( M ) = sup z ∈ R n z 6 =0 sup I ⊂ T | I | = r Ψ ◦ I ( z ) Ψ ◦ T ( z ) (47) At a ﬁxed r , ν r ( M ) quantiﬁes a genericity property for the sequence M = { M t } t ∈ T . In view of the particular structure of the collection M in (45), note that Ψ ◦ T is positiv e deﬁnite whenev er the system Σ is observable on T . Furthermore, ν r ( M ) can be interpreted to some extent, as a quantitative measure of observability . It is indeed all the smaller as the system is strongly observable. T o see this, recall from Lemma 3 that if the system is observable on [0 , T − 1] , then for all Z ∈ Z Σ initiated from z in R n , we have V Σ (0 , Z ) = Ψ ◦ T ( z ) ≥ q ( k z k ) for some K ∞ function q . It follows that ν r ( M ) ≤ sup z ∈ R n z 6 =0 sup I ⊂ T | I | = r Ψ ◦ I ( z ) q ( k z k ) (48) Hence the more observable (i.e., the larger the gain function q ), the smaller ν r ( M ) . For all ( Y , z 0 ) ∈ R n y × T × R n with Y expressed colum- nwise in the form Y =  y 0 · · · y T − 1  , consider now the following notations: I 0 ( Y , z 0 ) = { t ∈ T : y t − M t z 0 = 0 } I c ( Y , z 0 ) = { t ∈ T : y t − M t z 0 6 = 0 } . Theorem 3 (Exact Recoverability Condition) . Consider the cost function (44) where M = { M t } is assumed to have been constructed as in (45) fr om the matrices of system (1) . Assume that the loss functions { ψ t } in volved in (44) ar e deﬁned by (7) in which ψ is assumed to satisfy (P1), (P3) and (P5) with constant γ ψ = 1 . Let r be a positive inte ger . If the system (1) is observable on [0 , T − 1] , then the two following pr opositions ar e equivalent: 10 (i) F or all Y in R n y × T and all z 0 in R n , |I c ( Y , z 0 ) | ≤ r ⇒ arg min z ∈ R n V ◦ Σ ( Y , z ) =  z 0  (49) (ii) The index ν r ( M ) satisﬁes ν r ( M ) < 1 / 2 (50) Pr oof: (i) ⇒ (ii): Assume that (i) holds. Consider an arbitrary subset I of T such that | I | ≤ r . Let z 0 6 = 0 be a vector in R n . Construct a sequence Y in R n y × T such that y t = 0 if t ∈ I and y t = M t z 0 otherwise. Then I c ( Y , z 0 ) ⊂ I , so that |I c ( Y , z 0 ) | ≤ r . It then follows from (i) that arg min z ∈ R n V ◦ Σ ( Y , z ) =  z 0  which means that V ◦ Σ ( Y , z 0 ) < V ◦ Σ ( Y , z ) for all z ∈ R n , z 6 = z 0 . In particular, V ◦ Σ ( Y , z 0 ) < V ◦ Σ ( Y , 0) which, by taking into account the deﬁnition of Y , gives Ψ ◦ I ( z 0 ) < Ψ ◦ I c ( z 0 ) , where I c = T \ I . Since Ψ ◦ T ( z 0 ) = Ψ ◦ I ( z 0 ) + Ψ ◦ I c ( z 0 ) , we see that Ψ ◦ I ( z 0 ) Ψ ◦ T ( z 0 ) < 1 / 2 This reasoning works for e very nonzero z 0 and for ev ery subset I of T . W e can hence conclude that ν r ( M ) < 1 / 2 . (ii) ⇒ (i): Assume that (ii) holds. Let ( Y , z 0 ) ∈ R n y × T × R n be such that |I c ( Y , z 0 ) | ≤ r . W e then need to prove that arg min z ∈ R n V ◦ Σ ( Y , z ) =  z 0  . Since the assertion (ii) is assumed true, it follows from (47) and (50) that 2Ψ ◦ I c ( z 0 0 ) < Ψ ◦ T ( z 0 0 ) ∀ z 0 0 ∈ R n , z 0 0 6 = 0 (51) where, for simplicity , we ha ve posed I c = I c ( Y , z 0 ) . In the deriv ation of (51), we have used the obvious fact that r 1 ≤ r 2 ⇒ ν r 1 ( M ) ≤ ν r 2 ( M ) . If we pose I = I 0 ( Y , z 0 ) = T \ I c , then the inequality (51) is equiv alent to X t ∈ I c ψ t ( M t z 0 0 ) < X t ∈ I ψ t ( M t z 0 0 ) (52) Now we observe that for all t in I = I 0 ( Y , z 0 ) , y t = M t z 0 , so that ψ t ( M t z 0 0 ) = ψ t  y t − M t ( z 0 + z 0 0 )  . On the other hand, for t ∈ I c = I c ( Y , z 0 ) , if we apply the GTI (10) with γ ψ = 1 , we obtain ψ t ( M t z 0 0 ) = ψ t  y t − M t z 0 − ( y t − M t ( z 0 + z 0 0 )  ≥ ψ t ( y t − M t z 0 ) − ψ t ( y t − M t ( z 0 + z 0 0 )) Combining all these remarks with (52) yields X t ∈ I c [ ψ t ( y t − M t z 0 ) − ψ t ( y t − M t ( z 0 + z 0 0 )] < X t ∈ I ψ t ( y t − M t ( z 0 + z 0 0 )) Rearranging this giv es V ◦ Σ ( Y , z 0 ) < V ◦ Σ ( Y , z 0 + z 0 0 ) for all z 0 0 ∈ R n with z 0 0 6 = z 0 . This is equiv alent to arg min z ∈ R n V ◦ Σ ( Y , z ) =  z 0  . Hence (ii) holds as claimed. From the statement of Theorem 3, we infer that under the assumption that only the sparse noise { s t } is acti ve (i.e., there is no dense noise ( w t , v t ) ) in the system equations (1), E ◦ ( Y ) = { X } whenever ν r ( M ) < 1 / 2 , i.e, the estimator E ◦ returns exactly the true state. For a giv en system, if one can ev aluate numerically the index ν r ( M ) , then it becomes possi- ble to assess the number r max , max { r : ν r ( M ) < 1 / 2 } of gross errors that can be corrected by the estimator E ◦ when applied to that system. W e will get back to the computational matter in Section VI. A. Special case of ` 0 -norm loss based estimator Consider the special case where the loss functions { ψ t } are deﬁned, for all t ∈ T , by ∀ e ∈ R n y , ψ t ( e ) =  1 if e 6 = 0 0 otherwise (53) This corresponds to the block ` 0 -norm. Note that such func- tions satisfy the assumptions (P1), (P3) and (P5) requested in the deﬁnition 4 of ν r ( M ) and in the statement of Theorem 3. Hence ν r ( M ) is well-deﬁned in this case. Corollary 4. Consider system (1) under the assumption that w t = 0 for all t . Assume observability of the system on [0 , T − 1] . Consider the estimator (43) in which the cost V ◦ Σ is deﬁned fr om the family of loss functions { ψ t } expr essed in (53) . Then for all ( Y , z 0 ) ∈ R n y × T × R n , |I c ( Y , z 0 ) | < T − µ ( M ) + 1 2 ⇒ E ◦ ( Y ) = { X } , wher e µ ( M ) deﬁned by µ ( M ) = min n k : ∀ I ⊂ T ,  | I | = k ⇒ rank( M I ) = n o (54) is the minimum number k such that any matrix M I ∈ R | I | n y × n formed by stacking vertically the matrices of the collection { M t : t ∈ I } indexed by I ⊂ T with | I | = k , has full column rank. Pr oof: Let us start by observing that with the particular loss functions in voked in the statement of the corollary , Ψ ◦ T ( z ) denotes the number of t ∈ T for which ψ t ( M t z ) 6 = 0 . It follows from the deﬁnition of µ ( M ) that Ψ ◦ T ( z ) ≥ T − µ ( M ) + 1 for all z 6 = 0 . The reason for this is that if ψ t ( M t z ) was to be equal to zero more than µ ( M ) − 1 times, then z would be necessarily equal to zero. As a result we get ν r ( M ) ≤ r T − µ ( M ) + 1 Hence by applying Theorem 3, |I c ( Y , z 0 ) | / ( T − µ ( M ) + 1) < 1 / 2 is a sufﬁcient condition for exact recovery by the ` 0 -norm based estimator . Remark 2. Assume that ψ t is deﬁned to be the counting norm, i.e., ψ t ( e ) = k e k 0 (55) Then ψ t has a separ able structure as illustrated in (37) . Con- sider then deﬁning, still under the observability assumption, an entrywise version of the concentration ratio by ˜ ν r ( M ) = sup z ∈ R n z 6 =0 sup I ⊂ T × S | I | = r P ( t,i ) ∈ I k M ti z k 0 Ψ ◦ T ( z ) (56) 11 wher e M ti r efers to the i -th r ow of M t . Further , let ˜ µ ( M ) = min n k : ∀ I ⊂ T × S ,  | I | = k ⇒ rank( ˜ M I ) = n o with ˜ M I ∈ R | I |× n denoting the matrix obtained by stacking the r ow vectors { M ti : ( t, i ) ∈ I } . Then a r esult similar to Cor ollary 4 is obtainable: if the number the measur ements corrupted by a nonzer o err or (among the n y T available) is strictly less than ( n y T − ˜ µ ( M ) + 1) / 2 , then the estimator E ◦ expr essed in (43) (with ψ t being the ` 0 norm as in (55) ) r ecovers exactly the true state. Remark 3. Under the condition of Remark 2, if we consider the scenario wher e only a set of k < n y sensors may be compr omised by attacker s, then exact r ecovery is achie ved if k < n y 2 − ˜ µ ( M ) − 1 2 T . (57) T aking into consideration the fact that ˜ µ ( M ) − 1 < T , it can then be seen that (57) is equivalent to k ≤ d n y / 2 − 1 e wher e the notation d r e , for r ∈ R , r efers to the smallest inte ger lar ger or equal to r . T o sum up, when the ψ t ar e deﬁned as in (55) , the estimator (43) is able to r eturn the true state matrix even when d n y / 2 − 1 e sensors get faulty over the entir e observation horizon. This is r eminiscent of a r esult stated in [10] which ther efore appears to be a consequence of Theor em 3. B. Stability of the class of estimators E ◦ with respect to dense noise W e hav e argued that the class of estimators E ◦ in (43) is able to obtain exactly the true state matrix when there is no dense noises ( w t , v t ) in the system equations and only the sparse noise { s t } is activ e. The question we ask no w is whether this set of estimators can, in addition to sparse noise, handle dense process and output noises and to what extent this is possible. The starting point of our reﬂection is the observation that the dynamical system deﬁned by ˜ x t +1 = A t ˜ x t , ˜ x 0 = x 0 y t = C t ˜ x t + s t + ( v t + ˜ v t ) , (58) produces the same output as system (1). Here, ˜ v t = C t ˜ w t , with ˜ w t = P t − 1 k =0 A t − 1 · · · A k +1 w k a deﬁnition which uses the con vention that the product A t − 1 · · · A k +1 = I if k = t − 1 . Then the idea is to apply the estimator E ◦ to (58) by neglecting the dense component ( v t + ˜ v t ) of the output equation. T o state the resilience result for E ◦ , consider for a gi ven ε ≥ 0 , a partition ( ˜ T ε , ˜ T c ε ) of T deﬁned as in (22) with f t replaced by ˜ f t , s t + ( v t + ˜ v t ) = f t + ˜ v t . Theorem 4. Consider the estimator (43) for the system (1) . Assume that the loss functions { ψ t } in volved in (44) are deﬁned by (7) in which ψ is assumed to satisfy (P1) – (P5) with constant γ ψ = 1 . Let ε ≥ 0 and set r = | ˜ T c ε | . Denote with N a norm on R n × T deﬁned by N ( Z ) = max t ∈ T k z t k with z t being the t -th column of Z and k·k being a norm on R n . If the system (1) is observable on [0 , T − 1] and ν r ( M ) < 1 / 2 , then there exists a K ∞ function α such that for all norm k·k on R n × T , N ( ˆ X − X ) ≤ R Σ α − 1 ( ρ ) + max t ∈ T k ˜ w t k ∀ ˆ X ∈ E ◦ ( Y ) , (59) wher e R Σ is some constant depending on the system Σ and ρ = 2 D 1  1 − 2 ν r ( M )  X t ∈ ˜ T ε ψ t ( ˜ f t ) with ˜ f t = f t + ˜ v t , and D 1 = min k z k =1 Ψ ◦ T ( z ) . Pr oof: Let ˆ x 0 ∈ arg min z ∈ R n V ◦ Σ ( Y , z ) . W e ﬁrst provide a bound on the error e 0 = ˆ x 0 − x 0 with x 0 denoting the true initial state of system (1). By exploiting the fact that V ◦ Σ ( Y , ˆ x 0 ) ≤ V ◦ Σ ( Y , x 0 ) and noting that y t = M t x 0 + ˜ f t , we reach the inequality X t ∈ T ψ t ( ˜ f t − M t e 0 ) ≤ X t ∈ T ψ t ( ˜ f t ) . By then reasoning quite similarly as in the proof of Theorem 1, we get Ψ ◦ T ( e 0 ) − 2Ψ ◦ T c ε ( e 0 ) ≤ X t ∈ ˜ T ε ψ t ( ˜ f t ) which, by exploiting (47) and the assumption that ν r ( M ) < 1 / 2 , leads to Ψ ◦ T ( e 0 ) ≤ 2 1 − 2 ν r ( M ) X t ∈ ˜ T ε ψ t ( ˜ f t ) Applying no w Lemma 2, we conclude that for any norm k·k on R n , there exists a K ∞ function α such that k e 0 k ≤ α − 1 ( ρ ) . Now by observing that for any ˆ X ∈ E ◦ ( Y ) , ˆ X − X =  e 0 A 0 e 0 · · · A T − 1 · · · A 0 e 0  −  0 ˜ w 0 · · · ˜ w T − 1  the result follows by posing 3 R Σ = max t ∈ T k A t − 1 · · · A 0 k ind with k·k ind being the matrix norm induced by the vector norm k·k on R n . The interest in Theorem 4 is that it provides a condition of resilience for the estimator E ◦ which can be checked numerically as will be discussed in the next section. V I . O N T H E N U M E R I C A L E V A L U A T I O N O F T H E R E S I L I E N C E C O N D I T I O N S The analysis results presented in Sections IV and V rely on some functions (resilience index, concentration ratio, . . . ) which characterize quantitati vely some properties of the sys- tem being observed. A question we ask now is whether it would be possible to ev aluate numerically these measures. In effect, computing the r -resilience index in (21) would help testing for example the resilience condition in Theorem 1. Similarly , ev aluating the concentration ratio ν r ( M ) introduced in (47) is the way to assess whether a giv en estimator is able to return the true state of a giv en system if we make an hypothesis on the number of potential nonzero errors in the measurements. 3 W e use here the con vention that A t − 1 · · · A 0 = I if t = 0 . 12 Unfortunately , obtaining numerically the numbers p r , ˜ p r or ν r require solving some hard noncon vex and combinatorial optimization problems. This is indeed a common characteristic of the concepts which are usually used to assess resilience; for example, the popular Restricted Isometry Property (RIP) constant [6] is comparativ ely as hard to e valuate. W e note howe ver that when the dimension of the state is small enough, ν r ( M ) can be exactly computed by taking inspiration from a method presented in [23] even though at the price of a huge (but affordable) computational cost. Alternati vely , a cheaper ov erestimation can be obtained by means of conv ex optimization as suggested in [2]. The next lemma provides such an ov erestimate for ν r ( M ) . Lemma 5 (An estimate of ν r ) . Assuming all quantities are well-deﬁned (see the conditions in Deﬁnitions 3 and 4), the following statements hold: (a) ν r ≤ p r (b) If µ ( M ) ≤ T − 1 then ν r ( M ) ≤ r ν o 1 + ν o , (60) wher e ν o = max t ∈ T min λ t ∈ R T n k λ t k ∞ : V t M t = X k ∈ T λ tk V k M k , λ tt = 0 o (61) In (61) , the λ tk denote the entries of the vector λ t ∈ R T and { V t } refer s to the sequence of nonsingular weighting matrices in volved in (7) . The proof of statement (a) is straightforward by noticing that (47) follows from (21) by constraining the v ariable Z to be in Z Σ . As to the proof of statement (b), it follows a similar reasoning as the proof of Theorem 2 in [2]. The interest of this lemma is twofold. First it suggests that the resilience condition of E ◦ is weaker (in the sense that it is easier to achiev e) than that of E . Second, it provides an upper bound on ν r ( M ) which can be computed by solving a con vex optimization problem (see Eqs (60)-(61)). More speciﬁcally , giv en ν o in (61), checking numerically whether | T c ε | < 1 / 2(1 + 1 /ν o ) provides a sufﬁcient condition for ν r ( M ) < 1 / 2 and so, for the resilience of the estimator (43). In a similar spirit as in Lemma 5, we now show that the parameter ˜ p r deﬁned in (40) can also be overestimated via con vex optimization if the loss functions φ t and ψ t in (6)–(7) are both taken to be norms. Lemma 6 (An estimate of ˜ p r ) . Consider the resilience pa- rameter ˜ p r deﬁned in (40) where we assume that ψ ti ( e ) = | e | for all ( t, i, e ) ∈ T × S × R and φ t is an arbitrary norm. Then ˜ p r ≤ r b 1 (62) wher e b 1 = inf ( t,i ) ∈ T × S inf Z ∈ R n × T  H Σ ( Z ) : c > ti z τ = 1  = 1 ˜ p 1 . (63) Pr oof: (See Appendix D) Note, under the assumptions of Lemma 6, that inf Z ∈ R n × T  H Σ ( Z ) : c > ti z τ = 1  is a con vex optimization problem for any given ( t, i ) . Hence, solving for b 1 in (63) requires solving T n y con vex problems and picking the smallest value among all. The interest of the lemma is that it provides an o verestimate of ˜ p r which is numerically computable. Based on the so obtained ov erestimate of ˜ p r , we see from Theorem 1 that the estimator (8) is resilient to r outliers if r < b 1 / 2 . Moreover , we can deduce an underestimate of the number of outliers that the estimator (8) is able to handle as r max = max { r : r < b 1 / 2 } . As a last remark in this paragraph, let us observe that Lemma 6 is also applicable to ov erestimate p r deﬁned in (21) in the case of a single-output system, i.e. , when n y = 1 . V I I . S I M U L A T I O N R E S U LT S In this part we will illustrate numerically the resilience properties of the proposed class of estimators. For this purpose, we consider for simplicity , an example of linear time-in variant system in the form (1). W e select a single-input single-output example where the pair ( A, C ) is giv en by A =  0 . 7 0 . 45 − 0 . 5 1  , C =  1 2  . (64) W e instantiate the loss functions in (5) as follo ws: For all t in T and for all ( z , e ) ∈ R n × R n y , φ t ( z ) = ψ ( W t z ) and ψ t ( e ) = ψ ( V t e ) where the weighting matrices W t and V t and the functions φ and ψ will be speciﬁed below for each experiment. A. Numerical certiﬁcate of exact r ecoverability Suppose in this section that the process noise w t and the dense component v t of f t (see Eq. (2)) are both identically equal to zero. W e then focus on testing the exact recov erability property of the estimator (43) in the presence only of the sparse noise { s t } . The times of occurrence of the nonzeros values in the sequence { s t } are picked at random. As to its values there are also randomly generated from a zero-mean normal distribution with v ariance 100 2 . Giv en T = 100 output mea- surements and the system matrices in (64), the estimator E ◦ is implemented by directly solving the optimization problem deﬁned in (43) through the CVX interface [12]. Note that the implementation of the estimator (43)-(44) requires computing the matrices M t expressed in (45), which take the form C A t in the L TI case. A problem that may occur howe ver is that if A is Schur stable as is the case here (or unstable), taking successive powers of A produces matrices M t which might not be of the same order of magnitude. T o preserve the contribution of each term of (44), we introduce special weighting matrices { V t } (in the loss function ψ selected as the ` 1 norm) to normalize the rows of these matrices so that they all have unit 2 -norm. V t is therefore selected to be a diagonal matrix of the form V t = diag( V t 1 , · · · , V tn y ) , where V ti =  1 /   c > i A t   2 if c > i A t 6 = 0 1 otherwise . (65) Here, c > i , i = 1 , . . . , n y , denote the i -th row of the matrix C . Indeed the effect of the weighting function in (44) is equiv alent to changing y t and M t respectiv ely to ˜ y t = V t y t 13 0 0.2 0.4 0.6 0.8 1 Sparsity level of sparse noise 0 20 40 60 80 100 Probability of exact recovery [%] Figure 1: Probability of exact recov ery (expressed in percent- age) by the estimator (43) in the presence of only sparse measurement noise { s t } . The level of sparsity of the noise is expressed in terms of a fraction of nonzero v alues in the sequence { s t : t ∈ T } with | T | = T = 100 . and ˜ M t = V t C A t . Posing M =  ˜ M t  , it can be checked using the methods discussed in Section VI (See Eq. (60)) that at least r max = 30 erroneous data (out of T = 100 measurements) can be accommodated by the estimator while still returning exactly the true state. T o in vestigate empirical performance, we consider different ratios | Λ c 0 | /T of nonzero values in the sequence { s t } . For each ﬁxed proportion of nonzero v alues, we run the estimator over 100 dif ferent realizations of the output measurements. The results, depicted in Figure 1, tend to show that the estimator can still ﬁnd the true state ev en for proportions of gross errors as large as 60% . B. P erformances in the pr esence of dense noise W e consider now the more realistic scenario where the process noise { w t } and the measurement noise { v t } are nonzero. W e further assume them to be bounded, white and uniformly distributed. For the numerical e xperiments these signals are sampled from an interv al of the form [ − a, a ] . For comparison purpose, we conduct the estimation with several estimators: • an instance of the estimator (8), denoted E ` 2 2 ,` 1 in the sequel, in which the loss function φ t is quadratic and ψ t is the ` 1 -norm and λ = 1000 (see (5) and (6)-(7)) • an instance of the estimator (8) denoted E ` 1 ,` 1 in which both loss functions φ t and ψ t are the ` 1 -norm with λ = 10 • the estimator E ◦ deﬁned in (43) In addition we implement oracle versions 4 of E ` 2 2 ,` 1 and of E ` 2 2 ,` 2 2 (the latter corresponding to an instance of E where both φ t and ψ t are instantiated as quadratic functions). Experiment 1: Resilience test. K eeping the lev el of both dense noises (i.e., w t and v t ) ﬁxed with amplitude a = 0 . 03 4 By oracle version of an estimator , we refer here to an implementation of this estimator which is aware of the sparse noise sequence { s t } . for the entries of the former and a = 0 . 1 for the latter (yielding a Signal to Noise Ratio (SNR) of about 30 dB in each case), we apply the estimators E ` 2 2 ,` 1 , E ` 1 ,` 1 and E ◦ as deﬁned above) to 100 different realizations of the output data and we compute the average of the corresponding relativ e estimation errors. This process is repeated for different fractions of nonzeros in the sparse noise { s t } ranging from 0 to 0 . 8 . The estimates obtained by these estimators are displayed in Figure 2 in log scale. For the sake of comparison, we also display the oracle estimates given by E ` 2 2 ,` 1 and those obtained by a standard least squares estimator E ` 2 2 ,` 2 2 (i.e. with φ t and ψ t taken to be both quadratic in (5)). By oracle of an estimator, we mean here a version of that estimator which is aware of the true values of the sparse noise sequence { s t } . The results tend to show that the estimator (8) remains stable until the (empirical) resilience condition is violated (an ev ent that happens when the sparsity level for the sparse noise is around 60% ). This is consistent with the resilience property characterized in Theorem 1 and the empirical observations made in Section VII-A according to which the estimator is insensitiv e to the sparse noise sequence { s t } as long as the number of nonzero values in it (whose magnitudes are possibly arbitrarily large) is less than a certain threshold determined by the properties of the system. While Lemma 6 provides an underestimate of the number of correctable outliers as r max = 8 (out of 100 ), we can observ e that the empirical breakpoint in the current example seems to be indeed around 40% . The discrepancy between the two values is partly explained by the pessimism of the upper bound of p r proposed in Lemma 6. 0 0 . 2 0 . 4 0 . 6 0 . 8 1 10 − 1 10 0 10 1 Sparsity level of sparse noise Relativ e estimation error Estimator E ` 2 2 ,` 1 Estimator E ` 1 ,` 1 Estimator E ◦ Oracle E ` 2 2 ,` 1 Oracle E ` 2 2 ,` 2 2 Figure 2: A verage relative estimation error (in logarithm scale) induced by dif ferent estimators versus sparsity lev el of the sparse noise { s t } . The relative error is expressed here as   ˆ X − X   2 / k X k 2 where X and ˆ X denote the true and estimated state matrices respectiv ely . Parameters of the estimator E in (8): λ = 1000 , W t = I 2 and V t = 1 for all t . Experiment 2: Stability with respect to dense noise. Now , we ﬁx the sparsity le vel of the time sequence { s t } to 0 . 2 and let the powers of the dense noise { ( w t , v t ) } v ary jointly from 5 dB to 100 dB in term of SNR. The estimates obtained by the estimators (8) and (43) with the choices of φ t and ψ t agreed in the beginning of Section VII are displayed in Figure 3 in 14 term of log 10 of estimation errors. What this illustrates is that whenev er the number of faulty data is reasonable (here 20% of the available measurements), the estimator discussed in this section behav es almost in the same way as when there is no faulty data at all. 0 20 40 60 80 100 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 Signal-to-Noise Ratio (dB) Relativ e estimation error Estimator E ` 2 2 ,` 1 Estimator E ` 1 ,` 1 Estimator E ◦ Oracle E ` 2 2 ,` 1 Oracle E ` 2 2 ,` 2 2 Figure 3: A verage relative estimation error (in log scale) induced by different estimators for different le vels of both dense noises w t and v t . Parameters of the estimator E in (8): λ = 1000 , W t = I 2 and V t = 1 for all t . Experiment 3: Impact of the regularization parameter λ in E . T o assess the inﬂuence of the regularization parameter λ on the performance of the estimators E ` 2 2 ,` 1 and E ` 1 ,` 1 , we ﬁx the amplitudes of both dense noises { w t } and { v t } at the same le vel as in Experiment 1 ( i.e. SNR equal to 30 dB and ratio of non-zeros entries in { s t } equal to 30 % ). In this setting, we consider a set of values of λ ranging from 10 − 3 to 10 6 . For each of these values we perform an estimation ov er a hundred realizations of the output data and compute the av erage of the corresponding relativ e estimation errors. The outcome of this test, depicted in Figure 4, tends to suggest that low values of λ yield quite poor results. Conv ersely , when λ tends towards inﬁnity , both estimators’ performance measures saturate at the same value, namely 0 . 1 . It turns out that this limit value corresponds to the relative error obtained for E ◦ in Experiment 1 in the same conﬁguration, hence suggesting that E tends indeed to E ◦ in behavior as λ becomes large. Finally , it is interesting to observe that both performance curves exhibit minima located around λ = 750 and λ = 10 for E ` 2 2 ,` 1 and E ` 1 ,` 1 respectiv ely . V I I I . C O N C L U S I O N In this paper , we hav e considered the problem of estimating the state of linear time-varying systems in the face of uncer- tainties modeled as process and measurement noises in the system equations. The measurement noise sequence assumes values of possibly arbitrarily large amplitude which occur intermittently in time and accross the av ailable sensors. For this problem we have proposed a class of estimators based on the resolution of a family of parameterizable optimization problems. The discussed family is rich enough to include 10 − 3 10 − 2 10 − 1 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 − 1 10 0 10 1 10 2 Parameter λ Relativ e estimation error Estimator E ` 2 2 ,` 1 Estimator E ` 1 ,` 1 Figure 4: A verage relativ e estimation error induced by E ` 2 2 ,` 1 and E ` 1 ,` 1 for the system (64), SNR = 30 dB and 30 % of non-zero entries in { s t } for dif ferent values of the regularization parameter λ . optimization-based estimators based on v arious loss functions which may be con vex (e.g., ` p -norms) or nonconv ex (e.g., ` p quasi-norms or saturated functions), smooth or nonsmooth. In particular , we hav e proved a resilience property for the proposed class of state estimators, that is, the resulting es- timation error is bounded by a bound which is independent of the extreme values of the measurement noise provided that the number of occurrences (over time and over the whole set of sensors) of such extreme values is limited. Note ho wev er that the estimators studied here operate in batch mode, that is, they apply to a ﬁnite collection of measurements. In future works we intend to inv estigate efﬁcient and lo w cost adaptiv e versions of the proposed optimization framew ork. Another interesting research avenue would be to study the lev el of performance which is achiev able if one uses the discussed framew ork as a method to detect bad data prior to a reﬁnement with standard least squares estimation. A P P E N D I X In this appendix, we provide some technical results used in the paper and the associated proofs. A. A useful technical lemma Lemma 7. Let ξ 1 , ξ 2 : R a × b → R ≥ 0 be two functions which satisfy pr operties (P1) – (P3) and let ` : R c × d → R a × b be an injective linear mapping. Then ξ 1 + ξ 2 and ξ 1 ◦ ` verify (P1)– (P3). In addition, the following holds: (j) If ξ 1 , ξ 2 verify (P4), then ξ 1 + ξ 2 and ξ 1 ◦ ` verify (P4) . (jj) If ξ 1 , ξ 2 verify (P5), then ξ 1 + ξ 2 and ξ 1 ◦ ` verify (P5) . The main point of interest of this lemma is that even if there are functions which satisfy properties (P4) and (P5) with different values of q and γ , their sum still veriﬁes those properties. T o prove Lemma 7, we will need the following result. 15 Lemma 8 (Minimum function of tw o K ∞ functions) . If q 1 and q 2 ar e two K ∞ functions, then so is the function q deﬁned by ∀ λ ∈ R ≥ 0 , q ( λ ) = min i ∈{ 1 , 2 } q i ( λ ) (66) Pr oof: W e have to prove that q is continuous, strictly increasing and satisﬁes q (0) = 0 and lim λ → + ∞ q ( λ ) = + ∞ . First of all, it is clear that q (0) = 0 . Also, continuity of q is immediate from that of q 1 and q 2 by noting that q = ( q 1 + q 2 − | q 1 − q 2 | ) / 2 . T o see the strict increasingness of q , consider λ 1 and λ 2 in R ≥ 0 such that λ 1 < λ 2 . Then q ( λ 1 ) ≤ q 1 ( λ 1 ) < q 1 ( λ 2 ) and q ( λ 1 ) ≤ q 2 ( λ 1 ) < q 2 ( λ 2 ) . It follo ws that q ( λ 1 ) < min i ∈{ 1 , 2 } q i ( λ 2 ) = q ( λ 2 ) and hence q is strictly increasing. W e now show that q ( λ ) tends to inﬁnity when λ → + ∞ . Let M > 0 be an arbitrary positi ve number . Since q 1 and q 2 tend to inﬁnity , there exist η 1 and η 2 such that λ ≥ η 1 ⇒ q 1 ( λ ) ≥ M and λ ≥ η 2 ⇒ q 2 ( λ ) ≥ M . By taking η = max i ∈{ 1 , 2 } η i , it holds that q ( λ ) ≥ M whene ver λ ≥ η , or equi valently that, lim λ → + ∞ q ( λ ) = + ∞ . Proof of Lemma 7: The sum ξ 1 + ξ 2 has clearly the properties (P1)–(P3) as a sum of continuous, ev en, positiv e deﬁnite functions. Moreover , the composition of a continuous, ev en, conv ex positive deﬁnite function with an injecti ve linear mapping yields a continuous, ev en, positiv e deﬁnite function, so ξ 1 ◦ ` satisﬁes properties (P1) – (P3) too. Pr oof of (j): Assume that ξ 1 and ξ 2 satisfy (P4) with K ∞ functions q 1 and q 2 respectiv ely . For all λ 6 = 0 and all Z ∈ R a × b , (9) yields ξ i ( Z ) ≥ min j ∈{ 1 , 2 } q j  1 | λ |  ξ i ( λZ ) . (67) If we deﬁne q so that for all λ ∈ R ≥ 0 , q ( λ ) = min i ∈{ 1 , 2 } q i ( λ ) , then q is a K ∞ function (see Lemma 8 abov e) such that for all λ 6 = 0 and Z ∈ R a × b , ξ 1 ( Z ) + ξ 2 ( Z ) ≥ q  1 | λ |  ( ξ 1 ( λZ ) + ξ 2 ( λZ )) (68) therefore ξ 1 + ξ 2 veriﬁes property (P4). Besides, for all λ 6 = 0 and Z in R c × d , ξ 1 ( ` ( Z )) ≥ q 1  1 | λ |  ξ 1 ( λ` ( Z )) = q 1  1 | λ |  ξ 1 ( ` ( λZ )) (69) giv en the linearity of ` . W e can then conclude that ξ 1 ◦ ` also veriﬁes property (P4). Pr oof of (jj): Assume that ξ 1 and ξ 2 satisfy (P5) for γ 1 and γ 2 respectiv ely . Let γ = min i ∈{ 1 , 2 } γ i . Similarly to the ﬁrst case, for all Z 1 , Z 2 in R a × b and i in { 1 , 2 } , (10) yields ξ i ( Z 1 − Z 2 ) ≥ γ ξ i ( Z 1 ) − ξ i ( Z 2 ) (70) which giv es ξ 1 ( Z 1 − Z 2 )+ ξ 2 ( Z 1 − Z 2 ) ≥ γ ( ξ 1 ( Z 1 ) + ξ 2 ( Z 1 )) − ( ξ 1 ( Z 2 ) + ξ 2 ( Z 2 )) (71) therefore ξ 1 + ξ 2 satisﬁes property (P5). Moreov er , for all Z 1 and Z 2 in R c × d , ξ 1 ( ` ( Z 1 − Z 2 )) = ξ 1 ( ` ( Z 1 ) − ` ( Z 2 )) ≥ γ ξ 1 ( ` ( Z 1 )) − ξ 1 ( ` ( Z 2 )) (72) so ξ 1 ◦ ` satisﬁes (P5) too. B. Pr oof of Lemma 3 (i) ⇒ (ii): Assuming that the system is observable on the interv al [0 , T − 1] , we need to prov e that there exists a K ∞ function q which veriﬁes (14). The idea of the proof is to apply Lemma 7 to the function F of R n × T deﬁned by F ( Z ) = V Σ (0 , Z ) with V Σ deﬁned as in (5). T o begin with, we note that F can be decomposed as F = ξ ◦ ` where ξ : R n × ( T − 1) × R n × T → R ≥ 0 is a loss function such that for Z =  z 0 · · · z T − 2  in R n × ( T − 1) , Y =  y 0 · · · y T − 1  in R n y × T , ξ ( Z, Y ) = T − 2 X t =0 φ t ( z t ) + T − 1 X t =0 ψ t ( y t ) and ` : R n × T → R n × ( T − 1) × R n × T a linear mapping such that for all Z =  z 0 · · · z T − 1  in R n × T , ` ( Z ) =   z 1 − A 0 z 0 · · · z T − 1 − A T − 2 z T − 2  ,  C 0 z 0 · · · C T − 1 z T − 1   . T o apply Lemma 7 to F , we need to check that F fulﬁlls the properties (P1) – (P3). In virtue of the assumptions on φ t and ψ t agreed in the statement of the lemma, the ﬁrst two properties are obviously satisﬁed. The third will be satisﬁed if ` is injectiv e, a propriety which we now check. Let Z be such that ` ( Z ) = 0 . Then ∀ t ∈ { 0 , . . . , T − 2 } , z t +1 − A t z t = 0 (73) ∀ t ∈ { 0 , . . . , T − 1 } , C t z t = 0 (74) An immediate consequence of (73)–(74) is that O 0 ,T − 1 z 0 = 0 which yields z 0 = 0 because the system is observ able on [0 , T − 1] . Therefore, thanks to the recursi ve relation (73), we can conclude that Z = 0 , and so, the linear mapping ` is injectiv e. W e can therefore apply Lemma 7 to conclude that F satisfy indeed (P1) – (P4). Now , consider a matrix norm k·k ind on R n × T induced by two vector norms k·k T and k·k deﬁned respectiv ely on R T and R n in the sense that k Z k ind = sup η ∈ R T η 6 =0 k Z η k k η k T Applying Lemma 2 to F with the so-deﬁned induced norm, we infer that there exists D > 0 deﬁned as in (12) and a K ∞ function q 0 , such that for all Z in R n × T , F ( Z ) ≥ D q 0 ( k Z k ind ) (75) If we denote with e 1 the canonical vector of R T with all entries equal to zero except the ﬁrst one which is equal to 1 , then Z e 1 = z 0 . Howe ver , by deﬁnition of the induced norm, we know that k Z e 1 k / k e 1 k T ≤ k Z k ind . Therefore, as q 0 is an increasing function, we get that q 0 ( k z 0 k / k e 1 k T ) ≤ q 0 ( k Z k ind ) . By posing q : λ 7→ D q 0 ( λ/ k e 1 k T ) , it is easy to see that q is a K ∞ function so that for all Z in R n × T , V (0 , Z ) = F ( Z ) ≥ 16 q ( k z 0 k ) . (ii) ⇒ (i): Assume that there exists q in K ∞ such that for all Z =  z 0 z 1 . . . z T − 1  in R n × T such that (14) holds. W e want to prov e that the matrix O 0 ,T − 1 deﬁned in (13) is of full column rank, which is equi valent to showing that for z in R n , O 0 ,T − 1 z = 0 implies z = 0 . For all z ∈ R n , construct a sequence Z ∗ =  z ∗ 0 · · · z ∗ T − 1  as follows: z ∗ 0 = z and z ∗ t +1 = A t z ∗ t for all t ∈ { 0 , . . . , T − 2 } . Since the inequality (14) is supposed to be true for any sequence, so it is for the particular sequence { z ∗ t } deﬁned abo ve. Applying this inequality to Z ∗ yields V (0 , Z ∗ ) = T − 1 X t =0 ψ t ( C t z ∗ t ) ≥ q ( k z ∗ 0 k ) (76) Now , observe that if O 0 ,T − 1 z = 0 , then it follows from the recursiv e relation z ∗ t +1 = A t z ∗ t that for all t in { 0 , . . . , T − 1 } , C t z ∗ t = 0 . Injecting this in (76) imposes that q ( k z ∗ 0 k ) ≤ 0 which necessarily implies that z = 0 as q is a K ∞ function. Therefore, the matrix O 0 ,T − 1 is injecti ve and the system is observable on the interval [0 , T − 1] . C. T echnical r esults for pr oving Corollary 2 This section contains some technical steps of the proof of Corollary 2. Lemma 9. If ` : R n y → R ≥ 0 satisﬁes (P1)–(P3) and (P5), then so does the function ψ deﬁned by ψ ( y ) = 1 − e − ` ( y ) . Mor eover if ` fulﬁlls (P4), then ψ satisﬁes the same pr operty but with a function q in K sat ,a for a = 1 . Pr oof: It is straightforward to check that ψ obe ys (P1)- (P3). By assumption, ` obeys (P5). Denote therefore the associated constant with γ ` (which, by (10), is necessarily less than or equal to 1 ). T o see then that (P5) is also satisﬁed by ψ , we just need to check that ψ ( a + b ) − ¯ γ ` ψ ( a ) − ¯ γ ` ψ ( b ) ≤ 0 ∀ ( a, b ) ∈ R n y × R n y (77) with ¯ γ ` = γ − 1 ` ≥ 1 , which is equiv alent to 1 − 2 ¯ γ ` + ¯ γ ` e − ` ( a ) + ¯ γ ` e − ` ( b ) − e − ` ( a + b ) ≤ 0 Noting that ` ( a + b ) ≤ ¯ γ ` ` ( a ) + ¯ γ ` ` ( b ) , we have − e − ` ( a + b ) ≤ − e − ¯ γ ` ` ( a ) − ¯ γ ` ` ( b ) . From this it follows that for (77) to hold, it is enough that 1 − 2 ¯ γ ` + ¯ γ ` e − ` ( a ) + ¯ γ ` e − ` ( b ) − e − ¯ γ ` ` ( a ) − ¯ γ ` ` ( b ) ≤ 0 Posing α = e − ` ( a ) and β = e − ` ( b ) , it suf ﬁces that 1 − 2 ¯ γ ` + ¯ γ ` α + ¯ γ ` β − ( αβ ) ¯ γ ` ≤ 0 ∀ ( α, β ) ∈ ]0 , 1] which can indeed be checked to be true by applying the identity 1 + ¯ γ ` α − ¯ γ ` ≤ α ¯ γ ` , see e.g., [3, Fact 1.9.2]. In effect, it follows from this identity that 1 − 2 ¯ γ ` + ¯ γ ` α + ¯ γ ` β − ( αβ ) ¯ γ ` = (1 + ¯ γ ` α − ¯ γ ` ) + (1 + ¯ γ ` β − ¯ γ ` ) − 1 − ( αβ ) ¯ γ ` ≤ α ¯ γ ` + β ¯ γ ` − 1 − ( α β ) ¯ γ ` = − (1 − α ¯ γ ` )(1 − β ¯ γ ` ) ≤ 0 . In conclusion, (77) holds and therefore ψ satisﬁes (P5). It remains no w to check (P4). This follo ws directly from Lemma 10 below , from which we know that ψ ( y ) ≥ q ? (1 /λ ) ψ ( λy ) with q ? is a saturated function in K sat , 1 . Lemma 10. Let ` : R n y → R ≥ 0 be a function satisfying pr operties (P1) – (P2) and (P4). In particular , assume that pr operty (P4) is satisﬁed by ` with a K ∞ function q such that (9) is an equality relation. Let g ( y, λ ) = 1 − e − ` ( y ) 1 − e − ` ( y /λ ) for λ 6 = 0 and y 6 = 0 . Then the function q ? : R ≥ 0 → [0 , 1] deﬁned by q ? ( λ ) = inf y 6 =0 g ( y, λ ) for λ > 0 and q ? (0) = 0 , is well-deﬁned, continuous and strictly increasing on [0 , 1] . Mor eover we have 1 − e − ` ( y ) ≥ q ? (1 /λ )  1 − e − ` ( λy )  ∀ ( λ, y ) ∈ R > 0 × R n y Pr oof: Since g is positiv e on its domain (hence lower - bounded), the deﬁning inﬁmum of q ? is well-deﬁned. Pose a = e − ` ( y ) . Then by using the continuity property of ` and its radial unboundedness (see Lemma 2), we see that the range of a when y liv es in R n y \ { 0 } is ]0 , 1[ . From the assumptions of the lemma, ` ( y /λ ) = q (1 /λ ) ` ( y ) for all y and all λ > 0 and so, q (1) = 1 and e − ` ( y /λ ) = a q (1 /λ ) . For all λ > 0 we can write q ? ( λ ) = inf y 6 =0 g ( y, λ ) = inf a ∈ ]0 , 1[ 1 − a 1 − a q ( 1 λ ) with q (1 /λ ) ≥ 1 for 0 < λ ≤ 1 and q (1 /λ ) < 1 for λ > 1 . W e therefore obtain q ? ( λ ) =    1 q (1 /λ ) if 0 < λ ≤ 1 1 otherwise The so obtained q ? is clearly continuous wherever it is well deﬁned. Moreover , since lim λ → 0 q ? ( λ ) = q ? (0) = 0 , we conclude that q ? is continuous on its entire domain. From the properties of q , we deduce that q ? is strictly increasing on [0 , 1] . Lastly , we observe that the inequality in the statement of the lemma is a direct consequence of the deﬁnition of q ? . D. Pr oof of Lemma 6 The starting point of the proof is the observ ation that for ev ery integer r in { 1 , . . . , T } , ˜ p r ≤ r ˜ p 1 . Hence it sufﬁces to show that ˜ p 1 = 1 /b 1 and is as expressed in (63). Recall that by deﬁnition, ˜ p 1 = sup ( t,i ) ∈ T × S sup Z ∈ R n × T Z 6 =0 | c > ti z t | H Σ ( Z ) . (78) W ithout loss of generality , assume that c > ti 6 = 0 for all ( t, i ) ∈ T × S Then for any ( t, i ) , sup Z ∈ R n × T Z 6 =0 | c > ti z t | H Σ ( Z ) = sup Z ∈ R n × T Z 6 =0  | c > ti z t | H Σ ( Z ) : c > ti z t 6 = 0  , 1 β ti 17 where β ti = inf Z ∈ R n × T Z 6 =0  H Σ ( Z ) | c > ti z t | : c > ti z t 6 = 0  = inf Z ∈ R n × T Z 6 =0  H Σ ( Z ) : | c > ti z t | = 1  = inf Z ∈ R n × T Z 6 =0  H Σ ( Z ) : c > ti z t = 1  Recalling that H Σ ( Z ) is a norm under the conditions of the lemma, the second equality in the expression of β ti abov e follows from the (strict) homogeneity property of norms. As to the last equality , it follows from the fact that c > ti z t is a scalar which induces the possibility to replace the constraint | c > ti z t | = 1 indif ferently either by c > ti z t = 1 or by c > ti z t = − 1 . Now by in voking the deﬁnition of ˜ p 1 , it can be seen that ˜ p 1 = sup ( t,i ) ∈ T × S 1 β ti = 1 inf ( t,i ) ∈ T × S β ti = 1 b 1 . R E F E R E N C E S [1] L. Bako. Identiﬁcation of switched linear systems via sparse optimiza- tion. Automatica , 47:668–677, 2011. [2] L. Bako. On a class of optimization-based robust estimators. IEEE T ransactions on Automatic Contr ol , 62:5990–5997, 2017. [3] D. S. Bernstein. Matrix Mathematics: Theory, F acts, and F ormulas . 2009. [4] S. Boyd and L. V andenberghe. Con vex optimization . Cambridge Univ ersity Press, 2004. [5] E. Candès and P . A. Randall. Highly robust error correction by con vex programming. IEEE T ransactions on Information Theory , 54:2829– 2840, 2006. [6] E. J. Candès. The restricted isometry property and its implications for compressed sensing. Comptes rendus mathematique , 346(9-10):589– 592, 2008. [7] E. J. Candès and M. B. W akin. An introduction to compressiv e sampling. IEEE Signal Processing Society , 25:21–30, 2008. [8] A. Cardenas, S. Amin, and S. Sastry . Secure control: T owards surviv able cyber -physical systems. In The 28th International Confer ence on Distributed Computing Systems W orkshops , 2008. [9] Y . H. Chang, Q. Hu, and C. J. T omlin. Secure estimation based kalman ﬁlter for cyber-physical systems against sensor attacks. Automatica , 95:399–412, 2018. [10] H. Fawzi, P . T abuada, and S. Diggavi. Secure estimation and control for cyber -physical systems under adversarial attacks. IEEE Tr ansactions on Automatic Control , 59(6):1454–1467, 2014. [11] S. Foucart and H. Rauhut. A mathematical introduction to compressive sensing . Birkhäuser , 2013. [12] M. C. Grant and S. P . Boyd. CVX: Matlab software for disciplined con vex programming, version 2.1. 2017. [13] D. Han, Y . Mo, and L. Xie. Con vex optimization based state estimation against sparse integrity attacks. IEEE Tr ansactions on Automatic Contr ol , 64:2383–2395, 2019. [14] P . J. Huber and E. M. Ronchetti. Robust Statistics . A. John W iley & Sons, Inc. Publication (2nd Ed), 2009. [15] C. M. Kellett. A compendium of comparison function results. Mathe- matics of Control, Signals, and Systems , 26:339–374, 2014. [16] A. Kircher , L. Bako, E. Blanco, M. Benallouch, and A. K orniienko. Analysis of resilience for a state estimator for time-discrete linear systems. 2020 American Control Conference . [17] S. Mishra, Y . Shoukry , N. Karamchandani, S. N. Diggavi, and P . T abuada. Secure state estimation against sensor attacks in the presence of noise. IEEE T ransactions on Contr ol of Network Systems , 4:49–59, 2017. [18] Y . Mo and B. Sinopoli. Secure estimation in the presence of integrity attacks. IEEE T ransactions on Automatic Control , 60:1145–1151, 2015. [19] M. Pajic, I. Lee, and G. J. Pappas. Attack-resilient state estimation for noisy dynamical systems. IEEE T ransactions on Contr ol of Network Systems , 4:82–92, 2016. [20] F . Pasqualetti, F . Dorﬂer, and F . Bullo. Attack detection and identi- ﬁcation in cyber -physical systems. IEEE T ransactions on Automatic Contr ol , 58:2715–2729, 2013. [21] X. Ren, Y . Mo, J. Chen, and K. H. Johansson. Secure state estimation with byzantine sensors: A probabilistic approach. Manuscript https: // arxiv .org/ abs/ 1903.05698 , 2019. [22] R. T . Rockafellar and R. J.-B.W ets. V ariational analysis . Springer Science & Business Media, 2009. [23] Y . Sharon, J. Wright, and Y . Ma. Minimum sum of distances estimator: Robustness and stability . In American Control Conference, St. Louis, Mo, USA , pages 524–530, 2009. [24] Y . Shoukry and P . T abuada. Event-triggered state observers for sparse sensor noise/attacks. IEEE T ransactions on A utomatic Control , 61:2079– 2091, 2016. [25] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M. Jordan, and S. Sastry . Kalman ﬁltering with intermittent observations. IEEE T ransactions on Automatic Contr ol , 49:1453–1464, 2004.

An optimization framework for resilient batch estimation in Cyber-Physical Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment