Decentralized Cooperative Online Estimation With Random Observation Matrices, Communication Graphs and Time Delays

1 Decentralized Cooperati v e Online Estimation W ith Random Observ ation Matrices, Communication Graphs and T ime Delays Jiexiang W ang, T ao Li, Senior Member , IEEE , Xiwei Zhang Abstract W e analyze con ver gence of decentralized cooperativ e online estimation algorithms by a netw ork of multiple nodes via information exchanging in an uncertain en vironment. Each node has a linear ob- servation of an unkno wn parameter with randomly time-varying observ ation matrices. The underlying communication network is modeled by a sequence of random digraphs and is subjected to nonuniform random time-varying delays in channels. Each node runs an online estimation algorithm consisting of a consensus term taking a weighted sum of its own estimate and neighbours’ delayed estimates, and an innov ation term processing its own ne w measurement at each time step. By stochastic time-v arying system, martingale con vergence theories and the binomial expansion of random matrix products, we transform the con vergence analysis of the algorithm into that of the mathematical expectation of random matrix products. Firstly , for the delay-free case, we sho w that the algorithm gains can be designed properly such that all nodes’ estimates con verge to the true parameter in mean square and almost surely if the observation matrices and communication graphs satisfy the stochastic spatio- temporal persistence of e xcitation condition. Secondly , for the case with time delays, we introduce delay matrices to model the random time-v arying communication delays between nodes. It is sho wn that under the stochastic spatio-temporal persistence of excitation condition, for any given bounded delays, proper algorithm gains can be designed to guarantee mean square con ver gence for the case with conditionally balanced digraphs. *Corresponding Author: T ao Li. This work is supported by the National Natural Science Foundation of China under Grant 61977024. Please address all the correspondences to T ao Li: Phone: +86-21-54342646-318, Fax: +86-21-54342609, Email: tli@math.ecnu.edu.cn. Jiexiang W ang is with School of Mechatronic Engineering and Automation, Shanghai Univ ersity , Shanghai 200072, China. T ao Li and Xiwei Zhang are with the Key Laboratory of Pure Mathematics and Mathematical Practice, School of Mathematical Sciences, East China Normal University , Shanghai 200241, China. December 8, 2020 DRAFT 2 Index T erms Decentralized online estimation, cooperative estimation, random graph, random time delay , persistence of excitation. I . I N T R O D U C T I O N Estimation algorithms hav e important applications in many ﬁelds, e.g. navigation systems, space exploration, machine learning and power systems ([1]-[4]), etc. In a po wer system, measurement de vices such as remote terminal units and phasor measurement units, send the measured acti ve and reacti ve power ﬂo ws, b us injection po wers and voltage amplitudes to the Supervisory Control and Data Acquisition (SCDA) system, then the v oltage amplitudes and phase angles at all buses are estimated for secure and stable operation of the system ([5]- [6]). Generally speaking, there are mainly two categories of estimation algorithms in term of information structure, i.e. centralized and decentralized algorithms. In a centralized algorithm, a fusion center is used to collect all nodes’ s measurements and giv es the global estimate. This structure heavily relies on the fusion center and lacks robustness and security . In a decentralized algorithm, a netw ork of multiple nodes is emplo yed to cooperati vely estimate the unkno wn parameter via information exchanging, where each node is an entity with integrated capacity of sensing, computing and communication, and occasional node/link failures may not destroy the entire estimation task. Hence, decentralized cooperativ e estimation algorithms are more robust than centralized ones ([7]-[8]). There exist various kinds of uncertainties in real networks. For example, sensors are usually po wered by chemical or solar cells, and the unpredictability of cell po wer leads to random node/link failures, which can be modeled by a sequence of random communication graphs. Besides, node sensing failures or measurement losses ([9]) can be modeled by a sequence of random observation matrices. There are lots of literature on decentralized online esti- mation problems with random graphs. Ugrinovskii [10] studied decentralized estimation with Marko vian switching graphs. Kar & Moura [11] and Sahu et al [12] considered decentralized estimation with i.i.d. graph sequences, where Kar & Moura [11] showed that the algorithm achie ves weak consensus under a weak distributed detectability condition and Sahu et al [12] prov ed that the algorithm con ver ges almost surely if the mean graph is balanced and strongly connected. Sim ˜ oes & Xavier [13] proposed a decentralized estimation algorithm with i.i.d. undirected graphs and proved that the con vergence rate of mean square estimation error is asymptotically equal to that of the centralized algorithm. Decentralized cooperati ve online estimation based on diffusion strategies was addressed in [14]-[18] with spatio-temporally December 8, 2020 DRAFT 3 independent observ ation matrices, i.e. the sequence of observ ation matrices of each node is an independent random process and those of different nodes are mutually independent. Piggott & Solo [19]-[20] studied decentralized estimation with temporally correlated observation matrices and a ﬁxed communication graph. Ishihara & Alghunaim [21] studied decentralized estimation with spatially independent observation matrices. Kar et al [22] and Kar & Moura [23] proposed consensus+innovations decentralized estimation algorithms with random graphs and observation matrices, where the sequences of communication graphs and observation ma- trices are both i.i.d. They proved that the algorithm con ver ges almost surely if the mean graph is balanced and strongly connected. Zhang & Zhang [24] considered decentralized estimation with ﬁnite Markovian switching graphs and i.i.d. observation matrices, and pro ved that the algorithm con ver ges in mean square and almost surely if all graphs are balanced and jointly contain a spanning tree. Zhang et al [25] proposed a robust decentralized estimation algorithm with the communication graphs and observation matrices being mutually independent with each other and both uncorrelated sequences. In summary , most existing literature on decentral- ized cooperati ve estimation algorithms required balanced mean graphs and special statistical properties of communication graphs and observ ation matrices, such as i.i.d. or Markovian switching graph sequences, spatially or temporally independent observation matrices with the ﬁxed mathematical expectation, which are also independent of communication graphs. Besides random communication graphs and observ ation matrices, random communication delays are also common in real systems ([26]-[28]). Due to congestions of communication links and external interferences, time delays are usually random and time-varying, whose probability distribution can be approximately estimated by statistical methods. Howe ver , to our best knowledge, there has been no literature on decentralized online estimation with general random time-v arying communication delays. Zhang et al [29] and Mill ´ an et al [30] considered decentralized estimation with uniform deterministic time-in v ariant and time- v arying communication delays, respectiv ely , where Mill ´ an et al [30] established a LMI type con ver gence condition by the L yapunov-Kraso vskii functional method. In this paper , we analyze con ver gence of decentralized cooperativ e online parameter esti- mation algorithms with random observ ation matrices, communication graphs and time delays. Each node’ s algorithm consists of a consensus term taking a weighted sum of its own estimate and delayed estimates of its neighbouring nodes, and an innov ation term processing its own ne w measurement at each time step. The sequences of observ ation matrices, communication graphs and time delays are not required to satisfy special statistical properties, such as mutual independence and spatio-temporal independence. Furthermore, neither the sample paths of December 8, 2020 DRAFT 4 the random graphs nor the mean graphs are necessarily balanced and connected at each time step. These relaxations together with the existence of random time-v arying time delays bring essential dif ﬁculties to the con ver gence analysis, and most existing methods are not applicable. For example, the frequency domain approach ([29],[31]) is only suitable for deterministic uniform time-in v ariant time delays, and the L yapunov-Krasovskii functional method leads to a non-e xplicit LMI type con ver gence condition ([30]). Liu et al [32] and Liu et al [33] addressed distrib uted consensus with deterministic time-varying communication delays and i.i.d. communication graphs. The analysis method therein required the mean graph to be time-in variant and connected at each time step, and is not applicable to time-v arying mean graphs. W e introduce delay matrices to model the random time-varying communication delays between each pair of nodes. By stochastic time-v arying system, martingale con ver gence theories and the binomial expansion of random matrix products, we transform the conv ergence analysis of the algorithm into that of the mathematical expectation of random matrix products. Firstly , for the delay-free case, we show that the algorithm gains can be designed properly such that all nodes’ estimates con ver ge to the true parameter in mean square and almost surely if the observ ation matrices and communication graphs satisfy the stochastic spatio-temporal persistence of excitation conditions. Especially , it is sho wn that for Markovian switching communication graphs and observ ation matrices, this condition holds if the stationary graph is balanced with a spanning tree and the measurement model is spatio-temporally jointly observable . Secondly , for the case with time delays, we propose sev eral conditions for mean square con ver gence, which explicitly relies on the conditional expectations of delay matrices, observ ation matrices and weighted adjacency matrices of communication graphs ov er a sequence of ﬁxed-length time intervals. Furthermore, we sho w that if the communication graphs are conditionally balanced, then under the stochastic spatio-temporal persistence of excitation condition, for any giv en bounded delays, proper algorithm gains can be designed to guarantee mean square con vergence of the algorithm. Compared with the existing literature, our contrib utions are summarized as below . • The delay-free case – W e sho w that it is not necessary that the sequences of observation matrices and communication graphs be mutually independent or spatio-temporally independent. Also, the mean graphs are not necessarily time-inv ariant and balanced. W e estab- lish the stochastic spatio-tempor al persistence of e xcitation condition under which the algorithm with random graphs and observation matrices con verges in mean December 8, 2020 DRAFT 5 square and almost surely . For a network consisting of completely isolated nodes, the stochastic spatio-temporal persistence of excitation condition degenerates to a set of independent stochastic persistence of excitation conditions for centralized algorithms ([38]). – Especially , for the case with Markovian switching communication graphs and obser- v ation matrices, we pro ve that the stoc hastic spatio-temporal persistence of excitation condition holds if the stationary graph is balanced with a spanning tree and the measurement model is spatio-temporally jointly observ able, implying that neither local observability of each node nor instantaneous global observability of the entir e measur ement model is necessary . • The case with time delays – W e introduce delay matrices to model the random time-varying time delays between each pair of nodes. By the method of binomial expansion of random matrix products, we obtain sev eral conditions for mean square con ver gence, which explicitly relies on the conditional expectations of the delay matrices, observation matrices and weighted adjacency matrices of communication graphs ov er a sequence of ﬁxed- length time intervals. These conditions show that for giv en algorithm gains, the communication graphs and observation matrices need to be persistently e xcited with enough intensity to mitigate the random time delays. W e further sho w that if the stoc hastic spatio-temporal persistence of excitation condition holds, then for any giv en bounded delays, proper algorithm gains can be designed to guarantee mean square con ver gence of the algorithm for the case with conditionally balanced digraphs. – The nonuniform random time-varying communication delays considered in this pa- per are more general, and we allo w correlated communication delays, graphs and observ ation matrices. The rest of the paper is arranged as follows. In Section II, we formulate the problem. In Section III, we describe the decentralized cooperativ e online parameter estimation algorithm with random observation matrices, communication graphs and time delays. W e make the con ver gence analysis for the delay-free case and the case with time delays in Sections IV and V , respecti vely . In Section VI, we giv e a numerical example to demonstrate the theoretical results. Finally , we conclude the paper and giv e some future topics in Section VII. Notation and symbols: ◦ : the Hadamard product; ⊗ : the Kronecker product; T r( A ) : the trace of matrix A ; k A k : the 2-norm of matrix A ; A T : the transpose of matrix A ; P { A } : the December 8, 2020 DRAFT 6 probability of ev ent A ; I n : the n dimensional identity matrix; ρ ( A ) : the spectral radius of matrix A ; | a | : the absolute v alue of real number a ; R n : the n dimensional real vector space; A ≥ B : the matrix A − B is positi ve semideﬁnite; b x c : the largest integer less than or equal to x ; d x e : the smallest integer greater than or equal to x ; E [ ξ ] : the mathematical expectation of random variable ξ ; λ min ( A ) : the minimum eigen v alue of real symmetric matrix A ; 1 n : the n dimensional column vector with all entries being one; 0 n × m : the n × m dimensional matrix with all entries being zero; b n = O ( r n ) : lim sup n →∞ | b n | r n < ∞ , where { b n , n ≥ 0 } is a sequence of real numbers, { r n , n ≥ 0 } is a sequence of real positiv e numbers; b n = o ( r n ) : lim n →∞ b n r n = 0 ; For a sequence of n × n dimensional matrices { Z ( k ) , k ≥ 0 } and a sequence of scalars { c ( k ) , k ≥ 0 } , denote Φ Z ( j, i ) = ( Z ( j ) · · · Z ( i ) , j ≥ i I n , j < i. and j Y k = i c ( k ) = ( c ( j ) · · · c ( i ) , j ≥ i 1 , j < i. For an y nonnegati ve integers i and j , denote the Kronecker function by I i,j , satisfying I i,j = 1 if i = j and I i,j = 0 otherwise. I I . P R O B L E M F O R M U L A T I O N A. Measur ement model Consider a network of N nodes. Each node is an estimator with integrated capacity of sensing, computing, storage and communication. The estimators/nodes cooperativ ely estimate an unkno wn parameter vector x 0 ∈ R n via information exchanging. The relation between the measurement vector z i ( k ) ∈ R n i of estimator i and the unknown parameter x 0 is represented by z i ( k ) = H i ( k ) x 0 + v i ( k ) , i = 1 , · · · , N , k ≥ 0 . (1) Here, H i ( k ) ∈ R n i × n is the random observation (regression) matrix at time instant k with n i ≤ n , and v i ( k ) ∈ R n i is the additiv e measurement noise. Denote z ( k ) = [ z T 1 ( k ) , · · · , z T N ( k )] T , H ( k ) = [ H T 1 ( k ) , · · · , H T N ( k )] T and v ( k ) = [ v T 1 ( k ) , · · · , v T N ( k )] T . Rewrite (1) by the compact form z ( k ) = H ( k ) x 0 + v ( k ) , k ≥ 0 . (2) Remark 1. In many real applicaitons, the relations between the unkno wn parameter and the measurements can be represented by (1). For example, in the decentralized multi-area state estimation in power systems, the grid is partitioned into multiple geographically non- ov erlapping areas, and each area is reg arded as a node. The grid state x 0 to be estimated December 8, 2020 DRAFT 7 consists of voltage amplitudes and phase angles at all b uses. The measurement z i ( k ) of each area/node consists of the active and reacti ve power ﬂow , bus injection po wers and v oltage amplitude information measured by remote terminal units and phasor measurement units in the i -th area. By the DC po wer ﬂo w approximation ([34]), the grid state degenerates to the voltage phase angles at all buses and the relation between the measurement of each area and the grid state can be represented by (1). In decentralized parameter identiﬁcation, each node’ s measurement equation is giv en by z i ( k ) = n X j =1 c j z i ( k − j ) + v i ( k ) = [ z i ( k − 1) , · · · , z i ( k − n )][ c 1 , · · · , c n ] T + v i ( k ) . For this case, the unknown parameter x 0 = [ c 1 , · · · , c n ] T and the observation matrix (generally called regressor) H i ( k ) = [ z i ( k − 1) , · · · , z i ( k − n )] is an n dimensional row v ector . In addition, sensing failures in real networks can be modeled by a Markov chain or an i.i.d. sequence of Bernoulli v ariables { δ i ( k ) , k ≥ 0 } . Then H i ( k ) = δ i ( k ) H 0 i ( k ) , where { H 0 i ( k ) , k ≥ 0 } is the sequence of observation matrices without sensing failures. B. Communication models Assume that there exist nonuniform random time-varying communication delays for the communication links between each pair of nodes. W e use a sequence of random v ariables { λ j i ( k ) ∈ { 0 , · · · , d } , k ≥ 0 } to represent the time delays associated with the link from node j to node i , where the positi ve integer d represents the maximum time delay . This sequence is subjected to the discrete probability distribution P { λ j i ( k ) = q } = p j i,q ( k ) with d X q =0 p j i,q ( k ) = 1 . (3) W e stipulate that P { λ ii ( k ) = 0 } = 1 , i = 1 , · · · , N , k ≥ 0 . Denote the N dimensional matrices I ( k , q ) = [ I λ j i ( k ) ,q ] 1 ≤ j,i ≤ N , 0 ≤ q ≤ d , k ≥ 0 , called delay matrices . By the deﬁnition of Kronecker function, we kno w that for each q = 0 , 1 , ..., d , {I ( k , q ) , k ≥ 0 } is a sequence of random matrices and its sample paths are sequences of 0 − 1 matrices. By (3), we kno w that E [ I λ j i ( k ) ,q ] = p j i,q ( k ) and d X q =0 I ( k , q ) = 1 N 1 T N a . s . (4) W e use a sequence of random communication graphs {G ( k ) = hV , A G ( k ) i , k ≥ 0 } to describe the possible link failures among nodes, where V = { 1 , · · · , N } is the node set and A G ( k ) = [ a ij ( k )] 1 ≤ i,j ≤ N is the weighted adjacency matrix of the communication December 8, 2020 DRAFT 8 graph, in which a ii ( k ) = 0 a.s. for all i ∈ V and k ≥ 0 and a ij ( k ) 6 = 0 if and only if the link from node j to node i exists at time instant k for all i 6 = j . The neighbor- hood of node i is N i ( k ) = { j | a ij ( k ) 6 = 0 } . The degree matrix of the graph is D G ( k ) = diag ( P N j =1 a 1 j ( k ) , · · · , P N j =1 a N j ( k )) and the Laplacian matrix of the graph is L G ( k ) = D G ( k ) − A G ( k ) ([36][37]). Denote b L G ( k ) = L G ( k ) + L T G ( k ) 2 . Speciﬁcally , if G ( k ) is balanced, then b L G ( k ) is the Laplacian matrix of the symmetrized graph of G ( k ) , k ≥ 0 ([37]). Let A ( k , q ) = ( A G ( k ) ◦ I ( k , q )) ⊗ I n . (5) Then, by (4) and the abov e, we have d X q =0 A ( k , q ) = A G ( k ) ⊗ I n . (6) I I I . D E C E N T R A L I Z E D C O O P E R A T I V E O N L I N E E S T I M A T I O N A L G O R I T H M Let x i ( k ) ∈ R n be the estimate by node i for the unkno wn parameter x 0 at time instant k , k ≥ − d with the initial estimates x i ( k ) , − d ≤ k ≤ 0 being an y gi ven real vectors. Starting at the initial estimate, at any time instant k ≥ 0 , node i takes a weighted sum of its own estimate and delayed estimates receiv ed from its neighbours, and then adds a correction term based on the local measurement information (innov ation) to update the estimate x i ( k + 1) . Speciﬁcally , the decentralized cooperativ e online parameter estimation algorithm with random observ ation matrices, communication graphs and time delays, motiv ated by a baseline version without time delays in [23], is gi ven by x i ( k + 1) = x i ( k ) + a ( k ) H T i ( k )( z i ( k ) − H i ( k ) x i ( k )) + b ( k ) X j ∈N i ( k ) a ij ( k )( x j ( k − λ j i ( k )) − x i ( k )) , i ∈ V , k ≥ 0 , (7) where a ( k ) and b ( k ) are the inno v ation and consensus algorithm gains, respecti vely . Denote the σ − ﬁleds F ( k ) = σ ( A G ( s ) , v ( s ) , H i ( s ) , λ j i ( s ) , j , i ∈ V , 0 ≤ s ≤ k ) , k ≥ 0 , with F ( − 1) = { Ω , ∅} . For the algorithm (7), we ha ve the following assumptions. A1.a The sequence { v ( k ) , k ≥ 0 } is independent of { H ( k ) , k ≥ 0 } , {A G ( k ) , k ≥ 0 } and { λ j i ( k ) , j, i ∈ V , k ≥ 0 } . A1.b The sequence { v ( k ) , F ( k ) , k ≥ 0 } is a martingale difference sequence and there exists a constant β v > 0 such that sup k ≥ 0 E [ k v ( k ) k 2 |F ( k − 1)] ≤ β v a . s . A2.a sup k ≥ 0 k H ( k ) k < ∞ a . s . and sup k ≥ 0 kA G ( k ) k < ∞ a . s . A2.b There exist positiv e constants β a and β H such that max i,j ∈V sup k ≥ 0 | a ij ( k ) | ≤ β a a . s . and max i ∈V sup k ≥ 0 k H i ( k ) k ≤ β H a . s . December 8, 2020 DRAFT 9 For the algorithm gains, we make the follo wing conditions. C1.a The sequences { a ( k ) , k ≥ 0 } and { b ( k ) , k ≥ 0 } are positiv e real sequences mono- tonically decreasing to zero, satisfying a ( k ) = O ( b ( k )) . C1.b b 2 ( k ) = o ( a ( k )) , a ( k ) = O ( a ( k + 1)) and P ∞ k =0 a ( k ) = ∞ . C1.c P ∞ k =0 b 2 ( k ) < ∞ . Remark 2. Note that, in Assumption A1.a , neither mutual independence nor spatio-temporal independence is assumed on the observ ation matrices, communication graphs and time delays. Remark 3. It is easy to ﬁnd { a ( k ) , k ≥ 0 } and { b ( k ) , k ≥ 0 } satisfying Conditions C1.a – C1.c . If a ( k ) = 1 ( k +1) τ 1 , b ( k ) = 1 ( k +1) τ 2 , k ≥ 0 , 0 . 5 < τ 2 ≤ τ 1 ≤ 1 , then these conditions hold. By the deﬁnition of I λ j i ( k ) ,q , we kno w that x j ( k − λ j i ( k )) = P d q =0 x j ( k − q ) I λ j i ( k ) ,q . Then by (7), we have x i ( k + 1) = x i ( k ) + a ( k ) H T i ( k )[ z i ( k ) − H i ( k ) x i ( k )] + b ( k ) X j ∈N i ( k ) a ij ( k ) " d X q =0 x j ( k − q ) I λ j i ( k ) ,q − x i ( k ) # , i ∈ V . (8) Denote H ( k ) = diag { H 1 ( k ) , · · · , H N ( k ) } and x ( k ) = [ x T 1 ( k ) , · · · , x T N ( k )] T . By (5), rewrite (8) as x ( k + 1) = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] x ( k ) + b ( k ) d X q =0 A ( k , q ) x ( k − q ) + a ( k ) H T ( k ) z ( k ) . (9) Denote the ov erall estimation error vector e ( k ) = x ( k ) − 1 N ⊗ x 0 . Note that ( L G ( k ) ⊗ I n )( 1 N ⊗ x 0 ) = 0 . By (2) and (6), subtracting 1 N ⊗ x 0 on both sides of (9) leads to e ( k + 1) = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] x ( k ) + b ( k ) d X q =0 A ( k , q ) x ( k − q ) + a ( k ) H T ( k ) z ( k ) − 1 N ⊗ x 0 = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )]( x ( k ) − 1 N ⊗ x 0 + 1 N ⊗ x 0 ) + b ( k ) d X q =0 A ( k , q )( x ( k − q ) − 1 N ⊗ x 0 + 1 N ⊗ x 0 ) + a ( k ) H T ( k ) z ( k ) − 1 N ⊗ x 0 = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )]( e ( k ) + 1 N ⊗ x 0 ) + b ( k ) d X q =0 A ( k , q )( e ( k − q ) + 1 N ⊗ x 0 ) + a ( k ) H T ( k ) z ( k ) − 1 N ⊗ x 0 December 8, 2020 DRAFT 10 = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] e ( k ) − a ( k ) H T ( k ) H ( k )( 1 N ⊗ x 0 ) + b ( k ) d X q =0 A ( k , q ) e ( k − q ) + a ( k ) H T ( k ) H ( k ) x 0 + a ( k ) H T ( k ) v ( k ) , which together with H ( k )( 1 N ⊗ x 0 ) = H ( k ) x 0 gi ves the ov erall estimation error equation e ( k + 1) = [ I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] e ( k ) + b ( k ) d X q =0 A ( k , q ) e ( k − q ) + a ( k ) H T ( k ) v ( k ) , k ≥ 0 . (10) For the delay-free case, d = 0 . Then the algorithm (9) becomes x ( k + 1) = [ I N n − b ( k ) L G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] x ( k ) + a ( k ) H T ( k ) z ( k ) , (11) and the estimation error equation (10) becomes e ( k + 1) = [ I N n − b ( k ) L G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k )] e ( k ) + a ( k ) H T ( k ) v ( k ) . (12) Remark 4. In this paper , we use the concept of Laplacians of digraphs deﬁned in [37], which is widely used in the literature on decentralized estimation ([11]–[24]). For the delay- free case, the consensus term [ L G ( k ) ⊗ I n ] x ( k ) naturally appears in the algorithm (11). Note that another concept of symmetric Laplacians of digraphs is proposed in [35]. This symmetric Laplacian in v olves the Perron vector of the weighted adjacenc y matrix. It has been pointed out in [35] that for a general digraph, there is no closed form solution for the Perron vector . Generally , the i th element of the Perron vector , which is not local information of the i th node, depends on the weights associated to all nodes. Therefore, though the Laplacian proposed in [35] is symmetric, it is generally incompatible with the decentralized nature of the estimation algorithm. I V . T H E D E L A Y - F R E E C A S E In this section, we giv e the con vergence conditions of the algorithm (7) for the delay-free case, i.e. λ j i ( k ) = 0 , a.s. ∀ j, i ∈ V , ∀ k ≥ 0 . All proofs of results are put in Appendix B. For any given positiv e integers h and m , denote Λ h m = λ min " ( m +1) h − 1 X k = mh  E [ b L G ( k ) |F ( mh − 1)] ⊗ I n + E [ H T ( k ) H ( k ) |F ( mh − 1)]  # , Λ h m = λ min " ( m +1) h − 1 X k = mh  b ( k ) E [ b L G ( k ) |F ( mh − 1)] ⊗ I n + a ( k ) E [ H T ( k ) H ( k ) |F ( mh − 1)]  # . W e ﬁrst give a result for the case with general processes of random graphs and observ ation matrices. December 8, 2020 DRAFT 11 Theorem IV .1. Suppose that Assumptions A1.a – A1.b hold. If Condition C1.a holds, and there e xists an integer h > 0 , a constant ρ 0 > 0 and a positiv e real sequence { c ( m ) , m ≥ 0 } with b 2 ( mh ) = o ( c ( m )) , ∞ X m =0 c ( m ) = ∞ , (13) such that (b .1) Λ h m ≥ c ( m ) a.s., m ≥ 0 and (b .2) sup k ≥ 0 [ E [( kL G ( k ) k + kH T ( k ) H ( k ) k ) 2 max { h, 2 } |F ( k − 1)]] 1 2 max { h, 2 } ≤ ρ 0 a . s ., then the algorithm (7) con ver ges in mean square, that is, lim k →∞ E k x i ( k ) − x 0 k 2 = 0 , i ∈ V . In addition, if Assumption A2.a and Condition C1.c hold, then the algorithm (7) con verges almost surely , i.e. lim k →∞ x i ( k ) = x 0 , i ∈ V a . s . Remark 5. Most existing literature on decentralized estimation suppose that the mean graphs are balanced ([22],[24]). Here, the condition (b.1) in Theorem IV .1 may still hold e ven if the mean graphs are unbalanced. For e xample, consider a ﬁxed weighted graph G = hV = { 1 , 2 } , A G = [ a ij ] 2 × 2 i with a 12 = 1 and a 21 = 0 . 3 . Obviously , G is unbalanced. Suppose H 1 = 0 , H 2 = 1 . Choose a ( k ) = b ( k ) = 1 k +1 . W e hav e λ min ( b ( m ) b L G + a ( m ) H T H ) = 1 m +1 λ min ( b L G + H T H ) = 0 . 5821 m +1 . Then, the condition (b .1) holds with h = 1 and c ( m ) = 0 . 5821 m +1 satisfying (13). A more complex example with unbalanced mean graphs is gi ven in Section VI. Next, we giv e Theorem IV .2 for the case with conditionally balanced digraphs: Γ 1 = n {G ( k ) , k ≥ 0 }| the random matrix E [ A G ( k ) |F ( k − 1)] is nonnegati ve and its associated random graph is balanced a.s. , k ≥ 0 o . Theorem IV .2. Suppose that {G ( k ) , k ≥ 0 } ∈ Γ 1 , Assumptions A1.a – A1.b hold. If Condi- tions C1.a – C1.b hold, and there exists an integer h > 0 , positiv e constants θ and ρ 0 such that (c.1) inf m ≥ 0 Λ h m ≥ θ > 0 a . s . and (c.2) sup k ≥ 0 [ E [( kL G ( k ) k + kH T ( k ) H ( k ) k ) 2 max { h, 2 } |F ( k − 1)]] 1 2 max { h, 2 } ≤ ρ 0 a . s . , then the algorithm (7) con verges in mean square. In addition, if Assumption A2.a and Condition C1.c hold, then the algorithm (7) con verges almost surely . Remark 6. The condition (b .1) in Theorem IV .1 and the condition (c.1) in Theorem IV .2 are the ke y con ver gence conditions. W e call them the stochastic spatio-temporal persistence of excitation conditions. In detail, spatio emphasizes the reliance of the conditions on the communication graphs and observ ation matrices ov er all nodes rather than a single node, while temporal represents the summing matrices o ver a sequence of ﬁxed-length time interv als rather December 8, 2020 DRAFT 12 than a single time step, and “persistence of excitation” represents that the minimum eigen v al- ues of matrices consisting of spatio-temporal observation matrices and Laplacian matrices are uniformly bounded away from zero with respect to the sample paths in some sense. Guo [38] considered centralized estimation algorithms with random observ ation matrices and proposed the “stochastic persistence of excitation” condition to ensure con ver gence. The condition (c.1) can be regarded as the generalization of “stochastic persistence of excitation” condition in [38] to that for decentralized algorithms. For a network with N isolated nodes, L G ( k ) ≡ 0 N × N a.s., and the condition (c.1) degenerates to N independent “stochastic persistence of excitation” conditions. In the most existing literature, it was also required that the sequence of observation matrices be i.i.d. and independent of the sequence of communication graphs, neither of which is necessary in Theorems IV .1 and IV .2. Subsequently , we give more intuitiv e con vergence conditions for the case with Markovian switching communication graphs and observ ation matrices. W e ﬁrst make the follo wing assumption. A3 {hH ( k ) , A G ( k ) i , k ≥ 0 } ⊆ S is a homogeneous and uniform ergodi c Markov chain with a unique stationary distribution π . Here, S = {hH l , A l i , l = 1 , 2 , ... } with H l = diag ( H 1 ,l , · · · , H N ,l ) , where { H i,l ∈ R n i × n , l = 1 , 2 , ... } is the state space of observ ation matrices of node i and {A l , l = 1 , 2 , ... } being the state space of the weighted adjacency matrices, π = [ π 1 , π 2 , ... ] T , π l ≥ 0 , l = 1 , 2 , ... , and P ∞ l =1 π l = 1 with π l representing π ( hH l , A l i ) . Corollary IV .1. Suppose that Assumptions A1.a – A1.b , A3 hold, and sup l ≥ 1 kA l k < ∞ , sup l ≥ 1 kH l k < ∞ . If Conditions C1.a – C1.c hold, and (d.1) the stationary weighted adjacency matrix P ∞ l =1 π l A l is nonnegati ve and its associated graph is balanced with a spanning tree; (d.2) the measurement model (1) is spatio-temporally jointly observable , i.e. λ min N X i =1 ∞ X l =1 π l H T i,l H i,l !! > 0 , (14) then the algorithm (7) con verges in mean square and almost surely . Remark 7. Most of the e xisting decentralized estimation algorithms used the mathematical expectation of observation matrices which is restricted to be time-in variant and difﬁcult to be obtained ([22],[24]). They required instantaneous global observ ability in the statistical December 8, 2020 DRAFT 13 sense for the measurement model, i.e., P N i =1 H T i H i is positiv e deﬁnite, where H i is a ﬁxed matrix with E [ H i ( k )] ≡ H i , for all k ≥ 0 , i = 1 , 2 , ..., N . In contrast, we only use the sample paths of observation matrices in the algorithm (7). The mathematical expectations of observ ation matrices are allo wed to be time-varying. W e prov e that for homogeneous and uniform ergodic Markovian switching observation matrices and communication graphs, the stochastic spatio-temporal persistence of excitation condition gi ven in Theorem IV .2 holds if the stationary graph is balanced with a spanning tree and the measurement model is spatio- temporally jointly observ able, that is, (14) holds, implying that neither local observability of each node, i.e. λ min ( P ∞ l =1 π l H T i,l H i,l ) > 0 , i ∈ V , nor instantaneous global observ ability of the entire measurement model, i.e. λ min ( P N i =1 H T i,l H i,l ) > 0 , l = 1 , 2 , ... , is needed. V . T H E C A S E W I T H R A N D O M T I M E - V A RY I N G C O M M U N I C A T I O N D E L A Y S In this section, we analyze the con vergence of the algorithm (7) with random observ ation matrices, communication graphs and time delays simultaneously . All proofs of results are put in Appendix C. In the presence of random time-v arying communication delays, the mean square con ver - gence analysis of the algorithm becomes very difﬁcult. T o address this, we transform the estimation error equation (10) into the follo wing equi valent system ([32]-[33]). r ( k + 1) = F ( k ) r ( k ) + g ( k ) , g ( k ) = d X q =1 C q ( k ) g ( k − q ) + a ( k ) H T ( k ) v ( k ) , k ≥ 0 , (15) where F ( k ) , C q ( k ) , 1 ≤ q ≤ d , k ≥ 0 satisfy F ( k ) + C 1 ( k ) = I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k ) + b ( k ) A ( k , 0) , C 1 ( k ) F ( k − 1) − C 2 ( k ) = − b ( k ) A ( k , 1) , C 2 ( k ) F ( k − 2) − C 3 ( k ) = − b ( k ) A ( k , 2) , . . . C d − 1 ( k ) F ( k − d + 1) − C d ( k ) = − b ( k ) A ( k , d − 1) , C d ( k ) F ( k − d ) = − b ( k ) A ( k , d ) . (16) Here, F ( k ) = I N n , − d ≤ k ≤ − 1 . It can be v eriﬁed that if r ( k ) = e ( k ) , − d ≤ k ≤ − 1 , then r ( k ) = e ( k ) , ∀ k ≥ 0 , i.e. the system (10) and the system (15)-(16) are equiv alent. W e need the following condition on the consensus gain. C1.d The initial consensus gain b (0) ≤ max 0 <ψ < 1 f C 1 ,β a ,β H ,N ,d ( ψ ) , where f C 1 ,β a ,β H ,N ,d ( ψ ) , ψ N β a + C 1 β 2 H + N β a [(1 − ψ ) − ( d +1) − 1] / [(1 − ψ ) − 1 − 1] , d ≥ 1 , ψ ∈ (0 , 1) , December 8, 2020 DRAFT 14 with C 1 , sup k ≥ 0 a ( k ) b ( k ) . It can be veriﬁed that gi ven Assumption A2.b and Condition C1.a , max 0 <ψ < 1 f C 1 ,β a ,β H ,N ,d ( ψ ) is well-deﬁned. Examples of f 1 , 1 , 1 ,N ,d ( · ) with different d and N are shown in Figure 1. W e ﬁrst establish a lemma as the basis of con ver gence analysis. Lemma V .1. If Assumption A2.b , Conditions C1.a and C1.d hold, then F ( k ) is in vertible and k F − 1 ( k ) k ≤ (1 − ψ 1 ) − 1 a.s., ∀ k ≥ 0 , where ψ 1 = min { ψ ∈ (0 , 1) | f C 1 ,β a ,β H ,N ,d ( ψ ) ≥ b (0) } . Note that, by the continuity of f C 1 ,β a ,β H ,N ,d ( · ) and Conditon C1.d , it is known that the set { ψ ∈ (0 , 1) | f C 1 ,β a ,β H ,N ,d ( ψ ) ≥ b (0) } is a nonempty and bounded closed set. Then, ψ 1 is well-deﬁned. 0 0.2 0.4 0.6 0.8 1 0 0.005 0.01 0.015 0.02 0.025 0.03 ψ f 1 , 1 , 1 , N , d ( ψ ) d=1,N=5 d=2,N=6 d=3,N=10 Fig. 1: The curves of f 1 , 1 , 1 ,N ,d ( · ) . If the conditions of Lemma V .1 hold, then F ( k ) is in vertible a.s. Thus, by (16), we ha ve F ( k ) = I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k ) + b ( k ) A ( k , 0) − C 1 ( k ) = I N n − b ( k ) D G ( k ) ⊗ I n − a ( k ) H T ( k ) H ( k ) + b ( k ) A ( k , 0) − ( C 2 ( k ) − b ( k ) A ( k , 1)) F − 1 ( k − 1) . . . = I N n − G ( k ) , k ≥ 0 , (17) where G ( k ) , b ( k ) D G ( k ) ⊗ I n + a ( k ) H T ( k ) H ( k ) − b ( k ) d X q =0 A ( k , q )  Φ F ( k − 1 , k − q )  − 1 . (18) For any given positiv e integers h and m , denote e Λ h m = λ min " ( m +1) h − 1 X k = mh b ( k ) E [ b L G ( k ) |F ( mh − 1)] ⊗ I n + a ( k ) E [ H T ( k ) H ( k ) |F ( mh − 1)] December 8, 2020 DRAFT 15 − b ( k ) 2 d X q =0 E h A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] + [[Φ F ( k − 1 , k − q )] − 1 − I N n ] T A T ( k , q )    F ( mh − 1) i !# . (19) Theorem V .1. Suppose that Assumptions A1.a – A1.b , A2.b hold. If Conditions C1.a , C1.d hold, and there exists an integer h > 0 and a positi ve real sequence { c ( m ) , m ≥ 0 } with b 2 ( mh ) = o ( c ( m )) , P ∞ m =0 c ( m ) = ∞ , such that e Λ h m ≥ c ( m ) a . s ., m ≥ 0 , (20) then the algorithm (7) con verges in mean square. If {hH ( k ) , A G ( k ) , λ j i ( k ) , j, i ∈ V i , k ≥ 0 } is an independent random process, then Corollary V .1 below gi ves a sufﬁcient condition for the condition (20) in Theorem V .1 to hold, which is more intuitive and computable. Corollary V .1. Suppose that Assumptions A1.a – A1.b , A2.b hold, {hH ( k ) , A G ( k ) , λ j i ( k ) , j, i ∈ V i , k ≥ 0 } is an independent process. If Condition C1.a holds, b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ 2 ) with ψ 2 ∈ (0 , 2 1 d − 1) , and there exists an integer h > 0 and a positiv e real sequence { c ( m ) , m ≥ 0 } with b 2 ( mh ) = o ( c ( m )) and P ∞ m =0 c ( m ) = ∞ , such that Λ h m − ( m +1) h − 1 X k = mh " b ( k ) d X q =0 k E [ A ( k , q )] k [(1 + ψ 2 ) q − 1] 2 − (1 + ψ 2 ) q # ≥ c ( m ) , m ≥ 0 , (21) then the algorithm (7) con verges in mean square. Next, for the case with conditionally balanced digraphs, the following corollary presents a more intuiti ve con ver gence condition. Corollary V .2. Suppose that Assumptions A1.a – A1.b , A2.b hold and {G ( k ) , k ≥ 0 } ∈ Γ 1 . If Conditions C1.a – C1.b , C1.d hold, b ( k ) = O ( a ( k )) , and there exists an integer h > 0 , a constant θ > 0 such that inf m ≥ 0 (Λ h m − Σ h m ) ≥ θ a . s . (22) where Σ h m = C 2 ( C 3 ) h max { 1 , C 1 } ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )([Φ F ( k − 1 , k − q )] − 1 − I N n ) |F ( mh − 1)] k December 8, 2020 DRAFT 16 with C 2 , sup k ≥ 0 b ( k ) a ( k ) and C 3 , sup k ≥ 0 a ( k ) a ( k +1) , then, the algorithm (7) conv erges in mean square. Furthermore, if {hH ( k ) , A G ( k ) , λ j i ( k ) , j, i ∈ V i , k ≥ 0 } is an independent process, then (22) holds if there exist an integer h > 0 such that inf m ≥ 0 Λ h m > C 2 ( C 3 ) h max { 1 , C 1 } sup m ≥ 0 " ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )] k [(1 + ψ 2 ) q − 1] 2 − (1 + ψ 2 ) q # , (23) and b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ 2 ) , with ψ 2 ∈ (0 , 2 1 d − 1) , where C 1 is deﬁned in Condition C1.d . Remark 8. Theorem V .1, Corollaries V .1-V .2 giv e explicit con vergence conditions under which all nodes’ estimates con ver ge to the true parameter in mean square. Existing literature used the L yapunov-Kraso vskii functional method to deal with time delays and obtained the non-explicit LMI type con vergence condition ([30]). In contrast, here, we transform the system with random time-varying communication delays into an equiv alent delay-free system by introducing an auxiliary system and then adopt the method of binomial expansion of random matrix products to transform the mean square con ver gence analysis of the delay-free system into that of the mathematical expectation of random matrix products, and obtain the key con ver gence conditions (20)-(22) which explicitly rely on the conditional e xpectations of delay matrices, observ ation matrices and weighted adjacency matrices of communication graphs ov er a sequence of ﬁx ed-length time interv als. In the absence of time delays, the condition (20) degenerates to the condition (b .1) in Theorem IV .1. Remark 9. The conditions (21) and (23) can be further simpliﬁed for special delay processes. If the delays are independent of the graphs, then E [ A ( k , q )] = E [ A G ( k ) ] ◦ E [ I ( k , q )] . Here, the element in the i th row and the j th column of E [ I ( k , q )] , E [ I ( k , q )] ij = P { λ j i ( k ) = q } = p j i,q ( k ) . In addition, • if λ j i ( k ) are identically distrib uted w .r .t. k , then E [ I ( k , q )] ij = p j i,q (0) , ∀ k ≥ 0 ; • if λ j i ( k ) are identically distributed w .r .t. both k and ( j , i ) , then E [ I ( k , q )] ij = p q , i 6 = j where p q denotes the probability that the packet is delayed by q steps for all k and ( j, i ) , j 6 = i . Therefore, k E [ A ( k , q )] k = p q k E [ A G ( k ) ] k . Furthermore, if the graph sequence is an i.i.d. process, then the condition (23) becomes inf m ≥ 0 Λ h m > C 2 ( C 3 ) h max { 1 , C 1 } h k E [ A G (0) ] k d X q =0 p q [(1 + ψ 2 ) q − 1] 2 − (1 + ψ 2 ) q . Corollaries V .1-V .2 show that for giv en algorithm gains { a ( k ) , k ≥ 0 } and { b ( k ) , k ≥ 0 } , if the communication graphs and observation matrices are persistently excited with enough December 8, 2020 DRAFT 17 intensity , then the additional effects of time delays can be mitigated. The maximum delay bound d that can be allowed is related to the weighted adjacency matrix of mean graphs E [ A G ( k ) ] , the probability distrib ution of time delays E [ I ( k , q )] and the algorithm gains. In the absence of time delays, (22) degenerates to the condition (c.1) in Theorem IV .2. The follo wing corollary sho ws that for the case with conditionally balanced graphs, if the stoc hastic spatio-temporal persistence of excitation condition inf m ≥ 0 Λ h m ≥ θ a.s. holds, then for any gi ven bounded delays, mean square con ver gence of the algorithm can be guaranteed if the algorithms gains are properly designed and suf ﬁciently small. Corollary V .3. Suppose that Assumptions A1.a – A1.b , A2.b hold, {G ( k ) , k ≥ 0 } ∈ Γ 1 and there exists an integer h > 0 , a constant θ > 0 such that inf m ≥ 0 Λ h m ≥ θ a.s. If Conditions C1.a – C1.b hold, b ( k ) = O ( a ( k )) , and b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ 3 ) with ψ 3 ∈ (0 , (1 + θ / [ θ + N C 2 ( C 3 ) h max { 1 , C 1 } β a dh ]) 1 d − 1) , then the algorithm (7) con ver ges in mean square. V I . N U M E R I C A L E X A M P L E W e apply our algorithm to decentralized multi-area online state estimation in power systems to illustrate the effecti veness of the obtained theoretical results. An IEEE 14-bus system is used for the test, which has 14 buses and is partitioned into 4 areas A 1 , A 2 , A 3 , A 4 , sho wn in Figure 2. After a DC power ﬂow approximation ([34]), the grid state to be estimated degenerates into a vector of voltage phase angles at all buses. Let bus 1’ s voltage phase angle be zero, as the reference bus. The grid state to be estimated is giv en by x 0 = [ − 4 . 98 , − 12 . 72 , − 11 . 33 , − 8 . 78 , − 14 . 22 , − 13 . 37 , − 13 . 36 , − 14 . 94 , − 15 . 10 , − 14 . 79 , − 15 . 05 , − 15 . 12 , − 16 . 03] T . The measurements z i ( k ) are linearly related to x 0 , gi ven by z i ( k ) = s i ( k ) H 0 i x 0 + v i ( k ) , i = 1 , 2 , 3 , 4 , where the noise { v i ( k ) , k ≥ 0 } is assumed to be an i.i.d. process with the standard normal distribution, { s i ( k ) , k ≥ 0 } is an i.i.d. sequence, modelling the sensing failures with P { s i ( k ) = 1 } = P { s i ( k ) = 0 } = 0 . 5 , and H 0 i , i = 1 , ..., 4 are the observ ation matrices, which are deterministic and giv en in Appendix D. There are 4 random communication links with 0 − 1 weights, represented by the red dotted lines in Figure 2. At odd time instants, the link from A 2 to A 3 aw akes with the probability 0 . 5 and the others sleep; at e ven time instants, the link from A 2 to A 3 sleeps and the others aw ake with the probability 0 . 5 . Both {G ( k ) , k ≥ 0 } and { H ( k ) , k ≥ 0 } are independent processes. W e use the av eraged relativ e error , P 4 i =1 k x i ( k ) − x 0 k 4 k x 0 k , to ev aluate the performance of the algorithm. December 8, 2020 DRAFT 18 A 1 A 2 A 4 A 3 1 5 2 10 9 14 13 6 12 11 3 8 7 4 : i nj e c t i on m e a s ur e m e nt s : f l ow m e a s ur e m e nt s : c om m uni c a t i on l i nks Fig. 2: IEEE-14 multi-area buses and the communication graphs. For the delay-free case, set a ( k ) = b ( k ) = 0 . 5 ( k +1) 0 . 52 . Let c ( m ) = 0 . 0112 (2 m +2) 0 . 52 . When h = 2 , we plot the curves of Λ 2 m and c ( m ) w .r .t. m in Figure 3, which sho ws that Λ 2 m ≥ c ( m ) , m ≥ 0 . The conditions of Theorem IV .1 hold. Figure 4 is depicted with the curves of the a veraged relati ve errors, where the red line represents the error curve of the algorithm without random link failures and sensing failures, as the base case. It sho ws that in spite of the unbalance of the mean graphs and the sensing failures, the four areas’ estimates con ver ge to x 0 . For the case with time delays, assume that the delays are independent of the communi- cation graphs, observ ation matrices and measurement noises, and subjected to the Bernoulli distribution, i.e. λ j i ( k ) ∼ B ( d, p ) for all k and ( j, i ) . Then P { λ j i ( k ) = q } = C q d p q (1 − p ) d − q , q = 0 , · · · , d. (24) Set d = 4 , p = 0 . 4 . W e now v erify the con ver gence conditions in Corollary V .1. Let a ( k ) = b ( k ) . Then C 1 = 1 . By the abov e settings of communication graphs and measurement matrices, we kno w that β a = 1 , β H = 4 . 07 . Let ψ 2 = 0 . 01 . Then we hav e f C 1 ,β a ,β H ,N ,d ( ψ 2 ) = 0 . 0005 . Then, let b ( k ) = 0 . 0005 ( k +1) 0 . 1 . By the deﬁnition of p q in Remark 9, it follows from (24) that p q = C q 4 0 . 4 q 0 . 6 4 − q , q = 0 , ..., 4 . Note that k E [ A G ( k ) ] k ≡ 0 . 5 . As is discussed in Remark 9, k E [ A ( k , q )] k = p q k E [ A G ( k ) ] k . Hence, it can be calculated that ( m +1) h − 1 X k = mh b ( k ) d X q =0 k E [ A ( k , q )] k [(1 + ψ 2 ) q − 1] 2 − (1 + ψ 2 ) q = 2 m +1 X k =2 m b ( k ) 4 X q =0 0 . 5 p q [(1 + ψ 2 ) q − 1] 2 − (1 + ψ 2 ) q = 0 . 01 2 m +1 X k =2 m b ( k ) . Note that Λ 2 m = λ min [ P 2 m +1 k =2 m b ( k )( E [ b L G ( k ) ] ⊗ I 13 + E [ H T ( k ) H ( k )])] . Let c ( m ) = 0 . 00001 (4 m +4) 0 . 1 . W e plot the curves of Λ 2 m − 0 . 01 P 2 m +1 k =2 m b ( k ) and c ( m ) w .r .t. m in Figure 5, sho wing that December 8, 2020 DRAFT 19 the condition (21) in Corollary V .1 holds. Figure 6 is depicted with curve of the averaged relati ve error , which conﬁrms Corollary V .1. 0 100 200 300 400 500 0 0.005 0.01 0.015 m c(m ) Λ 2 m Fig. 3: Curves of Λ 2 m and c ( m ) = 0 . 0112 (2 m +2) 0 . 52 w .r .t. m . 0 2000 4000 6000 8000 10000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 P 4 i =1 k x i ( k ) − x 0 k / (4 k x 0 k ) k Case with link and sensing failures Case without link and sensing failures Fig. 4: Curves of the av eraged relativ e error P 4 i =1 k x i ( k ) − x 0 k 4 k x 0 k for the delay-free case. 0 100 200 300 400 500 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 x 10 −6 m c ( m ) = 0 . 00 00 1 (4 m ) 0 . 1 Λ 2 m − 0 . 0 1( b (2 m − 1 ) + b (2 m ) ) Fig. 5: Curves of Λ 2 m − 0 . 01 P 2 m +1 k =2 m b ( k ) and c ( m ) w .r .t. m . 0 1 2 3 4 5 6 x 10 4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 P 4 i =1 k x i ( k ) − x 0 k / ( 4 k x 0 k ) k Fig. 6: Curve of the av eraged relativ e error for the case with random time delays. V I I . C O N C L U S I O N In this paper , we analyzed the con ver gence of the decentralized cooperati ve online parameter estimation algorithm in an uncertain communication en vironment. Each node has a partial linear observation of the unknown parameter with random time-varying observation matrices. The underlying communication network is modeled by a sequence of random digraphs and is subjected to nonuniform random time-v arying delays in channels. F or the delay-free case, we proved that if the observation matrices and the graph sequence satisfy the stochastic spatio-temporal persistence of excitation condition, then the algorithm gains can be designed properly such that all nodes’ estimates con ver ge to the true parameter in mean square and December 8, 2020 DRAFT 20 almost surely . Specially , for Markovian switching communication graphs and observation matrices, this condition holds if the stationary graph is balanced with a spanning tree and the measurement model is spatio-temporally jointly observable. For the case with communication delays, we introduced delay matrices to model the random time-varying communication delays, adopted the method of binomial expansion of random matrix products to transform the mean square con vergence analysis of the algorithm into that of the mathematical expectation of random matrix products, and obtained mean square con ver gence conditions explicitly relying on the conditional expectations of delay matrices, observation matrices and weighted adjacency matrices of communication graphs ov er a sequence of ﬁxed-length interv als. In the absence of time delays, these mean square conv ergence conditions degenerate to the stochastic spatio-temporal persistence of excitation conditions. Especially , giv en that the digraphs are conditionally balanced, we sho w that if the stochastic spatio-tempor al persistence of excitation condition holds, then for any giv en bounded delay , proper algorithm gains can be designed to guarantee mean square con vergence of the algorithm. There are many interesting open issues for future research. Theorem V .1 is established for a very general type of delays, namely random and unordered. This means that in the practical implementation, the packets of information exchanged by pairs of nodes are being placed in a processing queue without any regard to their transmit time stamp. In some cases, all receiv ed packets are ordered by the time stamp of their transmission, and the communication delays would be random and monotone ([32],[44],[45]). How to explore monotonicity constraints in the random delay process to relax the conditions or strengthen the results of Theorem V .1 would be an interesting and challenging issue. The main obstacle is ho w to deal with the delay- induced products of the in verse of matrices, which is difﬁcult and may need more advanced techniques. Another important issue is the con ver gence rate of the algorithm. Especially , Corollary V .3 sho ws that for the case with conditionally balanced graphs, if the stochastic spatio-temporal persistence of excitation condition holds, then for any giv en bounded delays, mean square con ver gence of the algorithm can be guaranteed if we choose suf ﬁciently small algorithms gains. Ho wev er , smaller algorithm gains generally lead to a slower con vergence. Thus, how to choose the algorithms gains for optimizing the con ver gence rate is an interesting topic for future in vestig ation. A P P E N D I X A S E V E R A L U S E F U L L E M M A S Deﬁnition A.1. ([39]) A Markov chain on a countable state space S with a stationary distribution π and transition function P ( x, · ) is called uniform ergodic, if there exist pos- December 8, 2020 DRAFT 21 iti ve constants r > 1 and R such that for all x ∈ S , k P n ( x, · ) − π k ≤ Rr − n . Here, k P n ( x, · ) − π k = P y | P n ( x, y ) − π y | . Lemma A.1. ([40]) For any giv en matrix P , denote W = I − P . If there exists a constant ψ ∈ (0 , 1) such that k P k ≤ ψ , then W is in vertible and k W − 1 k ≤ (1 − k P k ) − 1 ≤ (1 − ψ ) − 1 . Lemma A.2. ([41]) Assume that { s 1 ( k ) , k ≥ 0 } and { s 2 ( k ) , k ≥ 0 } are real sequences satisfying 0 ≤ s 2 ( k ) < 1 , P ∞ k =0 s 2 ( k ) = ∞ and lim k →∞ s 1 ( k ) s 2 ( k ) exists. Then lim k →∞ k X i =1 s 1 ( i ) k Y l = i +1 (1 − s 2 ( l )) = lim k →∞ s 1 ( k ) s 2 ( k ) . Lemma A.3. ([42]) Assume that { x ( k ) , F ( k ) } , { α ( k ) , F ( k ) } , { β ( k ) , F ( k ) } and { γ ( k ) , F ( k ) } are all nonnegati ve adaptiv e sequences, satisfying E [ x ( k + 1) |F ( k )] ≤ (1 + α ( k )) x ( k ) − β ( k ) + γ ( k ) , k ≥ 0 a . s . If P ∞ k =0 ( α ( k ) + γ ( k )) < ∞ a . s . , then x ( k ) con v erges to a ﬁnite random v ariable a.s. and P ∞ k =0 β ( k ) < ∞ a . s . For the subsequent Lemmas A.4 and A.5, the readers may be referred to Theorem 6.4 and its next paragraph in Ch. 6 of [43]. Lemma A.4. (Conditional Lyapuno v inequality) Denote the probability space by (Ψ , F , P ) . Let F 1 be a sub σ − algebra of F and ξ be a random v ariable on (Ψ , F , P ) . Then ( E [ | ξ | s |F 1 ]) 1 s ≤ ( E [ | ξ | t |F 1 ]) 1 t a . s ., 0 < s < t. Lemma A.5. (Conditional H ¨ older inequality) Denote the probability space (Ψ , F , P ) . Let F 1 be a sub σ − algebra of F . Let ξ and η be two random variables on (Ψ , F , P ) . Let constants p ∈ (1 , ∞ ) , q ∈ (1 , ∞ ) and 1 /p + 1 /q = 1 . If E [ | ξ | p ] < ∞ and E [ | η | q ] < ∞ , then E [ | ξ η ||F 1 ] ≤ ( E [ | ξ | p |F 1 ]) 1 p ( E [ | η | q |F 1 ]) 1 q a . s . Lemma A.6. For any random matrix A ∈ R m × n , k E [ AA T ] k ≤ n k E [ A T A ] k . Pr oof. By the properties of matrix trace, we ha ve k E [ AA T ] k = λ max ( E [ AA T ]) ≤ T r( E [ AA T ]) = T r( E [ A T A ]) ≤ nλ max ( E [ A T A ]) = n k E [ A T A ] k . Lemma A.7. Let A = [ a ij ] N × N be a weighted adjacency matrix of an undirected graph with N nodes and L be the associated Laplacian matrix. Let x = [ x T 1 , ..., x T N ] T ∈ R N n be any giv en nonzero N n -dimensional v ector where x i ∈ R n , i = 1 , 2 , ..., N and there exists i 6 = j , such that x i 6 = x j . If a ij ≥ 0 , i, j = 1 , 2 , ..., N and the graph is connected, then x T ( L ⊗ I n ) x > 0 . December 8, 2020 DRAFT 22 Pr oof. By the deﬁnition of Laplacian matrix, we hav e x T ( L ⊗ I n ) x = 1 2 P N i =1 P N j =1 a ij k x i − x j k 2 . Noting that there exists i 6 = j , such that x i 6 = x j and the graph is connected, by a ij ≥ 0 , i, j = 1 , 2 , ..., N , we get x T ( L ⊗ I n ) x > 0 . A P P E N D I X B P R O O F S I N S E C T I O N I V Let P ( k ) = I N n − D ( k ) , (25) where D ( k ) = b ( k ) L G ( k ) ⊗ I n + a ( k ) H T ( k ) H ( k ) . (26) The proof of Theorem IV .1 needs the following lemma. Lemma B.1. For the algorithm (7), if Condition C1.a , the conditions (b .1) and (b .2) in Theorem IV .1 hold, then lim k →∞ k E [Φ P ( k , 0)Φ T P ( k , 0)] k = 0 . (27) Pr oof. By (25), we hav e Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) = ( I N n − D T ( mh )) · · · ( I N n − D T (( m + 1) h − 1)) × ( I N n − D (( m + 1) h − 1)) · · · ( I N n − D ( mh )) . (28) T aking conditional expectation w .r .t. F ( mh − 1) on both sides of the abov e, by the binomial expansion, we hav e k E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] k = k E [( I N n − D T ( mh )) · · · ( I N n − D T (( m + 1) h − 1)) × ( I N n − D (( m + 1) h − 1)) · · · ( I N n − D ( mh )) |F ( mh − 1)] k =    I N n − ( m +1) h − 1 X k = mh E [ D T ( k ) + D ( k ) |F ( mh − 1)] + E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)]    ≤    I N n − ( m +1) h − 1 X k = mh E [ D T ( k ) + D ( k ) |F ( mh − 1)]    + k E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)] k . (29) Here, M i ( m ) , i = 2 , · · · , 2 h represent the i -th order terms in the binomial expansion of Φ P (( m + 1) h − 1 , mh )Φ T P (( m + 1) h − 1 , mh ) . December 8, 2020 DRAFT 23 Since the 2-norm of a symmetric matrix is equal to its spectral radius, by the deﬁnition of spectral radius, we have    I N n − ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)]    = ρ I N n − ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] ! = max 1 ≤ i ≤ N n      λ i I N n − ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] !      = max 1 ≤ i ≤ N n      1 − λ i ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] !      . (30) Since both a ( k ) and b ( k ) tend to zero, by the condition (b .2), we know that there exists a positi ve integer m 1 , which is independent of the sample paths, such that λ i ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] ! ≤ 1 , i = 1 , · · · , N n, ∀ m ≥ m 1 a . s . This together with (29) and (30) leads to k E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] k ≤ 1 − λ min ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] ! + k E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)] k , ∀ m ≥ m 1 a . s . (31) W e next bound the two terms on the right side of the above. For the ﬁrst term, by the deﬁnitions of D ( k ) and Λ h m and the condition (b .1), we have 1 − λ min ( m +1) h − 1 X k = mh E [ D ( k ) + D T ( k ) |F ( mh − 1)] ! = 1 − λ min ( m +1) h − 1 X k = mh E [2 b ( k ) b L G ( k ) ⊗ I n + 2 a ( k ) H T ( k ) H ( k ) |F ( mh − 1)] ! = 1 − 2Λ h m ≤ 1 − c ( m ) , ∀ m ≥ m 1 a . s . (32) By Lemma A.4 and the condition (b .2), it follo ws that sup k ≥ 0 E [ k e D ( k ) k i |F ( k − 1)] ≤ sup k ≥ 0 [ E [ k e D ( k ) k 2 h |F ( k − 1)]] i 2 h ≤ ρ i 0 a . s ., 2 ≤ i ≤ 2 h , where e D ( k ) = L G ( k ) ⊗ I n + H T ( k ) H ( k ) . Note that for any giv en random variable ξ and σ - algebra F 1 ⊆ F 2 , it is true that E [ ξ |F 1 ] = E [ E [ ξ |F 2 ] |F 1 ] . (33) December 8, 2020 DRAFT 24 W e then have E [ k e D ( k ) k l |F ( mh − 1)] = E [ E [ k e D ( k ) k l |F ( k − 1)] |F ( mh − 1)] , 2 ≤ l ≤ 2 h , k ≥ mh. From the deﬁnitions of M i ( m ) , i = 2 , · · · , 2 h and the above, by termwise multiplication and using Lemma A.5 repeatedly , for the second term on the right side of (31), we hav e k E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)] k ≤ b 2 ( mh ) 2 h X i =2 C i 2 h (max { 1 , φ } ρ 0 ) i ! = b 2 ( mh ) α, (34) where φ satisﬁes a ( k ) ≤ φb ( k ) , α = (1 + max { 1 , φ } ρ 0 ) 2 h − 1 − 2 h max { 1 , φ } ρ 0 and C p m denotes the combinatorial number of choosing p elements from m elements. By (31)-(34), we ha ve k E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] k ≤ 1 − c ( m ) + b 2 ( mh ) α, m ≥ m 1 a . s . (35) Denote m k = b k h c . By the properties of the conditional expectation, Lemma A.6 and (35), we ha ve k E [Φ P ( k , 0)Φ T P ( k , 0)] k ≤ N n k E [Φ T P ( k , 0)Φ P ( k , 0)] k = N n k E [Φ T P ( m k h − 1 , 0)Φ T P ( k , m k h )Φ P ( k , m k h )Φ P ( m k h − 1 , 0)] k ≤ N n k E [Φ T P ( m k h − 1 , 0) k Φ T P ( k , m k h )Φ P ( k , m k h ) k Φ P ( m k h − 1 , 0)] k = N n k E [ E [Φ T P ( m k h − 1 , 0) k Φ T P ( k , m k h )Φ P ( k , m k h ) k Φ P ( m k h − 1 , 0) |F ( m k h − 1)]] k = N n k E [Φ T P ( m k h − 1 , 0) E [ k Φ T P ( k , m k h )Φ P ( k , m k h ) k|F ( m k h − 1)] × Φ P ( m k h − 1 , 0)] k , (36) For any positi ve integers m, n satisfying 0 ≤ m − n ≤ h − 1 , it follo ws from the condition (b .2) that there exists a constant ρ ∗ h > 0 such that k E [Φ T P ( m, n )Φ P ( m, n ) |F ( n − 1)] k < ρ ∗ h a . s . (37) By the above and (36), noting that k − m k h ≤ h − 1 , we hav e k E [Φ P ( k , 0)Φ T P ( k , 0)] k ≤ ρ ∗ h N n k E [Φ T P ( m k h − 1 , 0)Φ P ( m k h − 1 , 0)] k = ρ ∗ h N n k E [Φ T P ( m 1 h − 1 , 0)Φ T P ( m k h − 1 , m 1 h )Φ P ( m k h − 1 , m 1 h )Φ P ( m 1 h − 1 , 0)] k = ρ ∗ h N n k E [ E (Φ T P ( m 1 h − 1 , 0)Φ T P ( m k h − 1 , m 1 h ) × Φ P ( m k h − 1 , m 1 h )Φ P ( m 1 h − 1 , 0) |F ( m 1 h − 1))] k December 8, 2020 DRAFT 25 ≤ ρ ∗ h N n k E [Φ T P ( m 1 h − 1 , 0) × k E [Φ T P ( m k h − 1 , m 1 h )Φ P ( m k h − 1 , m 1 h ) |F ( m 1 h − 1)] k Φ P ( m 1 h − 1 , 0)] k . (38) By (33) and (35), we hav e k E [Φ T P ( m k h − 1 , m 1 h )Φ P ( m k h − 1 , m 1 h ) |F ( m 1 h − 1)] k = k E [Φ T P (( m k − 1) h − 1 , m 1 h )Φ T P ( m k h − 1 , ( m k − 1) h )Φ P ( m k h − 1 , ( m k − 1) h ) × Φ P (( m k − 1) h − 1 , m 1 h ) |F ( m 1 h − 1)] k = k E [ E [Φ T P (( m k − 1) h − 1 , m 1 h )Φ T P ( m k h − 1 , ( m k − 1) h )Φ P ( m k h − 1 , ( m k − 1) h ) × Φ P (( m k − 1) h − 1 , m 1 h ) |F (( m k − 1) h − 1)] |F ( m 1 h − 1)] k = k E [Φ T P (( m k − 1) h − 1 , m 1 h ) × E [Φ T P ( m k h − 1 , ( m k − 1) h )Φ P ( m k h − 1 , ( m k − 1) h ) |F (( m k − 1) h − 1)] × Φ P (( m k − 1) h − 1 , m 1 h ) |F ( m 1 h − 1)] k ≤ k E [Φ T P (( m k − 1) h − 1 , m 1 h ) × k E [Φ T P ( m k h − 1 , ( m k − 1) h )Φ P ( m k h − 1 , ( m k − 1) h ) |F (( m k − 1) h − 1)] k × Φ P (( m k − 1) h − 1 , m 1 h ) |F ( m 1 h − 1)] k ≤ [1 − c ( m k − 1) + b 2 (( m k − 1) h ) α ] × k E [Φ T P (( m k − 1) h − 1 , m 1 h )Φ P (( m k − 1) h − 1 , m 1 h ) |F ( m 1 h − 1)] k ≤ m k − 1 Y s = m 1 [1 − c ( s ) + b 2 ( sh ) α ] a . s ., (39) which together with (38) leads to k E [Φ P ( k , 0)Φ T P ( k , 0)] k ≤ ρ ∗ h N n k E [Φ T P ( m 1 h − 1 , 0)Φ P ( m 1 h − 1 , 0)] k m k − 1 Y s = m 1 [1 − c ( s ) + b 2 ( sh ) α ] . (40) By (13), we know that there exists a positiv e integer m 2 such that b 2 ( mh ) α ≤ 1 2 c ( m ) , ∀ m ≥ m 2 , (41) Let m 3 = max { m 2 , m 1 } and r 1 = Q m 3 − 1 s = m 1 [1 − c ( s ) + b 2 ( sh ) α ] . By (13) and (41), we hav e lim k →∞ m k − 1 Y s = m 1 [1 − c ( s ) + b 2 ( sh ) α ] ≤ lim k →∞ r 1 m k − 1 Y s = m 3 [1 − 1 2 c ( s )] ≤ lim k →∞ r 1 exp  − 1 2 m k − 1 X s = m 3 c ( s )  = r 1 exp  − 1 2 ∞ X s = m 3 c ( s )  = 0 . (42) December 8, 2020 DRAFT 26 Since k E [Φ T P ( m 1 h − 1 , 0)Φ P ( m 1 h − 1 , 0)] k < ∞ by the condition (b .2), (40) and (42), we hav e (27). The lemma is proved. Proof of Theorem IV .1 . If λ j i ( k ) = 0 a.s., ∀ j, i ∈ V , ∀ k ≥ 0 , then by (12), we hav e e ( k + 1) = P ( k ) e ( k ) + a ( k ) H T ( k ) v ( k ) = Φ P ( k , 0) e (0) + k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) , k ≥ 0 . (43) By the above, we hav e E [ e ( k + 1) e T ( k + 1)] = E [Φ P ( k , 0) e (0) e T (0)Φ T P ( k , 0)] + E " Φ P ( k , 0) e (0) k X i =0 a ( i )[Φ P ( k , i + 1) H T ( i ) v ( i )] T # + E " k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i )[Φ P ( k , 0) e (0)] T # + E " k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) ! × k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) ! T # . (44) By Assumptions A1.a and A1.b , we know that the second and third terms on the right side of (44) are both equal to zero. Moreov er , from E [ v ( i ) v T ( j )] = E [ E [ v ( i ) v T ( j ) |F ( i − 1)]] = E [ E [ v ( i ) |F ( i − 1)] v T ( j )] = 0 , ∀ i > j, (45) we ha ve E " k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) ! k X i =0 a ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) ! T # = E " k X i =0 a 2 ( i )Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ P ( k , i + 1) # . Substituting the above into (44) and taking the 2-norm leads to k E [ e ( k + 1) e T ( k + 1)] k ≤ k E [Φ P ( k , 0)Φ T P ( k , 0)] kk e (0) k 2 + k X i =0 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k = k E [Φ P ( k , 0)Φ T P ( k , 0)] kk e (0) k 2 + k X i = k − 3 h a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k December 8, 2020 DRAFT 27 + k − 3 h − 1 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k . (46) By Lemma B.1, we kno w that the ﬁrst term in the abov e con verges to zero. For the second term in the abov e, when k − h ≤ i < k , we hav e by (37) that k E [Φ T P ( k , i +1)Φ P ( k , i +1) |F ( i )] k ≤ ρ ∗ h a.s.; when k − 2 h ≤ i < k − h , it follows from Lemma A.6 and (37) that k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )] k ≤ N n k E [Φ T P ( k , i + 1)Φ P ( k , i + 1) |F ( i )] k = N n k E [Φ T P ( k − h, i + 1)Φ T P ( k , k − h + 1)Φ P ( k , k − h + 1)Φ P ( k − h, i + 1) |F ( i )] k = N n k E [ E [Φ T P ( k − h, i + 1)Φ T P ( k , k − h + 1) × Φ P ( k , k − h + 1)Φ P ( k − h, i + 1) |F ( k − h )] |F ( i )] k = N n k E [Φ T P ( k − h, i + 1) E [Φ T P ( k , k − h + 1) × Φ P ( k , k − h + 1) |F ( k − h )]Φ P ( k − h, i + 1) |F ( i )] k ≤ N n k E [Φ T P ( k − h, i + 1) × k E [Φ T P ( k , k − h + 1)Φ P ( k , k − h + 1) |F ( k − h )] k Φ P ( k − h, i + 1) |F ( i )] k ≤ N nρ ∗ h k E [Φ T P ( k − h, i + 1)Φ P ( k − h, i + 1) |F ( i )] k ≤ N n ( ρ ∗ h ) 2 a . s . ; when k − 3 h ≤ i < k − 2 h , similar to the above, we hav e k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )] k ≤ N n ( ρ ∗ h ) 3 a . s . Hence, by Assumptions A1.a and A1.b , we have sup k ≥ 0 k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k < ∞ , k − 3 h ≤ i ≤ k , a . s . Then, noting that a ( k ) decays to zero, the second term on the right side of (46) tends to zero. W e ne xt pro ve that the third term on the right side of (46) tends to zero. Let e m i = d i h e . W e ha ve k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )] k ≤ N n k E [Φ T P ( k , i + 1)Φ P ( k , i + 1) |F ( i )] k = N n k E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h )Φ T P ( k , m k h ) × Φ P ( k , m k h )Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k = N n k E [ E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h )Φ T P ( k , m k h )Φ P ( k , m k h ) × Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( m k h − 1)] |F ( i )] k = N n k E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h ) × E [Φ T P ( k , m k h )Φ P ( k , m k h ) |F ( m k h − 1)] × Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k ≤ N nρ ∗ h k E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h ) × Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k , a . s ., (47) December 8, 2020 DRAFT 28 where the ﬁrst inequality follo ws by Lemma A.6, the second equality follows by (33) and the last inequality follows by (37). Similarly to (39) in the proof of Lemma B.1, we have k E [Φ T P ( m k h − 1 , e m i +1 h )Φ P ( m k h − 1 , e m i +1 h ) |F ( e m i +1 h − 1)] k ≤ m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) α ] , From the above (37) and (47), we ha ve k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )] k ≤ N nρ ∗ h k E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h ) × Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k = N nρ ∗ h k E [ E [Φ T P ( e m i +1 h − 1 , i + 1)Φ T P ( m k h − 1 , e m i +1 h ) × Φ P ( m k h − 1 , e m i +1 h )Φ P ( e m i +1 h − 1 , i + 1) |F ( e m i +1 h − 1)] |F ( i )] k = N nρ ∗ h k E [Φ T P ( e m i +1 h − 1 , i + 1) × E [Φ T P ( m k h − 1 , e m i +1 h )Φ P ( m k h − 1 , e m i +1 h ) |F ( e m i +1 h − 1)] × Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k ≤ N nρ ∗ h k E [Φ T P ( e m i +1 h − 1 , i + 1) k × E [Φ T P ( m k h − 1 , e m i +1 h )Φ P ( m k h − 1 , e m i +1 h ) |F ( e m i +1 h − 1)] k × Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k ≤ N nρ ∗ h m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) α ] k E [Φ T P ( e m i +1 h − 1 , i + 1)Φ P ( e m i +1 h − 1 , i + 1) |F ( i )] k ≤ N n ( ρ ∗ h ) 2 m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) α ] , 0 ≤ i ≤ k − 3 h − 1 , a . s ., (48) By (48), the condition (b .2), Assumptions A1.a and A1.b , it follows that k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k = k E [ E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1) |F ( i )]] k ≤ k E [ kH T ( i ) v ( i ) v T ( i ) H ( i ) k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )]] k ≤ E [ kH T ( i ) v ( i ) v T ( i ) H ( i ) kk E [Φ P ( k , i + 1)Φ T P ( k , i + 1) |F ( i )] k ] ≤ N n ( ρ ∗ h ) 2 E [ kH T ( i ) v ( i ) v T ( i ) H ( i ) k ] m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) α ] ≤ N nβ v ρ 0 ( ρ ∗ h ) 2 m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) α ] ≤ N nβ v ρ 0 ( ρ ∗ h ) 2 m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] , m 3 h − 1 ≤ i ≤ k − 3 h − 1 . By the above, we hav e k − 3 h − 1 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k December 8, 2020 DRAFT 29 = m 3 h − 2 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k + k − 3 h − 1 X i = m 3 h − 1 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k ≤ m 3 h − 2 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1)Φ T P ( k , i + 1) E [ kH ( i ) k 2 k v ( i ) k 2 k|F ( i )]] k + N nβ v ρ 0 ( ρ ∗ h ) 2 k − 3 h − 1 X i = m 3 h − 1 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] ≤ β v ρ 0 m 3 h − 2 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1)Φ T P ( k , i + 1)] k + N nβ v ρ 0 ( ρ ∗ h ) 2 k − 3 h − 1 X i = m 3 h − 1 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] . (49) By Lemma B.1, we know that lim k →∞ k E [Φ P ( k , i + 1)Φ T P ( k , i + 1)] k = 0 , 0 ≤ i ≤ m 3 h − 2 . Then, lim k →∞ β v ρ 0 m 3 h − 2 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1)Φ T P ( k , i + 1)] k = 0 . (50) By direct calculations, it follows that k − 3 h − 1 X i = m 3 h − 1 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] ≤ k X i =0 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] = m k h − 1 X i =0 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] + k X i = m k h a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] = m k − 1 X i =0 " ( i +1) h − 1 X j = ih a 2 ( j ) # m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] + k X i = m k h a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] . (51) Since a ( k ) decays to zero, it follows that lim k →∞ k X i = m k h a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] = 0 . (52) By (13) and Condition C1.a , we hav e P m k h − 1 j =( m k − 1) h a 2 ( j ) c ( m k − 1) ≤ ha 2 (( m k − 1) h ) c ( m k − 1) and lim k →∞ ha 2 (( m k − 1) h ) c ( m k − 1) = lim k →∞ ha 2 (( m k − 1) h ) b 2 (( m k − 1) h ) b 2 (( m k − 1) h ) c ( m k − 1) = 0 . December 8, 2020 DRAFT 30 Then, from (13) and Lemma A.2, we ha ve lim k →∞ m k − 1 X i =0 " ( i +1) h − 1 X j = ih a 2 ( j ) # m k − 1 Y s = e m i +1 [1 − 1 2 c ( s + 1)] = lim k →∞ 2 P m k h − 1 j =( m k − 1) h a 2 ( j ) c ( m k − 1) = 0 . By the above, (51) and (52), it follo ws that lim k →∞ k − 3 h − 1 X i = m 3 h − 1 a 2 ( i ) m k − 1 Y s = e m i +1 [1 − 1 2 c ( s )] = 0 . (53) Then, by (49), (50) and the abov e, we hav e lim k →∞ k − 3 h − 1 X i =0 a 2 ( i ) k E [Φ P ( k , i + 1) H T ( i ) v ( i ) v T ( i ) H ( i )Φ T P ( k , i + 1)] k = 0 . Thus, the third term on the right side of (46) tends to zero. W e hav e lim k →∞ k E [ e ( k ) e T ( k )] k = 0 . Since E k e ( k ) k 2 ≤ N n k E [ e ( k ) e T ( k )] k , it follows that lim k →∞ E k e ( k ) k 2 = 0 . The algo- rithm (7) conv erges in mean square. W e next prov e that the algorithm (7) con verges almost surely . By (43), it follo ws that e (( m + 1) h ) = Φ P (( m + 1) h − 1 , mh ) e ( mh ) + ( m +1) h − 1 X k = mh a ( k )Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) , m ≥ 0 , T aking the 2-norm and then conditional expectation w .r .t. F ( mh − 1) on both sides of the abov e, we hav e E [ k e (( m + 1) h ) k 2 |F ( mh − 1)] = e T ( mh ) E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] e ( mh ) + E " ( m +1) h − 1 X k = mh a ( k )Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) ! T × ( m +1) h − 1 X k = mh a ( k )Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) !      F ( mh − 1) # + 2 e T ( mh ) E " Φ T P (( m + 1) h − 1 , mh ) × ( m +1) h − 1 X k = mh a ( k )Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) !      F ( mh − 1) # . By Lemma A.1 in [36] and Assumptions A1.a and A1.b , the abov e can be written as E [ k e (( m + 1) h ) k 2 |F ( mh − 1)] = e T ( mh ) E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] e ( mh ) + ( m +1) h − 1 X k = mh a 2 ( k ) E [ k Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) k 2 |F ( mh − 1)] . (54) December 8, 2020 DRAFT 31 In the light of the condition (b .2), Assumptions A1.a and A1.b , we know that there exists a constant ρ 4 such that ( m +1) h − 1 X k = mh E [ k Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) k 2 |F ( mh − 1)] ≤ ρ 4 a . s ., ∀ m ≥ 0 , which together with (35) and (54) gi ves E [ k e (( m + 1) h ) k 2 |F ( mh − 1)] ≤ k E [Φ T P (( m + 1) h − 1 , mh )Φ P (( m + 1) h − 1 , mh ) |F ( mh − 1)] kk e ( mh ) k 2 + a 2 ( mh ) ( m +1) h − 1 X k = mh E [ k Φ P (( m + 1) h − 1 , k + 1) H T ( k ) v ( k ) k 2 |F ( mh − 1)] ≤ (1 + b 2 ( mh ) α ) k e ( mh ) k 2 + a 2 ( mh ) ρ 4 a . s . By Lemma A.3 and Condition C1.c , we know that { e ( mh ) , m ≥ 0 } con ver ges almost surely , which, along with lim m → 0 E k e ( mh ) k 2 = 0 by Theorem IV .1, gi ves lim m → 0 e ( mh ) = 0 N n × 1 a . s . (55) For arbitrarily small  > 0 , by Marko v inequality , we ha ve P { a ( k ) k v ( k ) k ≥  } ≤ a 2 ( k ) E k v ( k ) k 2  2 , k ≥ 0 , which together with Assumption A1.b , Conditions C1.a and C1.c giv es ∞ X k =0 P { a ( k ) k v ( k ) k ≥  } ≤ P ∞ k =0 a 2 ( k ) E k v ( k ) k 2  2 ≤ β v P ∞ k =0 a 2 ( k )  2 < ∞ . Then by the Borel-Cantelli lemma, we hav e P { a ( k ) k v ( k ) k ≥  i.o. } = 0 , which means a ( k ) k v ( k ) k → 0 , k → ∞ a . s . (56) By (43), we have k e ( k ) k ≤ k Φ P ( k − 1 , m k h ) kk e ( m k h ) k + k − 1 X i = m k h a ( i ) k v ( i ) kk Φ P ( k − 1 , i + 1) kkH T ( i ) k . (57) By Assumption A2.a and noting 0 ≤ k − m k h < h, we know that sup k ≥ 0 k Φ P ( k − 1 , m k h ) k < ∞ a . s . and sup k ≥ 0 k Φ P ( k − 1 , i + 1) kkH T ( i ) k < ∞ a . s ., m k h ≤ i ≤ k − 1 . Then by (55)-(57), we ha ve lim k →∞ e ( k ) = 0 N n × 1 a . s . The proof is completed. Proof of Theorem IV .2 . Since {G ( k ) , k ≥ 0 } ∈ Γ 1 , E [ b L G ( k ) |F ( k − 1)] is positi ve semi- deﬁnite, which together with E [ b L G ( k ) |F ( mh − 1)] = E [ E [ b L G ( k ) |F ( k − 1)] |F ( mh − 1)] leads to that E [ b L G ( k ) |F ( mh − 1)] is positiv e semi-deﬁnite, k ≥ mh . Let c ( m ) = min { a (( m + 1) h ) , b (( m + 1) h ) } . Then, by Condition C1.a and the condition (c.1), we hav e Λ h m December 8, 2020 DRAFT 32 = λ min " ( m +1) h − 1 X k = mh b ( k ) E [ b L G ( k ) |F ( mh − 1)] ⊗ I n + a ( k ) E [ H T ( k ) H ( k ) |F ( mh − 1)] !# ≥ λ min " ( m +1) h − 1 X k = mh b (( m + 1) h ) E [ b L G ( k ) |F ( mh − 1)] ⊗ I n + a (( m + 1) h ) E [ H T ( k ) H ( k ) |F ( mh − 1)] !# ≥ c ( m )Λ h m ≥ c ( m ) θ . Note that P ∞ m =0 a (( m + 1) h ) ≥ 1 h P ∞ s =0 P ( m +2) h − 1 i =( m +1) h a ( i ) = 1 h P ∞ k = h a ( k ) . This together with Conditions C1.a and C1.b , and c ( m ) ≥ min { a (( m +1) h ) , a (( m +1) h ) /C 1 } = min { 1 , 1 /C 1 } a (( m + 1) h ) where C 1 , sup k ≥ 0 a ( k ) b ( k ) , gi ves ∞ X m =0 c ( m ) ≥ min { 1 , 1 /C 1 } ∞ X m =0 a (( m + 1) h ) ≥ min { 1 , 1 /C 1 } h ∞ X k = h a ( k ) = ∞ . (58) By Conditions C1.a and C1.b , we get sup m ≥ 0 a ( mh ) c ( m ) = sup m ≥ 0 a ( mh ) a ( mh + h ) a ( mh + h ) c ( m ) ≤ sup m ≥ 0 a ( mh ) a ( mh + h ) a ( mh + h ) min { a ( mh + h ) , 1 C 1 a ( mh + h ) } < ∞ , which together with Condition C1.b giv es lim m →∞ b 2 ( mh ) c ( m ) = lim m →∞ b 2 ( mh ) a ( mh ) a ( mh ) c ( m ) = 0 . (59) Then, c ( m ) satisﬁes b 2 ( mh ) = o ( c ( m )) and P ∞ m =0 c ( m ) = ∞ . The proof is completed by Theorem IV .1. Proof of Corollary IV .1 . By Assumption A3 and the one-to-one correspondence among A G ( k ) and L G ( k ) , we know that L G ( k ) is a homogeneous and uniform ergodic Markov chain (See Deﬁnition A.1) with the unique stationary distribution π . Denote the associated Laplacian matrix of A l by L l and b L l = b L l + b L T l 2 , l = 1 , 2 , ... By the deﬁnition of Λ h m , we have Λ h m = λ min " ( m +1) h − 1 X k = mh E [ b L G ( k ) ⊗ I n + H T ( k ) H ( k ) |F ( mh − 1)] # = λ min " ( m +1) h − 1 X k = mh E [ b L G ( k ) ⊗ I n + H T ( k ) H ( k ) |h b L G ( mh − 1) , H ( mh − 1) i = S 0 ] # = λ min " h X k =1 ∞ X l =1 ( b L l ⊗ I n + H T l H l ) P k ( S 0 , h b L l , H l i ) # , ∀ S 0 ∈ S , ∀ m ≥ 0 , h ≥ 1 . (60) December 8, 2020 DRAFT 33 Noting the uniform er godicity of { b L G ( k ) , k ≥ 0 } and {H ( k ) , k ≥ 0 } and the uniqueness of the stationary distribution π , since sup l ≥ 1 kL l k < ∞ and sup l ≥ 1 kH l k < ∞ , we ha ve      P h k =1 P ∞ l =1 ( b L l ⊗ I n + H T l H l ) P k ( S 0 , h b L l , H l i ) h − ∞ X l =1 π l ( b L l ⊗ I n + H T l H l )      =      P h k =1 P ∞ l =1 [( b L l ⊗ I n + H T l H l ) P k ( S 0 , h b L l , H l i ) − π l ( b L l ⊗ I n + H T l H l )] h      =      P h k =1 P ∞ l =1 [( b L l ⊗ I n + H T l H l )( P k ( S 0 , h b L l , H l i ) − π l )] h      ≤ sup l ≥ 1 k b L l ⊗ I n + H T l H l k P h k =1 Rr − k h → 0 , h → ∞ , where constants R and r are positiv e with r > 1 . By the deﬁnition of uniform con vergence, we kno w that 1 h   ( m +1) h − 1 X k = mh E [ b L G ( k ) ⊗ I n + H T ( k ) H ( k ) |F ( mh − 1)]   con ver ges to ∞ X l =1 π l ( b L l ⊗ I n + H T l H l ) uniformly w .r .t. m and the sample paths a . s ., as h → ∞ . By the conditions (d.1) and (d.2), it follows that λ min ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l )) > 0 . T o see this, for an y gi ven x ∈ R N n , x 6 = 0 N n × 1 , let x = [ x T 1 , · · · , x T N ] T , x i ∈ R n ; (i) if x = 1 N ⊗ a , ∃ a ∈ R n and a 6 = 0 n × 1 , i.e. x 1 = x 2 = .. = x N = a , then by the condition (d.2), we hav e x T ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l )) x = a T [ P N i =1 P ∞ l =1 ( π l H T i,l H i,l )] a > 0 ; (ii) otherwise, there must be x i 6 = x j , ∃ i 6 = j . By the condition (d.1), we kno w that P ∞ l =1 π l b L l is the Laplacian matrix of a connected graph. Then by Lemma A.7, we hav e x T ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l )) x ≥ x T ( P ∞ l =1 π l b L l ⊗ I n ) x > 0 . Combining (i) and (ii), we get λ min ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l )) > 0 . Since the function λ min ( · ) , whose ar guments are matrices, is continuous, we know that for a gi ven constant µ ∈ (0 , 2 λ min ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l ))) , there exists a constant δ > 0 such that for any giv en matrix L , | λ min ( L ) − λ min ( P ∞ l =1 π l ( b L l ⊗ I n + H T l H l )) | ≤ µ 2 provided k L − P ∞ l =1 π l ( b L l ⊗ I n + H T l H l ) k ≤ δ . Since the con vergence is uniform, we know that there exists an integer h 0 > 0 such that sup m ≥ 0      1 h ( m +1) h − 1 X k = mh E [ b L G ( k ) ⊗ I n + H T ( k ) H ( k ) |F ( mh − 1)] − ∞ X l =1 π l ( b L l ⊗ I n + H T l H l )      ≤ δ, h ≥ h 0 a . s ., which gi ves sup m ≥ 0      1 h Λ h m − λ min ∞ X l =1 π l ( b L l ⊗ I n + H T l H l ) !      ≤ µ 2 , h ≥ h 0 a . s . December 8, 2020 DRAFT 34 Thus, we arrive at inf m ≥ 0 Λ h m ≥ " λ min ∞ X l =1 π l ( b L l ⊗ I n + H T l H l ) ! − µ 2 # h ≥ " λ min ∞ X l =1 π l ( b L l ⊗ I n + H T l H l ) ! − µ 2 # h 0 > 0 a . s . By Theorem IV .2, the proof is completed. A P P E N D I X C P R O O F S I N S E C T I O N V Proof of Lemma V .1 . W e adopt the the mathematical induction method to prove the lemma. By (6) and (17), noting that F ( k ) = I N n , − d ≤ k ≤ − 1 , we ha ve F (0) = I N n − [ b (0) D G (0) ⊗ I n + a (0) H T (0) H (0) − b (0) d X q =0 A (0 , q )] = I N n − [ b (0) D G (0) ⊗ I n + a (0) H T (0) H (0) − b (0) A G (0) ⊗ I n ] . Note that, under Condition C1.d , the set { ψ ∈ (0 , 1) | b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ ) } is a nonempty and bounded closed set by the continuity of f C 1 ,β a ,β H ,N ,d ( ψ ) . Hence, ψ 1 exists. Then, by the deﬁnition of ψ 1 , we have b (0)[ N β a + C 1 β 2 H + N β a [(1 − ψ 1 ) − ( d +1) − 1] / [(1 − ψ 1 ) − 1 − 1]] ≤ ψ 1 . (61) By the above, Assumption A2.b and Condition C1.a , we have k G (0) k = k b (0) D G (0) ⊗ I n + a (0) H T (0) H (0) − b (0) A G (0) ⊗ I n k ≤ b (0) sup k ≥ 0 kD G ( k ) k + a (0) sup k ≥ 0 kH T ( k ) H ( k ) k + b (0) sup k ≥ 0 kA G ( k ) k ≤ b (0)[2 N β a + C 1 β 2 H ] ≤ b (0)[ N β a + C 1 β 2 H + N β a [(1 − ψ 1 ) − ( d +1) − 1] / [(1 − ψ 1 ) − 1 − 1]] ≤ ψ 1 a . s . By the above and Lemma A.1, noting ψ 1 ∈ (0 , 1) , it follows that F (0) is in vertible a.s. and k F − 1 (0) k ≤ (1 − ψ 1 ) − 1 a.s . Assume that F ( k ) is in vertible a.s. and k F − 1 ( k ) k < (1 − ψ 1 ) − 1 a . s . for k = 0 , 1 , 2 , · · · . By (61), Assumption A2.b and Condition C1.a , we have k G ( k + 1) k = k b ( k + 1) D G ( k +1) ⊗ I n + a ( k + 1) H T ( k + 1) H ( k + 1) − b ( k + 1) d X q =0 A ( k + 1 , q )[Φ F ( k , k − q + 1)] − 1 k ≤ b ( k + 1)[ N β a + C 1 β 2 H ] + b ( k + 1) N β a d X q =0 (1 − ψ 1 ) − q a . s . December 8, 2020 DRAFT 35 ≤ b (0)[ N β a + C 1 β 2 H + N β a [(1 − ψ 1 ) − ( d +1) − 1] / [(1 − ψ 1 ) − 1 − 1]] ≤ ψ 1 . Then By Lemma A.1, we know that F ( k + 1) is in vertible a.s. and k F − 1 ( k + 1) k ≤ (1 − ψ 1 ) − 1 a . s . By the mathematical induction, the proof is completed. Before proving Theorem V .1, we need the follo wing lemma. Lemma C.1. If Assumption A2.b , Conditions C1.a and C1.d hold, and there exist a positi ve integer h and a positiv e sequence { c ( m ) , m ≥ 0 } such that e Λ h m ≥ c ( m ) a . s . with c ( m ) satisfying b 2 ( mh ) = o ( c ( m )) and ∞ X m =0 c ( m ) = ∞ , (62) then lim k →∞   E (Φ F ( k , 0)Φ T F ( k , 0))   = 0 . Pr oof. Since Assumption A2.b , Conditions C1.a and C1.d hold, Lemma V .1 holds. Hence, F ( k ) is in vertible a.s., and (17) follo ws. Similarly to (28) − (31) in the proof of Lemma B.1, there exists a positi ve integer m 0 1 such that k E [Φ F (( m + 1) h − 1 , mh )Φ T F (( m + 1) h − 1 , mh ) |F ( mh − 1)] k = 1 − λ min ( m +1) h − 1 X k = mh E [ G ( k ) + G T ( k ) |F ( mh − 1)] ! +   E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)]   , ∀ m ≥ m 0 1 a . s . (63) Here, the deﬁnitions of M i ( m ) , i = 2 , · · · , 2 h are similar to (29). By (18), (19) and e Λ h m ≥ c ( m ) a . s . , we have 1 − λ min ( m +1) h − 1 X k = mh E [ G ( k ) + G T ( k ) |F ( mh − 1)] ! = 1 − λ min ( m +1) h − 1 X k = mh E " 2 b ( k ) D G ( k ) ⊗ I n + 2 a ( k ) H T ( k ) H ( k ) − b ( j ) d X q =0 [ A ( k , q )[Φ F ( k − 1 , k − q )] − 1 + ( A ( k , q )[Φ F ( k − 1 , k − q )] − 1 ) T ]    F ( mh − 1) #! = 1 − λ min ( m +1) h − 1 X k = mh E " 2 b ( k ) b L G ( k ) ⊗ I n + 2 a ( k ) H T ( k ) H ( k ) − b ( k ) d X q =0 [ A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] December 8, 2020 DRAFT 36 + ( A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ]) T ]    F ( mh − 1) #! = 1 − 2 e Λ h m ≤ 1 − c ( m ) a . s . (64) From (18), Assumption A2.b , Condition C1.a and Lemma V .1, we ha ve k G ( k ) k ≤ b ( k ) kD G ( k ) ⊗ I n k + a ( k ) kH T ( k ) H ( k ) k + b ( k ) k d X q =0 A ( k , q )[Φ F ( k − 1 , k − q )] − 1 k ≤ b ( k ) N β a + C 1 β 2 H + N β a 1 − (1 − ψ 1 ) − ( d +1) 1 − (1 − ψ 1 ) − 1 ! a . s ., k ≥ 0 . By the above and the deﬁnition of M i ( m ) , i = 2 , · · · , 2 h , we hav e k M i ( m ) k ≤ b 2 ( mh ) C i 2 h N β a + C 1 β 2 H + N β a 1 − (1 − ψ 1 ) − ( d +1) 1 − (1 − ψ 1 ) − 1 ! i a . s ., where C p m represent the combinatorial number of choosing p elements from m elements. Hence, k E [ M 2 ( m ) + · · · + M 2 h ( m ) |F ( mh − 1)] k ≤ b 2 ( mh ) 2 h X i =2 C i 2 h N β a + C 1 β 2 H + N β a 1 − (1 − ψ 1 ) − ( d +1) 1 − (1 − ψ 1 ) − 1 ! i = b 2 ( mh ) γ a . s ., (65) where γ = N β a + C 1 β 2 H + N β a 1 − (1 − ψ 1 ) − ( d +1) 1 − (1 − ψ 1 ) − 1 ! + 1 ! 2 h − 1 − 2 h N β a + C 1 β 2 H + N β a 1 − (1 − ψ 1 ) − ( d +1) 1 − (1 − ψ 1 ) − 1 ! . By (63), (64) and (65), we hav e k E [Φ F (( m + 1) h − 1 , mh )Φ T F (( m + 1) h − 1 , mh ) |F ( mh − 1)] k ≤ 1 − c ( m ) + b 2 ( mh ) γ a . s ., m ≥ m 0 1 . (66) By (17) and Assumption A2.b , we kno w that there exists a positiv e constant η such that k F ( k ) k ≤ η a . s ., k ≥ 0 . (67) Denote m k = b k h c . By (67) and Lemma A.6, we ha ve k E [Φ F ( k , 0)Φ T F ( k , 0)] k ≤ N n k E [Φ T F ( k , 0)Φ F ( k , 0)] k = N n k E [Φ T F ( m k h − 1 , 0)Φ T F ( k , m k h )Φ F ( k , m k h )Φ F ( m k h − 1 , 0)] k ≤ N n k E [Φ T F ( m k h − 1 , 0) k Φ F ( k , m k h ) k 2 Φ F ( m k h − 1 , 0)] k December 8, 2020 DRAFT 37 ≤ η 2 h N n k E [Φ T F ( m k h − 1 , 0)Φ F ( m k h − 1 , 0)] k = η 2 h N n k E [Φ T F ( m 0 1 h − 1 , 0)Φ T F ( m k h − 1 , m 0 1 h )Φ F ( m k h − 1 , m 0 1 h )Φ F ( m 0 1 h − 1 , 0)] k ≤ η 2 h N n k E [ k Φ F ( m 0 1 h − 1 , 0) k 2 Φ T F ( m k h − 1 , m 0 1 h )Φ F ( m k h − 1 , m 0 1 h )] k ≤ η 2( h + m 0 1 h ) N n k E [Φ T F ( m k h − 1 , m 0 1 h )Φ F ( m k h − 1 , m 0 1 h )] k a . s . (68) From the properties of the conditional expectation and (66), it follo ws that k E [Φ T F ( m k h − 1 , m 0 1 h )Φ F ( m k h − 1 , m 0 1 h )] k = k E [Φ T F (( m k − 1) h − 1 , m 0 1 h )Φ T F ( m k h − 1 , ( m k − 1) h )Φ F ( m k h − 1 , ( m k − 1) h ) × Φ F (( m k − 1) h − 1 , m 0 1 h )] k = k E [ E [Φ T F (( m k − 1) h − 1 , m 0 1 h )Φ T F ( m k h − 1 , ( m k − 1) h )Φ F ( m k h − 1 , ( m k − 1) h ) × Φ F (( m k − 1) h − 1 , m 0 1 h ) |F (( m k − 1) h − 1)]] k ≤ k E [Φ T F (( m k − 1) h − 1 , m 0 1 h ) × k E [Φ T F ( m k h − 1 , ( m k − 1) h )Φ F ( m k h − 1 , ( m k − 1) h ) |F (( m k − 1) h − 1)] k × Φ F (( m k − 1) h − 1 , m 0 1 h )] k ≤ [1 − c ( m k − 1) + b 2 (( m k − 1) h ) γ ] × k E [Φ T F (( m k − 1) h − 1 , m 0 1 h )Φ F (( m k − 1) h − 1 , m 0 1 h )] k ≤ m k − 1 Y s = m 0 1 [1 − c ( s ) + b 2 ( sh ) γ ] a . s . (69) Combining (68) and (69) implies k E [Φ F ( k , 0)Φ T F ( k , 0)] k ≤ N nη 2( h + m 0 1 h ) m k − 1 Y s = m 0 1 [1 − c ( s ) + b 2 ( sh ) γ ] a . s . Similarly to (40) − (42) in the proof of Lemma B.1, by Condition C1.a , (62) and the above, we ha ve lim k →∞ k E [Φ F ( k , 0)Φ T F ( k , 0)] k = 0 . The proof is completed. Proof of Theorem V .1 . By the conditions of the theorem, it follows that Lemmas V .1 and C.1 hold. Denote the follo wing block matrices: r ( k ) = [ r T ( k ) , g T ( k ) , · · · , g T ( k − d + 1)] T , b I = [ 0 N n × N n , e I ] T and e I = [ I N n , 0 N n × N n , · · · , 0 N n × N n ] , where b I and e I are the N n ( d + 1) dimen- sional column block matrix and N nd dimensional ro w block matrix with each block being the N n dimensional matrix, respectiv ely . Denote T ( k ) = F ( k ) e I 0 N nd × N n C ( k ) ! , which gi ves Φ T ( k , 0) = Φ F ( k , 0) P k i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) 0 N nd × N n Φ C ( k , 0) ! . December 8, 2020 DRAFT 38 Denote C ( k ) =        C 1 ( k + 1) C 2 ( k + 1) · · · C d ( k + 1) I N n 0 N n × N n . . . . . . I N n 0 N nd × N n        . (70) By the state augmentation approach and (15), we have r ( k + 1) = T ( k ) r ( k ) + a ( k + 1) b I H T ( k + 1) v ( k + 1) = Φ T ( k , 0) r (0) + k +1 X i =1 a ( i )Φ T ( k , i ) b I H T ( i ) v ( i ) , k ≥ 0 . Premultiplying the N n ( d + 1) dimensional row block matrix I , [ I N n , 0 N n × N n , · · · , 0 N n × N n ] on both sides of the abov e gi ves r ( k + 1) = I Φ T ( k , 0) r (0) + k +1 X i =1 a ( i ) I Φ T ( k , i ) b I H T ( i ) v ( i ) , which leads to E [ r ( k + 1) r T ( k + 1)] = E [ I Φ T ( k , 0) r (0) r T (0)Φ T T ( k , 0) I T ] + E " I Φ T ( k , 0) r (0)  k +1 X i =1 a ( i ) v T ( i ) H ( i ) b I T Φ T T ( k , i ) I T  # + E h k +1 X i =1 a ( i ) I Φ T ( k , i ) b I H T ( i ) v ( i )  r T (0)Φ T T ( k , 0) I T i + E " h k +1 X i =1 a ( i ) I Φ T ( k , i ) b I H T ( i ) v ( i ) ih k +1 X i =1 a ( i )[ I Φ T ( k , i ) b I H T ( i ) v ( i )] T i # . (71) By Assumptions A1.a and A1.b , we know that the second and third terms on the right side of the above are both equal to zero. By (45), we have E " h k +1 X i =1 a ( i ) I Φ T ( k , i ) b I H T ( i ) v ( i ) ih k +1 X i =1 a ( i )[ I Φ T ( k , i ) b I H T ( i ) v ( i )] T i # = k +1 X i =1 a 2 ( i ) E [ I Φ T ( k , i ) b I H T ( i ) v ( i ) v T ( i ) H ( i ) b I T Φ T T ( k , i ) I T ] . Substituting the abov e into (71) and taking the 2-norm on both sides of (71), from Assump- tions A1.a , A1.b and A2.b , it follo ws that k E [ r ( k + 1) r T ( k + 1)] k ≤ r 0 k E [ I Φ T ( k , 0)Φ T T ( k , 0) I T ] k December 8, 2020 DRAFT 39 +      k +1 X i =1 a 2 ( i ) E [ I Φ T ( k , i ) b I H T ( i ) v ( i ) v T ( i ) H ( i ) b I T Φ T T ( k , i ) I T ]      = r 0 k E [ I Φ T ( k , 0)Φ T T ( k , 0) I T ] k +      k +1 X i =1 a 2 ( i ) E [ I Φ T ( k , i ) b I H T ( i ) E ( v ( i ) v T ( i )) H ( i ) b I T Φ T T ( k , i ) I T ]      ≤ r 0 k E [ I Φ T ( k , 0)Φ T T ( k , 0) I T ] k + sup k ≥ 0 k E [ v ( k ) v T ( k )] k      k +1 X i =1 a 2 ( i ) E [ I Φ T ( k , i ) b I H T ( i ) H ( i ) b I T Φ T T ( k , i ) I T ]      ≤ r 0 k E [ I Φ T ( k , 0)Φ T T ( k , 0) I T ] k + β H sup k ≥ 0 k E [ v ( k ) v T ( k )] k      k +1 X i =1 a 2 ( i ) E [ I Φ T ( k , i ) b I b I T Φ T T ( k , i ) I T ]      ≤ r 0 k E [ I Φ T ( k , 0)Φ T T ( k , 0) I T ] k + β 2 H β v k +1 X i =1 a 2 ( i ) k E [ I Φ T ( k , i )Φ T T ( k , i ) I T ] k , (72) where r 0 , k r (0) r T (0) k . By the deﬁnitions of Φ T ( k , 0) and I , we hav e I Φ T ( k , 0) =  Φ F ( k , 0) k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0)  . Substituting the above into (72) giv es k E [ r ( k + 1) r T ( k + 1)] k ≤ r 0 k E [Φ F ( k , 0)Φ T F ( k , 0)] k + β 2 H β v k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k + r 0    E hn k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) on k X i =0 Φ T C ( i − 1 , 0) e I T Φ T F ( k , i + 1) oi    + β 2 H β v k +1 X i =1 a 2 ( i )    E hn k X j = i Φ F ( k , j + 1) e I Φ C ( j − 1 , i ) o × n k X j = i Φ F ( k , j + 1) e I Φ C ( j − 1 , i ) o T i    . (73) By Lemma C.1, we kno w that the ﬁrst term on the right side of the abo ve con verges to zero. Denote e m i = d i h e . By (67) and noting the deﬁnition of m k deﬁned in the proof of Lemma C.1, we have k − 3 h X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k = k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ F ( k , i + 1)Φ T F ( k , i + 1)] k = k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ F ( k , m k h )Φ F ( m k h − 1 , e m i +1 h )Φ F ( e m i +1 h − 1 , i + 1) December 8, 2020 DRAFT 40 × Φ T F ( e m i +1 h − 1 , i + 1)Φ T F ( m k h − 1 , e m i +1 h )Φ T F ( k , m k h )] k ≤ η 2 h k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ F ( k , m k h )Φ F ( m k h − 1 , e m i +1 h )Φ T F ( m k h − 1 , e m i +1 h )Φ T F ( k , m k h )] k ≤ η 4 h k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ F ( m k h − 1 , e m i +1 h )Φ T F ( m k h − 1 , e m i +1 h )] k , which together with Lemma A.6 and (69) leads to k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k ≤ η 4 h k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ F ( m k h − 1 , e m i +1 h )Φ T F ( m k h − 1 , e m i +1 h )] k + k X i = k − 3 h a 2 ( i + 1) k E [Φ F ( k , i + 1)Φ T F ( k , i + 1)] k ≤ N nη 4 h k − 3 h − 1 X i =0 a 2 ( i + 1) k E [Φ T F ( m k h − 1 , e m i +1 h )Φ F ( m k h − 1 , e m i +1 h )] k + k X i = k − 3 h a 2 ( i + 1) k E [Φ F ( k , i + 1)Φ T F ( k , i + 1)] k ≤ N n η 4 h k − 3 h − 1 X i =0 a 2 ( i + 1) m k − 1 Y s = e m i +1 [1 − c ( s ) + b 2 ( sh ) γ ] + k X i = k − 3 h a 2 ( i + 1) k E [Φ F ( k , i + 1)Φ T F ( k , i + 1)] k . Similarly to (51) − (53) in the proof of Theorem IV .1, we ha ve lim k →∞ k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k = 0 . (74) Hence, the second term on the right side of (73) con verges to zero. From (16) and (17), we hav e C i ( k ) = − b ( k ) d X q = i A ( k , q )[Φ F ( k − 1 , k − q )] − 1 , 1 ≤ i ≤ d. By Assumption A2.b and Condition C1.a , then there exist  ∈ (0 , 1 − ψ 1 √ N nd ) , where ψ 1 is deﬁned in Lemma V .1 and a positiv e integer k (  ) , such that for ∀ k ≥ k (  ) , k C i ( k ) k ∞ ≤  (  − 1)  −  1 − d a . s ., 1 ≤ i ≤ d , where k · k ∞ represents the inﬁnite norm of a matrix. If d > 1 , denote Y = diag { I N n , I N n ,  2 I N n , · · · ,  d − 1 I N n } ; if d = 1 , denote Y = I N n , which together December 8, 2020 DRAFT 41 with (70) leads to Y C ( k ) Y − 1 =        C 1 ( k + 1)  − 1 C 2 ( k + 1) · · ·  1 − d C d ( k + 1) I N n 0 N n × N n . . . . . . I N n 0 N n × N n        . Then, it follows that k Y C ( k ) Y − 1 k ∞ ≤ max n d X i =1  1 − i k C i ( k + 1) k ∞ ,  o ≤ max n  (  − 1)  −  1 − d  −  1 − d  − 1 ,  o =  a . s . From the relation between inﬁnite norm and 2-norm of a matrix, we hav e k Y C ( k ) Y − 1 k ≤ √ N nd k Y C ( k ) Y − 1 k ∞ ≤  √ N nd < 1 − ψ 1 a . s . (75) Noting that F ( k ) is in vertible a.s., we hav e    E hn k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) on k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) o T i    ≤ X 0 ≤ i,j ≤ k k E [Φ F ( k , i + 1) e I Φ C ( i − 1 , 0)Φ T C ( j − 1 , 0) e I T Φ T F ( k , j + 1)] k ≤ X 0 ≤ i,j ≤ k k E [Φ F ( k , 0)[Φ F ( i, 0)] − 1 e I Φ C ( i − 1 , 0)Φ T C ( j − 1 , 0) e I T [Φ F ( j, 0)] − T Φ T F ( k , 0)] k ≤ X 0 ≤ i,j ≤ k k E [Φ F ( k , 0) k [Φ F ( i, 0)] − 1 kk e I Φ C ( i − 1 , 0)Φ T C ( j − 1 , 0) e I T k × k [Φ F ( j, 0)] − T k Φ T F ( k , 0)] k . (76) By Lemma V .1, it follo ws that k [Φ F ( i, 0)] − 1 k ≤ (1 − ψ 1 ) − ( i +1) and k [Φ F ( j, 0)] − T k ≤ (1 − ψ 1 ) − ( j +1) a . s . (77) From (75), we obtain k e I Φ C ( i − 1 , 0)Φ T C ( j − 1 , 0) e I T k ≤ k Φ C ( i − 1 , 0) kk Φ C ( j − 1 , 0) k = k Y − 1 Φ Y C Y − 1 ( i − 1 , 0) Y kk Y − 1 Φ Y C Y − 1 ( j − 1 , 0) Y k ≤ (  √ N nd ) i + j − 2 a . s ., (78) which combining (76) and (77) giv es    E hn k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) on k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) o T i    ≤ (1 − ψ 1 ) − 2 k E [Φ F ( k , 0)Φ T F ( k , 0)] k X 0 ≤ i,j ≤ k ((1 − ψ 1 ) − 1  √ N nd ) i + j a . s . December 8, 2020 DRAFT 42 Noting that (1 − ψ 1 ) − 1  √ N nd < 1 , we hav e P 0 ≤ i,j < ∞ ((1 − ψ 1 ) − 1  √ N nd ) i + j < ∞ . Hence, by Lemma C.1, it follows that lim k →∞    E hn k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) on k X i =0 Φ F ( k , i + 1) e I Φ C ( i − 1 , 0) o T i    = 0 . Thus, the third term on the right side of (73) con verges to zero. By (77)-(78) and similarly to (76), it follo ws that k +1 X i =1 a 2 ( i )    E hn k X j = i Φ F ( k , j + 1) e I Φ C ( j − 1 , i ) on k X j = i Φ F ( k , j + 1) e I Φ C ( j − 1 , i ) o T i    = k +1 X i =1 a 2 ( i )    X i ≤ j 1 ,j 2 ≤ k E [Φ F ( k , j 1 + 1) e I Φ C ( j 1 − 1 , i )Φ T C ( j 2 − 1 , i ) e I T Φ T F ( k , j 2 + 1)]    = k +1 X i =1 a 2 ( i )    X i ≤ j 1 ,j 2 ≤ k E [Φ F ( k , i )(Φ F ( j 1 , i )) − 1 e I Φ C ( j 1 − 1 , i ) × Φ T C ( j 2 − 1 , i ) e I T (Φ T F ( j 2 , i )) − 1 Φ T F ( k , i )]    ≤ k +1 X i =1 a 2 ( i )    X i ≤ j 1 ,j 2 ≤ k E [Φ F ( k , i ) k (Φ F ( j 1 , i )) − 1 e I Φ C ( j 1 − 1 , i ) × Φ T C ( j 2 − 1 , i ) e I T (Φ T F ( j 2 , i )) − 1 k Φ T F ( k , i )]    ≤ k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k X i ≤ j 1 ,j 2 ≤ k (1 − ψ 1 ) − ( j 1 + j 2 − 2 i +6) (  √ N nd ) j 1 + j 2 − 2 i a . s . ≤ (1 − ψ 1 ) − 6 k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k X i ≤ j 1 ,j 2 ≤ k ((1 − ψ 1 ) − 1  √ N nd ) ( j 1 + j 2 − 2 i ) = (1 − ψ 1 ) − 6 k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k 1 − ((1 − ψ 1 ) − 1  ) 2 k − 2 i +1 1 − (1 − ψ 1 ) − 1  √ N nd ≤ (1 − ψ 1 ) − 6 1 − (1 − ψ 1 ) − 1  √ N nd k +1 X i =1 a 2 ( i ) k E [Φ F ( k , i )Φ T F ( k , i )] k a . s . In the light of (74), the abov e con ver ges to zero. So far , we ha ve prov ed that all the four terms on the right side of (73) con ver ge to zero. Thus, we hav e lim k →∞ k E ( r ( k + 1) r T ( k + 1)) k = 0 , which, along with the facts that E k r ( k ) k 2 = E [T r( r ( k ) r T ( k ))] = T r[ E ( r ( k ) r T ( k ))] and r ( k ) is equiv alent to e ( k ) , giv es lim k →∞ E k e ( k ) k 2 = 0 . The proof is completed. Proof of Corollary V .1 . Follo wing the lines in the proof of Lemma V .1, it can be veriﬁed that under b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ 2 ) , Assumption A2.b and Condition C1.a , F ( k ) is in vertible and k G ( k ) k ≤ ψ 2 a.s., ∀ k ≥ 0 . December 8, 2020 DRAFT 43 Noting that F ( mh − 1) ⊆ F ( k − 1) , k ≥ mh , by the properties of the conditional expectation, we hav e E [ A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] = E [ E [ A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( k − 1)] |F ( mh − 1)] = E [ E [ A ( k , q ) |F ( k − 1)][[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] . (79) Since {hH ( k ) , A G ( k ) , λ j i ( k ) , j, i ∈ V i , k ≥ 0 } is an independent process, by Assumption A1.a , we kno w that A ( k , q ) is independent of F ( k − 1) , q = 0 , ..., d . Then, by (79), we hav e E [ A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] = E [ E [ A ( k , q )][[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] = E [ A ( k , q )] E [[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] , k = mh, ..., ( m + 1) h − 1 , q = 0 , ..., d. (80) Let G q ( k ) = I N n − Φ F ( k − 1 , k − q ) , q = 0 , ..., d . Then, Φ F ( k − 1 , k − q ) = I N n − G q ( k ) . Noting that k G ( k ) k ≤ ψ 2 < 2 1 d − 1 , by the binomial expansion, we hav e k G q ( k ) k = k I N n − ( I N n − G ( k − 1) · · · ( I N n − G ( k − q ) k ≤ [(1 + ψ 2 ) q − 1] < 1 . Hence, [Φ F ( k − 1 , k − q )] − 1 = ( I N n − G q ( k )) − 1 = P ∞ i =0 G i q ( k ) . It follows that [Φ F ( k − 1 , k − q )] − 1 − I N n = P ∞ i =1 G i q ( k ) . Therefore, k [Φ F ( k − 1 , k − q )] − 1 − I N n k ≤    ∞ X i =1 G i q ( k )    ≤ ∞ X i =1 [(1 + ψ 2 ) q − 1] i = (1 + ψ 2 ) q − 1 2 − (1 + ψ 2 ) q , q = 0 , ..., d a . s . (81) Noting that for an y symmetric matrix B ∈ R n × n , B ≥ λ min ( B ) I n , B ≤ k B k I n , and for any matrix B ∈ R n × n , k B k = k B T k , by the deﬁnition of Λ h m , we have ( m +1) h − 1 X k = mh b ( k ) E [ b L G ( k ) ] ⊗ I n + a ( k ) E [ H T ( k ) H ( k )] − b ( k ) 2 d X q =0 E [ A ( k , q )] E [[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] − b ( k ) 2 d X q =0 E [[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] T E [ A T ( k , q )] ! ≥ Λ h m I N n −      ( m +1) h − 1 X k = mh b ( k ) d X q =0 E [ A ( k , q )] E [[Φ F ( k − 1 , k − q )] − 1 − I N n |F ( mh − 1)]      I N n , (82) By the above, (80), (81) and the deﬁnition of e Λ h m , we have e Λ h m ≥ Λ h m −      ( m +1) h − 1 X k = mh b ( k ) d X q =0 E [ A ( k , q )] E [[Φ F ( k − 1 , k − q )] − 1 − I N n |F ( mh − 1)]      December 8, 2020 DRAFT 44 ≥ Λ h m − ( m +1) h − 1 X k = mh b ( k ) d X q =0 k E [ A ( k , q )] k (1 + ψ 2 ) q − 1 2 − (1 + ψ 2 ) q ≥ c ( m ) where the last inequality follows by the condition (21). Hence, e Λ h m ≥ c ( m ) . By Theorem V .1 and the conditions of the corollary , the proof is completed. Proof of Corollary V .2 . W e ﬁrst prove the ﬁrst part of the corollary . Let c ( m ) = min { a (( m + 1) h ) , b (( m + 1) h ) } . Since {G ( k ) , k ≥ 0 } ∈ Γ 1 , we know that E [ b L G ( k ) |F ( mh − 1] is positi ve semi-deﬁnite, k ≥ mh . Then, by the deﬁnitions of Λ h m and Λ h m , we have Λ h m ≥ c ( m )Λ h m . (83) Then, noting that c ( m ) ≥ min { 1 , 1 /C 1 } a (( m + 1) h ) , by the deﬁnitions of C 2 and C 3 , we hav e b ( mh ) ≤ C 2 a ( mh ) ≤ C 2 ( C 3 ) h a (( m + 1) h ) ≤ C 2 ( C 3 ) h max { 1 , C 1 } c ( m ) . (84) By the deﬁnitions of e Λ h m and Σ h m , (83) and (84), similar to (82), we have e Λ h m ≥ Λ h m − ( m +1) h − 1 X k = mh b ( k ) d X q =0 k E [ A ( k , q )([Φ F ( k − 1 , k − q )] − 1 − I N n ) |F ( mh − 1)] k ! ≥ Λ h m − b ( mh ) ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )([Φ F ( k − 1 , k − q )] − 1 − I N n ) |F ( mh − 1)] k ! ≥ c ( m )Λ h m − c ( m )Σ h m ≥ c ( m ) θ a . s . where θ > 0 by the condition (22). By Conditions C1.a and C1.b , similarly to (58)-(59), it follo ws that P ∞ m =0 c ( m ) = ∞ and b 2 ( mh ) = o ( c ( m )) . Then, the algorithm (7) con ver ges in mean square by Theorem V .1. W e next pro ve the second part of the corollary . Since {hH ( k ) , A G ( k ) , λ j i ( k ) , j, i ∈ V i , k ≥ 0 } is an independent process, by (80) and (81), we hav e k E [ A ( k , q )[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] k = k E [ A ( k , q )] E [[[Φ F ( k − 1 , k − q )] − 1 − I N n ] |F ( mh − 1)] k ≤ k E [ A ( k , q )] k (1 + ψ 2 ) q − 1 2 − (1 + ψ 2 ) q , q = 0 , ..., d. Noting the deﬁnition of Σ h m , we then have Σ h m ≤ C 2 ( C 3 ) h max { 1 , C 1 } sup m ≥ 0 ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )] k (1 + ψ 2 ) q − 1 2 − (1 + ψ 2 ) q ! . December 8, 2020 DRAFT 45 By the above and the condition (23), we know that inf m ≥ 0 (Λ h m − Σ h m ) ≥ θ where θ , inf m ≥ 0 Λ h m − C 2 ( C 3 ) h max { 1 , C 1 } sup m ≥ 0 ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )] k (1 + ψ 2 ) q − 1 2 − (1 + ψ 2 ) q ! > 0 . Then, the proof is completed. Proof of Corollary V .3 . Follo wing the lines of the proof of Lemma V .1, it can be veriﬁed that by b (0) ≤ f C 1 ,β a ,β H ,N ,d ( ψ 3 ) , Assumption A2.b and Condition C1.a , F ( k ) is in vertible a.s. and k G ( k ) k ≤ ψ 3 a.s., ∀ k ≥ 0 . Let c ( m ) = min { a (( m + 1) h ) , b (( m + 1) h ) } . Recalling the deﬁnition of Σ h m in Corollary V .2, by (83) and (84), we have e Λ h m ≥ Λ h m − ( m +1) h − 1 X k = mh b ( k ) d X q =0 k E [ A ( k , q )([Φ F ( k − 1 , k − q )] − 1 − I N n ) |F ( mh − 1)] k ! ≥ c ( m )(Λ h m − Σ h m ) ≥ c ( m )( θ − Σ h m ) , (85) where the last inequality follows by inf m ≥ 0 Λ h m ≥ θ a . s . W e next pro ve that θ − Σ h m has a positi ve lower bound under the conditions of the corollary . By the deﬁnition of ψ 3 , similar to (81), we hav e k [Φ F ( k − 1 , k − q )] − 1 − I N n k ≤ (1 + ψ 3 ) q − 1 2 − (1 + ψ 3 ) q , q = 0 , ..., d a . s . By the above, we hav e Σ h m C 2 ( C 3 ) h max { 1 , C 1 } = ( m +1) h − 1 X k = mh d X q =0 k E [ A ( k , q )([Φ F ( k − 1 , k − q )] − 1 − I N n ) |F ( mh − 1)] k ≤ ( m +1) h − 1 X k = mh d X q =0 E [ k A ( k , q ) kk [Φ F ( k − 1 , k − q )] − 1 − I N n k|F ( mh − 1)] ≤ ( m +1) h − 1 X k = mh d X q =0 E [ k A ( k , q ) k|F ( mh − 1)] (1 + ψ 3 ) q − 1 2 − (1 + ψ 3 ) q ≤ (1 + ψ 3 ) d − 1 2 − (1 + ψ 3 ) d ( m +1) h − 1 X k = mh d X q =0 E [ k A ( k , q ) k|F ( mh − 1)] ! ≤ N β a dh (1 + ψ 3 ) d − 1 2 − (1 + ψ 3 ) d . This together with ψ 3 < 1 + θ θ + N C 2 ( C 3 ) h max { 1 ,C 1 } β a dh ! 1 d − 1 gi ves θ − Σ h m ≥ θ − N C 2 ( C 3 ) h max { 1 , C 1 } β a dh (1 + ψ 3 ) d − 1 2 − (1 + ψ 3 ) d > 0 . December 8, 2020 DRAFT 46 Then, by (85), we hav e e Λ h m ≥ c 0 ( m ) , m ≥ 0 , where c 0 ( m ) = c ( m ) " θ − N C 2 ( C 3 ) h max { 1 , C 1 } β a dh (1 + ψ 3 ) d − 1 2 − (1 + ψ 3 ) d # . Similarly to (58)-(59), by Conditions C1.a and C1.b it is known that P ∞ m =0 c 0 ( m ) = ∞ and b 2 ( mh ) = o ( c 0 ( m )) . By Theorem V .1, we get the conclusion of the corollary . A P P E N D I X D : T H E D E T E R M I N I S T I C O B S E RV A T I O N M A T R I C E S I N T H E S I M U L A T I O N H 0 1 = [ e H 1 , 0 5 × 9 ] , H 0 2 = [ e H 2 , 0 7 × 5 ] , H 0 3 = [ 0 6 × 4 , e H 3 ] , H 0 4 = [ 0 4 × 7 , e H 4 ] , where e H 1 =          − 1 0 0 0 0 0 0 − 1 1 0 0 − 1 − 1 0 0 − 1 − 1 0 − 1 3          , e H 2 =                0 0 0 0 0 − 1 1 0 0 0 − 1 0 0 1 0 0 0 1 − 1 0 0 0 0 0 0 1 − 1 0 0 0 0 0 0 0 1 0 0 0 0 − 1 0 0 1 0 0 1 0 − 1 0 0 1 − 1 0 0 0 0                e H 3 =             1 0 0 0 0 0 − 1 0 0 1 0 0 0 0 0 0 − 1 0 0 0 0 0 0 0 1 − 1 0 − 1 0 0 0 0 0 2 1 0 − 1 0 0 0 0 0 − 1 3 − 1 0 0 0 0 0 0 0 1 − 1             , e H 4 =        1 − 1 0 0 0 0 1 0 0 0 0 − 1 − 1 0 0 0 − 1 2 0 1 − 1 0 0 0        . R E F E R E N C E S [1] A. Abur and A. G. Exposito, P ower System State Estimation: Theory and Implementation. Boca Raton, FL, USA: CRC Press, 2004. [2] Y . B. Shalom, X. R. Li, and T . Kirubarajan, Estimation W ith Applications to T racking and Navigation. New Y ork, USA: Wile y , 2001. [3] P . O. Arambel, C. Rago and R. K. Mehra, “Cov ariance intersection algorithm for distrib uted spacecraft state estimation”, in Pr oc. Amer . Contr . Conf ., Arlington, V A, USA, 25-27 Jun. 2001, pp. 4398-4403. [4] N. E. Leonard, and A. Olshe vsky , Cooperativ e learning in multiagent systems from intermittent measurements, SIAM. J. Contr ol and Optimization , vol. 53, no. 1, pp. 1-29, 2015. [5] G. Rigatos, P . Siano, and N. Zervos, “ A distributed state estimation approach to condition monitoring of nonlinear electric power systems, ” Asian J. Contr ol , vol. 15, no. 3, pp. 1-12, Jul. 2012. [6] D. M. Falcao, F . F . W u and L. Murphy , “Parallel and distributed state estimation”, IEEE T rans. P ower Systems , vol. 10, no. 2, pp. 724-730, May 1995. [7] I. Schizas, G. Mateos and G. Giannakis, “Distributed LMS for consensus-based in-network adaptive processing, ” IEEE T rans. Signal Processing , vol. 57, no. 6, pp. 2365-2382, Jun. 2009. December 8, 2020 DRAFT 47 [8] S. Das and J. M. F . Moura, “Consensus+innovations distributed Kalman ﬁlter with optimized gains, ” IEEE T rans. Signal Pr ocessing, vol. 65, no. 2, pp. 467-481, Jan. 2017. [9] N. E. Nahi, “Optimal recursive estimation with uncertain observation, ” IEEE T rans. Information Theory , vol. 15, no. 4, pp. 457-462, Jul. 1969. [10] V . Ugrinovskii, “Distributed robust estimation over randomly switching networks using H ∞ consensus, ” Automatica , vol. 49, no. 1, pp. 160-168, 2013. [11] S. Kar and J. M. F . Moura, “Gossip and distributed Kalman ﬁltering: W eak consensus under weak detectability , ” IEEE T rans. Signal Processing , vol. 59, no. 4, pp. 1766-1786, Apr 2011. [12] A. K. Sahu, D. Jakov eti ´ c and S. Kar , “ C I RF E : A distributed random ﬁelds estimator , ” IEEE T rans. Signal Processing , vol. 66, no. 18, pp. 4980-4995, Sep. 2018. [13] A. Sim ˜ oes and J. Xavier , “F ADE: Fast and asymptotically efﬁcient distributed estimator for dynamic networks, ” IEEE T rans. Signal Processing , vol. 567, no. 8, pp. 2080-2092, Apr . 2019. [14] C. G. Lopes and A. H. Sayed, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis, ” IEEE Tr ans. Signal Processing , vol. 56, no. 7, pp. 3122-3136, Jul. 2008. [15] F . S. Cattivelli and A. H. Sayed, “Dif fusion LMS strategies for distrib uted estimation, ” IEEE T rans. Signal Pr ocessing , vol. 58, no. 3, pp. 1035-1048, Mar . 2010. [16] S. Al-Sayed, A. M. Zoubir and A. H. Sayed, “Robust distributed estimation by network ed agents, ” IEEE T rans. Signal Pr ocessing , vol. 65, no. 15, pp. 3909-3921, Aug. 2017. [17] M. R. Gholami, M. Jansson, E. G. Str ¨ om and A. H. Sayed, “Diffusion estimation over cooperativ e multi-agent networks with missing data, ” IEEE T rans. Signal and Information Processing over Networks, vol. 2, no. 3, pp. 276-289, Sep. 2016. [18] R. Abdolee, B. Champagne and A. H. Sayed, “Diffusion adaptation ov er multi-agent networks with wireless link impairments, ” IEEE Tr ans. Mobile Computing , vol. 15, no. 6, pp. 1362-1376, Jun. 2016. [19] M. J. Piggott and V . Solo, “Diffusion LMS with correlated regressors I: Realization-W ise stability , ” IEEE T rans. Signal Pr ocessing , vol. 64, no. 21, pp. 5473-5484, Nov . 2016. [20] M. J. Piggott and V . Solo, “Diffusion LMS with correlated regressors II: Performance, ” IEEE T rans. Signal Pr ocessing , vol. 65, no. 15, pp. 3934-3947, Aug. 2017. [21] J. Y . Ishihara and S. A. Alghunaim, “Diffusion LMS ﬁlter for distributed estimation of systems with stochastic state transition and observation matrices, ” in Pr oc. Amer . Contr . Conf ., Seattle, W A, USA, 24-26 May 2017, pp. 5199-5204, [22] S. Kar , J. M. F . Moura and K. Ramanan, “Distrib uted parameter estimation in sensor networks: Nonlinear observ ation models and imperfect communication, ” IEEE T rans. Information Theory , vol. 58, no. 6, pp. 3575-3605, Jun. 2012. [23] S. Kar and J. M. F . Moura, “Consensus+innov ations distributed inference over networks: Cooperation and sensing in networked systems, ” IEEE Signal Pr ocessing Magazine , vol. 30, no. 3, pp. 99-109, May 2013. [24] Q. Zhang and J. F . Zhang, “Distributed parameter estimation over unreliable networks with markovian switching topologies, ” IEEE Tr ans. Automatic Contr ol , vol. 57, no. 10, pp. 2545-2560, Oct. 2012. [25] J. Zhang, X. He and D. Zhou, “Distributed ﬁltering over wireless sensor netw orks with parameter and topology uncertainties, ” International J. Control, DOI: 10.1080/00207179.2018.1489146, 2018. [26] M. Mahmoud, Robust Contr ol and Filtering for T ime-Delay Systems . New Y ork, USA: Marcel Dekker , 2000. [27] C. Peng and J. Zhang, “Delay-distrib ution-dependent load frequency control of po wer systems with probabilistic interv al delays, ” IEEE Tr ans. P ower Systems , vol. 31, no. 4, pp. 3309-3317, Jul. 2016. [28] Y . P . T ian, “T ime synchronization in WSNs with random bounded communication delays, ” IEEE T rans. Automatic Contr ol , vol. 62, no. 10, pp. 5445-5450, Oct. 2012. [29] Y . Zhang, F . Li and Y . Chen, “Leader-follo wing-based distributed Kalman ﬁltering in sensor networks with communication delay , ” J. the F ranklin Institute , vol. 354, no. 16, pp. 7504-7520, Sep. 2017. December 8, 2020 DRAFT 48 [30] P . Mill ´ an, L. Orihuela, C. V i vas and F . R. Rubio, “Distributed consensus-based estimation considering network induced delays and dropouts, ” Automatica , vol. 48, no. 10, pp. 2726-2729, Jul. 2012. [31] Y . Chen and Y . Shi, “Consensus for linear multiagent systems with time v arying delays: A frequency domain perspectiv e, ” IEEE T rans. Cybernetics , vol. 47, no. 8, pp. 2143-2150, Aug. 2017. [32] S. Liu, T . Li, and L. Xie, “Distributed consensus for multiagent systems with communication delays and limited data rate, ” SIAM J. Contr ol and Optimization , vol. 49, no. 6, pp. 2239-2262, Aug. 2011. [33] S. Liu, L. Xie and H. Zhang, “Distributed consensus for multi-agent systems with delays and noises in transmission channels, ” Automatica , vol. 47, no. 5, pp. 920-934, Mar . 2011. [34] A. J. W ood and B. F . W ollenberg, P ower Generation, Operation, and Control . Ne w Y ork, NY , USA: Wile y , 2012. [35] F . Chung, “Laplacians and the Cheeger inequality for directed graphs, ” Annals of Combinatorics , vol. 9, no. 1, pp. 1-19, 2005. [36] T . Li and J. W ang, “Distributed averaging with random network graphs and noises, ” IEEE T rans. Information Theory , vol. 64, no. 11, pp. 7063-7080, Nov . 2018. [37] R. Olfati-Saber and R. M. Murray , “Consensus problems in networks of agents with switching topology and time- delays, ” IEEE Tr ans. Automatic Contr ol , vol. 49, no. 9, pp. 1520-1533, Sep. 2004. [38] L. Guo, “Estimating time-varying parameters by the Kalman-ﬁlter based algorithm, ” IEEE T rans. Automatic Control , vol. 35, no. 2, pp. 141-147, Feb. 1990. [39] S. P . Meyn and R. L. T weedie, Markov Chains and Stochastic Stability . London, UK: Springer-V erlag, 1993. [40] K. Zhou and J. C. Doyle, Essentials of Robust Contr ol . Upper Saddle River , NJ, USA: Prentice-Hall, 1998. [41] L. Guo, T ime-V arying Stochastic Systems: Stability and Adaptive Theory (Second Edition). Beijing, China: Science Press, 2020. [42] H. Robbins and D. Sie gmund, “ A con ver gence theorem for nonne gati ve almost supermartingales and some applications, ” In Selected P apers , T . L. Lai, and D. Siegmund, Eds. New Y ork, NY , USA: Springer-V erlag, 1985. [43] O. Kallenberg, F oundations of Modern Pr obability , 2nd ed. New Y ork, NY , USA: Springer-V erlag, 2002. [44] H. Chan and U. Ozguner , “Closed-loop control of systems over communications network with queues, ” International Journal of Contr ol , vol. 62, no. 3, pp. 493-510, 1995. [45] L. Q. Zhang, Y . Shi, T . W . Chen, and B. Huang, “ A new method for stabilization of networked control systems with random delays, ” IEEE T rans. Automatic Control , v ol. 50, no. 8, pp. 1177-1181, Aug. 2005. December 8, 2020 DRAFT

Decentralized Cooperative Online Estimation With Random Observation Matrices, Communication Graphs and Time Delays

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment