Identification of Switched Autoregressive Systems from Large Noisy Data Sets

Iden tiﬁcation of Switc hed Autoregressiv e Systems from Large Noisy Data Sets Sarah Ho jjati nia 1 , Constantino M. Lago a 2 , a nd F abrizi o Dabb ene 3 ∗ † ‡ Abstract The pap er in tro duces a no vel metho dology for the iden tiﬁ cation of co eﬃcien ts of switc hed autoregressive linear m o dels. W e consider the case when the system’s outputs are con taminated b y p ossibly large v al- ues of measuremen t noise. It is assu med that only partial information on the probabilit y distrib ution of the noise is a v ailable. Given inp ut- output data, we aim at identifying switc hed system co eﬃcien ts and parameters of the distr ib ution of the n oise w h ic h are compatible with the collected data. System d y n amics are estimated thr ough exp ected v alues computation and by exploiting the strong la w of large n um- b ers. W e demonstrate the eﬃciency of the pr op osed approac h with sev eral academic examples. The metho d is shown to b e extremely eﬀectiv e in the situations where a large n umber of measurements is a v ailable; cases in w hic h previous appr oac hes based on p olynomial or mixed-in teger op timization cannot b e applied due to v ery large com- putational bu rden. ∗ 1 Sarah Ho jjatinia is with the Scho ol of E lectrical Engineering and Co mputer Science, The Pennsylv ania State University , University P a rk, P A, USA, szh199@ps u.edu † 2 Constantino M. Lagoa is with the School of Electric a l Engineering and Computer Science, The Pennsylv ania Sta te Universit y , Univ ers it y Park, P A, USA, lagoa@psu .edu ‡ 3 F abrizio Dabbene is with CNR-IEI IT, Politecnico di T o r ino, 101 29 T orino, Italy , fabrizio .dabbene @ieiit.cnr.it This work was pa r tially suppo rted by Na tio nal Institutes of Health (NIH) Grant R01 HL14273 2, Natio nal Science F oundation (NSF) Grant #1 8082 6 6 and the International Bilateral Joint CNR-JST L ab COOP S. 1 In tro duc tion The in terest in t he study of h ybrid systems has b een p ersisten tly gro wing in the la st y ears, due to their capabilit y o f describing real-world pro cesses in whic h contin uous and discrete time dynamics co exist and in teract. Besides classical automotive and chem ical pro cesses, emerging applications include computer vision, bio lo gical systems, and comm unication netw o r ks. Moreo ve r, h ybrid sys tems can b e used to eﬃcien tly a ppro ximate no nlinear dynamics, with broad application, ranging f rom civil structures to rob o t ics and systems bio lo gy , that en tail extracting informat io n from high volume data streams [13], [18]. In the case of high dimen tional data, nonlinear or der reduction or low dimensional sparse represen tations tec hniques [9], [5], [1 6], are v ery eﬀectiv e in handling static data, but most do not exploit dynamical information of the data. In the lit era t ur e, sev eral results ha ve b een obtained for the analysis and con trol of h ybrid system s, fo rmally c haracterizing imp ortant prop erties suc h as stabilit y or r eachabilit y , and pro p osing diﬀeren t con trol designs [1 0]. In parallel, researc hers rapidly realized that ﬁrst-principle mo dels may b e hard to deriv e esp ecially with the increase of div erse application ﬁelds. This spark ed in terest on the problem of iden t if ying hybrid (switc hed) mo dels start- ing from experimental data; see for instance the tutorial pap er [14] a nd the surv ey [3]. It should b e immediately p o in ted out that this iden tiﬁcation problem is not a simple one, since the sim ultaneous presence o f contin uous and discrete state v ariables giv es it a com binatorial nature. The situation b ecomes further complicated in the presence of unkno wn-but-b ounded noise. In this case the problem is in general NP-har d. Sev eral approaches hav e b een prop osed to address this diﬃculty , see e.g. [8]. The pap er [15 ] reform ulates the problem as a mixed-in teger program. These tec hniques prov ed to b e v ery eﬀectiv e in situations in v olving relativ ely small noise lev els or mo derate dimensions, but they do not app ear to scale we ll, and their p erformance deteriorates as the noise lev el or pro blem size increase. Of particular interes t are recen t approa c hes based on conv ex optimization: in [1] some r elaxatio n based on sparsit y ar e prop osed, while [12] dev elops a momen t ba sed a pproac h to identify the switc hed autoregressiv e exogenous system, and [6 ] adapts it to ward Marko vian jump systems iden tiﬁcation. These metho ds are surely more robust, and represen t the c hoice of reference for medium-size problems and medium v alues of noise, a nd ha ve found ap- plications in sev eral con texts, ranging from segmen tatio n problems arising in computer vision to biomedical systems. Ho wev er, the metho ds still rely on the solution of rather large optimiza- tion problems. Eve n if the conv ex nature of these problems a llo ws to limit the complexit y grow th, there are sev eral situations f or which their applica- tion b ecomes critical. F or instance, iden tiﬁcation problems cases that in volv e quite high noise lev els and/or large n um b er of measuremen ts. An enlighte ning example, which serv es as a practical mot iv at io n for our dev elopmen ts, arises in healthcare applications: the a v ailability of activity tr ack ing devic es allows to gather a large amoun t o f informat io n of the ph ysi- cal activit y of an individual. Phys ical activit y is a dynamic b eha vior, whic h in principle can b e mo deled as a dynamical system [7]. Moreo v er, its c harac- teristics ma y signiﬁcan tly c hange dep ending on the time of the da y , p osition, etc. This motiv ated the approac h of mo deling it as a switc hing system [2]. In this pap er, w e fo cus on cases in volving a v ery large n um b er of sample p oin ts, p ossibly aﬀected by la r g e levels of noise. In this situation, p olyno- mial/momen ts based approac hes b ecome ineﬀectiv e, and diﬀeren t metho d- ologies need to b e devised. The approa c h w e pro p ose builds up on the same premises a s [12]: the starting p oint is the algebraic pro cedure due to Ma and Vidal [11 ], where it has b een shown for noiseless pro cesses, it is p ossi- ble to iden tify the diﬀeren t subsystems in a switc hing system by recurring to a Generalized Principal Comp onent Analysis (G PCA). In par t icular, w e infer the parameters of eac h subsystem from the null space of a matrix V n ( r ) constructed fr o m the input-output data r via a nonlinear embedding (the V eronese map). The approac h was extended t o the cases in presence of pro cess noise in [12], showing ho w the en tries of this matrix dep end p olynomially on the unkno wn noise t erms. Then, the problem w as form ulated in an unkno wn- but-b ounded setting, lo oking for an a dmissible noise sequence rendering the matrix V n ( r ) rank deﬁcie n t. This problem w as then relaxed using p olynomial optimization metho ds. In this work, w e follo w the same line of reasoning, but t hen take a some- what diﬀeren t route. First, w e consider random noise, and w e a ssume t hat some information on the noise is a v a ilable. Then, instead o f relaxing the problem, w e exploit the a v ailability of a large n umber o f measuremen ts to mak e recurse to la w-of -large-num b ers type of reasoning. This allow s us t o devise an alg o rithm characterize d b y an extremely low complexit y in terms of required o p erations. The ensuing optimization problem in v olve s o nly the computation of the singular v ector asso ciated with the minim um singular v alue of a matrix that can b e eﬃcien tly computed and whose size do es not dep end on t he n umber of measuremen ts. 1.1 P ap er Organization The pap er is structured as follows : after this introduction, there is a brief notation section. Section 2 includes the problem statemen t. In Section 3, algebraic refor mulation of switc hed autoregressiv e (SAR) linear system iden- tiﬁcation problem for noiseless data is review ed. The problem of iden tifying SAR system in the presence of noise is surv ey ed in Section 4. In Section 5, the algorithm fo r estimating unkno wn noise parameters is desc rib ed. Numer- ical results are show n in Section 6. Finally , Section 7 concludes the pap er highligh ting some p ossible future researc h directions. 1.2 Notation Giv en a scalar ra ndo m v ar iable X , w e denote by m d its d th momen t, whic h ma y b e computed according to the follo wing in tegra l m d = E [ x d ] = Z ∞ −∞ x d f ( x ) dx (1) where E [ · ] refers to exp ectation, and f ( x ) is the proba bilit y densit y function of X . Additionally , the v ariance of X is indicated by s 2 . F or instance, if X has a normal distribution with zero mean and v ariance s 2 , i.e. f ( x ) = 1 s √ 2 π e − x 2 / 2 s 2 , its moments are give n b y m d = E [ x d ] = ( 0 if d is o dd s d ( d − 1)!! if d is ev en (2) where !! denotes double factorial ( n !! is the pro duct of all num b ers fro m n to 1 that hav e the same parity as n ). 2 Problem S tatemen t In this section, a complete description of the problem is addressed. In addi- tion, the required assumptions are deﬁned to solve the problem. 2.1 System Mo del W e consider SAR systems o f the form x k = n a X j =1 a j σ ( k ) x k − j + n c X j =1 c j σ ( k ) u k − j (3) where x k ∈ R is the output at time k and u k ∈ R is input at time k . The v ar ia ble σ ( k ) ∈ { 1 , ..., n } denotes the sub-system active at time k , where n is the total n um b er of sub-systems. F urthermore, a j σ ( k ) and c j σ ( k ) denote unkno wn co eﬃcien ts corresp onding to mo de σ ( k ). Time k takes v alues o ver the non-negative integers. In practice, output is alw ays contaminated b y noise; i.e. w e assume that w e o bserv e y k = x k + η k (4) where η k , denotes measuremen t noise. The following assumptions are made on the system mo del and noise. Assumption 1 Thr oughout this p ap er it is a s s ume d that: • Upp er b ounds on n a and n c ar e available. • Upp er b ound on the numb er of subsystems n is available. • Noise η k at time k is indep enden t f r om η l for k 6 = l , and identic al ly di s - tribute d with pr ob a bility density f ( η | θ ) ; wher e θ is a (low dimensio n al) ve ctor of unknown p ar ameters. • Input se quenc e u k applie d to the system is known and b ounde d. • Ther e exists a ﬁnite c ons tant L so that | x k | ≤ L for al l p ositive i n te g e rs k . 2.2 Problem Deﬁnition The main ob jectiv e of this pap er is to dev elop algorithms that are able to iden tify the co eﬃcien ts o f a SAR mo del from no isy observ ations. More pre- cisely , w e aim at solving the follow ing problem: Problem 1 Given Assumption 1, an input se quenc e u k , k = − n c +1 , . . . , N − 1 and noisy output me asur ements y k , k = − n a + 1 , . . . , N , determin e c o ef- ﬁcients of the SAR mo del a i,j , i = 1 , 2 , . . . , n a , j = 1 , 2 , . . . , n , c i,j , i = 1 , 2 , . . . , n c , j = 1 , 2 , . . . , n , and the noise distribution p ar ameters θ . 3 Noiseles s Case: A Review As a mo t iv ation for the a pproac h presen t ed in this pap er, w e r eview and sligh tly reform ulate earlier r esults on an algebraic reformulation of the SAR iden tiﬁcation problem for the case where no noise is presen t. W e refer the reader to [19] for details on this algebraic approach to switc hed system iden- tiﬁcation. 3.1 Hybrid Decoup ling Constrain t W e start b y noting that equation (3) is equiv alen t to b T σ ( k ) r k = 0 (5) where w e in tro duced t he (kno wn) regressor at time k r k = [ x k , x k − 1 , · · · , x k − n a , u k − 1 , · · · , u k − n c ] T and the v ector of unknown co eﬃcien t s at time k b σ ( k ) = [ − 1 , a 1 σ ( k ) , · · · , a n a σ ( k ) , c 1 σ ( k ) , · · · , c n c σ ( k ) ] T . Hence, indep enden tly of whic h of the n submo dels is activ e at time k , w e ha ve P n ( r k ) = n Y i =1 b T i r k = c T n ν n ( r k ) = 0 , (6) where the v ector o f parameters corresponding to the i -th submo del is denoted b y b i ∈ R n a + n c +1 , and ν n () is V eronese map o f degree n [4] ν n ([ x 1 , · · · , x s ] T ) = [ · · · , x n 1 1 x n 2 2 · · · x n s s , · · · ] T whic h con tains all monomials of order n in lexicographical o rder, and c n is a ve ctor whose en tries a r e p olynomial functions of unkno wn parameters b i (see [20] for explicit deﬁnition). The V eronese map ab o v e is also know n as p olynomial embedding in machin e learning [20]. Equation (6) holds for all k , and these equalities can b e expresse d in matrix form V n ( r ) c n =  ν n ( r 1 ) T , · · · , ν n ( r N ) T  T c n = 0 (7) where r , without the subscript, denotes the set of all regressor v ectors. Clearly , w e are able to iden tify c n (and hence the system’s parameters) if and only if V n ( r ) is rank deﬁcien t. In tha t case, the v ector c n can b e found b y computing the n ullspace of V n . 3.2 A Reform u lation of the Hybrid Decoupling Con- strain t Note that the n umber o f rows of the V eronese matrix V n is equal to the n umber of measuremen ts av ailable for the regressor; i.e., in the not a tion of our pap er, the n umber of rows is N . Therefore, a reform ulation of the results in t he previous section is needed to b e able to address the problem of iden tiﬁcation from v ery lar g e data sets. As men tioned in the previous section, in the absence of noise, the SAR system iden tiﬁcation is equiv alen t to ﬁnding a ve ctor c n satisfying c T n ν n ( r k ) = 0 for all k = 1 , 2 , . . . N . This is in turn equiv alen t to ﬁnding c n so that 1 N N X k =1 c T n ν n ( r k ) ν T n ( r k ) c n = 0 As a result, for the noiseless case, identify ing the co eﬃcien t s of the sub- mo dels is equiv alen t to ﬁnding the singular v ector c n asso ciated with the minim um singular v alue of the matrix M N = 1 N N X k =1 ν n ( r k ) ν T n ( r k ) . = 1 N N X k =1 M k (8) Note t ha t, b y using this equiv alent condition, w e only need to consider ma- trices of size  n + n a + n c n  . In other words, the size of this matrix do e s not dep end on the nu m b er of measuremen ts. This is esp ecially imp ortant when considering ve ry lar g e data sets. 4 SAR syst em Iden t i ﬁcation in the Presence of Noise No w, w e address the case where the measuremen ts of output of the switc hed autoregressiv e system are corrupted by noise. As a ﬁrst step, we consider the case where the distribution of the noise is kno wn, so it s momen ts m d are a v ailable. As seen in the previous section, iden tifying the parameters of the SAR mo del is equiv alen t to ﬁnding a vec tor in the n ull space of the matrix M N . Under mild conditions, the null space of the matrix ab ov e ha s dimension o ne if and only if the data is compatible with the assumed mo del. Ho w eve r, if noise is presen t, x k is not kno wn; therefore, this matrix cannot b e computed. In this section, w e use av ailable information on the stat istics o f the noise to compute a pproximations of the matrix M N , consequen tly appro ximations of v ectors in its null space. 4.1 On the P o w ers of x k Since w e do not ha v e access to the v alues of the o utput x k to estimate the v alues of the quan tities in equation (8), w e need to relate the p ow ers of x k to the measuremen ts and a v ailable inf o rmation of the noise. Note that x k is a (unkno wn) deterministic quan tity . Therefore x h k = E [ x h k ] (9) Since x k = y k − η k w e ha ve x h k = E [ x h k ] = E [( y k − η k ) h ] ∀ k = 1 , 2 , · · · , N (10) Assume, fo r simplicit y the distribution of the noise is symmetric with resp ect to the origin. As a result, all o dd mo ments a re zero (in part icular, the noise is zero mean, i.e. m 1 = 0). This assumption is made to simplify the calculations b elo w and the approach can b e immediately extende d to the non-symmetric case. W e concen trate o n computing the exp ected v alue of p ow ers of x k recur- siv ely and in a closed fo rm. First, we give an example of ho w to compute the exp ected v alue of p o w ers of x k for p ow ers h = 1 , 2. F or h = 1, we ha v e x k = E [ x k ] = E [ y k − η k ] = E [ y k ] − m 1 = E [ y k ] (11) while, for h = 2, we can write x 2 k = E [ x 2 k ] = E [( y k − η k ) 2 ] = E [ y 2 k ] − 2 E [ y k η k ] + E [ η 2 k ] . (12) Note that E [ y 2 k ] can b e estimated from collected data, and E [ η 2 k ] is equal to second momen t of noise ( m 2 ), whic h is a ssumed to b e kno wn. T o estimate the v alue of E [ y k η k ], consider the follow ing E [ y k η k ] = E [( x k + η k ) η k ] = E [ x k η k ] + E [ η 2 k ] . (13) The quan tities x k and η k are m utually indep enden t and, therefore, E [ x k η k ] = E [ x k ] E [ η k ], with E [ η k ] = m 1 = 0. As a conseque nce, w e ha ve E [ y k η k ] = E [ η 2 k ] (14) and ﬁnally the v alue of equation (12 ) is E [ x 2 k ] = E [ y 2 k ] − 2 E [ η 2 k ] + E [ η 2 k ] = E [ y 2 k ] − E [ η 2 k ] (15) = E [ y 2 k ] − m 2 The reasoning ab o v e can b e generalized t o any p o w er of x k . More pre- cisely , w e ha v e the follow ing result whose pro of is an immediate consequenc e of the deriv ations so far. Lemma 1 The exp e cte d value of the p owers of x k satisﬁes E [ x h k ] = E [( y k − η k ) h ] = E [ y h k ] − h X d =1  h d  E [ x h − d k ] E [ η d k ] = E [ y h k ] − h X d =1  h d  E [ x h − d k ] m d ∀ k = 1 , 2 , · · · , N . (16) M k = ν n ( r k ) ν T n ( r k ) =         x 4 k x 3 k x k − 1 x 3 k u k − 1 x 2 k x 2 k − 1 x 2 k x k − 1 u k − 1 x 2 k u 2 k − 1 ∗ x 2 k x 2 k − 1 x 2 k x k − 1 u k − 1 x k x 3 k − 1 x k x 2 k − 1 u k − 1 x k x k − 1 u 2 k − 1 ∗ ∗ x 2 k u 2 k − 1 x k x 2 k − 1 u k − 1 x k x k − 1 u 2 k − 1 x k u 3 k − 1 ∗ ∗ ∗ x 4 k − 1 x 3 k − 1 u k − 1 x 2 k − 1 u 2 k − 1 ∗ ∗ ∗ ∗ x 2 k − 1 u 2 k − 1 x k − 1 u 3 k − 1 ∗ ∗ ∗ ∗ ∗ u 4 k − 1         M k =    E [ y 4 k ] − 6 m 2 ( E [ y 2 k ] − m 2 ) − m 4 ( E [ y 3 k ] − 3 m 2 E [ y k ]) E [ y k − 1 ] ( E [ y 3 k ] − 3 m 2 E [ y k ]) u k − 1 ∗ ( E [ y 2 k ] − m 2 ) ( E [ y 2 k − 1 ] − m 2 ) ( E [ y 2 k ] − m 2 ) E [ y k − 1 ] u k − 1 ∗ ∗ ( E [ y 2 k ] − m 2 ) u 2 k − 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ · · · · · · ( E [ y 2 k ] − m 2 ) ( E [ y 2 k − 1 ] − m 2 ) ( E [ y 2 k ] − m 2 ) E [ y k − 1 ] u k − 1 ( E [ y 2 k ] − m 2 ) u 2 k − 1 ( E [ y 3 k − 1 ] − 3 m 2 E [ y k − 1 ]) E [ y k ] ( E [ y 2 k − 1 ] − m 2 ) E [ y k ] u k − 1 E [ y k ] E [ y k − 1 ] u 2 k − 1 ( E [ y 2 k − 1 ] − m 2 ) E [ y k ] u k − 1 E [ y k ] E [ y k − 1 ] u 2 k − 1 E [ y k ] u 3 k − 1 E [ y 4 k − 1 ] − 6 m 2 ( y 2 k − 1 − m 2 ) − m 4 ( E [ y 3 k − 1 ] − 3 m 2 E [ y k − 1 ]) u k − 1 ( E [ y 2 k − 1 ] − m 2 ) u 2 k − 1 ∗ ( E [ y 2 k − 1 ] − m 2) u 2 k − 1 E [ y k − 1 ] u 3 k − 1 ∗ ∗ u 4 k − 1       Figure 1: Example of construction of M k 4.2 On the Structure of M k W e deriv e some of the prop erties of the ma t rices M k no w. An immediate consequenc e of the results in the previous section is t he followin g. Lemma 2 Assume that the noise distribution, some p ar ameters of the noise, and the input sig n al ar e giv e n and ﬁxe d. L et mon n ( · ) den ote a function that r eturns a ve ctor with al l monomials up to or der n of its ar gument. T hen ther e exists an aﬃne function M ( · ) so that M k = E { M [ mon n ( y k , . . . , y k − n a )] } = M { E [ mon n ( y k , . . . , y k − n a )] } . It should b e noted that the ra ndo m v ar ia bles y k and y l are m utua lly indep enden t for k 6 = l , so the function ab o ve can further b e represen ted as a m ultilinear function o f the momen t s of y k . 4.3 An Example of Construction of M k T o b etter illustrate the approach used in this pap er, w e provide an example of how to construct the matr ix M k required for iden tiﬁcation. T o t his end, consider the problem of iden tifying a SAR system with n = 2 subsys tems of the form subsyste m 1 : x k = a 1 x k − 1 + b 1 u k − 1 subsyste m 2 : x k = a 2 x k − 1 + b 2 u k − 1 (17) from noisy measuremen ts y k = x k + η k (18) where η k has a symmetric distribution. W e can rewrite the sys tem as in equation (6). In particular, the v ector c 2 as a function of the parameters of the subsystems, a ssumes the form c 2 = [1 , − ( a 1 + a 2 ) , − ( b 1 + b 2 ) , a 1 a 2 , a 1 b 2 + b 1 a 2 , b 1 b 2 ] T . The regressor ve ctor r k at time k r k =  x k x k − 1 u k − 1  T giv es rise to the fo llo wing V eronese v ector ν n ( r k ) = r k ⊗ r k =         x 2 k x k x k − 1 x k u k − 1 x 2 k − 1 x k − 1 u k − 1 u 2 k − 1         (19) whose size is l × 1, with l =  n + n a + n c n  =  2+1+1 2  = 6. F rom r k and ν n ( r k ), w e can compute matrix M k , whic h is giv en in Fig ur e 1. Then, as w e hav e the v alues o f noisy output y k , w e compute expected v alue of p ow ers o f x k in terms of exp ected v alue o f p o wers of y k and momen ts of measureme n t noise. F ollo wing the results of Lemma 1, w e obtain the second matrix in Fig ure 1. F or system of equation (17), M k is giv en by the t wo expression in Fig ure 1. 4.4 Iden tiﬁcation Algorithm As mentioned b efore, to iden tify the par ameters of the SAR system, we need to b e able to estimate the matrix M N in equation (8 ) . I t turns o ut that it can b e done by using the av ailable noisy measuremen ts. M ore precisely , we ha ve the following result. Theorem 1 L et M ( · ) and mon n ( · ) b e the functions deﬁne d in L emma 2. Deﬁne c M N . = 1 N N X k =1 M [ mon n ( y k , . . . , y k − n a )] Then, as N → ∞ , c M N − M N → 0 a.s. Sk etch of pro of: See App endix. As a result, the empirical av erage computed using the noisy measuremen ts (where exp ected v a lues of monomials ar e replaced by the measured monomial v alues) con v erges to the desired matrix in equation (8). Therefore w e prop o se the following algorithm for iden tiﬁcatio n o f a SAR system. A lgorithm 1: Let n a , n c , n and some parameters of t he noise b e g iven. Step 1. Compute matrix c M N = 1 N N X k =1 M [ mon n ( y k , . . . , y k − n a )] Step 2. Let c n b e the singular vec tor asso ciated with the minimum singular v alue of c M N . Step 3. Determine the co eﬃcien ts of the subsys tems from the v ector c n . In order to p erform Step 3 in Algorithm 1, w e a dopt p olynomial dif- feren tion algorithm for mixtures of h yp erplanes, introduced b y Vidal [21, pp. 69–70 ]. 5 Estimating Unkno wn Noise P arameters W e now address the case where the distribution of the noise is not completely kno wn. As men tioned in Assumption 1, the distribition of the noise is kno wn except fo r a few parameters θ . F or simplicit y o f exp osition, lets consider the case where the noise has a normal distribution with zero mean and unkno wn v ar ia nce s 2 . The reasoning extends to an y noise distribution with a small n umber of unkno wn parameters. In suc h a case, the ob jectiv e is to simu ltaneously estimate system parame- ters and the v ariance o f noise. W e start b y noting t ha t computing M N using the true v a lue o f the v ariance results in a rank deﬁcien t matrix. Moreo ver, giv en collected data y k and u k , the matrix c M N is a con tin uous function of the momen ts of noise and, hence, a kn o wn contin uous function of the standard deviation s . Giv en previous conv ergence results, the true v alue of s will mak e c M N to hav e a very small minim um singular v alue (esp ecially fo r large v alues of N ). F or this reason, estimation of s can b e p erformed b y minimizing the minim um singular v alue o f matrix ab ov e ov er the a llo wable v alues of s . More precisely , w e prop ose the follo wing algorithm A lgorithm 2: Let n a , n c , n , some parameters of the noise and s max b e giv en. Step 1. Compute matrix c M N = 1 N N X k =1 M [ mon ( y k , . . . , y k − n a )] as a function of the noise parameter s . Step 2. Find the v alue s ∗ ∈ [0 , s max ] that minimizes the minim um singular v alue of c M N . Step 3. Let c n b e asso ciated singular v ector. Step 4. Determine the co eﬃcien ts of the subsys tems from the v ector c n . Note that the nonconv ex optimization of Step 2 can b e solv ed via an easily implemen ta ble line-searc h. Ho w ev er, t he solution s ∗ migh t not b e unique; i.e., there might exist sev eral v alues of s that lead to a minim um singular v alue v ery close t o zero. In practice, o ur exp erience has b een that, for suﬃcien tly large N , the ab ov e a lgorithm provid es b oth a go o d estimate of the systems co eﬃcien ts, and noise parameters; esp ecially if w e ta ke s ∗ to b e the smallest v alue o f s f o r whic h the minim um singular v alue of c M N is b elo w a giv en threshold ǫ. T able 1: Iden tifying p olynomial co eﬃcien ts for diﬀeren t v alues o f noise v ari- ance and diﬀerent syste m run. system V alue 1 V alue 2 V alue 3 V alue 4 V alue 5 # 1 − ( a 1 + a 2 ) -( b 1 + b 2 ) a 1 a 2 a 1 b 2 + b 1 a 2 true parameters 1 0.2 0 -0.15 -0.8 iden tiﬁcation 1 1 0.2002 0.0001 -0.1503 -0.7989 iden tiﬁcation 2 1 0.2002 0.0011 -0.1510 -0.7974 iden tiﬁcation 3 1 0.1977 0.0046 -0.1548 -0.7997 iden tiﬁcation 4 1 0.2120 0.0003 -0.1485 -0.8006 system V alue 6 V alue 7 V alue 8 V alue 9 # b 1 b 2 γ s 2 estimation of s 2 true parameters -1 - - - iden tiﬁcation 1 -0.9996 0.2410 0.1 0.1000 iden tiﬁcation 2 -1.0004 0.5187 0.5 0.4980 iden tiﬁcation 3 -0.9966 0.6494 1 1.0010 iden tiﬁcation 4 -1.0017 0.8516 2 1.9950 6 Numerical Res ults In the following examples, w e address the problem of iden tifying a tw o- mo des switc hed system of the form of equation (17), whose true co eﬃcien ts are a 1 = 0 . 3 , b 1 = 1 , a 2 = − 0 . 5 , a nd b 2 = − 1. Measuremen t noise is assumed to b e zero-mean with Normal distribution. In the nume rical examples presen ted, N = 1 0 6 input-output data is give n. T rue and iden tiﬁed co eﬃcien ts for diﬀeren t v a riances of noise, are presen ted in T able 1. V ariance of noise and noise to o utput ratio for each experiment are also sho wn in this table. The pro vided noise to output ratio ( γ ) is deﬁned as γ = max | η | max | y | (20) Results are as exp ected ev en for high v alues of noise in comparison to output. As it is illustrated in T able 1, the identiﬁed para meters a r e ve ry close to true v a lues whic h demonstrates the conv ergence of prop o sed algorithm ev en for small signal to noise ratio. Moreov er, the algorithm requires a v ery small computational eﬀort. F or the case of 10 6 measuremen ts and using an oﬀ-the-shelf core i5 laptop with 8 Gigs of RAM, the running time is b et we en 7 to 8 seconds , whic h shows the eﬀectiv eness of appro ac h for v ery large data sets. The second no rm of error b et w een true co eﬃcien ts of system and esti- mated co eﬃcien ts, k c n − ˆ c n k 2 , as a function of num b er of measuremen ts, N , is depicted in Figure 2 for diﬀeren t v alues of noise v ariance. As it can b e seen from Fig ur e 2, the error decreases as the n um b er of measuremen ts increases. Rate of conv ergence is fast, despite the fact that, in some of the exp erimen t s, a larg e amount of noise is used. It should b e noted that these results a re for one experimen t, and giv en t hat this is a realization of a random pro cess, error is not alw a ys decreasing. F or a ll v alues of noise v ariance, error will ev en tua lly decrease and the estimated v alues of co eﬃcien ts con ve rge to the true v alues. No w that w e hav e iden tiﬁed the co eﬃcien ts of p olynomial, it is time to iden tify eac h subsystems’ co eﬃcien ts. F or the ab o v e men tioned example, T able 2 sho ws t he v alues of subsystems co eﬃcien t s for diﬀeren t exp eriments related to diﬀeren t v alues of noise v ariance. As w e see in this table the v alue of co eﬃcien ts are v ery close to the true v a lues, ev en when the noise v ariance is high with noise magnitude in a v erage around 85% of the signal magnitude. The estimation of noise v ariance based on the structure of matrix M k is sho wn in T able 1 as w ell. The estimates of noise v ariance are v ery close to the true v alues of v ariance. By kno wing the structure of matrix M k , the dep endence of ev ery en try on the moments of noise, and the relation in b et w een these momen ts and the unkno wn v ariance (see Section 1 .2), w e are able to estimate the noise parameter (in this case, noise v ariance). This illustrates the capabilit y o f the prop o sed algorithm to estimate b oth system and noise para meters even for larg e v alues of noise. Tw o examples of the pro cess of estimating the unkno wn v ariance of noise are sho wn in Fig. 3; where Fig. 3(a) is for the case of giv en data contaminated 10 4 10 5 10 6 10 -3 10 -2 10 -1 s 2 = 0.1 s 2 = 0.5 s 2 = 1 s 2 = 2 Figure 2: Estimation erro r of system co eﬃcien ts with noise of v a r iance 1, and Fig. 3(b) is for data with measuremen t noise of v ariance 2. By taking s ∗ as the smallest lo cal minim um, the estimated v ar ia nce for b o th cases in Fig. 3( a ) and Fig. 3(b) is ve ry close to the true v alues. 7 conclus ion and futu re work In this pa p er w e hav e prop osed a metho dology to iden tify the co eﬃcien ts of switc hed a utoregressiv e pro cesses and unkno wn noise parameters, starting from partial information of the noise and give n input-output data o f switc hed system. The approac h is shown t o b e particularly eﬃcien t in the case of large amount o f data, situation that makes it p ossible t o exploit law-of-large- n umbers t yp e of results. The approac h requires the computation of singular v alue decomposition of a sp ecially constructed input-output V eronese matrix. T able 2: Iden tif ying submo dels’ co eﬃcien ts for diﬀeren t v alues of noise v a r i- ance. co eﬀ- true v ar ia nce v ariance v ariance v ariance icien ts v alues s 2 = 0 . 1 s 2 = 0 . 5 s 2 = 1 s 2 = 2 a 1 0.3 0.3002 0.2981 0.3006 0.2938 b 1 1 0.9988 1.0007 0.9412 1.0031 a 2 -0.5 -0.4996 -0.5000 -0.5006 -0.5 059 b 2 -1 -0.9999 -0.9 991 -1.0 004 -1.0011 The ensuing singular v ector is then related t o the switc hed system parameters to b e iden tiﬁed. W e pro ve that the estimated parameters conv erge to the true ones a s t he num b er o f measuremen ts gro ws. Numerical sim ulat io ns sho w a lo w estimation error, eve n in t he case of large measuremen t noise. Also, in cases that noise distribution is not completely kno wn, simulation results sho w v ery close estimation of unknown parameters of noise to the true v alues. In fut ure w o r k, we will consider the problem of iden tifying switc hed systems with pro cess noise from larg e amo unt of noisy data. Moreo ve r, w e will a ddress the problem of iden tifying switc hing dynamics in switc hed pro cesses form large noisy dat a sets. References [1] La ur ent Bak o. Iden tiﬁcation of switc hed linear systems via sparse opti- mization. Aut omatic a , 47(4 ):668–677 , April 2011 . [2] D a vid E Conro y , Sarah Ho jjatinia, Constan tino M La goa, Chih-Hsiang Y ang , Stephanie T Lanza, and Joshua M Smy th. P ersonalized mo d- els o f ph ysical a ctivit y resp onses to text message micro-interv en tions: A pro o f-of-concept application of con tr o l systems engineering metho ds. Psycholo gy of Sp ort and Exer cise , 41:17 2 –180, 2019. [3] Andrea G arulli, Simone Paoletti, and Antonio Vicino. A surv ey on switc hed and piecewise aﬃne system iden tiﬁcation. IF A C Pr o c e e dings V olumes , 45(16):3 4 4–355, 2012. [4] Jo e Harris. Algebr aic ge ometry: a ﬁrst c ourse , v olume 133. Springer Science & Business Media, 2013. [5] Sar a h Ho jjat inia , Korkut Bekiroglu, and Constan tino M Lagoa. P arsi- monious v olterra system iden tiﬁcatio n. In 2018 A n nual Americ an Con- tr ol Confer enc e (ACC) , pages 1933–19 38. IEEE, 2018. [6] Sar a h Ho jjatinia, Constan tino M. Lag o a, and F abrizio Dabb ene. A metho d for iden tiﬁcation of m ark ovian j ump arx pro cesses. IF A C- Pap ersOnLine , 50(1) :14088 – 14093 , 2017 . 20th IF A C W orld Congress. [7] Constantino M. Lagoa, Da vid E. Conro y , Sarah Ho jjatinia, and Chih- Hsiang Y a ng . Mo deling sub ject resp onse to in terv en tions aimed at in- creasing ph ysical activit y: A control systems a ppro ac h. Poster Pr esente d at 5th In terna tion a l Co n fer enc e on A mbulatory Monitoring of Physic al A ctivity and Movement , 2017. [8] F abien La uer, G ´ erard Blo c h, and Ren ´ e Vidal. A contin uous optimizatio n framew ork for h ybrid system iden tiﬁcation. Automatic a , 47(3):608 –613, 2011. [9] Ruei- Sung Lin, Che-Bin Liu, Ming-Hsuan Y ang, Narendra Ahuja, and Stephen Levinson. L ear ning nonlinear manifolds from time series. In Eu- r op e an Confer enc e on Co mputer Vision , pages 245–256. Springer, 2006. [10] Ja n Lunze and F ran¸ coise Lamnabhi-Laga r r igue. Handb o ok of hybrid systems c ontr ol: the ory, to ols, a p plic ations . Cambridge Univ ersit y Press, 2009. [11] Yi Ma and Ren´ e Vidal. Identiﬁc ation of Determini s tic Switche d ARX Systems via Identiﬁc ation o f Algebr aic V arieties , pages 449 – 465. Springer Berlin Heidelb erg , Berlin, Heidelberg, 200 5 . [12] Necmiy e Oza y , Constan tino La g oa, and Mario Sznaier. Set mem b ership iden tiﬁcation of switc hed linear systems with kno wn n umber o f subsys- tems. Aut omatic a , 51:180 – 191, 2015. [13] Necmiy e Ozay , Mario Sznaier, and Constantino Lagoa. Conv ex certiﬁ- cates for mo del (in) v alidation o f switc hed aﬃne sys tems with unkno wn switc hes. IEEE T r ansactions on Aut omatic C ontr ol , 5 9(11):2921 –2932, 2014. [14] Simone P aoletti, Aleksandar Lj Juloski, Giancarlo F errari-T recate, and Ren ´ e Vidal. Identiﬁc ation of h ybrid systems a tuto rial. Eur op e a n journal of c ontr ol , 13(2-3 ) :242–260, 2007. [15] Ja cob Roll, Alb erto Bemp o r a d, and Lennart Ljung. Iden tiﬁcation of piecewise aﬃne systems via mixed-in teger programming. A utomatic a , 40(1):37–5 0, 2004. [16] L awrence K Saul a nd Sa m T Row eis. Think globally , ﬁt lo cally: un- sup ervised learning of low dimensional manifolds. Journal of machine le arning r ese ar c h , 4(Jun):119–155 , 2003. [17] Pra na b K. Sen and Julio d. M. Singer. L ar ge sa mple metho ds in statis- tics: an in tr o duction with applic ations . Chapman & Hall, New Y ork, 1993. [18] Mar io Sznaier, Octa via Camps, Necmiy e Oza y , and Constan tino Lagoa. Surviving the up coming data deluge: A systems and control p ersp ectiv e. In 53r d IEEE C onfer enc e on De cision and C o ntr o l , pages 1488 – 1498. IEEE, 2014. [19] R . Vidal, S. Soatto, Yi Ma, and S. Sastry . An algebraic geometric approac h to the identiﬁcation of a class of linear h ybrid system s. In Pr o c. 42nd IEEE Conf . De cision and Contr ol , v olume 1, pag es 167–1 7 2, Decem b er 2003. [20] R ene Vidal, Yi Ma, and Shank ar Sastry . Generalized principal comp o- nen t a nalysis ( gp ca). IEEE tr ansactions on p attern analysis and ma - chine intel ligenc e , 2 7 (12):1945– 1959, 200 5. [21] R ene Esteban Vidal and Shank a r Sastry . Gener alize d p ri n cip al c omp o- nent an a lysis (gp c a): an algebr aic ge ometric appr o ach to subsp ac e clus- tering and motion se gmentation . Electronics Researc h Lab oratory , Col- lege of Engineering, Univ ersit y of California , 2 003. Sk etch of Pro of of Theorem 1: F or simplicit y of presen tation, let c M k . = M [ mon n ( y k , . . . , y k − n a )] W e ﬁrst note that, given the assumptions made on the noise, u k and x k , the en tries of c M k ha ve a v a riance uniformly b ounded fo r all k . Moreo v er k > l + n a ⇒ c M k and c M l are indep enden t. Hence, b y Kolmogorov’s Strong Law of Large Num b ers [17 ] w e ha v e 1 L L X l =1 c M k + l ( n a +1) − 1 L L X l =1 E [ c M k + l ( n a +1) ] → 0 a.s. as L → ∞ . Since E [ c M k ) ] = M k for all p ositiv e in teger k and applying t he results ab o ve for k = 1 , 2 , . . . , n a + 1, w e conclude that 1 N N X j =1 c M j − 1 N N X j =1 M j → 0 a.s. as N → ∞ . 0 0.5 1 1.5 2 0 0.5 1 1.5 (a) Case 1: true noise v ariance s 2 = 1. 0 1 2 3 4 0 0.5 1 1.5 2 (b) Case 2 : true noise v ariance = s 2 = 2. Figure 3: Estimation o f noise v ariance

Identification of Switched Autoregressive Systems from Large Noisy Data Sets

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment