Learning Koopman Eigenfunctions and Invariant Subspaces from Data: Symmetric Subspace Decomposition

1 Learning K oopman Eigenfunctions and In v ariant Subspaces from Data: Symmetric Subspace Decomposition Masih Haseli and Jorge Cort ´ es Abstract —This paper dev elops data-driven methods to iden- tify eigenfunctions of the Koopman operator associated to a dynamical system and subspaces that are invariant under the operator . W e build on Extended Dynamic Mode Decomposition (EDMD), a data-driven method that ﬁnds a ﬁnite-dimensional approximation of the Koopman operator on the span of a predeﬁned dictionary of functions. W e propose a necessary and sufﬁcient condition to identify Koopman eigenfunctions based on the application of EDMD forward and backward in time. Moreo ver , we propose the Symmetric Subspace Decomposition (SSD) algorithm, an iterative method which provably identiﬁes the maximal Koopman-in variant subspace and the K oopman eigenfunctions in the span of the dictionary . W e also introduce the Str eaming Symmetric Subspace Decomposition (SSSD) algo- rithm, an online extension of SSD that only requires a small, ﬁxed memory and incorporates new data as is receiv ed. Finally , we pr opose an extension of SSD that appr oximates K oopman eigenfunctions and in variant subspaces when the dictionary does not contain sufﬁcient informative eigenfunctions. I . I N T RO D U C T I O N Driv en by advances in processing, data storage, cloud ser- vices, and algorithms, the world has witnessed in recent years a re volution in data-driv en learning, analysis, and control of dynamical phenomena. State-space, probabilistic, and neural network models are among the most popular methods to model dynamical systems. With sufﬁcient a priori information about the dynamics, state-space methods can provide closed- form analytic models that describe accurately the dynamical behavior . Such models, howe ver , are generally nonlinear and their analytical study becomes arduous for moderate to high- dimensional systems. Probabilistic approaches, on the other hand, provide an alternative description that is conductive to dealing with incomplete information about the underlying dy- namics. Howe ver , under such approaches, deriving mathemati- cal guarantees may be hard, if not impossible. Neural networks can describe the dynamics with high accuracy giv en enough data. The models acquired by neural networks are highly non- linear and difﬁcult to study analytically . Hence, even though they can be successful in predicting the beha vior of the system, they often do not provide a deeper understanding into their dynamics. These reasons hav e motiv ated researchers to seek alternativ e strategies to capture the dynamics using data with This work was supported by ONR A ward N00014-18-1-2828. A preliminary v ersion of this work appeared as [1] at the IEEE Conference on Decision and Control. Masih Haseli and Jorge Cort ´ es are with Department of Mechanical and Aerospace Engineering, Univ ersity of California, San Diego, CA 92093, USA, { mhaseli,cortes } @ucsd.edu minimum a priori information in a computationally efﬁcient way that result in simple yet accurate models. Approximating the Koopman operator associated with a dynamical system is one of such strategies. The Koopman operator is a linear but generally inﬁnite-dimensional operator that fully describes the behavior of the underlying dynamical system. Even though the linearity of the K oopman operator makes its spectral properties a powerful tool for analysis, its inﬁnite-dimensional nature prev ents the use of con ventional linear algebraic tools dev el- oped to work with digital computers. One way to circumvent this issue is to identify ﬁnite-dimensional subspaces that are in variant under the K oopman operator . This paper de velops data-driv en methods to identify such subspaces. Literatur e Revie w: The K oopman operator [2], [3] is a linear but generally inﬁnite-dimensional operator that provides an alternative view of dynamical systems by describing the effect of the dynamics on a functional space. Being a linear operator enables one to use its spectral properties to capture and predict the behavior of nonlinear dynamical systems [4]– [6]. This leads to a wide variety of applications including state estimation [7], [8], system identiﬁcation [9]–[11], sensor and actuator placement [12], model reduction [13], [14], control [15]–[21], and robotics [22], [23]. Moreover , the eigenfunctions of the K oopman operator play an important role in stability analysis of nonlinear systems [24]. Due to the inﬁnite-dimensional nature of the K oopman operator, the digital implementation of the aforementioned applications is not possible unless one can ﬁnd a way to represent the effect of the operator on ﬁnite-dimensional subspaces. The literature has e xplored se veral data-driv en methods to ﬁnd such ﬁnite-dimensional approximations, which can be divided into two main categories: projection methods and in variant- subspace methods. Projection methods ﬁt a linear model to the data acquired from the system. The most popular approach in this category is Dynamic Mode Decomposition (DMD), ﬁrst proposed to capture dynamical information from ﬂuid ﬂows [25]. DMD uses linear algebraic methods to form a linear model from time series data. The work [26] e xplores the properties of DMD and its connection with the K oopman operator , and [27] generalizes it to work with non-sequential data snapshots. Several extensions perform online computa- tions to work with streaming datasets [28]–[30], account for the effect of measurement noise on data [31], [32], promote sparsity [33], and consider time-lagged data snapshots [34]. Extended Dynamic Mode Decomposition (EDMD) [35] is an important variations of DMD that lifts the states of the system 2 to a (generally higher -dimensional) functional space using a predeﬁned dictionary of functions and ﬁnds the projection of the K oopman operator on that subspace. The work [36] studies the con vergence properties of EDMD to the K oopman operator as the number of data snapshots and dictionary elements go to inﬁnity . EDMD is speciﬁcally designed to work with exact data and experiments and simulations show that it may not work well with noisy data. Our previous work [37] presented a noise-resilient extension of EDMD able to work with data corrupted with measurement noise. The basic lifting idea of EDMD can also be combined with known information about the dynamics to increase the accuracy of the model [38]. The aforementioned methods provide linear higher-dimensional approximations for the underlying dynamics that are, howe ver , not suitable for long term predictions, since they are generally not exact. This issue can be tackled by ﬁnding subspaces that are inv ariant under the K oopman operator, since the acquired linear models are exact over them. This is the subject of the second group of approaches. The works [39]–[42] provide approaches to ﬁnd functions that span Koopman-in variant subspaces using neural networks. Moreover , since K oopman eigenfunctions span Koopman-in variant subspaces, one can use the empirical methods provided in [19], [43] to approxi- mate the K oopman eigenfunctions and consequently the in vari- ant subspaces. Moreover , the work in [44] provides theoretical and empirical results based on multi-step predictions to ﬁnd K oopman eigenfunctions. Interestingly , note that none of the aforementioned methods provide mathematical guarantees for the identiﬁed functions to be Koopman eigenfunctions. Statement of Contributions: W e present data-dri ven methods to identify K oopman eigenfunctions and K oopman-in variant subspaces associated with a potentially nonlinear dynami- cal system. First, we study the properties of the standard EDMD method re garding the identiﬁcation of K oopman eigen- functions. W e prove that EDMD correctly identiﬁes all the K oopman eigenfunctions in the span of the predeﬁned dic- tionary . This necessary condition howe ver is not sufﬁcient, i.e., the functions identiﬁed by the EDMD method are not necessarily Koopman eigenfunctions. This motiv ates our next contribution, which is a necessary and sufﬁcient condition that characterizes the functions that evolv e linearly according to the av ailable data snapshots. This condition is based on the application of EDMD forward and backward in time. The identiﬁed functions are not necessarily Koopman eigenfunc- tions, since one can only guarantee that they ev olve linearly on the av ailable data (but not necessarily starting anywhere in the state space). Howe ver , we prove that under reasonable assumptions on the density of the sampling, the identiﬁed functions are Koopman eigenfunctions almost surely . Our next contribution seeks to provide computationally ef ﬁcient ways of identifying K oopman eigenfunctions and K oopman-in variant subspaces. In fact, checking the aforementioned necessary and sufﬁcient condition requires one to calculate and compare the eigendecomposition of two potentially large matrices, which can be computationally cumbersome. Moreover , ev en though the subspace spanned by all the eigenfunctions in the span of the original dictionary is Koopman-in variant, it might not be maximal. T o address these limitations, we propose the Symmetric Subspace Decomposition (SSD) strate gy , which is an iterativ e method to ﬁnd the maximal subspace that remains inv ariant under the application of dynamics (and its associated K oopman operator) according to the available data. W e prove that SSD also ﬁnds all the functions that ev olve linearly in time according to the av ailable data. Moreov er, we prove that under the same conditions in the sampling density , the SSD strategy identiﬁes the maximal Koopman- in variant subspace in the span of the original dictionary almost surely . Our next contribution is moti vated by applications where the data becomes av ailable in an online fashion. In such scenarios, at any giv en time step, one would need to perform SSD on all the av ailable data received up to that time. Performing SSD requires the calculation of several singular value decompositions for matrices that scale with the size of the data, in turn requiring signiﬁcant memory capabilities. T o address these shortcomings, we propose the Streaming Symmetric Subspace Decomposition (SSSD) strategy , which reﬁnes the calculated K oopman-inv ariant subspaces each time it receives new data and deals with matrices of ﬁxed and relativ ely small size (independent of the size of the data). W e prove that SSSD and SSD methods are equiv alent, in the sense that for a given dataset, they both identify the same maximal K oopman-inv ariant subspace. Our last contrib ution is motiv ated by the fact that, in some cases the predeﬁned dictionary does not contain sufﬁcient eigenfunctions to capture important information from the dynamics. T o address this issue, we provide an extension of SSD, termed Approximated- SSD, enabling us to approximate Koopman eigenfunctions and in variant subspaces. W e show how the accuracy of the approximation can be tuned using a design parameter . I I . P R E L I M I NA R I E S In this section 1 , we revie w basic concepts on the K oopman operator and Extended Dynamic Mode Decomposition. A. K oopman Operator Here, we introduce the (discrete-time) Koopman operator and its spectral properties following [6]. Consider a nonlinear, 1 W e denote by N , N 0 , R , R ≥ 0 , and C , the sets of natural, nonnegati ve integer , real, positive real, and complex numbers respectiv ely . For a matrix A ∈ C m × n , we denote the sets comprised of its rows by rows( A ) , its columns by cols( A ) , the number of its rows by ] rows( A ) , and the number of its columns by ] cols( A ) , respectiv ely . In addition, we denote its pseudo- in verse, transpose, complex conjugate, conjugate transpose, Frobenius norm, and range space by A † , A T , ¯ A , A H , k A k F , and R ( A ) , respectiv ely . For 1 ≤ i < k ≤ m , we denote by A i : k the matrix formed with the i th to k th rows of A . Moreover , A i,j denotes the ij th element of A . For a square nonsingular matrix B , we denote its in verse by B − 1 . Given matrices A ∈ C m × n and B ∈ C m × d , we denote by [ A, B ] ∈ C m × ( n + d ) the matrix created by concatenating A and B . The angle between vectors v , w ∈ R n is ∠ ( v , w ) . Giv en v 1 , . . . , v k ∈ C n , span { v 1 , . . . , v k } represents the set comprised of all linear combinations c 1 v 1 + · · · + c n v n , with c 1 , . . . , c n ∈ C . W e use j to denote the imaginary unit (the solution of x 2 + 1 = 0 ). For v ∈ C n , we denote its real and imaginary parts by Re( v ) and Im( v ) , and its 2-norm as k v k 2 := √ v H v . Gi ven a set A , we denote its complement by A c . Giv en sets A and B , A ⊆ B means that A is a subset of B . W e denote by A ∩ B and A ∪ B the intersection and union of A and B , and set A \ B := A ∩ B c . Giv en a sequence of sets { A i } ∞ i =1 , we denote its superior and inferior limits by lim sup i →∞ A i and lim inf i →∞ A i , respectively . W e refer to the set consisting of all continuous strictly increasing functions α : R ≥ 0 → R ≥ 0 with α (0) = 0 by class- K . Giv en f : B → A and g : C → B , f ◦ g : C → A denotes their composition. 3 time-in variant, continuous map T : M → M on M ⊆ R n , deﬁning the dynamical system x + = T ( x ) . (1) The dynamics (1) acts on the points in the state space M and generates trajectories of the system. The K oopman operator , on the other hand, provides an alternati ve approach to analyze (1) based on ev olution of functions (also known as observables) deﬁned on M and taking v alues in C . Formally , let F be a linear space of functions from M to C which is closed under composition with T , i.e., f ◦ T ∈ F , ∀ f ∈ F . (2) The Koopman operator K : F → F associated with (1) is K ( f ) = f ◦ T . A closer look at the deﬁnition of the Koopman operator shows that it adv ances the observables in time, i.e., for g = K ( f ) then g ( x ) = f ◦ T ( x ) = f ( x + ) , ∀ x ∈ M . (3) This equation shows how the K oopman operator encodes the dynamics on the functional space F . The operator is linear as a direct consequence of linearity in F , i.e., for ev ery f 1 , f 2 ∈ F and c 1 , c 2 ∈ C , K ( c 1 f 1 + c 2 f 2 ) = c 1 K ( f 1 ) + c 2 K ( f 2 ) . (4) Assuming F contains the functions describing the states of the system, g i ( x ) = x i with i ∈ { 1 , . . . , n } , the Koopman operator fully characterizes the global features of the dynamics in a linear f ashion. Moreover , the operator might be (and generally is) inﬁnite dimensional either by choice of F or due to closedness requirement in (2). Being linear , one can naturally deﬁne its eigendecomposi- tion. A function φ ∈ F is an eigenfunction of K associated with eigen value λ ∈ C if K ( φ ) = λφ. (5) The combination of (3) and (5) leads to a signiﬁcant property of the Koopman operator: the linear ev olution of its eigen- functions in time. Formally , giv en an eigenfunction φ , φ ( x + ) = ( φ ◦ T )( x ) = K ( φ )( x ) = λφ ( x ) . (6) The linear ev olution of eigenfunctions, together with linear- ity (4), enables us to use spectral properties to analyze the nonlinear system (1). Given a set of eigenpairs { ( λ i , φ i ) } N k i =1 such that K ( φ i ) = λ i φ i , i ∈ { 1 , . . . , N k } , one can describe the evolution of every function f in span( { φ i } N k i =1 ) , i.e., f = P N k i =1 c i φ i , for some { c i } N k i =1 ⊂ C , as f ( x ( k )) = N k X i =1 c i λ k i φ i ( x (0)) , ∀ k ∈ N 0 . (7) The constants { c i } N k i =1 are called K oopman modes . It is im- portant to note that one might need to use N k = ∞ to fully describe the behavior of the dynamical system. Another important notion in the analysis of the K oopman operator is the inv ariance of subspaces under its application. Formally , a subspace S ⊆ F is K oopman-in variant if for ev ery f ∈ S we have K ( f ) ∈ S . Furthermore, S is maximal K oopman-invariant in L ⊆ F if it contains every K oopman- in variant subspace in L . Naturally , a set comprised of K oop- man eigenfunctions spans a K oopman-inv ariant subspace. B. Extended Dynamic Mode Decomposition Our exposition here mainly follows [35]. As mentioned earlier , despite its linearly , the inﬁnite-dimensional nature of the K oopman operator obstructs the use of efﬁcient linear algebraic methods. One natural way to ov ercome this problem is ﬁnding ﬁnite-dimensional approximations for it. Extended Dynamic Mode Decomposition (EDMD) is a popular data- driv en method to perform this task that lifts data snapshots acquired from the dynamical system to a higher-dimensional space using a predeﬁned dictionary of functions. The projec- tion of the action of the operator on the span of the dictionary can then be found by solving a least-squares problem. Formally , let D : R n → R 1 × N d be a dictionary of N d functions in F with D ( x ) = [ d 1 ( x ) , . . . , d N d ( x )] . Moreover , let X , Y ∈ R N × n be matrices comprised of N data snapshots such that y i = T ( x i ) for i ∈ { 1 , . . . , N } , where x T i and y T i are i th ro ws of X and Y , respecti vely . For con venience, we deﬁne the action of the dictionary on a matrix as D ( X ) := [ D ( x 1 ) T , . . . , D ( x N ) T ] T . The EDMD method approximates the projection of the K oop- man operator by ﬁnding the matrix that best explains the data ov er the dictionary , i.e., minimize K kD ( Y ) − D ( X ) K k 2 F , which yields the closed-form solution K ∗ = EDMD( D , X, Y ) := D ( X ) † D ( Y ) . (8) Note that the solution depends on the choice of dictionary . If the dictionary spans a Koopman-in variant subspace, then kD ( Y ) − D ( X ) K ∗ k 2 F = 0 and K ∗ fully captures the ev olution of functions in span( D ( x )) . Otherwise, EDMD loses some information about the dynamics. I I I . P RO B L E M S TA T E M E N T As described in Section II-B, the EDMD method loses information about the dynamical system when the employed dictionary does not span a Koopman-in variant subspace. As a result, in such cases, the EDMD approximation is not appropriate for long term prediction of the state ev olution. Motiv ated by this observation, our goal is to ﬁnd the maximal K oopman-inv ariant subspace and Koopman eigenfunctions in the span of a giv en dictionary . Formally , giv en the dynamical system (1) deﬁned by T : M → M , data matrices X and Y comprised of N data snapshots, and an arbitrary dictionary of functions D , our main goal is two-fold: (a) ﬁnd all the K oopman eigenfunctions in span( D ( x )) ; (b) ﬁnd a basis for the maximal K oopman-in variant sub- space in span( D ( x )) . 4 Note that (a) and (b) are closely related. The eigenfunc- tions found by solving (a) span Koopman-in variant subspaces. Those inv ariant subspaces howe ver might not be maximal. This mild difference between (a) and (b) requires the use of different solution approaches. Since we are dealing with ﬁnite- dimensional linear subspaces, we aim to use linear algebra instead of optimization-based methods, which are widely used for solving these types of problems. This enables us to directly use computationally efﬁcient linear algebraic packages that optimization methods rely on. Throughout the paper , we use the following assumption regarding the dictionary snapshots. Assumption 3.1: (Full Column Rank Dictionary Matrices): The matrices D ( X ) and D ( Y ) hav e full column rank.  Assumption 3.1 is reasonable: in order to hold, the dictio- nary functions must be linearly independent, i.e., the functions must form a basis for span( D ( x )) . Moreover , the assumption requires the set of initial conditions rows( X ) to be div erse enough to capture important characteristics of the dynamics. Our treatment here relies on EDMD, which is not speciﬁcally designed to work with data corrupted with measurement noise. Hence, we assume access to data with high signal-to-noise ratio. In practice, one might need to pre-process the data to use the algorithms proposed here. I V . E D M D A N D K O O P M A N E I G E N F U N C T I O N S Here we inv estigate the capabilities and limitations of the EDMD method regarding the identiﬁcation of Koopman eigenfunctions. Throughout the paper , we use the follo wing notations to represent the EDMD matrices applied on data matrices X and Y forward and backward in time K f = EDMD( D , X, Y ) , K b = EDMD( D , Y , X ) . The next result shows that EDMD is not only able to capture K oopman eigenfunctions but also all the functions that ev olve linearly according to the av ailable data. Lemma 4.1: (EDMD Captur es the K oopman Eigenfunc- tions in the Span of the Dictionary): Suppose Assumption 3.1 holds. Let f ( x ) = D ( x ) v for some v ∈ C N d \ { 0 } and all x ∈ M . (a) Let f ev olve linearly according to the av ailable data, i.e., there exists λ ∈ C such that f ( y i ) = λf ( x i ) for ev ery i ∈ { 1 , . . . ,  ro ws( X ) } . Then, the v ector v is an eigen vector of K f with eigenv alue λ ; (b) Let f be an eigenfunction of the Koopman operator with eigen value λ . Then, the vector v is an eigenv ector of K f with eigenv alue λ . Pr oof: (a) Based on the linear ev olution of f , we hav e D ( Y ) v = λD ( X ) v . Moreover , using the closed-form solution of EDMD, we hav e K f v = D ( X ) † D ( Y ) v = λD ( X ) † D ( X ) v = λv , where the last equality follo ws from Assumption 3.1. (b) Based on the deﬁnition of Koopman eigenfunction, we hav e f ( x + ) = λf ( x ) . Since this linear e volution reﬂects in data snapshots, we hav e f ( y i ) = λf ( x i ) for ev ery i ∈ { 1 , . . . ,  ro ws( X ) } where x T i and y T i are the i th ro ws of X and Y respectively . The rest follows from (a). Despite its simplicity , this result provides signiﬁcant insight into the EDMD method. Lemma 4.1 shows that EDMD can capture eigenfunctions in the span of the dictionary ev en if the underlying subspace is not K oopman in variant. In the literature, it is well kno wn that the (E)DMD method can capture physical constraints, conservation laws, and other properties of the underlying system, which actually correspond to K oopman eigenfunctions, e.g., see [35], [45]. W e note that Lemma 4.1 is a generalization of [27, Theorem 1] to EDMD when the underlying system is not necessarily linear (or cannot be approximated by a linear system accurately) and the underlying subspace is not Koopman in variant. The next result shows that EDMD accurately predicts the ev olution of functions in the span of Koopman eigenfunctions ev aluated on the system’ s trajectories. Pr oposition 4.2: (EDMD Accur ately Pr edicts Evolution of any Linear Combination of Eigenfunctions on System’ s T ra- jectories): Let f ( x ) = D ( x ) v for some v ∈ C N d \ { 0 } and all x ∈ M . Assume f is in the span of eigenfunctions { φ i } m i =1 ⊂ span( D ) with corresponding eigenv alues { λ i } m i =1 ⊂ C . Then, giv en any trajectory { x ( j ) } ∞ j =0 of (1), f ( x ( j )) = D ( x (0)) K j f v , ∀ j ∈ N 0 . (9) Pr oof: Since f ∈ span( { φ i } m i =1 ) , there exist scalars { c i } m i =1 ⊂ C such that f ( x ) = m X i =1 c i φ i ( x ) . (10) Since { φ i } m i =1 ⊂ span( D ) , there e xist v ectors { w i } m i =1 ⊂ C N d such that φ i ( x ) = D ( x ) w i , ∀ i ∈ { 1 , . . . , m } . (11) Combining (10) and (11) with the deﬁnition of f , we deduce that P m i =1 c i w i = v . Now , D ( x (0)) K j f v = D ( x (0)) K j f m X i =1 c i w i = D ( x (0)) m X i =1 c i λ j i w i , where in the last equality we have used Lemma 4.1(b) for the eigenfunctions φ i s. Using (11), D ( x (0)) K j f v = m X i =1 c i λ j i φ i ( x (0)) = m X i =1 c i φ i ( x ( j )) , where in the second equality we hav e used the linear temporal ev olution of K oopman eigenfunctions (6). The proof is now complete by noting that the previous equation holds for all j ∈ N 0 and its the right-hand side is equal to f ( x ( j )) based on (10). Lemma 4.1 provides a necessary condition for the identiﬁ- cation of Koopman eigenfunctions. This condition howe ver is not sufﬁcient, see e.g. [1, Example IV .3] for a counter example. Interestingly , if a function e volves linearly forward in time, it also evolv es linearly backward in time. The next result shows that checking this observation provides a necessary and 5 sufﬁcient condition for identiﬁcation of functions that ev olve linearly in time according to the av ailable data. Theor em 4.3: (Identiﬁcation of Linear Evolutions by F or- war d and Backwar d EDMD): Suppose Assumption 3.1 holds. Let f ( x ) = D ( x ) v for some v ∈ C N d \ { 0 } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { 0 } and for all i ∈ { 1 , . . . ,  ro ws( X ) } if and only if v is an eigen vector of K f with eigen value λ , and an eigen vector of K b with eigen value λ − 1 . Pr oof: ( ⇐ ) : Using the closed-form solutions of the EDMD problem and Assumption 3.1, one can write, K f = ( D ( X ) T D ( X )) − 1 D ( X ) T D ( Y ) , K b = ( D ( Y ) T D ( Y )) − 1 D ( Y ) T D ( X ) . Using these along with the deﬁnition of the eigenpair , λD ( X ) T D ( X ) v = D ( X ) T D ( Y ) v , (13a) λ − 1 D ( Y ) T D ( Y ) v = D ( Y ) T D ( X ) v . (13b) By multiplying (13a) from the left by v H and using (13b), λ k D ( X ) v k 2 2 = v H D ( X ) T D ( Y ) v = ¯ λ − 1 k D ( Y ) v k 2 2 which implies | λ | 2 k D ( X ) v k 2 2 = k D ( Y ) v k 2 2 . (14) Now , we decompose D ( Y ) v orthogonally as D ( Y ) v = cD ( X ) v + w , (15) with v H D ( X ) T w = 0 . Substituting (15) into (13a) and multiplying both sides from the left by v H yields λv H D ( X ) T D ( X ) v = cv H D ( X ) T D ( X ) v . Since v 6 = 0 , and under Assumption 3.1, we deduce that c = λ . Substituting the value of c in (15), ﬁnding the 2-norm, and using the fact that v H D ( X ) T w = 0 , one can write k D ( Y ) v k 2 2 = | λ | 2 k D ( X ) v k 2 2 + k w k 2 2 . Comparing this with (14), one deduces that w = 0 and D ( Y ) v = λD ( X ) v . The result follows by looking at this equality in a row-wise manner and noting that f ( x ) = D ( x ) v . ( ⇒ ) : Based on Lemma 4.1(a), v must be an eigenv ector of K f with eigen v alue λ . Moreover , since λ 6 = 0 one can write f ( x i ) = λ − 1 f ( y i ) for every i ∈ { 1 , . . . ,  rows( X ) } and, consequently , using Lemma 4.1(a) once again, we ha ve K b v = λ − 1 v , concluding the proof. If the function f satisﬁes the conditions provided by Theo- rem 4.3, then f ( x + ) = λf ( x ) for all x ∈ rows( X ) . Howe ver , Theorem 4.3 does not guarantee that f is an eigenfunction, i.e., there is no guarantee that f ( x + ) = λf ( x ) for all x ∈ M . T o circumvent this issue, we introduce next inﬁnite sampling and make an assumption about its density . Assumption 4.4: (Almost sur e dense sampling fr om a com- pact state space): Assume the state space M is compact. Suppose we gather inﬁnitely (countably) many data snapshots. For N ∈ N , the ﬁrst N data snapshots are represented by matrices X 1: N and Y 1: N such that y i = T ( x i ) for all i ∈ { 1 , . . . , N } , where x i and y i are the i th rows of X 1: N and Y 1: N , respectively (we refer to the columns of X T 1: N as the set S N of initial conditions). Assume there exists a class- K function α and sequence { p N } ∞ N =1 ⊂ [0 , 1] such that, for ev ery N ∈ N , ∀ m ∈ M , ∃ x ∈ S N suc h that k m − x k 2 ≤ α  1 N  holds with probability p N , and lim N →∞ p N = 1 .  Assumption 4.4 is not restrictive as, in most practical cases, the state space is compact or the analysis is limited to a speciﬁc bounded region. Moreover , the data is usually av ailable on a bounded re gion due the limited range of sensors. Regarding the sampling density , Assumption 4.4 holds for most standard random samplings. Noting that our methods presented later require Assump- tion 3.1 to hold, we provide a deﬁnition for dictionary matrices acquired from inﬁnite sampling. Deﬁnition 4.5: ( R -rich Sequence of Dictionary Snapshots): Let { X 1: N } ∞ N =1 and { Y 1: N } ∞ N =1 be the sequence of data snap- shot matrices acquired from system (1). Given the dictionary D : M → R 1 × N d , we say the sequence of dictionary snapshot matrices is R -rich if R = min { M ∈ N | rank( D ( X 1: M )) = rank( D ( Y 1: M )) = N d } exists ( R is called richness constant ). In Deﬁnition 4.5, if { M ∈ N | rank( D ( X 1: M )) = rank( D ( Y 1: M )) = N d } 6 = ∅ then based on the well-ordering principle, see e.g. [46, Chapter 0], the minimum of the set exists and the sequence of the dictionary snapshot matrices is R -rich. Moreov er , giv en an R - rich sequence of dictionary snapshots matrices D ( X 1: N ) and D ( Y 1: N ) , Assumption 3.1 holds for every N ≥ R . W e are now ready to identify the K oopman eigenfunctions in the span of the dictionary using forward-backward EDMD. Theor em 4.6: (Identiﬁcation of K oopman Eigenfunctions by F orwar d and Backwar d EDMD): Given an inﬁnite sampling, suppose that the sequence of dictionary snapshot matrices is R -rich. Let K N f = EDMD( D, X 1: N , Y 1: N ) , K N b = EDMD( D , Y 1: N , X 1: N ) . Gi ven v ∈ C N d \{ 0 } and λ ∈ C \{ 0 } , let f ( x ) = D ( x ) v . Then, (a) If f is an eigenfunction of the Koopman operator with eigen value λ , then K N f v = λv and K N b v = λ − 1 v for ev ery N ≥ R ; (b) Con versely , and assuming the dictionary functions are continuous and Assumption 4.4 holds, if K N f v = λv and K N b v = λ − 1 v for ev ery N ≥ R , then f is an eigen- function of the K oopman operator with probability 1. Pr oof: (a) Since f is a Koopman eigenfunction, for ev ery i ∈ N we have f ( y i ) = λf ( x i ) . Moreover , for ev ery N ≥ R , D ( X 1: N ) and D ( Y 1: N ) hav e full column rank. Therefore, the result follows from Theorem 4.3. (b) Based on Theorem 4.3, we deduce that, for every N ≥ R f ( y i ) = λf ( x i ) v , ∀ i ∈ { 1 , . . . , N } , (16) where x T i and y T i are the i th ro ws of X 1: N and Y 1: N respectiv ely . Now , deﬁne h ( x ) := f ◦ T ( x ) − λf ( x ) . The function h is continuous since f is a linear combination of continuous functions and T is also continuous. By inspect- ing h on the data points and using (16) and the fact that y i = T ( x i ) , for all i ∈ { 1 , . . . , N } , one can sho w that 6 h ( x i ) = f ◦ T ( x i ) − λf ( x i ) = f ( y i ) − λf ( x i ) = 0 for ev ery i ∈ { 1 , . . . , N } . Moreov er , note that based on Assumption 4.4, the set S ∞ = S ∞ i =1 S i is dense in M with probability 1 and h ( x ) = 0 for every x ∈ S ∞ . As a result, h ( x ) = 0 on M with probability 1. This implies that f ◦ T ( x ) = λf ( x ) for every x ∈ M almost surely . Consequently , we hav e K ( f ) = λf almost surely , and the result follows. W e note that the technique of considering the evolution forward and backward in time has also been used in the literature for other purposes, e.g., to alleviate the effect of measurement noise on the data when performing DMD [31], [32]. T o our knowledge, the use of this technique here for the identiﬁcation of K oopman eigenfunctions and inv ariant subspaces is nov el. Moreov er , unlike [47, Algorithm 1], the methods proposed here do not require access to the system’ s multi-step trajectories. Theorems 4.3 and 4.6 provide con- ditions to identify K oopman eigenfunctions. The identiﬁed eigenfunctions then can span K oopman-inv ariant subspaces. Howe ver , one still needs to compare N d potentially com- plex eigen vectors and their corresponding eigen v alues. This procedure can be impractical for large N d . Moreover , since M ⊆ R n , the eigenfunctions of the Koopman operator form complex-conjugate pairs. Such pairs can be fully characterized using their real and imaginary parts, which allows to use in- stead real-valued functions. This motiv ates the development of algorithms to directly identify K oopman-inv ariant subspaces. V . I D E N T I FI C A T I O N O F K O O P M A N - I N V A R I A N T S U B S PAC E S V I A S Y M M E T R I C S U B S PAC E D E C O M P O S I T I O N ( S S D ) Here we pro vide an algorithmic method to identify K oopman-inv ariant subspaces in the span of a predeﬁned dictionary and later show how it can be used to ﬁnd K oop- man eigenfunctions. With the setup of Section III, gi ven the original dictionary D : M → R 1 × N d comprised of N d linearly independent functions, we aim to ﬁnd a dictionary ˜ D : M → R 1 × ˜ N d with ˜ N d linearly independent functions such that the elements of ˜ D span the maximal Koopman- in variant subspace in span( D ) . Since span( ˜ D ) is in variant, we have R ( ˜ D ◦ T ) = R ( ˜ D ) . This equality gets reﬂected in the data, i.e., giv en snapshot matrices X and Y , R ( ˜ D ( Y )) = R ( ˜ D ( X )) . (17) Moreov er , since the elements of ˜ D are in the span of D , there exists a full column rank matrix C such that ˜ D ( x ) = D ( x ) C , for all x ∈ M . Thus from (17), R ( D ( Y ) C ) = R ( D ( X ) C ) . (18) Hence, we can reformulate the problem as a purely linear- algebraic problem consisting of ﬁnding the full column rank matrix C with maximum number of columns such that (18) holds. T o solve this problem, we propose the Symmetric Subspace Decomposition (SSD) method. The SSD algorithm relies on the fact, from (17), that R ( ˜ D ( Y )) = R ( ˜ D ( X )) ⊆ R ( D ( X )) ∩ R ( D ( Y )) . This fact can alternatively be expressed using the null space of the concatenation [ D ( X ) , D ( Y )] . SSD uses the null space to prune the dictionary and remov e functions that do not ev olve linearly in time according to the av ailable data to identify a potentially smaller dictionary . At each iteration, SSD repeats the aforementioned procedure of (i) concatenation of current dictionary matrices, (ii) null space identiﬁcation, and (iii) dictionary reduction, until the desired dictionary is identiﬁed. Algorithm 1 presents the pseudocode 2 . Algorithm 1 Symmetric Subspace Decomposition 1: Initialization 2: i ← 1 , A 1 ← D ( X ) , B 1 ← D ( Y ) , C SSD ← I N d 3: while 1 do 4:  Z A i Z B i  ← null([ A i , B i ])  Basis for the null space 5: if n ull([ A i , B i ]) = ∅ then 6: retur n 0  The basis does not exist 7: break 8: end if 9: if  ro ws( Z A i ) ≤  cols( Z A i ) then 10: retur n C SSD  The procedure is complete 11: break 12: end if 13: C SSD ← C SSD Z A i  Reducing the subspace 14: A i +1 ← A i Z A i , B i +1 ← B i Z A i , i ← i + 1 15: end while A. Con ver gence Analysis of the SSD Algorithm Here we characterize the con vergence properties of the SSD algorithm. The next result characterizes the dimension, maximality , and symmetry of the subspace deﬁned by its output. Theor em 5.1: (Properties of SSD Output): Suppose As- sumption 3.1 holds. For matrices D ( X ) , D ( Y ) , let C SSD = SSD( D ( X ) , D ( Y )) . The SSD algorithm has the follo wing properties: (a) it stops after at most N d iterations; (b) the matrix C SSD is either 0 or has full column rank, and satisﬁes R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) ; (c) the subspace R ( D ( X ) C SSD ) is maximal, in the sense that, for any matrix E with R ( D ( X ) E ) = R ( D ( Y ) E ) , we ha ve R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) and R ( E ) ⊆ R ( C SSD ) ; (d) R  SSD( D ( X ) , D ( Y ))  = R  SSD( D ( Y ) , D ( X ))  . Pr oof: (a) First, we use (strong) induction to prov e that at each iteration Z A i , Z B i are matrices with full column rank upon existence . By Assumption 3.1, A 1 and B 1 hav e full column rank. Now , by using Lemma A.1 one can derive that Z A 1 and Z B 1 hav e full column rank. Now , suppose that the matrices Z A 1 , . . . , Z A k and Z B 1 , . . . , Z B k hav e full column rank. Using Assumption 3.1 one can deduce that A k +1 = A 1 Z A 1 . . . Z A k , B k +1 = B 1 Z A 1 . . . Z A k hav e full column rank since they are product of matrices with full column rank. Using Lemma A.1, one can conclude that Z A k +1 and Z B k +1 hav e full column rank. 2 The function null([ A i , B i ]) returns a basis for the null space of [ A i , B i ] , and Z A i and Z B i in Step 4 hav e the same size. 7 Consequently , we have  ro ws( Z A i ) ≥  cols( Z A i ) . Hence, Step 9 of the SSD algorithm implies that the algorithm can only mov e to the next iteration if  rows( Z A i ) >  cols( Z A i ) , which means the number of columns in A i +1 and B i +1 decreases with respect to A i and B i . Hence, the algorithm terminates after at most N d iterations since A 1 and B 1 hav e N d columns. (b) The C SSD = 0 case is trivial. Suppose that the algorithm stops after k iterations with nonzero C SSD . This means that Z A k and Z B k are square full rank matrices. Also, by deﬁnition we hav e A k Z A k = − B k Z B k which means that A k = − B k Z B k ( Z A k ) − 1 . Noting that Z B k ( Z A k ) − 1 is a full rank square matrix, one can deriv e R ( A k ) = R ( B k ) . A closer look at the deﬁnitions sho ws that A k = D ( X ) C SSD and B k = D ( Y ) C SSD . Hence, R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) . Moreov er , C SSD = Z A 1 · · · Z A k − 1 and considering the fact that Z A 1 , . . . , Z A k − 1 hav e full column rank, one can deduce that C SSD has full column rank. (c) Suppose that the matrix E satisﬁes R ( D ( X ) E ) = R ( D ( Y ) E ) . First, we use induction to prove that R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) for each iteration i that the algorithm goes through. Let i = 1 , then A 1 = D ( X ) and B 1 = D ( Y ) . Consequently , R ( D ( X ) E ) ⊆ R ( A 1 ) and R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R ( B 1 ) based on the deﬁnition of E . Hence, R ( D ( X ) E ) ⊆ R ( A 1 ) ∩ R ( B 1 ) . Now , suppose R ( D ( X ) E ) ⊆ R ( A i ) ∩ R ( B i ) . (19) Using Lemma A.1, one can derive R ( A i Z A i ) = R ( A i ) ∩ R ( B i ) . Combining this with (19), we get R ( D ( X ) E ) ⊆ R ( A i Z A i ) = R  D ( X ) Z A 1 · · · Z A i  . (20) Using (20) with Lemma A.2 one can deriv e R ( E ) ⊆ R ( Z A 1 · · · Z A i ) . Using Lemma A.2 once again, we get R ( D ( X ) E ) = R ( D ( Y ) E ) ⊆ R  D ( Y ) Z A 1 · · · Z A i  . (21) Deﬁnition of A i +1 , B i +1 along with (20) and (21) lead to conclusion that R ( D ( X ) E ) ⊆ R ( A i +1 ) ∩ R ( B i +1 ) and the induction is complete. Now , suppose that the algorithm terminates at iteration k . In the case that C SSD = 0 , we hav e R ( A k ) ∩ R ( B k ) = { 0 } , which means that E = 0 and R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) . In the case that C SSD 6 = 0 , using the fact that R ( D ( X ) E ) ⊆ R ( A k ) ∩ R ( B k ) , C SSD = Z A 1 · · · Z A k − 1 , and R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) , one can deduce that R ( D ( X ) E ) ⊆ R ( D ( X ) C SSD ) . Moreover , using Assump- tion 3.1 and Lemma A.2 one can write R ( E ) ⊆ R ( C SSD ) . (d) For conv enience, let E SSD = SSD( D ( Y ) , D ( X )) . Based on the deﬁnition of C SSD and E SSD , one can write R ( D ( X ) C SSD ) = R ( D ( Y ) C SSD ) R ( D ( X ) E SSD ) = R ( D ( Y ) E SSD ) These equations in conjunction with the maximality of R ( C SSD ) from part (c) imply R ( E SSD ) ⊆ R ( C SSD ) . Using a similar argument, inv oking the maximality of R ( E SSD ) , we hav e R ( C SSD ) ⊆ R ( E SSD ) , concluding the proof. Remark 5.2: (T ime and Space Complexity of the SSD Al- gorithm): Gi ven N data snapshots and a dictionary with N d elements, where usually N  N d , and assuming that operations on scalar elements require time and memory of order O (1) , the most time and memory consuming operation in the SSD algorithm is Step 4. This step can be done by truncated Singular V alue Decomposition (SVD) and ﬁnding the perpendicular space to the span of the right singular vec- tors, with time complexity O ( N N 2 d ) and memory complexity O ( N N d ) , see e.g., [48]. Since, based on Theorem 5.1(a), the SSD algorithm terminates in at most N d iterations, the total time complexity is O ( N N 3 d ) . Howe ver , since at each iteration we can reuse the memory for Step 4, the space complexity of SSD is O ( N N d ) .  Note that SSD removes the functions that do not e volv e linearly in time according to the a vailable data snapshots. Therefore, as we gather more data, the identiﬁed subspace either remains the same or gets smaller , as stated next. Lemma 5.3: (Monotonicity of SSD Output with Respect to Data Addition): Let D ( X ) , D ( Y ) and D ( ˆ X ) , D ( ˆ Y ) be two pairs of data snapshots such that ro ws  [ D ( X ) , D ( Y )]  ⊆ rows  [ D ( ˆ X ) , D ( ˆ Y )]  , (22) and for which Assumption 3.1 holds. Then R (SSD([ D ( ˆ X ) , D ( ˆ Y )])) ⊆ R (SSD( D ( X ) , D ( Y ))) . Pr oof: W e use the shorthand notation ˆ C = SSD([ D ( ˆ X ) , D ( ˆ Y )]) and C = SSD( D ( X ) , D ( Y )) . From (22), we deduce that there exists a matrix E with ro ws( E ) ⊆ rows( I ] rows( ˆ X ) ) such that E D ( ˆ X ) = D ( X ) , E D ( ˆ Y ) = D ( Y ) . (23) Moreov er , based on the deﬁnition of ˆ C and Theorem 5.1(b), we have R ( D ( ˆ X ) ˆ C ) = R ( D ( ˆ Y ) ˆ C ) . Hence, there exists a full rank square matrix ˆ K such that D ( ˆ Y ) ˆ C = D ( ˆ X ) ˆ C ˆ K . Multiplying both sides from the left by E and using (23) gi ves D ( Y ) ˆ C = D ( X ) ˆ C ˆ K . Consequently , we hav e R ( D ( Y ) ˆ C ) = R ( D ( X ) ˆ C ) . Now , the maximality of C (Theorem 5.1(c)) implies R ( ˆ C ) ⊆ R ( C ) . Remark 5.4: (Implementing SSD on Finite-Pr ecision Ma- chines): Since SSD is iterativ e, its implementation using ﬁnite precision leads to small errors that can affect the rank and null space of [ A i , B i ] in Step 4. T o circumvent this issue, one can approximate [ A i , B i ] at each iteration by a close (in the Frobenius norm) low-rank matrix. Let σ 1 ≥ . . . ≥ σ l i be the singular values of [ A i , B i ] ∈ R N × l i . Giv en a design parameter  > 0 , let k i be the minimum integer such that l i X j = k i σ 2 j ≤   l i X j =1 σ 2 j  . (24) One can then construct the matrix [ ˆ A i , ˆ B i ] by setting σ k i = · · · = σ l i = 0 in the singular value decomposition of [ A i , B i ] . The resulting matrix has lower rank and k [ A i , B i ] − [ ˆ A i , ˆ B i ] k 2 F ≤  k [ A i , B i ] k 2 F . (25) Hence,  tunes the accuracy of the approximation. It is important to note that similar error bounds can be found for other unitarily in variant norms, see e.g. [49].  8 B. Identiﬁcation of Linear Evolutions and K oopman Eigen- functions with the SSD Algorithm Here we study the properties of the output of the SSD algorithm in what concerns the identiﬁcation of the maximal K oopman-inv ariant subspace and the K oopman eigenfunc- tions. If C SSD 6 = 0 , we deﬁne the in variant dictionary as ˜ D ( x ) := D ( x ) C SSD . (26) T o ﬁnd the action of the Koopman operator on the subspace spanned by ˜ D , we apply EDMD on ˜ D ( X ) and ˜ D ( Y ) to ﬁnd K SSD = EDMD( ˜ D , X, Y ) = ˜ D ( X ) † ˜ D ( Y ) =  D ( X ) C SSD  †  D ( Y ) C SSD ) . (27) Based on Theorem 5.1(b), we have R ( ˜ D ( X )) = R ( ˜ D ( Y )) . (28) Moreov er , ˜ D ( X ) and ˜ D ( Y ) have full column rank as a result of Assumption 3.1 and Theorem 5.1(b). Consequently , K SSD is a (unique) nonsingular matrix satisfying ˜ D ( Y ) = ˜ D ( X ) K SSD . (29) Interestingly , equation (29) implies that the residual error of EDMD, k ˜ D ( Y ) − ˜ D ( X ) K SSD k F , is equal to zero. Based on (28), one can ﬁnd K SSD more efﬁciently and only based on partial data instead of calculating the pseudo-inv erse of D ( X ) C SSD . F ormally , consider full column rank data matrices D ( ˆ X ) , D ( ˆ Y ) such that ro ws[ D ( ˆ X ) , D ( ˆ Y )] ⊆ rows[ D ( X ) , D ( Y )] . Then, K SSD = EDMD( ˜ D , ˆ X , ˆ Y ) . Next, we sho w that the eigen vectors of K SSD fully characterize the functions that ev olve linearly in time according to the av ailable data. Theor em 5.5: (Identiﬁcation of Linear Evolutions using the SSD Algorithm): Suppose that Assumption 3.1 holds. Let C SSD = SSD( D ( X ) , D ( Y )) 6 = 0 , K SSD =  D ( X ) C SSD  †  D ( Y ) C SSD  , and f ( x ) ∈ span( D ( x )) de- noted as f ( x ) = D ( x ) v with v ∈ C N d \ { 0 } . Then f ( y i ) = λf ( x i ) for some λ ∈ C \ { 0 } and for all i ∈ { 1 , . . . ,  rows( X ) } if and only if v = C SSD w with K SSD w = λw . Pr oof: ( ⇐ ) : Based on deﬁnition of K SSD , Assump- tion 3.1, and considering the fact that C SSD has full column rank (Theorem 5.1(b)), one can use (26)-(29) and the fact that K SSD w = λw to write D ( Y ) C SSD w = λD ( X ) C SSD w . Consequently , using v = C SSD w we have D ( Y ) v = λD ( X ) v . By inspecting the equation above in a row-wise manner, one can deduce that f ( y i ) = λf ( x i ) for some λ ∈ C \ { 0 } and for all i ∈ { 1 , . . . ,  ro ws( X ) } , as claimed. ( ⇒ ) : Based on the hypotheses, we hav e D ( Y ) v = λD ( X ) v . (30) Consider ﬁrst the case when v ∈ R N d . Then using (30), we deduce R ( D ( X ) v ) = R ( D ( Y ) v ) . The maximality of C SSD (Theorem 5.1(c)) implies that R ( v ) ⊆ R ( C SSD ) and consequently v = C SSD w for some w . Replacing v by C SSD w in (30) and using the deﬁnition of K SSD , one deduces K SSD w = λw . Now , suppose that v = v R + j v I with v I 6 = 0 . Since D ( X ) and D ( Y ) are real matrices, one can use (30) and write D ( Y ) ¯ v = ¯ λD ( X ) ¯ v . This, together with (30), implies D ( Y ) E = D ( X ) E Λ , (31) where E = [ v R , v I ] and Λ =  Re( λ ) Im( λ ) − Im( λ ) Re( λ )  . Since Λ is full rank, we have R ( D ( X ) E ) = R ( D ( Y ) E ) and using Theorem 5.1(c), one can conclude R ( E ) ⊆ R ( C SSD ) . Consequently , there exists a real vector z such that E = C SSD z . By replacing this in (31) and multiplying both sides from the right by r = [1 , j ] T and deﬁning w = z r , one can conclude that v = E r = C SSD w and D ( Y ) C SSD w = λD ( X ) C SSD w . This in conjunction with the deﬁnition of K SSD implies that K SSD w = λw , concluding the proof. Using Theorem 5.5, one can identify all the linear e volutions in the span of the original dictionary , thereby establishing an equiv alence with the forward-backward EDMD characteriza- tion of Section IV. Cor ollary 5.6: (Equivalence of F orwar d-Backwar d EDMD and SSD in the Identiﬁcation of Linear Evolutions): Suppose that Assumption 3.1 holds. Let K f = EDMD( D , X , Y ) , K b = EDMD( D , Y , X ) , C SSD = SSD( D ( X ) , D ( Y )) 6 = 0 and K SSD =  D ( X ) C SSD  †  D ( Y ) C SSD  . Then, K f v = λv and K b v = λ − 1 v for some v ∈ C N d \ { 0 } and λ ∈ C \ { 0 } if and only if there exists vector w such that v = C SSD w and K SSD w = λw . The proof of this result is a consequence of Theorems 4.3 and 5.5. Note that the linear ev olutions identiﬁed by SSD might not be Koopman eigenfunctions, since we can only guarantee that they ev olve linearly according to the av ailable data snapshots, not starting ev erywhere in the state space M . The following result uses the equiv alence between SSD and the Forward-Backw ard EDMD method to provide a guarantee for the identiﬁcation of K oopman eigenfunctions. Theor em 5.7: (Identiﬁcation of K oopman Eigenfunctions by the SSD Algorithm): Gi ven an inﬁnite sampling, suppose that the sequence of dictionary snapshot matrices is R -rich. For N ≥ R , let C SSD N = SSD( D ( X 1: N ) , D ( Y 1: N )) 6 = 0 , and K SSD N =  D ( X 1: N ) C SSD N  †  D ( Y 1: N ) C SSD N ) . Gi ven v ∈ C N d \ { 0 } and λ ∈ C \ { 0 } , let f ( x ) = D ( x ) v . Then, (a) If f is an eigenfunction of the Koopman operator with eigen value λ , then for ev ery N ≥ R , there exists w N such that v = C SSD N w N and K SSD N w N = λw N ; (b) Con versely , and assuming the dictionary functions are continuous and Assumption 4.4 holds, if v ∈ R ( C SSD N ) and there exists w N such that v = C SSD N w N and K SSD N w N = λw N for ev ery N ≥ R , then f is an eigen- function of the K oopman operator with probability 1. This result is a consequence of Theorem 4.6 and Corol- lary 5.6. Theorem 5.7 shows that the SSD algorithm ﬁnds all the eigenfunctions in the span of the original dictionary 9 almost surely . The identiﬁed eigenfunctions span a K oopman- in variant subspace. This subspace howe ver is not necessarily the maximal K oopman-inv ariant subspace in the span of the original dictionary . Next, we show that the SSD method actually identiﬁes the maximal Koopman-in variant subspace in the span of the dictionary . Theor em 5.8: (SSD Finds the Maximal K oopman-In variant Subspace as N → ∞ ): Giv en an inﬁnite sampling and a dictionary composed of continuous functions, suppose that the sequence of dictionary snapshot matrices is R -rich and Assumption 4.4 holds. Let the columns of C SSD ∞ form a basis for lim N →∞ R ( C SSD N ) , i.e., R ( C SSD ∞ ) = lim N →∞ R ( C SSD N ) = ∞ \ N = R R ( C SSD N ) . (32) (note that the sequence {R ( C SSD N ) } ∞ N =1 is monotonic, and hence con vergent). Then span( D ( x ) C SSD ∞ ) is the maximal K oopman-inv ariant subspace in the span of the dictionary D with probability 1. Pr oof: If C SSD ∞ = 0 , considering the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆ R ( C SSD N ) (Lemma 5.3), one deduces that there exists m ∈ N such that for all i ≥ m , C SSD i = 0 . Hence based on Theorem 5.1(c), the maximal Koopman- in variant subspace acquired from the data is { 0 } . Noting that the subspace identiﬁed by SSD contains the maximal K oopman-inv ariant subspace, we deduce that the latter is the zero subspace, which is indeed spanned by D ( x ) C SSD ∞ . Now , suppose that C SSD ∞ 6 = 0 and has full column rank. First, we show that R ( D ( X 1: N ) C SSD ∞ ) = R ( D ( Y 1: N ) C SSD ∞ ) , ∀ N ≥ R . (33) Considering (32) and the fact that for all N ≥ R , R ( C SSD N +1 ) ⊆ R ( C SSD N ) , we can write for all N ≥ R R ( C SSD ∞ ) = ∞ \ i = N R ( C SSD i ) . In voking Lemma A.4, we hav e for all N ≥ R , R ( D ( X 1: N ) C SSD ∞ ) = ∞ \ i = N R ( D ( X 1: N ) C SSD i ) , (34a) R ( D ( Y 1: N ) C SSD ∞ ) = ∞ \ i = N R ( D ( Y 1: N ) C SSD i ) . (34b) Moreov er , for all i ≥ N we hav e R ( D ( X 1: i ) C SSD i ) = R ( D ( Y 1: i ) C SSD i ) and hence by looking at this equality in a row-wise manner , one can write R ( D ( X 1: N ) C SSD i ) = R ( D ( Y 1: N ) C SSD i ) , ∀ i ≥ N . (35) The combination of (34) and (35) yields (33). Based on the latter , the fact that D ( X 1: N ) and D ( Y 1: N ) have full column rank for ev ery N ≥ R and the fact that C SSD ∞ has full column rank, there exists a unique nonsingular square matrix K SSD ∞ ∈ R ] cols( C SSD ∞ ) × ] cols( C SSD ∞ ) such that D ( X 1: N ) C SSD ∞ K SSD ∞ = D ( Y 1: N ) C SSD ∞ , ∀ N ≥ R . (36) Note that K SSD ∞ does not depend on N . Next, we aim to prove that for e very function f ∈ span( D ( x ) C SSD ∞ ) , K ( f ) is also in span( D ( x ) C SSD ∞ ) almost surely . Let v ∈ R ] cols( C SSD ∞ ) such that f ( x ) = D ( x ) C SSD ∞ v and deﬁne g ( x ) := D ( x ) C SSD ∞ K SSD ∞ v . (37) W e sho w that g = f ◦ T = K ( f ) almost surely . Deﬁne the function h := g − f ◦ T . Also, let S ∞ = S ∞ N = R S N be the set of initial conditions. Based on (36), (37), and deﬁnition of h , h ( x ) = 0 , ∀ x ∈ S ∞ . Moreov er , h is continuous since D and T are continuous. This, together with the fact that S ∞ is dense in M almost surely (Assumption 4.4), we deduce h ≡ 0 on M almost surely . Therefore, g = K ( f ) = f ◦ T with probability 1. Noting that g ( x ) ∈ span( D ( x ) C SSD ∞ ) , we hav e prov en that span( D ( x ) C SSD ∞ ) is Koopman in variant almost surely . Finally , we prove the maximality of span( D ( x ) C SSD ∞ ) . Let L be a Koopman-in variant subspace in span( D ( x )) . Then there e xists a full column rank matrix E such that L = span( D ( x ) E ) . Moreover , since the inv ariance of L reﬂects in data, R ( D ( X 1: N ) E ) = R ( D ( Y 1: N ) E ) , for all N ≥ R . As a result, based on Theorem 5.1(c), we have R ( E ) ⊆ R ( C SSD N ) , for all N ≥ R , and hence R ( E ) ⊆ R ( C SSD ∞ ) . Therefore, by Lemma A.2, we hav e L = R ( D ( x ) E ) ⊆ R ( D ( x ) C SSD ∞ ) , which completes the proof. Remark 5.9: (Generalized K oopman Eigenfunctions): One can also extend the above discussion for generalized Koopman eigenfunctions (see e.g. [6, Remark 11]). Giv en a generalized eigen vector w of K SSD , the corresponding generalized Koop- man eigenfunction is φ ( x ) = D ( x ) C SSD w .  V I . S T R E A M I NG S Y M M E T R I C S U B S PAC E D E C O M P O S I T I O N In this section, we consider the setup where data becomes av ailable in a streaming fashion. A straightforward algorithmic solution for this scenario would be to re-run, at each timestep, the SSD algorithm with all the data av ailable up to then. Howe ver , this approach does not take advantage of the answers computed in previous timesteps, and maybe become inefﬁcient when the size of the data is large. Instead, here we pursue the design of an online algorithm, termed Streaming Symmetric Subspace Decomposition (SSSD), cf. Algorithm 2, that up- dates the identiﬁed subspaces using the previously computed ones. Note that the SSSD algorithm is not only useful for streaming data sets but also for the case of non-streaming large data sets for which the execution of SSD requires a signiﬁcant amount of memory . Giv en the signature snapshot matrices X 1: S and Y 1: S , for some S ∈ N , and a dictionary of functions D , the SSSD algorithm proceeds as follo ws: at each iteration, the algo- rithm receives a ne w pair of data snapshots, combines them with signature data matrices, and applies the latest a v ailable dictionary on them. Then, it uses SSD on those dictionary matrices and further prunes the dictionary . The basic idea of the SSSD algorithm stems from the monotonicity of SSD’ s output dictionary versus the data (cf. Lemma 5.3), i.e., by adding more data the dimension of the dictionary does not increase. Since the SSD algorithm relies on Assumption 3.1, we make the following assumption on the signature snapshots and the original dictionary . 10 Algorithm 2 Streaming Symmetric Subspace Decomposition 1: Initialization 2: D X S (1) ←  D ( X 1: S ) D ( x S +1 )  , D Y S (1) ←  D ( Y 1: S ) D ( y S +1 )  3: i ← 1 , A 1 ← D X S (1) , B 1 ← D Y S (1) , C 0 ← I N d 4: while 1 do 5: if C i − 1 = 0 then 6: C i ← 0  The basis does not exist 7: retur n C i 8: break 9: end if 10: F i ← SSD( A i , B i ) 11: if F i = 0 then 12: C i ← 0  The basis does not exist 13: retur n C i 14: break 15: end if 16: if  ro ws( F i ) >  cols( F i ) then 17: C i ← basis( R ( C i − 1 F i ))  Subspace reduction 18: else 19: C i ← C i − 1  No change 20: end if 21: return C i 22: i ← i + 1 O Replacing the last data snapshot with the new one 23: D X S ( i ) =  D ( X 1: S ) D ( x S + i )  , D Y S ( i ) =  D ( Y 1: S ) D ( y S + i )  O Calculating the reduced dictionary snapshots 24: A i ← D X S ( i ) C i − 1 , B i ← D Y S ( i ) C i − 1 25: end while Assumption 6.1: (Full Rank Signatur e Dictionary Matri- ces): W e assume that there exists S ∈ N such that the matrices D ( X 1: S ) and D ( Y 1: S ) have full column rank.  For a ﬁnite number of data snapshots, Assumption 6.1 is equiv alent to Assumption 3.1. F or an inﬁnite sampling, Assumption 6.1 holds for a R -rich sequence of snapshot matrices. The next result discusses the basic properties of the SSSD output at each iteration. Pr oposition 6.2: (Pr operties of SSSD Output): Suppose As- sumption 6.1 holds. For i ∈ N , let C i denote the output of the SSSD algorithm at the i th iteration. Then, for all i ∈ N , (a) C i has full column rank or is equal to zero; (b) R ( C i ) ⊆ R ( C i − 1 ) ; (c) R ( D X S ( i ) C i ) = R ( D Y S ( i ) C i ) . Pr oof: (a) W e prov e the claim by induction. C 0 = I N d and has full column rank. Now , suppose that C k has full column rank or is zero. W e show the same fact for C k +1 . If C k = 0 , then SSSD executes Step 6 and we hav e C k +1 = 0 . Now , suppose that C k has full column rank. Considering the fact that D X S ( k + 1) and D Y S ( k + 1) have full column rank, one can deduce that A k +1 and B k +1 hav e full column rank. Consequently , based on Theorem 5.1(b), F k +1 has full column rank or is equal to zero. In the former case, the algorithm ex ecutes Step 17 or Step 19, and based on deﬁnition of basis function and the fact that C k has full column rank, one deduces that C k +1 has full column rank. In the latter case, the algorithm ex ecutes Step 12, and C k +1 = 0 , as claimed. Now we prove (b). Note that at iteration i , C i will be determined by either Step 6, 12, 19, or 17. The proof for the ﬁrst three cases is trivial. W e only need to prove the result for the case when the SSSD algorithm ex ecutes Step 17. Based on Theorem 5.1(b), one can deduce that F i has full column rank. Also, we hav e R ( F i ) ⊆ R ( I ] cols( C i − 1 ) ) . Hence using Step 17 and Lemma A.2, one can write R ( C i ) = R ( C i − 1 F i ) ⊆ R ( C i − 1 I ] cols( C i − 1 ) ) = R ( C i − 1 ) , as claimed. Next, we prove part (c). If the SSSD algorithm ex ecutes Step 6 or Step 12, then the result follo ws directly . No w , suppose that the algorithm ex ecutes Step 17 or Step 19. Note that if the algorithm executes one of these two steps, then F i 6 = 0 , C i − 1 6 = 0 and they have full column rank (Theo- rem 5.1(b)). Hence,  rows( F i ) ≥  cols( F i ) . As a result, if the algorithm executes Step 19, we ha ve  ro ws( F i ) =  cols( F i ) and consequently R ( F i ) = R ( I ] cols( C i − 1 ) ) . Therefore, R ( C i ) = R ( C i − 1 ) = R ( C i − 1 I ] cols( C i − 1 ) ) = R ( C i − 1 F i ) . (38) Moreov er , if the SSSD algorithm executes Step 17, then using the deﬁnition of basis function, we have R ( C i ) = R ( C i − 1 F i ) . (39) Also, based on deﬁnition of F i at Step 10, Theorem 5.1(b), and the fact that A i = D X S ( i ) C i − 1 and B i = D Y S ( i ) C i − 1 , R ( D X S ( i ) C i − 1 F i ) = R ( D Y S ( i ) C i − 1 F i ) . Using this together with (38) upon execution of Step 19 and (39) upon ex ecution of Step 17, one deduces R ( D X S ( i ) C i ) = R ( D Y S ( i ) C i ) , concluding the proof. Next, we sho w that the SSSD algorithm at each iteration identiﬁes exactly the same subspace as the SSD algorithm giv en all the data up to that iteration. Theor em 6.3: (Equivalence of SSD and SSSD): Suppose Assumption 6.1 holds. For i ∈ N , let C i denote the output of the SSSD algorithm at the i th iteration and let C SSD i = SSD  D ( X 1: S + i ) , D ( Y 1: S + i )  . Then, R ( C i ) = R ( C SSD i ) , ∀ i ∈ N . Pr oof: Inclusion R ( C SSD i ) ⊆ R ( C i ) for all i ∈ N : W e reason by induction. Note that in the SSSD algorithm, for i = 1 we have F 1 = C SSD 1 . As a result, if F 1 = 0 then based on Step 12, C 1 = C SSD 1 = 0 . If the SSSD algorithm ex ecutes Step 17, then using the fact that C 0 = I N d , one can write R ( C 1 ) = R ( C SSD 1 ) . Moreover , if the SSSD algorithm ex ecutes Step 19, based on Step 16 and Theorem 5.1(b), one can deduce that R ( C SSD 1 ) = R ( C 1 ) = R ( F 1 ) = R ( I N d ) . Consequently , in all cases R ( C SSD 1 ) = R ( C 1 ) . (40) Hence, R ( C SSD 1 ) ⊆ R ( C 1 ) . Now , suppose that R ( C SSD k ) ⊆ R ( C k ) . (41) 11 W e need to show that R ( C SSD k +1 ) ⊆ R ( C k +1 ) . If C SSD k +1 = 0 then the proof follows. Now assume that C SSD k +1 6 = 0 and has full column rank based on Theorem 5.1(b). By Lemma 5.3, we have R ( C SSD k +1 ) ⊆ R ( C SSD k ) . (42) Using (41) and (42), one can deduce R ( C SSD k +1 ) ⊆ R ( C k ) . Consequently , based on the fact that C SSD k +1 6 = 0 , we have C k 6 = 0 and hence has full column rank based on Proposition 6.2(a). Moreov er , there exists a full column-rank matrix E k such that C SSD k +1 = C k E k . (43) T wo cases are possible. In case 1, the SSSD algorithm e xecutes Step 19. In case 2, the algorithm executes Step 12 or Step 17. For case 1, we ha ve C k +1 = C k . Consequently , using (43) and considering the fact that R ( E k ) ⊆ R ( I ] cols( C k ) ) and the fact that C k has full column rank, one can use Lemma A.2 and conclude R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k ) = R ( C k +1 ) . (44) Now , consider case 2. In this case, we hav e R ( C k +1 ) = R ( C k F k +1 ) . (45) Also, based on deﬁnition of C SSD k +1 and Theorem 5.1(b), one can write R ( D ( X 1: k +1 C SSD k +1 )) = R ( D ( Y 1: k +1 C SSD k +1 )) . Looking at this equation in a ro w-wise manner and con- sidering the fact that rows  [ D X S ( k + 1) , D Y S ( k + 1)]  ⊆ ro ws  [ D ( X 1: k +1 ) , D ( Y 1: k +1 )]  , one can write R ( D X S ( k + 1) C SSD k +1 ) = R ( D Y S ( k + 1) C SSD k +1 ) . Now , using (43) we ha ve R ( D X S ( k + 1) C k E k ) = R ( D Y S ( k + 1) C k E k ) . Also, noting the deﬁnition of F k +1 and the fact that A k = D X S ( k + 1) C k , B k = D Y S ( k + 1) C k , one can use Theorem 5.1(c) to write R ( E k ) ⊆ R ( F k +1 ) . Since C k has full column rank, we use (43), (45), and Lemma A.2 to write R ( C SSD k +1 ) = R ( C k E k ) ⊆ R ( C k F k +1 ) = R ( C k +1 ) . (46) In both cases, equations (44) and (46) conclude the induction. Inclusion R ( C i ) ⊆ R ( C SSD i ) for all i ∈ N : W e reason by induction too. Using (40), we hav e R ( C 1 ) ⊆ R ( C SSD 1 ) . Now , suppose that R ( C k ) ⊆ R ( C SSD k ) . (47) W e prove the same result for k + 1 . If C k +1 = 0 then the result directly follows. Now , assume that C k +1 6 = 0 . Consequently , based on (47), Proposition 6.2(a), and Theorem 5.1(b), we deduce that C k +1 and C SSD k +1 hav e full column rank. The ﬁrst part of the proof and (47) imply that R ( C k ) = R ( C SSD k ) . Consequently , noting the f act that C SSD k is the output of the SSD algorithm with D ( X 1: S + k ) and D ( Y 1: S + k ) , one can use Theorem 5.1(b) to write R  D ( X 1: S + k ) C k  = R  D ( Y 1: S + k ) C k  . (48) Moreov er , based on Proposition 6.2(b), we have R ( C k +1 ) ⊆ R ( C k ) . Hence, since C k and C k +1 hav e full column rank, there exists a matrix G k with full column rank such that C k +1 = C k G k . (49) Also, based on Proposition 6.2(c) at iteration k + 1 of the SSSD algorithm R ( D X S ( k + 1) C k +1 ) = R ( D Y S ( k + 1) C k +1 ) . (50) Consequently , based on (49) and (50), we hav e R ( D ( X 1: S ) C k G k ) = R ( D ( Y 1: S ) C k G k ) . Using this equation together with (48) and Lemma A.3, R  D ( X 1: S + k ) C k G k  = R  D ( Y 1: S + k ) C k G k  . Moreov er , using (49) one can write R  D ( X 1: S + k ) C k +1  = R  D ( Y 1: S + k ) C k +1  . Hence, there exists a nonsingular square matrix K ∗ such that D ( X 1: S + k ) C k +1 K ∗ = D ( Y 1: S + k ) C k +1 . (51) Also, based on (50) and noting that D X S ( k + 1) , D X S ( k + 1) , and C k +1 hav e full column rank, there e xists a nonsingular square matrix K such that D X S ( k + 1) C k +1 K = D Y S ( k + 1) C k +1 . (52) Using the ﬁrst S ro ws of (51) and (52), one can write D ( X 1: S ) C k +1 K ∗ = D ( Y 1: S ) C k +1 , D ( X 1: S ) C k +1 K = D ( Y 1: S ) C k +1 . By subtracting the second equation from the ﬁrst one, we get D ( X 1: S ) C k +1 ( K ∗ − K ) = 0 . Moreov er , since D ( X 1: S ) C k +1 has full column rank, we deduce K ∗ = K . Using this together with (51) and (52) yields R  D ( X 1: S + k +1 ) C k +1  = R  D ( Y 1: S + k +1 ) C k +1  . (53) From (53), the deﬁnition of C SSD k +1 and Theorem 5.1(c), we deduce R ( C k +1 ) ⊆ R ( C SSD k +1 ) , concluding the proof. Theorem 6.3 establishes the equiv alence between the SSSD and SSD algorithms. As a consequence, all results regarding the identiﬁcation of K oopman-inv ariant subspaces and eigen- functions presented in Section V are also valid for the output of the SSSD algorithm. Remark 6.4: (T ime and Space Complexity of the SSSD Algorithm): Given the ﬁrst N data snapshots and a dictionary with N d elements, with N > S ≥ N d , and assuming that operations on scalar elements require time and space of order O (1) , the most time and memory consuming operation in the SSSD algorithm is Step 10 in voking SSD. In this step, the most time consuming operation is performing SVD, with time complexity O ( S N 2 d ) and space complexity of O ( S N d ) , see e.g., [48]. After having performed the ﬁrst SVD, the ensuing ones result in a reduction of the dimension of the subspace. Therefore, the SSSD algorithm performs at most N − S SVDs with no subspace reduction with overall time complexity O ( N S N 2 d ) and at most N d SVD operations with 12 subspace reductions with ov erall time complexity O ( S N 3 d ) . Considering the fact that N ≥ N d , the complexity of the SSSD algorithm is O ( N S N 2 d ) . Moreover , in many real world applications S = O ( N d ) (in fact usually S = N d ), which reduces the time complexity of SSSD to O ( N N 3 d ) , which is the same complexity as SSD. Moreover , since we can reuse the space used in Step 10 at each iteration, and considering the fact that the space complexity of this step is O ( S N d ) , we deduce that the space complexity of SSSD is O ( S N d ) . This usually reduces to O ( N 2 d ) since S = O ( N d ) in many real-world applications.  Remark 6.5: (SSSD is Mor e Stable and Runs F aster than SSD): The SSSD algorithm is more computationally stable than SSD, since it always works with matrices of size at most ( S + 1) × N d while SSD w orks with matrices of size N × N d . Moreover , even though SSD and SSSD have the same time complexity , the SSSD algorithm run faster for two reasons. First, at each iteration of the SSSD algorithm, the dictionary gets smaller , which reduces the cost of computation for the remaining data snapshots. Second, the characterizations in Remarks 5.2 and 6.4 only consider the number of ﬂoating point operations for the time complexity and ignore the amount of time used for loading the data. SSSD needs to load signif- icantly smaller data matrices, which leads to a considerable reduction in run time compared to SSD.  V I I . A P P ROX I M A T I N G K O O P M A N - I N V A R I A N T S U B S PAC E S W e note that, if the span of the original dictionary D does not contain any Koopman-in variant subspace, then the SSD algorithm returns the trivial solution, which does not result in any information about the behavior of the dynamical system. T o circumvent this issue, here we propose a method to approximate K oopman-in variant subspaces. Noting the fact that the existence of a K oopman-in variant subspace translates into the rank deﬁciency of the concatenated matrix [ A i , B i ] in Step 4 of the SSD algorithm, we propose to replace the n ull function in SSD with the approx-n ull routine presented in Algorithm 3 belo w . This routine constructs an approximated null space by selecting a set of small singular values. The parameter  > 0 in Algorithm 3 is a design choice that tunes the accuracy of the approximation 3 . The next result studies the basic properties of Algorithm 3. Pr oposition 7.1: (Pr operties of Algorithm 3): Let A and B be matrices of equal size,  > 0 , and Z = appro x-n ull( A, B ,  ) . Then, (a) Algorithm 3 terminates in ﬁnite iterations; (b) Z is either ∅ or has full column rank; (c) if Z 6 = ∅ , let Z = [( Z A ) T , ( Z B ) T ] T with Z A , Z B of equal size. Then Z A and Z B hav e full column rank. Pr oof: (a) W e prove it by contradiction, i.e., suppose the algorithm does not terminate in ﬁnite iterations. Let Z i be the internal matrix in Step 8 at iteration i . Since by construction k min >  cols( A ) and m =  cols( V ) =  cols( A ) +  cols( B ) (cf. Step 2), we deduce  cols( Z 1 ) = m − k min ≤  cols( B ) . (54) 3 In Algorithm 3, A and B hav e equal size and both have full column rank. Algorithm 3 approx-n ull(A , B ,  ) O Singular value decomposition of [ A, B ] 1: { U, S , V } ← svd([ A, B ])  U S V T = [ A, B ] 2: m ←  cols( V )  # of columns of V 3: k min ← n min k s . t .  P m i = k S 2 i,i k S k 2 F ≤  2 ∧ k >  cols( A ) o O Choosing the right singular vectors corresponding to small singular values as the approximated null space 4: if k min = ∅ then 5: return ∅ 6: break 7: else 8: Z ← ( V T k min : m ) T 9: end if O Make sure Assumption 3.1 holds for the output 10: while 1 do 11:  Z A Z B  ← Z   rows( Z A ) =  ro ws( Z B ) 12: if rank( Z A ) = rank( Z B ) =  cols( Z ) then 13: return Z  Basis for approximated null space 14: break 15: end if O Reducing the space 16: if  cols( Z ) = 1 then 17: return ∅ 18: break 19: else 20: Z ← ( Z T 2: ] cols( Z ) ) T  Removing the 1st column 21: end if 22: end while Moreov er , since we assumed the algorithm nev er termi- nates, it ex ecutes Step 20 at each iteration and consequently ,  cols( Z i +1 ) =  cols( Z i ) − 1 for i ∈ N . As a result, one can use (54) to write  cols( Z j ) =  cols( Z 1 ) − j + 1 ≤  cols( B ) − j + 1 , which leads to  cols( Z j ) < 0 for j >  cols( B ) + 1 , contradicting  cols( Z j ) ≥ 0 . (b) There are three ways for Algorithm 3 to terminate: either Steps 13-14, Steps 5-6, or Steps 17-18. The latter two cases imply Z = ∅ . In the other case, since the columns of Z are selected from the right singular vectors of [ A, B ] , they are nonzero and mutually orthogonal. Consequently , Z has full column rank. (c) Since Z 6 = ∅ , the algorithm ex ecutes Steps 13-14 upon termination. Hence, the condition in Step 12 holds, and consequently Z A and Z B hav e full column rank. W e next characterize the quality of Algorithm 3’ s output. Pr oposition 7.2: (Quality of Low-Rank Appr oximation of [ A, B ] Constructed with Output of Algorithm 3): Let  > 0 , A and B full column rank matrices with equal size, and assume Z = n ull-appro x( A, B ,  ) 6 = ∅ . Denote W = [ A, B ] and let W = U S V T be its singular value decomposition. Let ¯ S be deﬁned by setting in S the entries ¯ S i,i = 0 13 for i ∈ {  cols( V ) −  cols( Z ) + 1 , . . . ,  cols( V ) } . Deﬁne ¯ W = U ¯ S V T and express it as the concatenation ¯ W = [ ¯ A, ¯ B ] , where ¯ A and ¯ B have the same size. Then, (a) k W − ¯ W k F ≤  k W k F ; (b) the columns of Z form a basis for the null space of ¯ W ; (c) ¯ AZ A = − ¯ B Z B , where Z = [( Z A ) T , ( Z B ) T ] T and Z A , Z B hav e the same size. Pr oof: (a) By construction we ha ve Z = V T ( ] cols( V ) − ] cols( Z )+1): ] cols( V ) , i.e., the columns of Z are the last  cols( Z ) columns of V , corresponding to the smallest singular values of W . Moreover , based on Step 3 of the algorithm, the fact that the singular values are ordered in a decreasing manner in S , and noting that k min ≤  cols( V ) −  cols( Z ) + 1 , one can write ] cols( V ) X i = ] cols( V ) − ] cols( Z )+1 S 2 i,i ≤  2 ] cols( V ) X i =1 S 2 i,i =  2 k W k 2 F . The proof concludes by noting that the left hand side term in the previous equation is equal to k W − ¯ W k 2 F . (b) The proof directly follows from the fact that ¯ W = U ¯ S V T is the singular value decomposition of ¯ W and the columns of Z are the right singular vectors corresponding to zero singular values of ¯ W . (c) Based on (b), ¯ W Z = 0 . Hence, ¯ AZ A = − ¯ B Z B . W e formally deﬁne the Appr oximated-SSD algorithm as the modiﬁcation of SSD that replaces Step 4 of Algorithm 1 by  Z A i Z B i  ← approx-n ull( A i , B i ,  ) . Since all other steps of Approximated-SSD are identical to SSD, we omit presenting it for space reasons. For con venience, we denote the output of the Approximated SSD algorithm by C SSD aprx . Proposition 7.1 completely preserves the logical structure for the proof of Theorem 5.1(a) and, as a result, we deduce that the Approximated-SSD algorithm terminates in at most N d iterations. Moreover , the C SSD aprx matrix is zero or has full column rank, since the second part of the proof for Theorem 5.1(b) also holds for Approximated- SSD. If C SSD aprx 6 = 0 , one can deﬁne the reduced dictionary with ˜ N d =  cols( C SSD aprx ) elements as ˜ D aprx ( x ) = D ( x ) C SSD aprx , ∀ x ∈ M . (55) W e propose calculating the linear prediction matrix K SSD aprx by solving the following total least squares (TLS) problem (see e.g. [50] for more information on TLS) minimize K, ∆ 1 , ∆ 2 k [∆ 1 , ∆ 2 ] k F (56a) subject to ˜ D aprx ( Y ) + ∆ 2 = ( ˜ D aprx ( X ) + ∆ 1 ) K. (56b) Even though TLS problems do not always hav e a solution, the next result shows that (56) does. W e also provide its closed- form solution and a bound on the accuracy of the prediction on the av ailable data based on the parameter  . Theor em 7.3: (Solution and Pr ediction Accuracy of (56) ): Let [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] = U S V T be the singular value decomposition of [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] . Let ¯ S be deﬁned by setting in S the entries ¯ S i,i = 0 for i ∈ { ˜ N d + 1 , . . . , 2 ˜ N d } . Let U ¯ S V T = [ ¯ A, ¯ B ] , with ¯ A , ¯ B of the same size. Deﬁne K SSD aprx = ¯ A † ¯ B , (57a) [∆ ∗ 1 , ∆ ∗ 2 ] = [ ¯ A, ¯ B ] − [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] . (57b) Then, K SSD aprx , ∆ ∗ 1 , ∆ ∗ 2 are the global solution of (56) and k [∆ ∗ 1 , ∆ ∗ 2 ] k F ≤  k [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] k F . (58) Pr oof: One can rewrite (56b) as  [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] + [∆ 1 , ∆ 2 ]   K − I ˜ N d  = 0 , which implies that rank([ ˜ D aprx ( X ) , ˜ D aprx ( Y )] + [∆ 1 , ∆ 2 ]) ≤ ˜ N d . Using Eckart-Y oung theorem [51], one deduces that [ ¯ A, ¯ B ] is the closest matrix (in Frobenius norm) to [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] of rank smaller than or equal to ˜ N d . In other words, ∆ ∗ 1 and ∆ ∗ 2 in (57b) minimize the cost function in (56a). Next, we need to show that they also satisfy (56b) with K SSD aprx deﬁned in (57a). Let t be the termination iteration of the Approximated-SSD algorithm. Since C SSD aprx 6 = 0 , the algorithm executes Step 10. Therefore, the condition in Step 9 holds and  ro ws( Z A t ) ≤  cols( Z A t ) , where [( Z A t ) T , ( Z B t ) T ] = appro x-n ull( A t , B t ,  ) . In addition, based on Proposition 7.1(c), Z A t and Z B t are nonsingular square matrices. Noting that by deﬁnition in the Approximated-SSD algorithm, A t = D ( X ) C SSD aprx = ˜ D aprx ( X ) and B t = D ( Y ) C SSD aprx = ˜ D aprx ( Y ) , one can use Proposition 7.2(c) with W = [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] and ¯ W = [ ¯ A, ¯ B ] to write ¯ AZ A t = − ¯ B Z B t . Since Z A t and Z B t are nonsin- gular square matrices, the previous equation leads to R ( ¯ A ) = R ( ¯ B ) and ¯ AK SSD aprx = ¯ B , where K SSD aprx is deﬁned in (57a). As a result, ∆ ∗ 1 , ∆ ∗ 2 , K SSD aprx satisfy the constraint (56b). Finally , the accuracy bound deﬁned in (58) follows from Proposition 7.2(a) with W = [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] and ¯ W = [ ¯ A, ¯ B ] . Note that, unlike in the exact case (cf. Theorem 5.8), Theorem 7.3 does not provide an out-of-sample bound on prediction accuracy . According to this result, a small pertur- bation [∆ ∗ 1 , ∆ ∗ 2 ] to the matrix [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] allo ws us to describe the evolution of the dictionary matrices lin- early through K SSD aprx . Moreover , the Frobenius norm of the perturbation is upper bounded by  k [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] k F , which implies that a smaller  leads to better accuracy on the observed samples. V I I I . S I M U L A T I O N R E S U LT S W e illustrate the ef ﬁcacy of the proposed methods in two examples. 4 Example 8.1: (Unstable Discrete-time P olynomial System): Consider the nonlinear system x + 1 = 1 . 1 x 1 x + 2 = 1 . 2 x 2 + 0 . 1 x 2 1 + 0 . 1 , (59) 4 W e hav e chosen on purpose low-dimensional examples to be able to fully detail the identiﬁed Koopman eigen values and associated subspaces. Howe ver , it is worth pointing out that the results presented here are applicable without any restriction on the type of dynamical system, its dimension, or the sparsity of the model in the dictionary . 14 with state x T = [ x 1 , x 2 ] T . System (59) is actually an unsta- ble Polyﬂow [52] which has a ﬁnite-dimensional Koopman- in variant subspace comprised of polynomials. W e use the dictionary D ( x ) = [1 , x 1 , x 2 , x 2 1 , x 1 x 2 , x 2 2 , x 3 1 , x 1 x 2 2 , x 2 1 x 2 , x 3 2 ] with N d = 10 . Moreover , we gather 2 × 10 4 data snap- shots uniformly sampled from [ − 2 , 2] × [ − 2 , 2] . W e use the SSD and SSSD strategies to identify the maximal Koopman- in variant subspaces in span( D ( x )) . In the SSSD method, we use the ﬁrst 10 data snapshots as signature snapshots and feed the rest of the data to the algorithm according to the order they appear in the data set. Similarly to the previous example, we use the strategy e xplained in Remark 5.4 with  = 10 − 12 to ov ercome error due to the use of ﬁnite-precision machines. Both methods ﬁnd bases for the 6-dimensional subspace spanned by { 1 , x 1 , x 2 , x 1 x 2 , x 2 1 , x 3 1 } , which is the maximal K oopman-in v ariant subspace in span( D ( x )) . The SSSD method, howe ver , performs the calculations 96% faster than SSD. One can ﬁnd K SSD by applying EDMD on either of the identiﬁed dictionaries according to (27). Moreov er , based on Theorems 5.5 and 5.7, we use the eigendecomposition of K SSD to ﬁnd all the K oopman eigenfunctions associated with the system (59) in span( D ( x )) . T able I shows the identiﬁed eigenfunctions. One can use direct calculation to verify that the identiﬁed functions are the K oopman eigenfunctions asso- ciated with system (59). Note that since x 1 and x 2 are both in the span of the identiﬁed Koopman-in variant subspace, one can fully characterize the behavior of the system using the eigenfunctions and (7) linearly or directly using the identiﬁed dictionary and K SSD . T ABLE I: Identiﬁed eigenfunctions and eigenv alues of the K oopman operator associated with system (59). Eigenfunction Eigen value φ 1 ( x ) = 1 λ 1 = 1 φ 2 ( x ) = x 1 λ 2 = 1 . 1 φ 3 ( x ) = x 2 1 λ 3 = 1 . 21 φ 4 ( x ) = 20 x 2 1 − 2 x 2 − 1 λ 4 = 1 . 2 φ 5 ( x ) = x 3 1 λ 5 = 1 . 331 φ 6 ( x ) = 20 x 3 1 − 2 x 1 x 2 − x 1 λ 6 = 1 . 32 Next, we e valuate the ef fectiveness of the original dictionary D and the dictionary ˜ D identiﬁed by SSD (equi valently , by SSSD) for long-term prediction. T o do this, we consider error functions deﬁned as follows. Giv en an arbitrary dictionary D , consider its associated matrix K = EDMD( D , X , Y ) . For a trajectory { x ( k ) } M k =0 of (59) with length M and initial condition x 0 , let E relativ e ( k ) =   D ( x ( k )) − D ( x 0 ) K k   2 kD ( x ( k )) k 2 × 100 , (60a) E angle ( k ) = ∠  D ( x ( k )) , D ( x 0 ) K k  , (60b) where D ( x 0 ) K k is the predicted dictionary vector at time k calculated using the dictionary D . E relativ e corresponds to the relativ e error in magnitude between the predicted and exact dictionary vectors and E angle corresponds to the error in the angle of the vectors. W e compute the errors associated to the original dictio- nary D , denoted E Orig relativ e and E Orig angle , and the errors associated to the SSD dictionary ˜ D , denoted E SSD relativ e and E SSD angle . Figure 1 illustrates these errors along a trajectory starting from a ran- dom initial condition in [ − 2 , 2] × [ − 2 , 2] for 20 time steps. The plot sho ws the importance of the dictionary selection when performing EDMD. Unlike the prediction on span( D ( x )) , the prediction on the SSD subspace span( ˜ D ( x )) matches the behavior of the system exactly . This is a direct conse- quence of the fact that span( ˜ D ( x )) is a Koopman-in variant subspace, on which EDMD fully captures the behavior of the operator through K SSD . It is worth mentioning that based on Proposition 4.2, EDMD with dictionary D also predicts the functions in span( ˜ D ) exactly . Howe ver , its prediction for functions outside of span( ˜ D ) leads to large errors. 0 5 10 15 20 time step 0 200 400 600 800 1000 relative error (%) E relative SSD E relative Orig 0 5 10 15 20 time step 0 1 2 3 4 angle (rad) E angle SSD E angle Orig Fig. 1: Relativ e (left) and angle (right) prediction errors on the original and SSD subspaces for system (59) on a trajectory of length M = 20 . Example 8.2: (Dufﬁng Equation): Here, we inv estigate the efﬁcac y of the proposed methods in approximating Koopman eigenfunctions and inv ariant subspaces. Consider the Dufﬁng equation [35, Section 4.2] ˙ x 1 = x 2 ˙ x 2 = − 0 . 5 x 2 + x 1 − x 3 1 , (61) with state x T = [ x 1 , x 2 ] T . The system has one unstable equilibrium at the origin and two locally stable equilibria at [ ± 1 , 0] T . W e consider the discretized version of (61) with timestep ∆ t = 0 . 01 s and gather N = 5000 data snapshots uniformly sampled from M = [ − 2 , 2] × [ − 2 , 2] . Moreov er , we use the dictionary D comprised of all N d = 36 monomials up to degree 7 in the form of Q 7 i =1 y i , where y i ∈ { 1 , x 1 , x 2 } . The maximal K oopman-in variant subspace in the span of the dictionary is one dimensional, spanned by the trivial eigenfunction φ ( x ) ≡ 1 . Hence, applying the SSD and SSSD algorithms would result in a trivial solution. Instead, we apply the Approximated-SSD algorithm on the available dictionary snapshots with the accurac y parameter  = 10 − 3 . The outcome is the dictionary ˜ D aprx with ˜ N d = 15 elements. W e calculate the linear prediction matrix K SSD aprx using Theorem 7.3. The norm of the perturbation k [∆ ∗ 1 , ∆ ∗ 2 ] k F satisﬁes k [∆ ∗ 1 , ∆ ∗ 2 ] k F ≈ 9 . 6 × 10 − 4 k [ ˜ D aprx ( X ) , ˜ D aprx ( Y )] k F , agreeing with the upper bound pro vided in Theorem 7.3. W e approximate the eigenfunctions of the Koopman operator using the eigendecomposition of K SSD aprx . For space reasons, we only illustrate the leading nontri vial approximated Koopman eigenfunctions with eigen value closest to the unit circle. Fig- ure 2(left) shows the real-valued approximated eigenfunction 15 corresponding to the eigen v alue λ = 0 . 9919 . Despite being an approximation, the eigenfunction captures the behavior of the vector ﬁeld accurately and correctly identiﬁes the attractiv e- ness of the two locally stable equilibria. Gi ven that | λ | < 1 , Figure 2(left) predicts that the trajectories ev entually conv erge to one of the stable equilibria. Figure 2(right) shows the absolute value of the approximated K oopman eigenfunctions corresponding to the eigen values λ = 0 . 9989 ± 0 . 0037 j . Similarly to the other plot, it captures information about the shape of the vector ﬁeld such as the attracti ve equilibria and their regions of attraction. -2 -1 0 1 2 x 1 -2 -1 0 1 2 x 2 0 2 4 6 8 -2 -1 0 1 2 x 1 -2 -1 0 1 2 x 2 0.2 0.4 0.6 0.8 Fig. 2: The approximated eigenfunction corresponding to the eigen value λ = 0 . 9919 (left) and the absolute value of the approximated eigenfunctions corresponding to the eigen values λ = 0 . 9989 ± 0 . 0037 j (right) for the K oopman operator associated with (61), as identiﬁed by the Approximated- SSD algorithm. W e use the relative and angle errors deﬁned in (60) to compare the prediction accuracy of the original dictionary D and the Approximated-SSD dictionary ˜ D aprx (for which we use K SSD aprx ). Figure 3 illustrates the relativ e and angle errors along a trajectory starting from a random initial condition in [ − 2 , 2] × [ − 2 , 2] for 30 timesteps. The plot shows the superior- ity of EDMD over the subspace identiﬁed by Approximated- SSD in long-term prediction of dynamical behavior . 0 10 20 30 time step 0 20 40 60 80 relative error (%) E relative Approx-SSD E relative Orig 0 10 20 30 time step 0 0.1 0.2 0.3 0.4 angle (rad) E angle Approx-SSD E angle Orig Fig. 3: Relati ve (left) and angle (right) prediction errors on Approximated-SSD and original subspaces for system (61) on a trajectory of length M = 30 . I X . C O N C L US I O N S W e hav e studied the characterization of K oopman-inv ariant subspaces and Koopman eigenfunctions associated to a dy- namical system by means of data-driven methods. W e hav e shown that the application of EDMD over a giv en dictionary forward and backward in time fully characterizes whether a function ev olves linearly in time according to the av ailable data. Building on this result, and under dense sampling, we hav e established that functions satisfying this condition are K oopman eigenfunctions almost surely . W e have de veloped the SSD algorithm to identify the maximal Koopman-in variant subspace in the span of the giv en dictionary and formally char- acterized its correctness. Finally , we hav e dev eloped exten- sions to scenarios with large and streaming data sets, where the algorithm reﬁnes its output as ne w data becomes av ailable, and to scenarios where the original dictionary does not contain suf- ﬁcient informative eigenfunctions, in which case the algorithm obtains instead approximations of the K oopman eigenfunctions and in v ariant subspaces. Future work will dev elop parallel and distributed counterparts of the algorithms proposed here over network systems, obtain out-of-sample bounds on prediction accuracy for the output of the Approximated-SSD algorithm, in vestigate the design of noise-resilient methods to identify K oopman eigenfunctions and inv ariant subspaces, and explore methods to expand the original dictionary to ensure the exis- tence of non-trivial in v ariant subspaces. R E F E R E N C E S [1] M. Haseli and J. Cort ´ es, “Efﬁcient identiﬁcation of linear e volutions in nonlinear vector ﬁelds: Koopman in variant subspaces, ” in IEEE Conf. on Decision and Contr ol , Nice, France, Dec. 2019, pp. 1746–1751. [2] B. O. Koopman, “Hamiltonian systems and transformation in Hilbert space, ” Proceedings of the National Academy of Sciences , vol. 17, no. 5, pp. 315–318, 1931. [3] B. O. Koopman and J. V . Neumann, “Dynamical systems of continuous spectra, ” Pr oceedings of the National Academy of Sciences , vol. 18, no. 3, pp. 255–263, 1932. [4] I. Mezi ´ c, “Spectral properties of dynamical systems, model reduction and decompositions, ” Nonlinear Dynamics , vol. 41, no. 1-3, pp. 309– 325, 2005. [5] C. W . Ro wley , I. Mezi ´ c, S. Bagheri, P . Schlatter, and D. S. Henningson, “Spectral analysis of nonlinear ﬂows, ” J ournal of Fluid Mechanics , vol. 641, pp. 115–127, 2009. [6] M. Budi ˇ si ´ c, R. Mohr, and I. Mezi ´ c, “ Applied Koopmanism, ” Chaos , vol. 22, no. 4, p. 047510, 2012. [7] A. Surana, M. O. Williams, M. Morari, and A. Banaszuk, “Koopman operator framew ork for constrained state estimation, ” in IEEE Conf . on Decision and Control , Melbourne, Australia, 2017, pp. 94–101. [8] M. Netto and L. Mili, “ A robust data-driven Koopman Kalman ﬁlter for power systems dynamic state estimation, ” IEEE T ransactions on P ower Systems , vol. 33, no. 6, pp. 7228–7237, 2018. [9] A. Mauroy and J. Goncalves, “Koopman-based lifting techniques for nonlinear systems identiﬁcation, ” IEEE T ransactions on Automatic Con- tr ol , vol. 65, no. 6, pp. 2550–2565, 2020. [10] D. Bruder, C. D. Remy , and R. V asudevan, “Nonlinear system identiﬁ- cation of soft robot dynamics using K oopman operator theory , ” in IEEE Int. Conf. on Robotics and Automation , Montreal, Canada, May 2019, pp. 6244–6250. [11] B. Kramer, P . Grover , P . Boufounos, S. Nabi, and M. Benosman, “Sparse sensing and DMD-based identiﬁcation of ﬂow regimes and bifurcations in complex ﬂows, ” SIAM Journal on Applied Dynamical Systems , vol. 16, no. 2, pp. 1164–1196, 2017. [12] S. Sinha, U. V aidya, and R. Rajaram, “Operator theoretic framework for optimal placement of sensors and actuators for control of nonequilibrium dynamics, ” J ournal of Mathematical Analysis and Applications , v ol. 440, no. 2, pp. 750–772, 2016. [13] S. Klus, F . N ¨ uske, P . K oltai, H. W u, I. Ke vrekidis, C. Sch ¨ utte, and F . No ´ e, “Data-driv en model reduction and transfer operator approximation, ” Journal of Nonlinear Science , vol. 28, no. 3, pp. 985–1010, 2018. [14] A. Alla and J. N. Kutz, “Nonlinear model order reduction via dynamic mode decomposition, ” SIAM Journal on Scientiﬁc Computing , vol. 39, no. 5, pp. B778–B796, 2017. [15] M. K orda and I. Mezi ´ c, “Linear predictors for nonlinear dynamical sys- tems: Koopman operator meets model predictive control, ” Automatica , vol. 93, pp. 149–160, 2018. [16] H. Arbabi, M. Korda, and I. Mezic, “ A data-driven Koopman model predictiv e control framework for nonlinear ﬂows, ” arXiv pr eprint arXiv:1804.05291 , 2018. [17] S. Peitz and S. Klus, “Koopman operator-based model reduction for switched-system control of PDEs, ” Automatica , vol. 106, pp. 184–191, 2019. [18] B. Huang, X. Ma, and U. V aidya, “Feedback stabilization using Koop- man operator, ” in IEEE Conf. on Decision and Control , Miami Beach, FL, Dec. 2018, pp. 6434–6439. 16 [19] E. Kaiser , J. N. Kutz, and S. L. Brunton, “Data-driv en discovery of Koopman eigenfunctions for control, ” arXiv preprint , 2017. [20] A. Sootla and D. Ernst, “Pulse-based control using Koopman operator under parametric uncertainty , ” IEEE T ransactions on Automatic Control , vol. 63, no. 3, pp. 791–796, 2017. [21] A. Narasingam and J. S. Kwon, “Data-driven feedback stabilization of nonlinear systems: Koopman-based model predictive control, ” arXiv pr eprint arXiv:2005.09741 , 2020. [22] G. Mamakoukas, M. Castano, X. T an, and T . Murphey , “Local K oopman operators for data-driven control of robotic systems, ” in Robotics: Science and Systems , Freibur g, Germany , Jun. 2019. [23] M. L. Casta ˜ no, A. Hess, G. Mamakoukas, T . Gao, T . Murphey , and X. T an, “Control-oriented modeling of soft robotic swimmer with Koop- man operators, ” in IEEE/ASME International Confer ence on Advanced Intelligent Mechatr onics (AIM) , 2020, pp. 1679–1685. [24] A. Mauroy and I. Mezi ´ c, “Global stability analysis using the eigen- functions of the K oopman operator, ” IEEE T ransactions on Automatic Contr ol , vol. 61, no. 11, pp. 3356–3369, 2016. [25] P . J. Schmid, “Dynamic mode decomposition of numerical and experi- mental data, ” Journal of Fluid Mechanics , vol. 656, pp. 5–28, 2010. [26] K. K. Chen, J. H. T u, and C. W . Rowley , “V ariants of dynamic mode decomposition: boundary condition, Koopman, and Fourier analyses, ” Journal of Nonlinear Science , vol. 22, no. 6, pp. 887–915, 2012. [27] J. H. T u, C. W . Rowle y , D. M. Luchtenbur g, S. L. Brunton, and J. N. Kutz, “On dynamic mode decomposition: theory and applications, ” Journal of Computational Dynamics , vol. 1, no. 2, pp. 391–421, 2014. [28] M. S. Hemati, M. O. W illiams, and C. W . Rowley , “Dynamic mode decomposition for large and streaming datasets, ” Physics of Fluids , vol. 26, no. 11, p. 111701, 2014. [29] H. Zhang, C. W . Rowley , E. A. Deem, and L. N. Cattafesta, “Online dynamic mode decomposition for time-varying systems, ” SIAM J ournal on Applied Dynamical Systems , vol. 18, no. 3, pp. 1586–1609, 2019. [30] S. Anantharamu and K. Mahesh, “ A parallel and streaming dynamic mode decomposition algorithm with ﬁnite precision error analysis for large data, ” Journal of Computational Physics , vol. 380, pp. 355–377, 2019. [31] S. T . M. Dawson, M. S. Hemati, M. O. Williams, and C. W . Rowle y , “Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition, ” Experiments in Fluids , vol. 57, no. 3, p. 42, 2016. [32] M. S. Hemati, C. W . Rowle y , E. A. Deem, and L. N. Cattafesta, “De-biasing the dynamic mode decomposition for applied Koopman spectral analysis of noisy datasets, ” Theor etical and Computational Fluid Dynamics , vol. 31, no. 4, pp. 349–368, 2017. [33] M. R. Jov anovi ´ c, P . J. Schmid, and J. W . Nichols, “Sparsity-promoting dynamic mode decomposition, ” Physics of Fluids , vol. 26, no. 2, p. 024103, 2014. [34] S. L. Clainche and J. M. V ega, “Higher-order dynamic mode decom- position, ” SIAM J ournal on Applied Dynamical Systems , vol. 16, no. 2, pp. 882–925, 2017. [35] M. O. Williams, I. G. Ke vrekidis, and C. W . Rowley , “ A data-driven approximation of the K oopman operator: Extending dynamic mode decomposition, ” Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307– 1346, 2015. [36] M. Korda and I. Mezi ´ c, “On con vergence of extended dynamic mode decomposition to the Koopman operator , ” Journal of Nonlinear Science , vol. 28, no. 2, pp. 687–710, 2018. [37] M. Haseli and J. Cort ´ es, “ Approximating the Koopman operator using noisy data: noise-resilient extended dynamic mode decomposition, ” in American Contr ol Confer ence , Philadelphia, P A, Jul. 2019, pp. 5499– 5504. [38] E. Qian, B. Kramer, B. Peherstorfer, and K. Willcox, “Lift & learn: Physics-informed machine learning for large-scale nonlinear dynamical systems, ” Physica D: Nonlinear Phenomena , vol. 406, p. 132401, 2020. [39] Q. Li, F . Dietrich, E. M. Bollt, and I. G. Kevrekidis, “Extended dynamic mode decomposition with dictionary learning: A data-driv en adaptiv e spectral decomposition of the Koopman operator , ” Chaos , vol. 27, no. 10, p. 103111, 2017. [40] N. T akeishi, Y . Kawahara, and T . Y airi, “Learning Koopman inv ariant subspaces for dynamic mode decomposition, ” in Confer ence on Neural Information Pr ocessing Systems , 2017, pp. 1130–1140. [41] E. Y eung, S. Kundu, and N. Hodas, “Learning deep neural network representations for Koopman operators of nonlinear dynamical systems, ” in American Contr ol Conference , Philadelphia, P A, Jul. 2019, pp. 4832– 4839. [42] S. E. Otto and C. W . Ro wley , “Linearly recurrent autoencoder networks for learning dynamics, ” SIAM Journal on Applied Dynamical Systems , vol. 18, no. 1, pp. 558–593, 2019. [43] S. L. Brunton, B. W . Brunton, J. L. Proctor, and J. N. Kutz, “Koop- man in v ariant subspaces and ﬁnite linear representations of nonlinear dynamical systems for control, ” PLOS One , vol. 11, no. 2, pp. 1–19, 2016. [44] M. Korda and I. Mezic, “Optimal construction of K oopman eigen- functions for prediction and control, ” IEEE T ransactions on Automatic Contr ol , 2020, to appear. [45] S. Klus, F . N ¨ uske, S. Peitz, J. H. Niemann, C. Clementi, and C. Sch ¨ utte, “Data-dri ven approximation of the K oopman generator: Model reduction, system identiﬁcation, and control, ” arXiv pr eprint arXiv:1909.10638 , 2019. [46] G. B. Folland, Real Analysis: Modern T echniques and Their Applica- tions , 2nd ed. New Y ork: W iley , 1999. [47] M. Korda and I. Mezi ´ c, “Optimal construction of Koopman eigenfunctions for prediction and control, ” 2019. [Online]. A vailable: https://hal.archiv es- ouvertes.fr/hal- 02278835 [48] X. Li, S. W ang, and Y . Cai, “T utorial: Complexity analysis of singular value decomposition and its v ariants, ” arXiv preprint , 2019. [49] L. Mirsky , “Symmetric gauge functions and unitarily inv ariant norms, ” The Quarterly Journal of Mathematics , v ol. 11, no. 1, pp. 50–59, 1960. [50] I. Markovsky and S. V . Huffel, “Overview of total least-squares meth- ods, ” Signal processing , vol. 87, no. 10, pp. 2283–2302, 2007. [51] C. Eckart and G. Y oung, “The approximation of one matrix by another of lower rank, ” Psychometrika , vol. 1, no. 3, pp. 211–218, 1936. [52] R. M. Jungers and P . T abuada, “Non-local linearization of nonlinear differential equations via polyﬂows, ” in American Contr ol Conference , Philadelphia, P A, 2019, pp. 1906–1911. A P P E N D I X Here we gather some linear algebraic results. Lemma A.1: (Intersection of Linear Spaces): Let A, B ∈ R m × n be matrices with full column rank. Suppose that the columns of Z = [( Z A ) T , ( Z B ) T ] T ∈ R 2 n × l form a basis for the null space of [ A, B ] , where Z A , Z B ∈ R n × l . Then, (a) R ( AZ A ) = R ( A ) ∩ R ( B ) ; (b) Z A and Z B hav e full column rank. Pr oof: (a) First, note that R ( AZ A ) ⊆ R ( A ) . Moreover , by hypothesis, [ A, B ] Z = 0 , which leads to R ( AZ A ) = R ( B Z B ) ⊆ R ( B ) . Consequently , R ( AZ A ) ⊆ R ( A ) ∩ R ( B ) . Now , suppose that v ∈ R ( A ) ∩R ( B ) . By deﬁnition, there exist vectors w 1 , w 2 ∈ R n such that v = Aw 1 = B w 2 . Then,  A, B   w 1 − w 2  = 0 , which means that [ w T 1 , − w T 2 ] T ∈ R ( Z ) and there exists r ∈ R l such that w 1 = Z A r and w 2 = − Z B r . Therefore, v = Aw 1 = AZ A r ∈ R ( AZ A ) . Consequently , R ( A ) ∩ R ( B ) ⊆ R ( AZ A ) and the claim follows. (b) W e prove this part using contradiction. Suppose that there exists v 6 = 0 such that Z A v = 0 . Also, since [ A, B ] Z = 0 , one can conclude that AZ A v = − B Z B v = 0 . Since B has full column rank, we hav e Z B v = 0 . Hence Z v = 0 , which is a contradiction since the columns of Z are linearly independent. A similar reasoning sho ws that Z B has full column rank. Lemma A.2: Let A, C, D be matrices of appropriate sizes, with A having full column rank. Then R ( AC ) ⊆ R ( AD ) if and only if R ( C ) ⊆ R ( D ) . Pr oof: ( ⇒ ) : Suppose that v ∈ R ( C ) , and hence there exists w such that v = C w . Therefore, Av = AC w ∈ R ( AC ) . Since R ( AC ) ⊆ R ( AD ) , one can deduce that there exist r 17 such that AC w = AD r , and we get A ( v − D r ) = 0 . This leads to v = D r ∈ R ( D ) since A has full column rank, and hence R ( C ) ⊆ R ( D ) . ( ⇐ ) : Suppose that v ∈ R ( AC ) , and hence v = AC w for some w . Since R ( C ) ⊆ R ( D ) , there exists r such that C w = Dr . As a result, v = AC w = AD r which leads to the conclusion that v ∈ R ( AD ) and the claim follows. Lemma A.3: Let A 1 , B 1 ∈ R m × n , A 2 , B 2 ∈ R l × n , and C ∈ R n × k with A 1 , B 1 , C having full column rank. If R  A 1 A 2  ! = R  B 1 B 2  ! , (62a) R ( A 1 C ) = R ( B 1 C ) , (62b) then R  A 1 A 2  C ! = R  B 1 B 2  C ! . Pr oof: Based on (62a), one can deduce that there exists a nonsingular square matrix K such that A 1 = B 1 K, (63a) A 2 = B 2 K. (63b) Multiplying both sides of (63a) by C giv es A 1 C = B 1 K C . (64) Moreov er , based on (62b), one can deduce that there exists a nonsingular square matrix K ∗ A 1 C = B 1 C K ∗ . (65) By subtracting (65) from (64) and considering the fact that B 1 has full column rank, one can deduce that C K ∗ = K C . (66) Now , multiplying both sides of (63b) from the right by C and using (66), one can write A 2 C = B 2 C K ∗ , which in conjunction with (65) leads to  A 1 A 2  C =  B 1 B 2  C K ∗ , completing the proof. Lemma A.4: Let A , { C i } ∞ i =1 , and ˆ C be matrices of ap- propriate sizes. Assume that A has full column rank and R ( ˆ C ) = T ∞ i =1 R ( C i ) . Then R ( A ˆ C ) = T ∞ i =1 R ( AC i ) . Pr oof: First, we prov e that R ( A ˆ C ) ⊆ T ∞ i =1 R ( AC i ) . Let v ∈ R ( A ˆ C ) , then there exists a vector w such that v = A ˆ C w = Ar , with r = ˆ C w . Note that r ∈ R ( ˆ C ) and consequently r ∈ R ( C i ) for all i ∈ N . Hence, for ev ery i ∈ N there exists z i such that r = C i z i . Based on the deﬁnition of v , we hav e v = Ar = AC i z i for e very i ∈ N . As a result, v ∈ T ∞ i =1 R ( AC i ) which concludes the proof of this inclusion. Now , we prove that T ∞ i =1 R ( AC i ) ⊆ R ( A ˆ C ) . Let v ∈ T ∞ i =1 R ( AC i ) . Then for e very i ∈ N , there exists w i such that v = AC i w i . Moreover , v ∈ R ( A ) and since A has full column rank, there exists a unique r such that v = Ar . Therefore, for all i ∈ N , we hav e r = C i w i . Thus, r ∈ T ∞ i =1 R ( C i ) and consequently , r ∈ R ( ˆ C ) . Since v = Ar , we have v ∈ R ( A ˆ C ) , concluding the proof. Masih Haseli was born in K ermanshah, Iran in 1991. He received the B.Sc. degree, in 2013, and M.Sc. degree, in 2015, both in Electrical Engi- neering from Amirkabir University of T echnology (T ehran Polytechnic), T ehran, Iran. In 2017, he joined the Uni versity of California, San Die go to pursue the Ph.D. degree in Mechanical and Aerospace Engineering. His research interests in- clude system identiﬁcation, nonlinear systems, net- work systems, data-dri ven modeling and control, and distributed and parallel computing. Mr . Haseli was the recipient of the bronze medal in Iran’ s national mathematics competition in 2014. Jorge Cort ´ es (M’02, SM’06, F’14) received the Licenciatura degree in mathematics from Univ er- sidad de Zaragoza, Zaragoza, Spain, in 1997, and the Ph.D. degree in engineering mathematics from Univ ersidad Carlos III de Madrid, Madrid, Spain, in 2001. He held postdoctoral positions with the Uni- versity of T wente, T wente, The Netherlands, and the Univ ersity of Illinois at Urbana-Champaign, Urbana, IL, USA. He was an Assistant Professor with the Department of Applied Mathematics and Statistics, Univ ersity of California, Santa Cruz, CA, USA, from 2004 to 2007. He is currently a Professor in the Department of Mechanical and Aerospace Engineering, University of California, San Diego, CA, USA. He is the author of Geometric, Control and Numerical Aspects of Nonholonomic Systems (Springer-V erlag, 2002) and co-author (together with F . Bullo and S. Mart ´ ınez) of Distributed Control of Robotic Networks (Princeton University Press, 2009). He is a Fellow of IEEE and SIAM. His current research interests include distributed control and optimization, network science, resource-aware control, nonsmooth analysi s, reasoning and decision making under uncertainty , network neuroscience, and multi-agent coordination in robotic, power , and transportation networks.

Learning Koopman Eigenfunctions and Invariant Subspaces from Data: Symmetric Subspace Decomposition

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment