Low-Rank Dynamic Mode Decomposition: An Exact and Tractable Solution

Noname man uscript No. (will b e inserted b y the editor) Lo w-Rank Dynamic Mo de Decomp osition: An Exact and T ractable Solution P atric k H´ eas · C´ edric Herzet Received: date / Accepted: date Abstract This w ork studies the linear appro ximation of high-dimensional dy- namical systems using lo w-rank dynamic mo de decomposition (DMD). Searc h- ing this approximation in a data-driven approach is formalized as attempting to solve a lo w-rank constrained optimization problem. This problem is non- con vex and state-of-the-art algorithms are all sub-optimal. This paper sho ws that there exists a closed-form solution, which is computed in polynomial time, and c haracterizes the ` 2 -norm of the optimal appro ximation error. The pap er also prop oses low-complexit y algorithms building reduced mo dels from this optimal solution, based on singular v alue decomp osition or eigen v alue decomp osition. The algorithms are ev aluated b y numerical sim ulations using syn thetic and physical data b enchmarks. Keyw ords Reduced models · low-rank appro ximation · constrained opti- mization · dynamical mo de decomp osition 1 Introduction 1.1 Con text The numerical discretization of a partial diﬀeren tial equation parametrized b y its initial condition often leads to a very high dimensional system of the form: ( x t ( θ ) = f t ( x t − 1 ( θ )) x 1 ( θ ) = θ , t = 2 , . . . , T , (1) where x t ( θ ) ∈ R n is the state v ariable, f t : R n → R n , and θ ∈ R n denotes an initial condition. In some context, e.g. , for uncertain t y quantiﬁcation purp oses, INRIA Centre Rennes - Bretagne Atlan tique & IRMAR - UMR CNRS 6625, campus universitaire de Beaulieu, 35042 Rennes, F rance. E-mail: patrick.heas@inria.fr 2 Patric k H´ eas, C´ edric Herzet one is interested by computing a set of tra jectories corresp onding to diﬀeren t initial conditions θ ∈ Θ ⊂ R n . This ma y constitute an intractable task due to the high dimensionality of the space embedding the tra jectories. F or instance, in the case where f t is linear, the complexity required to compute a tra jectory of mo del (1) scales in O ( T n 2 ), which is prohibitive for large v alues of n or T . T o deal with these large v alues, reduced mo dels approximate the tra jecto- ries of the system for a range of regimes determined b y a set of initial condi- tions [6]. A common assumption is that the tra jectories of interest are w ell ap- pro ximated in a low-dimensional subspace of R n . In this spirit, man y tractable appro ximations of mo del (1) hav e b een prop osed, in particular the well-kno wn Petr ov-Galerkin pr oje ction [30]. How ever, these metho ds require the knowledge of the equations ruling the high-dimensional system. Alternativ ely , there exist data-driv en approaches. In particular, linear in- v erse mo deling [29], principal oscillating patterns [13], or more recen tly , dy- namic mode decomp osition (DMD) [5, 8, 17, 20, 22, 32, 34] propose to approxi- mate the unknown function f t b y a linear and low-rank op erator. This linear framew ork has b een extended to quadratic appro ximations of f t in [7]. Al- though linear approximations are in appearance restrictive, they ha v e recen tly spark ed a new surge of in terest because they are at the core of the so-called ex- tended DMD or k ernel-based DMD [3, 25, 35, 36, 38]. The latter decompositions c haracterize accurately non-linear b ehaviours under certain conditions [21]. Reduced mo dels based on low-rank linear appro ximations substitute func- tion f t b y a matrix ˆ A k ∈ R n × n with r = rank( ˆ A k ) ≤ n as ( ˜ x t ( θ ) = ˆ A k ˜ x t − 1 ( θ ) , t = 2 , . . . , T , ˜ x 1 ( θ ) = θ , (2) where { ˜ x t ( θ ) } T t =1 denotes an approximation of the tra jectory { x t ( θ ) } T t =1 of system (1). The complexit y for the ev aluation of a tra jectory approximation with (2) will b e refered to as on-line c omplexity . A lo w on -line complexit y is obtained b y exploiting the low rank of matrix ˆ A k . A scaling in O ( T r 2 + r n ) is reached if the reduced mo del is parametrized by matrices R, L ∈ C n × r and S ∈ C r × r suc h that tra jectories of (2) corresp ond to the recursion      ˜ x t ( θ ) = Rz t , t = 2 , . . . , T , z t = S z t − 1 , t = 3 , . . . , T , z 2 = L | θ . (3) The equiv alence of systems (2) and (3) is obtained for T ≥ 2 b y setting ˆ A T − 1 k = RS T − 2 L | . In particular, consider a factorization of the form ˆ A k = P Q | with P , Q ∈ R n × r . (4) This factorization is alw a ys p ossible b y computing the singular v alue decomp o- sition (SVD) ˆ A k = U ˆ A k Σ ˆ A k V | ˆ A k and identifying P = U ˆ A k and Q | = Σ ˆ A k V | ˆ A k . F actorization (4) implies that tra jectories of (2) are obtained with system (3) Low-Rank Dynamic Mode Decomposition: 3 setting R = P , L = Q and S = Q | P . Another factorization of interest relies on the eigen v alue decomp osition (EVD) ˆ A k = D ΛD − 1 , with D , Λ ∈ C n × n , (5) where Λ is a Jordan-blo c k matrix [11] of rank r. Using the “economy size” EVD yields a system of the form of (3). Indeed, it is obtained by making the iden tiﬁcation L = ( ξ 1 · · · ξ r ) and R = ( ζ 1 · · · ζ r ), where ξ i ∈ C n and ζ i ∈ C n are the i -th left and right eigenv ectors of ˆ A k (equiv alently the i -th column of ( D − 1 ) | and D ), and identifying S to the ﬁrst r × r block of Λ multiplied by L | . The on-line complexity to compute this recursion is still O ( T r 2 + r n ). But assuming that ˆ A k is diagonalizable 1 , we ha v e S = diag( λ 1 , · · · , λ r ) and system (3) b ecomes      ˜ x t ( θ ) = r X i =1 ζ i ν i,t , ν i,t = λ t − 1 i ξ | i θ , for i = 1 , . . . , rank( ˆ A k ) , t = 2 , . . . , T , (6) where λ i ∈ C is the i -th (non-zero) eigenv alue of ˆ A k . This reduced-mo del p os- sesses a v ery desirable on-line complexity of O ( r n ), i.e. , linear in the ambien t dimension n , linear in the reduced-model intrinsic dimension r and indepen- den t of the tra jectory length T . The key of reduced modeling is to ﬁnd a “go o d” tradeoﬀ b etw een the on- line complexity and the accuracy of the approximation. As shown previously , the low on-line computational eﬀort is obtained by a proper factorization of the lo w-rank matrix ˆ A k . Thus, in an oﬀ-line stage, it remains to i) searc h ˆ A k within the family of low-rank matrices which yields the “b est” approximation (2), ii) compute the SVD or EVD based factorization of ˆ A k . W e will refer to the computational cost asso ciated to these t wo steps as oﬀ-line c omplexity . A standard c hoice is to select ˆ A k inducing the b est tra jectory appro xima- tion in the ` 2 -norm sense, for initial conditions in the set Θ ⊂ R n : matrix ˆ A k in (2) targets the solution of the follo wing minimization problem for some giv en k ≤ n : arg min A :rank( A ) ≤ k Z θ ∈ Θ T X t =2 k x t ( θ ) − A t − 1 θ k 2 2 , (7) where k · k 2 denotes the ` 2 -norm. Since w e fo cus on data-driven approaches, w e assume that we do not know the exact form of f t in (1) and we only hav e access to a set of representativ e tra jectories { x t ( θ i ) } T t =1 , i = 1 , ..., N so-called snapshots , obtained by running the high-dimensional system for N diﬀerent initial conditions { θ i } N i =1 in the set Θ . Using these snapshots, we consider a 1 Diagonalizability is guaran teed if all the non-zero eigen v alues are distinct. Ho wev er, this condition is only suﬃcient and the class of diagonalizable matrices is larger [18]. 4 Patric k H´ eas, C´ edric Herzet discretized version of (7), whic h corresponds to the constrained optimization problem studied in [5, 20, 37]: matrix ˆ A k no w targets the solution A ? k ∈ arg min A :rank( A ) ≤ k N X i =1 T X t =2 k x t ( θ i ) − Ax t − 1 ( θ i ) k 2 2 , (8) where we hav e substituted A t − 1 θ i in (7) b y Ax t − 1 ( θ i ) and where we hav e appro ximated the integral b y an empirical av erage ov er the snapshots. Problem (8) is non-conv ex due to the presence of the rank constraint “rank( A ) ≤ k ”. As consequence, it has been considered as intractable in several con tributions of the litterature and numerous pro cedures hav e b een prop osed to approximate its solution (see next section). In this pap er, we sho w that problem (8) is in fact tractable and admits a closed-form solution whic h can b e ev aluated in p olynomial-time. 1.2 Problem Statemen t and Contributions This w ork deals with the oﬀ-line construction of reduced mo dels of the form of (3). It fo cuses on the following questions: 1. Can we compute a solution of problem (8) in p olynomial time? 2. How to compute eﬃciently a factorization of this solution, and in particular its EVD? Let us mak e some corresp ondences with the terminology used in the DMD literature [5, 8, 17, 20, 22, 32, 34] in order to reform ulate these t w o questions in the jargon used in this communit y . The “low-rank DMD” of system (1) refers to the EVD of the solution A ? k of problem (8), or equiv alently to the param- eters of reduced mo del (6) in the case where ˆ A k = A ? k is diagonalizable. 2 Using this terminology , the tw o ab o ve questions can b e summarized summa- rized as follo ws: can we compute exactly and with a p olynomial complexit y the lo w-rank DMD of system (1)? In this paper, we show that the answer to this question is positive and provide some numerical procedures to attain this goal. Solv er for problem (8) . In the last decade, there has b een a surge of in terest for low-rank solutions of linear matrix equations, see e.g. , [10, 19, 23, 24, 27, 31]. This class of problems includes (8) as an imp ortan t particular case. Problems in this class are alwa ys non-conv ex due to the rank constraint and computing their solutions in p olynomial time is often out of reac h. Nev erthe- less, certain instances of these problems with very sp ecial structures admit closed-form solutions [9, 26, 28]. In this work, we sho w that (8) b elongs to this class of problems and pro vide a closed-form solution whic h can b e computed in p olynomial time. Prior to this work, many authors hav e prop osed tractable pro cedures to compute appro ximations of the solution to problem (8) [5, 20, 2 The “DMD” of system (1) refers to the EVD of the solution of problem (8) without the low-rank constraint. Low-Rank Dynamic Mode Decomposition: 5 25, 34, 37, 38] or to related problems [17]. W e review these con tributions in Sec- tion 3.1 and discuss their complexit y . F actorization of the solution. The second problem concerns the com- putation of the factorization of the form (4) or (5) of the solution A ? k ∈ R n × n . A brute-force computation of a factorization of a matrix in R n × n , in par- ticular an EVD, is prohibitiv e for large v alues of n . In this w ork, w e prop ose lo w-complexity algorithms computing suc h factoization of A ? k . This follo ws the line and extends previous w orks [20, 34, 36], as detailed in Section 3.2. In summary , the contribution of this pap er is tw ofold. First, we provide a closed-form solution to (8). W e also design an algorithm computing a factor- ized form of this solution with a linear complexity in the am bient dimension. Second, we pro vide an algorithm computing the EVD of this optimal solution, whic h do es not imply an increase in complexity . The paper is organized as follows. In Section 3, w e provide a review of tec hniques approximating and factorizing the solution of problem (8). In Sec- tion 4, we presen t the prop osed approac h. Finally , in Section 5, w e study the p erformance obtained with the prop osed algorithms in synthetic and physical setups and compare with state-of-the-art. 2 Notations All along the pap er, we mak e extensive use of the economy-size SVD of a matrix M ∈ R p × q with p ≥ q : M = U M Σ M V | M with U M ∈ R p × q , V M ∈ R q × q and Σ M ∈ R q × q so that U | M U M = V | M V M = I q and Σ M is diagonal, where the upp er script · | refers to the transp ose and I q denotes the q -dimensional identit y matrix. The columns of matrices U M and V M are denoted U M = ( u 1 M · · · u q M ) and V M = ( v 1 M · · · v q M ) while Σ M = diag( σ M , 1 , · · · , σ M ,q ) with σ M ,i ≥ σ M ,i +1 for i = 1 , . . . , q − 1. The Mo ore-P enrose pseudo-inv erse of matrix M is then deﬁned as M † = V M Σ † M U | M , where Σ † M = diag( σ † M , 1 , · · · , σ † M ,q ) with σ † M ,i = ( σ − 1 M ,i if σ M ,i > 0 0 otherwise . The orthogonal pro jector on to the span of the columns (resp. of the rows) of matrix M is denoted by P M = M M † = U M Σ M Σ † M U | M (resp. P M | = M † M = V M Σ † M Σ M V | M ) [11]. W e also introduce additional notations to derive a matrix formulation of the lo w-rank estimation problem (8). W e gather consecutive elements of the i -th snapshot tra jectory b etw een time t 1 and t 2 in matrix X ( i ) t 1 : t 2 = ( x t 1 ( θ i ) · · · x t 2 ( θ i )) and form large matrices X , Y ∈ R n × m with m = N ( T − 1) as X = ( X (1) 1: T − 1 · · · X ( N ) 1: T − 1 ) and Y = ( X (1) 2: T · · · X ( N ) 2: T ) . In order to b e consistent with the SVD deﬁnition and to keep the presen tation as simple as possible, this w ork assumes that m ≤ n . Ho w ev er, all the result 6 Patric k H´ eas, C´ edric Herzet presen ted in this w ork can be extended without any diﬃculty to the case where m > n b y using an alternative deﬁnition of the SVD. 3 State-Of-The-Art Approximations W e begin by presenting state-of-the-art methods solving approximativ ely the lo w-rank minimization problem (8). In a second part, w e mak e an ov erview of state-of-the-art algorithms computing factorizations of these approximated solutions of the form of (4) or (5). 3.1 T ractable Approximations to Problem (8) Using the notations in tro duced in Section 2, problem (8) can b e rewritten as A ? k ∈ arg min A :rank( A ) ≤ k k Y − A X k 2 F , (9) where k · k F refers to the F rob enius norm. A detailed review of the follo wing state-of-the-art appro ximations of A ? k is pro vided in our technical note [15]. T runcation of the unconstrained solution. A ﬁrst appro ximation con- sists in removing the low-rank constrain t in problem (9). As p ointed out b y T u et al. in [34], the problem then b oils down to a least-squares problem arg min A k Y − A X k 2 F , (10) admitting the closed-form solution YX † . Matrix YX † also solves the con- strained problem (9) in the case where k ≥ m and in particular for k = m , i.e. , A ? m = YX † . (11) This solution relies on the SVD of X : A ? m = Y V X Σ † X U | X , which is computed with a complexit y of O ( m 2 ( m + n )) [11]. An appro ximation of the solution of (9) satisfying the low-rank constraint rank( A ) ≤ k with k < m is then obtained b y a truncation of the SVD or the EVD of A ? m using k terms. Appro ximation by lo w-rank pro jected DMD. The “pr oje cte d DMD” prop osed by Schmid [32] is an appro ximation of A ? m , which assumes that the columns of A ? m X are in the span of X . This assumption is used b y Jovanovic et al. [20] to approximate A ? k for k < m . Their approac h yields the so-called “low-r ank pr oje cte d DMD” approximation of (9) which takes the follo wing form A ? k ≈ U X ˜ Y k Σ † X U | X , (12) where ˜ Y k denotes the SVD represen tation of matrix ˜ Y = U | X Y V X truncated to k terms. Similar low-dimensional parametrizations of the optimal solution A ? k Low-Rank Dynamic Mode Decomposition: 7 are used to compute the so-called “optimize d DMD” in [5] or “optimal mo de de c omp osition ” in [37]. The computation of low-rank pro jected DMD relies on the SVD of X ∈ R n × m and ˜ Y ∈ R m × m and thus inv olv es a complexity of O ( m 2 ( m + n )) [11]. Appro ximation by sparse DMD. Jovanovic et al. also prop ose in [20] a tw o-stage approach which consists in searc hing a k terms approximation of the EVD of an unconstrained pro jected DMD, i.e. , the EVD of U X ˜ Y Σ † X U | X . The optimal truncated EVD is eﬃcien tly computed solving a relaxed conv ex optimization problem taking adv an tage of the EVD of ˜ Y Σ † X ∈ R m × m . The o verall complexit y of this pro cedure scales as O ( m 2 ( m + n )). Appro ximation b y total-least-square DMD. This appro ximation is prop osed b y Hemati et al. [17]. Using the pro jector P K | ,k = V K ,k ( V K ,k ) | where the columns of V K ,k ∈ R m × k are the righ t singular v ectors associated to the k largest singular v alues of matrix K =  X | Y |  | ∈ R 2 n × m , the total-least- square DMD (TLS DMD) appro ximation takes the form of A ? k ≈ Y P K | ,k X † . (13) This method relies on the SVD of K ∈ R 2 n × m and X ∈ R n × m and has thus a complexit y of O ( m 2 ( m + n )). Appro ximation b y con v ex relaxation. Some w orks prop ose to approx- imate (9) by a regularized version of the unconstrained problem (10), using Tikhono v penalization [25] or penalization enforcing structured sparsit y [38]. Ho wev er, these choices of regularizers do not guaran tee in general that the so- lution is low-rank. In contrast, the solution of (9) ma y under certain theoretical conditions [24, 19] b e recov ered by the following quadratic program A ? k ≈ arg min A ∈ R n × n k Y − A X k 2 F + α k k A k ∗ , = arg min A ∈ R n × n min B ∈ R n × n k Y − A X k 2 F + α k k B k ∗ s.t. A = B (14) where k · k ∗ refers to the nuclear norm (or trace norm) of the matrix, i.e. , the sum of its singular v alues. In optimization problem (14), α k ∈ R + repre- sen ts an appropriate regularization parameter determining the rank k of the solution. Program (14) is a conv ex optimization problem [27] whic h can b e eﬃcien tly solved using mo dern optimization techniques, such as the alternate directions of m ultipliers metho d (ADMM) [2]. The algorithms solving (14) t ypically inv olve p er iteration a complexity of O ( m ( m 2 + n 2 )). 3.2 F actorizations of Approximations of A ? k In this section, we provide an ov erview of some state-of-the-art metho ds to compute factorizations of the form of (4) or (5) for the approximations of A ? k presen ted ab ov e. 8 Patric k H´ eas, C´ edric Herzet W e ﬁrst note that a brute-force computation of the SVD or EVD of a matrix in R n × n leads in general to a prohibitiv e computational cost since it requires a complexit y of O ( n 3 ). Hop efully , the factorizations (4) or (5) are computable with a complexit y of O ( m 2 ( m + n )), in most cases men tioned ab o ve. In particular, in the case of low-rank pro jected DMD, a straigh tforward factorization of the form of (4) is P = U X and Q | = ˜ Y k Σ † X U | X . In the case of sparse DMD, the latter factorization holds b y substituting ˜ Y k b y the “sparse” appro ximation of ˜ Y . Another straigh tforw ard factorization of the form of (4) is intrinsic to the ADMM pro cedure, which uses an SVD to compute the regularized solution. Concerning EVD factorization, in the case of the truncated approac h, T u et al. prop ose an algorithm scaling in O ( m 2 ( m + n )) [34]. In the context of lo w-rank pro jected DMD or sparse DMD, Jovanovic et al. propose a pro ce- dure of analogous complexity , whic h approximates the ﬁrst m eigenv ectors, and then estimate the related eigen v alues b y solving a conv ex optimization problem [20]. In the case of TLS DMD, the diagonalization of a certain matrix in R m × m suﬃces to obtain the sough t EVD factorization. In summary , on the one hand, w e saw in Section 3.1 that all existing algo- rithms compute in general sub-optimal solutions of problem (9). On the other hand, the literature do es not provide a “turnk ey” algorithm for factorizing the optimal solution A ? k in the form of (4) or (5), with a reduced complexit y of O ( m 2 ( m + n )). In the next section, we sho w how to compute an optimal solution A ? k and compute eﬃcien tly its factorization. 4 The Prop osed Approach In this section, we provide a closed-form solution to problem (9). Algorithms are then proposed to compute and factorize this solution in the form of (4) or (5). 4.1 Closed-F orm Solution to (9) Let the columns of matrix U Z ,k =  u 1 Z · · · u k Z  ∈ R n × k b e the left singular v ectors { u i Z } k i =1 asso ciated to the k largest singular v alues of matrix Z = Y P X | ∈ R n × m , (15) where w e recall that P X | = V X V | X and consider the pro jector P Z ,k = U Z ,k U Z ,k | . (16) Matrix (16) app ears in the closed-form solution of (9), as shown in the follow- ing theorem. W e detail the pro of in App endix A. Low-Rank Dynamic Mode Decomposition: 9 Theorem 1 Pr oblem (9) admits the fol lowing solution A ? k = P Z ,k YX † . (17) Mor e over, the optimal appr oximation err or c an b e expr esse d as k Y − A ? k X k 2 F = m X i = k +1 σ 2 Z ,i + k Y ( I m − P X | ) k 2 F . (18) In w ords, Theorem 1 shows that problem (9) is simply solv ed b y computing the orthogonal pro jection of the solution of the unconstrained problem (10), on to the subspace spanned b y the ﬁrst k left singular vectors of Z . The ` 2 -norm of the error is simply expressed in terms of the singular v alues of Z , and the square norm of the pro jection of the ro ws of Y on to the orthogonal of the image of X | . If X is full row-rank, w e then obtain the simpliﬁcations P X | = I m and Z = Y . In this case, the second term in the righ t-hand side of (18) v anishes and the approximation error reduces to k Y − A ? k X k 2 F = P m i = k +1 σ 2 Y ,i . The latter error is indep enden t of matrix X and is simply the sum of the square of the m − k smallest singular v alues of Y . This error also corresp onds to the optimal error for the appro ximation Y b y a matrix of rank at most k in the F rob enius norm [9]. It is worth mentioning that we prop ose in [14] a generalization of The- orem 1 to separable inﬁnite-dimensional Hilb ert spaces. This generalization c haracterizes the solution of lo w-rank approximations in repro ducing kernel Hilb ert spaces (where n = ∞ ) at the core of k ernel-based DMD [16, 36], and c haracterizes the solution of the DMD counterpart (where m = ∞ ) to the con tinuous POD problem presen ted in [30, Theorem 6.2]. 4.2 Algorithm Ev aluating A ? k Algorithm 1 Computation of A ? k , a solution of (9) inputs : ( X , Y ) . 1) Compute the SVD of X = V X Σ † X U | X 2) Compute Z = Y V X Σ X Σ † X V | X . 3) Compute the SVD of Z to obtain the pro jector P Z ,k . 4) Compute A ? k = P Z ,k Y V X Σ † X U | X . output : A ? k . The design of an algorithm computing the solution (17) is straightforw ard: ev aluating A ? k consists in making a product of easily-computable matrices. The prop osed pro cedure is summarized in Algorithm 1. Steps 1) to 3) of Algorithm 1 implies the computation of the SVD of matri- ces X , Z ∈ R n × m , and matrix multiplications inv olving m 2 v ector pro ducts in R n or R m . The complexity of these ﬁrst three steps is therefore O ( m 2 ( m + n )). 10 Patric k H´ eas, C´ edric Herzet Algorithm 2 EVD of A ? k or lo w-rank DMD inputs : ( X , Y ) . 1) Compute step 1 to 3 of Algorithm 1 and use (19) to obtain W . 2) Let r = rank( A ? k ) and solv e for i = 1 , . . . , r the eigen-equations ( W | U Z ,k ) w r i = λ i w r i and ( U Z ,k | W ) w ` i = λ i w ` i , where w r i , w ` i ∈ C k and λ i ∈ C suc h that | λ i +1 | ≥ | λ i | . 3) Compute for i = 1 , . . . , r the right and left eigenv ectors ζ i = U Z ,k w r i and ξ i = W w ` i . (20) 4) Rescale the ξ i ’s so that ξ i T ζ i = 1. outputs : L = ( ξ 1 · · · ξ r ), R = ( ζ 1 · · · ζ r ), S = diag( λ 1 , · · · , λ r ) . Computing explicitly each entry of A ? k ∈ R n × n in step 4) of Algorithm 1 then requires a complexity of O ( n 2 k ), which is prohibitive for large n . How ever, as detailed in the next section, this last step is not necessary to factorize the optimal solution A ? k in the form of (4) or (5). 4.3 Algorithms F actorizing A ? k Giv en the closed-form solution (17), we present in what follows how to compute from X and Y a factorization of the optimal solution A ? k in the form of (4) or (5). W e will need matrix W = ( U Z ,k | YX † ) | ∈ R n × k . (19) F actorization of the form of (4) . By performing the ﬁrst three steps of Algorithm 1 and then making the identiﬁcations P = U Z ,k and Q = W , w e obtain a factorization of A ? k of the form of (4). As men tioned in the in- tro duction, tra jec tories of (2) can then be computed with system (3) setting R = U Z ,k , L = W and S = W | U Z ,k . The metho d relies on the ﬁrst three steps of Algorithm 1 and on the computation of matrix W . The three steps in Algorithm 1 imply a complexity of O ( m 2 ( m + n )) while the computation of W requires a complexit y of O ( nk 2 ). Since k ≤ m , the oﬀ-line complexity to build the factorization (4) from X and Y scales as O ( m 2 ( m + n )), which is the same order of complexit y as the pro cedures describ ed in Section 3. F actorization of the form of (5) . According to the previous factoriza- tion of the form of (4), A ? k is the pro duct of matrix U Z ,k in R n × k with matrix W | in R k × n . Therefore, using standard matrix analysis, we exp ect the eigen- v ectors of A ? k to belong to a k -dimensional subspace [11]. As sho wn in the next prop osition, the non-zero eigenv alues of A ? k are obtained b y EVD of certain matrices in R k × k . The pro of of this prop osition is giv en in App endix C. Low-Rank Dynamic Mode Decomposition: 11 Prop osition 1 Assume A ? k is diagonalizable. The elements of { ζ i , ξ i , λ i } rank( A ? k ) i =1 gener ate d by Algorithm 2 ar e the right eigenve ctors, the left eigenve ctors and the eigenvalues of the e c onomy size EVD of A ? k . In words, Prop osition 1 sho ws that Algorithm 2 computes the EVD of A ? k b y diagonalizing t wo matrices in R k × k . The complexity to build this EVD from snapshots X and Y is O ( m 2 ( m + n )). More precisely , as mentioned previously , p erforming the ﬁrst three steps of Algorithm 1 ( i.e. , step 1) of Algorithm 2) requires a num b er of op erations scaling as O ( m 2 ( m + n )); the complexity of step 2) is O ( k 3 ) since it p erforms the EVDs of k × k matrices; step 3) inv olv es r × n v ector products in R m while step 4) inv olves r v ector products in R n , with r ≤ k ≤ m . Ov erall, the complexit y of Algorithm 2 is dominated b y step 1) and the EVD of A ? k can be ev aluated with a computational cost of the order of O ( m 2 ( m + n )). 5 Numerical Ev aluation In what follows, w e ev aluate ﬁve diﬀeren t tra jectory approximations ˜ x t ( θ ) obtained by reduced model of the form of (6), obtained by EVD factorizations of diﬀeren t lo w-rank matrix appro ximations ˆ A k . The assessed lo w-rank matrix factorizations are listed b elo w. – Optimal appro ximation : the EVD of the optimal solution A ? k , pro vided b y Algorithm 2. – Approximation b y truncated DMD [34]: the k -th order truncation of the EVD of the unconstrained problem solution A ? m giv en in (11). – Approximation by lo w-rank pro jected DMD [20]: the EVD of the k -th order approximation (12). 3 – Approximation b y TLS DMD [17]: the EVD of (13). – Approximation b y con v ex relaxation : the EVD of the solution (14), the latter is computed b y an ADMM pro cedure, where the regularization parameter α k is adjusted to obtain a rank equal to k . Rather than ev aluating the error norm of the approximation, i.e. , the cost of the target problem (7), w e are in terested in the ability of the diﬀerent al- gorithms to minimize the cost of the proxy (9) for this problem. Therefore, the performance is measured in terms of the normalized reconstruction error norm k Y − ˆ A k X k F k Y k − 1 F as a function of k . Besides, in the analysis p ersp ec- tiv e adopted most often in the DMD literature [17, 20, 32, 34], w e are in terested in ev aluating the abilit y of the algorithms to compute accurately the EVD of A ? k . In particular, we quan tify for a given k the deviation of the set of the k largest estimated eigen v alues λ ( ˆ A k ) ∈ C k , from the non-zero optimal ones λ ( A ? k ) in terms of the normalized error norm k λ ( ˆ A k ) − λ ( A ? k ) k 2 k λ ( A ? k ) k − 1 2 . 3 W e do not ev aluate the sparse DMD approac h since the error norm induced by this method wil l alwa ys b e greater than the one induced b y lo w-rank pro jected DMD, see details in [15]. 12 Patric k H´ eas, C´ edric Herzet W e begin b y ev aluating the p erformance of the lo w-rank approximations on a diﬀerent toy mo dels in Section 5.1. W e then assess their p erformance for the reduction of a Rayleigh-B ´ enard conv ective system [4] in Section 5.2. W e ﬁnally ev aluate the inﬂuence of noise on the estimation accuracy in Section 5.3. 5.1 Syn thetic Exp eriments with T oy Mo dels W e set n = 50, N = 30 and set the tra jectory length in (1) to T = 2. Entries of matrices X , i.e. , the initial condition θ , are suc h that X = P rank( X ) i =1 ϕ i ϕ | i with rank( X ) ≤ m and where the ϕ i ’s are n -dimensional indep enden t samples of the standard normal distribution. Matrix Y is then generated using mo del (1) and three diﬀeren t choices for f t : – setting i) : f t ( x t − 1 ) = Gx t − 1 , where G is chosen such that G X ∈ span( X ), – setting ii) : f t ( x t − 1 ) = F x t − 1 , – setting iii) : f t ( x t − 1 ) = F ( x t − 1 + x 3 t − 1 ). Matrix F in tro duced ab ov e is a random matrices of rank m deﬁned as F = P m i =1 ϕ i ϕ | i , where the ϕ i ’s represen ts a set of n -dimensional indep endent samples of the standard normal distribution. W e deﬁne implicitly matrix G of rank m by drawing the parameters of the so-called companion matrix [32] with indep enden t samples of the standard normal distribution. The notation x 3 t − 1 denotes that each entry of v ector x t − 1 has been raised to the pow er 3. The ﬁrst 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 optimal truncated DMD [Tu14] projected DMD [Jovanovic12] TLS DMD [Hemati17] convex relax. by ADMM 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 1 Reconstruction error norm as a function of k for setting i ) , ii ) and iii ) (for rank ( X ) = m and rank ( X ) = m − 6) using our optimal approximation or state-of-the- art approximations. See details in Section 5.1. Low-Rank Dynamic Mode Decomposition: 13 0 10 20 30 40 50 0 0.5 1 1.5 2 2.5 truncated DMD [Tu14] projected DMD [Jovanovic12] TLS DMD [Hemati17] convex relax. by ADMM 0 10 20 30 40 50 0 0.5 1 1.5 2 2.5 0 10 20 30 40 50 0 1 2 3 4 5 6 7 0 10 20 30 40 50 0 1 2 3 4 5 6 Fig. 2 Eigenv alue error norm as a function of k for setting i ) , ii ) and iii ) (for rank( X ) = m and rank( X ) = m − 6) using our optimal approximation or state-of-the-art approximations. See details in Section 5.1. setting is a linear system satisfying the sub-space assumption G X ∈ span( X ), on which low-rank pro jected DMD relies. The tw o next settings do not make this assumption and simulate resp ectively linear and cubic dynamics. The p erformance of the ﬁve metho ds in terms of reconstruction and eigenv alue errors are displa yed in Figure 1 and 2. In accordance with our theoretical results, the prop osed algorithm yields the smallest error norms in all the scenarios. Moreov er, in accordance with Theorem 1, as long as we hav e a full-rank matrix X , the optimal solution reac hes a zero reconstruction error for k ≥ m with m = N ( T − 1) = 30 (w e ha ve considered N = 30 snapshots and T = 2 successive states). The deterioration of the reconstruction error norm for the appro ximation by truncated DMD sho ws that a tw o-stage approac h is sub-optimal. How ev er, this deterioration is mo derate in these toy exp eriments. Moreov er, the experiments show that the appro ximation b y low-rank pro jected DMD achiev es the optimal p erformance as long as the sub-space assumption holds, i.e. , for setting i) . If this assumption is not satisﬁed, i.e. , in setting ii) and iii) , we observe a po or p erformance of this pro jected approach for k > 10. On the contrary to low-rank pro jected DMD, TLS DMD p erforms p o orly in the case where G X ∈ span( X ), i.e. , for setting i) , while it yields a quasi-optimal error norm in the other settings. F urthermore, w e observe that the approximation provided by a conv ex relaxation approac h diﬀers signiﬁcantly from A ? k in all the considered settings, as indicated by the 14 Patric k H´ eas, C´ edric Herzet error norm for k < m . This suggests that the theoretical conditions necessary to recov er A ? k b y con vex relaxation do not hold here. Finally , as exp ected in the case where rank( X ) = m , the linear operator used to generate the snapshots is accurately recov ered b y our optimal approximation and truncated DMD, TLS DMD and ADMM for k ≥ m . In the case where rank( X ) = m − 6, we observ e (in go o d agreemen t with Theorem 1) that the optimal approximation error is lo wer b ounded by k Y ( I m − P X | ) k 2 F for k ≥ m . The eigen v alue error plots show that low-rank pro jected DMD and TLS DMD are optimal resp ectiv ely in setting i) and setting ii)- iii) , as long as X is full rank. How ev er, in other circumstances these metho ds are sub-optimal for k < m . Interestingly , we observ e that eigenv alues are well estimated by all the metho ds in all settings when k ≥ m , although lo w-rank pro jected DMD exhibits a high reconstruction error in settings ii)- iii) . These to y exp eriments sho w that ev en if eigen v alues are w ell estimated, related eigenv ectors can be ﬂa wed. 5.2 Ph ysical Exp eriments Ra yleigh-B´ enard mo del [4] constitutes a b enc hmark for conv ectiv e system in geophysics. Conv ection is driven by t w o coupled partial diﬀerential equa- tions ruling the ev olution of temp erature and vorticit y . After discretization of these equations on the cell [0 , 1] × [0 , 1 / 2], we obtain a discrete system of the form of (1) with x t ∈ R n , n = 1024. The regime of the conv ectiv e system is parametrized by tw o quantities: the Rayleigh num ber Ra ∈ R + and the Prandtl n um ber Pr ∈ R + . The initial condition θ = h ( ϑ ) is parametrized b y a vector ϑ ∈ R 4 , through the non-linear function h : R 4 → R 1024 , see details in [15]. Entries of the v ector ϑ are sampled uniformly on (a b ounded sub- domain of ) R 4 , yielding the samples { ϑ i } i . Then the ﬁrst eigen v ectors of the prop er orthogonal decomp osition of the set { h ( ϑ i ) } i are used to form an h yper- cub e of dimension d = 10. Finally , the set of initial conditions { θ i } i are ob- tained by uniform sampling on this h yp er-cub e. F or a particular parametriza- tion of the initial condition, Ra and Pr, the non-linear Rayleigh-B ´ enard system can simplify into a linear T aylor v ortex evolution [33]. A precise description of the Rayleigh-B ´ enard mo del, its parametrization and discretization is pro- vided in [15]. Three datasets of m = 50 snapshots of the discretized system tra jectories are computed b y numerical simulation: – setting iv ): N = 50 tra jectories of the linear T a ylor v ortex with T = 2, – setting v ): N = 5 tra jectories of the linear T a ylor v ortex with T = 11, – setting v i ): N = 5 tra jectories of the non-linear Rayleigh-B ´ enard system with T = 11. The p erformance of the diﬀerent algorithms in terms of reconstruction error is plotted as a function of k in Figure 3 in the case of these three set- tings. W e ﬁrst comment on results obtained in setting iv ). The error obtained b y the optimal approximation in this linear setting with T = 2 app ears to v anish for k ≥ d , i.e. , a dimensionality greater than the initial condition Low-Rank Dynamic Mode Decomposition: 15 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 optimal truncated DMD [Tu14] projected DMD [Jovanovic12] TLS DMD [Hemati17] convex relax. by ADMM 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 3 Reconstruction error norm as a function of k for settings iv ), v ) and v i ) using our optimal approximation or state-of-the-art approximations. See details in Section 5.2. dimensionalit y . This is consistent with Theorem 1. Indeed, we know b y the deﬁnition (15) that rank( Z ) ≤ rank( Y ); then, dealing with a linear mo del im- plies that rank( Y ) ≤ rank( X ); as T = 2 w e hav e rank( X ) = d and therefore rank( Z ) ≤ d whic h yields P m i = d +1 σ 2 Z ,i = 0; in addition we observe that the term k Y ( I m − P X | ) k 2 F ≈ 2 .e − 8 can be neglected in our exp erimen ts. It follows from the theorem that the optimal error v anishes for k ≥ d . The approximation b y truncated DMD is asso ciated to an important error which v anishes only for k = m , i.e. , for a dimensionality equal to the n um ber of snapshots. Con- cerning the approximation by lo w-rank pro jected DMD, it pro duces a fairly go o d solution up to k ≤ 8. How ev er, the appro ximation b ecomes clearly sub- optimal for greater dimensions and pro duces an non-negligible error, even for large v alues of k . TLS DMD yields for k < 10 an approximation slightly less accurate than the one we prop ose, while the p erformances of the tw o metho ds are indistinguishable for k ≥ 10. The approach by con vex relaxation pro duces fairly go o d results, how ev er with a p erformance signiﬁcantly low er than other state-of-the-art metho ds. In setting v ), we hav e longer sequences ( T > 2) so that we ha ve rank( X ) ≥ d . Although the dynamic is linear, no conclusion can b e drawn anymore from Theorem 1, except that the optimal approximation v anishes for k ≥ m . How- ev er, our optimal approximation yields an error which nearly v anishes for k ≥ d . This shows that, for this linear mo del, tra jectories concentrate near the subspace spanned by the initial condition. This explains the quasi-optimalit y of the appro ximation b y lo w-rank pro jected DMD, which relies on a strong 16 Patric k H´ eas, C´ edric Herzet assumption of linear dependence of snapshots. An appro ximation b y truncated DMD is again clearly sub-optimal and b eha v es analogously to setting iv ). The p erformance of TLS DMD is in this setting nearly optimal, while con v ex re- laxation is disapp oin ting up to some extent. In the more realistic geoph ysical setting v i ), w e see that the optimal p erfor- mance ac hiev ed b y our approximation is far from b eing reached by approxima- tions by truncated of low-rank pro jected DMD. Nevertheless, the approxima- tion obtained by TLS DMD is again nearly optimal. As in the linear settings, w e observe that the optimal error is small for k ≥ d . On the other hand, w e clearly notice that the assumption used in the approximation b y low-rank pro jected DMD do es not hold for this non-linear mo dels and pro duces an im- p ortan t error, even for large v alues of k . W e observe the p o or p erformance of an appro ximation by truncated DMD or using conv ex relaxation. 5.3 Robustness to Noise k 0 2 4 6 8 10 k Y ! ^ A k X k F k Y k F for setting vii) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 optimal truncate d DMD [Tu14] proje cte d DMD [Jovanov i c 12] TLS DMD [Hemati17] k 0 2 4 6 8 10 k Y ! ^ A k X k F k Y k F for setting viii) 0 0.1 0.2 0.3 0.4 0.5 optimal truncate d DMD [Tu14] proje cte d DMD [Jovanovic12] TLS DMD [Hemati17] Fig. 4 Error norm as a function of k for setting v ii ) and v iii ) using our optimal approxi- mation or state-of-the-art approximations. See details in Section 5.3. In the following, w e intend to ev aluate the ability of the diﬀerent methods to extract the eigen v ectors in (6) in the presence of noise. T o this aim, we build a new dataset of N = 5 long tra jectories with T = 11 (so that we get m = 50 snapshots) satisfying (6) with r = 3. The eigen vectors and eigen v alues in the set { ( ξ i , ζ i , λ i ) } 3 i =1 are computed using Algorithm 2 in the con text of setting v i ). In other words, matrices X and Y are generated using (1) and the mo del f t ( x t − 1 ) = F x t − 1 where F =  ζ 1 ζ 2 ζ 3  diag( λ 1 , λ 2 , λ 3 )  ξ 1 ξ 2 ξ 3  | . (21) W e then consider the tw o following snapshots conﬁgurations: – setting v ii ): the original matrices X and Y , – setting v iii ): a noisy v ersion of matrices X and Y , where we ha v e cor- rupted the snapshots with a zero-mean Gaussian noise so that the p eak signal-to-noise ratio 4 is 20 dB. 4 The p eak signal-to-noise ratio is deﬁned as 20 log 10 max t,i k x t ( θ i ) k ∞ σ , where σ denotes the standard deviation of the standard normal distribution. Low-Rank Dynamic Mode Decomposition: 17 Results are display ed in Figures 4 and 5. As exp ected the optimal approx- imation error v anishes in the noiseless setting in the case where k ≥ r . W e observ e only a slight increase of the error in the presence of noise. This shows the robustness of the prop osed method. Besides, we remark that the error norm obtained with TLS DMD or with the prop osed optimal approach are indistinguishable in b oth, the noiseless and noisy settings. The approximation b y low-rank pro jected DMD in the noiseless case reproduces almost exactly the optimal b ehaviour, while its p erformance slightly deteriorates for k ≥ 2 in the noisy case. The quasi-optimal p erformance of this metho d in the noiseless setting shows that the assumption of linear dep endence of s napshots is nearly v alid. This assumption no longer holds when snapshots are corrupted b y noise. The approximation by truncated DMD is clearly sub-optimal in the noiseless setting. More imp ortan tly , the p erformance of this metho d b ecomes dramatic in the presence of noise: the error diﬀerence with the optimal one is of the order of more than a decade. The deterioration is clearly visible in Figure 5. This ﬁgure displays the diﬀeren t approximations of eigenv ector ζ 3 re-arranged in the form of a spatial map. In the presence of noise, the spatial structure of ζ 3 is completely rubb ed out using an approximation b y truncated DMD. Eigenv ector ζ 3 is fairly re- co vered using our optimal approximation and roughly estimated using an ap- pro ximation b y low-rank pro jected DMD. Surprisingly , although TLS DMD yields a quasi-optimal approximation error norm, the structure of eigenv ector ζ 3 is completely rubbed out, in a very similar manner to low-rank pro jected DMD. 6 Conclusion This work shows that we can compute in p olynomial time an optimal solution of the non-conv ex problem related to lo w-rank linear appro ximation. As shown in Theorem 1, a closed-form solution is in fact the orthogonal pro jection of the unconstrained problem solution onto a sp eciﬁc low-dimensional subspace. The theorem also provides a closed-form c haracterization of the optimal ap- pro ximation error. Based on these results, w e show in Proposition 1 that the EVD of this optimal solution can b e obtained directly from the snapshots with a complexit y of O ( m 2 ( m + n )). This oﬀ-line complexit y is the same as for state-of-the-art sub-optimal metho ds. Finally , we illustrate through numerical sim ulations in syn thetic and ph ysical setups, the gain brought b y using this optimal appro ximation. A Pro of of Theorem 1 W e b egin b y showing the ﬁrst part of the theorem, namely that A ? k = U Z ,k U Z ,k | YX † is a solution of (9). W e ﬁrst prov e in this paragraph the existence of a minimizer of (9). Let us show that we can restrict our attention to a minimization problem ov er the set A = { ˜ A ∈ R n × n : rank( ˜ A ) ≤ k, Im( ˜ A | ) ⊆ Im( X ) } . 18 Patric k H´ eas, C´ edric Herzet Indeed, any matrix A ∈ { ˜ A ∈ R n × n : rank( ˜ A ) ≤ k } can b e decomposed in tw o comp onents: A = A k + A ⊥ where A k belongs to the set A , suc h that columns of A k are orthogonal to those of A ⊥ , i.e. , A ⊥ ( A k ) | = 0. F rom this construction, we hav e that rows of A ⊥ are orthogonal to rows of X . Using this decomp osition, we thus ha ve that k Y − A X k 2 F = k Y − A k X k 2 F . Moreov er, because of this orthogonal prop erty , w e ha ve that rank( A ) = rank( A k ) + rank( A ⊥ ) so that rank( A k ) ≤ rank( A ). In consequence, if A is a minimizer of (9), then A k is also a minimizer since it leads to same v alue of the cost function and since it is admissible: rank( A k ) ≤ rank( A ) ≤ k . Therefore, it is suﬃcient to ﬁnd a minimizer ov er the set A . Now, according to the W eierstrass’ theorem [2, Prop osition A.8], the existence is guar- anteed if the admissible set A is closed and the ob jective function k Y − A X k 2 F is co ercive. Let us prov e these tw o properties. W e ﬁrst sho w that A is closed. According to [12, Lemma 2.4], the set of lo w-rank matrices is closed. Moreov er, it is well-kno wn that a linear subspace of a normed ﬁnite-dimensional vector space is closed [1, Chapter 7.2], so that the set of matrices A = { ˜ A ∈ R n × n : Im( ˜ A | ) ⊆ Im( X ) } is closed. Since A is the intersection of tw o closed sets, we deduce that A is closed. Next, we sho w coe rcivit y . Let us consider the SVD of any A ∈ A : A = U A Σ A V | A , where Σ A = diag( σ A, 1 · · · σ A,k ). F rom the deﬁnition of the F robenius norm, we ha ve for any A ∈ A , k A k F = ( P k i =1 σ 2 A,i ) 1 / 2 . W e ha ve that k A k F → ∞ if a non-empty subset of singular v alues, sa y { σ A,j } j ∈J , tend to inﬁnity . Therefore, we ha v e lim k A k F →∞ : A ∈A k Y − A X k 2 F = lim k A k F →∞ : A ∈A k Y k 2 F − 2 trace( Y | A X ) + k A X k 2 F , = lim k A k F →∞ : A ∈A k A X k 2 F = lim k A k F →∞ : A ∈A k Σ A V | A X k 2 F , = lim σ A,j →∞ : A ∈A ,j ∈J n X j =1 σ 2 A,j k X | v j A k 2 2 = ∞ . The second equality is obtained b ecause the dominant term when k A k F → ∞ is the quadratic one k A X k 2 F . The third equality follows from the in v ariance of the F robenius norm to unitary transforms while the last equality is obtained noticing that k X | v j A k 2 6 = 0 because v j A ∈ Im( X ) since A ∈ A . This sho ws that the ob jective function is coercive o v er the closed set A . Th us, using the W eierstrass’ theorem, this shows the existence of a minimizer of (9) in A and thus in { ˜ A ∈ R n × n : rank( ˜ A ) ≤ k } . W e will no longer restrict our atten tion to the domain A in the following and come back to the original problem (9) implying the set of low-rank matrices. Next, problem (9) can b e rewritten as the unconstrained minimization A ? k ∈ arg min A = P Q | : P,Q ∈ R n × k k Y − A X k 2 F . (22) In the following we will use the ﬁrst-order optimalit y condition of problem (22) to c har- acterize its minimizers. A closed-form expression for a minimizer will then b e obtained b e introducing an additional orthonormal property . The ﬁrst-order optimalit y condition and the additional orthonormal prop erty are presen ted in the following lemma, whic h is prov en in App endix B. Lemma 1 Pr oblem (22) admits a solution such that P | P = I k (23) XY | P = XX | Q. (24) T o ﬁnd a closed-form expression of a minimizer of (22), w e need to rewrite condition (24). W e prov e that this condition is equiv alent to P X | Y | P = X | Q. (25) Indeed, we show b y contradiction that (24) implies that, for an y solution of the form P Q | , there exists Z ∈ R m × k such that P X | Y | P + Z = X | Q, (26) Low-Rank Dynamic Mode Decomposition: 19 with columns of Z in ker( X ). Indeed, if P X | Y | P + Z 6 = X | Q , then by multiplying b oth sides on the left by X we obtain P X XY | P + X Z = P X XY | P 6 = XX | Q . Since P X is the orthogonal pro jector onto the subspace spanned b y the columns of X , the latter relation implies that XY | P 6 = XX | Q which con tradicts (24). This prov es that (24) implies (26). Now, since columns of the tw o terms in the left-hand side of (26) are orthogonal and since columns of the matrix in the right-hand side are in the image of X | , we deduce that the only admissible choice is Z with columns belonging b oth to ker( X ) and Im( X | ), i.e. , Z is a matrix full of zeros. Therefore, we obtain the necessary condition (25). W e ha v e sho wn on the one hand that (24) implies (25). On the other hand, by multiplying on the left b oth sides of (25) b y X , we obtain (24) ( X P X | = X because XX † is the orthogonal pro jector on to the space spanned b y the columns of X ). Therefore the necessary conditions (24) and (25) are equiv alent. W e are now ready to characterize a minimizer of (9). According to Lemma 1, we ha ve min A ∈ R n × n :rank( A ) ≤ k k Y − A X k 2 F = min ( ˜ P , ˜ Q ) ∈ R n × k × R n × k k Y − ˜ P ˜ Q | X k 2 F s.t. ( ˜ P | ˜ P = I k XY | ˜ P = XX | ˜ Q , (27) = min ( ˜ P , ˜ Q ) ∈ R n × k × R n × k k Y − ˜ P ˜ Q | X k 2 F s.t. ( ˜ P | ˜ P = I k P X | Y | ˜ P = X | ˜ Q , = min ˜ P ∈ R n × k k Y − ˜ P ˜ P | Y P X | k 2 F s.t. ˜ P | ˜ P = I k , (28) = min ˜ P ∈ R n × k k ( Y − ˜ P ˜ P | Y ) P X | + Y ( I m − P X | ) k 2 F s.t. ˜ P | ˜ P = I k , = min ˜ P ∈ R n × k k Z − ˜ P ˜ P | Z k 2 F + k Y ( I m − P X | ) k 2 F s.t. ˜ P | ˜ P = I k . (29) The second equality is obtained from the equiv alence between (24) and (25). The third equality is obtained b y introducing the second constraint in the cost function and noticing that pro jection operators are always symmetric, i.e. , ( P X | ) | = P X | , while the last equality follows from the deﬁnition of Z given in (15) and the orthogonality of the columns of the tw o terms. Problem (29) is a prop er orthogonal decomp osition problem with the snapshot matrix Z . The solution of this prop er orthogonal decomp osition problem is the matrix U Z ,k (with orthonormal columns) deﬁned in Section 4.1, see e.g. , [30, Proposition 6.1]. W e thus obtain from (28) that min A ∈ R n × n :rank( A ) ≤ k k Y − A X k 2 F = k Y − U Z ,k U Z ,k | Y P X | k 2 F = k Y − P Z ,k Y P X | k 2 F . (30) F urthermore, w e verify that A ? k = U Z ,k W | with W = ( X | ) † Y | U Z ,k is a minimizer of (22). Indeed, since XX | W = XX | ( X | ) † Y | U Z ,k = XY | U Z ,k , we chec k that ( U Z ,k , W ) is ad- missible for problem (27). W e also chec k using (25) that k Y − U Z ,k W | X k 2 F = k Y − P Z ,k Y P X | k 2 F , i.e. , that ( U Z ,k , W ) reac hes the minimum given in (30). In consequence, we hav e shown that problem (22), and equiv alently problem (9), admit the minimizer A ? k = U Z ,k W | = P Z ,k YX † . It remains to prov e the second part of the theorem, namely the characterization of the approximation error. The sough t result follows from standard prop er orthogonal decompo- sition analysis. Indeed, according to [30, Prop osition 6.1] the ﬁrst term of the cost function in (29) ev aluated at A ? k is k Z − P Z ,k Z k 2 F = P m i = k +1 σ 2 Z ,i . B Pro of of Lemma 1 W e begin b y proving that any minimizer of (22) can b e rewritten as P Q | where P | P = I k . Indeed, the existence of the SVD of ˜ A for any minimizer ˜ A ∈ R n × n guarantees that k Y − ˜ A X k 2 F = k Y − U ˜ A Σ ˜ A V | ˜ A X k 2 F , 20 Patric k H´ eas, C´ edric Herzet where U ˜ A ∈ R n × k possesses orthonormal columns. Making the identiﬁcation P = U ˜ A and Q = V ˜ A Σ ˜ A we v erify that k Y − ˜ A X k 2 F = k Y − P Q | X k 2 F and that P p ossesses orthonormal columns. Next, an y solution P Q | of (22) should satisfy the ﬁrst-order optimalit y condition with resp ect to the j -th column denoted q j of matrix Q , that is 2[ − XY | p j + k X i =1 ( p | i p j ) XX | q i ] = 0 , where the j -th column of matrix P is denoted p j . In particular, a solution with P p os- sessing orthonormal columns should satisfy XY | p j = XX | q j , or in matrix form XY | P = XX | Q.  C Pro of of Prop osition 1 W e hav e A ? k = P Z ,k YX † = U Z ,k W | which implies that W | U Z ,k = U Z ,k | YX † U Z ,k = U Z ,k | P Z ,k YX † U Z ,k = U Z ,k | U Z ,k W | U Z ,k . Using the deﬁnition of ζ i ’s and ξ i ’s in (20), since the w r i ’s and w ` i ’s are the right and left eigenv ectors of W | U Z ,k , we v erify that A ? k ζ i = U Z ,k W | U Z ,k w r i = U Z ,k λ i w r i = λ i ζ i , and that ( A ? k ) | ξ i = ˆ QU Z ,k | W w ` i = W λ i w ` i = λ i ξ i . Finally , ξ | i ζ i = 1 is a suﬃcient condition so that ξ | i A ? k ζ i = λ i .  Ac knowledgemen ts The authors thank the “Agence Nationale de la Recherc he” (ANR) which partially funded this researc h through the GER ONIMO pro ject (ANR-13-JS03-0002). References 1. Auliac, G., Caby , J.: Math´ ematiques 3e Ann´ ee: T opologie et analyse. Ob jectif Licence. EdiScience (2005) 2. Bertsek as, D.: Nonlinear Programming. Athena Scientiﬁc (1995) 3. Budiˇ si´ c, M., Mohr, R., Mezi´ c, I.: Applied Ko opmanism. Chaos: An Interdisciplinary Journal of Nonlinear Science 22 (4), 047510 (2012) 4. Chandrasekhar, S.: Hydro dynamic and h ydromagnetic stability . Courier Corp oration (2013) 5. Chen, K.K., T u, J.H., Rowley , C.W.: V arian ts of dynamic mo de decomposition: bound- ary condition, Koopman, and Fourier analyses. Journal of nonlinear science 22 (6), 887–915 (2012) 6. Cohen, A., DeV ore, R.: Approximation of high-dimensional parametric PDEs. Acta Numerica 24 , 1 – 159 (2015). DOI 10.1017/S0962492915000033 7. Cui, T., Marzouk, Y.M., Willcox, K.E.: Data-driv en model reduction for the Bay esian solution of in v erse problems. International Journal for Numerical Methods in Engineer- ing 102 , 966–990 (2015) 8. Dawson, S.T.M., Hemati, M.S., Williams, M.O., Rowley, C.W.: Characterizing and correcting for the eﬀect of sensor noise in the dynamic mode decomp osition. Exp eriments in Fluids 57 , 42 (2016). DOI 10.1007/s00348- 016- 2127- 7 9. Eck art, C., Y oung, G.: The appro ximation of one matrix b y another of low er rank. Psychometrik a 1 (3), 211–218 (1936) 10. F azel, M.: Matrix rank minimization with applications, stanford univ ersity . Ph.D. thesis (2002) 11. Golub, G., V an Loan, C.: Matrix Computations. Johns Hopkins Studies in the Mathe- matical Sciences. Johns Hopkins University Press (2013) Low-Rank Dynamic Mode Decomposition: 21 12. Hackbusc h, W.: T ensor spaces and numerical tensor calculus, vol. 42. Springer Science & Business Media (2012) 13. Hasselmann, K.: PIPs and POPs: The reduction of complex dynamical systems using principal interaction and oscillation patterns. Journal of Geophysical Researc h: Atmo- spheres 93 (D9), 11015–11021 (1988) 14. H´ eas, P ., Herzet, C.: Low-rank appro ximation of linear maps. arXiv e-prints (2018) 15. H´ eas, P ., Herzet, C.: State-of-the-art algorithms for low rank dynamic mode decompo- sition. arXiv e-prints (2021) 16. H´ eas, P ., Herzet, C., Comb ` es, B.: Generalized kernel-based dynamic mode decomp osi- tion. In: IEEE International Conference on Acoustics, Speech and Signal Pro cessing (ICASSP) (2020) 17. Hemati, M.S., Rowley , C.W., Deem, E.A., Cattafesta, L.N.: De-biasing the dynamic mode decomp osition for applied Koopman spectral analysis of noisy datasets. Theoret- ical and Computational Fluid Dynamics 31 (4), 349–368 (2017) 18. Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge univ ersity press (2012) 19. Jain, P ., Mek a, R., Dhillon, I.S.: Guaranteed rank minimization via singular value pro- jection. In: Adv ances in Neural Information Pro cessing Systems, pp. 937–945 (2010) 20. Jov anovic, M., Schmid, P ., Nichols, J.: Low-rank and sparse dynamic mo de decomposi- tion. Cen ter for T urbulence Research Annual Researc h Briefs pp. 139–152 (2012) 21. Klus, S., Koltai, P ., Sch¨ utte, C.: On the numerical approximation of the Perron- Frobenius and Ko opman op erator. arXiv preprint arXiv:1512.05997 (2015) 22. Kutz, J.N., Brun ton, S.L., Brunton, B.W., Proctor, J.L.: Dynamic mode decomposition: Data-driven mo deling of complex systems (2016) 23. Lee, K., Bresler, Y.: Guaranteed minim um rank appro ximation from linear observ ations by nuclear norm minimization with an ellipsoidal constraint. arXiv preprint (2009) 24. Lee, K., Bresler, Y.: Admira: Atomic decomp osition for minimum rank approximation. IEEE T ransactions on Information Theory 56 (9), 4402–4416 (2010) 25. Li, Q., Dietrich, F., Bollt, E.M., Kevrekidis, I.G.: Extended dynamic mo de decomp o- sition with dictionary learning: a data-driv en adaptiv e sp ectral decomposition of the Koopman operator. arXiv preprint arXiv:1707.00225 (2017) 26. Mesbahi, M., Papa v assilop oulos, G.P .: On the rank minimization problem o v er a p ositive semideﬁnite linear matrix inequality . IEEE T ransactions on Automatic Control 42 (2), 239–243 (1997) 27. Mishra, B., Meyer, G., Bach, F., Sepulc hre, R.: Lo w-rank optimization with trace norm penalty . SIAM Journal on Optimization 23 (4), 2124–2149 (2013) 28. Parrilo, P .A., Khatri, S.: On cone-inv ariant linear matrix inequalities. IEEE T ransac- tions on Automatic Control 45 (8), 1558–1563 (2000) 29. Penland, C., Magorian, T.: Prediction of nino 3 sea surface temp eratures using linear inv erse modeling. Journal of Climate 6 (6), 1067–1076 (1993) 30. Quarteroni, A., Manzoni, A., Negri, F.: Reduced basis metho ds for partial diﬀerential equations: an introduction, vol. 92. Springer (2015) 31. Rech t, B., F azel, M., P arrilo, P .A.: Guaran teed minim um-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review 52 (3), 471–501 (2010) 32. Schmid, P .J.: Dynamic mode decomposition of numerical and experimental data. Jour- nal of Fluid Mechanics 656 , 5–28 (2010) 33. T aylor, G., Green, A.: Mec hanism of the production of small eddies from large ones. Pro- ceedings of the Roy al So ciety of London. Series A, Mathematical and Physical Sciences 158 (895), 499–521 (1937) 34. T u, J.H., Rowley , C.W., Luc hten burg, D.M., Brun ton, S.L., Kutz, J.N.: On dynamic mode decomp osition: Theory and applications. Journal of Computational Dynamics 1 (2), 391–421 (2014) 35. Williams, M.O., Kevrekidis, I., Rowley , C.: A data–driven approximation of the Koop- man op erator: Extending dynamic mo de decomp osition. Journal of Nonlinear Science 25 (6), 1307–1346 (2015) 36. Williams, M.O., Rowley , C.W., Kevrekidis, I.G.: A kernel-based approac h to data-driv en Koopman spectral analysis. arXiv preprint arXiv:1411.2260 (2014) 37. Wynn, A., Pearson, D., Ganapathisubramani, B., Goulart, P .J.: Optimal mode decom- position for unsteady ﬂows. Journal of Fluid Mechanics 733 , 473–503 (2013) 38. Y eung, E., Kundu, S., Ho das, N.: Learning Deep Neural Netw ork Representations for Koopman Operators of Nonlinear Dynamical Systems. ArXiv e-prints (2017) 22 Patric k H´ eas, C´ edric Herzet T ruth Optimal T runcated DMD low-rank Pro jected DMD TLS DMD ζ 1 ζ 2 ζ 3 Fig. 5 Amplitudes related to temp erature (left columns) and vorticit y (right columns) of the right eigenv ectors of matrix F of rank 3 deﬁned in (21): ground truth and estimation obtained in the noisy setting viii) with our optimal approximation, with an appro ximation by truncated DMD, with an approximation b y pro jected DMD or with an approximation by total-least square DMD. See details in Section 5.3.

Low-Rank Dynamic Mode Decomposition: An Exact and Tractable Solution

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment