Stability and instability in saddle point dynamics -- Part I

1 Stability and instability in saddle point dynamics - P art I Thomas Holding and Ioann is Lestas Abstract —W e consider the problem of co n ver gence to a saddle point of a conca ve-co n vex fun ction via gradient dynamics. Since ﬁrst introduced by Arr ow , Hurwicz and Uzawa in [1] such dynamics have been extensively used in d ive rse areas, there are, howev er , features that render their analysis non triv ial. These include the lack of con ver gence guarantees when the function considered is not strictly concav e-con vex and also the non- smoothness of sub gradient d ynamics. Our aim in th is t wo part paper is to pro vide an explicit characterization to the asymptotic behavio ur of general gradient and subgradient dynamics applied to a general conca ve-con vex function. W e sho w that d espite the nonlinearity and n on - smoothness of these d ynamics their ω -limit set is comprised of trajectories that solve only explicit lin ear ODEs that are characterized within the paper . More prec isely , in Part I an exact characterization is provided to the asymptotic behavio ur of unconstrained gradient d ynamics. W e also sho w that when con ver gence to a saddle point is not guaranteed then th e system behaviour can be problematic, with arbitrarily small noise leadi ng to an u nbound ed var iance. In Part II we consider a general class of sub gradient dynamics that restrict trajectories in an arbitrary conv ex domain, and show that when an equ ilibrium point exists the li miting trajectories are solut ions of subgradien t d ynamics on only afﬁn e sub spaces. The latter is a smooth class of dynamics wit h an asymptotic behavio ur exactly characterized in Part I, as solutions to explicit linear ODEs. These results are used to formulate corresponding con vergence criteria and are demonstrated with several examples and ap plications presented in Part II. Index T erms —Nonlin ear systems, saddle p oints, gradient dy- namics, large-sc ale systems. I . I N T RO D U C T I O N F INDING the saddle point of a con cav e-conve x function is a p roblem that is r elev an t in ma ny applications in engineer in g and econo mics and has b een addressed by various commun ities. It includes, for example, optimization problem s that are reduced to ﬁnding the sad dle point o f a Lagr angian. The gr adient me th od, ﬁrst introduced b y Arrow , Hurwicz an d Uzawa [1] has b een widely used in this context as it leads to decentralized up date rules for network o ptimization pro blems. It has th erefore been extensiv ely used in areas such as resour c e allocation in c ommun ic a tion and econ omic ne tworks (e. g. [20], [22], [30], [12], [26], [24]), game theory [13], distributed optimization [1 4], [3 6], [31] and power networks [37], [10], [21], [8], [9], [ 3 4], [25], [2 8]. Thomas Holding is with the Department of Mathematics, Imperial College London, United Kingdom; t.hold ing@imperial.a c.uk Ioannis Lestas is with the Department of Engineeri ng, Univ ersity of Cambridge , C ambridge, CB2 1PZ, Uni ted Kingdom; icl20@cam.ac. uk Pape r [17] is a conferen ce paper that includ es preliminar y result s in this manuscript (Part I). This manuscript includes additional results of indep endent intere st, detai led deri vation s and extensi ve discussion. Nev ertheless, in broad classes of pro blems there are fea tures that render the analysis o f the asymptotic b ehaviour o f gradient dynamics non tr i vial. In p articular, ev en th o ugh for a strictly concave-con vex function co nvergence to a sadd le point via gradient dy namics is e n sured, when this strictness is lacking, conv ergence is not guaranteed an d oscillator y solutio ns can occur . The existence of such oscillations h a s been reported in various ap p lications [1], [12], [18], [29], howev er , an exact characterizatio n of their explicit fo rm for a gene r al co ncave- conv ex f u nction, wh ic h lead s also to a necessary and su fﬁcient condition for their existence, has not been pr ovided in the literature and is o ne of the aims of Part I of this work. Furthermo re, when subgradient methods ar e u sed to re- strict the dy namics in a conve x domain (need e d, e.g . , in optimization pro blems), the dynamic s bec o me non-smoo th in continuo us-time. This increases signiﬁcantly the complexity in the analy sis as classical L yap unov and LaSalle typ e techniques (e.g. [23]) cannot be applied. This is also reﬂected in the alternative app roach taken for the conv ergence p roof in [1] for subgr adient dy namics ap plied to a strictly co ncave-con vex Lagrang ian with po sitivity constraints. From an early stage it has been n oted in the literature that the rig ht-hand side of (su b )gradien t dynamic s is monoton e [33]. Th is has bee n exploited to derive c on vergence r esults under ap propria te strictness in the con cavity/con vexity pr operties [35], [15]. In a more recent stud y [5] it was pointed out that the in variance principle fo r hybrid automa ta in [27] cannot be ap plied in this co ntext, and gave an alter native pro o f, by m eans o f Caratheodo ry’ s in variance prin ciple, to the con vergence r esult in [1] men tioned above. Conver gence criteria f or u nconstrain ed gradient dynamics were a lso d eriv ed in [4] and under positi vity constraints in [6] . In gen eral, rigorously proving conv ergence for the subg radient method, ev en in what would naively app ear to b e simple cases, is a n o n-trivial p roblem, an d req uires much machinery from no n-smooth a n alysis [7], [ 16]. Our aim in this two part p aper is to provide an explicit characterizatio n of the asymptotic behaviour of continu ous- time grad ien t an d subgradien t dyn amics ap plied to a ge n eral concave-con vex functio n . Ou r analy sis is carried ou t in a general setting, where the fu nction with respect to which these dy namics are app lied is not necessarily strictly co ncave- conv ex. Furthermo re, a g eneral class o f subgradien t dy namics are co nsidered, wher e trajecto ries are restricted in an ar bitrary conv ex dom ain. One of ou r main results is to show that de sp ite the no nlinear and non smooth character of these d ynamics their ω -limit set is co mprised o f trajectories that solve explicit linear ODEs. Our main con tributions can b e sum marized as follows: 2 • In Part I, we con sider the gradien t metho d applied on a g eneral conc av e-conv ex fun ction in an unconstrained domain, and provide an exact character ization to th e limiting solutions, whic h can in gen eral be oscillatory . In particular, we show that despite the nonlinear ity of the dy namics the trajecto ries converge to solutions that satisfy a linea r ODE th at is explicitly char acterized. Furthermo re, we show that when such oscillations oc- cur, the dynamic b ehaviour can b e p roblematic, in the sense that arbitrarily small stochastic pertur bations lead to an unbo unded variance, when th e set of saddle p o ints includes a b i-inﬁnite line. • In Part II, we conside r the subgrad ie n t m ethod applied to a g eneral concave-con vex fu nction with the tr a jectories restricted in an arb itrary conve x d omain, such that an equilibriu m poin t exists. W e show that d espite the no n- smooth character of these dynamics, their limiting be- haviour is giv en by the solutions of one of an explicit fam- ily of linear ODE s. In particular, these ODEs ar e sho wn to be solutio ns of subgrad ient dynamics on afﬁne subspaces, which is a class of dynamics the asym ptotic proper ties of which ar e exactly de termined in Part I. These results are used to formulate correspon d ing conver gence criteria, and various examples and a p plications are discussed. It should be noted that there is a dire ct link between the results in Part I and Part II as the dynamics, that are proved to be associated with the asym ptotic behaviour of the sub gradien t method, are a class of d y namics that can be analysed with the framework introd u ced in Part I. App lications of the results in Part I will ther efore be discu ssed in Part I I , as in many cases (e.g. o p timization p roblems with ineq uality constraints) a r e stricted domain fo r the concav e-conve x fu nction need s to be considere d . Finally , we would also like to com ment that the methodo l- ogy used for the deriv ations in the two p apers is of indep endent technical interest. In Part I the analysis is based on various geometric prop erties established for the saddle points of a concave-con vex function. In Part II the non -smooth analy sis is carr ie d out by means of so me more abstract results on correspo n ding semiﬂows that are ap plicable in this co n text, while a lso mak ing use of th e n otion of a face of a co nvex set to char acterize the asymptotic behaviour of the dy namics. The Part I pap er is struc tu red as follows. I n section II we introduce various d eﬁnitions and preliminaries that will be used throu ghout the paper . I n sectio n III th e p roblem formu latio n is g iv en and the ma in results are presented in section IV, i.e. ch a r acterization of the limitin g behaviour of gradient dyna mics. This section also inclu d es an extensio n to a class of sub gradien t d ynamics that restrict the trajectories on afﬁne spaces. This is a techn ical resu lt that will be used in Part II to character ize th e limiting b ehaviour of genera l subgrad ie n t d ynamics. The pr o ofs of the resu lts a r e ﬁnally giv en in section A. I I . P R E L I M I NA R I E S A. Nota tion Real num bers are deno ted by R and non -negativ e real num - bers as R + . F or vectors x, y ∈ R n the in equality x < y de n otes the correspond in g element wise inequ a lity , i. e. x i < y i ∀ i , d ( x, y ) d enotes the Eu clidean metric and | x | de n otes the Euclidean norm . The sp ace of k − times contin uously differentiable fu nctions is denoted by C k . For a sufﬁciently differentiab le function f ( x, y ) : R n × R m → R we d enote the vector of p artial deriv ati ves of f with respec t to x as f x , respectively f y . The Hessian matrices with respect to x and y are denoted f xx and f y y , while f xy denotes the matr ix of p artial der iv atives deﬁned as [ f xy ] ij := ∂ f ∂ x i ∂ y j . For a vector v alued function g : R n → R m we let g x denote the matrix form ed by partial deriv ati ves of the elem e nts of g , i.e. [ g x ] ij = ∂ g i ∂ x j . For a matrix A ∈ R n × m we denote its kern e l and tra nspose by ker( A ) and A T respectively . If A is in addition symmetric, we write A < 0 if A is negative deﬁnite. 1) Geometry: For subsp aces E ⊆ R n we d enote the or- thogon al co mplement as E ⊥ , and for a set of vectors E ⊆ R n we denote their span as span( E ) , their af ﬁne hull as aﬀ ( E ) and their conve x hu ll as Conv( E ) . A set K ⊂ R n is a bi- inﬁnite line if it is the afﬁne hull of two distinct points in R n . The add ition of a vector v ∈ R n and a set E ⊆ R n is deﬁned as v + E = { v + u : u ∈ E } . For a set K ⊂ R n , we d enote the interio r , relative inter ior , bound ary and closure o f K as int K , r elint K , ∂ K and K respectively , and w e say that K and M are o r thogon al an d write K ⊥ M if for any two pairs of p oints k , k ′ ∈ K and m , m ′ ∈ M , we h av e ( k ′ − k ) T ( m − m ′ ) = 0 . Giv en a set E ⊆ R n and a f unction φ : E → E we say that φ is an isometry of ( E , d ) or simply an isometr y , if for all x, y ∈ E we h av e d ( φ ( x ) , φ ( y )) = d ( x, y ) . For x ∈ R , y ∈ R + we d eﬁne [ x ] + y = x if y > 0 and max(0 , x ) if y = 0 . 2) Con vex geometry: For a clo sed co n vex set K ⊆ R n and z ∈ R n , we deﬁne the max imal orthog onal linear manif old to K throu g h z as M K ( z ) = z + span( { u − u ′ : u , u ′ ∈ K } ) ⊥ (1) and the nor mal cone to K thr ough z as N K ( z ) = { w ∈ R n : w T ( z ′ − z ) ≤ 0 for all z ′ ∈ K } . (2) When K is an afﬁ ne space N K ( z ) is ind ependen t of z ∈ K and is den oted N K . If K is in a d dition non-e mpty , then we deﬁne the projection of z onto K as P K ( z ) = argmin w ∈ K d ( z , w ) . B. Con cave-co nve x functions and saddle po ints Deﬁnition 1 (Concave-con vex f unction) . Let K ⊆ R n + m be non-em pty closed and co n vex. W e say th a t a function ϕ ( x, y ) : K → R is concave-con vex on K if f o r any ( x ′ , y ′ ) ∈ K , ϕ ( x, y ′ ) is a concave fun ction of x and ϕ ( x ′ , y ) is a co n vex function of y . If e ith er th e concavity or conv exity is always strict, we say th a t ϕ is strictly conc a ve-conve x o n K . Deﬁnition 2 (Saddle poin t) . For a concave-conv ex fun ction ϕ : R n × R m → R we say th at ( ¯ x, ¯ y ) ∈ R n + m is a sad dle point of ϕ if for all x ∈ R n and y ∈ R m we have the inequality ϕ ( x, ¯ y ) ≤ ϕ ( ¯ x, ¯ y ) ≤ ϕ ( ¯ x , y ) . 3 If ϕ is in add ition C 1 then ( ¯ x, ¯ y ) is a saddle p oint if and only if ϕ x ( ¯ x, ¯ y ) = 0 an d ϕ y ( ¯ x, ¯ y ) = 0 . When we con sider a concave-con vex fu nction ϕ ( x, y ) : R n × R m → R we shall denote th e pair z = ( x, y ) ∈ R n + m in bold, and write ϕ ( z ) = ϕ ( x, y ) . The fu ll Hessian matrix will then be den oted ϕ zz . V ectors in R n + m and matrices actin g on them will be de noted in b old font ( e.g. A ). Saddle p o ints o f ϕ will b e denoted ¯ z = ( ¯ x, ¯ y ) ∈ R n + m . C. Dynamical systems Deﬁnition 3 (Flows and semi-ﬂows) . A tr iple ( φ, X , ρ ) is a ﬂow (resp. semi-ﬂow) if ( X , ρ ) is a metr ic sp a ce, φ is a continuo us map from R × X (re sp . R + × X ) to X wh ich satisﬁes the two pro perties (i) For all x ∈ X , φ (0 , x ) = x . (ii) For all x ∈ X , t, s ∈ R (resp. R + ), φ ( t + s, x ) = φ ( t, φ ( s, x )) . (3) When there is no conf usion over which (semi)-ﬂow is meant, we shall d enote φ ( t, x (0)) as x ( t ) . For sets A ⊆ R (r esp. R + ) and B ⊆ X we deﬁne φ ( A, B ) = { φ ( t, x ) : t ∈ A, x ∈ B } . W e say that a trajectory x ( t ) of a semi-ﬂow converges to a trajectory y ( t ) of the semi-ﬂow if ρ ( x ( t ) − y ( t )) → 0 as t → ∞ . (4) Deﬁnition 4 ( Global con vergence) . W e say that a (semi)-ﬂow ( φ, X, ρ ) is globally conver gent , if f or all initial condition s x ∈ X , the tr ajectory φ ( t, x ) converges to an equilibrium point. A speciﬁc form o f incremental stability , which we will r e fer to as pathwise stab ility , will be needed in the analysis that follows. Deﬁnition 5 (Pathwise stability) . W e say that a semi- ﬂow ( φ, X , ρ ) is pathwise stable if for any two trajector ies x ( t ) , x ′ ( t ) the distance ρ ( x ( t ) , x ′ ( t )) is non- increasing in time. Note that in a pathwise stable sem i-ﬂow ( φ, X , ρ ) , fo r each t ∈ R + that th e map from x ∈ X to φ ( t, x ) is non- expansiv e. As the subgra d ient m ethod has a d isco ntinuou s vector ﬁeld we nee d the notio n of Cara th ´ eodory solution s of d ifferential equations. Deﬁnition 6 (Cara th ´ eodory solution) . W e say that a trajectory z ( t ) is a Carath ´ eodory solution to a d ifferential eq uation ˙ z = f ( z ) , if z is an absolutely contin uous fu n ction of t , and for almost all times t , the deriv ativ e ˙ z ( t ) exists and is equ al to f ( z ( t )) . I I I . P RO B L E M F O R M U L A T I O N The main ob ject of study in Part I is the gradient method on an ar bitrary concave-con vex f unction in C 2 . Deﬁnition 7 (Gradien t method) . Given ϕ a C 2 concave- conv ex fun ction on R n + m , we deﬁne the gradient meth od as the ﬂow on ( R n + m , d ) , wher e d den otes the Eu c lidean metric, generated by the d ifferential equ ation ˙ x = ϕ x , ˙ y = − ϕ y . (5) It is c lea r that the sadd le points of ϕ are exactly the equilibriu m points of (5). In our companion paper [ 19] we study instead the sub- gradient method where the grad ient method (Deﬁn ition 7) is restricted to a convex set K by the add ition of a p r ojection term to the d ifferential equ ation (5). Deﬁnition 8 (Subg radient method) . Given a no n-empty closed conv ex set K ⊆ R n + m and a C 2 function ϕ that is con cav e- conv ex on K , we d eﬁne th e subg radient method on K as a semi-ﬂow on ( K , d ) consisting o f Carath ´ eo dory solutio ns of ˙ z = f ( z ) − P N K ( z ) ( f ( z )) , f ( z ) =  ϕ x − ϕ y  . (6) Note that the gradient m ethod is the subgradient meth od on R n + m . It should a lso be no ted that the subgr a d ient method (6) can be seen as a projected dyn amical system with on ODE giv en by ˙ z = P T K ( z ) ( f ( z )) , where T K ( z ) is the tangent cone to K at z (see [3] for various equiv alen t repre sen tations). I n Append ix C we also conside r the additio n o f con stant g a ins to the gr a d ient and subgrad ient metho d. W e brieﬂy summ arise b elow the main c o ntributions of this paper (Part I). • W e provide an exact character ization of the limiting so- lutions of th e gradie n t meth od (5) app lied to an arbitra r y concave-con vex functio n which is no t assume d to be strictly concave-con vex. Desp ite the non-linearity o f the gradient dynam ics, we show that these limiting solutions solve an explicit linear ODE given by d eriv atives o f the concave-con vex fu nction at a sadd le point. • W e show th at th e lack o f con vergenc e in grad ient dy - namics can lead to a pr oblematic behaviour, in the sense that ar bitrarily small stochastic perturba tio ns lead to a n u nbou n ded variance when the set of saddle p oints includes a bi-inﬁn ite line. • W e provid e an exact classiﬁcation of the limiting solu- tions o f the su b gradien t method on afﬁne subsp aces by extending the result described in the ﬁr st bullet point. This will be imp ortant for th e an a ly sis of g eneral subg ra- dient dy namics considered in Part II [19]. In particular, we show in Part II that the limiting beh aviour of the subgrad ie n t m ethod o n a r bitrary c o n vex dom a ins reduces to the limiting b ehaviour on afﬁne subspaces. I V . M A I N R E S U LT S This section includes the main results of the pa per . Before presenting those som e kn own results in the literature are stated . An k nown property of th e gr a dient method (5) is the fact that it is p athwise stable, which is stated belo w as Lemma 9. This follows from the fact that the n egati ve of th e right- hand side in (5) is maximal monoton e [33], [1 5]. Lemma 9. Let ϕ be C 2 and con cave-conve x on R n + m , then the gradient method (5) is pa thwise stable. Since saddle points are equilibr ium poin ts o f th e g radient method the result b elow imme d iately follows 1 . 1 See also Figure 1 in Appendix A for a graphical illustration. 4 Corollary 10. Let ϕ be C 2 and con cave-conve x on R n + m , then the distance of a solutio n of (5) to any sad dle poin t is non-in cr ea sing in time. Using LaSalle’ s theorem and Lemma 9 we obtain Corollary 11, wh ich is proved in Appen dix A. Note th at the notion o f convergence to a solution in Corollary 1 1 (see its deﬁnition in (4)) is stron ger than that of conver gence to a set. Corollary 11. Let ϕ be C 2 , concave- conve x on R n + m and have at lea st o ne saddle po in t. Then each so lution of th e gradient metho d (5) conver ges to a solution o f (5) which ha s constant distance fr o m any sadd le point. Thus classifying th e limitin g behaviour of the gradient method reduces to the prob lem of ﬁnding all so lu tions that lie a constant distance fr om any saddle poin t. In or der to facilitate the p resentation of the results, for a g iv en conc ave-conv ex function ϕ we deﬁne the following sets: • ¯ S will denote th e set of sadd le points of ϕ . • S will denote the set of solution s to (5) that are a con stant distance from any saddle po in t of ϕ . Note that if ¯ S = S 6 = ∅ the n Corollary 11 giv es the conv ergence of the grad ient method to a sadd le point. Our ﬁrst main result is that solutions of the gradien t metho d conv erge to solutions that satisfy an explicit linear ODE. T o present our results we deﬁne the following matrices of partial derivati ves o f ϕ A ( z ) =  0 ϕ xy ( z ) − ϕ y x ( z ) 0  B ( z ) =  ϕ xx ( z ) 0 0 − ϕ y y ( z )  . (7) For simplicity of notation we shall state the result f o r 0 ∈ ¯ S ; the general case may be obtained by a translation of coordi- nates. Theorem 12. Let ϕ be C 2 and conca ve-conve x o n R n + m . Let 0 ∈ ¯ S the n all solution s in S solve the lin ear ODE: ˙ z ( t ) = A ( 0 ) z ( t ) . (8) Furthermore , a solution z ( t ) to (8) is in S if and only if for all t ∈ R a nd r ∈ [0 , 1] , z ( t ) ∈ ker( B ( r z ( t ))) ∩ ker( A ( r z ( t )) − A ( 0 )) (9) wher e A ( z ) and B ( z ) a r e deﬁ ned by (7) . The proof of Theo rem 12 is provided in Appen dix A and Append ix B. The sign iﬁcance of this resu lt is discussed in the remarks below . Remark 1 3 . It should be noted that despite the n on-linea rity of the gradient dy n amics (5 ), the limiting solutions solve a linear ODE with explicit coefﬁcients dependin g only on the deriv ati ves of ϕ at the saddle poin t. Remark 1 4 . An importan t con sequence of this exact cha rac- terisation o f the limitin g beh aviour , is th e fact th at the p roblem of proving glob a l conv ergence to a saddle point is r e duced to that of showing that there are no non - trivial limiting solutions. Remark 15 . Condition (9) appears to be very hard to check, as it requires kn owledge o f the trajectory for all times t ∈ R . Howe ver, whe n the aim is to pr ove co n vergence to an eq ui- librium point, the f o rm of con dition (9) makes the stability condition more powerful, as it makes it easier to prove that non-tr ivial trajectories do not satisfy th e conditio n. In partic- ular , fo r various classes o f grad ient d y namics, the structure of matrices A ( z ) and B ( z ) are often suf ﬁcient to deduce tha t (8), (9) are o nly satisﬁed by sadd le points, without explicitly knowing those a prior y , thus proving glo b al convergence to a saddle po int. Su ch examples will be discussed in par t II of th is man uscript, and in clude mod iﬁcations that ensure global conv ergence to a sad dle p oint without having to resor t to an assumption o f a strictly concave-conv ex func tio n. An application of such a m odiﬁcation to the problem of multipath routing will also be d iscussed in part II . Remark 16 (Localisation) . T he cond itions in the Theorem use only local information about the concave-con vex function ϕ , in the sense th a t if ϕ is only con cave-con vex on a con vex subset K ⊆ R n + m which co ntains 0 , then any trajectory z ( t ) of the gradient method (5) that lies a constant distance from any saddle po int in K and do e s not leave K at any time t will obey the conditio ns o f the theorem. Remark 17 . A main signiﬁcan ce of the exact characterizatio n of the limiting beh aviour of gr adient method in The o rem 12 is th e fact that it has a natur al gen eralization to subg radient dynamics on afﬁne subspaces (presented in section IV -A), which ca n be used to characterize th e limiting solutions of the subgr adient metho d . The latter is a mo r e inv olved prob le m due to the non-smoo th ness of the dynam ic s and is addre ssed in part II o f this manu script. Remark 1 8 . One of the resu lts proved in App e ndix B is the fact th at the set S is co n vex (Pr o position 2 32). From this it follows tha t global conv ergence of the gradient metho d can be deduced from only local con vergence pr operties ab out one of the saddle po ints. This is stated below as Lemma 19 which is proved in A p pendix B. Lemma 19 . Let ϕ be C 2 , conca ve-conve x on R n + m , and let ¯ z b e a saddle p oint. Then the gradient method (5) is globa lly conver gent if and on ly if th er e e xists a neigh bourho u d N of ¯ z such that trajectories with initial con dition in N conver ge to a saddle poin t. As a simple illustratio n of the use o f T heorem 12 we show how to r e c over the well known r e su lt that the g radient method is g lobally co n vergent under the assum ption th a t ϕ is strictly concave-con vex (ﬁrst p art of Exam ple 20). It is also known that it is sufﬁcient to have th e strictness only lo cally [6] which can b e recovered from Lem ma 19 (second part o f Example 20). Note that in gen eral strictness is not needed to deduce con vergence and such examples will be discussed in part II (see also Remark 15). Example 20. 1) Suppose ϕ is strictly concave (th e strictly co n vex case is similar), th en ϕ xx is of full rank except at isolated 2 This result is also generalize d in part II to pathwise stable semiﬂows. 5 points, and the co n dition (9) can only hold if x ( t ) = 0 . Th en th e ODE (8) implies that y ( t ) is constant, and hence ( x ( t ) , y ( t )) is a saddle poin t. Thu s the o n ly limiting solution o f the gradient meth od (5) ar e the sad dle poin ts, which establishes that it is globally conver gent. 2) Su ppose ϕ is concav e-conve x, but it is strictly concave- conve x only in an op en re gion B ⊂ R n + m that includes a saddle point ¯ z , rather than in the who le of R n + m . F r om the stability o f ¯ z the re e xists a neighbo urhood N ⊂ B of ¯ z such that trajectories that start in N will r emain in B for all times. Furthermore , using the ar g uments in 1 ) one ca n deduce tha t re gion N does not inclu de a trajectory z ∈ S that is not a saddle po int. Ther efor e globa l con ver gence can be ded u ced fr om Lemma 19. From Theorem 12 we dedu ce some furth er results that give a more easily u nderstand able classiﬁcation of th e limiting solutions of the gradient method for simpler fo rms of ϕ . In par ticular , the ‘lin ear’ case occu rs when ϕ is a quad ratic function , as then the gr adient meth o d ( 5) is a linear system of ODEs. In this case S has a simple explicit form in terms of the Hessian matrix o f ϕ at 0 ∈ ¯ S , and in general this provides an inclusion as described below , which can be used to prove global conv ergence of the grad ient method using only local analysis at a saddle point. Theorem 21. Let ϕ be C 2 , c o ncave-co n vex on R n + m and 0 ∈ ¯ S . Then d eﬁne S linear = span { v ∈ ker( B ) : v is an eigenvector o f A } (10) wher e A = A ( 0 ) a nd B = B ( 0 ) in (7) . Then S ⊆ S linear with equality if ϕ is a quadratic function. The proo f o f Theo rem 21 is p r ovided in Append ix B. Here we draw an analogy with the recen t study [2 ] on the discrete time gradien t metho d in th e quadratic case. There the gradient method is proved to be semi-convergent if and only if ker( B ) = ker( A + B ) , i.e. if S linear ⊆ ¯ S . Theorem 21 inclu des a contin uous time version of this statement. W e n ext conside r the effect o f n oise when oscillator y solutions occur, a nd show that arbitra r ily small stochastic perturb ations lea d to an un b ound ed variance when th e set of sadd le points in cludes a bi-inﬁn ite line. In p articular, we consider the add ition o f white noise to the dy namics ( 5). This leads to the following stochastic differential equations dx ( t ) = ϕ x dt + Σ x dB x ( t ) dy ( t ) = − ϕ y dt + Σ y dB y ( t ) (11) where B x ( t ) , B y ( t ) are indepen dent standard Brownian mo- tions in R n , R m respectively , and Σ x , Σ y are positi ve deﬁnite symmetric matrice s in R n × n , R m × m respectively . Theorem 22. Let ϕ ∈ C 2 be concave- conve x on R n + m . Let 0 ∈ ¯ S an d S contain a bi-inﬁn ite line. Consider the noisy dynamics (11) . Th en, for a ny in itia l c ondition , the va ria nce o f the solutio n tend s to inﬁnity as t → ∞ , in th at E | z ( t ) | 2 → ∞ a s t → ∞ . (12) wher e E denotes the expectation operator . The pr oof o f The o rem 22 is provided in Appen d ix B. The condition that S contains a bi-in ﬁnite lin e is satisﬁed, fo r example, if the set S is n o t just a single po int and ϕ is a quadra tic function , and can occu r in application s, e.g. in the multi-path routing e xample gi ven in our companion paper [ 19]. One of the main app lications of the grad ient m ethod is to provid e convergence to a sad dle po in t of a Lagrang ia n following f r om a co ncave optim ization problem where some of the constraints are relaxed by Lagrange m ultipliers. When all the relaxed constraints are lin ear , the Lagran gian ϕ has the form ϕ ( x, y ) = U ( x ) + y T ( D x + e ) (13) where U ( x ) is a concave cost fu nction, y are th e Lagr a n ge multipliers, and D , e are a con stant matrix an d vector re- spectiv ely associated with the equality con stra in ts. Under the assumption that U is analytic we o btain a simple exact characterisation of S . One speciﬁc case of th is was studied by the authors pr eviously in [ 18], b ut without the analy ticity condition . Theorem 23 . Let ϕ b e deﬁn ed by (13) with U analytic and D ∈ R m × n , e ∈ R m constant. Assume that ( ¯ x, ¯ y ) = ¯ z is a saddle poin t of ϕ . Then S is given by S = ¯ z + span { ( x, y ) ∈ W × R m : ( x, y ) is an eigen vector of  0 D T − D 0  o (14) W = { x ∈ R n : s 7→ U ( sx + ¯ x ) is linear for s ∈ R } . Furthermore W is an afﬁne subspace. The proo f o f Theorem 21 is p rovided in Appendix B. Remark 24 . It should be noted that a simple characterization of S as in T h eorem 21 and Theo rem 2 3 is not always n ec- essary in ord er to prove global convergence to a sadd le point by means of T h eorem 12. For examp le, in the m odiﬁcation methods discussed in part II the structure of the matrices A ( z ) and B ( z ) ar e sufﬁcient to de duce global convergence. A. Th e subgradient method on afﬁne subspaces W e n ow extend the exact classiﬁcation (T h eorem 12) to th e subgrad ie n t method on afﬁne sub spaces. Th e sign iﬁcance o f this result is that it allo ws to provide a characterizatio n of the limiting behaviour of the subg radient m ethod in any conve x domain. In particular, o ne of the main results tha t will be proved in Part II of th is work is th e fact that the limiting behavio u r of the subgradient method on a general c on vex domain, when an equilibrium po int exis ts, ar e solutions to subgradient d ynamics on on ly afﬁne sub spaces . In order to con sider su bgradien t dyna mics on an afﬁne subspace, we let V be an a fﬁne sub space of R n + m and let Π ∈ R ( n + m ) 2 be the ortho gonal p rojection m atrix onto the orthog onal complem ent o f th e norm al co ne N V . Then the subgrad ie n t metho d (6) o n V is g i ven b y ˙ z = f ( z ) − P N V ( f ( z )) = Πf ( z ) (15) 6 where f ( z ) =  ϕ x − ϕ y  T . W e gener a lise Theorem 12 for this projected form o f th e g r adient m e thod. As with the statement of Theor e m 12, we state the result for 0 being an equ ilib rium point; the general case may be ob tained by a translation of coordin ates. Theorem 2 5. Let Π ∈ R ( n + m ) 2 be an orthogon al p r ojection matrix, ϕ be C 2 and con cave-conve x on R n + m , and 0 b e an equilibrium p oint of (15) . Then the trajectories z ( t ) of (15) that lie a co nstant distance fr om a ny equilib rium p oint of (1 5) ar e e xactly the solutio ns to the linea r ODE: ˙ z ( t ) = ΠA ( 0 ) Πz ( t ) (16) that satisfy , fo r all t ∈ R and r ∈ [0 , 1] , the co ndition z ( t ) ∈ ker( ΠB ( r z ( t )) Π ) ∩ ker( Π ( A ( r z ( t )) − A ( 0 )) Π ) (17) wher e A ( z ) and B ( z ) a r e deﬁ ned by (7) . The proo f o f Theorem 25 is provided in Ap pendix B. V . A P P L I C A T I O N S In many ap plications associate d with saddle p oint prob lems, the variables need to be co nstrained in pr escribed domains. These include, for example, p ositivity constraints on d ual variables in optimization problem s where some of the in- equality con straints are r elaxed with L agrang e m ultipliers, or more general c o n vex constraints on pr imal v ariables. Therefore applications will be stud ied in Part I I of this work where subgrad ie n t dyn a mics will be analyzed 3 . It should be noted, that apart from their signiﬁcance for saddle point pr oblems witho ut constraints 4 , a ma in signif- icance of the results in Part I is that they also lead to a characterizatio n of the asymptotic behaviour of subgradient dynamics. In particular , as mentioned in sectio n IV -A, it will be proved in Part II of this work that the asymptotic behaviour of subg radient d ynamics on a general conv ex domain , is g iv en by solutions to subgrad ient dy namics on on ly af ﬁne subspaces, which is a class of dynam ics the asymptotic behaviour o f which can be exactly determ ined using the results in Part I. V I . C O N C L U S I O N W e have c o nsidered in Part I the problem of convergence to a saddle poin t of a g e n eral c o ncave-con vex fu nction, which is not necessarily strictly concave-con vex, via gra d ient d ynamics. W e hav e p r ovided an exact char acterization to the asympto tic behaviour of su c h dyn amics, and h av e shown that despite their nonlin earity , conver gence is guar anteed to trajec tories that satisfy an explicit line a r ODE. W e have also shown that when con vergence to a saddle poin t is not ensur ed then the behaviour of such dyn amics can be pr oblematic, with arbitrarily small no ise leading to an unbo unded variance when the set of sad d dle points includ es a bi-inﬁnite line. Th ese results have also b een extended to subg radient d ynamics on 3 It should be noted that there are also classes of constrain ed optimizati on problems that can also be solved by means of smooth saddle point dynamics, such as the dynamics proposed in [11 ]. 4 Note that these inclu de also dual versions of optimizati on problems with equali ty constrai nts. afﬁne sub spaces, wher e an exact cha r acterization of their asymptotic b ehaviour as linear ODEs has also bee n der iv ed. This class of dynamics will be used as a basis for the results in Part II. In particular , it will be shown in Part II that sub g radient dynamics on a general con vex domain that hav e an equilibrium point, have an ω -limit set th at consists of trajectories that are solutions to subg radient dy namics on only afﬁne sub sp aces. V arious examp les and ap plications will also be pr esented in Part I I . R E F E R E N C E S [1] K. J. Arrow , L. Hurwicz, and H. Uzaw a. Studies in linear and non-linear pr ogramming . Stand ford Univ ersity Press, 1958. [2] Z hong-Zhi Bai. On semi-con verge nce of hermitian and ske w-hermiti an splitti ng methods for singular linear systems. Comput ing , 89(3-4):171– 197, 2010. [3] B. Brogliato , A. Daniili dis, C. Lemarechal, and V . Acary . On the equi vale nce between complementarit y systems, projected systems and dif ferential inclusions. Systems & Contr ol Letters , 55(1):45– 51, 2006. [4] A. Cherukur i, B. Gharesifa rd, and J. Cort ´ es. Saddle-p oint dynamics: conditi ons for asymptotic stabil ity of saddle point s. SIAM Journal on Contr ol and Optimizati on , 55(1):486–511 , 2017. [5] A. Cherukuri, E . Mallad a, and J. Cort ´ es. Asymptotic con ver gence of constrai ned primal–du al dynamics. Systems & Contr ol Letters , 87:10– 15, 2016. [6] A. Cherukuri, E. Mallada , S. Low , and J. Cort ´ es. The role of conv exity on saddle-p oint dynamics: L yapunov functi on and robustness. arXiv pre print arXiv:1608.0858 6 , 2016. [7] J . Cort ´ es. Disconti nuous dynamica l systems-a tutoria l on solution s, nonsmooth analysis, and stabilit y . IEEE Contr ol Systems Magazine , 28(3):36–7 3, 2008. [8] E . De v ane, A. Kasis, M. Antoniou, and I. Lestas. Primary fre quenc y reg- ulati on w ith lo ad-side particip ation Part II: Beyond passi vity approaches. IEEE T ransactions on P ower Systems , 32(5):35 19–3528, 2017. [9] E . De v ane, A. Kasis, C. Spani as, M. Ant oniou, and I. Lestas. Distributed frequenc y control and demand-side management. Smarter Ener gy: Fr om Smart Metering to the Smart Grid , pages 245–268, 2016. [10] F . D ¨ orﬂe r , J. Simpson-Porco, and F . Bullo. Breaking the hierarch y: Distrib uted control and economic optimality in microgrids. IEEE T ransaction s on Contr ol of Network Systems , 3(3):241–253, 2016. [11] H. -B. D ¨ or r , E. Saka, and C. E benbaue r . A smooth vect or ﬁeld for quadrat ic programming. In IEEE Confer ence on Decision and Contr ol , pages 2515–2520, 2012. [12] D. Feijer and F . Pagani ni. Stability of primal-dua l gradient dynamics and applica tions to network optimizat ion. Automa tica , 46(12):1974 – 1981, 2010. [13] B. Gharesif ard and J. Cort ´ es. Distribute d con ver gence to Nash equilibria in two-net work zero-sum games. Automatica , 49(6):1683–16 92, 2013. [14] B. G haresifard and J. Cort ´ es. Distrib uted continuous-time conv ex opti- mizatio n on weight-bala nced digraphs. IEEE T ransactions on Automatic Contr ol , 59(3):781–7 86, 2014. [15] R. Goebel. Stabil ity and robustn ess for saddle-point dynamics through monotone mappings. Syst ems & Contr ol Lette rs , 108:16–22, 2017. [16] R. Goebel, R. G . Sanfel ice, and A. R. T eel. Hybrid dynamica l systems. IEEE Contr ol Systems Magazine , 29(2):28–93, 2009. [17] T . Holding and I. Lestas. On the con ver gence to saddle points of concav e-con vex functions, the gradient method and emergence of oscilla tions. In 53rd IEEE Confer ence on Decision and Contr ol , 2014. [18] T . Holding and I. L estas. On the emergence of oscillat ions in dist ribut ed resource alloca tion. Automatica , 85:22–33, 2017. [19] T . Holding and I. Lestas. St abili ty and instabi lity in saddle point dynaimcs Part II: The subgradient method. arXiv pre print arXiv:1707.07351 , 2017. [20] L . Hurwicz. The design of mechanisms for resource all ocati on. The American Economic Revie w , 63(2):1–30, May 1973. [21] A. Kasis, E. De v ane, C. Spanias, and I. Lestas. Primary freque ncy regul ation with load-side participa tion Part I: Stability and Optimality. IEEE T ransactions on P ower Systems , 32(5):35 05–3518, 2017. [22] F . Kelly , A. Maulloo, and D. T an. Rate control in communication netw orks: shado w prices, proportion al fairness and stability . Journal of the Operati onal Resear ch Society , 49(3):23 7–252, March 1998. [23] H. K. Khalil. Nonl inear Systems . Prentice Hall, 2002. 7 [24] I. Lestas and G. V innico mbe. Combined control of routing and ﬂow: a multipat h routing approach. In 43rd IEEE Confere nce on Decisio n and Contr ol , December 2004. [25] N. Li, C. Zhao, and L. Chen. Connecti ng automat ic genera tion control and economi c dispatch from an optimiz ation view . IEEE T ransact ions on Contr ol of Network Systems , 3(3):254–264, 2016. [26] S. H. Lo w and D. E. Lapsley . Optimizat ion ﬂow controlI: basic algorithm and con ver gence . IEEE/ACM Tr ansactions on Networking , 7(6):8 61– 874, 1999. [27] J. L ygeros, K. H. Johansson, Slobodan N. Simi ´ c, J. Zhang, and S. S. Sastry . Dynamical properti es of hybrid automata. IEEE T ransactio ns on Automatic Contr ol , 48(1):2–17, 2003. [28] E . Mallada, C. Zhao, and S. Lo w . Optimal load-side control for frequenc y reg ulatio n in smart grids. IEEE T ransact ions on Automa tic Contr ol , 62(3):6294– 6309, 2017. [29] D. Papadaska lopoulos and G. Strbac. Dece ntraliz ed Partic ipatio n of Flex ible Demand in Electric ity Markets-P art I: Market Mechani sm. IEEE T ransactions on P ower Systems , 28:3658 –3666, 2013. [30] Srikant R. The mathematics of Internet conge stion contr ol . Birkhauser , 2004. [31] D. Richert and J. Cort ´ es. Robust distrib uted linear programming. IEEE T ransaction s on Automatic Contr ol , 60(10):25 67–2582, 2015. [32] R. T . Rock afella r . Con vex analysi s . Princeto n Univ ersity Press, 2nd editi on, 1972. [33] R T yrrell Rockafel lar . Saddle-po ints and con vex analysis. Diffe re ntial games and relat ed topics , 109, 1971. [34] T . Ste gink, C. De Persis, and A. van der Schaft. A unifying energy- based approach to stabilit y of powe r grids with m arke t dynamics. IEE E T ransaction s on Automatic Contr ol , 62(6):261 2–2622, 2017. [35] V .I. V enets. Conti nuous algorithms for s olutio n of conv ex optimization problems and ﬁndin g sadd le poin ts of conte x-conea ve functions with the use of projection operations. Optimizati on , 16(4):519–533 , 1985. [36] J. W ang and N. Elia. A control perspecti ve for centraliz ed and distrib uted con vex opt imizati on. In 50th IE EE Conferen ce on Decision and Contr ol and Euro pean Contr ol Confer ence , pages 3800–3805, 2011. [37] C. Z hao, U. T opcu, N. Li, and S. H. Low . Desig n and stability of load- side primary frequency control in po wer systems. IEEE Tr ansactions on Automatic Contr ol , 59(5):1177–1189, 2014. A P P E N D I X In ap pendices A and B we pr ove the main results o f the paper which are stated in section IV. W e ﬁrst g i ve a b r ief o u tline of the derivations of the results to im prove their readability . Before we give th is summary we deﬁne some add itional notation. Giv en ¯ z ∈ ¯ S , we denote the set o f solution s to th e grad ient method (5) that are a con stant distance from ¯ z , (but not necessarily o ther saddle points), as S ¯ z . I t is later proved th at S ¯ z = S but until then th e distinction is important. Outline of Pr oofs : • First in App endix A we u se the pa th wise stability of the gradient meth od (L emma 9) and geom etric arguments to establish con vexity properties o f S . L e mma 27 states that ¯ S can only contain bi-inﬁn ite lines in degenerate cases. Lemma 28 g iv es an o rthogo nality condition between S and ¯ S which roug hly says that the larger ¯ S is, the smaller S is. These allow us to prove the key result of the section , Lemma 30, which states that any con vex combin a tio n o f ¯ z ∈ ¯ S and z ( t ) ∈ S ¯ z lies in S ¯ z . • In Append ix B we use the geometric results of Append ix A to p rove Theo rems 12, 21. T o prove Theorem 22 we ﬁr st prove Lemma 3 3 (analo- gous to Lemma 27) that tells u s that S containing a bi- inﬁnite line implies the presence o f a quantity conser ved by all solutions of the grad ient d ynamics (5). In the presence o f no ise, the variance o f this qu a n tity conver ges to inﬁnity an d allo ws us to prove T heorem 22. T o prove T heorem 23 we construct a quantity V ( z ) that is con served by solutions in S . In the case con sid e red this has a natural in terpretation in terms o f the u tility f u nction U ( x ) and the co nstraints g ( x ) . Finally Theo rem 2 5 is proved by modify ing the proo f of Theorem 12 to take into accoun t the addition o f the projection matrix. A P P E N D I X A G E O M E T RY O F ¯ S A N D S In th is section w e will use the g radient meth od to derive geometric prop erties of conv ex-concave f u nctions. W e will start with some simple results which are then u sed as a basis to derive Lemm a 3 0 the main r e sult of this section . W e ﬁrst pr ovid e a pr oof for Corollary 11 associated with the convergence of all solutions of the grad ient method to solutions in the set S . Pr oof of Cor o llary 11. Using Lasalle’ s in variance principle with | z ( t ) − ¯ z | 2 as the L y apunov like fu nction, whe re ¯ z is a sadd le po int, we get conv ergence of all solutions of the gradient method to the set of solutions that are a constant distance from all saddle poin ts (de noted as set S ). In the remainder of the p roof we stren gthen the con vergence to the set S , to conv ergence to a so lution 5 in S ( see the deﬁnition of conver gence to a solu tion in (4)). Let z ( t ) be a solution of (5). From the co n vergence to the set S and Lemma 9 ther e exist p o ints z ( n ) ∈ S and times t n such that, | z ( t n ) − z ( n ) | ≤ 1 /n. (18) W e now consider the solution s z ( n ) ( t ) ∈ S with z ( n ) ( t n ) = z ( n ) . By an application of Lemma 9, we hav e for all t ≥ t n , | z ( t ) − z ( n ) ( t ) | ≤ 1 / n. (19) From th e b ound edness o f z ( t ) and ( 19) the set an d { z ( n ) : n ∈ N } is re latively compa c t, and by th e constant distan ce o f each trajecto ry in S from all sadd le points, th e set of initial condition s { z ( n ) (0) : n ∈ N } is also r elativ ely c ompact. There is hence a sub sequence n k for which z ( n k ) (0) → z ′ (0) ∈ S as k → ∞ . W e cla im th at | z ( t ) − z ′ ( t ) | → 0 as t → ∞ . I ndeed, for any ǫ > 0 there exists a k ∈ N such that for all t ≥ t n k , we have | z ( t ) − z ( n k ) ( t ) | ≤ ε/ 2 (20) and also fo r a ll t ≥ 0 , | z ′ ( t ) − z ( n k ) ( t ) | ≤ ε/ 2 (21) where in e ach case we have used Lemma 9. T he claim now follows fro m th e tr ia n gle in equality , which completes the p roof of Corollary 11. Lemma 26 ([3 2]) . Let ϕ ∈ C 2 be concave-co n vex o n R n + m , then ¯ S , the set o f saddle points of ϕ , is closed an d c o n vex. 8 z ¯ a ¯ b L Fig. 1. ¯ a an d ¯ b are two saddle points of ϕ which is C 2 and conca ve-c on vex on R n + m . Solutions of (5) are constrai ned to lie in the shaded region for all positi ve time by Lemma 9. L z z + s b Fig. 2. L is a line of saddle points of ϕ which is C 2 and conca ve-con vex on R n + m . Solutions of (5) starting on hyperplan es normal to L are constrained to lie on the se planes for all time. z lies on one norma l hyperplane , and z + s b lies on another . Considering the solutio ns of (5) starting from each we see that by Lemma 9 the distanc e between these two soluti ons must be constant and equal to | s b | . Lemma 27. Let ϕ be C 2 and concave - conve x on R n + m . Let the set of sadd le poin ts of ϕ contain the in ﬁnite line L = { a + s b : s ∈ R } fo r some a , b ∈ R n + m . Then ϕ is translation in variant in the dir ectio n of L , i.e. ϕ ( z ) = ϕ ( z + s b ) fo r any s ∈ R . Pr oof. W e do this in two steps. First we will prove that the motion of th e gr a dient method is r estricted to linear manifold s normal to L . Let z be a point a nd consider the motion of the gradient method starting f rom z . As illustrated in Figure 1 we pick two saddle points ¯ a , ¯ b on L , then by Lemma 9 the motion starting fro m z is co nstrained to lie in the ( sh aded) region, which is the intersection o f the two clo sed balls ab out ¯ a and ¯ b which have z on their bo u ndaries. The intersection of all such region s gen erated by a sequence of pair s o f saddle points on L each tend ing to inﬁn ity o n opposite directions along L , is contain ed in the linear manifo ld nor mal to L that contains z . Next we claim that for s ∈ R the motion starting fro m z + s b is exactly the motio n starting from z shifted by s b . As illustrated in Figur e 2, by Lemma 9 the mo tion fr o m z + s b must stay a constant distan ce s | b | fro m the motion from z . This u n iquely id entiﬁes the motion from z + s b and proves the claim. Finally we dedu ce the full result by noting that the second claim implies that ϕ is deﬁn ed up to an add iti ve constant on each lin ear manif old as th e motion of the gradient 5 The proof is based on arg uments analogous to those in [18, Propositi on 42] method contain s all th e informa tio n abou t the d eriv atives of ϕ . As ϕ is constant on L , the proof is complete. W e now u se these techniqu es to prove orthogo nality results about solutions in S . Lemma 28. Let ϕ ∈ C 2 be co ncave-co n vex on R n + m , an d z be a trajectory in S , then z ( t ) ∈ M ¯ S ( z (0)) for all t ∈ R , wher e M ¯ S ( z (0)) d enotes the manifo ld deﬁ ned in (1) . Pr oof. If ¯ S = { ¯ z } or ∅ the claim is tri vial. Other wise we let ¯ a 6 = ¯ b ∈ ¯ S be arbitrary , and c o nsider th e spheres centred at ¯ a and ¯ b , respectively , that each have the po int z ( t ) on their boun dary. By Lemma 9, z ( t ) is constrained to lie on the intersection of these two sp h eres which lies inside M L ( z (0)) where L is the line segmen t between ¯ a a n d ¯ b . As ¯ a an d ¯ b were arbitrary this p roves the lem ma. Lemma 29. Let ϕ be C 2 and concave-co nve x o n R n + m , ¯ z ∈ ¯ S and z ( t ) ∈ S ¯ z lie in M ¯ S ( z (0)) for a ll t . Then z ( t ) ∈ S . Pr oof. If ¯ S = { ¯ z } the claim is trivial. Let ¯ a ∈ ¯ S \ { ¯ z } be arbitrary . Then by Lem ma 2 6 the line segment L b etween ¯ a and ¯ z lies in ¯ S . Let b be the point of intersection between the line that includ es the line segmen t L , an d M ¯ S ( z (0)) . Then the deﬁnition of M ¯ S ( z (0)) tells us that th e this line meets M ¯ S ( z (0)) at a right an gles. d ( b , ¯ z ) is constant and d ( z ( t ) , ¯ z ) is constant as z ( t ) ∈ S , which implies that d ( z ( t ) , ¯ a ) is also constant (as illustrated in Fig u re 3). Ind eed, we have d ( z ( t ) , ¯ a ) 2 = d ( z ( t ) , b ) 2 + d ( b , ¯ a ) 2 = d ( z ( t ) , ¯ z ) 2 − d ( b , ¯ z ) 2 + d ( b , ¯ a ) 2 (22) and all the term s on the righ t hand side a r e constant. b ¯ a ¯ z L z M ¯ S ( z ) Fig. 3. ¯ a and ¯ z are saddle points of ϕ which is C 2 and conca ve- con vex on R n + m , and L is the line segment between them. z is a point on a solution in S ¯ z which lies on M ¯ S ( z ) which is orthogonal to L by deﬁnition. b is the point of intersec tion between M ¯ S ( z ) and the extension of L . Using th ese orth ogona lity r esults w e prove the key result of the section, a co n vexity result between S ¯ z and ¯ z . Lemma 30. Let ϕ be C 2 and concave-co nve x o n R n + m , ¯ z ∈ ¯ S and z ( t ) ∈ S ¯ z . Then for any s ∈ [0 , 1 ] , th e conve x comb in ation z ′ ( t ) = (1 − s ) ¯ z + s z ( t ) lies in S ¯ z . I f in add itio n z ∈ S , then z ′ ( t ) ∈ S . Pr oof. Clearly z ′ is a co nstant distance fro m ¯ z . W e mu st show that z ′ ( t ) is also a solution to (5). W e argue in a similar way to Fig ure 2 but with sp h eres instead of planes. L et the solution 9 to (5) starting at z ′ (0) be denoted z ′′ ( t ) . W e must show this is equ al to z ′ ( t ) . As z ( t ) ∈ S it lies on a sphe r e about ¯ z , say of r adius r , and b y constru c tion z ′ (0) lies on a smaller sphere about ¯ z o f r adius rs . By Lem m a 9 , d ( z ( t ) , z ′′ ( t )) and d ( z ′′ ( t ) , ¯ z ) are non -increasing , so th at z ′′ ( t ) must b e with in rs of ¯ z and within r (1 − s ) o f z ( t ) . The on ly such po in t is z ′ ( t ) = (1 − s ) ¯ z + s z ( t ) which p roves the claim. For th e add itional statement, we con sider ano ther saddle point ¯ a ∈ ¯ S and let L be the line segment con necting ¯ a an d ¯ z . By Lemma 28, z ( t ) lies in M ¯ S ( z (0)) , so by construction, z ′ ( t ) ∈ M ¯ S ( z ′ (0)) , ( a s illustrated by Figure 4). Hence, by Lemma 29, z ′ ( t ) ∈ S . ¯ z z z ′ M ¯ S ( z ) M ¯ S ( z ′ ) Fig. 4. ¯ z is a saddle point of ϕ which is C 2 and conca ve-con vex on R n + m . z is a point on a s olutio n in S and z ′ is a conv ex combination of z and ¯ z . M ¯ S ( z ) and M ¯ S ( z ′ ) are parallel to each other by deﬁnition. In Appen dix B we also prove that the set S is con vex (stated as Propo sition 32). A P P E N D I X B C L A S S I FI C AT I O N O F S W e will now pr oceed with a full classiﬁcation of S and prove Theor ems 1 2-23. For no ta tio nal con venie n ce we will make the a ssum ption ( without loss of gener ality) that 0 ∈ ¯ S . Then we compute ϕ x ( z ) , ϕ y ( z ) from line in tegrals fro m 0 to z . Indeed , lettin g ˆ z be a u nit vector parallel to z , we h av e  ϕ x ( z ) − ϕ y ( z )  = Z | z | 0  ϕ xx ( s ˆ z ) ϕ xy ( s ˆ z ) − ϕ y x ( s ˆ z ) − ϕ y y ( s ˆ z )  ds ! ˆ z . (23) T ogether with the deﬁnition o f the matrices A ( z ) an d B ( z ) giv en by (7) we o btain  ϕ x ( z ) − ϕ y ( z )  = Z | z | 0 ( A ( s ˆ z ) + B ( s ˆ z )) ˆ z ds. (24) W e are n ow read y to prove the ﬁrst main result. Pr oof of Theor em 12. Deﬁne the set X as solutions o f th e ODE ( 8) which o bey the condition (9) for all t ∈ R and r ∈ [0 , 1] . Then T heorem 12 is the statement that X = S . For brevity we deﬁne the matrix B ′ ( z ) by B ′ ( z ) = B ( z ) + ( A ( z ) − A ( 0 )) . (25) As A ( z ) is skew sym metric and B ( z ) is symm etric we have ker( B ′ ( z )) = k er( B ( z )) ∩ k er( A ( z ) − A ( 0 )) , so that con dition (9) is equ i valent to z ( t ) ∈ ker( B ′ ( r z ( t ))) for all t ∈ R , r ∈ [0 , 1] . (26) W e will prove th at X ⊆ S 0 , X ⊆ S and S 0 ⊆ X . As th e other inclusion S ⊆ S 0 is clear this will prove the theo r em. Step 1: X ⊆ S 0 . For any non-zer o point z we can c o mpute the p artial de r iv atives of ϕ at z using th e line integral f o rmula (24) and (25),  ϕ x ( z ) − ϕ y ( z )  = A ( 0 ) z + Z | z | 0 B ′ ( s ˆ z ) ˆ z ds (27) where z = | z | ˆ z . If z ( t ) ∈ X , th en ˙ z ( t ) = A ( 0 ) z ( t ) , and by ske w-symmetry of A ( 0 ) , | z ( t ) | is con stan t, which means that z ( t ) is a co nstant distan c e from 0 . Fu rthermor e, th e assumption that z ( t ) ∈ ker( B ′ ( r z ( t ))) f or r ∈ [0 , 1] imp lies that the integran d in (27) vanishes, and z ( t ) is a solu tion of the gradien t meth od. Step 2: X ⊆ S . L et ¯ z b e arbitrary . Con sider the func tio n t 7→ d ( z ( t ) , ¯ z ) 2 . By expanding in the orth onorm al basis of eigen vectors of A ( 0 ) we observe that this f unction is a linear comb ination of contin u ous perio d ic fun ctions. As, b y Lemma 9, this function is also non-incre a sing, it must be constant. Step 3: S 0 ⊆ X . Let z ( t ) ∈ S 0 and R = | z ( t ) | which is constant. For r ∈ [0 , R ] , deﬁne z ( t ; r ) = ( r /R ) z ( t ) , so that z ( t ; 0) = 0 and z ( t ; R ) = z ( t ) . Note that the correspo nding unit vector ˆ z ( t ; r ) = ˆ z ( t ) does not depen d on r . T h e conve xity result Lemma 30 implies that z ( t ; r ) ∈ S 0 , and is a solution of the gradient method. W e shall compu te the time deriv ati ve of this in tw o ways. First, we u se (5) an d (27) to obtain, ˙ z ( t ; r ) = A ( 0 ) z ( t ; r ) + Z r 0 B ′ ( s ˆ z ( t )) ˆ z ( t ) ds. (28) Second, we u se the explicit deﬁnitio n of z ( t ; r ) in terms of z ( t ) to o btain, ˙ z ( t ; r ) = r R A ( 0 ) z ( t ) + r R Z R 0 B ′ ( s ˆ z ( t )) ˆ z ( t ) ds. (29) Equating (28) and ( 29) we dedu ce th at Z r 0 B ′ ( s ˆ z ( t )) ˆ z ( t ) ds = r R Z R 0 B ′ ( s ˆ z ( t )) ˆ z ( t ) ds. (30) Differentiating with respect to r we have, B ′ ( r ˆ z ( t )) ˆ z ( t ) = 1 R Z R 0 B ′ ( s ˆ z ( t )) ˆ z ( t ) ds. (31) The right hand side of this is independen t of r , which implies that the lef t hand side is also indep e n dent of r , an d is th us equal to its v a lu e at r = 0 , so tha t B ′ ( r ˆ z ( t )) ˆ z ( t ) = B ′ ( 0 ) ˆ z ( t ) = B ( 0 ) ˆ z ( t ) . (32) Putting this bac k into our expression for ˙ z we ﬁn d that ˙ z ( t ) = A ( 0 ) z ( t ) + B ( 0 ) z ( t ) , (33) but as | z ( t ) | is constant, A ( 0 ) skew sym metric, and B ( 0 ) symmetric, B ( 0 ) z ( t ) mu st v anish, which, together with (32) shows that z ( t ) ∈ X . The following Lem m a follows d ir ectly fro m Th e o rem 12 and will be used to prove Prop o sition 32, which states that the set S is convex. 10 Lemma 31. Let ϕ be C 2 and concave - conve x on R n + m . Let z ( t ) , z ′ ( t ) ∈ S . Then d ( z ( t ) , z ′ ( t )) is constant. Pr oof. Using Th eorem 12 we have that z ( t ) − z ′ ( t ) = e t A ( 0 ) ( z (0) − z ′ (0)) which has c o nstant magnitud e as A ( 0 ) is ske w symmetric . Proposition 32. Let ϕ b e C 2 and concave - conve x on R n + m , then S is conve x. Pr oof. The proof is very similar to that of Lemm a 30. Let z ( t ) , z ′ ( t ) ∈ S , and s ∈ (0 , 1) . Set w ( t ) = s z ( t ) + (1 − s ) z ′ ( t ) . By L emma 31 we kn ow th at d = d ( z ( t ) , z ′ ( t )) is co nstant. Denote the solution of the g radient method starting from w (0) as w ′ ( t ) . W e must prove that w ′ ( t ) = w ( t ) an d that w ( t ) ∈ S . First we imagine two closed balls centered on z ( t ) and z ′ ( t ) and of radii sd and (1 − s ) d respectiv ely . By Lemma 9, w ′ ( t ) is constrained to lie within both of the se balls. For each t there is o nly one such po int and it is exactly w ( t ) . Next we let ¯ a ∈ ¯ S be arbitr ary , then d ( ¯ a , w ( t )) is dete r mined by d ( z ( t ) , z ′ ( t )) , d ( ¯ a , z ) and d ( ¯ a , z ′ ( t )) , (as illustrated b y Figure 5). Indeed, we m ay assume by translation th at ¯ a = 0 , and then d ( ¯ a , w ( t )) 2 = d ( 0 , z ( t ) + (1 − s ) z ′ ( t )) 2 = s 2 d ( 0 , z ( t )) 2 + (1 − s ) 2 d ( 0 , z ′ ( t )) 2 − 2 s (1 − s ) z T ( t ) z ′ ( t ) (34) The ﬁr st two terms in (34) are co nstant by Lemma 31 and the third can be computed as 2 z T ( t ) z ′ ( t ) = d ( z ( t ) , z ′ ( t )) 2 − d ( 0 , z ( t )) 2 − d ( 0 , z ′ ( t )) 2 (35) which is con stan t for the same re a son. ¯ a z z ′ w Fig. 5. z and z ′ are two elements of S and w is a con vex combinat ion of them. ¯ a is a saddle point in ¯ S . W e kno w all the distances are constant except possibly d ( w , ¯ a ) , but this is uniquely determined by the other four distanc es. Pr oof of Lemma 19. T he only if part is trivial. T o prove the if p art, assume there exists a trajectory in S that is no t a saddle point, i.e. it is at a co n stant non -zero distance from each sadd le point. Since saddle points ar e in S , the conv exity of S (Propo sition 32) implies the existence of trajecto ries in S that are not saddle p oints and can be ch o sen to be arbitrarily close to the saddle p oint ¯ z . Th is co ntradicts the assump tion in Lemma 19. T o prove Th eorem 22 we requ ire the following lemm a which shows the existence of a conserved quantity of the gradient dyna mics. Lemma 33. Let ϕ be C 2 and conca ve-conve x o n R n + m . Su p- pose that S contain s a bi-inﬁnite line L = { a + s v : s ∈ R } . Assume th at 0 ∈ ¯ S . Then W ( t ; z ) = | ( e t A ( 0 ) v ) T z | 2 is a conserved quantity for any solution z of ( 5) . Pr oof. As S is c lo sed and conv ex (Prop osition 3 2) we may assume that the line passes though th e origin and take a = 0 . Let v ( t ) = e t A ( 0 ) v and note th at λ v ( t ) is a solution to the gradient method (5) b y Th eorem 12 for any λ ∈ R . W e follow the strategy of the ﬁrst part of the p roof o f Lemma 27 with − λ v ( t ) , λ v ( t ) r e placing the saddle poin ts ¯ a , ¯ b . Indeed, let z ( t ) be any solu tion to (5) and let λ ′ = v T z (0) . Then for any t ≥ 0 , Lemma 9 implies that z ( t ) must satisfy d ( ± λ v ( t ) , z ( t )) ≤ d ( ± λ v (0) , z (0)) , (36) where by ± we mea n that the equ ation holds for each of + an d − . In the sam e way as in th e p roof of Lemma 2 7, taking the intersection of these balls for a sequen ce λ → ∞ we d educe that z ( t ) is contain e d in the linear man ifold n ormal to the line throug h the or ig in and v ( t ) , and passing thro ugh λ ′ v ( t ) . Indeed , by squ aring (36) and expand ing we ob tain | z ( t ) | 2 ∓ 2 λ v ( t ) T z ( t ) ≤ | z (0) | 2 ∓ 2 λ v (0) T z (0) . By d ividing through by λ and takin g the limit λ → ∞ we deduce th a t v ( t ) T z ( t ) is equal to v (0 ) T z (0) which implies that W ( t ; z ) is conserved. Pr oof of Theor em 22. Consider the conserved quantity W ( t ; z ) given by L emma 33. Applyin g I t ¯ o’ s lemma an d taking expectations, we have d dt E W ( t ; z ( t )) = E ˙ W ( t ; z ( t )) + 1 2 E T r( Σ T W zz Σ ) where Σ = diag(Σ x , Σ y ) , ˙ W is the total de riv ative along the deterministic ﬂow (5) an d T r is the trace oper a tor . As W is conserved alon g the deterministic ﬂo w , ˙ W = 0 and a simp le computatio n shows that the second term is in depend ent of z and bounded belo w b y a strictly p ositiv e co nstant. Therefore E W ( t ; z ( t )) gr ows at least linearly in tim e. I t remain s to note that W ( t ; z ) ≤ | e t A ( 0 ) v | 2 | z | 2 ≤ | v | 2 | z | 2 , so that | z ( t ) | 2 ≥ cW ( t ; z ( t )) for a constant c > 0 . This implies that also E | z ( t ) | 2 → ∞ and co mpletes the p roof of the propo sition. T o prove Theorem 21 and Theo rem 23 we m a ke use of the following result that follows easily from linear algebra arguments. Lemma 3 4 . Let X be a linea r subsp a ce of R n and A ∈ R n × n a normal matrix. Let Y = s pan { v ∈ X : v is an eigenvector of A } . (37) Then Y is the lar gest subset of X that is in va riant u nder A . W e note that in variance of a subspace u n der A is e quiv alent to in variance of the subspace under the gro up e tA . Pr oof of Theor em 21. Step 1: S linear ⊆ S when ϕ is a quadratic funct io n. W e will use the c haracterisation of S giv en by Theo rem 12. By Lem ma 34, S linear is inv aria n t u nder e t A ( 0 ) , so that z (0) ∈ S linear = ⇒ z ( t ) = e t A ( 0 ) z (0) ∈ S linear . Hence if z (0) ∈ S linear then z ( t ) ∈ ker ( B ′ ( 0 )) for all time t , 11 and as ϕ is a quadr atic function, B ′ ( z ) is c onstant, so th is is enoug h to show S linear ⊆ S . Step 2: S ⊆ S linear . Let z ( t ) ∈ S , then b y Theorem 12 taking r = 0 we ha ve z ( t ) = e t A ( 0 ) ∈ k er( B ′ ( 0 )) fo r all t ∈ R . Th us S lies inside the largest subset of ker( B ′ ( 0 )) th at is in variant under the action of the g roup e t A ( 0 ) , which by Lemma 34 is exactly S linear . In order to pr ove Th eorem 23 we give a different in- terpretation o f the co ndition in Theo rem 12. The condition z ∈ k er( B ( s z )) fo r a ll s ∈ [0 , 1] loo ks like a line integral condition . I n deed, if we deﬁn e a function V ( z ) b y V ( z ) = z T  Z 1 0 Z 1 0 B ( ss ′ z ) s ds ′ ds  z (38) then as B ( z ) is sym metric negativ e semi-deﬁn ite we have that V ( z ) = 0 if and only if z ∈ k er( B ( s z )) for every s ∈ [0 , 1] . This still leav es the condition z ∈ ker( A ( s z ) − A ( 0 )) f or all s ∈ [0 , 1 ] , and the f unction V h as no natu ral interpretatio n in gene ral. Howe ver in the speciﬁc case where ϕ is the Lagrang ian of a concave optimization pr oblem wh ere the relaxed co nstraints a r e linear, we do have an interp retation. In this case the assumption that 0 is a sadd le p oint is n o longer generic an d we must translate coord inates explicitly . Let the Lagrang ian of the optimization prob le m be given by ϕ ( x ′ , y ′ ) = U ′ ( x ′ ) + y ′ T g ′ ( x ′ ) U ′ ∈ C 2 and concave, g ′ linear with g ′ x = D . (39) W e pick a saddle po int ( ¯ x ′ , ¯ y ′ ) , and sh ift to new co ordinates ( x, y ) = ( x ′ − ¯ x ′ , y ′ − ¯ y ′ ) so that (0 , 0) is a sad dle point in the new coordinate s. After expand ing we ob tain ϕ ( x, y ) = ( U ′ ( x + ¯ x ′ ) + ¯ y ′ T g ′ ( x + ¯ x ′ )) + y T g ′ ( x + ¯ x ′ ) (40) which is a La grangian originatin g f rom the utility fu n ction U ( x ) = U ′ ( x + ¯ x ′ ) + ¯ y ′ T g ′ ( x + ¯ x ′ ) (41) and constrain ts g ( x ) = g ′ ( x + ¯ x ′ ) . W ithout loss o f generality we assume that U (0) = 0 . As g ( x ) is a linear func tio n we have B ( z ) =  U xx ( x ) 0 0 0  (42) so th a t V ( z ) is in depend ent of y , a nd in fact by d irect computatio n we hav e V ( z ) = U ( x ) . Th is leads us to the following lem m a. Lemma 35 . Let (39) hold. Th en S is the larg est subset of U − 1 ( { 0 } ) × R m = { ( x, y ) ∈ R n + m : U ( x ) = 0 } that is in variant under evolution by the gr oup e t A ( 0 ) , wher e U is given by (41) . Pr oof. Denote the set d eﬁned in the lemma as Y . Step 1 : S ⊆ Y . By the co mputation above we know that z ∈ U − 1 ( { 0 } ) × R m if and only if z ∈ ker( B ( s z )) for all s ∈ [0 , 1] . Th us by Th eorem 1 2, we have S ⊆ U − 1 ( { 0 } ) × R m as S is in variant und er the action of e t A ( 0 ) . Step 2: Y ⊆ S . If z (0) is in the largest sub set of U − 1 ( { 0 } ) × R m in variant und er the action of e t A ( 0 ) , then z ( t ) is in this set for all t ∈ R . De ﬁn ing z ( t ) = e t A ( 0 ) z (0) , we hav e z ( t ) ∈ ker ( B ( s z ( t ))) for all s ∈ [0 , 1] , so z ( t ) ∈ S by Theor em 12. T o obtain a mo r e exact expression for S , we make use of the assumption tha t U is analytic. Lemma 36. Let (39) hold and in addition U given by (4 1) be analytic. Then (i) U − 1 ( { 0 } ) = span( U − 1 ( { 0 } )) . (ii) S = { e t A ( 0 ) z (0) : z (0) ∈ Q} wher e Q = span { ( x, y ) ∈ U − 1 ( { 0 } ) × R m : ( x, y ) is an eigen vec to r of  0 D T − D 0   (43) Pr oof. W e begin with (i). Recall we h av e a ssum ed witho ut loss of gen erality that U (0) = 0 . As U − 1 ( { 0 } ) is the set of maxima o f a concave function, it is convex. If U − 1 ( { 0 } ) is the single point 0 , then ( i) is tr i vial. Other wise le t L b e a line segment (of strictly positive length) in U − 1 ( { 0 } ) , and let ˆ L be the bi-inﬁnite exten sion of L . Let f be a linear b ijection from R to ˆ L , an d let I ⊂ R be the interval in R given by f − 1 ( L ) . Then U ( f ( t )) : R → R is a n analytic fun ction who se restriction to I vanishes. Hen ce U ( f ( t )) vanishes everywhere on R , which is equiv alent to U vanishing on ˆ L . By varying the choice o f L , we d educe that U − 1 ( { 0 } ) conta in s in ﬁnite lines in every directio n in span( U − 1 ( { 0 } )) and by conve xity is equal to span( U − 1 ( { 0 } )) . (ii) is a consequence of Lemma 35 an d Lemma 34. Lastly , we translate bac k into the o r iginal coordina te s. Lemma 37. Let (39) hold an d U ′ be analytic, then U − 1 ( { 0 } ) = { x ∈ R n : R ∋ s 7→ U ′ ( sx + ¯ x ′ ) is linear } wher e U is giv e n by (41) . Pr oof. Suppose that x ∈ U − 1 ( { 0 } ) then by Lemma 36 U ( sx ) = 0 for all s ∈ R . Recall that U − U ′ is a linear function . Hence U ′ ( sx + ¯ x ′ ) is linear as a function of s ∈ R . Now suppose that U ′ ( sx + ¯ x ′ ) is linear as a fu n ction of s ∈ R for some x ∈ R n , then U ( sx ) is also linear . But U (0) = 0 and U x (0) = 0 , as 0 is a saddle point of ϕ , so by linearity U ( sx ) = 0 fo r a ll s ∈ R . Pr oof of Theor em 23. This is just a simple co mbination of Lemma 37 an d Lemma 36. W e now consider the ca se of the p r ojected g radient metho d. Pr oof of Theor em 25. W e show how to ada p t the proof of the results on the gradient method . W e de note the set of equilibriu m points of th e p rojected gradient method a s ¯ S Π and similarly S Π , S Π ¯ z , in analo g y with S , S ¯ z . W e ﬁr st n ote tha t the projected gr a dient metho d is pathwise stable (Lem ma 2 2 in p a rt II). T o g ether with the assumption that 0 ∈ ¯ S Π , this means that the reasonin g in Appen dix A applies, and in pa r ticular a version of Lemm a 30 holds, i.e. Lemma 38. Let ϕ be C 2 and conca ve-conve x o n R n + m , Π ∈ R ( n + m ) 2 be an orthogon al pr ojection ma trix, ¯ z ∈ ¯ S Π and z ( t ) ∈ S Π ¯ z . Then for a ny s ∈ [0 , 1] , the co n vex co mbination 12 z ′ ( t ) = (1 − s ) ¯ z + s z ( t ) lies in S Π ¯ z . If in a d dition z ∈ S Π , then z ′ ( t ) ∈ S Π . Equation (23) beco mes Π  ϕ x ( z ) − ϕ y ( z )  Π = Z | z | 0 Π  ϕ xx ( s ˆ z ) ϕ xy ( s ˆ z ) − ϕ y x ( s ˆ z ) − ϕ y y ( s ˆ z )  Π ds ! ˆ z and we re p lace (7) with e A ( z ) = Π  0 ϕ xy ( z ) − ϕ y x ( z ) 0  Π e B ( z ) = Π  ϕ xx ( z ) 0 0 − ϕ y y ( z )  Π The remain d er of the proo f carr ies th rough unaltered in analogy with th at of Theorem 12. A P P E N D I X C T H E A D D I T I O N O F C O N S TA N T G A I N S It is common in a p plications to co nsider the gr adient method with con stan t gain s, i.e. ˙ x i = γ x i ϕ x i for i = 1 , . . . , n, ˙ y j = − γ y j ϕ y j for j = 1 , . . . , m. (44) for ϕ ∈ C 2 a concave-con vex function on R n + m and γ x i , γ y j positive constants. Howe ver , in the setting of an arbitrar y concave-con vex fu nction, this is n o t a g eneralisation, an d it is sufﬁcient to study the grad ient m e thod (5) without gain s, by a coo rdinate transfor m ation that we now describe . Let Λ be a diagonal matrix deﬁn ed f rom the gain s by Λ = diag( p γ x 1 , . . . , p γ x n , q γ y 1 , . . . , p γ y m ) . (45) Giv en a concave-con vex function ϕ we deﬁne a new con c av e- conv ex function ϕ ′ by ϕ ′ ( z ′ ) = ϕ ( Λz ′ ) . (46) Let z ′ ( t ) be a solution to the g radient method (5) without gains applied to ϕ ′ , then z ( t ) := Λz ′ ( t ) (47) is a solution to the g r adient method (4 4) applied to ϕ with gains. Inde e d , we h av e ˙ z ( t ) = Λ ˙ z ′ ( t ) = Λ 2  ϕ x ( Λz ′ ( t )) − ϕ y ( Λz ′ ( t ))  = Λ 2  ϕ x ( z ( t )) − ϕ y ( z ( t ))  and the Λ 2 term gives the gains. Thus any prop erties of the gradient meth od with gains can be o btained f rom the g radient m e th od without gains applied to a suitab ly modiﬁed func tio n. Howe ver, ap plying this transfo r mation to th e subgr a dient method has the effect of altering the metric in the conve x projection . W e th e r efore use the following deﬁnition of sub- gradient dyna mics with gains. Deﬁnition 3 9 (Sub gradient metho d with g ains) . Given a no n- empty closed conve x set K ⊆ R n + m , ϕ ∈ C 2 a conc ave- conv ex func tion o n K an d a set of positiv e gains γ x i , γ y j as in (44), we deﬁn e the subgradient method on K with gain s as a semi-ﬂow on ( K , d ) consisting o f Carath ´ eo dory solutio ns of ˙ z = f ( z ) − P N K ( z ) ,d Λ − 1 ( f ( z )) (48) where f ( z ) is th e vector ﬁeld of the gradien t meth od with gain s (44) and P M ,d Λ − 1 is a w e ighted conv ex projection given by P M ,d Λ − 1 ( z ) = arg min w ∈ M d ( Λ − 1 w , Λ − 1 z ) (49) where Λ is deﬁned in terms of the gain s by (45). It sho uld be noted that the weighted m e tr ic used in th e projection ar ises from the stretch ing of the d o main K when the coord in ate tr a nsformatio n (47) is applied. Remark 40 . When non-n egati vity co nstraints ar e pr e sent the subgrad ie n t dynam ics are not affected by this change to the metric in the conve x pro jection, i.e. the dynamics in (48) are identical to the ones where an unweig hted metric is u sed in the projection . For example , if the y coor d inates are restricted to be non-n egati ve an d the x coo rdinates unco nstrained, then the subgrad ient m ethod with gain s (4 8) is gi ven b y ˙ x i = γ x i ϕ x i for i = 1 , . . . , n, ˙ y j = [ − γ y j ϕ y j ] + y j for j = 1 , . . . , m. (50) This holds more generally for any c on vex set K with bound - aries aligned to the co ordinate axes.

Stability and instability in saddle point dynamics -- Part I

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment