Reversible Markov chain estimation using convex-concave programming
We present a convex-concave reformulation of the reversible Markov chain estimation problem and outline an efficient numerical scheme for the solution of the resulting problem based on a primal-dual interior point method for monotone variational ineq…
Authors: Benjamin Trendelkamp-Schroer, Hao Wu, Frank Noe
REVERSIBLE MARK O V CHAIN ESTI MA TION USING CONVEX-CONCA VE PR OGRAMMING BENJAMIN TRENDELKAMP-SCHROER † ‡ , HAO WU † § , AND FRANK NOE † ¶ Abstract. W e presen t a con v ex-co nca v e r eformulation of the r eversible Mar k o v chain estimation problem and outline an efficient numerical scheme for the solution of the r esulting problem based on a primal-dual interior point method for m onotone v ariational inequalities. Extensions to situations in whic h information about the stationary vect or is a v ai l able can also be solv ed via the conv ex- conca ve reformulation. The method can be generalized and applied to the discrete transition m atrix rewe i gh ting analysis method to perf orm inference from indep enden t c hains with specified couplings betw een the stationary probabilities. The proposed approach offers a significant sp eed-up compared to a fixed-p oint iteration for a num b er of relev an t applications. Key words. Mark ov chain estimation, Reversible Marko v chain, Conv ex-conca ve program AMS sub ject classificatio ns. 62M05, 65K15, 62F30, 62P10 1. In tro duction. The study of reversible Markov chains is a recurrent theme in proba bilit y theo r y with ma n y imp or tant applica tio ns, [1, 13, 20]. Surprisingly , sta- tistical inference for reversible Marko v c hains has b een studied only r e cent ly . The reversible maximum likelihoo d es timation (MLE) problem w as pr eviously discussed in [3, 18, 22]. [2, 9, 1 4, 16, 2 2] study the the p oster io r ensemble of reversible sto chas- tic matrices and dis cuss algorithms for Bayesian p oster ior inference. F or a given sto chastic matrix the b est approximation which is reversible with r esp ect to a given stationary vector was found in [1 5]. Maximum likelihoo d estimation and p osterior inference of reversible sto chas- tic matrices have imp ortant applica tio ns in the co n text of Marko v state mode ls [4]. Marko v state models are simplified kinetic mode ls for the complex dynamics of biomolecules. T ransition probabilities betw een relev ant molecula r c onformations are estimated from simulation da ta. The estimated transition matrix is then used to compute quantities of interest and to extract a s implified picture of the kinetic pathw ays present in the dynamics. In [2 1] it is shown that a s ignificant sp eed-up in the estimation of rare events is possible if additio na l infor mation ab out the stationary vector is inco rp orated via a detailed balance constr aint. The reversible MLE problem was previously s olved using a self co ns istent itera tion metho d which can req uir e a large num b er of itera tions to co n verge [3, 18, 22]. Here we o utline an efficien t n umer ical alg o rithm for so lving the reversible MLE problem via a co n vex-concave r eformulation o f the pr oblem based o n a duality argument from [23]. Conv ex-c oncav e pro g rams cannot be solved by standard nonlinear programming approaches which aim to minimize some ob jective sub ject to cons trains. They can be treated as finite dimensional mono tone v ar iational inequalities and they can be solved using the prima l- dual int e rior-p oint outlined in [19]. The reversible MLE problem is a nonlinear pro gramming pr oblem with a conv ex ob jective and non-conv ex constraints. The n umber of unknowns in the problem is quadratic in the num b er of sta tes of the chain. The dual problem has only linea r constraints and the num b er of unknowns gr ows linearly with the num b er of states of † Institut f¨ ur Mathe m atik und Inform atik, F reie Univ ersi t¨ at Berlin, Arnimall ee 6, 14195 B er lin ‡ B. T.-S. was supp orted by Deutsc he F orsc hungsgemeinsc haft (DFG) Gran t No. SFB 740 § H. W. was supp orted by DFG Grant No. SFB 1114 ¶ F. N. was s upp orted by Europ ean Researc h Council (ER C) starting gran t pcCell 1 the c hain. The reformulation ca n als o b e applied in order to so lve a num be r of related MLE problems arising if a dditional infor mation ab out the chain is av ailable a priori . A broader class of interesting MLE pr oblems for reversible Mar ko v chains ca n th us be solved. In [2 3, 24] the r eversible MLE problem has been extended to the discrete tr an- sition ma trix reweight ing analysis metho d (dTRAM). F or dTRAM, simulation data at multiple biasing co nditions, also called thermo dynamic sta tes, is co llected in o rder to efficiently estimate the sta tio nary vector at the unbiased condition. A p ositive reweigh ting transformatio n r e lates eac h stationa r y v ector at a biased condition to the stationary vector at the unbiased co ndition. This coupling b etw een unbiased and bi- ased condition makes it p oss ible to combine the infor mation from all ensembles into the desired estima te for the unbiased situation. The dTRAM pro ble m w a s previously solved through an applicatio n of a self co n- sistent itera tion pro cedure to the dual r eformulation [23]. This appr oach can require a large num b er of iteratio ns to conv erg e. W e show that the conv ex- concav e refor- m ula tion of the reversible MLE pr oblem can b e extended to als o c over the dTRAM problem. The resulting conv ex-concave pro gram can b e solved using the algo rithm outlined in [19]. The large linear s y stems aris ing during the computation of the search direction ca n b e efficient ly so lved using a Sc hur complement appr oach similar to the one outlined in [11, 2 5]. The resulting a lgorithm achiev es a significant sp eed-up c om- pared to the s e lf consistent itera tion. 2. Mark o v c hain e stimation. A Markov chain o n a finite state space is com- pletely c hara cterized by a square matrix of co nditional pr obabilities, P = ( p ij ) ∈ R n × n . The en try p ij is the pr obability for the chain to make a transition to state j given tha t it currently resides in state i . The matrix P is sto chastic, i.e. P j p ij = 1 for all i . If P is irreducible then there exists a unique v ector, π = ( π i ) ∈ R n , of po sitive probabilities such that π is inv ariant under the action of P , π T P = π T . The vector π is called the s ta tionary vector of the chain. If there is a vector, π , of probabilities for which P fulfills the following detaile d balance condition, (2.1) π i p ij = π j p j i then the chain is a re versible Markov chain with stationa ry vector π , [12]. In Markov chain estimation one is interested in finding an optimal transition ma- trix e stimate P from a given finite o bserv ation X = { X 0 , X 1 , . . . , X N } o f a Markov chain with unknown transition matrix. The matr ix of tra nsition co unts C = ( c ij ) to- gether with the initial s tate X 0 = x 0 is a minimal sufficien t s tatistics for the transition matrix [8]. The element c ij denotes the observed n umber of trans itions betw een state i a nd sta te j in X . The matr ix P is optimal if it maximizes the following log-likeliho o d (2.2) L ( C | P ) = X i,j c ij log p ij . F or finite ensemb le s consisting of finite length observ ations one ca n simply add the matrices of transition co un ts for ea c h o bserv ation. The accumulated co unts together with the empirical measur e of the initial states is then a sufficient statistics for the finite ensemble of observ ations. F or reversible Marko v chain estimation one c onstrains the genera l Marko v chain MLE problem to the set of all stochastic matrices for which detailed balance with 2 resp ect to s ome vector of probabilities holds. Thus w e can find the reversible MLE transition matrix fr om the following nonlinear progra m, (2.3) min π ,P − X i,j c ij log p ij sub ject to p ij ≥ 0 , X j p ij = 1 , π i > 0 , X i π i = 1 , π i p ij = π j p j i . In [23 , 24] pro blem (2.3) has b een extended to the dis crete tr ansition matrix reweigh ting analysis method (dTRAM). F or dTRAM, simu la tion data a t m ultiple thermo dynamic sta tes α = 0 , . . . , M is co llected in order to efficiently estimate the stationary vector a t the unbiased co ndition, α = 0. A p ositive reweighting transfor - mation r elates the sta tionary vector at the biased condition, α > 0 , to the stationa ry vector at the unb ia sed condition, (2.4) π ( α ) i = U ( α ) i π (0) i = exp( u ( α ) i ) π (0) i . This coupling allows us to combine the information from all ensembles in to the esti- mate for π (0) . The dTRAM problem consists of r eversible MLE problems for each thermo dy- namic sta te coupled via the reweigh ting trans fo rmation (2.4). The desired stationary vector can b e obtained as the optimal po int of the following nonlinear progra m, (2.5) min π ( α ) ,P ( α ) − X α X i,j c ( α ) ij log p ( α ) ij sub ject to p ( α ) ij ≥ 0 , X j p ( α ) ij = 1 , π ( α ) i > 0 , X i π ( α ) i = 1 , π ( α ) i p ( α ) ij = π ( α ) j p ( α ) j i , π ( α ) i = U ( α ) i π (0) i . W e show that the conv ex-co ncav e reformulation of the r eversible MLE prob- lem can be extended to deriv e a n efficient numerical algor ithm for the solution of the dTRAM pr o blem. Additional s tr ucture in the linear systems arising dur ing the primal-dual iteration can b e used so that the problem can be s o lved efficiently for many coupled chains. 3. Dual of the rev ersi ble MLE problem. In [23] a duality argument was used to show that finding the MLE of (2.3) for g iven positive w e ight s π i is equiv alent to the following concav e maximization pr oblem, (3.1) max x X i,j c ij log( π i x j + π j x i ) − X i,j c ij log π j − X i x i sub ject to x i ≥ 0 . The x i corres p ond to the Lagr ange multipliers for the row normaliz ation constra int in the primal problem (2 .3) . The optimal tr a nsition pro babilities can be re cov ered according to (3.2) p ∗ ij = ( c ij + c j i ) π j π i x ∗ j + π j x ∗ i , j 6 = i. The vector x ∗ denotes the optimal p oint of (3.1 ) a nd the diagonal entries p ∗ ii are deter- mined b y the r ow normalization c o ndition. It is clea r that p ∗ ij is a pr op er pr obability 3 irresp ective o f the no rmalization of the weigh ts since any scaling of π i cancels o ut in (3.2). In [23] the ineq uality co nstraints on x i were not made e xplicit. The non-nega tivit y requirement ca n b e seen from the following splitting o f the Lagrang ian L π in [23], (3.3) L π ( P, λ, ν ) = − X i,j ∈ I c ij log p ij + X i,j ∈ I ( π i ( λ ij − λ j i ) + x i ) p ij + X i,j / ∈ I ( π i ( λ ij − λ j i ) + x i ) p ij − X i x i with index s et I = { ( i, j ) | c ij > 0 } and the constraint p ij ≥ 0. The v alue min x L π is not b ounded fro m below if π i ( λ ij − λ j i ) + x i < 0 for some ( i, j ) / ∈ I . Therefor e x i ≥ 0 for a ll ( i, i ) / ∈ I . It is a lso not b ounded from below if π i ( λ ij − λ j i ) + x i ≤ 0 for some ( i, j ) ∈ I , so that x i > 0 for all ( i, i ) ∈ I , Using the dual function from [23] the reformulation o f the reversible MLE prob- lem, (2.3), as a saddle-p oint problem with constr aint s is (3.4) min π max x X i,j c ij log( π i x j + π j x i ) − X i,j c ij log π j − X i x i sub ject to x i ≥ 0 , π i > 0 , X i π i = 1 . is concave in x but non-convex in π . The problem can how e ver be easily ca s t into a conv ex-co ncav e form by the fo llowing change of v ariables, (3.5) π i ∝ e y i , and by r eplacing the normaliza tion conditio n with the s impler constraint (3.6) y 1 = 0 . The constraint in (3.6) removes the in v ar iance of the ob jective in (3) with res p ect to a consta n t shift o f y . P rop er stationar y probabilities π i can b e obtained fr om the new v ariables y i according to (3.5) follow ed by straig h tfor w ard normalization. The v ariable y i is the negative free energy of the state i . The final form o f the dual reversible MLE pr oblem is (3.7) max y min x − X i,j c ij log ( x i e y j + x j e y i ) + X i x i + X i,j c ij y j sub ject to x i ≥ 0 , y 1 = 0 . The ob jective in (3.7) is conv e x in x and concave in y . The feasible set is co n vex so that (3.7) is a conv ex- concav e prog ram. F or a given state space with n states the o riginal reversible MLE problem (2.3 ), a non- c o nv ex constra ined minimization problem in O ( n 2 ) unknowns, is reduced to a conv ex-co ncav e progr amming problem in O ( n ) unknowns with simple constraints. 3.1. Scaling. W e observe that the num b er of iterations needed for the solution of (3.7) using the a lgorithm from [1 9] can b e drastica lly reduced by sca ling the count- matrix by a constant factor γ chosen as (3.8) γ = max i,j c ij − 1 . 4 With scaled entries ˜ c ij = γ c ij and scaled v aria bles ˜ x = γ x , ˜ y = y we hav e (3.9) ˜ f 0 ( ˜ x, ˜ y ) = γ f 0 ( x, y ) + const. The cons traints in (3.7) a re inv a riant under the sca ling so that the optimal po in t for (3.7) can be obtained from the optimal so lutio n to the scaled pro blem. The r esulting statio na ry probabilities as well as the transition pr obabilities ar e inv aria n t under the scaling, (3.10) ˜ p ij = (˜ c ij + ˜ c j i ) e ˜ y j ˜ x i e ˜ y j + ˜ x j e ˜ y i = ( c ij + c j i ) e y j x i e y j + x j e y i = p ij . 3.2. Sp ecial cases and extensions. The reversible estimation pr oblem with fixed stationa ry vector π (3.11) min P − X i,j c ij log p ij sub ject to p ij ≥ 0 , X j p ij = 1 , π i p ij = π j p j i is a conv ex problem and can efficiently b e solved in its dual formulation (3.1) us ing an interior-po int method for convex progra mming problems. The reversible estimation problem with pa rtial information a bo ut the stationa ry vector (3.12) min π ,P − X i,j c ij log p ij sub ject to p ij ≥ 0 , X j p ij = 1 , π i > 0 , X i π i = 1 , π i p ij = π j p j i , π i = ν i i ∈ I , with I ( { 1 , . . . , n } and given po s itive weigh ts ( ν i ) i ∈ I can b e solved via its dual (3.13) max y min x − X i,j c ij log ( x i e y j + x j e y i ) + X i x i + X i,j c ij y j sub ject to x i ≥ 0 , y i = log ν i i ∈ I . The r eversible estima tion pr oblem with bo und-constrained information a bo ut the stationary vector (3.14) min π ,P − X i,j c ij log p ij sub ject to p ij ≥ 0 , X j p ij = 1 , π i > 0 , X i π i = 1 , π i p ij = π j p j i , η i ≤ π i ≤ ξ i i ∈ I . with I ⊆ { 1 , . . . , n } a nd given p ositive b ounds ( η i ) i ∈ I , ( ξ i ) i ∈ I can b e solved via the dual (3.15) max y min x − X i,j c ij log ( x i e y j + x j e y i ) + X i x i + X i,j c ij y j sub ject to x i ≥ 0 , log η i ≤ y i ≤ log ξ i i ∈ I . 5 The t wo pr o blems (3.13), (3.15) are conv ex-co ncav e pro gramming pr oblems. Non- linear, con vex ineq ua lit y and linear e q uality co ns traints p oss ibly co upling x and y can also b e treated within the algor ithmic framew o r k of [19]. A specia l case with p ossible int er est for applica tions are b ound constraints on the in teg rated stationar y weights on subsets S ⊆ { 1 , . . . , n } , (3.16) X i ∈ S π i ≤ ν. Equation (3.16) can b e express ed in ter ms of v aria bles y i as (3.17) log X i ∈ S e y i ≤ log ν k , The loga rithm of a sum of exp onentials is a co n vex function, [5]. 3.3. dTRAM. W e ca n apply the duality ar gument to each thermo dynamic state in (2.5) a nd int r o duce the coupling b etw een different ensembles, (2.4), thro ugh linear equality constr aints. The r esulting conv ex- concav e prog ramming pro blem is (3.18) max y ( α ) min x ( α ) − X α X i,j c ij log x ( α ) i e y ( α ) j + x ( α ) j e y ( α ) i + X i x ( α ) i + X i,j c ij y ( α ) j sub ject to x ( α ) i ≥ 0 , y ( α ) i − y (0) i = u ( α ) i , y (0) 1 = 0 . The n umber of iterations required to solve the dTRAM pr oblem is also greatly reduced by s caling each count-matrix ac cording to (3.19) ˜ c ( α ) ij = γ c ( α ) ij with (3.20) γ = max α,i,j c ( α ) ij As for the reversible MLE pr o blem a larger class of related dTRAM problems can b e solved by augmenting the dual problem (3.1 8) with con vex constraints, e.g. dTRAM with pa rtial or b ound constrained informa tion ab out the unbiased stationa ry vector. It mu st be ensured that the additiona l cons traints on the biased stationar y probabilities do not result in an infeasible problem, i.e. the r e w e ig ht ing co ndition (2.4) and the constraints cannot b e fulfilled simultaneously . 4. Con vex-conc av e programs and v ariational inequali ties. A conv ex-concave progra m is the fo llowing saddle p oint pro blem, (4.1) max y min x f ( x, y ) sub ject to ( x, y ) ∈ K with f co nvex in x , concav e in y , a nd K ⊆ R n a conv ex set. Conv ex-co ncav e programs can be treated as sp ecial cases of finite-dimensional v ariational ine q uality (VI) problems, [10]: F or a given feasible set K ⊆ R n and a mapping Φ : K → R n find a p oint z ∗ ∈ K such that (4.2) ( z − z ∗ ) T Φ( z ∗ ) ≥ 0 ∀ z ∈ K . 6 An y p oint z ∗ satisfying (4.2) is a solutio n or optimal p oint for the VI. The conv ex- concav e prog r am is ca st int o the VI-form by defining (4.3) Φ( z ) = ∇ x f ( x, y ) −∇ y f ( x, y ) , z = ( x, y ) . A mapping Φ is said to b e monotone if (4.4) ( z ′ − z ) T (Φ( z ′ ) − Φ( z )) ≥ 0 ∀ z ′ , z ∈ K . Monotonicity of (4.3) follows fro m the conv ex- concav e prop erty of f . If K is a conv ex p olyhedr al set, i.e. solely defined in terms of linea r equalities and inequalities, (4.5) K = { z ∈ R n | Az − b = 0 , Gz − h ≤ 0 } , then z so lves the VI (4.2) if a nd o nly if there are vectors λ , ν , s , s uc h that the following KKT-conditio ns ar e fulfilled [10], (4.6) Φ( z ) + A T ν + G T λ = 0 Az − b = 0 Gz − h + s = 0 λ T s = 0 λ, s ≥ 0 The vectors λ and ν are dual v ar ia bles ass o ciated with the inequality and equalit y constraints. The vector of slack v ariables, s = ( h − Gz ), transforms the linear in- equality constraints for z into simple non-negativity constr aints fo r s . Optimality conditions for conv ex K in s tandard for m, i.e. defined by a finite num b er of linear equalities and conv ex inequalities, are a lso av ailable, cf. [10]. A direct a pplication of a Newton type method to (4.6) ensuring p ositivity of λ and s is usually unsuccessful since the solution progress r apidly stagnates once the iterates appro ach the b oundary of the feasible set. A pos sible str a tegy to circumv ent this pr oblem is numerical path-following. In- stead of attempting a direct solutio n of (4 .6) path-following pr o ceeds b y so lving a sequence of pr oblems with p erturb ed complementarit y condition, (4.7) Φ( z ) + A T ν + G T λ = 0 Az − b = 0 Gz − h + s = 0 λ T s = µ λ, s ≥ 0 tracing the central path of s olutions z ∗ ( µ ) tow ards z ∗ (0) with µ → 0 + . Perturbing the complemen ta rity co ndition ensur es that the bo undary o f the feasible set is not reached pr ematurely and the iteration makes go o d pro gress along the computed search direction. Int er ior-p oint metho ds ensure the po sitivity of λ a nd s at each step of the itera- tion. If in addition a strictly feasible starting po in t Az (0) − b = 0, Gz (0) − h + s (0) = 0 7 is use d then all iterates pro duced by the algo r ithm lie in the int er ior of the feasible region. Progr ess to wards a so lution of the p erturb ed KKT- conditions (4.7) is usually made by taking steps a lo ng the Newton direction computed from the following linear system, (4.8) D Φ( z ) A T G T 0 A 0 0 0 G 0 0 I 0 0 S Λ ∆ z ∆ ν ∆ λ ∆ s = − Φ( z ) + A T ν + G T λ Az − b Gz − h + s S Λ e − µ e , with diag o nal matrices S = dia g( s 1 , s 2 , . . . ), Λ = diag( λ 1 , λ 2 , . . . ), the vector e = (1 , 1 , . . . ), and the per turbation para meter µ > 0 . W e use the following sho rt-hand no tation for the dual residuum, (4.9) r d = Φ( z ) + A T ν + G T λ, the primal re s iduals, (4.10) r p, 1 = Az − b , r p, 2 = Gz − h + s, and the p erturb ed co mplemen ta ry slackness, (4.11) r c ( µ ) = S Λ e − µ e . Solving the linear system (4.8) is the most exp ensive part of the a lg orithm. The sparse blo ck s tructure of (4.8) ca n be us e d to significantly sp eed up the solution pro cess. E limination of ∆ s and ∆ λ reduces (4.8) to the augmente d system (4.12) H A T A 0 ∆ z ∆ ν = − r d + G T Σ r p, 2 − G T S − 1 r c ( µ ) r p, 1 , with diagona l matrix Σ = S − 1 Λ and augmented Ja cobian H = D Φ + G T Σ G . The increments ∆ λ and ∆ s can b e c o mputed from ∆ z , (4.13) ∆ s = − r p, 2 − G ∆ z ∆ λ = − Σ∆ s − S − 1 r c ( µ ) . F or nons ing ular H further elimination of ∆ z fr om (4.12) is p ossible. The resulting normal e quations for ∆ ν are, (4.14) S ∆ ν = r 2 − AH − 1 r 1 . The vectors r i are the tw o co mpo nen ts of the RHS of (4.12) and the ma tr ix S = AH − 1 A T is the Sch ur complement of H . The increment ∆ z can then b e computed according to (4.15) ∆ z = − H − 1 ( r 1 + A T ∆ ν ) . A singular matrix H can for example o ccur for an equa lit y - constrained convex progra mming pro blem for whic h the ob jective is not strictly conv ex. Even if the 8 constraints ensure that the problem ha s a unique solution, H will b e singular so tha t the normal e q uations c a n not b e for med. F or co n vex prog r amming problems a non- s ingular H ca n b e efficiently factorized using a symmetric po sitive-definite Cho lesky fa ctorization. In the conv ex-co ncav e case the Jaco bian of the mapping Φ is not symmetric, (4.16) D Φ( z ) = ∇ x ∇ x f ( x, y ) ∇ y ∇ x f ( x, y ) T −∇ y ∇ x f ( x, y ) −∇ y ∇ y f ( x, y ) . In that case the a ug men ted sy stem is not symmetric and the Cho lesky facto rization can not b e us ed. A further sp eed-up in the co mputation of the Newto n direction can b e a chiev ed through the ex ploitation of spa rse o r blo ck-sparse s tructure po ssibly pres e n t in D Φ , G , A . In this situatio n solution via an itera tive method can b e particular ly efficien t if a go o d preco nditioner is av ailable. 5. Implementation detail s. In order to apply the a lgorithm in [19] to the reversible MLE pro blem (3.7) w e transfor m the convex-concav e prog ram into the VI form using the mapping Φ = ( ∇ x f , −∇ y f ) in (4 .3). The g r adient of the ob jective in (3.7) is g iven by (5.1) ∂ x k f = − X j ( c kj + c j k ) e y j x k e y j + x j e y k + 1 ∂ y k f = − X j ( c kj + c j k ) x j e y k x k e y j + x j e y k + X i c ik . F or the computation of the Newton direction w e also need the Jac o bian D Φ. The diagonal blo cks ar e given by (5.2) ∂ x k ∂ x l f = X j ( c kj + c j k ) e y j e y j ( x k e y j + x j e y k ) 2 δ k,l + ( c kl + c lk ) e y k e y l ( x k e y l + x l e y k ) 2 , ∂ y k ∂ y l f = − X j ( c kj + c j k ) x k e y j x j e y k ( x k e y j + x j e y k ) 2 δ k,l + ( c kl + c lk ) x k e y l x l e y k ( x k e y l + x l e y k ) 2 , and off-diagona l blo cks are given by (5.3) ∂ y k ∂ x l f = X j ( c kj + c j k ) e y k x j e y j ( x k e y j + x j e y k ) 2 δ k,l − ( c kl + c lk ) x k e y k e y l ( x k e y l + x l e y k ) 2 , ∂ x k ∂ y l f = ∂ y l ∂ x k f . It is s traightforw a rd to enco de the equality and inequalit y co nstraints in (3 .7 ) int o matrices A , G and vectors b , h . (5.4) A = (0 , . . . , 0 | {z } n , 1 , 0 , . . . , 0 | {z } n ) , (5.5) b = 0 , (5.6) G = ( − I n , 0 n ) , 9 (5.7) h = (0 , . . . , 0) T with I n the identit y a nd 0 n the zero matrix in R n × n . The Jacobia n D Φ is singular b ecaus e of the inv ariance of the ob jective f under a constant shift of y ; this is also true for the a ugmented Jacobian H s ince the inequalities act only o n x . Therefor e the normal equations (4.1 4) cannot b e formed and the search direction has to b e computed fro m the augmented sys tem (4.1 2). The blo cks of D Φ hav e the same spar sit y pattern a s the matrix C s = C + C T . This matrix is usually sparse. The augmented J acobian differs from the orig inal Jacobia n only on the dia g onal so that it is also spar se in a situation in which C s is spars e . The equality constra in ts for the r eversible MLE problem do o nly affect the y v ariables, i.e. A = (0 , A y ). The augmented system, (4 .12), can be c a st in to the following symmetric form, (5.8) H xx H y x 0 H T y x − H y y − A T y 0 − A y 0 ∆ x ∆ y ∆ ν = b x − b y − b ν . The a ug men ted sys tem matrix, W , on the left-hand side o f (5.8) is indefinite so that a symmetric indefinite factoriz a tion, [6], or the minim um re s idual (MINRES) metho d, [17], can be used to solve (5.8) . If an iterative metho d is used, a s uita ble preconditioner needs to remov e the ill-conditioning due to the Σ = S − 1 Λ term in H . MINRES r equires a p ositive definite pr econditioner. W e use a po sitive definite diagonal preconditioning matrix, T , with diagonal entries, (5.9) t ii = ( | w ii | if | w ii | > 0 1 else . 5.1. dTRAM. W e ca n a lso a pply the primal-dua l interior-p oint method to the conv ex-co ncav e reformulation of the dTRAM problem, (3.18). The dTRAM problem consists of a reversible MLE problem for eac h thermo dynamic state co upled via an equality constr aint. The r esulting VI-mapping fo r dTRAM is given by the vector Φ = (Φ 0 , . . . , Φ m ) . The en tr y Φ α is the mapping for the reversible MLE pro blem at thermodyna mic state α . Since Φ α depe nds o nly o n v ariables ( x ( α ) , y ( α ) ) the Jacobian of Φ α has a blo ck-diagonal structure D Φ = D Φ 0 . . . D Φ m The ma trix D Φ α is the mapping for the reversible MLE problem at thermo dyna mic state α . T he linear inequality co nstraints at different α are deco upled so that G is also blo ck dia gonal, G = G 0 . . . G m . 10 The blo ck G ( α ) is the matrix o f inequality cons traints at thermo dynamic sta te α , G α = ( − I n , 0 n ) , and h = 0 is the corr e s po nding RHS. The matrix for the equality constraints has the following for m, A = A 0 0 . . . 0 A 1 , 0 A 1 . . . 0 . . . . . . . . . . . . A m, 0 0 . . . A m with A 0 = (0 , . . . , 0 , 1 , . . . , 0) the co nstraint matrix for the unbiased ens e m ble, α = 0, and A α = (0 n , I n ) the constra int matrix at condition α 6 = 0. The matrix A α, 0 = (0 n , − I n ) is the co upling matrix b etw een biased a nd un bia s ed ensemble. The co rre- sp onding RHS is b = b 0 . . . b m with b 0 = 0 , and b α = ( u ( α ) i ) the v ector of energy differences with respect to the un bia sed condition. The blo ck-diagonal fo rm of D Φ and G can b e exploited for the solution of the aug- men ted sy stem. The blo ck diagona l structure o f D Φ and G implies a blo ck diag onal structure for H , (5.10) H = H 1 . . . H m , . The block H α = D Φ α + G T α Σ α G α is the augmented Ja c obian at thermodyna mic state α . Using the blo ck structure of H a nd A , the aug men ted system (4.12) can b e reorder ed res ulting in the following linea r system, (5.11) W 0 B T 1 , 0 . . . B T m, 0 B 1 , 0 W 1 . . . 0 . . . . . . . . . . . . B m, 0 0 . . . W m ∆ ξ 0 ∆ ξ 1 . . . ∆ ξ m = − ˜ b 0 ˜ b 1 . . . ˜ b m The augmented s ystem matrix at c o ndition α is (5.12) W α = H α A T α A α 0 . The coupling b etw een the bia sed condition a nd the unbiased c o ndition is enco ded in the matrix (5.13) B α, 0 = 0 0 A α, 0 0 α 6 = 0 . 11 The vector ∆ ξ α = (∆ z α , ∆ ν α ) is the resulting increment for the augmented s ys- tem at condition α . The vector ˜ b α in (5.11) is given b y the RHS of the augmented system at co ndition α , (5.14) ˜ b α = r ( α ) d + G T α Σ α r ( α ) p, 2 − G T α S − 1 α r ( α ) c ( µ ) r ( α ) p, 1 ! . The arr ow-shaped s tructure of the linear system in (5.11) allows us to apply the Sch ur co mplement metho d, [1 1, 25], to eliminate ∆ ξ 1 , . . . , ∆ ξ m and solve the fo llowing condensed system for ∆ ξ 0 , (5.15) S ∆ ξ 0 = − ˜ b 0 − m X α =1 B T α, 0 W − 1 α ˜ b α ! The Sch ur co mplemen t matr ix is (5.16) S = W 0 − m X α =1 B T α, 0 W − 1 α B α, 0 ! . All other incre ments can b e computed fr om ∆ ξ 0 via (5.17) ∆ ξ α = − W − 1 α ˜ b α + B α, 0 ∆ ξ 0 F or a system with n states at m ther mo dynamic conditions the complexity for a direct factoriz a tion o f the Newton system (4.8) is O ( m 3 n 3 ). The Sch ur comple- men t approach r e duces co mplexity to O ( mn 3 ). In addition, assem bly of the Sch ur complement in (5.1 6) and solution o f (5.17) can b e easily pa r alellized. As for the rev ersible MLE case, the blo cks of D Φ α hav e the same sparsity pattern as the matrix C ( α ) s = C ( α ) + C ( α ) T . The same is tr ue for the augmented Ja cobian H α except for the diagonal. Since C ( α ) s is usually sparse we use a sparse LU metho d to factor the augmented sys tem matrices W α for α > 0. The direct assembly o f the Sch ur complement in (5.1 6) is expe nsive since the computatio n of W − 1 α B α, 0 requires O ( n ) solves. If an itera tiv e metho d is used to solve the condensed system (5.15) one would like to avoid as sembly of the Sch ur c o mplemen t S in (5.16) all to g ether. Instead o nly few matrix-vector pro ducts in volving S should b e computed. As for the r eversible MLE case, w e can transform the condensed system in to a symmetric indefinite form and use MINRES to o btain a solution. Obtaining a g o o d preconditioner without ex plicit assembly of S is difficult. W e use the probing method outlined in [7] to obtain an approximation of the diagonal of S using o nly few matrix -vector pr o ducts. W e then construct a p ositive definite diag onal pr econditioning matrix T with entries t ii = ( | ˜ s ii | if | ˜ s ii | > 0 1 else . The entry ˜ s ii denotes the diagonal entry estimated by the pr o bing a ppr oach. The Sch ur co mplemen t bas ed solution ca n also b e applied to the dTRAM pro ble m with additional constraints whenever those constraints do not couple differen t biasing conditions. 12 T ab le 1 R eversible MLE pr oblem. Newton-IP algorithm v s. SC- i ter ation. We r ep ort t he numb er of states N , the gr owth factor for states N/n ( n is the numb er of states in the pr evious r ow), the total algorithm run time T (in se c onds), the gr owth factor for run time T /t ( t is the run time in t he pr evious r ow), the sc aling exp onent for run time with incr e asing numb er of state s p , ( T ∝ N p ), and the sp e e dup of t he Newton-IP metho d over the SC-iter ation SC/IP. The sc aling is sub quadr atic for b oth metho ds. The Ne wton-IP algorithm achieves a signific ant sp e e d-up ove r the SC-iter ation for al l e xamples exc e pt the p entap eptide. System N N /n Newton-IP SC-iteration SC/IP T T /t p T T /t p Three-well 361 1.1 4.6 4.0 2134 5.9 7.3 6.4 1.0 75.1 16.2 1 .6 10.2 8190 3.8 5 6 .8 7.7 1.5 40 0.3 5.3 1.2 7.0 29618 3.6 286.8 5.0 1.3 10 76.9 2.7 0.8 3.8 Alanine 292 0.7 4.2 6.3 1059 3.6 4.2 6.4 1.4 32.3 7.8 1.6 7.6 3835 3.6 3 2 .2 7.6 1.6 21 4.0 6.6 1.5 6.6 5826 1.5 6 1 .8 1.9 1.6 34 7.7 1.6 1.2 5.6 Pen tap eptide 250 0.6 0.2 0.4 500 2.0 1.2 1.9 0.9 0.6 2 .4 1 .3 0.5 1000 2.0 3.6 3.0 1.6 1.0 1.8 0.9 0.3 2000 2.0 5.4 1.5 0.6 1.3 1.3 0.4 0.2 Birth death 100 1.0 10.4 10.6 200 2.0 2.1 2.1 1.1 34.1 3.3 1.7 16 .3 500 2.5 5.8 2.8 1.1 18 5.3 5.4 1.8 31.7 1000 2.0 1 3 .9 2.4 1.3 33 8.7 1.8 0.9 24.3 6. Results. Below we rep ort results for the primal-dua l interior-p oint (Newton- IP) and the self cons is ten t itera tion (SC-iteration) approach to so lving the reversible MLE and dTRAM problem. W e compar e the efficiency of both alg orithms for a nu mber of examples. Using iterative metho ds for the so lution of the linear sys tems arising in the Newton-IP approa ch we a c hieve a similar scaling b ehavior as for the SC- iteration. W e demonstrate that the Newton-IP approa ch offers a significant sp eedup for nearly a ll examples. 6.1. Rev ersi ble MLE. In T a ble 1 we compar e the p erfo r mance of the algor ithm for differen t exa mple data- sets. The count matrix was estimated from the full data set using the sliding-w indow metho d [18]. The tolerance indica ting co n vergence was tol = 10 − 12 for b oth algor ithms. Both metho ds exhibit a sub quadratic scaling in the nu mber of s ta tes. The Newton-IP metho d is able to achieve a s ig nificant sp eed-up ov er the SC-itera tion for all examples except for the p entapeptide data. In Figure 1 we show the per formance o f b oth metho ds fo r the alanine dipeptide system with 36 1 states. F or the SC-iteration the num be r of itera tions required to conv erg e to a given tolera nce is very v ariable across differen t da ta sets. The total nu mber of itera tions required to conv erg e deterio rates with increa sing amount of input data. F or the Newton-IP metho d the requir ed num be r of iterations is consis ten t acro ss all data sets. Both metho ds exhibit sub quadratic s c aling in the num b er of o bs erved states. 13 0 2 4 6 8 10 12 14 16 Number of iterations, N 10 − 16 10 − 14 10 − 12 10 − 10 10 − 8 10 − 6 10 − 4 10 − 2 10 0 Error, k π ( k ) − π ∗ k T = 200 ns T = 1 µs T = 2 µs T = 5 µs T = 10 µs (a) 0 1000 2000 3000 4000 5000 6000 7000 Number of iterations, N 10 − 10 10 − 9 10 − 8 10 − 7 10 − 6 10 − 5 10 − 4 10 − 3 Error, k π ( k ) − π ∗ k T = 200 ns T = 1 µs T = 2 µs T = 5 µs T = 10 µs (b) 10 2 10 3 10 4 Number of states, n 10 − 1 10 0 10 1 10 2 10 3 Time, t in s Newton-IP sc-iteration t ∝ n 2 (c) Figure 1 . Comp arison of Newton i nt erior-p oint metho d, a), and self-c onsistent i t er ation, b) for the alanine dip eptide example. Conver g e nc e is plo t te d for differ ent data sets co rr e sp onding to differ- ent amounts of total simulation time. The ve ctor π ∗ is a r efer enc e st ationary distribution obtaine d fr om the c onver ged Newton interior-p oint metho d. The N ewton interior-p oint metho d c onver g es sup erline arly, the self- c onsistent iter ation c onver ges linea rly. The numb er of re q uir e d iter ations is very sensitiv e to the input data se t for the SC-iter ation while the Newton-IP metho d is only mild ly affe cte d. c) Both metho ds e xhibit a sub quadr atic sc aling in the numb er of states. The Newton-IP metho d achieves a signific ant sp e e d-up over the SC-itera t i on. 6.2. dTRAM . In T able 2 we c o mpare the p e r formance of the Newton-IP and the SC-iteration for different exa mples. The count matrix was estimated from the full data set using the sliding-window metho d [18]. The tolera nce indicating conv ergence was tol = 1 0 − 10 for b oth alg o rithms. The Newto n- IP metho d is more e fficie n t for all three examples and achieves a dramatic sp eed-up (orders of magnitude). The Sch ur complement probing appr o ach is succes s ful fo r the a lanine and the doublewell umbrella sampling example. F or the m ulti-temp era tur e example the Sch ur complement was assembled and the condensed s ystem was solved using a direct metho d. F or the SC-iteration metho d the req uired time to so lve the multi-temperature exa mple was very lar ge s o that computations were only car ried out for t wo examples with a small nu mber of states. Both metho ds scale linearly in the num b er o f thermo dynamic s tates. The Newton- IP metho d with Sch ur complement probing scales at most quadratic in the num b er of states. If the Sch ur co mplemen t is assembled and facto red by a direct metho d the scaling is b etw een quadratic and cubic. The SC-iteration exhibits quadratic sca ling in the num b er of s ta tes. The Newton-IP metho d achiev es order s of magnitude sp eed-up compared to the SC-iteration for a ll examples. In Figure 2 we s how p erforma nce of the Newton-IP and SC-iteration for the dou- 14 T ab le 2 Newton-IP algorithm vs. SC-iter ation for the dTRAM pr oblem. We r e p ort the numb er of states N , the numb er of thermo dynamic state M , the gr owth factor for states N/n ( n is t he numb er of states in the pr evious ro w), the total algorithm run time T (in se co nds), the gr owth factor for run time T /t ( t is the run ti me in the pr evious r ow), the sca ling e xp onent for run ti me with i ncr e asing numb er of states p , ( T ∝ N p ), and the sp e e dup of the Newton IP metho d over the SC metho d SC/IP. In one c ase we r ep ort inste ad the gr owth factor of the numb e r of thermo dynamic st ate s M /m ( m is the numb er of states in the pr evious r ow) and the sc aling exp onent for run time with incr e asing numb er of thermo dynamic states ( T ∝ M p ). Both metho d sca le line arly in the numb er of thermo dynamic states. The Newton-IP metho d with Schur c omplement pr obing (alanine, doublewel l with umbr el la sampling) sca les at most quadr atic in the numb er of states. If the Schur c omplement is assemble d and factor ed by a dir e ct metho d (doublewel l wit h indep endent temp er atur e sampling) the sc aling is b etwe en q uadr atic and cubic . The SC-iter ation exhibits quad ra ti c sc aling in the numb er of states. The Newton-IP metho d achieves or ders of magnitude sp e e d-up c omp ar e d to t he SC-iter ation for al l examples. System N M N /n Newton-IP SC-iteration SC/IP T T /t p T T /t p Alanine 292 40 34.0 1263.9 37.2 1521 4 0 5.2 202 .4 6.0 1.1 660 1 8.4 52.2 2.4 326 .2 Doublewell, um br ella 100 20 5.1 115 .5 22.7 199 20 2.0 6.4 1.3 0.3 492.9 4.3 2.1 77 .1 497 20 2.5 17.3 2.7 1.1 3258.4 6.6 2.1 18 8.7 990 20 2.0 48.3 2.8 1.5 13 729.7 4.2 2.1 28 4 .4 1978 2 0 2.0 193 .1 4.0 2.0 598 9 0.5 4.4 2.1 310.1 Doublewell, um br ella 100 20 5.1 115 .5 22.7 100 40 2.0 8.3 1.6 0.7 244.5 2.1 1.1 29 .3 100 80 2.0 16.5 2.0 1.0 72 1.1 2.9 1.6 43.8 100 100 1.2 2 0.9 1.3 1 .1 111 0.6 1.5 1.9 53.1 Doublewell, m ulti- temper ature 100 16 3.7 12223 .2 3285.8 200 16 2.0 10.7 2.9 1.5 50 446.2 4.1 2.0 4705.8 500 16 2.5 79.8 7.4 2.2 1000 1 6 2.0 544 .5 6.8 2.8 blewell um br e lla -sampling ex ample. The Newton-IP metho d ach ie ves a significant sp eed-up (up to tw o orders of mag nitude) ov er the SC-iteratio n. 7. Conclusion. W e s how that the pro ble m of finding the maximum likelihoo d reversible tr ansition matrix on a finite state space is equiv alent to a co nvex-concav e progra mming problem with a muc h smaller num b er o f unknowns and co nstraints. The prima l-dual interior-p oint metho d for monotone v aria tional inequalities out- lined in [19] can b e used to efficiently solve the arising conv e x -concav e prog ram. F o r a nu mber of examples the pro po sed algorithm significantly s peeds up the computation of the r eversible MLE compar ed to a previously pro p os ed fixed-p oint iteration. The conv ex-co ncav e reformulation makes it p oss ible to efficiently solve a num b er of rela ted pr oblems a rising in the co n text of reversible Mar ko v chain estimation. One application of sp ecial interest is statistical r eweigh ting of data fro m mu ltiple ensembles via the dTRAM metho d [23]. W e extend the convex-concav e r eformulation to the dTRAM problem so that it ca n a ls o b e so lved by a primal-dual interior-p oint metho d. W e show that the a rising linea r systems can b e efficien tly solved using a 15 10 2 10 3 Number of states, n 10 − 1 10 0 10 1 10 2 10 3 10 4 10 5 Time, t in s Newton-IP sc-iteration t ∝ n 2 (a) 10 1 10 2 Number of thermo dynamic states, m 10 0 10 1 10 2 10 3 10 4 Time, t in s Newton-IP sc-iteration t ∝ m (b) Figure 2 . Comp arison of the Newton-IP metho d and the SC-iter ati on for the dTRAM pr oblem. We show r esults for the doublewel l p otential with harmonic umbr el la for cing. a) Both metho ds exhibit quadr atic sca ling in the numb er of states, but the Newton met ho d is up t o two or ders of magnitude faster then the sc itera t i on. b) Sc aling is line ar in the numb er of thermo dynamic states for b oth metho ds. Sch ur co mplemen t a pproach. The o utlined alg orithm is shown to sig nificantly sp eed up the so lution pro cess compared to a previous ly pro po s ed fixed-p oint iteration. Similar to the reversible MLE problem a num b er of related dTRAM problems can b e solved using o ur metho d. The efficient linear solution of the arising Newto n systems using the Sch ur-complement metho d ca n b e retained no additio nal coupling betw een the differe n t thermo dyna mic ensembles is intro duced. The inv es tigation of efficient preconditioning techniques for the presented prob- lems remains a topic fo r future resear ch . Obtaining a g o o d prec o nditioner for the Sch ur co mplement without direct ass em bly is of special in teres t for the dTRAM pro b- lem. Ac kno wl edgments. The authors w ould like to thank C. W ehmeyer and F. Paul for stimulating disc us sions. B. T.- S. thanks E. P ipping and C. Gr¨ aser for v a luable comments and s uggestions. REFERENCES [1] D. Aldous and J. A. Fill , R eversible markov chains and r andom walks on gr aphs , 200 2. Unfinished m onograph, recompiled 2014, a v ailable at http://w ww.stat.berkeley.edu/ ~ aldous/R WG/book.html . [2] J . Besag and D. Mondal , Exact go o dness-of-fit t ests for markov c hains , Bi ometrics, 69 (2013), pp. 488–49 6. [3] G . R. Bowman, K. A. Beauchamp, G . Bo x er, and V. S . P an de , Pr o gr ess and chal lenges in the automate d c onstruction of markov state mo dels for ful l pr otein systems , The Journal of Chemical Ph ysics, 131 (2009), pp. –. [4] G . R. Bowman, V. S . P ande, a nd F. No ´ e , A n intr o duction to markov state mo dels and their applic ation t o long timesc ale mole cular simulation , vol. 797, Springer Science & Business Media, 2013. [5] S . Boyd and L. V andenberghe , Convex optimization , Cam bridge univ er sit y press, 2004. [6] J . R. Bunch and L. Kaufman , Some stable metho ds for c alculating inertia and solving sym- metric line ar systems , Mathematics of computation, (1977), pp. 163–179. [7] T. F. C. Chan and T. P. M a thew , The interfac e pr obing te chnique in domain de co mp osition , SIAM Journal on Matrix Analysis and Applications, 13 (19 92), pp. 212–2 38. [8] J . Denny and A. Wright , On tests for markov dep endenc e , Pr obabilit y Theory and Related Fields, 43 (1978), pp. 331–33 8. 16 [9] P. Diaco n is a n d S. W. W. Rolles , Bayesian analysis for r eversible markov chains , Ann. Statist., 34 (20 06), pp. 1270–129 2. [10] F. F acchinei an d J .-S. P ang , Finite -dimensional variational ine q ualities and c omplementarity pr oblems , Springer Science & Business Media, 200 7. [11] J. Kan g, Y. Cao, D. P. Wo rd, and C. Laird , A n interior-p oint metho d for efficient solution of blo ck-structur ed { NLP } pr oblems using an implicit schur-c omplement de c omp osition , Computers & Chemical Engineering, 71 (201 4), pp. 563 – 573. [12] D. A. Levin, Y. Peres, and E. L. Wilmer , Markov chains and mixing times , American Mathematical Society , 2009. [13] N. M etropolis , A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, an d E. Teller , Equation of state c alculations by fast c omputing machines , The Journal of Chemical Ph ysi cs, 21 (1953), pp. 1087–109 2. [14] P. Metzner, F. No ´ e, a nd C. Sch ¨ utte , Estimation of t r ansition matrix distrib utions by monte c arlo sampling , Phys. Rev. E, 80 (2009), p. 02110 6. [15] A. J. N. Nielsen and M. Weber , Computing the nea r e st r eve rsible markov chain , N umer ical Linear Algebra with Applications, 22 (2015), pp. 483–499. [16] F. No ´ e , Pr ob ability distri b utions of mole cular observables c ompute d fr om markov mo dels , J. Chem. Ph ys., 128 (200 8), p. 24410 3. [17] C. C. P aige and M. A. Saunders , Solution of sp arse indefinite sy stems of line ar e quations , SIAM journal on numerical analysis, 12 (1975 ), pp. 617–6 29. [18] J. Prinz, H. Wu, M. Sarich, B. Keller, M. Senn e, M. Held, J. Chodera, C. Sch ¨ utte, and F. No ´ e , Markov mo dels of mole cular kinet i cs: Gener ation and validation , J. Chem. Ph ys. , 134 (2011), p. 1741 05. [19] D. Ralph and S. J . Wright , Sup erline ar c onver genc e of an inte rior-p oint metho d despite dep endent c onstr aints , Mathematics of Op erations Researc h, 25 (2000), pp. pp. 179–194. [20] C. Rober t and G . Casella , Monte Carlo statistic al metho ds , Springer Science & Business Media, 2013. [21] B. Trendelkamp-Schroer and F. No ´ e , Efficient est i mation of r ar e-event kinetics , Phys. Rev. X, 6 (2016), p. 011009. [22] B. Trendelkamp-Schroer, H. Wu, F. P aul, and F. No ´ e , Estimation and unc ertainty of r eversible markov mo dels , J. Chem. Ph ys., 143 (2015). [23] H. Wu, A. S. J. S. Mey, E. Rost a, and F. No ´ e , Statistica l ly optimal analysis of state- discr etize d t r aje ctory data fr om multiple t hermo dynamic states , J. Chem. Phys., 141 (2014), p. 214106 . [24] H. Wu and F. No ´ e , Opti mal e stimation of fr e e e ner gies and stationary densities fr om multiple biase d simulations , M ultiscale Modeling & Sim ulation, 12 (2014), pp. 25–54. [25] V. M. Za v a la , C. D. Laird, and L. T. Biegler , Inte ri or-p oint de c omp osition appr o aches for p ar al lel solution of lar ge-sca le nonline ar p ar ameter e st imation pr oblems , Chemical Engi- neering Science, 63 (2008), pp. 483 4 – 4845. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment