Pymanopt: A Python Toolbox for Optimization on Manifolds using Automatic Differentiation

Optimization on manifolds is a class of methods for optimization of an objective function, subject to constraints which are smooth, in the sense that the set of points which satisfy the constraints admits the structure of a differentiable manifold. W…

Authors: James Townsend, Niklas Koep, Sebastian Weichwald

Journal of Machine Learning Researc h 17 (2016) 1-5 Submitted 4/16; Revised 7/16; Published 8/16 Pymanopt: A Python T o olb o x for Optimization on Manifolds using Automatic Differen tiation James T o wnsend james.townsend.14@ucl.a c.uk University Col le ge L ondon, L ondon, UK Niklas Ko ep niklas.koep@r wth-aachen.de R WTH A achen University, Germany Sebastian W eic hw ald sweichw ald@tue.mpg.de Max Planck Institute for Intel ligent Systems, T¨ ubingen, Germany Editor: Antti Honk ela Abstract Optimization on manifolds is a class of metho ds for optimization of an ob jective function, sub ject to constraints which are smo oth, in the sense that the set of p oin ts which satisfy the constraints admits the structure of a differentiable manifold. While many optimization problems are of the describ ed form, technicalities of differential geometry and the lab orious calculation of deriv atives p ose a significant barrier for exp erimen ting with these metho ds. W e introduce Pymanopt (a v ailable at p ymanopt.github.io ), a toolb o x for optimization on man- ifolds, implemen ted in Python, that—similarly to the Manopt 1 Matlab toolb o x—implements sev- eral manifold geometries and optimization algorithms. Moreo ver, we low er the barriers to users further b y using automated differentiation 2 for calculating deriv ativ e information, sa ving users time and saving them from p oten tial calculation and implementation errors. Keyw ords: Riemannian optimization, non-conv ex optimization, manifold optimization, pro jec- tion matrices, symmetric matrices, rotation matrices, p ositive definite matrices 1. In tro duction Optimization on manifolds, or Riemannian optimization, is a metho d for solving problems of the form min x ∈M f ( x ) where f : M → R is a (cost) function and the search space M is smo oth, in the sense that it admits the structure of a differen tiable manifold. Although the definition of differentiable manifold is technical and abstract, man y familiar sets satisfy this definition and are therefore compatible with the metho ds of optimization on manifolds. Examples include the spher e (the set of p oints with unit Euclidean norm) in R n , the set of p ositive definite matric es , the set of ortho gonal matric es as well as the set of p -dimensional subspaces of R n with p < n , also kno wn as the Gr assmann manifold. T o p erform optimization, the function f needs to b e defined for p oin ts on the manifold M . Elemen ts of M are often represented by elements of R n or R m × n , and f is often w ell defined on some or all of this “am bient” Euclidean space. If f is also differen tiable, it makes sense for an optimization algorithm to use the deriv ativ es of f and adapt them to the manifold setting in order to iteratively refine solutions based on curv ature information. This is one of the key asp ects of Manopt (Boumal et al., 2014), which allows the user to pass a function’s gradient and Hessian to state of the art 1. Manopt is av ailable at manopt.org and w as in tro duced in Boumal et al. (2014). 2. W e use the term automate d differ entiation to refer to the automatic calculation of derivativ es, whether using the method commonly kno wn as automatic differ entiation , as implemented by Autograd (Maclaurin et al., 2015) and T ensorFlow (Abadi et al., 2015), or symb olic differ entiation as implemented by Theano (Al-Rfou et al., 2016). c  2016 James T o wnsend, Niklas Koep, Sebastian W eich wald. Townsend, Koep and Weichw ald solv ers which exploit this information to optimize ov er the manifold M . How ev er, working out and implemen ting gradients and higher order deriv ativ es is a lab orious and error prone task, particularly when the ob jective function acts on matrices or higher rank tensors. Manopt’s state of the art Riemannian T rust Regions solver, describ ed in Absil et al. (2007), requires second order directional deriv ativ es (or a numerical approximation thereof ), which are particularly challenging to work out for the av erage user, and more error prone and tedious even for an exp erienced mathematician. It is these difficulties which we seek to address with this to olb o x. Pymanopt supp orts a v ariety of modern Python libraries for automated differen tiation of cost functions acting on v ectors, matrices or higher rank tensors. Com bining optimization on manifolds and automated differentiation enables a conv enient workflo w for rapid protot yping that was previously unav ailable to practitioners. All that is required of the user is to instantiate a manifold, define a cost function, and c ho ose one of Pymanopt ’s solvers. This means that the Riemannian T rust Regions solver in Pymanopt is just as easy to use as one of the deriv ative-free or first order metho ds. 2. The P otential of Optimization on Manifolds and Pymanopt Use Cases Muc h of the theory of how to adapt Euclidean optimization algorithms to (matrix) manifolds can b e found in Smith (1994); Edelman et al. (1998); Absil et al. (2008). The approach of optimization on manifolds is sup erior to p erforming free (Euclidean) optimization and pro jecting the parameters bac k onto the search space after each iteration (as in the pro jected gradien t descent metho d), and has b een shown to outp erform standard algorithms for a num b er of problems. Hosseini and Sra (2015) demonstrate this adv an tage for a w ell-known problem in machine learn- ing, namely inferring the maximum likelihoo d parameters of a mixture of Gaussian (MoG) mo del. Their alternative to the traditional expectation maximization (EM) algorithm uses optimization ov er a pro duct manifold of p ositiv e definite (cov ariance) matrices. Rather than optimizing the likelihoo d function directly , they optimize a reparameterized version whic h shares the same lo cal optima. The prop osed metho d, which is on par with EM and shows less v ariability in running times, is a strik- ing example why we think a to olb o x like Pymanopt , whic h allows the user to readily exp erimen t with and solve problems inv olving optimization on manifolds, can accelerate and pa ve the wa y for impro ved machine learning algorithms. 3 F urther successful applications of optimization on manifolds include matrix completion tasks (V andereyck en, 2013; Boumal and Absil, 2015), robust PCA (Podosinniko v a et al., 2014), dimension reduction for indep endent comp onen t analysis (ICA) (Theis et al., 2009), kernel ICA (Shen et al., 2007) and similarity learning (Shalit et al., 2012). Man y more applications to machine learning and other fields exist. While a full survey on the usefulness of these metho ds is well b ey ond the scop e of this manuscript, we highlight that at the time of writing, a search for the term “manifold optimization” on the IEEE Xplore Digital Library lists 1065 results; the Manopt to olbox itself is referenced in 90 pap ers indexed b y Go ogle Scholar. 3. Implemen tation Our toolb o x is written in Python and uses NumPy and SciPy for computation and linear algebra op- erations. Curren tly Pymanopt is compatible with cost functions defined using Autograd (Maclaurin et al., 2015), Theano (Al-Rfou et al., 2016) or T ensorFlo w (Abadi et al., 2015). Pymanopt itself and all the required softw are is op en source, with no dep endence on proprietary softw are. T o calculate deriv ativ es, Theano uses sym b olic differentiation, com bined with rule-based opti- mizations, while b oth Autograd and T ensorFlo w use rev erse-mo de automatic differen tiation. F or a discussion of the distinctions b et ween the tw o approaches and an o verview of automatic differentia- tion in the context of machine learning, we refer the reader to Baydin et al. (2015). 3. A quick example implementation for inferring MoG parameters is av ailable at pymanopt.github.io/MoG.html . 2 Pymanopt: A Python Toolbox f or Optimiza tion on Manif olds using Automa tic Differentia tion Muc h of the structure of Pymanopt is based on that of the Manopt Matlab to olbox. F or this early release, we hav e implemen ted all of the solv ers and a n um b er of the manifolds found in Manopt, and plan to implement more, based on the needs of users. The co debase is structured in a mo dular w ay and thoroughly commen ted to make extension to further solvers, manifolds, or bac kends for automated differentiation as straightforw ard as p ossible. Both a user and developer do cumen tation are av ailable. The GitHub repository at github.com/pymanopt/p ymanopt offers a conv enient w ay to ask for help or request features by raising an issue, and contains guidelines for those wishing to con tribute to the pro ject. 4. Usage: A Simple Instructiv e Example All automated differentiation in Pymanopt is p erformed b ehind the scenes so that the amount of setup co de required by the user is minimal. Usually only the follo wing steps are required: (a) Instantiation of a manifold M (b) Definition of a cost function f : M → R (c) Instantiation of a Pymanopt solver W e briefly demonstrate the ease of use with a simple example. Consider the problem of finding an n × n p ositiv e semi-definite (PSD) matrix S of rank k < n that b est approximates a given n × n (symmetric) matrix A , where closeness b et w een A and its low-rank PSD appro ximation S is measured by the following loss function L δ ( S, A ) , n X i =1 n X j =1 H δ ( s i,j − a i,j ) for some δ > 0 and H δ ( x ) , √ x 2 + δ 2 − δ the pseudo-Hub er loss function. This loss function is robust against outliers as H δ ( x ) approximates | x | − δ for large v alues of x while b eing appro ximately quadratic for small v alues of x (Hub er, 1964). This can b e formulated as an optimization problem on the manifold of PSD matrices: min S ∈P S D n k L δ ( S, A ) where P S D n k , { M ∈ R n × n : M  0 , rank( M ) = k } . This task is easily solv ed using Pymanopt : f r o m p y m a n o p t . m a n i f o l d s i m p o r t P S D F i x e d R a n k i m p o r t a u t o g r a d . n u m p y a s n p f r o m p y m a n o p t i m p o r t P r o b l e m f r o m p y m a n o p t . s o l v e r s i m p o r t T r u s t R e g i o n s # L e t A b e a ( n x n ) m a t r i x t o b e a p p r o x i m a t e d # ( a ) I n s t a n t i a t i o n o f a m a n i f o l d # p o i n t s o n t h e m a n i f o l d a r e p a r a m e t e r i z e d a s Y Y ^ T # w h e r e Y i s a m a t r i x o f s i z e n x k m a n i f o l d = P S D F i x e d R a n k ( A . s h a p e [ 0 ] , k ) # ( b ) D e f i n i t i o n o f a c o s t f u n c t i o n ( h e r e u s i n g a u t o g r a d . n u m p y ) d e f c o s t ( Y ) : S = n p . d o t ( Y , Y . T ) d e l t a = . 5 r e t u r n n p . s u m ( n p . s q r t ( ( S − A ) ∗ ∗ 2 + d e l t a ∗ ∗ 2 ) − d e l t a ) 3 Townsend, Koep and Weichw ald # d e f i n e t h e P y m a n o p t p r o b l e m p r o b l e m = P r o b l e m ( m a n i f o l d = m a n i f o l d , c o s t = c o s t ) # ( c ) I n s t a n t i a t i o n o f a P y m a n o p t s o l v e r s o l v e r = T r u s t R e g i o n s ( ) # l e t P y m a n o p t d o t h e r e s t Y = s o l v e r . s o l v e ( p r o b l e m ) S = n p . d o t ( Y , Y . T ) The examples folder within the Pymanopt to olb o x holds further instructive examples, such as p erforming inference in mixture of Gaussian mo dels using optimization on manifolds instead of the exp ectation maximization algorithm. Also see the examples section on pymanopt.github.io . 5. Conclusion Pymanopt enables the user to exp erimen t with different state of the art solv ers for optimization problems on manifolds, like the Riemannian T rust Regions solver, without any extra effort. Experi- men ting with differen t cost functions, for example b y c hanging the pseudo-Huber loss L δ ( S, A ) in the co de ab o ve to the F robenius norm || S − A || F , a p -norm || S − A || p , or some more complex function, requires just a small change in the definition of the cost function. F or problems of greater complex- it y , Pymanopt offers a significant adv an tage ov er to olb o xes that require manual differentiation by enabling users to run a series of related exp erimen ts without returning to p en and pap er each time to w ork out deriv ativ es. Gradients and Hessians only need to b e derived if they are required for other analysis of a problem. W e b eliev e that these adv an tages, coupled with the p otential for ex- tending Pymanopt to large-scale applications using T ensorFlow, could lead to significan t progress in applications of optimization on manifolds. Ac kno wledgmen ts W e would like to thank the developers of the Manopt Matlab to olbox, in particular Nicolas Boumal and Pierre-Antoine Absil, for developing Manopt, and for the generous help and advice they ha ve giv en. W e would also lik e to thank Heiko Strathmann for his thoughtful advice as well as the anony- mous reviewers for their constructive feedback and idea for a more suitable application example. 4 Pymanopt: A Python Toolbox f or Optimiza tion on Manif olds using Automa tic Differentia tion References M. Abadi, A. Agarwal, P . Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghema w at, I. Goo dfello w, A. Harp, G. Irving, M Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev enberg, D. Man´ e, R. Monga, S. Mo ore, D. Murray , C. Olah, M. Sch uster, J. Shlens, B. Steiner, I. Sutskever, K. T alw ar, P . T uck er, V. V anhouc ke, V. V asudev an, F. Vi ´ egas, O. Vin yals, P . W arden, M. W attenberg, M. Wic ke, Y. Y u, and X. Zheng. T ensorFlo w: Large-Scale Machine Learning on Heterogeneous Systems, 2015. URL http://tensorflow. org . P .-A. Absil, C.G. Baker, and K.A. Galliv an. T rust-Region Metho ds on Riemannian Manifolds. F oundations of Computational Mathematics , 7(3):303–330, 2007. P .-A. Absil, R. Mahon y , and R. Sepulchre. Optimization Algorithms on Matrix Manifolds . Princeton Universit y Press, Princeton, NJ, 2008. ISBN 978-0-691-13298-3. R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bay er, A. Beliko v, A. Belop olsky , Y. Bengio, A. Bergeron, J. Bergstra, V. Bisson, J. Bleecher Snyder, N. Bouchard, N. Boulanger- Lewando wski, X. Bouthillier, A. de Br´ ebisson, O. Breuleux, P .-L. Carrier, K. Cho, J. Chorowski, P . Christiano, T. Co oijmans, M.-A. Cˆ ot ´ e, M. Cˆ ot´ e, A. Courville, Y.N. Dauphin, O. Delalleau, J. Demouth, G. Desjardins, S. Dieleman, L. Dinh, M. Ducoffe, V. Dumoulin, S. Ebrahimi Kahou, D. Erhan, Z. F an, O. Firat, M. Germain, X. Glorot, I. Go odfellow, M. Graham, C. Gulcehre, P . Hamel, I. Harlouchet, J.-P . Heng, B. Hidasi, S. Honari, A. Jain, S. Jean, K. Jia, M. Korob o v, V. Kulk arni, A. Lamb, P . Lamblin, E. Larsen, C. Lauren t, S. Lee, S. Lefrancois, S. Lemieux, N. L´ eonard, Z. Lin, J. A. Livezey , C. Lorenz, J. Lowin, Q. Ma, P .-A. Manzagol, O. Mastropietro, R.T. McGibb on, R. Memisevic, B. v an Merri¨ en b oer, V. Michalski, M. Mirza, A. Orlandi, C. Pal, R. Pascan u, M. Pezeshki, C. Raffel, D. Rensha w, M. Ro c klin, A. Romero, M. Roth, P . Sadowski, J. Salv atier, F. Sav ard, J. Schl¨ uter, J. Sch ulman, G. Schw artz, I.V. Serban, D. Serdyuk, S. Shabanian, ´ E. Simon, S. Spieck ermann, S.R. Subramany am, J. Sygnowski, J. T angua y , G. v an T ulder, J. T urian, S. Urban, P . Vincent, F. Visin, H. de V ries, D. W arde-F arley , D.J. W ebb, M. Willson, K. Xu, L. Xue, L. Y ao, S. Zhang, and Y. Zhang. Theano: A Python framework for fast computation of mathematical expressions. arXiv pr eprint arXiv:1605.02688 , 2016. URL http: //deeplearning.net/software/theano . A.G. Baydin, B.A. Pearlm utter, A.A. Radul, and J.M. Siskind. Automatic differentiation in machine learning: a survey . arXiv preprint , 2015. N. Boumal and P .-A. Absil. Low-rank matrix completion via preconditioned optimization on the Grassmann manifold. Line ar Algebr a and its Applications , 475:200–239, 2015. doi: 10.1016/j.laa.2015.02.027. N. Boumal, B. Mishra, P .-A. Absil, and R. Sepulchre. Manopt, a Matlab T oolb o x for Optimization on Manifolds. Journal of Machine L e arning R ese ar ch , 15:1455–1459, 2014. URL http://manopt.org . A. Edelman, T.A. Arias, and S.T. Smith. The Geometry of Algorithms with Orthogonality Constraints. SIAM J. Matrix Anal. & Appl. , 20(2):303–353, 1998. R. Hosseini and S. Sra. Matrix Manifold Optimization for Gaussian Mixtures. In A dvanc es in Neur al Information Pr o c essing Systems , pages 910–918, 2015. P .J. Hub er. Robust estimation of a lo cation parameter. The Annals of Mathematic al Statistics , 35(1):73–101, 1964. D. Maclaurin, D. Duvenaud, M. Johnson, and R.P . Adams. Autograd: Reverse-mode differentiation of nativ e Python, 2015. URL http://github.com/HIPS/autograd . A. P o dosinnik ov a, S. Setzer, and M. Hein. Robust PCA: Optimization of the Robust Reconstruction Error ov er the Stiefel Manifold. In 36th German Confer enc e on Pattern R e c o gnition (GCPR) , 2014. U. Shalit, D. W einshall, and G. Chechik. Online Learning in the Em b edded Manifold of Low-rank Matrices. Journal of Machine L e arning R ese ar ch , 13(1):429–458, 2012. H. Shen, S. Jegelk a, and A. Gretton. F ast Kernel ICA using an Approximate Newton Metho d. In International Confer enc e on Artificial Intel ligenc e and Statistics , pages 476–483, 2007. S.T. Smith. Optimization techniques on Riemannian manifolds. Fields institute c ommunic ations , 3(3):113–135, 1994. F.J. Theis, T.P . Cason, and P .-A. Absil. Soft dimension reduction for ICA by joint diagonalization on the Stiefel manifold. In Indep endent Comp onent Analysis and Signal Separ ation , pages 354–361. Springer, 2009. B. V andereyc ken. Lo w-Rank Matrix Completion by Riemannian Optimization. SIAM J. Optim. , 23(2):1214–1236, 2013. 5

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment