A Robust Asynchronous Newton Method for Massive Scale Computing Systems

Reading time: 5 minute
...

📝 Abstract

Volunteer computing grids offer super-computing levels of computing power at the relatively low cost of operating a server. In previous work, the authors have shown that it is possible to take traditionally iterative evolutionary algorithms and execute them on volunteer computing grids by performing them asynchronously. The asynchronous implementations dramatically increase scalability and decrease the time taken to converge to a solution. Iterative and asynchronous optimization algorithms implemented using MPI on clusters and supercomputers, and BOINC on volunteer computing grids have been packaged together in a framework for generic distributed optimization (FGDO). This paper presents a new extension to FGDO for an asynchronous Newton method (ANM) for local optimization. ANM is resilient to heterogeneous, faulty and unreliable computing nodes and is extremely scalable. Preliminary results show that it can converge to a local optimum significantly faster than conjugate gradient descent does.

💡 Analysis

Volunteer computing grids offer super-computing levels of computing power at the relatively low cost of operating a server. In previous work, the authors have shown that it is possible to take traditionally iterative evolutionary algorithms and execute them on volunteer computing grids by performing them asynchronously. The asynchronous implementations dramatically increase scalability and decrease the time taken to converge to a solution. Iterative and asynchronous optimization algorithms implemented using MPI on clusters and supercomputers, and BOINC on volunteer computing grids have been packaged together in a framework for generic distributed optimization (FGDO). This paper presents a new extension to FGDO for an asynchronous Newton method (ANM) for local optimization. ANM is resilient to heterogeneous, faulty and unreliable computing nodes and is extremely scalable. Preliminary results show that it can converge to a local optimum significantly faster than conjugate gradient descent does.

📄 Content

A Robust Asynchronous Newton Method for Massive Scale Computing Systems

Travis Desell Department of Computer Science University of North Dakota Grand Forks, ND 58202, USA travis.desell@gmail.com Malik Magdon-Ismail, Heidi Newberg, Lee A. Newberg, Boleslaw K. Szymanski, Carlos A. Varela Network Science and Technology Center Rensselaer Polytechnic Institute Troy, NY 12180, USA

Abstract—Volunteer computing grids offer supercomputing levels of computing power at the relatively low cost of operating a server. In previous work, the authors have shown that it is possible to take traditionally iterative evolutionary algorithms and execute them on volunteer computing grids by performing them asynchronously. The asynchronous implementations dramatically increase scalability and decrease the time taken to converge to a solution. Iterative and asynchronous optimization algorithms implemented using MPI on clusters and supercomputers, and BOINC on volunteer computing grids have been packaged together in a framework for generic distributed optimization (FGDO). This paper presents a new extension to FGDO for an asynchronous Newton method (ANM) for local optimization. ANM is resilient to heterogeneous, faulty and unreliable computing nodes and is extremely scalable. Preliminary results show that it can converge to a local optimum significantly faster than conjugate gradient descent does. Keywords-voluntary computing, asynchronous Newton method; distributed optimization; BOINC; I. INTRODUCTION Volunteer computing grids can offer significant levels of computing power at very low costs. As an added benefit, many volunteers continually upgrade their hardware, so computing power of a volunteer computing project increases over time, while at best a supercomputer stays the same. However, utilizing these extremely large scale systems involves significant challenges in overcoming heterogeneous, faulty and even malicious hosts. The computations performed are also usually limited embarrassingly parallel bag-of-tasks type work. In many cases, effectively utilizing a volunteer computing system requires rethinking the algorithms involved. In previous work, the authors have shown that asynchronous versions of evolutionary algorithms can be effectively run on volunteer computing systems, such as MilkyWay@Home [1]. While evolutionary algorithms can effectively find global (or near global) solutions to difficult computational problems with many local optima, they are not nearly as efficient as local optimization methods in more well behaved search spaces with a single optimum. Additionally, after finding the general area of the global optimum, they may then take a very long time to converge to the solution.

This work explores an asynchronous version of the Newton method, which has traditionally been avoided in smaller scale computing systems. By using regression to calculate the search direction and then using a randomized line search, it is possible to perform an efficient local optimization method on a large scale computing system. The asynchronous Newton method (ANM) presented is extremely scalable and tolerant to heterogeneous, faulty hosts. As part of FGDO, it also uses BOINC [2] to validate results from volunteers, providing protection from malicious hosts. Preliminary results show that it converges to a solution in significantly less iterations than conjugate gradient descent and can scale to a massive scale computing system like MilkyWay@Home which currently consists of around 35,000 volunteered hosts (see http://boincstats.com ).
II. ITERATIVE LOCAL OPTIMIZATION In traditional local optimization scenarios, conjugate gradient descent (CGD) [3] or quasi-Newton (QN) methods [4] are typically favored over a standard Newton method, as they require fewer function evaluations to converge to the local optimum. Both types of methods start at a point, ,xG in the parameter space. For each iteration, both the CGD and QN methods will then calculate a precise approximation of the gradient, , ∇of the function f at point .xG The ith value of the gradient vector, i x f ) (G ∇ is:

( ) ( ) i i i i s s x f s x f x f 2 ) ( 0 0 G G G G G − − +

                    (1) 

where 0 isG is a vector of all zeros, except with a user defined step size 0

is as the ith element. For example, given a uniform step vector of length 0 2 0 .. 0 ,1.0 ,1 s s n n i G G

= would be [0, 0, 0.1, 0, .., 0]. CGD will use this gradient to update a stored conjugate gradient, while QN methods use the gradient to refine their approximation of a Hessian matrix (the second- order partial derivatives of a function). Following the calculation of the gradient, a direction, d G , for a line search is chosen (starting at xG ). CGD uses the conjugate gradient as the direction, while QN methods use the inverse of the approximate Hessian multiplie

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut