Increasing GP Computing Power via Volunteer Computing
This paper describes how it is possible to increase GP Computing Power via Volunteer Computing (VC) using the BOINC framework. Two experiments using well-known GP tools -Lil-gp & ECJ- are performed in order to demonstrate the benefit of using VC in t…
Authors: Daniel Lombrana Gonzalez, Francisco Fern, ez de Vega
Increasing GP Computing P o w er via V olun teer Computing Daniel Lombra˜ na Gonz´ alez 1 , F rancisco F ern´ andez de V ega 1 , L. T rujillo 2 , G. Olague 2 , F. Ch´ avez de la O 1 , M. C´ ardenas 3 , L. Araujo 4 , P . Cas tillo 5 , and K . Sharman 6 1 Universit y of Extremadura d aniellg@unex.es, f cofdez@unex .es, f cha vez@unex.es 2 CICESE t rujillo@cicese .mx, o lague@cicese.mx 3 Ceta-Ciemat m iguel.cardenas@ciemat.es 4 UNED l urdes@lsi.uned.es 5 Universit y of Granada p edro@atc.ugr.es 6 Universit y Po lit´ ecnica of V alencia, k en@iti.up v .es Abstract. This pap er describ es how it is p ossible to increase GP Com- puting Po wer via V olunteer Computing (VC) using the BOINC frame- w ork. Tw o exp eriments using well-kno wn GP t o ols -Lil-gp & ECJ- are p erformed in order to d emonstrate the b enefi t of using VC in terms of computing p ow er and sp eed up. Finally we present an extension of the mod el where any GP to ol or framew ork can b e used inside BOINC re- gardless of its programming language, complexit y or required operating system. 1 In tro duction Real world optimization problems are usually complex and their resolution is CPU time consuming when EAs a r e e mployed. This is due to the big a mount of individuals whic h are ev a luated a nd also due to the n umber o f iteratio ns required to find a so lution. In or der to alleviate this pro blem, EAs and GP have b enefited from parallel mo dels. Two main a pproaches to pa rallelize hav e been des crib ed: (se e [1]): Fine- gr ain whic h us es a ma ster-slave ar chitecture and Co arse-gr ain also k nown as island mo del. Given the sto chastic na ture of EAs and GP a nd the large num ber of runs, frequent ly require d for obtaining results, parameter sweep mo dels hav e recently b een applied in co mbination with high throughput computer s y stems. GRID computing is nowada ys o ne of the most emer g ing technologies and has bee n re c ent ly demonstra ted as a p ow erful to ol to deal with time-consuming ap- plications, s e e [2]. The GRID allows sup er computers, clusters or deskto p PCs, which ar e dis tributed ov er netw orks, to be harnessed by mea ns of a sp ecial soft- ware ca lled middleware. The middleware exp o rts and handles all the computer resource s with the goa l of providing a standard lay er where scientists can run their exp eriments. The middleware can be fo cus ed o n ha ndling two type of resources : desktop PCs or sup er co mputers/clusters . An example of super co mputer-cluster middle- ware is gLite (see [3]). A successful attempt of using EAs and GRID co mputing is presented by N. Mealab et a l. see [4 ]. How ever, this kind of middleware is usually asso ciated with exp ensive ha rdware which is a drawbac k for scie n tists from dev elo ping reg ions or countries. There exists other middleware systems which foc us on cheap de s ktop P Cs. This t yp e of middlew are is aimed at building up Deskto p Grid Computing (DGC) systems. The complexity and deploymen t of this technology is m uch smaller than alternative GRID one (such as g Lite). There are several DGC sys tems such as: Xtr emweb [5] a resear ch pro ject which prese nts a Glo bal Computing platform using a large base o f volunteer PCs, Condor [6] a middleware which implemen ts a scheduling system with desktop PCs a nd BOIN C [7] a multi- platform middlew ar e that uses workstation CP U idle p erio ds to run jobs. When dealing with DGC, the users are really imp ortant. DGC relies on them bec ause the idea b ehind DGC is to allow us ers to collab or ate with a scientific resear ch pro ject b y donating desk top computing p ow er. T o the best o f our knowl- edge, the pioneer o n engaging user s to co llab orate with a scientific research was the pro ject SETI@ HOME [8]. This pro ject ha s b een able to create a sup er vir - tual computer of 259.7 29 T eraFLOP S thanks to the co llab oratio n of 706 ,5 86 users 7 . Thus, DGC is also known as V olunteer Grid Computing (VGC) due to the user s that unselfishly c o llab orate with a scien tific research. T he r efore, VGC computing p ow er ca n b e used for free if users see the interest of the pro ject. F r om a ll the a b ove presented V GC middleware technologies BOINC is the most us e d one. F urthermore, BOINC is a lready widely used in different resea rch fields such as: High Ener gy Physics [9], Astr ophysics [10], Climate Pr e diction [11], Epidemi olo gy [12], etc. A novel technology has also b een recently describ ed which allows to har ne s s computing resources by only browsing a given and sp ecial web pa ge [13,14,15]. Y et, w e do n’t consider it here b ecause it allows to av oid users to notice the background work of the web-browser, thus computing p ow er is “ stolen” from user PCs. Users will th us b e annoy ed if they discovered that someone has b een using their resourc e s without their p e rmission. O ur ide a is just the opp o s ite, not only to inform users ab out the pro ject that we a r e running but a ls o invite them to join a nd collab orate. V GC is also a go o d computing platform for r unning Parameter Sw eep exper - imen ts in genetic and ev o lutionary co mputation. M.E. Samples describ es Com- mander, a new generic parameter sweep framework (see [16]). How ever, firstly , is not a imed at harnessing volun teer reso urces, sec o ndly , has not b een s o widely spread and tested with real scientific resea rch problems, and thirdly it ha sn’t inv olved the h uge nu mber of clients like BOINC (2,364,17 0 computer clients), a consolidated VGC. An y o f the ab ov e para lle l describ ed mo dels -fine a nd coar se grain- ca n run within this approach. W e simply hav e to b ear in mind tha t a ny of the whole parallel mo dels will run on a sing le machine. The p ow er comes fro m the m ultiple and simu ltaneo usly r uns of the same exp eriment with different para meters or ident ica l r uns fo r statistica l a nalysis. 7 Data obtained from the web http://b oincstats.com Therefore, what we prop ose is to use a VGC BO INC mo del and GP in order to: – Harness a larg e num be r of BOINC resources (now adays BOINC ha s 2 ,3 64,17 0 computers in total which provides in a verage 66 8,541.2 GigaFLO PS 8 ). – Improv e the sp eed up of GP sequential executions thanks to the par allel environmen t whic h VG C provides. W e have chosen BOINC as our VGC middleware bec ause BOINC is the most used one, therefore allowing a g reat co mputing p ower. The rema inder of the pap er includes a descr iption of the BO INC mo del in Section 2; we pr e sent the new VGC and GP mo del in Section 3; Section 4 shows the exp eriments and results and an extension to the mo del. W e conclude in Section 5. 2 The BOINC mo del As describ ed ab ove, BOINC is a middleware that har ness the commo dity com- puter reso urces for a given pro ject. BOINC has t wo main k ey features : it’s Mul- tiplatform and Op en sour c e . BOI NC uses a master- s lav e mo del where the server is in charge o f: – Hosting the scientific pr oje ct exp eriments . One pro ject is comp osed by a bi- nary (the algorithm) and some input files. The bina ry is classified according to the target platform (Ms. Windo ws, GNU/Linux, MacOSX) and a rchitec- ture (x86 3 2-64 bits and spa r c). – Cr e ation and distribution of jobs . I n BOINC’s terminology a job is called “work unit” (WU). A WU des c r ib es how the exp eriment must be run by the clients (the name of the binar y , the input/output files a nd the command line arguments). – Assimilation and validation of r esults . When the clients finish the compu- tations, they upload the res ults to the s e rver. At this p oint the s e r ver runs t wo pro cesses : a v alidation a nd assimila tor prog ram. The v alidator goa l is to verify if the results are cor rupted or not. If everything is c o rrect the re- sults are v alidated and prepar e d for the next program: the a ssimilator. The assimilator is in ch a rge o f parsing the output files of the pro ject to p erform tasks like: compute some statistics, s tore results inside other da tabase, etc. The client is quite simple. The BOINC c lient connects to the server and ask s for work (WU). The clien t downloads the necessary files and star ts the compu- tations. Once the results ar e obtained, the client uploads them to the s e r ver. Additionally , during the whole pro cess (all the steps: download WU, pro c e ss it, upload the results) the client contacts the server to keep a “heartb eat” function. The hea r tb e a t is used to tak e some decisio ns a nd g e ne r ate some statistical data. As BO INC relies on user s, BOINC re s ources ar e no t relia ble b eca use: 8 Data obtained from http://boincsta ts.comunder BOINC Com bined stats – Users turn on and off its machines without kn owing if they ar e interrupt- ing a BOINC exe cu tion . Therefore, the r esearch applicatio n mu st have a chec kp oint facility . – Users c ould try t o che at . BOINC ha s a redundancy feature that circumv ents this proble m. The BOINC administrato r can define which is the minimum required quorum to v alidate a solution. – Users c ould try hacking the BOINC server . If one user could hack the ser ver, he could be able to launch new WUs whic h can have viruses. In order to av oid this pr oblem, BOINC uses digital signatures to sig n binary applica tio ns. Therefore, only signed applications can b e distributed over the clie nt s. As we can observe BOINC’s structure is simple and provides the main needed features that any VGC system requir es. The next sub- section explains how we can use BOINC with a scientific res earch pr o ject. 2.1 Ho w to use BOINC with a Scien tific Pro ject A scien tific pro ject that w ants to us e BO INC has to set up a GNU/Lin ux ser ver (Apache, MySQL, PHP) and then take into account the following k ey p oints: – Pr o gr amming L anguage . B O INC uses C+ + and F ortran. Thus, if the sci- ent ific pro ject is not co ded in C++ or F OR TRAN the pro ject has to b e po rted to C++ or F ortran in order to link its source co de with the BOINC framework. – Op er ating S ystem . BOINC has c lie nts for GNU/Linux, Ms. Windows and MacOSX. How ever, the scientific pro ject ha s to b e adapted to all o f them if we wan t to harness as muc h as po ssible av ailable resourc e s. BOINC uses a generic framework whic h builds binar ie s which ar e O S dep endant. Therefore, there ar e two wa ys to supp ort BOINC: – Metho d 1. T o p ort the c o de . This metho d is the mos t used. Basica lly , a resear cher has to a dapt its application source co de to supp ort BOINC. The changes could be easy if the to o l is co ded in C/ C++ or F ortran. In other cases the research will ha ve to rewrite the whole co de. – Metho d 2. The Wr app er . The BOINC team pr ovides a to o l called wr ap- p er which enables to run statica lly linked softw are inside BOINC without needing to mo dify or p ort the application sour c e co de. Basically the wrapp er embo dies the application in such a w ay that for the BOINC client do es only exists one application: the wrapp er. In conclusion, an application which is not co ded in C/ C++/F or tran will use the s econd metho d. Ho wever, if the application is co ded in C/C++/ F ortran some minor changes will b e needed to supp ort BOINC ( Metho d 1 ). 3 V GC and GP problems Our pr op osal is to use VGC for running GP exp eriments via BOINC technol- ogy . W e present t wo examples that show how any GP preferred to ol could b e effectively used within the BOINC framework. The examples include ada pting Lil-gp to BOINC ( Me tho d 1 ) and using the w r app er for ECJ ( M etho d 2 ). 3.1 Lil-gp and B O INC Lilgp is a w ell known C GP system (see [17]). As Lil-gp is co ded in C, p orting the framework to BOINC is not difficult ( Metho d 1 ). The main p or ting changes were done in all the Input/O utput routines that acces s files. Under BOINC the I/O routines ar e tr eated with sp ecific I / O functions due to the par allel nature of the mo del. Once the c hang es were done Lil-g p was rea dy to be co mpiled a nd linked with the BOINC libraries. In summar y , a Lil-gp-B OINC pr o ject is comp o s ed by the following items: – Binary . The Lil-GP compiled problem (sym b olic linear regress ion, even par- it y 5 , etc.) using the adapted Lil-gp-BOINC framework. – Input files . Lil-GP uses as input file the GP pa rameter file (pro bability of crossover, mutation, etc.). – WU . The WU for this pro ject sp ecifies which are the input files needed by the Binary , the output files which are g oing to b e generated b y the Binary and fina lly if it is necessar y the command line arguments that can be pass e d to the Bina r y . In this pr o ject, it’s necess ary to sp e cify the input file using a command line argument. The results that we describ ed be low show the effectiv eness of the approach. Nevertheless, r esearchers frequently don’t hav e the time to manage a p orting or the to ol is written in a language different to C o r F ortran. In this cases the g oal is to us e the to ol as it is. Nex t sub- s ection deals with this cas e . 3.2 ECJ and BO INC ECJ is a moder n JA V A framework for Evolutionary Co mputation (EC) [1 8]. This fra mework can run different kinds of E C techniques like: g enetic algo rithms, evolutionary strateg ies, genetic progr amming, etc. As ECJ is not co ded in C+ + o r F ortran there are tw o wa ys o f supp orting BOINC: 1. Port the fr amework . This solution is quite difficult due to the framework is written in JA V A. As is w r itten in a different prog ramming language, the po rt step implies a complete r ewriting of all the fr a mework using C++ or F o rtran. 2. Use t he Wr app er . This is a simple solution co nsisting o f to not modify any source co de line to run the framework inside BOINC. Therefore we employ the wrapp er solution. How ever this solutio n implies that all the clients should hav e installed a JA V A virtual machine, b ecaus e without JA V A it is imp ossible to run any E CJ exp er imen t. So, the adopted solution to suppo rt JA V A inside all the clients, was to pa ck a lso the JA V A vir tual machine with ECJ. In summary , the E CJ-BOINC pro ject is comp ose d by the following items: – Binary . The bina r y is the wrapp er. The W rapp er s uses a file called job.xml which sp ecifies the u nmo difie d binary that m ust be launched by BOINC, the unmo difie d binary input/output files and the command line arguments. – Input fi les • ECJ and JA V A . E CJ a nd JA V A are pa ck aged in different compresse d files. These files will be unpack ed when the client s have downloaded them. • Unmo difie d Binary . The u nmo difie d binary is a script file which is in charge of unpa cking all the input files (E CJ and J A V A virtual machine) and start the ex ecution of ECJ. Additionally it’s a lso in c har ge of ha n- dling the internal ECJ chec kp ointing in or der to restart, when neces s ary , from the last sav ed and stable stage. – WU . The WU for this pro ject sp e c ifies whic h a re the input files needed by the Binary and the output files which ar e go ing to be g enerated by the Binary (in this case by the un mo difie d binary ). In summary the workflo w of ECJ- BOINC, o nce the clients hav e downloaded all the neces sary files, will be the next: 1. The W rapp er launches the starter s c ript. 2. The script: (a) Unpacks ECJ and J A V A compressed files . (b) Checks if a n ECJ chec kp oint file had b een created and in that case it will la unch ECJ with the c heckpo int file, else it will launch ECJ in the normal wa y . (c) Copies the ECJ output file to the solution file and exits. 3. The wra ppe r waits until the solution file is created and then notifies to the BOINC cor e client that a n execution has ended a nd that the r e sult files can be uplo aded to the BO INC se rver. An y other statically linked to ol co uld also follow this scheme to be run on BOINC clients. The next section explains how Lil-gp-BO INC and E CJ-BOINC were employ ed to r un different exp eriments using a geogra phically distributed po ol of clie nts. 4 Exp erimen ts & Results The goal of all the exp eriments presented b efore is to show that VGC is a useful c omputing platfor m for r unning GP problems . W e a re not interested in chec king the qua lit y of obtained r esults, and ther efore we hav e employ ed the standard implementation o f b enchmark pr oblems provided by the too ls Lil-gp and ECJ. W e fo cus on measur e the p erfor ma nce impr ovemen t tha t it is p ossible to achiev e when a BOINC mo del is used compared with the traditional and sequential mo de of running only one machine. Usua lly sp eed up is measure by the standard equation 1: A = T seq T B (1) where: – A is the acceleration. – T seq is the co ns umed time by the seq ue ntial mo de. – T B is the co ns umed time by the BO I NC mo de. Nevertheless, given the sp ecial feature s of VGC, we also measure the av ailable computing power (CP ) by using the method describ ed by Anders on a nd F edack in [19]: C P = X ar r ival ∗ X lif e ∗ X ncpus ∗ X f lo ps ∗ X ef f ∗ X onf rac ∗ X active ∗ X r edundancy ∗ X share (2) In all the experiments, X r edundancy is equal to 1 beca use w e didn’t use the redundancy facility provided b y BOINC. X share is a lso equal to 1 b ecause none of the clients shared its resour ces with other BOINC pro jects. X ar r ival and X lif e are imp or tant v ar iables due to they meas ur e the host ch urn (the volun teer computing pr o ject’s p o ol of hosts is dynamic). The rest of the v ariable s measure sp ecific hardware features [19]. 4.1 Lil-gp-BOINC This first exp er iment presents the pro of of conce pt and was s e t up on a controlled environmen t, a lab ora tory , using Lil-gp, Metho d 1 see Section 2.1. In order to measure the p erforma nce improv ement we chose the Artificial Ant o n Santa F e T rail problem (see [2 0]). W e run 25 executions of the exp eriment with differ e nt p opulation sizes (100 0 and 2000 indiv idua ls) and generations (1 000 and 2 000). Tw o p o ols of clients, one with 5 a nd a nother with 10 machines, were used for running the L il- gp-BOINC mo del. As said ab ov e, given the aim of this rese arch we don’t pr esent the qua lit y of obtained re s ults (whic h are the same a s the sequential execution). W e fo cus on the p erfo rmance, computing power and sp eed up. T able 1 shows the consumed time by Lil- gp-BOINC, standard Lil-gp and the accele r ation whic h was o btained using 5 and 1 0 client machines with the BOINC mo del. F r om the ab ov e results we can conclude that as mor e clients are used b etter per formance is obta ine d. F or instance, when we ar e using 10 c lie nts we hav e achiev ed an acceler ation of 5 while with 5 clients we o nly g et an acceler ation of 3 using the same num b er of generations a nd individuals. It is impo r tant to p oint out that this p erfor mance gr ows as more clients colla b orate with the pro ject. T able 1. Execution time fo r L il-gp and Lil-g p-BOINC (a) U sing 5 Clien ts T seq T B Acc. 1000 Gen, 1000 Ind. 4250s 1548 s 2.7455 1000 Gen, 2000 Ind. 650s 395s 1.6456 2000 Gen, 1000 Ind 9200s 2356s 3.9049 (b) Using 10 Clien ts T seq T B Acc. 1000 Gen, 1000 Ind. 4250s 1033s 4.1142 2000 Gen, 1000 I nd 9200s 1623s 5.6685 As this exp er iment is a pro of-of-concept the measure v ariables, sp eed up and computing p ow er, have not taken into account r eal volun teer users . Therefo r e, we do not show the av aila ble computing p ow er obtained in a volun teer computing scenario (see eq ua tion 2 ). In o rder to contin ue the ev alua tion of the mo del w e decided to face a mor e complex GP to o l b esides a mor e computing demanding GP problem. 4.2 ECJ-BOINC This seco nd exp eriment employs a mo der n a nd complex JA V A GP framework (ECJ). F or this to ol we decided to use the wrapp er (Metho d 2) in order to sup- po rt BOINC. W e ch o se the GP b enchmark Bo olean Multiplexer function (see [20]). In ge ne r al, the input to the Bo olea n Multiplexer function consists of k “a d- dress” bits a i and 2 k “data” bits d i which has the form a k − 1 · · · a 1 a 0 d 2 k − 1 · · · d 1 d 0 with a length equal to k + 2 k . The search space fo r this function is equal to 2 k +2 k . This problem has be e n run in several geogra phically dis tr ibuted lab or a- tory clients b elong ing to the Universit y of E xtremadura (C´ aceres, Bada joz and M´ er ida), see Fig. 1(a). Fig. 1 (b) s hows the n umber of clients p er city which take part in the exp eriment. It’s imp or tant to p oint out that in fo llowing tables, T B measures a ll the employ ed time (client connection, WU do wnloa d, CP U time, re- sults upload, etc.): since the first client r egisters and collab ora te with the pro ject un til the last connection to the server from any client . 828 iteratio ns o f the 1 1 multiplexer function ( k = 3 ) were initially p erfo rmed using 45 computers. The exp eriment used the same Koza par a meters (400 0 in- dividuals and 50 genera tions) for more details see [2 1]. F rom the 8 28 iteratio ns 449 itera tions found the p er fect s o lution (althoug h this is no t the goal of our resear ch) to the 11 m ultiplexer problem. Some itera tions gave a n erro r due to the initial a nd default r estriction of the JA V A hea p size 9 . 11 9.18 seconds in av erag e were needed in order to find the perfect solution while 13 4.75 seco nds in a verage is the needed time to r un one e xecution. F ro m the 45 av ailable co mputers, only 27 pro duced 8 28 results. The achiev ed sp e ed up was 0.29 which means a deter io- ration in the p erfor mance. The reason is the eas iness o f the problem, only 13 4.75 seconds in av erage, and we have to take into account that T B measures also the host ch urn (see 2). T a king into acco unt that the pr o ject was running for 5 .35 days, X lif e is measur e only from the first connection to the last co mmunication 9 The heap size was later modified to a void this problem. Fig. 1. Distributed Infra structure (a) Interconnected Cities (b) Clients p er Cit y of hosts tha t had no communicated in at lea st one day . Thus, the achiev ed CP is equal to 80Giga FLO PS. This CP was o bta ined b ecaus e the pro ject was running only a few days and all the hosts w er e active during all the computation time. T ab. 2(a ) shows a summar y . W e increa sed the complex it y of the problem with k = 4 (compare the new search space 2 1048576 with the pr evious one 2 2048 ). Our interest is not in solving the problem, but in setting up a time consuming exp eriment for testing the VGC mo del. W e tried first to deploy 42 runs of a n exp eriment using 50 gener ations with a po pulation of 100 0 individuals. The rest of the GP par ameters a r e the same as K oza pa rameters for the 11 multiplexer see [21]. V o lunt eer co mputers from other Universities or institutions suc h as: CICA in Sevilla, University of E xtremadura (C´ a ceres, Ba da joz, M´ erida), Gra na da, V alen- cia, UNED in Ma drid, and Ceta-Ciemat in T r ujillo collab orated with the pro ject. Thu s, the c o mputing res ources are mo re heterog eneous and r e a llistic now. Us- ing this infrastr ucture we p erfor med 42 runs of the ex pe r iment. A per fo rmance improv ement was a chiev ed as this pro blem needs in av er age 31079 .28 seco nds to r un one executio n. 41 ma chines were used to solve this pr oblem. F rom 4 1 computers, 7 pro duced the 42 r uns due to some machines were turned off fo r hours, others still computing, e tc. (t ypica l VGC behavior). Thus, the o btained acceleratio n w as 1.95 (see T a b. 2). The obtained sp eed up was nice altho ug h not impressive but it was obtained for free with a quite small num b er of volun teers inv olved. In average o ne itera tion employs 3094 4.53 seconds mo re tha n the 1 1 m ultiplexer . X lif e was also measure as in the 11 m ultiplexer pr oblem due to the pro ject was r unning only few da ys. So, the achiev ed co mputing p ow er is eq ua l to 23 GigaFLO P S and is smaller b eca use we are employing only 41 hosts a nd b e- cause the pro ject was running 7.75 days. How ever, b ea r in mind that BOINC ha s a po o l of 2,364 ,1 70 av ailable c omputers which co uld co llab orate with a pro ject in the future instead o f o nly 42 which w e have employed here, not a ll of them simult a ne o usly av ailable. T ab. 2(b) shows a summary o f the relev ant data. The best found Fitness was Raw = 18 0224 . 0 Ad j uste d = 5 . 548 62 e − 06 H its = 8 68352 . 0. T able 2. Execution time fo r E CJ and E C J -BOINC T seq T B Acc. CP 11 bits, 828 runs, 50 Gen, 4000 Ind. 134078s 462259s 0.29 80 GFLOPS 20 bits, 42 runs, 50 Gen, 1000 Ind. 1305330s 66975 9s 1.95 23 GFLOPS Finally , w e perfo r med another expe riment with a complex and not statica lly linked GP to ol whic h makes imp ossible to employ Metho d 1 or 2. Additionally , this exp eriment faces a real life a nd time-consuming Computer Vision problem (instead of a benchmark problem) that has alr eady b een solved in a sequential fashion (Interest Poin t detectors,see [2 2]). This GP framework uses the Matlab environmen t and several image too l-b oxes, which implies a m uch more complex system, b eing ther efore more difficult to deploy it o ver a BOINC infrastructure . Hence, our pro p osal is to use a Virtualiza tio n lay er (see [23]) inside BOINC by creating an image of a working s cientific system ( hardware, OS a nd the resear ch to ol). Thanks to this new virtualization lay er (bas ed on VMware [23]) any GP system o r fr amework -independently fro m its complexity , pr ogra mming languag e and op era ting sy stem- can b e run on a ny BOINC client (Linu x, Windows or MacOSX, for further details se e [24]). F o r this exp er iment we set up 10 Ms Windows volun teer computers . The virtual image was build using a GNU/Linux x86 oper ating system. Thus, a GNU/Lin ux s cientific environment runs direc tly inside Ms Windows tha nks to the Virtual-BOINC approa ch. The 10 Windows PCs pr o duced 12 solutions dur- ing 48 hours. The consumed time b y each so lutio n was in av erage o f 1 8 hour s. The tota l time consumed for pro ducing 1 2 solutions b y a seq ue ntial r un was 215 hours. Therefore, thanks to this new mo del the obtained sp eed up was of 4.4 8 and a CP o f 25 .67 GFLOP S, (see T ab. 3). T able 3. Execution time fo r I P and IP-Virtual- BOINC T seq T B Acc. CP 75 Gen, 75 Ind . 215h 48h 4.48 25.67 GFLOPS In summary , the BOINC mo del improves significantly the perfor mances when CPU-intensiv e, time-consuming , rea l-life problems are so lved by means of GP . Moreov er , as mo r e computer s co llab orate with one pro ject mor e computing power and spe ed up is achieved with a fre e co st. Fig. 2. Host ch urn during Septem b er of 2007 5 Conclusions This pap er ha s presented a new a pproach to solve GP pro blems using V olun- teer Grid Computing. Thr ee metho ds hav e b een descr ib ed in order to supp or t BOINC. Firstly , p o r ting one eas y GP application to BO INC, seco ndly using a mo dern and more co mplex GP framework which was run inside BOINC without any mo dification due to it’s statically link ed and finally a co mplex GP environ- men t which faces a rea l-life problem by using a v irtualization lay er which allows running ins ide BOINC an y GP system indepe nden tly of its complexity , progra m- ming la nguage or op e rating system. Three exper iments were p erfo r med, one in a controlled environment , another ov er a geo graphically dis tributed infrastruc- ture inv olving 8 cities a nd finally one more using the virtualiza tion extensio n. Results shows that V GC is a p er fect springb o a rd to run complex and time- int ensive problems by means o f GP using free co mputing reso urces. Moreover, BOINC is a really in teres ting technology if we take into account the big p o ol of BOINC enabled computers 2 ,3 64,17 0 which co uld collab orate with a new GP BOINC pro ject. 6 Ac knowled gmen t s This work was supp orted by Jun ta de Extremadura , C´ a tedra CET A-CIEMA T de la Universidad de Extr emadura, Regional Gridex pr o ject PRI06A2 23 and National NOHNES pro ject TIN200 7 -680 8 3-C02 -01 Spanish Minis try of Science and E ducation. References 1. T omassini, M.: Spatially Stru ctured Ev olutionary A lgorithms. Springer ( 2005) 2. Kesselman, C., F oster, I.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann (1999) 3. Laure, E., Fisher, S ., F rohn er, A., Grandi, C., Ku nszt, P ., Krenek, A., Mulmo, O., Pa cini, F., Prelz, F., White, J., et al.: Programming the Grid with gLite. Computational Metho ds in Science and T ec hn ology 12 (1) (2006) 33–45 4. Melab, N., Cahon, S., T albi, E.G.: Grid compu ting for parallel bioinspired algo- rithms. J. Paralle l Distrib. Comput. 66 (8) (2006) 1052– 1061 5. F edak , G., Germain, C., Neri, V., Cappello, F.: XtremW eb: A Generic Global Computing System. Pro ceedings of the IEEE International Symp osium on Cluster Computing and the Grid (CCGRID’01) (2001) 6. M. Litzko w, T. T annenbaum, J.B., Livny , M.: Checkpoint and migration of un ix processes in the condor d istributed pro cessing sy stem. T echnical report, U niversit y of Wisconsin (1997) 7. An derson, D.: Boinc: a system for public-resource computing and storage. In: Grid Computing, 2004. Proceedings. Fifth IEEE/ACM I nternational W orkshop on. (2004) 4–10 8. An derson, D.P ., Cobb, J., Korpela, E., Leb ofsky , M., W erthimer, D.: Seti@home: an exp eriment in public-resource computing. Commun. ACM 45 (11) (2002) 56–61 9. Rob ert- D´ emolaize, G.: Design and p erformance optimization of th e lhc collimation system. Master’s th esis, CERN (2006) 10. Sintes, A.M.: Recent results on the search for contin uous sources with ligo and geo600 11. Allen, M.: Do it yoursel f climate prediction. N atu re (1999) 12. T eam, A.: Africa@home Link. 13. Barabasi, A., Jeong, H., Bro ckman, J., F reeh, V .: P arasitic compu ting. Nature 412 (6850) (2001) 894–897 14. Klein, J., Sp ector, L.: Unwitting distribut ed genetic programming via asyn chronous ja v ascript and xml. In: GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, New Y ork, NY , USA, A CM (2007) 1628– 1635 15. Merelo, J.J., Garc ´ ıa, A.M., Laredo, J.L.J. , Lupi´ on, J., T ricas, F.: Browser-based distributed evolutionary compu tation: p erformance and scaling b ehavior. In: GECCO ’07: Proceedings of the 2007 GECCO conference companion on Gen etic and evol ut ionary computation, New Y ork, NY, USA, ACM (2007) 2851–2858 16. Samples, M., Daida, J., Byom, M., Pizzimen ti, M.: Pa rameter sw eeps for exploring GP parameters. Proceedings of the 2005 w orkshops on Genetic and evolutionary computation (2005) 212–219 17. Dr. Bill Punch, D .Z.: Lil-gp Link . 18. Sean Luke, Liviu Panai t, G.B.S.P .Z.S.E.P .J.H.J.B.R.H., Chircop, A.: Ecj a ja v a- based evo lutionary computation researc h system Link. 19. And erson, D., F edak, G.: The Compu t ational and S torage P otential of V olun- teer Computing. Proceedings of the IEEE International Symp osium on Cluster Computing and the Grid (CCGRID’06) (2006) 20. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cam bridge, MA , USA (1992) 21. Koza, J.: A hierarchical approac h to learning the Bo olean multiplexer function. Raw lins [1863] 171–19 2 22. T rujillo, L., Olague, G.: Synthesis of interest p oin t detectors th rough genetic p ro- gramming. In Cattolico, M., ed.: Proceedings of the Genetic and Evo lutionary Computation Conference, GECCO 2006, Seattle, W ashington, USA, July 8-12, 2006. V olume 1., ACM ( 2006) 887–894 23. Nieh, J., Leonard, O.C.: Examining VMware. j-DDJ 25 (8) (August 2000) 70, 72–74, 76 24. Daniel Lom bra ˜ n a Gonz´ alez, F rancisco F ern´ and ez de V ega, L.T.G.O.B.S.: Cus- tomizable ex ecution environments with virtual desktop grid computing. P arallel and Distributed Computing and Sy stems, PDCS (2007. Accepted)
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment