Increasing GP Computing Power via Volunteer Computing

Increasing GP Computing P o w er via V olun teer Computing Daniel Lombra˜ na Gonz´ alez 1 , F rancisco F ern´ andez de V ega 1 , L. T rujillo 2 , G. Olague 2 , F. Ch´ avez de la O 1 , M. C´ ardenas 3 , L. Araujo 4 , P . Cas tillo 5 , and K . Sharman 6 1 Universit y of Extremadura d aniellg@unex.es, f cofdez@unex .es, f cha vez@unex.es 2 CICESE t rujillo@cicese .mx, o lague@cicese.mx 3 Ceta-Ciemat m iguel.cardenas@ciemat.es 4 UNED l urdes@lsi.uned.es 5 Universit y of Granada p edro@atc.ugr.es 6 Universit y Po lit´ ecnica of V alencia, k en@iti.up v .es Abstract. This pap er describ es how it is p ossible to increase GP Com- puting Po wer via V olunteer Computing (VC) using the BOINC frame- w ork. Tw o exp eriments using well-kno wn GP t o ols -Lil-gp & ECJ- are p erformed in order to d emonstrate the b eneﬁ t of using VC in terms of computing p ow er and sp eed up. Finally we present an extension of the mod el where any GP to ol or framew ork can b e used inside BOINC re- gardless of its programming language, complexit y or required operating system. 1 In tro duction Real world optimization problems are usually complex and their resolution is CPU time consuming when EAs a r e e mployed. This is due to the big a mount of individuals whic h are ev a luated a nd also due to the n umber o f iteratio ns required to ﬁnd a so lution. In or der to alleviate this pro blem, EAs and GP have b eneﬁted from parallel mo dels. Two main a pproaches to pa rallelize hav e been des crib ed: (se e [1]): Fine- gr ain whic h us es a ma ster-slave ar chitecture and Co arse-gr ain also k nown as island mo del. Given the sto chastic na ture of EAs and GP a nd the large num ber of runs, frequent ly require d for obtaining results, parameter sweep mo dels hav e recently b een applied in co mbination with high throughput computer s y stems. GRID computing is nowada ys o ne of the most emer g ing technologies and has bee n re c ent ly demonstra ted as a p ow erful to ol to deal with time-consuming ap- plications, s e e [2]. The GRID allows sup er computers, clusters or deskto p PCs, which ar e dis tributed ov er netw orks, to be harnessed by mea ns of a sp ecial soft- ware ca lled middleware. The middleware exp o rts and handles all the computer resource s with the goa l of providing a standard lay er where scientists can run their exp eriments. The middleware can be fo cus ed o n ha ndling two type of resources : desktop PCs or sup er co mputers/clusters . An example of super co mputer-cluster middle- ware is gLite (see [3]). A successful attempt of using EAs and GRID co mputing is presented by N. Mealab et a l. see [4 ]. How ever, this kind of middleware is usually asso ciated with exp ensive ha rdware which is a drawbac k for scie n tists from dev elo ping reg ions or countries. There exists other middleware systems which foc us on cheap de s ktop P Cs. This t yp e of middlew are is aimed at building up Deskto p Grid Computing (DGC) systems. The complexity and deploymen t of this technology is m uch smaller than alternative GRID one (such as g Lite). There are several DGC sys tems such as: Xtr emweb [5] a resear ch pro ject which prese nts a Glo bal Computing platform using a large base o f volunteer PCs, Condor [6] a middleware which implemen ts a scheduling system with desktop PCs a nd BOIN C [7] a multi- platform middlew ar e that uses workstation CP U idle p erio ds to run jobs. When dealing with DGC, the users are really imp ortant. DGC relies on them bec ause the idea b ehind DGC is to allow us ers to collab or ate with a scientiﬁc resear ch pro ject b y donating desk top computing p ow er. T o the best o f our knowl- edge, the pioneer o n engaging user s to co llab orate with a scientiﬁc research was the pro ject SETI@ HOME [8]. This pro ject ha s b een able to create a sup er vir - tual computer of 259.7 29 T eraFLOP S thanks to the co llab oratio n of 706 ,5 86 users 7 . Thus, DGC is also known as V olunteer Grid Computing (VGC) due to the user s that unselﬁshly c o llab orate with a scien tiﬁc research. T he r efore, VGC computing p ow er ca n b e used for free if users see the interest of the pro ject. F r om a ll the a b ove presented V GC middleware technologies BOINC is the most us e d one. F urthermore, BOINC is a lready widely used in diﬀerent resea rch ﬁelds such as: High Ener gy Physics [9], Astr ophysics [10], Climate Pr e diction [11], Epidemi olo gy [12], etc. A novel technology has also b een recently describ ed which allows to har ne s s computing resources by only browsing a given and sp ecial web pa ge [13,14,15]. Y et, w e do n’t consider it here b ecause it allows to av oid users to notice the background work of the web-browser, thus computing p ow er is “ stolen” from user PCs. Users will th us b e annoy ed if they discovered that someone has b een using their resourc e s without their p e rmission. O ur ide a is just the opp o s ite, not only to inform users ab out the pro ject that we a r e running but a ls o invite them to join a nd collab orate. V GC is also a go o d computing platform for r unning Parameter Sw eep exper - imen ts in genetic and ev o lutionary co mputation. M.E. Samples describ es Com- mander, a new generic parameter sweep framework (see [16]). How ever, ﬁrstly , is not a imed at harnessing volun teer reso urces, sec o ndly , has not b een s o widely spread and tested with real scientiﬁc resea rch problems, and thirdly it ha sn’t inv olved the h uge nu mber of clients like BOINC (2,364,17 0 computer clients), a consolidated VGC. An y o f the ab ov e para lle l describ ed mo dels -ﬁne a nd coar se grain- ca n run within this approach. W e simply hav e to b ear in mind tha t a ny of the whole parallel mo dels will run on a sing le machine. The p ow er comes fro m the m ultiple and simu ltaneo usly r uns of the same exp eriment with diﬀerent para meters or ident ica l r uns fo r statistica l a nalysis. 7 Data obtained from the web http://b oincstats.com Therefore, what we prop ose is to use a VGC BO INC mo del and GP in order to: – Harness a larg e num be r of BOINC resources (now adays BOINC ha s 2 ,3 64,17 0 computers in total which provides in a verage 66 8,541.2 GigaFLO PS 8 ). – Improv e the sp eed up of GP sequential executions thanks to the par allel environmen t whic h VG C provides. W e have chosen BOINC as our VGC middleware bec ause BOINC is the most used one, therefore allowing a g reat co mputing p ower. The rema inder of the pap er includes a descr iption of the BO INC mo del in Section 2; we pr e sent the new VGC and GP mo del in Section 3; Section 4 shows the exp eriments and results and an extension to the mo del. W e conclude in Section 5. 2 The BOINC mo del As describ ed ab ove, BOINC is a middleware that har ness the commo dity com- puter reso urces for a given pro ject. BOINC has t wo main k ey features : it’s Mul- tiplatform and Op en sour c e . BOI NC uses a master- s lav e mo del where the server is in charge o f: – Hosting the scientiﬁc pr oje ct exp eriments . One pro ject is comp osed by a bi- nary (the algorithm) and some input ﬁles. The bina ry is classiﬁed according to the target platform (Ms. Windo ws, GNU/Linux, MacOSX) and a rchitec- ture (x86 3 2-64 bits and spa r c). – Cr e ation and distribution of jobs . I n BOINC’s terminology a job is called “work unit” (WU). A WU des c r ib es how the exp eriment must be run by the clients (the name of the binar y , the input/output ﬁles a nd the command line arguments). – Assimilation and validation of r esults . When the clients ﬁnish the compu- tations, they upload the res ults to the s e rver. At this p oint the s e r ver runs t wo pro cesses : a v alidation a nd assimila tor prog ram. The v alidator goa l is to verify if the results are cor rupted or not. If everything is c o rrect the re- sults are v alidated and prepar e d for the next program: the a ssimilator. The assimilator is in ch a rge o f parsing the output ﬁles of the pro ject to p erform tasks like: compute some statistics, s tore results inside other da tabase, etc. The client is quite simple. The BOINC c lient connects to the server and ask s for work (WU). The clien t downloads the necessary ﬁles and star ts the compu- tations. Once the results ar e obtained, the client uploads them to the s e r ver. Additionally , during the whole pro cess (all the steps: download WU, pro c e ss it, upload the results) the client contacts the server to keep a “heartb eat” function. The hea r tb e a t is used to tak e some decisio ns a nd g e ne r ate some statistical data. As BO INC relies on user s, BOINC re s ources ar e no t relia ble b eca use: 8 Data obtained from http://boincsta ts.comunder BOINC Com bined stats – Users turn on and oﬀ its machines without kn owing if they ar e interrupt- ing a BOINC exe cu tion . Therefore, the r esearch applicatio n mu st have a chec kp oint facility . – Users c ould try t o che at . BOINC ha s a redundancy feature that circumv ents this proble m. The BOINC administrato r can deﬁne which is the minimum required quorum to v alidate a solution. – Users c ould try hacking the BOINC server . If one user could hack the ser ver, he could be able to launch new WUs whic h can have viruses. In order to av oid this pr oblem, BOINC uses digital signatures to sig n binary applica tio ns. Therefore, only signed applications can b e distributed over the clie nt s. As we can observe BOINC’s structure is simple and provides the main needed features that any VGC system requir es. The next sub- section explains how we can use BOINC with a scientiﬁc res earch pr o ject. 2.1 Ho w to use BOINC with a Scien tiﬁc Pro ject A scien tiﬁc pro ject that w ants to us e BO INC has to set up a GNU/Lin ux ser ver (Apache, MySQL, PHP) and then take into account the following k ey p oints: – Pr o gr amming L anguage . B O INC uses C+ + and F ortran. Thus, if the sci- ent iﬁc pro ject is not co ded in C++ or F OR TRAN the pro ject has to b e po rted to C++ or F ortran in order to link its source co de with the BOINC framework. – Op er ating S ystem . BOINC has c lie nts for GNU/Linux, Ms. Windows and MacOSX. How ever, the scientiﬁc pro ject ha s to b e adapted to all o f them if we wan t to harness as muc h as po ssible av ailable resourc e s. BOINC uses a generic framework whic h builds binar ie s which ar e O S dep endant. Therefore, there ar e two wa ys to supp ort BOINC: – Metho d 1. T o p ort the c o de . This metho d is the mos t used. Basica lly , a resear cher has to a dapt its application source co de to supp ort BOINC. The changes could be easy if the to o l is co ded in C/ C++ or F ortran. In other cases the research will ha ve to rewrite the whole co de. – Metho d 2. The Wr app er . The BOINC team pr ovides a to o l called wr ap- p er which enables to run statica lly linked softw are inside BOINC without needing to mo dify or p ort the application sour c e co de. Basically the wrapp er embo dies the application in such a w ay that for the BOINC client do es only exists one application: the wrapp er. In conclusion, an application which is not co ded in C/ C++/F or tran will use the s econd metho d. Ho wever, if the application is co ded in C/C++/ F ortran some minor changes will b e needed to supp ort BOINC ( Metho d 1 ). 3 V GC and GP problems Our pr op osal is to use VGC for running GP exp eriments via BOINC technol- ogy . W e present t wo examples that show how any GP preferred to ol could b e eﬀectively used within the BOINC framework. The examples include ada pting Lil-gp to BOINC ( Me tho d 1 ) and using the w r app er for ECJ ( M etho d 2 ). 3.1 Lil-gp and B O INC Lilgp is a w ell known C GP system (see [17]). As Lil-gp is co ded in C, p orting the framework to BOINC is not diﬃcult ( Metho d 1 ). The main p or ting changes were done in all the Input/O utput routines that acces s ﬁles. Under BOINC the I/O routines ar e tr eated with sp eciﬁc I / O functions due to the par allel nature of the mo del. Once the c hang es were done Lil-g p was rea dy to be co mpiled a nd linked with the BOINC libraries. In summar y , a Lil-gp-B OINC pr o ject is comp o s ed by the following items: – Binary . The Lil-GP compiled problem (sym b olic linear regress ion, even par- it y 5 , etc.) using the adapted Lil-gp-BOINC framework. – Input ﬁles . Lil-GP uses as input ﬁle the GP pa rameter ﬁle (pro bability of crossover, mutation, etc.). – WU . The WU for this pro ject sp eciﬁes which are the input ﬁles needed by the Binary , the output ﬁles which are g oing to b e generated b y the Binary and ﬁna lly if it is necessar y the command line arguments that can be pass e d to the Bina r y . In this pr o ject, it’s necess ary to sp e cify the input ﬁle using a command line argument. The results that we describ ed be low show the eﬀectiv eness of the approach. Nevertheless, r esearchers frequently don’t hav e the time to manage a p orting or the to ol is written in a language diﬀerent to C o r F ortran. In this cases the g oal is to us e the to ol as it is. Nex t sub- s ection deals with this cas e . 3.2 ECJ and BO INC ECJ is a moder n JA V A framework for Evolutionary Co mputation (EC) [1 8]. This fra mework can run diﬀerent kinds of E C techniques like: g enetic algo rithms, evolutionary strateg ies, genetic progr amming, etc. As ECJ is not co ded in C+ + o r F ortran there are tw o wa ys o f supp orting BOINC: 1. Port the fr amework . This solution is quite diﬃcult due to the framework is written in JA V A. As is w r itten in a diﬀerent prog ramming language, the po rt step implies a complete r ewriting of all the fr a mework using C++ or F o rtran. 2. Use t he Wr app er . This is a simple solution co nsisting o f to not modify any source co de line to run the framework inside BOINC. Therefore we employ the wrapp er solution. How ever this solutio n implies that all the clients should hav e installed a JA V A virtual machine, b ecaus e without JA V A it is imp ossible to run any E CJ exp er imen t. So, the adopted solution to suppo rt JA V A inside all the clients, was to pa ck a lso the JA V A vir tual machine with ECJ. In summary , the E CJ-BOINC pro ject is comp ose d by the following items: – Binary . The bina r y is the wrapp er. The W rapp er s uses a ﬁle called job.xml which sp eciﬁes the u nmo diﬁe d binary that m ust be launched by BOINC, the unmo diﬁe d binary input/output ﬁles and the command line arguments. – Input ﬁ les • ECJ and JA V A . E CJ a nd JA V A are pa ck aged in diﬀerent compresse d ﬁles. These ﬁles will be unpack ed when the client s have downloaded them. • Unmo diﬁe d Binary . The u nmo diﬁe d binary is a script ﬁle which is in charge of unpa cking all the input ﬁles (E CJ and J A V A virtual machine) and start the ex ecution of ECJ. Additionally it’s a lso in c har ge of ha n- dling the internal ECJ chec kp ointing in or der to restart, when neces s ary , from the last sav ed and stable stage. – WU . The WU for this pro ject sp e c iﬁes whic h a re the input ﬁles needed by the Binary and the output ﬁles which ar e go ing to be g enerated by the Binary (in this case by the un mo diﬁe d binary ). In summary the workﬂo w of ECJ- BOINC, o nce the clients hav e downloaded all the neces sary ﬁles, will be the next: 1. The W rapp er launches the starter s c ript. 2. The script: (a) Unpacks ECJ and J A V A compressed ﬁles . (b) Checks if a n ECJ chec kp oint ﬁle had b een created and in that case it will la unch ECJ with the c heckpo int ﬁle, else it will launch ECJ in the normal wa y . (c) Copies the ECJ output ﬁle to the solution ﬁle and exits. 3. The wra ppe r waits until the solution ﬁle is created and then notiﬁes to the BOINC cor e client that a n execution has ended a nd that the r e sult ﬁles can be uplo aded to the BO INC se rver. An y other statically linked to ol co uld also follow this scheme to be run on BOINC clients. The next section explains how Lil-gp-BO INC and E CJ-BOINC were employ ed to r un diﬀerent exp eriments using a geogra phically distributed po ol of clie nts. 4 Exp erimen ts & Results The goal of all the exp eriments presented b efore is to show that VGC is a useful c omputing platfor m for r unning GP problems . W e a re not interested in chec king the qua lit y of obtained r esults, and ther efore we hav e employ ed the standard implementation o f b enchmark pr oblems provided by the too ls Lil-gp and ECJ. W e fo cus on measur e the p erfor ma nce impr ovemen t tha t it is p ossible to achiev e when a BOINC mo del is used compared with the traditional and sequential mo de of running only one machine. Usua lly sp eed up is measure by the standard equation 1: A = T seq T B (1) where: – A is the acceleration. – T seq is the co ns umed time by the seq ue ntial mo de. – T B is the co ns umed time by the BO I NC mo de. Nevertheless, given the sp ecial feature s of VGC, we also measure the av ailable computing power (CP ) by using the method describ ed by Anders on a nd F edack in [19]: C P = X ar r ival ∗ X lif e ∗ X ncpus ∗ X f lo ps ∗ X ef f ∗ X onf rac ∗ X active ∗ X r edundancy ∗ X share (2) In all the experiments, X r edundancy is equal to 1 beca use w e didn’t use the redundancy facility provided b y BOINC. X share is a lso equal to 1 b ecause none of the clients shared its resour ces with other BOINC pro jects. X ar r ival and X lif e are imp or tant v ar iables due to they meas ur e the host ch urn (the volun teer computing pr o ject’s p o ol of hosts is dynamic). The rest of the v ariable s measure sp eciﬁc hardware features [19]. 4.1 Lil-gp-BOINC This ﬁrst exp er iment presents the pro of of conce pt and was s e t up on a controlled environmen t, a lab ora tory , using Lil-gp, Metho d 1 see Section 2.1. In order to measure the p erforma nce improv ement we chose the Artiﬁcial Ant o n Santa F e T rail problem (see [2 0]). W e run 25 executions of the exp eriment with diﬀer e nt p opulation sizes (100 0 and 2000 indiv idua ls) and generations (1 000 and 2 000). Tw o p o ols of clients, one with 5 a nd a nother with 10 machines, were used for running the L il- gp-BOINC mo del. As said ab ov e, given the aim of this rese arch we don’t pr esent the qua lit y of obtained re s ults (whic h are the same a s the sequential execution). W e fo cus on the p erfo rmance, computing power and sp eed up. T able 1 shows the consumed time by Lil- gp-BOINC, standard Lil-gp and the accele r ation whic h was o btained using 5 and 1 0 client machines with the BOINC mo del. F r om the ab ov e results we can conclude that as mor e clients are used b etter per formance is obta ine d. F or instance, when we ar e using 10 c lie nts we hav e achiev ed an acceler ation of 5 while with 5 clients we o nly g et an acceler ation of 3 using the same num b er of generations a nd individuals. It is impo r tant to p oint out that this p erfor mance gr ows as more clients colla b orate with the pro ject. T able 1. Execution time fo r L il-gp and Lil-g p-BOINC (a) U sing 5 Clien ts T seq T B Acc. 1000 Gen, 1000 Ind. 4250s 1548 s 2.7455 1000 Gen, 2000 Ind. 650s 395s 1.6456 2000 Gen, 1000 Ind 9200s 2356s 3.9049 (b) Using 10 Clien ts T seq T B Acc. 1000 Gen, 1000 Ind. 4250s 1033s 4.1142 2000 Gen, 1000 I nd 9200s 1623s 5.6685 As this exp er iment is a pro of-of-concept the measure v ariables, sp eed up and computing p ow er, have not taken into account r eal volun teer users . Therefo r e, we do not show the av aila ble computing p ow er obtained in a volun teer computing scenario (see eq ua tion 2 ). In o rder to contin ue the ev alua tion of the mo del w e decided to face a mor e complex GP to o l b esides a mor e computing demanding GP problem. 4.2 ECJ-BOINC This seco nd exp eriment employs a mo der n a nd complex JA V A GP framework (ECJ). F or this to ol we decided to use the wrapp er (Metho d 2) in order to sup- po rt BOINC. W e ch o se the GP b enchmark Bo olean Multiplexer function (see [20]). In ge ne r al, the input to the Bo olea n Multiplexer function consists of k “a d- dress” bits a i and 2 k “data” bits d i which has the form a k − 1 · · · a 1 a 0 d 2 k − 1 · · · d 1 d 0 with a length equal to k + 2 k . The search space fo r this function is equal to 2 k +2 k . This problem has be e n run in several geogra phically dis tr ibuted lab or a- tory clients b elong ing to the Universit y of E xtremadura (C´ aceres, Bada joz and M´ er ida), see Fig. 1(a). Fig. 1 (b) s hows the n umber of clients p er city which take part in the exp eriment. It’s imp or tant to p oint out that in fo llowing tables, T B measures a ll the employ ed time (client connection, WU do wnloa d, CP U time, re- sults upload, etc.): since the ﬁrst client r egisters and collab ora te with the pro ject un til the last connection to the server from any client . 828 iteratio ns o f the 1 1 multiplexer function ( k = 3 ) were initially p erfo rmed using 45 computers. The exp eriment used the same Koza par a meters (400 0 in- dividuals and 50 genera tions) for more details see [2 1]. F rom the 8 28 iteratio ns 449 itera tions found the p er fect s o lution (althoug h this is no t the goal of our resear ch) to the 11 m ultiplexer problem. Some itera tions gave a n erro r due to the initial a nd default r estriction of the JA V A hea p size 9 . 11 9.18 seconds in av erag e were needed in order to ﬁnd the perfect solution while 13 4.75 seco nds in a verage is the needed time to r un one e xecution. F ro m the 45 av ailable co mputers, only 27 pro duced 8 28 results. The achiev ed sp e ed up was 0.29 which means a deter io- ration in the p erfor mance. The reason is the eas iness o f the problem, only 13 4.75 seconds in av erage, and we have to take into account that T B measures also the host ch urn (see 2). T a king into acco unt that the pr o ject was running for 5 .35 days, X lif e is measur e only from the ﬁrst connection to the last co mmunication 9 The heap size was later modiﬁed to a void this problem. Fig. 1. Distributed Infra structure (a) Interconnected Cities (b) Clients p er Cit y of hosts tha t had no communicated in at lea st one day . Thus, the achiev ed CP is equal to 80Giga FLO PS. This CP was o bta ined b ecaus e the pro ject was running only a few days and all the hosts w er e active during all the computation time. T ab. 2(a ) shows a summar y . W e increa sed the complex it y of the problem with k = 4 (compare the new search space 2 1048576 with the pr evious one 2 2048 ). Our interest is not in solving the problem, but in setting up a time consuming exp eriment for testing the VGC mo del. W e tried ﬁrst to deploy 42 runs of a n exp eriment using 50 gener ations with a po pulation of 100 0 individuals. The rest of the GP par ameters a r e the same as K oza pa rameters for the 11 multiplexer see [21]. V o lunt eer co mputers from other Universities or institutions suc h as: CICA in Sevilla, University of E xtremadura (C´ a ceres, Ba da joz, M´ erida), Gra na da, V alen- cia, UNED in Ma drid, and Ceta-Ciemat in T r ujillo collab orated with the pro ject. Thu s, the c o mputing res ources are mo re heterog eneous and r e a llistic now. Us- ing this infrastr ucture we p erfor med 42 runs of the ex pe r iment. A per fo rmance improv ement was a chiev ed as this pro blem needs in av er age 31079 .28 seco nds to r un one executio n. 41 ma chines were used to solve this pr oblem. F rom 4 1 computers, 7 pro duced the 42 r uns due to some machines were turned oﬀ fo r hours, others still computing, e tc. (t ypica l VGC behavior). Thus, the o btained acceleratio n w as 1.95 (see T a b. 2). The obtained sp eed up was nice altho ug h not impressive but it was obtained for free with a quite small num b er of volun teers inv olved. In average o ne itera tion employs 3094 4.53 seconds mo re tha n the 1 1 m ultiplexer . X lif e was also measure as in the 11 m ultiplexer pr oblem due to the pro ject was r unning only few da ys. So, the achiev ed co mputing p ow er is eq ua l to 23 GigaFLO P S and is smaller b eca use we are employing only 41 hosts a nd b e- cause the pro ject was running 7.75 days. How ever, b ea r in mind that BOINC ha s a po o l of 2,364 ,1 70 av ailable c omputers which co uld co llab orate with a pro ject in the future instead o f o nly 42 which w e have employed here, not a ll of them simult a ne o usly av ailable. T ab. 2(b) shows a summary o f the relev ant data. The best found Fitness was Raw = 18 0224 . 0 Ad j uste d = 5 . 548 62 e − 06 H its = 8 68352 . 0. T able 2. Execution time fo r E CJ and E C J -BOINC T seq T B Acc. CP 11 bits, 828 runs, 50 Gen, 4000 Ind. 134078s 462259s 0.29 80 GFLOPS 20 bits, 42 runs, 50 Gen, 1000 Ind. 1305330s 66975 9s 1.95 23 GFLOPS Finally , w e perfo r med another expe riment with a complex and not statica lly linked GP to ol whic h makes imp ossible to employ Metho d 1 or 2. Additionally , this exp eriment faces a real life a nd time-consuming Computer Vision problem (instead of a benchmark problem) that has alr eady b een solved in a sequential fashion (Interest Poin t detectors,see [2 2]). This GP framework uses the Matlab environmen t and several image too l-b oxes, which implies a m uch more complex system, b eing ther efore more diﬃcult to deploy it o ver a BOINC infrastructure . Hence, our pro p osal is to use a Virtualiza tio n lay er (see [23]) inside BOINC by creating an image of a working s cientiﬁc system ( hardware, OS a nd the resear ch to ol). Thanks to this new virtualization lay er (bas ed on VMware [23]) any GP system o r fr amework -independently fro m its complexity , pr ogra mming languag e and op era ting sy stem- can b e run on a ny BOINC client (Linu x, Windows or MacOSX, for further details se e [24]). F o r this exp er iment we set up 10 Ms Windows volun teer computers . The virtual image was build using a GNU/Linux x86 oper ating system. Thus, a GNU/Lin ux s cientiﬁc environment runs direc tly inside Ms Windows tha nks to the Virtual-BOINC approa ch. The 10 Windows PCs pr o duced 12 solutions dur- ing 48 hours. The consumed time b y each so lutio n was in av erage o f 1 8 hour s. The tota l time consumed for pro ducing 1 2 solutions b y a seq ue ntial r un was 215 hours. Therefore, thanks to this new mo del the obtained sp eed up was of 4.4 8 and a CP o f 25 .67 GFLOP S, (see T ab. 3). T able 3. Execution time fo r I P and IP-Virtual- BOINC T seq T B Acc. CP 75 Gen, 75 Ind . 215h 48h 4.48 25.67 GFLOPS In summary , the BOINC mo del improves signiﬁcantly the perfor mances when CPU-intensiv e, time-consuming , rea l-life problems are so lved by means of GP . Moreov er , as mo r e computer s co llab orate with one pro ject mor e computing power and spe ed up is achieved with a fre e co st. Fig. 2. Host ch urn during Septem b er of 2007 5 Conclusions This pap er ha s presented a new a pproach to solve GP pro blems using V olun- teer Grid Computing. Thr ee metho ds hav e b een descr ib ed in order to supp or t BOINC. Firstly , p o r ting one eas y GP application to BO INC, seco ndly using a mo dern and more co mplex GP framework which was run inside BOINC without any mo diﬁcation due to it’s statically link ed and ﬁnally a co mplex GP environ- men t which faces a rea l-life problem by using a v irtualization lay er which allows running ins ide BOINC an y GP system indepe nden tly of its complexity , progra m- ming la nguage or op e rating system. Three exper iments were p erfo r med, one in a controlled environment , another ov er a geo graphically dis tributed infrastruc- ture inv olving 8 cities a nd ﬁnally one more using the virtualiza tion extensio n. Results shows that V GC is a p er fect springb o a rd to run complex and time- int ensive problems by means o f GP using free co mputing reso urces. Moreover, BOINC is a really in teres ting technology if we take into account the big p o ol of BOINC enabled computers 2 ,3 64,17 0 which co uld collab orate with a new GP BOINC pro ject. 6 Ac knowled gmen t s This work was supp orted by Jun ta de Extremadura , C´ a tedra CET A-CIEMA T de la Universidad de Extr emadura, Regional Gridex pr o ject PRI06A2 23 and National NOHNES pro ject TIN200 7 -680 8 3-C02 -01 Spanish Minis try of Science and E ducation. References 1. T omassini, M.: Spatially Stru ctured Ev olutionary A lgorithms. Springer ( 2005) 2. Kesselman, C., F oster, I.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann (1999) 3. Laure, E., Fisher, S ., F rohn er, A., Grandi, C., Ku nszt, P ., Krenek, A., Mulmo, O., Pa cini, F., Prelz, F., White, J., et al.: Programming the Grid with gLite. Computational Metho ds in Science and T ec hn ology 12 (1) (2006) 33–45 4. Melab, N., Cahon, S., T albi, E.G.: Grid compu ting for parallel bioinspired algo- rithms. J. Paralle l Distrib. Comput. 66 (8) (2006) 1052– 1061 5. F edak , G., Germain, C., Neri, V., Cappello, F.: XtremW eb: A Generic Global Computing System. Pro ceedings of the IEEE International Symp osium on Cluster Computing and the Grid (CCGRID’01) (2001) 6. M. Litzko w, T. T annenbaum, J.B., Livny , M.: Checkpoint and migration of un ix processes in the condor d istributed pro cessing sy stem. T echnical report, U niversit y of Wisconsin (1997) 7. An derson, D.: Boinc: a system for public-resource computing and storage. In: Grid Computing, 2004. Proceedings. Fifth IEEE/ACM I nternational W orkshop on. (2004) 4–10 8. An derson, D.P ., Cobb, J., Korpela, E., Leb ofsky , M., W erthimer, D.: Seti@home: an exp eriment in public-resource computing. Commun. ACM 45 (11) (2002) 56–61 9. Rob ert- D´ emolaize, G.: Design and p erformance optimization of th e lhc collimation system. Master’s th esis, CERN (2006) 10. Sintes, A.M.: Recent results on the search for contin uous sources with ligo and geo600 11. Allen, M.: Do it yoursel f climate prediction. N atu re (1999) 12. T eam, A.: Africa@home Link. 13. Barabasi, A., Jeong, H., Bro ckman, J., F reeh, V .: P arasitic compu ting. Nature 412 (6850) (2001) 894–897 14. Klein, J., Sp ector, L.: Unwitting distribut ed genetic programming via asyn chronous ja v ascript and xml. In: GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation, New Y ork, NY , USA, A CM (2007) 1628– 1635 15. Merelo, J.J., Garc ´ ıa, A.M., Laredo, J.L.J. , Lupi´ on, J., T ricas, F.: Browser-based distributed evolutionary compu tation: p erformance and scaling b ehavior. In: GECCO ’07: Proceedings of the 2007 GECCO conference companion on Gen etic and evol ut ionary computation, New Y ork, NY, USA, ACM (2007) 2851–2858 16. Samples, M., Daida, J., Byom, M., Pizzimen ti, M.: Pa rameter sw eeps for exploring GP parameters. Proceedings of the 2005 w orkshops on Genetic and evolutionary computation (2005) 212–219 17. Dr. Bill Punch, D .Z.: Lil-gp Link . 18. Sean Luke, Liviu Panai t, G.B.S.P .Z.S.E.P .J.H.J.B.R.H., Chircop, A.: Ecj a ja v a- based evo lutionary computation researc h system Link. 19. And erson, D., F edak, G.: The Compu t ational and S torage P otential of V olun- teer Computing. Proceedings of the IEEE International Symp osium on Cluster Computing and the Grid (CCGRID’06) (2006) 20. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cam bridge, MA , USA (1992) 21. Koza, J.: A hierarchical approac h to learning the Bo olean multiplexer function. Raw lins [1863] 171–19 2 22. T rujillo, L., Olague, G.: Synthesis of interest p oin t detectors th rough genetic p ro- gramming. In Cattolico, M., ed.: Proceedings of the Genetic and Evo lutionary Computation Conference, GECCO 2006, Seattle, W ashington, USA, July 8-12, 2006. V olume 1., ACM ( 2006) 887–894 23. Nieh, J., Leonard, O.C.: Examining VMware. j-DDJ 25 (8) (August 2000) 70, 72–74, 76 24. Daniel Lom bra ˜ n a Gonz´ alez, F rancisco F ern´ and ez de V ega, L.T.G.O.B.S.: Cus- tomizable ex ecution environments with virtual desktop grid computing. P arallel and Distributed Computing and Sy stems, PDCS (2007. Accepted)

Increasing GP Computing Power via Volunteer Computing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment