Adaptive Population Models for Offspring Populations and Parallel Evolutionary Algorithms

We present two adaptive schemes for dynamically choosing the number of parallel instances in parallel evolutionary algorithms. This includes the choice of the offspring population size in a (1+$\lambda$) EA as a special case. Our schemes are paramete…

Authors: J"org L"assig, Dirk Sudholt

Adaptiv e P opulation Mo dels for Offspring P opulations and P arallel Ev olutionary Algorithms J¨ org L¨ assig ICS, Univ ersit y of Lugano 6906 Lugano, Switzerland Dirk Sudholt CER CIA, Univ ersit y o f Birmingham Birmingham B15 2TT, UK June 2 0, 2018 Abstract A central question when parallelizing evolutionary algorithms is the choi ce of th e num b er of parallel instances. In practice opt i mal p ara m- eter settings are often hard to find due to limited info rmation ab out the o ptimization problem under consideration. W e present tw o adaptive sc hemes for dynamically choosing t he num b er of instances in eac h gen- eration. These schemes work in a black-b o x setting where no kn owledge on t h e function at han d is av ailable. Bo th schemes p ro vide near-optimal sp eed -ups in t erms of th e parallel time w hile not increasing the num b er of function ev aluations in an asymptotic sense, compared to up per b ounds via the fitness-level metho d. It turns out that th e optimization of the offspring p opulation size in a (1+ λ )-EA is just a sp eci al case in this con- text, so our schemes and results also work for the choice of th e offspring p opulatio n size. 1 In tro duc ti on Parallelization is b ecoming a more a nd more imp ortan t issue for so lving difficult optimization problems [1]. V arious implemen tations of parallel evolutionary algorithms (EAs) have b een applied in the past deca des [17]. One of the most imp ortant questio ns when dealing with para llel EAs is how to cho o se the num b er of pr ocesso r s such that a g oo d sp eed-up is achiev ed in terms of the par allel computation time, without wasting computatio nal effort in terms of the total sequential c o mputation time. W e c o nsider a setting where m ultiple pro cessor s try to find improv ements of the cur ren t best fitness in paral- lel. This corresp onds to an island model where subp opulations evolve in par allel and mig ration is used to s e nd copies of g oo d individuals to other islands. Our 1 setting is gr eedy in a sense that we assume a co mplete top ology on the isla nds; whenever one isla nd finds an improv ement o f the curre nt b est individual in the system, this is immedia tely co mm unica ted to all o ther islands. W e are in terested in finding b est-possible sp eed-ups in such a setting by adapting the num b er of islands. This s hould b e done without incr easing the asymptotic sequential running time. Cho osing the offspring p opulation size of a (1+ λ ) E A tur ns out to b e a sp ecial ca se in our setting, where we hav e λ islands, an (1+1 ) EA on each island and a single b est individual is s en t to all is la nds. The offspring p opulation size has already b een investigated theor etically and empirically by Jans en, De Jong, a nd W egener [9 ]. Our r e s ults a pply to b oth parallel EAs and offspr ing p opulations in the (1+ λ ) E A. F or b oth –parallel EAs a nd the (1+ λ ) EA– w e sp eak of the p ar al lel optimiza- tion time , denoted b y T par , as the num b er of genera t ions un til the first globa l optimum is ev alua ted. The se quential optimizatio n t ime , denoted by T seq , is defined a s the n umber o f function ev a luations un til the fir st glo bal optimum is ev aluated. Note that this includes a ll function ev aluations in the g eneration o f the alg orithm in which the improv ement is found. In b oth measures we allow ourselves to neglect the cost of the initialization as this o nly adds a fixed term to the r unning times. T o unify the notation for para llel EAs and offspring p op- ulations, w e simply sp eak of the p opulation size in the following; this mea ns the nu mber of islands in the island mo del and the offspr ing p opulation size for the (1+ λ ) E A, resp ectively . In prev io us work on the choice o f the offspring po pulation size [9] and on parallel spa tially structured EAs with a co mplete top ology [11] it was p ossi- ble to a nalytically der iv e asympto tica lly o ptimal p opulation sizes for thr ee test functions OneMax, L O , and Jump k . How ever, it remains op en whether one can derive an automatic way o f choosing optimal p opulation sizes. This is par- ticularly impor tan t with r egard to pro blems where it might not b e p ossible or worth while to p erform an analys is. In this work we pre s en t adaptive schemes for choos ing the p opulation size and accompany these schemes by a rigor ous theoretical analysis of their running time. Our schemes ar e inspired by GPU or cloud computing where it is p ossible to adjust the num b er of pro cessors o n-the-fly . The fir st scheme doubles the po pulation size if the current gener ation fails to pro duce an offspring that has larger fitness than the cur ren t be s t fitness v alue. Once an improvemen t is found, the p opulation size dro ps to 1; only the b est individual or island s urviv es. The second scheme tries to maintain a go od popula tion size ov er time; it also doubles the p opulation size in unsuccessful g enerations a nd it halves the p opulation size in successful gener ations. Both schemes a r e oblivious with res pect to the function a t hand and can therefore be a pplied in a black-b ox setting wher e no knowledge is av ailable on the function at hand. W e prove in the following that, compared to upp er b ounds via the fitness-level metho d, the exp e cted sequential optimization time do es no t increase asymptotica lly . But for the parallel optimization time the waiting time for improv ement s on every fitness le vel can be replaced by their log arithms. This leads to a tremendous sp eed-up, in particular for problems where improv ements 2 are har d to find. W e present general upp er b ounds for b oth schemes as well as example a pplications to tes t functions: OneMax, LO , the cla s s of unimo dal functions and Jump k . In our pro o fs we introduce new arg uments on the amortized analysis of algorithms, which may find further applications in the analys is of sto c hastic search algor ithm s and adaptive mechanisms. The remainder o f this work is str uctur ed as follows. In Section 2 we review previous work. Section 3 pre s en ts the algorithms and the cons idered p opulation upda te schemes. In Section 4 we provide technical statements that will b e used later on in our a nalyses and that ma y also help to understand the dynamics o f the adaptive algor ithms. Sectio n 5 then presents g eneral upper b ounds for b oth schemes, while Se c tio n 6 dea ls with lower bo unds o n exp ected sequential times. Section 7 contains a brief discussio n ab out tailo red, that is, non-oblivious p op- ulation up date schemes. Our gener al theorems a re applied to co ncrete example functions in Section 8. W e finish with a discus sion of pos sible extensions in Section 9 and conclus io ns in Section 10. 2 Previous W ork 2.1 Adaptiv e P opulation Mo dels Considering ada pt ive n umbers o f isla nds in the island mo del o f E A s, previo us work is very limited. How ever, there ar e numerous results for ada ptiv e p opula- tion sizes in EAs. E iben, Mar chiori, and V alko [5] describ e EAs with on-the-fly po pulation size a djustm ent. They compare d the pe r formance of the different strategies in ter ms of success rate, sp eed, and s o lution qualit y , measured on a v ariety of fitness landsca pes. The b est E As with adaptive po pulation resizing outp e rformed traditio na l approaches. Typical approa c hes are eliminating p opu- lation s ize as a n explicit parameter by intro ducing ag ing and max im um lifetime prop erties for individuals [12], the parameter - less GA (PLGA) whic h evolves a nu mber of p opulations of different siz e s simultaneously [7], ra ndom v ar iation of the p opulation size [3], and co mpetition schemes [14]. Sch wefel [15] sug gested λ -a daptation firs t, whic h adapts the offspring p op- ulation sized dur ing the optimization pro cess. Her dy [8] prop osed a mutativ e adaptation of λ in a t wo-lev el ES, where on the upper level, ca lled p opulation level, λ is trea ted as a v ar iable to be optimized while on the lo wer level, called individual level, the o b ject para meters are o pt imized. In [6] a deterministic ada pta tion scheme fo r the num b er of offspring λ based on theoretical considera tions on the rela tio n b et ween s erial r a tes of progr ess for the a ctual n umber of offspring λ , for λ − 1 and for the optimal nu mber of offspring is intro duced. More sp ecific, the lo cal seria l pr ogress (i. e. prog r ess per fitness function ev aluation) is optimized in a (1 , λ ) E A with resp ect to the nu mber o f offspring λ . The authors prov e the following structural pr operty: the serial prog ress-rate as a function of λ is either a function with exact one (lo cal and global) maximum or a str ictly monoto nically increasing function. 3 Jansen, De Jong, and W egener [9] further elab orate on the o ff spring p opula- tion size. A thorough runtime analysis of the effects of the offspring p opulation size is presented. They a lso suggest a simple way to dynamically adapt this pa- rameter and pres en t empirical res ult s for this scheme, but no theoretical a na ly- sis ha s b een p erformed. The pre s en ted scheme doubles the offspring p opulation size if the algo rithm is unsucc essful to improve the curr en tly b est fitness v alue. Otherwise, it divides the cur ren t o ffspring p opulation size by s , w her e s is the nu mber of offspr ing with b ett er fitness than the b est fitness v alue so far. W e will dis c us s in Section 9 how our schemes relate to their s c heme and in how far our results can b e transferre d. 2.2 Theoretical W ork on Parallel EA s In [10 ] a first rigo r ous r un time analys is for island mo dels has b een p erformed by constructing a function where alter nating phases o f indep enden t evolution and communication among the isla nds a re e s sen tial. A simple island mo del with migration finds a glo bal o ptim um in po lynomial time, while panmictic po pulations a s well as island mo dels without migration need exp o nen tia l time, with very high pro babilit y . New metho ds for the running time a nalysis of parallel evolutionary algo - rithms with spatially structure d p opulations hav e been pr esen ted in [11]. The authors generalized the well k nown fitness-level metho d, also called metho d of f -based partitions [18], fro m panmictic p opulations to spatially s tr uctured evo- lutionary alg orithms with v arious migration top ologies. These metho ds were applied to estimate the s peed-up gained by paralleliz a tion in pseudo- Boolea n optimization. T he pa rallel and s equen tia l optimization times were compared to upper b ounds for a panmictic EA der ived via the fitness-level method. It w as shown that the po ssible sp eed-up for the par a llel optimizatio n time inc r eases with the density of the to pology , while not increasing the total num b er of func- tion e v aluations , a symptotically . More prec isely , the classica l fitness level metho d says tha t when s i is a low er bo und o n the proba bilit y tha t o ne island leaves the current fitness level towards a b etter one, the exp ected time until this happ e ns is at mo st 1 /s i for a panmictic po pulation. In a paralle l E A with a unidirectional ring, the ex p ected parallel time decrease s to O ( s 1 / 2 ); in o ther words, the w aiting time can b e repla ced by its square ro ot. F or a torus graph even the third ro ot can be us e d and with a prop er choice of the num b er µ of islands, a sp eed-up of order µ is p ossible in some settings. Int eres tin gly , the r esults from [11] ca n partially b e interpreted in terms of adaptive po pulation sizes. The ana ly ses a re based on the num b ers of individuals on the cur ren t best fitness level. In o ur upp er b ounds we p essimistically assume that only islands on the cur ren t b est fitness level hav e a r easonable chance of finding b etter fitness levels. All worse individuals are ignored when estimating the waiting time for an improv ement of the best fitness level. F or a unidirectional ring, when migr ation happ ens in e v er y gener ation and b etter individuals are guaranteed to win in the selection step, the num b er of individuals on the c ur ren t 4 bes t fitness level increa s es by 1 in ea c h g e neration as always a new island is taken ov er. If an improv ement is found, it is p essimistically assumed that then o nly one island has made it to a new, better fitness level. This s ett ing cor responds exa ctly to a par a llel EA that in ea ch unsuccessful generation acquir es one new pro cessor a nd to an adaptive (1+ λ ) EA that in- creases λ b y 1 in each unsuccessful genera tion. O nce a n improv ement is found, the p opulation size dro ps to 1 as in the case of our first scheme presented here. The upp er b ounds from [11] ther e f or e dir ectly transfer to a ddit ive p opulation size adjustments. In the following we show that multiplicativ e a djustm ents of the populatio n size may a dmit better speed- ups than additiv e approaches as sug gested in [11]. 3 Algorithms In Sectio ns 5 and 7 we pr esen t gener al upper b ounds v ia the fitness-level metho d. These results are gene r al in the following sense. If a ll islands in a par allel EA run e lit ist algo rithms (i. e. alg orithms whe r e the b est fitness in the p opulation can never decrea se) and w e ha ve a low er b ound on the proba bilit y of finding a better fitness level then this can b e tur ne d into an upp er b ound for the exp ected sequential and para llel r unning times of the para llel EA. W e present a scheme for algorithms where this argument applies. The goal is to maximize some fitness function f in a n arbitrary search s pace. An a da ptation tow a rds minimization is trivial. Algorithm 1 E litist parallel EA with a daptiv e p opulation 1: Let µ := 1 and initialize a s ingle island P 1 1 uniformly at ra ndom 2: for t := 1 to ∞ do 3: for all 1 ≤ i ≤ µ in pa r allel do 4: Select parents and cr eate offspring by v ar iation 5: Send a copy of a fittest o ff spring to a ll other islands 6: Create P i t +1 such tha t it co ntains a b est individual from the union o f P i t , the new o ffspring, and the incoming mig ran ts 7: µ t +1 := up dateP opulationSize( P i t , P i i +1 ) 8: if µ t +1 > µ t then crea te µ t +1 − µ t new islands by copying existing 9: islands 10: if µ t +1 < µ t then delete µ t − µ t +1 islands The selection of islands to b e copied or r emo ved, resp ectiv ely , can be arbi- trary a s due to the complete top ology a ll islands alwa ys co n tain an offspring with the current b est fitness. With other top ologies this selection would b e based on the fitness v alues of the curr en t elitists on a ll islands. Note that we hav e neither s pecified a search spac e nor v ar ia tion op erators. How ever, in Section 6 w e will discuss low er b ounds that only hold in pseudo- Bo olean optimiza t ion and for EAs that o nly use standa rd mutation (i. e. flipping each of n bits indep enden tly with probability 1 /n ) for crea tin g new offspring. 5 The (1+ λ ) EA can b e re g arded a specia l case wher e we hav e λ islands a nd a single b est individual takes over all λ isla nds. Algorithm 2 (1 + λ ) EA with ada pt ive p opulation 1: Initialize a cur ren t sear c h p oin t x 1 uniformly at ra ndom 2: for t := 1 to ∞ do 3: Create λ offspring by mutation 4: Let x ∗ be the be st offspring 5: if f ( x ∗ ) ≥ f ( x t ) then x t +1 := x ∗ else x t +1 := x t 6: λ := up dateP opulationSize( { x t } , { x t +1 } ) In Section 8 w e will consider concr ete example functions where the (1+ λ ) EA with adaptive p opulations or, eq uiv a len tly , a n island mo del r unning (1+ 1) E As, with an adaptive popula tio n are applied. The la tt er was called paralle l (1+1) EA in [10, 1 1 ]. W e now define the po pulation update schemes co nsidered in this work. The function up datePopulationSize takes the old and the new p opulation as inputs and it outputs a new populatio n size . In order to help finding improvemen ts that take a lo ng time to b e found, we double the p opulation size in each uns ucc essful g eneration. As we mig h t not need tha t man y islands after a success, we reset the p opulation size to 1. Algorithm 3 up datePopulationSize( P t , P t +1 ) (Scheme A) 1: if ma x { f ( x ) | x ∈ P t +1 } ≤ max { f ( x ) | x ∈ P t } then 2: return 2 µ t 3: else 4: return 1 On pr oblems where finding impr ovemen ts ta kes a s imila r amo un t of time, it might no t make sense to throw aw ay all islands at onc e . Therefore, in the following scheme we halve the p opulation size with every succe ssful genera tion. W e will see that this do es not worsen the asymptotic p erformance compar ed to Scheme A. F o r some problems this sc heme will turn out to be s uperior. Algorithm 4 up datePopulationSize( P t , P t +1 ) (Scheme B) 1: if ma x { f ( x ) | x ∈ P t +1 } ≤ max { f ( x ) | x ∈ P t } then 2: return 2 µ t 3: else 4: return ⌊ µ t / 2 ⌋ Our schemes for pa r allel EAs ar e a pplicable in large clusters where the cost of allo cating ne w pro cessors is low, compared to the computational effort sp en t within the evolutionary algorithm. Man y of our results can b e ea s ily ada pted tow a rds algorithms that do no t use migratio n and p opulation size up dates in 6 every gener a tion, but only every τ generations, for a para meter τ ∈ N called migration interv al. This can significantly r educe the co sts for a llocating a nd deallo cating new pro cessor s. Details can be found at the end of Section 5. An a lgorithm using Scheme B can b e implemented in a dec en tra lized way as follows, where we a ssume that each island r uns on a dis tin ct pro cessor. Assume all pro cessors ar e synchronized, i. e., they share a common timer. All pro ces- sors have knowledge on the curr en t b est fitness lev el a nd they inform all other pro cessors by se nding messages in case they find a b etter fitness level. This message c o n ta ins genetic material that is taken ov er by o th er pro cessors so that all pro cessors work on the cur ren t b est fitness level. In the adaptive scheme, if a f ter one gene r ation no messag e has b een r eceiv ed, i. e., no pro cessor has found a b etter fitnes s level, each pro cessor activ a tes a new pro cessor as follows. E ac h pro cessor maintains a unique ID. The first pro cessor has an ID that simply consis ts o f a n empt y bit str ing. E a c h time a pro cessor activ ates a new pro cessor, it co pies its curre nt p opulation a nd its current ID to the new pro cessor. Then it app ends a 0 -bit to its ID while the new pro cessor app ends a 1-bit to its ID. A t the end, all pro cessors have enlarged their IDs by a single bit. When an impr o vemen t has b een found, all proc essors first take over the genetic material in the messag es that are pa s sed. Then all pro cessor s whose ID ends with a 1-bit shu t down. All other pro cessors remov e the las t bit from their IDs. It is easy to see that w ith this mechanism all pro cessors will alwa ys hav e pairwis e distinct IDs a nd no central control is needed to acquir e and shut down pro cessors. 4 T ail B ounds and Exp ectations In prepar ation for up coming running time a nalyses w e first pr o ve tail bo unds for the parallel optimization times in a setting where we ar e waiting for a sp ecific even t to ha pp en. This, along with b ounds on the exp ected par allel and seq uen- tial waiting times, will prove useful la ter on. The tail b ounds also indicate that the p opulation will no t grow to o la rge. In the rema inder of this pa per we abbrevia t e max { x, 0 } by ( x ) + . Lemma 1. Assume st a rting with 2 k islands for some k ∈ N 0 and doubling t h e numb er of islands in e ach gener ation. L et T par k ( p ) denote the r andom p ar al lel time until the first island enc ounters an event that o c curs in e ach gener ation with pr ob ability p . Then for every α ∈ N 0 1. Pr  T par k ( p ) > ( ⌈ log(1 / p ) ⌉ − k ) + + α + 1  ≤ exp( − 2 α ) , 2. Pr ( T par k ( p ) ≤ log(1 /p ) − k − α ) ≤ 2 · 2 − α , 3. log (1 /p ) − k − 3 < E ( T par k ( p )) < (log (1 /p ) − k ) + + 2 , 4. max { 1 /p, 2 k } ≤ E ( T seq k ( p )) ≤ 2 /p + 2 k − 1 . 7 Each ine quality r emains valid if p is r eplac e d by a p essimistic est i mation of p (i. e. either an upp er b ound or a lower b ound). Pr o of. T he conditio n T par k ( p ) > ( ⌈ log(1 / p ) ⌉ − k ) + + α + 1 req uires that the even t do es not happ en on any island in this time pe r iod. The nu mber of trials in the last generatio n is at leas t 2 ⌈ log(1 /p ) ⌉ + α ≥ 1 /p · 2 α for all k ∈ N 0 . Hence Pr  T par k ( p ) > ( ⌈ log(1 / p ) ⌉ − k ) + + α + 1  ≤ (1 − p ) 1 /p · 2 α ≤ exp( − 2 α ) . F or the seco nd statement w e as sume k ≤ lo g(1 /p ) − α as otherwise the cla im is triv ial. A neces s ary condition for T par k ( p ) ≤ log(1 /p ) − k − α is that the even t do es happ en at least once within in the first log(1 /p ) − k − α generations. This corres p onds to at most P log(1 /p ) − α i =1 2 i − 1 ≤ 2 log(1 /p ) − α = 1 /p · 2 − α trials. If p > 1 / 2 the claim is tr ivial as e ither the proba bilit y b ound o n the rig ht-hand side is at lea st 1 or the time b ound is neg ativ e, hence we as sume p ≤ 1 / 2. Observing that then 1 /p · 2 − α ≤ 2 (1 /p − 1) · 2 − α , the considered probability is bo unded by 1 − (1 − p ) 2(1 /p − 1) · 2 − α ≤ 1 − exp( − 2 · 2 − α ) ≤ 1 − (1 − 2 · 2 − α ) = 2 · 2 − α . T o b ound the exp ectation we observe that the first statement implies Pr  T par k ( p ) ≥ (log (1 /p ) − k ) + + α + 2  ≤ exp( − 2 α ). Since T par k is non- negative, we hav e E ( T par k ( p )) = ∞ X t =1 Pr ( T par k ( p ) ≥ t ) ≤ (log(1 /p ) − k ) + + 1 + ∞ X α =0 Pr  T par k ( p ) ≥ (log(1 / p ) − k ) + + α + 2  ≤ (log(1 /p ) − k ) + + 1 + ∞ X α =0 exp( − 2 α ) < (log(1 /p ) − k ) + + 2 as the last sum is less than 1. F or the low er b ound we use that the se c ond 8 statement implies Pr ( T ≥ log (1 /p ) − k − α ) ≥ 1 − 2 · 2 − α . Hence E ( T par k ( p )) = ∞ X t =1 Pr ( T par k ( p ) ≥ t ) ≥ log(1 /p ) − k − 1 X α =2 Pr ( T par k ( p ) ≥ log (1 /p ) − k − α ) ≥ log(1 /p ) − k − 1 X α =2 (1 − 2 · 2 − α ) = log(1 /p ) − k − 2 − log(1 /p ) − k − 2 X α =1 2 − α > log(1 /p ) − k − 3 . F or the four th s tatemen t co ns ider the islands o ne-b y-one, according to some arbitrar y ordering . Let T ( p ) b e the random num b er of sequential tria ls until an event with probability p happ ens. It is w ell known that E ( T ( p )) = 1 /p . Obviously T seq k ( p ) ≥ T ( p ) since the sequential time has to account for all islands that ar e active in one gener ation. This prov es E ( T seq k ( p )) ≥ E ( T ( p )) ≥ 1 / p . The second lower b o und 2 k is obvious a s at lea st one generatio n is neede d for a success. F or the upper b ound obs erv e that T seq k ( p ) = 2 k in case T ( p ) ≤ 2 k and T seq k ( p ) = P ℓ i = k 2 i in case P ℓ − 1 i = k 2 i < T ( p ) ≤ P ℓ i = k 2 i . T ogether, we get that T seq k ( p ) ≤ max { 2 T ( p ) , 2 k } ≤ 2 T ( p ) + 2 k − 1, hence E ( T seq k ( p )) ≤ 2 /p + 2 k − 1. The pres en ted ta il b ounds indicate that the p opulation typically do es not grow to o la rge. The pro babilit y that the num b er of gener ations exc e e ds the exp ectation by an additive v alue of α + 1 is even an inv erse doubly ex ponen- tial function. The following provides a more handy statement in terms of the po pulation size. It follows immediately fr om Lemma 1. Corollary 1. Consider the sett ing describ e d in L emma 1 . F or every β ≥ 1 , β a p ower of 2, the pr ob ability that while waiting for the event to happ en the p opulation size exc e e ds max { 2 k +1 , 4 /p } · β is at most exp( − β ) . One conclusio n from these findings is that o ur schemes can b e a pplied in practice without risking an overly larg e blowup of the p opulation size. W e now turn to p erformance g uaran tees in terms of exp ected parallel and s equen tia l running times. 5 Upp er Boun ds via Fitness Lev els The following results are base d on the fitness-level method or metho d of f -ba sed partitions. This metho d is well known for pr o ving upper b ounds for algor ithms that do no t a ccept worsenings of the p opulation. Consider a partitio n of the 9 search space into sets A 1 , . . . , A m where for all 1 ≤ i ≤ m − 1 all search p oin ts in A i are s tr ictly worse than all search p oint s in A i +1 and A m contains a ll glo ba l optima. If each set A i contains o nly a single fitness v alue then the partition is called a c anonic par tit ion. If s i is a lower b ound on the pr obabilit y o f creating a sear c h po int in A i +1 ∪ · · · ∪ A m , provided the curre nt b est search p oin t is in A i , then the exp ected optimization time is b ounded from ab o ve by m − 1 X i =1 Pr ( A i ) · m − 1 X j = i 1 s j , where Pr ( A i ) abbr eviates the proba bilit y that the b est search point after ini- tialization is in A i . The reason for this b ound is that the e x pected time until A i is left towards a hig he r fitness- le v el set is at most 1 /s i and each fitness level, starting from the initial one, has to b e left at mos t once. Note that we can a l- wa y s simplify the ab o ve b ound by p essimistically assuming that the p opulation is initialized in A 1 . This remov es the term “ P m − 1 i =1 Pr ( A i ) · ” and only leav es P m − 1 j =1 1 /s j . This wa y of simplifying uppe r b ounds ca n b e used for all results presented hereinafter. The fitness- le v el metho d yields go o d upp er b ounds in many cases. This includes situations where an evolutionary algo rithm typically moves through increasing fitness levels, without s k ipping to o many levels [16]. It only g iv es crude upp er b ounds in case v alues s i are dominated b y sea rc h p oint s from which the probability of leaving A i is m uch lower than for other search po in ts in A i or if there are levels with difficult lo cal optima (i. e. la rge v alues 1 /s i ) that are only reached with a sma ll probability . Using the exp ectation b ounds fro m Section 4 we now show the following result. The main implication is that for b oth schemes, A and B, in the upp er bo und for the exp ected par allel time the exp ected sequential waiting time is re- placed by its lo garithm. In addition, compar ed to the fully serialized a lgorithm, the exp ected s e quen tia l time do es not increase a symptotically , and with resp ect to the upp er b ound gained by f -bas e d partitions. In the re ma inder of the pap er we denote with T par x and T seq x , x ∈ { A , B } the parallel time and the sequen tial time for the Sc hemes A and B, r espectively . Theorem 1. Given an f -b ase d p artition A 1 , . . . , A m , E ( T seq A ) ≤ m − 1 X i =1 Pr ( A i ) · 2 m − 1 X j = i 1 s j . If the p artition is c anonic then also E ( T par A ) ≤ m − 1 X i =1 Pr ( A i ) · 2 m − 1 X j = i log  2 s j  . 10 The r eason for the co nstan t 2 in lo g(2 /s j ) is to ens ur e that the term do es not b ecome smaller than 1; with a constant 1 the v a lue s j = 1 would even le a d to a summand log(1 /s j ) = 0. Pr o of. W e only need to pr ove asymptotic b ounds o n the conditional exp ecta- tions when star ting in A i , with a common co nstan t hidden in all O -terms. The law of total exp ectation then implies the cla im. F or Scheme A we apply Lemma 1 w ith k = 0 . This yields that the exp ected sequential time for leaving the cur ren t fitnes s level A j tow a rds A j +1 ∪ · · · ∪ A m is at mos t 2 /s j and the exp ected par allel time is at most log(1 /s j ) + 2 ≤ 2 log(2 /s j ). The exp ected sequential time is hence bounded by 2 P m − 1 j = i 1 /s j and the exp ected pa r allel time is at most 2 P m − 1 j = i log(2 /s j ). W e prov e a similar upp er b ound fo r Sch eme B using arg umen ts from the amortized analy sis of algorithms [2 , Chapter 17]. Amortized analy s is is us ed to derive sta temen ts on the average running time of an op eration or to estimate the total costs of a sequence of o perations . It is esp ecially useful if so me op erations may be far more costly tha n others a nd if expensive op erations imply that many other op erations will b e cheap. The ba sic idea of the so-called ac c ounting metho d is to let all o p erations pa y for the costs of their exe c ut ion. Op erations are allow ed to pay excess a moun ts of money to fictional accounts. Other op e r ations can then tap this p ool of money to pay for their costs. As long as no acco un t bec omes overdra wn, the total costs of all op erations is b o unded by the total amount of money that has been paid or dep osited. Theorem 2. Given an f -b ase d p artition A 1 , . . . , A m , E ( T seq B ) ≤ m − 1 X i =1 Pr ( A i ) · 3 m − 1 X j = i 1 s j . If the p artition is c anonic then also E ( T par B ) ≤ m − 1 X i =1 Pr ( A i ) · 4 m − 1 X j = i log  2 s j  . Pr o of. W e use the a ccoun ting metho d as fo llows to b o und the exp ected se- quential optimization time o f B. Ass ume the alg orithm b eing on level j with a p opulation s iz e o f 2 k . If the current g eneration passe s without leaving the current fitness level, we pay 2 k to cov e r the costs for the sequential time in this genera tion. In addition, w e pay another 2 k to a fictional bank account. In case the gener ation is successful in leaving A j and the pre v ious genera tion was unsuccessful, we just pay 2 k and do no t make a dep osit. In case the cur ren t gen- eration is success f ul a nd the las t unsuccessful genera tion was on fitness lev el j , we withdraw 2 k from the ba nk account to pay for the curr en t gener ation. In other words, the c ur ren t g eneration is for fr ee. This wa y , if there is a sequence of successful generatio ns after a n unsuccessful one on level j all but the fir st successful genera t ions are for fre e. 11 Let us verify that the bank acco un t ca nnot b e ov erdrawn. The basic ar - gument is tha t , whenever the p opulation size is decre a sed from, s ay , 2 k +1 to 2 k then there must b e a previous gener ation where the p opulation size w as in- creased from 2 k to 2 k +1 . It is easy to see that asso ciating a decrea se with the latest increase gives an injectiv e mapping. In simpler ter ms, the latest g e ne r a- tion that has incr eased the p opulation size from 2 k to 2 k +1 has alrea dy paid for the current decrease to 2 k . When in the upp er b ound for A fitness level i takes sequential time 1 + 2 + · · · + 2 k = 2 k +1 − 1 then for B the tota l costs paid are 2(1 + 2 + · · · + 2 k − 1 ) + 2 k as a success fu l genera tion does not make a dep osit to the bank account. The total costs equal 2 k +1 − 2 + 2 k ≤ 3 / 2 · (2 k +1 − 1). In consequence, the total costs for Scheme B are a t most 3 / 2 the co sts for A in A’s upp er b ound. This proves the claimed upp er b ound for B. By the very same a rgumen t an upper b ound for the exp ected par allel time for B follows. Instead o f paying 2 k and maybe making a dep osit of 2 k , we alwa y s pay 1 and alwa ys make a depo sit of 1. When withdrawing money , we always withdraw 1. This prov es that also E ( T par B ) is at most twice the corre sponding upper bo und for Scheme A. The argument in the ab o ve pro of ca n also b e used for proving a general upp er bo und for the exp ected parallel optimization time for B. When paying co sts 2 for each fitness level, this pays fo r the successful generation with a popula tion size of, say , 2 k and for one future generatio n where the po pulation size might hav e to b e doubled to re ac h 2 k again. Imagine the sequence of p opulation sizes ov er time and then delete a ll el- ement s where the p opulation size has decr eased, including the asso ciated gen- eration where the p opulation size was incr eased b eforehand. In the rema ining sequence the p opulation size co n tinually incr eases until, assuming a global op- tim um has not been found yet, after n lo g n generations a po pulation size o f at least n n is reached. In this case the pro babilit y of cr eating a globa l o pti- m um by mutation is at lea st (1 − n − n ) n n ≈ 1 /e a s the pro babilit y of hitting any specific target p oint in one m utation is a t least n − n . The exp ected num- ber of generations un til this happ ens is clea rly O (1). W e hav e th us shown the following. Corollary 2. F or every function with m funct ion values we have E ( T par B ) ≤ 2 m + n log n + O (1 ) . This b ound is asymptotically tig h t, for instance, for long path pro blems [4, 13]. So, the m -term is, in general, neces sary . When compar ing A and B with resp ect to the exp ected para llel time, we exp ect B to per form b etter if the fitness levels hav e a similar degr ee of difficulty . This implies that there is a certa in tar get lev el fo r the p opulation size. Note, how e ver, that such a tar get level do es no t exis t in case the s i -v alues a re dissim- ilar. In the case of similar s i -v alues A mig h t be forced to sp end time doubling the p opulation s ize for each fitness level un til the target level ha s b een reached. 12 This waiting time is reflected by the log (2 /s j )-terms in Theor em 1. The fol- lowing upper bo und on B shows that these log-ter ms can b e avoided to some extent. In the sp ecial yet r ather common situatio n that impro vemen ts b ecome harder with each fitness lev el, only the bigg est such log-ter m is needed. Theorem 3. Given a c anonic al f -b ase d p artition A 1 , . . . , A m , E ( T par B ) is b ounde d by m − 1 X i =1 Pr ( A i ) · 3( m − i − 1) + log  1 s i  + m − 1 X j = i +1  log  1 s j  − log  1 s j − 1  + ! . If addi tional ly s 1 ≥ s 2 ≥ · · · ≥ s m − 1 then t h e b oun d simplifies to m − 1 X i =1 Pr ( A i ) ·  3( m − i − 1) + log  1 s m − 1  . Pr o of. T he second claim immedia t ely fo llo ws from the fir st one a s the lo g-terms form a telescoping sum. F or the fir st b ound we ag ain use arg um ents from amo rtized analysis. By Lemma 1 if the current p opulation size is 2 k then the expected num b er of genera- tions un til an improv ement fr om level i happ ens is at most (log(1 / s i ) − k ) + + 2. This is a b ound o f 2 if k ≥ log (1 /s i ). W e p erform a so -called aggr e gate analysis to estimate the total c o st o n all fitness levels. Thes e co sts are attributed to different source s . Summing up the costs for a ll so ur ces will yield a b o und on the total costs and hence on T par B . In the first gener a tion the fitness level i ∗ the algo rithm sta rts on pays log(1 /s i ∗ ) to the global bank account. Afterw ards costs ar e a s signed as fol- lows. Consider a generatio n o n fitness level i with a p opulation size of 2 k . • If the curr en t gener ation is succes sful, we charge c ost 2 to the fitness level; cost 1 pays for the e ffo r t in the gener ation and co st 1 is dep osited on the bank account. In addition, each fitness level j that is skipp ed or reached during this improv ement pays (log(1 / s j ) − log(1 /s j − 1 )) + as a dep osit on the bank a c c oun t. Note that this a moun t is non-negative and it may b e fractional. • If k ≥ log(1 /s i ) and the current generation is unsuccessful we charge cost 1 to the fitness level. • If k < lo g(1 /s i ) and the curr en t generation is unsuccessful we withdraw cost 1 fro m our bank account. By Lemma 1 the exp ected cost charged to fitness level i in unsuccessful gener a- tions (i. e., not counting the last succ e ssful gener ation) is at most 1. Assuming 13 that the bank account is never overdra wn, the overall exp ected cost for fitness level i is at most 1 + 2 + (log (1 /s j ) − log(1 /s j − 1 )) + . Adding the cos t s for the initial fitness level yields the c laimed b ound. W e use the so- called p otential metho d to show that the bank a ccoun t is never ov erdrawn. Our c la im is that at any p oin t of time there is enoug h money o n the bank a ccoun t to cov er the co s ts o f increa sing the curre nt p opulation siz e to at lea st 2 log(1 /s j ) where j is the current fitness lev el. W e co nstruct a potential function indicating the exces s money on the bank a c coun t and s ho w that the po ten tial is alwa ys non-negative. Let µ t denote the p opulation size in g eneration t and ℓ t be the (random) fitness le vel in g e ne r ation t . By b t we deno te the account balance o n the bank account. W e prov e by induction that b t ≥ (log(1 / s ℓ t ) − log( µ t )) + . As this b ound is alwa ys p ositive, this implies that the ac c oun t is never over- drawn. After the initial fitness level has made its dep osit we ha ve b 1 := log(1 /s ℓ 1 ) − 0 . Assume by induction that the bound holds for b t . If gener ation t is unsucces sful a nd log ( µ t ) ≥ log(1 /s ℓ t ) then the p opulation size is doubled a t no co st for the ba nk acco unt. As by induction b t ≥ 0 we have b t +1 = b t ≥ 0 = (log (1 / ( s ℓ t )) − log( µ t +1 )) + . If generation t is unsuccessful and log ( µ t ) < log(1 / s ℓ t ) then the algor ithm doubles its p o pulation size and withdraws 1 from the bank a ccoun t. As b t is po sitiv e a nd log ( µ t +1 ) = log( µ t ) + 1, we hav e b t +1 = b t − 1 = lo g(1 /s ℓ t ) − lo g( µ t ) − 1 = log(1 / s ℓ t ) − log( µ t +1 ) . If gener ation t is succ essful and the current fitness level inc r eases from i to j > i the account balance is incr eased by 1 + j X a = i +1 (log(1 / s a ) − log(1 /s a − 1 )) + ≥ 1 + (log (1 /s j ) − lo g(1 /s i )) + . This implies b t +1 ≥ b t + 1 + (log (1 /s j ) − log(1 /s i )) + ≥ (log(1 /s i ) − log( µ t )) + + 1 − log(1 /s i ) − lo g( µ t +1 ) ≥ (log(1 /s j ) − log( µ t )) + + 1 ≥ (log(1 /s j ) − log( µ t +1 )) + . The upp er b o unds in this section c an b e easily ada pted tow ards paralle l EAs that do no t pe r form migr a tion a nd p opulation size ada pt ation in every generation, but only every τ g enerations, for a migra tio n in terv al τ ∈ N . Instead of cons idering the probability of leaving a fitness level in one genera tion, we 14 simply c o nsider the probability o f leaving a fitness level in τ generatio ns. This is done by considering s ′ i := 1 − (1 − s i ) τ instead of s i . The resulting time bo unds, based o n s ′ 1 , . . . , s ′ m − 1 , are then with resp e ct to the num b er o f per iods of τ genera tions. T o get b ounds on our origina l measures of time, we just m ultiply a ll bo unds by a factor of τ . 6 Lo w er Bounds In order to prove low e r b ounds for the e x pected sequential time we make use of recent re s ults b y Sudholt [16]. He prese n ted a new low er-b ound metho d base d on fitness-level arguments. If it is unlik ely tha t many fitness levels ar e skipp ed when leaving the curr en t fitness-level set then go od low er b ounds can b e shown. The lower b ound applies to every algorithm A in pseudo- Boolea n o pt imiza- tion tha t only uses standard mutations (i. e. flipping each bit independently with probability 1 / n ) to create new offspr ing. Such an EA is ca lled a mutation-based EA. More precisely , every mutation-based E A A works as follows. First, A cre ates µ sea rc h po ints x 1 , . . . , x µ uniformly a t r andom. Then it re- pea ts the following lo o p. A counter t counts the num b er of function ev a luations; after initia liz ation we hav e t = µ . In one iteration o f the lo op the algo r ithm first selects one out of all se a rc h p oin ts x 1 , . . . , x t that hav e been created so far. This decision is based o n the fitness v alues f ( x 1 ) , . . . , f ( x t ) and, poss ibly , also the time index t . It p erforms a standard mutation of this sea r c h p oin t, creating an offspring x t +1 . T o make this work self-contained, we cite (a slightly simplified version o f ) the result here. The p erformance measure consider ed is the num b er of function ev aluatio ns , which one can assume to co incide with the num b er o f m utations. Theorem 4 ([1 6 ]) . Consider a p artition of the se ar ch sp ac e into non-empty s et s A 1 , . . . , A m such that only A m c ontains glob al optima. F or a mu tatio n-b ase d EA A we say that A is in A i or on level i if the b est individ ual cr e ate d so far is in A i . L et the pr ob ability of tr aversing fr om level i t o level j in one mut ation b e at most u i · γ i,j and P m j = i +1 γ i,j = 1 . Assume that for al l j > i and some 0 < χ ≤ 1 it holds γ i,j ≥ χ P m k = j γ i,k . Then the exp e cte d numb er of function evaluations of A on f is at le ast m − 1 X i =1 Pr ( A i ) · χ m − 1 X j = i 1 u j . All p opulation upda te s c hemes ar e compatible with this framework; every parallel m utation-ba sed EA using an arbitrary p opulation up date scheme is still a mutation-based EA. Offspr ing cr eations are p erformed in parallel in our al- gorithms, but one can imagine these o perations to b e per formed seq uen tially . Since the selection can be based on the time index t it is easy to exclude that offspring cr eated in the current generation are used as par en ts ahea d o f time. By stor ing k no wledge on the times when ea c h isla nd ha s b een a ctiv e and also 15 recording migratio ns , this info r mation ca n also b e used to mimic the p opula- tion management mechanism and to ens ure that only sear c h p oin ts from the currently a ctiv e island a r e chosen as par e n ts. There is o ne cav eat: the pa r en t selection mechanism in [16] do es not account fo r p ossibly randomized decisions made during migr ation. How ever, the pr oof of Theorem 4 go es through in case additional knowledge is use d. Definition 1. Cal l an f -b ase d p artition A 1 , . . . , A m (asymptotic al ly) t ig ht for an algorithm A if t h er e exist c onstant s c ≥ 1 > χ > 0 and values γ i,j such that for e ach p opulation in A i the fol lo wing holds. 1. The pr ob ability of gener ating a p opulation in A i +1 ∪ · · · ∪ A m in one mu - tation is at le ast s i . 2. The pr ob ability of gener ating a p opulation in A j in one mu tatio n, j > i , is at m ost c · s i · γ i,j . 3. F or the γ i,j -values it holds that P m j = i +1 γ i,j = 1 and γ i,j ≥ χ P m k = j γ i,k for al l i < j . Tight f - based partitions imply that the sta ndard upp er b ound b y f -based partitions [1 8 ] is asymptotica lly tight. This holds for all elitist m utation-ba s ed algorithms, that is, mut ation- ba sed algorithms where the bes t fitness v alue in the p opulation can never decrease. Theorem 5. Consider an algorithm A with an arbitr ary p opulation up date str ate gy that only u ses s t anda r d mu tatio ns for cr e ating new offspring. Given a tight f -b ase d p artition A 1 , . . . , A m for a fu n ctio n f , we have E ( T seq ) = Ω   m − 1 X i =1 Pr ( A i ) · m − 1 X j = i 1 s j   . Pr o of. T he low er b ound on E ( T seq ) follows by a direct application of Theorem 4. W e alre a dy discuss ed that this theor e m applies to all a lgorithms considered in this work. Setting u j := cs j for all 1 ≤ j ≤ m , c and χ be ing as in Definition 1, Theorem 4 implies E ( T seq ) ≥ m − 1 X i =1 Pr ( A i ) · χ c m − 1 X j = i 1 s j . As b oth, χ and c , ar e constants, this implies the claim. This low er b ound shows that for tight f -ba sed partitions both our p opulation upda te schemes pro duce asymptotically optimal results in terms o f the exp ected sequential optimization time. 16 7 Non-oblivious Up date Sc hemes W e also briefly disc us s up date s c hemes that are tailo red tow ards par ticular functions, in or der to judge the pe rformance o f our oblivious up date schemes. Non-oblivious p opulation up date schemes may allo w for smaller upp er bo unds for the ex pected par allel time than the o nes seen so far . When the po pulation update s c heme has co mplete knowledge on the function f a nd the f -based partition, an upper b ound ca n b e shown where each fitness level co n- tributes o nly a cons tan t to the exp ected para llel time. By T seq no and T par no we denote the sequential a nd para llel times of the cons ider ed non-oblivio us sc heme. Theorem 6. Given an arbitr ary f -b ase d p artition A 1 , . . . , A m , t h er e is a tai- lor e d p opulation up date scheme for which E ( T seq no ) = O   m − 1 X i =1 Pr ( A i ) ·   m − 1 X j = i 1 s j     and E ( T par no ) = O m − 1 X i =1 Pr ( A i ) · ( m − i − 1 ) ! . In p articular, E ( T par no ) = O ( m ) . Pr o of. T he up date scheme cho oses to us e ⌈ 1 /s i ⌉ isla nds if the alg orithm is in A i . Then the probability of finding a n improv ement in o ne genera tio n is at lea st 1 − (1 − s i ) 1 /s i ≥ 1 − 1 /e . The expected par a llel time un til this happ ens is at most e/ ( e − 1) and so the exp ected sequential time is at most e/ ( e − 1) · ⌈ 1 /s i ⌉ ≤ 2 e/ ( e − 1) · 1 / s i . Summing up these exp ectations fo r a ll fitness levels fro m i to m − 1 prov es the tw o b ounds. In some situations it is p ossible to des ig n schemes that p erform even b etter than the ab o ve bo und suggests. F or instance, for trap functions the b est strateg y would b e to use a very large p opulation in the first g eneration so that the optimum is found with high probability , a nd b efore the algo rithm is tric ked to increasing the distance to the g lobal optimum. 8 Bounds for Example F unctions The previous b ounds all applied in a very gener al con text, with ar bitrary fit- ness functions. W e also give results for selected e x ample functions to estimate po ssible sp eed-ups in mo re concre te settings. W e co nsider the s e t of exa mple functions a nd function classes that has al- ready b een inv estigated in [1 1 ]. The goal is the maximization of a pseudo - Bo olean function f : { 0 , 1 } n → R . F or a s earc h p oin t x ∈ { 0 , 1 } n write x = x 1 . . . x n , then OneMax( x ) := P n i =1 x i counts the num ber of ones in x and LO( x ) := P n i =1 Q i j =1 x i counts the num b er o f leading ones in x . A function 17 is called unimo dal if e very non-optimal sea rc h po in t has a Hamming neighbor (i. e., a p oint with Hamming distance 1 to it) with strictly large r fitness. F or 1 ≤ k ≤ n we a lso consider Jump k := ( P n i =1 k + x i , if P n i =1 x i ≤ n − k or x = 1 n , P n i =1 (1 − x i ) otherwise . This function has b een introduced by Droste, Jansen, a nd W egener [4] as a function with tuna ble difficult y a s evolutionary algor ithms typically hav e to per form a jump to overcome a gap by flipping k s p ecific bits. F or these functions we obtain b ounds for T seq and T par as summarized in T able 1. The low er bo unds for E ( T seq ) on OneMax and L O follow directly from [16] for all schemes. Scheme E ( T seq ) E ( T par ) OneMax A Θ( n log n ) O ( n log n ) B Θ( n log n ) O ( n ) non-oblivious Θ( n log n ) O ( n ) LO A Θ( n 2 ) Θ( n log n ) B Θ( n 2 ) O ( n ) non-oblivious Θ( n 2 ) O ( n ) unimo dal f A O ( dn ) O ( d log n ) with d f -v alues B O ( dn ) O ( d + log n ) non-oblivious O ( dn ) O ( d ) Jump k A O ( n k ) O ( n log n ) with k ≥ 2 B O ( n k ) O ( n + k log n ) non-oblivious O ( n k ) O ( n ) T able 1: Asymptotic b ounds for exp ected para llel running times E ( T par ) and exp ected sequential running times E ( T seq ) for the par a llel (1+1) EA and the (1+ λ ) E A with ada ptiv e p opulation mo dels. Theorem 7. F or the p ar al lel (1+1) EA and t h e (1+ λ ) EA with adaptive p opu- lation m o dels the upp er b ounds for E ( T seq ) and E ( T par ) hold as given in T able 1. Pr o of. T he upp er bounds for Scheme A fo llo w from Theo rem 1, for Scheme B from Theo rems 2 and 3 and for the non-o blivious sc heme from Theore m 6. Starting p essimistically fro m the first fitness level, the following b ounds hold: • F or OneMax we are using the ca nonical f -based pa rtition A i := { x | OneMax( x ) = i } and the c orrespo nding success probabilities s i ≥ ( n − i ) /n · (1 − 1 /n ) n − 1 ≥ ( n − i ) / ( en ). Hence, E ( T par A ) ≤ 2 P n − 1 i =1 log( 2 en n − i ) ≤ 18 2 n log(2 e n ) = O ( n log n ), E ( T seq A ) ≤ 2 n − 1 X i =0 1 s i ≤ 2 n − 1 X i =0 en n − i = 2 en n X i =1 1 i = 2 en · [(ln n ) + 1] , E ( T par B ) ≤ (3( n − 2) + log(2 en )) = O ( n ) and E ( T seq B ) ≤ 3 en · [(ln n ) + 1 ], E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n log n ). • F or LO we ar e using the cano nical f -based partition A i := { x | LO( x ) = i } and the co rrespo nding s uc c e ss proba bilit ies s i ≥ 1 /n · (1 − 1 /n ) n − 1 ≥ 1 / ( en ). Hence, E ( T par A ) ≤ 2 P n − 1 i =0 log(2 en ) = 2 n lo g (2 en ) = O ( n log n ), E ( T seq A ) ≤ 2 n − 1 X i =0 1 s i ≤ 2 n − 1 X i =0 en = 2 en 2 , E ( T par B ) ≤ (3( n − 2) + log( en )) = O ( n ), E ( T seq B ) ≤ 3 en 2 , E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n 2 ). • F or unimoda l functions with d function v alues , w. l. o . g. { 1 , . . . , d } , we are using corresp onding success probabilities s i ≥ 1 / ( en ). Hence, E ( T par A ) ≤ 2 P d − 1 i =1 log(2 en ) ≤ 2 d log(2 en ) = O ( d n ), E ( T seq A ) ≤ 2 d − 1 X i =1 1 s i ≤ 2 d − 1 X i =1 en = 2 edn , E ( T par B ) ≤ 3( d − 2) + log( en ) = O ( d + log n ), E ( T seq B ) = 3 edn , E ( T par no ) = O ( d ) and E ( T seq no ) = O ( dn ). • F or Jump k functions with k ≥ 2 and all individuals having neither n − k nor n 1-bits an improv ement is found by either increas ing or decr easing the num b er o f 1-bits. This corr esponds to o pt imizing OneMax. In order to improve a solution with n − k 1- bits a sp ecific bit string w ith Hamming distance k has to b e created, which has pr obabilit y s n − k at least  1 n  k ·  1 − 1 n  n − k ≥  1 n  k ·  1 − 1 n  n − 1 ≥ 1 en k . Hence, E ( T par A ) ≤ O ( n log n ) + 2 lo g ( en k ) ≤ O ( n log n ) + 2 k log ( en ) = O ( n lo g n ), E ( T seq A ) ≤ O ( n k ) , E ( T par B ) ≤ O ( n )+ k log ( en ) = O ( n + k log n ), E ( T seq B ) ≤ O ( n k ), E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n k ). It can b e seen from T able 1 that b oth our schemes lead to significant sp eed- ups in the considered settings. The sp eed-ups incr ease with the difficulty of the 19 function. This b ecomes o b vio us when comparing the results o n O neMax and LO and it is even mo re visible for Jump k . The upp er bo unds for E ( T par B ) are always as ymptotically lower than those for E ( T par A ), except for J ump k with k = Θ( n ). How ever, without corre sponding low e r b ounds we cannot say whether this is due to differences in the rea l run- ning times o r whether we simply proved tighter guar an tees for B. W e therefore consider the function LO in more deta il and prov e a low er b ound for A. This demonstrates that Scheme B can b e a s ymptotically b etter than Scheme A on a concrete problem. Theorem 8. F or the p ar al lel (1+1) EA and the (1+ λ ) EA with adaptive p op- ulation mo dels on LO we have E ( T par A ) = Ω( n log n ) . Pr o of. W e consider a p essimistic setting (p essimistic for proving a low er bo und) where an improv ement has probability e x actly 1 /n . This ignores that all lead- ing ones have to b e co nserv ed in or der to incr ease the bes t LO-v alue. W e show that with pr obabilit y Ω(1) a t lea st n / 30 improv ements ar e needed in this set- ting. As by Lemma 1 the exp ected waiting time for a n improvemen t is at least max { 0 , (log n ) − 3 } , the conditional exp ected pa rallel time is Ω( n log n ). By the law of total e x pectation, also the unconditional exp ected parallel time is then Ω( n log n ). Let us b ound the exp ected increase in the num b er of leading o nes on o ne fitness level. Let T par i denote the random num b er o f genera tions until the b est fitness incr eases when the algor it hm is on fitness le vel i . By the law of total exp ectation the exp ected increase in the b est fitness in this generation equals ∞ X t =1 Pr ( T par i = t ) · E (LO-incr ease | T par i = t ) . (1) The exp ected increase in the num b er of leading ones can b e estimated a s follows. With T par i = t the num b er of mutations in the succ e s sful generation is 2 t − 1 . Let I deno te the num b e r of m utations that increa s e the current best LO -v alue. A well-kno wn prop ert y o f LO is that whe n the current bes t fitness is i then the bits at p ositions i + 2 , . . . , n are uniform. Bits tha t form pa r t of the leading ones after an improv ement ar e called fr e e riders . The proba bilit y of having k free riders is thus 2 − k (unless the end of the bit string is r eac hed) and the ex pected nu mber of free rider s is at most P ∞ k =0 2 − k = 1. The unifor mit y of “ra ndom” bits a t p ositions i + 2 , . . . , n holds after any spe- cific num b er of mutations and in particula r after the mutations in generation T par i hav e been p erformed. How ever, when lo oking at mu ltiple improvemen ts, the free-rider even ts are not ne c e ssarily indep enden t as the “ra ndom” bits ar e very likely to b e co r related. The following rea soning av oids these po ssible de- pendenc ie s . W e consider the improv ements in generation T par i one-by-one. If F 1 denotes the r andom num b er of free r ider s ga ined in the first impr ovemen t, when considering the se c ond improv ement the bits at p ositions i + 3 + F 1 , . . . , n are s till uniform. In some sense, we give awa y the free riders fr om a fitness im- prov ements for free for all following improv ements. This leads to an estimation of 1 + F 1 for the gain in the num b er o f leading ones. 20 Iterating this argument, the exp ected total num b er of leading ones ga ined is th us b ounded by 2 I , the e x pectation being taken for the randomness of free rid- ers. Also considering the exp ectation for the random n umber of improvemen ts yields the b ound 2 E ( I | I ≥ 1 ) as I has b een defined with resp ect to the last (i. e. succes s fu l) g eneration. W e also observe E ( I | I ≥ 1) ≤ 1 + E ( I ) ≤ 1 + 2 t /n . Plugging this into Equa tion (1) yields ∞ X t =1 Pr ( T par i = t ) · (2 + 2 t +1 /n ) = 2 + 2 ∞ X t =0 Pr ( T par i = t + 1) · 2 t +1 /n ≤ 2 + 2 ∞ X t =0 Pr ( T par i > t ) · 2 t +1 /n ≤ 2 + 2 ⌈ log n ⌉ X t =0 2 t +1 /n + 2 ∞ X t = ⌈ log n ⌉ +1 Pr ( T par i > t ) · 2 t +1 /n . The first sum is at most 16 . Using Lemma 1 to estimate the second sum, we arrive at the low er b ound 18 + 2 ∞ X α =0 Pr ( T par i > ⌈ log n ⌉ + α + 1) · 2 ⌈ log n ⌉ + α +2 /n ≤ 18 + 2 ∞ X α =0 exp(2 − α ) · 2 ⌈ log n ⌉ + α +2 /n ≤ 18 + 16 · ∞ X α =0 exp(2 − α ) · 2 α < 29 . 8 . With proba bilit y 1 / 2 the algorithm star ts with no lea ding ones , indep enden tly from all following even ts. The exp ected num b er o f leading ones after n/ 30 improv ements is a t most 29 . 8 / 3 0 · n . By Mar k ov’s inequa lity the pro babilit y of having created n leading o nes is thus at most 29 . 8 / 30 and so with probability 1 / 2 · 0 . 2 / 3 0 = Ω(1) having n/ 30 impr o vemen ts is not enough to find a globa l optimum . 9 Generalizations & Extensions W e fina lly discuss gener a lizations and extensions of our r esults. One interesting question is in how far our res ult s change if the p opulation is not doubled or halved, but instead mu ltiplied or divided by some other v alue b > 1. W e b elieve that then the results would c hange as follows. With some p otential adjustment s to constant factors, the log- terms in the para lle l o ptimization times 21 in Theorems 1, 2 and 3 would hav e to b e r eplaced by lo g b . F or the sequential optimization times stated in these theorems one would need to multiply these bo unds by b/ 2 . This means that a larger b would further decrease the parallel optimization times at the exp ense of a la rger sequential optimization time. Our analyses c a n also be tra nsferred tow a rds the ada ptiv e scheme presented by Jans e n, De Jong, and W egener [9]. Reca ll that in their scheme the p opulation size is divided by the n umber o f successes . In case o f o ne succes s the p opulation size r emains unchanged. This only affects the constant factors in o ur upper bo unds. When the num b er of succ e sses is la rge, the p opulation size might decrease quickly . In most cases, how ever, the n umber of s uc c e sses w ill b e rather small; for instance, the low er b ound for LO, Theo rem 8, ha s shown that the exp ected num b er of successes in a suc c essful generation is constant. How ever, it might b e po ssible that after a difficult fitness level a n easier fitness level is reached and then the num b er of successes might b e m uch higher. In an extreme case their scheme can decrease the p opulation size like Scheme A. In so me sense, their scheme is so mewhat “in b et ween” A and B. With a slight adaptation of the constants, the upp er bo und for Scheme A from Theor em 1 can b e transferr ed to their scheme. Another extension of the r esults a b ov e is to wards maximum popula tion sizes. Although we hav e argued in Section 4 that the p opulation size do es not blow up to o m uch, in practice the maximum num b e r of pro cessor s might be limited. The following theorem ab out E ( T par A ) for ma x im um p opulation size s c a n b e proven by applying arg uments from [1 1]. Theorem 9. The exp e cte d p ar al lel optimizatio n t ime of Scheme A for a maxi- mum p opulation size µ max is b ounde d by E ( T par A ) ≤ m · [lo g µ max + 2] + 2 µ m − 1 X i =1 1 s i . Pr o of. W e pe ssimistically estimate the exp ected par allel time by the time until the p opulation cons ists of µ max islands plus the exp ected optimization time if µ max islands are av ailable. The time until µ max islands ar e in volved is log µ max on one fitness level. Hence, summing up a ll levels p essimistically g iv es m log µ max . F or µ max islands the success probabilit y on fitness level i with success probability s i for o ne is land is given by 1 − (1 − s i ) µ max . Hence, the exp ected time for leaving fitness level i if µ max islands are av ailable is at most 1 / [1 − (1 − s i ) µ max ]. No w we consider tw o cas e s . If s i · µ max ≤ 1 we hav e 1 − (1 − s i ) µ max ≥ 1 − (1 − s i µ max / 2) = s i µ max / 2 bec ause for all 0 ≤ xy ≤ 1 it holds (1 − x ) y ≤ 1 − xy / 2 [11, Lemma 1]. Otherw is e, if s i · µ max > 1 we hav e 1 − (1 − s i ) µ max ≥ 1 − e − s i µ max ≥ 1 − 1 e . Thus, m − 1 X i =1 1 1 − (1 − s i ) µ max ≤ m − 1 X i =1 max  1 1 − 1 /e , 2 µ max · s i  ≤ m · e e − 1 + 2 µ max m − 1 X i =1 1 s i . 22 Adding the exp ected waiting times un til µ max islands are inv olved yields the claimed b o und. In ter ms of our test functions OneMax, LO, unimoda l functions, and Jump k , this leads to the following result that can b e prov en like Theo rem 7. Corollary 3 . F or the p ar al lel (1+1) EA and t he (1+ λ ) EA with Scheme A the fol lowi ng holds for a m ax i mum p opulation size µ max : • E ( T par A ) = O ( n log µ max + n lo g n lo g( µ max ) /µ max ) for OneMax , which gives O ( n log log n ) for µ max = log n , • E ( T par A ) = O ( n log µ max + n 2 log( µ max ) /µ max ) for LO , whic h gives O ( n lo g n ) for µ max = n , • E ( T par A ) = O ( d log µ max + dn log( µ max ) /µ max ) for unimo dal functions with d function values, whic h gives O ( d log n ) for µ max = n , • E ( T par A ) = O ( n lo g µ max + n k log( µ max ) /µ max ) for Jump k , which gives O ( nk log n ) for µ max = n k − 1 . Note that Co rollary 3 has led to a n improvemen t of E ( T par A ) from O ( n log n ) to O ( n lo g log n ) for µ max = log n . This obviously a ls o holds in the setting of unrestricted p opulation sizes . 10 Conclusions W e have presented tw o schemes for ada pting the offspring popula tio n size in evolutionary a lgorithms and, more generally , the num b er of islands in pa rallel evolutionary algor ithms. Both schemes double the p opulation size in each gen- eration tha t do es not yield an improvemen t. Despite the exp onen tial growth, the exp ected se q uen tial optimizatio n time is a symptotically o pt imal for tight f -based partitions . In gener al, we obtain b ounds that ar e a s ymptotically equal to upper bo unds via the fitness-level metho d. In terms of the para lle l c o mputation time exp ected waiting times can b e re- placed by their logarithms for b oth schemes, compared to a ser ia l EA. This yields a tremendo us sp eed-up, in particular fo r functions where finding improv ement is difficult. Scheme B, doubling or halving the p opulation size in each generation, turned o ut to b e more effective than r esets to a single isla nd as in Scheme A. Apart fro m our main results, we have introduced the notion o f tight f -based partitions and new arguments from amortized a nalysis of a lgorithms to the theory of evolutionary a lgorithms. An open q ue s tion is how our sch emes p erform in cas e the fitness-level metho d do es not provide go o d upp er b ounds. In this ca se our b ounds may b e off fro m the real ex pected running times. In particular, there may be examples where increasing the offspring p opulation siz e by to o muc h might b e detrimental. One constructed function where large o ff spring po pulations p erform badly w as pre- sented in [9 ]. F uture work could characterize function c la sses for which o ur 23 schemes are efficien t in compariso n to the r e al exp ected running times. The notion of tight f -base d par titions is a first step in this dir ection. Ac kno wledgmen ts The authors would like to thank the Germa n Academic E xc hange Ser vice for funding their r esearch . Part of this w ork was done while bo th author s were visiting the International Computer Science Institute in Berkeley , CA, USA. References [1] E. Alba. Parallel meta heuristics: A new cla ss o f algo rithms, 2 005. [2] T. H. Co rmen, C. E . Leis erson, R. L. Rivest, and C. Stein. Int r o duction to Algo rithms . The MIT P ress, 2nd edition, 2001. [3] J. Costa , R. T av a res, a nd A. Rosa . Exp erimen tal study o n dy namic ran- dom v aria tion of po pulation size. In Pr o c e e dings of t h e IEEE Int ernatio nal Confer enc e on Systems, Man and Cyb ernetics , 1999. [4] S. Droste, T. Jansen, and I. W egener. O n the ana lysis of the (1+1) evolu- tionary alg o rithm. The or etic al Computer Scienc e , 27 6:51–81, 2002 . [5] A. E ib en, E. Marchiori, and V. V alko. Evolutionary alg orithms with on- the-fly p o pulation size adjustment. In Par al lel Pr oblem Solving fr om Natu r e (PPSN VIII) , pages 41– 50. Spring er, 200 4. [6] N. Hansen, A. Ga welczyk, and A. Oster meier. Sizing the popula tion with resp ect to the lo cal progr ess in (1, λ )-evolution strategies–A theoretical anal- ysis. In 1995 IEEE International Confer enc e on Evolutionary Computation , pages 80– 85, 1 995. [7] G. Harik and F. Lob o. A para meter-less genetic alg orithm. In Pr o c e e dings of the Genetic and Evolutionary Computation Confer enc e , page s 2 58–265, 1999. [8] M. Herdy . The num b er of offspring a s s trategy para meter in hier arc hically organize d evolution strategies. A CM SIGBIO Newsletter , 13(2 ):9 , 1 9 93. [9] T. Jansen, K. A. De Jong, and I. W egener. On the c hoice of the o ff- spring p opulation size in ev olutionar y a lgorithms. Evolutionary Computa- tion , 13:41 3–440, 2005. [10] J. L¨ assig and D. Sudholt. The b enefit of migra tio n in pa rallel evolutionary algorithms. In Pr o c e e dings of the Genetic and Evolutionary Computation Confer enc e (GECCO 2010) , pages 11 0 5–1112, 2 010. 24 [11] J. L¨ assig and D. Sudholt. Gene r al sc heme for ana lyzing r unning times of parallel evolutionary a lg orithms. In 11th International Confer enc e on Par al lel Pr oblem Solving fr om Natur e (PPSN 2010) , volume 6238 of LNCS , pages 234 –243. Spring er, 201 0. [12] Z. Michalewicz. Genetic algorithms + data structu r es = evolution pr o gr ams . Springer, 19 9 6. [13] G. Rudolph. How mutation and s e le ction solve long -path problems in p oly- nomial exp ected time. Evolutionary Computation , 4(2):195–20 5, 1997 . [14] D. Schlierk amp-V oo sen a nd H. M ¨ uhlen b ein. Strategy adaptation b y com- peting subpo pulations. Par al lel Pr oblem Solving fr om Natur e (PPSN III) , pages 199 –208, 1 9 94. [15] H.-P . Sch wefel. Numeric al optimization of c omputer mo dels . John Wiley & Sons, Inc. New Y ork, NY, USA, 1 981. [16] D. Sudholt. General low er b ounds for the r unning time of evolutionary algorithms. In 11th International Confer enc e on Par al lel Pr oblem Solving fr om Natur e (PPSN 2010) , volume 6238 of LNCS , pages 124–1 33. Springer, 2010. [17] M. T omassini. Sp atial ly St ructur e d Evolutionary Algorithms: A rtificial Evolution in Sp ac e and Time . Springer, 200 5. [18] I. W egener. Metho ds fo r the analy sis of evolutionary algor ith ms on ps eudo- Bo olean functions. In R. Sarker, X. Y ao, and M. Mohammadian, editor s, Evolutionary Optimization , pages 34 9–369. K lu wer, 20 02. 25 Adaptiv e P opulat i on Mo dels fo r Offspring P opulat i ons and P arall el Ev olutio na r y Algorithms J¨ org L¨ assig ICS, Univ ersit y of L ugano 6906 Lugano, Switzerland Dirk Sudholt CER CIA, Univ ersit y o f Birmingham Birmingham B15 2TT, UK June 2 0, 2018 Abstract W e present tw o adaptive schemes for d ynamical ly choosing the num b er of parallel instances in parallel ev olutionary algori thms. This includ es the choice of th e offspring p o pu lation size in a (1+ λ ) EA as a sp ecial case. Our schemes are parameterless and they w ork in a black-b o x setting where no knowledge on the problem is av ailable. Both schemes doub l e the num b er of instances in case a generation ends without findin g an impro vemen t. In a su cce ssful generation, the first scheme resets the system to one instance, while the second scheme halve s the num b er of instances. Both schemes provide n ea r-optimal sp eed - ups in terms of the parallel time. W e give upp er b ounds for the asympt otic sequential time (i. e., the total num b er of function ev aluations) that are n o t larger than upp er b ounds for a corresponding non- paral lel algorithm derived by the fitness- leve l meth od. 1 In tro duc ti on Parallelization is b ecoming a more a nd more imp ortan t issue for so lving difficult optimization problems [1]. V ar ious implement ations of par allel evolutionary algorithms (EAs) have b een applied in the pa st decades [17]. An obvious wa y of using parallelization is to pa rallelize sing le op erations of an EA such as executing fitness ev aluations on different pro cessors. This particularly applies to EAs us ing large o ffspring popula tions. So-c a lled island mo dels use paralleliza tion o n a higher level. The idea is to parallelize evolution itself, b y having s ubpopulations, called isla nds , which evolve in parallel. Go od solutions a re e x c ha nged b et ween the islands in a migr ation pro cess. 1 One of the most imp ortant questio ns when dealing with para llel EAs is how to c ho ose the num b er of pro cessors in o r der to decrease the p ar al lel optimization time , defined as the n umber of ge nerations un til an EA has found a global optimum . Assume a setting where we can c ho ose the num b er of pro cessors to be allo cated, but we have to pay co sts for each pro cessor in each gene r ation it is b eing used. This s itua tion is common in clo ud computing or in lar g e grids where pro cessors are shared with other users. The total c o st for all pro cessor s ov er time is called se quential optimization time . The ta s k is now to choos e the num b e r of pro cessors to b e used such that the par allel optimization time is small, but at the sa me time the se q uen tial time is r easonable. Allo cating to o ma n y pr ocessor s would waste computational effor t and hence unnecess arily increase the sequential optimiza tio n time. Allo cating too few pro cessor s implies a large parallel o pt imization time. During the run of an EA, the “ ideal” v alue for the num b er o f pr ocessor s is likely to change over time. One typical situation is that in the b eginning of a r un improv ements are eas y to obtain and only few pro cessor s ar e needed. The better the b est fitness, the tougher it g e ts to find further improvemen ts a nd then mor e pro cessors are requir ed. It therefore makes sens e to loo k at adaptive mechanisms that ca n adjust the num b er of pro cessors which ar e b eing used during the run of the E A. This obviously only makes sense in a setting w he r e allo cating and deallo cating pro cessors on-the-fly is p ossible and the cost for these op erations and the cost for the co mm unicatio n b et ween the pro cessor s are ra ther small. Hence we fo cus on balancing the pa r allel and sequential o pt imization times. In the following we pr esen t ada ptive schemes for choosing the n umber of pro cessors tha t apply both to offspring po pulations a s well as island mo dels of EAs. W e accompany our schemes by a rigor ous theor etical analysis of their running time. Both schem es double the num be r o f pro cessors if the cur ren t generation fails to pro duce an offspr ing tha t has larger fitness than the curr en t bes t fitness v alue. Other wise, if the generation yields an impr o vemen t, the nu mber of pro cessor s is decreased ag ain. The difference b et ween the t wo schemes lies in the wa y the num b er of pro cessors is decreased. The first s cheme, ca lled Scheme A, simply r esets the num ber of pro cessors to 1; only the b est individual or island survives. This is to av o id a n overly large num b er of pro cessor s when mo ving from a situation where improv ements are ha rd to find to a situation where improv ements a re easy . This ha ppens, for instance, if the EA e scapes fr om a loca l optimum and then jumps to the basin of a ttraction of a better lo cal optimum. The second scheme, Scheme B, tries to maintain a fair num b er of pro cessor s ov er time; it also do ubles the p opulation size in unsuc c e ssful g e nerations a nd it halves the popula tio n size in succ essful gener ations. This strateg y makes mo re sense in cas e the EA encounters similar probabilities for impr o vemen ts over time. Both schemes are pa rameterless and oblivio us with respect to the ob jective function. They can b e a pplied in a black-box setting where no k no wledge is av ailable ab out the problem. In ter ms of offspr ing p opulations we consider the (1+ λ ) EA that maintains a single b est indiv idua l a nd in each iter ation cr e ates λ o ffspr ing. A b est offspring 2 replaces its parent if its fitness is not worse. The λ offspring cr eations and function ev aluations can b e par a llelized on λ pro cessor s. Concer ning is land mo dels, we assume tha t migration sends copies of each island’s best individual to each other island in ev ery genera tion. So, whenever one isla nd finds a n improv ement of the curre nt b est individual in the sy s tem, this is immediately communicated to all other isla nds. The island mo del then behaves similar ly to offspring p opulations, but it is mor e genera l a s the islands can work with po pulations of size la rger than 1. T o unify the notatio n for is la nd mo dels and offspring p opulations, we sim- ply sp eak of the p opulation size in the following; this means the num b er of islands in the is land mo del a nd the offspring p opulation size for the (1+ λ ) EA, resp ectiv ely . F or EAs using either Scheme A or B we show tha t the exp ected par allel op- timization time ca n be decrea sed dr a stically . In comparison to the w ell-known fitness-level metho d, in the parallel optimization time for every fitness v alue the exp ected waiting time for a n improv ement can b e replaced by its lo ga- rithm. This can drastically reduce the pa rallel optimizatio n time, in pa rticular for problems where improvemen ts are hard to find. The e x pected sequential time remains r e a sonable. W e prov e upp er b ounds on the exp ected s equen tia l optimization time that ar e a symptotically no lar ger than upp er b ounds fo r a single instance obta ined via the fitness - lev el metho d. F or problems whe r e the fitness-level metho d g iv e s tight b ounds, our res ult s show that the t wo schemes automatically yield decreas e d exp ected paralle l o ptim izatio n times, without in- creasing the exp ected sequential time. The mentioned bo unds ar e g eneral in the s e ns e that they a pply to islands running arbitra ry elitist alg o rithms. Exa mple applications a re given that apply simult aneo usly to the (1 + λ ) EA and to islands of p opulation s ize 1. V arious functions are considered: OneMa x, LO, the class of unimo dal functions and Jump k . Comparing the different schemes, our results indicate that Scheme B is more efficient than A, from an asymptotic p erspective, as it quickly reduces the num- ber of pro cessors, if necessa ry . T his adaptatio n automatica lly leads to o pt imal or near-optimal par allel optimization times o n a ll considered ex amples. On one example Scheme B outp erforms Scheme A. W e also compar e these results with tailored schemes that ar e allow ed to use knowledge on the ob jective function. Besides the main results this paper is also interesting b ecause o f the meth- o ds us ed. W e introduce new techniques from the amortized analysis of a l- gorithms, whic h represent na tural and effective to ols for analy z ing adaptive mechanisms. These tec hniques ma y find further applications in the analys is of adaptive sto c ha stic search algo rithms. The remainder o f this work is str uctur ed as follows. In Section 2 we review previous work. Section 3 pre s en ts the algorithms and the cons idered p opulation upda te schemes. In Section 4 we provide technical statements that will b e used later on in our a nalyses and that ma y also help to understand the dynamics o f the adaptive algor ithms. Sectio n 5 then presents g eneral upper b ounds for b oth schemes, while Se c tio n 6 dea ls with lower bo unds o n exp ected sequential times. 3 Section 7 contains a brief discussio n ab out tailo red, that is, non-oblivious p op- ulation up date schemes. Our gener al theorems a re applied to co ncrete example functions in Section 8. W e finish with a discus sion of pos sible extensions in Section 9 and conclus io ns in Section 10. 2 Previous W ork 2.1 Adaptiv e P opulation Mo dels Considering ada pt ive n umbers o f isla nds in the island mo del o f E A s, previo us work is very limited. How ever, there ar e numerous results for ada ptiv e p opula- tion sizes in EAs. E iben, Mar chiori, and V alko [5] describ e EAs with on-the-fly po pulation size a djustm ent. They compare d the pe r formance of the different strategies in ter ms of success rate, sp eed, and s o lution qualit y , measured on a v ariety of fitness landsca pes. The b est E As with adaptive po pulation resizing outp e rformed traditio na l approa ches when considering the time to result, which is the pa r allel optimization time. Typical appro ac hes ar e eliminating p opula- tion size as an explicit para meter by intro ducing a ging and maximum lifetime prop erties for individuals [12], the parameter - less GA (PLGA) whic h evolves a nu mber of p opulations of different siz e s simultaneously [7], ra ndom v ar iation of the p opulation size [3], and co mpetition schemes [14]. Sch wefel [15] first suggested the ada ptation of the offspr ing p opulation size during the optimizatio n pro cess. Her dy [8] prop osed a mutativ e adapta tio n of λ in a t wo-level ES, where on the upp er level, called p opulation level, λ is treated as a v a r iable to b e optimized while on the low er level, c a lled individual level, the ob ject par ameters are optimized. In [6 ], a deter ministic a daptation scheme for λ based on theoretical consid- erations on the relatio n b et ween serial rates of progr ess for the actual n umber of o ff spring λ , for λ − 1 and for the optimal num b er o f o ffspr ing is in tro duced. More sp ecific, the lo cal serial progres s (i. e., pr o gress p er fitness function ev al- uation) is o ptim ized in a (1 , λ ) EA with resp ect to the num be r of o ffspr ing λ . The author s prove the following str uctu ra l pro perty: the seria l pr o gress-rate a s a function of λ is either a function with exact one (lo cal and globa l) maximum or a strictly monotonically increas ing function. Jansen, De Jong , and W egener [9] further elab orate o n the offspring p opu- lation s ize, presenting a thorough runtime analy s is of the effects of the offspring po pulation siz e . They also suggest a simple wa y to dynamically a dapt this pa- rameter and present empirical r esults for this scheme, but no theoretica l analysis of their scheme has be e n p erformed. The prese n ted scheme doubles the offspring po pulation size if the algo rithm is unsuccessful to impr ove the currently b est fitness v alue. Otherwise, it divides the current offspring po pulation size by s , where s is the num b er of o ffspring with b etter fitness than the b est fitness v alue so far. W e will discuss in Section 9 how our schemes rela te to their scheme and in how far our results can b e transferr ed. 4 2.2 Theoretical W ork on Parallel EA s In [10], a first rigoro us runtime analysis for island mo dels has b een perfor med by constructing a function where alter nating phases o f indep enden t evolution and communication among the isla nds a re e s sen tial. A simple island mo del with migration finds a glo bal o ptim um in po lynomial time, while panmictic po pulations a s well as island mo dels without migration need exp o nen tia l time, with very high pro babilit y . New metho ds for the running time a nalysis of parallel evolutionary algo - rithms with spatially structure d p opulations hav e been pr esen ted in [11]. The authors generalized the well k nown fitness-level metho d, also called metho d of f -based partitions [18], fro m panmictic p opulations to spatially s tr uctured evo- lutionary alg orithms with v arious migration top ologies. These metho ds were applied to estimate the s peed-up gained by paralleliz a tion in pseudo- Boolea n optimization. It w as shown that the p ossible sp eed-up for the parallel optimiza- tion time increas es with the density of the top ology . The exp ected sequential optimization time is asymptotically no t larg er than an upp er bound for a cor- resp onding non-para llel E A, derived via the fitness- lev el metho d. More prec isely , the classica l fitness level metho d says tha t when s i is a low er bo und o n the proba bilit y tha t o ne island leaves the current fitness level towards a b etter one, the exp ected time until this happ e ns is at mo st 1 /s i for a panmictic po pulation. In a paralle l E A with a unidirectional ring, the ex p ected parallel time decrease s to O ( s 1 / 2 ); in o ther words, the w aiting time can b e repla ced by its square ro ot. F or a torus graph even the third ro ot can be us e d and with a prop er choice of the num b er µ of islands, a sp eed-up of order µ is p ossible in some settings. Int eres tin gly , the r esults from [11] ca n partially b e interpreted in terms of adaptive p opulation sizes. The analyses a re based on the n umber of individuals on the current best fitness le vel. In our uppe r b ounds, w e pess imistically assume that only islands on the cur ren t b est fitness level hav e a r easonable chance of finding b etter fitness levels. All worse individuals are ignored when estimating the waiting time for a n improvemen t o f the b est fitness level. If a unidirectional ring topo logy is us e d, migr ation happ ens in ev ery genera tion, and b etter indi- viduals ar e gua r an teed to win in the s election step, the num b er of individuals on the curr en t b est fitness level increases by 1 in ea c h gener ation as alwa ys a new island is taken ov er. (W e p essimistically igno re the fact that islands on worse fitness levels can impr o ve their b est fitness.) If any is land finds an im- prov ement, it is p essimistically a ssumed that then only o ne is la nd has made it to a new, b etter fitness level. This setting cor responds e x actly to a paral- lel EA that in each unsuccess ful gener ation a cquires one new proc essor and to an a daptiv e (1+ λ ) EA that incr eases λ by 1 in each unsuccessful genera tion. Once an improv ement is found, the p opulation siz e dr ops to 1 a s in the case of our first scheme presented here. The upp er b ounds fro m [11] therefore dir e ctly transfer to additive p opulation siz e adjustments. In the following we show that m ultiplicative adjustments of the p opulation size may admit b ett er sp eed-ups than a dditiv e approaches as suggested in [11]. 5 3 Algorithms In Sectio ns 5 and 7 we pr esen t gener al upper b ounds v ia the fitness-level metho d. These results are gene r al in the following sense. If a ll islands in a par allel EA run elitist algorithms (i. e., algorithms where the b est fitness in the p opulation can never decrease) and if we hav e a low er b ound on the pr obabilit y of finding a better fitness level then this can b e tur ne d into an upp er b ound for the exp ected sequential and para llel r unning times of the para llel EA. W e present a scheme for algorithms where this argument applies. The goal is to maximize some fitness function f in a n arbitrary search s pace. An a da ptation tow a rds minimization is trivial. Algorithm 1 E litist parallel EA with a daptiv e p opulation 1: Let µ := 1 and initialize a s ingle island P 1 1 uniformly at ra ndom. 2: for t := 1 to ∞ do 3: for all 1 ≤ i ≤ µ in pa r allel do 4: Select parents and cr eate offspring by v ar iation. 5: Send a copy of a fittest o ff spring to a ll other islands. 6: Create P i t +1 such tha t it co ntains a b est individual from the union o f P i t , the new o ffspring, and the incoming mig ran ts. 7: µ t +1 := up dateP opulationSize( P i t , P i i +1 ) 8: if µ t +1 > µ t then cre ate µ t +1 − µ t new islands b y copying ex is ting islands. 9: if µ t +1 < µ t then delete µ t − µ t +1 islands. The selection of is lands to b e copied or remov ed, res p ectively , is left un- sp ecified. Note that each island migrates individuals to all other islands. This corres p onds to a c o mplete migration top ology . Due to this fact, all isla nds alwa ys co n tain a n o ffspring with the cur ren t b est fitness. This o bserv ation is sufficient for the up coming analyses. With other topo logies this selection would be based on the fitness v alues of the curr en t elitists on all islands. The (1+ λ ) EA can b e re g arded a specia l case wher e we hav e λ islands a nd a single b est individual takes ov er all λ islands. Setting λ := 1 yields the well- known (1+1) EA. Algorithm 2 (1 + λ ) EA with ada pt ive p opulation 1: Initialize a cur ren t sear c h p oin t x 1 uniformly at ra ndom. 2: for t := 1 to ∞ do 3: Create λ offspring by mutation. 4: Let x ∗ be an o ff spring with max imal fitness. 5: if f ( x ∗ ) ≥ f ( x t ) then x t +1 := x ∗ else x t +1 := x t . 6: λ := up dateP opulationSize( { x t } , { x t +1 } ) Note that we hav e neither s pecified a search spac e nor v ar ia tion op erators. How ever, in Section 6 w e will discuss low er b ounds that only hold in pseudo- 6 Bo olean optimization and for EAs that only use standar d m utation (i. e., flipping each of n bits indep enden tly with probability 1 /n ) for crea tin g new offspring. In Section 8 w e will consider concr ete example functions where the (1+ λ ) EA with adaptive p opulations or, eq uiv a len tly , a n island mo del r unning (1+ 1) E As, with an adaptive nu mber of islands are applied. The latter was called parallel (1+1) EA in [10, 1 1 ]. W e now define the po pulation update schemes co nsidered in this work. The function up datePopulationSize takes the old and the new p opulation as inputs and it outputs a new populatio n size . In order to help finding improvemen ts that take a lo ng time to b e found, we double the p opulation size in each uns ucc essful g eneration. As we mig h t not need tha t man y islands after a success, we reset the p opulation size to 1. Algorithm 3 up datePopulationSize( P t , P t +1 ) (Scheme A) 1: if ma x { f ( x ) | x ∈ P t +1 } ≤ max { f ( x ) | x ∈ P t } then 2: return 2 µ t 3: else 4: return 1 On pr oblems where finding impr ovemen ts ta kes a s imila r amo un t of time, it mig h t no t ma k e sense to thr o w awa y all islands at o nce . Esp ecially if im- prov ements have similar probabilities over time, it makes sense to stay close to the current num b er of isla nds. Therefor e , in the following scheme we ha lv e the po pulation size with every successful ge ner ation. W e will see that this do es not worsen the asymptotic p erformance co mpared to Scheme A. F or so me pr oblems this scheme will turn out to b e sup erior. Algorithm 4 up datePopulationSize( P t , P t +1 ) (Scheme B) 1: if ma x { f ( x ) | x ∈ P t +1 } ≤ max { f ( x ) | x ∈ P t } then 2: return 2 µ t 3: else 4: return ⌊ µ t / 2 ⌋ The motiv a tio n for considering Scheme A is that we ca n assess the effect o f gradually decreasing the p opulation size, when compar ing it to Scheme B. It also serves as a first s tep towards analyzing Scheme B, where the analys is turns out to be more inv o lv e d. Our schemes for pa r allel EAs ar e a pplicable in large clusters where the cost of allo cating ne w pro cessors is low, compared to the computational effort sp en t within the evolutionary algorithm. Man y of our results can b e ea s ily ada pted tow a rds algorithms that do no t use migratio n and p opulation size up dates in every g eneration, but o nly every τ gener ations, fo r a pa rameter τ ∈ N , called migr ation interval . This can sig nifican tly reduce the cos ts for allo cating and deallo cating new pro cessor s. Details can be found at the end of Section 5. 7 An algo rithm using Schem e B can b e implement ed in a decentralized wa y as follows, wher e we assume that ea ch isla nd runs on a separ ate pro cessor. Assume all pr ocesso r s are synchronized, i. e., they share a common timer. All pro cessors hav e knowledge o n the cur ren t b est fitness lev el and they inform all other pro cessors by sending messages in case they find a b etter fitness level. This messag e contains individuals tha t can b e taken ov er by other pro cessors so that all pro cessors work on the cur ren t b est fitness level. In the adaptive scheme, if a f ter one gene r ation no messag e has b een r eceiv ed, i. e., no pro cessor has found a b etter fitnes s level, each pro cessor activ a tes a new pro cessor as follows. E ac h pro cessor maintains a unique ID. The first pro cessor has an ID that simply consis ts o f a n empt y bit str ing. E a c h time a pro cessor activ ates a new pro cessor, it co pies its curre nt p opulation a nd its current ID to the new pro cessor. Then it app ends a 0 -bit to its ID while the new pro cessor app ends a 1- bit to its ID. At the end, all pro cessors hav e enla rged their IDs by a single bit. When an improvemen t has b een found, all pro cessor s first take ov er the genetic material in the messag es that ar e pass ed. Then all pr ocesso rs whose ID ends with a 1 -bit shut down. All other pro cessors remove the last bit from their IDs. It is easy to see that with this mechanism all pro cessors w ill always have pairwise distinct IDs a nd no central control is needed to acquire and shut down pro cessors. As mentioned in the intro duction, we define the p ar al lel optimization t ime T par as the num b er o f g enerations until the first glo bal optimum is ev aluated. The se quential op timization t i me T seq is defined a s the n umber o f function ev aluatio ns until the fir s t g lobal optimum is ev a luated. The num b er o f function ev aluatio ns is a c ommon per f or mance meas ure a nd it ca ptures the total effor t o n all pr ocessor s. Note that this includes all function ev alua t ions in the generatio n of the algorithm in whic h the improv ement is found. These definitions a re consistent with the measures a s suggested in the literature [9]. In b oth measur es we allow ourselves to neg le ct the cost o f the initialization a s this only adds a fixed ter m to the r unning times. 4 T ail B ounds and Exp ectations In prepar ation for up coming running time a nalyses w e first pr o ve tail bo unds for the parallel optimization times in a setting where we ar e waiting for a sp ecific even t to ha pp en. This, along with b ounds on the exp ected par allel and seq uen- tial waiting times, will be useful to pr o ve our main theor ems. The tail b ounds also indicate that the p opulation will not grow to o larg e. In the re ma inder of this pap er we abbr eviate max { x, 0 } by ( x ) + . Lemma 1. Assu me st a rting with 2 k islands for some k ∈ N 0 and doubling the numb er of islands in e ach gener ation. L et T par ( k , p ) denote the r andom p ar al lel t i me until the first island enc ount ers an event that o c curs indep endently on e ach island and in e ach gener ation with pr ob ability p . L et T seq ( k , p ) b e the c orr esp onding se quential time. Then for every α ∈ N 0 8 1. Pr h T par ( k , p ) > ( ⌈ lo g(1 /p ) ⌉ − k ) + + α + 1 i ≤ exp( − 2 α ) , 2. Pr [ T par ( k , p ) ≤ lo g(1 /p ) − k − α ] ≤ 2 · 2 − α , 3. log (1 /p ) − k − 3 < E ( T par ( k , p )) < ( log(1 /p ) − k ) + + 2 , 4. max { 1 /p, 2 k } ≤ E ( T seq ( k , p )) ≤ 2 /p + 2 k − 1 . Each ine quality r emains valid if p is r eplac e d by a p essimistic est i mation of p (i. e., either an upp er b ound or a lower b ound). Pr o of. T he conditio n T par ( k , p ) > ( ⌈ log ( 1 / p ) ⌉ − k ) + + α + 1 requires that the even t does not happ en on a ny island in this time per iod. The num b er o f tr ia ls in the last gener ation is at least 2 ⌈ log(1 /p ) ⌉ + α ≥ 1 /p · 2 α for all k ∈ N 0 . Hence Pr h T par ( k , p ) > ( ⌈ log (1 /p ) ⌉ − k ) + + α + 1 i ≤ (1 − p ) 1 /p · 2 α ≤ exp( − 2 α ) . F or the seco nd statement w e as sume k ≤ lo g(1 /p ) − α as otherwise the cla im is tr ivial. A neces sary condition for T par ( k , p ) ≤ log (1 /p ) − k − α is that the even t do es ha pp en at least o nce within in the first log(1 /p ) − k − α generations. This corr esponds to at mos t P log(1 /p ) − α i =1 2 i − 1 ≤ 2 log(1 /p ) − α = 1 /p · 2 − α trials. If p > 1 / 2 the claim is trivial as either the pro babilit y bo und on the r igh t-hand side is at lea st 1 or the time b ound is neg ativ e, hence we as sume p ≤ 1 / 2. Observing that then 1 /p · 2 − α ≤ 2 (1 /p − 1) · 2 − α , the considered probability is bo unded by 1 − (1 − p ) 2(1 /p − 1) · 2 − α ≤ 1 − exp( − 2 · 2 − α ) ≤ 1 − (1 − 2 · 2 − α ) = 2 · 2 − α . T o b ound the exp ectation we observe that the first statement implies Pr h T par ( k , p ) ≥ (lo g(1 /p ) − k ) + + α + 2 i ≤ ex p( − 2 α ). Since T par ( k , p ) is non- negative, we hav e E ( T par ( k , p )) = ∞ X t =1 Pr [ T par ( k , p ) ≥ t ] ≤ (log(1 /p ) − k ) + + 1 + ∞ X α =0 Pr h T par ( k , p ) ≥ (lo g(1 /p ) − k ) + + α + 2 i ≤ (log(1 /p ) − k ) + + 1 + ∞ X α =0 exp( − 2 α ) < (log(1 /p ) − k ) + + 2 9 as the last sum is less than 1. F or the low er b ound we use that the se c ond statement implies Pr [ T ≥ log (1 /p ) − k − α ] ≥ 1 − 2 · 2 − α . Hence E ( T par ( k , p )) = ∞ X t =1 Pr [ T par ( k , p ) ≥ t ] ≥ log(1 /p ) − k − 1 X α =2 Pr [ T par ( k , p ) ≥ lo g(1 /p ) − k − α ] ≥ log(1 /p ) − k − 1 X α =2 (1 − 2 · 2 − α ) = log(1 /p ) − k − 2 − log(1 /p ) − k − 2 X α =1 2 − α > log(1 /p ) − k − 3 . F or the four th s tatemen t co ns ider the islands o ne-b y-one, according to some arbitrar y ordering . Le t T ( p ) be the ra ndom n umber of sequential trials until a n even t with proba bility p happ ens. It is well known that E ( T ( p )) = 1 /p . Obvi- ously T seq ( k , p ) ≥ T ( p ) since the sequential time has to account for all islands that a r e ac tive in one ge neration. This prov es E ( T seq ( k , p )) ≥ E ( T ( p )) ≥ 1 /p . The second lower b o und 2 k is obvious a s at lea st one generatio n is neede d for a success. F or the upper bound o bserv e that T seq ( k , p ) = 2 k in case T ( p ) ≤ 2 k and T seq ( k , p ) = P ℓ i = k 2 i in cas e P ℓ − 1 i = k 2 i < T ( p ) ≤ P ℓ i = k 2 i . T ogether, we get that T seq ( k , p ) ≤ max { 2 T ( p ) , 2 k } ≤ 2 T ( p ) + 2 k − 1, hence E ( T seq ( k , p )) ≤ 2 /p + 2 k − 1. The pres en ted ta il b ounds indicate that the p opulation typically do es not grow to o larg e. The pr obabilit y that the nu mber of gener ations exc e e ds its exp ectation by an additive v alue of α + 1 is even an inv erse doubly ex ponen- tial function. The following provides a more handy statement in terms of the po pulation size. It follows immediately fr om Lemma 1. Corollary 1. Consider the sett ing describ e d in L emma 1 . F or every β ≥ 1 , β a p ower of 2, the pr ob ability that while waiting for the event to happ en the p opulation size exc e e ds max { 2 k +1 , 4 /p } · β is at most exp( − β ) . One conclusio n from these findings is that o ur schemes can b e a pplied in practice without risking an overly larg e blowup of the p opulation size. W e now turn to p erformance g uaran tees in terms of exp ected parallel and s equen tia l running times. 5 Upp er Boun ds via Fitness Lev els The following re sults ar e based on the fitness-level method, also kno wn as metho d of f -based pa rtitions (see, e. g., W eg ener [18]). This method is well 10 known for proving upp er b ounds for algor ithms that do no t accept w ors enings of the p opulation. Consider a partition o f the sea rc h space in to sets A 1 , . . . , A m where for all 1 ≤ i ≤ m − 1 a ll search points in A i are str ictly w orse than all search p oin ts in A i +1 and A m contains all g lobal optima. If ea c h set A i contains only a single fitness v alue then the pa r tition is called a c anonic par tition. If s i is a lower b ound on the pr obabilit y o f creating a sear c h po int in A i +1 ∪ · · · ∪ A m , provided the curre nt b est search p oin t is in A i , then the exp ected optimization time is b ounded from ab o ve by m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 s j , where Pr [ A i ] abbr e viates the proba bilit y tha t the b est search p oin t after ini- tialization is in A i . The reason for this b ound is that the e x pected time until A i is left tow ards a higher fitness level is at most 1 /s i and each fitness level, starting from the initial one, has to b e left at mos t once. Note that we can a l- wa y s simplify the ab o ve b ound by p essimistically assuming that the p opulation is initialized in A 1 . This removes the term “ P m − 1 i =1 Pr [ A i ] · ” and only leav es P m − 1 j =1 1 /s j . This wa y of simplifying uppe r b ounds ca n b e used for all results presented hereinafter. The fitness- le v el metho d yields go o d upp er b ounds in many cases. This includes situations where an evolutionary algo rithm typically moves through increasing fitness levels, without s k ipping to o many levels [16]. It only g iv es crude upp er b ounds in case v alues s i are dominated b y sea rc h p oint s from which the probability of leaving A i is m uch lower than for other search po in ts in A i or if there are levels with difficult lo cal optima (i. e., large v alues 1 / s i ) that are only reached with a sma ll probability . Using the exp ectation b ounds fr om Section 4 we now show in Theorem 1: F o r bo th s c hemes , A a nd B, in the upp er b ound for the exp ected par allel time the exp ected se q uen tial waiting time ca n b e replaced by its loga r ithm. In addition, the expe c t ed sequential time is asymptotically no t large r than the upp er b ound for the seria l alg orithm, derived by f -ba sed pa rtitions. In the re ma inder of the pap er we denote with T par x and T seq x , x ∈ { A , B } the parallel time and the sequen tial time for the sc hemes A and B, resp ectiv e ly . Theorem 1. Given an f -b ase d p artition A 1 , . . . , A m , E ( T seq A ) ≤ 2 m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 s j . If the p artition is c anonic then also E ( T par A ) ≤ 2 m − 1 X i =1 Pr [ A i ] · m − 1 X j = i log  2 s j  . 11 The reaso n for the co nstan t 2 in the lo g(2 /s j ) ter m is to ensure that the term do es not beco me sma ller than 1; with a cons ta n t 1 the v alue s j = 1 would even lead to a summand log (1 /s j ) = 0. Pr o of. W e only need to pr ove asymptotic b ounds o n the conditional exp ecta- tions when star ting in A i , with a common co nstan t hidden in all O -terms. The law of total exp ectation then implies the cla im. F or Scheme A we apply Lemma 1 w ith k = 0 . This yields that the exp ected sequential time for leaving the cur ren t fitnes s level A j tow a rds A j +1 ∪ · · · ∪ A m is at mos t 2 /s j and the exp ected par allel time is at most log(1 /s j ) + 2 ≤ 2 log(2 /s j ). The exp ected sequential time is hence bounded by 2 P m − 1 j = i 1 /s j and the exp ected pa r allel time is at most 2 P m − 1 j = i log(2 /s j ). W e prov e a similar upp er b ound fo r Sch eme B using arg umen ts from the amortized analy sis of algorithms [2 , Chapter 17]. Amortized analy s is is us ed to derive sta temen ts on the average running time of an op eration or to estimate the total costs of a sequence of o perations . It is esp ecially useful if so me op erations may be far more costly tha n others a nd if expensive op erations imply that many other op erations will b e cheap. The ba sic idea of the so-called ac c ounting metho d is to let all o p erations pa y for the costs of their exe c ut ion. Op erations are allow ed to pay excess a moun ts of money to fictional accounts. Other op e r ations can then tap this p ool of money to pay for their costs. As long as no acco un t bec omes overdra wn, the total costs of all op erations is b o unded by the total amount of money that has been paid or dep osited. Theorem 2. Given an f -b ase d p artition A 1 , . . . , A m , E ( T seq B ) ≤ 3 m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 s j . If the p artition is c anonic then also E ( T par B ) ≤ 4 m − 1 X i =1 Pr [ A i ] · m − 1 X j = i log  2 s j  . Pr o of. W e use the accounting metho d to b ound the exp ected sequential o pti- mization time o f B as follows. Assume the a lgorithm b eing on level j with a po pulation size o f 2 k . If the current generation pas ses without leaving the cur- rent fitness level, we pay 2 k to cover the co sts for the seque ntial time in this generation. In addition, we pay another 2 k to a fictiona l bank acco un t. In case the gener ation is successful in leaving A j and the pre v ious genera tion was unsuccessful, we just pay 2 k and do no t make a dep osit. In case the cur ren t gen- eration is success f ul a nd the las t unsuccessful genera tion was on fitness lev el j , we withdraw 2 k from the ba nk account to pay for the curr en t gener ation. In other words, the c ur ren t g eneration is for fr ee. This wa y , if there is a sequence of successful generatio ns after a n unsuccessful one on level j all but the fir st successful genera t ions are for fre e. 12 Let us verify that the bank acco un t ca nnot b e ov erdrawn. The basic ar - gument is tha t , whenever the p opulation size is decre a sed from, s ay , 2 k +1 to 2 k then there must b e a previous gener ation where the p opulation size w as in- creased from 2 k to 2 k +1 . It is easy to see that asso ciating a decrea se with the latest increase gives an injectiv e mapping. In simpler ter ms, the latest g e ne r a- tion that has incr eased the p opulation size from 2 k to 2 k +1 has alrea dy paid for the current decrease to 2 k . When in the upp er b ound for A fitness level i takes sequential time 1 + 2 + · · · + 2 k = 2 k +1 − 1 then for B the tota l costs paid are 2(1 + 2 + · · · + 2 k − 1 ) + 2 k as a success fu l genera tion does not make a dep osit to the bank account. The total costs equal 2 k +1 − 2 + 2 k ≤ 3 / 2 · (2 k +1 − 1). In consequence, the total costs for Scheme B are a t most 3 / 2 the co sts for A in A’s upp er b ound. This proves the claimed upp er b ound for B. By the very same a rgumen t an upper b ound for the exp ected par allel time for B follows. Instead o f paying 2 k and maybe making a dep osit of 2 k , we alwa y s pay 1 and alwa ys make a depo sit of 1. When withdrawing money , we always withdraw 1. This prov es that also E ( T par B ) is at most twice the corre sponding upper bo und for Scheme A. The argument in the ab o ve pro of ca n also b e used for proving a general upp er bo und for the exp ected parallel optimization time for B. When paying co sts 2 for each fitness level, this pays fo r the successful generation with a popula tion size of, say , 2 k and for one future generatio n where the po pulation size might hav e to b e doubled to re ac h 2 k again. Imagine the sequence of p opulation sizes ov er time and then delete a ll el- ement s where the p opulation size has decr eased, including the asso ciated gen- eration where the p opulation size was incr eased b eforehand. In the rema ining sequence the p opulation size co n tinually incr eases until, assuming a global op- tim um has not been found yet, after n lo g n generations a po pulation size o f at least n n is reached. In this case the pro babilit y of cr eating a globa l o pti- m um by mutation is at lea st (1 − n − n ) n n ≈ 1 /e a s the pro babilit y of hitting any specific target p oint in one m utation is a t least n − n . The exp ected num- ber of generations un til this happ ens is clea rly O (1). W e hav e th us shown the following. Corollary 2. F or every function with m funct ion values we have E ( T par B ) ≤ 2 m + n log n + O (1 ) . This b ound is asymptotically tig h t, for instance, for long path pro blems [4, 13]. So, the m -term, in gener al, cannot b e avoided. When compar ing A and B with resp ect to the exp ected para llel time, we exp ect B to per form b etter if the fitness levels hav e a similar degr ee of difficulty . This implies that there is a certa in tar get lev el fo r the p opulation size. Note, how e ver, that such a tar get level do es no t exis t in case the s i -v alues a re dissim- ilar. In the case of similar s i -v alues A mig h t be forced to sp end time doubling the p opulation s ize for each fitness level un til the target level ha s b een reached. 13 This waiting time is reflected by the log (2 /s j )-terms in Theor em 1. The fol- lowing upper bo und on B shows that these log-ter ms can b e avoided to some extent. In the sp ecial yet r ather common situatio n that impro vemen ts b ecome harder with each fitness lev el, only the bigg est such log-ter m is needed. Theorem 3. Given a c anonic al f -b ase d p artition A 1 , . . . , A m , E ( T par B ) is b ounde d by m − 1 X i =1 Pr [ A i ] · 3( m − i − 1) + log  1 s i  + m − 1 X j = i +1  log  1 s j  − log  1 s j − 1  + ! . If addi tional ly s 1 ≥ s 2 ≥ · · · ≥ s m − 1 then t h e b oun d simplifies to m − 1 X i =1 Pr [ A i ] ·  3( m − i − 1) + log  1 s m − 1  . Pr o of. T he second claim immedia t ely fo llo ws from the fir st one a s the lo g-terms form a telescoping sum. F or the fir st b ound we ag ain use arg um ents from amo rtized analysis. By Lemma 1 if the current p opulation size is 2 k then the expected num b er of genera- tions un til an improv ement fr om level i happ ens is at most (log(1 / s i ) − k ) + + 2. This is a bo und of 2 for k ≥ lo g(1 /s i ). W e per form a so- c alled aggr e gate anal- ysis to es tim ate the total cost on all fitness levels. Thes e co s ts are attributed to different sources. Summing up the costs for all sources will yield a b ound on the total costs and hence on T par B . In the first gener a tion the fitness level i ∗ the algo rithm sta rts on pays log(1 /s i ∗ ) to the global bank account. Afterw ards costs ar e a s signed as fol- lows. Consider a generatio n o n fitness level i with a p opulation size of 2 k . • If the curr en t gener ation is succes sful, we charge c ost 2 to the fitness level; cost 1 pays for the e ffo r t in the gener ation and co st 1 is dep osited on the bank account. In addition, each fitness level j that is skipp ed or reached during this improv ement pays (log(1 / s j ) − log(1 /s j − 1 )) + as a dep osit on the bank a c c oun t. Note that this a moun t is non-negative and it may b e non-integer. • If k ≥ log(1 /s i ) and the current generation is unsuccessful we charge cost 1 to the fitness level. • If k < lo g(1 /s i ) and the curr en t generation is unsuccessful we withdraw cost 1 fro m our bank account. By Lemma 1 the exp ected cost charged to fitness level i in unsuccessful gener a- tions (i. e., not counting the last succ e ssful gener ation) is at most 1. Assuming 14 for the moment that the ba nk a ccoun t is never overdra wn, the ov era ll e x pected cost for fitness level i is at most 1 + 2 + (log(1 /s j ) − log(1 /s j − 1 )) + . Adding the costs for the initial fitness lev el yields the cla imed bo und. W e use the so-called p otent ia l metho d [2, Chapter 17] to show that the bank account is never ov erdr a wn. Our claim is that at any p oin t of time ther e is enough mone y o n the ba nk account to cover the costs of increa s ing the cur r en t po pulation size to at least 2 log(1 /s j ) where j is the cur ren t fitness level. W e construct a p oten tial function indicating the exces s money on the bank account and show that the p otent ial is alw ays non-negative. Let µ t denote the p opulation size in g eneration t and ℓ t be the (random) fitness le vel in g e ne r ation t . By b t we deno te the account balance o n the bank account. W e prov e by induction that b t ≥ (log(1 / s ℓ t ) − log( µ t )) + . As this b ound is alwa ys p ositive, this implies that the ac c oun t is never over- drawn. Aft er the initial fitness lev el has made its dep osit we hav e b 1 = log(1 /s ℓ 1 ) − 0 . Assume by induction that the bound holds for b t . If gener ation t is unsucces sful a nd log ( µ t ) ≥ log(1 /s ℓ t ) then the p opulation size is doubled a t no co st for the ba nk acco unt. As by induction b t ≥ 0 we have b t +1 = b t ≥ 0 = (log (1 / ( s ℓ t )) − log( µ t +1 )) + . If generation t is unsuccessful and log ( µ t ) < log(1 / s ℓ t ) then the algor ithm doubles its p o pulation size and withdraws 1 from the bank a ccoun t. As b t is po sitiv e a nd log ( µ t +1 ) = log( µ t ) + 1, we hav e b t +1 = b t − 1 = lo g(1 /s ℓ t ) − lo g( µ t ) − 1 = log(1 / s ℓ t ) − log( µ t +1 ) . If generation t is successful and the current fitness level increas e s fro m i to some j > i , the a ccoun t balance is increased by 1 + j X a = i +1 (log(1 / s a ) − log(1 /s a − 1 )) + ≥ 1 + (log (1 /s j ) − lo g(1 /s i )) + . This implies b t +1 ≥ b t + 1 + (log (1 /s j ) − log(1 /s i )) + ≥ (log(1 /s i ) − log( µ t )) + + 1 − log(1 /s i ) − lo g( µ t +1 ) ≥ (log(1 /s j ) − log( µ t )) + + 1 ≥ (log(1 /s j ) − log( µ t +1 )) + . The upp er b o unds in this section c an b e easily ada pted tow ards paralle l EAs that do no t pe r form migr a tion a nd p opulation size ada pt ation in every generation, but only every τ g enerations, for a migra tio n in terv al τ ∈ N . Instead of cons idering the probability of leaving a fitness level in one genera tion, we 15 simply c o nsider the probability o f leaving a fitness level in τ generatio ns. This is done by considering s ′ i := 1 − (1 − s i ) τ instead of s i . The resulting time bo unds, based o n s ′ 1 , . . . , s ′ m − 1 , are then with resp e ct to the num b er o f per iods of τ genera tions. T o get b ounds on our origina l measures of time, we just m ultiply a ll bo unds by a factor of τ . 6 Lo w er Bounds for the Sequen tial T ime In order to prove low e r b ounds for the e x pected sequential time we make use of recent re s ults b y Sudholt [16]. He prese n ted a new low er-b ound metho d base d on fitness-level arguments. If it is unlik ely tha t many fitness levels ar e skipp ed when leaving the curr en t fitness-level set then go od low er b ounds can b e shown. The lower b ound applies to every algorithm A in pseudo- Boolea n o pt imiza- tion that only uses standa rd mutations (i. e., flipping ea ch bit independently with probability 1 /n ) to cr eate new offspr ing. Such an EA is called a mutation- based EA. Mor e precisely , e very mutation-based E A A works a s follows. Firs t, A creates µ search p oints x 1 , . . . , x µ uniformly at ra ndom. Then it r epeats the following lo op. A counter t co un ts the num b er of function ev aluations; a ft er initialization we have t = µ . In one itera tion of the lo op the algorithm first selects one out of a ll sea rc h p oin ts x 1 , . . . , x t that hav e been created so far. This decision is based o n the fitness v alues f ( x 1 ) , . . . , f ( x t ) and, po s sibly , also the time index t . It pe rforms a standard mutation of this s earc h p oin t, crea t ing an offspring x t +1 . T o make this work self-contained, we cite (a slightly simplified version o f ) the result here. The p erformance measure consider ed is the num b er of function ev aluatio ns . This can b e assumed to coincide with the num b er of offspring creations as every offspring nee ds to ev aluated exac tly o nce. Theorem 4 ([1 6 ]) . Consider a p artition of the se ar ch sp ac e into non-empty s et s A 1 , . . . , A m such that only A m c ontains glob al optima. F or a mu tatio n-b ase d EA A we say that A is in A i or on level i if the b est individ ual cr e ate d so far is in A i . L et the pr ob ability of tr aversing fr om level i t o level j in one mut ation b e at most u i · γ i,j and P m j = i +1 γ i,j = 1 . Assume that for al l j > i and some 0 < χ ≤ 1 it holds γ i,j ≥ χ P m k = j γ i,k . Then the exp e cte d numb er of function evaluations of A on f is at le ast χ m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 u j . All p opulation upda te s c hemes ar e compatible with this framework; every parallel m utation-ba sed EA using an arbitrary p opulation up date scheme is still a mut ation- based EA. Offspring cr eations are p erformed in pa rallel in our algo- rithms, but one ca n imagine these op erations to b e p erformed seq uen tially . W e can cast a parallel EA with pa rallel offspring crea tio ns as a se quen tia l m utation- based E A that simulates the p opulation manag emen t of an island mo del in the 16 background. Recall that the selectio n in the notion of a mutation-based E A can be based on the time index t . Hence, a sequential mutation-based EA can keep track o f the times when individuals on a sp ecific island hav e b een created or when individuals have immigra ted fro m a different island. The alg orithm can then simulate offspring creations for an island by allowing only individuals on the island to beco me parents. There is one caveat: the par en t selection mech- anism in [16] do es not account for p ossibly randomized decisions made during migration. How ever, the pro of of Theorem 4 go es through in case additional knowledge is used. W e in tro duce the notion of tight fitness levels, where the success pro babilities s i from the classica l fitness - lev el metho d are exact up to a cons tan t factor . Definition 1. Cal l an f -b ase d p artition A 1 , . . . , A m (asymptotic al ly) t ig ht for an algorithm A if ther e exist c onst ants c ≥ 1 > χ > 0 and values γ i,j for 1 ≤ i, j ≤ m such that for e ach p opulation in A i the fol lo wing holds. 1. The pr ob ability of gener ating a p opulation in A i +1 ∪ · · · ∪ A m in one mu - tation is at le ast s i . 2. The pr ob ability of gener ating a p opulation in A j in one mu tatio n, j > i , is at m ost c · s i · γ i,j . 3. F or the γ i,j -values it holds that P m j = i +1 γ i,j = 1 and γ i,j ≥ χ P m k = j γ i,k for al l i < j . Tight f - based partitions imply that the sta ndard upp er b ound b y f -based partitions [1 8 ] is asymptotica lly tight. This holds for all elitist m utation-ba s ed algorithms, that is, mut ation- ba sed algorithms where the bes t fitness v alue in the p opulation can never decrease. Theorem 5. Consider an algorithm A with an arbitr ary p opulation up date str ate gy that only u ses s t anda r d mu tatio ns for cr e ating new offspring. Given a tight f -b ase d p artition A 1 , . . . , A m for a fu n ctio n f , we have E ( T seq ) = Ω   m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 s j   . Pr o of. T he low er b ound on E ( T seq ) follows by a direct application of Theorem 4. W e alre a dy discuss ed that this theor e m applies to all a lgorithms considered in this work. Setting u j := cs j for all 1 ≤ j ≤ m , c and χ be ing as in Definition 1, Theorem 4 implies E ( T seq ) ≥ χ c m − 1 X i =1 Pr [ A i ] · m − 1 X j = i 1 s j . As b oth, χ and c , ar e constants, this implies the claim. This low er b ound shows that for tight f -ba sed partitions both our p opulation upda te schemes pro duce asymptotically optimal results in terms o f the exp ected sequential optimization time, a ssuming no cost of c o mm unica tio ns. 17 7 Non-oblivious Up date Sc hemes W e also briefly disc us s up date s c hemes that are tailo red tow ards par ticular functions, in or der to judge the pe rformance o f our oblivious up date schemes. Non-oblivious p opulation up date schemes may allo w for smaller upp er bo unds for the ex pected par allel time than the o nes seen so far . When the po pulation update s c heme has co mplete knowledge on the function f a nd the f -based partition, an upper b ound ca n b e shown where each fitness level co n- tributes o nly a cons tan t to the exp ected para llel time. By T seq no and T par no we denote the sequential a nd para llel times of the cons ider ed non-oblivio us sc heme. Theorem 6. Given an arbitr ary f -b ase d p artition A 1 , . . . , A m , t h er e is a tai- lor e d p opulation up date scheme for which E ( T seq no ) = O   m − 1 X i =1   Pr [ A i ] · m − 1 X j = i 1 s j     and E ( T par no ) = O m − 1 X i =1 Pr [ A i ] · ( m − i − 1) ! . In p articular, E ( T par no ) = O ( m ) . Pr o of. T he up date scheme cho oses to us e ⌈ 1 /s i ⌉ isla nds if the alg orithm is in A i . Then the probability of finding a n improv ement in o ne genera tio n is at lea st 1 − (1 − s i ) 1 /s i ≥ 1 − 1 /e . The expected par a llel time un til this happ ens is at most e/ ( e − 1) and so the exp ected sequential time is at most e/ ( e − 1) · ⌈ 1 /s i ⌉ ≤ 2 e/ ( e − 1) · 1 / s i . Summing up these exp ectations fo r a ll fitness levels fro m i to m − 1 prov es the tw o b ounds. In some situations it is p ossible to des ig n schemes that p erform even b etter than the ab o ve bo und suggests. F or instance, for trap functions the b est strateg y would b e to use a very large p opulation in the first g eneration so that the optimum is found with high probability , a nd b efore the algo rithm is tric ked to increasing the distance to the g lobal optimum. 8 Bounds for Example F unctions The prev ious b ounds are applica ble in a very genera l context, with ar bitrary fitness functions. W e also give results for selected exa mple functions to estimate po ssible sp eed-ups in mo re concre te settings. W e consider the same exa mple functions and function class es that hav e b een inv es tig ated in [11]. The goal is the maximization of a pseudo-Bo olean func- tion f : { 0 , 1 } n → R . F o r a search po in t x ∈ { 0 , 1 } n write x = x 1 . . . x n , then OneMax( x ) := P n i =1 x i counts the num ber o f ones in x and LO( x ) := 18 P n i =1 Q i j =1 x i counts the num b er of le a ding ones in x . A function is ca lle d uni- mo dal if every non-optimal sear c h p oint has a Hamming neighbo r (i. e., a p oint with Hamming distance 1 to it) with s trictly lar ger fitness. F o r 1 ≤ k ≤ n we also consider Jump k := ( k + P n i =1 x i , if P n i =1 x i ≤ n − k or x = 1 n , P n i =1 (1 − x i ) otherwis e . This function has b een introduced by Droste, Jansen, and W egener [4] as a func- tion with tunable difficult y . Evolutionary algor ithms t ypica lly have to perfo rm a jump to ov erco me a gap by flipping k spec ific bits. F or these functions we obtain b ounds for T seq and T par as summarized in T able 1. The low er bo unds for E ( T seq ) on OneMax and L O follow directly from [16] for all schemes. Scheme E ( T seq ) E ( T par ) OneMax A Θ( n log n ) O ( n log n ) B Θ( n log n ) O ( n ) non-oblivious Θ( n log n ) O ( n ) LO A Θ( n 2 ) Θ( n log n ) B Θ( n 2 ) O ( n ) non-oblivious Θ( n 2 ) O ( n ) unimo dal f A O ( dn ) O ( d log n ) with d f -v alues B O ( dn ) O ( d + log n ) non-oblivious O ( dn ) O ( d ) Jump k A O ( n k ) O ( n log n ) with k ≥ 2 B O ( n k ) O ( n + k log n ) non-oblivious O ( n k ) O ( n ) T able 1: Asymptotic b ounds for exp ected para llel running times E ( T par ) and exp ected sequential running times E ( T seq ) for the par a llel (1+1) EA and the (1+ λ ) E A with ada ptiv e p opulation mo dels. Theorem 7. F or the p ar al lel (1+1) EA and t h e (1+ λ ) EA with adaptive p opu- lation m o dels the upp er b ounds for E ( T seq ) and E ( T par ) hold as given in T able 1. Pr o of. T he upp er bounds for Scheme A fo llo w from Theo rem 1, for Scheme B from Theo rems 2 and 3 and for the non-o blivious sc heme from Theore m 6. Starting p essimistically fro m the first fitness level, the following b ounds hold: • F or O neMax we use the canonical f - based partitio n A i := { x | OneMax( x ) = i } and the c orrespo nding success probabilities s i ≥ ( n − i ) /n · (1 − 1 /n ) n − 1 ≥ ( n − i ) / ( en ). Hence, E ( T par A ) ≤ 2 P n − 1 i =1 log( 2 en n − i ) ≤ 19 2 n log(2 e n ) = O ( n log n ), E ( T seq A ) ≤ 2 n − 1 X i =0 1 s i ≤ 2 n − 1 X i =0 en n − i = 2 en n X i =1 1 i = 2 en · [(ln n ) + 1] , E ( T par B ) ≤ (3( n − 2) + log(2 en )) = O ( n ) and E ( T seq B ) ≤ 3 en · [(ln n ) + 1 ], E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n log n ). • F or LO we use the canonical f -based par tition A i := { x | LO ( x ) = i } and the co r responding s uc c e ss probabilities s i ≥ 1 /n · (1 − 1 /n ) n − 1 ≥ 1 / ( en ). Hence, E ( T par A ) ≤ 2 P n − 1 i =0 log(2 en ) = 2 n lo g (2 en ) = O ( n log n ), E ( T seq A ) ≤ 2 n − 1 X i =0 1 s i ≤ 2 n − 1 X i =0 en = 2 en 2 , E ( T par B ) ≤ (3( n − 2) + log( en )) = O ( n ), E ( T seq B ) ≤ 3 en 2 , E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n 2 ). • F or unimo dal functions with d function v a lue s we use c o rrespo nding suc- cess pr obabilities s i ≥ 1 / ( en ). Hence, E ( T par A ) ≤ 2 P d − 1 i =1 log(2 en ) ≤ 2 d log(2 en ) = O ( dn ), E ( T seq A ) ≤ 2 d − 1 X i =1 1 s i ≤ 2 d − 1 X i =1 en = 2 edn , E ( T par B ) ≤ 3( d − 2) + log( en ) = O ( d + log n ), E ( T seq B ) = 3 edn , E ( T par no ) = O ( d ) and E ( T seq no ) = O ( dn ). • F or Jump k functions with k ≥ 2 and all individuals having neither n − k nor n 1-bits, a n improv ement is found by either increasing or decrea sing the num b er o f 1-bits. This corr esponds to o pt imizing OneMax. In order to improv e a s o lution with n − k 1-bits, a specific bit string with Hamming distance k has to b e created, which has pr obabilit y s n − k at least  1 n  k ·  1 − 1 n  n − k ≥  1 n  k ·  1 − 1 n  n − 1 ≥ 1 en k . Hence, E ( T par A ) ≤ O ( n log n ) + 2 lo g ( en k ) ≤ O ( n log n ) + 2 k log ( en ) = O ( n lo g n ), E ( T seq A ) ≤ O ( n k ) , E ( T par B ) ≤ O ( n )+ k log ( en ) = O ( n + k log n ), E ( T seq B ) ≤ O ( n k ), E ( T par no ) = O ( n ) and E ( T seq no ) = O ( n k ). It can b e seen from T able 1 that b oth our schemes lead to significant sp eed- ups in terms of the pa rallel time. The sp eed-ups inc r ease with the difficult y of 20 the function. This b ecomes obvious when compa ring the results on OneMax and LO and it is even mo re visible fo r Jump k . The upp er bo unds for E ( T par B ) are always as ymptotically lower than those for E ( T par A ), except for J ump k with k = Θ( n ). How ever, without corre sponding low e r b ounds we cannot say whether this is due to differences in the rea l run- ning times o r whether we simply proved tighter guar an tees for B. W e therefore consider the function LO in more deta il and prov e a low er b ound for A. This demonstrates that Scheme B can b e a s ymptotically b etter than Scheme A on a concrete problem. Theorem 8. F or the p ar al lel (1+1) EA and the (1+ λ ) EA with adaptive p op- ulation mo dels on LO we have E ( T par A ) = Ω( n log n ) . Pr o of. W e consider a p essimistic setting (p essimistic for proving a low er bo und) where an improv ement has probability e x actly 1 /n . This ignores that all lead- ing ones have to b e co nserv ed in or der to incr ease the bes t LO-v alue. W e show that with pr obabilit y Ω(1) a t lea st n / 30 improv ements ar e needed in this set- ting. As by Lemma 1 the exp ected waiting time for a n improvemen t is at least max { 0 , (log n ) − 3 } , the conditional exp ected pa rallel time is Ω( n log n ). By the law of total e x pectation, also the unconditional exp ected parallel time is then Ω( n log n ). Let us b ound the exp ected increase in the num b er of leading o nes on o ne fitness level. Let T par i denote the random num b er o f genera tions until the b est fitness incr eases when the algor it hm is on fitness le vel i . By the law of total exp ectation the exp ected increase in the b est fitness in this generation equals ∞ X t =1 Pr [ T par i = t ] · E (LO-increas e | T par i = t ) . (1) The exp ected increase in the num b er of leading ones can b e estimated a s follows. With T par i = t the num b er of mutations in the succ e s sful generation is 2 t − 1 . Let I deno te the num b e r of m utations that increa s e the current best LO -v alue. A well-kno wn prop ert y o f LO is that whe n the current bes t fitness is i then the bits at p ositions i + 2 , . . . , n are uniform. Bits tha t form pa r t of the leading ones after an improv ement ar e called fr e e riders . The proba bilit y of having k free riders is thus 2 − k (unless the end of the bit string is r eac hed) and the ex pected nu mber of free rider s is at most P ∞ k =0 2 − k = 1. The unifor mit y of “ra ndom” bits a t p ositions i + 2 , . . . , n holds after any spe- cific num b er of mutations and in particula r after the mutations in generation T par i hav e been p erformed. How ever, when lo oking at mu ltiple improvemen ts, the free-rider even ts are not ne c e ssarily indep enden t as the “ra ndom” bits ar e very likely to b e co r related. The following rea soning av oids these po ssible de- pendenc ie s . W e consider the improv ements in generation T par i one-by-one. If F 1 denotes the r andom num b er of free r ider s ga ined in the first impr ovemen t, when considering the se c ond improv ement the bits at p ositions i + 3 + F 1 , . . . , n are s till uniform. In some sense, we give awa y the free riders fr om a fitness im- prov ements for free for all following improv ements. This leads to an estimation of 1 + F 1 for the gain in the num b er o f leading ones. 21 Iterating this argument, the exp ected total num b er of leading ones ga ined is th us b ounded by 2 I , the e x pectation being taken for the randomness of free rid- ers. Also considering the exp ectation for the random n umber of improvemen ts yields the b ound 2 E ( I | I ≥ 1 ) as I has b een defined with resp ect to the last (i. e. succes s fu l) g eneration. W e also observe E ( I | I ≥ 1) ≤ 1 + E ( I ) ≤ 1 + 2 t /n . Plugging this into Equa tion (1) yields ∞ X t =1 Pr [ T par i = t ] · (2 + 2 t +1 /n ) = 2 + 2 ∞ X t =0 Pr [ T par i = t + 1] · 2 t +1 /n ≤ 2 + 2 ∞ X t =0 Pr [ T par i > t ] · 2 t +1 /n ≤ 2 + 2 ⌈ log n ⌉ X t =0 2 t +1 /n + 2 ∞ X t = ⌈ log n ⌉ +1 Pr [ T par i > t ] · 2 t +1 /n . The first sum is at most 16 . Using Lemma 1 to estimate the second sum, we arrive at the low er b ound 18 + 2 ∞ X α =0 Pr [ T par i > ⌈ log n ⌉ + α + 1] · 2 ⌈ log n ⌉ + α +2 /n ≤ 18 + 2 ∞ X α =0 exp(2 − α ) · 2 ⌈ log n ⌉ + α +2 /n ≤ 18 + 16 · ∞ X α =0 exp(2 − α ) · 2 α < 29 . 8 . With proba bilit y 1 / 2 the algorithm star ts with no lea ding ones , indep enden tly from all following even ts. The exp ected num b er o f leading ones after n/ 30 improv ements is a t most 29 . 8 / 3 0 · n . By Mar k ov’s inequa lity the pro babilit y of having created n leading o nes is thus at most 29 . 8 / 30 and so with probability 1 / 2 · 0 . 2 / 3 0 = Ω(1) having n/ 30 impr o vemen ts is not enough to find a globa l optimum . 9 Generalizations & Extensions W e fina lly discuss gener a lizations and extensions of our r esults. One interesting question is in how far our res ult s change if the p opulation is not doubled or halved, but instead mu ltiplied or divided by some other v alue b > 1. Then the res ults would change as follows. With so me potential adjustments to constant factors, the log-ter ms in the par allel optimization times in Theor ems 1, 22 2 and 3 would hav e to b e replaced by log b . F or the seq uen tial optimization times stated in these theore ms one would ne e d to multiply these b ounds b y b/ 2 . This means that a larger b would further decreas e the par allel optimization times a t the exp ense of a la rger sequential o pt imization time. Our analyses c a n also be tra nsferred tow a rds the ada ptiv e scheme presented by Jans e n, De Jong, and W egener [9]. Reca ll that in their scheme the p opulation size is divided by the n umber o f successes . In case o f o ne succes s the p opulation size r emains unchanged. This only affects the constant factors in o ur upper bo unds. When the num b er of succ e sses is la rge, the p opulation size might decrease quickly . In most cases, how ever, the n umber of s uc c e sses w ill b e rather small; for instance, the low er b ound for LO, Theo rem 8, ha s shown that the exp ected num b er of successes in a suc c essful generation is constant. How ever, it might b e po ssible that after a difficult fitness level a n easier fitness level is reached and then the num b er of successes might b e m uch higher. In an extreme case their scheme can decrease the p opulation size like Scheme A. In so me sense, their scheme is so mewhat “in b et ween” A and B. With a slight adaptation of the constants, the upp er bo und for Scheme A from Theor em 1 can b e transferr ed to their scheme. Another extension of the r esults a b ov e is to wards maximum popula tion sizes. Although we hav e argued in Section 4 that the p opulation size do es not blow up to o m uch, in practice the maximum num b e r of pro cessor s might be limited. The following theorem ab out E ( T par A ) for ma x im um p opulation size s c a n b e proven by applying arg uments from [1 1]. Theorem 9. The exp e cte d p ar al lel optimizatio n t ime of Scheme A for a maxi- mum p opulation size µ := µ max > 1 is b ounde d by E ( T par A ) ≤ m · [log µ max + 2] + 2 µ max m − 1 X i =1 1 s i . Pr o of. W e pe ssimistically estimate the exp ected par allel time by the time until the p opulation cons ists of µ max islands plus the exp ected optimization time if µ max islands are av ailable. The time until µ max islands ar e in volved is log µ max on one fitness level. Hence, summing up a ll levels p essimistically g iv es m log µ max . F or µ max islands the success probabilit y on fitness level i with success probability s i for o ne is land is given by 1 − (1 − s i ) µ max . Hence, the exp ected time for leaving fitness level i if µ max islands are av ailable is at most 1 / [1 − (1 − s i ) µ max ]. No w we consider tw o cas e s . If s i · µ max ≤ 1 we hav e 1 − (1 − s i ) µ max ≥ 1 − (1 − s i µ max / 2) = s i µ max / 2 bec ause for all 0 ≤ xy ≤ 1 it holds (1 − x ) y ≤ 1 − xy / 2 [11, Lemma 1]. Otherw is e, if s i · µ max > 1 we hav e 1 − (1 − s i ) µ max ≥ 1 − e − s i µ max ≥ 1 − 1 e . Thus, m − 1 X i =1 1 1 − (1 − s i ) µ max ≤ m − 1 X i =1 max  1 1 − 1 /e , 2 µ max · s i  ≤ m · e e − 1 + 2 µ max m − 1 X i =1 1 s i . 23 Adding the exp ected waiting times un til µ max islands are inv olved yields the claimed b o und. In ter ms of our test functions OneMax, LO, unimoda l functions, and Jump k , this leads to the following result that can b e prov en like Theo rem 7. Corollary 3 . F or the p ar al lel (1+1) EA and t he (1+ λ ) EA with Scheme A the fol lowi ng holds for a m ax i mum p opulation size µ := µ max > 1 : • E ( T par A ) = O ( n log µ max + n lo g( n ) /µ max ) for OneMax , which gives O ( n lo g lo g n ) for µ max = log n , • E ( T par A ) = O ( n lo g µ max + n 2 /µ max ) for LO , which gives O ( n lo g n ) for µ max = n , • E ( T par A ) = O ( d log µ max + dn/µ max ) for unimo dal fun ct i ons with d function values, which gives O ( d log n ) for µ max = n , • E ( T par A ) = O ( n log µ max + n k /µ max ) for Jump k , which gives O ( nk log n ) for µ max = n k − 1 . Note that Co rollary 3 has led to a n improvemen t of E ( T par A ) from O ( n log n ) to O ( n lo g log n ) for µ max = log n . This obviously a ls o holds in the setting of unrestricted p opulation sizes . 10 Conclusions W e have presented tw o schemes for ada pting the offspring popula tio n size in evolutionary a lgorithms and, more generally , the num b er of islands in pa rallel evolutionary algor ithms. Both schemes double the p opulation size in each gen- eration tha t do es not yield an improvemen t. Despite the exp onen tial growth, the exp ected se q uen tial optimizatio n time is a symptotically o pt imal for tight f -based partitions . In gener al, we obtain b ounds that ar e a s ymptotically equal to upper bo unds via the fitness-level metho d. In ter ms of the pa rallel computation time exp ected waiting times o n a fitness level can b e repla ced by their logarithms for b oth schemes, compar ed to a ser ial EA. This yields a tremendo us sp eed-up, in particula r for functions where finding improv ements is difficult. Schem e B, doubling or halving the p opulation size in each gener ation, turned o ut to b e more e ff ective than r esets to a single island as in Scheme A. This is b ecause B can quickly decreas e the po pulation size if necessary . The effort sp en t w hile this happ ens do es not a ffect the a symptotic bo unds for exp ected pa rallel and sequential times. Apart fro m our main results, we have introduced the notion o f tight f -based partitions and new arguments from amortized a nalysis of a lgorithms to the theory of evolutionary a lgorithms. An op en question is how our schemes p erform in situa tio ns where the fitness- level metho d do es not provide go od upper b ounds. In this cas e o ur b ounds may b e o ff from the real exp ected running times. In par ticular, there may 24 be examples where incr easing the o ff spring po pulation size b y too mu ch might be detrimen tal. O ne constructed function where large offspring p opulations per form badly w as pr e sen ted in [9]. F uture work could characterize function classes for which our schemes are efficient in c omparison to the real exp ected running times. The notion of tight f -based partitions is a first step in this direction. Ac kno wledgmen ts The authors would like to thank the Germa n Academic E xc hange Ser vice for funding their r esearch . Part of this w ork was done while bo th author s were visiting the I nternational Computer Science Institute in Berkeley , CA, USA. The seco nd a uth or was pa rtially supp orted by EPSRC g ran t EP/ D 05 2785/1. The authors thank Car ola Winzen for many useful sugg estions that help e d to improv e the pr esen tation. References [1] E. Alba. Parallel meta heuristics: A new cla ss o f algo rithms, 2 005. [2] T. H. Co rmen, C. E . Leis erson, R. L. Rivest, and C. Stein. Int r o duction to Algo rithms . The MIT P ress, 2nd edition, 2001. [3] J. Costa , R. T av a res, a nd A. Rosa . Exp erimen tal study o n dy namic ran- dom v aria tion of po pulation size. In Pr o c e e dings of t h e IEEE Int ernatio nal Confer enc e on Systems, Man and Cyb ernetics , 1999. [4] S. Droste, T. Jansen, and I. W egener. O n the ana lysis of the (1+1) evolu- tionary alg o rithm. The or etic al Computer Scienc e , 27 6:51–81, 2002 . [5] A. E ib en, E. Marchiori, and V. V alko. Evolutionary alg orithms with on- the-fly p o pulation size adjustment. In Par al lel Pr oblem Solving fr om Natu r e (PPSN 2004) , pa ges 41 – 50. Spr ing er, 200 4. [6] N. Hansen, A. Ga welczyk, and A. Oster meier. Sizing the popula tion with resp ect to the lo cal progr ess in (1, λ )-evolution strategies–A theoretical anal- ysis. In 1995 IEEE International Confer enc e on Evolutionary Computation , pages 80– 85, 1 995. [7] G. Harik and F. Lob o. A para meter-less genetic alg orithm. In Pr o c e e dings of t he Genet i c and Evolutionary Computation Confer enc e (GECCO 1999) , pages 258 –265, 1 9 99. [8] M. Herdy . The num b er of offspring a s s trategy para meter in hier arc hically organize d evolution strategies. A CM SIGBIO Newsletter , 13(2 ):9 , 1 9 93. [9] T. Jansen, K. A. De Jong, and I. W egener. On the c hoice of the o ff- spring p opulation size in ev olutionar y a lgorithms. Evolutionary Computa- tion , 13:41 3–440, 2005. 25 [10] J. L¨ assig and D. Sudholt. The b enefit of migra tio n in pa rallel evolutionary algorithms. In Pr o c e e dings of the Genetic and Evolutionary Computation Confer enc e (GECCO 2010) , pages 11 0 5–1112, 2 010. [11] J. L¨ assig and D. Sudholt. Gene r al sc heme for ana lyzing r unning times of parallel evolutionary a lg orithms. In 11th International Confer enc e on Par al lel Pr oblem Solving fr om Natur e (PPSN 2010) , volume 6238 of LNCS , pages 234 –243. Spring er, 201 0. [12] Z. Michalewicz. Genetic algorithms + data structu r es = evolution pr o gr ams . Springer, 19 9 6. [13] G. Rudolph. How mutation and s e le ction solve long -path problems in p oly- nomial exp ected time. Evolutionary Computation , 4(2):195–20 5, 1997 . [14] D. Schlierk amp-V oo sen a nd H. M ¨ uhlen b ein. Strategy adaptation b y com- peting subpo pulations. Par al lel Pr oblem Solving fr om Natur e (PPSN III) , pages 199 –208, 1 9 94. [15] H.-P . Sch wefel. Numeric al optimization of c omputer mo dels . John Wiley & Sons, Inc. New Y ork, NY, USA, 1 981. [16] D. Sudholt. General low er b ounds for the r unning time of evolutionary algorithms. In 11th International Confer enc e on Par al lel Pr oblem Solving fr om Natur e (PPSN 2010) , volume 6238 of LNCS , pages 124–1 33. Springer, 2010. [17] M. T omassini. Sp atial ly St ructur e d Evolutionary Algorithms: A rtificial Evolution in Sp ac e and Time . Springer, 200 5. [18] I. W egener. Metho ds fo r the analy sis of evolutionary algor ith ms on ps eudo- Bo olean functions. In R. Sarker, X. Y ao, and M. Mohammadian, editor s, Evolutionary Optimization , pages 34 9–369. K lu wer, 20 02. 26

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment