Adaptive Drift Analysis

We show that, for any c>0, the (1+1) evolutionary algorithm using an arbitrary mutation rate p_n = c/n finds the optimum of a linear objective function over bit strings of length n in expected time Theta(n log n). Previously, this was only known for …

Authors: Benjamin Doerr, Leslie Ann Goldberg

Adaptiv e Drift Analysis ∗ Benjamin Do err Max Planc k Institute for Computer Science, Campus E1 4 66123 Saarbr ¨ uc k en, German y Leslie Ann Goldb erg Departmen t o f Computer Science Univ ersit y of Live rp o ol Ash ton Bldg, Liv erp o ol L69 3BX, UK July 26, 2018 Abstract W e show that, for any c > 0, the (1+1) evolutionary algo r ithm using an ar bi- trary m utation rate p n = c/n finds the optimum of a linear ob jective function ov er bit strings of length n in exp ected time Θ( n log n ). Previous ly , this w as only known for c ≤ 1. Since previo us work also shows that universal drift functions cannot exist for c larg er tha n a certain constant, we instead define dr ift functions which depe nd c rucially on the relev ant ob jective functions (and also o n c itself ). Using these ca refully-constructed drift functions, we prov e that the exp ected optimisation time is Θ( n log n ). By giv ing a n alternative pro of of the multiplicative drift theorem, we a lso show tha t o ur optimisa tio n-time b ound holds with high probabilit y . 1 In tro duction Drift analysis is cen tral to the field of ev olutionary alg orithms. This type of analysis wa s implicit in the w ork of Droste, Jansen and W egener [9], who analysed the optimisation of linear fun ct ions o v er bit strings by th e classical (1+1) ev olutionary algorithm ((1+1) EA) with m utation rate p n = 1 /n . The metho d was made explicit in the w ork of He and Y ao who ga ve a simp le, clean analysis. Later fund amental applications of drift analysis in the theory of evo lutionary computation include [11, 12, 15, 20, 22]. Recen t work b y J ohannsen, Winzen and the fir st au th or [6, 7] shows that drift anal- ysis, as it is currently used, relies strongly on the fact that the m utation probabilities p n ∗ This work wa s b egun while b oth authors were visiting t h e “Centre de Recerca Matem´ atica d e Catalun ya”. I t profited greatly from this ideal environmen t for collaboration. A preliminary announ ce- ment of th e result ( wi thout pro of s) app eared in [3]. The wo rk describ ed in this pap er wa s partly supp orted by EPSRC Research Gran t (refs EP/I01152 8/1) “Computational Counting” 1 are relativ ely small. As He and Y ao obser ved [17], the analysis in [16] only applies if the m utation p r obabilit y p n is strictly smaller than 1 /n , where n is the length of the bit strings of the search sp ac e. This restrictio n wa s impro v ed in [18], where a family of drift fun cti ons w as presen ted that wo rks f o r th e most common mutation p robabilit y p n = 1 /n . 1 Ho w ev er, as Do err et al. hav e observ ed [7], this family of dr ift fu n cti ons still ceases to work for p n ≥ 4 /n . F urthermore [7], if p n > 4 /n , then for any unive rsal family of drift f unctions (from the class of log-of-linear f u nctions) there is a linear ob jectiv e fu nction f , and a search space elemen t x , su c h that the dr ift from x is negativ e (so the p roof that the (1+1) EA con v erges quic kly do es not go th rough). Do err et al. ha v e also sho wn [6] that this problem c annot b e fixed b y applying the av erag ing app roa c h of J¨ agersk ¨ upp er [19] — that ap p roac h fails for p n ≥ 7 /n . Thus, p rior to the work p resen ted here, it w as an op en problem whether the (1+1) EA minimises linear ob jectiv e fun ctions o v er bit str ings in O ( n log n ) time when the mutat ion p robabilit y is p n = c/n for c ≥ 7. Our main result sh o ws that this is the case. Since it is kn o wn that no universal family of d rift functions exists, we instead manage to defi n e a feasible f amil y of drift fu nctio ns in suc h a wa y that the drift fu nctio n Φ f dep ends crucially on the ob jectiv e f unction f . Using this idea, w e show (see Theorem 7) that, for an y constant c , the (1+1) EA with m utation probability p n = c/n optimises an y family of linear ob jectiv e fu nctio ns o v er bit strin gs in expected time O ( n log n ). A corresp onding lo we r b ound follo ws ea sily from standard arguments, s ee Th eo rem 19. T hus, our result is as go od as p ossible (up to a constan t factor). By repr oving a multiplicat ive dr if t th eo rem (wh ic h was first used to analyse ev olu- tionary algorithms in [7]), we also sh o w that our b ound on the optimisation time holds with h ig h pr o bability . Th e tail b ound s in our d rift theorem can also b e used to sh o w that man y other kno wn b ounds on optimisation times also hold with high probability . This has b een done for the (1+1) EA fin ding minim um spanning trees, compu ting shortest paths or Eulerian cycles in [4]. 2 Drift Analysis In th is section, w e giv e a brief description of drift analysis, w hic h is suffi ci ent for our purp oses. F or a more general backg roun d to drift analysis, w e refer to the pap ers cited ab o v e. 2.1 The (1+1) ev olutionary algorithm Let F b e a set of obje ctive functions . Eac h f ∈ F is asso ciate d with a pr oblem size n ( f ) ∈ N and is a function from the se ar ch sp ac e Ω f to R ≥ 0 . Giv en f , the goal is to find an elemen t x ∈ Ω f suc h that f ( x ) is minimise d . O ur assumption that the optimisation 1 Note, though, that in that p ap er an EA only accepting strict imp ro v ements was analysed; this fact w as exploited in the pro of . W e ha ve little doubt, though, that their pro of can b e adapted to work als o for th e more common setting that also an offspring with eq ual fitn ess is accepted. 2 problem is minimisation (as opp osed to maximisation) is without loss of generalit y , as is our assumption that the r ange of eac h ob jectiv e fun ction con tains only non-negativ e n umb ers. F or eac h ob jectiv e function f , let Ω opt ,f ⊆ Ω f denote the set of optimal searc h p oin ts — that is, those that minimise the v alue of f . Definition 1. W e say that F is a family of obje ctive functions ov er bit strings if, for every f ∈ F , Ω f = { 0 , 1 } n ( f ) . In this c ase, an element x ∈ Ω f is a string of n ( f ) bits, x = x n ( f ) . . . x 1 . Definition 2. Supp ose that F is a family of obje ctive functions over bit strings. We say that F is linear if e ach f ∈ F is of the form f ( x ) = P n ( f ) i =1 a i x i , wher e the c o efficients a i ar e r e al numb ers. Without loss of gene r ality, we assume that a i +1 ≥ a i > 0 f or al l i ∈ { 1 , . . . , n ( f ) − 1 } . Example 1. Supp ose, for n ∈ N , that f n : { 0 , 1 } → R ≥ 0 is define d by f n ( x n . . . x 1 ) = P n i =1 2 i − 1 x i . Then F = { f n } is a line ar family of obje ctive functions over bit strings. The value of f n ( x ) is the binary value of the bit string x = x n . . . x 1 . Example 2. Supp ose, for n ∈ N , that f n : { 0 , 1 } → R ≥ 0 is define d by f n ( x n . . . x 1 ) = P n i =1 x i . Then F = { f n } is a line ar family of obje ctive functions over bit strings. The value of f n ( x ) is the numb er of ones in the bit string x = x n . . . x 1 . The r andomised searc h heuristic that w e study is the well-kno wn (1+1) EA. T o emphasize th e role of the parameters, w e refer to this algorithm as the (1+1) EA for minimising F . Giv en an ob jectiv e function f ∈ F , this algorithm starts with an initial solution x , c hosen u niformly at rand om from th e searc h sp ace Ω f . In eac h iteration, from its existing solution x , it generates a new solution x ′ b y mutation . Definition 3. Su pp ose F is a family of obje ctive fu nc tions over bit strings and that p n ∈ [0 , 1] f o r n ∈ N . In indep endent b it muta tion , e ach bit x i of x is flipp e d indep endently with pr ob ability p n . In other wor ds, for e ach i ∈ { 1 , . . . , n } indep endently, we have Pr( x ′ i = 1 − x i ) = p n and Pr( x ′ i = x i ) = 1 − p n . Often, p n = 1 /n , but we do not make this assumption. In the sub sequen t sele ction step, if f ( x ′ ) ≤ f ( x ), the EA ac c epts the solution x ′ , meaning that the next iteration starts with x new := x ′ . Otherwise, the n ext iteration starts with x new := x . Since we are intereste d in determining the num b er of iterations that are necessary to find an optimal solution, we do not sp ecify a termination criterion here. A pseu d o-c o de d escriptio n of th e (1+1) EA is given in Algorithm 1 Note that the (1+1) E A is not t ypically us ed to solv e difficult optimisati on pr oblems in practice. There are other, m ore complex, search heuristics whic h are b etter for suc h problems in practice. Ho w ev er, unders ta ndin g the optimisation b eha viour of the (1+1) EA often helps us to pr edict the optimisation b eha viour of more complicated EAs (wh ic h are mostly to o complex to allo w r igorous th eo retical analysis). As such, the (1+1) EA pro ve d to b e an imp ortan t to ol that attracted significant researc h efforts (see, e.g., [1, 8, 9] for some early wo rks). 3 Algorithm 1 Th e (1+1) EA for minimising F ov er bit strings with indep endent bit m utation 1: Inpu t an ob jectiv e function f ∈ F . 2: Initialization: Ch oose x ∈ { 0 , 1 } n ( f ) uniformly at random. 3: rep eat forever 4: Create x ′ ∈ { 0 , 1 } n ( f ) b y cop ying x . 5: Mutation: Flip eac h bit in x ′ indep endent ly with probabilit y p n ( f ) . 6: Selection: if f ( x ′ ) ≤ f ( x ) then x := x ′ . 2.2 A simple drift theorem with tail b ounds The optimisation time of the (1+1) EA for minimising F is defined to b e the num b er of times that the ob jectiv e fun ct ion is ev aluated b efore the optimum is found . Th is is (apart from an additive d evia tion of one) equal to the n umb er of m utation-selection iterations. S u pp ose that c is a p ositiv e constan t and th at F is a family of linear ob j ective functions o v er b it strings. Our main result (Theorem 7) shows that the (1+1) EA for minimising F with indep enden t bit-mutat ion rate p n = c/n has exp ected optimisation time O ( n ( f ) log n ( f )). It also sho ws that, with high probability , the optimisation time is of this order of magnitud e. In order to prov e th e main resu lt, we introd u ce the notion of pie c e- wise p olyno mial drift . This will b e explained in S ec tion 2.5. In this section, we prepare th e groundw ork, b y introd ucing the basic d r ift theorems that we will need. W e start b y definin g the notion of a fe asible f amily of drift f unctions. When feasible families of drift f unctions exist, they allo w an elegan t analysis yielding u p p er b oun ds for th e optimisation time of EAs. Definition 4. L et ν : N → R ≥ 0 b e monotonic al ly incr e asing and c onsider a family F of obje ctive fu nctions . F or e ach f ∈ F , let Φ f b e a function fr om Ω f to R ≥ 0 . We say that Φ = { Φ f } is a ν - f e asible f amily of drift functions for a (1+1) EA for minimising F , if ther e is an n 0 ∈ N such that, for every f ∈ F with n ( f ) ≥ n 0 , the fol lowing c onditions ar e satisfie d. 1. Φ f ( x ) = 0 for al l x ∈ Ω opt ,f ; 2. Φ f ( x ) ≥ 1 for al l x ∈ Ω f \ Ω opt ,f ; 3. for al l x ∈ Ω f \ Ω opt ,f , E [Φ f ( x new )] ≤  1 − 1 ν ( n ( f ))  Φ f ( x ) , wher e, as ab ove, we denote by x new the solution r esulting fr om exe cuting a single iter a tion (c onsisting of mutation and sele ction) with initial solution x . Here is a simple example. 4 Example 3. Fix a p ositive c onstant c . L et F b e a line ar family of obje ctive functions over bit strings and c onsider the (1+1) EA f o r minimising F which u se s indep endent bit mutation with p n = c/n . Supp ose that, for e ach f ∈ F , the c o efficient a 1 is at le ast 1 . Then the trivial family Φ with Φ f = f is an ( n/c ′ ) -fe asible family of drift functions for this EA, wher e c ′ := c (1 − ( c/n )) n − 1 ≈ ce − c . However, as we shal l se e, this not often a very useful family of drift functions. The follo wing we ll-kno wn theorem (Th eo rem 5, b elo w) shows h o w the optimisation time can b e b ound ed using a d rift fun ct ion. S imila r argument s app ear in the context of coupling pr oofs. See, f or example, [10, Section 5]. Much more is known ab out drift analysis. See, for example [14]. Note that Theorem 5 give s a pr obabilit y tail b ound in addition to an up per b ound on th e exp ected optimisation time. Th e tail b ound is not new, but it seems to b e unknown in th e ev olutionary algorithms literature. It can b e applied to improv e s ev eral previous resu lts (see [4]). Theorem 5. Consider a family F of obje ctive functions and a ν -fe asible family Φ of drift functions for a (1+1) EA f or minimising F . L et Φ max ,f denote max { Φ f ( x ) | x ∈ Ω f } . Then ther e is an n 1 ∈ N such that, for ev ery f ∈ F with n ( f ) ≥ n 1 , the exp e cte d optimisation time of the EA is at most ν ( n ( f ))(ln Φ max ,f + 1) . Also , for any λ > 0 , the pr ob ability that the optimisation time exc e e ds ⌈ ν ( n ( f ))(ln Φ max ,f + λ ) ⌉ is at most exp( − λ )) . Pr o of. Let n 0 b e the v alue from Definition 4. Definition 4 ru le s out th e p ossibilit y that max { ν ( n ) | n ≥ n 0 } < 1. Also, if max { ν ( n ) | n ≥ n 0 } = 1 then, from p art (3) of the definition, E [Φ f ( x new )] = 0 so the optimisation time is 1. Su pp ose then, that there is an n ∈ N suc h that ν ( n ) > 1. Let n ′ 0 b e min { n ∈ N | ν ( n ) > 1 } (actually , it w ould suffi ce to tak e n ′ 0 to b e an y mem b er of this set, but, for concreteness, we tak e th e m inim um). Let n 1 = max( n 0 , n ′ 0 ). Now consider any f ∈ F with n ( f ) ≥ n 1 and note that the first t w o conditions in Definition 4 are satisfied. Let n = n ( f ). Fix an arbitrary initial solution x 0 ∈ Ω f . Consider starting the EA with this initial solution x 0 instead of choosing a random one. Denote by Φ [ t ] the v alue of Φ f ( x ) after t selectio n-mutatio n steps. Denote b y T opt ,x 0 the first time when the curr en t solution x is optimal. Thus, from Definition 4, Φ [ T opt ,x 0 ] = 0, and for t < T opt ,x 0 , w e ha ve Φ [ t ] ≥ 1. F rom th e third condition in Definition 4, E [Φ [ t ] ] ≤ (1 − 1 /ν ( n )) t Φ [0] ≤ (1 − 1 /ν ( n )) t Φ max ,f ≤ exp( − t/ν ( n ))Φ max ,f , where, in the last estimate, we used the w ell-kno wn in equ al it y 1 + z ≤ e z , which is v alid for all z ∈ R . 5 It is well kno wn (see, for example [13, Problem 13(a), S ection 3.11]) that if X is a random v ariable taking v al ues in the non-negativ e in tegers, then E [ X ] = P ∞ i =1 Pr( X ≥ i ). Therefore, th e exp ected optimisation time E [ T opt ,x 0 ] can b e written as E [ T opt ,x 0 ] = X i ≥ 1 Pr( T opt ,x 0 ≥ i ) = X t ≥ 0 Pr(Φ [ t ] > 0) . So, for any non-n egativ e integ er T , E [ T opt ,x 0 ] ≤ T + P t ≥ T Pr(Φ [ t ] > 0). Sin ce , by Mark o v’s in equalit y , Pr(Φ [ t ] > 0) = Pr(Φ [ t ] ≥ 1) ≤ E [Φ [ t ] ], E [ T opt ,x 0 ] ≤ T + X t ≥ T E [Φ [ t ] ] . No w let T = ⌈ ln(Φ max ,f ) ν ( n ) ⌉ = ln(Φ max ,f ) ν ( n ) + ε for some 0 ≤ ε < 1. By ou r upp er b ounds ab o v e, we obtain E [ T opt ,x 0 ] ≤ T + (1 − 1 /ν ( n )) T Φ max ,f X ∞ i =0 (1 − 1 /ν ( n )) i . Since ν ( n ) > 1, P ∞ i =0 (1 − 1 /ν ( n )) i = ν ( n ). Plugging this in with the definition of T and using (1 − 1 /ν ( n )) ln(Φ max ,f ) ν ( n ) ≤ exp − ln(Φ max ,f ) = 1 / Φ max ,f , E [ T opt ,x 0 ] ≤ ln(Φ max ,f ) ν ( n ) + ε + (1 − 1 /ν ( n )) ε ν ( n ) = ν ( n ) ( ln( Φ max ,f ) + ε/ν ( n ) + (1 − 1 /ν ( n )) ε ) . W e can now chec k, for ev ery ε ∈ [0 , 1], that ε/ν ( n ) + (1 − 1 / ν ( n )) ε ≤ 1, as required. This is easiest seen by c hec king it for ε = 0 and ε = 1 and noting that the term is conv ex in ε . Finally , let T ′ := ⌈ ( ν ( n ))(ln(Φ max ,f ) + λ ) ⌉ f or λ > 0. W e compute Pr( T opt ,x 0 > T ′ ) = Pr(Φ [ T ′ ] > 0) ≤ E [Φ [ T ′ ] ] ≤ exp( − T ′ /ν ( n ))Φ max ,f ≤ exp( − λ ) . The p roof ab o v e us es the argument E [Φ [ t ] ] ≤ (1 − 1 /ν ( n )) t Φ max ,f . This had b een used previously in the so-called metho ds of exp e cte d weight de cr e ase [21]. There, ho we ve r, it was follo w ed up with a simple Mark o v inequalit y argument that led to a b ound on the exp ected run-time that is w eak er (b y a constan t factor) than what our drift theorem yields. Hence the main difference b et wee n the t w o approac hes is that ours giv es a b etter transformation of the dr ift of E [Φ [ t ] ] in to a b ound on E [min { t | Φ [ t ] < 1 } ]. No te, just to a v oid misunderstand ings, that typical ly E [min { t | Φ [ t ] < 1 } ] and min { t | E [Φ [ t ] ] < 1 } are different quantit ies. Theorem 5 ind ica tes that a family of d r ift fu nctio n is b etter if the maxim um v al- ues Φ max ,f are small. In Examp le 3, taking Φ f = f only yields an upp er b ound O ( n ( f ) log f max ) for the exp ecte d optimisation time, where f max = max { f ( x ) | x ∈ Ω f } . This can b e a weak b ound. F or example, app lyin g it to the family F fr om Example 1 yields a b ound O ( n ( f ) 2 ) for the exp ected optimisation time (whic h, as we shall see, is a w eak b ound). 6 2.3 Drift analysis for linear ob jectiv e functions ov er bit strings The main goal of this pap er is to analyse the optimisation time of the (1+1) EA for min- imising a linear family F of ob jectiv e f unctions o v er bit strings, assuming in dep enden t-bit m utation with p n = c/n (for a fixed constant c ). The reason for assum ing p n = c/n is that results of Droste, J ansen and W egener (Theorem 13 and 14 in [9]) s h o w th at this is the optimal order of magnitude. S in ce our ob jectiv e is an O ( n ( f ) log n ( f )) b ound on optimisation time, we ease the language w ith the follo wing d efinition. Definition 6. A feasible family of drift fun cti ons is a family of drift functions which is ν -fe asible for a function ν ( n ) = O ( n ) . Finding feasible dr ift functions is typical ly qu ite tric ky . Do err, Johan n sen and Winzen built on earlier ideas of Droste, Jans en and W egener [9 ] and He and Y ao [18 ] in order to sho w that, for any linear family F of ob jectiv e f unctions o v er b it strings, th e family Φ defin ed by Φ f ( x ) = ⌊ n ( f ) / 2 ⌋ X i =1 x i + 5 4 n ( f ) X i = ⌊ n ( f ) / 2 ⌋ +1 x i is a feasible family of d rift fu nctio ns for th e (1+1) E A f or minimisin g F whic h u ses indep endent bit mutation with p n = 1 /n . (Thus, th is su ffices for the case c = 1.) This family Φ = { Φ f } is said to b e a unive rsa l family of feasible drift fu nctions b ecause Φ f dep ends on n ( f ), but not otherwise on f . Since Φ max ,f = Θ( n ( f )), th is give s an exp ected optimisation time of O ( n ( f ) log n ( f )), wh ic h is asymptotically optimal [9]. Pro ving that this Φ is a feasible family , while not trivial, is n ot o v erly complicated. This disco v ery of a u niv ersal family of feasible d rift functions giv es an elegan t analysis of the EA. Unfortunately , ev en if we allo w Φ max ,f to gro w faster than Θ( n ( f )) , such u niv ersal families of feasible dr if t functions only exist when c is sm al l (as n o ted in the intro d uction to this pap er). F or larger v alues of c , the f unction Φ f has to dep end up on f . Prior to this p aper, no n on-trivial drift functions of this form we re kno wn, so it wa s an op en problem whether the O ( n ( f ) log n ( f )) time b ound also app lies for c > 1. W e show that this is the case. 2.4 Our result Our main theorem is as follo ws. Theorem 7. L et c b e a p ositive c onst ant. L et F b e a family of line ar obje ctive functions over bit strings. The (1+1) EA for minimising F with indep end ent bi t- muta tion r ate p n = c/n has exp e c te d optimisation time O ( n ( f ) log n ( f )) . Ther e is a c onsta nt k and a fu nction ν ( n ) = O ( n ) such that, for any λ > 0 , the pr o b ability that the optimisation time exc e e ds this b ound by k ν ( n ) λ time steps is at most k exp ( − λ ) . W e pr o v e Th eo rem 7 by constru cting a feasible family of drift functions for the EA that is pie c e-wise p olynomial (a notion th at will b e defi n ed in S ec tion 2.5). Lemma 9 7 extends Theorem 5 to piece-wise p olynomial feasible f amilies of drift functions, allo wing us to prov e Th eorem 7. Theorem 7 is in teresting for t wo reasons. On the m et ho dological side, th e p roof of the theorem greatly enlarges our under s ta nd in g ab out how to choose go od dr ift f unctions. This migh t enable b etter solutions for some p r oblems where drift analysis h as n ot y et b een v ery successful. Examples are the m inim um spann ing tree problem [21] and the single-criteria form ulation of the single-source shortest p ath problem [2]. F or b oth prob- lems, th e known b oun d s on the exp ected optimisation time con tain a log ( f max )-factor, stemming from the f a ct that, at least imp lici tly , drift analysis with the trivial family of drift functions with Φ f = f is condu cte d. Of course, our result is also inte resting b ecause it for the first time sho ws that linear functions are optimised by the (1+1) EA in time O ( n ( f ) log n ( f )), regardless of what m utation prob ab ility p n = c/n is used. Note that this is not ob vious. In [5], the authors sho w that already for monotone fun ct ions, a constan t factor change in the m utation probabilit y can c hange the optimisation time from p olynomial to exp onen tial. 2.5 Piece-wise p olynomial drift Let F b e a family of linear ob jectiv e fu n cti ons o ve r bit str ings. Let Φ b e a feasible family of drift functions for a (1+1)-EA f or m in imising F . W e start with an elemen tary observ ation ab out Φ, which is that, in order to obtain an O ( n ( f ) log n ( f )) b ound on the exp ected optimisation time, we d o not really need Φ max ,f to b e b ounded from ab o v e by a p olynomial in n ( f ) — we can afford to hav e a constan t n umb er of “h uge jumps”. The f ol lo wing arguments can b e s ee n as a v ariation of the fitness lev el metho d [23]. Definition 8. Fix k ∈ N . Supp ose that, for every f ∈ F , M f = M f 0 , . . . , M f k is a p artition of Ω f . L e t M = {M f | f ∈ F } . We say M is a family of fitn ess-based k -partitio ns for F i f for al l f ∈ F , 1. M f 0 = { 0 } , 2. for al l i < j , x ∈ M f i and y ∈ M f j , we have f ( x ) < f ( y ) . W e us e the notation min Φ f ( M f j ) to denote min { Φ f ( x ) | x ∈ M f j } and the notation max Φ f ( M f j ) to denote max { Φ f ( x ) | x ∈ M f j } . Lemma 9. L et F b e a family of line ar obje ctive fu nctions over bit strings. L et Φ b e a ν -fe asible family of drift functions f or a (1+1)-EA for minimising F . L et M b e a family of fitness-b ase d k -p artitions for F . Then ther e is an n 1 ∈ N such that, for every f ∈ F with n ( f ) ≥ n 1 , the exp e cte d optimisation time of the EA i s at most ν ( n ( f )) k X j =1  ln(max Φ f ( M f j )) − ln(min Φ f ( M f j )) + 1  . 8 Also , for any λ > 0 , the pr ob ability that the optimisation time exc e e ds k X j =1 l ν ( n ( f ))  ln(max Φ f ( M f j )) − ln(min Φ f ( M f j ) + λ m is at most k exp( − λ ) . Pr o of. Let n 1 b e the quant it y in Theorem 5 (wh ich is at least as large as the quantit y n 0 in Definition 4). Let f ∈ F with n ( f ) ≥ n 1 . F or 0 ≤ j ≤ k , let Ω f ,j = S j ℓ =0 M f ℓ and let µ f ,j = min Φ f ( M f j ). F or 1 ≤ j ≤ k , define Ψ f ,j : Ω f ,j → R as follo w s. If Φ f ( x ) ≥ µ f ,j then Ψ f ,j ( x ) = Φ f ( x ) /µ f ,j . Otherwise, Ψ f ,j ( x ) = 0. No w for j ∈ { 1 , . . . , k } , consider restricting the searc h sp ac e to Ω f ,j . S ince the partition M f is fi tness b ase d, we conclude that, if the EA is started w it h inpu t f , and an in iti al solution in Ω f ,j , all new solutions that are accepted by the EA are in Ω f ,j . Considering all solutions in Ω f ,j − 1 to b e equiv alen t to the all-zero state 0 , w e note that { Ψ f ,j | f ∈ F } satisfies th e first t w o conditions of b eing a ν -feasible family of drif t functions for F on { Ω f ,j } . Also, if Φ f ( x ) ≥ µ f ,j then E [Φ f ( x new )] ≤ (1 − 1 /ν ( n ( f )))Φ f ( x ) so E [Ψ f ,j ( x new )] ≤ E [Φ f ( x new ) /µ f ,j ] ≤ (1 − 1 /ν ( n ( f )))Ψ f ,j ( x ) . So, by Theorem 5, the exp ected time un til a solution in Ω f ,j − 1 is reac hed is at m ost ν ( n ( f ))(1 + ln max { Ψ f ,j ( x ) | x ∈ Ω f ,j } ) , whic h is at m ost ν ( n ( f )) 1 + ln max Φ f ( M f j ) min Φ f ( M f j ) !! . This giv es the desir ed result, su mming from j = k d own to j = 1. F or the high probabilit y statemen t, again from Theorem 5 , w e conclud e that with probabilit y at least 1 − exp( − λ ) , & ν ( n ( f )) ln max Φ f ( M f j ) min Φ f ( M f j ) ! + λ !' iterations suffice to go from a solution in Ω f ,j to one in Ω f ,j − 1 . Definition 10. Supp ose that Φ i s a family of f e asible drift functions for F . We wil l say that Φ is piece-wise p olynomial (with r esp e ct to the (1+1)-EA ), if ther e is a c onstant k and a family M of fitness b ase d k -p artitions for F such that for every j ∈ { 1 , . . . , k } , ln(max Φ f ( M f j )) − ln(min Φ f ( M f j )) = O (log n ( f )) . If Φ is a family of feasible drift fu nctio ns for a (1+1)-EA for minimising F , and Φ is piece-wise p olynomial with resp ect to the EA, then the optimisation time b ound giv en b y Lemma 9 is O ( n ( f ) log n ( f )). 9 3 Construction of the Dr ift F unction Let F b e a linear family of ob jectiv e f u nctions o v er bit strin gs (see Definition 2). Fix a constan t c and consider the (1+1) EA for minimising F with indep endent bit-m utation rate p n = c/n . W e aim to construct a family Φ of feasible d rift f u nctions f or the EA whic h is p iec e-wise p olynomial with resp ect to the EA. 3.1 Notation and parameters Recall from Definition 2 that Ω f = { 0 , 1 } n ( f ) and that an elemen t x ∈ Ω f is w ritte n as a strin g of n ( f ) bits, x = x n ( f ) . . . x 1 . In th e pro of, w e sh al l often us e the w ord “left” to refer to the most-significan t b it (with the largest ind ex, index n ( f )) of x and “right” to refer to the least-significan t bit (with the smallest in dex, index 1). The pro of will u se sev eral parameters, which w e discuss here. W e start by fixing an arbitrarily-small p ositiv e constan t ε . This is constan t will b e used to pr ec isely for- m ulate the intermediate results. T o define the family Φ, w e will use a sufficiently large constan t K ≥ 1 (dep ending on c and ε ) and a sufficient ly small p ositiv e constant γ (dep ending on c , ε and K ). 3.2 Splitting in to blo c ks The difficult y in defin ing a suitable dr ift function Φ f is that the optimisation of f via the EA h ea vily d epend s on the co efficie nts a i . If these are steeply increasing, as in Example 1, whether a new solution is accepted or not is d et ermined by th e v alue of the leftmost bit that is flip ped . On the other h and, if these are of comparable size, as in Example 2, the difference b et wee n the num b er of “go od” bit-flips (turing a 1 into a 0) and the n umber of “bad” bit-flips (turing a 0 in to a 1) determines whether a n ew solution is accepted. Of course, the pr ec ise d efi nitio ns of “steeply increasing” and “comparable size” d epend on the constan t c in the m utation pr obabilit y . Also, an ob jectiv e function f can b e of a mixed type, ha ving regions with steeply increasing co efficien ts and also regions where co effic ients are of comparable size. Fix an ob jectiv e fun ctio n f with n ( f ) = n . T o analyse f and define the corresp onding drift function Φ f , we sp lit the b it p ositions { 1 , . . . , n } into blo cks . The idea is th at, within a blo c k, one of the tw o b eha viours is d o minant. The defi n itio n of blo c ks, naturally , has to allo w us to an alyse the in teraction b et wee n differen t blo c ks. W e fir st split the b it p ositions { 1 , . . . , n } into miniblo cks . Start with j = 1. A miniblo c k starting at bit p osition j is constru cted as follo ws. If a n /a j < n 2 , then { j, . . . , n } is a single miniblo c k. Otherwise, let i b e the minim um v alue in { j + 1 , . . . , n } suc h that a i /a j ≥ n 2 . Then the set { j, . . . , i } is a miniblo c k. If i = n , w e are fi nished. Otherwise, set j = i and r epeat to form the next min ibloc k, starting at bit p osition j . Note that consecutiv e miniblo c ks o v erlap b y one bit p osition. The next thing that w e d o is merge consecutiv e p a irs of miniblo c ks into blo cks . T o start out with, w e just go through the m iniblocks f r om righ t to left, making a blo c k out of eac h pair of miniblo c ks. Note that this is (in ten tionally) different from just definin g 10 blo c ks analogous to m iniblocks with the n 2 replaced by n 4 . No te f urther that again consecutiv e blo c ks ov erla p in one b it p osition. A blo c k is said to b e long if it con tains at least γ n b it p ositions (recall that the parameter γ is from Section 3.1) and short otherwise. It helps our analysis if an y pair of long blo c ks h as at least th ree short blo c ks in b etw een. S o if t w o long b loc ks are separated b y at most t wo short blo c ks, th en we com bine the w hole th ing in to a single long b loc k. W e rep eat this (at most a constant num b er of times since there are less than 1 /γ long b lo c ks initially) un til all remaining long blo c ks are separated by at least three short blo c ks. W e will us e ℓ B to denote the leftmost bit p osition in blo c k B and r B to denote the righ tmost bit p osition in blo c k B . As long as B is not the leftmost blo c k, w e hav e a ℓ B /a r B ≥ n 4 . 3.3 Definition of Φ f W e w ill defin e weig hts w 1 , . . . , w n ∈ R su c h that Φ f ( x ) = P n i =1 w i x i . W e call the w i weights to distinguish them from the c o efficients a 1 , . . . , a n of f . W e define the we ight s w 1 , . . . , w n as follo ws, starting with w 1 = 1. S upp ose th at bit p osition i is in blo c k B , that i 6 = r B , and that w r B is already d efined. If blo c k B is a long blo c k, or is immediately to the left of a long blo c k, then w e defin e w i b y w i = w r B a i /a r B . W e call this the c opy r e gime since w i /w r B = a i /a r B . Otherwise, we are in th e damp e d r e gime and we d efine w i b y w i = w r B min { K ( i − r B ) c/n , a i /a r B } , where K is th e parameter fr om Section 3.1. It will b e a ma jor effort in the remaind er of the pap er to show that this { Φ f | f ∈ F } is a feasible family of drift f unctions for the EA. I t is easier to see that { Φ f } is p iece-wise p olynomial with r espect to the EA, so we do this next. Lemma 11. L et F b e a line ar family of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep endent bit-mutation r ate p n = c/n . Th e family Φ = { Φ f } of drift functions c onstructe d ab ove is pie c e- wise p ol ynomial with r e sp e ct to the EA. Pr o of. Let k = 6 ⌈ 1 /γ ⌉ + 1. W e now construct a family of fitn ess-b ase d k -partitions for F . Let f b e an ob jectiv e function in F and let n = n ( f ). W e no w defin e the partition M f . W e call a bit p osition i ∈ [2 ..n ] a jump (for the ob j ectiv e fun cti on f ) if • i is in a copy regime, and • w i /w i − 1 > n 2 . By th e constru ct ion of the blo c ks, b it p osition i is the leftmost bit p osition of a miniblo c k con tained in either (1) a long blo c k, or (2) a s h ort blo c k imm ediat ely to the left of a 11 long b loc k. S ince th ere are at most ⌈ 1 /γ ⌉ long blo c ks, there are at most k − 1 ju mps. (The easiest wa y to see this is to th ink ab out th e original long b loc ks, prior to an y merges. E ac h blo c k con tains t w o miniblo c ks. Within a long blo c k B , there ma y b e t w o jumps, and th ere may b e t wo in eac h of the two b loc ks to the left of B — the blo c k immediately to the left of B is alwa ys in the cop y regime, but the b loc k to its left m ay also b e merged in to a long blo c k with B .) S upp ose there are k ′ jumps, and let M f j = ∅ , for k ′ + 1 < j ≤ k . Let i 1 , . . . , i k ′ b e an increasing enumeration of the jump s. S et i 0 = 1 and i k ′ +1 = n + 1 to ease the f ollo wing definition. F or j = 1 , . . . , k ′ + 1, let N j b e { i j − 1 , . . . , i j − 1 } and define M f j = { x ∈ { 0 , 1 } n | ∃ i ∈ N j : x i = 1 ∧ ∀ i ≥ i j : x i = 0 } . Let M f 0 = { 0 } . Inf o rmally , N j is th e set of bit p ositions starting at the jump i j − 1 and going up to, b ut n ot including, the jump i j . So { N j | 1 ≤ j ≤ k ′ + 1 } is a partition of the bit p ositions. Th en M f j is the set of bit strings x whic h ha ve the leftmost “1”-bit in N j . In ord er to sh o w that M = {M f | f ∈ F } is a f a mily of fitn ess-based k -partitions for F , we n eed only sh o w that th e follo wing condition is satisfied: f or all i < j , x ∈ M f i and y ∈ M f j , w e ha ve f ( x ) < f ( y ). T he condition follo ws from the fact that a i /a i − 1 = w i /w i − 1 > n 2 for all jumps i . In ord er to sho w that Φ is piece-wise p olynomial with resp ect to the EA, it remains to pro ve that, for ev ery j ∈ { 1 , . . . , k } , ln(max Φ f ( M f j )) − ln (min Φ f ( M f j )) = O (log n ( f )). Fix an y suc h j . Let r f = max Φ f ( M f j ) / min Φ f ( M f j ). W e show that r f is upp er-b ounded b y a p olynomial in n . F or a set of b it p ositi ons I ⊆ { 1 , . . . , n } , let min I denote the minimum elemen t in I and let max I denote the maximum elemen t. Since w 1 ≤ . . . ≤ w n , min Φ f ( M f j ) = w min N j = w i j − 1 . Similarly , max Φ f ( M f j ) = P max N j i =1 w i ≤ nw max N j = nw i j − 1 . Hence r f ≤ nw max N j /w min N j . W e rewrite w max N j w min N j = Y B : B ∩ N j 6 = ∅ w max( B ∩ N j ) w min( B ∩ N j ) , (1) where B run s o ve r all minib loc ks that ha v e a non-empty in tersection w ith N j . Note that the ab o v e is tru e b ecause adjacen t miniblo c ks intersect in exactly one bit p osition. If B is a miniblo c k in a damp ed regime, then w max( B ∩ N j ) /w min( B ∩ N j ) ≤ w ℓ B /w r B = K ( ℓ B − r B ) c/n . In consequence, th e con tribu tion of all w eigh ts in damp ed regimes to (1) is at most a factor K c . What remains is the con tribu tio n of minib loc ks in long blo c ks and in those short blo c ks immediately to the left of a long b loc k. Let B b e such a miniblo c k. If B ∩ N j = { ℓ B } then w max( B ∩ N j ) /w min( B ∩ N j ) = 1. O th erwise, note that w max( B ∩ N j ) w min( B ∩ N j ) ≤ w ℓ B w r B =  w ℓ B w ℓ B − 1   w ℓ B − 1 w r B  . 12 The firs t factor is at most n 2 , since ℓ B is not a ju mp, the second factor is at most w ℓ B − 1 /w r B = a ℓ B − 1 /a r B ≤ n 2 b y the d efi nitio n of a min ibloc k. 3.4 Auxiliary results concerning the w eigh ts w i Fix an ob jectiv e fun ct ion f ∈ F and let n = n ( f ). W e w ill assume that n is sufficien tly large with resp ect to th e constan ts c , ε , K and γ since our ob jectiv e is to construct a f amily Φ of feasible dr ift functions for th e EA and the d efinition of su c h a family (Definition 4) is only concerned with su fficie ntly large n . The defi n itio n of Φ f allo ws us to pr o v e a n umb er of usefu l facts. Th e first of these uses a geometric series to b ound sums of we ight s in the damp ed r egime. Lemma 12. L et B 0 , . . . , B k b e a c onse cutiv e se quenc e of blo cks (left to right) in the damp e d r e gime with ℓ B 0 = r B k + t . Then X j ∈ B 0 ∪ ... ∪ B k w j ≤ K tc/n w r B k  n c ln K + 1  . Pr o of. F or 0 ≤ h ≤ t w e h a v e w ℓ B 0 − h ≤ K tc/n w r B k K − hc/n . No w X j ∈ B 0 ∪ ... ∪ B k w j ≤ K tc/n w r B k ∞ X h =0 K − ch/n = K tc/n w r B k 1 1 − K − c/n . No w K c/n = e (ln K ) c/n ≥ 1 + (ln K ) c/n , so 1 1 − K − c/n ≤ 1 1 − 1 1+(ln K ) c/n =  n c ln K + 1  . The n ext lemma giv es the relationship b et we en the leftmost we ight and the righ tmost w eigh t in a blo c k in the damp ed regime. Lemma 13. If B is a blo ck i n the damp e d r e gi me with ℓ B = r B + t and B is not the leftmost blo c k, then w ℓ B = K tc/n w r B . Pr o of. Th is f ollo ws from the definition of the weigh ts in the damp ed regime, since a ℓ B a r B ≥ n 4 ≥ K c ≥ K tc/n . The second inequalit y follo w s fr om our assu mption (at the b eginning of Section 3.4) that n is sufficien tly large with resp ect to K and c . Lemmas 12 and 13 giv e the f ollo wing corollary . 13 Corollary 14. L et B 0 , . . . , B k b e a c onse cutive se quenc e of blo cks (left to right) in the damp e d r e gime with ℓ B 0 = r B k + t . If B 0 is not the leftmost b lo ck then X j ∈ B 0 ∪ ... ∪ B k w j ≤ w ℓ B 0  n c ln K + 1  . Corollary 14 giv es the follo wing upp er b ound f or the sum of all w eigh ts con tained in, and to the right of, a short blo c k. Lemma 15. L et B b e a short blo ck that is not the leftmost blo ck. Then X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1 + γ n + n − 3  . Pr o of. If there is no long b lo c k to the r ig ht of B , then B and all of the blo c ks to its right are in the d amped regime, so the result follo ws immed iately from Corollary 14. Ass u me therefore that there is a long blo c k to the right of B . Let L b e th e long blo c k whic h is closest to B on its righ t. Let S b e the sh ort blo c k immediately to the left of L . Note that S might b e th e same blo c k as B . Supp ose j ∈ L . R ecall that for all h, k ∈ L ∪ S , we h a v e w h a h = w k a k . Thus, since S is not the leftmost blo c k, w j = w j a j a j ≤ w j a j n − 4 a ℓ S = n − 4 w ℓ S ≤ n − 4 w ℓ B . Since the w j ’s increase with j , we conclude that w j ≤ n − 4 w ℓ B for any j ≤ ℓ L . Thus, P j ≤ ℓ L w j ≤ n − 3 w ℓ B . Using the fact that S is sh ort and the m onot onicit y of w , we deduce P j ∈ S w j ≤ γ nw ℓ B . Com binin g this with Corollary 14, we obtain X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1  + γ nw ℓ B + w ℓ B n − 3 . 4 F easible Drift Our ob jectiv e in this section is to pro v e the follo wing lemma, w hic h is the heart of the pro of of our main resu lt. Lemma 16. L et F b e a line ar f amily of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep endent bit-mutation r ate p n = c/n . Ther e is a function ν ( n ) = O ( n ) such that the family Φ = { Φ f } of drift functions c onstructe d ab ove is ν -fe asible for the EA. 14 Consider running the EA w ith inp ut f with n = n ( f ). W e use the f ol lo wing notatio n. The state after t steps is a binary string x [ t ] = x n [ t ] . . . x 1 [ t ]. Recall from Section 3.1 that w e write b it strings as w ords from most signifi ca nt b it (“leftmost b it ”) to least significant. In the ( t + 1)’st step of th e algorithm, the b its of a binary string y [ t + 1] = y n [ t + 1] . . . y 1 [ t + 1] en coding the mutat ion mask are chosen indep endentl y . Th e p robabilit y that y i [ t + 1] = 1 is p n = c/n . Then x ′ [ t + 1] is formed from x [ t ] by fl ipping the bits that are 1 in string y [ t + 1]. T hat is, x ′ n [ t + 1] . . . x ′ 1 [ t + 1] = ( x n [ t ] ⊕ y n [ t + 1]) . . . ( x 1 [ t ] ⊕ y 1 [ t + 1]). Let A t +1 b e the ev en t that P i a i x ′ i [ t + 1] ≤ P i a i x i [ t ]. W e say that the mutat ion in step t + 1 is “accepted” in this case. I f A t +1 o ccurs, then x [ t + 1] = x ′ [ t + 1]. Otherw ise, x [ t + 1] = x [ t ]. O f cour s e, th e co efficien ts a i , and therefore A t +1 itself, d epend s implicitly on f . Supp ose that x [ t ] is not the all-zero string. F or a bit p osition i with x i [ t ] = 1, let I i [ t + 1] b e the eve nt y i [ t + 1] = 1 ∧ ∀ j ∈ { i + 1 , . . . , n } : ( x j [ t ] = 1) ⇒ ( y j [ t + 1] = 0) . I i [ t + 1] is th e even t that i is th e leftmost ‘1’ to b e considered f o r a fl ip in step t + 1. Finally , let I ′ ℓ [ t + 1] b e the eve nt ∀ j ∈ { ℓ + 1 , . . . , n } : ( x j [ t ] = 0) ⇒ ( y j [ t + 1] = 0) . I ′ ℓ [ t + 1] is the even t that the ‘0’ b its to the left of ℓ are not considered for a flip in step t + 1. Note that Pr( I ′ ℓ [ t + 1]) ≥ (1 − p n ) n and that, giv en x [ t ], the ev ent I ′ ℓ [ t + 1] is ind ep endent of I i [ t + 1] for an y i (the ev ent I i [ t + 1] constrains y j [ t + 1] for some j with x j [ t ] = 1, w hereas the ev ent I ′ ℓ [ t + 1] constrains y j [ t + 1] for j with x j [ t ] = 0). Ho w ev er, these ev ents are n ot ind epend en t if we cond iti on on A t +1 , as the follo wing simple observ atio n shows. Lemma 17. L et i b e a bit p ositio n c on taine d in some blo ck B . Assume that ther e is a blo ck L imme diately to the left of B . Then I i [ t + 1] and A t +1 implies I ′ ℓ L [ t + 1] . Pr o of. Th er e is nothing to sho w if L is the leftmost b loc k. Hence assume that it is not. Then in particular, a ℓ L ≥ n 4 a r L . Assume th at I i [ t + 1] o ccurs and I ′ ℓ L [ t + 1] do es not. Let k > ℓ L b e such that y k [ t + 1] = 1 and x k [ t ] = 0. Then P n j =1 a j ( x j [ t + 1] − x j [ t ]) ≥ a k − P j ≤ i a j ≥ a k − na i > 0, b ecause a k ≥ a ℓ L ≥ n 4 a r ℓ ≥ n 4 a i . Hence this mutatio n is not accepted, that is, A t +1 do es not o ccur. Recall that ε ∈ (0 , 1), K , and γ are parameters defin ed in Section 3.1. W e tak e ε to b e “sufficientl y sm all”. Th en K ≥ 1 is tak en to b e “sufficien tly large” (dep ending on c and ε ) and th en γ ∈ (0 , 1) is tak en to b e “sufficien tly small” (dep ending on c , ε and K ). Finally , w e tak e n 0 > 1 to b e an y intege r w hic h is “su fficie ntly large” with resp ect to all of these parameters. The actual constrain ts that we use (to determine what is “sufficien tly large” and what is “sufficiently small”) will b e sp elle d out b elo w. Note that (1 − c n ) n approac hes exp ( − c ) fr om b elo w as n → ∞ . W e c ho ose n 0 so that (1 − c n 0 ) n 0 is “sufficien tly close” to exp( − c ) (with resp ect to c , ε and K ). W e can conclud e from this that (1 − c n ) n is “su fficie ntly close” to exp( − c ) for an y n ≥ n 0 . Similarly , (1 − c n ) 3 n 15 approac hes exp( − 3 c ) fr om b elo w as n → ∞ . W e will c ho ose n 0 to ens u re that, for n ≥ n 0 , this is “sufficien tly close” to exp ( − 3 c ). Pr o of of L emma 16. Th e fir s t t wo conditions in Definition 4 f o llo w from the construction of Φ f in Section 3.3. Th e third condition follo ws from Lemma 18 b elo w. The follo wing lemma is the main ingredient in the short pro of of Lemma 16 ab o v e. It establishes the th ir d condition in Definition 4, so it allo ws us to conclud e that Φ is ν -feasible f or the EA. Since by Lemma 11, Φ is also piece-wise p olynomial with resp ect to the EA, Lemma 9 enables us to rep eatedly apply Lemma 18 to b oun d th e exp ected optimisation time of the EA. Lemma 18. L et F b e a line ar f amily of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep e nd ent bit-mutation r ate p n = c/n . L et f b e an obje c tive func tion in F with n ( f ) ≥ n 0 . F or al l x ∈ { 0 , 1 } n ( f ) \ { 0 } , E [Φ f ( x [ t + 1]) | x [ t ] = x ] ≤  1 − 1 n ( f ) ce − 3 c (1 − ε ) 2  Φ f ( x ) . Pr o of. Fix f ∈ F with n ( f ) ≥ n 0 . Let n = n ( f ). Note that, for any fixed x [ t ], E [Φ f ( x [ t ]) − Φ f ( x [ t + 1])] = X i : x i [ t ]=1 Pr( I i [ t + 1]) E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] , (2) since the ev en ts I i [ t + 1] for 1 ≤ i ≤ n are d isjoin t and Φ f ( x [ t ]) = Φ f ( x [ t + 1]) unless one of them o ccurs. In eac h of v arious cases (see S ubsections 4.1 to 4.5), we will show that, for all i with x i [ t ] = 1, E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) 2 n w i (1 − ε ) , (3) whic h is greater than or equal to 0 since n ≥ n 0 > c and ε < 1. Using the lo w er b ound Pr( I i [ t + 1]) ≥ p n (1 − p n ) n , wh ic h applies for ev ery i with x i [ t ] = 1, Equations (2 ) and (3) giv e E [Φ f ( x [ t ]) − Φ f ( x [ t + 1])] ≥ p n (1 − p n ) n X i : x i [ t ]=1 E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ p n (1 − p n ) n (1 − p n ) 2 n (1 − ε )Φ f ( x [ t ]) , so E [Φ f ( x [ t + 1])] ≤ (1 − p n (1 − p n ) 3 n (1 − ε ))Φ f ( x [ t ]) . Since (1 − p n ) 3 n ≥ e − 3 c (1 − ε ) f o r n ≥ n 0 , this will complete the p roof. It r ema ins to p ro v e E q u ati on (3). W e do this in Subs ec tion 4.1 to 4.5. In eac h case, B is the block con taining bit p osition i , L is the blo c k to the left of B (if it exists) and R is the blo c k to the righ t of B (if it exists). Figure 1 d epict s some b loc ks (tw o short blo c ks follo w ed by a long blo c k, follo w ed by a short b loc k divided in to t w o minib loc ks, follo w ed 16 Case 1 Case 3 Case 5 Case 4 Case 2 Case 1 Figure 1: Th e cases that are used to p roof Equation (3). b y another short b loc k). F or eac h p ossible lo cation of the bit p osition i , it names the relev ant case. Ev ery long b loc k is cov ered by Case 5. Blo c ks to the left of a long blo c k are co v ered b y Case 3 and b loc ks immediately to the righ t of a long blo c k are co v ered b y Case 4, then C ase 2. Everything else is cov ered by Case 1. F or all of the follo wing cases, fix f ∈ F with n ( f ) ≥ n 0 . Let n = n ( f ). Fix x [ t ] with x i [ t ] = 1 for a bit p osition i in b loc k B . Recall fr om th e pro of of Lemma 18 that the goal is to pr o v e (3). That is, we m ust sho w that E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) 2 n w i (1 − ε ) . 4.1 Case 1 F or this case, assume that B is not long and that blo c ks adjacen t to B are not long either. If B is not the leftmost blo c k, then let L b e the b lo c k to B ’s left. Th e case in w hic h B is the leftmost blo c k is actually easier, bu t to av oid rep etition, in this case, let L b e the blo ck consisting of the single bit p osition ℓ B . The follo wing argumen t now applies whether L is a real blo c k or just a s ingle bit p osition. W e will condition on I i [ t + 1]. By L emm a 17, we kn o w that if th is m utation is accepted (so A t +1 o ccurs), then the ev en t I ′ ℓ L [ t +1] o ccurs. Also, Pr( I ′ ℓ L [ t + 1] | I i [ t + 1]) ≥ (1 − p n ) n , as we noted earlier. T h us E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] is equal to Pr( I ′ ℓ L [ t + 1] | I i [ t + 1]) · E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1] , I ′ ℓ L [ t + 1]] . (4) Let P = Pr( A t +1 | I i [ t + 1] , I ′ ℓ L [ t + 1]). Note that P ≥ (1 − p n ) n (since, for example, A t +1 o ccurs if y j [ t + 1] = 0 for j 6 = i ). Now Φ f ( x [ t ]) − Φ f ( x [ t + 1]) = P n j =1 w j ( x j [ t ] − x j [ t + 1]). If I i [ t + 1] and I ′ ℓ L [ t + 1] o ccur, then this is P j ≤ ℓ L w j ( x j [ t ] − x j [ t + 1]). If A t +1 also o ccurs, then x i [ t ] − x i [ t + 1] = 1 so this is w i + P j ≤ ℓ L ,j 6 = i w j ( x j [ t ] − x j [ t + 1]). Thus, th e quan tit y in (4) is at least (1 − p n ) n   w i P − X j ≤ ℓ L ,j 6 = i w j Pr( y j [ t + 1] = 1 | I i [ t + 1] , I ′ ℓ L [ t + 1])   ≥ (1 − p n ) n   w i (1 − p n ) n − X j ≤ ℓ L w j p n   . 17 No w, by Lemma 15, we h a v e X j ≤ ℓ L w j ≤ K 2 cγ w r B  2 n c ln K + 2 + γ n + n − 3  . T o see this, apply the lemma d irect ly to L if it is not the leftmost blo c k (and n ot e that w ℓ L ≤ K 2 cγ w r B ). If L is the leftmost blo c k (and B is not) then app ly Lemma 15 to blo c k B (noting that w ℓ B ≤ K cγ w r B ) and u se Lemma 12 to sum the w eigh ts in L . Finally , if B is the leftmost blo c k then apply Lemma 15 to the sh ort blo c k to the righ t of B an d use Lemma 12 to sum the weig hts in B . Using this and w r B ≤ w i w e hav e E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) n w i  (1 − p n ) n − 2 K 2 cγ ln K − 2 c n K 2 cγ − γ cK 2 cγ − c n 4 K 2 cγ  . By the choice of th e parameters in Section 3.1, and sin ce n ≥ n 0 , eac h of 2 K 2 cγ ln K , 2 c n K 2 cγ , γ cK 2 cγ and c n 4 K 2 cγ is at most (1 − p n ) n ε/ 4, so Equ ati on (3) holds, as r equired. T o see this, recall (from the text jus t after Lemm a 17 ) that ε is tak en to b e “sufficient ly small”, then K ≥ 1 is tak en to b e “sufficiently large” (dep ending on c and ε ) and th en γ ∈ (0 , 1) is tak en to b e “sufficien tly small” (dep ending on c , ε and K ). Finally , w e tak e n 0 > 1 to b e any in teger whic h is “sufficien tly large” with resp ect to all of these parameters, in particular, guarante eing that (1 − p n ) n is “sufficient ly close” to exp( − c ) for an y n ≥ n 0 . It is easy to see th at c n 4 K 2 cγ and 2 c n K 2 cγ are s u fficien tly s m al l, s in ce n 0 is c hosen after the other parameters (so these terms can b e made arb itrarily sm all as compared to exp( − c ) ε/ 4 ). S imila rly , γ cK 2 cγ is sufficiently small b ecause γ is c hosen to b e sufficiently small with resp ect to ε , c and K . Finally , 2 K 2 cγ ln K is suffi ci entl y small b ecause γ can b e c hosen as small as we like with resp ect to the other parameters. (Th at is, first K is m a de sufficien tly large with r espect to c and ε and then γ is defined.) F or example, setting γ = ln ( ε 16 e − c ln K ) / (2 c ln K ) give s 2 K 2 cγ ln K = e − c ε/ 8. 4.2 Case 2 F or this case, assu m e that the blo c k L , immediately to the left of B , is long, and that i is in the right most miniblo c k of blo c k B (which is therefore short). This is v ery similar to C ase 1. As in C ase 1, we will condition on I i [ t + 1]. Where Case 1 uses Lemma 17, w e use exactly the same argument to sho w that, if this mutation is accepted (so A t +1 o ccurs), then even t I ′ ℓ B [ t +1] o ccurs. F rom that p oin t the argu m en t pro ceeds exactly as in Case 1, replacing “ ℓ L ” with “ ℓ B ”. W e u se Lemma 15 to obtain the upp er b ound X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1 + γ n + n − 3  ≤ K cγ w r B  n c ln K + 1 + γ n + n − 3  . The rest of the argument is exactly th e same as in Case 1. 18 4.3 Case 3 F or this case, assume th a t B is immediately to the left of a long blo c k R . Hence b oth B and R are in the copy regime. If B is n ot th e leftmost blo c k, then there is a blo c k L immediately to the left of B . Block L is sh o rt, since an y pair of long blo c ks has at thr ee short blo c ks b et we en. Thus, L is in the damp ed regime. If B is th e leftmost blo c k, to k eep notation simple, we add an artifi cial blo c k L = { ℓ B } = { n } . Note that X j ℓ L or b oth j > i and x j [ t ] = 1. Th us, by the d efi nitio n of A t +1 , we h a v e X j ≤ ℓ L a j (( x j [ t ] ⊕ y j ) − x j [ t ]) ≤ 0 . W e compute X j ∈ L : y j =1 ,x j [ t ]=0 a j + X j ∈ B ∪ R : y j =1 ,x j [ t ]=0 a j − X j ∈ B ∪ R : y j =1 ,x j [ t ]=1 a j ≤ X j ℓ L or if j > i and x j [ t ] = 1. Thus, if y ∈ Y , then, b y th e definition of A t +1 , we h a v e 0 ≤ X j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) . (7) T o derive an up per b ound in th e r ig ht-hand sid e of Eq u ati on (7) we split the summa- tion into thr ee easily-b ounded parts. The s ummation o ver j ∈ L − { r L } is equal to − P r L i s u c h that x j [ t ] = 1. T hus, P ≥ (1 − p n ) n so ( ε/ 3)( 1 − c n ) n ≤ ( ε/ 3) P. W e conclude that the fi rst term is at least − ( ε / 3) P w i . Using Lemma 15 and w ℓ B ≤ K γ c w i , we obtain K γ c c n X j ≤ ℓ B w j ≤ w i  K 2 γ c ln K + cK 2 γ c n + K 2 γ c γ c + cK 2 γ c n 4  . 22 Giv en the constrain ts on our parameters (see the discus sion at the end of Case 1), eac h of the four summands, K 2 γ c ln K , cK 2 γ c n , K 2 γ c γ c and cK 2 γ c n 4 , is at most ( ε/ 12) P . Th us, the second term, − K γ c P j ≤ ℓ B ,j 6 = i c n w j , is also at least − ( ε/ 3) P w i . In a similar w a y , w e see that the third term, P ((1 + 1 n − K − γ c ) K γ c w i + w i ) , is at least P w i (1 − ε/ 3). W e conclude that E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ P w i (1 − ε ) , whic h establishes Equ ati on (3), as r equired. 4.5 Case 5 F or this case, assume that B is a long blo c k. T o the right of B , there migh t b e a short b loc k R , otherwise r B = 1 and we d efine R = { r B } to ease notation. T o th e left of B , there might b e a short b loc k L , otherwise ℓ B = n and we define L = { ℓ B } to ease notation. Let Y b e the s et of n -bit bin ary strings so that, if y [ t + 1] = y , then I i [ t + 1] o ccurs and A t +1 o ccurs (so the mov e in step t + 1 is accepted). As in Case 4, A t +1 implies I ′ ℓ L [ t + 1]. Hence for every y ∈ Y w e ha ve y j = 0 for j > ℓ L and for all j > i satisfying x j [ t ] = 1. T h us, if y ∈ Y , then, by the definition of A t +1 , we ha ve 0 ≤ X j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≤ X r R ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) + a i n − 3 ≤ X r B ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) + X j ∈ R ; y j =1; x j [ t ]=1 a j + a i n − 3 . (10) W e w ill use the fact that for j ∈ L ∪ B , w e ha ve w j = w r B a r B a j since w e are in the copy regime, whereas for j ∈ R , we are in th e damp ed regime, so we hav e a j ≤ a r B = a r B w r B w r B ≤ a r B w r B w r R K ( r B − r R ) c/n ≤ a r B w r B w j K γ c . Plugging this into (10), we obtain X r B ≤ j ≤ ℓ L w j ( x j [ t ] − ( x j [ t ] ⊕ y j )) = w r B a r B X r B ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≥ − w r B a r B   X j ∈ R ; y j =1; x j [ t ]=1 a j + a i n − 3   ≥ − K γ c X j ∈ R ; y j =1; x j [ t ]=1 w j − w i n − 3 . 23 Let Ψ( y ) = − K γ c P j ≤ ℓ R ; y j =1 w j − w i n − 3 . F r om the ab o ve , X j ≤ ℓ L w j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≥ (1 − K γ c ) X j ∈ R ; y j =1; x j [ t ]=1 w j − X j ∈ R ; y j =1; x j [ t ]=0 w j − w i n − 3 − X j

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment