Adaptive Drift Analysis

Adaptiv e Drift Analysis ∗ Benjamin Do err Max Planc k Institute for Computer Science, Campus E1 4 66123 Saarbr ¨ uc k en, German y Leslie Ann Goldb erg Departmen t o f Computer Science Univ ersit y of Live rp o ol Ash ton Bldg, Liv erp o ol L69 3BX, UK July 26, 2018 Abstract W e show that, for any c > 0, the (1+1) evolutionary algo r ithm using an ar bi- trary m utation rate p n = c/n ﬁnds the optimum of a linear ob jective function ov er bit strings of length n in exp ected time Θ( n log n ). Previous ly , this w as only known for c ≤ 1. Since previo us work also shows that universal drift functions cannot exist for c larg er tha n a certain constant, we instead deﬁne dr ift functions which depe nd c rucially on the relev ant ob jective functions (and also o n c itself ). Using these ca refully-constructed drift functions, we prov e that the exp ected optimisation time is Θ( n log n ). By giv ing a n alternative pro of of the multiplicative drift theorem, we a lso show tha t o ur optimisa tio n-time b ound holds with high probabilit y . 1 In tro duction Drift analysis is cen tral to the ﬁeld of ev olutionary alg orithms. This type of analysis wa s implicit in the w ork of Droste, Jansen and W egener [9], who analysed the optimisation of linear fun ct ions o v er bit strings by th e classical (1+1) ev olutionary algorithm ((1+1) EA) with m utation rate p n = 1 /n . The metho d was made explicit in the w ork of He and Y ao who ga ve a simp le, clean analysis. Later fund amental applications of drift analysis in the theory of evo lutionary computation include [11, 12, 15, 20, 22]. Recen t work b y J ohannsen, Winzen and the ﬁr st au th or [6, 7] shows that drift anal- ysis, as it is currently used, relies strongly on the fact that the m utation probabilities p n ∗ This work wa s b egun while b oth authors were visiting t h e “Centre de Recerca Matem´ atica d e Catalun ya”. I t proﬁted greatly from this ideal environmen t for collaboration. A preliminary announ ce- ment of th e result ( wi thout pro of s) app eared in [3]. The wo rk describ ed in this pap er wa s partly supp orted by EPSRC Research Gran t (refs EP/I01152 8/1) “Computational Counting” 1 are relativ ely small. As He and Y ao obser ved [17], the analysis in [16] only applies if the m utation p r obabilit y p n is strictly smaller than 1 /n , where n is the length of the bit strings of the search sp ac e. This restrictio n wa s impro v ed in [18], where a family of drift fun cti ons w as presen ted that wo rks f o r th e most common mutation p robabilit y p n = 1 /n . 1 Ho w ev er, as Do err et al. hav e observ ed [7], this family of dr ift fu n cti ons still ceases to work for p n ≥ 4 /n . F urthermore [7], if p n > 4 /n , then for any unive rsal family of drift f unctions (from the class of log-of-linear f u nctions) there is a linear ob jectiv e fu nction f , and a search space elemen t x , su c h that the dr ift from x is negativ e (so the p roof that the (1+1) EA con v erges quic kly do es not go th rough). Do err et al. ha v e also sho wn [6] that this problem c annot b e ﬁxed b y applying the av erag ing app roa c h of J¨ agersk ¨ upp er [19] — that ap p roac h fails for p n ≥ 7 /n . Thus, p rior to the work p resen ted here, it w as an op en problem whether the (1+1) EA minimises linear ob jectiv e fun ctions o v er bit str ings in O ( n log n ) time when the mutat ion p robabilit y is p n = c/n for c ≥ 7. Our main result sh o ws that this is the case. Since it is kn o wn that no universal family of d rift functions exists, we instead manage to deﬁ n e a feasible f amil y of drift fu nctio ns in suc h a wa y that the drift fu nctio n Φ f dep ends crucially on the ob jectiv e f unction f . Using this idea, w e show (see Theorem 7) that, for an y constant c , the (1+1) EA with m utation probability p n = c/n optimises an y family of linear ob jectiv e fu nctio ns o v er bit strin gs in expected time O ( n log n ). A corresp onding lo we r b ound follo ws ea sily from standard arguments, s ee Th eo rem 19. T hus, our result is as go od as p ossible (up to a constan t factor). By repr oving a multiplicat ive dr if t th eo rem (wh ic h was ﬁrst used to analyse ev olu- tionary algorithms in [7]), we also sh o w that our b ound on the optimisation time holds with h ig h pr o bability . Th e tail b ound s in our d rift theorem can also b e used to sh o w that man y other kno wn b ounds on optimisation times also hold with high probability . This has b een done for the (1+1) EA ﬁn ding minim um spanning trees, compu ting shortest paths or Eulerian cycles in [4]. 2 Drift Analysis In th is section, w e giv e a brief description of drift analysis, w hic h is suﬃ ci ent for our purp oses. F or a more general backg roun d to drift analysis, w e refer to the pap ers cited ab o v e. 2.1 The (1+1) ev olutionary algorithm Let F b e a set of obje ctive functions . Eac h f ∈ F is asso ciate d with a pr oblem size n ( f ) ∈ N and is a function from the se ar ch sp ac e Ω f to R ≥ 0 . Giv en f , the goal is to ﬁnd an elemen t x ∈ Ω f suc h that f ( x ) is minimise d . O ur assumption that the optimisation 1 Note, though, that in that p ap er an EA only accepting strict imp ro v ements was analysed; this fact w as exploited in the pro of . W e ha ve little doubt, though, that their pro of can b e adapted to work als o for th e more common setting that also an oﬀspring with eq ual ﬁtn ess is accepted. 2 problem is minimisation (as opp osed to maximisation) is without loss of generalit y , as is our assumption that the r ange of eac h ob jectiv e fun ction con tains only non-negativ e n umb ers. F or eac h ob jectiv e function f , let Ω opt ,f ⊆ Ω f denote the set of optimal searc h p oin ts — that is, those that minimise the v alue of f . Deﬁnition 1. W e say that F is a family of obje ctive functions ov er bit strings if, for every f ∈ F , Ω f = { 0 , 1 } n ( f ) . In this c ase, an element x ∈ Ω f is a string of n ( f ) bits, x = x n ( f ) . . . x 1 . Deﬁnition 2. Supp ose that F is a family of obje ctive functions over bit strings. We say that F is linear if e ach f ∈ F is of the form f ( x ) = P n ( f ) i =1 a i x i , wher e the c o eﬃcients a i ar e r e al numb ers. Without loss of gene r ality, we assume that a i +1 ≥ a i > 0 f or al l i ∈ { 1 , . . . , n ( f ) − 1 } . Example 1. Supp ose, for n ∈ N , that f n : { 0 , 1 } → R ≥ 0 is deﬁne d by f n ( x n . . . x 1 ) = P n i =1 2 i − 1 x i . Then F = { f n } is a line ar family of obje ctive functions over bit strings. The value of f n ( x ) is the binary value of the bit string x = x n . . . x 1 . Example 2. Supp ose, for n ∈ N , that f n : { 0 , 1 } → R ≥ 0 is deﬁne d by f n ( x n . . . x 1 ) = P n i =1 x i . Then F = { f n } is a line ar family of obje ctive functions over bit strings. The value of f n ( x ) is the numb er of ones in the bit string x = x n . . . x 1 . The r andomised searc h heuristic that w e study is the well-kno wn (1+1) EA. T o emphasize th e role of the parameters, w e refer to this algorithm as the (1+1) EA for minimising F . Giv en an ob jectiv e function f ∈ F , this algorithm starts with an initial solution x , c hosen u niformly at rand om from th e searc h sp ace Ω f . In eac h iteration, from its existing solution x , it generates a new solution x ′ b y mutation . Deﬁnition 3. Su pp ose F is a family of obje ctive fu nc tions over bit strings and that p n ∈ [0 , 1] f o r n ∈ N . In indep endent b it muta tion , e ach bit x i of x is ﬂipp e d indep endently with pr ob ability p n . In other wor ds, for e ach i ∈ { 1 , . . . , n } indep endently, we have Pr( x ′ i = 1 − x i ) = p n and Pr( x ′ i = x i ) = 1 − p n . Often, p n = 1 /n , but we do not make this assumption. In the sub sequen t sele ction step, if f ( x ′ ) ≤ f ( x ), the EA ac c epts the solution x ′ , meaning that the next iteration starts with x new := x ′ . Otherwise, the n ext iteration starts with x new := x . Since we are intereste d in determining the num b er of iterations that are necessary to ﬁnd an optimal solution, we do not sp ecify a termination criterion here. A pseu d o-c o de d escriptio n of th e (1+1) EA is given in Algorithm 1 Note that the (1+1) E A is not t ypically us ed to solv e diﬃcult optimisati on pr oblems in practice. There are other, m ore complex, search heuristics whic h are b etter for suc h problems in practice. Ho w ev er, unders ta ndin g the optimisation b eha viour of the (1+1) EA often helps us to pr edict the optimisation b eha viour of more complicated EAs (wh ic h are mostly to o complex to allo w r igorous th eo retical analysis). As such, the (1+1) EA pro ve d to b e an imp ortan t to ol that attracted signiﬁcant researc h eﬀorts (see, e.g., [1, 8, 9] for some early wo rks). 3 Algorithm 1 Th e (1+1) EA for minimising F ov er bit strings with indep endent bit m utation 1: Inpu t an ob jectiv e function f ∈ F . 2: Initialization: Ch oose x ∈ { 0 , 1 } n ( f ) uniformly at random. 3: rep eat forever 4: Create x ′ ∈ { 0 , 1 } n ( f ) b y cop ying x . 5: Mutation: Flip eac h bit in x ′ indep endent ly with probabilit y p n ( f ) . 6: Selection: if f ( x ′ ) ≤ f ( x ) then x := x ′ . 2.2 A simple drift theorem with tail b ounds The optimisation time of the (1+1) EA for minimising F is deﬁned to b e the num b er of times that the ob jectiv e fun ct ion is ev aluated b efore the optimum is found . Th is is (apart from an additive d evia tion of one) equal to the n umb er of m utation-selection iterations. S u pp ose that c is a p ositiv e constan t and th at F is a family of linear ob j ective functions o v er b it strings. Our main result (Theorem 7) shows that the (1+1) EA for minimising F with indep enden t bit-mutat ion rate p n = c/n has exp ected optimisation time O ( n ( f ) log n ( f )). It also sho ws that, with high probability , the optimisation time is of this order of magnitud e. In order to prov e th e main resu lt, we introd u ce the notion of pie c e- wise p olyno mial drift . This will b e explained in S ec tion 2.5. In this section, we prepare th e groundw ork, b y introd ucing the basic d r ift theorems that we will need. W e start b y deﬁnin g the notion of a fe asible f amily of drift f unctions. When feasible families of drift f unctions exist, they allo w an elegan t analysis yielding u p p er b oun ds for th e optimisation time of EAs. Deﬁnition 4. L et ν : N → R ≥ 0 b e monotonic al ly incr e asing and c onsider a family F of obje ctive fu nctions . F or e ach f ∈ F , let Φ f b e a function fr om Ω f to R ≥ 0 . We say that Φ = { Φ f } is a ν - f e asible f amily of drift functions for a (1+1) EA for minimising F , if ther e is an n 0 ∈ N such that, for every f ∈ F with n ( f ) ≥ n 0 , the fol lowing c onditions ar e satisﬁe d. 1. Φ f ( x ) = 0 for al l x ∈ Ω opt ,f ; 2. Φ f ( x ) ≥ 1 for al l x ∈ Ω f \ Ω opt ,f ; 3. for al l x ∈ Ω f \ Ω opt ,f , E [Φ f ( x new )] ≤  1 − 1 ν ( n ( f ))  Φ f ( x ) , wher e, as ab ove, we denote by x new the solution r esulting fr om exe cuting a single iter a tion (c onsisting of mutation and sele ction) with initial solution x . Here is a simple example. 4 Example 3. Fix a p ositive c onstant c . L et F b e a line ar family of obje ctive functions over bit strings and c onsider the (1+1) EA f o r minimising F which u se s indep endent bit mutation with p n = c/n . Supp ose that, for e ach f ∈ F , the c o eﬃcient a 1 is at le ast 1 . Then the trivial family Φ with Φ f = f is an ( n/c ′ ) -fe asible family of drift functions for this EA, wher e c ′ := c (1 − ( c/n )) n − 1 ≈ ce − c . However, as we shal l se e, this not often a very useful family of drift functions. The follo wing we ll-kno wn theorem (Th eo rem 5, b elo w) shows h o w the optimisation time can b e b ound ed using a d rift fun ct ion. S imila r argument s app ear in the context of coupling pr oofs. See, f or example, [10, Section 5]. Much more is known ab out drift analysis. See, for example [14]. Note that Theorem 5 give s a pr obabilit y tail b ound in addition to an up per b ound on th e exp ected optimisation time. Th e tail b ound is not new, but it seems to b e unknown in th e ev olutionary algorithms literature. It can b e applied to improv e s ev eral previous resu lts (see [4]). Theorem 5. Consider a family F of obje ctive functions and a ν -fe asible family Φ of drift functions for a (1+1) EA f or minimising F . L et Φ max ,f denote max { Φ f ( x ) | x ∈ Ω f } . Then ther e is an n 1 ∈ N such that, for ev ery f ∈ F with n ( f ) ≥ n 1 , the exp e cte d optimisation time of the EA is at most ν ( n ( f ))(ln Φ max ,f + 1) . Also , for any λ > 0 , the pr ob ability that the optimisation time exc e e ds ⌈ ν ( n ( f ))(ln Φ max ,f + λ ) ⌉ is at most exp( − λ )) . Pr o of. Let n 0 b e the v alue from Deﬁnition 4. Deﬁnition 4 ru le s out th e p ossibilit y that max { ν ( n ) | n ≥ n 0 } < 1. Also, if max { ν ( n ) | n ≥ n 0 } = 1 then, from p art (3) of the deﬁnition, E [Φ f ( x new )] = 0 so the optimisation time is 1. Su pp ose then, that there is an n ∈ N suc h that ν ( n ) > 1. Let n ′ 0 b e min { n ∈ N | ν ( n ) > 1 } (actually , it w ould suﬃ ce to tak e n ′ 0 to b e an y mem b er of this set, but, for concreteness, we tak e th e m inim um). Let n 1 = max( n 0 , n ′ 0 ). Now consider any f ∈ F with n ( f ) ≥ n 1 and note that the ﬁrst t w o conditions in Deﬁnition 4 are satisﬁed. Let n = n ( f ). Fix an arbitrary initial solution x 0 ∈ Ω f . Consider starting the EA with this initial solution x 0 instead of choosing a random one. Denote by Φ [ t ] the v alue of Φ f ( x ) after t selectio n-mutatio n steps. Denote b y T opt ,x 0 the ﬁrst time when the curr en t solution x is optimal. Thus, from Deﬁnition 4, Φ [ T opt ,x 0 ] = 0, and for t < T opt ,x 0 , w e ha ve Φ [ t ] ≥ 1. F rom th e third condition in Deﬁnition 4, E [Φ [ t ] ] ≤ (1 − 1 /ν ( n )) t Φ [0] ≤ (1 − 1 /ν ( n )) t Φ max ,f ≤ exp( − t/ν ( n ))Φ max ,f , where, in the last estimate, we used the w ell-kno wn in equ al it y 1 + z ≤ e z , which is v alid for all z ∈ R . 5 It is well kno wn (see, for example [13, Problem 13(a), S ection 3.11]) that if X is a random v ariable taking v al ues in the non-negativ e in tegers, then E [ X ] = P ∞ i =1 Pr( X ≥ i ). Therefore, th e exp ected optimisation time E [ T opt ,x 0 ] can b e written as E [ T opt ,x 0 ] = X i ≥ 1 Pr( T opt ,x 0 ≥ i ) = X t ≥ 0 Pr(Φ [ t ] > 0) . So, for any non-n egativ e integ er T , E [ T opt ,x 0 ] ≤ T + P t ≥ T Pr(Φ [ t ] > 0). Sin ce , by Mark o v’s in equalit y , Pr(Φ [ t ] > 0) = Pr(Φ [ t ] ≥ 1) ≤ E [Φ [ t ] ], E [ T opt ,x 0 ] ≤ T + X t ≥ T E [Φ [ t ] ] . No w let T = ⌈ ln(Φ max ,f ) ν ( n ) ⌉ = ln(Φ max ,f ) ν ( n ) + ε for some 0 ≤ ε < 1. By ou r upp er b ounds ab o v e, we obtain E [ T opt ,x 0 ] ≤ T + (1 − 1 /ν ( n )) T Φ max ,f X ∞ i =0 (1 − 1 /ν ( n )) i . Since ν ( n ) > 1, P ∞ i =0 (1 − 1 /ν ( n )) i = ν ( n ). Plugging this in with the deﬁnition of T and using (1 − 1 /ν ( n )) ln(Φ max ,f ) ν ( n ) ≤ exp − ln(Φ max ,f ) = 1 / Φ max ,f , E [ T opt ,x 0 ] ≤ ln(Φ max ,f ) ν ( n ) + ε + (1 − 1 /ν ( n )) ε ν ( n ) = ν ( n ) ( ln( Φ max ,f ) + ε/ν ( n ) + (1 − 1 /ν ( n )) ε ) . W e can now chec k, for ev ery ε ∈ [0 , 1], that ε/ν ( n ) + (1 − 1 / ν ( n )) ε ≤ 1, as required. This is easiest seen by c hec king it for ε = 0 and ε = 1 and noting that the term is conv ex in ε . Finally , let T ′ := ⌈ ( ν ( n ))(ln(Φ max ,f ) + λ ) ⌉ f or λ > 0. W e compute Pr( T opt ,x 0 > T ′ ) = Pr(Φ [ T ′ ] > 0) ≤ E [Φ [ T ′ ] ] ≤ exp( − T ′ /ν ( n ))Φ max ,f ≤ exp( − λ ) . The p roof ab o v e us es the argument E [Φ [ t ] ] ≤ (1 − 1 /ν ( n )) t Φ max ,f . This had b een used previously in the so-called metho ds of exp e cte d weight de cr e ase [21]. There, ho we ve r, it was follo w ed up with a simple Mark o v inequalit y argument that led to a b ound on the exp ected run-time that is w eak er (b y a constan t factor) than what our drift theorem yields. Hence the main diﬀerence b et wee n the t w o approac hes is that ours giv es a b etter transformation of the dr ift of E [Φ [ t ] ] in to a b ound on E [min { t | Φ [ t ] < 1 } ]. No te, just to a v oid misunderstand ings, that typical ly E [min { t | Φ [ t ] < 1 } ] and min { t | E [Φ [ t ] ] < 1 } are diﬀerent quantit ies. Theorem 5 ind ica tes that a family of d r ift fu nctio n is b etter if the maxim um v al- ues Φ max ,f are small. In Examp le 3, taking Φ f = f only yields an upp er b ound O ( n ( f ) log f max ) for the exp ecte d optimisation time, where f max = max { f ( x ) | x ∈ Ω f } . This can b e a weak b ound. F or example, app lyin g it to the family F fr om Example 1 yields a b ound O ( n ( f ) 2 ) for the exp ected optimisation time (whic h, as we shall see, is a w eak b ound). 6 2.3 Drift analysis for linear ob jectiv e functions ov er bit strings The main goal of this pap er is to analyse the optimisation time of the (1+1) EA for min- imising a linear family F of ob jectiv e f unctions o v er bit strings, assuming in dep enden t-bit m utation with p n = c/n (for a ﬁxed constant c ). The reason for assum ing p n = c/n is that results of Droste, J ansen and W egener (Theorem 13 and 14 in [9]) s h o w th at this is the optimal order of magnitude. S in ce our ob jectiv e is an O ( n ( f ) log n ( f )) b ound on optimisation time, we ease the language w ith the follo wing d eﬁnition. Deﬁnition 6. A feasible family of drift fun cti ons is a family of drift functions which is ν -fe asible for a function ν ( n ) = O ( n ) . Finding feasible dr ift functions is typical ly qu ite tric ky . Do err, Johan n sen and Winzen built on earlier ideas of Droste, Jans en and W egener [9 ] and He and Y ao [18 ] in order to sho w that, for any linear family F of ob jectiv e f unctions o v er b it strings, th e family Φ deﬁn ed by Φ f ( x ) = ⌊ n ( f ) / 2 ⌋ X i =1 x i + 5 4 n ( f ) X i = ⌊ n ( f ) / 2 ⌋ +1 x i is a feasible family of d rift fu nctio ns for th e (1+1) E A f or minimisin g F whic h u ses indep endent bit mutation with p n = 1 /n . (Thus, th is su ﬃces for the case c = 1.) This family Φ = { Φ f } is said to b e a unive rsa l family of feasible drift fu nctions b ecause Φ f dep ends on n ( f ), but not otherwise on f . Since Φ max ,f = Θ( n ( f )), th is give s an exp ected optimisation time of O ( n ( f ) log n ( f )), wh ic h is asymptotically optimal [9]. Pro ving that this Φ is a feasible family , while not trivial, is n ot o v erly complicated. This disco v ery of a u niv ersal family of feasible d rift functions giv es an elegan t analysis of the EA. Unfortunately , ev en if we allo w Φ max ,f to gro w faster than Θ( n ( f )) , such u niv ersal families of feasible dr if t functions only exist when c is sm al l (as n o ted in the intro d uction to this pap er). F or larger v alues of c , the f unction Φ f has to dep end up on f . Prior to this p aper, no n on-trivial drift functions of this form we re kno wn, so it wa s an op en problem whether the O ( n ( f ) log n ( f )) time b ound also app lies for c > 1. W e show that this is the case. 2.4 Our result Our main theorem is as follo ws. Theorem 7. L et c b e a p ositive c onst ant. L et F b e a family of line ar obje ctive functions over bit strings. The (1+1) EA for minimising F with indep end ent bi t- muta tion r ate p n = c/n has exp e c te d optimisation time O ( n ( f ) log n ( f )) . Ther e is a c onsta nt k and a fu nction ν ( n ) = O ( n ) such that, for any λ > 0 , the pr o b ability that the optimisation time exc e e ds this b ound by k ν ( n ) λ time steps is at most k exp ( − λ ) . W e pr o v e Th eo rem 7 by constru cting a feasible family of drift functions for the EA that is pie c e-wise p olynomial (a notion th at will b e deﬁ n ed in S ec tion 2.5). Lemma 9 7 extends Theorem 5 to piece-wise p olynomial feasible f amilies of drift functions, allo wing us to prov e Th eorem 7. Theorem 7 is in teresting for t wo reasons. On the m et ho dological side, th e p roof of the theorem greatly enlarges our under s ta nd in g ab out how to choose go od dr ift f unctions. This migh t enable b etter solutions for some p r oblems where drift analysis h as n ot y et b een v ery successful. Examples are the m inim um spann ing tree problem [21] and the single-criteria form ulation of the single-source shortest p ath problem [2]. F or b oth prob- lems, th e known b oun d s on the exp ected optimisation time con tain a log ( f max )-factor, stemming from the f a ct that, at least imp lici tly , drift analysis with the trivial family of drift functions with Φ f = f is condu cte d. Of course, our result is also inte resting b ecause it for the ﬁrst time sho ws that linear functions are optimised by the (1+1) EA in time O ( n ( f ) log n ( f )), regardless of what m utation prob ab ility p n = c/n is used. Note that this is not ob vious. In [5], the authors sho w that already for monotone fun ct ions, a constan t factor change in the m utation probabilit y can c hange the optimisation time from p olynomial to exp onen tial. 2.5 Piece-wise p olynomial drift Let F b e a family of linear ob jectiv e fu n cti ons o ve r bit str ings. Let Φ b e a feasible family of drift functions for a (1+1)-EA f or m in imising F . W e start with an elemen tary observ ation ab out Φ, which is that, in order to obtain an O ( n ( f ) log n ( f )) b ound on the exp ected optimisation time, we d o not really need Φ max ,f to b e b ounded from ab o v e by a p olynomial in n ( f ) — we can aﬀord to hav e a constan t n umb er of “h uge jumps”. The f ol lo wing arguments can b e s ee n as a v ariation of the ﬁtness lev el metho d [23]. Deﬁnition 8. Fix k ∈ N . Supp ose that, for every f ∈ F , M f = M f 0 , . . . , M f k is a p artition of Ω f . L e t M = {M f | f ∈ F } . We say M is a family of ﬁtn ess-based k -partitio ns for F i f for al l f ∈ F , 1. M f 0 = { 0 } , 2. for al l i < j , x ∈ M f i and y ∈ M f j , we have f ( x ) < f ( y ) . W e us e the notation min Φ f ( M f j ) to denote min { Φ f ( x ) | x ∈ M f j } and the notation max Φ f ( M f j ) to denote max { Φ f ( x ) | x ∈ M f j } . Lemma 9. L et F b e a family of line ar obje ctive fu nctions over bit strings. L et Φ b e a ν -fe asible family of drift functions f or a (1+1)-EA for minimising F . L et M b e a family of ﬁtness-b ase d k -p artitions for F . Then ther e is an n 1 ∈ N such that, for every f ∈ F with n ( f ) ≥ n 1 , the exp e cte d optimisation time of the EA i s at most ν ( n ( f )) k X j =1  ln(max Φ f ( M f j )) − ln(min Φ f ( M f j )) + 1  . 8 Also , for any λ > 0 , the pr ob ability that the optimisation time exc e e ds k X j =1 l ν ( n ( f ))  ln(max Φ f ( M f j )) − ln(min Φ f ( M f j ) + λ m is at most k exp( − λ ) . Pr o of. Let n 1 b e the quant it y in Theorem 5 (wh ich is at least as large as the quantit y n 0 in Deﬁnition 4). Let f ∈ F with n ( f ) ≥ n 1 . F or 0 ≤ j ≤ k , let Ω f ,j = S j ℓ =0 M f ℓ and let µ f ,j = min Φ f ( M f j ). F or 1 ≤ j ≤ k , deﬁne Ψ f ,j : Ω f ,j → R as follo w s. If Φ f ( x ) ≥ µ f ,j then Ψ f ,j ( x ) = Φ f ( x ) /µ f ,j . Otherwise, Ψ f ,j ( x ) = 0. No w for j ∈ { 1 , . . . , k } , consider restricting the searc h sp ac e to Ω f ,j . S ince the partition M f is ﬁ tness b ase d, we conclude that, if the EA is started w it h inpu t f , and an in iti al solution in Ω f ,j , all new solutions that are accepted by the EA are in Ω f ,j . Considering all solutions in Ω f ,j − 1 to b e equiv alen t to the all-zero state 0 , w e note that { Ψ f ,j | f ∈ F } satisﬁes th e ﬁrst t w o conditions of b eing a ν -feasible family of drif t functions for F on { Ω f ,j } . Also, if Φ f ( x ) ≥ µ f ,j then E [Φ f ( x new )] ≤ (1 − 1 /ν ( n ( f )))Φ f ( x ) so E [Ψ f ,j ( x new )] ≤ E [Φ f ( x new ) /µ f ,j ] ≤ (1 − 1 /ν ( n ( f )))Ψ f ,j ( x ) . So, by Theorem 5, the exp ected time un til a solution in Ω f ,j − 1 is reac hed is at m ost ν ( n ( f ))(1 + ln max { Ψ f ,j ( x ) | x ∈ Ω f ,j } ) , whic h is at m ost ν ( n ( f )) 1 + ln max Φ f ( M f j ) min Φ f ( M f j ) !! . This giv es the desir ed result, su mming from j = k d own to j = 1. F or the high probabilit y statemen t, again from Theorem 5 , w e conclud e that with probabilit y at least 1 − exp( − λ ) , & ν ( n ( f )) ln max Φ f ( M f j ) min Φ f ( M f j ) ! + λ !' iterations suﬃce to go from a solution in Ω f ,j to one in Ω f ,j − 1 . Deﬁnition 10. Supp ose that Φ i s a family of f e asible drift functions for F . We wil l say that Φ is piece-wise p olynomial (with r esp e ct to the (1+1)-EA ), if ther e is a c onstant k and a family M of ﬁtness b ase d k -p artitions for F such that for every j ∈ { 1 , . . . , k } , ln(max Φ f ( M f j )) − ln(min Φ f ( M f j )) = O (log n ( f )) . If Φ is a family of feasible drift fu nctio ns for a (1+1)-EA for minimising F , and Φ is piece-wise p olynomial with resp ect to the EA, then the optimisation time b ound giv en b y Lemma 9 is O ( n ( f ) log n ( f )). 9 3 Construction of the Dr ift F unction Let F b e a linear family of ob jectiv e f u nctions o v er bit strin gs (see Deﬁnition 2). Fix a constan t c and consider the (1+1) EA for minimising F with indep endent bit-m utation rate p n = c/n . W e aim to construct a family Φ of feasible d rift f u nctions f or the EA whic h is p iec e-wise p olynomial with resp ect to the EA. 3.1 Notation and parameters Recall from Deﬁnition 2 that Ω f = { 0 , 1 } n ( f ) and that an elemen t x ∈ Ω f is w ritte n as a strin g of n ( f ) bits, x = x n ( f ) . . . x 1 . In th e pro of, w e sh al l often us e the w ord “left” to refer to the most-signiﬁcan t b it (with the largest ind ex, index n ( f )) of x and “right” to refer to the least-signiﬁcan t bit (with the smallest in dex, index 1). The pro of will u se sev eral parameters, which w e discuss here. W e start by ﬁxing an arbitrarily-small p ositiv e constan t ε . This is constan t will b e used to pr ec isely for- m ulate the intermediate results. T o deﬁne the family Φ, w e will use a suﬃciently large constan t K ≥ 1 (dep ending on c and ε ) and a suﬃcient ly small p ositiv e constant γ (dep ending on c , ε and K ). 3.2 Splitting in to blo c ks The diﬃcult y in deﬁn ing a suitable dr ift function Φ f is that the optimisation of f via the EA h ea vily d epend s on the co eﬃcie nts a i . If these are steeply increasing, as in Example 1, whether a new solution is accepted or not is d et ermined by th e v alue of the leftmost bit that is ﬂip ped . On the other h and, if these are of comparable size, as in Example 2, the diﬀerence b et wee n the num b er of “go od” bit-ﬂips (turing a 1 into a 0) and the n umber of “bad” bit-ﬂips (turing a 0 in to a 1) determines whether a n ew solution is accepted. Of course, the pr ec ise d eﬁ nitio ns of “steeply increasing” and “comparable size” d epend on the constan t c in the m utation pr obabilit y . Also, an ob jectiv e function f can b e of a mixed type, ha ving regions with steeply increasing co eﬃcien ts and also regions where co eﬃc ients are of comparable size. Fix an ob jectiv e fun ctio n f with n ( f ) = n . T o analyse f and deﬁne the corresp onding drift function Φ f , we sp lit the b it p ositions { 1 , . . . , n } into blo cks . The idea is th at, within a blo c k, one of the tw o b eha viours is d o minant. The deﬁ n itio n of blo c ks, naturally , has to allo w us to an alyse the in teraction b et wee n diﬀeren t blo c ks. W e ﬁr st split the b it p ositions { 1 , . . . , n } into miniblo cks . Start with j = 1. A miniblo c k starting at bit p osition j is constru cted as follo ws. If a n /a j < n 2 , then { j, . . . , n } is a single miniblo c k. Otherwise, let i b e the minim um v alue in { j + 1 , . . . , n } suc h that a i /a j ≥ n 2 . Then the set { j, . . . , i } is a miniblo c k. If i = n , w e are ﬁ nished. Otherwise, set j = i and r epeat to form the next min ibloc k, starting at bit p osition j . Note that consecutiv e miniblo c ks o v erlap b y one bit p osition. The next thing that w e d o is merge consecutiv e p a irs of miniblo c ks into blo cks . T o start out with, w e just go through the m iniblocks f r om righ t to left, making a blo c k out of eac h pair of miniblo c ks. Note that this is (in ten tionally) diﬀerent from just deﬁnin g 10 blo c ks analogous to m iniblocks with the n 2 replaced by n 4 . No te f urther that again consecutiv e blo c ks ov erla p in one b it p osition. A blo c k is said to b e long if it con tains at least γ n b it p ositions (recall that the parameter γ is from Section 3.1) and short otherwise. It helps our analysis if an y pair of long blo c ks h as at least th ree short blo c ks in b etw een. S o if t w o long b loc ks are separated b y at most t wo short blo c ks, th en we com bine the w hole th ing in to a single long b loc k. W e rep eat this (at most a constant num b er of times since there are less than 1 /γ long b lo c ks initially) un til all remaining long blo c ks are separated by at least three short blo c ks. W e will us e ℓ B to denote the leftmost bit p osition in blo c k B and r B to denote the righ tmost bit p osition in blo c k B . As long as B is not the leftmost blo c k, w e hav e a ℓ B /a r B ≥ n 4 . 3.3 Deﬁnition of Φ f W e w ill deﬁn e weig hts w 1 , . . . , w n ∈ R su c h that Φ f ( x ) = P n i =1 w i x i . W e call the w i weights to distinguish them from the c o eﬃcients a 1 , . . . , a n of f . W e deﬁne the we ight s w 1 , . . . , w n as follo ws, starting with w 1 = 1. S upp ose th at bit p osition i is in blo c k B , that i 6 = r B , and that w r B is already d eﬁned. If blo c k B is a long blo c k, or is immediately to the left of a long blo c k, then w e deﬁn e w i b y w i = w r B a i /a r B . W e call this the c opy r e gime since w i /w r B = a i /a r B . Otherwise, we are in th e damp e d r e gime and we d eﬁne w i b y w i = w r B min { K ( i − r B ) c/n , a i /a r B } , where K is th e parameter fr om Section 3.1. It will b e a ma jor eﬀort in the remaind er of the pap er to show that this { Φ f | f ∈ F } is a feasible family of drift f unctions for the EA. I t is easier to see that { Φ f } is p iece-wise p olynomial with r espect to the EA, so we do this next. Lemma 11. L et F b e a line ar family of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep endent bit-mutation r ate p n = c/n . Th e family Φ = { Φ f } of drift functions c onstructe d ab ove is pie c e- wise p ol ynomial with r e sp e ct to the EA. Pr o of. Let k = 6 ⌈ 1 /γ ⌉ + 1. W e now construct a family of ﬁtn ess-b ase d k -partitions for F . Let f b e an ob jectiv e function in F and let n = n ( f ). W e no w deﬁn e the partition M f . W e call a bit p osition i ∈ [2 ..n ] a jump (for the ob j ectiv e fun cti on f ) if • i is in a copy regime, and • w i /w i − 1 > n 2 . By th e constru ct ion of the blo c ks, b it p osition i is the leftmost bit p osition of a miniblo c k con tained in either (1) a long blo c k, or (2) a s h ort blo c k imm ediat ely to the left of a 11 long b loc k. S ince th ere are at most ⌈ 1 /γ ⌉ long blo c ks, there are at most k − 1 ju mps. (The easiest wa y to see this is to th ink ab out th e original long b loc ks, prior to an y merges. E ac h blo c k con tains t w o miniblo c ks. Within a long blo c k B , there ma y b e t w o jumps, and th ere may b e t wo in eac h of the two b loc ks to the left of B — the blo c k immediately to the left of B is alwa ys in the cop y regime, but the b loc k to its left m ay also b e merged in to a long blo c k with B .) S upp ose there are k ′ jumps, and let M f j = ∅ , for k ′ + 1 < j ≤ k . Let i 1 , . . . , i k ′ b e an increasing enumeration of the jump s. S et i 0 = 1 and i k ′ +1 = n + 1 to ease the f ollo wing deﬁnition. F or j = 1 , . . . , k ′ + 1, let N j b e { i j − 1 , . . . , i j − 1 } and deﬁne M f j = { x ∈ { 0 , 1 } n | ∃ i ∈ N j : x i = 1 ∧ ∀ i ≥ i j : x i = 0 } . Let M f 0 = { 0 } . Inf o rmally , N j is th e set of bit p ositions starting at the jump i j − 1 and going up to, b ut n ot including, the jump i j . So { N j | 1 ≤ j ≤ k ′ + 1 } is a partition of the bit p ositions. Th en M f j is the set of bit strings x whic h ha ve the leftmost “1”-bit in N j . In ord er to sh o w that M = {M f | f ∈ F } is a f a mily of ﬁtn ess-based k -partitions for F , we n eed only sh o w that th e follo wing condition is satisﬁed: f or all i < j , x ∈ M f i and y ∈ M f j , w e ha ve f ( x ) < f ( y ). T he condition follo ws from the fact that a i /a i − 1 = w i /w i − 1 > n 2 for all jumps i . In ord er to sho w that Φ is piece-wise p olynomial with resp ect to the EA, it remains to pro ve that, for ev ery j ∈ { 1 , . . . , k } , ln(max Φ f ( M f j )) − ln (min Φ f ( M f j )) = O (log n ( f )). Fix an y suc h j . Let r f = max Φ f ( M f j ) / min Φ f ( M f j ). W e show that r f is upp er-b ounded b y a p olynomial in n . F or a set of b it p ositi ons I ⊆ { 1 , . . . , n } , let min I denote the minimum elemen t in I and let max I denote the maximum elemen t. Since w 1 ≤ . . . ≤ w n , min Φ f ( M f j ) = w min N j = w i j − 1 . Similarly , max Φ f ( M f j ) = P max N j i =1 w i ≤ nw max N j = nw i j − 1 . Hence r f ≤ nw max N j /w min N j . W e rewrite w max N j w min N j = Y B : B ∩ N j 6 = ∅ w max( B ∩ N j ) w min( B ∩ N j ) , (1) where B run s o ve r all minib loc ks that ha v e a non-empty in tersection w ith N j . Note that the ab o v e is tru e b ecause adjacen t miniblo c ks intersect in exactly one bit p osition. If B is a miniblo c k in a damp ed regime, then w max( B ∩ N j ) /w min( B ∩ N j ) ≤ w ℓ B /w r B = K ( ℓ B − r B ) c/n . In consequence, th e con tribu tion of all w eigh ts in damp ed regimes to (1) is at most a factor K c . What remains is the con tribu tio n of minib loc ks in long blo c ks and in those short blo c ks immediately to the left of a long b loc k. Let B b e such a miniblo c k. If B ∩ N j = { ℓ B } then w max( B ∩ N j ) /w min( B ∩ N j ) = 1. O th erwise, note that w max( B ∩ N j ) w min( B ∩ N j ) ≤ w ℓ B w r B =  w ℓ B w ℓ B − 1   w ℓ B − 1 w r B  . 12 The ﬁrs t factor is at most n 2 , since ℓ B is not a ju mp, the second factor is at most w ℓ B − 1 /w r B = a ℓ B − 1 /a r B ≤ n 2 b y the d eﬁ nitio n of a min ibloc k. 3.4 Auxiliary results concerning the w eigh ts w i Fix an ob jectiv e fun ct ion f ∈ F and let n = n ( f ). W e w ill assume that n is suﬃcien tly large with resp ect to th e constan ts c , ε , K and γ since our ob jectiv e is to construct a f amily Φ of feasible dr ift functions for th e EA and the d eﬁnition of su c h a family (Deﬁnition 4) is only concerned with su ﬃcie ntly large n . The deﬁ n itio n of Φ f allo ws us to pr o v e a n umb er of usefu l facts. Th e ﬁrst of these uses a geometric series to b ound sums of we ight s in the damp ed r egime. Lemma 12. L et B 0 , . . . , B k b e a c onse cutiv e se quenc e of blo cks (left to right) in the damp e d r e gime with ℓ B 0 = r B k + t . Then X j ∈ B 0 ∪ ... ∪ B k w j ≤ K tc/n w r B k  n c ln K + 1  . Pr o of. F or 0 ≤ h ≤ t w e h a v e w ℓ B 0 − h ≤ K tc/n w r B k K − hc/n . No w X j ∈ B 0 ∪ ... ∪ B k w j ≤ K tc/n w r B k ∞ X h =0 K − ch/n = K tc/n w r B k 1 1 − K − c/n . No w K c/n = e (ln K ) c/n ≥ 1 + (ln K ) c/n , so 1 1 − K − c/n ≤ 1 1 − 1 1+(ln K ) c/n =  n c ln K + 1  . The n ext lemma giv es the relationship b et we en the leftmost we ight and the righ tmost w eigh t in a blo c k in the damp ed regime. Lemma 13. If B is a blo ck i n the damp e d r e gi me with ℓ B = r B + t and B is not the leftmost blo c k, then w ℓ B = K tc/n w r B . Pr o of. Th is f ollo ws from the deﬁnition of the weigh ts in the damp ed regime, since a ℓ B a r B ≥ n 4 ≥ K c ≥ K tc/n . The second inequalit y follo w s fr om our assu mption (at the b eginning of Section 3.4) that n is suﬃcien tly large with resp ect to K and c . Lemmas 12 and 13 giv e the f ollo wing corollary . 13 Corollary 14. L et B 0 , . . . , B k b e a c onse cutive se quenc e of blo cks (left to right) in the damp e d r e gime with ℓ B 0 = r B k + t . If B 0 is not the leftmost b lo ck then X j ∈ B 0 ∪ ... ∪ B k w j ≤ w ℓ B 0  n c ln K + 1  . Corollary 14 giv es the follo wing upp er b ound f or the sum of all w eigh ts con tained in, and to the right of, a short blo c k. Lemma 15. L et B b e a short blo ck that is not the leftmost blo ck. Then X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1 + γ n + n − 3  . Pr o of. If there is no long b lo c k to the r ig ht of B , then B and all of the blo c ks to its right are in the d amped regime, so the result follo ws immed iately from Corollary 14. Ass u me therefore that there is a long blo c k to the right of B . Let L b e th e long blo c k whic h is closest to B on its righ t. Let S b e the sh ort blo c k immediately to the left of L . Note that S might b e th e same blo c k as B . Supp ose j ∈ L . R ecall that for all h, k ∈ L ∪ S , we h a v e w h a h = w k a k . Thus, since S is not the leftmost blo c k, w j = w j a j a j ≤ w j a j n − 4 a ℓ S = n − 4 w ℓ S ≤ n − 4 w ℓ B . Since the w j ’s increase with j , we conclude that w j ≤ n − 4 w ℓ B for any j ≤ ℓ L . Thus, P j ≤ ℓ L w j ≤ n − 3 w ℓ B . Using the fact that S is sh ort and the m onot onicit y of w , we deduce P j ∈ S w j ≤ γ nw ℓ B . Com binin g this with Corollary 14, we obtain X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1  + γ nw ℓ B + w ℓ B n − 3 . 4 F easible Drift Our ob jectiv e in this section is to pro v e the follo wing lemma, w hic h is the heart of the pro of of our main resu lt. Lemma 16. L et F b e a line ar f amily of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep endent bit-mutation r ate p n = c/n . Ther e is a function ν ( n ) = O ( n ) such that the family Φ = { Φ f } of drift functions c onstructe d ab ove is ν -fe asible for the EA. 14 Consider running the EA w ith inp ut f with n = n ( f ). W e use the f ol lo wing notatio n. The state after t steps is a binary string x [ t ] = x n [ t ] . . . x 1 [ t ]. Recall from Section 3.1 that w e write b it strings as w ords from most signiﬁ ca nt b it (“leftmost b it ”) to least signiﬁcant. In the ( t + 1)’st step of th e algorithm, the b its of a binary string y [ t + 1] = y n [ t + 1] . . . y 1 [ t + 1] en coding the mutat ion mask are chosen indep endentl y . Th e p robabilit y that y i [ t + 1] = 1 is p n = c/n . Then x ′ [ t + 1] is formed from x [ t ] by ﬂ ipping the bits that are 1 in string y [ t + 1]. T hat is, x ′ n [ t + 1] . . . x ′ 1 [ t + 1] = ( x n [ t ] ⊕ y n [ t + 1]) . . . ( x 1 [ t ] ⊕ y 1 [ t + 1]). Let A t +1 b e the ev en t that P i a i x ′ i [ t + 1] ≤ P i a i x i [ t ]. W e say that the mutat ion in step t + 1 is “accepted” in this case. I f A t +1 o ccurs, then x [ t + 1] = x ′ [ t + 1]. Otherw ise, x [ t + 1] = x [ t ]. O f cour s e, th e co eﬃcien ts a i , and therefore A t +1 itself, d epend s implicitly on f . Supp ose that x [ t ] is not the all-zero string. F or a bit p osition i with x i [ t ] = 1, let I i [ t + 1] b e the eve nt y i [ t + 1] = 1 ∧ ∀ j ∈ { i + 1 , . . . , n } : ( x j [ t ] = 1) ⇒ ( y j [ t + 1] = 0) . I i [ t + 1] is th e even t that i is th e leftmost ‘1’ to b e considered f o r a ﬂ ip in step t + 1. Finally , let I ′ ℓ [ t + 1] b e the eve nt ∀ j ∈ { ℓ + 1 , . . . , n } : ( x j [ t ] = 0) ⇒ ( y j [ t + 1] = 0) . I ′ ℓ [ t + 1] is the even t that the ‘0’ b its to the left of ℓ are not considered for a ﬂip in step t + 1. Note that Pr( I ′ ℓ [ t + 1]) ≥ (1 − p n ) n and that, giv en x [ t ], the ev ent I ′ ℓ [ t + 1] is ind ep endent of I i [ t + 1] for an y i (the ev ent I i [ t + 1] constrains y j [ t + 1] for some j with x j [ t ] = 1, w hereas the ev ent I ′ ℓ [ t + 1] constrains y j [ t + 1] for j with x j [ t ] = 0). Ho w ev er, these ev ents are n ot ind epend en t if we cond iti on on A t +1 , as the follo wing simple observ atio n shows. Lemma 17. L et i b e a bit p ositio n c on taine d in some blo ck B . Assume that ther e is a blo ck L imme diately to the left of B . Then I i [ t + 1] and A t +1 implies I ′ ℓ L [ t + 1] . Pr o of. Th er e is nothing to sho w if L is the leftmost b loc k. Hence assume that it is not. Then in particular, a ℓ L ≥ n 4 a r L . Assume th at I i [ t + 1] o ccurs and I ′ ℓ L [ t + 1] do es not. Let k > ℓ L b e such that y k [ t + 1] = 1 and x k [ t ] = 0. Then P n j =1 a j ( x j [ t + 1] − x j [ t ]) ≥ a k − P j ≤ i a j ≥ a k − na i > 0, b ecause a k ≥ a ℓ L ≥ n 4 a r ℓ ≥ n 4 a i . Hence this mutatio n is not accepted, that is, A t +1 do es not o ccur. Recall that ε ∈ (0 , 1), K , and γ are parameters deﬁn ed in Section 3.1. W e tak e ε to b e “suﬃcientl y sm all”. Th en K ≥ 1 is tak en to b e “suﬃcien tly large” (dep ending on c and ε ) and th en γ ∈ (0 , 1) is tak en to b e “suﬃcien tly small” (dep ending on c , ε and K ). Finally , w e tak e n 0 > 1 to b e an y intege r w hic h is “su ﬃcie ntly large” with resp ect to all of these parameters. The actual constrain ts that we use (to determine what is “suﬃcien tly large” and what is “suﬃciently small”) will b e sp elle d out b elo w. Note that (1 − c n ) n approac hes exp ( − c ) fr om b elo w as n → ∞ . W e c ho ose n 0 so that (1 − c n 0 ) n 0 is “suﬃcien tly close” to exp( − c ) (with resp ect to c , ε and K ). W e can conclud e from this that (1 − c n ) n is “su ﬃcie ntly close” to exp( − c ) for an y n ≥ n 0 . Similarly , (1 − c n ) 3 n 15 approac hes exp( − 3 c ) fr om b elo w as n → ∞ . W e will c ho ose n 0 to ens u re that, for n ≥ n 0 , this is “suﬃcien tly close” to exp ( − 3 c ). Pr o of of L emma 16. Th e ﬁr s t t wo conditions in Deﬁnition 4 f o llo w from the construction of Φ f in Section 3.3. Th e third condition follo ws from Lemma 18 b elo w. The follo wing lemma is the main ingredient in the short pro of of Lemma 16 ab o v e. It establishes the th ir d condition in Deﬁnition 4, so it allo ws us to conclud e that Φ is ν -feasible f or the EA. Since by Lemma 11, Φ is also piece-wise p olynomial with resp ect to the EA, Lemma 9 enables us to rep eatedly apply Lemma 18 to b oun d th e exp ected optimisation time of the EA. Lemma 18. L et F b e a line ar f amily of obje ctive functions over bit strings. Consider the (1+1) EA for minimising F with indep e nd ent bit-mutation r ate p n = c/n . L et f b e an obje c tive func tion in F with n ( f ) ≥ n 0 . F or al l x ∈ { 0 , 1 } n ( f ) \ { 0 } , E [Φ f ( x [ t + 1]) | x [ t ] = x ] ≤  1 − 1 n ( f ) ce − 3 c (1 − ε ) 2  Φ f ( x ) . Pr o of. Fix f ∈ F with n ( f ) ≥ n 0 . Let n = n ( f ). Note that, for any ﬁxed x [ t ], E [Φ f ( x [ t ]) − Φ f ( x [ t + 1])] = X i : x i [ t ]=1 Pr( I i [ t + 1]) E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] , (2) since the ev en ts I i [ t + 1] for 1 ≤ i ≤ n are d isjoin t and Φ f ( x [ t ]) = Φ f ( x [ t + 1]) unless one of them o ccurs. In eac h of v arious cases (see S ubsections 4.1 to 4.5), we will show that, for all i with x i [ t ] = 1, E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) 2 n w i (1 − ε ) , (3) whic h is greater than or equal to 0 since n ≥ n 0 > c and ε < 1. Using the lo w er b ound Pr( I i [ t + 1]) ≥ p n (1 − p n ) n , wh ic h applies for ev ery i with x i [ t ] = 1, Equations (2 ) and (3) giv e E [Φ f ( x [ t ]) − Φ f ( x [ t + 1])] ≥ p n (1 − p n ) n X i : x i [ t ]=1 E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ p n (1 − p n ) n (1 − p n ) 2 n (1 − ε )Φ f ( x [ t ]) , so E [Φ f ( x [ t + 1])] ≤ (1 − p n (1 − p n ) 3 n (1 − ε ))Φ f ( x [ t ]) . Since (1 − p n ) 3 n ≥ e − 3 c (1 − ε ) f o r n ≥ n 0 , this will complete the p roof. It r ema ins to p ro v e E q u ati on (3). W e do this in Subs ec tion 4.1 to 4.5. In eac h case, B is the block con taining bit p osition i , L is the blo c k to the left of B (if it exists) and R is the blo c k to the righ t of B (if it exists). Figure 1 d epict s some b loc ks (tw o short blo c ks follo w ed by a long blo c k, follo w ed by a short b loc k divided in to t w o minib loc ks, follo w ed 16 Case 1 Case 3 Case 5 Case 4 Case 2 Case 1 Figure 1: Th e cases that are used to p roof Equation (3). b y another short b loc k). F or eac h p ossible lo cation of the bit p osition i , it names the relev ant case. Ev ery long b loc k is cov ered by Case 5. Blo c ks to the left of a long blo c k are co v ered b y Case 3 and b loc ks immediately to the righ t of a long blo c k are co v ered b y Case 4, then C ase 2. Everything else is cov ered by Case 1. F or all of the follo wing cases, ﬁx f ∈ F with n ( f ) ≥ n 0 . Let n = n ( f ). Fix x [ t ] with x i [ t ] = 1 for a bit p osition i in b loc k B . Recall fr om th e pro of of Lemma 18 that the goal is to pr o v e (3). That is, we m ust sho w that E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) 2 n w i (1 − ε ) . 4.1 Case 1 F or this case, assume that B is not long and that blo c ks adjacen t to B are not long either. If B is not the leftmost blo c k, then let L b e the b lo c k to B ’s left. Th e case in w hic h B is the leftmost blo c k is actually easier, bu t to av oid rep etition, in this case, let L b e the blo ck consisting of the single bit p osition ℓ B . The follo wing argumen t now applies whether L is a real blo c k or just a s ingle bit p osition. W e will condition on I i [ t + 1]. By L emm a 17, we kn o w that if th is m utation is accepted (so A t +1 o ccurs), then the ev en t I ′ ℓ L [ t +1] o ccurs. Also, Pr( I ′ ℓ L [ t + 1] | I i [ t + 1]) ≥ (1 − p n ) n , as we noted earlier. T h us E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] is equal to Pr( I ′ ℓ L [ t + 1] | I i [ t + 1]) · E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1] , I ′ ℓ L [ t + 1]] . (4) Let P = Pr( A t +1 | I i [ t + 1] , I ′ ℓ L [ t + 1]). Note that P ≥ (1 − p n ) n (since, for example, A t +1 o ccurs if y j [ t + 1] = 0 for j 6 = i ). Now Φ f ( x [ t ]) − Φ f ( x [ t + 1]) = P n j =1 w j ( x j [ t ] − x j [ t + 1]). If I i [ t + 1] and I ′ ℓ L [ t + 1] o ccur, then this is P j ≤ ℓ L w j ( x j [ t ] − x j [ t + 1]). If A t +1 also o ccurs, then x i [ t ] − x i [ t + 1] = 1 so this is w i + P j ≤ ℓ L ,j 6 = i w j ( x j [ t ] − x j [ t + 1]). Thus, th e quan tit y in (4) is at least (1 − p n ) n   w i P − X j ≤ ℓ L ,j 6 = i w j Pr( y j [ t + 1] = 1 | I i [ t + 1] , I ′ ℓ L [ t + 1])   ≥ (1 − p n ) n   w i (1 − p n ) n − X j ≤ ℓ L w j p n   . 17 No w, by Lemma 15, we h a v e X j ≤ ℓ L w j ≤ K 2 cγ w r B  2 n c ln K + 2 + γ n + n − 3  . T o see this, apply the lemma d irect ly to L if it is not the leftmost blo c k (and n ot e that w ℓ L ≤ K 2 cγ w r B ). If L is the leftmost blo c k (and B is not) then app ly Lemma 15 to blo c k B (noting that w ℓ B ≤ K cγ w r B ) and u se Lemma 12 to sum the w eigh ts in L . Finally , if B is the leftmost blo c k then apply Lemma 15 to the sh ort blo c k to the righ t of B an d use Lemma 12 to sum the weig hts in B . Using this and w r B ≤ w i w e hav e E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ (1 − p n ) n w i  (1 − p n ) n − 2 K 2 cγ ln K − 2 c n K 2 cγ − γ cK 2 cγ − c n 4 K 2 cγ  . By the choice of th e parameters in Section 3.1, and sin ce n ≥ n 0 , eac h of 2 K 2 cγ ln K , 2 c n K 2 cγ , γ cK 2 cγ and c n 4 K 2 cγ is at most (1 − p n ) n ε/ 4, so Equ ati on (3) holds, as r equired. T o see this, recall (from the text jus t after Lemm a 17 ) that ε is tak en to b e “suﬃcient ly small”, then K ≥ 1 is tak en to b e “suﬃciently large” (dep ending on c and ε ) and th en γ ∈ (0 , 1) is tak en to b e “suﬃcien tly small” (dep ending on c , ε and K ). Finally , w e tak e n 0 > 1 to b e any in teger whic h is “suﬃcien tly large” with resp ect to all of these parameters, in particular, guarante eing that (1 − p n ) n is “suﬃcient ly close” to exp( − c ) for an y n ≥ n 0 . It is easy to see th at c n 4 K 2 cγ and 2 c n K 2 cγ are s u ﬃcien tly s m al l, s in ce n 0 is c hosen after the other parameters (so these terms can b e made arb itrarily sm all as compared to exp( − c ) ε/ 4 ). S imila rly , γ cK 2 cγ is suﬃciently small b ecause γ is c hosen to b e suﬃciently small with resp ect to ε , c and K . Finally , 2 K 2 cγ ln K is suﬃ ci entl y small b ecause γ can b e c hosen as small as we like with resp ect to the other parameters. (Th at is, ﬁrst K is m a de suﬃcien tly large with r espect to c and ε and then γ is deﬁned.) F or example, setting γ = ln ( ε 16 e − c ln K ) / (2 c ln K ) give s 2 K 2 cγ ln K = e − c ε/ 8. 4.2 Case 2 F or this case, assu m e that the blo c k L , immediately to the left of B , is long, and that i is in the right most miniblo c k of blo c k B (which is therefore short). This is v ery similar to C ase 1. As in C ase 1, we will condition on I i [ t + 1]. Where Case 1 uses Lemma 17, w e use exactly the same argument to sho w that, if this mutation is accepted (so A t +1 o ccurs), then even t I ′ ℓ B [ t +1] o ccurs. F rom that p oin t the argu m en t pro ceeds exactly as in Case 1, replacing “ ℓ L ” with “ ℓ B ”. W e u se Lemma 15 to obtain the upp er b ound X j ≤ ℓ B w j ≤ w ℓ B  n c ln K + 1 + γ n + n − 3  ≤ K cγ w r B  n c ln K + 1 + γ n + n − 3  . The rest of the argument is exactly th e same as in Case 1. 18 4.3 Case 3 F or this case, assume th a t B is immediately to the left of a long blo c k R . Hence b oth B and R are in the copy regime. If B is n ot th e leftmost blo c k, then there is a blo c k L immediately to the left of B . Block L is sh o rt, since an y pair of long blo c ks has at thr ee short blo c ks b et we en. Thus, L is in the damp ed regime. If B is th e leftmost blo c k, to k eep notation simple, we add an artiﬁ cial blo c k L = { ℓ B } = { n } . Note that X j ℓ L or b oth j > i and x j [ t ] = 1. Th us, by the d eﬁ nitio n of A t +1 , we h a v e X j ≤ ℓ L a j (( x j [ t ] ⊕ y j ) − x j [ t ]) ≤ 0 . W e compute X j ∈ L : y j =1 ,x j [ t ]=0 a j + X j ∈ B ∪ R : y j =1 ,x j [ t ]=0 a j − X j ∈ B ∪ R : y j =1 ,x j [ t ]=1 a j ≤ X j ℓ L or if j > i and x j [ t ] = 1. Thus, if y ∈ Y , then, b y th e deﬁnition of A t +1 , we h a v e 0 ≤ X j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) . (7) T o derive an up per b ound in th e r ig ht-hand sid e of Eq u ati on (7) we split the summa- tion into thr ee easily-b ounded parts. The s ummation o ver j ∈ L − { r L } is equal to − P r L i s u c h that x j [ t ] = 1. T hus, P ≥ (1 − p n ) n so ( ε/ 3)( 1 − c n ) n ≤ ( ε/ 3) P. W e conclude that the ﬁ rst term is at least − ( ε / 3) P w i . Using Lemma 15 and w ℓ B ≤ K γ c w i , we obtain K γ c c n X j ≤ ℓ B w j ≤ w i  K 2 γ c ln K + cK 2 γ c n + K 2 γ c γ c + cK 2 γ c n 4  . 22 Giv en the constrain ts on our parameters (see the discus sion at the end of Case 1), eac h of the four summands, K 2 γ c ln K , cK 2 γ c n , K 2 γ c γ c and cK 2 γ c n 4 , is at most ( ε/ 12) P . Th us, the second term, − K γ c P j ≤ ℓ B ,j 6 = i c n w j , is also at least − ( ε/ 3) P w i . In a similar w a y , w e see that the third term, P ((1 + 1 n − K − γ c ) K γ c w i + w i ) , is at least P w i (1 − ε/ 3). W e conclude that E [Φ f ( x [ t ]) − Φ f ( x [ t + 1]) | I i [ t + 1]] ≥ P w i (1 − ε ) , whic h establishes Equ ati on (3), as r equired. 4.5 Case 5 F or this case, assume that B is a long blo c k. T o the right of B , there migh t b e a short b loc k R , otherwise r B = 1 and we d eﬁne R = { r B } to ease notation. T o th e left of B , there might b e a short b loc k L , otherwise ℓ B = n and we deﬁne L = { ℓ B } to ease notation. Let Y b e the s et of n -bit bin ary strings so that, if y [ t + 1] = y , then I i [ t + 1] o ccurs and A t +1 o ccurs (so the mov e in step t + 1 is accepted). As in Case 4, A t +1 implies I ′ ℓ L [ t + 1]. Hence for every y ∈ Y w e ha ve y j = 0 for j > ℓ L and for all j > i satisfying x j [ t ] = 1. T h us, if y ∈ Y , then, by the deﬁnition of A t +1 , we ha ve 0 ≤ X j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≤ X r R ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) + a i n − 3 ≤ X r B ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) + X j ∈ R ; y j =1; x j [ t ]=1 a j + a i n − 3 . (10) W e w ill use the fact that for j ∈ L ∪ B , w e ha ve w j = w r B a r B a j since w e are in the copy regime, whereas for j ∈ R , we are in th e damp ed regime, so we hav e a j ≤ a r B = a r B w r B w r B ≤ a r B w r B w r R K ( r B − r R ) c/n ≤ a r B w r B w j K γ c . Plugging this into (10), we obtain X r B ≤ j ≤ ℓ L w j ( x j [ t ] − ( x j [ t ] ⊕ y j )) = w r B a r B X r B ≤ j ≤ ℓ L a j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≥ − w r B a r B   X j ∈ R ; y j =1; x j [ t ]=1 a j + a i n − 3   ≥ − K γ c X j ∈ R ; y j =1; x j [ t ]=1 w j − w i n − 3 . 23 Let Ψ( y ) = − K γ c P j ≤ ℓ R ; y j =1 w j − w i n − 3 . F r om the ab o ve , X j ≤ ℓ L w j ( x j [ t ] − ( x j [ t ] ⊕ y j )) ≥ (1 − K γ c ) X j ∈ R ; y j =1; x j [ t ]=1 w j − X j ∈ R ; y j =1; x j [ t ]=0 w j − w i n − 3 − X j

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment