Branch and Bound Algorithms for Maximizing Expected Improvement Functions

Branc h and Bound Algorithms for Maximizing Exp ected Impro v emen t F unctions Mark F raney , Pritam Ranjan and Hugh Chipman Dep artment of Mathematics and Statistics, A c adia University, NS, Canada B4P 2R6 Abstract Deterministic computer simulations are often used as a replacemen t for com- plex physical exp erimen ts. Although less exp ensiv e than ph ysical experi- men tation, computer co des can still b e time-consuming to run. An eﬀectiv e strategy for exploring the resp onse surface of the deterministic sim ulator is the use of an appro ximation to the computer co de, suc h as a Gaussian pro- cess (GP) mo del, coupled with a sequen tial sampling strategy for choosing design points that can b e used to build the GP model. The ultimate goal of such studies is often the estimation of speciﬁc features of in terest of the sim ulator output, suc h as the maximum, minim um, or a level set (con tour). Before appro ximating such features with the GP mo del, suﬃcien t runs of the computer sim ulator must b e completed. Sequen tial designs with an exp ected impro vemen t (EI) function can yield go od estimates of the features with a minimal n um b er of runs. The c hallenge is that the expected impro vemen t function itself is often multimodal and diﬃcult to maximize. W e develop branch and b ound algorithms for eﬃcien tly maximizing the EI function in sp eciﬁc problems, including the sim ultaneous estimation of a minim um and a maximum, and in the estimation of a con tour. These branch and b ound algorithms outperform other optimization strategies suc h as genetic algorithms, and o ver a num b er of sequential design steps can lead to dramatically sup erior accuracy in estimation of features of in terest. Key wor ds: Computer exp erimen ts, F eature estimation, Sequen tial design Email addr ess: mark.franey@gmail.com, pritam.ranjan@acadiau.ca, hugh.chipman@acadiau.ca (Mark F raney , Pritam Ranjan and Hugh Chipman) Pr eprint submitte d to Journal of Statistic al Planning and Infer enc e Octob er 25, 2018 1. In tro duction Computer exp erimen ts are often emplo yed to simulate complex systems, but realistic computer simulators of ph ysical phenomena can b e time con- suming to ev aluate. In such cases it is desirable to p erform the exp eriment as eﬃciently as p ossible. One p opular approach designed for this purpose is sequen tial exp erimental design (Jones, 2001) whereb y an initial design is sim ulated and a surrogate mo del is ﬁt to the output, often using a Gaussian Spatial Pro cess (Sacks et al., 1989). The surrogate provides estimates at all input combinations (sampled and unsampled) with accompanying v ari- ance estimates and is used, along with a suitable ‘inﬁll criterion’, to choose subsequen t trials. Scien tists are often in terested in sp eciﬁc features of the sim ulator, suc h as the maximum, the minim um, lev el set(s), etc. In each case, the follo w-up trial in the sequen tial design is selected so as to give the greatest improv ement to the estimate(s) of these feature(s) of interest. A popular approach is to c ho ose this new trial b y maximizing a merit-based inﬁll criterion. Man y criteria hav e b een prop osed (Jones et al., 1998) (Villemon teix et al., 2006) (F orrester and Jones, 2008) and one p opular choice is the ‘exp ected improv ement’. This criteria has b een sho wn to b e eﬃcien t if the initial design is not to o sparse or deceptiv e (F orrester and Jones, 2008). The form of the improv emen t function v aries with the features of in ter- est, but in an y case facilitates a balance b et w een c ho osing p oin ts that exhibit lo cal optimality (greedy searc h, new trials close to predicted feature of in ter- est), and p oin ts with high uncertaint y (global searc h, new trials a w ay from previously sampled p oin ts) (P apalam bros et al., 2002). Finding the next design p oin t (the point that maximizes the EI function) is necessary in or- der to ensure the estimation of features of interest in the smallest num b er of sim ulator runs. Finding this maxim um is a diﬃcult global optimization problem, and the addition of eac h point in the sequen tial design c hanges the EI function, requiring another searc h for the global optim um of EI in order to ﬁnd the next p oin t. The main fo cus of this paper is to develop branc h and b ound algo- rithms for maximizing expected impro v ement functions for tw o particular features of in terest; con tours, and the simultaneous estimation of maximum and minimum. The most challenging asp ect of this algorithm is construct- ing the b ounds on the ob jectiv e function (exp ected improv emen t). Branc h and b ound algorithms hav e b een implemen ted for ﬁnding the minimum of a 2 function (Jones et al., 1998), and exp ected improv ement criteria ha ve been dev elop ed for the minimum (Jones et al., 1998), and con tours (Ranjan et al., 2008). Our main contribution are in (a) generalizing the exp ected improv e- men t criterion for sim ultaneous estimation of the maxim um and minim um, and (b) dev eloping branc h and b ound algorithms for maximizing exp ected impro vemen t for contours, and for sim ultaneous maxim um/minim um estima- tion. Ranjan et al. (2008) maximized their exp ected improv ement function for estimating con tours using a genetic algorithm. The branc h and b ound algorithm dev elop ed here outperforms their optimization b y lo cating lev el sets using few er simulator runs. This pap er is organized as follo ws: Section 2 gives an ov erview of the Gaussian pro cess mo del, the exp ected improv ement criterion and a generic branc h and bound algorithm which will b e used to maximize the exp ected impro vemen t o ver candidate input points. In Section 3 we prop ose b ounds on the exp ected improv ement for estimating contours and sim ultaneous esti- mation of the global maxim um and minim um. Section 4 presen ts simulation results comparing our branc h and b ound and a genetic algorithm similar to that of Ranjan et al. (2008) for diﬀeren t test functions. W e presen t our conclusions and outline p ossible future w ork in Section 5. 2. Review - surrogate mo del, global optimization 2.1. Gaussian Pr o c ess Mo del Let y i represen t the univ ariate response from the computer simulator, and X = ( x 1 , x 2 , ..., x n ) 0 b e the matrix of input vectors x 0 i = ( x i 1 , x i 2 , ..., x id ) used to generate the resp ectiv e outputs, with y ( X ) = ( y 1 , y 2 , ..., y n ) 0 . Without loss of generalit y , let the input space b e the unit h yp ercub e χ = [0 , 1] d . F ollowing Sac ks et al. (1989) we mo del the output y ( x i ) as y ( x i ) = µ + z ( x i ); i = 1 , ..., n, (1) where µ is the o v erall mean, and { z ( x ) , x ∈ χ } is a Gaussian spatial pro- cess with mean 0, v ariance σ 2 z and corr( z ( x i ) , z ( x j )) = R ij . The correlation matrix R is a function of the hyper-parameters θ = ( θ 1 , . . . , θ d ). In general, y ( X ) has multiv ariate normal distribution, y ( X ) ∼ N n ( 1 n µ, σ 2 z R ), where the parameters Ω = ( θ 1 , ..., θ d ; µ, σ 2 z ) are estimated by maximizing the likelihoo d. The mean and v ariance parameters hav e closed form estimates given by ˆ µ = ( 1 0 n R − 1 1 n ) − 1 1 0 n R − 1 y and ˆ σ 2 z = ( y − 1 n ˆ µ ) 0 R − 1( y − 1 n ˆ µ ) n , (2) 3 whereas the h yp er-parameters θ = ( θ 1 , ..., θ d ) are estimated by maximizing the proﬁle lik eliho o d. The b est linear un biased predictor (BLUP) at x ∗ , ˆ y ( x ∗ ), and the asso ci- ated mean squared error (MSE) s 2 ( x ∗ ) are giv en by ˆ y ( x ∗ ) = ˆ µ + r 0 R − 1 ( y − 1 n ˆ µ ) , (3) s 2 ( y ( x ∗ )) = σ 2 z  1 − r 0 R − 1 r + (1 − 1 0 n R − 1 r ) 2 1 0 n R − 1 1 n  , (4) where r = ( r 1 ( x ∗ ) , ..., r n ( x ∗ )) 0 and r i ( x ∗ ) = corr( z ( x ∗ ) , z ( x i )). The estimates ˆ y ( x ∗ ) and ˆ s ( x ∗ ) are obtained b y replacing R , r and σ 2 z with ˆ R ( ˆ θ ), ˆ r ( ˆ θ ) and ˆ σ 2 z . There are innumerable c hoices for the correlation structure, for instance, p o wer exp onen tial correlation and Mat ´ ern correlation (see Stein (1999); Santner et al. (2003) for details). The c hoice of correlation function is not crucial to the algorithms and metho dologies dev elop ed here, and thus w e omit the details. In all illustrations, the p o wer exp onen tial correlation function is used. It turns out that if the design p oin ts are close together in the input space the correlation matrix R can sometimes be near-singular whic h results in unstable computation. A p opular approac h to o v ercome this problem is to in tro duce a small n ugget 0 < δ < 1 and approximate the ill-conditioned R − 1 with a well-conditioned ( R + δI ) − 1 , where I is the n × n iden tity matrix. In later sections, a n ugget is included in the GP mo del. 2.2. Se quential design using EI criterion F or expensive computer mo dels the total n umber of sim ulator runs is limited; and thus the choice of design p oints is crucial for estimating certain pre-sp eciﬁed features or approximating the underlying pro cess. If we are in terested in only certain features (e.g., global maxima, con tours, and so on) of the pro cess, a p opular approach is to use a sequential sampling strategy . The k ey steps of such a sampling strategy are summarized as follows: 1. Cho ose an initial design { x 1 , ..., x n 0 } . 2. Run the sim ulator, f , for these design settings to get { y 1 , ..., y n 0 } , where y i = f ( x i ). Set n = n 0 . 3. Fit a surrogate to the data { ( x i , y i ) , i = 1 , ..., n } (we use the Gaussian pro cess model). 4. Cho ose a new trial x new that leads to improv ement in the estimate of the feature of in terest. 4 5. Run the sim ulator for the new trial lo cation and up date the data x n +1 = x new , y n +1 = f ( x new ) and n = n + 1. 6. Rep eat 3-5 until n exceeds a predetermined limit. Dev eloping feature sp eciﬁc criteria for selecting new trials has gained con- siderable atten tion in the computer exp eriment communit y . F or instance, Jones et al. (1998) prop osed an exp e cte d impr ovement (EI) criterion for estimating the global minimum of an exp ensiv e computer sim ulator. Let I ( x ) denote the impro vemen t function for estimating a particular feature of in terest. Then, the proposed criterion in Jones et al. (1998) is E [ I 1 ( x )], where I 1 ( x ) = max { f min − y ( x ) , 0 } , f min is the current best estimate of the function minimum, and the exp ectation is tak en o ver the distribution of y ( x ) ∼ N ( ˆ y ( x ) , s 2 ( x )). That is, the new trial in Step 4 is the maximizer of E [ I 1 ( x )] = s ( x ) φ ( u ) + ( f min − ˆ y ( x ))Φ( u ) , (5) where u = ( f min − ˆ y ( x )) /s ( x ), and φ ( · ) and Φ( · ) denote standard normal p df and cdf respectively . One of the most attractiv e c haracteristics of the EI criterion is that it exhibits a balance b et ween ‘global’ and ‘lo cal’ searc h. The ﬁrst term in (5) supp orts global searc h, and the second term, lo cal searc h. Since the main ob jectiv e of p erforming a go o d design is to attain (or at least reach a go o d approximation of ) the global minim um in as few sim ula- tor runs as p ossible, eﬃcien t optimization of the EI function b ecomes v ery imp ortan t. The EI functions are often multimodal and the location, n umber and heights of these peaks c hange after adding every new trial. Jones et al. (1998) dev elop ed a branch and b ound (BNB) algorithm for optimizing the EI function (5), whic h we review brieﬂy in the next section. 2.3. Br anch and b ound algorithm: r eview for ﬁnding a glob al minimum BNB algorithms are often used for global optimization of non-conv ex functions (Lawler and W o o d, 1966; Moore, 1991). The BNB algorithm pre- sen ted here ﬁnds the global minim um of a real-v alued function g . In our case, g ( x ) = − E [ I ( x )] is deﬁned on the d -dimensional h yp ercub e Q init = χ = [0 , 1] d for a sp eciﬁc feature of in terest. There are three key comp onen ts of a BNB algorithm - (i) branc hing (ii) b ounding and (iii) pruning. The branc hing comp onen t uses a splitting pro- cedure that, giv en a rectangle Q ⊂ Q init , returns Q I and Q I I suc h that Q I ∩ Q I I = φ (n ull) and Q I ∪ Q I I = Q . The splitting o ccurs along the 5 longest edge of Q (if there is a tie, one is c hosen randomly). Bounding is the second comp onen t. Bounding the ob jective function g , requires ﬁnding the low er and upp er b ounds, Ψ lb and Ψ ub , of the minim um v alue of g . Let Ψ min ( Q ) = min x ∈Q g ( x ) for ev ery Q ⊂ Q init , then Ψ lb and Ψ ub m ust satisfy R 1 : Ψ lb ( Q ) ≤ Ψ min ( Q ) ≤ Ψ ub ( Q ) R 2 : ∀  > 0 ∃ δ > 0 such that for all Q ⊆ Q init , (6) |Q| ≤ δ = ⇒ Ψ ub ( Q ) − Ψ lb ( Q ) ≤ . As in Balakrishnan et al. (1991), |Q| is the length of the longest edge of rectangle Q . Although BNB algorithms guaran tee the global minim um of g with a pre-sp eciﬁed tolerance  , the b ounding functions Ψ lb and Ψ ub are ob jective function sp eciﬁc and often nontrivial to deriv e. This is the most c hallenging part of a BNB algorithm. Appropriate lo wer and upp er bounds can b e constructed for maximizing the EI criteria for a few speciﬁc pro cess features of in terest, and this is the main con tribution of the pap er. The third comp onen t of the BNB algorithm is pruning, in whic h the algorithm remo v es rectangles Q from the set of all rectangles L , that ha ve low er b ound greater than the upp er b ound of some other rectangle in L . In the BNB algorithm outlined b elo w, individual rectangles are represented b y Q ’s and lists of rect- angles are represen ted by L ’s: 1 k = 0 (initialize coun ter) 2 L 0 = {Q init } (initialize rectangle list) 3 L 0 = Ψ lb {Q init } 4 U 0 = Ψ ub {Q init } 5 while U k − L k >  6 P ick Q ∈ L k : Ψ lb ( Q ) = L k 7 S pl it Q al ong one of its l ong est edg es into Q I and Q I I 8 L k +1 := ( L k \{Q} ) ∪ {Q I , Q I I } (remo ve Q , replace with Q I and Q I I to get L k +1 ) 9 L k +1 := min Q∈ L k +1 Ψ lb ( Q ) 10 U k +1 := min Q∈ L k +1 Ψ ub ( Q ) 11 L k +1 := L k +1 \ {Q ∈ L k +1 : Ψ lb ( Q ) > U k +1 } 12 k = k + 1 13 end In this algorithm, k is the iteration index, L k +1 is the list of h yp er-rectangles, 6 ( L k +1 , U k +1 ) are the smallest lo wer and upp er b ounds for Ψ min ( Q init ) (in our case min x ∈ χ − E [ I ( x )]) after k + 1 iterations, and  is the desired precision and is ﬁxed beforehand. Steps 6-8 correspond to branc hing, Steps 9-10 bounding, and Step 11 represen ts pruning of the hyper-rectangles. The EI function, E [ I 1 ( x )], for ﬁnding the global minimum of the sim u- lator, is a function of the input lo cation x via the predicted resp onse ˆ y ( x ), the asso ciated uncertaint y s 2 ( x ) at x and the current estimate of the global minim um f min (in general the estimate of the feature of in terest). Jones et al. (1998) note that E [ I 1 ( x )] is monotonic with respect to (w.r.t.) ˆ y ( x ) and s ( x ). Let E I 1 ( s ( x ) , ˆ y ( x )) denote E [ I 1 ( x )] for all x ∈ χ . T aking partial deriv ativ es of (5) w.r.t. ˆ y ( x ) and s ( x ) give ∂ E I 1 ∂ s ( x ) = φ ( u ) and ∂ E I 1 ∂ ˆ y ( x ) = − Φ ( u ) . The partial deriv atives ∂ E I 1 /∂ s ( x ) ≥ 0 and ∂ E I 1 /∂ ˆ y ( x ) ≤ 0 for all x ∈ χ . Hence if ˆ y ( x ) and s ( x ) can b e b ounded by ( ˆ y lb ( Q ) , ˆ y ub ( Q )) and ( s lb ( Q ) , s ub ( Q )) resp ectiv ely o ver a hyper-rectangle Q ∈ L , then for every x ∈ Q , the low er and upp er bounds E [ I 1 ( x )] ub = Ψ lb ( Q ) and E [ I 1 ( x )] lb = Ψ ub ( Q ) are E [ I 1 ( x )] lb = E I 1 ( s lb ( Q ) , ˆ y ub ( Q )) , E [ I 1 ( x )] ub = E I 1 ( s ub ( Q ) , ˆ y lb ( Q )) , where ˆ y lb ( Q ) , ˆ y ub ( Q ) , s lb ( Q ) and s ub ( Q ) are the low er and upp er b ounds on ˆ y ( x ) and s ( x ) o ver Q . These b ounds are needed in Steps 6-11. In practice s ( x ) is replaced b y its predicted v alue ˆ s ( x ). This approac h of b ounding the EI function is eﬃcien t, as b ounding s ( x ) and ˆ y ( x ) is relativ ely easier (see Jones et al. (1998) for details). 3. BNB Algorithms for New EI Criteria In this section w e ﬁrst prop ose a generalization of the exp ected impro v e- men t criterion dev elop ed b y Jones et al. (1998) for sim ultaneously estimating the maximum and minimum of an exp ensive deterministic computer simula- tor. Next w e develop a BNB algorithm for maximizing this new EI criterion. W e also propose a mo diﬁcation in the EI criterion dev elop ed b y Ranjan et al. (2008) for estimating a pre-speciﬁed con tour. This modiﬁcation facilitates the dev elopmen t of a BNB algorithm for maximizing the mo diﬁed EI crite- rion, and still main tains the global versus lo cal search trade-oﬀ. 7 3.1. EI criterion for maximum and minimum W e prop ose an impro vemen t function for sim ultaneous estimation of the global maxim um and minim um. This feature could b e of sp eciﬁc in terest, for instance, if one wishes to iden tify b est-case and worst-case scenarios in a global climate change mo del, or maximal and minimal pro jected unemplo y- men t rates in a complex mo del of the economy . The improv emen t function can b e written as I 2 ( x ) = max { ( y ( x ) − f max ) , ( f min − y ( x )) , 0 } , (7) where f max and f min are current best estimates of the global maxim um and minim um resp ectiv ely . The corresponding exp ected improv ement criterion is obtained b y taking the exp ectation of I 2 ( x ) with resp ect to the distribution of y ( x ) ∼ N ( ˆ y ( x ) , s 2 ( x )), E [ I 2 ( x )] = sφ ( u 1 ) + ( ˆ y − f max )Φ( u 1 ) + sφ ( u 2 ) + ( f min − ˆ y )Φ( u 2 ) , (8) where u 1 = ( ˆ y − f max ) /s and u 2 = ( f min − ˆ y ) /s . The EI criterion (8) turns out to be the sum of the t wo EI criteria for individually estimating the global maxim um and the global minimum. Optimization of (8) fa v ors subsequen t sampling in the neigh b ourho od of the global maximum if ( ˆ y − f max )Φ( u 1 ) is the dominating term in the sum, the global minimum if ( f min − ˆ y )Φ( u 2 ) is the dominant term, and otherwise in the less explored regions to minimize o verall v ariabilit y . As in optimization of the EI criterion (5) for estimating the global min- im um, a BNB algorithm can b e developed for optimizing (8). The most diﬃcult part, computation of the lo wer and upper bounds of (8) that sat- isfy R 1 and R 2 in (6) for minimizing g ( x ) = − E [ I 2 ( x )], is accomplished by monotonicit y of the E [ I 2 ( x )] w.r.t. ˆ y ( x ) and s ( x ). Let E I 2 ( s ( x ) , ˆ y ( x )) denote E [ I 2 ( x )] for all x ∈ χ . Then the tw o partial deriv ativ es of E [ I 2 ( x )] are ∂ E I 2 ∂ s ( x ) = φ ( u 1 ) + φ ( u 2 ) and ∂ E I 2 ∂ ˆ y ( x ) = Φ ( u 1 ) − Φ ( u 2 ) . The partial deriv ative w.r.t. s ( x ) is positive for all x ∈ χ , and the partial deriv ativ e w.r.t. ˆ y ( x ) is equal to zero when ˆ y ( x ) = ( f max + f min ) 2 . Moreo v er, it is straigh tforward to show that ∂ E I 2 ∂ ˆ y ( x ) ( ≥ 0 if ˆ y ( x ) > ( f max + f min ) 2 ≤ 0 if ˆ y ( x ) < ( f max + f min ) 2 . 8 Hence E [ I 2 ( x )] is monotonic for all x ∈ χ w.r.t. s ( x ) and piecewise monotonic w.r.t. ˆ y ( x ). Giv en the upper and low er bounds on ˆ y ( x ) and s ( x ) in a hyper- rectangle Q ⊂ Q init , somewhat c onservative b ounds on E [ I 2 ( x )], for ev ery x ∈ Q , are E [ I 2 ( x )] lb = min { E I 2 ( s lb ( Q ) , ˆ y lb ( Q )) , E I 2 ( s lb ( Q ) , ˆ y ub ( Q )) } , E [ I 2 ( x )] ub = max { E I 2 ( s ub ( Q ) , ˆ y lb ( Q )) , E I 2 ( s ub ( Q ) , ˆ y ub ( Q )) } . That is, the upp er bound of E [ I 2 ( x )] in Q is calculated at s ub ( Q ), and due to piecewise monotonicit y of E [ I 2 ( x )], corresp onds to the ˆ y ( x ) in Q that is closest to f min or f max . Similarly , the low er b ound of E [ I 2 ( x )] is calculated using s lb ( Q ) and the ˆ y ( x ) in Q that is closest to ( f max + f min ) / 2. These b ounds are conserv ativ e because the low er and upp er b ounds of s ( x ) and minimizer of | ˆ y ( x ) − ( f max + f min ) / 2 | ma y not correspond to the same p oin t in the input space. The lo w er bound Ψ lb ( Q ) of min x ∈Q ( − E [ I 2 ( x )]) will be equal to the true minim um of − E [ I 2 ( x )] only if the minimizers of s ( x ) and | ˆ y ( x ) − ( f max + f min ) / 2 | are identical. As a result, the h yp er-rectangles in L are pruned at a slo wer rate. 3.2. BNB algorithm for c ontour estimation pr oblem The use of a EI criterion for estimating a con tour (level set) of an exp en- siv e computer simulato r w as prop osed b y Ranjan et al. (2008). The motiv at- ing application inv olv ed distinguishing “go od” from “bad” performance of a serv er in a one-server-t wo-queue net work queuing simulator. The prop osed impro vemen t function for estimating the con tour S ( a ) = { x : y ( x ) = a } was I 3 ( x ) =  2 ( x ) − min { ( y ( x ) − a ) 2 ,  2 ( x ) } , where  ( x ) = αs ( x ), for a positive constan t α , deﬁnes a neigh b ourho o d around the con tour. The exp ectation of I ( x ) w.r.t. y ( x ) ∼ N ( ˆ y ( x ) , s 2 ( x )) is giv en by E [ I 3 ( x )] =   2 ( x ) − ( a − ˆ y ( x )) 2  [Φ ( u 1 ) − Φ ( u 2 )] (9) − 2( a − ˆ y ( x )) s ( x ) [ φ ( u 1 ) − φ ( u 2 )] − s 2 ( x ) Z u 2 u 1 w 2 φ ( w ) dw , where u 1 = ( a − ˆ y ( x ) +  ( x )) /s ( x ), u 2 = ( a − ˆ y ( x ) −  ( x )) /s ( x ), and φ ( · ) and Φ( · ) are the standard normal p df and cdf resp ectiv ely . Ranjan et al. (2008) use E [ I 3 ( x )] as the EI criterion for selecting additional new trials in the se- quen tial sampling strategy . Ho w ever, con trolling the trade-oﬀ b et ween local and global searc h has gained atten tion in the area of computer exp erimen ts 9 (e.g., Sc honlau, W elc h and Jones (1998), S´ obester, Leary , and Keane (2005)). This motiv ated us to modify the EI criterion in (9) that will facilitate the construction of the lo wer and upp er b ounds which satisfy R 1 and R 2 in (6), and still en tertains the global versus lo cal search trade-oﬀ. Note that the third term in (9) represents the total uncertain t y in the  -neigh b ourho od of the con tour at x , and can b e dropp ed without altering the imp ortan t features of the criterion. W e propose a mo diﬁed EI criterion, obtained b y dropping the third term in (9) and using a change of v ariable { ˆ y ( x ) , s ( x ) } → { t ( x ) , s ( x ) } , giv en by E [ I ∗ 3 ( x )] = s 2 ( x )  α 2 − t 2  [Φ ( t + α ) − Φ ( t − α )] (10) − 2 ts 2 ( x ) [ φ ( t + α ) − φ ( t − α )] , where t = ( a − ˆ y ( x )) /s ( x ). Although the prop osed modiﬁcation slightly alters the tradeoﬀ b et ween the global and local search (see Section 4.1 for more details), it allo ws the partial deriv atives of E [ I ∗ 3 ( x )] to b e piecewise monotone. F urthermore, this facilitates easy construction of lo wer and upp er b ounds in the BNB algorithm for eﬃcien t optimization of E [ I ∗ 3 ( x )]. Let E I ∗ 3 ( s ( x ) , t ( x )) denote E [ I ∗ 3 ( x )] for all x ∈ χ . Then, the partial deriv ativ e of E [ I ∗ 3 ( x )] w.r.t. t = t ( x ) is ∂ E I ∗ 3 ∂ t = − 2 s 2 ( x ) t [Φ( t + α ) − Φ( t − α )] + 2 αs 2 ( x ) t [ φ ( t + α ) + φ ( t − α )] + s 2 ( x )( α 2 + t 2 − 2)[ φ ( t + α ) − φ ( t − α )] . (11) It turns out that the sign of the partial deriv ative ∂ E I ∗ 3 /∂ t dep ends on the sign of t = ( a − ˆ y ( x )) /s ( x ). Since the normal p df is symmetric around the mean (i.e., φ ( u ) = φ ( − u ), for all u ), the partial deriv ative ∂ E I ∗ 3 /∂ t = 0 at t = 0. In general, ∂ E I ∗ 3 ∂ t    ≤ 0 if t > 0 = 0 if t = 0 ≥ 0 if t < 0 . Due to the presence of normal pdf and cdf in (11), it may not b e straight- forw ard to analytically prov e the monotonicit y result, ho wev er, it can easily b e plotted (see Figure 1). There are a few p oin ts worth noting. F or instance, the height of these p eaks, in Figure 1, increases with s . The magnitude and sign of the deriv ative dep end on the choice of α (we used α = 2). Interestingly , the nature of the curv es remain the same for α > 1 (approximately). 10 − 6 − 4 − 2 0 2 4 6 − 4 0 − 3 0 − 2 0 − 1 0 0 1 0 2 0 3 0 4 0 s = 1 s = 2 s = 3 s = 4 s = 5 t ( x ) = ( a − y ^ ( x ) ) / s ( x ) ∂ E I 3 * ( x ) / ∂ t ( x ) Figure 1: Partial deriv ative of E I ∗ 3 ( t ( x ) , s ( x )) w.r.t. t ( x ) ev aluated at α = 2. The partial deriv ativ e E I ∗ 3 w.r.t. s ( x ) is giv en by ∂ E I ∗ 3 ∂ s ( x ) = 2 s ( x )( α 2 − t 2 )[Φ( t + α ) − Φ( t − α )] − 4 ts ( x )[ φ ( t + α ) − φ ( t − α )] . (12) The partial deriv ative ∂ E I ∗ 3 /∂ s ( x ) is alwa ys non-negative. How ever, as with the monotonicit y of E I ∗ 3 ( s ( x ) , t ( x )) with resp ect to t ( x ), analytic pro of ma y b e somewhat complicated. Figure 2 displays the partial deriv ativ e ∂ E I ∗ 3 /∂ s ( x ) as a function of s ( x ). As in the sim ultaneous estimation of the global maxim um and minimum case, b ounds on E [ I ∗ 3 ( x )] for ev ery x ∈ Q are obtained using the b ounds on t ( x ) and s ( x ) in the h yp er-rectangle Q ∈ L . The b ounds are E [ I ∗ 3 ( x )] lb = min { E I ∗ 3 ( s lb ( Q ) , t lb ( Q )) , E I ∗ 3 ( s lb ( Q ) , t ub ( Q )) } , E [ I ∗ 3 ( x )] ub = max { E I ∗ 3 ( s ub ( Q ) , t lb ( Q )) , E I ∗ 3 ( s ub ( Q ) , t ub ( Q )) } . Because E I ∗ 3 ( s ( x ) , t ( x )) is an increasing function of s ( x ), the low er b ound of E [ I ∗ 3 ( x )] in Q is obtained using s lb ( Q ), and the v alue of ˆ y ( x ) farthest from the estimated con tour. 11 0 1 2 3 4 5 6 0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 t = 0 t = 1 t = 2 t = 3 t = 4 s ( x ) ∂ E I 3 * ( x ) / ∂ s ( x ) Figure 2: Partial deriv ative of E I ∗ 3 ( t ( x ) , s ( x )) w.r.t. s ( x ) ev aluated at α = 2. Ranjan et al. (2008) used a genetic algorithm for maximizing the exp ected impro vemen t criterion (9). W e will show in Sections 4.2 and 4.3 that the prop osed BNB algorithm outperforms the implemen tation of Ranjan et al. (2008) and can sa ve simulator ev aluations. 4. Sim ulations This section presents comparison b et ween the prop osed branc h and b ound algorithm, a genetic algorithm (GA) and a non-sequential (static) design for maximizing the EI criteria for v arious test functions. Tw o approac hes are tak en to compare BNB, GA and a static design approac h. The ﬁrst w e will call the “direct comparison”, and w as designed to reduce all noise external to the tw o optimization algorithms (BNB and GA) so that the EI criterion v alues chosen b y each method can be directly compared. The second, the “long run comparison”, demonstrates the av erage performance of the t w o algorithms by comparing the estimated and true features of interest as the additional new trials are augmented. In b oth cases, complex test functions are used for generating computer sim ulator outputs. The GA we implemen ted to maximize the exp ected impro vemen t function 12 can b e described as follows (see Ranjan et al. (2008) for details): for i=1..number_multistarts Generate an initial population X_init (of size n_init) for j=1..number_generations Mutation - Randomly perturb X_init to get X_pert Augmentation: X_aug = [X_init; X_pert] Crossover - Randomly swap elements of X_aug to get X_cross Augmentation: X_aug = [X_aug; X_cross] Selection - Evaluate EI for every candidate solution in X_aug and retain n_init members with highest EI value. end end GAs are w ell-known algorithms that can pro duce comp etitive results. In the GA w e used, the magnitude of the m utation w as v ariable and p erturb ed eac h dimension ± (0..5)% of the original v alue. The initial p opulation w as generated with the random Latin h yp ercube function lhsdesign in Matlab 7.5.0. Crosso ver was b et ween randomly selected pairs, in random dimensions. See Holland (1975) and Mitc hell (1996) for more information. Before we compare the optimization p o wer of the proposed BNB algo- rithms and the GA, we presen t results from an empirical study on the trade- oﬀ b et ween lo cal v ersus global searc h in E I 3 and E I ∗ 3 , for estimating a pre- sp eciﬁed con tour. 4.1. Comp arison b etwe en E I 3 and E I ∗ 3 Both E I 3 and E I ∗ 3 consist of terms that encourage local and global searc h. The most important diﬀerence is the balance b et w een the lo cal and the global searc hes. The entries in T able 1 denote the prop ortion of additional trials that were c hosen in the neighborho o d of a pre-sp eciﬁed con tour, i.e., the prop ortion of new trials that fa v ored lo cal search. T able 1 presents the results when the computer sim ulator outputs were generated from the Branin and t wo-dimensional Levy functions (see Section 4.3 for details on these test functions). The results presen ted here are av eraged o ver 100 realization of random maximin Latin h yp ercub e designs of size n 0 = 20 for initial design. F or eac h realization, the new trials w ere c hosen b y maximizing the t wo EI 13 criteria using GA. The con tours of interest were y = 45 and y = 70 for the Branin and Levy functions resp ectiv ely . This simulation will display whether there are practical diﬀerences b et ween the t wo criteria in terms of their balance b et w een lo cal and global searc h. T able 1: Proportion of additional trials chosen in the neigh b ourhoo d of a pre- sp eciﬁed contour, by E I 3 and E I ∗ 3 criteria for estimating pre-sp eciﬁed contours n umber of Branin y = 45 Con tour Levy 2D y = 70 Con tour p oin ts added E I 3 E I ∗ 3 E I 3 E I ∗ 3 k = 5 0.55 0.66 0.49 0.55 k = 10 0.72 0.80 0.44 0.53 k = 20 0.85 0.89 0.39 0.54 k = 30 0.89 0.93 0.37 0.55 A new p oin t w as considered to be in the lo cal neigh b ourhoo d if f ( x new ) w as within (40 , 50) for the Branin function, and within (60 , 80) for the Levy function. F or b oth functions more p oin ts are added in the lo cal neigh b our- ho od with the new criterion, most notably in estimating the Levy con tour when k = 30 new p oin ts w ere added. The practical implications of this sim ulation are that the new criterion allo cates slightly more p oin ts for lo cal search. This is expected as the term with integral w as remov ed from (9), and it con tributes to the global search. This is a desirable characteristic of an exp ected improv ement criterion if the underlying surface is relativ ely smo oth. If it is known that the simulator resp onse surface is relativ ely bumpy , the lo cal and global searc h terms in (10) could b e rew eighted as in S´ ob ester et al. (2005). This rew eighted EI will still enable easy construction of the low er and upp er bounds for the branch and b ound algorithm. 4.2. Dir e ct c omp arison The ob jective of this sim ulation is to compare BNB and GA optimization of the EI criterion for a single step of the sequen tial design algorithm, starting from the same initial design. T o begin with w e follo w Steps 1-3 of the sequen tial design strategy out- lined in Section 2.2 to obtain a ﬁtted surface. The ﬁtted GP mo del serv es 14 as input for the BNB and GA optimizer. These t wo methods searc h for the maxim um of the EI criterion o ver Q init = [0 , 1] d , and then the estimates of the maxim um EI are compared. The tw o optimizers BNB and GA use exactly the same inputs (MLE of the parameters in GP model, initial trials X init , y ( X init ), and the curren t best estimate of the feature of in terest) in this comparison, and so their output are directly comparable, though sub ject to small appro ximation. The appro ximation in BNB comes from estimating the b ounds of ˆ y ( x ), ˆ s ( x ) based on p oin ts in eac h rectangle Q ∈ Q init = [0 , 1] d . F or example the upp er bound of ˆ y ( x ) in some rectangle Q is estimated b y the maxim um observ ed ˆ y ( x ) for x ∈ Q from the GP mo del ﬁt. Th us, the comparison will be betw een a ‘stochastic v ersion’ of the branch and b ound algorithm dev elop ed in Section 3, and the genetic algorithm outlined earlier. F or all the comparisons presented here the num b er of ev aluations of the EI criterion will b e ﬁxed for b oth BNB and GA; 500 for 2-dimensional, and 3000 for 4-dimensional test functions. In actual applications, the use of a a tolerance  for BNB migh t b e preferred to the ev aluation budget adopted here. The ev aluation budget facilitates direct comparison. The entries in T ables 2 and 3 summarize the EI v alues found by BNB and GA for t w o features of in terest (contour, maxim um and minim um) and three test functions (Branin, t wo-dimensional and four-dimensional Levy). The n umbers are the maximum EI v alues av eraged o ver approximately 500 diﬀeren t initial random maximin Latin h yp ercub e designs. A few sim ulations resulted in bad GP ﬁt (esp ecially when ﬁtting the complex Levy functions, see Figure 6) and b oth BNB and GA results for those sim ulations w ere excluded from a v erage calculation. The num b ers in paren theses sho w the standard errors of these EI v alues ( s/ √ n = s/ √ 500). F or example, in the ﬁrst ro w of T able 2, 9.85 and 7.05 are the av erage maximum EI v alues obtained by BNB and GA, and the second ro w contains the corresp onding sample standard error of 0.18 and 0.20 resp ectiv ely . T able 2 sho ws that for ev ery case (c hoice of n 0 , feature of in terest and test function) BNB maximizes E [ I 2 ( x )] more eﬀectively than GA. In some cases the tw o metho ds are closer - for instance with 40 initial p oints in sim ultaneous estimation of the maxim um and minimum for the 2D Levy function the t wo metho ds are practically indistinguishable. But is most of these sim ulations BNB lo cates signiﬁcan tly higher EI than GA. Analogous to the results presented in T able 2, T able 3 presen ts the results for the four-dimensional Levy function. The features of in terest are the con tour at y = 180 and sim ultaneous determination of the maximum and 15 T able 2: EI optimization using BNB and GA for sim ultaneous estimation of the global maximum and minim um, and contours for the Branin and 2D Levy functions Maxim um & minimum Branin Levy 2D BNB GA BNB GA n 0 = 10 9.85 7.05 3.58 2.74 (0.18) (0.20) (0.23) (0.18) n 0 = 20 7.46 4.45 1.33 1.13 (0.17) (0.18) (0.11) (0.10) n 0 = 30 6.51 3.48 0.76 0.66 (0.17) (0.16) (0.06) (0.05) n 0 = 40 5.61 2.75 0.68 0.66 (0.18) (0.14) (0.05) (0.04) Con tour at y = a Branin ( a = 45) Levy 2D ( a = 70) BNB GA BNB GA n 0 = 10 29.79 18.12 62.70 41.88 (0.67) (0.49) (6.19) (4.81) n 0 = 20 8.02 4.04 55.64 40.23 (0.24) (0.15) (5.14) (4.27) n 0 = 30 3.54 1.90 37.98 24.15 (0.15) (0.09) (3.58) (2.57) n 0 = 40 1.76 0.93 29.13 15.65 (0.08) (0.05) (3.19) (1.99) minim um. Results are presented when BNB and GA are allo cated 3000 ev aluations of the EI criterion (T ables 2 and 3 resp ectiv ely). F or the 4D Levy function, BNB ac hiev es slightly b etter results than GA for both of these EI criteria. 16 T able 3: EI optimization using BNB and GA for estimation of the y = 180 con tour, and simultaneous estimation of the maximum and minim um for 4D Levy function 4 Dimensional Levy F unction Con tour y=180 Maxim um and Minimum BNB GA BNB GA n 0 = 30 516.1 509.8 11.79 10.63 (38.39) (38.53) (0.54) (0.47) n 0 = 40 325.5 315.4 12.46 10.80 (27.14) (25.27) (0.56) (0.50) n 0 = 50 230.5 227.2 8.99 8.13 (16.62) (15.52) (0.38) (0.33) n 0 = 60 172.1 170.4 9.12 8.35 (14.90) (13.61) (0.39) (0.35) It should b e noted that one of the principal adv an tages of the branch and b ound algorithm is that it is designed to give results within  of the true maxim um of E [ I ( x )]. Fixing the n umber of EI ev aluations (and consequently the n umber of iterations of the BN B/GA optimization algorithm) though necessary for comparison, results in an approximation of the E I ( x ) maxim um whic h is not necessarily within  tolerance of the true E [ I ( x )] maxim um. 4.3. L ong run c omp arison The maximum E I v alues found b y B N B and GA cannot b e compared after the ﬁrst new p oint is added b ecause the augmented designs { X init ∪ x new g a } and { X init ∪ x new bnb } diﬀer, as will the ﬁtted GP surfaces and the EI surfaces. As a result, instead of comparing the maxim um E I , the running b est estimate of the features of in terest (e.g., maxim um) after adding k -th new design point are compared. W e also use a non-sequen tial (static) approac h, b y ﬁtting a GP model with a random maximin Latin hypercub e design of size n 0 + k , to serv e as a b enc hmark for comparison. Ideally , b oth BNB and GA should outp erform this non-sequential approac h. Next, we presen t three examples to illustrate the comparisons. 17 Example 1 - Branin. Supp ose the computer mo del output y = f ( x 1 , x 2 ) is generated using Branin function giv en by f ( x 1 , x 2 ) =  x 2 − 5 . 1 x 2 1 4 π 2 + 5 x 1 π − 6  2 + 10  1 − 1 8 π  cos( x 1 ) + 10 , (13) where x 1 , x 2 ∈ [0 , 5], how ever, w e rescale the inputs to [0 , 1] 2 . The true global maxim um, y max = 55 . 6, and minim um, y min = 0 . 398, of Branin function are attained at ( x 1 , x 2 ) = (0 , 0) and ( x 1 , x 2 ) = (0 . 62 , 0 . 42). Figure 3 presents the long run comparison of the BNB and GA optimizers for maximizing the EI criterion (8) developed for sim ultaneous estimation of the global maximum and minimum. F or each realization, a random maximin Latin h yp ercub e designs of size n 0 = 20 was chosen as initial design, and then n new = 30 additional trials were found sequentially (one at-a-time) b y maximizing the EI criterion and augmen ted to the design. The results are a veraged ov er 100 suc h realizations, and the error bars denote the simulation standard error. 0 5 10 15 20 25 30 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Number of Points Added to Initial Design Current Estimate of Global Minimum (a) Minimum 0 5 10 15 20 25 30 35 40 45 50 55 Number of Points Added to Initial Design Current Estimate of Global Maximum (b) Maximum Figure 3: Estimated optima for Branin function (true maximum = 55 . 6, minimum = 0 . 398); BNB (blue - solid), GA (red - dashed), static (blac k - dash-dotted) It is clear from Figure 3 that the optimization of the EI criterion using 18 BNB (blue) leads to the true maxim um in signiﬁcan tly few er num b er of computer sim ulator runs compared to the (red) GA optimizer or baseline (blac k) searc h of a random maximin Latin h yp ercube. Ho wev er the estimated global minim um do es not seem to b e signiﬁcantly diﬀeren t for any of the optimization techniques, though on a verage BNB attains a closer estimate of the true minim um than GA which is in turn closer on a verage than the static metho d. This is somewhat in tuitive as the Branin function is quite smo oth and almost ﬂat near the global minim um (see Figure 4). 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3.90415 7.35068 7.35068 10.7972 10.7972 14.2437 14.2437 17.6903 17.6903 21.1368 24.5833 28.0299 31.4764 34.9229 38.3695 41.816 x1 x2 5 10 15 20 25 30 35 40 45 50 Figure 4: Contours of Branin function in [0 , 5] 2 rescaled to [0 , 1] 2 . A comparison of a static design of size n 0 + k and the sequen tial design with EI (10) optimized using BNB and GA when estimating a contour of heigh t y = 45 is presented in Figure 5. Their performance is measured in terms of the div ergence b et ween the true and estimated contour, giv en b y d k = v u u t 1 n m X i =1 ( ˆ y k ( x c,i ) − a ) 2 , (14) where { x c,i , i = 1 , ..., m } denotes the discretized true con tour at y = 45, and ˆ y k ( · ) denotes the estimated process v alue after adding k new p oin ts or from a static design of size n 0 + k . The results are a veraged o v er 100 random 19 maximin Latin hypercub e initial designs of size n 0 = 20 each, and for each realization, w e add n new = 30 additional trials. 0 5 10 15 20 25 30 35 0 2 4 6 8 10 12 Number of Points Added to Initial Design Contour Discrepancy d k Figure 5: Estimated BNB(blue), GA(red), and static(blac k) con tour div ergence v ersus n um b er of new p oin ts added to initial design for the Branin test function. Figure 5 displa ys the div ergence d k for the Branin function with contour at y = 45. In this simulation BNB and GA metho ds perform similarly , b oth signiﬁcan tly better than the static design. The contour discrepancies of the sequen tial metho ds approac h 0 quic kly; an indication that this con tour is relativ ely simple to lo cate. Since the contour at y = 45 is in the very b ottom left corner of the design region (see Figure 5), the maximin Latin h yp ercub e design is less lik ely to place p oin ts very near the contour y = 45, which re- sults in p oor p erformance for the static metho d. Example 2 - Levy 2D. Supp ose the computer mo del output y = f ( x 1 , x 2 ) is generated using the Levy function giv en by f ( x 1 , ..., x d ) = sin 2 ( π w 1 ) + d − 1 X k =1 ( w k − 1) 2 [1 + 10 sin 2 ( π w k + 1)] + ( w d − a ) 2 , where w k = 1 + ( x k − 1) / 4 and x k ∈ [ − 10 , 10] for k = 1 , ..., d . W e rescale the inputs x = ( x 1 , ..., x d ) to [0 , 1] d . F or the t wo-dimensional Levy function (see 20 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x1 x2 78.0405 52.027 26.0135 8.6712 26.0135 8.6712 8.6712 8.6712 17.3423 8.6712 17.3423 43.3558 10 20 30 40 50 60 70 80 Figure 6: 2D Levy function contour plot rescaled to [0 , 1] 2 Figure 6), the global maxim um, y max = 95 . 4, and minimum, y min = 0, are attained at ( x 1 , x 2 ) = (0 , 0) and ( x 1 , x 2 ) = (0 . 55 , 0 . 55). Figure 7 presen ts the long run comparison of the BNB and GA optimizers for maximizing the EI criterion (8) dev elop ed for simultaneous estimation of the global maxim um and minim um. As in Example 1, the results are a veraged o ver 100 realizations with initial designs of size n 0 = 20 and n new = 30 additional trials. F or this example, BNB is a clear winner and leads to b etter estimates of the global optima in muc h fewer simulator runs for b oth the maximum and minim um. As exp ected, GA p erforms signiﬁcantly b etter than the static design. Figure 8 presen ts the div ergence comparison of the sequen tial design with the tw o optimizers as w ell as a static design when estimating the con tour at y = 70. Here also, each initial design w as of size n 0 = 20, n new = 30 new trials were added sequentially , and the results are a veraged ov er 100 initial random maximin Latin hypercub e designs. As in the maximum and minim um case, the sequence of p oin ts added by the BNB (blue) optimization of EI (10) lead to quic k er conv ergence of the con tour div ergence compared to GA (red). On av erage the static (blac k) designs pro vide relatively minimal improv emen t to the contour estimate. 21 0 5 10 15 20 25 30 0 1 2 3 4 5 6 7 Number of Points Added to Initial Design Current Estimate of Global Minimum (a) Minimum 0 10 20 30 40 30 40 50 60 70 80 90 Number of Points Added to Initial Design Current Estimate of Global Maximum (b) Maximum Figure 7: Estimated optima for Levy 2D function (true minim um = 0, maximum = 95.4); BNB (blue - solid), GA (red - dashed), static (black - dash-dotted). 0 5 10 15 20 25 30 35 15 20 25 30 35 40 45 Number of Points Added to Initial Design Contour Discrepency d k Figure 8: Estimated contour divergence comparison for the 2D Levy function ( y = 70); BNB (blue - solid), GA (red - dashed) and static (blac k - dash-dotted). 22 Example 3 - Levy 4D. W e no w use an example of a simulator with four- dimensional inputs (Levy function) to demonstrate that careful optimization of the EI criterion can sa v e signiﬁcant amount of sim ulator runs. The sim u- lation results use 30 initial design p oin ts chosen from random maximin Latin h yp ercub e, and 20 additional p oin ts added sequen tially . Comparisons using this higher dimensional function ma y highligh t the diﬀerences b et ween the t wo metho ds of optimization. BNB and GA w ere eac h giv en a budget of 3000 ev aluations of the EI criterion. The true global maxim um is approximately 255 at (0 , 0 , 0 , 0), and the minim um is 0 at (0 . 55 , 0 . 55 , 0 . 55 , 0 . 55). Figure 9 displays the comparison of our BNB algorithm with the GA and static designs in sim ultaneously estimating the maximum and minim um of the 4D Levy function. The results are av eraged ov er 100 random maximin Latin h yp ercub e initial designs of size n 0 = 30. BNB b egins to lo cate both the minim um and maxim um muc h more quickly than GA, whic h is in turn b etter than the static design. BNB is the only metho d that gets close to the maxim um in 20 additional trials. All three methods are slow to lo cate the global maxim um, a reﬂection of the complexity of the Levy function. 0 5 10 15 20 0 2 4 6 8 10 12 14 16 18 Number of Points Added to Initial Design Estimated Global Minimum (a) Estimated Minimum 0 5 10 15 20 80 100 120 140 160 180 200 Number of Points Added to Initial Design Estimated Global Maximum (b) Estimated Maximum Figure 9: Estimated optima for Levy 4D function (true minim um = 0, maximum = 255); BNB (blue - solid), GA (red - dashed) and static (black - dash-dotted). 23 As b efore, w e sim ulate the estimation of the y = 180 contour of the 4D Levy function for BNB, GA, and static designs. The num b er of initial p oin ts n 0 w as 30 and 20 new points were added. The results are av eraged o ver 100 random maximin Latin hypercub e designs. Figure 10 displa ys the con tour divergence d k for the three designs - sequential with EI (10) optimized using BNB, GA and the static design. The new p oin ts selected by the BNB algorithm allo w estimation of the 4D Levy y = 180 con tour more accurately than the other tw o methods, decreasing the con tour divergence m uc h more rapidly after only one p oin t w as added. 0 2 4 6 8 10 12 14 16 18 20 22 112 114 116 118 120 122 124 126 128 130 132 Number of Points Added to Initial Design Contour Discrepency d k Figure 10: Estimated c on tour div ergence comparison for the 4D Levy function ( y = 180); BNB (blue - solid), GA (red - dashed) and static (blac k - dash-dotted). As illustrated in these examples, the branc h and b ound algorithm is quite comp etitiv e. F or 2D and 4D Levy functions, BNB is sup erior to GA and static in estimation of contours as w ell as sim ultaneous estimation of the maxim um and minimum. In estimating the maxim um and minimum of the Branin function BNB lo cates the maxim um m uch more quickly than the other metho ds, and all 3 metho ds are similar in estimating the minimum. F or conto urs of the Branin function GA performs on par with BNB, b oth of whic h are preferable to a static design. The Branin function is relativ ely smo oth, and hence optimization of EI criteria is relativ ely simple, which ma y explain the absence of diﬀerences b et w een these metho ds in some cases. 24 5. Discussion W e ha ve generalized the exp ected impro vemen t criterion for ﬁnding the maxim um and minimum (simultaneously) as the features of interest, and demonstrated the use of this inﬁll criterion on the Levy and Branin test functions. Also we hav e developed branch and b ound algorithms for contour and maximum/minim um estimation, compared them with genetic algorithms and static designs, and hav e shown that the BNB outp erforms the other t wo optimization approaches. This study demonstrate that careful optimization of the EI criterion can sa ve signiﬁcant amoun t of sim ulator runs. Note that w e ha ve used a ‘sto c hastic v ersion’ of the BNB algorithm, and the appro ximation comes in from estimating the b ounds of ˆ y ( x ), ˆ s ( x ) based on p oin ts in eac h rectangle Q ∈ Q init = [0 , 1] d . F urther w ork should b e done to inv estigate maximization of the prop osed EI criteria using a non-sto c hastic branc h and b ound algorithm. References Balakrishnan, V., Bo yd, S. and Balemi, S. (1991), ‘Branch and b ound algo- rithm for computing the minimum stability degree of parameter-dependent linear systems’, Int. J. of R obust and Nonline ar Contr ol 1 (4), 295–317. F orrester, A. and Jones, D. (2008), Global Optimization of Deceptive F unc- tions With Sparse Sampling, in ‘12 th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference’, American Institute of Aeronau- tics and Astronautics. Holland, J. (1975), ‘Adaption in Natural and Artiﬁcial Systems’. Jones, D. (2001), ‘A T axonomy of Global Optimization Methods Based on Resp onse Surfaces’, Journal of Glob al Optimization 21 (4), 345–383. Jones, D. R., Sc honlau, M. and W elch, W. J. (1998), ‘Eﬃcient Global Opti- mization of Exp ensiv e Black-Bo x F unctions’, Journal of Glob al Optimiza- tion 13 (4), 455–492. La wler, E. and W o o d, D. (1966), ‘Branc h-and-b ound metho ds: A surv ey’, Op er ations r ese ar ch pp. 699–719. Mitc hell, M. (1996), A n Intr o duction to Genetic Algorithms , Bradford Bo oks. 25 Mo ore, R. (1991), ‘Global optimization to prescrib ed accuracy’, Computers & mathematics with applic ations(1987) 21 (6-7), 25–39. P apalambros, P ., Go o v aerts, P . and Sasena, M. (2002), ‘Exploration of Meta- mo deling Sampling Criteria for Constrained Global Optimization’, Engi- ne ering Optimization 34 (3), 263–278. Ranjan, P ., Bingham, D. and Michailidis, G. (2008), ‘Sequential Exp eriment Design for Con tour Estimation from Complex Computer Co des’, T e chno- metrics 50 (4), 527–541. Sac ks, J., W elc h, W., Mitchell, T. and Wynn, H. (1989), ‘Design and analysis of computer exp erimen ts’, Statistic al Scienc e 4 (4), 409–435. San tner, T. J., Williams, B. and Notz, W. (2003), The Design and A nalysis of Computer Exp eriments , Springer. S´ ob ester, A., Leary , S. and Keane, A. (2005), ‘On the design of optimization strategies based on global resp onse surface appro ximation models’, Journal of Glob al Optimization 33 (1), 31–59. Stein, M. L. (1999), Interp olation of Sp atial Data: Some The ory for Kriging , Springer. Villemon teix, J., V azquez, E. and W alter, E. (2006), ‘An informational ap- proac h to the global optimization of expensive-to-ev aluate functions’, Arxiv pr eprint cs.NA/0611143 . 26

Branch and Bound Algorithms for Maximizing Expected Improvement Functions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment