Modeling and Optimization with Gaussian Processes in Reduced Eigenbases -- Extended Version

Mo deling and Optimization with Gaussian Pro cesses in Reduced Eigen bases - Extended V ersion Da vid Gaudrie 1,2 , Rodolphe Le Ric he 2 , Victor Pic hen y 3 , Beno ˆ ıt Enaux 1 , and Vincen t Herb ert 1 1 Group e PSA 2 LIMOS: CNRS, Mines Sain t-Etienne and UCA 3 Pro wler.io Abstract P arametric shap e optimization aims at minimizing an ob jective function f ( x ) where x are CAD parameters. This task is diﬃcult when f ( · ) is the output of an exp ensiv e-to-ev aluate n umerical sim ulator and the n umber of CAD parameters is large. Most often, the set of all considered CAD shap es resides in a manifold of lo wer eﬀectiv e dimension in which it is preferable to build the surrogate mo del and perform the optimization. In this work, we uncov er the manifold through a high-dimensional shap e mapping and build a new coordinate system made of eigenshap es. The surrogate mo del is learned in the space of eigenshap es: a regularized likelihoo d maximization pro vides the most relev an t dimensions for the output. The ﬁnal surrogate mo del is detailed (anisotropic) with respect to the most sensitive eigenshap es and rough (isotropic) in the remaining dimensions. Last, the optimization is carried out with a fo cus on the critical dimensions, the remaining ones being coarsely optimized through a random em b edding and the manifold b eing accounted for through a replication strategy . At lo w budgets, the metho dology leads to a more accurate mo del and a faster optimization than the classical approac h of directly w orking with the CAD parameters. Keyw ords: Dimension Reduction, Principal Comp onen t Analysis, Parametric Shap e Opti- mization, Gaussian Pro cesses, Ba yesian Optimization 1 In tro duction The most frequen t approach to shap e optimization is to describ e the shap e b y a vector of d Computer Aided Design (CAD) parameters, x ∈ X ⊂ R d and to searc h for the parameters that minimize an ob jective function, x ∗ = arg min x ∈ X f ( x ). In the CAD mo deling process, the set of all p ossible shapes has b een reduced to a space of parameterized shap es, Ω Ω Ω : = { Ω x , x ∈ X } . This is an extended version of the article “David Gaudrie, Rodolphe Le Riche, Victor Pichen y , Benoit Enaux, Vincent Herb ert. Modeling and Optimization with Gaussian Processes in Reduced Eigenbases. Structural and Multidisciplinary Optimization, Springer V erlag (German y), 2020, 61, pp.2343-2361”, https://doi.org/10.1007/ s00158- 019- 02458- 6 . Please cite the journal v ersion. 1 It is common for d to b e large, d & 50. Optimization in suc h a high-dimensional design space is diﬃcult, esp ecially when f ( · ) is the output of a high ﬁdelit y numerical simulator that can only b e run a restricted n umber of times [ 44 ]. In computational ﬂuid dynamics for example, sim ulations easily tak e 12 to 24 hours and ev aluation budgets range b et w een 100 and 200 calls. Surrogate-based approac hes [ 39 , 18 ] hav e pro ven their eﬀectiv eness to tac kle optimization problems in a few calls to f ( · ). They rely on a surrogate mo del (or metamo del, e.g., Gaussian Pro cesses [ 46 , 13 , 37 ]) built up on n past observ ations of y i = f ( x ( i ) ). F or a Gaussian Pro cess (GP , [ 46 , 13 , 37 ]), given D n = { ( x (1) , y 1 ) , . . . , ( x ( n ) , y n ) } = { x (1: n ) , y 1: n } , f ( · ) can b e predicted in closed-form at any un tested p oin t x new ∈ X via the kriging mean predictor, m ( x new ). The probabilistic framew ork of GPs additionally pro vides the uncertaint y asso ciated to the prediction, known as the kriging v ariance, s 2 ( x new ), also computable in closed-form [ 37 ]. F or the optimization, the metamo del’s prediction and uncertain ty are mixed by an acquisition function suc h as the Exp ected Impro vemen t [ 31 ] to decide which design x ( n +1) should b e ev aluated next. Ho wev er, such tec hniques suﬀer from the curse of dimensionality [ 3 ] when d is large. The budget is also typically to o narrow to perform sensitivit y analysis [ 40 ] and select v ariables prior to optimizing. A further issue is that the CAD parameters x commonly hav e heterogeneous impacts on the shapes Ω x : many of them are intended to reﬁne the shap e lo cally whereas others hav e a global inﬂuence so that shap es of practical interest in volv e interactions betw een all the parameters. Most often, the set of all CAD generated shap es, Ω Ω Ω, can b e appro ximated in a δ -dimensional manifold, δ < d . In [ 35 , 36 ] this manifold is accessed through an auxiliary description of the shap e, φ (Ω), φ being either its characteristic function or the signed distance to its contour. The authors aim at minimizing an ob jective function using diﬀuse appro ximation and gradien t-based techniques, while staying on the manifold of admissible shapes. Active Shap e Mo dels [ 12 ] provide another wa y to handle shap es in whic h the con tour is discretized [ 45 , 50 ]. Building a surrogate mo del in reduced dimension can b e p erformed in diﬀeren t wa ys. The simplest is to restrict the metamo del to the most inﬂuential v ariables. But typical ev aluation budgets are too narro w to ﬁnd these v ariables before the optimization. Moreov er, correlations migh t exist among the original dimensions (here CAD parameters) so that a selection of few v ariables may not constitute a v alid reduced order description and meta-v ariables may b e more appropriate. In [ 52 ], the high-dimensional input space is circumv en ted by decomp osing the mo del in to a series of lo w-dimensional mo dels after an ANO V A pro cedure. In [ 8 ], a kriging mo del is built in the space of the ﬁrst P artial Least Squares axes for emphasizing the most relev ant directions. Related approac hes for dimensionality reduction inside GPs consist in a pro jection of the input x on a low er dimensional h yp erplane spanned by orthogonal v ectors. These v ectors are determined in diﬀerent manners, e.g. by searc hing the activ e space in [ 11 , 26 ], or during the h yp er-parameters estimation in [ 47 ]. A more detailed bibliography of dimension reduction in GPs is conducted in Section 3 . F or optimization purposes, the mo des of discretized shapes [ 45 ] are in tegrated in a surrogate mo del in [ 25 ]. In [ 9 ], the optimization is carried out on the most relev ant mo des using evolutionary algorithms combined with an adaptiv e adjustment of the b ounds of the design space, also emplo yed in [ 43 ]. F ollo wing the same route, in Section 2 , we retrieve a shape manifold with dimension δ < d . Our approac h is based on a Principal Component Analysis (PCA, [ 49 ]) of shap es describ ed in an ad ho c manner in the same v ein as [ 9 , 25 ] but it provides a new inv estigation of the b est w ay to characterize shapes. Section 3 is devoted to the construction of a kriging surrogate mo del in reduced dimension. Con trarily to [ 25 , 26 ], the least imp ortant dimensions are still accoun ted for. A regularized likelihoo d approach is employ ed for dimension selection, instead of the linear 2 PLS metho d [ 8 ]. In Section 4 , we employ the metamodel to p erform global optimization [ 24 ] via the maximization of the Exp ected Improv emen t [ 31 ]. A reduction of the space dimension is ac hieved through a random embedding tec hnique [ 51 ] and a pre-image problem is solved to k eep the correspondence b et ween the eigenshap es and the CAD parameters. The prop osed metho d is summarized in Figure 1 . 1 Sample inputs x and apply φ ( x ). 2 PCA: x mapp ed to α α α in { v 1 , . . . , v D } basis. 3 Determine the active and inactive eigendimensions for the output: α α α = [ α α α a , α α α a ]. 4 Build an additive GP with tw o diﬀerent resolutions: Y ( α α α ) = β + Y a ( α α α a ) + Y a ( α α α a ). 5 Optimization in α α α a space L random em b edding in α α α a space ⇒ α α α ( n +1) ∗ . 6 Solv e the pre-image problem: α α α ( n +1) ∗ ⇒ x ( n +1) ; up date the GP with ( α α α ( x ( n +1) ) , f ( x ( n +1) )) and another p oin t if replication used. Figure 1: Summary of the prop osed metho d. Steps 3-6 are iterated during the optimization process. Main notations 2 F rom CAD description to shap e eigen basis CAD parameters are usually set up b y engineers to automate shap e generation. These parameters ma y b e B ´ ezier or Spline control p oin ts which lo cally readjust the shap e. Other CAD parameters, suc h as the ov erall width or the length of a comp onen t, hav e a more global impact on the shap e. 3 A Manifold of α α α ’s for which ∃ x ∈ X : V > ( φ ( x ) − φ φ φ ) = α α α . A N Empirical manifold of α α α ’s which are the coordinates of the φ ( x ( i ) )’s in the eigenbasis. α α α Co ordinates of a design in the eigenshap e basis. α α α a Activ e comp onents of α α α . α α α a Inactiv e comp onents of α α α . d Num b er of (CAD) parameters. d 0 T rue eﬀectiv e dimension. δ Num b er of chosen/selected comp onents for dimension reduction. D Dimension of the high-dimensional shap e representation. n Num b er of ev aluated designs. N Number of shapes in the Φ Φ Φ database. Ω x Shap e induced b y the x parameterization. φ ( · ) High-dimensional shap e mapping, φ : X 7→ Φ Φ Space of shap e discretizations, Φ ⊂ R D . φ φ φ High-dimensional shap e represen tation of one design ( φ φ φ ∈ R D ). φ φ φ Mean shap e in the Φ Φ Φ database. Φ Φ Φ Shap e database ( N × D matrix whose i -th row is φ ( x ( i ) )). V D × D matrix whose columns ( v 1 , . . . , v D ) are the eigenv ectors of the cov ariance matrix of Φ Φ Φ. They are the vectors of the orthonormal V basis. x Design v ector in the space of CAD parameters, x ∈ X . X Original search space (of CAD parameters), X ⊂ R d . While these parameters are in tuitive to a designer, they are not chosen to ac hieve an y sp eciﬁc mathematical prop ert y and in particular do not let themselv es interpret to reduce dimensionality . In order to deﬁne a b etter b eha ved description of the shap es that will help in reducing dimen- sionalit y , we exploit the fact that the time to generate a shape Ω x is negligible in comparison with the ev aluation time of f ( x ). In the spirit of k ernel methods [ 48 , 41 ], we analyze the designs x in a high-dimensional feature space Φ ⊂ R D , D  d (p oten tially inﬁnite dimensional) that is deﬁned via a mapping φ ( x ), φ : X → Φ. With an appropriate φ ( · ), it is p ossible to distinguish a lo wer dimensional manifold em b edded in Φ. As we deal with shap es, natural candidates for φ ( · ) are shap e representations. This pap er is motiv ated by parametric shap e optimization problems. How ever, the approaches dev elop ed for metamodeling and optimization are generic and extend to any situation where a pre- existing collection of designs { x (1) , . . . , x ( N ) } and a fast auxiliary mapping φ ( x ) exist. φ ( x ) = x is a possible case. If x are parameters that generate a signal, another example would b e φ ( x ), the discretized times series. 2.1 Shap e represen tations In the literature, shap es hav e been describ ed in diﬀeren t w ays. First, the char acteristic function of a shap e Ω x [ 35 ] is χ Ω x ( s ) = ( 1 if s ∈ Ω x 0 if s / ∈ Ω x (1) where s ∈ R 2 or R 3 is the spatial co ordinate. χ is computed at some relev ant lo cations (e.g. on a grid) S = { s (1) , . . . , s ( D ) } and is cast as a D -dimensional vector of of 0’s or 1’s dep ending on 4 whether the s ( i ) ’s are inside or outside the shape. Second, the signe d distanc e to the c ontour ∂ Ω x [ 36 ] is D Ω x ( s ) = ε ( s ) min y ∈ ∂ Ω x k s − y k 2 , where ε ( s ) = ( 1 if s ∈ Ω x − 1 if s / ∈ Ω x (2) and is also computed at som e relev ant locations (e.g. on a grid) S , transformed into a vector with D components. Finally , the Poin t Distribution Mo del [ 12 , 45 ] where ∂ Ω x is discretized at D /k lo cations s ( i ) ∈ ∂ Ω x ⊂ R k ( k = 2 or 3), also leads to a D -dimensional represen tation of Ω x where D Ω x = ( s (1) > , . . . , s ( D/k ) > ) > ∈ R D . F or diﬀeren t shap es Ω and Ω 0 , S has to b e the same for χ and D , and the discr etizations { s (1) > , . . . , s ( D/k ) > } of Ω and Ω 0 need to b e consisten t for D . Figure 2 illustrates these shap e represen tations for t wo diﬀerent designs. The ﬁrst one consists of three circles param- eterized by their cen ters and radii. The second design is a NA CA airfoil whic h dep ends on three parameters. These shap es are des cribed b y the mappings φ ( x ) ∈ R D with φ ( x ) = χ Ω x ( S ) , D Ω x ( S ) and D Ω x , resp ectiv ely . Sp ecifying another design with parameters x 0 generally leads to φ ( x ) 6 = φ ( x 0 ). Figure 2: Shape representations for a design consisting of three circles (top) and for a NACA airfoil (b ottom). The representations are the characteristic function (left), the signed distance to the con tour (center), and the contour discretization(right). 2.2 PCA to retrieve the eﬀectiv e shap e dimension During the step 1 of our metho d (see Figure 1 ), a large num b er ( N ) of plausible designs x ( i ) ∈ X is mapp ed to Φ ⊂ R D and build the matrix Φ Φ Φ ∈ R N × D whic h contains the φ ( x ( i ) ) ∈ R D in rows and whose column-wise mean is φ φ φ ∈ R D . In the absence of a set of relev ant x ( i ) ’s, these designs can b e sampled from an a priori distribution, typically a uniform distribution. Next (step 2 in Figure 1 ), we p erform a Principal Comp onen t Analysis (PCA) on Φ Φ Φ: correlations are sought b et ween the φ ( x ) j ’s, j = 1 , . . . , D . The eigenv ectors of the empirical cov ariance matrix C Φ Φ Φ := 1 N ( Φ Φ Φ − 1 N φ φ φ > ) > ( Φ Φ Φ − 1 N φ φ φ > ), written v j ∈ R D , form an ordered orthonormal basis of Φ with decreasing imp ortance as measured b y the PCA eigen v alues λ j , j = 1 , . . . , D . They correspond to orthonormal directions in Φ that explain the most the dispersion of the high-dimensional representations of the shap es, φ ( x ( i ) ). Any design x can no w b e expressed in the eigenbasis V := { v 1 , . . . , v D } since φ ( x ) = φ φ φ + D X j =1 α j v j (3) 5 where ( α 1 , . . . , α D ) > =: α α α = V > ( φ ( x ) − φ φ φ ) are the co ordinates in V (principal comp onen ts), and V := ( v 1 , . . . , v D ) ∈ R D × D is the matrix of eigen vectors (principal axes). α j is the deviation from the mean shap e φ φ φ , in the direction of the eigenv ector v j . The α α α ( i ) ’s form a manifold A N := { α α α (1) , . . . , α α α ( N ) } which approximates the true α α α manifold, A := { α α α ∈ R D : ∃ x ∈ X , α α α = V > ( φ ( x ) − φ φ φ ) } . Even though A N ⊂ R D , it is often a manifold of lo wer dimension, δ  D , as w e will so on see (Section 2.3 ). Link with k ernel PCA N designs x ( i ) ∈ R d ha ve been mapped to a high-dimensional feature space Φ ⊂ R D in which PCA w as carried out. This is precisely the task that is p erformed in Kernel PCA [ 41 ], a nonlinear dimension reduction tec hnique (contrarily to PCA whic h seeks linear directions in R d ). KPCA aims at ﬁnding a linear description of the data in a feature space Φ, b y applying a PCA to nonlinearly mapp ed φ ( x ( i ) ) ∈ Φ. The diﬀerence with our approac h is that the mapping φ ( · ) as well as the feature space Φ are usually unkno wn in KPCA, since φ ( x ) may live in a v ery high-dimensional or ev en inﬁnite dimensional space in whic h dot pro ducts cannot b e computed eﬃcien tly . Instead, dot pro ducts are computed using designs in the original space X via a kernel whic h should not b e mistak en with the kernel of GPs, k φ : X × X → R , k φ ( x , x 0 ) = h φ ( x ) , φ ( x 0 ) i Φ (this is called the “k ernel-trick” [ 48 , 41 ]). The eigencomponents of the points after mapping, α ( i ) j = v j > ( φ ( x ( i ) ) − φ φ φ ), can be recov ered from the eigenanalysis of the N × N Gram matrix K with K ij = k φ ( x ( i ) , x ( j ) ) (see [ 41 , 50 ] for algebraic details). Finding whic h original v ariables in x corresp ond to a given v j is not straightforw ard and requires the resolution of a pre-image problem [ 30 , 50 ]. Ha ving a shape-related and computable φ ( · ) a voids these ruses and makes the principal axes v j directly meaningful. It is further p ossible to giv e the expression of the equiv alent k ernel in our approach, in terms of the mapping φ ( · ), from the p olarization iden tity . By deﬁnition of the (cen tered) high dimensional mapping to Φ, x 7→ φ ( x ) − φ φ φ , k ( φ ( x ) − φ φ φ ) − ( φ ( x 0 ) − φ φ φ ) k 2 R D = h ( φ ( x ) − φ φ φ ) − ( φ ( x 0 ) − φ φ φ ) , ( φ ( x ) − φ φ φ ) − ( φ ( x 0 ) − φ φ φ ) i R D = k ( φ ( x ) − φ φ φ ) k 2 R D + k ( φ ( x 0 ) − φ φ φ ) k 2 R D − 2 h ( φ ( x ) − φ φ φ ) , ( φ ( x 0 ) − φ φ φ ) i R D | {z } k φ hence, k φ ( x , x 0 ) = 1 2 ( k φ ( x ) − φ φ φ k 2 R D + k φ ( x 0 ) − φ φ φ k 2 R D − k φ ( x ) − φ ( x 0 ) k 2 R D ) (4) Logically , k φ ( · , · ), a similarit y measure b et ween designs, is negatively prop ortional to the distance b et w een the shap e representations. Because of the size of the eigenanalyses to b e p erformed, kernel PCA is adv antageous o ver a mapping follow ed by a PCA when D > N , i.e. when the shap es ha ve a v ery high resolution, and vice v ersa. In the current work where φ ( · ) is kno wn and D is smaller than 1000, we will follo w the mapping plus PCA approac h. 2.3 Exp erimen ts In this section, all the parametric design problems used in the exp erimen ts throughout this pap er are in tro duced and discussed in terms of signiﬁcant dimensions. Unless stated otherwise, the database Φ Φ Φ is made of N = 5000 designs sampled uniformly in X . W e start with 3 test cases of kno wn in trinsic dimension, which will b e complemen ted by 4 other test cases. The metamo deling and the optimization will b e addressed later in Sections 3 and 4 . 6 2.3.1 Retriev al of true dimensionality In this part, w e generate shapes of known lo w intrinsic dimension. In the Example 1 (cf. Figure 3 ), the shap es are circular holes of v arying cen ters and radii, therefore describ ed by 1, 2 or 3 parameters. In the Example 2 (cf. Figure 12 ), they are also circular holes but whose center positions and radii are describ ed b y sums 1 of parts of the 39 parameters. Last, in the Example 3 (cf. Figure 17 ), the shapes are made of three non ov erlapping circles with parameterized cen ters and radii. PCAs w ere then carried out on the Φ Φ Φ’s asso ciated to the three mappings (c haracteristic function, signed con tour distance and contour discretization). In each example, the 10 ﬁrst PCA eigen v alues λ j are rep orted. The α α α ’s manifolds, A N ⊂ R D , are plotted in the ﬁrst three dimensions as well as the ﬁrst eigen vectors in the Φ space. Example 1 A hole in R 2 p ar ameterize d by its r adius ( d = 1 ), its r adius and the x -c o or dinate of its c enter ( d = 2 ), or its r adius and the x and y c o or dinates of its c enter ( d = 3 ). 1 other algebraic op erations such as multiplications hav e also led to the same conclusions. 7 Figure 3: Example 1 : three ﬁrst eigencomp onen ts of the α α α ( i ) ’s for three parametric test cases (columns) with low eﬀectiv e dimension equal to 1 (left), 2 (center) and 3 (righ t). The rows cor- resp ond to diﬀerent φ ( · )’s which are the characteristic function (top), the signed distance to the con tour (middle) and the discretization of the contour (bottom). 8 Characteristic function Signed Distance Discretization j Eigen v alue Cum ulative percentage Eigen v alue Cumulativ e p ercen tage Eigenv alue Cumulativ e p ercen tage 1 324.63 63.09 840.14 100 25.20 100 2 75.98 77.86 0 100 0 100 3 32.69 84.21 0 100 0 100 4 18.20 87.75 0 100 0 100 5 11.48 89.98 0 100 0 100 6 8.12 91.56 0 100 0 100 7 5.92 92.71 0 100 0 100 8 4.45 93.57 0 100 0 100 9 3.50 94.25 0 100 0 100 10 2.79 94.80 0 100 0 100 T able 1: 10 ﬁrst PCA eigenv alues for the diﬀerent φ ( · )’s, circle with d = 1 parameter. Characteristic function Signed Distance Discretization j Eigen v alue Cum ulative percentage Eigen v alue Cumulativ e p ercen tage Eigenv alue Cumulativ e p ercen tage 1 60.90 26.50 1332.17 80.41 100.82 94.14 2 44.63 45.93 294.07 98.15 6.27 100 3 26.70 57.55 25.48 99.69 0 100 4 20.62 66.52 3.88 99.93 0 100 5 9.48 70.65 0.81 99.97 0 100 6 4.87 72.77 0.24 99.99 0 100 7 3.97 74.49 0.09 99.99 0 100 8 3.74 76.12 0.04 100 0 100 9 3.25 77.54 0.02 100 0 100 10 3.11 78.89 0.01 100 0 100 T able 2: 10 ﬁrst PCA eigenv alues for the diﬀerent φ ( · )’s, circle with d = 2 parameters. Characteristic function Signed Distance Discretization j Eigen v alue Cum ulative percentage Eigen v alue Cumulativ e p ercen tage Eigenv alue Cumulativ e p ercen tage 1 26.48 10.12 1045.26 42.42 82.13 48.51 2 25.82 19.98 1037.44 84.53 80.82 96.26 3 20.58 27.84 300.14 96.71 6.34 100 4 19.38 35.24 33.83 98.08 0 100 5 15.65 41.22 18.49 98.83 0 100 6 11.36 45.56 14.40 99.42 0 100 7 11.20 49.84 3.78 99.57 0 100 8 11.05 54.06 3.64 99.72 0 100 9 7.52 56.93 1.58 99.78 0 100 10 7.21 59.69 1.55 99.84 0 100 T able 3: 10 ﬁrst PCA eigenv alues for the diﬀerent φ ( · )’s, circle with d = 3 parameters. 9 Figures 4 - 11 sho w the 9 ﬁrst eigen vectors (if they hav e strictly p ositiv e eigenv alue) in the 3 cases of Example 1 with the three φ ( · )’s. Figure 4: Example 1 , circle with d = 1 parameter, 9 ﬁrst eigen vectors (left to righ t and top to b ottom) when φ ( · ) = c haracteristic function. Figure 5: Example 1 , circle with d = 1 parameter, ﬁrst eigenv ector when φ ( · ) = signed distance (left) and when φ ( · ) = contour discretization (right). 10 Figure 6: Example 1 , circle with d = 2 parameters, 9 ﬁrst eigen vectors (left to right and top to b ottom) when φ ( · ) = c haracteristic function. Figure 7: Example 1 , circle with d = 2 parameters, 9 ﬁrst eigen vectors (left to right and top to b ottom) when φ ( · ) = signed distance. 11 Figure 8: Example 1 , circle with d = 2 parameters, 2 ﬁrst eigen vectors (black and red) when φ ( · ) = contour discretization. Figure 9: Example 1 , circle with d = 3 parameters, 9 ﬁrst eigen vectors (left to right and top to b ottom) when φ ( · ) = c haracteristic function. 12 Figure 10: Example 1 , circle with d = 3 parameters, 9 ﬁrst eigen vectors (left to righ t and top to b ottom) when φ ( · ) = signed distance. Figure 11: Example 1 , circle with d = 3 parameters, 3 ﬁrst eigen vectors (black, red, green) when φ ( · ) = contour discretization. 13 A prop erty of PCA is that a linear combination of the eigenv ectors given in Equation ( 3 ) enables to retrieve any φ ( x ( i ) ). Some of the eigen vectors are easy to interpret: in Figure 5 left (signed distance), the eigenv ector is constant b ecause the av erage shap e is a map (an image) whose lev el lines are p erfect circles so that adding a constant to it changes the radius of the n ull con tour line; in Figure 8 where the mapping is a contour discretization, the ﬁrst eigenv ector (as well as the second in Figure 11 ) is a non-centered point that allows horizontal (and vertical) translations. The second (third in Figure 11 ) eigen vector is a circle which dilates or compresses the hole. As is seen in T ables 2 and 3 , more eigenv ectors are necessary for the c haracteristic function and for the signed distance than for the contour discretization. Contrarily to the c haracteristic function and the signed con tour, when the mapping φ ( · ) is the con tour discretization, the ﬁrst eigenv ectors lo ok like shap es on their own and therefore we will call them eigenshap es . This does not mean ho wev er that all of them are v alid shapes, as was seen in Figures 8 and 11 with the point vectors. In fact, most v j ’s are “non-physical” in the sense that there ma y not exist one design x such that φ ( x ) = v j , see for instance Figure 29 where the eigenshap es do not corresp ond to a v alid x from v 3 on. In the case of the characteristic function, even though φ ( x ) ∈ { 0 , 1 } D , the eigen vectors are real-v alued (see Figure 4 for instance). Example 2 A n over-p ar ameterize d hole in R 2 : the horizontal p osition of its c enter is s := P 13 j =1 x j , the vertic al p osition of its c enter is t := P 26 j =14 x j and its r adius is r := P 39 j =27 x j , as shown in Figur e 12 . T o incr e ase the c omplexity of the pr oblem, x 1 , x 14 and x 27 ar e of a magnitude lar ger than the other x j ’s: the cir cle mainly dep ends on these 3 p ar ameters. Figure 12: Second example: an o ver-parameterized circle. 14 10 5 Dim.2 0 -5 -10 -5 0 5 10 -15 -10 -5 Dim.3 Dim.1 0 -10 50 Dim.2 0 -50 -50 0 50 -100 -50 0 Dim.3 Dim.1 50 30 20 10 Dim.2 0 -10 -20 -30 -20 0 20 -5 0 Dim.3 Dim.1 5 10 5 Dim. 2 0 -5 -10 -5 0 5 10 -5 0 5 Dim. 4 Dim. 1 -10 50 Dim. 2 0 -50 -50 0 50 -15 -10 -5 0 5 Dim. 4 Dim. 1 10 15 30 20 10 Dim. 2 0 -10 -20 -30 -20 0 20 -4 -2 0 2 Dim. 4 Dim. 1 4 Figure 13: F our ﬁrst eigencomp onen ts of the α α α ( i ) ’s in the Example 2 , for the three diﬀerent shap e represen tations φ ( · ). Left: c haracteristic function, middle: signed distance to the con tour, righ t: discretization of the contour. The manifolds are shown in the { v 1 , v 2 , v 3 } (top), and { v 1 , v 2 , v 4 } bases (b ottom). As can b e seen from the tw o-dimensional surface in the { v 1 , v 2 , v 4 } space when φ ( · ) = D (b ottom righ t), the true dimension (3) is retrieved with the con tour discretization. Note also that the asso ciated manifold is conv ex. Characteristic function Signed Distance Discretization j Eigen v alue Cum ulative percentage Eigen v alue Cumulativ e p ercen tage Eigenv alue Cumulativ e p ercen tage 1 9.24 9.48 1238.53 40.24 109.04 49.23 2 8.97 18.69 1210.72 79.57 104.69 96.50 3 8.76 27.68 516.05 96.33 7.75 100 4 5.95 33.79 39.70 97.62 0 100 5 5.28 39.21 24.47 98.42 0 100 6 3.93 43.25 21.83 99.13 0 100 7 3.59 46.93 6.10 99.33 0 100 8 3.36 50.38 6.03 99.52 0 100 9 2.90 53.35 3.27 99.63 0 100 10 2.80 56.23 3.12 99.73 0 100 T able 4: 10 ﬁrst PCA eigen v alues for the diﬀerent φ ( · )’s, ov er-parameterized circle with d = 39 parameters, with real dimension d = 3. 15 The PCA eigenv alues for this example are given in T able 4 and are nearly the same as those in T able 3 . Apart from the little modiﬁcation in the uniform distribution for sampling the x ( i ) ’s whic h might lead to a slightly diﬀerent Φ Φ Φ, the o ver-parameterization is not a concern to retrieve the correct dimension. Figures 14 - 16 show the 9 ﬁrst eigenv ectors (if they hav e a strictly p ositive eigen v alue) for the three φ ( · )’s. Figure 14: Example 2 , o ver-parameterized circle with d = 39 parameters, 9 ﬁrst eigen vectors (left to right and top to bottom) when φ ( · ) = characteristic function. Figure 15: Example 2 , o ver-parameterized circle with d = 39 parameters, 9 ﬁrst eigen vectors (left to right and top to bottom) when φ ( · ) = signed distance. 16 Figure 16: Example 2 , o ver-parameterized circle with d = 39 parameters, 3 ﬁrst eigenv ectors when φ ( · ) = contour discretization. Example 3 Thr e e (non-overlapping) holes in R 2 , whose c enters and r adii ar e determine d by x 1 , x 2 , x 3 (ﬁrst cir cle), x 4 , x 5 , x 6 (se c ond cir cle), and x 7 , x 8 , x 9 (thir d cir cle). This pr oblem is mor e c omplex sinc e it c onsists of thr e e elements, and has d = 9 dimensions. F or φ ( · ) = D , the discr etiza- tion ve ctor φ ( x ) ∈ R D is split into 3 p arts of size D / 3 which c orr esp ond to the discr etization of e ach cir cle. 0 1 2 3 0 1 2 3 4 x 1 , x 2 ( ) x 3 x 4 , x 5 ( ) x 6 x 7 , x 8 ( ) x 9 Figure 17: Third example: three circles with v arying centers and radii. 17 Characteristic function Signed Distance Discretization j Eigen v alue Cum ulative percentage Eigen v alue Cumulativ e p ercen tage Eigenv alue Cumulativ e p ercen tage 1 96.67 9.52 1785.93 31.51 154.26 19.06 2 81.57 17.56 1267.81 53.88 151.80 37.82 3 80.07 25.45 912.40 69.98 149.81 56.33 4 66.03 31.96 588.30 80.36 148.09 74.63 5 48.28 36.71 402.56 87.46 91.34 85.91 6 40.66 40.72 159.38 90.27 90.53 97.10 7 39.37 44.60 144.75 92.83 8.65 98.17 8 38.75 48.42 121.80 94.97 8.54 99.22 9 25.07 50.89 54.63 95.94 6.29 100 10 24.45 53.30 47.36 96.77 0 100 T able 5: 10 ﬁrst PCA eigenv alues for the diﬀerent φ ( · )’s, three circles with d = 9 parameters. The 9 ﬁrst eigenv ectors are illustrated for the three φ ( · )’s in Figures 18 to 20 . Figure 18: Example 3 , three circles with d = 9 parameters, 9 ﬁrst eigenv ectors (left to right and top to b ottom) when φ ( · ) = characteristic function. 18 Figure 19: Example 3 , three circles with d = 9 parameters, 9 ﬁrst eigenv ectors (left to right and top to b ottom) when φ ( · ) = signed distance. Figure 20: Example 3 , three circles with d = 9 parameters, 9 ﬁrst eigen vectors (from left to righ t, top to bottom) when φ ( · ) = discretization. The blue part of eac h eigen vector acts on th e ﬁrst circle, the red part of eac h eigenv ector mo diﬁes the second circle and the green part of eac h eigenv ector applies on the third circle. 19 In each example, for all φ ( · )’s, an y shap e φ ( x ( i ) ) can b e reconstructed via Equation ( 3 ). α α α ( i ) is nonetheless D -dimensional hence no dimension reduction is obtained. W e are therefore in terested in low-rank approximations φ φ φ 1: δ := φ φ φ + P δ j =1 α j v j whic h solely consider the δ ﬁrst eigenv ectors, while guaranteeing a suﬃcient precision. It is known [ 22 ] that k Φ Φ Φ − Φ Φ Φ 1: δ k 2 F = N P D j = δ +1 λ j where Φ Φ Φ 1: δ is the reconstruction matrix using the δ ﬁrst principal axes v j only , and whose i -th row is φ φ φ + P δ j =1 α ( i ) j v j . Φ Φ Φ 1: δ is also known to be the closest (in terms of F robenius norm) matrix to Φ Φ Φ with rank low er or equal to δ . The λ j ’s with j > δ inform us ab out the reconstruction loss. Hence, w e lo ok for a mapping φ ( · ) for which the λ j quic kly go to zero. In T ables 1 to 5 , the v anishing of λ j b ey ond the in trinsic dimension only happens when φ ( · ) = D . With the other mappings, alternative tec hniques relying on lo cal PCAs [ 20 ] on the α α α ( i ) ’s are required to estimate the dimensionality of manifolds suc h as the ones on the top row of Figure 3 . The d ﬁrst principal comp onents, α α α ( i ) 1: d suﬃce to reconstruct φ ( x ( i ) ) exactly using D as the φ ( · ) mapping, while more than d comp onen ts are required for φ ( x ( i ) ) to be recov ered using χ or D . With D , the eigenv ectors v j (Righ t plot of Figure 5 , Figures 8 , 11 , 16 and 20 ) are physically meaningful: they can be in terpreted as shap e discretizations, which, b eing m ultiplied by co eﬃcients α j and added to the mean shap e φ φ φ , act on the hole’s size (Eigenv ector 1 in righ t plot of Figure 5 , Eigen vector 2 in Figure 8 , Eigenv ector 3 in Figure 11 , Eigenv ector 3 in Figure 16 , Eigenv ectors 7-9 in Figure 20 ), or on the hole’s p osition (Eigen vector 1 in Figure 8 , Eigen vectors 1-2 in Figure 11 , Eigen vectors 1-2 in Figure 16 , Eigen vectors 1-6 in Figure 20 ). F or example, very small eigen vectors suc h as the ﬁrst one in Figure 8 displace the shap e in the direction sp eciﬁed by the eigen vector’s p osition. In Figure 20 , the ﬁrst eigenv ectors tend to mo ve each circle with resp ect to eac h other, while the sizes of the holes are aﬀected b y the last eigenv ectors. Whereas the c haracteristic function χ and the signed distance D are images, the mapping D is a discretization of the ﬁnal ob ject w e represen t, a contour shap e. Without formal pro of, w e think that this is related to the observed property that the d (the n umber of intrinsic dimensions) ﬁrst eigencomp onents α α α ( i ) 1: d , i = 1 , . . . , N mak e a con vex set as can be seen in Figures 3 and 13 . In a solid mechanics analogy , the φ φ φ + P j α j v j reconstruction can be though t as a sum of pressure ﬁelds v j applied on eac h no de of the Poin t Distribution Model, and which deform the initial mean shap e φ φ φ b y a magnitude α j to obtain φ φ φ . Such an in terpretation cannot b e conducted with the eigen vectors obtained via the χ or D mapping, sho wn in the other ﬁgures. Because of its clear pre-eminence, in the following, we will only consider the α α α ’s obtained using the contour discretization as φ ( · ) mapping. 2.3.2 Hierarc hic shap e basis for the reduction of high-dimensional designs F ollowing these observ ations, we now deal with slightly more complex and realistic shapes Ω x . Even though they are initially describ ed with man y parameters, they mainly depend on few intrinsic dimensions. Example 4 A r e ctangle ABCD with x ∈ R 40 whose p ar ameters x 1 and x 2 ar e the lo c ation of A, x 3 and x 4 ar e the width and the height of ABCD, and x 5:13 , x 14:22 , x 23:31 and x 32:40 ar e smal l evenly distribute d p erturb ations, on the AB, BC, CD and DA se gments, r esp e ctively. x 1 , . . . , x 4 are of a magnitude larger than the other parameters to ensure a close-to-rectangular shap e, as sho wn in Figure 21 . 20 Figure 21: Example 4 : a rectangle with v arying p osition, size, and deformation of its sides. j Eigen v alue Cum ulative percentage 1 867.65 48.73 2 866.90 97.42 3 21.46 98.62 4 21.43 99.83 5 0.13 99.83 6 0.13 99.84 7 0.13 99.85 8 0.13 99.86 9 0.12 99.86 10 0.12 99.87 . . . . . . . . . 39 0.04 99.99 40 0.04 100 41 0 100 T able 6: First PCA eigenv alues for φ ( · ) = discretization, rectangles with d = 40 parameters (Ex- ample 4 ). 21 In this example where 4 parameters (p osition and sizes) mainly explain the diﬀerences among shap es, w e see that a reconstruction qualit y of 99.83% is attained with the 4 ﬁrst eigenv ectors v j . Figure 22 details the eigenv ectors. v 1 and v 2 , the most inﬂuencing eigenshap es plotted in blac k and blue act as translations, while v 3 and v 4 (in red and green) corresp ond to widening and heigh tening of the rectangle. The ﬂuctuations along the segmen ts appear from the 5th eigenshap e on. An y shap e is retriev ed with the d = 40 ﬁrst eigenshap es which corresp onds to the total num b er of parameters. Figure 22: 6 ﬁrst eigenshap es (in the order blac k, blue, red, green, yello w, purple) of the rectangles in Example 4 . Example 5 A str aight line joining two ﬁxe d p oints A and B, mo diﬁe d by smo oth p erturb ations r ∈ R 29 , evenly distribute d along [AB] to appr oximate a smo oth curve. The ﬁfth example is inspired by the catenoid problem [ 10 ]. The p erturbations r are generated b y a Gaussian Pro cess with squared exp onen tial kernel and with length-scale 6 times smaller than [AB]. Therefore, in this example, the N = 5000 r ( i ) ’s used for building Φ Φ Φ are not uniformly distributed in X . 22 Figure 23: Example 5 : a straight line joining t wo p oin ts, modiﬁed by the p erturbations r j to appro ximate a curve. Gray: the line joining A and B. Blue, red, yello w and green curv e: examples of lines with regular r j p erturbations. Red en velope: b oundaries for the r j ’s. j Eigen v alue Cum ulative percentage 1 2.156 50.258 2 1.251 79.422 3 0.590 93.181 4 0.206 97.973 5 0.065 99.480 6 0.017 99.882 7 0.004 99.975 8 0.001 99.995 9 ε 99.999 10 ε 100 . . . . . . . . . 28 ε 100 29 ε 100 30 0 100 T able 7: First PCA eigen v alues for φ ( · ) = discretization, curve with d = 29 parameters. ε means the quan tity is not exactly 0, but smaller than 10 − 3 , hence less than 0.04% of the ﬁrst PCA eigen v alue. 23 Again, the initial dimension ( d = 29) is recov ered by lo oking at the strictly p ositive eigen v alues. F urthermore, the manifold is found to mainly lie in a low er dimensional space: A N can approximated in δ = 7 dimensions since P δ j =1 λ j / P D j =1 λ j = 99 . 975%. Figure 24 shows the corresp onding eigenshap es. The eigenshap es are similar to the ordered mo des of the harmonic series with the asso ciated eigen v alues ordered as the in verse of the frequen- cies. x y −0.07 −0.03 0.03 0.07 0.0 1.5 3.5 5.0 1 2 3 4 5 6 7 Figure 24: 7 ﬁrst eigenshap es for the curves of Example 5 . Example 6 A NACA airfoil p ar ameterize d by thr e e p ar ameters: x = ( M , P , T ) > ∈ R 3 wher e M is the maximum c amb er, P is the p osition of this maximum, and T is the maximal thickness. Figur e 25 describ es the airfoil. Figure 25: Description of a NACA airfoil with its M , P , T parameters. In this example, a t ypical noise-truncation criterion suc h as discussed in Example 5 w ould retain 3 or 4 axes. In Example 6 too, the eﬀectiv e dimension can almost be retriev ed from the λ ’s. Figure 26 shows the 4 ﬁrst eigenshap es (left) as well as the A N manifold (right). The eigenv ectors can b e interpreted as a reformulation of the CAD parameters. The ﬁrst eigenshap e (blue) is a symmetric airfoil. Multiplying it by a co eﬃcien t (after adding it to the blac k mean shap e) will increase or decrease the thic kness of the airfoil, hence it plays a similar role to the T parameter. 24 j Eigen v alue Cum ulative percentage 1 0.2819 54.619 2 0.2203 97.318 3 0.0129 99.814 4 0.0008 99.959 5 0.0001 99.983 6 ε 99.991 7 ε 99.996 8 ε 99.997 9 ε 99.999 10 ε 99.999 T able 8: First PCA eigenv alues of the NACA airfoil with d = 3 parameters ( φ ( · ) is the contour discretization). ε means the quantit y is smaller than 10 − 4 , hence less than 0.04% of the ﬁrst PCA eigen v alue. The second eigenshape is a cam b ered airfoil, whose role is similar to M (maxim um camber). Last, the third airfoil, which has a muc h smaller eigenv alue λ 3 , is very thin, p ositiv e in the ﬁrst part of the airfoil, and negative in its second part. It balances the camber of the airfoil tow ards the leading edge or tow ards the rear and plays a role similar to P , the position of the maximum cam b er. v 3 ’s eﬀect is complemented b y v 4 . The analysis of A N (Figure 26 ) is physically meaningful: ev en though x ( i ) are sampled uniformly in X , A N resem bles a p yramid in the ( v 1 , v 2 , v 3 ) basis. Designs with minimal α 2 share the same α 3 v alue. Since negative α 2 ’s corresp ond to wings with little camber, the p osition of this maximum cam b er has v ery little impact, hence the almost n ull α 3 v alue. By looking at A N , it is learned that the parameter P does not matter w hen M is small, whic h is intuitiv e but is not expressed b y the ( M , P , T ) co ordinates. Distances in A N are therefore more representativ e of shap e diﬀerences. An additional adv antage of analyzing shap es is that correlations in the space of parameters (such as the one b et ween M and P in this example) are discov ered and remov ed, since V is an orthonormal basis. Here, orthogonalit y b et w een eigenshap es is measured b y the standard scalar pro duct in R D . Dep ending on the application, there may exist natural deﬁnitions of the orthogonalit y betw een discretized shap es, whic h could b e used by the PCA. Example 7 A mo diﬁe d NACA airfoil which is p ar ameterize d by d = 22 p ar ameters: x = ( M , P , T , L 1 , . . . , L 19 ) > ∈ R 22 wher e M , P , T ar e the standar d NACA p ar ameters (Example 6 ), and wher e the L i ’s c orr esp ond to smal l bumps along the airfoil. Figur e 27 describ es a NACA 22 airfoil. 25 Figure 26: NACA airfoil with d = 3 parameters. Left: mean shap e and 4 ﬁrst eigenshap es (blac k, blue, red, green, yello w). Right: three ﬁrst eigencomponents ( α 1 , α 2 , α 3 ) of the A N manifold. Figure 27: Description of a NACA airfoil in 22 dimensions. It is a standard NACA airfoil whose in trados and extrados ha ve b een mo diﬁed b y bumps of size L i . 26 j Eigen v alue Cum ulative percentage 1 0.2826 53.932 2 0.2205 96.021 3 0.0134 98.580 4 0.0011 98.798 5 0.0006 98.903 6 0.0005 99.006 7 0.0005 99.106 8 0.0005 99.202 9 0.0005 99.293 10 0.0004 99.377 . . . . . . . . . 19 0.003 99.958 20 0.002 99.992 21 ε 99.995 22 ε 99.998 23 ε 99.999 T able 9: First PCA eigenv alues for φ ( · ) = discretization, NA CA with d = 22 parameters. ε means the quantit y is not exactly 0, but smaller than 10 − 4 , hence less than 0.04% of the ﬁrst PCA eigen v alue. Here, as in the Example 6 , the noise-truncation criteria will retain betw een 6 and 20 dimensions, dep ending on the reconstruction qualit y required. Indeed, when lo oking at sp ecimen of NA CA 22 airfoils as the one in the upp er left part of Figure 28 , less than 22 dimensions are exp ected to b e necessary to retrieve an appro ximation of suﬃcient qualit y . The analysis of eigenshap es, shown in Figure 29 , is similar to the one of Example 6 . Small details that act on the airfoil suc h as the bumps only app ear from the 4th eigenshap e on. Not taking them into account leads to a w eaker reconstruction, as shown in the bottom part of Figure 28 . 27 Figure 28: Left: examples of NACA 22 airfoils. Even though the true dimension is 22, less di- mensions ma y suﬃce to approximate the shap es w ell enough. Right: reconstruction scheme of any NA CA 22 shape: a weigh ted deviation from the mean shap e φ φ φ in the direction of the eigenshap es. Bottom: example of shap e reconstruction (red) using 2, 3, 6 or 20 eigenshap es. The more v j ’s, the b etter the reconstruction but the larger the dimension of α α α . 28 Mean discretization 1st eigenv ector 2nd eigenv ector 3rd eigenv ector 4th eigenv ector 5th eigenv ector 6th eigenv ector Figure 29: Mean shape (black) and 6 ﬁrst eigenshapes (blue, red, green, yello w, purple, pink) for the NACA with 22 parameters. The three ﬁrst eigenv ectors are similar to those observed on Figure 26 for the original NA CA 3. Fluctuations along the eigenshap es are found from the 4th eigenshape on. They allow to reconstruct the lo cal reﬁnemen ts (bumps) of the airfoils. 29 According to these exp erimen ts, the eigenv ectors v j , j ∈ { d + 1 , . . . , D } , can already be discarded without even considering the v alues of the asso ciated ob jective functions since the d ﬁrst shap e mo des explain the whole v ariability of the discretized shap es. In practice, to ﬁlter numerical noise and to remov e non-informativ e modes in shap es that are truly ov er-parameterized, we only consider the d 0 ﬁrst eigenshapes, d 0 := min( d, ˜ d ) where ˜ d corresp onds to the smallest n umber of axes that explain more than a giv en lev el of div ersity in Φ (e.g. 99.9, 99.95 or 99.99%), measured by 100 × P ˜ d j =1 λ j / P D j =1 λ j . Another alternative is to deﬁne ˜ d according to the dimensions for which λ j /λ 1 is smaller than a prescrib ed threshold (e.g. 1/1000). Even though the notation D is kept, the eigen vectors v j and the principal comp onen ts α j , are considered to b e null ∀ j > d 0 so that in fact D = d 0 in the following. 3 GP mo dels for reduced eigenspaces Building a surrogate mo del in the space of principal comp onen ts has already b een in vestigated in the context of reduced order mo dels [ 5 ]. In most applications, the dimension reduction is carried out in the output space, whic h has large dimension when it corresp onds to v alues on a ﬁnite elemen t mesh. The resp onse is approximated by a linear combination of a small n umber of mo des, and the metamodel is a function of the mo des co eﬃcien ts. The construction of surrogates with inheren t dimensionality reduction has also b een considered. In the active subspace metho d [ 11 ], the dimension reduction comes from a linear combination of the inputs whic h is carried out b y pro jecting x onto the hyperplane spanned by the directions of largest ∇ f ( x ) v ariation. The reduced- dimension GP is then Y ( W > x ) with W ∈ R d × δ con taining these directions in columns. In [ 34 ], cross-v alidation is emplo y ed for c ho osing the n umber of suc h axes. An application to airfoils is giv en in [ 26 ] where the authors tak e the directions of largest drag and lift gradien ts as columns of W , even though this basis is no longer orthogonal. Another related tec hnique with a Y ( W > x ) GP whic h does not require the kno wledge of ∇ f ( x ) is the Kriging and Partial Least Squares (KPLS) metho d [ 8 ], where x is pro jected on to the h yp erplane spanned by the ﬁrst δ axes of a PLS regression [ 19 ]. The dimension reduction is output-driven but W is no longer orthogonal, and information ma y b e lost when n < d 0 b ecause an y shap e (of eﬀective dimension d 0 ) cannot b e exactly reconstructed (Equation 3 ) with these n vectors. Coordinates in the PLS space are therefore incomplete and metamo deling loses precision when n is to o small. In the same spirit, a double maximum-lik eliho od pro cedure is developed in [ 47 ] to build an output-related and orthogonal matrix W for the construction of a Gaussian Pro cess with built-in dimensionality reduction. Rotating the design space through h yp erparameters determined by maximum lik eliho o d is also p erformed in [ 33 ]. T able 10 summarizes the existing literature for building such GPs as well as the approac h introduced in Section 3.2.2 (last column). 3.1 Unsup ervised dimension reduction Instead of the space of CAD parameters x , w e reduce the dimension of the input space by building the surrogate with information from the space of shape represen tations, Φ, as in [ 25 ]. T o circum ven t the high dimensionalit y of Φ ⊂ R D , a linear dimension reduction of φ ( x ) is achiev ed by building the mo del in the space spanned by W > φ ( x ). A natural candidate for W is a restriction to few columns (eigenshap es) of the matrix V . Notice that contrarily to the other dimension reduction techniques whic h op erate a linear dimension reduction of x , this approach is nonlinear in x since it op erates 30 Mo del Y ( W > x ) Y ( W > φ ( x )) Y a ( W > a φ ( x )) + Y a ( W > a φ ( x )) Dimension reduction Linear in x Nonlinear in x Nonlinear in x ; group-additiv e mo del Construction of W Activ e subspaces [ 11 , 34 , 26 ] PLS [ 8 ] GP hyperparameters [ 47 , 33 ] Sensitivit y analysis [ 4 ] PLS [ 25 ] Selection of mapp ed v ariables through p enalized lik eliho od (Section 3.2.1 ) T able 10: An o verview of GP mo dels with built-in dimensionalit y reduction. linearly on the nonlinear transformation φ ( x ). Also, it op erates on a b etter suited representation of the designs, their shap es, instead of their parameters. A ﬁrst idea to reduce the dimension of the problem is to conserve the δ ﬁrst eigen vectors v j according to some reconstruction qualit y criterion measured b y the eigenv alues. Given a threshold T (e.g., 0.95 or 0.99), only the ﬁrst δ mo des such that P δ j =1 λ j P D j =1 λ j > T are retained in V 1: δ ∈ R D × δ b ecause they con tribute for 100 × T % of the v ariance in Φ. The surrogate mo del is implemen ted in the space of the δ ﬁrst principal components as Y ( α α α 1: δ ) = Y ( V > 1: δ ( φ ( x ) − φ φ φ )) . (5) Using a stationary kernel for the Y ( α α α 1: δ ) GP , i.e. k ( α α α 1: δ , α α α 0 1: δ ) = ˜ k ( k α α α 1: δ − α α α 0 1: δ k R δ ), the corre- lation b et w een designs is k ( α α α 1: δ , α α α 0 1: δ ) = ˜ k ( k V > 1: δ ( φ ( x ) − φ ( x 0 )) k R δ ) = ˜ k ( r ) with r 2 = ( φ ( x ) − φ ( x 0 )) > M ( φ ( x ) − φ ( x 0 )) where M = V 1: δ V > 1: δ is a D × D matrix with low rank ( δ ). Hence, this mo del implements a Gaussian Process in the Φ space with an integrated linear dimensionality reduction step [ 37 ]. Note that the k ernel is non-stationary in the original X space. The approaches [ 11 , 8 , 47 ] mainly diﬀer from that prop osed in Equation ( 5 ) in the construction of the reduced basis: in Equation ( 5 ), dimension reduction is carried out without the need to call the exp ensiv e f ( x ) (or its gradient): the directions of largest v ariation of an easy to compute mapping φ ( · ) are used instead. This also preven ts from a spurious or incomplete pro jection when n is smaller than D and av oids recomputing the basis at each iteration. This is nonetheless a limitation since the Y ( α α α 1: δ ) approach relies only on considerations ab out the shap e geometry . The output y is not taken into account for the dimension reduction even though some v j , j ∈ { 1 , . . . , δ } may inﬂuence y or not. Two shap es whic h diﬀer in the α j comp onen ts with j ≤ δ may b eha ve similarly in terms of output y , so that further dimension reduction is p ossible. Vice v ersa, eigencomponents that ha ve a small geometrical eﬀect and w ere neglected may b e rein tro duced because they matter for y . As an illustration consider the red and blac k shapes of Figure 30 . Both are associated to parameters x and x 0 and their discretizations φ ( x ) and φ ( x 0 ) are quite diﬀerent. Dep ending on the ob jective function, f ( x ) and f ( x 0 ) might diﬀer widely . How ever, when considering the φ φ φ + P δ j =1 α j v j reconstruction with δ = 3, they look v ery similar b ecause α α α 1:3 ≈ α α α 0 1:3 . Even though V 1:3 := { v 1 , v 2 , v 3 } is a tempting basis b ecause it explains 98.5% of the discretizations v ariance, it is not a goo d c hoice if f ( x ) and f ( x 0 ) are diﬀeren t: b ecause of contin uity assumptions a surrogate mo del w ould typically suﬀer from inputs α α α ≈ α α α 0 with y 6 = y 0 . F or this reason, instead of building the surrogate in the space spanned by the most relev an t shap e mo des, w e would prefer to build it in the V a ⊂ V basis of the most output-inﬂuencing eigenshap es 31 Figure 30: Example of tw o diﬀerent shap es (black and red) whose reconstruction in the space of the three ﬁrst eigenshap es is very similar. α α α a . Additionally , since the remaining “inactive” comp onen ts α α α a reﬁne the shap e and might explain small ﬂuctuations of y , instead of omitting them (whic h is equiv alent to stating α α α a = 0 ), w e wou ld lik e to keep them in the surrogate model while prioritizing α α α a : a GP Y a ( W a φ ( x )) + Y a ( W a φ ( x )) is detailed in Sec. 3.2.2 . 3.2 Sup ervised dimension reduction 3.2.1 Selection of activ e eigenshap es T o select the eigencomponents that impact y the most, the penalized log-likelihoo d [ 53 ] of a regular, anisotropic GP in the high dimensional space of α α α ’s is considered, max ϑ pl λ ( α α α (1: n ) , y 1: n ; ϑ ) where pl λ ( α α α (1: n ) , y 1: n ; ϑ ) := l ( α α α (1: n ) , y 1: n ; ϑ ) − λ k θ θ θ − 1 k 1 (6) The ϑ are the GP’s h yp er-parameters made of the length-scales θ j , a constant mean term β , and the v ariance of the GP σ 2 . α α α (1: n ) are the eigencomp onen ts of the ev aluated designs x (1) , . . . , x ( n ) , and y 1: n the asso ciated outputs, y 1: n = ( y 1 , . . . , y n ) > = ( f ( x (1) ) , . . . , f ( x ( n ) )) > . The mean and the v ariance terms can be solved for analytically b y setting the deriv ativ e of the penalized log-likelihoo d ( 6 ) equal to 0 which yields b β := 1 > R − 1 θ θ θ y 1: n 1 > R − 1 θ θ θ 1 and b σ 2 := 1 n ( y 1: n − 1 b β ) > R − 1 θ θ θ ( y 1: n − 1 b β ) (7) where K ϑ is the co v ariance matrix with entries K ϑ ij = c σ 2 k θ θ θ ( x ( i ) , x ( j ) ), with determinan t | K ϑ | and R θ θ θ is the correlation matrix, R ij = k θ θ θ ( x ( i ) , x ( j ) ). The (concen trated) p enalized log-lik eliho od of this GP is pl λ ( α α α (1: n ) , y 1: n ; ϑ ) = − n 2 log(2 π ) − 1 2 log( | K ϑ | ) − 1 2 ( y 1: n − 1 b β ) > K − 1 ϑ ( y 1: n − 1 b β ) − λ k θ θ θ − 1 k 1 (8) The p enalization is applied to θ θ θ − 1 := (1 /θ 1 , . . . , 1 /θ D ) > , the v ector con taining the inv erse length-scales of the GP . It is indeed clear [ 4 ] that if θ j → + ∞ , the direction v j has no inﬂuence on y as all the points are perfectly correlated together, making the GP ﬂat in this dimension. The L 1 p enalt y term applied to the θ j ’s p erforms v ariable selection: this Lasso-like pro cedure promotes zeros in the vector of inv erse length-scales, hence sets many θ j ’s to + ∞ . F ew directions with small θ j are selected and make the active dimensions, α α α a (step 3 in Figure 1 ). Ev en if the maximization of pl λ is carried out in a D -dimensional space, the problem is tractable since the gradien ts of pl λ 32 are analytically known [ 38 ], and b ecause the L 1 p enalt y con vexiﬁes the problem. W e solv e it using standard gradient-based techniques suc h as BFGS [ 27 ] with multistart. Numerical exp erimen ts not rep orted here for reasons of brevity ha ve sho wn that most lo cal optima to this problem solely diﬀer in θ j ’s that are already to o large to b e relev an t and consistently yield the same set of activ e v ariables α α α a . Notice that in [ 53 ], a similar approac h is undertak en but the penalization w as applied on the recipro cal v ariables w = ( w 1 , . . . , w D ) > with w j = 1 /θ j . In our work, the in verse length-scales are p enalized, the gradien t of the p enalt y is prop ortional to 1 /θ 2 j . This might help the optimizer since directions with θ j ’s that are not large yet are giv en more emphasis. In comparison, the w p enalt y function’s gradient is isotropic. Since w e can restrict the n umber of v ariables to d 0  D with no loss of information (cf. discussion at the end of Section 2.3 ), the dimension of Problem ( 6 ) is substan tially reduced which leads to a more eﬃcient resolution. Because the α j ’s hav e zero mean and v ariance λ j , they hav e magnitudes that decrease with j . When m < n , 1 /θ n is typically larger than 1 /θ m , meaning that the optimizer is b etter rewarded b y diminishing 1 /θ n than 1 /θ m . Starting from reasonable θ j v alues 2 the ﬁrst θ j ’s are therefore less likely to be increased in comparison with the last ones, i.e. they are less lik ely to b e found inactiv e. This can be seen as a bias whic h can b e remo ved by scaling all α j ’s to the same interv al. Ho wev er, w e do not normalize the α α α v ariables for tw o reasons. First, since the α j ’s correspond to reconstruction co eﬃcien ts asso ciated to normalized eigenshapes ( k v j k R D = 1), they share the same ph ysical dimension and can b e in terpreted in the same manner. Second, this bias is equiv alent to assuming that the most signiﬁcant shap e v ariations are resp onsible for the largest output v ariations, whic h is a reasonable prior. In exp erimen ts that are not rep orted here for the sake of brevity , we ha ve noticed that a BF GS algorithm optimizing Problem ( 6 ) got trapp ed by w eak lo cal optima more frequently when the α j ’s were normalized. Deﬁnition 1 (Selection of activ e dimensions) L et a GP b e indexe d by α 1 , . . . , α D ∈ [ α α α min , α α α max ] ⊂ R D and { α α α (1: n ) , y (1: n ) ) } b e the data to mo del. The length-sc ales θ θ θ of the GP ar e set by maximizing the L 1 p enalize d c onc entr ate d lo g-likeliho o d of Equation ( 8 ). A dimension j is de clar e d active if θ j r ange ( α j ) ≤ 10 × min i =1 ,...,D θ i r ange ( α i ) . The δ such active dimensions ar e denote d α α α a = ( α a 1 , . . . , α a δ ) ∈ R δ . Since the α j ’s ha ve diﬀerent (decreasing) ranges, the length-scales hav e to b e normalized b y the range of α α α (1: n ) j to b e meaningful during this θ j comparison. Our implementation extends the likeli- ho od maximization of the kergp pack age [ 14 ] to include the p enalization term. After a dimensional analysis of pl λ , we hav e chosen to take λ = n D to balance b oth terms. Other techniques such as cross-v alidation or the use of diﬀeren t λ ’s for obtaining a pre-deﬁned num b er of active comp onents can also b e considered. On the NA CA 22 b enc hmark with few observ ations of f ( · ) ( n = 15 here), Figure 31 giv es the only few active comp onen ts that are selected by the p enalized maximum likelihoo d pro cedure. The three ﬁrst principal axes, v 1 , v 2 and v 3 are retained when considering the drag (top). Indeed, these are the eigenshap es that globally impact the shap e the most and change its drag. When the output y is the lift (b ottom), only the second principal axis is selected. This eigenshap e mo diﬁes the cam b er of the shape, which is known to highly impact the lift. The other eigenv ectors are 2 Typically of the order of range( α j ). 33 detected to b e less critical for y ’s v ariations. When n grows, more eigenshapes get selected because they also slightly impact the output. F or instance when n = 50, some eigenshap es that contain bumps (the 4th, the 5th, the 8th, etc.) are selected for mo deling the lift. They also contribute to c hanging the cam b er of the airfoil, hence its lift. Figure 31: V ariable selection on the NA CA 22 benchmark by p enalized maxim um lik eliho od. F or the drag (top), the three ﬁrst eigenshapes that act on the shap e, hence on its drag, are selected (red co eﬃcien ts). F or the lift, only the second eigencomp onent ( v 2 ) is selected (b ottom). Indeed v 2 mo diﬁes the cam b er of the airfoil, hence it plays a ma jor role on the lift. The other eigen basis v ectors (green coeﬃcients) are estimated to b e less inﬂuen tial on y . 3.2.2 Additiv e GP b et ween activ e and inactiv e eigenshap es Completely omitting the non-active dimensions, α α α a ∈ R D − δ , and building the surrogate mo del Y ( · ) in the sole α α α a space may amoun t to erasing some geometric patterns of the shap es which con tribute to small v ariations of y . F or this reason, an additiv e GP [ 16 , 17 ] with zonal anisotropy [ 1 ] b et ween the active and inactive eigenshap es is considered (step 4 in Figure 1 ): Y ( α α α ) = β + Y a ( α α α a ) + Y a ( α α α a ). (9) Y a ( α α α a ) is the anisotropic main-eﬀect GP which works in the reduced space of active v ariables. It requires the estimation of δ + 1 h yp er-parameters (the length-scales θ j and a GP v ariance σ 2 a ) and aims at capturing most of y ’s v ariation, related to α α α a ’s eﬀect. Y a ( α α α a ) is a GP ov er the large space of inactive comp onen ts. It is a GP which just tak es residual eﬀects into account. T o k eep Y a ( α α α a ) tractable, it is considered isotropic, i.e., it only has 2 hyper-parameters, a unique length- scale θ a and a v ariance σ 2 a . In the end, ev en though Y ( α α α ) op erates with α α α ’s ∈ R D and there are few er observ ations than dimensions 3 , n  D , it remains tractable since only a total of δ + 3  n h yp erparameters ha ve to be learned, which guarantees the identiﬁabilit y , i.e. the unicity of the h yp erparameters solution even when the num b er of observ ations is small. Although the α j ’s hav e diﬀeren t ranges, they are homogeneous in that they all multiply normalized eigenshap es. Thus, the distances inside the shap e manifold, A , should b e relev ant and an isotropic mo del is a p ossible assumption, which again, tends to emphasize eigenshap es that appear the most within the designs. This additiv e mo del can b e interpreted as a GP in the α α α a space, with an inhomogeneous noise ﬁtted b y the Y a ( · ) GP [ 15 ]. It aims at modeling a function that v aries primarily along the active dimensions, and ﬂuctuates only marginally along the inactiv e ones, as illustrated in Figure 32 . 3 Even if pruning the α j components for j > d 0 (see comments at the end of Section 2.3 ), n < d 0 may hold. 34 Figure 32: Example of a function that primarily v aries along the α α α a direction, and secondarily along α α α a . If α α α a is omitted, one implicitly considers the restriction of f ( · ) to the gray plane where α α α a = 0 . Denoting k a and k a the kernels of the GPs, the h yp er-parameters ϑ a = ( θ a 1 , . . . , θ a δ , σ 2 a ) and ϑ a = ( θ a , σ 2 a ) are estimated by maximizing the log-lik eliho o d of ( 9 ) giv en the observ ed data y 1: n , l Y ( α α α (1: n ) , y 1: n ; ϑ a , ϑ a ) = − n 2 log 2 π − 1 2 log( | K | ) − 1 2 ( y 1: n − 1 b β ) > K − 1 ( y 1: n − 1 b β ), using the kergp pack age [ 14 ]. K = K a + K a , with K a ij = σ 2 a k a ( α α α a ( i ) , α α α a ( j ) ), and K a ij = σ 2 a k a ( α α α a ( i ) , α α α a ( j ) ), and b β is giv en by Equation ( 7 ). The correlation b et w een α α α and α α α 0 b eing k ( α α α, α α α 0 ) = σ 2 a k a ( α α α a , α α α a 0 ) + σ 2 a k a ( α α α a , α α α a 0 ), the kriging predictor and v ariance of this additive GP are [ 37 ] m ( α α α ) = 1 n b β + k ( α α α, α α α (1: n ) ) > K − 1 ( y 1: n − 1 n b β ) s 2 ( α α α ) = σ 2 a + σ 2 a − k ( α α α, α α α (1: n ) ) > K − 1 k ( α α α, α α α (1: n ) ) (10) 3.3 Exp erimen ts: Metamo deling in the eigenshap e basis W e now study the p erformance of the v ariable selection and of the additive GP describ ed in the previous section. The diﬀerent v ersions of GPs that are compared are the following: • GP( X ) is the GP in the original space of parameters X ; • GP( α α α ) indicates the GP is built in the space of (to b e speciﬁed) principal components; • GP( α α α a ) means the GP works with the active α α α ’s only; • AddGP( α α α a + α α α a ) refers to the additive GP (Section 3.2.2 ). W e equip the example designs 2 , 4 , 5 and 7 (Section 2.3 ) with ob jectiv e functions f ( x ) that are to be mo deled by the ﬁtted GPs. F or each function, the predictiv e capability of diﬀerent mo dels is compared on a distinct test set using the R2 coeﬃcient of determination. Later, in Section 4.3.2 , the ob jective functions will b e optimized. 35 • Example 2 : f 2 ( x ) = r − π r 2 − k ( x, y ) > − (3 , 2) > k 2 , where x , y and r corresp ond to the p osition of the cen ter and the radius of the ov er-parameterized circle (and accessible through x ), resp ectiv ely . • Example 4 : f 4 ( x ) = k Ω t − Ω ˜ x k 2 2 where ˜ x := x − ( x 1 + 2 . 5 , x 2 + 2 . 5 , 0 , . . . , 0) > corresp onds to the cen tered design, and Ω t , Ω ˜ x are the nodal co ordinates of the shapes, see Figure 21 . The goal is to retrieve a target shap e t = ( t 1 , . . . , t 40 ) > whose low er left p oin t (A) is set at t 1 = t 2 = 2 . 5 with the ﬂexible rectangle deﬁned by x . The A point of any shap e x is ﬁrst mo ved to wards (2.5, 2.5) to o, and f 4 measures the discrepancy . Here, the target t is the rectangular heart shown in Figure 33 . Figure 33: Rectangular heart target shap e of Example 4 . • Example 5 : f 5 ( r ) = 2 π R y B y A r ( y ) p 1 + r 0 ( y ) 2 dy : inspired by the catenoid problem [ 10 ], w e aim at ﬁnding a regular curve joining tw o points A = (0 , y A ) and B = (1 , y B ), with the smallest axisymmetric surface. The curve r ( y ) is the straight line b et ween A and B, mo diﬁed by r = ( r 1 , . . . , r 29 ) > , see Figure 23 . • Example 7 : the ob jectiv e functions are the lift coeﬃcient and the drag coeﬃcient of the airfoil, f 7 L , f 7 D . The latter are computed using a commercial Computational Fluid Dynamics (CFD) computer co de. Ov er-parameterized circle (Example 2 ) F or the o ver-parameterized circle, the ob jectiv e function is f 2 ( x ) = r − π r 2 − k ( x, y ) > − (3 , 2) > k 2 , where x , y and r corresp ond to the p osition of the cen ter and the radius of the circle (accessible through x ), resp ectiv ely . f 2 explicitly dep ends on the parameters that truly deﬁne the circle. Three mo dels are compared • A mo del using the CAD parameters x ∈ R 39 ; • A mo del using the 3 ﬁrst eigencomponents, ( α 1 , α 2 , α 3 ); • A mo del built ov er the true circle parameters ( x, y , r ). T able 11 gives the av erage R2 ov er 10 runs with diﬀeren t space-ﬁlling DoEs of size n = 20 , 50 , 100 , 200. Since d = 39 > 20, no GP w as ﬁtted in the CAD parameter space when n = 20. f 2 is easily learned by the surrogate mo del as shown by large R2 v alues. Obviously , the qualit y of prediction increases with n and the eigenshape GP ( GP( α α α 1:3 ) ) built in a 3-dimensional space 36 n GP( X ) GP( α α α 1:3 ) GP(True) 20 - 0.99741 0.99701 50 0.78193 0.99954 0.99951 100 0.86254 0.99984 0.99985 200 0.93383 0.99992 0.99997 T able 11: Average R2 ov er 10 runs for the prediction of f 2 . GP( X ) is the GP in the 39-dimensional CAD parameter space, GP( α α α 1:3 ) corresp onds to a GP ﬁtted to the 3 ﬁrst principal comp onen ts α 1 , α 2 , α 3 , and GP(True) to the GP with the space of minimal circle co ordinates. outp erforms the GP in the CAD parameters space ( GP( X ) , d = 39). Y et, the GP( α α α 1:3 ) p erforms as well (and even b etter for small n ’s) as GP(True) . Heart target (Example 4 ) W e turn to the metamo deling of f 4 . It is a 40-dimensional function, f 4 ( x ) = k Ω t − Ω ˜ x k 2 2 that explicitly dep ends on the CAD parameters. Unlik e the previous test problem, the shap es do not ha ve superﬂuous parameters since all x j ’s are necessary to retrieve t . 7 diﬀeren t mo dels detailed through Sections 3.1 and 3.2 are inv estigated. GP( X ) , the standard GP carried out in the space of CAD parameters. GP( α α α 1:40 ) , the metamodel built in the space of 40 ﬁrst principal comp onen ts. Indeed, T able 6 informed us that an y shape is retrieved via its 40 ﬁrst eigenshap e co eﬃcien ts. T o build surrogates in reduced dimension, considering the cumulativ e eigen v alue sum in T able 6 , GP( α α α 1:2 ) , GP( α α α 1:4 ) and GP( α α α 1:16 ) are mo dels that consider the 2, 4 and 16 ﬁrst principal comp onen ts only . Finally , GP( α α α a ) and AddGP( α α α a + α α α a ) are also compared. T able 12 rep orts the a verage R2 indicator o ver 10 runs starting with space-ﬁlling DoEs of size n = 20 , 50 , 100 , 200. Figure 34 sho ws a boxplot of the results (for the sake of clarity , only runs with R2 ≥ 0.8 are sho wn). The input dimension for GP( X ) and for GP( α α α 1:40 ) is to o large for coping with n = 20 observ ations. GP( α α α 1:40 ) is giv en b eside GP( X ) b ecause b oth GPs hav e the same input space dimension. n GP( X ) GP( α α α 1:40 ) GP( α α α 1:2 ) GP( α α α 1:4 ) GP( α α α 1:16 ) GP( α α α a ) AddGP( α α α a + α α α a ) 20 - - -0.063 0.979 0.844 0.935 0.967 50 0.455 0.542 -0.009 0.984 0.968 0.983 0.991 100 0.662 0.868 0 0.986 0.986 0.986 0.997 200 0.873 0.988 0 0.987 0.991 0.987 0.999 T able 12: Av erage R2 ov er 10 runs when metamodeling f 4 . The beneﬁts of the additive GP app ear to b e threefold. First, it ensures sparsit y by selecting a small num b er of eigenshapes for the anisotropic part of the k ernel. A high-dimensional input space hinders the predictiv e capabilities when n is small, as conﬁrmed b y the weak p erformance of GP( X ) , GP( α α α 1:40 ) and even GP( α α α 1:16 ) for n = 20. When n increases, higher-dimensional mo dels b ecome more accurate. F or n = 100 and n = 200, the model with 16 principal comp onents outperforms the one with 4 principal comp onen ts, even though the latter w as more precise with n = 20 or n = 50 observ ations. In the case n = 200, ev en GP( α α α 1:40 ) outp erforms the 4 dimensional one ( GP( α α α 1:4 ) ). This is due to the fact that more principal components mean a more realistic shap e, hence less 37 Figure 34: Boxplots of R2 coeﬃcient for the diﬀerent models, rectangle test case (Example 4 ). “input space errors”. When few observ ations are av ailable, these mo dels suﬀer from the curse of dimensionalit y , but b ecome accurate as so on as their design space gets inﬁlled enough. With more observ ations, GP( α α α 1:40 ) may b ecome the b est mo del. Besides the dimension reduction, the selection of eigenshapes that truly inﬂuence the output is also critical. According to T able 6 , a tempting decision to reduce the dimension w ould be to retain the tw o ﬁrst principal comp onents, i.e. GP( α α α 1:2 ) . But since the 2 ﬁrst eigenshap es act on the shap e’s position (see Figure 22 ) to whic h f 4 is insensitive, this is a w eak option, as p ointed out by the R2 scores which are close to 0 for this model. Here, the selected v ariables are usually the 3rd and the 4th eigenshap e which act on the size of the rectangle, hence are of ﬁrst order importance for f 4 . In ab out 30% of the runs, they are accompanied b y the ﬁrst and the second one, and more rarely by other eigenshap es. Third, the AddGP( α α α a + α α α a ) outp erforms GP( α α α a ) . Indeed, the less imp ortan t eigenshap es (from a geometric point of view) v 5 , . . . , v 40 lo cally modify the rectangle, and allow the ﬁnal small im- pro vemen ts in f 4 . This highlights the beneﬁts of taking the remaining eigenshapes which act as lo cal shape reﬁnements in to account. Last, even though their input spaces ha ve the same dimension, GP( α α α 1:40 ) consisten tly outp er- forms GP( X ) . This conﬁrms our commen ts ab out the NACA manifold of Figure 26 : the eigenshap es are a b etter represen tation than the CAD parameters for statistical prediction. Catenoid shap e (Example 5 ) In relation with the catenoid, w e introduce the ob jective function f 5 ( r ) = 2 π R y B y A r ( y ) p 1 + r 0 ( y ) 2 dy . f 5 is an integral related to the surface of the axisymmetric surface giv en by the rotation of a curv e r ( y ). In our example, r ( y ) is the line b et ween t wo p oin ts A and B mo diﬁed by regularly spaced 38 deviations r = ( r 1 , . . . , r 29 ) > . Only r ’s generated b y a GP that lead to a curve inside a prescribed en velope (see Figure 23 ) are kept in the same spirit as [ 26 ] where a smo othing operator is applied to consider realistic airfoils. With this, it is exp ected that less than 29 dimensions suﬃce to accurately describ e all designs. This is conﬁrmed by the eigenv alues in T able 7 and the true dimensionality detected to b e 7. In this exp erimen t, we compare the predictiv e capabilities of six models. The ﬁrst one is the classical GP( X ) . The ob jectiv e function explicitly dep ends on r but its high-dimensionality may be a drawbac k for metamo deling. Ev en though less dimensions are necessary and many eigenshapes corresp ond to noise, a GP ﬁtted to all d = 29 eigenshap es, GP( α α α 1:29 ) , is considered. Along with it, GP( α α α 1:4 ) and GP( α α α 1:7 ) are considered. The former is an unsup ervised dimension reduction, considering the λ j ’s, while the latter is the full dimensional eigenshap e GP , since the eigenshapes 8 to 29 are non-informativ e. Finally , the GPs with v ariable selection GP( α α α a ) and AddGP( α α α a + α α α a ) , are also compared. T able 13 rep orts the a verage R2 indicator o ver 10 runs starting with space-ﬁlling DoEs of size n = 20 , 50 , 100 , 200. Figure 35 sho ws a boxplot of the results (for the sake of clarity , only runs with R2 ≥ 0.95 are sho wn). The input dimension for GP( X ) and for GP( α α α 1:29 ) is to o large for coping with n = 20 observ ations. GP( α α α 1:29 ) is given b eside GP( X ) because these GPs hav e the same input space dimension. n GP( X ) GP( α α α 1:29 ) GP( α α α 1:4 ) GP( α α α 1:7 ) GP( α α α a ) AddGP( α α α a + α α α a ) 20 - - 0.966 0.958 0.914 0.992 50 0.976 0.925 0.954 0.987 0.938 0.997 100 0.992 0.968 0.958 0.997 0.957 0.999 200 0.997 0.981 0.952 0.998 0.951 0.999 T able 13: Av erage R2 ov er 10 runs for the metamo deling of f 5 . These results indicate a better performance of AddGP( α α α a + α α α a ) which b eneﬁts from the priori- tization of the most inﬂuen tial eigenshapes in the additive model and, at the same time, accoun ts for all the 7 eigenshap es. Mo deling in the space of the full α α α ’s ( GP( α α α 1:7 ) ) p erforms fairly well to o b ecause the low true dimensionalit y (7). Despite its lo wer dimensionality , GP( α α α 1:4 ) do es not w ork w ell. This is b ecause the reﬁnements induced by v 5 , v 6 and v 7 are disregarded while acting on f 5 . This explanation also stands for the mo derate p erformance of GP( α α α a ) in which mainly the 4 ﬁrst principal comp onen ts are selected. Including the remaining components in a coarse GP as is done inside AddGP( α α α a + α α α a ) increases the p erformance. Ev en though there are d = 29 CAD parameters, GP( X ) exhibits correct p erformances: since only smo oth curv es are considered, they are fav orable to GP mo deling and the curse of dimensionality is damp ed. In this example, considering all 29 eigenshapes ( GP( α α α 1:29 ) ), even though it w as assumed that solely 7 were necessary , leads to the w orst results, since the non-informativ e eigenshapes augmen t the dimension without bringing additional information. NA CA 22 airfoil (Example 7 ) The last example brings us closer to real w orld engineering problems. The ob jective functions asso ciated to the NACA airfoil with 22 parameters (Example 7 ), f 7 L and f 7 D are the lift and the drag co eﬃcien t of this airfoil. f 7 L , f 7 D dep end implicitly and nonlinearly on x through Ω x . 39 Figure 35: Boxplots of R2 coeﬃcient for the diﬀerent models, catenoid test case (Example 5 ). T able 9 sho ws that only the 20 ﬁrst eigen v ectors are informativ e. Sev en metamo deling strategies are compared: GP( X ) ; GP( α α α 1:20 ) , the surrogate in the space of all 20 meaningful eigenshap es; GP( α α α 1:2 ) , GP( α α α 1:3 ) , GP( α α α 1:6 ) where few er eigenshapes are considered; GP( α α α a ) ; and AddGP( α α α a + α α α a ) . GP( α α α 1:20 ) is given b eside GP( X ) b ecause these GPs hav e almost the same input space dimension. T able 14 rep orts the a verage R2 indicator o ver 10 runs starting with space-ﬁlling DoEs of n = 20 , 50 , 100 , 200 observ ations. Figure 36 sho ws a b o xplot of the results (for the sak e of clarit y , only runs with R2 ≥ 0.8 for f 7 L and ≥ 0.6 for f 7 D are shown). The input dimension for the GP( X ) ( d = 22) and for GP( α α α 1:20 ) is to o large for coping with n = 20 observ ations. f 7 L n GP( X ) GP( α α α 1:20 ) GP( α α α 1:2 ) GP( α α α 1:3 ) GP( α α α 1:6 ) GP( α α α a ) AddGP( α α α a + α α α a ) 20 - - 0.857 0.907 0.930 0.935 0.957 50 0.956 0.973 0.714 0.935 0.950 0.970 0.984 100 0.975 0.989 0.708 0.938 0.962 0.981 0.992 200 0.987 0.995 0.515 0.954 0.968 0.993 0.996 f 7 D n GP( X ) GP( α α α 1:20 ) GP( α α α 1:2 ) GP( α α α 1:3 ) GP( α α α 1:6 ) GP( α α α a ) AddGP( α α α a + α α α a ) 20 - - 0.443 0.806 0.720 0.800 0.796 50 0.771 0.847 0.259 0.866 0.882 0.878 0.896 100 0.861 0.921 0.192 0.915 0.928 0.925 0.945 200 0.915 0.958 -0.008 0.920 0.950 0.946 0.969 T able 14: Av erage R2 ov er 10 runs for the metamo deling of f 7 L (top) and f 7 D (b ottom). 40 Figure 36: Boxplots of R2 co eﬃcient for the diﬀeren t mo dels, NACA 22 airfoil example. Left: Lift, f 7 L . Right: Drag, f 7 D . In this example to o, AddGP( α α α a + α α α a ) exhibits the b est predictiv e capabilities. Even though they are coarsely taken in to account, the non activ e eigenshap es which mostly represen t bumps, are included in the surrogate mo del. F or the lift, GP( α α α a ) p erforms quite well to o since the f 7 L relev ant dimensions hav e b een selected. The v ariable selection metho d provides con trasted results betw een f 7 L and f 7 D . F or the lift, the ﬁrst eigenshap e is not alwa ys selected. The second and the third one, as well as some higher order eigenshap es get selected, whic h conﬁrms the eﬀect of the bumps on the lift (see Figures 28 and 29 ). F or the drag ( f 7 D ) ho wev er, only the 2 or 3 ﬁrst eigenshapes are usually selected. W e hav e also noticed that the num b er of selected components tends to gro w with n . This is a desirable prop ert y since with larger samples, an accurate surrogate can b e built in a higher dimensional space. As already remarked in the previous examples (e.g. T able 12 ), it is seen here in Figure 36 that mo dels with more eigenshap es b ecome more accurate when the num b er of observ ations grows. F or f 7 D (b ottom table) for example, when n is small, GP( α α α 1:3 ) is b etter than GP( α α α 1:6 ) and GP( α α α 1:20 ) , but this c hanges as n grows, GP( α α α 1:6 ) and GP( α α α 1:20 ) b ecoming in turn the best eigenshap e truncation-based model. F or f 7 L (top table), in spite of the dimension reduction, v ery po or results are achiev ed when retaining only 2 or 3 comp onen ts, ev en with small n ’s. When considering only the tw o ﬁrst eigenshap es ( GP( α α α 1:2 ) ), the R2 is weak as the third eigenshap e signiﬁcantly mo diﬁes the cam b er. F or this GP , the p erformance decreases with n b ecause of situations lik e the one sho wn in Figure 30 where shapes that falsely look similar when considering α α α 1:2 only actually diﬀer in lift. Suc h situations are more lik ely to o ccur during the training of the GP as n gro ws, whic h degrades p erformance. The example of f 7 L is informative in the sense that GP( α α α 1:20 ) alwa ys outp erforms GP( α α α 1:6 ) which outp erforms GP( α α α 1:3 ) , for an y n (including v ery little n ’s), despite the higher dimension. By ignoring second order eigenshap es, GP( α α α 1:3 ) and GP( α α α 1:6 ) pro vide less reconstruction details. These details are nonetheless imp ortan t since they change the cam b er of the airfoil and this is why GP( α α α 1:20 ) , a more precise reconstruction, p erforms better. 41 Indeed, the remaining α α α ’s mainly reconstruct the bumps of this airfoil as can b e seen in Figure 29 , whic h do es inﬂuence the lift. This is also the reason wh y GP( X ) is b etter at predicting lift than GP( α α α 1:3 ) and GP( α α α 1:6 ) , which could seem counter-in tuitiv e at ﬁrst glance since the dimension is reduced. Last, let us point out than ev en though the dimension is almost the same, GP( α α α 1:20 ) consisten tly outp erforms GP( X ) for b oth the lift and the drag: it conﬁrms that the eigenshap e basis V is more relev ant than the CAD parameters basis for GP surrogate modeling. GP in reduced dimension: summary of results These four examples hav e pro ven the worth of the additive GPs: they are the mo dels that p erform the best b ecause of the selection and prioritization of active v ariables. Mo dels in reduced dimension that exclusively rely on the active eigenshapes pro vide accurate predictions to o, but are sligh tly outp erformed as they disregard smaller eﬀects. GPs built in the space of all (informative) eigen- shap es alw ays outp erform the ones built in the space of CAD parameters, ev en when b oth mo dels ha ve the same dimension. Among the GPs ov er the reduced space of δ ﬁrst principal axes, further remo ving dimensions generally pro duces b etter predictions when the n umber of data points n is small. As n increases, more eigenshapes lead to b etter metamo deling. Mo dels where dimensions ha ve been chosen only from a geometric criterion (the PCA) hav e a prediction quality that dep ends on the output: if the ﬁrst mo des do not impact y , as the 2 ﬁrst eigenshap es of the rectangle prob- lem, predictions are p o or. Ignoring reconstruction details that aﬀect the output as second-order eigenshap es in f 7 L also degrades the p erformance, highlighting the imp ortance of ﬁnding the active v ariables that aﬀect the output. 4 Optimization in reduced dimension W e now turn to the problem of ﬁnding the shap e that minimizes an exp ensiv e ob jective function f ( · ). T o this aim, we employ the previous additiv e GP , whic h w orks in the space of eigencomponents α α α , in an Eﬃcient Global Optimization pro cedure [ 24 ]: at eac h iteration, a new shap e is determined given the previous observ ations { ( α α α (1) , y 1 ) , . . . , ( α α α ( n ) , y n ) } by maximizing the Expected Improv ement (EI, [ 31 , 24 ]) as calculated with the GP Y ( α α α ): α α α ( n +1) ∗ = arg max α α α ∈ R D EI( α α α ; Y ( α α α )), (11) where the EI is deﬁned as EI( x ; Y ( x )) = ( a − m ( x )) φ N  a − m ( x ) s ( x )  + s ( x ) ϕ N  a − m ( x ) s ( x )  . (12) m ( · ) and s ( · ) are the conditional mean and standard deviation of Y ( · ) (Equation ( 10 )), resp ectiv ely , while φ N and ϕ N stand for the normal cum ulative distribution function and probability densit y function. The threshold a is usually set as the curren t minim um, f min := min i =1 ,...,n y i , while other v alues hav e also b een inv estigated [ 23 , 21 ]. 42 4.1 Alternativ e Exp ected Improv ement Maximizations Maximization in the entire α α α space The most straightforw ard wa y to maximize the EI is to consider its maximization in R D as in Equation ( 11 ). How ever, this optimization is t ypically diﬃcult as the EI is a m ulti-mo dal and high ( D ) dimensional function 4 . Maximization in the α α α a space W e can how ever take adv an tage of the dimension reduction b ey ond the construction of Y ( · ): α α α a ∈ R δ are the v ariables that aﬀect y the most and should b e prioritized for the optimization of f ( · ). A second option is therefore to maximize the EI solely with resp ect to Y a ( α α α a ) in dimension δ . This option is nonetheless incomplete as the full GP Y ( · ) requires the kno wledge of α α α = [ α α α a , α α α a ]. A ﬁrst simple idea to augment α α α a is to set α α α a equal to its mean, 0 . The inactive part of the cov ariance matrix K a w ould be ﬁlled with the same scalar and t he full cov ariance matrix K = K a + K a w ould hav e a degraded conditioning. A second simple idea is to sample α α α a ∼ N ( 0 , λ λ λ a ). Ho wev er, α α α a act as local reﬁnemen ts to the shap e that con tribute a little to y , and should also b e optimized. In [ 26 ], the authors observed that despite the gain in accuracy of surrogate mo dels in a reduced basis (directions of largest v ariation of the gradien t of the lift and drag in their application), a restriction to to o few directions led to po orer optimizations since small eﬀects could not b e accoun ted for. Optimization in α α α a space complemented with a random embedding in α α α a This leads to the third prop osed EI maximization, whic h mak es step 5 in Figure 1 : a maximization of the EI with resp ect to α α α a and the use of a random embedding [ 51 ] to coarsely optimize the comp onen ts α α α a : EI([ α α α a , α a ]) is maximized, where α ∈ R is the co ordinate along a random line in the α α α a space, a = ( a 1 , . . . , a D − δ ) > . Since α α α a ha ve been classiﬁed as inactiv e, it is not necessary to mak e a large eﬀort for their optimization. This approach can b e view ed as an extension of REMBO [ 51 ]. In REMBO, a lo wer dimensional v ector y ∈ R δ is em b edded in X through a linear random em b edding, y 7→ A R y , where A R ∈ R D × δ is a random matrix. Instead of choosing a completely random and linear embedding with user-chosen (inv estigated in [ 7 ]) eﬀective dimension δ , our em b edding is nonlinear (eﬀect of the mapping φ ( · )), sup ervised and semi-random (choice of the activ e/inactive directions). The dimension is no longer arbitrarily chosen since it is determined b y the num b er of selected activ e components (Section 3.2.1 ), and the random part of the embedding is only asso ciated to the inactive parts of α α α : denoting α α α a = ( α a 1 , . . . , α a δ ) > the selected comp onen ts (that are not necessarily the δ ﬁrst axes) and α α α a = ( α a 1 , . . . , α a D − δ ) > the inactiv e ones, our em b edding matrix A emb ∈ R D × ( δ +1) transforms [ α α α a , α ] in to the α α α space to which the x ’s are nonlinearly mapp ed. The δ ﬁrst columns of A emb , A ( i ) emb , i = 1 , . . . , δ , correspond to α α α a and contain the δ ﬁrst vectors of the canonical basis of R D , e ( i ) D , i.e. A ( i ) emb = δ a i i , where δ ij stands for the Kroneck er sym b ol here, δ ij = 1 if i = j , 0 else. The δ + 1-th column of A emb con tains a in the rows which corresp ond to α α α a , A ( δ +1) emb a i = a i , i = 1 , . . . , D − δ . Ro ws corresp onding to active α α α ’s equal 0. 4 As explained at the end of Section 2 , w e can restrict all calculations to α α α ’s d 0 ﬁrst co ordinates. Ev en though d 0  D , it has approximately the same dimension as d , hence the optimization is still carried out in a high dimensional space. 43 Assuming φ − 1 exists, the prop osed approach is the em b edding of a lo wer dimensional design [ α α α a , α ] whose dimension δ + 1 has carefully be chosen, in X , via the nonlinear and problem-related mapping [ α α α a , α ] 7→ φ − 1 ( V A emb [ α α α a , α ] + φ φ φ ). The approach can alternatively be considered as an aﬃne mapping of [ α α α a , α ] to the complete space spanned by the eigenshapes V , [ α α α a , α ] 7→ V emb [ α α α a , α ] + φ φ φ with V emb := V A emb (13) The shap es generated b y the map of Equation ( 13 ) are em b edded in the space of all discretized shap es. The columns of V emb ∈ R D × ( δ +1) asso ciated to active comp onen ts are the corresponding eigenshap es, while its last column is sum of the remaining eigenshap es, w eighted b y random co ef- ﬁcien ts, namely Va , hence a sup ervised and semi-random em b edding. Another diﬀerence to [ 51 ] is that only the EI maximization is carried out in the REMBO framework; the surrogate mo del is not built in terms of [ α α α a , α ] but rather with the full α α α ’s via the additive GP (Section 3.2.2 ). In this v ariant, the EI maximization is carried out in a muc h more tractable δ + 1 -dimensional space and still has analytical gradients (see next section). F rom its optim um α α α ∗ = [ α α α a ∗ , α ∗ ] ∈ R δ +1 arises a D -dimensional vector, α α α ( n +1) ∗ = A emb [ α α α a ∗ , α ∗ ] to b e ev aluated b y the true function (this is the pre-image problem discussed in Section 4.2 ). Figure 37: EI maximization in α α α a complemen ted b y the maximization along a , a random line in the α α α a space. EI gradient in α α α space The Exp ected Impro vemen t ( 12 ) is diﬀerentiable and its deriv ative is kno wn in closed-form [ 38 ]: ∇ EI( x ) = −∇ m ( x ) × φ N ( z ( x )) + ∇ s ( x ) × ϕ N ( z ( x )), (14) where z ( x ) = ( f min − m ( x )) /s ( x ). ∇ m ( x ) and ∇ s ( x ) require the gradient of Y ( · )’s k ernel k at x , with the past observ ations x (1: n ) , i.e. ∇ k ( x , x (1: n ) ), whic h is analytically computable. ∇ s 2 ( x ) = 2 s ( x ) ∇ s ( x ) helps computing s ( x )’s gradient. In the case of the additive GP ( 9 ), the mean and v ariance m ( α α α ) and s 2 ( α α α ) are given by ( 10 ). Using the notations of Section 3.2.2 and exploiting the symmetry of K , few calculations lead to ∇ m ( α α α ) = ∇ k ( α α α, α α α (1: n ) ) > K − 1 ( y 1: n − 1 n b β ) ∇ s ( α α α ) = − ∇ k ( α α α, α α α (1: n ) ) > K − 1 k ( α α α, α α α (1: n ) ) s ( α α α ) (15) where ∇ k ( α α α, α α α (1: n ) ) = σ 2 a ∇ k a ( α α α a , α α α a (1: n ) ) + σ 2 a ∇ k a ( α α α a , α α α a (1: n ) ), whic h are plugged in ( 15 ) and in ( 14 ) together with z ( α α α )’s expression to obtain ∇ EI( α α α ; Y ( α α α )). In the alternatives prop osed b efore, giv en an α α α ∈ R D , the gradien t of the EI can be computed eﬃciently , accelerating its maximization 44 whic h is carried out by the genetic algorithm using deriv atives genoud [ 29 ]. In the random em- b edding of α case, the EI of [ α α α a , α ] ∈ R δ +1 is giv en b y EI( A emb [ α α α a , α ]; Y ( α α α )), and its gradient by A > emb ∇ EI( A emb [ α α α a , α ]; Y ( α α α )). Setting b ounds on α α α for the EI maximization As seen in the examples of Section 2.3 , neither the manifold of α α α ’s, nor its restriction to α α α a need to b e hyper-rectangular domains, which is a common assumption made b y most optimizers such as genoud [ 29 ], the algorithm used in our implementation. Two strategies w ere imagined to control the space in which the EI is maximized ( 11 ): the ﬁrst one is to restrict the EI maximization to A b y setting it to zero for α α α ’s that are outside of the manifold. The b eneﬁt of this approach is that only realistic α α α ’s are prop osed. But it might suﬀer from an incomplete description of the entire manifold of α α α ’s, A , which is approximated b y A N . Additionally , given A N , the statement “b eing inside/outside the manifold” has to b e clariﬁed. W e rely on a nearest neighbor strategy in which the 95th quantile of the distances to the nearest neigh b or within A N , d 0 . 95 , is computed and used as a membership threshold: a new α α α is considered to belong to A if and only if the distance to its nearest neigh b or within A N is smaller than d 0 . 95 . In the ligh t of these limitations, a second strategy , in which the EI is maximized in A N ’s cov ering hyper-rectangle, is also inv estigated. The v ariant of EI maximization with embedding (random line in α α α a ), introduces an α co ordinate which has to be b ounded too. The α min and α max b oundaries are computed as the smallest and largest pro jection of A N on a . But depending on A N and on a , this may lead to a to o large domain since the embedded α a migh t stay outside the α α α a co vering h yp er-rectangle. In the spirit of [ 6 ], to a void this phenomenon, the largest α min and the smallest α max suc h that α a b elongs to the cov ering h yp er-rectangle ∀ α ∈ [ α min , α max ], are chosen. EI maximization via the CAD parameters A last option consists in carrying the maximization in the X space through the mapping φ ( · ) b y max x ∈ X EI( x ; Y ( V > ( φ ( x ) − φ φ φ ) | {z } α α α )) = EI( x ; Y ( α α α ( x ))). This av oids b oth the aforemen tioned optimiza- tion domain handling and the pre-image searc h describ ed in the following section. Ho wev er, this optimization might b e less eﬃcient since it is a maximization in d > δ dimensions, and since ∇ φ ( x ) is unknown, the EI loses the closed-form expression of its gradient. 4.2 F rom the eigencomp onen ts to the original parameters: the pre-image problem The (often exp ensiv e) numerical simulator underlying the ob jectiv e function can only take the original (e.g. CAD) parameters as inputs. When the EI maximization is carried out in the eigen- comp onen ts space, the α α α ’s need to b e translated in to x ’s. T o this aim, the pr e-image problem consists in ﬁnding the CAD parameter vector x whose description in the shap e representation space Φ equals V α α α ( n +1) ∗ + φ φ φ . Because there are more α α α ’s than x ’s, D  d , a strict equalit y ma y not hold and the pre-image problem is relaxed in to: x ( n +1) = arg min x ∈ X k ( φ ( x ) − φ φ φ ) − V α α α ( n +1) ∗ k 2 R D . (16) 45 T o complete an iteration, the pre-image problem ( 16 ) is solved and its solution x ( n +1) , the para- metric shap e that resembles α α α ( n +1) ∗ the most, is ev aluated b y the simulator, which returns y n +1 = f ( x ( n +1) ). Solving the pre-image problem does not in volv e calls to the s im ulator so that it is rela- tiv ely not costly . The surrogate model is then up dated with y n +1 and α α α ( n +1) := V > ( φ ( x ( n +1) ) − φ φ φ ), the x ( n +1) description in the V basis (step 6 in Figure 1 ). Dep ending on the α α α ( n +1) ∗ yielded by the EI maximization (remember it may not stay on the manifold A ), φ φ φ ( n +1) ∗ := V α α α ( n +1) ∗ + φ φ φ and φ φ φ ( n +1) := V α α α ( n +1) + φ φ φ , the shape representation of the α α α promoted b y the EI and the shap e representation of x ( n +1) , resp ectiv ely , may substantially diﬀer. While it is mandatory to up date the GP ( 9 ) with the pair ( α α α ( n +1) , y n +1 ), it may at ﬁrst seem unclear what should b e done with α α α ( n +1) ∗ . When α α α ( n +1) ∗ do es not b elong to A and do es not ha ve a pre-image, it might seem straightforw ard to ignore it. How ever, if α α α ( n +1) ∗ w as yielded by the EI, it is very lik ely to b e promoted in the following iterations, since its uncertaint y , s 2 ( α α α ( n +1) ∗ ), has not v anished. Therefore, if φ φ φ ( n +1) ∗ and φ φ φ ( n +1) are substan tially diﬀerent, the virtual pair ( α α α ( n +1) ∗ , y n +1 ) is included in the GP ( 9 ) too in a strategy called r eplic ation . W e deﬁne replication in general terms. Deﬁnition 2 (Replication) In Bayesian optimization, when the GP is built over c o or dinates α α α that ar e a mapping 5 of the original c o or dinates x , α α α = T ( x ) , at the end of e ach iter ation a pr e-image pr oblem such as ( 16 ) must b e solve d to tr anslate the new ac quisition criterion maximizer α α α ( n +1) ∗ into the next p oint to evaluate x ( n +1) and the asso ciate d iter ate α α α ( n +1) = T ( x ( n +1) ) . The replication str ate gy c onsists in up dating the GP with b oth  α α α ( n +1) , f ( x ( n +1) )  and  α α α ( n +1) ∗ , f ( x ( n +1) )  pr ovide d α α α ( n +1) ∗ and α α α ( n +1) ar e suﬃciently diﬀer ent. Here, the diﬀerence b et ween α α α ( n +1) ∗ and α α α ( n +1) is calculated as the distance b et ween the associated shap es φ φ φ ( n +1) ∗ and φ φ φ ( n +1) . Since the database Φ Φ Φ contains the shap e representation of N distinct designs, d 0 := min i,j =1 ,...,N i 6 = j k Φ Φ Φ i − Φ Φ Φ j k R D , the minimal distance b etw een t wo diﬀeren t designs in Φ Φ Φ is used as a threshold beyond whic h φ φ φ ( n +1) and φ φ φ ( n +1) ∗ are considered to be diﬀerent. The replication strategy is further motiv ated by the fact that since x ( n +1) = arg min x ∈ X k ( φ ( x ) − φ φ φ ) − V α α α ( n +1) ∗ k 2 R D = arg min x ∈ X k V > ( φ ( x ) − φ φ φ ) | {z } α α α ( x ) − α α α ( n +1) ∗ k 2 R D , where the last equalit y expresses just a change of basis since V is orthogonal, α α α ( n +1) is an orthogonal pro jection 6 of α α α ( n +1) ∗ on A , see Figure 38 . This is somehow similar to [ 35 , 36 ] where the authors pro ject non realistic shap es on a smo oth surface built via diﬀuse approximation or a lo cal p olynomial ﬁtting, using the p oin ts of A N , to retriev e a realistic design. In our approach, unrealistic shap e representations are directly pro jected on to A through the resolution of ( 16 ). Incorp orating the non ph ysical ( α α α ( n +1) ∗ , y n +1 ) in the surro- gate model can b e view ed as an extension of the surrogate model outside its domain [ 42 ] (outside the manifold A in our case) b y constant prolongation. 4.3 Exp erimen ts The ideas developed in Section 2 , 3 , 4 when put together mak e the metho d already sketc hed in Figure 1 and more detailed in the follo wing pseudo-co de: 5 In this article, the mapping T ( · ) is the comp osition of φ ( · ) with the pro jection onto a subspace of ( v 1 , . . . , v D ). 6 Since we do not know the conv exity of A , the pro jection might not b e unique. 46 Figure 38: When α α α ( n +1) ∗ / ∈ A , the solution of the pre-image problem (in the α α α space), α α α ( n +1) , is its pro jection on A . Man y algorithms result from the combination of versions of the GP metamo del and the EI maximization. They are related to the space in which these op erations are p erformed (the initial X or the eigencomponents A with the retained n umber of dimensions), the classical or additive GP , and the use of embedding or not. Before further explaining and testing them, we introduce a shorthand notation. The algorithms names are made of t wo parts separated by a dash, GP version-EI version . The GP part ma y either b e an anisotropic GP with Mat ´ ern kernel, in which case it is noted GP , or an additive GP made of an anisotropic plus an isotropic kernel noted AddGP . The spaces on whic h they op erate are sp eciﬁed in paren theses. F or example, GP( α α α 1:3 ) is an anisotropic GP in the space spanned by ( α 1 , α 2 , α 3 ), AddGP( X 1:3 + X 4:40 ) is an additive GP where the kernel is the sum of an anisotropic k ernel in ( x 1 , x 2 , x 3 ) and an isotropic kernel in ( x 4 , . . . , x 40 ). The space o ver which the EI maximization is carried out is sp eciﬁed in the same w ay . Unspeciﬁed dimensions in the EI hav e their v alue set to the middle of their deﬁning interv al, e.g., GP( X )-EI( X 1:2 ) means that the EI maximization is done on the 2 ﬁrst comp onen ts of x , the other ones b eing ﬁxed to 0 if the interv al is cen tered. The EI descriptor can also b e a keyw ord characterizing the EI alternativ e emplo yed (see Section 4.1 ). F or example, AddGP( α α α 1:2 + α α α 3:20 )-EI embed means that the EI is maximized in a 3 dimensional space made of α 1 , α 2 and the embedding α . 4.3.1 Optimization of a function with low eﬀectiv e dimension A set of exp erimen ts is now carried out that aims at comparing the three optimization alternativ es in volving GPs which ha v e b een introduced in Section 4.1 when a subset of active v ariables has b een iden tiﬁed: EI maximization in the space of active v ariables, in the space of active v ariables with an embedding in the inactiv e space, and in the entire space. In order to test the EI maximization separately from the space reduction metho d (the mapping, PCA and regularized lik eliho od), we 47 1 Sample N designs x ( i ) and discretize them to form the matrix Φ Φ Φ; /* see Section 2.2 */ 2 Eigendecomp osition of C Φ Φ Φ := 1 N ( Φ Φ Φ − 1 N φ φ φ > ) > ( Φ Φ Φ − 1 N φ φ φ > ) ⇒ eigenv ector basis V = { v 1 , . . . , v D } and principal comp onen ts α α α = V > ( φ ( x ) − φ φ φ ); Ev aluate n designs x (1: n ) ⇒ y 1: n , and compute their eigencomp onen ts α α α (1: n ) ; while n < c omputational budget do 3 Maximize pl λ ( α α α (1: n ) , y 1: n , ϑ ) ⇒ active and inactive eigencomp onents, α α α = [ α α α a , α α α a ]; /* see Section 3.2.1 */ 4 Build the additive GP Y ( α α α ) = b β + Y a ( α α α a ) + Y a ( α α α a ); /* see Section 3.2.2 */ Randomly draw a vector of D − δ comp onen ts, a 5 Maximize the EI with respect to [ α α α a , α a ] ⇒ α α α ( n +1) ∗ next shap e to be ev aluated; /* sharp maximization w.r.t. α α α a and coarse maximization w.r.t. α α α a , see Section 4.1 */ 6 Solve pre-image problem ⇒ x ( n +1) to b e ev aluated ⇒ y n +1 = f ( x ( n +1) ), and α α α ( n +1) = V > ( φ ( x ( n +1) ) − φ φ φ ), asso ciated eigencomponents; /* see Section 4.2 */ 6 if α α α ( n +1) and α α α ( n +1) ∗ to o diﬀer ent then Up date the GP with ( α α α ( n +1) , y n +1 ) and ( α α α ( n +1) ∗ , y n +1 ); /* Replication, see Definition 2 */ else Up date the GP with ( α α α ( n +1) , y n +1 ); end n ← n + 1; end Algorithm 1: Pseudo-co de of the Bay esian optimization in reduced eigencomponents, AddGP( α α α a + α α α a )-EI embed with replication . start by assuming that the eﬀectiv e v ariables are kno wn. Complete exp eriments will b e given later. W e minimize a function dep ending on a small num b er of parameters, the follo wing modiﬁed v ersion of the Griew ank function [ 32 ], f MG ( x ) = f Griewank ( x ) + f Sph ( x ) , x ∈ [ − 600 , 600] d (17) where f Griewank ( x ) is the classical Griewank function in dimension 2, f Griewank ( x ) = 1 4000 2 X j =1 x 2 j − 2 Y j =1 cos( x j √ j ) + 1 deﬁned in [ − 600 , 600] 2 and whose optimum, lo cated in (0 , 0) > , is 0. T o create a high-dimensional function where only few v ariables act on the output, the f Sph function is added to f Griewank , where f Sph is a sphere centered in c , with smaller magnitude than f Griewank , and whic h only dep ends on the v ariables x 3 , . . . , x 10 : f Sph ( x ) = 1 400 , 000 10 X j =3 ( x j − c j − 2 ) 2 . 48 f Sph ( x ) is the squared Euclidean distance betw een ( x 3 , . . . , x 10 ) > and c which is set to c = ( − 140 , − 100 , − 60 , − 20 , 20 , 60 , 100 , 140) > in our exp eriments. Completely ignoring ( x 3 , . . . , x 10 ) > therefore does not lead to the optimum of f MG . W e deﬁne f MG in [ − 600 , 600] d , d ≥ 10: the v ari- ables x 11 , . . . , x d do not hav e any inﬂuence on f MG but augment the dimension. In the follo wing exp erimen ts, we take d = 40. The additiv e GP describ ed in Section 3.2.2 op erates b etw een the active space comp osed of x 1 and x 2 , and the inactiv e space of X 3: d . With the additive GP , three w ays to optimize the EI are inv estigated: AddGP( X 1:2 + X 3:40 )-EI( X 1:2 ) where the EI is optimized along the activ e space only and x 3 , . . . , x d are set to the middle of their in terv als ( 0 ), AddGP( X 1:2 + X 3:40 )-EI embed where the EI is optimized in the activ e space completed by the embedding in the inactiv e space, and AddGP( X 1:2 + X 3:40 )-EI( X ) where the EI is optimized in the en tire X . These Bay esian optimization algorithms with additive GPs are compared to three classical optimizers: one based on the GP built in the en tire space ( GP( X )-EI( X ) ), another based on the building of the GP in the X 1:2 := ( x 1 , x 2 ) space ( GP( X 1:2 )-EI( X 1:2 ) ), and one working in the X 1:10 := ( x 1 , . . . , x 10 ) space ( GP( X 1:10 )-EI( X 1:10 ) ). W e start the exp erimen ts with an initial DoE of n = 20 p oints, which is space-ﬁlling in X (or in X 1:2 or X 1:10 for the v ariants where the metamodel is built in these spaces). W e then try to ﬁnd the minimum of f MG , x ∗ := (0 , 0 , c , ∗ ∗ ∗ ) in the limit of p = 80 iterations 7 . F or the instance where the metamo del is built in X ⊂ R 40 , w e cannot start with an initial DoE of n = 20 p oin ts, and the experiments are initialized with n = 50 designs, only p = 50 iterations b eing allo wed. The EI b eing maximized by the genetic algorithm genoud [ 29 ], we use the same p opulation and num b er of generations in each v ariant for fair comparison. The low est ob jectiv e function v alues obtained by the algorithms are reported in T able 15 . They are av eraged ov er 10 runs with diﬀeren t initial designs, and standard deviations are given in brack ets. The left-hand side columns corresp ond to standard GPs carried out in diﬀeren t spaces, and the righ t-hand side columns corresp ond to runs using the additive GP of Section 3.2.2 together with diﬀeren t EI maximization strategies. Metamo del Standard GP Additiv e GP GP( X 1:2 )- GP( X 1:10 )- GP( X )- AddGP( X 1:2 + X 3:40 )- EI maximization EI( X 1:2 ) EI( X 1:10 ) EI( X ) EI( X 1:2 ) EI embed EI( X ) Optim um (sd) 0.776 (0.221) 1.127 (0.214) 0.669 (0.280) 0.545 (0.210) 0.481 (0.185) 0.986 (0.366) T able 15: Ob jectiv e function v alues obtained within 100 (20+80 or 50+50 for the third column) ev aluations of the 40-dimensional f MG , with diﬀeren t metamo dels and v arying EI maximization strategies. The results in T able 15 show that the methods using the additive GP usually outperform those where the GP is built in a more or less truncated X space. The results of GP ( X 1:10 ) -EI( X 1:10 ) are surprisingly bad. Additional exp erimen ts hav e sho wn that they seem to be link ed with a to o small initial DoE. Notice that with another v ersion of f MG (where c is closer to the b oundaries of X 3:10 , not rep orted here), GP ( X 1:10 ) -EI( X 1:10 ) outp erforms GP ( X 1:2 ) -EI( X 1:2 ) and the classical GP( X )-EI( X ), which is normal since in this situation X 3:10 b ecome activ e. Ho wev er, the AddGP-EI embed and AddGP-EI( X ) versions with the additive GP remain better. 7 that is to say EI maximizations, whose optima are ev aluated by f MG . 49 The maximization of the EI for the additiv e GP b et ween the activ e and inactive comp onen ts p erforms the b est when the maximization strategy com bines the adv antage of a low-dimensional ac- tiv e space with a rough maximization in the larger inactiv e subspace, the AddGP-EI embed strategy . It is also w orth men tioning that it is the v ariant with lo west standard deviation. AddGP-EI( X ) , searc hing in a 40 dimensional space, is not able to attain the optimum as well. Ev en though it is carried out in a very small dimension, AddGP-EI( X 1:2 ) is also sligh tly outperformed b y AddGP-EI embed , b ecause it cannot optimize the x a ’s. In this instance of f MG where c is relatively close to 0 , AddGP-EI( X 1:2 ) does not suﬀer to muc h from disregarding x a ’s. How ever, in the additional exp erimen t where c is close to the b oundaries of X 3:10 , AddGP-EI( X 1:2 ) exhibits p oor results, while AddGP-EI embed still p erforms well. In this case AddGP-EI( X ) p erforms slightly b etter than AddGP-EI embed , b ecause it beneﬁts from the maximization ov er the complete X while the re- striction on x hinders AddGP-EI embed to get as close to the solution, but AddGP-EI embed still p erforms reasonably w ell and has a smaller standard deviation than AddGP-EI( X ) . F or all these reasons, the additive GP with random embedding ( AddGP-EI embed ) strategy is assessed as the safest one. 4.3.2 Exp erimen ts with shap e optimization W e now turn to the shap e optimization of the designs in tro duced in Section 2.3 whose ob jectiv e functions were deﬁned in Section 3.3 . W e compare the standard approach where the designs are optimized in the CAD parameters space with the metho dologies where the surrogate model is built in the eigenshap e basis (all v ariants described in Section 4.1 ). F or fair comparison, the same computational eﬀort is put on the in ternal EI maximization. Catenoid shap e W e w ant to ﬁnd a curve r ( y ) which minimizes the asso ciated axisymmetric surface as expressed by the integral making f 5 ( x ) in the catenoid problem (Example 5 ). The diﬀerent versions of Ba yesian optimizers that are now tested are the follo wing: • the standard GP( X )-EI( X ) where b oth the GP and the EI w ork with the original x ’s, i.e. CAD parameters; • GP( α α α )-EI( α α α ) indicates the GP is built in the space of (to be sp eciﬁed) principal com- p onen ts ov er whic h the EI is maximized; are tak en equal to 1:4 and 1:7 b ecause, as seen in T able 7 , 4 and 7 eigencomp onen ts account for 98% and all of the shape v ariance, resp ectiv ely . • GP( α α α )-EI( X ) indicates the GP is built in the space of principal components but the EI is maximized in the X space; • AddGP( α α α a + α α α a ) refers to the additiv e GP , for which three EI maximizations ha ve b een de- scrib ed (Section 4.1 ): EI embed where α α α a and an em b edding in the α α α a space is maximized, EI( α α α a ) where only the actives α α α ’s are maximized (the remaining ones b eing set to their mean v alue in A N , 0 ), and EI( α α α ) where all α α α ’s are maximized; • GP( α α α a )-EI( α α α a ) means the GP is built ov er the space of activ e α α α ’s, ov er which the EI maxi- mization is carried out. 50 Regarding the EI maximization in A , on manifold states that the search is restricted to α α α ’s close to A N . If not, the maximization is carried out in A N ’s cov ering hyper-rectangle, and with replication indicates that both α α α ( n +1) and α α α ( n +1) ∗ / ∈ A are used for the metamodel up date, while no replication indicates that only the α α α ( n +1) ’s are considered by the surrogate. The b est ob jective function v alues obtained by the algorithms are reported in T able 16 . They are a veraged ov er 10 runs with diﬀerent initial DoEs, and standard deviations are given in brac kets. The algorithms start with a space-ﬁlling DoE of 20 individuals and are run for 60 additional iterations. In the case of the CAD parameters, since d = 29 > 20, the initial DoE contains 40 designs and the algorithm is run for 40 iterations. The num b er of function ev aluations to reach certain levels is also rep orted, to compare the abilit y of the algorithms to quickly attain near-optimal v alues. When at least one run has not reached the target, a rough estimator of the empirical run time [ 2 ], T s /p s , is pro vided in red, the n umber of runs achieving the target v alue being reported in brack ets. T s and p s corresp ond to the av erage n umber of function ev aluations of runs that reac h the target and the prop ortion of runs attaining it. Metho d Best v alue Time to 27 Time to 30 Time to 35 GP( X )-EI( X ) 31.83 (2.10) × 570.0 [1] 68.5 (9.9) GP( α α α 1:7 )-EI( α α α 1:7 ) on manifold 26.93 (0.18) 86.9 [7] 40.2 (10.5) 40.2 (10.5) GP( α α α 1:7 )-EI( α α α 1:7 ) with replication 26.16 (0.10) 30.5 (2.8) 24.3 (0.8) 23.4 (0.5) GP( α α α 1:7 )-EI( α α α 1:7 ) no replication 27.62 (0.72) 147.5 [2] 25.4 (2.5) 23.5 (0.5) GP( α α α 1:7 )-EI( X ) 40.57 (11.61) 370.0 [1] 163.3 [3] 120.0 [4] AddGP( α α α a + α α α a )-EI embed on manifold 50.67 (0.05) × × × AddGP( α α α a + α α α a )-EI embed no replication 27.58 (0.53) 172.5 [2] 23.6 (1.4) 22.3 (0.7) AddGP( α α α a + α α α a )-EI embed with replication 26.19 (0.16) 28.4 (4.1) 24.2 (3.1) 22.8 (1.9) GP( α α α 1:4 )-EI( α α α 1:4 ) with replication 27.12 (0.13) 550.0 [1] 27.0 (3.9) 25.4 (3.8) T able 16: Best ob jectiv e function v alues found and num b er of iterations required to attain a ﬁxed target (av erage ov er 10 runs, standard deviations in brack ets) for diﬀeren t metamo dels and opti- mization strategies, on the catenoid problem (Example 5 ). Red ﬁgures corresp ond the empirical run time, with the num b er of runs whic h attained the target in brack ets, and ’ × ’ signiﬁes that no run was able to attain it within the limited budget. Comparing the results in T able 16 of the algorithms that stay on the manifold with the others indicates that restricting the searc h of EI maximizers to the vicinit y of A N w orsens the con vergence. Indeed, promising α α α ’s are diﬃcult to attain or are even falsely considered as outside A . This observ ation gets even worse with the additive GP: sta ying in the neigh b orhoo d of A N has even stronger consequences b ecause of the restriction to the random line a . The EI should therefore b e optimized in the cov ering hyper-rectangle of A N . F or tac kling the issue of EI maximizers α α α ( n +1) ∗ / ∈ A , the replication strategy exhibits b etter p erformance than the strategy where only the pro jection, α α α ( n +1) , is used for up dating the GP . Figure 39 sho ws the t ypical eﬀect of the replication strategy . On the left, the inner EI maximization is carried out in the cov ering hyper-rectangle of A N but only the α α α ∈ A obtained through the pre- image problem solving are used to construct the surrogate mo del. On the right, all EI maximizers ha ve b een used for the GP , including α α α / ∈ A . Without replication, since the v ariance of the GP at previous EI maximizers has not v anished, the EI contin ues promoting the same α α α ’s, which hav e 51 appro ximately the same pre-image. The same part of the α α α space is sampled, whic h not only leads to a premature conv ergence (the b est observed v alue has already been attained after 6 iterations), but also increases the risk of getting a singular co v ariance matrix. With the replication, the GP v ariance v anishes for all EI maximizers, ev en those outside A , remo ving any further EI from these α α α ’s. The α α α space is b etter explored with b eneﬁts on the ob jective v alue (26.26 against 27.13 here). Figure 39: Optimization with EI maximization in the co vering hyper-rectangle of A N without (left) or with (right) replication strategy . The EI strategy which consists in maximizing via the X space of CAD parameters av oids the α α α manifold issues. Ho wev er, it do es not p erform w ell, b ecause of the higher dimensional space where the criterion is maximized. An additional dra wback for eﬃcient maximization is that ∇ EI is not kno wn analytically in this case. In this catenoid example, the additiv e GP and the GP in the space of (all) 7 pr incipal comp onen ts ac hieve comparable results, both in terms of b est v alue, and of function ev aluations to attain the targets. Indeed, the true dimension (7) is relativ ely low, and we ha ve noticed that the 5, 6 or even 7 ﬁrst eigenshap es often got classiﬁed as active for the additive GP . Heart rectangle W e now consider Example 4 and the minimization of f 4 ( x ) that expresses the distance from a shape to a rectangle deformed as an heart. As b efore, diﬀerent metamo deling and EI maximization options are b enc hmarked. They include: the standard approac h of doing the pro cess in the space of CAD parameters (in dimension d = 40); the optimization in the space of 2, 4, 16 or 40 ﬁrst principal components, where 100% of the shapes v ariability is reco vered with 40 eigencomponents as seen in T able 6 . Sup ervised eigenshap e selection metho ds (Section 3.2 ) are also used: the GP built ov er α α α a only , and the additiv e mo del ov er α α α a and α α α a . F or the latter, the 4 EI maximization options of Section 4.1 are compared. In ligh t of the ab o v e optimization results on the catenoid, the three EI maximization strategies are carried out in the cov ering hyper-rectangle of A N (as opp osed to restricted to the neigh b orho od of A N ), and EI maximizers whic h do not belong to A are nonetheless used for the GP update. Henceforth, the with replication strategy b ecomes the new default in all algorithms carrying out EI maximizations in 52 α α α ’s and it will no longer be speciﬁed in the algorithms names. The statistics on the solutions prop osed by the algorithms are rep orted in T able 17 . They consist in the b est ob jective function v alues av eraged ov er 10 runs with diﬀerent initial designs, with standard deviations giv en in brack ets. The av erage and standard deviation of the n umber of function ev aluations to reach certain levels is also given, to compare the ability of the algorithms to quic kly attain near-optimal v alues. When at least one run failed in attaining the target, it is replaced b y a rough estimator of the empirical runtime. The algorithms start with a space-ﬁlling DoE of 20 individuals and are run for 80 supplementary iterations. In the case of the CAD parameters GP( X )-EI( X ) and of GP( α α α 1:40 )-EI( α α α 1:40 ) , since d = 40 > 20, the initial DoE con tains 50 designs and the algorithm is run for 50 iterations. Metho d Best v alue Time to 0.5 Time to 1 Time to 3 GP( X )-EI( X ) 1.18 (0.45) × 166.9 [4] 42.1 (26.5) GP( α α α 1:2 )-EI( α α α 1:2 ) 9.21 (0.80) × × × GP( α α α 1:4 )-EI( α α α 1:4 ) 0.33 (0.07) 48.8 (21.8) 21.8 (2.2) 21.0 (0.0) GP( α α α 1:16 )-EI( α α α 1:16 ) 0.59 (0.15) 197.8 [3] 50 (15.4) 35.0 (9.7) GP( α α α 1:40 )-EI( α α α 1:40 ) 2.95 (0.97) × × 194.4 [5] GP( α α α a )-EI( α α α a ) 0.32 (0.09) 33.7 (9.4) 24.5 (3.7) 21.8 (1.3) AddGP( α α α a + α α α a )-EI( X ) 0.54 (0.19) 199.4 [4] 40.2 (12.3) 30.2 (10.5) AddGP( α α α a + α α α a )-EI embed 0.37 (0.08) 49.0 (21.4) 26.1 (5.6) 22.2 (1.9) AddGP( α α α a + α α α a )-EI( α α α a ) 0.37 (0.09) 33.3 (14.6) 22.7 (2.6) 21.4 (0.7) AddGP( α α α a + α α α a )-EI( α α α ) 0.60 (0.26) 106.7 [6] 41.2 [9] 21.5 (0.5) T able 17: Minimum ob jectiv e function v alues found and num b er of function ev aluations required to attain a ﬁxed target (a v erage o ver 10 runs, standard deviations in brac k ets) for diﬀerent metamo dels and optimization strategies, rectangular heart problem (Example 4 ). The red ﬁgures corresp ond the empirical runtime, with the num b er of runs which attained the target in brac kets, and ’ × ’ signiﬁes that no run was able to attain it within the limited budget. All algorithms p erforming an EI search in α α α ’s do it with replication , the henceforth default. In this test case, as shown in Figure 22 , the 2 ﬁrst eigenshap es mo dify the shap e’s p osition, to which f 4 is insensitive. Poor results are therefore obtained by GP( α α α 1:2 )-EI( α α α 1:2 ) even though v 1 and v 2 accoun t for 80% of shape reconstruction, highlighting the b eneﬁts of the determination of active eigenshap es. In a ﬁrst order approximation, v 3 and v 4 are the most inﬂuen tial eigen- shap es with regard to f 4 , whic h measures the no dal diﬀerence b et w een Ω x and the target Ω t . GP( α α α 1:4 )-EI( α α α 1:4 ) exhibits very go od results, as well as GP( α α α a )-EI( α α α a ) , whic h mainly selects v 3 and v 4 ( v 1 , v 2 and other eigenshap es are sometimes selected to o). Even though the shap e recon- struction is enhanced, GP( α α α 1:16 )-EI( α α α 1:16 ) and GP( α α α 1:40 )-EI( α α α 1:40 ) ha ve p oor results b ecause of the increase in dimension whic h is not accompanied b y additional information, as already pointed out during the comparison of the predictiv e capabilit y of these GPs for small budgets, see T able 12 . GP( α α α 1:40 )-EI( α α α 1:40 ) p erformed b etter than GP( X )-EI( X ) in T able 12 , y et its optimization p erformance is decreased. This is certainly due to the initial DoE: b oth DoEs are space-ﬁlling in their respective input space ( X or the h yp er-rectangle of α α α ∈ A containing A N ). How ever, there is a signiﬁcan t diﬀerence betw een the minima in these DoEs: the av erage minimum ov er the 10 runs w as 2.57 for GP( X )-EI( X ) (hence b etter than the even tual a v erage best v alue for GP( α α α 1:40 )-EI( α α α 1:40 ) ), and 9.22 for GP( α α α 1:40 )-EI( α α α 1:40 ) . While GPs built o ver the entire α α α space (e.g. the additiv e one) 53 suﬀer from the same dra wback, the selection of v ariables iden tiﬁes the dimensions to fo cus on to rapidly decrease the ob jective function. This remark applies only to the rectangular heart test case and one ma y wonder what level of generality it contains. Contrarily to the previous example where building the GP in the space of all (informative) eigenshap es led to the b est results, this strategy ( GP( α α α 1:40 )-EI( α α α 1:40 ) ) p erforms w eakly here b ecause of the higher dimension. The v arian ts of the additive GP p erform well to o but they are sligh tly outp erformed by GP( α α α 1:4 )-EI( α α α 1:4 ) . As the ob jective function mainly dep ends on v 3 and v 4 , alw a ys classiﬁed as activ e, strategies that do not put too muc h emphasis or that neglect α α α a (namely , AddGP( α α α a + α α α a )-EI embed and AddGP( α α α a + α α α a )-EI( α α α a ) ) perform the best. This explains the go od p erformance of GP( α α α 1:4 )-EI( α α α 1:4 ) , whic h disregards α 5 , . . . , α 40 . The maximization of the EI with respect to the full α α α is hindered by the high dimension. Again, the p erformance decreases when the EI is maximized via the X space. AddGP( α α α a + α α α a )-EI embed and GP( α α α 1:4 )-EI( α α α 1:4 ) need more iterations to attain go o d v alues (smaller than 0.5) than GP( α α α a )-EI( α α α a ) and AddGP( α α α a + α α α a )-EI( α α α a ) which are early starters. This migh t b e due to the additional though less critical comp onen ts ( α or α 1 , α 2 , respectively) considered by these metho ds. NA CA 22 optimization In this last test case, we compare tw o of the aforementioned algorithms by optimizing the lift co eﬃcien t and the drag co eﬃcien t of a NACA 22 airfoil ( f 7 L and f 7 D ). The sim ulation is made with a computational ﬂuids dynamic code that solves the Reynolds Averaged Navier-Stok es (RANS) equations with k − ε turbulence model. Since a single call to the sim ulator (one calculation of f 7 ) tak es ab out 20 min utes on a standard p ersonal computer, only tw o runs are compared for each ob jective. The ﬁrst algorithm is the classical Bay esian optimizer where the GP is built in CAD parameter space, GP( X )-EI( X ) . In the second algorithm, AddGP( α α α a + α α α a )-EI embed , the GP is built in the V basis of eigenshap es, while prioritizing the active dimensions, α α α a , via the additiv e GP and the EI random embedding method with the replication option, see Section 4.2 . The optimization in the eigenshap e basis starts with a DoE of n = 10 designs and is run for p = 90 additional iterations while, b ecause there are 22 x i ’s, the optimization in the CAD parameters space starts using n = 50 designs and is run for p = 50 iterations. Figure 40 shows the optimization runs of b oth algorithms for the minimization of the NACA 22’s drag (top) and lift (b ottom), and Figure 41 the resulting airfoils. In this application, the main adv antage of the AddGP( α α α a + α α α a )-EI embed (Figure 40 , top left and bottom left) o ver the standard Ba yesian optimizer (top center and b ottom center) is that it enables an early search for low drag, respectively high lift airfoils, at a time when the standard approac h is still computing its initial DoE. Indeed, the classical method needs m uch more function ev aluations for building the initial surrogate mo del (blac k dots) b ecause the inputs live in a space of higher dimension. The approach introduced in this pap er would further gain in relev ance in problems with more than d = 22 CAD parameters, where it w ould almost be imp ossible to build a large enough initial design of exp erimen ts (whose size is t ypically of the order of 10 × dimension [ 28 ]). It is observ ed in Figure 41 that smo other airfoils are obtained with AddGP( α α α a + α α α a )-EI embed (righ t column), b ecause it uses a shap e co ordinate system instead of treating the L i ’s (i.e., x i ’s with lo cal inﬂuences on the airfoil, see Figure 27 ) separately , as is done by GP( X )-EI( X ) (left column). When the optimization aims at minimizing the drag, the AddGP( α α α a + α α α a )-EI embed airfoil (top righ t) is smo other than the GP( X )-EI( X ) one (top left). And when the ob jectiv e is to 54 Figure 40: T op row: drag optimization of the NA CA 22 airfoil in the reduced eigen basis with AddGP( α α α a + α α α a )-EI embed (left) or carried out in the CAD parameters space with GP( X )-EI( X ) (cen ter). Low drag airfoils are found with AddGP( α α α a + α α α a )-EI embed while the classical method still ev aluates the airfoils of the initial design of exp eriments (righ t). Bottom row: lift optimization of the NA CA 22 airfoil in the reduced eigen basis with AddGP( α α α a + α α α a )-EI embed (left) or carried out in the CAD parameters space with GP( X )-EI( X ) (center). High lift airfoils are found while the classical metho d still ev aluates the airfoils of the initial design of exp erimen ts (right), i.e., low er ob jective functions are obtained faster. maximize the lift, the cam b er of the AddGP( α α α a + α α α a )-EI embed airfoil (b ottom right) is increased in comparison with the design yielded b y GP( X )-EI( X ) (b ottom left). 5 Conclusions In this pap er w e ha ve prop osed a new metho dology to apply Bay esian optimization techniques to parametric shap es and other problems where a pre-existing set of relev an t points and a fast auxiliary mapping exist. Instead of working directly with the CAD parameters, which are to o n umerous for an eﬃcient optimization and may not b e the b est representation of the underlying shape, w e unv eil the low er dimensional manifold of shap es through the auxiliary mapping and PCA. The dimensions of this manifold that contribute the most to the v ariation of the output are identiﬁed through an L 1 p enalized lik eliho o d and then used for building an additive Gaussian Pro cess with a zonal anisotrop y on the selected v ariables and isotrop y on the other v ariables. This GP is then utilized for Bay esian optimization. The construction of the reduced space of v ariables op ens the w ay to several strategies for the maximization of the acquisition criterion, in particular the restriction or not to the manifold and the replication. The diﬀerent v ariants for the construction of the surrogate mo del and for the EI maximization hav e b een compared on 7 examples, 6 of them being analytical and easily repro ducible, the last one b eing a realistic airfoil design. Ev en though speciﬁc v arian ts are more or less adapted to features of speciﬁc test problems, the 55 Drag optimization 0.0 0.2 0.4 0.6 0.8 1.0 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 GP( X )-EI( X ) 0.0 0.2 0.4 0.6 0.8 1.0 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 AddGP( α α α a + α α α a )-EI embed Lift optimization 0.0 0.2 0.4 0.6 0.8 1.0 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 GP( X )-EI( X ) 0.0 0.2 0.4 0.6 0.8 1.0 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 AddGP( α α α a + α α α a )-EI embed Figure 41: Airfoils found b y the compared optimization algorithms. T op: drag minimization, b ot- tom: lift maximization. Left: optimization with the GP( X )-EI( X ) algorithm, right: optimization with the AddGP( α α α a + α α α a )-EI embed algorithm. sup ervised dimension reduction approach and the construction of an additive GP betw een activ e and inactive comp onen ts hav e given the most reliable results. Regarding the EI maximization our exp erimen ts highlight the eﬃciency of the random embed- ding in the space of inactiv e v ariables in addition to the detailed optimization of the activ e v ariables. It is a trade-oﬀ betw een optimizing the active v ariables only , and optimizing all v ariables. Beneﬁts ha ve b een observ ed for not restricting this inner maximization to the curren t appro ximation of A as well as for the virtual replication of p oin ts outside A when α α α / ∈ A is promoted by the EI. F urther researc h should consider shap es made of multiple elements such as the one in Example 3 . This is of practical imp ortance and it brings a new theoretical feature, the presence of symmetries in Φ. The knowledge ab out symmetries has to b e propagated to the eigenshap e space to enhance the surrogate mo del. 56 Ac kno wledgmen ts This research was partly funded by a CIFRE grant (conv ention #2016/0690) established b et ween the ANR T and the Group e PSA for the do ctoral w ork of David Gaudrie. References [1] Denis Allard, Rachid Senoussi, and Emilio Porcu. Anisotropy models for spatial data. Math- ematic al Ge oscienc es , 48(3):305–328, 2016. [2] Anne Auger and Nik olaus Hansen. Performance ev aluation of an adv anced lo cal search evo- lutionary algorithm. In Congr ess on evolutionary c omputation , v olume 2, pages 1777–1784. IEEE, 2005. [3] Ric hard E Bellman. A daptive c ontr ol pr o c esses: a guide d tour . Princeton universit y press, 1961. [4] Malek Ben Salem, F ran¸ cois Bac ho c, Olivier Roustant, F abrice Gam b oa, and Lionel T omaso. Sequen tial dimension reduction for learning features of exp ensiv e black-box functions. 2018. [5] Gal Berk o oz, Philip Holmes, and John L Lumley . The proper orthogonal decomp osition in the analysis of turbulent ﬂows. A nnual r eview of ﬂuid me chanics , 25(1):539–575, 1993. [6] Mic k a¨ el Binois, Da vid Ginsb ourger, and Olivier Roustant. A w arp ed kernel impro ving ro- bustness in Ba yesian optimization via random embeddings. In International Confer enc e on L e arning and Intel ligent Optimization , pages 281–286. Springer, 2015. [7] Mic k a¨ el Binois, Da vid Ginsb ourger, and Olivier Roustant. On the choice of the low-dimensional domain for global optimization via random embeddings. arXiv pr eprint arXiv:1704.05318 , 2017. [8] Mohamed Amine Bouhlel, Nathalie Bartoli, Ab delk ader Otsmane, and Joseph Morlier. Im- pro ving kriging surrogates of high-dimensional design mo dels by partial least squares dimension reduction. Structur al and Multidisciplinary Optimization , 53(5):935–952, 2016. [9] Da vide Cinquegrana and Emiliano Iuliano. Inv estigation of adaptive design v ariables b ounds in dimensionalit y reduction for aero dynamic shap e optimization. Computers & Fluids , 174:89– 109, 2018. [10] T obias H Colding and William P Minicozzi. Shap es of embedded minimal surfaces. Pr o c e e dings of the National A c ademy of Scienc es , 103(30):11106–11111, 2006. [11] P aul Constantine, Eric Dow, and Qiqi W ang. Active subspace metho ds in theory and practice: applications to kriging surfaces. SIAM Journal on Scientiﬁc Computing , 36(4):A1500–A1524, 2014. [12] Timoth y F Cootes, Christopher J T aylor, David H Co oper, and Jim Graham. Active shap e mo dels-their training and application. Computer vision and image understanding , 61(1):38–59, 1995. 57 [13] Noel Cressie. Statistics for spatial data. T err a Nova , 4(5):613–617, 1992. [14] Yv es Deville, David Ginsb ourger, Nicolas Durrande, and Olivier Roustant. P ack age ‘k ergp’. 2015. [15] Nicolas Durrande. ´ Etude de classes de noyaux adapt´ ees ` a la simpliﬁc ation et ` a l’interpr´ etation des mo d` eles d’appr oximation. Une appr o che fonctionnel le et pr ob abiliste. PhD thesis, ´ Ecole Nationale Sup´ erieure des Mines de Saint- ´ Etienne, 2011. [16] Nicolas Durrande, David Ginsb ourger, and Olivier Roustant. Additive cov ariance k ernels for high-dimensional Gaussian pro cess mo deling. In Annales de la F acult´ e des scienc es de T oulouse: Math´ ematiques , volume 21, pages 481–499, 2012. [17] Da vid Duv enaud, Hannes Nic kisc h, and Carl Edw ard Rasm ussen. Additiv e Gaussian pro cesses. In A dvanc es in neur al information pr o c essing systems , pages 226–234, 2011. [18] Alexander IJ F orrester and Andy J Keane. Recent adv ances in surrogate-based optimization. Pr o gr ess in aer osp ac e scienc es , 45(1-3):50–79, 2009. [19] Ildik o E F rank and Jerome F riedman. A statistical view of some c hemometrics regression to ols. T e chnometrics , 35(2):109–135, 1993. [20] Keinosuk e F ukunaga and Da vid R Olsen. An algorithm for ﬁnding in trinsic dimensionality of data. IEEE T r ansactions on Computers , 100(2):176–183, 1971. [21] Da vid Gaudrie, Ro dolphe Le Ric he, Victor Pic heny , Benoit Enaux, and Vincent Herb ert. Budgeted m ulti-ob jective optimization with a fo cus on the cen tral part of the Pareto front- extended version. arXiv pr eprint arXiv:1809.10482 , 2018. [22] Ian Jolliﬀe. Princip al c omp onent analysis . Springer, 2011. [23] Donald R Jones. A taxonomy of global optimization metho ds based on resp onse surfaces. Journal of glob al optimization , 21(4):345–383, 2001. [24] Donald R Jones, Matthias Schonlau, and William J W elch. Eﬃcien t Global Optimization of exp ensiv e black-box functions. Journal of Glob al optimization , 13(4):455–492, 1998. [25] Jic hao Li, Mohamed Amine Bouhlel, and Joaquim Martins. A data-based approach for fast airfoil analysis and optimization. In 2018 AIAA/ASCE/AHS/ASC Structur es, Structur al Dy- namics, and Materials Confer enc e , page 1383, 2018. [26] Jic hao Li, Jinsheng Cai, and Kun Qu. Surrogate-based aero dynamic shap e optimization with the active subspace metho d. Structur al and Multidisciplinary Optimization , 59(2):403–419, 2019. [27] Dong Liu and Jorge No cedal. On the limited memory BF GS metho d for large scale optimiza- tion. Mathematic al pr o gr amming , 45(1-3):503–528, 1989. [28] Jason L Lo eppky , Jerome Sacks, and William J W elch. Cho osing the sample size of a computer exp erimen t: A practical guide. T e chnometrics , 51(4):366–376, 2009. 58 [29] W alter R Mebane Jr, Jasjeet S Sekhon, et al. Genetic optimization using deriv atives: the rgenoud pack age for R. Journal of Statistic al Softwar e , 42(11):1–26, 2011. [30] Sebastian Mik a, Bernhard Sc h¨ olkopf, Alexander Smola, Klaus-Rob ert M ¨ uller, Matthias Sc holz, and Gunnar R¨ atsch. Kernel PCA and de-noising in feature spaces. In A dvanc es in neur al information pr o c essing systems , pages 536–542, 1999. [31] Jonas Mo c kus. On Bay esian methods for seeking the extremum. In Optimization T e chniques IFIP T e chnic al Confer enc e , pages 400–404. Springer, 1975. [32] Marcin Molga and Czes la w Smutnic ki. T est functions for optimization needs. T est functions for optimization ne e ds , 101, 2005. [33] Nobuo Namura, Ko ji Shimo yama, and Shigeru Oba yashi. Kriging surrogate mo del with co- ordinate transformation based on lik eliho od and gradien t. Journal of Glob al Optimization , 68(4):827–849, 2017. [34] Pram udita S P alar and Ko ji Shimo yama. On the accuracy of kriging model in active subspaces. In 2018 AIAA/ASCE/AHS/ASC Structur es, Structur al Dynamics, and Materials Confer enc e , page 0913, 2018. [35] Bala ji Ragha v an, Piotr Breitk opf, Yves T ourbier, and Pierre Villon. T ow ards a space re- duction approach for eﬃcien t structural shap e optimization. Structur al and Multidisciplinary Optimization , 48(5):987–1000, 2013. [36] Bala ji Raghav an, Guenhael Le Quilliec, Piotr Breitk opf, Alain Rassineux, Jean-Marc Ro elandt, and Pierre Villon. Numerical assessment of springbac k for the deep dra wing process b y lev el set in terp olation using shape manifolds. International journal of material forming , 7(4):487–501, 2014. [37] Carl Edward Rasmussen and Christopher KI Williams. Gaussian Pr o c esses for Machine L e arn- ing . The MIT Press, 2006. [38] Olivier Roustan t, Da vid Ginsb ourger, and Yves Deville. DiceKriging, DiceOptim: Two R pac k ages for the analysis of computer exp erimen ts by kriging-based metamo deling and opti- mization. 2012. [39] Jerome Sacks, William J W elch, T oby J Mitchell, and Henry P Wynn. Design and analysis of computer exp erimen ts. Statistic al scienc e , pages 409–423, 1989. [40] Andrea Saltelli, Stefano T aran tola, F rancesca Campolongo, and Marco Ratto. Sensitivit y analysis in practice: a guide to assessing scien tiﬁc mo dels. Chichester, England , 2004. [41] Bernhard Sch¨ olkopf, Alexander Smola, and Klaus-Robert M ¨ uller. Ke rnel principal comp onent analysis. In International c onfer enc e on artiﬁcial neur al networks , pages 583–588. Springer, 1997. [42] Bobak Shahriari, Alexandre Bouc hard-Cˆ ot´ e, and Nando F reitas. Un b ounded Ba yesian opti- mization via regularization. In Artiﬁcial Intel ligenc e and Statistics , pages 1168–1176, 2016. 59 [43] Songqing Shan and G Gary W ang. Space exploration and global optimization for computation- ally in tensive design problems: a rough set based approac h. Structur al and Multidisciplinary Optimization , 28(6):427–441, 2004. [44] Songqing Shan and G Gary W ang. Survey of mo deling and optimization strategies to solve high- dimensional design problems with computationally-exp ensiv e black-box functions. Structur al and Multidisciplinary Optimization , 41(2):219–241, 2010. [45] Mikk el B Stegmann and David Delgado Gomez. A brief in tro duction to statistical shap e analysis. Informatics and mathematic al mo del ling, T e chnic al University of Denmark, DTU , 15(11), 2002. [46] Mic hael Stein. Interp olation of sp atial data: some the ory for kriging . Springer Science & Business Media, 1999. [47] Rohit T ripathy , Ilias Bilionis, and Marcial Gonzalez. Gaussian pro cesses with built-in di- mensionalit y reduction: Applications to high-dimensional uncertain ty propagation. Journal of Computational Physics , 321:191–223, 2016. [48] Vladimir V apnik. The natur e of statistic al le arning the ory . Springer science & business media, 1995. [49] Mic hael E W all, Andreas Rec htsteiner, and Luis M Rocha. Singular v alue decomp osition and principal comp onen t analysis. In A pr actic al appr o ach to micr o arr ay data analysis , pages 91–109. Springer, 2003. [50] Quan W ang. Kernel principal comp onent analysis and its applications in face recognition and activ e shap e mo dels. arXiv pr eprint arXiv:1207.3538 , 2012. [51] Ziyu W ang, Masrour Zoghi, F rank Hutter, David Matheson, and Nando De F reitas. Ba yesian optimization in high dimensions via random embeddings. In Twenty-Thir d International Joint Confer enc e on A rtiﬁcial Intel ligenc e , 2013. [52] Xiao jing W u, Xuhao Peng, W eisheng Chen, and W eiwei Zhang. A dev elop ed surrogate-based optimization framew ork com bining HDMR-based mo deling tec hnique and TLBO algorithm for high-dimensional engineering problems. Structur al and Multidisciplinary Optimization , pages 1–18, 2019. [53] G Yi, JQ Shi, and T Choi. Penalized Gaussian pro cess regression and classiﬁcation for high- dimensional nonlinear data. Biometrics , 67(4):1285–1294, 2011. 60

Modeling and Optimization with Gaussian Processes in Reduced Eigenbases -- Extended Version

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment