Computer experiments with functional inputs and scalar outputs by a norm-based approach

Computer experiments with functional inputs and scalar outputs by a norm-based appr oach Thomas Muehlenstaedt ∗ W . L. Gore & Associates and Jana Fruth Faculty of Statistics, TU Dortmund and Oli vier Roustant Ecole Nationale Superieure des Mines de Saint-Etienne, F A Y OL-EMSE, LSTI, F-42023 Saint-Etienne, France October 3, 2014 Abstract A frame work for designing and analyzing computer experiments is presented, which is constructed for dealing with functional and real number inputs and real number outputs. F or designing experiments with both functional and real number inputs a tw o stage approach is suggested. The ﬁrst stage consists of constructing a candidate set for each functional input and during the second stage an optimal combination of the found candidate sets and a Latin hypercube for the real number inputs is searched for . The resulting designs can be considered to be generalizations of Latin hypercubes. GP models are e xplored as metamodel. The functional inputs are incorporated into the kriging model by applying norms in order to deﬁne distances between two functional inputs. In order to make the calculation of these norms computationally feasible, the use of B-splines is promoted. K e ywor ds: space-ﬁlling design, Gaussian process, Maximin design ∗ The authors gratefully acknowledge 1 1 Intr oduction A lot of physical phenomena are no w studied virtually by means of computer codes. For complex phenomena it often happens that the code is too time-consuming for a direct usage. This issue is usually addressed by creating “metamodels”, also called “surrogates” or “emulators”, that correspond to quick-to-ev aluate mathematical models of the computer codes. In particular the (meta)model based on a Gaussian process (GP) proposed by Sacks et al. (1989a,b) and Currin et al. (1991) at the end of the eighties has gained in popularity , and is no w described in se veral books (see e.g. Santner et al. (2003), Fang et al. (2006), Rasmussen and W illiams (2006)). In the sequel we will use the term “GP model” though other equiv alent expressions can be found, such as GP Regression, GaSP , GP emulator , or Kriging model. One main reason for its success is that the GP model both provides an interpolation of the data and an uncertainty quantiﬁcation in the unexplored re gions. Furthermore, it depends on a positi ve deﬁnite function, or k ernel , that is adaptable to speciﬁc priors or frame works. A large amount of research has addressed the case of scalar-v alued inputs and outputs, though this is often a summary of functional inputs and outputs, gi ven as functions of time or space. Nev- ertheless there has been a recent literature focusing on this more complex functional framew ork. Bayarri et al. (2007) in vestigated model validation with functional outputs by using a wa velet decomposition. Shi et al. (2007) had batches of time-v arying data, and modeled separately the mean structure with a functional regression model (Ramsay and Silverman (1997)) and the co- v ariance structure with a GP model. De velopments are giv en in Shi and Shoi (2011). Morris (2012) introduced a new kernel for the GP model that allo ws modelling time-v arying inputs and outputs. He also considered the design problem, and extended the maximin distance to the time- v arying case. Some theoretical results on designs for computer experiments with time varying inputs can be found in Morris (2014). In this article we consider the situation where the inputs are either scalar-v alued or functional, and where the output is scalar -valued. This corresponds, for instance, to practical situations where engineers study a summary of the output, b ut consider the whole complexity of the inputs, that may be scalar-v alued but also time-v arying functions or more general multiv ariate functions. W e in vestigate a GP model approach using a customized kernel based on norms and B-splines. W e also propose an original design strategy aiming at providing an initial space-ﬁlling design. 2 Although the methods are related to the work presented by Morris (2012), it covers different cases, e.g. our method is not restricted to time v arying inputs and, at the same time, time v arying outputs. Furthermore we allo w for a combination functional and scalar inputs. Section 2 provides some notations and presents the functional frame work, including some basics about B-splines and functional norms. In Section 3 designs for computer experiments with both functional and scalar -v alued inputs are described. In Section 4, GP models are deriv ed, including a weighting procedure for extracting which part of a functional input has high inﬂuence on the output. In Section 5, the methodology is applied to a theoretical example and to a sheet metal forming problem. A concluding discussion is gi ven in Section 5. 2 Backgr ound and notations In this paper we consider a scalar-v alued function g depending on functional inputs f ( t ) = ( f 1 ( t ) , . . . , f d f ( t )) , as well as, possibly , on scalar -valued inputs x = ( x 1 , . . . , x d s ) : y = g ( x , f ( t )) (1) In the notations above, g represents a time-consuming simulator and d s , d f are two integers, with d f > 0 . For the sake of simplicity we consider that t ∈ [0 , 1] is scalar -valued but the methodology presented here could be generalized to a vector-v alued input t ∈ [0 , 1] d t . W e assume that the inputs are bounded, and hav e been rescaled to [0 , 1] : x ∈ [0 , 1] d s and f j ( t ) ∈ [0 , 1] for all t ∈ [0 , 1] and j ∈ { 1 , . . . , d f } . In this framew ork, a design of experiments D with n runs consists of two sets of scalar and functional inputs denoted by x ( i ) = ( x ( i ) 1 , . . . , x ( i ) d s ) and f ( i ) ( t ) = ( f ( i ) 1 ( t ) , . . . , f ( i ) d f ( t )) , i = 1 , . . . , n . W e denote by y (1) , . . . , y ( n ) the corresponding scalar-v alued outputs. The design used is denoted by D , in contrast to a distance later on denoted by D . 2.1 Some basics on B-splines B-splines are an attractiv e tool for the modeling of functional input (see de Boor (2001), Ram- say and Silverman (1997)). They cov er v arious types of functions, reduce the inﬁnite space of functions considerably and provide a practical mathematical framework for further computations. 3 B-spline functions are alw ays bounded, which is an important feature for input functions which usually are only allo wed to vary between gi ven v alues. A B-spline is deﬁned as a linear combination of B-spline basis functions B i,m , i = 1 , . . . , K of order m f ( t ) = K X i =1 β i B i,m ( t ) where the order m = 1 relates to (piecewise) constant functions. K and m have to be ﬁxed with K ≥ m and β = ( β 1 , . . . , β K ) is the vector of basis coefﬁcients. The B-spline basis functions are deﬁned o ver a sequence of increasing knots (time points) of length K − m + 2 with additional m − 1 replicates for the ﬁrst and the last knot which are necessary for basis functions at the bounds τ 1 = · · · = τ m − 1 = τ m < τ m +1 < · · · < τ K < τ K +1 = τ K +2 = · · · = τ K + m . They are recursi vely giv en by B i, 1 ( t ) = 1 [ τ i ,τ i +1 ] ( t ) for i = 1 . . . , K + m − 1 and B i,m ( t ) = t − τ i τ i + m − 1 − τ i B i,m − 1 ( t ) + τ i + m − t τ i + m − τ i +1 B i +1 ,m − 1 ( t ) for i ∈ 1 , . . . , K , with B i,m = 0 if τ i = · · · = τ i + m = 0 to a v oid division by zero. Figure 1 shows basis functions for B-splines of order 1, 2, 3, and 4. F or order 1 the basis functions are disjoint piecewise constant functions, for order 4 the functions form the popular cubic spline. In the ﬁgure the number of basis functions K is set to 5 for each order . It follo ws that the number of knots decreases with the order . It can be further seen that at each time point t , the sum of all 5 basis functions is 1, which implies that if β ∈ [0 , 1] K then for all t we have 0 ≤ f ( t ) ≤ 1 , f ( t ) = 1 ⇔ β 1 = · · · = β K = 1 and f ( t ) = 0 ⇔ β 1 = · · · = β k = 0 . Therefore a bounded function f corresponds to a hypercubic domain for the basis coef ﬁcients β . One result which justiﬁes the use of B-Splines is found in de Boor (2001) on page 55, equation 12. In our notation the result stats that giv en an unkno wn but four times dif ferentiable function g ( t ) deﬁned on [ τ m , τ K ] , the (pointwise) interpolation error of a cubic B-spline is bounded from abov e and the bound depends on the maximum stepwidth of the knots and the absolute maximum of 4th deriv ati ve of the function g . Hence, while B-splines itself are a somewhat restriced class 4 of functions, they can be used to approximate a very broad class of functions, i.e. all sufﬁciently smooth functions. 0.0 0.2 0.4 0.6 0.8 1.0 t 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 t 0.00 0.25 0.50 0.75 1.00 0.0 0.2 0.4 0.6 0.8 1.0 t 0 1/3 2/3 1 0.0 0.2 0.4 0.6 0.8 1.0 t 0.0 0.5 1.0 Figure 1: B-Spline bases of increasing orders (1 to 4 from top to bottom) for a ﬁx number of K = 5 functions. The knots are sho wn as ticks on the x-axis. 2.2 Distance-based approach T o discriminate between functions, a distance based approach is chosen. Similar to the L 2 norm in Euclidean space, the well kno wn L 2 norm for functions is deﬁned as 5 D f ( f , ˜ f ) = k f − ˜ f k L 2 = s Z 1 0 ( f ( t ) − ˜ f ( t )) 2 dt. (2) Other choices of norms could be possible, e.g. weighted norms, general p norms or norms working on deri vati ves, e.g. Sobole v norms. The choice of a suitable norm is case sensitiv e. In the case that the functions are designed as B-splines of the same basis (same number of basis functions, basis order and knot points) the L 2 norm of h ( t ) = f ( t ) − ˜ f ( t ) = P K i =1 ( β i − ˜ β ) b i ( t ) = P K i =1 δ i b i ( t ) reduces to a norm in R K k h k 2 L 2 = Z 1 0 h ( t ) 2 dt (3) = Z 1 0 X i,j δ i δ j b i ( t ) b j ( t ) dt (4) = δ 0 J δ (5) = k δ k 2 J (6) where J is the K × K matrix  R 1 0 b i ( t ) b j ( t ) dt  1 ≤ i,j ≤ K . As the matrix J does not depend on the coef ﬁcients of a B-spline function but just on the order and number of basis functions, this matrix can be stored and reused. In order to include the scalar valued inputs into the frame work, a further distance function has also to be deﬁned: D (( x ( i ) , f ( i ) ) , ( x ( j ) , f ( j ) )) = v u u t k x ( i ) − x ( j ) k 2 2 + d f X k =1 ( D f ( f ( i ) k , f ( j ) k )) 2 . (7) In the case that there are no scalar inputs, this distance simpliﬁes to q P d f k =1 ( D f ( f ( i ) k , f ( j ) k )) 2 . 3 Designs f or functional inputs 3.1 Theory There are many approaches on ho w to design a simulation experiment. A good summary can be found in Fang et al. (2006). Uniform design criteria like the Wrap Around Discrepancy or 6 the Centered Discrepancy are popular approaches, as well as distance-based design criteria like maximin and minimax designs. In contrast to uniform and distance-based designs, which are not directly linked to a statistical model, maximum entropy designs and IMSE optimal are optimality criteria, which are directly linked to a GP model and an assumed cov ariance kernel. A popular class of designs are Latin Hypercube designs (LHD), in vented by McKay et al. (1979). Our aim is to generalize the concept of LHD to situations with functional inputs. One ap- proach would be to design the coefﬁcients of a basis, such as a B-spline basis, a polynomial basis, etc. (see Ramsay and Silverman (1997)). Here another approach is taken. For a con ventional LHD with just scalar inputs, the v alues of each input variable x k are equally spread between 0 and 1 , i.e. x ( i ) k = π ( i ) − 1 n − 1 , i = 1 , . . . , n , where π ( . ) is a permutation of 1 , . . . , n . While it is not ob vious, which combination of the input v ariables x 1 to x d to use, it is ensured, that the one dimensional projections are uniformly distributed. The combination is then chosen according to a ﬁtness criterion. This idea is copied to the functional inputs such that in a ﬁrst step, a candidate set of functions f (1) k , . . . , f ( n ) k is constructed for each k ∈ { 1 , . . . , d f } . Once these sets are constructed, the best combination of the sets and the scalar inputs is determined. In the following, the strategy for ﬁnding a candidate set based on B-splines is described. This corresponds to ﬁnding equally spread points in one dimension for con ventional LHD with scalar inputs. Depending on the restrictions on the candidate set, different strate gies for ﬁnding a good candidate set can be applied. If no prior kno wledge is a vailable, our strategy is to apply distance- based approaches here as well. After ﬁnding a candidate set, in a second step a space-ﬁlling combination of the scalar LHD and the candidate set is found. 3.1.1 Constructing the candidate set For B-splines, the choice of the candidate set reduces to the choice of the coefﬁcients of basis functions, i.e. for a candidate set with n functions with K bases, n ∗ K coefﬁcients hav e to be chosen. In order to hav e a space-ﬁlling candidate set, the coefﬁcients hav e to be chosen with care. Ideally a big variety of functions are covered, i.e. increasing/decreasing functions or functions which are in av erage very high or lo w . 7 Here the basis coefﬁcients are sampled from a LHD, i.e. each basis function is considered as an input factor in a LHD. The coef ﬁcients could also be sampled and optimized without any restriction to a LHD, but in this case, the coef ﬁcients tend to be near to the extremes for DoEs with larger number of basis functions and higher number of runs. As a ﬁtness criterion, not directly the minimum distance among all pairs of functions is used, but a variant proposed in Morris and Mitchell (1995): Φ q ( D f ( β )) = ( n X i =1 i − 1 X j =1 ( D f ( f ( i ) ( β i ) , f ( j ) ( β j ))) − q ) 1 /q . (8) Here, q = 5 is applied. The criterion Φ q ( D f ( β )) is written in dependence on β , as for the construction of the candidate set the optimization takes place ov er the coef ﬁcient v ector of the B- spline representation. This ﬁtness criterion does not only use the minimum distance for compar- ison of dif ferent designs but all possible pairs of functions. In order to optimize the Φ q -criterion any existing algorithm for optimizing LHD can be used, e.g. simulated annealing or genetic algorithms. Here, simulated annealing has been used. 3.1.2 Constructing a generalized Latin h ypercube Gi ven for each functional input f k a set of functions is created, the best combination of the LHD for the real inputs and the sets for the functional input has to be searched. Here the same set is used for all functional inputs. Howe v er , it would be possible to use a dif ferent set for each functional input. In order to rank full designs again a maximin strategy will be chosen. Therefore the distance (7) is used for the follo wing criterion: Φ c q ( D ( π )) = n X i =1 i − 1 X j =1 ( D (( x ( i ) ( π i ) , f ( i ) ( π i )) , ( x ( j ) ( π j ) , f ( j ) ( π j )))) − q ! 1 /q (9) The criterion Φ c q ( D ( π )) is written in dependence on a permutation π in order to indicate, that in this step of the design construction, the optimization only takes place over switching indices of the scalar of functional inputs. In order to optimize this ﬁtness criterion by an algorithm, there are multiple algorithms possible. In principle, all algorithms used for optimizing LHDs can be applied here, as the candidate sets themselves are not changed, just the combination of the candidate sets and the scalar inputs. Here a v ariant of simulated annealing as described in Morris and Mitchell (1995) is used. 8 Remark 1. Another alternative for ﬁnding designs, which seems to be pr omising in the ﬁrst place is to apply a sear ching algorithm directly on the ﬁtness criterion by optimizing over the class of functions, the results is that only extr emes of the class ar e chosen. This is similar to maximin designs with just r eal inputs: The optimal maximin design without r estricting it to be a LHD in d dimensions with 2 d runs is a traditional full factorial design with two le vels, which is deﬁnitely not a space-ﬁlling design. In or der to illustrate this behaviour , a design with 2 functional and 2 scalar inputs, 15 runs and 8 basis functions has been set up. This design has been optimized unconditionally over the coefﬁcients of the functional inputs and the permutation of the scalar inputs. In F igur e 2 a plot of the B-splines for the ﬁrst functional input is given. Clearly the distinctive functions cluster at the minimum and maximum of the allowed rang e. 0.0 0.2 0.4 0.6 0.8 1.0 −0.2 0.4 1.0 t Figure 2: Plot of one functional input of an unconditionally optimized design with 15 runs, 2 functional and 2 scalar inputs and 8 basis functions. Clearly , the functions are clustering at the minimum and maximum of the allo wed output range. 9 4 Surr ogate models As for many simulations, the ev aluation of the simulation is costly , a big part of literature about computer experiments focused on constructing statistical models for simulation output. Different types of models are applied, e.g. response surface models, artiﬁcal neural networks or radial basis functions. Ho wev er , the most popular model is most likely the GP model (Santner et al. (2003)), also called Kriging. There are sev eral reasons for using a GP model. It is capable of exactly reproducing the observations, gi ves an uncertainty estimate and is very ﬂexible by incorporating dif ferent cov ariance kernels. Furthermore, the GP model has often a very high prediction po wer compared to other approaches and there is an easy way to switch from interpolation to smoothing by incorporating a nugget effect. In Morris (2012), a GP model is extended to incorporate time v arying inputs, which are modeled as functional inputs. The ideas presented in the following are in some ways e xtensions of the modeling ideas dev eloped by Morris. Especially due to the ﬂexibility a GP model is chosen here as well. For a standard GP model it is assumed that the output of the simulation follo ws a Gaussian process: Y ( x, f ) = µ + Z ( x, f ) , where the zero centered GP Z ( x, f ) is characterized by its cov ariance function. A typical GP model approach is to use an anisotropic, tensor-product kernel, which can easily be extended here: cov ( Z ( x (1) , f (1) ) , Z ( x (2) , f (2) )) = σ 2 g ( D s ( x (1) , x (2) ; θ s )) g ( D f ( f (1) , f (2) ; θ f )) . (10) Here D f ( ., . ; θ f ) and D s ( ., . ; θ s ) are distances for the functional and scalar inputs respectiv ely , scaled by some covariance parameters θ s , θ f . Standard kernels for g k ( ., θ ) are the Gaussian kernel ( g ( h, θ ) = exp ( − h 2 2 θ 2 ) ), the Matern 5/2 kernel ( g ( h, θ )) = (1 + √ 5 | h | θ + 5 h 2 3 θ 2 ) exp ( − √ 5 | h | θ ) . In this statistical model, the parameters µ, σ, θ s and θ f hav e to be estimated. There are sev eral approaches for parameter estimation in GP models (Maximum likelihood, restricted Maximum Likelihood, cross validation), where the most common is Maximum Likelihood. As the likeli- hood cannot be optimized analytically here algorithmic optimization is chosen. While in the general case, k f − ˜ f k requires the e v aluation of an inte gral, the use of a B-spline basis simpliﬁes the computation. The kernel reduces to a kernel deﬁned on D × D where D is 10 the hypercube [0 , 1] d s + d f × K , e.g. a Gaussian cov ariance kernel reduces to exp   − d x X ` =1 1 2 x (1) ` − x (2) ` θ x` ! 2   exp   − d f X ` =1 1 2 ( β (1) ` − β (2) ` ) 0 J ( β (1) ` − β (2) ` ) θ f ` ! 2   . with f ` = P K k =1 β `,k B k,m for the functional inputs ` = 1 , . . . , d f . Furthermore the domain is here hypercubic, both for scalar inputs and functional inputs, due to the property of B-splines (see Section 2.1). The estimation of the parameters is done in a similar way to the estimation methods used in the R package DiceKriging (R Core T eam (2013), Roustant et al. (2012)). Therefore in a ﬁrst step a number of random points in the parameter space are check ed for their log-likelihood v alue and the best is chosen as starting point for the optimization by the R -command optim . Many useful concepts, which are known for scalar-v alued inputs also work in this context of functional inputs. A leav e-one-out cross validation, where the unknown parameters are estimated based on the full data set, but a prediction is made for data point ( x ( i ) , f ( i ) ) based on the full data set omitting data point i , can be useful to check model adequacy . Although such kind of leav e one out prediction is optimistic, it still can help to identify problems with the model. As for other GP models, an uncertainty estimate is av ailable and hence EGO type optimization techniques (Jones et al. (1998)) can be applied for sequential optimization. 4.1 W eighting The surrogate model strategy e xplained abov e is attracti v e, as it shrinks do wn the inﬁnite dimen- sional functional input to a problem where for each functional input one cov ariance parameter is estimated. The disadvantage of this approach is that it tells if one functional input as a whole is important or not via its cov ariance parameter . But it does not giv e any result about, which part of the input space of a functional input is important. In order to construct a more informativ e parameter estimation process, a weighting step in the GP model is suggested. So far the dis- tance between two different functions of one functional input is determined by the L 2 distance of the two functions and this distance it used as basis for constructing the cov ariance between two outputs Y (1) and Y (2) . For the special case of B-splines, the L 2 distance reduces to δ 0 J δ . The 11 general idea for the weighting process is to use instead ˜ D w ( f , ˜ f ) = s Z 1 0 ( w ( t ; ω ) ∗ ( f ( t ) − ˜ f ( t ))) 2 dt, (11) with R 1 0 w ( t ; ω ) dt = 1 . One of the advantages of using B-splines is that the integration is easily done numerically . As this advantage should not be destroyed, the weighting process has to be chosen carefully . Writing equation (11) with f and ˜ f being B-splines becomes ˜ D w ( f , ˜ f ) = v u u t Z 1 0 ( K X i =1 δ i B i,m ( t ) w ( t ; ω )) 2 dt, (12) with δ i = β i − ˜ β i Although this would be in general a possible way for weighting, the numerical computation would be much more complex than before, as the matrix J would now also depend on (weighting) parameters to be estimated. In order to av oid this drawback, the weighting process is discretized such that each basis function is weighted separately: D w ( f , ˜ f ) = v u u t Z 1 0 ( K X i =1 δ i B i,m ( t ) w i ( ω )) 2 dt, (13) with weighting coefﬁcients w 1 ( ω ) , . . . , w K ( ω ) ≥ 0 , P K k =1 w k = 1 . As the weights can be taken out of the integration, no w the integral can again be calculated ef ﬁciently using D 2 w ( f , ˜ f ) = δ 0 W ( ω ) J W ( ω ) δ . (14) Beforehand, the same formula was deri ved without the W ( ω ) -matrix in equation (3). W ( ω ) is a diagonal matrix of size K . The parameter ω is a (potentially multidimensional) parameter describing the weighting. This parameter is estimated during the maximum likelihood optimiza- tion. One possibility would be to include each diagonal entry of W into ω and just restrict it to be positiv e. This is unfortunate for two reasons. First this potentially increases the number of parameters to be estimated by ML dramatically . Secondly , the GP model would no longer be uniquely identiﬁable. The cov ariance parameter θ and the weighting parameter ω could be exchanged without changing the model. T o ov ercome the identiﬁcation problem, the entries of W are restricted to be nonnegati ve and to sum up to 1: W ii ≥ 0 , tr ( W ) = 1 . In order to reduce the number of parameters, here a parametric description of the weighting by a beta distribution 12 is used. The beta distribution is a very ﬂexible distribution with support [0 , 1] . It is described by two parameters, which both need to be greater than 0. Let dbeta ( t, ω ) the density of a beta distribution with parameters ω . Then the weighting matrix is deﬁned as ˜ W ii = dbeta ( t imax , ω ) , W ( ω ) := ˜ W ( ω ) /tr ( ˜ W ( ω )) . (15) The value t imax is the argument v alue, where the i th basis spline has its maximum, i.e. the place where the i th basis has the highest inﬂuence. As the two parameters in ω are just restricted to be ≥ 0 , the numerical optimization of these two parameters can easily be incorporated into the ML estimation procedure for the cov ariance parameters. 5 A pplication 5.0.1 Theoretical example 1 In order to check if the estimation of the covariance parameters work comparable both for real number inputs and functional inputs the follo wing example is used: g 1 ( x, f ) = x 1 + 2 x 2 + 4 Z 1 0 tf 1 ( t ) dt + 1 Z 1 0 f 2 ( t ) dt The ﬁrst real number and the second functional input are of the same importance, i.e. the function x 1 and the function R 1 0 f 2 ( t ) dt hav e the same output domain. At the same time, the second real number input and the ﬁrst functional input are comparably inﬂuential but are more important than the ﬁrst real number and functional input. A DoE with 20 runs and 3 real number and 3 functional inputs is set up (hence there are inacti ve input parameters). The corresponding B-splines have 7 basis functions and are of order 4. Afterwards a GP model with the Matern5/2 cov ariance kernel inlcuding weighting is ﬁtted to the data. The result of the cov ariance plot is shown in Figure 3 and the weighting plots are shown in Figure 4. 5.0.2 Theoretical example 2 The second example has again 3 scalar inputs and 3 functional inputs f k ( t ) ∈ C 0 ([0 , 1]) , k = 1 , 2 , 3 , satisfying boundary constraints 0 ≤ f k ≤ 1 , but this time the function is chosen to 13 x1 x2 x3 f1 f2 f3 (1 − Correlation) f or distance = 1 0.000 0.004 Figure 3: Sensitivity plot for the ﬁrst theoretical example using a GP model including weighting. be more complex. The real number part is the well known Branin function (Dixon and Szego (1978)) plus one inacti ve input and the functional part of the e xample has 2 acti ve inputs and one inacti ve, including interactions between the real number and functional inputs. g 2 ( x, f ) =( x 2 − 5 4 π 2 x 2 1 + 5 π x 1 − 6) 2 + 10(1 − 1 8 π ) cos ( x 1 ) + 10 + 4 3 π  42 Z 10 − 5 f 1 ( t )(1 − t ) dt + π (( x 1 + 5) / 5 + 15) Z 1 0 tf 2 ( t ) dt  . In order to construct a design, the strategy described abov e is applied with n = 40 , K = 7 , m = 4 . The candidate set is sho wn in ﬁgure 5. A generalized LHD according to the methodology described abo ve has been constructed and two GP models hav e been estimated: One without any weighting for the functional inputs and a second one including weighting for the functional inputs as described in the last chapter . As a cov ariance kernel, the Matern 5 / 2 kernel has been used. Both GP models ha ve been used in order to make predictions for 300 randomly selected sets of inputs points and input functions in order to validate the prediction quality of the two models. For both models, the weighted and 14 0.0 0.4 0.8 0.0 1.0 2.0 weights functional input 1 0.0 0.4 0.8 0.0 1.0 2.0 weights functional input 2 Figure 4: W eighting plot for the ﬁrst theoretical example. As the third functional input is (cor- rectly) rated as unimportant, only the ﬁrst two weighting plots are sho wn. the unweighted one, the co v ariance parameters are summarized in a bar plot in order to illustrate, which inputs are important. In this bar plot, 1 − g k (1; θ k ) is plotted, where g k is the kernel of the cov ariance function chosen (see Figure (7)). For the weighted model, the result of the weighting procedure is plotted in ﬁgure (8). Here again 1 − g k (1; θ k ) is plotted in a bar plot. Furthermore, a Bspline is plotted, where the weights obtained from the maximum lik elihood procedure are used as β -coefﬁcients. This plot indicates which part of the support has high importance and which part has lo w importance. Both models deli ver a good prediction based on prediction plots (see ﬁgure (6)), where the weighted version has slightly better RMSE (0.055) than the nonweighted version (0.081) based on 300 independent observ ations of the theoretical example. 5.1 Springback analysis In deep dra wing sheet metal forming, the ﬁnal shape of a part depends on the elastic energy stored during the process of the forming. The energy is inﬂuenced by a number of process pa- 15 Figure 5: Candidate set with n = 40 , K = 5 , m = 4 . rameters like blankholder force and friction. Springback, one of the main sources of geometrical inaccuracy , can be predicted by these parameters in simulation models. Usually , the analysis is limited to constant input parameters. The goal here is to achie ve better predictions and deeper information to the springback de velopment by v arying the process parameters blankholder force and friction in time using the norm-based function analysis approach. An explicit Finite Element Method (FEM) via LS-D YN A is used which takes two parame- ters as input, the friction coefﬁcient ( f F ) and the blankholder force ( f B ), which can be varied externally during the punch tra vel. A generalized LHD with 40 runs, using 6 B-spline basis functions of order 4 is constructed 16 Figure 6: Prediction plots for the two models including weighting (left hand side) and without weighting (right hand side). and performed in the FEM model. On these data, a functional Kriging model including weighting on a Gauss co variance kernel is ﬁtted. The v alue 1 − g k (1; θ k ) is around 0.78 for f F and 0.28 for f B indicating a much larger inﬂuence for friction as for blankholder force on the springback. A weighting plot as in Fig. 8 can be seen in Fig. 9. It can be found that in the FEM model the ef fect of the inputs on the springback reduction is increases tow ards the end of the punch tra vel. This sho ws the importance of careful settings at the end of the process, where the ﬂange of the part is formed. 5.2 Exploration of B-splines W e revisit the theoretical e xample 2 of Section 5.0.2 for a small study in which we examine the effect of the B-spline order m to the presented functional design and modelling approach. W e compare ﬁ ve dif ferent orders, 1 , 2 , 3 , 4 and 5 . For each order we set up a design of size n = 20 with K = 7 basis functions and construct a surrogate model. The constant number K ensures a comparable number of model parameters between the different orders. The procedure 17 Figure 7: Sensitivity plots based on the cov ariance parameters for the two models including weighting (left hand side) and without weighting (right hand side). is repeated 100 times for each order and 5 test data sets of size 600, one for each order, are set up for comparison. T able 1 shows the resulting root mean square errors (RMSE), averaged ov er the 100 models. The spline order m = 4 performs best here. W e conclude that, at least in our experience, the B-spline order 4 can be recommended. Figure 10 sho ws boxplots of the values 1 − g k (1; θ k ) , comparable to Figure (7). The box sizes gi ve an impression of the accuracy of the cov ariance estimates. m 1 2 3 4 5 A verage RMSE 56.29 53.10 44.81 37.07 40.065 T able 1: B-spline order comparison: A verage RMSE values between the predicted and true v alues of the 5 test data sets. 6 Conclusion In this article a methodology for incorporating functional inputs and scalar inputs into simulation experiments via the use of B-splines is presented. Therefore designs and metamodels are de- 18 Figure 8: W eighting plot for the GP model including weighting. Areas on the x-scale with a high v alue are of higher importance on the output than areas with smaller values. veloped. For constructing a space-ﬁlling designs, a distance-based approach is presented, which works in two steps. In a ﬁrst step a candidate set for the functional input parameters is constructed and in the second step a design for the functional as well for the scalar inputs is constructed in a Latin hypercube manner . Gi ven scalar outputs from a simulation, the data can be modelled by a GP model and the cov ariance parameters are used in order to rank the inputs by importance. In order to learn more about the beha viour of the functional inputs, a weighting process can be introduced, which can analyze, where a functional input is of high importance. This giv es an attracti ve possibility to learn more about the behaviour of the functional inputs. But this beneﬁt comes with the cost of introducing additional parameters and therefore with a more demanding optimization process. The weighting process is not so beneﬁcial for improving prediction but it aims at learning more about the functional inputs. Although when the data set is lar ge enough the prediction for the weighted GP model has often been slightly better than for the unweighted v er - sion, especially for small sample sizes, the estimation process of the parameters for weighted GP model does not work as reliably as for the non-weighted GP model. All in all, the methodology de veloped incorporates functional inputs in a way that the functional character is not changed b ut still computations are feasible. Fundamental to this has been the usage of functional norms in 19 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 weights f F ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 weights f B ● ● ● ● ● ● Figure 9: W eighting plots for the springback FEM model. ● ● ● ● ● ● ● ● ● ● x 1 x 2 x 3 f 1 f 2 f 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 m=1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x 1 x 2 x 3 f 1 f 2 f 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 m=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x 1 x 2 x 3 f 1 f 2 f 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 m=3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x 1 x 2 x 3 f 1 f 2 f 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 m=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x 1 x 2 x 3 f 1 f 2 f 3 0.00 0.05 0.10 0.15 0.20 0.25 0.30 m=5 Figure 10: B-spline order comparison: Boxplots of the v alues 1 − g k (1; θ k ) . order to incorporate functional inputs and the usage of B-splines as a representation of functional inputs. Ackno wledgements: Andon Iyassu, for his help on editing. Refer ences Bayarri, M. J., Berger, J. O., Cafeo, J., Garcia-Donato, G., Liu, F ., Palomo, J., P arthasarathy, R. J., Paulo, R., Sacks, J., and W alsh, D. (2007). Computer Model V alidation with Functional Output. Annals of Statistics, 35:1874 – 1906. 20 Currin, C., Mitchell, T ., Morris, M., and Ylvisaker , D. (1991). Bayesian prediction of determin- istic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86(416):953–963. de Boor , C. (2001). A practical guide to splines. Springer , Ne w Y ork. Dixon, L. C. W . and Szego, G. P . (1978). The global optimization problem: An introduction. T ow ards global optimization, 2:1 – 15. Fang, K.-T ., Li, R., and Sudjianto, A. (2006). Design and Modeling for Computer Experiments. Computer Science and Data Analysis Series. Chapman & Hall/CRC, Ne w Y ork. Jones, D., Schonlau, M., and W elch, W . (1998). Ef ﬁcient global optimization of e xpensi ve black- box functions. Journal of Global Optimization, 13:455–492. McKay , M., Beckman, R., and Cono ver , W . (1979). A comparison of three methods for selecting v alues of input v ariables in the analysis of output from a computer code. T echnometrics, 21(2):239–245. Morris, M. (2012). Gaussian surrogates for computer models with time-varying inputs and oupt- puts. T echnometrics, 54:42–50. Morris, M. (2014). Maximin distance optimal designs for computer experiments with time- v arying inputs and outputs. Journal of statistical Planning and Inference, 144:63–68. Morris, M. and Mitchell, T . (1995). Exploratory designs for computational e xperiments. Journal of Statistical Planning and Inference, 43:381–402. R Core T eam (2013). R: A Language and En vironment for Statistical Computing. R Foundation for Statistical Computing, V ienna, Austria. ISBN 3-900051-07-0. Ramsay , J. and Silverman, B. (1997). Functional Data Analysis. Springer , Ne w Y ork. Rasmussen, C. and W illiams, C. (2006). Gaussian Processes for Machine Learning. the MIT Press. 21 Roustant, O., Ginsbourger , D., and De ville, Y . (2012). DiceKriging, DiceOptim: T wo R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. Journal of Statistical Software, 51(1):1–55. Sacks, J., Schiller , S., and W elch, W . (1989a). Design for computer e xperiments. T echnometrics, 31:41–47. Sacks, J., W elch, W ., Mitchell, T ., and W ynn, H. (1989b). Design and analysis of computer experiments. Statistical Science, 4:409–435. Santner , T ., W illiams, B., and Notz, W . (2003). The Design and Analysis of Computer Experiments. Springer Series in Statistics. Springer V erlag, Ne w Y ork. Shi, J. Q. and Shoi, T . (2011). Gaussian Process Re gression Analysis for Functional Data. Chap- man & Hall. Shi, J. Q., W ang, B., Murray-Smith, R., and T itterington, D. M. (2007). Gaussian process func- tional regression modeling for batch data. Biometrics, 63(3):714–723. 22

Computer experiments with functional inputs and scalar outputs by a norm-based approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment