Different numerical estimators for main effect global sensitivity indices

Different num erical estimators for main effect global sensitivi ty indices Serg ei Kucherenko a* , Shufang Song a a Centr e for P r ocess System s Engineering, Imperial Co llege London, London, SW 7 2AZ, UK * e-mail address: s.kucheren ko@im perial.ac.uk Abstract. T he variance- based method of globa l sensitivity indices based on Sobol' sensitivity indices became very popular among practitioners due to its easiness of interpretatio n. For complex practical problems computation of Sobol' indices generally requires a lar ge number of function evaluations to achieve reasonable converg ence. Four differ ent d irect formulas for computing Sobol’ main effe ct sensitivity indices are com pared on a set of test pr oblem s f or which t here are analytical r esults . These formulas are based on high- dimensional integrals which are evaluated using MC a nd QMC techniques . Direct formulas are also compared with a differe nt approach based on the so-called “ double loop reordering” form ula. It is found that th e “ double loop reordering” ( DLR) approach shows a super ior performance am ong all methods both for m odels with independ ent and dependen t variables. Keywords: Globa l Sensitivity Analysis, Sobol’ S ensitivity Indices, Quasi Monte Carl o, Double L oop Reordering 1 Introduc tion Global s ensitivity analysis (GSA) complements Unc ertainty Quantification i n tha t it of fers a comprehensiv e approach t o model analysis by qu antifying how t he uncertainty in model ou tput is apportioned to the uncerta inty i n model inputs [1,2]. Unlike local sensitivity ana ly sis , G SA es tim at es the effect of varying a given input (or set of inputs) while all other inputs are varied as well, thus providing a measure of interactions among variables. GSA i s used to identify key parameters whos e uncertainty m ost affect s the output. Th is information then can be us ed to rank v ariables, fix unessenti al variables and decrease problem dimensionality . The variance-based method of global sensitivity indices based on Sobol' sensitivity indices became very popular among pr actitioners due to its effici ency and easines s of interpretat ion [3,4]. There are two types of Sobol' sensitivity indices: the main effec t indices, which estimate the individual contribution of each input param eter or a group of inputs to the output v ariance, and the total sensitiv ity indices, which measure the total contribu tion of a single input fact or or a group of inpu ts including inte ractions with al l other inputs [5]. 2 For models with independent variables there are effici ent direct form ulas which allow to comput e Sobol' indices directly fr om funct ion values. These formulas are based on high-dimensional integrals which can be ev aluated via MC/QMC techniques [1,4,5]. For complex practica l pro blem s com pu tati on of Sobol' indices general ly requires a large number of function evaluations to ac hieve reasonable conver gence. More efficien t formulas for evaluation of Sobol ’ main effec t indices using di rect integral formulas were suggested by Salt elli [6] . Kucherenko et al [7,8] developed further Saltelli’ s approach by suggesting new formula which significantly improves t he computat ional accuracy of Sobol’ main effect i ndices with small values. Sobol’ and Myshetskaya [9] and Owen [10] suggested t heir ve rsio ns of improved direct formulas. In this work we compare original and existing improved direct formulas . For models with dependent inputs we consider a novel approach for esti mation Sobol' indices developed in [11]. W e also com pare direct formulas using MC esti mators based on MC and QMC sampling with the so -cal led double loop approach on a set of test problems which for which there are analytical results f or t he values of Sobol' indices. T he double loop approach has been dis car ded in th e past as being i nef ficient in comparison w ith direct formulas but due to t he improv ements in the algorithms suggested by Plischke [12 ] i t becam e an intere sting alternative to the direc t formulas. Further we call this app roach as “ double loop reordering ” (DLR ). E valuation of Sobol’ main effect indices remains to be an active area of research : we could mention application of RBD [13], various metamodelling methods [14,15,16] and some other attempt s to i mprove direct formulas [17]. W e also note that a new method for improving the efficiency of the Monte Carlo estim ates for the Sobo l’ total sens itivity indices was developed in [ 18 ]. This paper is or ganized as f ollows. The nex t se ction introduces AN OV A decomposition an d Sobo l’ sensitivity indices. In Sect ion 3 we present different estimators of t he main eff ect sensitivity indices . Comparison of the ef ficiency of different estimators is considered in Sect ion 4. Finally , c onclusion s are given in Sec tion 5. 2 Sobol’ sens itivity indices Consider the square integrable function () fx defined in t he unit hypercube d H = [0 ,1 ] d . The decomposition of () f x 0 12 1 11 ( ) ( ) ( , ) ( , , ) d d d i i ij i j d d i i j i f x f f x f x x f x x , (1) where 0 () d H f f x d x is called ANOV A if cond itions 3 1 ... 0 d sk i i i H f d x are satisfi ed for all different g roups of indices 1 , ..., s ii such that 12 1 . . . s i i i d . These conditions guarant ee that all term s in (1) are mutua lly orthogonal with resp ect to integ ration [4]. The variances of the term s in the ANO V A decomposition add up to the total v ariance: 1 1 22 0 . . . 1 () d s s dd ii H s i i D f x d x f D , where com ponents 1 1 1 1 2 ... . . . ( , ..., ) ... s s s s s i i i i i i i i H D f x x dx dx are called p artial variances. Main eff ect global sensi tivity indices are defined as rat ios 11 .. . . . . / ss i i i i S D D . Further we will co nsider sensitivity indices for a sing le index: i i S D D . T otal partial v ariances acco unt for the total in fluence of th e factor i x : 1 ... s tot i i i i D D , where the sum i is extended over all different groups of indices 1 , ..., s ii satisfying condition 12 1 . . . s i i i d , 1 sd , where one of the in dices is equal i [1,4]. The co rresponding tota l sensitivity inde x is defined as tot t o t i i S D D . Sobol’ also i ntroduc ed sensitivity indices for subsets of variables [3,4]. Consider two complem entary subsets of v ariables y an d z : ( , ) x y z . Let 1 11 ( , ..., ) , 1 ... , ( , ..., ) m i i m m y x x i i n K i i . Here m is a cardinality of a subset y . The variance cor responding to y is defined as 1 1 ... 1 ( ) s s m y i i s i i K D D . (2 ) y D includes all partial variances 1 i D , 2 i D ,…, 1 ... s ii D such that t heir subsets of i ndice s 4 1 ( , ..., ) s i i K . The total variance tot y D is defined as tot y z D D D tot y D consists of all 1 2 ... s ii such t hat at least one index p i K while t he remaining indices can belong to the com plementary to K set K . The corresponding Sobol’ sensitivity i ndices are defined as /, /. y y tot tot yy S D D S D D Denote ~ 1 1 1 ( , . . . , , , ... , ) i i i d x x x x x the vector of all variables but i x , then ~ ( , ) ii x x x and ~ ( ) ( , ) ii f x f x x . The first order com ponent in ANOV A decomposition (1) can be found as ~ 0 ( ) ( ) d i i i H f x f x dx f . Then 2 2 ~0 [ ( ) ] ( ) d d d i i i i i i H H H D f x d x f x d x f d x , from which it follows that 2 2 ~0 () dd i i i HH D f x d x d x f . (3) This formula is used to derive a MC estimato r known as the brute force estimat or or the double loop method. There is another approach to derive Sobol’ sensitivity i ndices. If we consi der x as a random variable uniform ly defined in d H then i D can be expresse d as [1 ]: ~ ~ [ ( ( , )] i i i i i i D Va r E f x x x . (4) This representat ion can be used to derive an extension of Sobo l’ sensitivity indice s for the case of models with depen dent variables [ 11 ]. 3 Differ ent formulas and estimators of th e main effec t sensitivity ind ices 3.1 Original Sobol ’ formula Sobol’ suggested the following Monte Car lo algorithm for the estimatio n of y y S D D [3,4] . 5 Given x and x being two in dependent sample po ints ( , ) x y z and ( , ) x y z ， y D defined in (2) is calculated using the fo llowing formula: 2 0 ( ) ( , ) . y D f x f y z d x d z f (5) In this case, the M onte Carlo estim ator for (5) has a fo rm: 2 11 1 1 ( , ) ( , ) ( , ) N N y k k D f y z f y z f y z N N , (6) where N is a number o f sam pled points. 3.2 Improved for mula of Kucherenko Kucherenko et al [7,8 ] proposed a new formula f or sensitivity i ndices for sensitivity indices which i s es pecially effic ient in the case of indices with small values. Kucherenko and independently Saltelli [6] noticed t hat 2 0 f in (5) can be computed as 2 0 ( ) ( ) f f x f x d x d x . (7) Substituting (7) i nto (5) and taking out a comm on m ultiplier () fx , one can obtaine a new integra l representation fo r y D : ( ) ( , ) ( ) y D f x f y z f x d x d x (8) and the correspon ding Monte Carlo est imator: 1 1 ( , ) ( , ) ( , ) N y k D f y z f y z f y z N . (9) Further we refer to this form ula as “S - K” . 3.3 Improved for mula of Owen Owen extended the idea of Kucherenko by using three independent sample points ( , ) x y z , ( , ) x y z and ( , ) x y z and r eplacing () fx by ( ) ( , ) f x f y z in (8) [10]. As a result y D is calculated us ing the follow ing formula: ( ) ( , ) ( , ) ( ) . y D f x f y z f y z f x d x d x d x (10) The Monte Carlo a lgorithm for (10) has a form: 6 1 1 ( , ) ( , ) ( , ) ( , ) N y k D f y z f y z f y z f y z N . ( 11 ) Further we refer to this form ula as “Owen”. 3.4 Improved for mula of Sobol’ and Myshetz skay Sobol’ and Myshetzskay [9 ] have arg ued that formula (8) can be further i mprov ed by r eplacing () fx in formula by 0 () f x f . Thu s, y D is calculated by using the follow ing formula: 0 ( ) ( , ) ( ) y D f x f f y z f x d x d x d x . (12 ) The corresponding Monte Carlo estim ator has a form : 0 1 1 ( , ) ( , ) ( , ) N y k D f y z f f y z f y z N . (13 ) Following Ow en’ s classification we further call this form ula “Oracle”. 3.5 Double loop reorde ring approach Formula (3) can be used to derive the double loop (the brut e force) MC estimator . In this case N points () , 1 , 2 , . . . , j x j N are genera ted from the joint probability di stribution ( PDF) W e consider the cases of models with independent and dependent inputs . For each random variable i y x , the sample set () , 1 , 2 , . . . , j x j N is sorted and subdivided in M equally populated partitions (bins) each containing / m N N M points ( M N ). W ithin each bin we calculate the local mean value 1 1 ( , ) | ( , ) m N Z k k k m E f y z y f y z N . Finally , t he variance o f all conditiona l averages is co mputed as 2 2 0 11 1 1 ( , ) m N M jj y k k j k m D f y z f M N . ( 14 ) The subdivision in bins is done in the same way for all inputs using the same set of sample d points. This approach we further call the double loop reorder ing (DLR). A critical issue is t he link between N and M . It was sug gested in [12] to use as a “r ule of thum b” M N . In this work we used Sobol’ sequences for QMC sampling [19,20]. T o preserve their uniformity properties N should always be equal to 2 p N , where p is an integer . It m akes observing the “ rule of thumb” more challenging. W e used depend ence of M and m N versus N shown in Fig. 7 1. Fig. 1. Dependence o f the num ber of partitions (bins ) M and sampled poin ts in each partitio n m N versus N W e note that although i t is possible t o extend applicat ion of DLR from a si ngle index ( m =1) to the case of two indices ( m =2 ), its extension t o m higher than 2 is not practical. Another li mitation of DLR in that t here is no sim ilar “brute force” form ula which a llows to compute t otal Sobo l’ sensitivity ind ices. 3.6 Number of funct ion evaluations The fi ve considered MC estimators converge to the same values of the main effect sensitivity indices, but the number of f unction evalua tions pe r one ' i th input for ea ch of these methods is differe nt. T able 1 shows the number of function evaluations CPU N required to compute the whole set of sensitivity indices { , tot iy SS } for a d dimensional function () fx with independen t inputs. Here N is a number of sampled points. W e also included CPU N for metam odel based computation of sensitivity ind ices [15,16 ]. Here HDMR s tands for high dim ensional m odel representation [ 16 ]. 8 Table 1 : Num ber of required func tion evaluations CPU N Method Sobol’ S-K Owen Oracle DLR HDMR Number of funct ion evaluations CPU N (2 1 ) Nd ( 2 ) Nd ( 2 2 ) Nd ( 2 ) Nd N N For models with dependent i nputs the num ber of function evaluations CPU N required to compute the whole set of sens itivity indices { , tot iy SS } ( 2 2 ) CPU N N d [ 11 ]. 4 Numerical tests The objective of this section is to compare perform ances of MC and QMC estimators of considered formulas for main ef fect Sobol’ sensitivity indices i S for models with independent inputs , i.e. direct Sobol’ formula, Sobol’ -Kucherenko (S - K) formula, Owen’ s form ula, Oracle’ s formula and DLR on a set of test cases for which analytical values of Sobol’ sensitivity indices are known. For models with depen dent inputs a fo rmula from [ 11 ] was also compared with DLR. For studying the accuracy , the root mean square error (RMSE) is determined using K independent runs : 1/ 2 ( ) , ( ) 2 1 1 ( ) ( ) K n k a i i i k N S S K , (15) where (n) i S and () a i S are t he numerical and analytical values of i S . Num erical values (n) i S are computed at N , which is reflected in the dependen ce () i N . For the MC method all r uns are statistically independent . For QMC integration for each run a different part of the Sobol’ sequence was used. For all tests w e took K =10. The QMC conv erg ence rate i s known to be (l n ) d Q M C ON N [20]. I n pracitce, the rate of conver gence for QMC methods appears to be approxim ately equal to () ON , with 0.5 1 . For the MC method 0 .5 . QMC method in most cases outperform s MC in terms of conver gence [ 8 ]. In pract ical t ests the RMSE () i N is approximated by the formula , 0 1 cN , and the conver gence rate is extra cted from fi tt ed trend lines . W e consider conver gence rates for various estimators versus N and CPU N . 4.1 Models with ind ependent inputs T est 1: Consider a m odel 9 1 1 2 2 ( ) . . . dd f x a x a x a x , where 12 , , ..., d x x x are independen t normal v ariables: 2 ~ ( , ) i i i xN , 12 , , ... , d a a a are constant coeff icients. The PDF of th e output Y is normal ly distributed, i.e. 22 11 ~ , dd i i i i ii Y N a a , while the PDF of the conditional output is 22 1 , 1 , | ~ , dd i i i j j j j j j i j j i Y X N a x a a . It’ s easy to see that the ana lytical values of Sobol’ indices are 2 2 22 1 to t ii ii d jj j a S S a . W e consider the case d =4 with t he mean values and st andard deviations ( 1 , 3 , 5 , 7) and ( 1 , 1 . 5 , 2, 2. 5) , respectively with all coef ficients i a =1 ( i =1, 2, 3, 4). The analy tical values of i S are i S ={0.0741, 0.167, 0.296, 0.463}. It is clear from the converg ence and RMSE plots presented in Figs. 2-4 that 1) For the MC method for input i = 1 which has a small value 1 S , al l t hree improved formulas have much higher conv ergence rate than the original Sobo l ’ formula with Owen ’s form ula ou tperforming all other methods (Fig. 3) . DLR has a similar performanc e to Oracle’s formula . Situati on is different for input i = 4 which h as rather h igh value 4 S : although a ll three im proved for mulas have h igher convergenc e rate than the original Sobol’ formula but the difference between the original Sobol’ and S- K formulas are smaller. Owen and Oracle formulas are the most efficient among all direct formulas but the clear winner is DLR. We also note, that the extracted convergence rate is close t o 0.5 as expected for th e MC method. 2) For the QMC method for input i = 1 ( a sm all value 1 S ), all three improv ed f ormulas have much higher conv ergence rat e th an the orig inal Sobo l’ form ula with Owen and Oracle form ulas outperform ing all other methods (Fig. 2, 4). DLR shows the hi ghest perform ance superior to direct formulas . For input i = 4 three improved formulas have higher convergence rate than the original Sobol’ formula but the difference between the origina l Sobol’ formula and S - K and Owen’s formulas are smaller similarly to the previous case with MC sam pling . Oracle’s formula is the most efficient 10 among all direct fo rmulas but DLR exhibits even high er convergence. T he extracted conv erg en ce rate is close to 1.0 as exp ected for the QMC m ethod in the case of low effectiv e dimension [8]. (a) (b) (c) ( d) Fig. 2. T est case 1: Conver gence plots of i S , i = 1 ( a), (b), i = 4 (c), (d). QMC sampling. T he red line refers to S-K formula; the blue line r efers to Sobol’ form ula, the green line r efers to Owen’ s form ula, the cyan line refers to Oracle formula, the black line refer s to DLR. On the left: (a), (c) the values of i S obtained at the same number of N . On the right: (b), (d) the values of i S obtained at the same number of CPU N . 11 (a) (b) (c) (d) Fig. 3. T est ca se 1: T he RMSE i i = 1 ( a), (b), i = 4 (c) , (d) versus t he number of N (on the left: (a), (c)) and the number of CPU N (on the right: ( b), (d)). MC sam pling. T he red line refers to S- K formula; the blue line refers t o Sobol’ formula, the gr een line refers to Owen’ s formula, t he cyan line refers to Oracle formula, the black line refers to D LR. 12 (a) (b) (c) (d) Fig. 4. T est case 1: The RMSE i e ps i = 1 (a), (b), i = 4 ( c), (d) versus the number of N (on the left: (a), (c)) and the number of CPU N (on the right: (b), (d)). QMC sam pling. T he red line refers to S- K formula; the blue line refers t o Sobol’ formula, the gr een line refers t o Owen’ s formula, the cyan li ne refers to Oracle formula, the bla ck line refers to DLR. T est 2: Consider a m odel 1 3 5 1 3 6 1 4 5 1 4 6 2 3 4 2 3 5 2 4 5 2 5 6 2 4 7 2 6 7 () f x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x in which all seven variables are independent lognorm al with the mean values 2, 3, 0.001, 0.002, 0.004, 13 0.005 and 0.003, respectively . All the standard deviations are all equal t o 0.4214. This model was taken from [21]. V alues of i S are i S = {0.0350, 0.330, 0.0157, 0.0857, 0.174, 0.2 21, 0.0477}. From the conver gence and R MSE plots presente d in Figs. 5- 7 we can conclude that 1) For the MC method for i nput i = 2 which has a l arge value 2 S , all three improved formulas have a higher convergence rate than the original Sobol’ form ula with Owen and Oracle formulas outperform ing al l other m ethods (Fig. 6). DLR has a superior performance over oth er m ethods. Situation is different for input i = 4 which has sm all 4 S : all three i mprov ed formulas have a much higher convergence rate than the original Sobol’ formula . Owen and Oracle formulas are t he most effi cient among all direct fo rmulas and they show a similar perform ance to DLR. 2) For the QMC method for input s i = 2, 6 (large values i S ), all three improved f ormulas show slightly higher co nvergence rate th an the original Sob ol’ form ula (Fig. 5, 7) . For input i = 4 (small 4 S ) three improved f ormulas hav e m uch higher convergence rate than the original Sobol’ f ormula . DLR shows a superior pe rformance ov er other methods for a ll inputs. (a) (b) 14 (c) (d) (e) (f) Fig. 5. T est case 2: Conver g ence plots of i S , i = 2 (a), (b), i = 4 (c), (d), i = 6 (e), (f). QMC sampling . The red line refers to S-K formula; the blue line refers to Sobol’ formula, the green line refers to Owen’ s formula, the cyan line refers to Oracle formula, the bla ck line refers t o DLR. On the left: (a), (c), (e) the values of i S obtained at the same number of N . On the right: (b), (d), (f) the values of i S obtained at the sam e number of CPU N . 15 (a) (b) (b) (d) Fig. 6. T est ca se 2: T he RMSE i i = 2 ( a), (b), i = 4 (c), (d) versus the number of N (on the left: (a), (c)) and the number of CPU N (on the right: ( b), (d)). MC sampling. T he red line r efers to S- K formula; the blue line refers t o Sobol’ formula, the gr een line refers to Owen’ s formula, t he cyan line refers to Oracle formula, the bla ck line refers to DLR. 16 (a) (b) (c) (d) 17 (e) (f) Fig. 7. Test case 2: The RMSE i i = 2 (a), (b), i = 4 (c), (d), i = 6 (e), (f) versus the number of N (on the left: (a), (c), (e)) and the number of CPU N (on the right: (b), (d), (f)). QMC sampling. The red line refers to S-K formula; the blue line refers to Sobol’ formula, the green line refers to Owen’s formula, the cyan line refers to Or acle formula, the b lack line refer s to DLR. T est 3: The Ishigam i function: 24 1 2 3 1 ( ) s i n ( ) 7( s i n ) 0 . 1 s i n ( ) f x x x x x is often used as a benchmark f or sensitivity analyses st udies [ 1]. Here , 1 , 2 , 3 i xi are uniformly di stributed on the interval [ , ] . The Sobol’ s sensitivity indices have the following values: i S = {0.314, 0.442, 0.00} . From the conver gence a nd RMSE plots p resented in Fig s. 8 -10 we can con clude th at 1) For the MC method for inputs i = 1, 2 which have large values of i S Oracle’s for mula slight ly outperform s other methods (Fig. 9). For i nput i = 3 for which 3 0 . 0 S all three improved formulas have a much higher conv ergence rate than the original Sobol’ formula. DLR has a superior performance over o ther methods for all th ree inputs. 2) For the QMC method for inputs i = 1, 2 direc t formulas show a similar convergence rate (Fig. 8 , 10). DLR shows hig her performance only for inputs i = 1. Figs. 10, d also presents the results f or 2 S obtained using t he QMC-H DMR method f rom [ 16]. Clearly, the QMC- HDMR method shows somewhat bette r converence than oth er methods. . 18 (a) (b) (c) (d) 19 (e) (f) Fig. 8. T est case 3: Conver g ence plots of i S , i = 1 (a), (b), i = 4 (c), ( d), i = 6 (e), (f). QMC sampling. The red line refers to S-K formula; the blue line refers to Sobol’ formula, the green line refers to Owen’ s formula, the cyan line refers to Oracle formula, the bla ck line refers t o DLR. On the left: (a), (c), (e) the values of i S obtained at the same number of N . On the right: (b), (d), (f) the values of i S obtained at the sam e number of CPU N . (a) (b) 20 (c) (d) (e) (f) Fig. 9. Test case 3: The RMSE i i = 1 (a), (b), i = 2 (c), (d), i = 3 (e), (f) versus the number of N (on t he left: (a), (c), (e)) and the number of CPU N (on t he right: (b), (d), ( f)). MC sampling. The red line refers to S-K formula; the blue line refers to Sobol’ formula, the green line refers to Owen’s formula, the cyan line refers to Orac le formula, the b lack line refers to DLR. 21 (a) (b) (c) (d) 22 (e) (f) Fig. 10. Test case 3: The RMSE i i = 1 (a), (b), i = 2 (c), (d), i = 3 (e), (f) versus the number of N (on the left: (a), (c), (e)) and the number of CPU N (on the right: (b), (d), (f)). QMC sampling. The red line refers to S-K formula; the blue line refers to Sobol’ formula, the green line refers to Owen’s formula, the cyan line refers to Oracle formula, the black line refers to DLR , t he m agenta line refers to metam odel based computation o f i S . T est 4: The g-function 11 | 4 2 | () 1 dd ii i i i i xa f x g a , is also often used as a benc hmark [4]. Here d is t he number of independent input factors ( 0 1 i x ) . Parameter i a determines the im portance of the inp ut facto r i x . T est 4.1: W e consider the 10-dimensional g-function with parameters 1 2 0, a a 3 1 0 3. a a The analytical values of i S : 1 2 0 . 3 0 4 , S S 3 1 0 0 . 0 1 9 S S . For i a =0 variable i x is important, for i a =3 variable i x is unimportant, hence only t he first two variables are important. From the conv erg ence and RMSE plots pres ented in Figs. 1 1 -13 we can conc lude that 1) For the MC method for input i = 1 which has a large value of i S , Oracle’s formula slightly outperform s other m ethods (Fig. 12). DLR has a supe rior perform ance over ot her methods. For input i = 23 3 which has a small value of i S all three improved form ulas have a much higher convergence r ate than the original Sobol’ fo rmula with Owen’s f ormula outperforming all other methods . DLR shows performance sim ilar to Owen’s form ula when 11 2 CPU N . 2) T he results for the QMC method qua litatively are sim ilar to those of the MC meth od (Figs. 11, 13) with the only difference in t hat DLR shows a super ior performance among all other m ethods for input i = 3. Howev er, quantitatively th e rate of conv ergence for the Q MC method is m uch higher than that for the MC m ethod. (a) (b) (c) (d) 24 Fig. 1 1. T est case 3: Conv erg ence plots of i S , i = 1 (a), (b), i = 3 (c), (d). QMC sampling . The red line refers to S-K formula; the blue line r efers to Sobol’ form ula, the green line r efers t o Owen’ s formula, the cyan line refers to Oracle form ula, the black l ine refers to DLR. On the l eft: ( a), (c) the values of i S obtained at t he sam e number of N . On t he right: (b), (d) the values of i S obtained at the same number of CPU N . (a) (b) (c) (d) Fig. 12. Test case 4.1: The RMSE i e ps i = 1 (a), (b) , i = 3 (c), (d) versus the number of N (on the left: (a), (c)) and t he number of CPU N (on the right: ( b), (d)). MC sampling. T he red line refers to 25 S-K formula; the blue line refers to Sobol’ f ormula, the green li ne refers to Owen ’s for mula, t he cyan line refers to Ora cle form ula, the black line refers to DLR. (a) (b ) (c) (d) Fig. 13. Test case 4.1: The RMSE i i = 1 (a), (b), i = 3 (c), (d) versus the number of N (on the left: (a), (c)) and the number of CPU N (on the right: (b), (d)). QMC sam pling. T he red li ne refers to S-K formula; the blue line refers to Sobol’ f ormula, the green li ne refers to Owen ’s for mula, t he cyan line refers to Ora cle form ula, the black line refers to DLR. 26 Test 4 .2 : All parameters 0 , 1 , 2 , . . . , 1 0 i a i , t he analytical value of i S is 0.0199. A ll i nputs are equally important and there are strong interaction s between inputs. This is type C function [ 8] for which the QMC m ethod is not more efficient than MC. From the convergence and RMSE plots presented in Figs. 14- 16 we can concl ude that 1) For the MC method, all three improv ed formulas show slightly higher convergence rate than the original Sobol’ fo rmula (Fig. 15). DLR outperforms other m ethods. 2) T he results for the QMC method are similar t o those of the MC method (Figs. 14, 16 ) with the only difference in that DLR has a hig her higher conv ergence rate than tha t for the MC m ethod. (a) (b) Fig. 14. T est case 4. 2: Converg ence plots of i S , i = 1. QMC sampling. The red line refers to S- K formula; the blue line refers t o Sobol’ formula, the gr een line refers to Owen’ s formula, t he cyan line refers to Oracle formula, the black line refers to DLR. On the left: (a) the values of i S obtained at the same num ber of N . On the rig ht: (b) the values of i S obtained at the same num ber of CPU N . 27 (a) (b) Fig. 15. Test case 4.2: The RMSE i e ps versus the number of N (on the left: (a)) and the num ber of CPU N (on the right: (b)). MC sam pling. The red line refers to S-K f orm ula; the blue line refers to Sobol’ formula, the gree n line r efers to Owen ’s formula, the cyan line r efers to Oracle formula, the black line refer s to DLR. (a) (b) Fig. 16. Test case 4.2: The RMSE i versus the number of N (on the left: (a)) and the number of CPU N (on t he right: (b)). QMC sampling. The red line refers to S-K formula; the blue li ne refers to 28 Sobol’ formula, the gree n line r efers to Owen ’s formula, the cyan line refers to Oracle formula, the black line refer s to DLR. 4.2 Models with dep endent inputs T est case 5 : Consider a model 1 3 2 4 () f x x x x x , where 1 2 3 4 ( , , , ) ~ ( , ) x x x x N C x with 34 ( 0 , 0 , , ) μ and the covariance m atrix 2 1 1 2 2 1 2 2 2 3 3 4 2 3 4 4 00 00 00 00 x C . This test case was cons idered i n [ 11 ] were the analytical values of the main and t otal orde r indices were presented (T able 2). T able 2: T est case 5. Analytical values of the main and total o rder indices. 1 x 2 x 3 x 4 x i S 2 2 2 1 3 4 12 1 D 2 2 1 2 4 3 12 2 D 0 0 tot i S 2 2 2 2 1 1 2 3 3 1 D 2 2 2 2 2 1 2 4 4 1 D 2 2 2 1 3 3 4 1 D 2 2 2 2 4 3 4 1 D Here ij ij ij and 2 2 2 2 2 2 1 3 3 2 4 4 1 2 3 4 3 4 2 ( ) D . For numerica l test we used the following parameters : ( 0 , 0 , 25 0 , 40 0) μ , 44 44 1 6 2 .4 0 0 2 .4 4 0 0 0 0 4 1 0 1 .8 1 0 0 0 1 .8 1 0 9 1 0 x C . N umerical valu es of Sobol ’ sensitivity indices are i S = {0.507, 0.399, 0, 0}. From the conver gence plots presented in Fig. 17 we can conclude th at DLR outperforms the extend ed version of Sobol’ formula [ 11 ] for high values of i S ( i =1, 2), however for zero values of i S ( i =3, 4) it is s lightly less effici ent than the extended version of Sobol’ formula. W e note that DLR is m uch easier to im p lem en t algorithmically as it does not require r ather complex procedure of sampling from condit ional 29 distribution. T he CPU t ime required for the extended version of Sobol’ formula is 19.6 s versus only 1.78 s required fo r DLR. Fig. 17. Test case 5: Convergence plots of i S , i = 1,..,4. QMC sampling. The red line r efers to S- K formula; the black line refers to DLR. T he values of i S obtained at the same number of CPU N . 1 S - upper lines with crosses, 2 S - lines with circles, 3 S - lines with triangles, 4 S - lower lines with crosses. T est case 6: Consider the linear model 1 2 3 () f x x x x , where all input variables are normally distr ibuted with zero m ean and the covariance m atrix C x : 2 1 0 0 01 0 C x . Analytical valu es of both main ef fect and total Sobol’ sensitivity indices were given in [ 11 ] (T able 3). 30 T able 3: T est case 6. Analytical value s of the main and total o rder indices. 1 x 2 x 3 x i S 2 1 2 2 2 2 1 2 2 2 2 2 2 tot i S 2 1 2 2 2 2 1 2 2 22 2 1 2 2 For numerical test we used t he following parameters: (0 , 0 , 0) , 2 , 0 . 8 . Similarly to the previous test case f rom the convergence plots presented in Fig. 18 we can conclude that DLR outperform s the extended version o f Sobol’ f ormula for high values of i S ( i =1 , 2), ho we ve r for sm all values of i S ( i =3) both meth ods show a sim ilar performance. Fig. 18. Test case 6: Convergence plots of i S , i = 1,2,3. QMC sampling. The red line r efers to S- K formula; the black line refers to DLR. T he values of i S obtained at the same number of CPU N . 1 S - upper lines wi th crosses, 2 S - lines with c ircles, 3 S - medium lines with crosses. 5 Conclusions 31 In this paper we compared the best known direct fo rmulas and the so-called double loop re ordering approach for estim ation Sobol’ main effect indices on a se t of test functions for m odels with independen t and dependent inputs. Both MC and QMC samplings were consider ed. From the converg ence results for models wi th independent inputs it follows that i n majority of tests cases improved direct fo rm ul as show much hi gher efficien cy than the original Sobol’ formula especially for cases of small values of Sobol’ indice s with Owen and Oracle form ulas outperforming ot her formulas. DL R out perf orm s dir ect formulas on averag e and by a wide marg in when the values of Sobol’ indices are not very small . For models with depe ndent inputs DLR is m uch easier to i mplem ent algorithmically than the di rect extended Sobol’ formula and hence it is much faster to run. However , practica lly the DLR method is limited to com puting Sobol’ main ef fect indices for a single index only . Converg ence of all met hod s is much hig her when QMC sam pling is used apart from the case o f type C function. Acknowledgemen ts The authors would like t o thank B. Delpuech for his he lp in pre paration of this work. The financial support by the EPSR C grant EP/H03126X /1 is gratefully acknowledged. Referenc es [1] Saltelli A. , Tarantola S. , Campolongo F. , Ratto M. Sensitivity anal ysis in practice. London: Wiley; 2004. [2] Sobol' I. , Kucherenko S. Global sensitivity indices for nonlinear mathematical models. Review . W ilm ott Mag 2005;1:56-61. [3 ] Sobol’ I. Sensitivity estim ates for nonlinear mathem atical models. Matem Mod 1990;2(1):1 12 – 1 18, [in Ru ssian, Tran slated in I.M. Sobol’, Sensitivity estimates for nonlinear mathe matical models, Mat h Mod and Com p Exp 1993;26:407 – 414. [4 ] Sobol’ I. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math Com put Sim ul 2001;55(1 – 3):271 – 280. [5 ] Saltelli A., Annoni P. , Azzini I. , Campolongo F. , Ratto M. , Tarantola S. V ariance based sensitivity analysis of model output. Design and estimator for the t otal sensitivity index. Computer Physics Comm unications, 2010;181(2):259-270. [6 ] Saltelli A. Making best use of model valuations t o compute sensitivity i ndices . Computer Physics Comm unications, 2002;145:280 – 297. [7] Sobol’ I. , Tarantola S. , Gatelli D. , Kucherenko S., Mauntz W. E stimating the approximation erro r when ﬁxing unessential fa ctors in g lobal sensitivity analysis. Re liability Engineering and System Safety 2007;92:957 – 960. [8 ] K ucherenko S., Feil B. , Shah N. , Mauntz W. The identification of model effective dimension s using global sens itivity analy sis. Reliability Engineering and System Safety 2011 ;96:440 – 449. 32 [9] Sobol’ I. , Myshetskaya E. Mont e Carlo estimators f or small sensitivity i ndices. Mont e Carlo Meth and their Applic ation 2007;13(5 – 6) :455 – 465. [10] Owen A. Better estimation of sm all Sobol' sensi tivity indices. ACM Tra ns on Mod and Comp Simul 2013;23( 1 1):1-17. [11 ] K ucherenk o S., T arantola S. , Annoni P . Estim ation of global sensitivity indices for models wit h dependent variab les, Computer Phy sics Comm unications 2012;183:937 – 946. [12] Plischke E. An adaptive correlation ratio m ethod usi ng the cumulative sum of the reordere d output. Reliabili ty Engineering and System Safety 2012;107 :149 – 156. [13] T arantola S. , Gatelli D. , Ma ra T . Random balance designs for the estimation of first order global sensitivity ind ices. Reliability Engineering and Sy stem Safety 2006; 91(6):717- 727. [14] Marrel A. , Iooss B. , Lauren t B. , Roustant O. Calculations of Sobol indices for the Gauss ian process metam odel. Reliability Engineering and Sy stem Safety 2009;94:742- 51. [ 15 ] Blatman G. , Sudret B. Adaptive sparse pol ynom ial chaos expansion based on least angle regression. J Com put Phys 201 1;230 :2345 – 2367. [16] Zuniga M, Kuche renko S., Shah N. Metamodell ing with independent and dependent inputs. Computer Phy sics Comm unications 2013;184(6):1570- 1580. [17] Janon A. , Klein T. , Lag noux A. , Nod et M. , Prieur C. Asymptotic normality and efficiency of two Sobol index estim ators. ESAI M: Probability and Statis tics 2014;18:342-364. [18 ] Kucherenk o S., Delpuech B. , Iooss B. , T arantola S. Applicat ion of the control v ariate technique to estimation of tota l sensitivity indices. Reliability Engineering and System Safety 2015;134:251 – 259. [19 ] Sobol’ I. Uniformly distributed sequences with additional uniformity properties. USSR Comput. Math. and Math. Phy s. 1976;16(5):236- 242. [20] Sobol' I. , Asotsky D. , Kreinin A. , Kuchere nko S. Construction and comparison of high- dim ensional Sobol’ generators . W ilmott Mag 201 1;Nov:64-79. [21 ] Park C. , Ahn K . A new approach for measuring uncertainty importance and distributiona l sensitivity in probabilistic safety assessm ent. Reliability Engineering and System Safety 1994;46(3):363- 261.

Different numerical estimators for main effect global sensitivity indices

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment