Confidence intervals for sensitivity indices using reduced-basis metamodels

Conﬁdence in terv als for sensitivit y indices using reduced-basis metamo dels A. Janon ∗ , M. No de t , C. Prieur Joseph F ourier University, L ab or atoir e Je an K untzmann , MOISE te am, BP 53, 38041 Gr enoble Ce dex, F r anc e Abstract Global sensitivit y analysis is often impracticable for complex and time demanding nume rical mo d- els, as it requires a large num b er of runs. The re duced-basis approach prov ides a w a y to replace the original mo del by a m uch faster to run co de. I n this pap er, w e are intere sted in the infor matio n loss induced b y the approxi matio n on the es timation of sensitivit y indices . W e presen t a metho d to pro vide a robust error assessm ent, hence enabling s igniﬁcant time sa vings without s acriﬁce on precision and rigourousn ess. W e illustrate our m etho d with an experimen t where computation time is d ivided b y a factor of nearly 6. W e also giv e dire ctions on tuning s ome of the parameters used in our estimation algorithms . K eywor ds: sensitiv ity analysis, reduced basis metho d, Sob ol indices, b o o tstrap metho d, Monte Carlo metho d. In tro duction Man y mathematical mo dels use a large num b er of p o orly-kno wn parameters as inputs. When suc h mo dels are encoun tered, it is imp ort a n t for the practitioner to quan tify whether this uncertain ty on the inputs has a larg e rep ercussion on the mo del output. This problem can b e tack led b y turning the uncertain input parameters into random v aria bles, whose probabilit y distrib ution reﬂects the practitioner’s b elief ab out the o ddness of the fact that an input parameter tak es some v alue. In turn, mo del output, as function of the mo del inputs, is a random v ariable; its probabilit y distribution, uniquely determined b y the inputs’ distrib ution and the mo del itself, can giv e detailed and v aluable information a b out the b ehav ior of the output when input parameters v ary: range of attained v alues, mean v a lue and disp ersion ab out the mean (throughout expectation and standard deviation), most probable v alues (mo des), etc. Sensitivity analysis aims to iden tify the sensitiv e parameters, that is the paramete rs for whic h a small v aria tio n implies a large v ariation of the mo del output. In sto chastic sensitivity analysis , one mak es use of the output’s probabilit y distribution to deﬁne (amongst other measures of sensitivit y) sensitivity indic es (a lso kno wn as Sob ol indic es ). Sensitivit y index of an output with respect to an input v ariable is the fraction of the v ariance of the output whic h can b e “explained” b y the v ariation of the input v ariable, either alone (one then sp eaks ab out main eﬀe ct ), or in conjunction with other input v ariables ( total eﬀe ct ). T his w ay , input v ariables can b e sorted by the order of imp o r ta nce they ha ve on the o utput. One can also consider the part of v ariance caused b y the v ariatio n of groups of t wo or more inputs, although main eﬀects and total eﬀects are generally ∗ Corresp onding author Email addr esses: al exandr e.jan on@imag.fr (A. Jano n), mae lle.n odet@i nria.fr (M. No det), clemen tine. prieu r@imag.fr (C. Prieur ) Pr eprint submitte d to Computational St atistics & Data A nalysis Octob er 30, 2018 suﬃcien t to pro duce a satisfying sensitivit y analysis. The reader is referred to Helton et al. ( 2006 ); Saltelli et al. for more info r matio n ab out uncertain ty and sensitivi ty analysis. Once these indices hav e b een deﬁned, the question of their eﬀectiv e calculation remains op en. F or most mo dels, an exact, analytic computation is not attainable (ev en expressing an output as an analytic function of the inputs is infeasible) and one has to use n umerical approximations . A ro bust, p o pular wa y to obtain suc h approx imations is Monte Carlo estimation. Th is metho d sim ulates randomness in inputs b y sampling a large n umber of parameters’ v alues (from the sele cted inputs’ distribution). The mo del output is then computed f or eac h sampl ed v alue of the paramete rs. This w ay , one obtains a sample of outputs, under the conjugate a ction of the mo del and the input distribution. This sample of outputs is f ed in to a suitable statistical estimator of the desired sensitiv ity index to pro duce a n umerical estimate. The Mont e Carlo approach to computation of Sob ol indices is describ ed in Sob ol ( 2001 ), together with impro v emen ts in Homma and Saltelli ( 1996 ); Saltelli ( 2002 ). A ma jo r dra wbac k of the Mon te Carlo estimation is that a la r g e num b er of outputs of the mo del ha v e to b e ev aluated for the resulting a pproximation of the sens itivity index to b e accurate enough. In complex models, where a sim ulation fo r one single v alue of the parameters may tak e sev eral min utes, the use of these methods “as-is” is impracticable. In those cases, o ne generally mak es use of a surr o gate mo del (also kn own as r e duc e d mo del , me tamo de l or r esp onse surfac e ). The surrogate mo del has to approx imate w ell the original mo del (called the ful l mo del), while b eing man y times faster to ev aluate. The sensitiv ity index is then calculated via a sample of o ut- puts generated b y a call to the surrogate mo del, thus accelerating the ov erall computation time. The r e duc e d-b a sis (RB) metho d Nguy en et al. ( 2 005 ); Grepl and P atera ( 200 5 ); V eroy and P atera ( 2005 ); G repl et al. ( 200 7 ) is a wa y of deﬁning surrogate mo dels when the original mo del is a dis- cretization of a partial diﬀeren tial equation (PDE) dep ending on the input parameters. It comes with an err or b ound , that is, an upper b ound on the error b et we en the original output a nd the surrogate output. The sensitiv ity index produced by Mon te Carlo estimation on a surrogate mo del is tainte d with a t wofold error. Firstly , o ur Mon te-Carlo samplin g pro cedure assimilates the whole ( g enerally inﬁnite) p opulation o f p o ssible inputs w ith the ﬁni te, randomly c hosen, sample; this produces sampling , or Monte-Carlo err or . Secondly , using a surrogate mo del biases the estimation of the Sob ol index, as what is actually estimated is sensitivit y of surrogate o utput, and not the full one; w e call this bias the metamo del err or . In order to make a rigoro us sensitivit y analysis, it is imp ortant t o assess the magnitude of these t w o combine d errors on the estimated sensitivit y indices. S uch assessme nt can also b e used to help in the c hoice of correct appro ximation parameters (Mon te-Carlo sample size a nd metamo del ﬁdelit y) t o achie ve a desired precision in estimated indice s. Sampling error can be classically estimated for a moderate cost by using b o otstr ap r es ampling Efron et al. ( 1993 ); Arc her et al. ( 1997 ). Based on statistical estimation theory , the bo otstrap tec hnique in volv es the generation of a sample of replications of sens itivity index estimator, whose empirical distribution serv es as appro ximation of the true (unkno wn) estimator distribution, in order to pro duce asymptotic conﬁdenc e in terv als whic h give go o d results in man y practical cases. A v ar ia tio n on the b o otstrap, whic h addresses sampling error as well as metamo del error, has b een prop osed in Storlie et al. ( 2009 ); also Marrel et al. ( 2009 ) dev elops a metho dology in Kriging metamo dels. In this paper, w e presen t another conﬁdence in terv al- based a pproac h for assessing sampling errors, together with error s caused b y reduced-basis metamo dels, whic h mak es use of the certiﬁed, a p osteriori error b ound that comes with the reduced-basis metho d. Bo y av al et al. ( 2009 ) also mak es use o f the reduced-basis output error b o und to certify computation of the expectation and the v ariance of a mo del output with neglecte d sampling error. The adv antages o f our approac h ar e: its rigorousness (the impact of the use of a surrogate mo del 2 is prov ably b ounded), its eﬃcie ncy (our b ounds are rather sharp, a nd go to zero when metamo del errors decrease), its cle ar separation b et w een estimation (sampling) and me ta mo del error, and mo derate comp utat io nal requirem ents (time sh o uld rather b e sp ent at making a precise compu- tation than at measuring precision). In other w ords, our met ho d allo ws to estimate sensitivit y indices b y using a reduced basis metamo del whic h larg ely sp eeds up computation times, wh ile rigorously k eeping track of the precision of the estimation. This pap er is or g anized as follows : in the ﬁrst part, we go through the prerequisites for our approac h: we give the deﬁnition and standard Mon te Carlo estimator of the sensitiv ity indices we are inte rested in, and giv e an ov erview of the reduced basis metho d; in the second and third parts, w e presen t our conﬁdence inte rv al estimation tec hnique fo r t he sensitivit y index, whic h accoun ts for the t wo sources of error describ ed earlier. In the fourth part, w e presen t the n umerical results w e obtain on an example of a reduced-basis metamo del. 1. Mo del output and sensitivity analysis metho dology 1.1. Sensi tivity indic es 1.1.1. Gener a l setting In or der to deﬁne sensitivit y indices, w e c ho ose a probabilit y distribution for the input v ariables, turning eac h input v ariable X i ( i = 1 , . . . , p ) in to a random v ariable with kno wn dis tribution; the mo del output Y = f ( X 1 , . . . , X p ) (assume d to b e a scalar; multipl e o utputs can b e treated separately) is th us for X = ( X 1 , . . . , X p ) a σ ( X )-measurable random v ariable. W e further assume that the X i ’s are mutu a lly indep enden t. W e also ﬁx throughout all this pap er an input v a r ia ble of in terest 1 ≤ i ≤ p . W e deﬁne the ﬁrst-or der main eﬀe ct of X i on Y b y: S i = V ar E ( Y | X i ) V ar Y (1) S i is the sensitivit y index in whic h we are in terested in this pap er but other indices (to tal eﬀect, high-order eﬀects) exist and o ur metho dology can readily b e extended to these indices. 1.1.2. Monte-Carlo estimator W e ar e in terested in the follo wing Mon te-Carlo estimator for S i Homma and Saltelli ( 1996 ); Saltell i ( 2002 ): a sample size N ∈ N b eing giv en, let n X k o k =1 ,...,N and n X ′ k o k =1 ,...,N b e t wo random i.i.d. samples of size N eac h, dra wn from the distribution of the input v ector X . F o r k = 1 , . . . , N , w e note: y k = f ( X k ) and: y ′ k = f ( X ′ k 1 , . . . , X ′ k i − 1 , X k i , X ′ k i +1 , . . . , X ′ k p ) The Mon te-Carlo estimator of S i is then g iv en by : c S i = 1 N P N k =1 y k y ′ k −  1 N P N k =1 y k   1 N P N k =1 y ′ k  1 N P N k =1 y 2 k −  1 N P N k =1 y k  2 (2) It can b e sho wn that c S i is a strongly consisten t estimator of S i . 1.2. Metamo del c ons truction: overview of the r e duc e d b asis metho d W e brieﬂy presen t here the reduced basis metho d for a ﬃnely parametrized elliptic partial diﬀer- en tial equations. F or more details, see Nguy en et al. ( 2005 ). 3 1.2.1. Oﬄine-online de c omp osition The reduced basis metho d deals with v a ria tional problems of the fo rm: giv en an input parameter v ector X b elonging to a parameter set D ⊂ R p , ﬁnd u ( X ) ∈ F so that a ( u, v ; X ) = ψ ( v ) ∀ v ∈ F (3) where F is an appropriate ﬁnite dimensional f unction space (usually a discretization of a con tinuous function space suc h as H 1 or H 1 0 ), ψ is a linear form on F , and a ( · , · ; X ) is a parameter-dep enden t bilinear form on F that can b e written under aﬃne form : a ( v , w ; X ) = Q X q =1 Θ q ( X ) a q ( v , w ) ∀ v , w ∈ F where Θ q are arbitrary real functions deﬁned on D , and a q are parameter-indep enden t bilinear functions on F . T raditional computation of u ( X ) for a prescribed X inv olv es lo oking for u ( X ) as a linear com bi- nation: u ( X ) = dim F X i =1 u i ( X ) φ i where ( φ i ) i =1 ,..., dim F is a suitable basis of F , a nd the unkno wns are ( u i ( X )) i =1 ,..., dim F . This wa y , one obtains the follo wing linear system of (dim F ) equations: dim F X i =1   Q X q =1 Θ q ( X ) a q ( φ i , φ j )   u i ( X ) = ψ ( φ j ) j = 1 , . . . , dim F (4) In man y cases, the space F has a large dimension, so as to represen t man y functions of the con tin uous f unction space with a g reat precision, and the system ( 4 ) is ve ry large (a lthough one can generally c ho ose F and ( φ i ) so a s to pro duce a v ery sparse system). When ( 3 ) has to b e solv ed for man y v alues of X (the so-called many query con text), the reduced basis metho d can b e used to sp eed up the ov erall computation, whic h is split into t w o parts. In the oﬄin e phase, w e choose a linearly independen t family B = { ζ 1 , . . . , ζ n } of n vec to rs in F , and compute and store the Q n -b y- n matrices o f eac h a q form ( ie. the matrices A q whose ( i, j ) co eﬃcien t is a q ( ζ i , ζ j )) and the n -vec tor represen ting ψ ( ie. the v ector whose i th co eﬃcien t is ψ ( ζ i )) in the basis B (called the r e d uc e d b asis ). The oﬄine phase do es no t dep end on a particular v alue of X a nd can b e done once. Then, for each v alue of X for whic h u ( X ) has to b e computed, one can pro ceed to the onlin e phase: using information stored during the oﬄine phase, the follo wing n -b y- n linear system is assem bled a nd solv ed for ( e u i ( X )) i =1 ,...,n : n X i =1   Q X q =1 Θ q ( X ) a q ( ζ i , ζ j )   e u i ( X ) = ψ ( ζ j ) j = 1 , . . . , n (5) Then e u ( X ) ≈ u ( X ) is recov ered by using e u ( X ) = n X i =1 e u i ( X ) ζ i . In man y cases, { u ( X ); X ∈ D } lies in a manifold of dimension m uch smaller than dim F , and it is p ossible t o c ho ose a linear space of appro ximation of dimension n ≪ dim F and thus , solv e ( 5 ) muc h faster than ( 4 ) while k eeping e u suﬃcien tly close to u . At the end of this section w e will see a metho d to automatically choose a n “eﬀectiv e” reduced basis, that allow s accurate represen tation of u ( X ) for X ∈ D . When the mo del output f ( X ) is a linear functional f ( u ( X )) of u ( X ), the surrogate output can b e deﬁned as: e f ( X ) := f ( e u ( X )) = Q X q =1 e u i ( X ) f ( ζ i ) (6) 4 whose para meter-indep enden t reals f ( ζ i ) ( i = 1 , . . . , n ) can b e calcu lated and stored during the oﬄine phase, allo wing ev aluation of e f ( X ) without explicitly forming e u ( X ), and leading to a meta- mo del whose complexit y of ev aluation dep ends only on its dimens ion n (and on Q ) – and no more on the dimension of the orig inal mo del dim F . 1.2.2. Err or b ound An in teresting feature of the reduced basis approach is that it comes with a prov a ble error b ound ε u ( X ) fully computable with a complexit y indep enden t of dim F Nguy en et al. ( 2 005 ). This error b ound satisﬁes k u ( X ) − e u ( X ) k ≤ ε u ( X ) ∀ X ∈ D for a c hosen Hilb ert space norm k·k on F . T o presen t the error b ound, we assume, for simplicit y , that the a q ’s are symmetric bilinear forms, a nd that a ( · , · ; X ) is uniformly c o er cive , that is, α ( X ) deﬁned b y: α ( X ) = sup v ∈F , k v k =1 a ( v , v ; X ) satisﬁes α ( X ) > 0 for all X ∈ D . W e claim that: k u ( X ) − e u ( X ) k ≤ k r ( X ) k F ′ α ( X ) ∀ X ∈ D where r ( X ) is the residual linear form, deﬁned by : r ( X )( v ) = ψ ( v ) − a ( e u ( X ) , v ; X ) and k·k F ′ is the dual norm on F : k ℓ k F ′ = sup v ∈F , k v k =1 ℓ ( v ) Eﬃcien t pro cedures ha v e b een dev elop ed f or eﬃcien t oﬄine-online computation of k r ( X ) k F ′ , a nd a lower b ound e α ( X ) < α ( X ), leading to a computable error b ound on u : ε u ( X ) = k r ( X ) k F ′ e α ( X ) This error b ound on u can b e used to dev elop an error b ound on the output. F or example, if f ( X ) = f ( u ( X )) and e f ( X ) is the surrogate output deﬁned in ( 6 ), one can use ε ( X ) = k f k F ′ ε u ( X ) (7) whic h satisﬁes:    f ( X ) − e f ( X )    ≤ ε ( X ) ∀ X ∈ D (8) as error b ound on the output. 1.2.3. POD-b ase d pr o c e dur e for r e duc e d b asis choic e W e now describ e a w ay of selecti ng a reduced basis { ζ 1 , . . . , ζ n } . W e randomly choose a ﬁnite subset Ξ = { X 1 , . . . , X m } ⊂ D , and compute u ( X ) for eac h X ∈ Ξ. W e put the coor dinates of u ( X ) with respect to the basis { φ 1 , . . . , φ dim F } of F as columns of a snapshot matrix S : S =       u ( X 1 ) 1 u ( X 2 ) 1 . . . u ( X m ) 1 u ( X 1 ) 2 u ( X 2 ) 2 . . . u ( X m ) 2 . . . . . . . . . . . . u ( X 1 ) dim F u ( X 2 ) dim F . . . u ( X m ) dim F       (9) 5 W e now pro ceed with the Pr op er Ortho gonal De c omp osition ( PO D) of the S matrix: we compute { z 1 , . . . , z n } , wh ere z i is an eigen ve ctor asso ciated with the i th largest eigen v a lue of the m -by- m symmetric matrix S T Ω S (where Ω is the matrix of the scalar product <, > asso ciated with k ·k , with resp ect to the { φ 1 , . . . , φ dim F } basis), and deﬁne the v ectors of the reduced basis to b e: ζ i = S z i k S z i k One can sho w that the { ζ 1 , . . . , ζ n } are solutions of the f o llo wing optimization program: Minimize X X ∈ Ξ k u ( X ) − π [ u ( X )] k 2 , under the constrain ts < ζ i , ζ j > = ( 1 if i = j 0 else. where π is the orthogonal pro j ector onto Span( ζ 1 , . . . , ζ n ). Prop er orthogonal decomposition (also kno wn as Principal compo nen t analysis (PCA), or Singular v alue decomposition (SVD)) Chatterjee ( 2000 ), and v arian ts of POD, are widely used in mo del reduction without error bounds Bui-Thanh et al. ( 2007 ); Bergmann and Iollo ( 2008 ). W e also sho w ed in Janon et al. ( 201 0, s ubm i tte d . ) that POD reduced bases are eﬃcien t with resp ect to the obtained error b ounds. 1.3. Estimator on r e duc e d mo del Using the reduced mo del t o p erfor m the sensitivit y analysis is straightforw ard: replace eve ry call to the full mo del f b y a call to the reduced one e f . This give s rise to an estimator c f S i con v erging, for N → + ∞ , to the true v alue of the sensitivit y index on e f : e S i = V ar E ( e Y | X i ) V ar e Y Sampling error of this estimator can b e a ssessed using b o otstrap Efron et al. ( 1993 ), see Algorithm 1 . Ho w eve r, doing so do es not tak e in accoun t the gap b et w een S i and e S i . If the metamo del is “to o far” from the o r ig inal mo del, the (1 − α )–conﬁdence in terv al estimated using it will not con tain the true v a lue of S i with probabilit y close to 1 − α . On the o ther hand, a mo derate-ﬁdelity metamo del migh t b e w ell-suited to giv e a “ro ugh” estimate of sens itivit y indices – in some cases suc h a rough estimate is suﬃcie nt – but the user w ould lik e b e informed that the metamo del he uses giv es him a limited-precision estimator; suc h a limited precision w ould reﬂect in the increase in the width of the output conﬁdence in terv al. 2. Quan tiﬁcation of the tw o types of error in index est imation W e now presen t our method for estimating the t w o ty p es of error that o ccur in Monte-Carlo sensitiv ity index estimation on a reduced-basis metamo del. In the ﬁrst part 2.1 , we review the b o otstrap, whic h w e will use for the treatment of sampling error. In the second part 2.2 , w e sho w ho w to use reduced-basis b ounds to assess metamo del error. 2.1. Samp l i n g err or : b o otstr a p c onﬁdenc e intervals Sampling error, due to the Monte - Carlo ev a luatio n of the v ariances in ( 1 ), can b e quan tiﬁed through an appro ximate conﬁdence in terv al calculated using b o otstrap Arc her et al. ( 1997 ). W e use the bias-corrected (BC) p ercen tile metho d presen ted in Efron ( 198 1 ); Efron and Tibshirani ( 1986 ). The principle of this metho d can b e summed up the follo wing wa y: let b θ ( X 1 , . . . , X n ) b e an 6 estimator for an unkno wn parameter θ in a reference p opulation P . T o get a p oin t estimate o f θ , one take s a random i.i.d. n -sample { x 1 , . . . , x n } from P , and computes b θ ( x 1 , . . . , x n ). In (nonpara- metric) b o otstrap w e rep eatedly , for b = 1 , . . . , B , sample { x 1 [ b ] , . . . , x n [ b ] } with replacemen t from the original sample { x 1 , . . . , x n } and get a r eplic ation of b θ by computing b θ [ b ] = b θ ( x 1 [ b ] , . . . , x n [ b ]). This w a y w e get a sample R = { b θ [1] , . . . , b θ [ B ] } of replications of b θ . No w see how this sample can b e used to estimate a conﬁdence interv al for θ . W e denote by Φ the standard normal cdf: Φ( z ) = 1 √ 2 π Z z −∞ exp − t 2 2 ! d t and b y Φ − 1 its in v erse. Using R and the p oin t estimate b θ = b θ ( x 1 , . . . , x n ), a “bias correction constan t” z 0 can b e estimated: c z 0 = Φ − 1 # { b θ [ b ] ∈ R s.t. b θ [ b ] ≤ b θ } B ! Then, for β ∈ ]0; 1[, w e deﬁne the “corrected quan tile estimate” b q ( β ): b q ( β ) = Φ(2 c z 0 + z β ) where z β satisﬁes Φ( z β ) = β . The cen tral BC b o otstrap conﬁdence in terv al of lev el 1 − α is then estimated b y the in terv al whose endpo in ts are the b q ( α / 2) and b q (1 − α/ 2) quantile s of R . This conﬁdence in terv al has b een justiﬁed in Efron ( 1981 ) when there exists an increasing trans- formation g , z 0 ∈ R and σ > 0 suc h that g ( b θ ) ∼ N ( θ − z 0 σ , σ ) and g ( b θ ∗ ) ∼ N ( b θ − z 0 σ , σ ), where b θ ∗ is the b o otstrapp ed b θ , for ﬁxed sample { x 1 , . . . , x n } and (hence) ﬁxed b θ = b θ ( x 1 , . . . , x n ). In practice, due to the complex analytic expressions ( 20 ) of the estimators w e are going to b o otstrap, it seems hard to prov e that suc h a g exists. Ho wev er, we giv e in Section 4.4 empirical ev idence that, for the t w o estimators deﬁned at ( 20 ), g can appro ximativ ely b e ch osen as identit y . The full computation metho d of a BC b o otstrap conﬁdence inte rv al for S i is g iven in Algorithm 1 . The k ey adv antage of b o otstrapping our sensitivit y estimators is that we do not require supple - men tary mo del ev aluations to estimate a conﬁdenc e interv al; hence the computational o v erhead for getting a conﬁdenc e interv al (v ersus p oint wise estimation only) remains quite mo dest. 2.2. Metamo del err or F o r a pair of samples  { X k } k =1 ,...,N , { X ′ k } k =1 ,...,N  , w e can use o ur metamo del output e f and o ur metamo del error b ound ε to compute, for k = 1 , . . . , N : e y k = e f ( X k ) , e y ′ k = e f ( X ′ k 1 , . . . , X ′ k i − 1 , X k i , X ′ k i +1 , . . . , X ′ k p ) and: ε k = ε ( X k ) , ε ′ k = ε ( X ′ k 1 , . . . , X ′ k i − 1 , X k i , X ′ k i +1 , . . . , X ′ k p ) In this section, w e ﬁnd accurate, explicitly a nd eﬃcien tly c o mputable b ounds b S m i and b S M i , de- p ending only on e y k , e y ′ k , ε k and ε ′ k so that: b S m i ≤ b S i ≤ b S M i (10) In other words , w e w ant low er and upp er bounds on the full mo del based sensitivit y index estimator b S i computable from surrogate mo del calls. Let: R ( a ; y , µ, µ ′ ) = N X k =1 ( y ′ k − ( a ( y k − µ ) + µ ′ )) 2 7 where y = ( y 1 , . . . , y N , y ′ 1 , . . . , y ′ N ) and µ , µ ′ ∈ R . By setting ﬁrst deriv ativ e of R with resp ect to a to zero, making use of the con v exit y of R ( · ; y , y , y ′ ) and using: c S i = 1 N P N k =1 y k y ′ k −  1 N P N k =1 y k   1 N P N k =1 y ′ k  1 N P N k =1 y 2 k −  1 N P N k =1 y k  2 one easily sho ws that: b S i = argmin a ∈ R R ( a ; y , y , y ′ ) where: y = 1 N N X k =1 y k and y ′ = 1 N N X k =1 y ′ k . In other w ords, b S i is the slop e of the linear least squares regression of the { y ′ k } k on the { y k } . Deﬁne: R inf ( a ; e y , ε, µ, µ ′ ) = N X k =1 ( inf z ∈ [ e y k − ε k ; e y k + ε k ] ,z ′ ∈ [ e y ′ k − ε ′ k ; e y ′ k + ε ′ k ] ( z ′ − ( a ( z − µ ) + µ ′ )) 2 ) (11) and: R sup ( a ; e y , ε, µ, µ ′ ) = N X k =1    sup z ∈ [ e y k − ε k ; e y k + ε k ] ,z ′ ∈ [ e y ′ k − ε ′ k ; e y ′ k + ε ′ k ] ( z ′ − ( a ( z − µ ) + µ ′ )) 2    (12) where e y = ( e y 1 , . . . , e y N , e y ′ 1 , . . . , e y ′ N ), ε = ( ε 1 , . . . , ε N , ε ′ 1 , . . . , ε ′ N ). It is clear that: R inf ( a ; e y , ε, µ, µ ′ ) ≤ R ( a ; y , µ, µ ′ ) ≤ R sup ( a ; e y , ε, µ, µ ′ ) ∀ a, µ, µ ′ ∈ R (13) Note that R , R inf and R sup are quadratic p olynomials in a . W e name α , β , γ , α inf , β inf , γ inf , α sup , β sup and γ sup their respectiv e co eﬃcien ts. In other words , w e hav e: R ( a ; y , µ, µ ′ ) = αa 2 + β a + γ R inf ( a ; e y , ε, µ, µ ′ ) = α inf a 2 + β inf a + γ inf (14) R sup ( a ; e y , ε, µ, µ ′ ) = α sup a 2 + β sup a + γ sup (15) These co eﬃcien ts dep end on µ and µ ′ 1 . W e do not explicitly write this dep endence un til the last part of our discus sion. Using ( 13 ) w e see that the quadratic function of a : ( α inf − α ) a 2 + ( β inf − β ) a + γ inf − γ is negativ e or zero; hence it tak es a non-p ositiv e v alue for a = 0, and has a non-p ositiv e discrimi- nan t: γ inf − γ ≤ 0 (16) ( β inf − β ) 2 ≤ 4( α inf − α )( γ inf − γ ) (17) As ( β inf − β ) 2 ≥ 0, Equations ( 16 ) and ( 17 ) ab ov e imply that α inf − α ≤ 0, and that: β inf − δ inf ≤ β ≤ β inf + δ inf 1 as well on y (for α, β , γ ) a nd e y and ε (for the other co eﬃcients) 8 for δ inf = 2 q ( α inf − α )( γ inf − γ ). W e no w supp ose that α inf > 0. As α inf is computable from e y k , e y ′ k , ε k and ε ′ k , one can practically c hec k if this condition is met. If it is not the case, our b ound can not b e used. W e exp ect t hat if the metamo del error is not to o large, w e ha ve α inf ≈ α and, as α > 0, the h yp othesis α inf > 0 is realistic. So, under this suppleme ntary assumption, w e hav e: argmin a R ( a ; y , µ, µ ′ ) = − β 2 α ≥ − β inf + δ inf 2 α inf No w using the second part of ( 13 ) and the same reasoning on the non-p ositiv e quadratic function of a : R ( a ; y , µ, µ ′ ) − R sup ( a ; e y , ε, µ, µ ′ ), w e get that: α ≤ α sup , and: β sup − δ sup ≤ β ≤ β sup + δ sup . Hence, argmin a R ( a ; y , µ, µ ′ ) ≤ − β sup − δ sup 2 α sup where δ sup = 2 q ( α − α sup )( γ − γ sup ). This comes without suppleme ntary assumption, because α sup ≥ α and α > 0, as the minim um o f R ( · ; y , µ, µ ′ ) exis ts. As w e clearly ha ve δ inf and δ sup less than (or equal to) b δ := 2 q ( α inf − α sup )( γ inf − γ sup ), w e deduce that: − β inf ( µ, µ ′ ) + b δ ( µ, µ ′ ) 2 α inf ( µ, µ ′ ) ≤ argmin a R ( a ; y , µ, µ ′ ) ≤ − β sup ( µ, µ ′ ) − b δ ( µ, µ ′ ) 2 α sup ( µ, µ ′ ) where w e hav e explicited t he dep endencies in µ and µ ′ . T o ﬁnish, it is easy to see that w e hav e: P := [ e y − ε ; e y + ε ] ∋ y (18) and: P ′ := [ e y ′ − ε ′ ; e y ′ + ε ′ ] ∋ y ′ (19) (where e y , e y ′ , ε and ε ′ denote, resp ectiv ely , the means of ( e y k ) k , ( e y ′ k ) k , ( ε k ) k and ( ε ′ k ) k ) so t hat : min µ ∈ P , µ ′ ∈P ′ − β inf ( µ, µ ′ ) + b δ ( µ, µ ′ ) 2 α inf ( µ, µ ′ ) ! ≤ b S i = a r g min a R ( a ; y , µ, µ ′ ) ≤ max µ ∈ P ,µ ′ ∈P ′ − β sup ( µ, µ ′ ) − b δ ( µ, µ ′ ) 2 α sup ( µ, µ ′ ) ! Hence, ( 10 ) is v eriﬁed with: b S m i = min µ ∈ P ,µ ′ ∈P ′ − β inf ( µ, µ ′ ) + b δ ( µ, µ ′ ) 2 α inf ( µ, µ ′ ) ! , b S M i = max µ ∈ P , µ ′ ∈P ′ − β sup ( µ, µ ′ ) − b δ ( µ, µ ′ ) 2 α sup ( µ, µ ′ ) ! (20) It is clear that b S m i and b S M i are computable without kno wing the y k s and y ′ k s. In practice, we compute appro ximate v alues o f b S m i and b S M i b y replacing the min and max o ve r P × P ′ b y the min and max ov er a ﬁnite sample Ξ ⊂ P × P ′ . See Algorithm 2 for a summary of the en tire computation pro cedure. 3. Com bined conﬁdence interv als and parameters c hoice 3.1. Comb i n e d c onﬁd e n c e i n tervals In t he last section, w e ha v e seen ho w to separately assess sampling error and metamo del error. T o tak e b oth error in to account sim ultaneously , w e prop ose using b o otstrap conﬁdence interv als (see 9 Section 2.1 ) b y calculating B b o otstrap replications of b S m i and b S M i , where, fo r b = 1 , . . . , B eac h b o otstrap pair  b S m i [ b ]; b S M i [ b ]  is computed using ( e y k ) k ∈ L b , ( e y ′ k ) k ∈ L b as surrogate o utput samples, and asso ciated error b ounds ( e ε k ) k ∈ L b , ( e ε ′ k ) k ∈ L b , whe re L b is a list of N in tegers sampled with replacemen t fro m { 1 , . . . , N } . The BC b o otstrap conﬁdence in terv al pro cedure (see 2.1 ) can then b e used to pro duce a 1 − α - lev el conﬁdence in terv al [ b S m i,α/ 2 ; b S m i, 1 − α/ 2 ] f or S m i , a nd a conﬁdence in terv al [ b S M i,α/ 2 ; b S M i, 1 − α/ 2 ] for S M i . W e then tak e as combi ned conﬁdence interv al of lev el 1 − α f o r S i the range [ b S m i,α/ 2 ; b S M i, 1 − α/ 2 ]. This in terv al accoun ts f or sampling and metamo dels error sim ultaneously . ε sampling Optionally , w e can in tro duce a p ostulated uncertain t y on the error b ounds thro ugh what w e call ε samp ling . In ε sampling, the b th b o otstrap replicates for d S m i and d S M i are computed using ( ε ∗ k ) k ∈ L b ,  ε ′ ∗ k  k ∈ L b as error b ounds, where ε ∗ k and ε ′ ∗ k are sampled independen tly from a uniform distribution in [ η k ε k ; ε k ] and [ η ′ k ε ′ k ; ε ′ k ] , where η k , η ′ k ∈ [0; 1] a r e alleged eﬀe ctivities of our error b ound, that is, an indicator of the ratio betw een the true errors | e y k − y k | and ε k (and betw een | e y ′ k − y ′ k | and ε ′ k ). Setting eﬀectiv eness close to zero narrows conﬁdenc e inte rv als, putting more trust in the reduced model than in the error b ound, whic h is considered to o pessimistic; on the con trary , eﬀe ctive ness close to o ne means that error b ound do es not ov erestimate true error to o m uc h and that the error can not b e considere d to o smaller than it. The pro cedure for obtaining conﬁdence in terv als is summed up in Algorithm 3 . 3.2. Choic e of r e duc e d b a sis size and Monte-C arlo sam p le size When doing a Monte Carlo estimation of sensitivit y indices using a reduced basis metamo del, b y means of conﬁdence interv als computed with the strategy described ab o v e, one has to c ho o se t w o imp ortan t parameters : the s a mple size ( N ) and the n umber of elemen ts in the reduced basis ( n ). Increasing N and/or n will increase the o v erall time for computation (b ecause o f a larger n um b er o f surrogate sim ulations to p erform if N is increased, or, if n is increased, each surrogate sim ulation taking more time to complete due to a larger linear system to solve ). Ho w ev er, increase in these parameters will also impro v e the precis io n of the calculation (thanks to reduction in sampling error for increased N , or reduction in metamo del error f o r increased n ). In practice, one w an ts to estimate sensitivit y indices with a give n precision ( ie. to pro duce (1 − α )-leve l conﬁdence in terv als with prescribed length), and has no a priori indication on how to c ho ose N and n to do so. Moreov er, for one giv en precision, there ma y be m ultiple c hoices of suitable couples ( N , n ), balancing b et wee n sampling and metamo del error. W e wish t o c ho ose the b est, that is, the one who giv es the smallest computation time. The aim of this section is to describe a simple computational mo del that helps us in making a go o d c hoice of sample size and reduced basis size to pro duce a conﬁdence interv al o f a desired precision. F ormulation as a c onstr aine d optimization pr oblem On the one hand, w e ev aluate computation time: a n analysis of the reduced basis metho d sho ws that the most costly op eration made during an online ev aluation (see Section 1.2 ) is the resolution of a linear system of n equations; this resolution can b e done (e.g., b y using Gauss’ algorithm) with O ( n 3 ) op erations. This has to b e m ultiplied b y the required num b er of online ev a luations, i.e. the sample size N . Henc e, w e ma y assume that computation time is prop ortional to N × n 3 . On the other hand, the mean length of the (1 − α )-lev el conﬁdence in terv als for S 1 , . . . , S p can b e written as t he sum o f t w o terms. The ﬁrst, dep ending on N , accoun ts for sampling error and can b e mo delled as Z α √ N 10 for a constan t Z α > 0. The assumption of 1 / √ N deca y is heuristically deduced from cen tral limit theorem. The second term, whic h accoun ts for metamo del error, is assumed to b e of exp onential deca y when n increases: C /a n , where C > 0 and a > 1 are constan ts. This assumption is back ed up b y n umerical exp erimen ts as w ell as theoretical w orks Buﬀa et al. ( 2009 ). Once this analysis has b een done, w e translate our problem into the follo wing constrained mini- mization program: Find ( N ∗ , n ∗ ) = argmin ( n,N ) ∈ R + × R + n 3 × N so that 2 q α σ √ N + C a n = P (21) where P is the desired av erage precision for the conﬁdence in terv als. Note t hat w e conv erted the discrete design v ariables N and n to con tinuous p ositiv e v a r ia bles so as to use the standard to ols of contin uous optimization; once optim um of the contin uous pro blem ha v e b een found, we just round it to the nearest inte ger couple to reco v er a near-optimal in teger solution. R esolution o f the optimiza tion pr oblem The constrain t in ( 21 ) is equiv alen t to the conjunction of the follo wing tw o equations: N = Z α P − C a n ! 2 (22) P − C a n ≥ 0 (23) so that the function to minimize ov er I = ] n c ; + ∞ [ (where n c = ln( C /P ) / ln( a ) has b een chos ed so as to satisfy ( 23 )) is: φ ( n ) = ( Z α ) 2 n 3  P − C a n  2 This function is diﬀere ntiable on I and tends t o + ∞ as n → n c and n → + ∞ ; hence it has a minimizer n ∗ ∈ I that satisﬁes φ ′ ( n ∗ ) = 0, whic h is equiv alen t to: n ∗ P a n ∗ − C − 3 2 C ln a = 0 (24) On I , ( 24 ) is equiv alen t to: n ∗ − 3 P 2 C ln a a n ∗ = − 3 2 C ln a No w let ψ b e the f unction deﬁned on R by ψ ( x ) = x − 3 P 2 C ln a a x . By remarking that ψ ′ ( x ) = 1 − 3 P 2 C a x is contin uous a nd nonzero on  ln  2 C 3 P  / ln a ; + ∞  ⊃ I , one has that ψ is injectiv e on I , and so ( 24 ) has a t most one solution in I . Th us ( 24 ) has exactly one solution in I , of whic h an appro ximate v alue can b e found by usi ng bisection method Press et al. ( 1 992 ) on [ n c + ε ; L ] where ε > 0 is small enough and L is a suﬃcien tly large. Once n ∗ has b een found, w e can ﬁnd the optimal N ∗ b y setting n = n ∗ in ( 22 ). 11 Estimation of the p ar ameters The last question that remains to address is the estimation of the Z α , a and C constan ts. The Z α parameter is estimated b y running the estimation pro cedure on the metamo del, for ﬁxed N and n , estimating com bined BC b o otstrap conﬁdence interv als (Section 2.1 ) and taking: b Z α = √ N  b S M i, 1 − α/ 2 − b S M i + b S m i,α/ 2 − b S m i  where the factor m ultiplying √ N is the estimated “Monte-Carlo part” of the error. The a and C parameters are estimated by running an estimation pro cedure, for a single N ﬁxed, and diﬀeren t reduced basis sizes n 1 , . . . , n K , and measuring, fo r eac h reduced basis size, the a ve ra ge metamo del error e ( n k ) = b S M i − b S m i , f o r k = 1 , . . . , K . The { ( n 1 , e ( n 1 )); . . . ; ( n K , e ( n K )) } pairs are then used to ﬁt the exp onen tia l regression mo del e ( n ) = C /a n . If one w an ts to estimate the sensitivit y indices with resp ect to all v ariables i = 1 , . . . , p for a single v alue of N and n , one can use b o otstrap pro cedure to estimate Mon te-Carlo errors b E 1 , . . . , b E p for eac h of the p sensitivit y indices estimators: b E i = b S M i, 1 − α/ 2 − b S M i + b S m i,α/ 2 − b S m i and then to tak e: b Z α = √ N p p X i =1 b E i 4. Numerical results and discussion In this section, w e test o ur com bined conﬁdence in terv al pro cedure describ ed earlier, and compare it with Mon te-Carlo estimation on the full mo del (with b o otstrap to assess sampling error), and with the pro cedure describ ed in Storlie et al. ( 2009 ) a nd implemen ted in the CompMo dSA R pac kage. Our criteria of comparison are the CPU time needed to compute the inte rv als and the lengths of these in terv als (the smaller the b etter). In all our tests w e tak e α = . 05 and B = 2000 b o otstrap replications. 4.1. Mo del se t-up Let u , a function of space x ∈ [0; 1] (note t hat sp ace v ariable x is unre lat ed to input parameter v ector X ) and time t ∈ [ 0 , T ] ( T > 0 is a ﬁxed (i.e., kno wn) parameter) satisfying the visc ous Bur gers’ e quation : ∂ u ∂ t + 1 2 ∂ ∂ x ( u 2 ) − ν ∂ 2 u ∂ x 2 = ψ (25) where ν ∈ R + ∗ is the visc osity , and ψ ∈ C 0  [0 , T ] , L 2 (]0 , 1[)  is the sour c e term . F o r u to b e we ll- deﬁned, w e a lso prescrib e initial v alue u 0 ∈ H 1 (]0 , 1[): u ( t = 0 , x ) = u 0 ( x ) ∀ x ∈ [ 0; 1] (26) and b oundary v alues b 0 , b 1 ∈ C 0 ([0 , T ]):    u ( t, x = 0 ) = b 0 ( t ) u ( t, x = 1 ) = b 1 ( t ) ∀ t ∈ [0; T ] (27) Where b 0 , b 1 and u 0 are giv en functions, supp osed to satisfy c om p atibili ty c o nditions : u 0 (0) = b 0 (0) and u 0 (1) = b 1 (0) (28) 12 The initial u 0 and b oundary v alues b 0 and b 1 are parametrized the fo llowing wa y: b 0 ( t ) = b 0 m + n ( b 0 ) X l =1 A b 0 l sin( ω b 0 l t ) b 1 ( t ) = b 1 m + n ( b 1 ) X l =1 A b 1 l sin( ω b 1 l t ) f ( t, x ) = f m + n T ( f ) X l =1 n S ( f ) X p =1 A f lp sin( ω f T l t ) sin( ω f S p x ) u 0 ( x ) = ( u 0 m ) 2 + n ( u 0 ) X l =1 A u 0 l sin( ω u 0 l x ) The v alues of the angular frequencies ω b 0 l , ω b 1 l , ω f T l , ω f S p and ω u 0 l , as well as their cardinalities n ( b 0 ), n ( b 1 ), n T ( f ), n S ( f ) and n ( u 0 ) are ﬁxed (kno wn), while our uncertain parameters, namely: viscosit y ν , c o eﬃcien ts b 0 m , b 1 m , f m and u 0 m , and amplitudes  A b 0 l  l =1 ,...,n ( b 0 ) ,  A b 1 l  l =1 ,...,n ( b 1 ) ,  A f lp  l =1 ,...,n T ( f ); p =1 ,...,n S ( f ) and ( A u 0 l ) l =1 ,...,n ( u 0 ) liv e in some Cartesian pro duct of interv als P ′ , subset of R 1+4+ n ( b 0 )+ n ( b 1 )+ n T ( f ) n S ( f )+ n ( u 0 ) . Ho w eve r, the compatibilit y condition ( 28 ) constrain ts b 0 m and b 1 m as functions of the other pa- rameters: b 0 m = ( u 0 m ) 2 and b 1 m = ( u 0 m ) 2 + n ( u 0 ) X l =1 A u 0 l sin( ω u 0 l ) (29) so that the “complian t” uncertain parameters actually b elong to P deﬁned by : P =  X =  ν, b 0 m , A b 0 1 , . . . , A b 0 n ( b 0 ) , b 1 m , A b 1 1 , . . . , A b 0 n ( b 1 ) , f m , A f 11 , A f 12 , . . . , A f 1 ,n S ( f ) , A f 2 , 1 , . . . , A f 2 ,n S ( f ) , . . . , A f n T ( f ) ,n S ( f ) , u 0 m , A u 0 1 , . . . , A u 0 n ( u 0 )  ∈ P ′ satisfying ( 29 )  (30) In Janon et al. ( 2 0 10, submitte d . ), w e gav e an example with man y more parameters. T o illustrate our sensitivi ty analysis metho dology without ov erloading the text, w e c ho ose an example with a reduced n umber of parameters. The solution u = u ( X ) depends on the parameter v ector X ab ov e. The “full” mo del is obtained by discretizing the initial-b oundary v alue problem ( 25 ), ( 26 ), ( 27 ), using a discrete time grid { t k = k ∆ t } k =0 ,...,T / ∆ t , where ∆ t > 0 is the time step, and, space-wise, using P 1 Lagrange ﬁnite elemen ts built up on an uniform sub division of [0; 1]: { x i = i/ N } , for i = 0 , . . . , N . Our full output is: f ( X ) = f ( u ( X )) = 1 N N X i =0 u ( t = T , x = x i ) The r educed basis metho d is then applied to yield a surrogate solution e u of ( 25 ), ( 26 ), ( 27 ), as w ell as an error b ound ε u on u . D ue to non-linearit y and time-dep endence of ( 25 ), as w ell as parametrization of the b oundary v alues, the reduced basis metho dology is not as simple as the one presen ted in Section 1.2 of the presen t paper. The reader can refer to Janon et al. ( 2 0 10, submitte d . ) for full details o n discretization and reduction of this mo del. Error b ound on output ε is o btained b y following ( 7 ). In our numeric al exp erimen ts, w e take N = 60, ∆ t = . 01, T = . 05, n S ( f ) = n T ( f ) = n ( b 0 ) = n ( b 1 ) = 0, n ( u 0 ) = 1, ω u 0 1 = 0 . 5, A u 0 1 = 5 and f m = 1. Reduced basis are found using POD- based pro cedure with #Ξ = 30. The tw o input parameters are a ssumed independen t a nd uniformly distributed. The ta ble b elo w con tains the ranges for them, and also the “t rue” v alues of the sensitivit y indices, whic h ha v e b een calculated (in more than 14h CPU time) using a Mon te-Carlo sim ulation with large sample size N = 4 × 10 6 (so as to BC b o otstrap conﬁdence in terv als of length < 10 − 2 ) on the f ull mo del: 13 nu u0m model output Figure 1: O utput f of the Burg e r s’ mo del, plotted a s a function o f ν and u 0 m . Parameter Range 95% conﬁdence int er v a l for sens itivit y index ν [1 ; 20] [0.0815 ;0.0832 ] u 0 m [-0.3 ; 0.3] [0.9175 ;0.9182 ] The output, as a f unction of the t wo uncertain parameters ν and u 0 m is plotted at Figure 1 ; as one can see it is nonlinear with resp ect to the input parameters. 4.2. Conve r genc e b enchmark Figure 2 sho ws t he low er d S m and upp er d S M b ound (deﬁned in Section 2.2 ) for diﬀeren t reduced basis sizes (hence diﬀeren t metamo del precision) but ﬁx ed samp le of size N = 300, as we ll as the b o otstrap conﬁdence interv als computed usin g the pro cedure pres ente d in Section 3.1 . T his ﬁgure exhibits the fast con ve rg ence of our b ounds to the true v alue of the sensitivit y index as the reduced basis size increases. W e also see that the part of the error due to sampling (gaps b et we en conﬁdence in terv al upp er b ound and upp er b o und, and b etw een conﬁdence interv al low er b ound and lo w er b ound) remains constan t, a s sample size sta ys the same. 4.3. Choic e of n a nd N W e no w discus s the n umerical resu lts obtained wh en using the parameter tunin g pro cedure (Section 3.2 ). W e ha v e done “pre-runs” for N = 3 0 0, and diﬀeren t reduced basis sizes { 7 , 8 , . . . , 12 } of the com bined conﬁdence in terv al pro cedure, to yield the follo wing estimates: b C = 197 . 69 b a = 2 . 789 b Z . 05 = 2 . 6407 T o assess v alidity of t his estimation, and to che ck our mo delisation of the b o und precision and the execution time, w e plot the cub e ro ot of the CPU time (Figure ( 3 )) and the precision o f the b ound deﬁned in Section 2.2 (Figure ( 4 )). W e can c hec k that these h yp otheses are reasonably satisﬁed. 14 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 7 8 9 10 11 12 sensi 0 reduced basis size lower bound S i m hat upper bound S i M hat lower bound, 95% CI S i,.05/2 m hat upper bound, 95% CI S i,1-.05/2 M hat 0.75 0.8 0.85 0.9 0.95 1 1.05 7 8 9 10 11 12 sensi 2 reduced basis size lower bound S i m hat upper bound S i M hat lower bound, 95% CI S i,.05/2 m hat upper bound, 95% CI S i,1-.05/2 M hat Figure 2: Conv er gence b enchmark for sensitivit y indices estimation in the Burgers’ model, top: v a riable ν , b o ttom: v a r iable u 0 m . W e plotted, fo r a ﬁxed sa mple siz e N = 300, estimator b o unds b S m and b S M deﬁned in ( 2.2 ), a nd endpo in ts b S m i,. 025 and b S M i, 1 − . 025 of the 95% conﬁdence interv al, for diﬀerent reduced basis sizes. 15 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 7 8 9 10 11 12 (execution time)^(1/3) reduced basis size Figure 3 : Cube r o ot of the CPU time nece s sary to do estimations, as a function of the reduced ba sis size of n . Section 3.2 assumes this function is linear. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 7 8 9 10 11 12 average RB precision reduced basis size true RB precision model RB precision Figure 4: L ine: r educed basis “precisio n”, i.e. mean b S M i − b S m i , as a function of reduced basis size; dashes: re s ult of the ﬁt of an exp onential regr ession mo del: b C / b a n . 16 Precision p Reduced basis size n ∗ Sample size N ∗ 0.005 12.4437 354491 0.02 11.1095 22057.6 0.05 10.0501 3698.95 0.08 9.59689 1442.7 0.09 9.48332 1139.47 Figure 5: Optimal reduced basis and sample sizes calcula ted using the strateg y des crib ed in Section 3.2 . −3 −2 −1 0 1 2 3 0.06 0.07 0.08 0.09 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles −3 −2 −1 0 1 2 3 0.06 0.07 0.08 0.09 0.10 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles Figure 6: Normal empirical quantile-quant ile plots of the distributions of S m ν (left) and S M ν (right). One can ﬁnd in Figure 5 the computed optimal reduced basis sizes n ∗ and sample sizes N ∗ using resolution of the optimization problem ( 21 ), f o r v arious precisions p . All these v alues hav e b een computed in 5.77 s CPU time, including the time necess ar y to estimate C , a a nd σ . T o c heck f or the eﬃciency of this parameter tuning strategy , w e ch o ose a target precision of p = . 0 2 . In the table 5 , w e read that w e should tak e n ≈ 11 and N ≈ 22000 . Conducting the com bined conﬁdence in terv al estimation with these parameters g ive in terv als [0 . 06599 9 7; 0 . 0937285] for sensitivit y index for ν , a nd [ 0 . 914266; 0 . 9264 52] for sensitivit y with re- sp ect to u 0 m . These conﬁdence in terv als ha ve mean length: 1 2 (0 . 0937285 − 0 . 06599 9 7 + 0 . 926452 − 0 . 914266) = 0 . 0199575 ≈ 0 . 02 as desired. This computation to ok 5 2 s of CPU time to complete (including a metamo del oﬄine phase of 1 s). 4.4. Normality of the b o otstr ap distributions W e giv e in Figure 6 the empirical normal q uantile-quan tile plots of the b o otstrap replicates { b S m i [1] , . . . , b S m i [ B ] } and { b S M i [1] , . . . , b S M i [ B ] } . As these plots a re close to a line, the b o otstrap distributions are a ppro ximately normal. 4.5. Optimali ty of our metamo del err or b ound W e chec k ed for near optimalit y of the metamo del error b o und 2.2 b y comparing it with the v alues of the optimization problems: min Y ψ and max Y ψ where: ψ ( y ) = 1 N P N k =1 y k y ′ k −  1 N P N k =1 y k   1 N P N k =1 y ′ k  1 N P N k =1 ( y k ) 2 −  1 N P N k =1 y k  2 17 Surface resp onse (metamo del) Mean conﬁdence inte r v al length R 2 CPU time qreg: quadratic regression 0.081 0.9967 99 143.5 9 s mars: m ultiv ariate adaptiv e r egression splines 0.075 0.9 998506 218.716 s our appr o ach 0.019 N/A 5 2 s Figure 7: Results of CompMo dSA’s sensitivity function o n o ur mo del, for tw o ﬁtted resp onse surfaces. R 2 is an indicator o f the metamo del ﬁt (v alues close to unit y s uggest go o d ﬁt). The last line reca lls the results o f the exp e riment in Section 4.3 . and: Y = N Y k =1 [ e y k − ε k ; e y k + ε k ] × N Y k =1 [ e y ′ k − ε ′ k ; e y ′ k + ε ′ k ] These problems, o f larg e dimens ion 2 N , giv e the o ptimal v alues of b S m i and b S M i satisfying ( 10 ). They can b e solv ed with sim ulated annealing Kirkpatric k et al. ( 1983 ); P ardalos and Romeijn ( 2002 ). Our b ound gav e results v ery close to the optimal ones, for a smaller computational cost than using sim ulated a nnealing. 4.6. Comp arison with estimation on the ful l mo del T o obtain a result of the same precision, w e carry a sim ulation for N = 2 1 000 (sample size can b e c hosen smaller than b efore, as there will b e no metamo del error) on the ful l mo del; w e get the b o otstrap conﬁdence in terv al with mean length of ≈ 0 . 0 1 93. This computation take s 308 s of CPU time to complete. Hence, on this example, using a reduced- basis surrogate model roughly divides o v erall computation time b y a factor of 5.9, without any sacriﬁce on the precision and the rigorousnes s (as our metamodel error quan tiﬁcation pro cedure is fully prov en a nd certiﬁed) of the conﬁdence inte rv al. W e expect higher time sav ings with more complex (for example , tw o - or three-dimensional in space) mo dels. 4.7. Comp arison with CompMo dSA W e compared our results with the ones obtained using the R CompMo dSA v ersion 1.2 pac kage do wnloaded a t compmo dsa . This pac kage implemen ts t he metho d described in Storlie et al. ( 2009 ) for assessing sampling error as w ell as metamo del error. It do es not make use of t he reduced basis output error b o und, but uses a non-intrusiv e metho d to ﬁt a metamo del using a reasonable n um b er of full mo del ev aluations. W e fed in to CompMo dSA pro cedure 50 suc h full mo del outputs (whic h to ok 0.22 s CPU to com- pute). W e then tried v arious non-in tr usiv e metamo dels ( surfac e r esp on ses ) , and rep ort ed the results in to Figure 7 . W e used a s parameters: n.mc.T=0 (w e do not wan t an y total index com- putation), n.mc. S=2 3 000 (sample size), n.sample s=1 (one run), n.CI=300 (generate conﬁdence in terv als using 300 b o otstrap replications). W e contrib uted 2 the option CI. S, whic h is set to TR UE to compute b o otstrap conﬁdenc e in terv als for the main eﬀect index. By lo oking at results in Figure 5 , w e can see that in this case, our approach is cle a r ly superior, b oth in terms o f prec ision and computation time. T o ac hiev e this result, w e to ok adv antage o f the particular form ulation of the original model whic h allo ws the reduced basis metho dology to b e eﬃcien tly applied; CompMo dSA, due to its non- in trusiv e nature, is easier to use on a generic “blac k b ox” mo del. 2 patch av ailable at http:/ /ljk. imag.fr/membres/Alexandre.Janon/compmodsa.php 18 Conclusion and p ersp ect iv es W e presen ted a metho dology to mak e a rigorous quan tiﬁcation of the impact of the sampling error and the metamo del error on the sensitivit y indices computation, when a reduced-basis metamo del is used. Sampling error is handled b y a classic b o otstrap pro cedure, whil e metamodel error is managed using a new b ound on the sensitivit y index estimator. Quan tiﬁcation of those t wo t yp es of errors p ermits not o nly a certiﬁcation on the p erfor med estimation, but also giv es a w ay to tune the optimal parameters (reduced basis and optimal sample sizes), fo r a giv en desired precision on the indices. W e hav e sho wn on a concre te example the sup eriorit y of this method when compared to the use of the f ull mo del, or non-intrus ive (quadratic regression, MARS) metamo dels. Our metho d can b e a pplied to other Mon te Carlo (or quasi Mon te Carlo) estimators, and to other metamo dels whic h provide an error b ound similar to the one pro vided by the reduced basis framew ork. A cknow le dgements. W e wish to thank Jean-Claude F ort for a su gg estion whic h w e exploited to p erform our computation of the metamo del-induced error. W e also thank Anestis An toniadis and Ingrid V an Keilegom for advice on b o o tstrap metho dology . – This w ork has been partially suppo r t ed by the F renc h National Researc h Agency (ANR) through COSINUS progr a m (pro ject COST A-BRA V A n° ANR-09-COSI-015). A. Algorithms Algorithm 1: 1. Draw from X distribution t wo indep enden t samples of size N : { X k } and { X ′ k } . 2. T abulate necess a ry mo del ev aluations: for k = 1 , . . . , N : (a) set X ← X k ; (b) compute y k = f ( X ); (c) swap X i and X ′ k i ; (d) compute y ′ k = f ( X ); (e) swap X i and X ′ k i bac k. 3. Compute b S i : b S i = 1 N P N k =1 y k y ′ k −  1 N P N k =1 y k   1 N P N k =1 y ′ k  1 N P N k =1 y 2 k −  1 N P N k =1 y k  2 4. Rep eat, fo r b = 1 , . . . , B : (a) Draw at random a list L of length N , with replacemen t from { 1 , . . . , N } . (b) Compute r eplication b S i [ b ] : b S i [ b ] = 1 N P k ∈ L y k y ′ k −  1 N P k ∈ L y k   1 N P k ∈ L y ′ k  1 N P k ∈ L y 2 k −  1 N P k ∈ L y k  2 5. Compute b z 0 : c z 0 = Φ − 1 # { b ∈ { 1 , . . . , B } s.t. b S i [ b ] ≤ b S i } B ! where Φ( z ) = 1 √ 2 π Z z −∞ exp − t 2 2 ! d t . 19 6. Lo ok up for z α/ 2 so that: Φ( z α/ 2 ) = α/ 2 and tak e z 1 − α/ 2 = − z α/ 2 , satisfying: Φ( z 1 − α/ 2 ) = 1 − α / 2. 7. Compute b q ( α / 2) and b q (1 − α/ 2): b q ( α / 2) = Φ(2 c z 0 + z α/ 2 ) , b q (1 − α/ 2) = Φ(2 c z 0 + z 1 − α/ 2 ) 8. Compute b S i,α/ 2 and b S i, 1 − α/ 2 , the b q ( α / 2) and b q (1 − α/ 2) quantile s of { S i [1] , . . . , S i [ B ] } . 9. Output h b S i,α/ 2 ; b S i, 1 − α/ 2 i as conﬁdence in terv al for S i of lev el 1 − α . Algorithm 2: 1. Draw from X distribution t wo indep enden t samples of size N : { X k } and { X ′ k } . 2. T abulate necess a ry mo del ev aluations: for k = 1 , . . . , N : (a) set X ← X k ; (b) compute e y k = e f ( X ) and ε k = ε ( X ) ; (c) swap X i and X ′ k i ; (d) compute e y ′ k = e f ( X ) and ε ′ k = ε ( X ); (e) swap X i and X ′ k i bac k. 3. Compute e y , e y ′ , ε and ε ′ , the resp ectiv e means of { e y k } , { e y ′ k } , { ε k } and { ε ′ k } . 4. Cho ose a ﬁnite subset Ξ of the set P × P ′ where P and P ′ are deﬁned b y ( 18 ) and ( 19 ). 5. Rep eat, fo r ( µ, µ ′ ) ∈ Ξ: (a) By using ( 11 ) and ( 1 2 ), compute R inf ( a ; e y , ε, µ, µ ′ ) and R sup ( a ; e y , ε, µ, µ ′ ) for three diﬀeren t v alues of a ; (b) deduce α inf , β inf , γ inf , α sup , β sup and γ sup satisfying ( 14 ) and ( 15 ); (c) if α inf ≤ 0, exit with f a ilure; (d) compute b δ = 2 q ( α inf − α sup )( γ inf − γ sup ); (e) compute: b S m i ( µ, µ ′ ) = − β inf + b δ 2 α inf b S M i ( µ, µ ′ ) = − β sup − b δ 2 α sup 6. Output: b S m i = min ( µ,µ ′ ) ∈ Ξ b S m i ( µ, µ ′ ) b S M i = max ( µ,µ ′ ) ∈ Ξ b S M i ( µ, µ ′ ) Algorithm 3: 1. F ollo w steps 1. and 2. of Algorithm 2 . 2. Compute b ounds b S m i and b S M i using steps 3 .–6. of Algorithm 2 . 3. Rep eat, fo r b = 1 , . . . , B : (a) Draw at random a list L of length N , with replacemen t from { 1 , . . . , N } . (b) If using ε -sampling: fo r k ∈ L , sample ε ∗ k uniformly in [ η k ε k ; ε k ] a nd ε ′ ∗ k uniformly in [ η ′ k ε ′ k ; ε ′ k ] . 20 (c) Else: tak e, for k ∈ L , ε ∗ k = ε k and ε ′ ∗ k = ε ′ k . (d) Compute b ounds b S m i [ b ] and b S M i [ b ] using steps 3.–6. of Algorithm 2 , with ( e y k ) k ∈ L instead of ( e y k ) k =1 ,...,N , ( e y ′ k ) k ∈ L instead of ( e y ′ k ) k =1 ,...,N as sample data, a nd ( ε ∗ k ) k ∈ L instead of ( ε k ) k =1 ,...,N , a nd  ε ′ ∗ k  k ∈ L instead of ( ε ′ k ) k =1 ,...,N as error b ounds. 4. Compute, for w ∈ { m, M } , the t w o bias correction constan ts: c z w 0 = Φ − 1 # { b ∈ { 1 , . . . , B } s.t. b S w i [ b ] ≤ b S w i } B ! where Φ( z ) = 1 √ 2 π Z z −∞ exp − t 2 2 ! d t . 5. Lo ok up for z α/ 2 so that: Φ( z α/ 2 ) = α/ 2 and tak e z 1 − α/ 2 = − z α/ 2 , satisfying: Φ( z 1 − α/ 2 ) = 1 − α / 2. 6. Compute b q m ( α/ 2) and b q M (1 − α/ 2) : b q m ( α/ 2) = Φ(2 c z m 0 + z α/ 2 ) , b q M (1 − α/ 2) = Φ(2 d z M 0 + z 1 − α/ 2 ) 7. Compute b S m i,α/ 2 and b S M i, 1 − α/ 2 , the b q m ( α/ 2) a nd b q M (1 − α / 2) quan tiles of { S m i [1] , . . . , S m i [ B ] } and { S M i [1] , . . . , S M i [ B ] } , resp ectiv ely . 8. Output h b S m i,α/ 2 ; b S M i, 1 − α/ 2 i as com bined conﬁdence in terv al for S i of lev el 1 − α . References Arc her, G., Saltelli, A., Sob ol, I., 199 7 . Sensitivit y measures, ANOV A-lik e tech niques and the use of b o otstrap. Journal of Statistical Computation and Sim ulation 58, 99–120. Bergmann, J., Iollo, A., 2008. Numerical metho ds for lo w-order mo deling of ﬂuid ﬂow s based on POD . Bo y av al, S., Bris, C., Maday , Y., Nguy en, N., P atera, A., 2009. A reduced basis approac h for v ariational problems with sto c hastic parameters: Application to heat conduction with v ariable robin co eﬃcien t. Computer Metho ds in Applied Mec hanics and Engineering 198, 3 187–3206. Buﬀa, A., Mada y , Y., P a tera, A., Prud’homme, C., G., T., 2009. A priori con v ergence of the greedy a lgorithm f o r the parametrized reduced basis. Mathematical Mo delling and Numerical Analysis . Bui-Thanh, T., Willco x, K., Ghattas, O., v an Blo emen W aa nders, B., 20 07. Go a l-orien ted, mo del- constrained optimization for reduction of large-scale systems. Journal of Computational Phy sics 224, 880–896. Chatterjee, A., 200 0. An introduction to the prop er ort hog onal decomp osition. Curren t Science 78, 808–817. compmo dsa, . CompMo dSA. http://w ww.stat.unm.edu/ ~s torlie/CompModSA/ . 21 Efron, B., 19 81. Nonparametric standard errors and conﬁdenc e in terv a ls. Canadian Journal of Statistics 9, 139–158. Efron, B., Tibshirani, R., 1986. Bo otstrap metho ds fo r standard errors, conﬁdence in terv als, and other measures of statistical a ccuracy. Statistical science 1, 54– 7 5. Efron, B., Tibsh ira ni, R., Tibshirani, R., 1993. An introduction to the bo otstrap. Chapman & Hall/CR C. Grepl, M., Mada y , Y., Nguyen , N., Patera, A., 2007 . Eﬃcien t reduced-basis tr eatment of nonaﬃne and nonlinear partial diﬀeren tial equations. Mathematical Modelling and Numerical Analysis 41, 575–605. Grepl, M., P at era, A., 2005. A posteriori error b ounds fo r reduced-basis appro ximations of parametrized para b olic partial diﬀeren tial equations. Mathematical Mo delling and Numerical Analysis 39, 1 5 7–181. Helton, J., Johnson, J., Sallab erry , C., Stor lie, C., 2006. Surv ey of sampling-based methods for uncertain t y and sensitiv ity analysis. Reliabilit y Engineering & System Safet y 91, 1175–1209. Homma, T., Saltelli, A., 1996. Imp ortance measures in glo bal sensitivit y analysis of nonlinear mo dels. Reliabilit y Engineering & System Safety 52, 1–17. Janon, A., No det, M., Prieur, C., 2010, subm i tte d . Certiﬁe d reduced-basis solutions of vis- cous Burgers equations parametrized by initial and b oundary v alues. Preprin t av ailable at http://h al.inria.fr/inri a - 00 524727/en . Kirkpatric k, S., Gelatt Jr, C., V ecc hi, M., 1983. Optimization b y sim ulated annealing. Science 220, 671. Marrel, A., Io oss, B., Lauren t, B., Roustant, O., 2009. Calculations of s o b ol indices for the gaussian pro cess metamo del. Reliabilit y Engineering & Syste m Safet y 94, 742–751. Nguy en, N., V ero y , K., P atera, A., 2005. Certiﬁe d real-time solution of parametrized partial diﬀeren tial equations. Handb o ok of Materials Mo deling , 1523– 1558. P ardalos, P ., Romeijn, H., 200 2 . Handb o ok of global optimization. V olume 2. Klu w er. Press, W., Flannery , B., T euk olsky , S., V etterling, W., 1992. Numerical recip es in C: the a rt of scien tiﬁc progra mming. Cam bridge U. Press, Cam bridge, England . Saltelli, A., 2002. Making b est use of mo del ev aluations to compute sensitivit y indices. Computer Ph ysics Comm unications 145, 280–297. Saltelli, A., Chan, K., Scott, E., . Se nsitivit y analysis. 2000. Sob ol, I., 2001. Global sensitivit y indices for nonlinear mathematical mo dels and their Mon te Carlo estimates. Mathematics and Computers in Simulation 55, 271 –280. Storlie, C., Swiler, L., Helton, J., Sallaberry , C., 2 009. Implemen tation and ev aluation of non- parametric regression pro cedures for sensitivit y analysis o f computationally demanding mo dels. Reliabilit y Engineering & System Safet y 94 , 1 7 35–1763. V eroy , K., Patera, A., 2005. Certiﬁed real-time solution of the parametrized steady incompressible Na vier-Stok es equations: Rigorous reduced-basis a p osteriori error b ounds. Intern a tional Journal for Numerical Methods in Fluids 47, 773 –788. 22

Confidence intervals for sensitivity indices using reduced-basis metamodels

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment