A Control Variate Approach for Improving Efficiency of Ensemble Monte Carlo

A Con trol V ariate App roac h for Improving Eﬃciency o f Ense m ble Mon te Carlo ⋆ T arik Borog o v ac a , b , ∗ F rancis J. Alexander b Piro oz V a k ili c a Ele ctric al and Computer E ngine ering Dep artment, Boston Unive rsity, 8 Saint Mary’s St. Boston, MA 02215, USA b CCS-3, L os Ala mos National L ab or atory, MS-B256, L os A lamos, NM 87545, USA c Division of Systems E ngine ering & Me chanic al Engine e ring Dep artment, Boston University, 15 Saint Mary’s St. Br o okline, M A 02446, U SA Abstract In this pap er we presen t a new approac h to con trol v ariat es for improving co mpu- tational eﬃciency of Ensem ble Mon te Carlo. W e present the approac h using sim- ulation of paths of a time-dep endent nonlinear sto c hastic equation. The core idea is to extract information at one or more nominal mo d el parameters and use this information to gain estimation eﬃciency at n eigh b oring parameters. This idea is the basis of a general strategy , called DataBase Mon te Carlo (DBMC), for impro ving eﬃciency of Mon te Carlo. In this pap er we d escrib e h o w this strategy can b e imple- men ted using t he v ariance reduction tec hn ique of Con trol V ariates (CV). W e sho w that, once an initial setup cost for extracting information is incurred, this approac h can lead to signiﬁcant gains in compu tational eﬃciency . The in itial setup cost is justiﬁed in pr o jects that requ ire a large n um b er o f estimations or in those that are to b e p erformed und er r eal-time constrain ts. Key wor ds: Mon te Carlo, V ariance Reduction, Con trol V ariate s P ACS: S05.10.Ln, 02.70.Uu, 02.70. Tt ⋆ P ortions of this w ork, L A-UR-08-0 5399, were carr ied out at Los Alamo s National Lab oratory un der the auspices of the US National Nuclear Securit y Administra- tion of the US Departmen t of Energy . T arik Borogo v ac and Piro oz V akili were supp orted in part b y the National Science F oundation gran ts CMMI-0620965 and DGE-022 1680. ∗ Corresp ond ing auth or Email addr ess: ta rikb@bu.e du (T arik Borogo v ac). Preprint sub mitted to Elsevier 17 No v em b er 2018 1 In tro duction The purpo se of this pap er is to presen t a no v el approac h for eﬃcien t estimation via the Monte Carlo (MC) metho d. The approa c h is very broadly a pplicable but here, to presen t the main ideas, we narrow the fo cus to Ensem ble Mon te Carlo where estimation is based on stochastically indep enden t tra jectories of a system. T o illustrate, w e use sim ulation of time-dep enden t no nlinear pro cesses for whic h Mon te Carlo is a par t icularly general and p ow erful nu merical metho d compared to av ailable alternativ es. Time-dependent nonlinear pr o cesses are v ery general mo dels used, among o thers, in statistical me c hanics [1], data assimilation in climate, w eather and o cean mo deling [2], ﬁnancial mo deling [3], and quantitativ e biology [4]. Hence dev eloping eﬃcien t MC metho ds ma y signiﬁcan tly impact a wide range of applications. A kno wn w eakness of MC is its slo w rate of con v ergence. Assume Y is a random quan tit y deﬁned on paths of a pro cess and let σ Y denote its standard deviation. The conv ergence r a te of MC for estimating the exp ected v alue of Y is ≈ σ Y / √ n where n is the n um b er of indep enden t paths o f the pro cess. In g eneral the canonical n − 1 / 2 rate of con v ergence cannot b e impro v ed up on, hence, since the inception of the MC metho d, a n um ber o f v aria nce reduction (VR) tec hniques hav e b een devised to reduce σ Y (see, [5] for a n early account and [3] and [6] for mor e recen t discussions). Most VR t echniq ues lead to estimators of the form w 1 Y 1 + · · · + w n Y n , i.e., a w eigh ted a v erage of the samples. These tec hniques prescrib e (i) a recip e for selecting samples Y 1 , · · · , Y n and (ii) a set o f w eigh ts w 1 , · · · , w n . T o arriv e at these prescriptions, one m ust rely on the existence of sp eciﬁc problem features and the abilit y of the user of the metho d to disco v er and eﬀectiv ely exploit suc h features. This lack of generalit y has signiﬁcan tly limited the applicabilit y of VR techniq ues. The p oint of departur e of a new strategy , called Da t aBase Mon te Carlo (DBMC), is to address this shortcoming and to devise generic VR tec hniques that can b e generically applied [7]. All VR tec hniques bring additional information to b ear on the estimation problem, ho w ev er, as men tioned ab o v e, this informa- tion is problem sp eciﬁc and relies o n exploiting sp ecial features of the prob- lem at hand. By con trast, as will b e clariﬁed in this pap er, DBMC adds a generic computational exploratio n phase to the estimation problem that re- lies on gathering information at one (o r more) nominal mo del parameter(s) to achiev e estimation eﬃciency at neigh b oring par ameters. The adv an tage of this a pproac h is its generalit y and wide applicability: it is quite easy t o im- 2 plemen t and it can wrap existing ensem ble MC co des. On t he other hand, the computat ional exploration phase of the D BMC a ppro ac h may r equire ex- tensiv e sim ulations and can b e computationally costly . Therefore, the initial setup cost needs justiﬁcation. The setup cost may b e justiﬁed in pro jects t ha t in v olv e estimations at many mo del parameters and/or in pro jects where there is a real-time computationa l constrain t. In the ﬁrst t yp e of pro ject, the setup cost ma y lead to eﬃciency gain for each subsequen t estimation, and for a large enough n um ber of subse quen t estimations it can b e easily justiﬁed. In pro jects with a real-t ime constrain t the setup cost is an oﬀ- line “passiv e” cost that can lead to estimates of signiﬁcan tly higher quality (low er statistical error); the higher quality in man y suc h pro jects more than justiﬁes the setup cost. In this pap er w e limit ourselv es to pr esen ting the implemen tation of the VR tec hnique of Con trol V aria t es (CV) in the D BMC setting (see [7] for discus- sion of other VR techniq ues). The CV tec hnique, whic h compared to the VR tec hnique of Imp ortance Sampling is less utilized in computationa l physic s, requires iden tifying a num b er of ra ndom v ariables called c ontr ol va ri a tes , sa y X 1 , · · · , X k , t ha t are correlated with Y and ha v e know n means. The correlation with Y implies that X i ’s carry info r mation ab out Y . The CV tec hnique is a w a y of utilizing the informatio n included in the controls (t heir known means) to help with the estimation of the mean o f v ariable Y . In the D BMC setting w e assume that Y = Y ( θ ) dep ends on a mo del para meter θ and use X i = Y ( θ i ) where θ i ’s are in a neigh b orho o d of θ ( i = 1 , · · · , k ). In a departure from the classical CV tec hnique, w e use “high quality ” estimates of E [ X i ] rather than precise v alues of E [ X i ] to arriv e at the controlled estimator of E [ Y ]. As we argue in this pap er (and elsewhere [8]) this departure allows for substan tially broader ch oices of control v ariates and make s the CV tec hnique signiﬁcan tly more ﬂexible a nd eﬀectiv e. The DBMC metho d shares a similar inte n t as the we ll-kno wn histog ram rew eighing metho d [9] f rom the Mark o v c hain Mon te Carlo literature (e.g. [1]), but with a v ery diﬀeren t setting a nd implemen tation, and with broader applicabilit y . F or example, it do es not rely on having a Boltzmann distribution or exp ( − H /k T ) structure. Giv en its generalit y , it has p otential applications, among others, in ensem ble w eather prediction, h ydrological source lo cation, climate and o cean, optimal con trol, a nd sto c hastic sim ulations of biological systems . The remainder of the pap er is o r ganized as follows . In section 2 w e discuss preliminaries, including the details o f the example n umerical study – t he time- dep enden t Ginzburg- Landau (TDGL) equation – as w ell as the metho d of con trol v a riates. Estimation of mean outcomes of the TDGL equation ov er a range of temp eratures is of intere st, esp ecially considering the larg e diﬀerence in b eha vior b elow and ab ov e the co existence curv e. In section 3 w e describ e the DBMC metho dology and motiv ation in a general con text. Section 4 discusse s 3 the implemen tation a nd results of DBMC as applied to estimation of quan tities generated b y the TDG L equation, and the results of that n umerical study . W e conclude in section 5. 2 Preliminaries W e presen t asp ects of o ur approac h and num erical results in the contex t of the time-dep enden t G inzburg Landau (TDGL) mo del. It is w orth noting that this mo del is c hosen fo r illustrative purp oses only and w e do no t mak e use of an y of its sp eciﬁc features. 2.1 Time-Dep endent Ginzbur g L anda u W e use a canonical equation of phase-ordering kinetics [1 0 ,11] the sto c hastic TDGL equation in t w o spatial dimensions. This is written as ∂ φ ( x , t ) ∂ t = D ∆ φ ( x , t ) − V ′ ( φ ( x , t )) + η ( x , t ) (1) where φ ( x , t ) represen ts a lo cal order para meter, e.g. a mag netization at p oin t x = ( x 1 , x 2 ) ⊤ and time t ( ⊤ denotes transp ose). The noise has mean zero and cov ariance h η ( x , t ) η ( x ′ , t ′ ) i = 2 δ ( x − x ′ ) δ ( t − t ′ ). W e c ho ose a double-w ell p oten tial V ( φ ) = − θ 2 φ 2 + χ 4 φ 4 . As in [11] χ is a constan t, and θ is a function of temp erature suc h that a high θ corresp onds to a low temp erat ur e. W e use a discrete form of (1) using a forw ard Euler-Maruyama sto chas tic in tegrator and a 5- p oin t stencil for the Laplacian (denoted ∆ L ) f or sim ulat io n: φ ( x , t + δ t ) = φ ( x , t ) + D δ t ∆ L φ ( x , t ) − δ t [ − θφ ( x , t ) + χφ 3 ( x , t )] + q 2( δ t/ δ x ) N ( x , t ) with time step δ t and lattice spacing δ x , and where N ( x , t ) are independent and iden tically distributed standard normal random v ariables for eac h space- time p o in t ( x , t ). What follo ws applies to o ther discretization sc hemes as well. 2.2 Estimation pr oblem T o cov er a broa d range of estimation problems, we consider the estimation of quan tities related to a speciﬁc space-time p oint, quan tities that are global (en- 4 tire lat t ice at a par ticular time) and quantities that dep end on the en tire time ev olution of the system. Sp eciﬁcally , we consider the follow ing represen tativ e quan tities: (P1) Point magne tiza tion : φ ( x , t ), (P2) T otal m a gnetization at a sp e ciﬁc time t : P x φ ( x , t ), and (P3) T otal sp ac e-time m agnetization : P t P x φ ( x , t ). The problem of estimating t he exp ected v a lue of an y one of the ab ov e quan- tities can b e represen ted by: J ( θ ) = E [ Y ( ω ; θ ))] where ω is a vector of random n um b ers represen ting all the noise/uncertain t y in a single complete path φ of the dynamics; θ is the temp erature related parameter; Y ( ω ; θ ) is the r a ndom sample of a quan tit y of interes t (e.g., the magnetization f rom a single sample path), and E [ · ] denotes exp ectation. Note that knowin g the noise ω and par a meter θ completely determines the path φ and the sample quan tit y of in terest Y ( ω ; θ ). 2.3 The c o ntr ol variate te chnique Here we give a brief review of the classical con trol v ariate (CV) tec hnique fo r v ariance reduction (see [3] [12]). Let Y = Y ( ω ; θ ), J = E [ Y ]. Assume X 1 , · · · , X k are random v aria bles (called con trol v aria t es) that are correlated with Y and assume their means E [ X i ] are known. Let X = ( X 1 , · · · , X k ) ⊤ , E [ X ] = ( E [ X 1 ] , · · · , E [ X k ]) ⊤ , a nd β = ( β 1 , · · · , β k ) ⊤ . Then Z , deﬁned b elow, is a controlled estimator o f E [ Y ] Z = Y + k X i =1 β i ( X i − E [ X i ]) = Y + β ⊤ ( X − E [ X ]) The estimator Z uses information included in samples of the con trols (the degree of their deviation from their known means) to “ correct/adjust” the estimator Y and bring it closer to its unkno wn mean. This is the k ey idea of CV. (Alternative ly , Z can b e view ed as t he ﬁtted v alue of Y when Y is linearly regressed on v ariables X 1 , · · · , X k . In o t her words, Z includes the part of the v ariation in Y that cannot b e “explained” by X i ’s.) Z is a n unbias ed estimator of E [ Y ] for all v ectors β . The co eﬃcien t v ector 5 that minimizes the v ar ia nce of Z is: β o = Σ − 1 X Σ X Y where Σ X is t he k × k co v ariance matrix o f X and Σ X Y is t he k × 1 v ector of co v ariances of Y and X i ’s. When β o is used, the v ariance of Z is giv en by (1 − R 2 ) σ 2 Y where R 2 = Σ ⊤ X Y Σ − 1 X Σ X Y /σ 2 Y . Therefore, V ar ( Y ) V ar ( Z ) = (1 − R 2 ) − 1 (2) and hence (1 − R 2 ) − 1 is precisely the theoretical degree of v ar ia nce reduction if the con trolled estimator Z = Y + β o ⊤ ( X − E [ X ]) is used to estimate J as opp osed to the crude MC estimator Y , and it is called the V ariance Reduction Ratio (VRR) statistic for control v ariates. Note that there is no upp er limit to the degree o f ac hiev able v ariance reduction since R 2 can p o ten tially b e v ery close to 1 when the controls are hig hly correlated with the estimation v ariable Y . In ot her w ords, the CV tec hnique can p otentially b e ve ry eﬀectiv e leading to orders of magnitude o f v ariance reduction. In practice and in general, Σ X and Σ X Y (i.e. β o ) are not kno wn exactly and need to b e estimated from samples o f X i ’s and Y . T ypically , β o is estimated from the same samples used to construct the controlled estimator Z . While this practice adds some bias for small sample sizes, and th us mak es the eﬀectiv e decrease in estimator mean squared erro r not precisely equal to the v ariance reduction ratio (1 − R 2 ) − 1 , this bias con v erges to zero faster than the standard error of Z . Th us, exp ending computatio nal resources in to generating separate pilot samples for estimating β o is not considered to b e justiﬁable. F or an insigh tful and detailed discussion of the CV tec hnique, see [3]. 2.4 Chal lenges in using the CV te chnique The critical task f o r using the CV tec hnique is in ﬁnding eﬀectiv e controls. Once the con trols are selected, the rest of the pro cedure is fairly routine. An eﬀectiv e con trol, sa y X , needs to satisfy t w o requiremen ts ( t o simplify the discussion we consider a scalar con trol): (R1) X needs to b e correlated with Y , and (R2) E [ X ] needs to b e a v ailable to the user, i.e., known. The main barrier to ﬁnding eﬀectiv e con trols is the second requiremen t, na mely the requiremen t of a kno wn mean E [ X ]. A mo diﬁcation of the CV techniq ue called Bia sed Control V ariat e (BCV) reduces the burden of requiremen t (R2) 6 b y allo wing for a go o d appro ximation of E [ X ] when E [ X ] cannot b e ev a lu- ated analytically [13]. While BCV low ers the requiremen t barrier a nd expands the range of a v ailable c hoices f o r eﬀectiv e controls, it nonetheless limits its p oten tial scop e by implicitly assuming an analytic path to a rriving at the ap- pro ximate v alue. As w e describ e in the next section, in the DBMC approach w e turn the second requiremen t in to a computatio nal ta sk; in other w ords, w e use statistical estimation to obta in a g o o d estimate of E [ X ]. Therefore, barrier (R2) is completely remo v ed and t he range of c hoices of controls is dra matically expanded. The relev ant question now b ecomes whether the computational in- v estmen t in estimating E [ X ] pa ys enough dividends to mak e the in v estmen t w orth while. 3 DBMC & Co ntrol V ariate The starting p oin t of the DBMC approa ch is the observ ation t hat in man y parametric estimation settings, including in the example considered in this pap er, quantities Y ( θ ) and Y ( θ ′ ) are highly correlated when the same random input ω is used to generate them and when θ and θ ′ are close 1 . This suggests using con trol v ariates X i = Y ( θ i ), i = 1 , · · · , k , when estimating Y ( θ ) where θ i ’s are “close” to θ . While w e ha v e iden tiﬁed p ot entially eﬀectiv e controls, w e do no t hav e suﬃcien t information ab out them, i.e., J ( θ i ) = E [ X i ] is not kno wn and needs to b e ev aluated. This brings us to the second feature of the DBMC metho d that corresp onds to its initial computational information gat hering/ setup stage. This stage corresp onds to statistical estimation of J ( θ i ). D etails ar e g iven b elo w. 3.1 DBMC + CV algorithm The DBMC approach consists of a setup stage and an estimation stage. 3.1.1 Setup stage The D BMC setup phase in v olv es generating a “ la rge” n um b er of input ran- dom ve ctors ω and obtaining “high quality ” estimates of J ( θ i ). Let D B = 1 A similar obser v ation is the basis for the histogram re-w eigh ting metho ds: “from a sim ulation at a single state p oin t (charac terized in an Ising mo del by choic e of temp erature T and magnetic ﬁeld H) one d o es not gain information on pr op erties at that p oin t only , bu t also in the n eigh b oring region,” ([1], page 116) 7 { ω 1 , ω 2 , · · · , ω N } ( N “la r g e”) denote a large set of random inputs. This set represen ts the datab ase . Give n the database, the av erages of the controls are precisely calculated. A sc hematic of this stage is give n in Figure 1 . (1) F or j = 1 , · · · , N (a) Generate ω j according to the distribution of the inputs; (b) F o r i = 1 , · · · , k (i) Sim ulate the path φ ( ω j ; θ i ) (ii) Ev aluate the v alue of the control X i ( ω j ) = Y ( ω j ; θ i ) (2) F or i = 1 , · · · , k (a) Find J D B ( θ i ), the a v erage of the i th control on the datebase, as J D B ( θ i ) = 1 N N X j =1 Y ( ω j ; θ i ) Fig. 1. DBMC setup stage 3.1.2 Estimation stage T o estimate J ( θ ), at a θ close to θ i ’s ( i = 1 , · · · , k ) select a “small” sample (sa y of size n ≪ N ) uniformly from the database. F or eac h sample ω j re- sim ulate the equation using ω j and θ to obtain Y ( ω j ; θ ) . F or these samples the v alues of the controls X i ( ω j ) are a v aila ble in the database. Using these ev aluate a con trolled estimate of J ( θ ). A sc hematic v ersion of these steps is giv en in Figure 2. (1) F or j = 1 , · · · , n (a) Select ω j uniformly f r om the database; (b) Sim ulate the path φ ( ω j ; θ ) ; (c) Ev aluate the estimation v ariable Y ( ω j ; θ ) . (2) Find the controlled estimator of J ( θ ): b J cv ( θ ) = 1 n n X j =1 [ Y ( ω j ; θ ) + k X i =1 β o i ( X i ( ω j ) − J D B ( θ i ))] (3) Fig. 2. DBMC estimation stage 8 3.2 Implementation choic es There a re tw o general sche mes f o r implemen tat io n of our CV approa c h: (I1) corresp onding to what is described ab ov e, requires storing sim ulation inputs { ω j } and outputs { X i ( ω j ) } in a databa se fo r later resampling; (I2) do es not utilize resampling, so there is no stora ge o f data b ey ond recording the calcu- lated con trol means. Both implemen tations are feasible, the ﬁrst is preferable in most cases; the sec ond ma y b e preferred in some cases. W e elabo rate b elo w. Implemen ta tion (I1). • The databa se of r a ndom inputs, i.e., ω j ’s, a r e either directly stored or enough information ab o ut them (e.g. input seeds of a pseudo-random num b er gen- erator) is stored to b e able to regenerate ω j ’s precisely . • The k paths corresp onding to θ i , i = 1 , · · · , k , i.e., φ ( ω j ; θ i ) are generally sim ulated “in parallel” a s elemen ts of a random ve ctor, ω i , are progressiv ely generated. • F o r eac h random input, say ω j , the v alue of the controls, X i ( ω j ), j = 1 , . . . , N , i = 1 , . . . , k are stored. Implemen ta tion (I2). • Once the setup stage is completed, the only v alues stored are the “high qualit y” estimates of the means of J ( θ i )’s, i.e., the k v alues J D B ( θ i ), i = 1 , · · · , k . • A t t he estimation phase, n random input v ectors ω j , j = 1 , . . . , n , are generated anew ; paths at θ and θ i , i = 1 , · · · , k , a re sim ulated using new random inputs and for each pa t h X i ( ω j ) and Y ( ω j ; θ ) are calculated; ﬁnally , using these v alues, the con trolled estimator is ev aluated. 3.3 Statistic al pr op erties & c om putational eﬃciency The pro mise of the approac h is the following: b y anc horing estimation via CV at a few high qualit y estimates (at θ 1 , · · · , θ k ), it is p ossible t o obtain high qualit y estimates at other lo cations in the par a meter space (at other θ ) with far few er samples. The actual statistical pro p erties of the resulting estimators, and the computational eﬃciency of generating them, reﬂect c hoices made in implemen t ing eac h giv en problem. F or example, how muc h computation should b e “inv ested” in the exploration phase, and which p o ints θ i in the par ameter space should b e explored are tw o imp o rtan t questions that need further in v es- tigation. Suc h c hoices generally in v olv e problem dep enden t tradeoﬀs, and w e lea v e them to future studies. 9 Instead, the analysis that follo ws is mean t to pro vide a general and qualitative understanding of the stat istical prop erties, computational eﬃciency and t he tradeoﬀs inv olv ed. The discussion is a s general as p o ssible, but consisten t with the n umerical study described in section 4, where suc h implemen tation c hoices w ere ma de utilizing o nly a basic familiarity with the pro blem. F or further discussion, see [8]. 3.3.1 Statistic al pr op erties W e g iv e the ana lysis for implemen tation ( I1 ) . In ot her w ords, assume we are re-sampling from the database. Analysis of implemen ta t ion (I2) show s similar estimator statistical prop erties. T o simplify the discussion consider a single con trol, say X 1 . Let J ( θ ) = E [ Y ], J ( θ 1 ) = E [ X 1 ], σ 2 Y = V ar ( Y ), σ 2 X 1 = V ar ( X 1 ). Assume a data ba se of input v ariables are g enerated and let Y ∗ and X ∗ 1 b e r andom v ar ia bles corr esp o nding to Y and X 1 that are generated b y re-sampling (uniformly , with replacemen t) from the da t abase. Let J ∗ ( θ ), σ 2 Y ∗ , J ∗ ( θ 1 ), σ 2 X ∗ 1 denote the means and v ariances of the re-sampled v ar ia bles Y ∗ and X ∗ . Conditioned on t he database, the controlled estimator is exactly the classical CV estimator and all results from classical CV apply . F or example, for an y scalar β , Z ∗ = Y ∗ + β ( X ∗ − J ∗ ( θ 1 )) is an unbiase d estimator of J ∗ ( θ ), J ∗ ( θ 1 ) is kno wn, a nd the optimal β o is what is prescrib ed b y classical CV if w e ta k e all random v ar ia bles as those deﬁned o n the database. A measure of v aria nce reduction due to using a con trolled estimator is V RR = σ 2 Y ∗ σ 2 Z ∗ (4) W e use the con trolled estimator Z ∗ as an estimator for J ( θ ). Assume optima l β o is used to deﬁne Z ∗ and assume E [ Z ∗ ] = J ∗ ( θ ) 2 . In general J ∗ ( θ ) 6 = J ( θ ). Therefore, Z ∗ is a biased estimator of J ( θ ) where the bias is in tro duced b y sampling fro m the database, i.e., from Y ∗ , as o pp osed to from Y . W e ha v e some probabilistic assessmen t of t his bias and w e can reduce it by increasing the size of the database. Sp eciﬁcally , for this bias we can obtain an appro ximate 1 − α pr o babilit y conﬁdence in terv al: P ( | J ∗ ( θ ) − J ( θ ) | ≤ z α/ 2 σ Y √ N ) ≈ 1 − α 2 i.e. we ignore th e low order bias that results from the t ypical CV pro cedur e of estimating the optimal β o e.g. [3], not to b e confused with the resamplin g bias discussed in this section 10 where z α/ 2 is the 1 − α/ 2 quan tile from the standard normal distribution. In other w ords, with high probability the bias is of the order o f O ( N − 1 / 2 ). W e assume that for lar ge N the bias is suﬃcien tly small to b e disregarded and that w e can fo cus on V RR in (4) as the ke y measure of computational gain in using the con trolled estimator to estimate J ( θ ). 3.3.2 Computational eﬃcienc y Generating the ab ov e large database, as w e p ointed o ut earlier, corresp onds to an initial “setup” cost. Let C b e the computational cost of generating a sample of Y ( θ ). This cost inv olve s generating an ω , sim ulating the pa t h, a nd ev aluating Y ( ω , θ ). A reasonable assumption for many problems is that this cost is ab out the same for all θ and ω . Then the set-up cost of generating t he database and obtaining av erages of the con trols is approximately N × k × C . Let V RR ( θ ) denote the v ariance reduction ra tio at θ , i.e., the ratio of the v ariance of a n uncon trolled sample a nd that o f a con trolled sample at θ . Then, the statistical error of a con trolled estimator ba sed on n samples is a ppro ximately the same as that o f n × V RR ( θ ) samples of an uncontrolled estimator. Th us, the ratios o f the computational costs of the tw o estimators (to arriv e at the same statistical accuracy) is ( n × V RR ( θ ) × C ) / ( n × C ) = V R R ( θ ). Therefore, V RR ( · ) can serv e a s a measure of b eneﬁt of the D BMC approach. The setup cost of the DBMC approa c h can b e justiﬁed in tw o t yp es of ap- plications. The ﬁrst t yp e are those a pplications t ha t require solving man y instances o f the estimation problem, at man y θ ’s. If the total num b er of in- stances is suﬃcien tly large, and some v aria nce reduction is achie v ed on the a v erage on those instances, then the large ﬁxed set-up cost can b e dw arfed by the total computational sa vings from the many estimations. The second type are real-time applications where the setup cost can b e view ed as an oﬀ-line cost enabling signiﬁcan t eﬃciency ga ins in the critical task of real-time esti- mation. Typ ically , the “cost” of dela y in suc h real-time estimation is hig her and not merely computational, justifying ev en a m uc h la rger computatio na l eﬀort oﬀ-line. 4 Numerical results The numerical results in this sec tion are in tended to give a qualitativ e illustra- tion of the eﬃciency gains that can b e ac hiev ed using the DBMC appro a c h. Sp eciﬁcally , w e estimate the v ariance reduction that can b e achiev ed ov er regular (crude) sampling, when estimating the three quan tities of in terest (a p oin t magnetization, to tal mag netization at a sp eciﬁc time t and the t o tal time-space magnetization) a t a range of the parameter θ . Our c hoices of the 11 size of the data ba se, num b er of samples used for estimation, rang e of parame- ter v alues, and the controls are simply f or illustration purp oses. How eve r, w e exp ect that the n umerical r esults are, qualitativ ely , quite represen tativ e. W e simulate the TDGL dynamics on a 40 × 40 lattice (lattice spacing δ x k = 1, k = 1 , 2) with ﬁxed χ = 1. On eac h path, w e ev olv e the system for a to t a l of 5000 time steps ( δ t = 0 . 01) whic h is suﬃcien t f or the system to exhibit b eha vior that is sp eciﬁc to its temp erature region. The critical p oint for this system is θ c = 1 . 265 [1 1], and our parameter range of in terest (1 . 0 to 1 . 5) extends to b oth sides o f that critical p oint. T o build a data base, w e sim ulate N = 2 14 = 16384 paths and ev aluate p oint magnetization, total mag netization at a sp eciﬁc time, and total space-time magnetization at tw o nominal v a lues of θ , 1 . 2 and 1 . 35. F o r eac h quan tit y of in terest, w e consider three con trol v ariate estimators. The ﬁrst t w o estimators, CV1.2 and CV1.35 , use single con trols corresp onding to θ = 1 . 2 and θ = 1 . 3 5, respective ly . W e chose to anc hor our estimators at those tw o nominal v alues for θ b ecause they ar e lo cated o n opp o site sides of the phase t r a nsition line θ c . The third estimator, CV2C, uses b oth con trols sim ultaneously . W e use n = 2 8 = 256 samples for crude and CV estimators. T o estimate the v ariance of these estimators, follow ing the micro-macro sim ulation appro a c h (see, e.g., [14]), w e use 40 indep enden t macro sim ulations consisting of 25 6 in- dep enden t micro sim ulations. W e obtain v a riance estimates from eac h macro sim ulation and a v erage the resulting 4 0 v alues to obtain an ov erall v a riance estimate. W e rep o rt the ratio s o f the v ariance estimates (crude/con trolled, as in (4)) as V R R . A sampling of VRR results for the tot a l space-time mag neti- zation (problem P3) is giv en in T able 1 and the corresp o nding graph is giv en in Figure 3. The graph for p o in t magnetizatio n (problem P1) are give n in Fig. 4, and the results for the t otal magnetization at a t ime t (pro blem P2) are quite similar and are excluded. T able 1 V ariance r eduction ratios of the estimators applied to space-time in tegral of the magnetizati on, P x P t φ ( x , t ), at sev eral v alues of θ . Estimator \ θ 1.150 1.175 1.225 1.250 1.265 1.300 1.325 1.375 1.40 0 CV1.2 63 236 219 55 33 15 10 6 5 CV1.35 5 6 11 16 21 59 231 245 67 CV2C 170 709 947 332 259 300 761 5 13 129 Based on these results, we draw the following conclusions: • Con trolled estimators pro duce dramatic v aria nce reduction for parameter 12 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 10 1 10 2 θ VRR Variance Reduction Ratios for the time integral of the magnetization CV12 CV135 CV2C Fig. 3. V ariance reduction ratios of the estimators of the space-time integral of the magnetizati on, P x P t φ ( x , t ), ov er a range of v alues of θ (log scale). v alues v ery close to the nominal parameters and substan tial v a r iance reduc- tion at v a lues mo derat ely close to the nominal. • F o r all the estimation pro blems, adding the second con trol consisten tly im- pro v es p erformance, in some cases leading to substan tial reduction in v ari- ance (compared to single controls). Of course, b y incorp orating information from p oin ts on b oth sides of the critical temp erature, CV2C is exp ected to giv e b etter co v erage than either of the single control estimators. Ho w ev er, CV2C do es b etter than either of the single con trol estimators ev en in their o wn regions, which suggests that eac h con trol pro vides relev an t info rmation to the estimation problem in t he opp osite region. • VRR v alues for the t o tal space-time magnetization are somewhat larger than those for the p oint and total magnetization at a sp eciﬁc time t – we exp ect this to b e true more generally fo r path in tegrals when compared with v alues at sp eciﬁc time instances. 13 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 10 1 10 2 θ VRR Variance Reduction Ratios for the point magnetization CV12 CV135 CV2C Fig. 4. V ariance redu ction ratios of the estimators of the p oint magnetization o v er a range of v alues of θ , in log scale. 5 Conclusions In this pap er w e describ ed a new strategy , DataBase Monte Carlo (D BMC), for improving computational eﬃciency o f Ensem ble Mon te Carlo. F or a sp eciﬁc time-dep enden t nonlinear dynamics we sho w ed t hat the approach can lead to signiﬁcan t eﬃciency gains for a rang e o f estimation problems. Our selection of the controls has b een ad-ho c and for illustratio n purp o ses. F urther w ork is required to b etter understand the options av ailable a nd t he computational tradeoﬀs in v olv ed. T o this end, our curren t researc h is fo cused on (i) deriv ation of more sp eciﬁc guidelines for the selection of eﬀectiv e con trol v ariates, (ii) implemen t a tion of the DBMC strategy in conjunction with other v a riance reduction techniq ues, for example, strat iﬁcatio n and imp ortance sampling, and (iii) application o f the metho d in some sp eciﬁc domains, for example, estimation problems in geophy sical ﬂuids and bio che mical systems. 14 References [1] K . K. Binder, D. Heermann, Mon te Carlo S im ulation in Statistical Physic s: An In tro du ction, Sprin ger, 2002. [2] G. Ev ensen, Data Assimilation, The Ensemble Kalman Filter, Sp r inger, 2006. [3] P . Glasserman, Monte Carlo Metho d s in Financial En gineering, Sp ringer-V erlag New Y ork, Inc, 2004. [4] D. J. Wilkinson, Sto chastic Mo delling for S ystems Biology, CRC Press, 2006. [5] J . M. Hammersley , D. C. Handscom b, Monte Carlo Methods, John Wiley , 1964. [6] S . Asmussen, P . W. Glynn, S to c hastic Simulatio n: Algorithms and Analysis, Springer, 2007. [7] P . V akili, G. Zhao, T. Borogo v ac, Database mon te carlo: A new strategy for eﬃcien t simulati on, T ec h. r ep., Boston Univ ersit y College of Engineering (2008) . [8] T . Borogo v ac, P . V akili, Database mon te carlo approac h to eﬀectiv e control v ariates, T ec h. r ep., Boston Unive rsit y College of Engineering (2008). [9] A. M. F err en b erg, R. H. Swendsen, New monte carlo tec hnique for studying phase transitions, P h ys. Rev. Lett. 61 (23) (1988) 2635–263 8. [10] N. Gulbahce, F. J. Alexander, G. Johnson, Statistical mec hanics of histories: A cluster monte carlo algorithm, Phys. Rev. E 73 (2006) 026701. [11] A. L. F erreira, R. T oral, Hybrid mon te carlo metho d for conserve d-order- parameter systems, Phys. Rev. E 47 (6) (1993) R3848–R38 51. [12] C. P . Rob er t, G. Casella, Mon te C arlo Statistical Metho ds, S pringer S cience & Business Media Inc., 2004. [13] B. Sc hmeiser, M. R. T aaﬀe, J. W ang, Biased cont rol-v ariate estimatio n, I IE T r ansactions 33 (2001 ) 219–22 8. [14] B. Schmeiser, C hapter 7: Sim ulation exp er im ents, in: D. P . Heyman, M. J. Sob el (Eds.), Handb o oks in OR and MS 2: Sto c hastic Mo dels, Elsevier B. V., 1990, Ch. 7, p p. 295–330 . 15

A Control Variate Approach for Improving Efficiency of Ensemble Monte Carlo

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment