FiEstAS sampling -- a Monte Carlo algorithm for multidimensional numerical integration
This paper describes a new algorithm for Monte Carlo integration, based on the Field Estimator for Arbitrary Spaces (FiEstAS). The algorithm is discussed in detail, and its performance is evaluated in the context of Bayesian analysis, with emphasis o…
Authors: Yago Ascasibar
FiEstAS sampling – a Mon te Carlo algorithm for m ultidi men sional n umerical in tegration Y ago Ascasibar Astr ophysikal isches Institut Potsdam An der Sternwarte 16, Potsdam D-14482, Germany Universidad Aut´ onoma de Madrid Dpto F ´ ısic a T e´ oric a, Campus de Cantoblanc o, M adrid E-28049, Sp ain Abstract This pap er d escrib es a new algorithm for Monte Carlo int egration, based on the Field E s timator for Arb itrary Spaces ( FiEstAS ). The algorithm is d iscussed in detail, and its p erformance is ev aluated in the con text of Ba y esian analysis, with emphasis on m ultimo d al distribu tions with strong parameter degeneracies. Source co de is a v ailable up on request. Key wor ds: n umerical in tegration, Ba y esian inference, Mon te Carlo imp ortance sampling P ACS: 02.60.Jh, 02.50.Tt, 02.70.Uu 1 In t ro duction Man y pro blems in Ph ysics, as w ell a s in ot her branche s of science, inv o lve the computation of an in tegral I = Z V f ( x ) d x (1) o ver a giv en m ultidimensional domain V . In mo st circumstances, the v alue of I has to b e estimated numeric ally , ev aluating the function f at a series of N sample p oints x i . Numerical quadrature algorithms (see e.g. [1]) sample f on a regular grid or at a carefully selected set of unequally-spaced p oin ts (Gaussian quadrature). Email addr ess: yago .ascasib ar@uam.es (Y ago Ascasibar). Preprint sub mitted to Elsevier 12 Octob er 2018 This approac h is v ery efficien t for smoo t h functions, but it is certainly far from optimal when the in tegrand is strongly p eak ed. Adaptive t echniq ues provide a solution to this problem b y refining those sub-inte rv als of the in tegra tion domain where f is found t o v ary on smaller scales. In m ultiple dimensions, Monte Carlo methods usually yield b etter accuracy for a g iven N than rep eated integrations using one-dimensional deterministic algorithms. The essence of the Monte Carlo metho d is that the sample p oints x i are c hosen (to some exten t) at random. The most basic v ersion uses a uniform probabilit y distribution o v er the in tegration region, and the estimator for the in tegra l I is giv en by ˆ I MC = V N N X i =1 f i (2) where V denotes the integration v o lume a nd f i ≡ f ( x i ). The error asso ciated to ˆ I MC can b e estimated as ˆ ∆ 2 MC = V 2 N 2 N X i =1 f i − ˆ I MC V ! 2 . (3) Although the error decreases as N − 1 / 2 (this is, in fact, why Monte Carlo metho ds are preferable in m ultidimensional problems), the basic Monte Carlo is still computationally inefficien t when the function f has a v ery narro w p eak compared to the total integration v olume. O n t he one hand, a large n umber of sample po in ts w ould b e required to obtain go o d estimates ˆ I MC and ˆ ∆ MC . On the other hand, and more importa n t, it is indeed quite p ossible that the algorithm ac hiev es conv erg ence, r ep orting a wrong v alue of t he in tegral with an extremely small estimate of the error, b efore finding the maxim um (or maxima) of f . A refinemen t of this metho d, kno wn as imp o rtance sampling, consists in dr aw- ing the p oin ts fro m a distribution similar in form to the in tegrand, so that they are more lik ely to come from the r egio ns that con tribute most to I . Rather than sampling the whole integration v olume uniformly , one c ho oses a prob- abilit y distribution g ( x ) that adapts to the shap e of the in tegrand. F or an y g ( x ), the integral (1) can b e re-written as I = Z V f ( x ) g ( x ) g ( x ) d x (4) and the estimators of I and its error b ecome ˆ I IS = 1 N N X i =1 f i g i (5) 2 and ˆ ∆ 2 IS = 1 N 2 N X i =1 f i g i − ˆ I IS ! 2 (6) resp ectiv ely . The k ey issue is, of course, the c hoice of the distribution g ( x ). These t wo expressions reduce to (2) and (3) fo r the particular case of uniform sampling, i.e. all g i ≡ g ( x i ) = 1 /V . In the ideal case, when g ( x ) = | f ( x ) | /I , all p oin t s con tribute exactly the same amount to the sum in (5 ) and zero in (6), and the alg orithm ta k es o nly a few N t o achiev e the desired accuracy . Unfortunately , c ho osing g ( x ) is not straig h tfo rw ard when the shap e of f is unkno wn a priori , and ev aluating it is actually pa r t of our goal. A p erfect ex ample of suc h a situation is Ba yes ian inference, where one tries to selec t the mo del M that b est describ es t he data D and estimate the v alues of the mo del parameters Θ . F or a given mo del, Bay es’ Theorem states that p ( Θ | M , D ) = p ( Θ | M ) p ( D | M , Θ ) p ( D | M ) (7) where p ( Θ | M ) is the prior pro babilit y distribution of the mo del parameters, p ( D | M , Θ ) is t he like liho o d of the data , p ( Θ | M , D ) is the p osterior proba bility distribution of the parameters, and p ( D | M ) is the evidence for mo del M , p ( D | M ) = Z p ( Θ | M ) p ( D | M , Θ ) d Θ (8) The p osterior probability distribution (7) contains all the informatio n needed for para meter estimation, and the evide nce pla ys in this case the role of a mere normalization constant. Nev ertheless, the v alue o f the integral (8) b ecomes critical in model selection. Applying again Ba yes ’ Theorem, t w o mo dels M 1 and M 2 can b e compared b y ev a luating the Ba yes ’ f actor, p ( M 1 | D ) p ( M 2 | D ) = p ( M 1 ) p ( D | M 1 ) p ( M 2 ) p ( D | M 2 ) (9) where p ( M i ) denotes the prior probabilit y of eac h mo del, p ( D | M i ) are their evidences , and p ( M i | D ) ar e the posterior probabilities. Ev en if one is only in terested in parameter estimation, it is go o d practice to quote the evidenc e (as well as the assumed prio r s) in order to make p ossible the comparison with future w ork. The computational part of a Ba y esian analysis consists thu s in the ev aluation of the in tegral I = p ( D | M ) = R f ( Θ ) d Θ , where f ( Θ ) = p ( Θ | M ) p ( D | M , Θ ), o ver the region o f the parameter space where p ( Θ | M ) > 0. Since eac h ev a l- uation of the lik eliho o d may inv olv e a significan t computational cost, the al- gorithm should lo cate the the maxima of f ( Θ ) a s efficien tly as p ossible, ev en when this f unction has a complicated structure. In particular, it is not un- common tha t f ( Θ ) presen ts sev eral maxima, corresponding to differen t sets 3 of para meter v alues t ha t pro vide a go o d fit to the da t a . Whic h solution is to b e preferred dep ends, within the Ba y esian framew o r k, on t he v alue of the in tegra l (the evidence) ov er the region asso ciated t o eac h p eak. Ev en more often, there are strong degeneracies in parameter space. The maxima of the function f can b e extremely elongated, g enerally with a curv ed shap e arising from a non-linear relatio nship b et w een t wo or more parameters. These t w o problems (m ultimo dalit y and curv ed degeneracies) ha ve prov en dif- ficult to o v ercome fo r man y standard algorithms, esp ecially as the n umber of dimensions increases. Mark ov Chain Monte Carlo metho ds (see e.g. [2]) t yp- ically require a relativ ely large n um b er of ev aluations. Nested sampling [3] pro vides a more efficien t alternativ e, transforming the m ultidimensional in te- gral (1) in to a single dimension b y means of a re-parameterization in terms of the ‘prio r mass’ p ( Θ | M ) d Θ . This formalism has recen tly b een extende d to cop e with the problems of mu ltimo dality and curv ed degeneracies b y r esort- ing to clustering algorithms [4,5]. On the other hand, t raditional import ance sampling metho ds (e.g. V egas [6]) assume a se parable w eight function, i.e. g ( x ) = Q D d =1 g d ( x d ), whic h can g iv e a go o d appro ximation to f ( x ) only as long as the characteristic features are w ell alig ned with the co o r dina t e a xes. An adaptiv e refinemen t tec hnique has b een implemen ted in the algorithm Suav e [7] to impro v e the p erformance in t he g eneral case. Here I pres en t a new metho d, based on t he Field Es timator for Arbitrary Spaces ( FiEstAS ) dev elop ed b y [8] to es timate t he con tinuous densit y field underlying a giv en p oin t distribution. The algorithm is fully describ ed in Sec- tion 2, results are presen t ed in Section 3, and the main conclusions are sum- marized in Section 4. 2 Description of the algorithm FiEstAS sampling p erfor ms m ultidimensional n umerical in t egr a tion b y means of Mon t e Carlo imp ortance sampling. Here I discuss in detail the most relev an t algorithmic issues; actual source co de is av ailable up on request. The metho d has three f r ee parameters: the de sired relative accuracy ǫ , the fraction of p oin ts that are uniformly distributed η u , and the fractiona l increase in the num b er of p oin ts p er step η n . D efault v a lues are ǫ = 0 . 01, η u = 0 . 1, and η n = 0 . 1. F or the first step, t he num b er o f new sample p oints is set to n = 1. The main lo op of the program can b e summarized as (1) generate η u n new p oints uniformly (2) compute g ( x ) for this step using FiEstAS 4 (3) sample n F = (1 − η u ) n new p oints from g ( x ) (4) ev aluate the integral a nd it s erro r (5) increase n b y a f r action η n (6) rep eat un t il the desired accuracy is achiev ed 2.1 Uniform sampling FiEstAS sampling is intende d to adapt the probabilit y distribution g ( x ) to the shap e of the in tegrand within the fewe st p ossible num b er of iterations. As will b e show n b elow, the presence of pronounced curv ed degeneracies do es not p ose a significan t pro blem to the algorithm, but the disco v ery of narrow , isolated maxima is, at b est, a v ery hard problem, and a la rge n um b er of sample p oints may b e required in order to ach iev e the corr ect result. Eve n w orse, there can b e no guaran tee that al l the maxima of f ha ve b een lo cated, and therefore it is imp ortant to reac h a compromise b etw een the detection of p oten tial m ultimo dalit y and sampling efficiency . Sampling from a uniform distribution is ob viously the most unbiased w a y to explore t he integration domain in searc h of previously unkno wn maxima. The fraction of p o in ts (i.e. computational effort) dev oted to suc h exploration is con tro lled b y the parameter 0 ≤ η u ≤ 1 . A low v alue w ould b e adequate for functions where m ultimo dalit y is not a concern, either b ecause the p ositions of the maxima are already kn o wn by other me ans, because there c an only b e one, or b ecause the different p eaks are not to o isolated (i.e. the contrast b et w een the maxim um and the saddle p oin t to w ards the nearest detected p eak is not large). A high v alue of η u w ould help the algorithm lo cate t he p eaks, but it w ould render it slo w. What constitutes a reasonable compromise b et wee n accuracy and computational resources dep ends, of course, o n the details of the application b eing considered, and the adopted default v alue should not preclude the user from applying his/her own judgmen t . 2.2 Choic e of g ( x ) The prescription adopted for g ( x ) is arguably the heart of the metho d. The idea is to use t he set of sample p oin ts computed so far to estimate the shap e of the in tegrand. First, each p oin t is a ssigned a h yp ercubical cell b y the F iEstAS algorithm (see [8] and App endix A) . Let the volumes of these cells b e denoted as v i . Then, g ( x ) = P N i =1 w i Π i ( x ) P N i =1 w i v i (10) with Π i ( x ) = 1 if x b elongs to the i -th cell, and 0 otherwise. 5 The choic e o f w eights w i = 1 corresp onds to unif o rm sampling, and w i = | f i | corresp onds to g ( x ) ≈ | f ( x ) | /I . This is the optimal probabilit y distribution when the integrand shap e is a lr eady w ell describ ed by the sample p oin ts, i.e. f ( x ) ≈ P N i =1 f i Π i ( x ). In practice, conv erg ence is faster if o ne uses instead w 2 i = D f 2 E i = 1 n nei n nei X j =1 f 2 j (11) where the sum is p erformed o v er t he n nei neigh b ouring cells in the FiEstAS tessellation, including the cell i itself. This prescription tends to sample regio ns where | f | is lar g e and/or v aries on small scales (comparable to the sampling densit y), and therefore it combines in a simple w ay the adv an tag es of imp or- tance and stratified sampling. Considering the v alues of f in adjacent cells is imp ortant in order to efficien tly explore those regions where large gradients are found, suc h a s newly discov ered maxima or pronounced degeneracie s. 2.3 Sampling fr o m g ( x ) Eac h step, n F = (1 − η u ) n new p oints are g enerated from the distribution (10). The probability that eac h of t hese p oin ts b elongs to cell i is given b y p i = g i v i = w i v i P N j =1 w j v j (12) and therefore the cum ulative distribution can b e computed as P i = i X j =1 p j = P i − 1 + w i v i P N (13) with P 0 = 0. Sampling from P i is then trivial. A uniform random num b er b etw een 0 and 1 is generated, a nd the v alue of i is obtained fr om a binary searc h. The new p oin t is created at a random lo cation, uniformly distributed within cell i . 2.4 Evaluation of the r esult F rom the n F = (1 − η u ) n new p oin ts generated during the curren t step, ˆ I s = 1 n F n F X i =1 f i g i (14) 6 and ˆ ∆ 2 s = 1 n 2 F n F X i =1 f i g i − ˆ I s ! 2 . (15) T o estim ate the v alue of I , the estimates of the last S steps ar e com bined according to ˆ I = 1 S S − 1 X i =0 ˆ I s − i (16) with an asso ciated error ˆ ∆ 2 = 1 S ( S − 1) S − 1 X i =0 ˆ I s − i − ˆ I 2 . (17) The n um b er of steps S is set by the condition ˆ I s − S − ˆ I > 3 min { ∆ s , ∆ S } . (18) if S ≥ 4 . Else, the pro cedure is assume d no t to hav e con v erged y et, and only the information from the last step is considered (i.e. ˆ I = ˆ I s and ˆ ∆ = ˆ ∆ s ). 2.5 Iter ation A t the end of eac h step, the total n um b er of new points n is increased by a fixed fraction η n . Since the FiEstAS tree has to b e up dated once p er step in order to estimate g ( x ), a small v alue of η n increases the ov erhead of the algorithm. Ho w ev er, a frequen t up date results in a more accurate sampling, and therefore a smaller n umber o f f unction ev aluations. A compromise should b e sough t for eac h particular problem, ta king in to account the computational cost o f ev a lua ting the in tegra nd and setting η n so as t o minimize the total CPU time. Otherwise, the precise v alue of this parameter do es not significantly affect the results. 2.6 Conver genc e The main lo o p contin ues generating new p oints until the estimated relativ e error dro ps b elow the v alue of t he t olerance parameter ǫ , ˆ ∆ < ǫ | ˆ I | . (19) Again, the default v alue should b e in terpreted as a guideline, sub ject to t r ial and error, common sense , and the sp ecific requireme n ts of the problem at hand. Note that, as men tio ned ab ov e, t he condition S ≥ 4 is a lso imp osed 7 in order to ensure that the algorithm has actually conv erged. This criterion is only imp ortan t in those few cases where a new maxim um has just b een disco v ered at the time ˆ ∆ decreased b elo w the requested tolerance. 3 Results The p erfo r ma nce of FiEstAS sampling has b een tested b y applying the al- gorithm to a series of to y m ultidimensional problems with kno wn analytical solution. T he integrands hav e been c hosen to emphasize the issues of m ul- timo dality a nd t he presence of non-linear degeneracies. I also consider the test suite prop osed by Genz [9] and compare the results with those of o ther algorithms. Tw o differen t asp ects ha ve b een considered: the accuracy of the solution (the actual v alue o f the inte gral ˆ I , the quality of t he error estimate ˆ ∆, and the abilit y to find all maxima) and the computat io nal cost (in terms of the n umber of ev alua tions of the function f as w ell as the ov erhead in CPU time incurred b y the FiEs tAS tree). All the tests hav e b een carried out on a P en tium 4 pro cessor with a clo ck rate of 3 . 2 GHz. The parameters of the algorithm are alw a ys set to their default v a lues. 3.1 T oy pr obl em I As a first example, the metho d is us ed to compute the integral of a linear com bination of five m ultiv ariate G aussians with differen t v alues of σ , lo cated at arbitrary p o in ts in the xy -plane. The lo catio ns and disp ersions are t he same as in [5], i.e. x i = {− 0 . 4 , − 0 . 35 , − 0 . 2 , 0 . 1 , 0 . 45 } , y i = {− 0 . 4 , 0 . 2 , 0 . 15 , − 0 . 15 , 0 . 1 } , and σ i = { 0 . 0 1 , 0 . 01 , 0 . 0 2 , 0 . 03 , 0 . 05 } , but eac h of the Gaussian comp o nen ts has b een normalized in o r der to facilitat e the counting of the n um b er of p eaks detected, f ( x ) = 5 X i =1 1 (2 π σ 2 i ) D / 2 exp " − | x − c i | 2 2 σ 2 i # (20) where D is the dimensionalit y of the problem and c i is the cen tre of eac h Gaussian, giv en b y x i and y i . The correct v alue of the in t egr a l ( p erformed from − 1 to 1 along all a xes) is th us I ≃ 5. In the Ba y esian framew ork, eac h of the maxima w ould corresp ond to a differen t solution. Since all the individual evidences are equal, these fiv e solutions w ould b e equally v alid, and none of them would be preferred o v er 8 ˆ I ˆ ∆ N t CPU 3.982 0.037 3974 0.31 3.979 0.026 3603 0.27 5.095 0.047 3603 0.27 4.026 0.024 1990 0.14 3.958 0.032 3603 0.27 4.035 0.040 3603 0.27 4.032 0.039 2682 0.19 5.052 0.044 5325 0.42 4.144 0.039 3603 0.27 4.998 0.048 7127 0.57 T able 1 Results obtained b y 10 ind ep endent runs of FiEstAS samp ling for the toy p rob- lem I in t wo dimensions. First and second co lumns quote the estimates gi v en by equations (16) and (17 ), resp ectiv ely , follo w ed b y the total num b er of ev aluations and the CPU time (in seconds) requir ed by the algorithm. Fig. 1. Sample p oint s used b y the last run in T able 1 the ot hers. F rom a frequen tist p oin t of vie w, the narrow est p eaks should b e fa v oured because they yield the highest v alues of the like liho o d f . The res ults of ten indep enden t runs with differen t random s eeds are listed in T able 1 for the t w o-dimensional case. In a larg er set of one h undred runs, the n umber o f p eaks iden tified b y the algo r ithm w as three (4 times), four 9 D D ˆ I E ± σ I D ˆ ∆ / ∆ true E ± σ ∆ h N i ± σ N h t CPU i ± σ t 2 4 . 490 ± 0 . 485 0 . 7 67 ± 1 . 836 4400 ± 1397 0 . 34 ± 0 . 12 3 3 . 910 ± 0 . 309 2 . 1 28 ± 3 . 254 18896 ± 6003 2 . 87 ± 1 . 00 4 4 . 161 ± 0 . 325 0 . 7 22 ± 3 . 254 86084 ± 40075 24 . 64 ± 12 . 07 5 3 . 525 ± 1 . 362 1 . 4 41 ± 2 . 796 21395 6 ± 115194 104 . 41 ± 59 . 64 6 1 . 790 ± 1 . 577 0 . 7 96 ± 2 . 186 48273 9 ± 545036 380 . 49 ± 487 . 6 7 2 . 001 ± 1 . 615 1 . 3 43 ± 2 . 383 2937 754 ± 3908586 4207 . 7 8 ± 6332 8 1 . 501 ± 1 . 094 0 . 4 21 ± 4 . 539 6578 407 ± 5835573 13554 . 28 ± 15750 T able 2 Results for to y problem I in D dimensions, a verage d o v er ten ind ep endent run s. Columns sho w D , the v alue of ˆ I , the deviatio n of the estimated err or ˆ ∆ with resp ect to ∆ true (see text), the num b er of ev aluations, and the CPU time in seconds. (56 times), and five (40 times). The narrow Gaussian at ( − 0 . 4 , − 0 . 4) was the most difficult to detect. Although it has the same dispersion as the second comp onen t at ( − 0 . 35 , 0 . 2), the latter is relativ ely close to ano ther broader maxim um, and th us it w a s muc h easier to iden tify . Neglecting the con tribution from missed p eaks, the estimated error alwa ys corresp onded fairly w ell to the actual deviation with r espect to the a nalytical solution. The num b er of sample p oin ts required to ac hiev e con ve rgence ranged from N = 19 9 0 to 1155 7, and the run time w as in all cases of the order of a few tenths of a second. The sample p oints used b y the la st run are plo t t ed in Figure 1 . The dep endence on the n um b er of dimensions D is s ho wn in T able 2. The fiv e Gaussians are k ept at the same lo cations in the xy -plane, with the same disp ersions and normalized t o unit evidence, but they refer no w to D inde- p enden t v ariables ranging from − 1 to 1. Eac h entry on the table corresp o nds to the av erag e and standard deviation of ten indep enden t runs with different random seeds. FiEstAS sampling typically disco ve rs four or five p eaks for D = 2, whic h leads to the a verage v alue D ˆ I E = 4 . 5 ± 0 . 5. As the n umber of dimensions increases, detecting eac h of the isolated maxima b ecomes more difficult, and the v alue of D ˆ I E decreases. Note, ho w eve r, that large disp ersions σ I indicate that the n um b er of peaks detected ma y v ar y considerably b etw een differen t runs. F or instance, in D = 6 dimensions, the algorithm stopp ed after finding only the broa d maxim um at ( 0 . 45 , 0 . 1) in eight out of the ten runs, whereas all p eaks w ere correctly iden tified in the other tw o. The v alues of D ˆ ∆ / ∆ true E corresp ond to the geometrical a v erage of the esti- mated error ˆ ∆ divided by the ‘true’ error ∆ true = | ˆ I − I ′ | , where I ′ is the 10 nearest in teger to ˆ I . More precisely , D ˆ ∆ / ∆ true E ≡ exp "* ln ˆ ∆ | ˆ I − I ′ | !+# (21) and σ ∆ ≡ exp * ln 2 ˆ ∆ | ˆ I − I ′ | !+ − * ln ˆ ∆ | ˆ I − I ′ | !+ 2 . (22) Since all of the Gaussians ha v e unit evidence, I ′ corresp onds to b oth the num- b er of p eaks detected and the analytical v alue of the in tegral after subtract- ing the contribution from the undetected components . The res ults obta ined, D ˆ ∆ / ∆ true E ∼ 1 ± 3, suggest that ˆ ∆ is t ypically within a factor of three (ab o v e or b elo w) from ∆ true (see e.g. T a ble 1 ) . Finally , the exact n um b er of ev aluations required dep ends on the in tegrand b eing considered, the requested tolerance ǫ , and, to a lesser exten t, the v alues of the parameters η u and η n . What can be inferred from the data in T able 2 is that N gro ws exp onen tia lly with the num b er D of dimensions. Since the complexit y of the FiEstAS tree is O ( N log N ) [8], the ov erhead in CPU time also grow s exp onentially . 3.2 T oy pr obl em I I In order to assess the effect of non-linear degeneracies, let us also in v estigate the to y pro blem prop osed b y [10 ], f ( x ) = R ( x ; c 1 , r 1 , w 1 ) + R ( x ; c 2 , r 2 , w 2 ) (23) with R ( x ; c , r, w ) = 1 κ exp " − ( | x − c | − r ) 2 2 w 2 # . (24) The function R ( x ) represen ts a ring of radius r and width w , centere d at c . The normalization κ is c hosen so that R R ( x )d x = 1, a nd it can b e expressed as κ = D π D / 2 Γ( D / 2 + 1) √ 2 π w 2 K D − 1 ( r , w ) (25) where Γ denotes the Gamma function and K D is g iven by the recursiv e form ula K D ( r , w ) = r K D − 1 + ( D − 1) w 2 K D − 2 (26) with K 0 = 1 and K 1 = r . The t w o r ings are separated b y 7 units a lo ng the x -axis, and their radii and widths a re set to the v alues r 1 = 1, r 2 = 2, and w 1 = w 2 = 0 . 1. The domain of integration ranges f rom − 6 to 6 in all dimensions. 11 Fig. 2. Sample p oint s for to y problem I I in tw o dimensions. D D ˆ I E ± σ I D ˆ ∆ / ∆ true E ± σ ∆ h N i ± σ N h t CPU i ± σ t 2 1 . 990 ± 0 . 023 1 . 792 ± 2 . 991 6127 ± 1012 0 . 47 ± 0 . 09 3 2 . 001 ± 0 . 017 1 . 411 ± 2 . 137 29660 ± 8082 4 . 60 ± 1 . 35 4 1 . 984 ± 0 . 027 0 . 938 ± 2 . 700 72619 ± 13452 20 . 03 ± 4 . 04 5 1 . 897 ± 0 . 304 1 . 199 ± 2 . 473 17933 3 ± 56814 83 . 86 ± 29 . 15 6 1 . 491 ± 0 . 501 0 . 933 ± 3 . 181 32782 4 ± 77328 249 . 65 ± 64 . 26 7 1 . 196 ± 0 . 397 0 . 782 ± 2 . 524 735387 ± 203382 552 . 04 ± 169 . 9 8 1 . 381 ± 0 . 480 0 . 675 ± 2 . 206 4517358 ± 361661 3 11001 . 25 ± 9788 T able 3 Results for to y pr ob lem I I. The v alues of the integral computed b y FiEs tAS sampling are giv en in T a- ble 3, and Figure 2 shows the sample p oints used b y the alg o rithm in the t wo-dimensional case. The t w o rings hav e been correctly iden t ified by the al- gorithm ten times out of ten up to D = 4. As the n um b er of dimensions increases, the small ring o ccupies an exponentially smaller fraction of the in- tegration domain with resp ect to the bigger structure, and the detection rate decreases. F or D = 8, it is found in f our o f the ten runs. The shape of the rings is alwa ys accurately reco vered , which prov es that the presence of strong degeneracies do es not significan t ly affect t he accuracy of the metho d. As in the previous examples, the estimated error is within a factor of three from the true v alue, and bo th the num b er N of ev aluatio ns and the o verhead t CPU gro w exp o nentially with D . 12 Mo del D N V egas N Sua v e N Divon ne N Cuhre N FiEstAS I 2 520 00 90000 18569 9165 3974 3 45 000 16 0000 19391 0 82677 14012 4 175000 32 0010 28531 2 215 271 16864 1 5 314500 1270259 86567 115069 5 204091 6 - 1 2003 6 171526 3971 451 224517 7 - 2 8008 4 236217 2624 5785 11186912 8 - 184039 7 6619 086 1162426 85 76407 40 I I 2 45 000 60000 6321 19175 5325 3 67 500 90000 24700 213233 33236 4 85 000 23 0006 99300 8675253 58999 5 137500 32 0015 18923 6 963010 23 328797 6 - 5 1004 6 219956 - 397 883 7 - 8 2012 8 364261 - 640 912 8 - 112015 3 122915 - 18 28970 T able 4 Num b er of ev aluations required by the algorithms V egas [6], S u a ve [7], Div onne [11], and Cuhre [12] for to y problems I and I I in D d imensions. Emp t y en tries indicate failure to con v erge within N = 10 9 function calls. Results of a s ingle ru n of FiEstAS sampling are quoted in the last column. 3.3 Comp arison wi th other algorithms The metho d presen ted here significan tly outp erforms the standard Monte Carlo with uniform probability , and its results are comparable to those of the bank sampling tec hnique describ ed in [1 0], at least fo r problems of low dimensionalit y . An imp o r t a n t adv an ta ge o f FiEstAS samplin g is that prior information on the shape of f is not required, although it could be easily tak en in to accoun t b y simply a dding the ‘bank’ p o in ts at any stage. On the other hand, comparison with the r esults o f [5] suggests that clustered nested sampling ma y b e more efficien t for large D . T ables 4 and 5 sho w the p erformance of the algorithms V egas [6], Sua ve [7], Div o nne [11], and Cuhre [12], as implemen ted in the Cuba library 1 [7], when applied to to y problems I and I I. T able 4 quotes t he n um b er of in tegrand ev alua t io ns, and T able 5 shows the estimated v a lue of the in tegr a l. FiEs tAS 1 h ttp://www.feynarts.de/cuba 13 Mo del D ˆ I V egas ˆ I Sua v e ˆ I Divon ne ˆ I Cuhre ˆ I FiEstAS I 2 5.01 5.00 4.95 5.00 4.06 3 2.29 3.99 5.00 5.00 4.09 4 1.03 3.99 3.64 2.00 4.56 5 2.11 3.95 2.61 2.00 3.96 6 - 0.99 2.01 2.00 1.00 7 - 0.37 2.47 2.00 4.99 8 - 1.80 0.58 1.99 3.01 I I 2 2.00 2.00 2.02 2.00 2.00 3 2.00 2.00 1.96 1.63 2.01 4 1.95 2.00 2.01 1.54 1.98 5 1.37 2.00 1.38 1.60 2.00 6 - 1.26 1.20 - 2.01 7 - 0.50 1.00 - 0.96 8 - 0.30 0.51 - 0.99 T able 5 Estimates ˆ I return ed b y the differen t algorithms (same runs as in T able 4). sampling is quite comp etitiv e in terms of efficiency; the v alues of N are alwa ys comparable to or lo w er than those req uired b y the other metho ds. Its main dra wbac k, how ev er, is the computatio nal o verhe ad; the CPU times in T ables 2 and 3 are m uc h larger than those required b y the other algorithms (a few min utes in the w orst cases). In terms of accuracy , the num b er of detected maxima compares v ery fa v ourably with the other metho ds. Moreov er, the estimate ˆ I is almost alwa ys close to the true v alue o f t he in tegral, ignoring the con tributio n fro m undetected p eaks; among all the examples quoted, only the run for to y mo del I in four dimensions rep orted a wrong v alue. In order to prob e a wider v ariety of in t egrands, the differen t metho ds a re also b enc hmarked with the Genz test suite [9], based on the fo llo wing six function families: (1) Oscillatory: f 1 ( x ) = cos( c · x + 2 π w 1 ) (2) Pro duct p eak: f 2 ( x ) = D Y d =1 1 ( x i − w i ) 2 − c − 2 i 14 (3) Corner p eak: f 3 ( x ) = (1 + c · x ) − ( D +1) (4) Gaussian: f 4 ( x ) = exp( −| c · ( x − w ) | 2 ) (5) C 0 -con tin uous: f 5 ( x ) = exp( − c · | x − w | ) (6) Discon tinuous: f 6 ( x ) = exp( c · x ) if x 1 < w 1 and x 2 < w 2 0 otherwise The parameter ve ctor c sets the o v erall difficult y of the problem. Its comp o- nen ts are chos en as uniform random n um b ers from 0 to 1, and t hen they are normalized according to || c || 1 ≡ D X d =1 c d = 50 (27) except for the oscillatory in tegrand, for whic h I set || c | | 1 = 5. The comp onen ts of w are uniform random n um b ers b etw een 0 and 1. In general terms, they do no t affect the difficult y of the problem, since they only con trol the lo cation of the features (e.g. maxima) of f . They pla y an imp ortan t role, though, for the oscillatory family , b ecause w 1 sets the precise v alue of the in tegra l, whic h can b e a rbitrarily close to zero. This p o ses a problem for FiEstAS sampling (and, more generally , to pure Mon te Carlo alg o rithms) b ecause a la r g e n um b er of ev aluations is required to ac hiev e the cancellation of p ositiv e and negative terms with the desired accuracy . Results of the Genz test in 2, 5, and 8 dimensions are quoted in T able 6 for the differen t algorithms. Eac h en tr y shows the av erage num b er of ev aluatio ns o ver 20 random realizations of eac h fa mily . In order to sp eed up the test, w e imp ose a maxim um n umber o f ev aluations N max = 10 6 . It is clear fr o m the comparison that ev ery algorithm has its ow n strengths and w eaknesse s. Some metho ds ar e b etter suited to solv e a particular class o f problems, and some others p erform b etter in man y dimensions. In general terms, the results presen ted here suggest that FiEstAS sampling could b e a go o d c ho ice for complicated in tegrands of lo w to mo derate dimensionality . 4 Conclusions This pa p er presen ts a new algorithm to car r y out n umerical in tegration in m ultiple dimensions or, in a Ba yes ian contex t, sample from the p osterior dis- tribution and compute the evidence for the assumed mo del. The metho d, dubb ed ‘ FiEstAS sampling’, is a v ariant of imp ortance sampling whe re the 15 Genz D h N V egas i h N Sua v e i h N Divon ne i h N Cuhre i h N FiEstAS i 1 2 (2) 97056 6650 0 2954 195 (3) 16591 3 5 1 8350 42000 2 917 819 (1) 22110 8 6 8500 51500 3 697 331 5 (1) 57411 2 2 5625 2050 0 3186 219 0 1096 5 9100 3150 0 10624 47742 2 16559 8 1 2300 40000 2 3326 (19) 46962 5 57301 3 2 4500 1350 0 1690 195 395 5 4500 2000 0 3970 819 32 31 8 7000 2000 0 5495 331 5 11505 4 2 5750 2300 0 7137 176 8 2037 5 1 6950 42000 3 0582 576 30 32559 8 3 0200 64000 (3) 52943 (10) 54918 5 183580 5 2 5375 2250 0 4542 440 7 1595 5 1 0025 33500 1 9747 (17) 67185 3 21697 8 1 6650 42500 (1) 40161 (20) - 92247 6 2 7825 3250 0 6994 504 53 3056 5 1 8025 87008 (1) 64257 847 66 25940 8 (1) 36079 10300 6 (4) 15215 6 (12) 60443 5 115577 T able 6 Av erage n u m b er of ev aluations required by eac h algorithm for the test su ite pro- p osed by Genz [9 ]. The s m all num b ers in parentheses indicate the num b er of runs where the desired accuracy wa s not reac hed for N = 10 6 . w eight of eac h p oin t is computed with the help of the Field Es timator for Arbitrary Spaces [8]. Its p erformance ha s b een tested for sev eral to y problems with kno wn ana- lytical solution, sp ecifically designed to con tain multimodal distributions a nd significan t degeneracie s. The results suggest that FiEstAS sampling provid es an inte resting alternativ e to other metho ds for problems of low dimensionality . In particular, it is able to disco ver most isolated maxima of the in tegra nd, and it can p erfectly sample from distributions with pronounced degeneracies. As the nu m b er of dimensions increases, the ability of the algorit hm to iden tify p eaks decreas es and the required computational resources (in terms of b oth time and memory) grow expo nen tially . 16 Fig. A.1. FiEstAS algorithm applied to a random u niform d istr ibution of 100 p oint s in t wo dimensions. A FiEstAS The Field Estimator for Arbitrary Spaces ( FiEstAS ) is a techniq ue to esti- mate the con tin uous (probabilit y) densit y field underlying a giv en distribution of data p oints. P articular atten tion is paid to a void imp osing a metric on t he data space. Indeed, the problem ma y actually b e regarded as c omputing the appropriate metric, giv en the data. FiEstAS assigns eac h p oin t a v olume v i b y means of a k − d tree. The space is recursiv ely divided, one dimension at a time, until there is only one single data p oint in eac h no de. The original implemen tation, describ ed in [8], is heav- ily orien ted t ow ards a particular problem, namely the estimation o f densities in phase space (a non-Euclidean, six-dimensional space composed of three- dimensional p ositions and ve lo cities). Here I use a more general v ersion of the algorithm, where the dimension less lik ely to arise from a uniform distribu- tion is selec ted for splitting at eac h step (v ery similar to the Shannon entrop y approac h follow ed by [13]). More precisely , a histogram with B = 1 + √ N node bins is built for eac h dimens ion, and the log -lik eliho o d L d = ln( N node !) − N node ln( B ) − B X b =1 ln( n bd !) (A.1) is computed, where t he indices 1 ≤ d ≤ D and 1 ≤ b ≤ B denote the dimension and the bin num b er, resp ectiv ely , n bd is the n um b er of p oints in eac h bin, and N node is the tota l n um b er of p oints in the node. In order to encourage a similar nu m b er of divisions along all dimensions, I add to L d the n umber of times s d that the d -th axis ha s a lready b een split, L ′ d = L d + s d . (A.2) The dimension w ith smaller L ′ is divided at the point x split = ( x l + x r ) / 2, where x l is the maxim um x of all p oints lying on the ’left’ side ( b ≤ b split ) and x r is the minimum x of the p o in ts lying on the ’rig ht’ ( b > b split ) side. The bin 1 ≤ b split < B is c hosen in order that the num b er of p oints on eac h side is as close as p ossible t o N node / 2. 17 The pro cedure is illustrated in Figure A.1, where success iv e steps of the algo- rithm are plotted for a random realization of 100 uniformly-distributed data p oin ts in t wo dimensions. F or this particular realization, the dimension with smaller L ′ turned out to b e the x -axis (first panel in the figure). Then, for the p oin ts on the left side, the y - axis was selected for splitting (second panel). The same axis w as c hosen again for the bo ttom region, and t he sequence contin ued b y dividing further along the x -, y -, x -, and x -axes. At this momen t (third panel in the figure) there is only one p oin t on the left side, and con trol r eturns to the paren t no de in order to split the right branc h (containing, in this case, t wo po in ts). The final tessellation obtained b y this metho d is sho wn on the last panel. References [1] P r ess, W. H., T euk olsky , S. A., V etterling, W. T., and Flannery , B. P ., Numerical r ecip es in F OR TRAN. The art of scien tific compu ting, Cam bridge Univ er s it y Press, 1992. [2] Mac k a y, D. J. C ., Information Theory , Inf erence and Learning Algorithms, Cam b ridge Unive rsit y Press, 2003. [3] S killing, J., Nest ed S ampling, in AIP Confer enc e Series , edited b y Fisc her, R., Preuss, R., and T oussaint, U. V., volume 735, 2004 . [4] S ha w , J. R., Bridges, M., and Hobson, M. P ., MNRAS 378 (2007 ) 1365. [5] F eroz, F. and Hobson, M. P ., astro-ph/0704.370 4 (2007). [6] L ep age, G. P ., Journ al of Comp utational Physic s 27 (1978 ) 192. [7] Hahn , T., Compu ter Ph ys ics Communicatio ns 168 (2005) 78. [8] Ascasibar, Y. and Binney, J., MNRAS 356 (2005) 872. [9] Genz, A., A pac k age for testing m u ltiple in tegratio n subr outines, in Numeric al inte gr ation , edited by K east, P . and F airwea ther, G., 1986. [10] Allanac h, B. C. and Lester, C. G., hep-ph/0705.0486 (2007). [11] F r iedman, J. H. and W righ t, M. H., ACM T r an s . Math. Soft w. 7 (1981) 76. [12] Bern tsen, J., Esp elid, T. O., and Genz, A., A CM T rans. Math. Soft w . 17 (1991 ) 437. [13] Sharma, S. and Steinmetz, M., MNRAS 373 (2006) 1293. 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment