A fast vectorised implementation of Wallaces normal random number generator

A F AST VECTO RISED IMPLEMENT A TION OF W ALLA CE’S NORMAL RANDOM NUMBER GENERA TOR RICHARD P . BRENT Abstra ct. W allace has proposed a new class of pseudo-random gen- erators for normal vari ates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essen tially matrix-vector multiplications and are very suitable for implementati on on vector pro cessors or vector/parallel pro cessors such as the F ujitsu VPP300. In this rep ort we outline W allace’s idea, consider some vari ations on it, and d escribe a vectorised implementa- tion RANN4 whic h is more than three times faster than its b est competi- tors (the P olar and Box-Muller meth ods) on th e F ujitsu VP2200 and VPP300. 1. Introduction Sev eral r ecen t p ap ers [3, 5, 18, 19] hav e considered the generation of uniformly distributed pseudo-random n um b ers on vec tor and parallel com- puters. In many applications, random n um b ers fr om sp eciﬁed non-u n iform distributions are requir ed. A common r equiremen t is for the normal dis- tribution, whic h is what we consider here. In principle it is suﬃcien t to consider metho ds for generating normally distribu ted num b ers with mean 0 and v ariance 1, s in ce translation and scaling can easily b e p erformed to giv e n um b ers with mean µ and v ariance σ 2 (usually r eferred to as num b ers w ith the N ( µ, σ 2 ) distribution). The most eﬃcien t metho d s for generating n ormally distribu ted random n um b ers on sequential machines [2, 4, 9, 10, 11, 12, 14, 20] inv olv e the use of diﬀeren t appro ximations on diﬀeren t in terv als, and/or the use of “rejection” metho ds, so they do not v ectorise we ll. Simple, “old-fashioned” metho ds ma y b e preferable on v ector pr o cessors. In [6] w e d escrib ed t w o suc h meth- o ds, the Box- Muller [16] and P olar metho ds [12]. The Po lar metho d was implemen ted as RAN N3 and wa s the fastest vec torised metho d for normally distributed n u m b ers kno wn at the time [17 , 19], although muc h slo w er than the b est u niform rand om num b er generators. F or example, on the F ujitsu VP2200/ 10 a normal r andom num b er using RANN3 requir es an a ve rage of 21.9 cycles, but a go o d generalised Fib onacci uniform random num b er gen- erator requires only 2.21 cycles. (A cycle on the VP2200/10 is 3.2 nsec. Date : 14 April 1997 . 1991 Mathematics Subje ct Classiﬁc ation. Primary 65C10, Secondary 54C70, 60G15 , 65Y10, 68U20. Key wor ds and phr ases. Gaussian random numbers, maxim um en tropy , normal distri- bution, normal random n umbers, pseudo-rand om num b ers, random number generators, random num bers, sim ulation, vector pro cessors, W allace’s metho d. Cop yright c  1997, R. P . Bren t rpb170tr t yp eset using A M S -L A T E X. 1 2 R. P . BRENT Since four ﬂoating-p oin t op er ations can b e p erformed p er cycle, th e theo- retical p eak p erform ance of the VP2200/1 0 is 1250 Mﬂop. The cycle time of the VPP 300 is 7 n sec but the pip elines are wider, so th e theoretic al p eak p erformance is 2285 Mﬂop.) Recen tly W allace [21] prop osed a new class of pseudo-random generato rs for normal v ariates. These generators do not require a s tream of uniform pseudo-random n um b ers (except for initialisatio n) or the ev aluation of ele- men tary functions such as log, sqrt, sin or cos (needed by the Bo x-Muller and P olar metho d s). T he crucial observ atio n is that, if x is an n -vect or of normally d istributed r an d om num b ers, and A is an n × n orthogonal matrix, then y = Ax is another n -vec tor of normally distrib u ted num b ers. Th us, giv en a p o ol of nN normally distribu ted n um b ers, w e can generate another p o ol of nN normally d istributed num b ers by p erformin g N matrix-v ector m ultiplications. The inner lo ops are v ery suitable for imp lemen tation on v ector pro cessors suc h as the VP2200 or vect or/parallel pro cessors suc h as the VPP300. Th e v ector lengths are prop ortional to N , and the n um b er of arithmetic op erations p er normally distr ibuted n um b er is prop ortional to n . T ypically we c ho ose n to b e small, say 2 ≤ n ≤ 4, and N to b e large. W allace implemen ted v ariants of his n ew metho d on a scalar RISC work- station, and found that its sp eed was comparable to that of a fast un iform generator. The same p erformance relativ e to a fast u niform generato r is ac hiev able on a v ector pr o cessor, although some care has to b e tak en with the implemen tation (see § 7). In § 2 w e describ e W allace’s new metho ds in more d etail. Some statis- tical questions are considered in §§ 3–6. Asp ects of implementat ion on a v ector pro cessor are d iscussed in § 7, and details of an implementa tion on the VP2200 and VPP300 are giv en in § 8. Some conclusions are d ra wn in § 9. 2. W al lace’s No rmal Genera tors The idea of W allace’s new generators is to kee p a p o ol of n N norm ally distributed p seudo-random v ariates. As num b ers in the p o ol are used, new normally distribu ted v ariates are generated by forming appropriate com bi- nations of the num b ers whic h ha ve b een used. On a v ector pro cessor N can b e large and th e whole p o ol can b e regenerated with only a small n umber of v ector op erations 1 . As just outlined, the idea is the same as that of the generalised Fib onacci generators f or u n iformly distributed num b ers – a p o ol of random n umbers is transformed in an appropriate w ay to generate a new p o ol. As W alla ce [21] observ es, we can regard the uniform, n ormal and exp onenti al distributions as maximum-en tropy distributions sub ject to the constraints: 0 ≤ x ≤ 1 (uniform) E ( x 2 ) = 1 (norm al) E ( x ) = 1, x ≥ 0 (exp onential) . W e w an t to com b ine n ≥ 2 num b er s in the p o ol so as to satisfy the r elev ant constrain t, but to conserve no other statistically r elev an t information. T o simplify notation, supp ose that n = 2 (there is n o pr oblem in generalising to 1 The process of regenerating the p ool will b e called a “pass”. A F AST N ORMAL RANDOM NUMBER GENERA TOR 3 n > 2). Giv en tw o n um b ers x , y in the p o ol, we could satisfy th e “uniform ” constrain t by forming x ′ ← ( x + y ) mod 1 , and this give s the family of generalised Fib onacci generators [6]. W e could satisfy the “normal” constraint by forming  x ′ y ′  ← A  x y  , where A is an orthogonal matrix, for example A = 1 √ 2  1 1 − 1 1  or A = 1 5  4 3 − 3 4  . Note that this generates tw o new pseudo-rand om normal v ariates x ′ and y ′ from x and y , and th e constrain t x ′ 2 + y ′ 2 = x 2 + y 2 is satisﬁed b ecause A is orthogonal. Supp ose the p o ol of pr eviously generated pseudo-rand om num b ers con- tains x 0 , . . . , x N − 1 and y 0 , . . . , y N − 1 . Let α, . . . , δ b e int eger constant s. These constants might b e ﬁ xed throughout, or they migh t b e v aried (us- ing a subsidiary uniform random n um b er generator) eac h time the p o ol is regenerated. One v arian t of W all ace’s metho d generates 2 N new pseu d o-random num- b ers x ′ 0 , . . . , x ′ N − 1 and y ′ 0 , . . . , y ′ N − 1 using the recur r ence x ′ j y ′ j ! = A  x αj + γ mod N y β j + δ mo d N  (1) for j = 0 , 1 , . . . , N − 1. T he v ectors x ′ and y ′ can then o v erwrite x and y , and b e used as the next p o ol of 2 N pseudo-rand om n u m b ers. T o av oid the cop y in g o v erhead, a double-buﬀering sc heme can b e used. 3. Desirabl e Constraints In order that all n um b ers in the old p o ol ( x, y ) are u sed to generate the new p o ol ( x ′ , y ′ ), it is essen tial that the indices αj + γ mo d N and β j + δ m o d N giv e p ermutations of { 0 , 1 , . . . , N − 1 } as j ru ns through { 0 , 1 , . . . , N − 1 } . A necessary and suﬃcien t condition for this is th at GCD( α, N ) = GCD ( β , N ) = 1 . (2) F or example, if N is a p o w er of 2, then an y o dd α and β ma y b e chosen. The orthogonal matrix A must b e c h osen so eac h of its r o w s has at least t wo nonzero elements, to av oid rep etition of th e s ame p seudo-random num- b ers. Also, these nonzeros should not b e to o small. 4 R. P . BRENT F or implemen tation on a v ector pro cessor it wo uld b e eﬃcien t to tak e α = β = 1 so vect or op erations h a ve unit strides. Ho wev er, statistical considerations ind icate that unit strides should b e a vo ided. T o see wh y , supp ose α = 1. Thus, f rom (1), x ′ j = a 0 , 0 x j + γ mod N + a 0 , 1 y β j + δ mod N , where | a 0 , 0 | is not ve ry sm all. Th e sequence ( z j ) of rand om n um b ers returned to the u ser is x 0 , . . . , x N − 1 , y 0 , . . . , y N − 1 , x ′ 0 , . . . , x ′ N − 1 , y ′ 0 , . . . , y ′ N − 1 , . . . so we s ee that z n is strongly correlated with z n + λ for λ = 2 N − γ . W allace [21] suggests a “v ector” sc heme where α = β = 1 but γ and δ v ary at eac h pass. This is certainly an improv ement o ve r ke eping γ and δ ﬁxed . Ho wev er, there will still b e correlations o v er segmen ts of length O ( N ) in the output, and these correlations can b e detected b y suitable statistica l tests. Th us, we do n ot recommend the sc h eme for a library routine, although it w ould b e satisfactory in many ap p licatio ns. W e recommend that α and β should b e diﬀerent, greater than 1, and that γ and δ should b e selected randomly at eac h pass to reduce an y r esid ual correlations. F or similar reasons, it is desirable to use a diﬀeren t orthogonal matrix A at eac h pass. W allace suggests randomly selecting from tw o pr edeﬁned 4 × 4 matrices, but there is no reason to limit the choic e to t wo 2 . W e prefer to c ho ose “random” 2 × 2 orthogonal matrices with r otatio n angles not to o close to a multiple of π / 2. 4. The Su m of Squa res As W allac e p oin ts out, an ob vious defect of the sc h emes describ ed in §§ 2– 3 is that the sum of squares of the n u m b ers in the p o ol is ﬁxed (apart from the eﬀect of r ounding errors). F or truly random normal v ariates the s um of squares s hould ha ve the chi-squared distribution χ 2 ν , where ν = n N is the p o ol size. T o o verco me this defect, W allace suggests that one pseu d o-random num- b er from eac h p o ol sh ould not b e return ed to the user, but should b e used to appro ximate a ran d om sample S from th e χ 2 ν distribution. A scaling factor can b e in tro duced to ensure that the sum of squares of the ν v alues in the p o ol (of whic h ν − 1 are r eturned to the user) is S . This only inv olve s scaling the matrix A , so the inn er lo ops are n ot signiﬁcan tly c hanged. There are several go o d appro ximations to the χ 2 ν distribution for large ν . F or example, 2 χ 2 ν ≃  x + √ 2 ν − 1  2 , (3) where x is N (0 , 1). More accurate appr o ximations are known [1], b ut (3) should b e adequate if ν is large. 2 Caution: if a ﬁnite set of predeﬁned matrices is u sed, the matrices should b e m ul- tiplicativ ely ind epen dent ov er GL ( n, R ). (If n = 2, this means that the rotation angles should b e indep endent ov er the the integers. ) In particular, n o matrix should b e the inv erse of any other matrix in the set. A F AST N ORMAL RANDOM NUMBER GENERA TOR 5 5. Rest ar ting Unlik e the case of generalised Fib onacci uniform random num b er gener- ators [7], there is no w ell-dev elop ed theory to tell us what the p erio d of th e output sequence of pseudo-random norm al num b ers is. Since the size of the state-space is at least 2 2 w N , where w is the n umber of bits in a ﬂ oating-point fraction and 2 N is the p o ol size (assuming the w orst case n = 2), we would exp ect the p erio d to b e at least of ord er 2 w N (see Kn uth [12]), but it is dif- ﬁcult to guaran tee this. One solution is to restart after say 100 0N num b ers ha v e b een generated, using a go o d uniform rand om n um b er generator with guaran teed long p erio d com bined with the Bo x-Muller metho d to reﬁll the p o ol. 6. Discarding S ome Number s Because eac h p o ol of pseudo-rand om num b ers is, strictly sp eaking, deter- mined b y the previous p o ol, it is desirable n ot to retur n all the generated n um b ers to the us er 3 . If f ≥ 1 is a constant parameter 4 , we can return a fraction 1 /f of the generated num b ers to the user and “discard” the remain- ing fraction (1 − 1 /f ). T he discarded num b ers are retained inte rnally and used to generate the next p o ol. There is a tradeoﬀ b et w een indep end ence of the num b ers generated and the time required to generate eac h n um b er whic h is return ed to th e user. Our tests (describ ed in § 8) indicate th at f ≥ 3 is satisfactory . 7. Vectorised Impleme nt a tion If the recurrence (1) is imp lemen ted in the obvious wa y , th e inn er lo op will in v olv e in dex computations mo d ulo N . It is p ossible to a v oid these computations. Thus 2 N pseu do-random num b ers can b e generated by α + β − 1 iteratio ns of a lo op of the form DO J = LOW, HIGH XP(J) = A00*X(A LPHA*J + JX ) + A01*Y(BETA*J + JY) YP(J) = A10*X(A LPHA*J + JX ) + A11*Y(BETA*J + JY) ENDDO where ALPHA = α , BETA = β , and LOW, HIGH, JX, an d JY are in tegers w hic h are constan t within the loop but v ary b etw een iterations of the lo op. Th us, the lo op v ectorises. T o generate eac h pseudo-random num b er requir es one load (non-unit stride), one ﬂoating-p oint add, t wo ﬂ oating-point m ultiplies, one store, and of order α + β N startup costs. Th e av erage cost should is only a few mac h ine cycles p er random num b er if N is large and α + β is small. On a vect or p ro cessor w ith in terlea ved memory banks, it is d esirable for the strides α and β to b e o dd so that the maxim um p ossible memory 3 Similar remarks apply t o some uniform pseudo-rand om num b er generators [13, 15]. 4 W e shall call f t he “throw-a w a y” factor. 6 R. P . BRENT bandwidth can b e ac hieved. F or statistica l reasons w e w an t α and β to b e distinct and greater than 1 (see § 3). F or example, w e could c ho ose α = 3 , β = 5 , pro vided GCD( αβ , N ) = 1 (true if N is a p o w er of 2). Since α + β − 1 = 7, the a v erage ve ctor length in ve ctor op erations is ab out N / 7. Count ing op erations in the inner lo op ab o v e, w e see that generation of eac h pseud o-random N (0 , 1) n um b er r equires ab out tw o ﬂoating-p oin t m ul- tiplications and one ﬂoating-point addition, plus one (non-un it stride) load and one (unit-stride) store. T o transform the N (0 , 1) num b ers to N ( µ, σ 2 ) n um b ers with giv en mean and v ariance requires an additional multiply and add (plus a unit-stride load and store) 5 . Th us, if f is the thro w-a w a y factor (see § 6), eac h p seudo-random N ( µ, σ 2 ) num b er returned to the user requires ab out 2 f + 1 m ultiplies and f + 1 additions, plus f + 1 loads and f + 1 s tores. If p erf ormance is limited b y the m u ltiply p ip elines, it migh t b e d esir ab le to r educe the num b er of multiplica tions in the in ner lo op by usin g fast Giv en s transformations (i.e. diagonal scaling). The scaling could b e und one when the resu lts w ere copied to the caller’s bu ﬀer. T o av oid problems of o ver/underﬂo w, explicit s caling could b e p erformed o ccasionally (e.g. once ev er y 50-th pass th rough the p o ol should b e su ﬃcien t). The imp lemen tatio n d escrib ed in § 8 do es not in clude fast Giv ens trans- formations or any particular optimisations for the case µ = 0, σ = 1. 8. RANN4 W e ha v e implemen ted the metho d describ ed in §§ 6–7 in F ortran on the VP2200 and VPP300. Th e current implemen tation is called RANN4 . Th e im- plemen tation uses RANU 4 [5] (or equiv alen tly the SS L 2/VPP DP VRANU4 ) to generate un iform ps eudo-random n um b ers for initialization and generation of the p arameters α, . . . , δ (see (1)) and pseudo-random orthogonal matri- ces (see b elow). Some desirab le prop erties of the un iform random num b er generator are inherited by RANN4 . F or example, DP VRANU 4 app end s the pro cessor num b er to the seed, so it is certain that diﬀeren t pseudo-random sequences will b e generated on d iﬀeren t pro cessors, eve n if the u ser calls the generator with the same seed on sev er al pro cessors of the VPP300. The user pro vides RAN N4 with a w ork area w hic h must b e p reserv ed b e- t ween calls. RANN4 chooses a p o ol size of 2 N , where N ≥ 25 6 is the largest p o w er of 2 p ossible so that the p o ol ﬁ ts within p art (ab out half ) of the work area. The remainder of th e work area is used for the uniform generator and to preserve essen tial information b et w een calls. RANN 4 returns an arr a y of normally distribu ted ps eu do-random n um b ers on eac h call. The size of this arra y , and the mean and v ariance of the normal d istribution, can v ary from call to call. The parameters α, . . . , δ (see (1)) are c h osen in a pseu d o-random m anner, once for eac h p o ol, w ith α ∈ { 3 , 5 } and β ∈ { 7 , 11 } . The parameters γ and δ are c hosen uniformly from { 0 , 1 , . . . , N − 1 } . The orthogonal m atrix A is 5 Obviously some optimisations are p ossible if it is know n that µ = 0 and σ = 1. A F AST N ORMAL RANDOM NUMBER GENERA TOR 7 c h osen in a pseudo-random manner as A =  cos θ sin θ − sin θ cos θ  , where π / 6 ≤ | θ | ≤ π / 3 or 2 π / 3 ≤ θ ≤ 5 π / 6. The constrain ts on θ ensure that m in( | sin θ | , | cos θ | ) ≥ 1 / 2. W e do not need to compute trigonometric functions: a un iform generator is used to select t = tan( θ/ 2) in the appro- priate range, and then sin θ and cos θ are obtained usin g a f ew arithmetic op erations. The matrix A is ﬁxed in eac h inner lo op (though not in eac h complete pass) so m ultiplications by cos θ and sin θ are fast. F or safet y w e adopt the conserv ativ e c hoice of throw-a w a y factor f = 3 (see § 6), although in most applications the c h oice f = 2 (or ev en f = 1) is satisfactory and signiﬁcan tly faster. Because of our use of R ANU4 to generate the parameters α, . . . , δ etc, it is most unlikel y that the p erio d of the sequence return ed by RA NN4 w ill b e shorter than the p erio d of the uniformly distr ib uted sequence generated b y RANU4 . Th us, it was not considered n ecessary to r estart the generato r as describ ed in § 5. Ho wev er, our implemen tation monitors the sum of squares and corrects for an y “drift” caused b y accum u lation of rounding errors 6 . On the VP2200/10 , the time p er normally distributed num b er is appr o x- imately (6 . 8 f + 3 . 2) nsec, i.e. (1 . 8 f + 1 . 0) cycles. With our c hoice of f = 3 this is 23.6 nsec or 6.4 cycles. T he fastest version, w ith f = 1, tak es 10 nsec or 2.8 cycles. F or comparison, the fastest metho d of those considered in [6] (the Pola r metho d) tak es 21.9 cycles. Thus, we ha v e obtained a sp eedup by a factor of ab out 3.2 in th e case f = 3. Times on the VPP300 are t ypically faster by a facto r of ab out t w o. F or example, the time p er n ormally distributed n um b er is 11.4 nsec if f = 3 and 5.4 nsec if f = 1. V arious statistica l tests w ere p erformed on RANN 4 with sev eral v alues of the thro w-a wa y factor f . F or example: • If ( x, y ) is a p air of pseudo-random num b ers w ith (supp osed) normal N (0 , 1) d istributions, then u = exp( − ( x 2 + y 2 ) / 2) shou ld b e uniform in [0 , 1], and v = artan( x/y ) should b e uniform in [ − π / 2 , + π / 2]. Th us, standard tests for uniform ps eudo-random n umbers can b e applied. F or example, w e generated batc hes of (up to) 10 7 pairs of n um b ers, transformed them to ( u, v ) pairs, and tested u n iformit y of u (and similarly for v ) b y coun ting th e num b er of v alues o ccurring in 1 , 000 equal size bins and computing the χ 2 999 statistic. This test w as rep eated sev eral times with d iﬀerent initial seeds etc. Th e χ 2 v alues we re n ot signiﬁcan tly large or small for any f ≥ 1. • W e generated a batc h of up to 10 7 pseudo-random num b ers, com- puted the sample mean, second and fourth moments, rep eated a n um b er of times, and compare the observe d and exp ected d istri- butions of sample moments. Th e observed m oments we re not sig- niﬁcan tly large or small for an y f ≥ 3. T he fourth momen t w as sometimes s igniﬁcan tly small (at th e 5% conﬁdence lev el) for f = 1. 6 This provides a useful c hec k, b ecause an y large change in the sum of squ ares must b e caused by an error – most likely the user has o verw ritten the w ork area. 8 R. P . BRENT A p ossible explanation for the b ehavi our of the fourth momen t when f = 1 is as follo ws. Let the m axim u m abs olute v alue of num b ers in the p o ol at one pass b e M , and at the follo wing p ass b e M ′ . By considering the eﬀect of the orthogonal transform ations applied to pairs of num b ers in the p o ol, w e see that (assuming n = 2), M / √ 2 ≤ M ′ ≤ √ 2 M . Th us, there is a correlati on in the s ize of outliers at su ccessiv e passes. The correlation for the sub set of v alues returned to the u ser is reduced (although not completely eliminated) b y choosing f > 1. 9. Conclus ion W e ha ve sh o w n that normal pseudo-random num b er generators based on W allace ’s id eas v ectorise well, and that their sp eed on a vec tor pro cessor is close to that of the generalised Fib onacci uniform generators, i.e. only a small num b er of mac hine cycles p er random num b er. Because W allace’s m etho ds are new, there is little knowledge of their statistica l prop erties. Ho wev er, a careful implemen tation should ha ve satis- factory statistica l prop erties provided d istinct non-unit strides α , β satisfy- ing (2) are used, the su m s of squ ares are v aried as describ ed in § 4, and the thro w-a wa y factor f is c hosen appropriately . W allace uses n × n orth ogonal transformations with n = 4, but a satisfactory (and chea p er) generator is p ossible with n = 2. The p o ol size should b e fairly large (su b ject to storage constraints), b oth for statistical r easons and to impro v e p erforman ce of th e inner lo ops. On a v ecto r-parallel machine such as the VPP300, indep end en t streams of p s eudo-random num b ers can b e generated on eac h pr o cessor, an d no comm u nication b etw een pro cessors is required after the initializatio n ph ase. Ac knowledgemen ts. T hanks to Chris W allac e for sending me a p reprint of his pap er [21] and commenting on a preliminary version [8] of this rep ort. Also, th anks to Don Knuth f or discussions regarding the prop erties of gener- alised Fib onacci metho ds and for bringing the reference [15] to m y atten tion. This w ork was supp orted in part by a F ujitsu-ANU researc h agreemen t. Referen ces [1] M. Abramow itz and I . A. Stegun, Handb o ok of Mathematic al F unctions , Dove r, N ew Y ork, 1965, Ch. 26. [2] J. H. Ahrens and U. Dieter, Computer metho ds for sampling from the exp onential and normal distributions, Communic ations of the ACM 15 (1972), 873-882. [3] S . L. And erson, R andom number generators on vector sup ercomputers and other adv anced architectures, SIAM R eview 32 (1990), 221-251. [4] R . P . Brent, Algorithm 488: A Gaussian pseudo-random num b er generator (G5), Comm. ACM 17 (1974), 704-706. [5] R . P . Brent, Uniform random num b er generators for sup ercomputers, Pr o c. Fi fth Aus tr alian Sup er c omputer Confer enc e , Melb ourne, D ecem b er 1992, 95-104. ftp://nimb us.anu.edu .au/pub/Brent/rpb132.dvi.Z . [6] R . P . Brent, F ast Normal R andom Numb er Gener ators for V e ctor Pr o c essors , T ech- nical Rep ort TR-CS-93-04, Computer Sciences Laboratory , Australian National Uni- versi ty , Marc h 199 3. ftp://nimbus .anu.edu.au /pub/Brent/rpb141tr.dvi.Z . A F AST N ORMAL RANDOM NUMBER GENERA TOR 9 [7] R . P . Brent, On the p eriods of generalized Fib onacci recurrences, Mathematics of Computation 63 (1994), 389-401. [8] R . P . Bren t, Wal lac e’s fast normal r andom numb er gener ators: pr eliminary r ep ort , F u jitsu Area 4 Pro ject R ep ort, Octob er 1994, 6 pp. [9] L. Devroy e, Non-Uniform R andom V ariate Gener ation. Springer-V erlag, New Y ork, 1986. [10] P . Griﬃths and I. D. Hill (editors), Appli e d Statistics Algorithms , Ellis Horwood, Chic hester, 1985. [11] A. J. Kinderman and J. F. Monahan, Computer generation of random v ariables using the ratio of uniform d eviates, A CM T r ansactions on Mathematic al Softwar e 3 (1977), 257-260. [12] D. E. K nuth, The Art of Computer Pr o gr amm ing, V ol ume 2: Semi numeric al A lgo- rithms (second edition). A ddison-W esley , Menlo P ark, 1981. [13] D. E. K nuth, The Art of Computer Pr o gr amm ing, V ol ume 2: Semi numeric al A lgo- rithms (third ed ition). Addison-W esley , Menlo P ark, to app ear. [14] J. L. Lev a, A fast normal rand om n umber generator, ACM T r ansactions on Mathe- matic al Softwar e 18 (1992), 449-453. [15] M. L¨ uscher, A portable h igh-quality random num b er generator for lattice ﬁeld theory sim ulations, Computer Physics Communic ations 79 (1994), 100-110. [16] M. E. Muller, A comparison of metho ds for generating normal v ariates on digital computers. J. ACM 6:376–383, 1959. [17] W. P . Peterse n, Some vectori zed random number generators for uniform, normal, and P oisson distributions for CRA Y X-MP , J. Sup er c omputing , 1:32 7–335, 1988. [18] W. P . Peterse n, Lagged Fibonacci Series Random Number Generators for the N EC SX-3, International J. of High Sp e e d Computing 6 (1994), 387-398. [19] W. P . Peters en, R andom Numb er Gener ators on V e ctor Ar chite ctur es , preprint, 1993. [20] C. S. W allace, T ransformed rejection generators for Gamma an d Normal pseudoran- dom vari ables, Aus tr alian Computer Journal 8 (1976), 103–105. [21] C. S. W allace, F ast Pseudo-R andom Generators for Normal and Exp onentia l V ariates, ACM T r ans. on Mathematic al Softwar e 22 (1996), 119–12 7. Computer Sciences Labora tor y, RSISE, Australian Na tional Universi ty, Canberra, ACT 0200, Australia E-mail addr ess : Richard.Bren t@anu.edu.a u

A fast vectorised implementation of Wallaces normal random number generator

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment