George Forsythes last paper

George F orsythe’s last pap er ∗ Ric hard P . Bren t Mathematical Sciences Institute Australian National Univ ersit y Can b erra, A CT 0200, Australia In memory of George and Sandra F orsythe I ha ve a fe eling, however, th at it is so mehow sil ly to take a r an- dom numb er and put it elab or ately into a p ower series · · · John v on Neumann Collected W orks V ol. 5, p g. 769 ∗ Presen ted at the “Stanford 50” meeting, Stanford Universi ty , 29 Marc h 2007. Cop yright c  2007–201 0, R. P . Brent. rpb238 Summary I will describ e v on Neumann’s elegan t idea for sampling f rom the exp onen- tial distribution, F orsythe’s generalizat ion for sampling from a probabilit y distribution whose densit y has th e form exp( − G ( x )), where G ( x ) is easy to compute (e.g. a p olynomial), and my reﬁnement of these id eas to give an eﬃcien t algorithm for ge nerating p seudo-random n um b ers with a n ormal distribution. Later devel opments will also b e men tioned. Bac kground Roger Ho c kney and I are th e only p eople who were lucky enou gh to ha v e b oth George F orsythe and Gene Golub as PhD advisors (see the Mathematics Gene alo gy Pr oje ct ). In m y case th is came ab out b ecause Gene w en t on sabb atical to the U K, and George to ok ov er while he w as a w a y . Ho w ev er, I managed to ﬁnish b efore Gene returned to Stanford. That was in the da ys b efore email, and there w as a mail strike in UK, s o communicati on with Gene w as d iﬃcult. P erhaps that help ed me to ﬁnish quickly , b ecause if Gene had b een at S tanford he probably would ha v e ask ed m e to do more w ork on the last c hapter! Most of y ou here to da y kno w Gene, but only the older ones w ill r emem b er George and Sandra F orsythe, so to da y I will talk ab out George F orsythe and an in teresting link b ac k to John vo n Neumann an d the early days of computers. History In summer 1949 F orsythe attended some lectures at UCLA by John vo n Neumann on th e topic of random n umber generation. The lectures w ere part of a Symp osium on th e (then new) Monte Carlo metho d [15, p. 236]. It seems that von Neumann nev er wrote up th e lectures, but a f ascinating 3-page summary wa s written b y F orsythe and published in 1951. F orsythe must h a v e con tin ued to think ab out the topic b ecause, shortly b efore he died, he wrote a Stanford rep ort V on Neumann ’s c omp arison metho d for r andom sampling fr om the normal and other distributions (S T AN- CS-72-254, dated F ebru ary 9, 1972). 2 This expanded on a brief commen t by v on Neumann that his metho d [for the exponential d istribution] c an b e mo diﬁe d to yie ld a distribution satisfying any ﬁrst-or der diﬀer ential e quation. Collected W orks 5 , 770. Ahrens, Di eter and Kn uth F orsythe intended that his Stanford rep ort w ould form the basis of a joint pap er w ith J . H. Ahrens and U. Dieter, who had disco v ered r elated results indep end ently , and had presente d them at Stanford in Octob er 1971. After F orsythe died in April 1972, Don Knuth submitted the Stanford rep ort to Mathematics of Computa tion , and it w as pub lished with only minor c hanges in the October 1972 issue. This w as F ors ythe’s last published pap er, with the p ossible e xception of a pap er by E. H. Lee and F orsythe in SIAM R eview (sub mitted in Octob er 1971 and pub lished in Jan uary 1973). Ahrens and Dieter published a follo w-up pap er in Math. Comp. (1973) [2] and I published an implementa tion GRAND of m y improv ement of the F orsythe – v on Neumann metho d in Comm. ACM (19 74) [4]. That was in the d a ys b efore TOM S , when interesting algorithms were still pu blished in Communic ations . The problem Supp ose w e w an t to sample a probabilit y distribution with d ensit y f ( x ) = e − G ( x ) , where G ( x ) is some simple function, e.g. a p olynomial. V on Neumann illus- trated his idea for th e exp onent ial distrib ution G ( x ) = x, ( x ≥ 0) , but it also app lies to the normal distribu tion G ( x ) = x 2 / 2 + ln(2 π ) / 2 . The function f ( x ) satisﬁes a ﬁrst-order linear diﬀeren tial equation f ′ + G ′ ( x ) f = 0 , and con v ersely . That is wh y von Neumann made the remark ab out ﬁrs t- order diﬀerenti al equations. 3 V on Neumann’s insigh t The ob vious wa y to generate a samp le from the exp onen tial distribution is to generate a sample u ∈ (0 , 1] from the uniform distribution and th en tak e x = − ln( u ) . Ho w ev er, the ev aluation of ln( u ) is exp ensive (relativ e to the cost of generat- ing u by an eﬃci ent uniform random num b er g enerator). Also, this m etho d do es not generalize w ell to the normal distribution, where w e would need to ev aluate th e in v erse of the n ormal distribution function 1 √ 2 π Z x −∞ exp( − t 2 / 2) dt . V on Neumann’s in sigh t was that we can generate a random samp le usin g a sm all num b er (on a v erage) of samples from a uniform distribution, and ev aluation of G ( x ) at a small num b er of p oints. There is no need to compute an y exp ens iv e sp ecial functions! Probabilit y of a run Supp ose for the moment that 0 ≤ u 1 = G ≤ 1. Generate samples u 2 , u 3 , . . . from the uniform distribu tion so long as the num b ers are decreasing, and then stop. In other words, ﬁnd n ≥ 1 suc h that G = u 1 > u 2 > u 3 > · · · > u n ≤ u n +1 . (1) The probability that G > u 2 > u 3 > · · · > u n > u n +1 is G n n ! = Prob(max( u 2 , . . . , u n +1 ) < G ) n ! , so the p robabilit y of (1) is p n = G n − 1 ( n − 1)! − G n n ! . Chec k: p 1 + p 2 + · · · = 1 by telescoping series, so the alg orithm terminates with probabilit y 1. Exercise: The exp ected v alue of n is exp( G ). 4 The p o w er series for exp( − G ) What is the pr obabilit y that our ﬁn al n is o dd ? It is j ust p 1 + p 3 + · · · = 1 − G + G 2 2! − G 3 3! + · · · = exp( − G ) . This suggests a r eje ction metho d for generating a sample from the distribu- tion with density exp( − G ( x )) on some interv al [ a, b ]: 1. Generate un if orm w ∈ [ a, b ] and set u 1 ← G ( w ). 2. Generate uniform u 2 , u 3 , . . . ∈ [0 , 1] until condition (1 ) is satisﬁed ( u n ≤ u n +1 ). 3. If n is even, return to step 1 (i.e . r eje ct w ). 4. Return w (i.e. ac c ept w ). This works b ecause the pr obabilit y that w is accepted, i.e. the probabilit y that n is o d d at step 3, is exactly exp( − G ( w )). An imp ortan t c ond ition The algorithm only wo rks correctly if 0 ≤ G ( w ) ≤ 1 on the in terv al w ∈ [ a, b ]. T o apply the idea to the exp onenti al or normal distributions we ha v e to split the inﬁnite in terv al [0 , + ∞ ) or ( −∞ , + ∞ ) into a u nion of ﬁnite interv als I k . Provided the inte rv als I k are small enough, we ca n use the al gorithm to generate samples f rom eac h I k . Th us, ﬁ r st select k with the correct pr obabilit y Z I k exp( − G ( x )) dx , then use the F orsy th e – v on Neum ann algo rithm to get a sample from I k . Minor detail: The fun ction G ( x ) has to b e mo diﬁed b y addition of a constan t to giv e the appropriate function G k ( x ) on the interv al I k . F or example, w e could use ( x 2 − a 2 ) / 2 for the n ormal d istribution on [ a, b ], where 0 ≤ a < b and b 2 − a 2 ≤ 2. 5 Exp onen tial and normal distributions F or the exp onen tial distribu tion, consider the in terv als I k = [( k − 1) ln 2 , k ln 2) . F or con v enience on a binary computer, our sample shou ld lie in I k with probabilit y 2 − k , k = 1 , 2 , . . . W e can select k by coun ting the leading zero bits in a u niform rand om n umber (giving k − 1). Then w e can app ly a rejection metho d to get a sample with the correct d istribution from I k . F or the normal distribution, it is con v enien t to randomly generate the sign, then consider the interv al [0 , ∞ ). W e su b divide this in terv al in to in- terv als I k = [ a k − 1 , a k ) suc h that a 0 = 0 and r 2 π Z a k a k − 1 exp( − x 2 / 2) dx = 2 − k for k = 1 , 2 , . . . , w . It is easy to precompute a table of the constants a k . The table is small, sin ce we can neglect probabilities 2 − k if k is greater than the w ordlength w of the computer. Historical notes F or the exp onentia l distribu tion, von Neumann to ok interv als I k = [ k − 1 , k ) so the probabilit y of sampling from I k is ( e − 1) /e k . He did this b ecause he had a tr ick for combining the trials with the sele ction of in terv als. Ho we v er, his tric k do es not generaliz e to other distribu tions. F or the normal distribution, F orsythe used in terv als deﬁn ed by a 0 = 0 and a k = √ 2 k − 1 for k ≥ 1 . Presumably he did this b ecause then a 2 k − a 2 k − 1 = 2 for k ≥ 2 . This choice of a k is w hat San d ra F orsythe used in her imp lemen tion of the algorithm: The c orr e ctness of this algorithm · · · (has) b e en c onﬁrme d in un- publishe d exp eriments by A. I. F orsythe and indep endently by J. H. Ahr ens. George F orsythe 6 Commen t It is b etter to u se the interv als that I deﬁned, as used in GRAND, b ecause then w e do not need to s tore a table of probabilities (they are just negativ e p o w ers of 2). With my c hoice it can b e sho wn that a 2 k − a 2 k − 1 < 2 ln 2 < 1 . 39 for k ≥ 1 . As w ell as r educing th e table size, m y choice reduces the exp ected n umb er of calls to the u niform random num b er generato r. Reﬁnemen ts The algorithms prop osed by F orsythe an d v on Neumann w ere ineﬃcient in the sense th at they used more uniform samp les than necessary to generate one sample fr om the exp onen tial or norm al distribution. The algorithm implemente d by Sandra F orsythe r equires (on a v erage) 4 . 04 uniform samples p er n orm al s amp le. F or v on Neumann’s algorithm the corresp onding constant is 5 . 88. Ahrens and Dieter (1973) redu ced the constan t 4 . 04 to 2 . 54 (and ev en further at the exp ense of larger tables and more complicat ions). In my 1974 p ap er describing GRAND I show ed h ow 4 . 04 could b e re- duced to 1 . 38 b y using a b etter sub division of the inﬁnite in terv al [0 , + ∞ ) and by not w asting random bits. F or example, after step 2, u n +1 − u n 1 − u n is un if orm ly distributed and can b e used later. F urther reﬁn emen ts In principle, by using larger tables, it is p ossible to redu ce the constan t to 1 + ε for any ε > 0, but this w ould not n ecessarily giv e a faster algorithm. In practice 1 . 38 is small enough. Later dev elopmen ts The idea of r eje ction metho ds wa s dev elop ed by man y p eople to giv e eﬃ- cien t algorithms for sampling from a great v ariet y of distr ib utions – see for example the b o oks b y Devro y e and Kn uth (V ol. 2). 7 Sp eciﬁcally for the n ormal distribution, F orsythe’s metho d (a s im p ro v ed and imp lemen ted in GRAND) is m uc h faster than earlier m ethod s, suc h as the Bo x-Muller an d P olar metho ds. There are now man y diﬀerent algorithms f or the norm al distribution, but I think it is fair to sa y that n one are cle arly b etter than GRAND. The d iﬀerences b etw een the b est algorithms are s m all – often there is a tradeoﬀ b et w een space and time, and the relativ e sp eeds dep end on the mac hine architec ture as we ll as on the choic e of uniform rand om num b er generator. W allace’s metho d The only metho d that is clearly muc h faster than GRAND is W allace’s metho d, pr op osed in 1994 b y Chris W allace. It do es not use a uniform random num b er generator. In stead, a p o ol of normally d istr ibuted n umb ers is mainta ined and refreshed b y p erformin g orthogonal transformations. The key observ ation is that, if x is a v ector of n in dep endent, normally distributed num b ers, then the pr obabilit y densit y of x , (2 π ) − n/ 2 exp  − ( x 2 1 + · · · + x 2 n ) / 2  , is a function of || x || 2 . i.e. the distribution has spherical symmetry . If follo ws that, if Q is an n × n orthogonal matrix, then y = Qx is another ve ctor of normally distributed num b ers, b ecause || y || 2 = || x || 2 . W allace’ s metho d is in teresting an d fast, but suﬀers from some statistical problems: see my pap er in the W allace memorial [5 ]. 8 References [1] J . H. Ahrens and U. Diete r, Computer metho ds for sampling from the exp onen tial and norm al distribu tions, Comm. ACM 15 (1972), 873– 881. [2] J . H. Ahren s and U. Dieter, Extensions of F orsythe’s metho d for random sampling from the normal distrib ution, Math. Comp. 27 , 124 (Oct. 1973) , 927–93 7. [3] G. E. Box and M. E. Muller, A note on the generation of random n ormal deviates, Ann. Math. Statistics 29 (1958), 610–611. [4] R . P . Bren t, Algorithm 488: A Gaussian pseudo-random num b er gen- erator [G5], Comm. ACM 17 , 12 (1974), 704–706 . http://wwwma ths. anu.edu.au/ ~ brent/pub/pub023.html [5] R . P . Brent, Some commen ts on C . S. W allace’s rand om num b er gen- erators, Computer Journal 51 , 5 (200 8), 579–58 4. http://wwwma ths. anu.edu.au/ ~ brent/pub/pub213.html [6] L . Devro y e, N on-U niform R ando m V ariate Gener ation , Springer- V erlag, New Y ork, 19 86. Av aila ble from h ttp://cg.scs.carleton . ca/ ~ luc/rnbookin dex. html [7] G. E. F orsythe, V on Neumann’s comparison metho d for rand om sam- pling from th e normal and other distribu tions, R ep ort ST AN-CS-72- 254, January 19 72 [b ut dated F ebruary 9, 1972 ], 21 pp. [8] G. E. F orsythe, V on Neumann’s comparison metho d for rand om sam- pling f rom the norm al and other distributions, M ath. Comp. 26 , 120 (Oct. 1972), 817–826 . [Su bmitted p osth umously April 27, 1972; F orsythe died Ap ril 9, 1972.] [9] A. J. Kinderman and J. F. Monahan, Comp uter generation of random v ariables usin g the ratio of uniform d eviates, ACM T r ans. on Mathe- matic al Softwar e 3 (197 7), 257– 260. [10] D. E . Kn uth, The Art of Computer P r o gr amming, V olume 2: Seminu- meric al A lgorithms (t hird edition), Addison-W esley , Menlo Park, 1998. [11] E. H. L ee and G. E. F orsythe, V ariational study of nonlinear spline curv es, SIAM R ev iew 15 , 1 (Jan. 1973) , 120–123. [Submitted October 21, 1971; revised Marc h 13, 1972.] 9 [12] J. L. Lev a, A fast normal random num b er generator, ACM T r ans. on Mathematic al Softwar e 18 (1992), 449–453 . [13] W. D. MacLaren, G. Marsaglia and T. A. Bra y , A fast pro cedure for generating normal rand om v ariables, Comm. ACM 7 (1964), 4–10. [14] M. E. Mu ller, A comparison of metho d s for ge nerating normal v ariates on digital computers, J. ACM 6 (1959), 376–383. [15] J. v on Neuman n , V arious tec hniques used in connection with random digits, in Monte Carlo Metho d , App l. Math. Series 12 , US Nat. Bu- reau of S tandards, 1951 , 36–3 8 (summary written b y G. E . F orsyth e); reprinte d in John von Neumann Col le cte d Works (ed. A. H. T aub), 5 , P ergamon Press, New Y ork , 1963 , 768–77 0. [This is a sum mary w r itten b y Geo rge F orsythe, n ot by v on Neuman n . Symp osium held Ju ne-July 1949 at the Institute for Numer ical An aly- sis, UCLA. The 3-page pap er con tains man y int eresting observ ations, e.g. ho w to get an unbiased sample b y tossing a biased coin, and h o w to generate samples from v arious distributions b y r ejection metho ds.] [16] C. S. W allace, F ast pseudo-random generators f or normal and exp o- nen tial v ariates, ACM T r ans. on Mathematic al Softwar e 22 (199 6), 119–1 27. Also Rep ort TR#94/197, Departmen t of Computer Science, Monash Universit y , F ebruary 1994. [17] The Mathematics Gene alo gy Pr oje ct , http: //genealogy.math.ndsu. nodak.edu/ 10

George Forsythes last paper

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment