A fast vectorised implementation of Wallaces normal random number generator

Reading time: 5 minute
...

📝 Original Info

  • Title: A fast vectorised implementation of Wallaces normal random number generator
  • ArXiv ID: 1004.3114
  • Date: 2010-04-20
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Wallace has proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace's idea, consider some variations on it, and describe a vectorised implementation RANN4 which is more than three times faster than its best competitors (the Polar and Box-Muller methods) on the Fujitsu VP2200 and VPP300.

💡 Deep Analysis

Deep Dive into A fast vectorised implementation of Wallaces normal random number generator.

Wallace has proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace’s idea, consider some variations on it, and describe a vectorised implementation RANN4 which is more than three times faster than its best competitors (the Polar and Box-Muller methods) on the Fujitsu VP2200 and VPP300.

📄 Full Content

arXiv:1004.3114v1 [cs.DS] 19 Apr 2010 A FAST VECTORISED IMPLEMENTATION OF WALLACE’S NORMAL RANDOM NUMBER GENERATOR RICHARD P. BRENT Abstract. Wallace has proposed a new class of pseudo-random gen- erators for normal variates. These generators do not require a stream of uniform pseudo-random numbers, except for initialisation. The inner loops are essentially matrix-vector multiplications and are very suitable for implementation on vector processors or vector/parallel processors such as the Fujitsu VPP300. In this report we outline Wallace’s idea, consider some variations on it, and describe a vectorised implementa- tion RANN4 which is more than three times faster than its best competi- tors (the Polar and Box-Muller methods) on the Fujitsu VP2200 and VPP300. 1. Introduction Several recent papers [3, 5, 18, 19] have considered the generation of uniformly distributed pseudo-random numbers on vector and parallel com- puters. In many applications, random numbers from specified non-uniform distributions are required. A common requirement is for the normal dis- tribution, which is what we consider here. In principle it is sufficient to consider methods for generating normally distributed numbers with mean 0 and variance 1, since translation and scaling can easily be performed to give numbers with mean µ and variance σ2 (usually referred to as numbers with the N(µ, σ2) distribution). The most efficient methods for generating normally distributed random numbers on sequential machines [2, 4, 9, 10, 11, 12, 14, 20] involve the use of different approximations on different intervals, and/or the use of “rejection” methods, so they do not vectorise well. Simple, “old-fashioned” methods may be preferable on vector processors. In [6] we described two such meth- ods, the Box-Muller [16] and Polar methods [12]. The Polar method was implemented as RANN3 and was the fastest vectorised method for normally distributed numbers known at the time [17, 19], although much slower than the best uniform random number generators. For example, on the Fujitsu VP2200/10 a normal random number using RANN3 requires an average of 21.9 cycles, but a good generalised Fibonacci uniform random number gen- erator requires only 2.21 cycles. (A cycle on the VP2200/10 is 3.2 nsec. Date: 14 April 1997. 1991 Mathematics Subject Classification. Primary 65C10, Secondary 54C70, 60G15, 65Y10, 68U20. Key words and phrases. Gaussian random numbers, maximum entropy, normal distri- bution, normal random numbers, pseudo-random numbers, random number generators, random numbers, simulation, vector processors, Wallace’s method. Copyright c⃝1997, R. P. Brent rpb170tr typeset using AMS-LATEX. 1 2 R. P. BRENT Since four floating-point operations can be performed per cycle, the theo- retical peak performance of the VP2200/10 is 1250 Mflop. The cycle time of the VPP300 is 7 nsec but the pipelines are wider, so the theoretical peak performance is 2285 Mflop.) Recently Wallace [21] proposed a new class of pseudo-random generators for normal variates. These generators do not require a stream of uniform pseudo-random numbers (except for initialisation) or the evaluation of ele- mentary functions such as log, sqrt, sin or cos (needed by the Box-Muller and Polar methods). The crucial observation is that, if x is an n-vector of normally distributed random numbers, and A is an n×n orthogonal matrix, then y = Ax is another n-vector of normally distributed numbers. Thus, given a pool of nN normally distributed numbers, we can generate another pool of nN normally distributed numbers by performing N matrix-vector multiplications. The inner loops are very suitable for implementation on vector processors such as the VP2200 or vector/parallel processors such as the VPP300. The vector lengths are proportional to N, and the number of arithmetic operations per normally distributed number is proportional to n. Typically we choose n to be small, say 2 ≤n ≤4, and N to be large. Wallace implemented variants of his new method on a scalar RISC work- station, and found that its speed was comparable to that of a fast uniform generator. The same performance relative to a fast uniform generator is achievable on a vector processor, although some care has to be taken with the implementation (see §7). In §2 we describe Wallace’s new methods in more detail. Some statis- tical questions are considered in §§3–6. Aspects of implementation on a vector processor are discussed in §7, and details of an implementation on the VP2200 and VPP300 are given in §8. Some conclusions are drawn in §9. 2. Wallace’s Normal Generators The idea of Wallace’s new generators is to keep a pool of nN normally distributed pseudo-random variates. As numbers in the pool are used, new normally distributed variates are generated by forming appropriate combi- nations of the numbers which have been used. On a vector processor N can be large and the whole pool can be regenerated with only a small number of vector operations1. As just outlined,

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut