Information geometry for testing pseudorandom number generators

The information geometry of the 2-manifold of gamma probability density functions provides a framework in which pseudorandom number generators may be evaluated using a neighbourhood of the curve of exponential density functions. The process is illust…

Authors: C.T.J. Dodson

Information geometry for testing pseudorandom number generators
Information geometry for testing pseudorandom n um b er generators C.T.J. Do dson Scho ol of Mathematics, University of Manchester, Manchester M13 9PL, UK ctdo dson@manchester.ac.uk Abstract The information geometry of the 2-manifold of gamma probabilit y densit y functions provides a framew ork in whic h pseudorandom n umber generators may be ev aluated using a neigh b ourhoo d of the curve of ex- p onen tial density functions. The pro cess is illustrated using the pseudo- random num b er generator in Mathematica. This metho dology may b e useful to add to the curren t family of test pro cedures in real applications to finite sampling data. 1 In tro duction The smo oth family of gamma probabilit y density functions is given b y f : [0 , ∞ ) → [0 , ∞ ) : x 7→ e − xκ µ x κ − 1  κ µ  κ Γ( κ ) µ, κ > 0 . (1) Here µ is the mean, and the standard deviation σ, given b y κ = ( µ σ ) 2 , is pro- p ortional to the mean. Hence the co efficien t of v ariation 1 √ κ is unity in the case that (1) reduces to the exp onen tial distribution. Thus, κ = 1 corresp onds to an underlying Poisson random pro cess complementary to the exp onential distribu- tion. When κ < 1 the random v ariable X represents spacings b etw een even ts that are more clustered than for a Poisson pro cess and when κ > 1 the spacings X are more uniformly distributed than for Poisson. The case when µ = n is a p ositiv e integer and κ = 2 giv es the Chi-Squared distribution with n − 1 degrees of freedom; this is the distribution of ( n − 1) s 2 σ 2 G for v ariances s 2 of samples of size n taken from a Gaussian p opulation with v ariance σ 2 G . The gamma distribution has a conv enien tly tractable information geome- try [1, 2], and the Riemannian metric in the 2-dimensional manifold of gamma distributions (1) is [ g ij ] ( µ, κ ) = = " κ µ 2 0 0 d 2 dκ 2 log(Γ) − 1 κ # . (2) So the c oordinates ( µ, κ ) yield an orthogonal basis of tangen t v ectors, which is 1 2 Information geometry for testing pseudorandom num b er generators 0 100 200 300 400 500 0.6 0.8 1.0 1.2 1.4 1.6 κ Figure 1: Maximum likeliho o d gamma p ar ameter κ fitte d to sep ar ation statistics for simulations of Poisson r andom se quenc es of length 100000 for an element with exp e cte d p ar ameters ( µ, κ ) = (511 , 1) . These simulations use d the pseudo- r andom numb er gener ator in Mathematic a [7]. useful in calculations b ecause then the arc length function is simply ds 2 = κ µ 2 dγ 2 +  Γ 0 ( κ ) Γ( κ )  0 − 1 κ ! dκ 2 . W e note the following imp ortan t uniqueness property: Theorem 1.1 (Hwang and Hu [4]) F or indep endent p ositive r andom vari- ables with a c ommon pr ob ability density function f , having indep endenc e of the sample me an and the sample c o efficient of variation is e quivalent to f b eing the gamma distribution. This prop erty is one of the main reasons for the large num b er of applications of gamma distributions: man y near-random natural pro cesses hav e standard deviation approximately proportional to the mean [2]. Given a set of iden tically distributed, indep enden t data v alues X 1 , X 2 , . . . , X n , the ‘maximum likelihoo d’ or ‘maximum entrop y’ parameter v alues ˆ µ, ˆ κ for fitting the gamma distribu- tion (1) are computed in terms of the mean and mean logarithm of the X i b y maximizing the likelihoo d function L f ( µ, κ ) = n Y i =1 f ( X i ; µ, κ ) . C.T.J. Do dson 3 By taking the logarithm and setting the gradient to zero we obtain ˆ µ = ¯ X = 1 n n X i =1 X i (3) log ˆ κ − Γ 0 ( ˆ κ ) Γ( ˆ κ ) = log ¯ X − 1 n n X i =1 log X i = log ¯ X − log X . (4) 2 Neigh b ourho o ds of randomness in the gamma manifold In a v ariet y of con texts in cryptology for encoding, deco ding or for obscuring pro cedures, sequences of pseudorandom n umbers are generated. T ests for ran- domness of such sequences hav e b een studied extensively and the NIST Suite of tests [5] for cryptological purp oses is widely employ ed. Information theoretic metho ds also are used, for example see Grzegorzewski and Wieczork owski [3] also Ryabk o and Monarev [6] and references therein for recent w ork. Here w e can show how pseudorandom sequences may b e tested using information geom- etry by using distances in the gamma manifold to compare maximum lik eliho od parameters for separation statistics of sequence elemen ts. Mathematic a [7] sim ulations were made of Poisson random sequences with length n = 100000 and spacing statistics w ere computed for an element with abundance probabilit y p = 0 . 00195 in the sequence. Figure 1 shows maximum lik eliho o d gamma parameter κ data p oin ts from such simulations. In the data from 500 simulations the ranges of maxim um lik eliho o d gamma distribution parameters were 419 ≤ µ ≤ 643 and 0 . 62 ≤ κ ≤ 1 . 56 . The surface height in Figure 2 represen ts upp er b ounds on information ge- ometric distances from ( µ, κ ) = (511 , 1) in the gamma manifold. This employs the geo desic mesh function w e describ ed in Arwini and Do dson [2]. D istance [(511 , 1) , ( µ, κ )] ≤     d 2 log Γ dκ 2 ( κ ) − d 2 log Γ dκ 2 (1)     +     log 511 µ     . (5) Also sho wn in Figure 2 are data p oin ts from the Mathematic a simulations of Poisson random sequences of length 100000 for an element with exp ected separation γ = 511 . In the limit, as the sequence length tends to infinit y and the abundance of the element tends to zero, we exp ect the gamma parameter τ to tend to 1 . Ho wev er, finite sequences must b e used in real applications and then pro vision of a metric structure allows us, for example, to compare real sequence generating pro cedures against an ideal P oisson random mo del. References [1] S-I. Amari and H. Nagaok a. Metho ds of Information Geometry , Amer- ican Mathematical So ciet y , Oxford Universit y Press, 2000. 4 Information geometry for testing pseudorandom num b er generators 400 500 600 1.0 1.5 0.0 0.5 1.0 1.5 2.0 Distanc e fr om ( µ, κ ) = (500 , 1) µ κ Figure 2: Distanc es in the sp ac e of gamma mo dels, using a ge o desic mesh. The surfac e height r epr esents upp er b ounds on distanc es fr om ( µ, κ ) = (511 , 1) fr om Equation (5). Also shown ar e data p oints fr om simulations of Poisson r andom se quenc es of length 100000 for an element with exp e cte d sep ar ation µ = 511 . In the limit as the se quenc e length tends to infinity and the element abundanc e tends to zer o we exp e ct the gamma p ar ameter κ to tend to 1 . C.T.J. Do dson 5 [2] Khadiga Arwini and C.T.J. Dodson. Information Geometry Near Ran- domness and Near Indep endence . Lecture Notes in Mathematics, Springer-V erlag, New Y ork, Berlin 2008. [3] P . Grzegorzewski and R. Wieczorko wski. En tropy-based go odness-of-fit test for exp onentialit y . Commun. Statist. The ory Meth. 28, 5 (1999) 1183-1202. [4] T-Y. Hw ang and C-Y. Hu. On a characterization of the gamma distribu- tion: The indep endence of the sample mean and the sample co efficien t of v ariation. Annals Inst. Statist. Math. 51, 4 (1999) 749-753. [5] A. Rushkin, J. Soto et al. A Statistical T est Suite for Random and Pseudorandom Number Generators for Cryptographic Applica- tions . National Institute of Standar ds & T e chnolo gy , Gaithersburg, MD USA, 2001. [6] B.Y a. Ry abko and V.A. Monarev. Using information theory approach to randomness testing. Preprint: arXiv:cs.IT/0504006 v1 , 3 April 2005. [7] S. W olfram. The Mathematica Bo ok 3 rd edition, Cambridge Universit y Press, Cambridge, 1996.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment