Uses of randomness in computation

Random number generators are widely used in practical algorithms. Examples include simulation, number theory (primality testing and integer factorization), fault tolerance, routing, cryptography, optimization by simulated annealing, and perfect hashi…

Authors: Richard P. Brent

The Galileo spacecraft is somewhere near Jupiter, but its main radio antenna is not working, so communication with it is very slow. Suppose we want to check that a critical program in Galileo's memory is correct, and has not been corrupted by a passing cosmic ray. How can we do this without transmitting the whole program to or from Galileo ? Here is one way. The program we want to check (say N 1 ) and the correct program on Earth (say N 2 ) can be regarded as multiple-precision integers. Choose a random prime number p in the interval (10 9 , 2 × 10 9 ). Transmit p to Galileo and ask it to compute r 1 ← N 1 mod p and send it back to Earth. Only a few bits (no more than 64 for p and r 1 ) need be transmitted between Earth and Galileo, so we can afford to use good error correction/detection. On Earth we compute r 2 ← N 2 mod p, and check if r 1 = r 2 . There are two possibilities: • r 1 = r 2 . We conclude that N 1 = N 2 . Galileo's program has been corrupted ! If there are only a small number of errors, they can be localised by binary search using O(log log N 1 ) small messages. • r 1 = r 2 . We conclude that Galileo's program is probably correct. More precisely, if Galileo's program is not correct there is only a probability of less than 10 -9 that r 1 = r 2 , i.e. that we have a "false positive". If this probability is too large for the quality-assurance team to accept, just repeat the process (say) ten times with different random primes p 1 , p 2 , . . . , p 10 . If N 1 = N 2 , there is a probability of less than 10 -90 that we get r 1 = r 2 ten times in a row. This should be good enough. The problem and its solution were communicated to me by Michael Rabin, who called it the "Library of Congress on Mars" problem. Our procedure has the following form. We ask a question with a yes/no answer. The precise question depends on a random number. If the answer is "no", we can assume that it is correct. If the answer is "yes", there is a small probability of error, but we can reduce this probability to a negligible level by repeating the procedure a few times with independent random numbers. We call such a procedure a probabilistic algorithm; other common names are randomised algorithm and Monte Carlo algorithm. It would be much better to build error correcting hardware into Galileo, and not depend on checking from Earth. Here is another example 1 with the same structure. We want an algorithm to determine if a given odd positive integer n is prime. Write n as 2 k q + 1, where q is odd and k > 0. 2. Compute y = x q mod n. This can be done with O(log q) operations mod n, using the binary representation of q. 3. If y = 1 then return "yes". 4. For j = 1, 2, . . . , k do if y = n -1 then return "yes" else if y = 1 then return "no" else y ← y 2 mod n. Return "no". 1 Due to M. O. Rabin [49], with improvements by G. L. Miller. See Knuth [28]. To understand the mathematical basis for Algorithm P, recall Fermat's little Theorem: if n is prime and 0 < x < n, then x n-1 = 1 mod n. Thus, if x n-1 = 1 mod n, we can definitely say that n is composite. Unfortunately, the converse of Fermat's little theorem is false: if x n-1 = 1 mod n we can not be sure that n is prime. There are examples (called Carmichael numbers) of composite n for which x n-1 is always 1 mod n when GCD(x, n) = 1. The smallest example is A slight extension of Fermat's little Theorem is useful, because its converse is usually true. If n = 2 k q + 1 is an odd prime, then either x q = 1 mod n, or the sequence ends with 1, and the value just preceding the first appearance of 1 must be n -1. Proof: If y2 = 1 mod n then n|(y -1)(y + 1). Since n is prime, n|(y -1) or n|(y + 1). Thus y = ±1 mod n. ✷ The extension gives a necessary (but not sufficient) condition for primality of n. Algorithm P just checks if this condition is satisfied for a random choice of x, and returns "yes" if it is. Algorithm P can not give false negatives (unless we make an arithmetic mistake), but it can give false positives (i.e. "yes" when n is composite). However, the probability of a false positive is less than 1/4. (Usually much less -see Knuth [28], ex. 4.5.4.22.) Thus, if we repeat the algorithm 10 times there is less than 1 in 10 6 chance of a false positive, and if we repeat 100 times the results should satisfy anyone but a pure mathematician. Algorithm P works fine even if the input is a Carmichael number. Note that in both our examples randomness was introduced into the algorithm. We did not make any assumption about the distribution of inputs. Given any ε > 0, we can check primality of a number n in O((log n)3 log(1/ε)) bit-operations 3 , provided we are willing to accept a probability of error of at most ε. By way of comparison, the best known deterministic algorithm takes O((log n) c log log log n ) bit-operations, and is much more complicated. If we assume the Generalised Riemann Hypothesis, the exponent can be reduced to 5. (But who believes in GRH with as much certainty as Algorithm P gives us ?) The probabilistic algorithms considered so far (Monte Carlo algorithms) can give the wrong answer with a small probability. There is another class of probabilistic algorithms (Las Vegas algorithms) for which the answer is always correct; only the runtime is random4 . An interesting example is H. W. Lenstra's elliptic curve method (ECM) [36] for integer factorisation. To avoid trivial cases, suppose we want to find a prime factor p > 3 of an odd composite integer N . To motivate ECM, consider an earlier algorithm, Pollard's "p -1" method. This works if p -1 is "smooth", i.e. has only small prime factors. p -1 is important because it is the order of the multiplicative group G of the field F p . The problem is that G is fixed. Lenstra had the idea of using a group G(a, b) which depends on parameters (a, b). By randomly selecting a and b, we get a large set of different groups, and some of these should have smooth order. The group G(a, b) is the group of points on the elliptic curve and by a famous theorem5 the order of G(a, b) is an integer in the interval The distribution in this interval is not uniform, but it is "close enough" to uniform for our purposes. Under plausible assumptions ECM has expected run time where c ≃ 2. Note that T depends mainly on the size of p, the factor found, and not very strongly on N . In practice the run time is close to an exponentially distributed random variable with mean and variance about T . ECM is the best known algorithm for finding moderately large factors of very large numbers. Consider the 617-decimal digit Fermat number F 11 = 2 2 11 + 1. Its factorisation is: where p 564 is a 564-decimal digit prime. In 1989 I found the 21-digit and 22-digit prime factors using ECM. The factorisation required about 360 million multiplications mod N , which took less than 2 hours on a Fujitsu VP 100 vector processor. Hashing is a common technique used to map words into a small set of integers (which may then be used as indices to address a table). Thus, the computation r 1 ← N 1 mod p used in our "Galileo" example can be considered as a hash function. Formally, consider a set W = {w 0 , w 1 , . . . , w m-1 } of m words w j , each of which is a finite string of symbols over a finite alphabet Σ. A hash function is a function h : W → I, where I = {0, 1, . . . , k -1} and k is a fixed integer (the table size). A collision occurs if two words w 1 and w 2 map to the same address, i.e. if h(w 1 ) = h(w 2 ). There are various techniques for handling collisions [29]. However, these complicate the algorithms and introduce inefficiencies. In applications where W is fixed (e.g. the reserved words in a compiler), it is worth trying to avoid collisions. If there are no collisions, the hash function is called perfect. For a perfect hash function, we must have k ≥ m. If k = m the hash function is minimal. Given a set W , how can we compute a minimal perfect hash function ? The CHM Algorithm Czech, Havas and Majewski (CHM) [14] give a probabilistic algorithm which runs in expected time O(m) (ignoring the effect of finite word-length). Their algorithm uses some properties of random graphs. Take n = 3m, and let V = {1, 2, . . . , n}. CHM take two independent pseudo-random functions6 and let We can think of G = (V, E) as a random graph with n vertices V and (at most) m edges E. If G has less than m edges or G has cycles, CHM reject the choice of f 1 , f 2 and try again. Eventually they get a graph G with m edges and no cycles. Because n = 3m, the expected number of trials is a constant (about √ 3, or more generally n n-2m , for large m and n > 2m). Once an acceptable G has been found, it is easy to compute (and store in a table) a function is the desired minimal perfect hash function. We can even get h(w j ) = j for j = 0, 1, . . . , m -1. All this requires is a depth-first search of G. CHM report that on a Sun SPARCstation 2 they can generate a minimal perfect hash function for a set of m = 2 19 words in 33 seconds. Earlier algorithms required time which (at least in the worst case) was an exponentially increasing function of m, so could only handle very small m. A network G is a connected, undirected graph with N vertices 0, 1, . . . , N -1. The permutation routing problem on G is: given a permutation π of the vertices, and a message (called a packet) on each vertex, route packet j from vertex j to vertex π(j). It is assumed that at most one packet can traverse each edge in unit time, and that we want to minimise the time for the routing. In practice we only want to consider oblivious algorithms, where the route taken by packet j depends only on (j, π(j)). For simplicity, assume that the G is a d-dimensional hypercube, so N = 2 d . Similar results apply to other networks. A simple algorithm for routing packets on a hypercube chooses which edge to send a packet along by comparing the current address and the destination address and finding the highest order bit position in which these addresses differ. For example, consider the bit-reversal permutation 01001001 → 10010010. Each "↓" corresponds to traversal of an edge in the hypercube. The following result [8] says that there are no "uniformly good" deterministic algorithms for oblivious permutation routing: Theorem: For any deterministic, oblivious permutation routing algorithm, there is a permutation π for which the routing takes Ω( N/d 3 ) steps. Example: For the leading-bit routing algorithm, take π to be the bit-reversal permutation, i.e. where there are at least d/2 trailing zeros. We can do much better with a probabilistic algorithm. Valiant suggested: 1. Choose a random mapping σ (not necessarily a permutation). 2. Route message j from vertex j to vertex σ(j) using the leading bit algorithm (for 0 ≤ j < N ). 3. Route message j from vertex σ(j) to vertex π(j). This seems crazy 7 , but it works ! Valiant and Brebner [60] prove: Theorem: With probability greater than 1 -1/N , every packet reaches its destination in at most 14d steps. The expected number of steps to route all packets is less than 15d. Some probabilistic algorithms use many independent random numbers, and because of the "law of large numbers" their performance is very predictable. One example is the multiple-polynomial quadratic sieve (MPQS) algorithm for integer factorisation. Suppose we want to factor a large composite number N (not a perfect power). The key idea of MPQS is to generate a sufficiently large number of congruences of the form where p 1 , . . . , p k are small primes in a precomputed "factor base", and y is close to √ N . Many y are tried, and the "successful" ones are found efficiently by a sieving process. Making some plausible assumptions, the expected run time of MPQS is where c ≃ 1. In practice, this estimate is good and the variance is small. MPQS is currently the best general-purpose algorithm for factoring moderately large numbers N whose factors are in the range N 1/3 to N 1/2 . For example, A. K. Lenstra and M. S. Manasse recently found where the penultimate factor p 50 is a 50-digit prime 24677078822840014266652779036768062918372697435241, and the largest factor p 67 is a 67-digit prime. The computation used a network of workstations for "sieving", then a super-computer for the solution of a very large linear system. A "random" 129-digit number (RSA129) has just been factored in a similar way to win a $100 prize offered by Rivest, Shamir and Adleman in 1977. Do probabilistic algorithms have an advantage over deterministic algorithms ? If we allow a small probability of error, the answer is yes, as we saw for the Galileo example. If no error is allowed, the answer is (probably) no. A. C. Yao considered probabilistic algorithms (modelled as decision trees) for testing properties P of undirected graphs (given by their adjacency matrices) on n vertices. He also considered deterministic algorithms which assume a given distribution of inputs (i.e. a distribution over the set of graphs with n vertices). Yao defines randomized complexity F R (P ) as an infimum (over all possible algorithms) of a maximum (over all graphs with n vertices) of the expected runtime. and distributional complexity F D (P ) as a supremum (over input distributions) of a minimum (over all possible deterministic algorithms) of the average runtime. Informally, F R (P ) is how long the best probabilistic algorithm takes for testing P ; and F D (P ) is the average runtime we can always guarantee with a good deterministic algorithm, provided the distribution of inputs is known. Yao (1977) claims that F D (P ) = F R (P ) follows from the minimax theorem of John von Neumann (1928). The minimax theorem is familiar from the theory of two-person zero-sum games. Yao's result should not discourage the use of probabilistic algorithms -we have already given several examples where they out-perform known deterministic algorithms, and there are many similar examples. Yao's computational model is very restrictive. Because n is fixed, table lookup is permitted, and the maximum complexity of any problem is O(n 2 ). Less restrictive models have been considered by Adleman and Gill. Without going into details of the definitions, they prove: Theorem: If a Boolean function has a randomised, polynomial-sized circuit family, then it has a deterministic, polynomial-sized circuit family. There are two problems with this result: • The deterministic circuit may be larger (by a factor of about n, the number of variables) than the original circuit. • The transformation is not "uniform" -it can not be computed in polynomial time by a Turing machine. The proof of the theorem is by a counting argument applied to a matrix with 2 n rows, so it is not constructive in a practical sense. We can formalise the notion of a probabilistic algorithm and define a class RP of languages L such that x ∈ L is accepted by a probabilistic algorithm in polynomial time with probability p ≥ 1/2 say8 , but x / ∈ L is never accepted. Clearly P ⊆ RP ⊆ N P, where P and N P are the well-known classes of problems which are accepted in polynomial time by deterministic and nondeterministic (respectively) algorithms. It is plausible that P ⊂ RP ⊂ N P, but this would imply that P = N P , so it is a difficult question. 9 Perfect Parties B. McKay (ANU) and S. Radziszowski are interested in the size of the largest "perfect party". Because people at parties tend to cluster in groups of five, we consider a party to be imperfect if there are five people who are mutual acquaintances, or five who are mutual strangers. A perfect party is one which is not imperfect. McKay et al have performed a probabilistic computation which shows that, with high probability, the largest perfect party has 42 people. R(s, t) is the smallest n such that each graph on n or more vertices has a clique of size s or an independent set of size t. Examples: R(3, 3) = 6, R(4, 4) = 18, R(4, 5) = 25, and 43 ≤ R(5, 5) ≤ 49. See [38,39]. Perfect party organisers would like to know R(5, 5) -1. A (5, 5, n)-graph is a graph with n vertices, no clique of size 5, and no independent set of size 5. There are 328 known (5, 5, 42)-graphs, not counting complements as different. McKay et al generated 5812 (5,5,42)-graphs using simulated annealing, starting at random graphs. All 5812 turned out to be known. If there were any more (5,5,42)-graphs, and if the simulated annealing process is about equally likely to find any (5, 5, 42)-graph 9 , then another such graph would have been found with probability greater than 0.99999998 Thus, there is convincing evidence that all (5, 5, 42)-graphs are known. None of these graphs can be extended to (5, 5, 43)-graphs. Thus, it is very unlikely that such a graph exists, and it is very likely that R(5, 5) -1 = 42 A rigorous proof that R(5, 5) -1 = 42 would take thousands of years of computer time 10 , so the probabilistic argument is the best that is feasible at present, unless we can get time on a computer as fast as Deep Thought [1]. We did not have time to mention applications of randomness to serial or parallel algorithms for: • sorting and selection, • computer security, • cryptography, • computational geometry, • load-balancing, • collision avoidance, • online algorithms, • numerical integration, • graphics and virtual reality, • avoiding degeneracy, • approximation algorithms for NP-hard problems, 9 There is no obvious way to prove this, so the probability estimate is not rigorous. 10 Based on the fact that it took seven years of Sparcstation time to show that R(4, 5) = 25. and many other problems. References to most of these applications are given in the bibliography below (see for example [42,53]). We did not discuss algorithms for generating pseudo-random numbers -that would require another talk11 . • Probabilistic algorithms are useful. • They are often simpler and use less space than deterministic algorithms. • They can also be faster, if we are willing to live with a minute probability of error. • Give good lower bounds for the complexity of probabilistic algorithms (with and without error) for interesting problems. • Show how to generate independent random samples from interesting structures (e.g. finite groups defined by relations, various classes of graphs, . . .) to provide a foundation for probabilistic algorithms on these structures. • Consider the effect of using pseudo-random numbers instead of genuinely random numbers. • Extend Yao's results to a more realistic model of computation. • Give a uniform variant of the Adleman-Gill theorem. • Show that P = RP (hard). Hardy's taxi number[23], 1729 = 12 + 1 3 = 10 3 + 93 .3 We can factor n deterministically in O(log n) arithmetic operations[56], but this result is useless because the operations are on numbers as large as 2 n . Thus, it is more realistic to consider bit-operations. In practical cases the expected runtime is finite. It is possible that the algorithm does not terminate, but with probability zero. The "Riemann hypothesis for finite fields". G(a, b) is known as the "Mordell-Weil" group. The result on its order follows from a theorem of Hasse (1934), later generalised by A. Weil and Deligne (see[34]). How can this be done ? This is a theoretical weak point of the algorithm, but in practice the solution given in[14] is satisfactory. I do not know of any manufacturer who has been persuaded to implement it. Probably it would be hard to sell. Any fixed value in (0, 1) can be used in the definition. "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." (John vonNeumann, 1951).

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment