Infinity in computable probability

Inﬁnit y in comp u table p robabilit y Logical p ro of that William Shak esp eare p robably w as not a dactylographi c monk ey Maarten McKubre-Jordens ∗ & Phillip L. Wilson † June 4, 20 20 1 Intro duction Since at le ast the time of Aristot le [1], the concept o f com bining a ﬁnite n umber of ob jects inﬁnitely man y t imes has b een tak en to imply certain t y of constru ction of a particular ob ject. In a frequently-enco untered mo d ern example of this argument , at least one of inﬁnitely many monkeys, pro du cing a c haracter strin g equal in length to the collecte d w orks of Shak esp eare b y striking t yp ewr iter ke ys in a u niformly random manner , w ill with probabilit y one repro duce the collected works. In the follo wing, the term “monkey” can (naturally) refer to some (abstract) device capable of pr o ducing sequences of letters of arbitrary (ﬁ x ed ) length at a r easonable sp eed. Recursiv e fu nction theory is one p ossible mo del for computation; Russian recursive mathematics is a reasonable formalization of this theory [4]. 1 Here we sh o w that, s ur- prisingly , w ithin recursive mathematics it is p ossib le to assign to an inﬁnite num b er of monk eys p robabilities of repro ducing S h ak esp eare’s collected w orks in such a w a y that while it is imp ossible that no monke y repr o duces the collecte d works, the probabilit y of any ﬁnite num b er of m onk eys repro ducing the works of Shake sp eare is arbitr arily s mall. The metho d of assigning probabilities dep ends only on the desired probabilit y of success and not on the size of any ﬁ nite subset of monkeys. Moreo v er, the result extends to reprod ucing all p ossible texts of an y ﬁnite giv en length. Ho w ev er, in the conte xt of imp lemen ting an exp eriment or sim ulation computationally (suc h as th e small-scale example in [10]; see also [7]), the fraction among all p ossible probabilit y distrib u tions of such p atho lo gic al distr ibutions is v anishingly small provided suﬃcien tly large samples are taken. ∗ Adjunct F ell ow, School of Mathematics & Statistics, U niversi ty of Can terbury , New Zealand. † Corresponding Author: phillip.wilson@can terbury .ac. nz. School of Mathematics & Statistics, Univer- sit y of Can terbury , N ew Zealand. 1 The history of the relationship b etw een classical logic and compu tation is long and complex, and b eyond the scop e of th is pap er. These results do indicate th at mathematics using constructive logics, such as the R ussian recursive mathematics used here, seems to b e more suited to the simulation of the work of a computer than classical logic. 1 2 The classical exp er i ment The classical inﬁnite monk ey theorem [2, 6] can b e stated as follo ws: giv en an inﬁn ite amoun t of time, a m onk ey hitting keys on a t yp ewr iter with un iformly random proba- bilit y will almost certainly t yp e the collected w orks of William Shak esp eare [11]. W e u s e a sligh tly altered (bu t equiv ale nt) theorem in vo lving an inﬁ nite collecti on of monk eys, and giv e an int uitive d irect pro of. Let a s tr ing of charac ters of length w ∈ N + o v er a giv en alphab et A (of s ize | A | , including pu nctuation) b e called a w -string . F o r example, “ b a nana ” is a 6-string ov er the alphab et { a, b, n } . Supp ose eac h monkey is giv en a computer k eyb oard with | A | k eys, eac h corresp onding to a diﬀerent charac ter. Supp ose also that the exp erimen t is so con trive d that eac h monkey will t yp e its w -string in ﬁ nite time. Theorem 1. A t le ast one of inﬁnitely many monkeys typing w -strings, as describ e d in the pr evious p ar agr ap h, wil l almost c ertainly pr o duc e a p erfe ct c opy of a tar g e t w - string in ﬁnite time. Pr o of. Recall that for this theorem, the pr ob ab ility that any giv en monk ey strikes an y particular key is uniformly distribu ted. Let the target w -string b e T w . The chance of a give n mon key pro du cing T w is simply the p robabilit y of him t yping eac h c haracter of the target text in the correct p osition, or 1 | A | × 1 | A | × · · · × 1 | A | | {z } w terms =  1 | A |  w . Therefore, the p robabilit y that a giv en mon key fails to pr o duce T w is 1 −  1 | A |  w . No w, if we examine th e output of m m onk eys, then the probabilit y that none of these monk eys p ro duces T w is T w ( m ) =  1 −  1 | A |  w  m . Therefore, the p robabilit y that at least one monkey of m pro duces T w is P ( m ) = 1 − T w ( m ) . No w lim m →∞ P ( m ) = 1 . In other words, as th e n umb er of monk eys tends to in ﬁnit y , at least one will almost certainly pr o duce th e r equired string. 2 Ho w ev er, an y real-wo rld exp eriment that attempts to sh o w this w ill, unless the target w -str in g and | A | are qu ite small, b e very lik ely to fail, since the pr obabilities in vo lve d are tin y . F or example, taking the English alphab et (together w ith p unctuation and capitaliza tion) to hav e 64 charact ers, a simple computation s h o ws th at, if the monkeys are t yping 6-strings, the chance s of a monk ey typing “banana” correctly are  1 64  6 = 1 68719 476736 ≈ 1 . 5 × 10 − 11 . (1) If it tak es one s econd to chec k a single monk ey’s output, then th e num b er of seconds th at will elapse b efore w e ha v e a 50% c hance of ﬁn d ing a monke y that has typed “banana” correctly is outside the precision of t ypical computing soft w are. Of cour se, if some monk eys ha ve a p reference for typing a certain letter more often than others — say ‘a’ — then this probabilit y can b e muc h larger. I ndeed, it is non-un iformit y among monkeys that we exploit to derive our main result in § 4. Results such as (1) ha v e b een interpreted [8, p.53] as sa ying that “The probabilit y of [r epro ducing the collecte d works of S hak esp eare] is therefore zero in any op erational sense. . . ”. In § 4 w e show that this p r obabilit y can b e made arbitrarily small in an y sense, op erational or otherwise. 3 A simple, classical non-unifo rm version What if the monk eys do not necessarily strike th eir ke ys in a unif orm ly d istributed manner? In this case, w e might p rescrib e a certain probabilit y f or a particular m on key to t yp e a particular w -string (and this pr obabilit y need not b e the same from one monkey to the n ext). Before we reac h our main result, we outline a non-uniform classical probabilit y distribution suc h that for an y ε > 0 the probabilit y of success by m onk ey m is arb itrarily small, but with th e probabilit y of failure still zero. If w e allo w our p robabilit y distrib u tion to b e a function of m as w ell as ε then the follo wing distrib ution will suﬃ ce: 1 − p k ( m, ε ) = δ ( m − k )( ε − σ ) + δ ( m + 1 − k )(1 − ε + σ ) , where p k is the pr obabilit y of failing at m onk ey k , the Dirac delta fu n ction δ ( s ) = 1 for s = 0 and is zero otherw ise, and 0 < σ < ε . Here, the probability of ﬁn ding the target w -str in g at or b efore the m th monk ey , P ( m ), is less than the pr escrib ed ε , but success is still certain — w e n eed merely lo ok at m + 1 monkeys. In the f ollo wing section we show that, sur prisingly , this can b e ac hiev ed w ith a p roba- bilit y distribution d ep endent only on ε , and not on m . That is, it is p ossible to pro d u ce a computable distrib ution so that, while eac h monk ey p ro duces Shakespeare’s w orks with nonzero probabilit y , actually ﬁn ding the culprit among any ﬁ nite sub collection is very unlik ely . T o d o s o, w e inv ok e a result from recur siv e mathematics. 4 The successful monk ey is a rbitra rily elusive Within recursive mathematics, th ere is a theorem sometimes referred to as the singular c overing the or em , originally pr o v ed by Tseitin and Zasla vsky (1956), and indep end en tly 3 b y Kreisel and Lacombe (1957) (see [9]): give n a compact set K , for every p ositiv e ε , one can construct a computable op en rational ε -b ounded co ve ring of K . 2 It can b e restricted to the interv al [0 , 1] as follo ws: Theorem 2. F or e ach ε > 0 ther e exists a se quenc e ( I k ) ∞ k =1 of b ounde d op en r ational intervals in R suc h that (i) [0 , 1] ⊂ S ∞ k =1 I k , and (ii) P n k =1 | I k | < ε for e ach n ∈ N + . Our pr incipal result, T h eorem 3, f ollo ws fr om this theorem, and highlights th e tension b et wee n classical p robabilit y theory and its constructive coun terpart as outlined in [5]. T o s et up our principal theorem, w e ﬁrs t deﬁne M to b e an inﬁn ite, enumerable set of monk eys (the monkeyverse ), and for any natur al num b er m the m -tr o op of monkeys to b e th e ﬁrst m monk eys in M . Note that for any giv en monke y it is decidable w hether that monke y has pro d uced a giv en ﬁ n ite target string. Theorem 3. Given a ﬁnite tar get w - string T w and a p ositive r e al numb er ε , ther e exists a c omputa ble pr ob ability distribution on M of pr o ducing w -strings such that: (i) the classic al pr ob ability that no monkey in M pr o duc es T w is 0 ; and (ii) the pr ob ability of a monkey in any m -tr o op pr o ducing T w is less than ε . Pr o of. S upp ose that the h yp otheses of the theorem are s atisﬁed. As ab o v e, let P ( m ) b e the probabilit y that a monk ey in the m -tro op h as pr o duced T w , and let p k b e the probabilit y th at the k th monk ey h as not pr o duced T w . Then P ( m ) = 1 − m Y k =1 p k . Giv en 0 < ε < 1, compu te ε 0 = − log(1 − ε ). F or this ε 0 , construct the singular co v er ( I k ) ∞ k =1 as p er Theorem 2. T h en set p k = exp ( −| I k | ) . T o p ro v e (i), observe that 0 < p k < 1 for eac h k . The monotone con v ergence theorem no w ensur es th at the pro duct Q m k =1 p k classic al ly tend s to 0, hence it is (classically) imp ossible that n o m on key pro du ces T w . On the other hand, w e h a v e (computably) − log ( p k ) = | I k | , 2 Related theorems with detailed pro ofs and discussion w ere published by Tseitin and Zasla vsky in [12]. W e hasten to add t hat while this may seem esoteric, th e results really provide commenta ry on muc h more mainstream ideas such as computer simulatio ns, since constructive logics are muc h more suited to theorizing about th ese. 4 whence, by the singular co v ering theorem, m X k =1 − log ( p k ) = m X k =1 | I k | < ε 0 = − log(1 − ε ) for all m ∈ N + . Some rearranging sho ws that log m Y k =1 p k ! = m X k =1 log( p k ) > log(1 − ε ) and hence m Y k =1 p k > 1 − ε. Then the pr obabilit y of any memb er of the m -tro op p ro ducing T w is P ( m ) = 1 − m Y k =1 p k < ε for any p ositiv e n atural num b er m . This p ro v es (ii). Th us, the c hances of us actually ﬁnding the monkey that pro duces the collec ted works of S hak esp eare can b e made arbitrarily small, and the classical intuition that, s ince we ha v e an inﬁnite num b er of monk eys, S hak esp eare’s w orks m ust b e t yp ed by some monkey is of no help in lo c ating the su ccessfu l m onk ey . W e emphasize that, in con trast to the case in § 3, the p athologic al distribution in Theorem 3 do es not dep end on m , the size of th e tro op we search. 3 One migh t argue that it is easy to assign pr obabilities in su c h a wa y that any ﬁnite searc h will almost certainly not yield the monk ey that pro du ced it — b y letting eac h monk ey pro duce the target w -string with probabilit y zero. Ho w ev er, in this case, no monk ey w ill pro du ce it. Our theorem sho ws that, even in the case w here it is (classically) imp ossible that no m onk ey pro d uces the target, it is still p ossible to m ak e the pr obabilit y of ﬁn d ing the monkey th at accomplishes the n ecessary task arbitrarily s m all. 5 T a rget-free writing One criticism of the ab o v e line of reasoning is that the exp erimen ter requires knowle dge of the target. Th ere, the output of eac h m onk ey was tested against the collected w orks of 3 Con trasting the classical with the computational view in t he same pro of ma y prov e counterin tuitive. W e are hoping to shed light on why the i ntuitive result—that it is (in t he classical abstract world) imp ossible t hat no monkey prod uces Shakesp eare’s works—clas hes with the fact that it may b e incredibly diﬃcult (in the concrete comput ational worl d) to nail the c heeky monkey that did it. What sense to make of the pro duct Q p k of monkeys failing to prod uce Shakespeare classically tending to 0? The problem here is t he r ate at whic h it do es so—th is rate is computationally un - tractable. 5 Shak esp eare: only if every characte r matc hed w ould it pass the test. Ho w ev er, supp ose no w that we w ish to recreate Shakespeare’s work armed only with kn owledge of the total c haracter length in some alphab et. That is, we kno w that we require one of th e | A | w p ossible w -strings. C an w e guarantee to complete th e list (without rep etitio n) and therefore recreate the collected w orks of Shake sp eare (somewhere)? W e note that th e list can b e shortened by chec king f or grammar etc. 4 ; here we consider the w orst case of the complete list, without rep etitio n, of w -strings. Corollary 4. Any list of ﬁnite strings is c ompl ete d in ﬁnite time with arbitr arily smal l pr ob ability. The pro of relies on applying Th eorem 3 m ultiple times using stand ard calculations. 6 P athological distributions a re a rbitrarily r are A t ﬁrst sight, Theorem 3 might app ear to d estro y any hop e of ﬁnding the successful monk ey . Ho wev er, we hav e the follo wing: Theorem 5. The pr ob ability th at th e p r ob ability distr ibution on the mo nkeyverse i s c onst ructe d in such a way as to make the c o nstructive pr ob ability of ﬁnding the desir e d monkey arbitr arily smal l, is arbitr arily smal l. Pr o of. Given 0 < ε < 1, in order for the pr obabilit y distrib u tion to b e pathological, the probabilit y of any m onk ey in the m -tro op outputting T w cannot exceed ε . T h erefore the fraction of p athologic al distr ibutions o v er an m -tro op is at most ε m , and lim m →∞ ε m = 0 . In short, we can mak e the fraction of pathological d istributions arbitrarily small if w e searc h suﬃcien tly large m -tro ops. Here, then, is an a priori justiﬁcation for large sample sizes in the case of computational sim ulations. 7 Discussion and further w o rk Recall that, throughout th is pap er, we tak e the term “monkey” to r efer to some device capable of pr o ducing arbitrary but ﬁnite sequences of letters — computers satisfy this criterion. The theorems p resen ted in this pap er therefore ha v e implications for computer sim ulations. In particular, when p erforming sim ulations of a probab ilistic nature, the exp erimente r needs to ensure th at pathologic al distribu tions do not arise, or arise r arely enough to provide a measure of conﬁdence in the conclusion. 4 T runcating th e list in this wa y may b e desirable in order to avo id b eing ove rwhelmed by “meaningless cacophonies, verbal farragoes, and babb lings” [3 ]. 6 It should also b e noted that th e classical non-un iform distribution p resen ted ab o v e suggests that a p athologic al s itu ation can nev er b e ruled out w ith certain t y , since if the exp erim enter tests j u st one more monkey , the result may b e v astly diﬀerent than observ ed earlier in th e sim ulation. With practical considerations in min d, there will b e some p oin t at whic h costs (ethical and/or material) out w eigh the b eneﬁt of testing further monke ys. The pro of of Theorem 3 required a r esult from constru ctiv e mathematics. W e conjec- ture that suc h a resu lt is classically imp ossible, s in ce the sin gular co v ering theorem is classically not true. A deep er fact h ere is that from the classical viewp oin t, the computable reals hav e zero measure, and all ﬁ n ite texts pr o duced by monk eys corresp ond to the rationals (or some other conv enien t s ubset of compu table reals). Th e con text of the results, then, would indicate that a careful constr u ctiv e study of prob ab ility distributions pro vides a priori motiv ation for rep etition of simulations for accuracy (to rule out acciden tal pathological distributions generated by compu ter p rograms), and has p otentia lly more to sa y ab out issues inv olvi ng computer s imulations. There is the fu rther issu e of what mo del of constructive mathematics p ro vides a go o d framew ork for th is sort of w ork. Philosophically there is tension b et wee n the intuitio nistic free c hoice-sequence appr oac h and the computable sequence appr oac h, and within these approac hes are fu r ther complicatio ns b y sensitivit y of the theory to the v alidit y (or in v alidit y) of the v arious v ersions of K¨ onig’s Lemma. It is not the aim here to go deeply in to these issues, wh ic h could lead to a length y series of pap ers. In the in terest of brevit y , w e lea ve such explorations for future research. It h as not escap ed our atten tion that science and m athematics hav e eac h b een consid - ered to b e “games” of r ecombining a ﬁn ite set of charac ters (ev en if we do not yet kn o w what they all are). Even if we consid er only ﬁnite strings w hic h are syntac tically soun d, and not contradict ed by empirical evidence, our result sh o ws that completing su c h a list is not necessarily eve n likely to happ en within any ﬁnite time, su c h as a human lifespan , the du ration of a civilisation, or even th e age of the u niv erse. Ackno wledgements: The authors w ould lik e to ackno wledge the con tributions of the anon ymous referees for sub stan tial impro ve ments to the pap er. McKu bre-Jordens w as partially fu nded by Marsden F un d F ast-Start Grant UC1205. References [1] Aristotle (350 BCE) Metap hysics . T rans lation b y W.D. Ross. http://c lassics.m it.edu/Aristotle/metaphysics.html . Retriev ed 18 Ju ne 2010. [2] ´ E. Borel (1913) ‘M´ eca niqu e Statistique et Ir r ´ eve rsibilit´ e’. J. Phys. , 5e s ´ erie 3, 189– 196. 7 [3] J.L. Borges (1939) In The T otal Li b r ary: Non-Fiction 1922-19 86 . T ranslated by E. W einberger (2000). Penguin, London, 214–216 . [4] D.S. Bridges & F. Ric hman (1987 ) V arieties of Constructive Mathematics . LMS Lecture Notes Series, Cam brid ge Universit y Press. [5] Y.K. Chan (1974) Notes on C onstructiv e Prob ab ility Theory . Ann. Pr ob. , 2(1), 51– 75. Institute of Mathematical Statistics. [6] A. Eddington (1928) The N atur e of the Physic al World: The Giﬀor d L e ctur es . New Y ork: Macmillan. [7] Elmo, Gum, Hea ther, Holl y , Mistlet o e, & Ro w an Notes T owar ds The Complete Works of Wil liam Shakesp e ar e (2002) Kh a v e-So ciet y & Liquid Press, UK . [8] C. K ittel & H. Kr o emer (1980 ) Thermal Physics (2nd ed.). W.H. F reeman C omp an y . [9] B.A. Ku shner (1999) Marko v’s constructive analysis; a participan t’s view. The or et- ic al Computer Scienc e 219, 267–285. [10] ‘Giv e six monk eys a computer, and what do y ou get? Certainly n ot the Bard’, h ttps://www.theguardian.com/uk/2003/ma y/0 9/science.arts. [11] W. Sh ak esp eare The Complete Works of Wil liam Shakesp e ar e (2001) Geddes & Grosset, Scotland. [12] I.D. Zasla vsky & G.S. Tseitin (1962) Singular co ve rings and prop er ties of constru c- tiv e fu nctions connected with them, Pr oblems of the c onstructive dir e ction in math- ematics. P art 2. Constructive mathematic al analysis , Collection of articles, T ru dy Mat. Inst. S teklo v., 67, Acad. Sci. USS R, Mosco wLeningrad, 458–502. English trans- lation: A.M.S. T ranslations (2) 98 (1971), 41-89, MR 27#2408. 8

Infinity in computable probability

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment