A Tricentenary history of the Law of Large Numbers

Bernoul li 19 (4), 2013, 1088–11 2 1 DOI: 10.315 0/12-BEJ SP12 A T ricen tenary history of the Law of Large Num b ers EUGENE SENET A Scho ol of Mathematics and Statist ics FO7, University of Sydney, NSW 2006, A ustr alia. E-mail: eseneta@math s.usyd.e du.au The W eak Law of Large Numbers is traced chronologi cally from its inception as J acob Be rnoulli’s Theorem in 1713, th rough De Moivre’s Theorem, to u ltimate fo rms due to Usp ensky and Khinchin in th e 1930s , and b ey ond. Both asp ects of Jacob Bernoulli’s Theorem: 1. As limit theorem (sample size n → ∞ ), an d : 2. Determining suﬃcientl y large sample size f or sp eciﬁed precision, for known and also unkn own p (th e invers ion problem), are studied, in frequentist and Ba yesian settings. The Bienaym ´ e–Cheb yshev I n equalit y is shown to b e a meeting p oint of the F rench and Russian directions in the h istor y . P articular emphasis is gi ven to le ss we ll-kn own aspects esp ecia lly of the Russian direction, with the work of Chebyshev, Mark ov (t h e organizer of Bicentennial celebrations), and S.N. Bernstein as focal p oin ts. Keywor ds: Bienaym ´ e–Chebyshev I n equalit y; Jacob Bernoulli’s Theorem; J.V. Usp ensky and S.N. Bernstein; Marko v’s Theorem; P .A. N ek raso v and A.A. Marko v; Stirling’s approximation 1. In tro duction 1.1. Jacob Bernoulli’s Theorem Jacob Bernoulli’s Theorem was m uch mor e than the ﬁr st instance of what came to be know in later times as the W eak La w o f Lar g e Numbers (WLLN). In mo dern notation Bernoulli show ed that, for ﬁxed p , any given small p ositiv e num b er ε , and a n y given lar ge po sitiv e num b er c (for example c = 1000 ), n may b e s peciﬁed so tha t: P      X n − p     > ε  < 1 c + 1 (1) for n ≥ n 0 ( ε, c ). The con text: X is the n um b er of succe s ses in n binomial trials relating to sampling with r eplacemen t from a collec tion of r + s items, o f whic h r were “ fer tile” and s “sterile”, so tha t p = r/ ( r + s ) . ε was taken as 1 / ( r + s ) . His conclusion was that This is an electronic reprint of the original article pub lis hed by the ISI/BS in Bernoul l i , 2013, V ol. 19, No. 4, 1088–1121 . This reprint diﬀers from the original in pagination and typographic detail. 1350-7265 c  2013 ISI/BS 2 E. Seneta n 0 ( ε, c ) co uld b e taken as the int eger gr e ater than or e q ual to : t max  log c ( s − 1) log( r + 1) − log r  1 + s r + 1  − s r + 1 , (2) log c ( r − 1) log( s + 1 ) − log s  1 + r s + 1  − r s + 1  where t = r + s . The nota tio n c, r, s, t is Be rnoulli’s and the form of the lo wer b ound for n 0 ( ε, c ) is la rgely his notation. There is already a clear understanding of the c oncept of an event a s a subset of out- comes, of pr obabilit y o f an even t a s the prop ortion of o utcomes favourable to the even t, and of the binomial dis tribution: P ( X = x ) =  n x  p x (1 − p ) n − x , x = 0 , 1 , 2 , . . . , n, (3) for the num b e r X of o ccurrences of the even t in n binomial trials. Jacob B ernoulli’s Theor em has t wo central featur es. The ﬁrst is that the greater the nu mber of obse rv ations, the less the uncertaint y . That is: in a probabilis tic sens e later formalized as “co nvergence in proba bilit y”, r elativ e fr equencies o f o ccurrence of an even t in independent repetitions of an exp erimen t approa c h the pro babilit y of o ccurre nc e of the even t a s sample size incre ases. It is in this guise that Ja cob Bernoulli’s The o rem app ears as the ﬁrst limit t he or em of pro babilit y theory , a nd in fr e quen tist mathematica l statistics as the notion of a cons isten t estimator of a para meter (in this case the par ameter p ). The ﬁrst central featur e also re ﬂe c ts , a s a mathematical theo rem, the empirica lly ob- served “statistical regular it y” in nature, wher e indep endent rep etitions of a random ex- per imen t under uniform c o nditions result in obser v ed stabilit y o f la rge sample relative frequency o f an even t. The second central feature, less str essed, is that Jaco b Bernoulli’s Theorem is an exact r esult . It is tantamoun t to obtaining a sample s ize n large e no ugh for sp eciﬁed accur acy of approximation o f p by the pr oportio n X/n . The lo wer b ound o n such n ma y dep end on p , as it do es in Jacob Bernoulli’s Theorem, but even if p is known, the pro blem of determining the b est p ossible lower b ound for n for sp eciﬁed precision is far fro m straightforward, as we shall demonstrate on one of Bernoulli’s exa mples. Bernoulli’s underlying mo tiv ation w as, ho wev er, the approximation of an unknown p by X/n on the bas is o f repeated bino mial sampling (accumulation of evidence) to sp eciﬁc accuracy . W e shall call this the inversion problem. It adds several layers o f complexity to b oth feature s . 1.2. Some bac kground not es The present a uthor’s mathematical and historica l interests have b een muc h in the direc- tion of A.A. Markov, and Markov c hains. It is ther e fore a pleasur e to hav e b een asked to write this pap er at the T ricentenary fo r a jour nal which b ears the name Bernoul li , since T ric entenary history 3 A.A. Markov wrote an ex cellen t summary o f the histor y of the LLN for the Bicentenary celebrations in St. Petersburg, Russia, 1913 . 1 J.V. Usp e ns ky’s transla tio n in to Russia n in 191 3 of the fourth part of Ars Conje ctandi (Bernoulli ( 1713 , 2005 )), wher e Jacob Be rnoulli’s Theorem o ccurs was part of the St. Pe- tersburg celebrations. Ma rk ov’s pap er and Usp e ns ky’s tra nslation are in Bernoulli ( 1986 ), a b o o k prepar ed for the First W orld Congr ess of the Bernoulli So ciety for Ma thematical Statistics and Proba bilit y held in T ashken t in 1986. It w as one of the present autho r ’s sources for this T ricentenary history , and includes a long co mmen tary b y Prokho ro v ( 1986 ). The T r icen tenary of the death of Jaco b Berno ulli was commemora ted in Paris in 20 05 at a collo quium ent itled L’art de c onje ctur er des Bernoul li . The pro ceedings hav e b een published in the Journal Ele ct r onique d’Histoir e des Pr ob abilit ´ es et de la Statist ique , 2 , Nos. 1 and 1(b) (at www.jehps.net ). A num b er of celebr ations of the T r icen tenary of publication o f the Ars Conje ctandi are scheduled for 201 3 , whic h is also the 250th anniversary of the ﬁrst public presen tation of Thomas Ba yes’s work. This inco rpora tes his famous theore m, whic h plays an important role in our sequel. 2013 has in fac t been designated In ternational Y e ar of S tatistics. Many of the so urces be lo w a re av aila ble for viewing online, although only few online addresses a re sp eciﬁed in the sequel. Titles o f bo oks and chapters are genera lly g iv en in the orig inal langua ge. In the cas e of Russian this is in English transliter ation, a nd with an Eng lish translation provided. English-lang uage versions are cited where p ossible. Quotations are in English translation. In F rench only the ﬁrst letter of titles is generally capitalized. German ca pitalizes the ﬁrst letters o f nouns wherever nouns o ccur. Within quotations we have gener ally stayed with wha tever style the o riginal author had us ed. 2. The Bernoullis and Mon tmort In 1687 Jaco b Bernoulli (165 4–1705) beca me P rofessor of Mathema tics at the University of Base l, and remained in this p osition unt il his dea th. The title Ars Conje ctandi was an em ulation of Ars Co gitandi , the title o f the Latin version 2 of L a L o gique ou l’Art de p enser , more commonly known as the Lo g ic of Port Roy al, whose ﬁr st edition was in 1 6 62, the year of Pascal’s death. Bienaym ´ e wr ites in 1843 ( Heyde and Sene ta ( 197 7 ), p. 114) of Ja cob Berno ulli: One reads on p. 225 of the fourth part of his A rs Conje ctandi that his idea s have bee n suggested to him, pa rtially at lea st, by Chapter 1 2 and the chapters following it, of l’A rt de p enser , whose a utho r he calls magni acuminis et ingenii vir [ a man of gr e at acumen and ingenuity ] . . . The ﬁnal chapters con tain in fact elements o f the calculus of proba bilities, applied to history , to medicine, to mir acles, to literary 1 It is av ailable in English as Appendix 1 of Ondar ( 1981 ). 2 Latin wa s then the int ernational language of scholarship. W e hav e used “Jacob” as version of the Latin “Jacobus” used b y the author of A rs Conje ctandi for this reason, instead of the German “Jakob”. 4 E. Seneta criticism, to incidents in life, etc., and ar e c o ncluded by the argument of Pascal on eternal life. The implication is that it w as P asca l’s writings which were the inﬂuence . Jacob Bernoulli w as steeped in Calvinism (although well acquain ted with Catholic theo lo gy). He was thus a ﬁrm b eliev er in predestination, as opp osed to free will, and hence in de- terminism in resp ect of “random” phenomena. This colo ured his view on the origins of statistical r egularity in nature, and led to its mathematical formaliza tio n. Jacob Bernoulli’s A rs Conje ctandi remained unﬁnished in its ﬁnal (fourth) part, the Pars Q uarta , the part whic h contains the theo rem, a t the time o f his dea th. The un- published version was reviewed in the J ournal des s¸ cavans in Paris, and the rev iew accelerated the (anonymous) publica tion o f Montmort’s Essay d’a nalyse sur les jeux de hazar d in 170 8. Nicolaus Be r noulli (168 7–1759) 3 was a nephew to Jaco b and Johann. His do ctorate in law a t the Univ ers it y o f Basel in 170 9 , en titled De Usu Artis Conje ctandi in Jur e , was clea rly inﬂuenced by a direction to wards a pplications in the dra ft for m of the Ars Conje ctandi o f Jacob. Nico la us’s uncle Johann, co mmen ting in 1 7 10 o n pro blems in the ﬁrst edition o f 1708 of Mont mor t, fac ilita ted Nico laus’s contact with Montmort, and seven letters from Nicolaus to Montmort app ear in Montmort ( 171 3 ) , the second editio n of the Essay . The mo s t imp ortant of these as regar ds our presen t topic is dated P aris, 23 January , 1713. It focuses on a low er b ound approximation to binomia l probabilities in the spirit of Jacob’s in the Ars Conje ctandi , but, in contrast, for ﬁ xe d n . As a sp eciﬁc illustration, Nicola us asserts ( Montmort ( 1713 ), pp. 392 –393), using mo d- ern nota tion, that if X ∼ B (14 00 0 , 18 / 35) , then 1 − P (7037 ≤ X ≤ 7363 ) ≥ 1 / 4 4 . 58 = 0 . 022 4 316 . (4) Laplace ( 1814 ), p. 2 81, without men tioning Nicolaus a n ywhere, to illustrate his own approximation, obtains the v a lue P (7037 ≤ X ≤ 73 63) = 0 . 9 94505 s o that 1 − P (7037 ≤ X ≤ 73 63) = 0 . 0056 942. The sta tistical softw are pack ag e R which can calculate binomial sum probabilities giv es P ( X ≤ 7 363) − P ( X ≤ 703 6) = 0 . 9943058 , so that 1 − P (7037 ≤ X ≤ 73 63) = 0 . 0056 942. The diﬀerence in philosophica l appro ac h is clear: ﬁnding n suﬃciently la rge for spe- ciﬁc precision (J a cob), and ﬁnding the degree of precision for given large n (N icola us). While Nico laus’s contribution is a direct bridge to the no rmal approximation of binomial probabilities for large ﬁxed n as taken up by De Moivre, it do es not enhance the limit theorem direction of Jac ob Ber noulli’s Theo rem, as De Moivre’s Theorem, from which the no rmal appr oximation for large n to binomial pr obabilities emerges , was to do. After Paris, in ea rly 1713 at Montmort’s country estate, Nicola us helped Montmort prepare the s econd edition of his b o ok ( Montmort ( 171 3 ) ), and returned to Basel in April, 1713, in time to write a pr eface to Ars Conje ctandi which appeared in August 1 713, a few mo nths b efore Montmort’s ( 171 3 ). 3 F or a substan tial biographical accoun t, see Cs¨ org¨ o ( 2001 ). T ric entenary history 5 Nicolaus, P ierre R´ emond de Mon tmort (1678–1719 ) a nd Abraham De Moivre (166 7 – 1754) were the three leading ﬁgures in what Hald ( 2007 ) calls “the grea t leap for w ard” in stochastics, which is how Hald descr ibes the p erio d from 1708 to the ﬁr s t edition of De Mo iv re’s ( 17 18 ) Do ctrine of Chanc es . In his pr eface to Ars Conje ctandi in 1 713, Nicolaus says of the fourth par t that Jacob int ended to a pply what had b een expo sited in the earlier parts of Ars Conje ctandi , to civic, mor al a nd eco nomic q uestions, but due to pro longed illness and untimely onset o f death, Ja c ob left it incomplete. Descr ibing himself as to o young a nd inex perienced to do this a ppropriately , Nicolaus decided to let the Ars Conje ctandi b e published in the form in which its author left it. As Cs¨ org¨ o ( 20 01 ) comments: Jakob’s progr amme, o r dream rather, w as wholly imp ossible to accomplish in the eighteen th century . It is impossible to da y , and will remain s o, in the sense that it was under stoo d then. 3. De Moivre De Moivr e’s motiv ation was to approximate sums of individual binomial pr obabilities when n is larg e, and the proba bilit y of success in a s ingle trial is p . Thus, when X ∼ B ( n, p ). His initial focus w as o n the symmetric case p = 1 / 2 a nd large n , thus av oiding the complication o f appr o ximating an asymmetric binomia l distribution by a symmetr ic distribution, the standard normal. In the English tr anslation o f his 1733 paper (this is the culminating pa p er on this topic; a facsimile o f its o pening pages is in Stigler ( 1 986 ) , p. 7 4), De Moivr e (1 738) praises the work of Jaco b and Nicolaus Berno ulli on the summing of several terms of the binomial ( a + b ) n when n is la rge, which De Moivre had alrea dy brieﬂy desc r ibed in his Misc el lane a Analytic a of 1730, but says: . . . yet some things were further requir e d; for what they have done is not so muc h an Approximation as the determining of very wide limits, within which they demon- strated that the sum of the terms was c o n tained. De Mo ivre’s approa c h is in the spirit of Nicola us Bernoulli’s, not least in that he s e e ks a result for larg e n , and pro ceeds by a ppro ximation o f individual terms. As with J acob Ber noulli’s Theor e m the limit theo rem asp ect of De Moivre’s result, eﬀectively the Central Limit Theorem for the standardize d pro portion of succes s es in n binomial trials as n → ∞ , is mask ed, a nd the a ppro ximating v alue for s ums o f binomial probabilities is par amoun t. Nevertheless, De Moivr e’s results provide a str ik ingly simple, go o d, and eas y to apply approximation to binomia l s ums, in terms of an integral of the no rmal density curve. He discov ered this curv e thoug h he did not a ttac h s p ecial signiﬁcance to it. Stigler ( 1986 ), pp. 70– 88 elega n tly sketc hes the progr ess of De Moivre’s developmen t. Citing Stigler ( 1986 ), p. 81: . . . De Moivre had found an eﬀective, feasible way of summing the terms of the binomial. 6 E. Seneta It ca me a bout, in o ur view, due to tw o key comp onents: wha t is now known as Stirling’s formula, and the practical ca lculation of the no r mal integral. De Moivr e’s ( 1733 ) The or em may be stated as follows in mode r n terms: the sum of the bino mia l terms X  n x  p x q n − x , where 0 < p = 1 − q < 1 o ver the r a nge | x − np | ≤ s √ npq , approaches as n → ∞ , for any s > 0 the limit 1 √ 2 π Z s − s e − z 2 / 2 dz . (5) Jacob Berno ulli’s Theo rem a s express ed by ( 1 ) fol lows as a Cor ol lary. This co rollary is the LLN asp ect of Ja cob Bernoulli’s T heo rem, and was the fo cus of De Moivre’s appli- cation of his r esult. It revolv es conceptually , as do es Jaco b Bernoulli’s The o rem, ar ound the mathematical formaliza tion of sta tistical reg ularit y , which empirica l phenomenon De Moivre attributes to: . . . that Order which naturally results from O RIGINA L DESIGN. (quoted by Stigler ( 1986 ), p. 85). The theo lo gical connotatio ns o f empirical statistical regular ity in the co n text of fr e e will and its opp osite, deter minism, are elab orated in Seneta ( 200 3 ). De Moivr e’s ( 1733 ) r esult also gives a n answer to estimating precision o f the rela- tive frequency X/ n as a n estimate o f an unknown p , for given n ; or of determining n for given prec is ion (the inv ers e problem), in frequentist fashion, using the inequality 4 p (1 − p ) ≤ 1 / 4. De Mo ivre’s r esults app eared in part in 1730 in his Misc el lane a Analy tic a de S eriebus et Quadr aturis and were completed in 1733 in his Appr oximatio ad Summam T erminorum Binomii a + b | n in S eri em Exp ansi. 5 His D o ctrine of Chanc es of 173 8 (2nd ed.) contains his tr a nslation into Englis h of the 1733 paper. There is a short preamble on its p. 235, repro duced in Stigler ( 19 86 ), p. 74, which states: I shall here translate a Paper of mine whic h w as prin ted N ove mb er 12 , 1733, and communicated to some F riends, but never yet made public, reser v ing to myself the right o f enlarging my own thoughts, a s o ccasio n shall requir e . In his Misc el lane a Analytic a , B o ok V, De Mo ivre dis pla ys a detailed study o f the work of the Ber noullis in 171 3 , and distinguishes clear ly , on p. 28, b et ween the approa c hes of Jacob in 17 1 3 of ﬁnding an n suﬃciently lar ge for s peciﬁed pr e cision, and of Nicolaus of assessing precision for ﬁxed n for the “futurum pro babilitate”, thus alluding to the fa ct that the work was for a genera l, and to b e e s timated, p . 4 See our Section 9.1. 5 a + b | n is De Moivre’s notation for ( a + b ) n . T ric entenary history 7 The ﬁrst edition of De Mo ivre’s Do ct rine of Chanc es had app eared in 171 8 . In this bo ok there are a n umber of refere nc e s to the work of bo th Ja c ob and Nicolaus but only within a ga mes of c hance setting, in particula r to the w ork of Nicola us as presen ted in Montmort ( 1713 ), and it is in this ga mes of chance context that later F rench authors generally cite the Do ctrine of Cha nc es , characteristically giving no year of publication. The life and proba bilis tic work of De Moivre is thro ughly describ ed also in Schneider ( 1968 , 2 006 ) , and Bellhouse ( 2 0 11 ). 4. Laplace. The in v ersion problem. Lacroix. The cen tenary In a pa p er of 1774 (Laplace ( 1986 )) which Stigler ( 198 6 ) regar ds as foundational for the problem of predictive pr o babilit y , P ierre Simon de Laplace (17 4 9–1827) sees that Bay es’s Theorem pr o vides a mea ns to solution of Jaco b Bernoulli’s inversion pr oblem. Laplace co nsiders binomial trials with succ e ss pr obabilit y x in each trial, ass uming x has uniform prior distributio n on (0 , 1 ), and ca lculates the p osterior distribution of the success probability ra ndom v a riable Θ after observ ing p s uccesses a nd q failures. Its density is: θ p (1 − θ ) q R 1 0 θ p (1 − θ ) q dθ = ( p + q + 1)! p ! q ! θ p (1 − θ ) q (6) and Lapla ce pr o ves that fo r a n y given w > 0 , δ > 0 P      Θ − p p + q     < w  > 1 − δ (7) for la rge p, q . This is a B ayesian analogue of Ja c ob Bernoulli’s Theo rem, the b eginning of Bay esian estimation of succes s pr obabilit y of binomial trials and of Bay esian-type limit theorems o f LLN and Cent ra l Limit kind. Ear ly in the pap er Lapla ce ta kes the me an p + 1 p + q + 1 (8) of the p osterior distributio n as his total [predictive] pro babilit y on the basis of o bserving p and q , a nd ( 8 ) is w ha t we now c all the Bayes estimator . There is a brief men tion in Laplace’s paper o f De Moivr e’s Do ctrine of Chanc es (no doubt the 1718 edition) at the outset, but in a context diﬀerent from the C e n tral Limit problem. There is no mention o f J acob Ber no ulli, Stirling (whose fo r m ula he uses, but which he cites a s s ourced from the work of E uler), or Bayes. The pap er of 1774 a pp ears to b e a work of still youthful exub erance. In his preliminary Disc ours to his Essai , Condorcet ( 17 85 ), p. viij, speaks o f the r elation betw een relative frequency and pro ba bilit y , and ha s a fo otnote: 8 E. Seneta F o r these t wo demonstra tions, se e the third par t of the Ars Conje ctandi of Jac ob Bernoul li , a work full o f genius, and one of those o f which one may regr et that this great man ha d be gun his mathematical career so la te, and whose death ha s to o so on interrupted. Lacroix ( 1 816 ) , who had b een a pupil of J .A.N. de Caritat de Condorcet (1743 –1794), writes o n p. 59 ab out Jacob Bernoulli’s Theo rem, a nd has a fo otnote: It is the o b ject of the 4 th Part of the Ars Conje ctandi . This p osth umous work, published in 1713, alrea dy co n tains the principa l foundations o f the philosophy of the pr obabilit y calculus, but it remained la rgely obscured until Condorcet r ecalled, per fected and extended it. Condorcet w as indeed well-versed with the work of Jaco b Bernoulli, and sp eciﬁcally the Ars Conje ct andi , to which the numerous a llusions in the b ook o f Bru–Cr epel ( Condorcet ( 1994 )) testify . Lacroix ( 1816 ) may well b e regar ded as ma rking the ﬁrs t Centenary of J acob Bernoulli’s Theo rem, b e cause it gives a direct pro of and extensive dis cussion of tha t theorem. Subsequently , while the name and statement o f the theorem p ersist, it ﬁgures in essence as a frequentist co rollary to De Moivre’s Theo rem, o r in its Bayesian version, following the Bay esian (predictive) analo gue of De Mo iv re’s Theo r em, o f Lapla ce ( 1814 , pp. 363 ﬀ, Chapitre VI: De la pr ob abilit´ e des c auses et des ´ ev ´ enemens futu rs, tir ´ ee des ´ ev´ enemens observ ´ ees ), which is what the fo otnote of Lacroix ( 1816 ), p. 2 9 5, cites at its very end. The ﬁrst edition of 181 2 and the second editio n of 1814 of La place’s Th ´ eorie analytique des pr ob abili t´ es spa n the Ce ntenary year of 18 13, but, a s Armatte ( 2006 ) puts it, Lacro ix ( 1816 ) served as an exp osition of the proba bilis tic thinking of Condor cet and Laplace for peo ple who would never go to the origina l philosophical, let alone tec hnical, sources of these author s. 6 Nevertheless, La pla ce ( 1814 ) is an outstanding ep o c h in the developmen t of probability theory . It connects well with what had gone b efore a nd with our present histo r y o f the LLN, and mig h tily inﬂuenced the future. La pla ce’s ( 1814 ) p. 2 7 5 ﬀ, Chapitre I II, Des lois de pr ob abili t´ e, qui r esultent de la mult ip lic ation ind´ eﬁnie des ´ even´ emens is frequentist in approach, co n tains De Mo ivre’s Theorem, and in fact a dds a contin uity cor rection term (p. 2 7 7): P ( | X − np | ≤ t √ npq ) ≈ 1 √ 2 π Z t − t e − u 2 / 2 du + e − t 2 / 2 √ 2 π npq . (9) Laplace remar ks that this is an appr o ximation to O ( n − 1 ) providing np is an integer, 7 and then applies it to Nicolaus B ernoulli’s exa mple (see our Sectio n 2). On p. 28 2 he 6 The inﬂuence of Lacroix’s ( 1816 ) b ook is particularl y evident in the subsequen t more statistical direction of F renc h probabilit y i n the important b ook of Cournot ( 1843 ), as the copious and incisiv e notes of the editor, Bernard Br u, of i ts r eprin ting of 1984 mak e clear. 7 See our Section 11.2 for a precise statemen t. T ric entenary history 9 inv erts ( 9 ) to give an interv al fo r p centred on ˆ p = X/n , but the ends of the interv al still depe nd on the unkno wn p , whic h Laplace r eplaces by ˆ p , since n is lar ge. This giv es an int erv al o f ra ndom leng th, in fa ct a conﬁdence interv al in mo dern ter minology , fo r p . Neither De Moiv re nor Stirling no r Nicolaus Bernoulli ar e men tioned here. How ever in his Notic e historique sur le Calcul des Pr ob abil it´ es , pp. xcxix–civ, b oth B e rnoullis, Montmort, De Mo ivre and Stirling receiv e due credit. In particular a par agraph extending ov er pages cij–ciij re fers to a later edition (1838 o r 185 6, unsp eciﬁed) of De Moivre ’s Do ctrine of Chanc es sp eciﬁcally in the cont ext of De Moivre’s Theorem, in bo th its contexts, that is (1) as facilita ting a pro of of Jacob Ber no ulli’s Theo rem; and (2) as: . . . an elegant and simple expression that the diﬀerence betw een these t wo ratios will b e contained within the given limits. Finally , of r elev ance to our pre s en t theme is a subsection (pp. 6 7–70) en titled: Th ´ eor` emes su r le developp ement en s´ eries des fonctions de plusieurs variables. Her e Laplace consider s, using their gener ating functions, sums of independent int ege r -v alued but no t necessar ily iden tically distributed random v ariables, a nd obtains a Central Limit Theorem. The idea o f inhomog eneous sums and av erag es leads directly in to subsequen t F r enc h (Poisson) and Russ ia n (Chebyshev) dir ections. 5. P oisson’s La w of Large Num b ers and Cheb yshev 5.1. P oisson’s La w The ma jor work in probability of Sim´ eon Denis Poisson (1781–184 0) was his bo ok 8 of 1837: R e cher ches sur la pr ob abilit´ e . It is la rgely a tr eatise in the tradition of, a nd a se q uel to, that of his gr eat predeces sor Laplace’s ( 1814 ) Th´ eorie analytique in its emphasis on the large sample b eha viour of av erag es. The theor em of Jacques Ber nouilli [sic] [Jacob Bernoulli] is mentioned in 5 pla ces, scattered ov er pp. iij to p. 205 . Lapla ce rec e iv es m ultiple mentions on 16 pag es. Bay es, a s “Blay es”, is mentioned twice on just one page, and in connection with Laplace. Wha t follows is clearly in the s ense of Lapla ce, with the prior probability v alues for pr obabilit y o f succes s in binomial tria ls being determined by o ccurrence of one of a range of “causes”. Condorcet is men tioned twice, and P asca l at the b eginning, but there is no ment ion of Montmort, let alo ne Nico laus Bernoulli, nor of De Mo iv re. The term L oi des gr ands nombr es [ L aw of L ar ge Numb ers ] appe a rs for the ﬁr st time in the histor y of probability o n p. 7 of Poisson ( 1 837 ) , within the s ta temen t; Things of every kind of natur e are sub ject to a univ ersal la w whic h one may well call the L aw of L ar ge Numb ers . It consists in that if one observes larg e n umbers 8 The digitized version which I ha ve examined has the lab el on the cov er: “F rom the Library of J.V. Usp ensky , Pr ofessor of Mathematics at Stanford, 1929–1947” and is from the Stanford Universit y Libraries. Us pensky plays a ma j or role i n our account . 10 E. Seneta of even ts of the same natur e depending on causes which are constant and causes which v ary irre g ularly , . . . , one ﬁnds that the pr o portions of o ccurrence are almos t constant . . . There are tw o versions of a LLN in Poisson’s tr eatise. The one most emphasized by him has at any o ne of n binomial trials, each of a ﬁxed n umber a of ca uses o perate equiprobably , tha t is, with probability 1 /a , o ccurrence of the i th cause r esulting in ob- served success w ith pr obabilit y p i , i = 1 , 2 , . . . , a . Thus in each of n indep enden t trials the probability of success is ¯ p ( a ) = P n i =1 p i /a . So if X is the num b er o f successes in n trials, for suﬃciently larg e n , P      X n − ¯ p ( a )     > ε  < Q for any pr e s peciﬁed ε, Q . Poisson ( 183 7 ) prov ed this directly , not r ealizing tha t it follows directly fro m Ja cob Bernoulli’s Theo r em. The LLN which Poisson ( 1837 ) considere d ﬁrs t, and is now called Poisson’s Law of Large Num b ers, ha s pr obabilit y of succes s in the i th trial ﬁxed, at p i , i = 1 , 2 , . . . , n . He show ed that P      X n − ¯ p ( n )     > ε  < Q for suﬃcien tly large n , using Laplace’s Central Limit Theorem for sums of non-iden tically distributed random v ar iables. The sp ecial case where p i = p, i = 1 , 2 , . . . g iv es Jaco b Bernoulli’s Theor em, s o Poisson’s LLN is a genuine genera lization. Inasmuc h as ¯ p ( n ) its e lf need not even converge as n → ∞ , Poisson’s LLN displays a s a primary asp ect loss of variability of pro portions X/n as n → ∞ , r ather than a tendency to stability , which Jaco b Bernoulli’s Theorem es ta blished under the re s triction p i = p . 5.2. Cheb yshev’s thesis and paper The mag is terial thesis, Chebyshev ( 1845 ), at Mos co w Universit y of Pafn utiy Lvo vich Chebyshev (1 8 21–1894 ), b egun in 1841 and de fended in 1846 , but appar e n tly published in Russian in 1845, was entitled An Essay in Element ary Analysis of the The ory of Pr ob abilities. 9 It gives as its motiv a tion, dated 17 Oc to ber (o.s) 1844 ( Chebyshev ( 1955 ), pp. 1 12–113): T o show witho ut us ing tra nscenden tal analysis the fundamental theor ems o f the calculus of probabilities and their ma in applications , to serve as a support for all branches of knowledge, bas ed o n observ ations and evidence . . . Dominant driving forces for the applica tio n of probability theory in Eur ope, Great Britain, and the Russian Empire in tho s e times were r etir ement funds and insu r anc e 10 9 I hav e consulted a reprinting in Cheb yshev ( 1955 ), pp. 111–189. 10 Laplace ( 1814 ) had dev oted an extensiv e par t of the applications to inv estigations of life tables and the sex ratio, and in F rance D e Moivre’s work wa s largely kno wn for his wri tings on annuities. T ric entenary history 11 and Russ ian ins titutions such as the Y a roslavl Demidov Lyc´ ee, within the Mo sco w Ed- ucational Region, had no textb ooks . Suc h a textb o ok was to in volv e only “elementary metho ds”. As a co nsequence, Chebyshev’s mag is terial disserta tion used no c a lculus, only a lgebra, with what w ould hav e been integrals being sums throughout, but w as nevertheless a lmost ent irely theoretical, g iving a rig orous analytical discussion of the then probability theory , with a few examples. Thr oughout, the q uan tit y e − x 2 ﬁgures pro minen tly . The dissertatio n concludes with a table of what are in eﬀect tail probabilities of the standar d normal distribution. Much of the thesis is in fact devoted to pro ducing these very accurate tables (cor rect to 7 decimal places) by summation. Laplace’s ( 1814 ) Chapitre VI, on predictive pr obabilit y , is adapted b y Chebyshev to the cir cumstances. In Laplace ’s writings, the pr ior distribution is envisaged as co ming ab out a s the result o f “ca uses”, resulting in corresp onding v alues b eing attached to the po ssible v alues in (0 , 1 ) of a succes s pr obabilit y , the a ttac hed v a lue dep ending on which “cause” o ccurs. If causes ar e deemed to be “ equiprobable”, the distribution of s uccess probability is uniform in (0 , 1) . Chebyshev stays “discrete” , so, for example, he ta k es i s , i = 1 , 2 , . . . , s − 1 as the p ossible v a lues (the s a mple space) o f the prio r pro babilit y in (0 , 1 ) of ( s − 1) equiprobable causes, the pro babilit y of each of the causes b eing 1 s − 1 . Thus if r oc currences o f an event E (“success”) a r e o bserv ed in n tria ls, the p osterior distr ibution is given by:  n r  ( i/s ) r (1 − ( i/s ) n − r P s − 1 i =1  n r  ( i/s ) r (1 − ( i/s )) n − r . (10) Examples are also mo tiv ated by Laplace ( 1814 ), who in the same chapter b egins Sec- tion 28 , p. 377, with the following: It is princ ipa lly to births that the preceding analys is is applicable. Chebyshev’s ( 1845 ) thesis co ncludes Sectio n 26,which is within Chapter IV, with: Inv estigations have shown that of 215 599 newbo rns in F rance 110 312 were b o ys. 11 He then calcula tes that the probability tha t the p osterior random v ariable Θ satisﬁes P (0 . 50715 ≤ Θ ≤ 0 . 516 15) = 0 . 99 996980 by taking r /n = 1 10312 / 2 15599 = 0 . 51165 3579, and using (in mo dern notatio n): Θ ∼ N  r n , ( r/n )(1 − ( r /n )) n  and his tables of the standa rdized normal r andom v ariable. (Using the statistical softw are R for the standard normal v ariable gives 0.9 9 99709.) 11 I could not ﬁnd this data in Laplace ( 1814 ), although more extensiv e data of this kind is treated there. 12 E. Seneta Chebyshev is clearly w ell acquainted with not only the work o f Laplace, but also the work o f De Mo ivre, Bayes and Stirling, although he cites none of these authors explicitly . Jacob Ber noulli’s Theorem is men tioned at the end o f Chebyshev’s ( 1845 ) thesis, Section 20, where he pr oceeds to obtain as an approximation to the binomial probability: P µ,m = µ ! m !( µ − m )! p m (1 − p ) µ − m the ex pr ession 1 p 2 π p (1 − p ) µ e − z 2 2 / 2 p (1 − p ) µ using the (Stirling) appr o ximation x ! = √ 2 π x x +1 / 2 e − x which he sa ys is the “form usually used in probability theory”. But he actually obtains b ounds for x ! dir e ctly . 12 Much of Chapter I II is in fact dedicated to ﬁnding such b ounds, as he says at the outset to this chapter. Such car eful b ounding arguments (rather than approximative asymptotic expressio ns) are character istic o f Chebyshev’s work, and o f the Rus s ian pr obabilistic tradition which came after him. T his is very muc h in the spirit of the b ounds in Jacob Berno ulli’s The- orem. Poisson’s ( 1837 ) Re cher ches sur la pr ob abil it´ e came to Chebyshev’s attention a fter the publication of Chebyshev ( 184 5 ), but the subtitle of Chebyshev ( 1846 ) sug gests that the conten t o f Chebyshev ( 184 6 ), motiv ated by Poisson’s LLN was used in the defense of the dissertation in 184 6 ( Bernstein ( 1 9 45 )). The only explicit citation in Chebyshev ( 184 6 ) is to Poisso n ( 1837 ), Cha pitre IV, although J acob B ernoulli’s Theorem is ackno wledged as a sp ecial ca se of Poisson’s LLN. In his Section 1 Che byshev says of Poisso n’s LLN: All the sa me, no matter how ingenious the metho d utilized by the s plendid g eome- ter, it do es not provide b ounds on the error in this approximative analy s is, and, in consequence of this la c k of deg r ee of error , the deriv ation lacks appr opriate r igour. Chebyshev ( 18 4 6 ) in eﬀect rep eats his bo unds fo r the homogeneous cas e ( p i = p = 1 , 2 , . . . , n ) of binomia l tria ls which he dealt with in Chebyshev ( 1845 ), Sectio n 21 , to deal with the pr esen t inhomog eneous ca se. He also uses genera ting functions for sums in the ma nner of Laplace ( 1814 ). Here is his ﬁnal result, where as usual X stands for the num b e r o f successes in n tria ls, p i is the probability of s uccess in the i th trial, a nd p = P n i =1 p i n . P      X n − p     ≥ z  ≤ Q if n ≥ max  log[ Q z 1 − p q 1 − p − z p + z ] log H  ,  log[ Q z p q p − z 1 − p + z ] log H 1  (11) 12 See our Section 9.2. T ric entenary history 13 where: H =  p p + z  p + z  1 − p 1 − p − z  1 − p − z , H 1 =  p p − z  p − z  1 − p 1 − p + z  1 − p + z . (12) Structurally , ( 11 ), ( 1 2 ) are v ery similar to Jacob Ber noulli’s expressions in his Theo rem, so it is relev ant to compar e what they give in his numerical example when z = 1 / 50 , p = 30 / 50 = 0 . 6 , Q = 1 / 1 001 = 0 . 0009 99001. The ans wer app ears to b e n ≥ 12241 . 2 93, i.e., n ≥ 1 2242. In spite o f the eminence of the jour nal (Crelle’s) in which Chebyshev ( 1846 ) pub- lished, and the F rench lang ua ge in whic h he wrote, the pap er passed unnoticed amo ng the F rench ma thema ticians, to who m what we now call Poisson’s LLN r emained a n ob- ject of co n trov ersy . In his historically imp ortant fo llo w-up to the Laplac ia n analytical tradition of probability (see Bru, Br u and Eid ( 2012 )), Laurent ( 1873 ) gives a pro of of Poisson’s Law of Lar ge Num b ers. He uses characteristic functions, a nd gives a careful consideratio n of the error, and hence of co nvergence rate. Howev er Sleshinsk y ( 1892 ), in his historica l F oreward, claims Lauren t’s proo f contains a n error w hich substan tially alters the conclusion on conv erg ence ra te. La uren t ( 1873 ) cites a num be r of Bienaym´ e’s pap ers, but do es not a ppear to use the simple pro of of Poisson’s LLN which follows from the Bie naym´ e– Cheb yshev Inequality , which by 18 73 had b een known for s o me time. 6. Biena ym ´ e and Ch eb yshev 6.1. Biena ym ´ e’s motiv ation The ma jor ea rly w ork of Iren´ ee Jules Bienaym ´ e (1796– 1 878): De la dur´ ee de la vie en F r anc e (1 837), on the a ccuracy of life ta bles as used for insur a nce calculations, force d the abandonment of the Duvillar d ta ble in F rance in fav our o f the Deparcieux ta ble. He was inﬂuenced in the writing of this pap er not least by the demogra phic conten t of Laplace’s Th ´ eorie analytique . Bienaym ´ e, well aw are o f Poisson ( 1 837 ) vehement ly dis appro ved of the term “ La w of Large Num bers ” ( Heyde and Seneta ( 1977 ), Section 3 .3), thinking that it did no t exist as a sepa rate entit y fro m Jacob Ber noulli’s Theor em, no t unders ta nding the version of Poisson’s Law wher e a ﬁxe d probability of success, p i is asso ciated with the i -th trial, i = 1 , 2 , . . . . As a co nsequence of his misunders tanding, in 1 839 ( Heyde and Seneta ( 1977 ), Section 3.2) Bienaym´ e pro poses a scheme of v ariation o f probabilities (that is, of “genuine” inhomog e neit y of trials, as oppose d to the other version of P oisson’s Law which do es not diﬀer fro m Jacob Berno ulli’s) throug h a principle o f dur´ ee des c auses [ p ersistenc e of c auses ]. Suppo s e there are a cause s , sa y c 1 , c 2 , . . . , c a , the i -th cause giv ing rise to probabilit y p i , i = 1 , 2 , . . . , a of success. Each c a use ma y o ccur equipro bably for each one of m sets of n trials; but once chosen it p ersists for the who le se t. The case n = 1 is of course the “o ther” Poisson scheme which is tantamoun t to Jacob Be r noulli’s sampling scheme with success probability ¯ p ( a ) . 14 E. Seneta F o r his scheme of N = mn trials Bienaym´ e writes down a Central Limit res ult with correctio n term to the norma l integral in the ma nner of Lapla ce’s version of De Mo ivre’s Theorem for Ber noulli trials, to which Bienaym´ e’s r esult r educes whe n n = 1 . Schemes o f m s ets of n binomial tria ls underlie Disper sion Theory , the study of ho- mogeneity a nd stability of repeated trials, which was a predecessor of the “con tinental direction of statistics”. W ork on Disp ersion Theory pro ceeded throug h Lexis, Bortkiewicz and Ch uprov; and even tually , through the corresp ondence b et ween Ma rk ov and Chupro v, manifested itself in another branch of the evolutionary tree of the LLN o f r epeated tr ials founded o n Ja cob Ber noulli’s T heo rem. (See Heyde and Seneta ( 1977 ), Chapter 3 .) 6.2. The Biena ym´ e–Cheb yshev Inequalit y Bienaym ´ e ( 1853 ) shows mathematica lly that for the sample mean ¯ X o f independently and iden tically distributed rando m v ariables whose population mean is µ and v ariance is σ 2 , s o E ¯ X = µ, V ar ¯ X = σ 2 /n , then fo r any t > 0: Pr(( ¯ X − µ ) 2 ≥ t 2 σ 2 ) ≤ 1 / ( t 2 n ) . (13) The pro of which Bienaym ´ e uses is the s imple o ne we use in the clas sroo m today to prov e the inequa lit y by proving that for any ε > 0, providing E X 2 < ∞ , and µ = E X : Pr( | X − µ | ≥ ε ) ≤ (V ar X ) /ε 2 . (1 4) This is co mmonly r e ferred to in proba bilit y theor y as the Chebyshev Inequality , and less c o mmonly as the Bienaym ´ e–Chebyshev Inequality . If the X i , i = 1 , 2 , . . . are indep en- dent ly but not necessa r ily ident ically dis tributed, and S n = X 1 + X 2 + · · · + X n , putting X = S n in ( 14 ), a nd using the Bienaym ´ e equality V ar S n = P n i =1 V ar X i , ( 14 ) r eads: Pr( | S n − E S n | ≥ ε ) ≤ n X i =1 V ar X i ! . ε 2 . (15) This inequality was o btained by Chebyshev ( 1867 ) for dis c rete r a ndom v ariables and published s im ultaneously in F rench and Russian. Bienaym ´ e ( 1853 ) was reprinted imme- diately pre c e ding the F rench version in Liouv ille ’s jour nal. In 18 74 Chebyshev wrote: The simple and rig orous demonstration of Bernoulli’s la w to be found in m y note ent itled: Des vale urs moyennes , is only one of the results ea sily deduce d from the metho d of M. Bienaym ´ e, which led him, himself, to demonstrate a theorem on probabilities, from which Ber noulli’s law follows immediately . . . Actually , not only the limit theor e m asp ect o f Jacob Berno ulli’s Theore m is co vered b y the Bienaym ´ e–Chebyshev Inequality , but als o the inv ersio n asp ect 13 even for unsp eciﬁed p . 13 Using p (1 − p ) ≤ 1 / 4. T ric entenary history 15 F ur ther , Chebyshev ( 1874 ) form ulates as: “the metho d of Bienaym ´ e ” what later be- came known as the method of moments. Chebyshev ( 1887 ) used this metho d to prov e the ﬁrs t version of the Central Limit Theor em for sums of independently but not iden- tically distributed summands; and it was quickly tak en up and generalized b y Ma rk ov. Marko v and Liapunov were Chebyshev’s most illustrious students, and Markov was ever a champion o f Bienaym ´ e as re gards prior it y of discov ery . See Heyde and Seneta ( 1 977 ) , Section 5.1 0, for details , and Seneta ( 1984 ) 14 for a his tory of the Cent ra l Limit problem in pr e-Rev olutionar y Russia . 7. Life tables, insuranc e, and probabilit y in Britain. De Morgan F r om the mid 170 0s, there had b een a close asso ciation b et ween ga mes of chance a nd de- mographic and oﬃcia l s tatistics with resp ect to calculation o f sur viv al probabilities fro m life tables. Indeed ga mes of chance and demographic statistics w ere carrier s of the nas c en t discipline of pro babilit y . There was a need for, and activity tow ards, a relia ble sc ience of risk ba sed on birth statis tics and life tables by insurance companies and sup erannuation funds ( Heyde and Seneta ( 1977 ), Sections 2.2–2.3 ). De Moivre’s ( 1725 ) Annuities up on Lives was a foremos t so ur ce in England. John William Lubbo ck (1803 –1865) is sometimes des cribed a s “the foremos t among English mathematicians in adopting Laplac e ’s do ctrine of pr obabilit y”. With Jo hn El- liott Dr ink w ater (Later Drinkwater-Bethune) (180 1 –1859), he published anonymously a 64 page elementary treatise on probability ( Lubbo ck and Drinkwater-Bethune ( 183 0 )). Lubbo ck’s slightly younger collea g ue, Augus tus De Morgan (18 06–1871), was making a name for himself as mathematicia n, actuary and academic. The pap er of Lubbo ck ( 1830 ) attempts to a ddr ess and correct curr en t sho rtcomings of life ta bles used at the time in England. He praises Laplace’s Th ´ eorie analytique in respect of its Bay esian a pproach, and applies this approa c h to m ultinomial distributions o f obs erv ations, to obtain in pa rticular probabilities of interv als symmetric ab out the mean via a normal limiting distr ibution. Lubbo ck v ery likely used the 1 820 edition of the Th ´ eorie analytique , since his colleague De Morga n w as in 1 837 to review this edition. De Morg a n’s c hief work on proba bilit y was, cons equen tly , a length y article in the Encyclop e dia Metr op olitana usually cited as of 1 845, but publishe d as separa tum in 1837 ( De Morgan ( 1837 )). This was primarily a summary , simpliﬁcation and clariﬁcation of man y of Laplace’s deriv ations. It w as the ﬁrst full-length e x position of La placian theory and the ﬁrst ma jor work in E nglish on probability theor y . An early fo otnote (p. 410 ) expresses De Mor g an’s satisfaction that in Lubb ock a nd Drinkwater-Beth une ( 1830 ) there is a c o llection in Eng lish, “a nd in so acce s sible a form” on “problems on ga m bling which usually ﬁll works on our s ub ject”, so he ha s no com- punction in throwing most of these aside “to make room for pr inciples” in the manner 14 h ttp://www.maths.usyd.edu.au/u/esenet a/TMS 9 37-77.pdf . 16 E. Seneta of, tho ug h not necessarily in the metho dology of, L a place. There is no mention of De Moivre or Bay es. On p. 4 13, Section 48, which is o n “the probability of future even ts fro m those which are pa st”, De Morga n a ddresses the same pro blem as Lubbo ck ( 183 0 ). Using the multi- nomial distribution for the pr ior pr obabilities, he calculates the p osterior distribution by Bay es’s Theorem, a nd this is then use d to ﬁnd the joint distribution from further draw- ings. Stirling ’s formula (with no attribution) is introduced in Sectio n 7 0. A discussio n of the no rmal approximation to the binomia l follows in Section 74, pp. 431 –434. W e co uld ﬁnd no men tion as such of Jacob Bernoulli’s Theo rem or De Moivre’s Theorem. Section 74 is concluded by Nicolaus Bernoulli’s exa mple, which is taken directly from Lapla ce: with success pro babilit y 18 / 35 , and 14 000 trials , P (72 0 0 − 16 3 ≤ X ≤ 72 00 + 1 63) is con- sidered, for whic h De Morgan obtains 0 .99433 (a little closer to the true v alue 0.99 431 than Laplace ). Section 77, p. 4 34, addresses “the inv erse question” of prediction g iv en observ ations and prior distribution. De Mo rgan ( 1838 ) publishe d An Essay on Pr ob abilities , designed for the use of actuar- ies. The b ook, clearly written and m uch less technical than De Mo r gan ( 1 837 ) remained widely used in the insur ance industry for many years. It g a ve an interesting p erception of the history up to that time, esp ecially o f the English co ntributions. On pp. v–viii De Morgan says: A t the end of the seventeen th century , the theor y of pro ba bilit y was contained in a few isola ted problems, which had b een solved b y P ascal, Huyghens, J ames Bernoulli, and others . . . . Montmort, James Bernoulli, and pe r haps others, had made some slight a ttempt s to overcome the mathematical diﬃculty; but De Mo ivre, o ne o f the most profound analysts of his day , was the ﬁr s t who made decided progress in the remov al of the necess it y for tedious op erations . . . when we lo ok at the intricate analysis by which Lapla ce obtained the sa me [results], . . . De Moivre nevertheless did not discov er the inverse metho d. This was ﬁrst used by the Rev. T. Bayes, . . . Laplace, armed with the mathematica l a id g iv en by De Moivre, Stirling , Euler a nd others, and being in p ossessio n o f the inv erse principle already mentioned, succeeded . . . in . . . re ducing the diﬃculties of ca lculation . . . within the reach of an ordina ry arithmetician . . . for the solution of all questio ns in the theory of chances which would otherwise requir e lar ge num b ers of o peratio ns . The instrumen t employed is a table (marked T able I in the App endix to this w ork), up on the construction of which the ultimate solution of every pro blem may b e made to dep e nd. T a ble I is basically a table of the normal distribution. 8. The British and F renc h streams con tin ue 8.1. Bo ole and T o dh un ter George Bo ole (181 5 –1864), a prot ´ eg ´ e of De Mo rgan, in his bo ok Bo ole ( 1854 ), introduced (p. 3 07) what b ecame kno wn as Bo ole’s Inequa lit y , whic h w as later instrumental in coping with statistical dependence in C a n telli’s ( 1917 ) pioneering trea tmen t of the Strong La w of T ric entenary history 17 Large Numbers ( Seneta ( 19 92 )). Bo ole’s b ook contains one of the ﬁrst care ful treatments of hypothesis testing on the foundation of Bayes’s Theorem. 15 ( Rice a nd Seneta ( 2005 )). Bo ole do es not app ear to pay a tten tion to Jacob B e rnoulli’s Theore m, nor do es he follow Laplacian metho ds a lthough he shows resp ect for De Mo rgan, as o ne who: has mos t fully entered into the spirit of Lapla c e. He ob jects to the uniform prior to ex press igno rance, a nd to inv erse probability (Bay esian) metho ds in genera l, pa rticularly in r egard to his discussio n o f Lapla ce’s Law of Successio n. B oole is kinder to Poisson ( 1837 ), w ho m he quotes at leng th a t the outse t of his Chapter XVI: On the The ory of Pr ob abili ties . His a pproach to this theory is in essence s et-theoretic, in the spirit of for mal logic. The history of the theor y of probabilit y upto and including La place, a nd e v en some later rela ted materials, app ears in the r emark a ble b o o k of T o dhun ter ( 1865 ) which is still in use to da y . A whole chapter entit led: Chapter VI I. Ja mes Berno ulli (Sections 92 –134, pp. 56–77 ) addresses the whole of the Ars Conje ctandi . Sections 123– 124, devoted to Jacob Ber noulli’s Theorem, b egin with: The mos t remark able sub ject con tained in the fourth pa rt of the A rs Conje ctandi is the enunciation of what w e now ca ll Bernoul li’s The or em . The theorem is enunciated just as Bernoulli des cribed it; of ho w la rge N ( n ) is to be to give the sp eciﬁed precisio n. Section 123 e nds with: James Bernoulli’s demonstration o f this result is lo ng but p erfectly satisfactor y . . . W e shall see that James Bernoulli’s demonstra tio n is now supe r seded by the use of Stir ling’s Theorem. In Section 124 , T o dh unt er uses Jacob Ber no ulli’s own exa mples , including the one we hav e cited (“for the o dds to b e 1000 to 1”). Section 125 is o n the inversion problem: given the n umber of successes in n trials , to determine the precisio n of the e s timate of the probability of success. T o dh unter concludes by saying that the inversion has b een done in tw o ways, by an inversion of Ja mes Berno ulli’s Theorem, or b y the aid of another theo rem called Ba yes’s theorem; the results a ppro ximately ag ree. See Laplace Th ´ eorie An- alytique . . . pag es 2 82 a nd 36 6. Section 13 5 concludes with: The pr oblems in the ﬁrst three parts o f the Ars Conje ctandi cannot b e co nsidered equal in imp ortance or diﬃculty to those which we ﬁnd inv estigated b y Montmort and De Moivre; but the memorable theorem in the fourth par t, which justly bea rs its author’s name, will ensure him a p e rmanen t place in the history of the Theory of P robability . 15 According to T o dh un ter ( 1865 ) th ere is no diﬀerence in essence betw een the 1814 2nd and the 1820 3rd editions. 18 E. Seneta Impo rtan t here is T o dh unter’s v iew tha t Ja cob Ber noulli’s pr o of has b een sup erseded. Only the limit the or em asp e ct is b eing pe rceiv ed, and that as a co rollary to De Moivre’s Theorem, although in this connec tio n and at this p oint De Moivre gets no cr edit, despite Laplace’s ( 18 14 ) full recog nition for his theo rem. 8.2. Crofton and Co ok Wilson T o dhun ter’s limited p erception of Jac ob Bernoulli’s Theo rem as only a limit theorem, with Stir ling’s Theorem as instrument of pro of in the manner of De Moivr e–Laplace, but without mention of De Moivre, b ecame the standard one in subsequen t Br itis h probability theory . In his Encyclop ae dia Britannic a article in the famous 9th edition, Cr ofton ( 1885 ) constructs such a pro of (pp. 772–77 3), using his characteristically geometric approach, to emphasize the approximative use of the no rmal integral to calculate pr obabilities, and then co ncludes with: Hence it is always p ossible to incr e ase the nu mb er of trials til l it b e c omes c ertainty that the pr op ortion o f o c curr enc es of the event wil l diﬀer fr om p ( its pr ob ability on a single trial ) by a qu antity less than any assignable. This is the celebr ated theorem given by Ja mes Bernoulli in the Ars Conje ctandi. (See T o dhun ter’s History , p. 7 1.) Then Cr ofton pr esen ts the whole issue of Laplace’s pr edictiv e appro ac h as a con- sequence o f B a yes’s Theore m in Section 17 of Crofton ( 1885 ) (pp. 774– 775), using a characteristically geometric ar gumen t, toge ther with Stirling’s Theorem. Crofton’s general fra nc o philia is everywhere evident; he had sp en t a time in F rance. His c o ncluding par a graph on p. 77 8, on literature, mentions De Morga n’s Encyclop ae dia Metr op olitana presentation, Bo ole’s b o ok with some dispa ragement , a nd a num ber of F r enc h langua ge sour ces, but he refers: . . . the reader, . . . above a ll, to the g reat work o f Laplace, of which it is s uﬃcie nt to say that it is worth y o f the ge nius o f its author – the Th´ eorie analytique des pr ob abilit´ es, . . . There is a c e rtain duality b et ween De Moivre, a Protes tan t refugee from Catholic F r ance to Pro testan t England, and Crofton, an Anglo-Iris h con vert to Roman Catholicism in the fo otsteps of John Henry (Cardinal) Newman, himself an author o n probability , and an inﬂuence on Crofton, a s is evident from its ea rly par agraphs, in Cro fton ( 18 85 ). Crofton’s ( 1885 ) article was lik ely brought to the atten tion Seneta ( 2012 ) of John Co ok Wilson (18 49–1915), who in Co ok Wilson ( 19 01 ) developed his own r elativ ely simple pro of of the limit asp ect o f “J ames Ber noulli’s Theor em”. He uses domina tion by a geometric progres sion. His motiv a tio n is the simpliﬁcatio n o f Laplac e ’s pro of as presen ted in T o dhun ter ( 18 65 , Sectio n 993). There is no mention of De Moivre, and dealing s with the normal int egr al are av oided. An interesting featur e is that Co ok Wilson consider s asymmetric b ounds fo r the deviation X n − p , but he do es even tually re sort to limiting arguments using Stirling’s approximation, so the po ssibilit y o f an exact b ounding result in the Ber no ulli and Bienaym ´ e– Cheb yshev style is lost. T ric entenary history 19 8.3. Bertrand In the b oo k of Bertrand ( 1907 ) in the F r enc h stream, Chapitre IV contains a pro of of De Mo iv re’s Theorem, and ment ions b oth De Moivre and Stirling ’s Theore m, but there seems to b e no mention of “Jacques Bernoulli” in the chapter con tent, nor a statement of his Theorem, let alone a pro of. Chapitr e V of Bertrand ( 1907 ) has tw o “ demonstrations” which Bertra nd ( 1907 , p. 101 ) descr ibes only in its limit a spect. Ber trand ﬁrst shows that if X i , i = 1 , . . . , n are indep enden tly and identically distributed, and E X 2 1 < ∞ , then V ar ¯ X = V ar X 1 n → 0 , n → ∞ and then simply applies this to the case when P ( X 1 = 1) = p, P ( X 1 = 0) = q = 1 − p . There is no mention of the Bienaym ´ e–Che byshev Inequality or its a uthors. That V ar ¯ X → 0 is deemed s uﬃcie nt for “co n vergence” o ne might charitably equate to foresha do wing co nvergence in mea n square. In the ﬁnal section of Cha pitre V, Section 80, p. 101 , Bertrand ( 1907 ) asser ts that he will g iv e a demonstration to the theorem of Berno ulli even s impler than the prec e ding. What follo ws is a demonstration that for { 0 , 1 } ra ndom v ariables X i , i = 1 , . . . , n, E | ¯ X − p | → 0 , n → ∞ without the need to calculate E | P n i =1 X i − np | . The reader will see that this a ctually follows easily from V ar ¯ X → 0 . The “ex act a spect” of Jac o b Ber noulli’s theor em has disa ppeared. 8.4. K. P earson In a p erceptiv e pap er written in that author’s so mewhat abra s iv e style, Karl Pearson ( 1925 ) restore s cre dit to De Moivre for his achiev ement, and r efocuses (p. 2 02) o n the need, s ur ely appropriate for a mathematical sta tistician, to o btain a b etter expres s ion for sa mple size , n , needed fo r s peciﬁed accurac y . In his Sec tio n 2, Pearson repro duces the main features o f Jacob B ernoulli’s pro of, and s ho ws how the no r mal approximation to the binomial in the manner of De Moivre can be used to determine n for speciﬁe d pr e cision if p is known . In Section 3, Pearson tightens up Bernoulli’s pro of keeping the sa me expr essions for p = r r + s and ε = 1 r + s , b y using a geometric s e ries bounding pro cedure and then Stirling’s Theor em. There is no men tion of Co ok Wilson’s ( 1901 ) work. Recall that if p 6 = 1 2 one pr oblem with the norma l approximation to the normal is that asymmetry ab out its mean of the binomial is not reﬂected in the normal. Thus in cons idering c c + 1 < P      X n − p     ≤ ε  = P ( X ≤ np + nε ) − P ( X < np − nε ) (16) inv olves bino mial tails of diﬀering pro babilit y size. A commensura te asp ect in Pearson ( 1925 ) is the treatment of the tails of the binomial distribution individually . The approximation is rema r k ably goo d, giving for Ber noulli’s example where r = 30 , s = 20 , p = 3 5 , c = 1 000 , ε = 1 50 the result n 0 ( ε, c ) ≥ 6502, whic h is almost the same as for the norma l a ppr o ximation to the binomia l (6498 ). The rea s on is similar: the use of the De Moivr e–Stirling approximation for x !, and the fact that p = 0 . 6 is close to p = 0 . 5 , which is the case of sy mmetric binomial (when it is known that a cor rection for co n tin uity s uc h as La place’s with the normal probabilit y function giv es 20 E. Seneta very acc ur ate res ults). Pearson does not attempt the in version (that is, the determination of n when p is not known) in Jacob Bernoulli’s e x ample. 9. Sample size and emerging b ounds 9.1. Sample size in Bernoulli’s example F o r this clas sical ex a mple when p = 0 . 6, referring to ( 16 ), we seek the sma llest n to satisfy 0 . 9990 009999 = 1000 1001 < P ( X ≤ 0 . 6 2 n ) − P ( X < 0 . 5 8 n ) (17) where X ∼ B ( n, 0 . 6). Using R , n = 6491 on the right hand side giv es 0 . 99 9 0126, while n = 6 490 giv es 0 . 998 9679, so the minimal n which will do is 6491 , providing the a lgorithm in R is s atisfactory . Chebyshev’s ( 1846 ) inequa lit y for inhomog eneous binomial trials, when applied to a homogeneous situa tion, gives, as w e hav e seen, a mu ch sharp er “exac t” result for mini- mal n (namely , n ≥ 122 42) for p = 0 . 6 than Bernoulli’s, but, like Jacob Ber noulli’s, w as incapable o f explicit alg e braic inv ers io n when p was unk nown. In his monogra ph (in the 3rd edition, Marko v ( 19 13 ), this is on p. 74 ) Marko v uses the normal approximation with k no wn p = 0 . 6 in Bernoulli’s example to obtain that n ≥ 6 498 is req uir ed for the sp eciﬁed accur acy . 16 In the tra dition of Cheb yshev, a nd in the context of his controv ersy with Nekrasov (see our Section 10.1), Marko v ( 189 9 ) had developed a metho d using contin ued fractio ns to obtain tight b ounds for binomial probabilities when p is known and n is a lso presp eciﬁed. The method is describ ed and illustrated in Uspe ns ky ( 1937 ) , pp. 52– 5 6. On p. 74, Mark ov ( 1913 ) ar gues that the upper b ound 0 . 999 on a ccuracy is likely to hold also for n not m uch gre ater than the appr o ximative 6498 which he had just obtained, say n = 6520 . On pp. 161– 165 he veriﬁes this, sho wing that the pr obabilit y when n = 6520 is betw een 0 . 9990 28 and 0 . 99 9 044. Using R the true v alue is 0 .9990309. Thus Mar k ov’s pro cedure is a n exact procedur e for in version when p and accur a cy a re presp eciﬁed, o nce one has an approximative low er b ound for n . One co uld then pro ceed exp eriment ally , as we have done using R , lo oking for the smallest n . T o eﬀect “ appro ximative” inv ersion if we did not know the value of p , to g et the sp eciﬁed a ccuracy o f the estimate of p pres uming n w ould still b e larg e, we could use De Moivre’s Theo rem a nd the “worst case” b ound p (1 − p ) ≤ 1 4 , to obtain n ≥ z 2 0 4 ε 2 = 0 . 25(3 . 29 0527) 2 (50) 2 = 6767 . 2 3 ≥ 6767 where P ( | Z | ≤ z 0 ) = 0 . 9 99001 . Again the r esult 676 7 is g oo d s ince p = 0 . 6 is not far fr om the worst case v a lue p = 0 . 5. The now commonly used substitution of the estimate ˆ p from 16 Actually , Marko v uses 0 . 999 in pl ace of 0 . 9990009999. T ric entenary history 21 a prelimina ry p erformance of the binomial exp eriment in pla c e of p in p (1 − p ) (and this is implicit in Laplac e’s use of his add- on co rrection to the nor mal to eﬀect inv ersion) would improve the in version result. 9.2. Impro ving Stirling’s appro ximation and t he normal appro ximation The De Moivr e–Laplace appro ximative methods a re ba sed on Stirling’s appr o ximation for the factor ial. They ca n b e reﬁned by obtaining b ounds for the fa c torial. Such b ounds were alrea dy present in an extended fo otnote in Chebyshev ( 1 8 46 ): T 0 x x + 1 2 e − x < x ! < T 0 x x + 1 2 e − x + 1 12 x (18) where T 0 is a p ositive cons tan t. This was la ter r eﬁned 17 to x ! = √ 2 π x x + 1 2 e − x + 1 12 x + θ (19) where 0 < θ < 1. It was therefor e to b e exp ected that De Mo ivre’s Theorem could b e made mo re precise by pro ducing bo unds. De La V all´ ee - P oussin ( 1907 ) in the second of tw o pap ers (see Seneta ( 2001a )) considers the s um P = P x  n x  p x q n − x ov er the range | x − ( n + 1) p + 1 2 | < ( n + 1 ) l for arbitrary ﬁxe d l , and obta ins the bo unds for P in terms of the norma l in tegra l. A b ound for minimal sample size n required for sp eciﬁed accuracy of approximation could be deter mined, at le a st when p was known. Although this work seems to hav e passed largely unnoticed, it presa ges the return of “exac t metho ds” via b ounds o n the deviation of nor mal appr o ximation to the binomial. These b ounds imply a c onver genc e r ate of O ( n − 1 / 2 ). A cy cle of rela ted b ounding pro cedures is initiated in Bernstein ( 1911 ). He b egins by saying that he has not found a rig orous estimate of the accur acy of the no rmal approximation (“Laplace’s for mula”) to P ( | X − np | < z p np (1 − p )) . He illustra tes his own in vestigations by showing when: p = 1 / 2 , n is an o dd num b er, a nd 1 / 2 + z p ( n + 1) / 2 is an int ege r , that: P      X − n 2     ≤ z r n + 1 2  > 2Φ( z √ 2) − 1 (20) where Φ( z ) = P ( Z ≤ z ) is the cumu lative distribution function of a standa rd normal v ari- able Z ∼ N (0 , 1) . He illustrates in the ca se z = 2 . 25 , n = 199 , so that th e right-hand side 18 of ( 20 ) is 0 . 99 85373 . This v a lue is thus a low er bo und for P (77 ≤ X ≤ 122). Bernstein ( 1911 ) then inv erts, by ﬁnding that if Φ( z 0 ) = 0 . 9 985373 , then this v alue nor mal approx- imation corr e sponds to P (77 . 05 ≤ X ≤ 12 1 . 9) = P (78 ≤ X ≤ 121) . This testiﬁes not only 17 F or a his to ry see Boudin ( 1916 ), pp. 244–251: Note II. F orm ule de Stirl ing. 18 Using R , the left-hand side is 0 . 9989406. 22 E. Seneta to the ac curacy of “Laplac e’s for mula” as approximation, but also to the sharpness o f Bernstein’s b ound even for mo derate size o f n , alb eit in the very s peciﬁc situation of p = 1 / 2. The accura cy o f Lapla ce’s formula b ecame a cen tral theme in Bernstein’s s ubsequen t probabilistic work. There is a s trong thematic connection b et ween Bernstein’s str iking work o n pr ob abilistic metho ds in approximation theory in those ear ly pre-w ar years to ab out 1914 , in Khar k ov, and De La V all´ ee Poussin’s approximation theor y . See our Section 11 .3. 10. The Russia n stream. Statistical dep endence and Bicen tenary cele brations 10.1. Nekraso v and Marko v Nekrasov ( 1898 a ) is a s umma r y pap er con taining no proofs. It is dedica ted to the memory of Chebyshev, on account of Nekrasov’s contin uation of Chebyshev’s work on Cent ra l Limit theory in it. The author , P .A. Nekras o v (185 3–1924), attempted to use what w e now call the metho d of saddle p oin ts, of La pla cian p eaks, and o f the Lagr ange in version formula, to establish, for sums of indep enden t non-identically distributed lattice ra ndom v a riables, what a re now standard lo cal and g lobal limit theorems of C e ntral Limit theo r y for larg e de v iations. A follow-up paper, Nekrasov ( 1898b ) dealt exclusively with binomial trials. Marko v’s ( 1898 ) ﬁrst r igorization, within corr e spondence with A.V. V a siliev (185 3 – 1929), of Cheb yshev’s version o f the Central Limit Theorem, a ppeared in the Kazan- ba sed journal edited by V asiliev. The three pap ers of 18 98 mark the b eginning of tw o bitter contro versies b et ween Nekra s o v and Markov, details o f whos e technical and p ersonal int era ction are describ ed in Seneta ( 1984 ). Nekrasov’s writing s from ab out 1898 had be c o me less mathematically fo cused, par tly due to administra tiv e load, a nd par tly due to his use of s tatistics as a propaga ndist to ol of state and relig io us authority (Tsaris t gov ernment and the Russian Ortho dox Churc h). In a long foo tnote ( Nekrasov ( 1902 ), pp. 29–3 1), sta tes “Chebyshev’s Theo rem” as follows: If X 1 , X 2 , . . . , X n are independently distributed and ¯ X n = ( X 1 + X 2 + · · · + X n ) /n then P ( | ¯ X n − E ¯ X n | < τ √ g n ) ≥ 1 − 1 nτ 2 , where τ is a given p ositive num b er, and g n = P n i =1 V ar X i n . He adds that if τ (= τ n ) c an be chosen so that τ n √ g n → 0 while simult aneo usly nτ 2 n → ∞ , then ¯ X n − E ¯ X n conv erges to 0 . This comment encompass es the LLN in its general form at the time. T ric entenary history 23 Nekrasov says ( Seneta ( 19 8 4 )) that he has e x amined the “ theoretical underpinnings of Chebyshev’s Theor em”, and has come to the [corr ect] conclusio n that if in the ab o ve g n is deﬁned as g n = n V ar ¯ X n , the inequality contin ues to hold. Now, in ge ne r al, V ar ¯ X n = P n i =1 V ar X i + 2 P i 0 and C . The case δ = 1 ca me to b e k nown in Russia n- language liter a ture as Chebyshev’s The or em. Markov’s Theorem 2 thus disp enses with the need for ﬁnite v a riance of summands X i , but retains their indep endence. It o ccurs in the same Section 1 6 of Chapter I II of Marko v ( 19 1 3 ), sp eciﬁcally on pp. 83 –88. Marko v’s publications of 1914 str ongly reﬂect his apparen t bac kground r eading activity in prepar ation for the Bicentenary . In particular , a pap er entitled O zadache Y akova 21 Hitherto his attributions had b een to Laplace. 22 Bernstein ( 1927 ) (1934), p. 101 and p. 92, resp ectiv ely , and then Usp ensky ( 1937 ), p. 182, call it Cheb yshev’s (resp. Tshebyshe ﬀ ’s) Lemma. 28 E. Seneta Bernoul li [ O n the pr oblem of J ac ob Bernoul li ] can b e found in Marko v ( 1951 ), pp. 509– 521. In this pap e r in place of what Markov calls the appr o ximate fo rm ula of De Moivre: 1 √ π Z ∞ z e − z 2 dz for P ( X > np + z p 2 npq ) he derives the express io n 1 √ π Z ∞ z e − z 2 dz + (1 − 2 z 2 )( p − q ) e − z 2 6 √ 2 npq π which Ma r k ov calls Cheb yshev’s form ula. This pape r of Markov’s clearly motiv ated Uspe ns ky ( 1937 ) to ultimately r e solv e the issues throug h the co mponent ( 2 7 ) of Us- pens ky’s expressio n. 11.2. Bernstein and Usp ensky on the WLLN Marko v died in 1922 w ell after the Bolshevik seizure of p ower, and it was thr o ugh the 4th (p osth umous) edition of Ischislenie V er oiatnestei ( Markov ( 1924 )) that his results were publicized and extended, in the ﬁrst instance in the Soviet Union due to the mono- graph S.N. Bernstein ( 1927 ). The third part of this b o ok (pp. 1 42–199) is titled The L aw of L ar ge Numb ers and co nsists of three chapters: Chapter 1: Chebyshev’ s ine quality and its c onse quenc es. Cha pter 2: R eﬁnement of Ch ebyshev’s Ine qualiity. and Chapter 3: Extension of the L aw of L ar ge Num b ers t o dep endent quant itities. Chapter 3 b egins with Marko v’s Theorem 1, and co n tin ues with study of the eﬀect of sp e ciﬁc forms of corr e- lation betw een the summands forming S n . In Chapter 1, o n p. 155 Be rnstein mentions Marko v’s Theorem 2 as a r e sult of the “ deceased Academician A.A. Marko v” a nd adds “The re ader will ﬁnd the pro of in the textb o ok of A.A. Mar k ov”. A pro of is included in the second edition, Bernstein ( 1934 ), in which the three chapters in the third part are almost unchanged from Bernstein ( 1 9 27 ). Bernstein ( 19 24 ) returned to the proble m of accura cy of the no rmal a ppro ximation to the binomial via b ounds. He show ed that there exists an α ( | α | ≤ 1) such that P = P x  n x  p x q n − x summed over x sa tisfying     x − np − t 2 6 ( q − p )     < t √ npq + α is 1 √ 2 π Z t − t e − u 2 / 2 du + 2 θ e − (2 npq ) 1 / 3 (24) where | θ | < 1 for a n y n, t , pr o vided t 2 / 16 ≤ npq ≥ 365. The to ol used, p erhaps for the ﬁrst time ever, was what came to b e k no wn as Bernstein ’s Ine qu al ity : P ( V > v ) ≤ E ( e V ε ) e vε for a ny ε > 0 , which follows fro m P ( U > u ) ≤ E ( U ) u (25) namely Mar k ov’s Inequa lit y (called Chebyshev’s L emma by Be rnstein). It holds for any random v aria ble V , on substituting U = e V ε , u = e vε . If E ( e V ε ) < ∞ the bound is par- ticularly eﬀective for a non-negative rando m v ariable V suc h as the binomial, since the T ric entenary history 29 bo und may b e tightened b y manipulating ε . I n connection with a discussion of ( 25 ), Bernstein ( 1927 ), pp. 2 31–232 p oints out tha t, conse q uen tly the ordinary (uncor rected) normal in tegr al a ppro ximation thus gives adequate accur acy when npq is of size se v eral hu ndred, but in cases wher e great accuracy is not requir e d, npq ≥ 3 0 will do. How ever, our int eres t in ( 25 ) is in its nature as an exact r esult a nd in the suggested r ate of c onver genc e , O ( n − 1 ), to the limit in the WLLN which the bo unds pr o vide. The e ntire issue was res olv ed into an ultimate exact fo rm, under the partial inﬂuence of the extensive treatment of the WLLN in Bernstein’s ( 1927 ) textb o o k, by Uspensk y ( 1937 , Chapter VI I, p. 13 0) who show ed that P taken ov er the usua l range t 1 √ npq ≤ x − np ≤ t 2 √ npq for any real nu mbers t 1 < t 2 , ca n b e ex pressed (provided npq ≥ 2 5) as: 1 √ 2 π Z t 2 t 1 e − u 2 / 2 du + (1 / 2 − θ 1 ) e − t 2 1 / 2 + (1 / 2 − θ 2 ) e − t 2 2 / 2 √ 2 π npq (26) + ( q − p ) { (1 − t 2 2 ) e − t 2 2 / 2 − (1 − t 2 1 ) e − t 2 1 / 2 } 6 √ 2 π npq + Ω , (27) where θ 2 = np + t 2 √ npq − [ np + t 2 √ npq ] , θ 1 = np − t 1 √ npq − [ np − t 1 √ npq ], and | Ω | < 0 . 20 + 0 . 25 | p − q | npq + e − 3 √ npq / 2 . The symmetric case then follows b y putting t 2 = − t 1 = t , so the “Chebyshev” ter m v a nishes. When b oth np and t √ npq are integers, θ 1 = θ 2 = 0, r educing the corr ection term in ( 26 ) to Laplace ’s e − t 2 / 2 / √ 2 π npq . But in any case, b ounds which are within O( n − 1 ) of the tr ue v alue are thus av aila ble. Uspe ns ky’s ( 1937 ) b ook car ried Markov’s theory to the E nglish-spea king co un tries. Uspe ns ky ( 193 7 ) cites Markov ( 1924 ) and Ber nstein ( 192 7 ) in his tw o-chapter discus sion of the LL N. Ma r k ov’s Theore m 2 is stated and proved in Chapter X, Section 8. Presum- ably the sec o nd ( 1934 ) edition o f Berns tein’s textb ook was not av aila ble to Usp ensky due to circumstances mentioned b elow. On the other hand in Uspe ns ky ( 193 7 ) the idea s in the pro of of Ma rk ov’s Theor em 2 are used to prove the now fa mous “Khinchin’s The- orem”, an ultimate form of the WLLN. F or indep enden t identically distributed (i.i.d.) summands, K hinc hin ( Khintc hine ( 192 9 )) show ed that the existence of a ﬁnite mean, µ = E X i , is suﬃcient for ( 23 ). Finally , Usp ensky ( 19 37 ), pp. 101 –103, pr o ves the Strong Law of La rge Num b ers (SLLN) for the setting o f Berno ulli’s Theorem, and calls this strengthening “ Can telli’s Theo rem”, citing one of the tw o foundation pap ers ( Cantelli ( 1917 )) in the history of the SLLN. On the other hand, Bernstein ( 1934 ), in his third par t has an additio nal Chapter 4 : Statistic al pr ob abilities, aver age values and the c o eﬃcient of disp ersion. It b egins with a B a yesian inversion of Jacob Berno ulli’s Theor em, prov ed under a certain condition on the prio r (unconditional) distr ibution of the num b e r o f “succes ses”, X, in n tria ls. The methodolog y uses Marko v’s Inequalit y applied to P ((Θ − X n ) 4 > w 4 | Θ) and in the classical case o f a uniform prior distribution ov er (0 , 1) o f the success pro babilit y Θ g iv es 30 E. Seneta for a ny given w > 0 P      Θ − X n     < w    X = m  > 1 − 3( n 0 + 1 ) 16 nw 4 n 0 , (28) for n > n 0 and m = 0 , 1 , . . . , n . T his s ho uld b e co mpared with ( 7 ). Bernstein ( 1934 ) a ls o has 4 new appendices. The 4th of these (pp. 4 06–409) is titled: A The or em Inverse to L aplac e’s The or em. This is the Bay esia n in verse of De Moivre’s Theorem, with an arbitra ry prior dens it y , and co n vergence to the standard nor mal inte- gral as m, n → ∞ providing m n behaves a ppropriately . A v ersion of this theor em is now called the Ber nstein-v on Mises Theore m, a lthough this attributio n is no t quite appropr i- ate. After Lapla ce the multiv ar iate extension of Laplace’s inv ersion is a ctually due to his disciplele Bienaym ´ e in 1834 , and is called by v on Mises in 1919 the “Seco nd F undamental Theorem” (the ﬁrst b eing the CL T). Details are given in Section 5.2 of Heyde and Seneta ( 1977 ). The bo oks o f b oth Ber nstein and Uspens k y are very muc h devoted to Markov’s work, and Bernstein’s als o emphasizes a nd publicizes Markov chains. Several sections o f Ber n- stein’s textb oo k in its 19 4 6 4th edition, such as the 4th app endix, are inc luded in Bern- stein’s ( 19 6 4 ) c ollected works and hav e not b een published separa tely . 11.3. Biographical notes on Bernstein and Usp ensky Bernstein and Usp ensky play ed parallel and inﬂuential roles in publicizing and extending Marko v’s work, espec ia lly on the L L N. These ro les were conditioned b y their background. T o help under stand, we sketc h these backgrounds. Usp ensky’s stor y is almost unknown. Sergei Natanovic h Berns tein (1880–1 968) w as b orn in Odessa in the then Russian Em- pire. Although his fa ther was a do ctor and university lecturer , the family ha d diﬃculties since it was Jewish. On completing high school, Ber nstein w en t to Paris for his ma th- ematical ediucation, and defended a do ctoral dissertation in 1904 at the Sorb onne. He returned in 190 5 and taught at Kharkov Universit y fro m 1908 to 1933 . In the spirit of his F r enc h training and following a Cheb yshevian theme, in the years preceding the outbreak of W or ld W ar I he follo wed Bernstein ( 1911 ) by a num ber of ar ticles o n approximation theory . The s e included the famous pap er o f 191 2 whic h presented a pro babilistic pro of of W eier strass’s Theo rem, and intro duced what we no w call Bernstein Polynomials. A prize- winning pap er which con tains forms of inv erse theorems and Be rnstein’s Inequality , aros e out of a question p osed by De La V all´ ee Poussin. 23 After the Bolshevik Revolution during 1919–1 934 Kharko v (Kharkiv in Ukrainian) was the capital of the Ukrainian SRS. Bernstein b ecame P rofessor at Kharkov Universit y and was activ e in the Soviet reo rganization of tertiary institutions as National Commissar for Education, when the All-Ukrainia n Scientiﬁc Resear c h Institute o f Mathematical Sciences was set up in 192 8. In 19 3 3 he was forced to move to Leningra d, wher e he worked a t the 23 See History of Appro ximation Theory (HA T) at http://w ww.math.techn ion.ac.il/hat/papers.html . T ric entenary history 31 Mathematical Institute of the Academ y o f Sciences. He and his wife were ev acuated to Kazakhsta n b efore Leningrad was blo c k aded by German armies fro m Septem b er, 1941 to January , 1 943. F rom 194 3 he worked at the Mathematica l Institute in Mosc ow. F urther detail may b e found in Seneta ( 2001b ). Bern ˇ ste ˘ ın ( 196 4 ) is the 4th volume o f the four v olume collection of his mathematical pap ers. His contin uing int eres t in the accura cy of the normal distribution as approxi- mation to the binomial pro babilities developed into a reexamina tion in a new light of the main theorems of probability , such a s their extension to dep endent summands. The idea of martingale diﬀerences app ears in his work, which is perhaps best known for his extensions o f the Central Limit T he o rem to “ w eakly dep enden t random v ariables”. His was a co n tin uing voice of re ason in the face o f Stalinis t interference in ma thema tical and biological science. A ﬁfth edition of his textbo ok nev er appear e d. It w as stoppe d when almost in pres s b ecause of prev ailing ideo logy . J.V. Usp ensky , the transla tor of the 4th par t o f Ars Conje ctandi into Russia n and the author o f Uspe ns ky ( 1937 ) br ough t the rigo rous probabilistic Russ ian tradition to the English sp eaking world a fter moving to the United States. Y akov Viktorovic h Usp ensky (188 3–1947) is de s cribed by Ma r k ov in his May , 19 1 3, F o rew ard to the translation as “Priv at-Do cen t” (roughly , Assistant Professo r) of St. Petersburg University . His academic contact with Marko v s eems to ha ve b een through Marko v’s other grea t ﬁeld of interest, n umber theory . Uspe nsky’s ma gisterial degre e at this university was conferred in 1910. He wrote on qua dr atic forms and analytical metho ds in the additive theory o f nu mbers. He was “Pr iv at-Do cen t” 19 1 2–1915, and Professo r 1915– 1923, and taught the to-b e-famous Russian num b er theorist I.M. Vinogradov in the Petrogr a d incarnation of St. Petersburg. F or his election to the Russ ian Academ y of Scie nce s in 1921, he had b een nominated by A.A. Markov, V.A. Steklov, and A.N. Krylov, and upto the time o f elections in 1 929 he was the only mathematicia n in the Academy ( Bernoulli ( 1986 ), p. 73). After Markov’s death in 1922, it was Usp ensky who wrote a precis of Mar kov’s academic a ctivit y in the Academy’s Izvest iia , 17 (1923 ) 19– 34. Acco rding to Royden ( 19 88 ), p. 2 43, the “ y ear 19 2 9–1930 saw the app oin tment of James Victor Usp enskly as an acting professor o f mathematics” at Stanfor d University . He was pro fes s or of mathematics there fro m 1931 until his death. He app ears to have anglicized his name and patronymic, Y akov Viktorovich, into James Victor, and it is under this name, or just as J.V. Usp ensky , that he app ears in his English- language writings. Royden ( 19 88 ) writes that Usp ensky had made a trip to the U.S. in the ea rly 1920s . When he did decide to come pe r manen tly he ca me “ in style on a Soviet ship with his passage paid for by the [So viet] gov ernment”, which presumably was unaw are of his int entions. 12. Extensions. Necessary and su ﬃcien t conditions The expr e ssion ( 23 ) is the cla ssical for m of wha t is now called the WLLN. W e hav e conﬁned our selv es to suﬃcient co nditions for ( 23 ) where S n = P n i =1 X i and the { X i , i = 1 , 2 , . . . } are indep enden t and not necess arily identically dis tributed. In par ticular, in the 32 E. Seneta tradition of J acob Bernoulli’s T heo rem as limit theor e m, we have fo cused on the case o f “Bernoulli” summands where P ( X i = 1) = p i = 1 − P ( X i = 0). Because of limitation o f space we do no t discuss the SLLN, and direct the reader to our historical acco un t ( Seneta ( 1 992 ) ), whic h b egins with Borel ( 1909 ) and Ca n telli ( 1917 ). F ur ther historical asp ects may b e found in Fisz ( 1963 ), Chung ( 19 68 ), Petrov ( 1995 ) and Krengel ( 1997 ). The SLLN under “Cheb yshev’s conditions” : { X k } , k = 1 , 2 , . . . pairwis e independent, with v ariances w ell-deﬁned a nd uniformly bounded, w as already av ailable in Ra jchman ( 19 32 ), but this source may hav e b een ina ccessible to b oth Bernstein and Uspe ns ky , no t only b ecause of its language . F r om the 1 9 20s attention turned to ne c essary and suﬃcient conditions for the WLLN for indep enden t summands. Kolmog orov in 1 928 o btained the ﬁr s t such co ndition for “triangula r ar ra ys” , and ther e were a g eneralizations by F elle r in 193 7 and Gnedenko in 1944 (see Gnedenko a nd Ko lmogorov ( 19 6 8 ), Se c tion 2 2 ). In Khinchin’s pap er o n the WLLN in Cantelli’s jo urnal ( Khintc hine ( 1936 )) a tten tion turns to necessar y and suﬃcient conditions for the existence of a sequence { d n } of po sitiv e nu mbers such tha t S n d n p → 1 as n → ∞ (29) where the (i.i.d.) summands X i are non-ne gative . Two new features in the consideration of limit theory for i.i.d. summands make their app earance in Khinchin’s several pap ers in Cantelli’s journal: a fo cus on the asymptotic structur e of the tails of the di stribut io n function , and the expressio n of this s tr ucture in terms of what was la ter rea lized to b e r e gularly varying functions ( F eller ( 196 6 ) , Seneta ( 1976 )). Putting F ( x ) = P ( X i ≤ x ) and ν ( x ) = R x 0 (1 − F ( u )) du , K hinc hin’s necessa ry and suf- ﬁcient condition for ( 29 ) is x (1 − F ( x ) ν ( x ) → 0 as x → ∞ . (30) This is e q uiv alent to ν ( x ) b eing a slowly v ar ying function at inﬁnity . In the event, d n can b e taken a s the unique solution of nν ( d n ) = d n . A de ta iled a ccoun t is given by Cs¨ org˝ o and Simons ( 2008 ), Section 0. It is worth noting additionally that lim x →∞ ν ( x ) = E X i ≤ ∞ ; a nd that if x (1 − F ( x )) = L ( x ), a slowly v arying function at inﬁnity (this includes the case of ﬁnite mean µ w he n ν ( x ) → µ ), then ( 30 ) is s atisﬁed. ( F eller ( 1966 ) Section VI I.7, p. 23 3, Theor e m 3; Seneta ( 1976 ), p. 87). Cs¨ org˝ o and Simons ( 200 8 ), motiv ated partly b y the St. Pet ers burg problem 24 consider, more gener ally than the WLLN sum str ucture, arbitrar y linea r co m binations of i.i.d. nonnegative ra ndo m v ariables. When sp ecializing to sums S n , how ever, they s ho w in their Cor ollary 5 that S n nν ( n ) p → 1 (31) 24 Asso ciat ed with Nicolaus and Daniel Bernoulli . T ric entenary history 33 if and only if ν ( xν ( x )) ν ( x ) → 1 as x → ∞ . (32) They call ( 32 ) the Bojani ´ c–Seneta c ondition. 25 Khinchin’s Theor em itself was gener alized by F eller (see, for example, F eller ( 1966 ) Section VI I.7) in the spirit of Khintc hine ( 1936 ) for i.i.d but not necess arily no nnegativ e summands. Petro v ( 1995 ), Chapter 6, Theorem 4, gives necessar y and suﬃcient conditions for the existence of a sequence of constan ts { b n } such that S n /a n − b n → 0 for an y given sequence of p ositive constants { a n } such that a n → ∞ , where the indep enden t summands X i are not necessa rily identically distributed. T o co nclude this very brief sk etch, w e draw the reader ’s attention to a little-k no wn nec- essary and suﬃcient condition for ( 23 ) to ho ld, for arbitra rily dep enden t not ne c essarily ident ically distributed random v ariables (see, for example, Gnedenko ( 19 63 )). Ac kno wledgemen ts I am indebted to Bernard Bru and Stev e Stigler for information, to an anon ymous Russian probabilist for sug gesting emphasis on “Marko v’s Theorem” and the contributions o f S.N. Bernstein, a nd to the editors Richard Davis and Thomas Mikosch for car eful rea ding. References Arma tte, M. (2006). Les statisticiens du XIXe si ` ecle lecteurs d e Jacques Bernoulli. J. ´ Ele ctr on. Hist. Pr ob ab. Stat. 2 21. MR2393228 Bellhouse, D.R. (2011). A br aham De Moivr e: Set ting th e Stage for Cl ass ic al Pr ob ability and its Appli c ations . Bo ca Raton, FL: CRC Press. MR2850003 Bernoulli, J. (1713). A rs Conje ctandi . Basileae: Thurnisiorum. Bernoulli, J. (1986). O Zakone Bolshikh Chisel . [ On the L aw of L ar ge Numb ers. ] (Y u. V. Prokhoro v, ed.). Moskv a: N auk a. Bernoulli, J. (2005). On t h e law of large n umbers. [T ranslation by O.B. Sh eynin into English of the Pars Quarta of Ars Conje ctandi .] Av ailable at www.sheynin.de/download/bernoulli.pd f . Bernstein, S.N. (1911). S ur le calcul app roch ´ e par la formule d e Laplace. Comm. So c. Math. Kharkov, 2 S´ eries 12 106–110. Also in Bernstein (1964), pp. 5–9, in R ussian translation. Bernstein, S.N. (1924). On a vari ant of Chebyshev’s inequality and on the accuracy of Lap lace’s form ula. (In Russian.) Uchenie Zapiski Nauchno-Issle dovatel’nikh Kafe dr Ukr ainy, O tdelenie Matematiki , No. 1, 38–48. [Also in Bernstein (1964), p p. 71–79.] Bernstein, S.N. (1927). T e oriia V er oiatnostei . [ The The ory of Pr ob abili ties. ] Moskv a– Leningrad: Gosudarstvenno e Izdatel’stvo. 25 It ori gi nat es from Bo jani´ c and Seneta ( 1971 ). 34 E. Seneta Bernstein, S.N. (1934). T e oriia V er oiatnostei . [ The The ory of Pr ob abili ties. ] Moskv a– Leningrad: Gosudarstvennoe T ekh nik o-T eoretichesk oe Izdatel’stvo. [2nd augmented ed. The 3rd ed., of the same year, is identical. The 4th ed., augmented, app eared in 1946.] Bernstein, S. N . (1945). On the w orks of P .L. Chebyshev on probability theory . (In R us- sian.) Nauchno e Nasle die P.L. Chebysheva . [ The Scientiﬁc L e gacy of P.L. Chebyshev .] V ol. 1 Matematika. Moskv a-Leningrad: Ak ad. Nauk SSSR. p p . 43–48. [Also in Bernstein (1964), pp. 409–433 .] Bern ˇ ste ˘ ın, S.N. (1964). Sobr anie So chinenii. Tom IV : Te oriya V er oyatnost ei. Matematich- eskaya Statistika. 1911–1946 . Moskv a: Nauka. MR0169758 Ber trand, J. (1907). Calcul des pr ob abilit ´ es. Paris : Gauthier-Villars. [2nd ed. Reprinted 1972, New Y ork: Chelsea. (1st ed.: 1889)]. Biena ym ´ e, I.J. (1853). Considerations ` a l’appu i de la d´ ecouverte de Laplace sur la loi d e proba- bilit ´ e dans la m´ etho de des moindres carr ´ es. C.R. A c ad. Sci., Paris 37 309–324. Also as (1867) Liouville’s Journal de Math ´ ematiques Pur es et Appliqu´ ees , (2) 12 158–176. Bojani ´ c, R. and Senet a, E. (1971). Slo wly v arying functions and asymptotic relations. J. M ath . Ana l. Appl. 34 302–315 . MR0274676 Boole, G. (1854). An Investigation Into the L aws of Thought on Which Ar e F ounde d the Mathematic al The ories of L o gi c and Pr ob abilities . London: W alton and Mab erly . R eprin ted. New Y ork: Dov er. Borel, E. (1909). Les probabilit´ es d´ enombrables et leur applications arithm ´ etiques. R endc onditi del Cir c olo M at ematic o di Palermo 27 247–271. Bor tkiewicz , L. (1898). Das Gesetz der kleinen Zahlen. Leipzig: G. T eub ner. Boudin, E.J. (1916). L e¸ cons de Calcul des Pr ob abil it ´ es . Pa ris: Gauthier-V il lars. Bru, M.F. , Br u, B. and Eid, S. (2012). Une approche an alyt iq ue de la Th´ eorie analytique , Hermann Laurent (1873). Journal Ele ctr onique d’H i sto ir e des Pr ob abilit´ es et de la Statistique 1. Avai lable at www.j ehps.net . Cantelli, F.P. (1917). Su ll a probabilit` a come limite della frequenza. R endic onti del l a R. A c- c ademia dei Linc ei. Cl ass e di Scienze Fisiche M atematiche e Natur ali, S´ erie 5 a 26 39–45 . Also in Cantelli (1958). Cantelli, F.P. (1958). Alcune memorie matematiche. Onor anze F r anc esc o Paolo Cantel li . Milano: Dott. A. Giuﬀr` e- Editore. Chebyshev, P.L. (1845). O p’it Elementarno go Analiza T e orii V er oiatnostei. [ An Exp erimental Elementary Analysis of Pr ob abi l ity The ory. ] In Chebyshev (1955) 111–189. Chebyshev, P.L. ( 18 46). D ´ emonstration ´ el ´ ementaire d’une prop osition g ´ en´ erale de la th´ eorie des probabilit´ es. Crelle’s. J. R eine A ngew. Mathematik 33 259–26 7. Also in Markoﬀ and Sonin, 1962, T ome I, pp. 17–26. Chebyshev, P.L. (1867 ). Des v aleurs moy ennes. Liouv il le’s. Journal de M at h´ ematiques Pur es et Appliqu´ ees 12 177–184. [Also in Markoﬀ and S onin, 1962, T ome I I, pp. 687–694.] Chebyshev, P.L. (1874). Sur les v aleurs limites des int´ egrales. Liouv ille’s. Journal de Math ´ ematiques Pur es et Appli qu ´ ees 19 15 7–160. [Also in Markoﬀ and Sonin, 1962, T ome I I, p p. 183–185.] Chebyshev, P.L. (1887). Sur d eux th´ eor ` emes relatifs aux probabilit ` es. A cta Math. 14 305–315 . [Also in Chebyshev (1955) and Markoﬀ and Sonin (1962).] Chebyshev, P.L. ( 19 55). Izbr annye T rudy . Mosco w: Ak ad. Nauk S SSR. MR0067792 Chung, K.L. (1968). A Course in Pr ob ability The ory . New Y ork: H arco urt, Brace and W orld. MR0229268 Chupro v, A.A . (1909). Ocherki p o T e orii Statistiki . [ Essays on the The ory of Statistics. ] Sankt P eterburg. [2nd edition of 1910 was reprinted in 1959, by Gosstatizdat: Moskv a.] T ric entenary history 35 Condorc et, M.J.A.N.C.M. (1785). Essai sur l’appli c ation de l ’analyse ` a la pr ob abil it ´ e des d ´ ecisions r endues ` a la plur alit´ e des voix . P aris: Imprimerie Roy ale. Condorc et, M.J.A.N.C.M. (1994). Ar ithm´ etique p olitique. T extes r ar es ou in´ edites (B. Bru and P . Cr´ epel, eds.). Paris : L’In stitut National D’ ´ Etudes D´ emographiques. Cook W ilson, J. (1901). Probability – James Bernoulli’s Theorem. Natur e 1637 465–466. Cournot, A.A. (1843). Exp osition de la th´ eorie des chanc es et des pr ob abili t ´ es. Paris: H ac hette. [Reprinted 1984, ed. B. Bru. Pa ris; J. V rin.] Cro fton, M.W . (1885). Probability . Encyclop ae dia Britannic a , 9th ed. 19 768–788. Cs ¨ or g ¨ o, S. (2001). Nicolaus Bernoulli. In C.C. Heyde and E. Seneta, ed s. (2001), p p. 55–63. Cs ¨ or g ˝ o, S . and Si m ons, G . (2008). W eak laws of large numbers for co operative gam blers. Perio d. Math. Hungar. 57 31–60. MR2448396 De La V all ´ ee-Poussin, C.J. (1907). D´ emonstration nouvelle du th´ eor ` eme de Bernoulli. Ann. So c. Scient. Bruxel les 31 219–236. Also in De La V all´ ee-Poussin (2001). de La V a ll ´ ee-Poussin, C.J. (2001). Col le cte d Works/Oeuvr es Scientiﬁques. Vol. II . Brussels: Acad´ emie R oya le de Belgique. Edited by Paul Butzer, Jean Ma whin and P asquale V etro. MR1929849 De Moivre, A. (1718). The Do ctrine of C hanc es : A Metho d of Calculating the Pr ob ability of Events in Play. Lon don: W. Pea rson. De Moivre, A. (1725). Annuities Up on Li ves , London: W. P earson. [2nd ed.1743: London: W o odfall]. De M oivre, A. (1730). Misc el lane a A nalytic a de Seriebus et Quadr aturis . London: J. T onson and J. W atts. De Moivre , A. (1733). Appr oxim at io ad Summam T erminorum Binomii a + b | n in Seriem exp ansi. [Priv ately printed.] De Moivre , A. (1738). The Do ctrine of Chanc es , 2nd ed. London: W oo dfall. [R eprin ted 1967, New Y ork: Chelsea.] De Mor gan, A. (1837). Theory of probabilities. I n Encyclop ae dia Metr op olitana 2 393–490. London: Baldwin and Crado c k. De Mor gan, A. (1838). An Essay on Pr ob abili ties and on Their Applic ation to Life Contin- gencies and Insur anc e Oﬃ c es . London: Longmans. Feller, W. (1966). An Intr o duction to Pr ob ability The ory and I ts Applic ations 2 . New Y ork: Wiley . [2nd ed. 1971, New Y ork: Wiley]. Fisz, M. (1963). Pr ob ability The ory and Mathematic al Statistics , 3rd ed., New Y ork: Wil ey . [2nd. ed. 1958, as R achunek Pr awdop o dobie´ nstwa i Statys tyka Matematyczna. W arsza w a: Pa´ nstw ow e Wydawnict w o Nauko we. ] MR0115193 Gnedenko, B.V . (1963). The The ory of Pr ob abili ty . T ransl. from 2nd R ussian Edition by B.D. Seckler. New Y ork: Chelsea. Gnedenko, B.V. and Kolmogor o v, A. N . (1968). Limit The or ems for Sums of Indep endent R andom V ariables. (T ransl. and ed. by K.L. Chung). Reading, MA: Addison-W esley . Hald, A. (2007). A History of Par ametric Statistic al Infer enc e f r om Bernoul li to Fisher, 1713– 1935 . Sour c es and Studies i n the History of Mathemat ics and Physic al Scienc es . New Y ork: Springer. MR2284212 Heyde, C.C. and Senet a, E. ( 19 77). I. J. Bienaym´ e. Statistic al The ory Ant ici p ate d . New Y ork: Springer. S tudies in the History of Mathematics and Physica l S ciences, N o. 3. MR0462888 Heyde, C. C . and Se n et a, E. , ed s. (2001). Statisticians of the Centuries . New Y ork: Springer. MR1863206 Khintchine, A. Y. (1929). S ur la loi des grands nombres. C.R. A c ad. Sci., Paris 188 477–479. 36 E. Seneta Khintchine, A.Y. (1936). Su una legge dei grandi numeri generalizzata. Giorn. Ist . Ital. At tuari 7 365–377. Krengel, U. (1997). History of probability and statistics at the Internatio nal Congresses of Mathematicians. Bernoul li News 4 20–28. Lacr oix, S.F. (1816). T r ait ´ e ´ e ´ lementair e du c alcul des pr ob abili t ´ es . Pa ris: Courcier. [2nd ed. P aris: Bacheli er.] Laplace, P.S. (1814). Th ´ eorie analytique des pr ob abili t ´ es. 2nd ´ edition, r evue et augment´ ee p ar l’auteur . Paris: Courcier. Laplace, P.S. (1986). Memoir on the Probability of Causes of Even ts. Statist ic al Scienc e 1 364–378 . (T ranslation from t he F renc h by S.M. Stigler of: Laplace, S.M. (1774) M´ emoir e sur la pr ob abilit ´ e des c auses p ar les ´ ev ´ enemens. ) Laurent, H. (1873). T r ai t ´ e du c alcul des pr ob abi l it ´ es . Pa ris: Gauthier-V illars. Lubbock, J.W. (1830). On t he calculation of annuities, and on some q uestions in t h e theory of chances. T r ansactions of the Cam bridge Philosophic al So ciety 3 141–155. Lubbock, J.W. and Drinkw a ter-Bethune , J.E. (1830). O n Pr ob ability. Libr ary of Useful Know le dge . Lond on: Baldwin and Crado c k. Marko ff, A. and Sonin, N. , ed s. (196 2). Oeuvr es de P.L. T che bychef, T omes I et II . N ew Y ork: Chelsea. [Reprint from St. Peters burg 2 vol. edition, 1899–1907.] Marko v, A.A. (1898). The la w of large num b ers and the metho d of lea st squares. (In Russian.) Izvestiia Fiz. Mat. Obschestva Kazan Univ. , 2nd. S er., 8 110–128 . [Also in Mark ov (1951), pp. 231–251.] Marko v, A.A. ( 18 99). A pplication of contin ued fractions to th e calculation of probabilities (In R ussian.) Izvestiia Fi z. Mat. Obschestva Kazan Univ. , 2nd. Ser. 9 29–34. Marko v, A.A. (1906). Generalization of th e la w of large numbers to d ependent quantities. (In Russian.) I zves tiia Fi z. Mat. Obschestva Kazan Univ. , 2nd. Ser. 15 135–156. [Also in Mark ov (1951), pp. 339–361.] Marko v, A.A. (1913). Ischislenie V er oiatnostei. [ The Calculus of Pr ob abili ties. ] 3rd ed. Sankt P eterburg: Tip ograﬁa Imp eratorsk oi Akademii Nauk . Marko v, A.A. (192 4). Ischislenie V er oiatnostei . [ The Calculus of Pr ob abilities .] 4th (p osthu- mous) ed . Moskv a: Gosizdat. [1st ed.: 1900, 2nd ed.: 1908, 3rd ed.: 1913.] Marko v, A . A. ( 1951). Izbr annye T rudy. Te oriya ˇ Cisel. Te oriya V er oyatnoste ˘ ı . Leningrad: Iz- dat. A k ad. Nau k SS SR. MR0050525 Montmor t, P.R. (1713). Essay d’ analyse sur les jeux de hasar d. 2nd ed. P aris: Jacque Quillau. [Reprinted 1980, NewY ork: Chelsea.] Nekraso v, P.A. (1898a ). General prop erties of numerous indepen den t even ts in connection with approximativ e calculation of functions of very large num b ers. (I n Ru ssi an.) Matematicheskii Sb ornik 20 431–442 . Nekraso v, P.A. (1898b). Limits for th e error in approximate expressions for t h e probabilit y P in Jacob Bernoulli’s theorem. (In Russian.) Matematicheskii Sb ornik 20 485–534, 535–548. Nekraso v, P.A. (1902). Fi losoﬁa i L o gika Nauki o Massovikh Pr oiavleniakh Chelove cheskoi Deiatelnosti ( Per esmotr osnovanii sotsialnoi ﬁziki Ketle ). [ The Phil osoph y and L o gic of the Study of Mass Phenomena of Human A ctivity . ( A r eview of the foundations of the so cial physics of Quete let. )] Moskv a: Universitetsk aia tip ograﬁia . [Also in Matematicheskii Sb ornik 23 436–604.] Ondar, K.O. (1981). The Corr esp ondenc e Betwe en A.A. Markov and A . A. Chupr ov on the The ory of Pr ob ability and Mathematic al Statistics. New Y ork: Springer. (T ranslated by C. and M. Stein.) Pearson, K. ( 19 25). James Bernoulli’s theorem. Biometrika 17 201–210. T ric entenary history 37 Petr ov, V.V. (1995). Limit The or ems of Pr ob ability The ory: Se quenc es of Indep endent R andom V ariables . Oxfor d Studies in Pr ob ability 4 . Oxford: Clarendon Press. Poisson, S.D. (1837). R e cher ches sur la pr ob abi l it ´ e des j uge mens en mati ` er e crimi nel le at en mati` er e civile, pr´ ec´ ed´ es des r` egles g´ en´ er ales du c alcul des pr ob abil it ´ es . P aris: Bachelie r. Pr okhoro v, Y.V. (1986). The Law of Large Numbers and estimates of probability of large deviations. (I n Ru ssi an.) In: Bernoulli (1986), pp. 116–150. Quine, M.P. an d Senet a , E. (1987). Bortkiewicz’s data and the Law of S mall N um b ers. In- ternat. Statist. R ev. 55 173–181. MR0963338 Rajchman, A. (1932). Zaostrzone pra wo w ielkich liczb. [A sharp ened la w of large n umbers.] Mathesis Polska 6 145–161 . Rice, A . and Se net a, E. (2005). De Morgan in the prehistory of statistical hyp othesis testing. J. R oy. Statist. So c. Ser. A 168 615–627. MR2146412 Ro yden, H. (1988). The history of th e Mathematics Department at Stanford. In A Century of Mathematics in A meric a ( P.L. Duren , R. Askey and U.C. Merzbach , eds.). Providence, RI: Amer. Math. So c. Schneide r , I. (1968). Der Mathematik er Abraham de Moivre (1667–1754 ). Ar ch. Hi st ory Exact Sci. 5 177–317. MR1554118 Schneide r , I. (2006). Direct an d indirect inﬂuences of Jakob Bernoulli’s Ars c onje ctandi in 18th century Great Britain. J. ´ Ele ctr on. Hist. Pr ob ab. Stat. 2 17. MR2393219 Senet a, E. (1976). R e gularly V arying F unctions . L e ctur e Notes in Math. 508 . Berlin: Springer. MR0453936 Senet a, E. (1984). The central limit prob lem and linear least sq uares in pre-revolutionary Russia: The background. Math. Sci. 9 37–77. MR0750248 Senet a, E. ( 1992). On the history of the strong law of large n umbers and Bo ole’ s inequality. Historia Math. 19 24–39. MR1150880 Senet a, E. (1996). Marko v and the birth of chai n dep endence theory. International Statistic al R eview 64 255–263. Senet a, E. (2001a). Commentary on De la V all´ e P oussin’s papers on Bernoulli’s Theorem in probabilit y theory . In De La V all ´ ee-Pouss in (2001). Senet a, E. (2001b). S ergei N atano vich Bernstein. I n Heyde and Sen eta (2001) 339–342. Senet a, E. ( 20 03). Statistical regularit y and free will: L.A.J. Q uetelet and P .A. Nekraso v. International Statistic al R eview 71 319–334. Senet a, E. (2012). Victorian probabilit y and Lewis Carroll. J. R oy. Statist. So c. Ser. A 175 435–451 . MR2905047 Sleshinsky, I.V. (1892). On th e theory of the method of least squares. (In Ru ssi an.) Zapiski Mat. Otdelenia Novor ossiskag o Obsche stva Estestvoispitatelei ( Odessa ) 14 201–264. Stigler, S.M. (1986). The History of Statistics. The Me asur ement of Unc ertainty Befor e 1900 . Cam bridge, MA: The Belknap Press of Harv ard Univ. Press. MR0852410 Todhunter, I. ( 18 65). A History of the Mathematic al The ory of Pr ob abili ty. London: Macmil- lan. [Reprinted 1949, 1965, New Y ork: Chelsea.] Uspensky, J.V. (1937). Intr o duction to M athematic al Pr ob abil i ty . New Y ork: McGra w-Hill. V assilief, A.V. (1914). L a bicentenai re de la loi des grands nombres. Enseignement Math ´ ematique 16 92–100.

A Tricentenary history of the Law of Large Numbers

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment