Karl Pearson played an enormous role in determining the content and organization of statistical research in his day, through his research, his teaching, his establishment of laboratories, and his initiation of a vast publishing program. His technical contributions had initially and continue today to have a profound impact upon the work of both applied and theoretical statisticians, partly through their inadequately acknowledged influence upon Ronald A. Fisher. Particular attention is drawn to two of Pearson's major errors that nonetheless have left a positive and lasting impression upon the statistical world.
Deep Dive into Karl Pearsons Theoretical Errors and the Advances They Inspired.
Karl Pearson played an enormous role in determining the content and organization of statistical research in his day, through his research, his teaching, his establishment of laboratories, and his initiation of a vast publishing program. His technical contributions had initially and continue today to have a profound impact upon the work of both applied and theoretical statisticians, partly through their inadequately acknowledged influence upon Ronald A. Fisher. Particular attention is drawn to two of Pearson’s major errors that nonetheless have left a positive and lasting impression upon the statistical world.
arXiv:0808.4032v1 [stat.ME] 29 Aug 2008
Statistical Science
2008, Vol. 23, No. 2, 261–271
DOI: 10.1214/08-STS256
c
⃝Institute of Mathematical Statistics, 2008
Karl Pearson’s Theoretical Errors and the
Advances They Inspired
Stephen M. Stigler
Abstract.
Karl Pearson played an enormous role in determining the
content and organization of statistical research in his day, through his
research, his teaching, his establishment of laboratories, and his ini-
tiation of a vast publishing program. His technical contributions had
initially and continue today to have a profound impact upon the work
of both applied and theoretical statisticians, partly through their in-
adequately acknowledged influence upon Ronald A. Fisher. Particular
attention is drawn to two of Pearson’s major errors that nonetheless
have left a positive and lasting impression upon the statistical world.
Key words and phrases:
Karl Pearson, R. A. Fisher, Chi-square test,
degrees of freedom, parametric inference, history of statistics.
1. INTRODUCTION
Karl Pearson surely ranks among the more pro-
ductive and intellectually energetic scholars in his-
tory. He cannot match the most prolific humanists,
such as one of whom it has been said, “he had no
unpublished thought,” but in the domain of quanti-
tative science Pearson has no serious rival. Even the
immensely prolific Leonhard Euler, whose collected
works are still being published more than two cen-
turies after his death, falls short of Pearson in sheer
volume. A list of Pearson’s works fills a hardbound
book; that book lists 648 works and is still incom-
plete (Morant, 1939). My own moderate collection
of his works—itself very far from complete (it omits
his contributions to Biometrika)—occupies 5 feet of
Stephen M. Stigler is the Ernest DeWitt Burton
Distinguished Service Professor in the Department of
Statistics, University of Chicago, 5734 University
Avenue, Chicago, Illinois 60637, USA e-mail:
stigler@uchicago.edu. This paper is based upon a talk
presented at the Royal Statistical Society in March
2007, at a symposium celebrating the 150th anniversary
of Karl Pearson’s birth.
This is an electronic reprint of the original article
published by the Institute of Mathematical Statistics in
Statistical Science, 2008, Vol. 23, No. 2, 261–271. This
reprint differs from the original in pagination and
typographic detail.
shelf space. And his were not casually constructed
works: when a student or a new co-worker would do
the laborious calculations for some statistical anal-
ysis, Pearson would redo the work to greater accu-
racy, as a check. An American visiting Pearson in
the early 1930s once asked him how he found the
time to write so much and compute so much. Pear-
son replied, “You Americans would not understand,
but I never answer a telephone or attend a commit-
tee meeting” (Stouffer, 1958).
Pearson’s accomplishments were not merely volu-
minous; they could be luminously enlightening as
well. Today the most famous of these are Pearson’s
Product Moment Correlation Coefficient and the Chi-
square test, dating respectively from 1896 and 1900
(Pearson, 1896, 1900a, 1900b). He was a driving
force behind the founding of Biometrika, which he
edited for 36 years and made into the first important
journal in mathematical statistics. He also estab-
lished another journal (the Annals of Eugenics) and
several additional serial publications, two research
laboratories, and a school of statistical thought. Pear-
son pioneered in the use of machine calculation, and
he supervised the calculation of a series of mathe-
matical tables that influenced statistical practice for
decades. He made other discoveries, less commonly
associated with his name. He was in 1897 the first
to name the phenomenon of “spurious correlation,”
thus publicly identifying a powerful idea that made
1
2
S. M. STIGLER
him and countless descendents more aware of the
pitfalls expected in any serious statistical investi-
gation of society (Pearson, 1897). And in a series of
investigations of craniometry he introduced the idea
of landmarks to the statistical study of shapes.
Pearson was at one time well known for the Pear-
son Family of Frequency Curves. That family is sel-
dom referred to today, but there is a small fact (re-
ally a striking discovery) he found in its early de-
velopment that I would call attention to. When we
think of the normal approximation to the binomial,
we usually think in terms of large samples. Pear-
son discovered that there is a sense in which the
two distributions agree exactly for even the smallest
number of trials. It is well known that the normal
density is characterized by the differential equation
d
dx log(f(x)) = f ′(x)
f(x) = −(x −µ)
σ2
.
Pearson discovered that p(k), the probability func-
tion for the symmetric binomial distribution (n in-
dependent trials, p = 0.5 each trial), satisfies the
analogous difference equation exactly:
p(k + 1) −p(k)
(p(k + 1) + p(k))/2 = −(k + 1/2) −n/2
(n + 1) · 1/2 · 1/2
or
rate of change p(k)
…(Full text truncated)…
This content is AI-processed based on ArXiv data.