Foundations of Descriptive and Inferential Statistics

F O U N DA T I O N S O F D E S C R I P T I V E A N D I N F E R E N T I A L S T A T I S T I C S Lecture notes for a quantit ativ e–methodologi cal module at the Bachelor degree (B.Sc.) lev el H E N K V A N E L S T August 30, 2019 parcIT GmbH Erftstraße 15 50672 Köln Germany E–Mail: Henk.van. Elst@parcIT .de E–Print: arXiv:1302.2525v4 [stat.AP] © 2008–2019 Henk va n Elst Abstract These lecture notes were written with the aim to provide an accessible though technica lly solid introductio n to the logic of systemati cal analyses of statistica l data to both under grad uate and postgradu ate students, in particu lar in the Social Sciences, Economics, and the Financial S ervices . The y m ay also serve as a general referen ce for the applicatio n of quan titati v e–empir ical res earch metho ds. In an attempt to encou rage the adopti on of an interd iscipli nary perspec ti ve on quant itati v e probl ems arising in pract ice, the notes cov er the four broad topics (i) descripti v e sta tistica l processing of raw data, (ii) elementary proba bility theory , (iii) the operationa lisatio n of one-dimens ional latent statis tical vari ables according to L ikert’ s widely used scalin g approach, and (iv) null hypothes is signi ﬁcance testing within the frequentis t approa ch to probab ility theory concernin g (a) distri b utional diffe rences of v ariable s between subgroups of a targ et popul ation, and (b) statistical associ ations between two v ariab les. T he rele v ance of effe ct size s for making inferen ces is emphasis ed. These lectu re notes are fully hyperlink ed, thus provi ding a direc t route to origina l scientiﬁc papers as well as to interestin g biographica l information. They also list many commands for runnin g statis- tical funct ions and data analys is routine s in the software packag es R , SPSS, EXCEL and OpenOf ﬁce. T he immediate in v olv ement in actual data analysis practices is strongl y recommended. Cite as: arXiv:1302.2525v4 [stat.AP] These lecture notes were typeset in L A T E X 2 ε . Contents Abstract Intr oductory remarks 1 1 Statistical variables 5 1.1 Scale le vels of m easurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Raw data sets and d at a matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Univariate frequ ency distributions 9 2.1 Absolute and relativ e frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Empirical cumulativ e dist ri bution functio n (discrete data) . . . . . . . . . . . . . . 11 2.3 Empirical cumulativ e dist ri bution functio n (continuous data) . . . . . . . . . . . . 13 3 Measur es for uni variate distrib utions 15 3.1 Measures of central tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.3 α –Quantil e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.4 Fi ve num ber summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.5 Sample mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.6 W eighted mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Measures of va riability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Interquartile range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.3 Sample variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.4 Sample standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.5 Sample coefﬁ cient of variation . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.6 Standardisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3.3 Measures of relativ e dis tortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Ske wness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 Excess kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Measures of concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.1 Lorenz curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.2 Normalised Gini coefﬁ cient . . . . . . . . . . . . . . . . . . . . . . . . . 25 CONTENTS 4 Measur es of ass ociation f or bivariate distrib utions 27 4.1 ( k × l ) contingency tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Measures of association for the metrical scale leve l . . . . . . . . . . . . . . . . . 29 4.2.1 Sample cov ariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.2 Bra v ais and Pearson’ s sampl e correlatio n coefﬁcient . . . . . . . . . . . . 31 4.3 Measures of association for the ordinal scale level . . . . . . . . . . . . . . . . . . 3 3 4.4 Measures of association for the nomin al scale level . . . . . . . . . . . . . . . . . 34 5 Descriptiv e linear regr ession analysis 37 5.1 Method of least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Empirical regre ssion line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.3 Coef ﬁcient of determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6 Elements of prob ability theory 41 6.1 Random e vents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2 K olmogorov’ s axiom s of probability theory . . . . . . . . . . . . . . . . . . . . . 44 6.3 Laplacian random experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.4.2 Combinations and variations . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.5 Conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.5.1 Law o f total probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.5.2 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7 Discre te and continuous random variables 51 7.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 7.2 Continuous random v ariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.3 Ske wness and excess kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 7.4 Lorenz curve for continu o us random v ariables . . . . . . . . . . . . . . . . . . . . 55 7.5 Linear transformations of random variables . . . . . . . . . . . . . . . . . . . . . 55 7.5.1 Eff ect on expectation values . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.5.2 Eff ect on variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.5.3 Standardisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 6 7.6 Sums of random va riables and reproductivity . . . . . . . . . . . . . . . . . . . . 56 7.7 T wo-dimensional random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7.7.1 Joint probability distributions . . . . . . . . . . . . . . . . . . . . . . . . 57 7.7.2 Marginal and condit ional distributions . . . . . . . . . . . . . . . . . . . . 58 7.7.3 Bayes’ theorem for two-dimensional random va riables . . . . . . . . . . . 59 7.7.4 Cov ariance and correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 60 8 Standard univariate probability distrib utions 63 8.1 Discrete uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 8.2 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.2.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 8.2.2 General binom ial dis tribution . . . . . . . . . . . . . . . . . . . . . . . . 66 CONTENTS 8.3 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 8.4 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 8.5 Continuous uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 8.6 Gaußian normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8.7 χ 2 –distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 8.8 t –distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 8.9 F –distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8.10 Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 8.11 Exponenti al dist ribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8.12 Logist ic dist ribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.13 Special hyperbolic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.14 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.15 Central limi t theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 9 Likert ’ s scaling method of summated i tem ratings 93 10 Random sampling of target populations 97 10.1 Random sampling methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 10.1.1 Simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 10.1.2 Stratiﬁed random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 10 0 10.1.3 Cluster random s ampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 10.2 Point estimato r function s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 11 Null hypoth esis signiﬁcance testing 103 11.1 General procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 11.2 Deﬁnition of a p –value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 12 Univariate methods of statistical data analysi s 109 12.1 Conﬁdence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 12.1.1 Conﬁdence interva ls for a mean . . . . . . . . . . . . . . . . . . . . . . . 110 12.1.2 Conﬁdence interva ls for a var iance . . . . . . . . . . . . . . . . . . . . . . 110 12.2 One-sample χ 2 –goodness–of–ﬁt–t est . . . . . . . . . . . . . . . . . . . . . . . . . 111 12.3 One-sample t – and Z –tests for a popul ation m ean . . . . . . . . . . . . . . . . . . 112 12.4 One-sample χ 2 –test for a popul ation variance . . . . . . . . . . . . . . . . . . . . 115 12.5 Independent samples t –test for a mean . . . . . . . . . . . . . . . . . . . . . . . . 116 12.6 Independent samples Mann–Whi tney– U –test . . . . . . . . . . . . . . . . . . . . 118 12.7 Independent samples F –test for a var iance . . . . . . . . . . . . . . . . . . . . . . 120 12.8 Dependent samples t –test for a mean . . . . . . . . . . . . . . . . . . . . . . . . . 121 12.9 Dependent samples W ilcoxon–test . . . . . . . . . . . . . . . . . . . . . . . . . . 123 12.10 χ 2 –test for homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 12.11 One-way analysis of variance (ANO V A) . . . . . . . . . . . . . . . . . . . . . . . 126 12.12 Kruskal–W allis–test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 CONTENTS 13 Bivariate methods of statistical data analysi s 133 13.1 Correlation analysis and linear regression . . . . . . . . . . . . . . . . . . . . . . 133 13.1.1 t –test for a correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 13.1.2 F –test of a regression m odel . . . . . . . . . . . . . . . . . . . . . . . . . 135 13.1.3 t –test for the regre ssion coefﬁ cients . . . . . . . . . . . . . . . . . . . . . 137 13.2 Rank correlation analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 0 13.3 χ 2 –test for independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Outlook 144 A Simple principal component a nal y sis 147 B Distance measur es in Statistics 149 C List of online survey tools 151 D Glossa ry of technical terms (G B – D) 153 Bibliography 160 Intr oductory r emark s Statistical methods o f data analysis form t h e cornerstone of q u antitative–empirical research in the Social Sciences , Humanities , and Economics . Historically , the b ulk of knowledge a v ailable in Statistics emerged in the context of the analysi s of (now adays large) data sets from observational and experimental measurements in the Natural Sciences . The purpose of the present lecture notes is t o provide its rea ders with a solid and thorough, though accessible introduction t o the basic concepts of Descriptiv e and Infer ential Statistics . When d iscussing methods relating to the l atter subject, we will here take t he perspective of the frequ entist approach t o Probability Theory . (See Ref. [19] for a methodo logically dif ferent app roach.) The concepts t o be int roduced and the topics to be cove red hav e been selected in order to make a vailable a fairly self-contained b asic stati s tical to o l kit for thorou g h analysis at the univariate and bivariate levels o f complexity of d ata, g ai n ed by m eans of opinion polls, surveys or observation. In the Social Sciences , Humanities , and Economics there are two broad families of empi ri cal re- search to o ls av ailable for studyi n g behavioural features of and mutual interactions between human individuals on the one-hand side, and the social system s and organisations that these form on the other . Q ualitative–empirical methods focus th eir view on t he individual with the aim to account for her/his/its particular characteristic features, thus probing the “small scale-structure” o f a social system, while quantitative–empirical methods strive to recognise patterns and regularities t h at pertain to a large num ber o f ind ividuals and so hope to gain insi ght on t h e “large-scale structure” of a social system. Both approaches are strongl y committ ed to pursuing the principles o f the scientiﬁc method . These entail t he syst em atic observation and measurement of ph eno m ena of interest on the basis of well- deﬁned stati stical variables, the structured analysis of data so generated, the att em pt to pro vide compelling theoretical explanations for effects for which t h ere exists conclusiv e evidence in the data, the deri vation from t he data of predictio n s which can be t ested empirically , and t he publica- tion of all relev ant data and th e analyt ical and int erpretational to ols dev eloped and used, so that t he piv otal replicability of a researcher’ s ﬁndi n g s and ass o ciated conclusions is ensu red. By comply- ing with th ese principles, t he body o f scientiﬁc knowledge av ailable in any ﬁeld of research and i t s practical applications undergoes a cont i nuing process of updating and expansion. Ha ving thoroughly worked th rou gh these l ecture n o tes, a reader should have obtained a good understanding of the use and efﬁciency of descriptive and frequentist inferential statist ical methods for handling quantitative issues, as they often arise in a manager’ s e veryday b usi n ess life. Like wise, a reader should feel well-prepared for a smooth ent ry in t o any Master degree programme in the Social Sciences or Economics which puts emphasis on quantit at ive–empirical methods. 1 2 CONTENTS Follo wing a standard pedagog i cal concept, these l ecture notes are spli t into three m ain parts: Par t I, comprising Chapters 1 to 5, covers the basic considerations of Descriptive Statistics ; Part II, which consists of Chapters 6 to 8, introduces the foundations of Probability Theory . Finall y , the mate- rial of Part III, p rovided i n Chapters 9 to 13, ﬁrst revie ws a widespread method for operationalising latent statistical variables, and then introduces a number of standard uni- and biv ariate analytical tools of Infer ential Statistics wit hin the fr equentist framework that prove valuable in applica- tions. As such, the contents of Par t III are the most important ones for quantitativ e–empirical research work. Useful mathematical tools and further material hav e been gathered in appendices. Recommended introdu ct o ry textbooks , which may be used for study in parallel to these lecture notes, are Levin et al (2010) [61], Hatzinger and Nagel (201 3 ) [37], W einber g and Abramowitz (2008) [115], W e wel (20 14) [116], T outenbur g (2005) [108], or Duller (2007) [16]. There are not incl u ded in these lecture notes any explicit ex ercises on the topics to be d i scussed. These are reserved for lectures giv en throughout term time. The present lecture notes are designed to be dynamical in character . On the one-hand side, this means that they will be upd ated on a regular basis. On the other , that the *.pdf version of the notes contains interactive features such as full y hyperlinked refer- ences t o orig i nal publ ications at the webs ites doi.org and jsto r.org , as well as many active links to biographical in form ation on scient ists that ha ve been inﬂuential i n the historical de velopment of P robability Theory and Statistics , hosted by the websites The MacT utor History of Mathematics archive ( www-histor y.mcs.st-an d.ac.uk ) and en.wikiped ia.org . Throughout these lecture notes refere nces have been provided t o respectiv e descriptive and in- ferential statistical fun ct i ons and routines th at are av ailable in the excellent and wi despread sta- tistical software package R , on a standard graphic displ ay calculator (GDC), and in the statis - tical softwa re packages EXCEL, OpenOf ﬁce and SPSS (Statisti cal Program for the Social Sci- ences). R and its exhaustive documentation are distributed by the R Core T eam (2019) [85] via the website cran.r-proje ct.org . R , too, has been employed for generating all the ﬁg- ures contained in these lecture not es. Useful and easily accessible te xtbooks on the application of R for statist ical dat a analysis are, e.g., Dalgaard (2008) [15], or Hatzinger et al (2014) [38 ]. Further helpful information and assist ance is av ailable from the website ww w.r-tutor. com . For active statis t ical data analysis with R , we strongly recomm end the use of t he con venient custom-made work environment R S tudio, provided free of char ge at www.rs tudio.com . An- other user friendly statistical software package is GNU PSPP . This i s a vailable as sharewar e from www.gnu.or g/software /pspp/ . A few examples from the inbuilt R data sets package have been related to in these lecture notes in the context of the visualisati o n of dis tributional features of statistical data. Further informati on on these data sets can be obtained by ty p i ng lib rary(help = "datasets") at the R prompt. Lastly , we hop e the reader will discover som ething useful or/ and enjoyable for her/him-self when working t hrough these lecture notes. Constructive criticism is always welcome. Ac knowledgments: I am grateful to Kai Ho lschuh, Eva Kunz and Diane W ilcox for valuable com- ments on an earlier draft of th ese lecture notes, to Isabel Passin for being a critical sparing part- CONTENTS 3 ner i n ev aluating pedagogical con s iderations concerning cocreated accompanying lectures, and to Michael Rüger for compili ng an ini tial l i st of online survey t ools for the Social Sciences. 4 CONTENTS Chapter 1 Statistical va riables A central task of an empirical scientiﬁc dis ci p line is the observation or measur ement of a ﬁnite set of characteristic variable features of a given system of objects chosen for stud y . The h ope is to be able to recognise in a sea of data, typically guided by randomness , meaningful patterns and regularities t hat provide evidence for p ossible asso cia tions , or , stronger s till, causal r elationships between these variable features. Ba sed on a com bination of inductive and deducti ve methods of data analysis , one aims at gaining insights of a qualitative and/or quantitative nature into the intricate and oft en complex interdependencies of s u ch variable features for the purpose of (i) ob- taining explanations for phenomena that ha ve been observed, and (ii) making predictions which, subsequently , can be tested. The acceptance o f th e validity of a particular empirical scienti ﬁc frame work generally increases wit h the number o f successful rep lications of its predictions. 1 It is the interplay of o b serva tion, experimentation and th eoretical modell ing, systematically coupled to one another by a number of feedback loops, which gives rise to p rog ress in learning and under - standing in all em pirical scientiﬁc activities. This procedure, which focuses o n replicable facts , is referred to as the scientiﬁc method . More speciﬁcally , the general i ntention of em p irical scientiﬁc activities is to modi fy or strengthen the theor etical fou ndations of an empirical scientiﬁc discipline by means o f observ ational and/or experimental testing of sets of hypotheses ; see Ch. 11. This is generally achiev ed by employing the q u antitative–empirical techniques that hav e been de veloped in Statistics , in particular i n the course of the 20 th Century . At th e heart of these techniques i s the concept of a statistical vari- able X as an entit y which represents a sin g le commo n aspect of the system of objects selected for analysis — th e target population Ω of a statistical in vestigation . In the ideal case, a v ariable entertains a one-to-one correspondence with an observable , and thus is directly amenable to mea- sur ement . In the Social Sciences , Humanities , and Economics , ho wev er , one needs to carefully distingui sh between manifest variables correspondin g t o ob serva bles on the one-hand side, and latent variables representing in general unobservable “social const ructs” on th e other . It is this latter kind of variables which is com monplace in the ﬁelds mentioned. Hence, i t becomes an un- a voidable task t o thoroug hly address the issue of a reliable, valid and objective o perationalisation of any giv en latent variable one has identi ﬁed as providing ess ential informati on on the objects 1 A particu larly sceptical view o n the ability of makin g reliab le prediction s in certain empirical scientiﬁc disciplines is voiced in T aleb (200 7) [1 05, pp 1 35–21 1]. 5 6 CHAPTER 1. ST A TISTICAL V ARIABLES under in vestigation. A s tandard approach t o deali n g with t h e im portant matter of rendering latent var iables measurable is revie wed in Ch. 9. In Statistics , it has proven useful t o classify v ariabl es on the basis o f their intrinsic information content into o n e of three hi erachically ordered categories, referred to as the s cale levels of mea- sur ement ; cf. Stev ens (1946) [98]. W e provide t h e deﬁnit i on of these scale lev els next. 1.1 Scale levels of measurement Def.: Let X be a one-dimension al statistical variable wi t h k ∈ N (countably many) resp. k ∈ R (uncountably many) pos s ible values , attributes , or categories a j ( j = 1 , . . . , k ). Statisti cal vari- ables are classiﬁed as belongi ng into one of three hierachically ordered scale leve ls of measure- ment . This is done on the basis of three criteria for distingui shing inform ation contained in the values of actual data for these variables. On e t hus deﬁnes: • Metrically scaled variables X (quantitativ e/ numerical) Possible values can be disti n guished by (i) their names , a i 6 = a j , (ii) they allo w for a natural rank or der , a i < a j , and (iii) dista n ces betw een t hem, a i − a j , are uniquely determined. – Ratio scale : X has an absolute zer o point and otherwis e only non-negativ e values; analysis of both difference s a i − a j and ratios a i /a j is meaningful. Examples: body height, monthly net income, . . . . – Interval scale : X has no absolut e zer o point ; only difference s a i − a j are meaningful. Examples: year of birth, temperature in centigrades, Likert scales (cf. Ch. 9), . . . . Note that the values obtained for a metrically scaled variable (e.g. in a survey) always constitute deﬁnite numerical multipl es of a speciﬁc unit of measurement . • Ordinally scaled variables X (qualitative/cate gorical) Possible values, attributes, or categories can be disti nguished by (i) their names , a i 6 = a j , and (ii) they allo w for a natural rank or der , a i < a j . Examples: Likert item rating scales (cf. Ch. 9), grading of commodi t ies, . . . . • Nominal l y scaled variables X (qualitative/cate gorical) Possible values, attributes, or categories can be disti nguished only by (i) their names , a i 6 = a j . Examples: ﬁrst name, location of birth , . . . . 1.2. RA W D A T A SETS AND D A T A M A TRICES 7 Remark: As we wil l see later in Ch. 12 and 13, the applicability of speciﬁc method s of statistical data analys i s crucially depends on the scale level o f measur ement of the variables i n volved in the respectiv e procedures. Metrically s caled data offe rs the largest variety of powerful meth ods for this purpose! 1.2 Raw data sets and d a ta m atrices T o set th e stage for subsequent considerations, we here introduce some formal representations of entities which assume central roles in statist i cal d at a analyses. Let Ω denote the target population of study objects of interest (e.g., human individuals forming a particular social syst em) relating to some s tatistical in vestigation . This set Ω s hall comprise a total of N ∈ N stati s tical units , i.e., its size be | Ω | = N . Suppose one int end s to determine th e frequ ency distributional prope rties in Ω of a portfoli o of m ∈ N statistical variables X , Y , . . . , and Z , with spectra of values a 1 , a 2 , . . . , a k , b 1 , b 2 , . . . , b l , . . . , and c 1 , c 2 , . . . , c p , respective ly ( k , l , p ∈ N ). A survey typ i cally obtains from Ω a stati s tical sample S Ω of s i ze | S Ω | = n ( n ∈ N , n < N ), unl ess one is g iv en the rare oppo rt unity to conduct a proper census on Ω (in which case n = N ). The data thus generated consist s of observ ed values { x i } i =1 ,...,n , { y i } i =1 ,...,n , . . . , and { z i } i =1 ,...,n . It constitutes the raw data set { ( x i , y i , . . . , z i ) } i =1 ,...,n of a statistical inv est igation and may be con veniently assem bled in th e form of an ( n × m ) data matrix X giv en by sampling variable variable . . . variable unit X Y Z 1 x 1 = a 5 y 1 = b 9 . . . z 1 = c 3 2 x 2 = a 2 y 2 = b 12 . . . z 2 = c 8 . . . . . . . . . . . . . . . n x n = a 8 y n = b 9 . . . z n = c 15 T o systematically record the information obtained from measuring the values of a p ortfolio of statistical v ariables in a statistical sampl e S Ω , in the ( n × m ) data matrix X ev ery one of the n sampli ng units inv estigated is assigned a particular r ow , while every one of th e m s tati s tical variables m easured i s assigned a particular column . In t he following, X ij denotes the data entry i n 8 CHAPTER 1. ST A TISTICAL V ARIABLES the i th row ( i = 1 , . . . , n ) and the j th colu m n ( i = 1 , . . . , m ) of X . T o clarify st andard terminology used in Statistics , a raw data set is referred to as (i) univariate , when m = 1 , (ii) bivariate , when m = 2 , and (iii) multivariate , when m ≥ 3 . According t o Hair et al (2010) [36, pp 102, 175], a rough rule o f thum b concernin g an adequate sample size | S Ω | = n for multivariate data analysis is giv en by n ≥ 10 m . (1.1) Considerations of statistical power of particular methods of d ata analysi s lead to m ore reﬁned recommendations; cf. Sec. 11.1. “Big data” scenarios apply when n, m ≫ 1 (i.e., when n is typically on the order o f 10 4 , or very much lar ger sti l l, and m i s on the order of 10 2 , or lar ger). In general, an ( n × m ) data matrix X is the starting p o int for the application of a statistical soft- ware package such as R , SPSS, GNU PSPP , or other for the purpose of syst ematic data analysis. When the sam p le comprises exclusively metrically scaled data , the data matrix is real-valued, i.e., X ∈ R n × m ; (1.2) cf. Ref. [18, Sec. 2. 1 ]. Then the informati on contained in X uni q uely posit ions a collection of n sampling un i ts according to m qu antitative characteristic variable features in (a subset of) an m -dimensional Euclidian space R m . R : datMat <- data.frame(x = c( x 1 ,..., x n ), y = c( y 1 ,..., y n ), ..., z = c( z 1 ,..., z n )) W e next turn to describe phenom enologically the univariate fr equency distrib ution of a single one-dimensional statisti cal variable X in a speciﬁc statistical sampl e S Ω of size n , drawn in the context of a surve y from s o me tar g et population of study objects Ω of si ze N . Chapter 2 Uni v ariate fr eq u e n c y distrib utions The ﬁrst task at hand in unravelling the intrinsi c structure potenti al l y residing in a give n raw data set { x i } i =1 ,...,n for som e st ati stical variable X corresponds to Cinderella’ s task of separating the “good peas” from the “bad peas, ” and collecting them in respecti ve bowls (or bi ns). This is to say , the ﬁrst q u estion to be answered requires determin ation o f the fr equency with which a value (or attribute, or category) a j in the spectrum of po s sible values of X w as observed in a statistical sample S Ω of size n . 2.1 Absolute and relativ e fr equencie s Def.: Let X be a nomin ally , ordinall y or metrically scaled one-dimensi onal s tati stical variable , with a spectrum of k dif ferent values or a ttrib utes a j resp. k different categories (or bi ns) K j ( j = 1 , . . . , k ). If, for X , we have a univ ariate raw data set comprising n observ ed values { x i } i =1 ,...,n , we deﬁne by o j :=      o n ( a j ) = num ber of x i with x i = a j o n ( K j ) = number of x i with x i ∈ K j (2.1) ( j = 1 , . . . , k ) th e absolute (observ ed) frequen cy of a j resp. K j , and, upon division of the o j by the sample size n , we deﬁne by h j :=          o n ( a j ) n o n ( K j ) n (2.2) ( j = 1 , . . . , k ) the relativ e freq uency of a j resp. K j . Note that for all j = 1 , . . . , k , we ha ve 0 ≤ o j ≤ n with k X j =1 o j = n , and 0 ≤ h j ≤ 1 with k X j =1 h j = 1 . The k value p ai rs ( a j , o j ) j =1 ,...,k resp. ( K j , o j ) j =1 ,...,k represent the univ ariate distribu tion of a b- solute fr equencies , the k value pairs ( a j , h j ) j =1 ,... , k resp. ( K j , h j ) j =1 ,... , k represent the univar iate distrib ution of relativ e fr equencies of the a j resp. K j in S Ω . 9 10 CHAPTER 2. UNIV ARIA TE FREQUENC Y DISTRIBUTIONS R : table( v ariable ) , prop.t able( varia ble ) EXCEL, OpenOfﬁc e: FREQUEN CY (dt.: HÄ UFIGKEIT ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . T ypical graphical representation s of univ ariate r elative fr equency distrib utions , regularly em- ployed i n visualisi ng results of descriptive s tatistical data analyses , are the • histogram for metrically scaled dat a; cf. Fig. 2.1, 1 • bar chart for ordinally scaled data; cf. Fig. 2.2, • pie chart for n o minally scaled data; cf. Fig. 2.3. R : hist( varia ble , freq = FALSE) , barplot(ta ble( variab le )) , barplot(p rop.table(t able( varia ble ))) , pie(table( variable )) , pie(prop. table(tabl e( variable ) )) magnitude [1] relative frequency density 4.0 4.5 5.0 5.5 6.0 0.0 0.4 0.8 histogram Figure 2.1: Example of a histogram, representing t he relative frequency density for the variable “magnitude” in the R data set “quakes. ” R : data("quak es") ?quakes hist( quakes$m ag , breaks = 20 , freq = FALSE ) It is standard practice in Statistics to compile from the u n iv ariate relative frequency dis t ribution ( a j , h j ) j =1 ,...,k resp. ( K j , h j ) j =1 ,...,k of data for some ordinall y or metrically scaled one-dimensional 1 The ap pearance o f gr aphs gener ated in R can be p rettiﬁed b y emp loying the advanced graphical pack age ggplot2 by Wickham ( 2 016) [1 1 7]. 2.2. EM PIRICAL CUMULA TIVE DISTRIB UTION FUNCTION (DISCRETE D A T A) 11 25−34 35−44 45−54 55−64 65−74 75+ age group [yr] relative frequency 0.00 0.10 0.20 bar char t Figure 2.2: Examp l e o f a bar chart, representing the relativ e frequency di stribution for the variable “age group” in the R data set “esoph. ” R : data("esop h") ?esoph barplot( prop.table ( table( esoph$agegp ) ) ) statistical variable X the associated emp irical cumulative distribution functio n. Hereby it is neces- sary to dist inguish the case of data for a var iable wi th a discrete spectrum of values from th e case of data for a variable with a continu o us spectrum of va lues. W e will discuss t his issue next. 2.2 Empirica l cu mulative distribution f u n ction (d iscr ete da ta) Def.: Let X be an ordinally or metrically scaled on e-dimensional stat i stical variable, the spectrum of values a j ( j = 1 , . . . , k ) of whi ch vary di s cr etely . Suppose given for X a statist ical sample S Ω of size | S Ω | = n comprisin g observed values { x i } i =1 ,...,n , which we assu m e arranged in an ascending fashion according to the natural order a 1 < a 2 < . . . < a k . The corresponding u n iv ariate relati ve frequency di stribution is ( a j , h j ) j =1 ,...,k . For all real numbers x ∈ R , we then deﬁne by F n ( x ) :=                      0 for x < a 1 j X i =1 h n ( a i ) for a j ≤ x < a j +1 ( j = 1 , . . . , k − 1 ) 1 for x ≥ a k (2.3) 12 CHAPTER 2. UNIV ARIA TE FREQUENC Y DISTRIBUTIONS 0−5yrs 6−11yrs 12+ yrs pie char t Figure 2 . 3 : Ex am ple of a pie chart, representing t h e relative frequency dist ribution for the variable “education” in the R data set “infert. ” R : data("infe rt") ?infert pie( table( infert$educ ation ) ) the empirical cumulative distribution function for X . The v alue of F n at x ∈ R represents the cumulative relative frequencies of all a j which are less or equal to x ; cf. Fig. 2.4. F n ( x ) has the following properties: • its domain i s D ( F n ) = R , and its range i s W ( F n ) = [0 , 1] ; hence, F n is bou nded from above and from below , • it is continuous from t h e right and monoton o usly increasing, • it is constant on all half-open int erv als [ a j , a j +1 ) , but exhibits jump discont inuities of s ize h n ( a j +1 ) at all a j +1 , and, • asym p totically , it behav es as lim x →−∞ F n ( x ) = 0 and lim x → + ∞ F n ( x ) = 1 . R : ecdf( varia ble ) , plot(ecdf( variable )) Computational rules for F n ( x ) 1. h ( x ≤ d ) = F n ( d ) 2. h ( x < d ) = F n ( d ) − h n ( d ) 3. h ( x ≥ c ) = 1 − F n ( c ) + h n ( c ) 2.3. EM PIRICAL CUMULA TIVE DISTRIB UTION FUNCTION (CONTINUOUS D A T A) 13 4.0 4.5 5.0 5.5 6.0 6.5 0.0 0.4 0.8 x: magnitude [1] Fn(x) empirical cumulative distribution function Figure 2.4: Exam p le of an empirical cum ulative distribution function, here for the var iable “m ag- nitude” in the R data set “quakes. ” R : data("quak es") ?quakes plot( ecdf( quakes$magn itude ) ) 4. h ( x > c ) = 1 − F n ( c ) 5. h ( c ≤ x ≤ d ) = F n ( d ) − F n ( c ) + h n ( c ) 6. h ( c < x ≤ d ) = F n ( d ) − F n ( c ) 7. h ( c ≤ x < d ) = F n ( d ) − F n ( c ) − h n ( d ) + h n ( c ) 8. h ( c < x < d ) = F n ( d ) − F n ( c ) − h n ( d ) , wherein c denotes an arbitrary lower bound , and d denotes an arbit rary up per bound , on the ar gument x of F n ( x ) . 2.3 Empirica l cumula tive distribution function (contin u o us data) Def.: Let X be a metrically scaled one-dimensional statistical variable, th e spectrum of values of which vary continuously , and let obs erved values { x i } i =1 ,...,n for X from a statistical sample S Ω of size | S Ω | = n be binned into a ﬁnite set of k (wit h k ≈ √ n ) ascendingly ordered exclusive clas s 14 CHAPTER 2. UNIV ARIA TE FREQUENC Y DISTRIBUTIONS interv als (or bins) K j ( j = 1 , . . . , k ), of wid th b j , and with lower bound ary u j and upper bound- ary o j . The univ ariate distri bution of relative frequencies of the class i ntervals be ( K j , h j ) j =1 ,... , k . Then, for all real numbers x ∈ R , ˜ F n ( x ) :=                      0 for x < u 1 j − 1 X i =1 h i + h j b j ( x − u j ) for x ∈ K j 1 for x > o k (2.4) deﬁnes the empirical cumulative dis tribution function for X . ˜ F n ( x ) has the following proper- ties: • its domain i s D ( ˜ F n ) = R , and its range i s W ( ˜ F n ) = [0 , 1] ; hence, ˜ F n is bou nded from above and from below , • it is continuous and m o notonously increasing, and, • asym p totically , it behav es as lim x →−∞ ˜ F n ( x ) = 0 and lim x → + ∞ ˜ F n ( x ) = 1 . R : ecdf( va riable ) , plot(ec df( variabl e )) Computational rules for ˜ F n ( x ) 1. h ( x < d ) = h ( x ≤ d ) = ˜ F n ( d ) 2. h ( x > c ) = h ( x ≥ c ) = 1 − ˜ F n ( c ) 3. h ( c < x < d ) = h ( c ≤ x < d ) = h ( c < x ≤ d ) = h ( c ≤ x ≤ d ) = ˜ F n ( d ) − ˜ F n ( c ) , wherein c denotes an arbitrary lower bound , and d denotes an arbit rary up per bound , on the ar gument x of ˜ F n ( x ) . Our next steps comprise t he introductio n of a s et of scale-lev el-dependent standard descriptive measur es which characterise speciﬁc properti es of univariate and bivariate relativ e frequency dis- tributions of stati stical variables X resp. ( X, Y ) . Chapter 3 Descripti ve measur e s f or univ ariate fr equency dis t r i b u t i o n s There are four families of scale-level-dependent standard measures one employs in Statistics to describe characteristic properties of u niv ariate relative frequency distributions. On a technical lev el, the determi nation of the values of these m easures from a vailable data d o es not go beyond application of the four fundamental arithmetical operation s: addi tion, subtraction, multipli cati o n and division. W e w i ll introduce t h ese measures in turn. In the foll owing we s uppose given from a surv ey for some one-dimensio n al stati s tical var iable X either (i) a raw data set { x i } i =1 ,...,n of n measured values, or (ii) a relativ e frequenc y distrib ution ( a j , h j ) j =1 ,...,k resp. ( K j , h j ) j =1 ,...,k . 3.1 Measur es of centr al tende ncy Let us begin wi th the measur es of central tendency whi ch i ntend to con vey a not ion of “middle” or “centre” of a univ ariate relative frequency dist ribution. 3.1.1 Mode The mode x mod (nom, ord, metr) of the relativ e frequency di stribution for any one-dimensi onal var iable X is that value a j in X ’ s spectrum whi ch was obs erved wi th the highest relativ e frequency in a statist ical sampl e S Ω . Note that the mode does not necessarily take a uniqu e value. Def.: h n ( x mod ) ≥ h n ( a j ) for all j = 1 , . . . , k . EXCEL, OpenOfﬁc e: MODE.SNGL (dt.: MODUS.E INF , MODALW ERT ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Mo de 3.1.2 Median T o determine the median ˜ x 0 . 5 (or Q 2 ) (ord, metr) of the relative frequenc y distribution for an ordinally or metrically scaled one-dim ens ional v ariable X , it i s necessary to ﬁrst arra nge t he n observed values { x i } i =1 ,...,n in their ascending natural rank order , i. e., x (1) ≤ x (2) ≤ . . . ≤ x ( n ) . Def.: For the ascendingly ordered n obs erved values { x i } i =1 ,...,n , at mos t 50% ha ve a rank lower or equal to resp. are less o r equal to the median v alue ˜ x 0 . 5 , and at most 50 % have a rank higher or equal to resp. are greater or equal to the m edian v alue ˜ x 0 . 5 . 15 16 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS (i) Discrete data F n ( ˜ x 0 . 5 ) ≥ 0 . 5 ˜ x 0 . 5 = ( x ( n +1 2 ) if n is odd 1 2 [ x ( n 2 ) + x ( n 2 +1) ] if n is eve n . (3.1) (ii) Binned d ata ˜ F n ( ˜ x 0 . 5 ) = 0 . 5 The class int erv al K i contains the medi an value ˜ x 0 . 5 , if i − 1 X j =1 h j < 0 . 5 and i X j =1 h j ≥ 0 . 5 . Then ˜ x 0 . 5 = u i + b i h i 0 . 5 − i − 1 X j =1 h j ! . (3.2) Alternative ly , the median of a stati stical s am ple S Ω for a continuous variable X wi t h binned data ( K j , h j ) j =1 ,...,k can be obtained from the associated empirical cum ulative distribution function by solving the condition ˜ F n ( ˜ x 0 . 5 ) ! = 0 . 5 for ˜ x 0 . 5 ; cf. Eq. (2.4). 1 Remark: Note t hat the value of the median of a univ ariate relati ve frequency distribution is rea- sonably insensitive to so-called outliers i n a statistical sample. R : median( var iable ) EXCEL, OpenOfﬁc e: MEDIAN (dt.: MEDIAN ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Medi an 3.1.3 α –Quantile A generalisation of the median is th e concept of the α –quantile ˜ x α (ord, m etr) of the relative frequency distribution for an ordinally or metrically scaled one-dim ensional variable X . Again, it is necessary to ﬁrst arrange the n observed values { x i } i =1 ,...,n in their ascending natural rank order , i.e., x (1) ≤ x (2) ≤ . . . ≤ x ( n ) . Def.: For th e ascendingly ordered n observed va lues { x i } i =1 ,...,n , and for given α wi th 0 < α < 1 , at mos t α × 100% have a rank lower of equal to resp. are less or equal to the α –quant i le ˜ x α , and at most (1 − α ) × 100% ha ve a rank higher or equal to resp. are greater or equ al to the α –quantile ˜ x α . (i) Discrete data F n ( ˜ x α ) ≥ α ˜ x α = ( x ( k ) if nα / ∈ N , k > nα 1 2 [ x ( k ) + x ( k +1) ] if k = nα ∈ N . (3.3) 1 From a m athematical point of v iew , this amoun ts to th e f ollowing prob lem: consider a straight line which co ntains the point with coordin ates ( x 0 , y 0 ) and has non-zer o slope y ′ ( x 0 ) 6 = 0 , i.e., y = y 0 + y ′ ( x 0 )( x − x 0 ) . Re-arrang ing to solve for the variable x then yield s x = x 0 + [ y ′ ( x 0 )] − 1 ( y − y 0 ) . 3.1. ME ASURES OF CENTRAL TENDENCY 17 (ii) Binned d ata ˜ F n ( ˜ x α ) = α The class interval K i contains the α –q u antile ˜ x α , if i − 1 X j =1 h j < α and i X j =1 h j ≥ α . Th en ˜ x α = u i + b i h i α − i − 1 X j =1 h j ! . (3.4) Alternative ly , an α –quanti l e of a s tatistical sample S Ω for a continuous var iable X with binned d ata ( K j , h j ) j =1 ,...,k can be o btained from the asso ciated empirical cumulat ive distri- bution function by solv ing th e condition ˜ F n ( ˜ x α ) ! = α for ˜ x α ; cf. Eq. (2.4). Remark: The quantiles ˜ x 0 . 25 , ˜ x 0 . 5 , ˜ x 0 . 75 (also denoted by Q 1 , Q 2 , Q 3 ) hav e special statu s. They are referred to as the ﬁrst quartile → second quartile (median) → third quartile of a relative frequency di s tribution for an ordinally or a metrically scaled one-dimensi onal variable X and form the core of the ﬁve number summary o f thi s distribution. Occasionally , α –quantiles are also referred to as perc entile values . R : quantil e( variable , α ) EXCEL, OpenOfﬁc e: PERCENTILE .EXC (dt .: QUANT IL.EXKL , QU ANTIL ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Percentile(s) 3.1.4 Five number summary The ﬁve number summary (ord, metr) of the relative frequency distri bution for an ordinally or metrically scaled on e-dimensional variable X i s a compact compilation of i nformation giving the (i) lo w est rank resp. smallest v alue, (ii) ﬁrst quartile, (iii) second quartile or median, (iv) third quartile, and (v) high est rank resp. l ar gest value t h at X takes in a univ ariate raw data s et { x i } i =1 ,...,n from a statisti cal sample S Ω , i.e., { x (1) , ˜ x 0 . 25 , ˜ x 0 . 5 , ˜ x 0 . 75 , x ( n ) } . (3.5) Alternative notatio n: { Q 0 , Q 1 , Q 2 , Q 3 , Q 4 } . R : fivenum( va riable ) , summary ( variable ) EXCEL, OpenOfﬁc e: M IN , QUART ILE.INC , MAX (dt.: MIN , QUA RTILE.INKL , QU ARTILE , MAX ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statistics . . . : Quartiles, Minim um, Maximum All measures of central tendency which we will discuss hereafter are deﬁned exclusiv ely for char- acterising relative frequency dist ri butions for metri cally scal ed one-dimensional variab l es X o nly . 3.1.5 Sample mean The best known measure of central tendency is th e dimensionful s a m ple mean ¯ x (metr) (also referred to as the arithmetical mean). Amongst th e ﬁrst t o have employed the 18 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS sample mean as a characteristic statistical measure in the s y stematic analysis of q u antita- tiv e emprical data ranks the E nglish physicis t, mathematician, astronomer and philosoph er Sir Isaac Newton PRS MP (1643 –1727); cf. M lodinow (2008) [73, p 127]. Given metrically scaled data, it is deﬁned by: (i) From a raw data set: ¯ x := 1 n ( x 1 + . . . + x n ) =: 1 n n X i =1 x i . (3.6) (ii) From a relative frequency distribution: ¯ x := a 1 h n ( a 1 ) + . . . + a k h n ( a k ) =: k X j =1 a j h n ( a j ) . (3.7) Remarks: (i) The value of the sam p le m ean is very sensitiv e to outliers . (ii) For b inned data one selects the midpoint of each class interval K i to represent the a j (provided the raw data set is n o longer accessible). R : mean( varia ble ) EXCEL, OpenOfﬁc e: AVERAGE (dt.: MITTEL WERT ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Mean 3.1.6 W eighted mean In practice, one also encounters the dimensi o nful weighted mean ¯ x w (metr), deﬁned by ¯ x w := w 1 x 1 + . . . + w n x n =: n X i =1 w i x i ; (3.8) the n weight factors w 1 , . . . , w n need to satisfy the constraints 0 ≤ w 1 , . . . , w n ≤ 1 and w 1 + . . . + w n = n X i =1 w i = 1 . (3.9) 3.2 Measur es of variability The idea behind the meas ur es of variability is to con ve y a notion of the “spread” of data i n a giv en statistical sample S Ω , technically referred to also as the dispersion of the data. As t he realisation of this intent ion requires a wel l -deﬁned concept o f dis tance , the m easures of variability are meaningful for d ata relating to metrically scaled one-dimensional variables X only . One can distingui sh two kinds of such m easures: (i) simple 2 -data-point measures, and (i i ) sop h isticated n -data-point measures. W e begin wit h two examples belonging to the ﬁrst category . 3.2. ME ASURES OF V ARIABILITY 19 3.2.1 Range For a univariate raw data s et { x i } i =1 ,...,n of n observed values for X , t h e dim ensionful range R (metr) simply expresses the dif ference between the largest and the sm allest value i n this set, i.e., R := x ( n ) − x (1) . (3.10) The basi s of this measure is t he ascending l y ordered data set x (1) ≤ x (2) ≤ . . . ≤ x ( n ) . Alt erna- tiv ely , th e range can be denoted by R = Q 4 − Q 0 . R : range( vari able ) , max( varia ble ) − min( va riable ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Range 3.2.2 Interquartile range In the same spi ri t as the range, the di mensionful inter quartile range d Q (metr) is deﬁned as the diffe rence between t he third qu antile and t h e ﬁrst qu antile of th e relative frequency dist ribution for some metrically scaled X , i.e., d Q := ˜ x 0 . 75 − ˜ x 0 . 25 . (3.11) Alternative ly , t his is d Q = Q 3 − Q 1 . R : IQR( var iable ) V iewing the int erquarti le range d Q of a u niv ariate metrically scaled raw data set { x i } i =1 ,...,n as a reference length, it is commonp lace to deﬁne a speciﬁc v alue x i to be an • outlier , if either x i < ˜ x 0 . 25 − 1 . 5 d Q and x i ≥ ˜ x 0 . 25 − 3 d Q , o r x i > ˜ x 0 . 75 + 1 . 5 d Q and x i ≤ ˜ x 0 . 75 + 3 d Q , • extreme value , i f eit her x i < ˜ x 0 . 25 − 3 d Q , or x i > ˜ x 0 . 75 + 3 d Q . A very con venient graphical method for transparently d isplaying dist ri butional features of metri- cally s caled data relating to a ﬁ ve number su m mary , also making explicit th e interquarti le range, outliers and extreme values, i s provided by a box plot ; see, e.g., T ukey (1977 ) [110]. An example of a single box plot is depicted in Fig. 3.1, of parallel b o x plo ts in Fig. 3.2. R : boxplot( va riable ) , boxplot ( variable ~ group variable ) 3.2.3 Sample variance The most frequently employed measure of variability in Statistics is the di m ensionful n -data-point sample variance s 2 (metr), and the related sample standard deviation to be discussed b elow . O n e of the originators of these con cepts is the French mathematician Abraham de Mo ivre (1667–1 754); cf. Bernstein (199 8) [3, p 5]. Give n a un ivariate raw data set { x i } i =1 ,...,n for X , it s spread is essentially quantiﬁed in t erms of the sum of squared deviations of the n data points x i from their common sample mean ¯ x . Due t o the algebraic identity ( x 1 − ¯ x ) + . . . + ( x n − ¯ x ) = n X i =1 ( x i − ¯ x ) = n X i =1 x i ! − n ¯ x Eq. ( 3 . 6 ) ≡ 0 , 20 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS 4.0 5.0 6.0 magnitude [1] bo x plot Figure 3.1: Exampl e of a box plo t , representing elements of the ﬁve num b er summ ary for the distribution of measured values for t he variable “magnitude” in the R data set “quakes. ” The open circles indicate the positi o ns of out liers. R : data("quak es") ?quakes boxplot( quakes$mag ) there are onl y n − 1 degrees of freedom in volved in th i s measure. The sample v ariance is thu s deﬁned by: (i) From a raw data set: s 2 := 1 n − 1  ( x 1 − ¯ x ) 2 + . . . + ( x n − ¯ x ) 2  =: 1 n − 1 n X i =1 ( x i − ¯ x ) 2 ; (3.12) alternativ ely , by the s hift theor em : 2 s 2 = 1 n − 1  x 2 1 + . . . + x 2 n − n ¯ x 2  = 1 n − 1 " n X i =1 x 2 i − n ¯ x 2 # . (3.13) (ii) From a relative frequency distribution: s 2 := n n − 1  ( a 1 − ¯ x ) 2 h n ( a 1 ) + . . . + ( a k − ¯ x ) 2 h n ( a k )  =: n n − 1 k X j =1 ( a j − ¯ x ) 2 h n ( a j ) ; (3.14) 2 That is, the algebr aic id entity n X i =1 ( x i − ¯ x ) 2 = n X i =1  x 2 i − 2 x i ¯ x + ¯ x 2  Eq. ( 3 . 6 ) ≡ n X i =1 x 2 i − n X i =1 ¯ x 2 = n X i =1 x 2 i − n ¯ x 2 . 3.2. ME ASURES OF V ARIABILITY 21 3.5 4.5 5.5 mass [g] control treatment 1 treatment 2 parallel bo x plots Figure 3.2: Example of parallel b ox plots , comparing elements of the ﬁve num ber summary for the distribution of measured v alues for th e variable “weight” between categories of the variable “group” in the R data set “PlantGrowth. ” The o pen circle indicates the position of an outlier . R : data("Plan tGrowth") ?PlantGrow th boxplot( PlantGrowt h$weight ~ PlantGrowth$grou p ) alternativ ely: s 2 = n n − 1  a 2 1 h n ( a 1 ) + . . . + a 2 k h n ( a k ) − ¯ x 2  = n n − 1 " k X j =1 a 2 j h n ( a j ) − ¯ x 2 # . (3.15) Remarks: (i) W e point out that the alternative formulae for a sampl e variance provided here prov e computationall y more efﬁc ient. (ii) For binned data, when one selects the midpoint of each class interval K j to represent the a j (giv en the raw dat a set is no longer accessible), a correction of Eqs. (3.14) and (3.15) by an addi- tional term (1 / 12)( n/n − 1) P k j =1 b 2 j h j becomes n ecessary , assumi ng uniforml y di stributed data within each of the class intervals K j of width b j ; cf. Eq. (8.41). R : var( var iable ) EXCEL, OpenOfﬁc e: VAR.S (dt.: VAR.S , VARIANZ ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : V ariance 22 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS 3.2.4 Sample standard de viation For ease of handling dimensions asso ciated with a metrically scaled on e-dimensional var iable X , one deﬁnes the dimensionful s ample standard deviation s (metr) simply as the posit iv e square root of the sample variance (3.12), i.e., s := + √ s 2 , (3.16) such that a measure for the spread of data results which shares the dimension of X and its sample mean ¯ x . R : sd( variabl e ) EXCEL, OpenOfﬁc e: STDEV.S (dt.: STABW. S , STAB W ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Std. de viation 3.2.5 Sample coefﬁcient of variation For ratio scaled one-dim ensional variables X , a dimensionl ess relative measure of variability is the sample coefﬁcient of variation v (metr: ratio ), deﬁned by v := s ¯ x , if ¯ x > 0 . (3.17) 3.2.6 Standardisation Data for metrically scaled one-dimensional variables X is amenable to the process of standardis- ation . By this is meant a linear afﬁne t ransformation X → Z , which generates from a univa riate raw data set { x i } i =1 ,...,n of n measured values for a di m ensionful v ariable X , with sample mean ¯ x and samp l e s t andard deviation s X > 0 , data for an equiv alent dim ensionless va riable Z according to x i 7→ z i := x i − ¯ x s X for all i = 1 , . . . , n . (3.18) For the resultant Z -data, referred to as t he Z scores o f the origin al metrical X -dat a, this has the con venient practical consequences that (i) all on e-di m ensional metrical data is thus represented on the same dimensio n l ess meas u r ement scal e , and (ii) the corresponding sample mean and s am ple standard de viation of the Z -data am ount to ¯ z = 0 and s Z = 1 , respectiv el y . Em p loying Z scores, speciﬁc values x i of the origi nal m etrical X -data will be ex- pressed in terms of sample standard deviation unit s, i.e., by how many sampl e s t andard deviations they fall on either side of the comm o n sample mean. Essenti al informati on on characteristic distri- butional features of one-dimensional metrical data will be preserved by the process of standardis - ation. R : scale( vari able , center = TRUE, scale = TRUE) EXCEL, OpenOfﬁc e: STANDAR DIZE (dt.: STAN DARDISIERUN G ) SPSS: Analyze → Descript iv e Statistics → Descriptives . . . → Sav e standardized values as vari- ables 3.3. ME ASURES OF RELA TIVE DISTOR TION 23 3.3 Measur es of relativ e distor t ion The third family of measures characterising relati ve frequenc y distributions for u n iv ariate data { x i } i =1 ,...,n for m etrically scaled one-dimens ional va riables X , ha ving speciﬁc sample mean ¯ x and sampl e standard deviation s X , relate to t h e issue of the shape of a dist ribution. Th ese measures take a Gaußian normal distribution (cf. Sec. 8.6 below) as as a reference case, with the values of its two free parameters equal to the given ¯ x and s X . W ith respect to this r efer ence distribution , one deﬁnes t wo kinds o f dimension less measur es of re lative distortion as described in the following (cf., e.g., Joanes and Gill (1998) [45]). 3.3.1 Skewness The skewne ss g 1 (metr) is a dim ensionless measure to quantify the degree of relati ve distorti on of a given frequenc y distribution in the horizont al dir ection . Its implementation in the software package EXCEL employs the deﬁnition g 1 := n ( n − 1)( n − 2 ) n X i =1  x i − ¯ x s X  3 for n > 2 , (3.19) wherein th e observed values { x i } i =1 ,...,n enter in their standardised form according t o Eq. (3.18). Note that g 1 = 0 for an exact Gaußian normal distribution. R : skewness( v ariable , t ype = 2) (package: e1071 , by Meyer et al (2019) [71]) EXCEL, OpenOfﬁc e: SKEW (dt.: SCHIE FE ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Skewness 3.3.2 Excess kurtosis The excess kurtosis g 2 (metr) is a d i mensionless m easure t o quantify the degree of relative dist o r- tion of a giv en frequency dist ribution in the vertical direc tion . Its imp l ementation in th e software package EXCEL employs the deﬁnition g 2 := n ( n + 1) ( n − 1)( n − 2)( n − 3 ) n X i =1  x i − ¯ x s X  4 − 3( n − 1) 2 ( n − 2)( n − 3 ) for n > 3 , (3.20) wherein th e observed values { x i } i =1 ,...,n enter in their standardised form according t o Eq. (3.18). Note that g 2 = 0 for an exact Gaußian normal distribution. R : kurtosis( v ariable , t ype = 2) (package: e1071 , by Meyer et al (2019) [71]) EXCEL, OpenOfﬁc e: KURT (dt.: KURT ) SPSS: Analyze → Descriptive Stati s tics → Frequencies . . . → Statist i cs . . . : Kurtosis 3.4 Measur es of conce ntration Finally , for univ ariate data { x i } i =1 ,...,n relating t o a ratio scaled one-dimensi onal v ariable X , whi ch has a discrete s p ectrum of values { a j } j =1 ,...,k , or which was binned into k dif ferent categories 24 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS { K j } j =1 ,...,k with respective mi d points a j , two kinds of measures of concentration are common - place in Statistics ; one qualitative in nature, th e other quantitative. Begin by deﬁning the total sum for the data { x i } i =1 ,...,n by S := n X i =1 x i = k X j =1 a j o n ( a j ) Eq. ( 3 . 6 ) = n ¯ x , (3.21) where ( a j , o n ( a j )) j =1 ,...,k is the absolute fre quency distrib ution for t h e observe d v alues (or cat- egories) of X . Then the relativ e proportion that the value a j (or the category K j ) takes in S is a j o n ( a j ) S = a j h n ( a j ) ¯ x . (3.22) 3.4.1 Lorenz curve From the elements introduced in Eqs. (3.21) and (3.22), the US–American economist Max Otto Lorenz (1876–195 9) con s tructed cumulative relative quantities which constitute the co- ordinates of a so-called Lore nz curve representi n g concentrati on i n the distribution for the ratio scaled one-dimensional va riable X ; cf. L o renz (1905) [64]. These coordinates are deﬁned as follows: • Horizont al axis: k i := i X j =1 o n ( a j ) n = i X j =1 h n ( a j ) ( i = 1 , . . . , k ) , (3.23) • V erti cal axis: l i := i X j =1 a j o n ( a j ) S = i X j =1 a j h n ( a j ) ¯ x ( i = 1 , . . . , k ) . (3.24) The initi al point on a L o renz curve is generally the coordinate system’ s origin, ( k 0 , l 0 ) = (0 , 0) , the ﬁnal point is (1 , 1 ) . As a reference facility to measure concentration in the d istribution of X in qualit ativ e terms, one deﬁnes a null conc entration curve as the bisecting line li nking (0 , 0) to (1 , 1) . The Lorenz curve is interpreted as stating that a point on t h e curve with coordinates ( k i , l i ) represents the fact th at k i × 100% of the n statist ical uni ts take a share of l i × 100% in the tot al sum S for the ratio scaled one-dimension al var iable X . Quali tativ ely , for giv en u niv ariate data { x i } i =1 ,...,n , the concentrati o n in the distribution o f X is the stronger , the large r is the dip of the Lorenz curve relativ e to the null concentrati on curve. Note that in addition to the null concentration curve, one can deﬁne as a second reference facility a maximum concentration curv e such that only the largest v alue a k (or category K k ) in the spectrum of values of X takes the full share o f 100% in the total sum S for { x i } i =1 ,...,n . 3.4. ME ASURES OF CONCENTRA TION 25 3.4.2 Normalised Gini coefﬁcient The Italian stati stician, demographer and socio logist Corrado Gini (1884 – 1965) devised a qu ant i- tativ e measure for concentration i n the distribution for a ratio scaled one-dimensi o nal v ariable X ; cf. Gini (1921) [33]. The di mensionless normalised Gini coefﬁcient G + (metr: ratio) can be interpreted geometrically as the ratio of areas G + := ( area enclosed between Lorenz and null concentration curves ) ( area enclosed between maxi mum and nul l concentration curves ) . (3.25) Its related computati onal deﬁnition is given by G + := n n − 1 " k X i =1 ( k i − 1 + k i ) a i o n ( a i ) S − 1 # . (3.26) Due to no rmalisation, the range of v alues is 0 ≤ G + ≤ 1 . Thu s , nu l l concentration amoun t s to G + = 0 , while maximum concentration amounts to G + = 1 . 3 3 In September 2012 it was repo r ted (imp licitly) in the public press that the co ordinate s un derlying th e Lorenz curve describin g the distribution of p r iv ate equ ity in Germany at the time were (0 . 00 , 0 . 0 0 ) , (0 . 50 , 0 . 01) , (0 . 9 0 , 0 . 50) , and (1 . 00 , 1 . 00 ) ; cf . Ref. [101]. Given that in this case n ≫ 1 , these values amount to a Gini coefﬁcient o f G + = 0 . 64 . Th e Oxfam Report on W ealth Ineq uality 2019 can be found a t the URL (cited on May 31, 201 9): www .oxfam.org/en /r esearch/public-good-or-pri vate-wealth . 26 CHAPTER 3. M EASURES FOR UNIV ARIA TE DISTRIB UTIONS Chapter 4 Descripti ve measur e s of association f or bi v ariate fr e qu e ncy distrib utions Now we come to describe and characterise sp eciﬁc features of biv ariate frequenc y dist ri butions, i.e., intrinsi c structures of biv ariate raw d ata sets { ( x i , y i ) } i =1 ,...,n obtained from sam ples S Ω for a two-dimensional s t atistical variable ( X, Y ) from some target po pulation of st u dy ob jects Ω . Let us suppose t hat the spectrum o f values resp. categories of X is a 1 , a 2 , . . . , a k , and th e spectrum of values resp. categories of Y is b 1 , b 2 , . . . , b l , where k , l ∈ N . Hence, for the biv ariate joi nt dis- trib ution there exists a t otal of k × l possib le combinations { ( a i , b j ) } i =1 ,...,k ; j =1 ,... ,l of values resp. categories for ( X, Y ) . In t he following, we will denote associated biv ariate absolute (observed) frequencies by o ij := o n ( a i , b j ) , and biv ariate relative frequencies by h ij := h n ( a i , b j ) . 4.1 ( k × l ) contingency ta bles Consider a biva riate raw data set { ( x i , y i ) } i =1 ,...,n for a t wo-dimensional statistical variable ( X , Y ) , giving rise to k × l combinations of v alues resp. categories { ( a i , b j ) } i =1 ,...,k ; j =1 ,... ,l . The biv ariate joint distribution of observed a bsol ute fr equencies o ij may be con veniently represented in terms of a ( k × l ) contingency table , or cross tabulation , by o ij b 1 b 2 . . . b j . . . b l Σ j a 1 o 11 o 12 . . . o 1 j . . . o 1 l o 1+ a 2 o 21 o 22 . . . o 2 j . . . o 2 l o 2+ . . . . . . . . . . . . . . . . . . . . . . . . a i o i 1 o i 2 . . . o ij . . . o il o i + . . . . . . . . . . . . . . . . . . . . . . . . a k o k 1 o k 2 . . . o k j . . . o k l o k + Σ i o +1 o +2 . . . o + j . . . o + l n , (4.1) where it holds for all i = 1 , . . . , k and j = 1 , . . . , l that 0 ≤ o ij ≤ n and k X i =1 l X j =1 o ij = n . (4.2) 27 28 CHAPTER 4. MEASURES OF ASSOCIA TION FOR BIV ARIA TE DISTRIBUTIONS The corresponding univ ariate marginal absolute fre quencies of X and of Y are o i + := o i 1 + o i 2 + . . . + o ij + . . . + o il =: l X j =1 o ij (4.3) o + j := o 1 j + o 2 j + . . . + o ij + . . . + o k j =: k X i =1 o ij . (4.4) R : CrossTable ( row variable , column variable ) (package: gmodels , by W arnes et al (2018) [114]) SPSS: Analyze → Descriptive Stati s tics → Crosstabs . . . → Cells . . . : Obs erved One obtains the related biv ariate jo int d i stribution of obs erved r elative freq uencies h ij following the systemati cs of Eq. (2.2) to yield h ij b 1 b 2 . . . b j . . . b l Σ j a 1 h 11 h 12 . . . h 1 j . . . h 1 l h 1+ a 2 h 21 h 22 . . . h 2 j . . . h 2 l h 2+ . . . . . . . . . . . . . . . . . . . . . . . . a i h i 1 h i 2 . . . h ij . . . h il h i + . . . . . . . . . . . . . . . . . . . . . . . . a k h k 1 h k 2 . . . h k j . . . h k l h k + Σ i h +1 h +2 . . . h + j . . . h + l 1 . (4.5) Again, it holds for all i = 1 , . . . , k and j = 1 , . . . , l that 0 ≤ h ij ≤ 1 and k X i =1 l X j =1 h ij = 1 , (4.6) while the univ ariate marginal r elative freque ncies of X and of Y are h i + := h i 1 + h i 2 + . . . + h ij + . . . + h il =: l X j =1 h ij (4.7) h + j := h 1 j + h 2 j + . . . + h ij + . . . + h k j =: k X i =1 h ij . (4.8) On the b asis of a ( k × l ) contingency t able displaying the relative frequencies of the bi variate joint distribution for some two-dimensi o nal variable ( X , Y ) , o n e may deﬁne two kind s of related conditional relative freq uency distrib utions , namely (i) the conditional distribution of X given Y by h ( a i | b j ) := h ij h + j , (4.9) 4.2. ME ASURES OF ASSOCIA TION FOR THE METRICAL SCALE LEVEL 29 and (ii) the conditional distribution o f Y given X by h ( b j | a i ) := h ij h i + . (4.10) Then, by m eans of these condition al d i stributions, a notion of statistical independ ence of variables X and Y is deﬁned to correspond t o the simultaneous properties h ( a i | b j ) = h ( a i ) = h i + and h ( b j | a i ) = h ( b j ) = h + j . (4.11) Giv en these properti es hold , i t follows from Eqs. (4.9) and (4.10) th at h ij = h i + h + j ; (4.12) the biv ariate relative frequencies h ij in this case are numerically equal to the product o f the corre- sponding univ ariate marginal relative frequencies h i + and h + j . 4.2 Measur es of assoc ia tion f or the metrical scale level Next, speciﬁcally con sider a biva riate raw data set { ( x i , y i ) } i =1 ,...,n from a statisti cal samp le S Ω for a metri call y scaled two-dimens i onal variable ( X , Y ) . Th e biv ariate join t distribution for ( X , Y ) in th is sample can be con veniently represented graphically in terms of a scatter plot , cf. Fig. 4.1, thus uniquely l ocating the positions of n sampling units in (a su b set of) Euclidian space R 2 . Let us no w introduce t wo kinds of measures for the description of speciﬁc characteristic features of such biv ariate joint distributions. R : plot( varia ble1 , variable2 ) 4.2.1 Sample covariance The ﬁrst standard measure describing degree of association in the joint dis tribution for a metri- cally scaled two-dimensional v ariable ( X, Y ) is the dimensionful sa mple covariance s X Y (metr), deﬁned by (i) From a raw data set: s X Y := 1 n − 1 [ ( x 1 − ¯ x )( y 1 − ¯ y ) + . . . + ( x n − ¯ x )( y n − ¯ y ) ] =: 1 n − 1 n X i =1 ( x i − ¯ x ) ( y i − ¯ y ) ; (4.13) alternativ ely: s X Y = 1 n − 1 [ x 1 y 1 + . . . + x n y n − n ¯ x ¯ y ] = 1 n − 1 " n X i =1 x i y i − n ¯ x ¯ y # . (4.14) 30 CHAPTER 4. MEASURES OF ASSOCIA TION FOR BIV ARIA TE DISTRIBUTIONS 60 70 80 90 0 50 100 150 temperature [°F] ozone [ppb] scatter plot Figure 4.1: Examp l e o f a scatter plot, representing the joint distribution of measured values for th e var iables “temperature” and “ozone” in the R da ta set “airquality . ” R : data("airq uality") ?airqualit y plot( airquali ty$Temp , airquality$Oz one ) (ii) From a relative frequency distribution: s X Y := n n − 1 [ ( a 1 − ¯ x )( b 1 − ¯ y ) h 11 + . . . + ( a k − ¯ x )( b l − ¯ y ) h k l ] =: n n − 1 k X i =1 l X j =1 ( a i − ¯ x ) ( b j − ¯ y ) h ij ; (4.15) alternativ ely: s X Y = n n − 1 [ a 1 b 1 h 11 + . . . + a k b l h k l − ¯ x ¯ y ] = n n − 1 " k X i =1 l X j =1 a i b j h ij − ¯ x ¯ y # . (4.16) Remark: The alternati ve formul ae provided here prove computationall y more efﬁc ient. R : cov( variab le1 , variable2 ) EXCEL, OpenOfﬁc e: COVARIA NCE.S (dt.: KOV ARIANZ.S, KOVAR ) In vie w of its deﬁning equati o n (4.13 ), th e sample covar iance can be given the follo wing geo- metrical interpretation. For a tot al of n dat a points ( x i , y i ) , it quantitﬁes the degree of excess of 4.2. ME ASURES OF ASSOCIA TION FOR THE METRICAL SCALE LEVEL 31 signed rectangular areas ( x i − ¯ x ) ( y i − ¯ y ) with respect t o the common centr oi d r C :=  ¯ x ¯ y  of the n data points in fa vour of either pos i tiv e or negati ve signed areas, if any . 1 It is worthwhile to point out that i n the research literature it is standard to deﬁne for the joint distribution for a metrically scaled two-dimensional v ariable ( X , Y ) a dimensionful symmetric (2 × 2) sa mple covariance matrix S 2 according to S 2 :=  s 2 X s X Y s X Y s 2 Y  , (4.17) the components of which are deﬁned by Eqs . (3.12) and (4.1 3 ). The determin ant o f S 2 , given by det( S 2 ) = s 2 X s 2 Y − s 2 X Y , is positive as long as s 2 X s 2 Y − s 2 X Y > 0 , whi ch applies in mo st practical cases. Then S 2 is regular , and thu s a correspondi ng in verse ( S 2 ) − 1 exists; cf. Ref. [18, Sec. 3.5]. The concept of a regular sample covar iance matrix S 2 and its in verse ( S 2 ) − 1 generalises in a straightforward fashion to the case of m ultiv ariate joint distributions for metrically s caled m - dimensional statistical var iables ( X , Y , . . . , Z ) , where S 2 ∈ R m × m is giv en by S 2 :=      s 2 X s X Y . . . s Z X s X Y s 2 Y . . . s Y Z . . . . . . . . . . . . s Z X s Y Z . . . s 2 Z      , (4.18) and det( S 2 ) 6 = 0 is required. 4.2.2 Brav ais and Pearson’ s sa m ple correlation co efﬁcient The s ample covariance s X Y constitutes th e basis for the s econd s t andard measure characteris- ing the joint dist ribution for a m etrically scaled two-dimension al variable ( X , Y ) by descriptive means, which is t h e normalised and d imensionless sample correlation coefﬁcient r (metr) de- vised by the French physicist Auguste Brav ais (181 1 –1863) and the Engli s h m athematician and statistician Karl Pearson FRS (1857–1936) for the purpose of analysing corresponding biv ariate raw data { ( x i , y i ) } i =1 ,...,n for the existence of a linear (!!!) statistical association. It i s deﬁned in terms of t he biv ariate sampl e covar iance s X Y and the un ivariate sample standard deviations s X and s Y by (cf. Bra vais (1846) [8] and Pearson (1901, 1920 ) [7 9, 81]) r := s X Y s X s Y . (4.19) 1 The c e ntroid is the special case of equ al mass points, with masses m i = 1 n , of the centr e of gravity of a system of n discrete massive objects, deﬁned b y r C := P n i =1 m i r i P n j =1 m j . In two Euclid ian dimensions the position vector is r i =  x i y i  . 32 CHAPTER 4. MEASURES OF ASSOCIA TION FOR BIV ARIA TE DISTRIBUTIONS W ith Eq. (4.13) for s X Y , this becomes r = 1 n − 1 n X i =1  x i − ¯ x s X   y i − ¯ y s Y  = 1 n − 1 n X i =1 z X i z Y i , (4.20) employing standardi s ation according to Eq. (3.18) in the ﬁnal step. Due to its no rmalisation, the range of the sample correlation coef ﬁcient is − 1 ≤ r ≤ +1 . The sig n of r encodes the direction of a correlation. A s t o in terpreting t h e strength o f a correlation via the magnitude | r | , in practice one typically employs the following qu alitative Rule of thumb: 0 . 0 = | r | : n o correlation 0 . 0 < | r | < 0 . 2 : very w eak correlatio n 0 . 2 ≤ | r | < 0 . 4 : weak correlation 0 . 4 ≤ | r | < 0 . 6 : moderately strong correlation 0 . 6 ≤ | r | ≤ 0 . 8 : strong correlation 0 . 8 ≤ | r | < 1 . 0 : very strong correlation 1 . 0 = | r | : p erfect correlation. R : cor( variab le1 , variable2 ) EXCEL, OpenOfﬁc e: CORREL (dt.: KORREL ) SPSS: Analyze → Correlate → Biv ariate . . . : Pearson In l ine with Eq. (4.17), it is con venient to deﬁne a di mensionless s y m metric (2 × 2) sample corr elation matrix R by R :=  1 r r 1  , (4.21) which is regular and pos itive deﬁnite as long as its determin ant det( R ) = 1 − r 2 > 0 . In t his case, its in verse R − 1 is giv en by R − 1 = 1 1 − r 2  1 − r − r 1  . (4.22) Note that for non-corr elati n g metrically scaled variables X and Y , i.e., when r = 0 , the sample correlation matrix degenerates to become a u n it matrix, R = 1 . Again, the concept of a regular and po sitive deﬁnite samp l e correlation mat ri x R , with in verse R − 1 , generalises to m ultiv ariate jo int distri butions for metrically scaled m -dimensional s t atistical var iables ( X , Y , . . . , Z ) , where R ∈ R m × m is given by 2 R :=      1 r X Y . . . r Z X r X Y 1 . . . r Y Z . . . . . . . . . . . . r Z X r Y Z . . . 1      , (4.23) 2 Giv en a da ta ma trix X ∈ R n × m for a metrically scale d m -dim ensional statistical variable ( X , Y , . . . , Z ) , o ne can show tha t upon standardisation of the d a ta acco rding to Eq. (3.1 8), which amoun ts to a tran sformation X 7→ Z ∈ R n × m , the sample correlation m a trix can be re p resented by R = 1 n − 1 Z T Z . The f orm of this relation is e quiv a le n t to Eq. ( 4.20). 4.3. ME ASURES OF ASSOCIA TION FOR THE ORDIN AL SCALE LEV E L 33 and det( R ) 6 = 0 . Note that R is a dimensionless q u antity which, h ence, is scale-in variant ; cf. Sec. 8.10. 4.3 Measur es of assoc ia tion f or the ordinal sc ale le vel At the ordinal scale level, b iv ariate raw data { ( x i , y i ) } i =1 ,...,n for a two-dimensional variable ( X , Y ) is not necessarily quanti tativ e in n atu re. Therefore, i n order to be in a positio n to deﬁne a sensible quantitative bivar iate measure of statistical association for ordinal variables, one needs to introduce meaningful surrogate data whi ch is numerical. This task is realised by m eans of deﬁning so - called rank numbers , which are assigned to the original ordinal data according to the procedure described in the following. Begin by establishing amongst the observed values { x i } i =1 ,...,n resp. { y i } i =1 ,...,n their natural as- cending rank order , i.e., x (1) ≤ x (2) ≤ . . . ≤ x ( n ) and y (1) ≤ y (2) ≤ . . . ≤ y ( n ) . (4.24) Then, ev ery individual x i resp. y i is assigned a rank number which corresponds t o its pos i tion in the ordered sequences (4.24): x i 7→ R ( x i ) , y i 7→ R ( y i ) , for all i = 1 , . . . , n . (4.2 5) Should there be any “tied ranks” due to equality of so me x i or y i , one assi gns the arithmetical mean of the corresponding rank numbers to all x i resp. y i in volved in the “tie. ” Ultimately , by thi s procedure, the entire biv ariate raw data undergoes a transformation { ( x i , y i ) } i =1 ,...,n 7→ { [ R ( x i ) , R ( y i )] } i =1 ,...,n , (4.26) yielding n pairs of rank numbers to numerically represent the orig inal biv ariate o rdi nal data. Giv en surrogate rank number data, the means of rank numbers alw ays amount to ¯ R ( x ) := 1 n n X i =1 R ( x i ) = n + 1 2 (4.27) ¯ R ( y ) := 1 n n X i =1 R ( y i ) = n + 1 2 . (4.28) The variances of rank numbe rs are deﬁned in accordance with Eqs. (3.13) and (3.15), i.e., s 2 R ( x ) := 1 n − 1 " n X i =1 R 2 ( x i ) − n ¯ R 2 ( x ) # = n n − 1 " k X i =1 R 2 ( a i ) h i + − ¯ R 2 ( x ) # (4.29) s 2 R ( y ) := 1 n − 1 " n X i =1 R 2 ( y i ) − n ¯ R 2 ( y ) # = n n − 1 " l X j =1 R 2 ( b j ) h + j − ¯ R 2 ( y ) # . (4.30) 34 CHAPTER 4. MEASURES OF ASSOCIA TION FOR BIV ARIA TE DISTRIBUTIONS In addition , to characterise the joint distribution of rank n umbers, a sa mple cov ariance of rank numbers is deﬁned in line with Eqs. (4.14) and (4.16) by s R ( x ) R ( y ) := 1 n − 1 " n X i =1 R ( x i ) R ( y i ) − n ¯ R ( x ) ¯ R ( y ) # = n n − 1 " k X i =1 l X j =1 R ( a i ) R ( b j ) h ij − ¯ R ( x ) ¯ R ( y ) # . (4.31) On th is fairly elaborate technical backdrop, the En glish psy chologist and statistici an Charles Edward Spearman FRS (18 63–1945) deﬁned a dimensionless sample rank correlation coefﬁcient r S (ord), in analogy to Eq. (4.19), by (cf. Spearman (1904) [96]) r S := s R ( x ) R ( y ) s R ( x ) s R ( y ) . (4.32) The range of this rank correlation coef ﬁcient is − 1 ≤ r S ≤ +1 . Again, while the sign of r S encodes th e direc tion of a rank correlation, in int erpreting the str ength of a rank correlation via the magnitud e | r S | one usually employs t he qualitative Rule of thumb: 0 . 0 = | r S | : no rank correlation 0 . 0 < | r S | < 0 . 2 : very weak rank correlation 0 . 2 ≤ | r S | < 0 . 4 : weak rank correlation 0 . 4 ≤ | r S | < 0 . 6 : mod erately strong rank correlation 0 . 6 ≤ | r S | ≤ 0 . 8 : st rong rank correlation 0 . 8 ≤ | r S | < 1 . 0 : very strong rank correlation 1 . 0 = | r S | : perfect rank correlation. R : cor( var iable1 , variable2 , method = "spearman") SPSS: Analyze → Correlate → Biv ariate . . . : Spearman When no tied ranks occur , Eq. (4.32) simp l iﬁes to (cf. Hartun g et al (2005) [39, p 554]) r S = 1 − 6 P n i =1 [ R ( x i ) − R ( y i )] 2 n ( n 2 − 1) . (4.33) 4.4 Measur es of assoc ia tion f or the nominal s c ale le vel Lastly , let us turn to consider the case of q uantifying the degree of st atistical association in biv ariate raw dat a { ( x i , y i ) } i =1 ,...,n for a nomi n ally scaled two-dimens ional variable ( X , Y ) , with categories { ( a i , b j ) } i =1 ,...,k ; j =1 ,... ,l . The starti n g point are the observed bivar iate absolute resp. r elative (cell) fr equencies o ij and h ij of the j o i nt distribution for ( X , Y ) , wit h univ ariate marginal fr equencies o i + resp. h i + for X and o + j resp. h + j for Y . The χ 2 –statistic d evised by the En g lish mathematical statistician Karl Pearson FRS (1857 – 1936) rests on the notio n of stat i stical independence of t wo 4.4. ME ASURES OF ASSOCIA TION FOR THE NOMIN AL SCALE LEVE L 35 one-dimensional v ariables X and Y in that it takes the correspon ding formal condition provided by Eq. (4.12) as a reference state. A sim ple algebraic manipulation of this condition obtains h ij = h i + h + j ⇒ o ij n = o i + n o + j n multipli catio n by n z}|{ ⇒ o ij = o i + o + j n . (4.34) Pearson’ s d escrip t iv e χ 2 –statistic (cf. Pearson (1900) [78]) is then deﬁned by χ 2 := k X i =1 l X j =1  o ij − o i + o + j n  2 o i + o + j n = n k X i =1 l X j =1 ( h ij − h i + h + j ) 2 h i + h + j , (4.35) whose range of values amounts to 0 ≤ χ 2 ≤ max( χ 2 ) , with max( χ 2 ) := n [min( k , l ) − 1] . Remark: Provided o i + o + j n ≥ 5 for all i = 1 , . . . , k and j = 1 , . . . , l , Pearson’ s χ 2 –statistic can be employed for the analy s is of statist i cal associatio ns amon g st th e com ponents of a two-dimensio nal var iable ( X, Y ) of al m ost all combinations of scale lev els. The problem with Pearson’ s χ 2 –statistic is that , due to i ts variable spectrum of values, it is not immediately clear how to use it ef ﬁciently in i nterpreting the strength of stati stical associatio ns. This shortcom i ng can, howe ver , be overcome by resorting to th e meas ure of asso ciation propos ed by the Swedish mathematician, actuary , and statistician Carl Harald Cramér (1893–1985), which basically is the result of a special kind of norm alisation of Pearson’ s measure. Th u s , Cramér’ s V , as it has come to be known, is deﬁned by (cf. Cramér (1946) [13]) V := s χ 2 max( χ 2 ) , (4.36) with range 0 ≤ V ≤ 1 . For the int erpretation of the strengt h of statistical association in the joi n t distribution for a t wo-dimensional categorical variable ( X , Y ) , one may thus employ the qualit ative Rule of thumb: 0 . 0 ≤ V < 0 . 2 : weak association 0 . 2 ≤ V < 0 . 6 : m oderately strong association 0 . 6 ≤ V ≤ 1 . 0 : strong association. R : assocstats ( contingen cy table ) (package: vcd , by Meyer et al (2017) [70]) SPSS: Analyze → Descriptiv e Statistics → Crosstabs . . . → Statistics . . . : Chi-sq uare, Phi and Cramer’ s V 36 CHAPTER 4. MEASURES OF ASSOCIA TION FOR BIV ARIA TE DISTRIBUTIONS Chapter 5 Descripti ve linear r egr ession analysis For strongly correlating bivariate sam p le data { ( x i , y i ) } i =1 ,...,n for a metrically scaled two- dimensional statist ical var iable ( X , Y ) , i.e., when 0 . 71 ≤ | r | ≤ 1 . 0 , it is m eanin gful to con- struct a mathemati cal mo del of the linear quantitative s t atistical association so diagnosed. The standard meth o d to realise th is by system atic means is due to the German mathematician and astronomer Carl Friedrich Gauß (1777–1 855) and is known by the n am e of descriptiv e linear re- gre ssion analysis ; cf. Gauß (1809) [29]. W e here restrict our attention to the case of simple linear regr ession , which aim s to explain the variability i n one depend ent variable i n terms o f the var iability in a singl e independe nt variable . T o be determined is a best-ﬁt linear model to given biv ariate metrical data { ( x i , y i ) } i =1 ,...,n . The linear model in question can be expressed in mathematical terms by ˆ y = a + bx , (5.1) with unknown regression coefﬁ cients y -intercep t a and slo pe b . Gauß’ method of least squar es works as fol lows. 5.1 Method of least squares At ﬁrst, one has to make a choice: assign X the statu s of an indepe ndent variable , and Y the status of a dependent variable (or v ice versa; usually this freedom of choice does exist, unl ess one is testing a speciﬁc functional or suspected causal relationship, y = f ( x ) ). Then, considering the measured v alues x i for X as ﬁxed, to be minimised for th e Y -data is the sum of the squar ed vertical deviations of the measured values y i from the model values ˆ y i = a + bx i . T h e l atter are associated with an arbitrary straight li ne through the cloud of data points { ( x i , y i ) } i =1 ,...,n in a scatter plot . This sum, given by S ( a, b ) := n X i =1 ( y i − ˆ y i ) 2 = n X i =1 ( y i − a − bx i ) 2 , (5.2) constitutes a no n -negati ve real-valued function of two variables, a and b . Hence, determin i ng its (local) mi nimum values entails satis fyi ng (i) the n ecessary condi tion o f simul taneously vanishing 37 38 CHAPTER 5. DESCRI PTIVE LINEAR REGRESSION AN AL YSIS ﬁrst partial deriv atives 0 ! = ∂ S ( a, b ) ∂ a , 0 ! = ∂ S ( a, b ) ∂ b , (5.3) — this yields a well-determined (2 × 2) sy s tem of l i near algebraic equations for the unknowns a and b , cf. Ref. [18, Sec. 3.1] —, and (ii) the suf ﬁcient condi t ion of a posi tive deﬁni te Hes s ian matrix H ( a, b ) of second partial deriv atives, H ( a, b ) :=      ∂ 2 S ( a, b ) ∂ a 2 ∂ 2 S ( a, b ) ∂ a∂ b ∂ 2 S ( a, b ) ∂ b∂ a ∂ 2 S ( a, b ) ∂ b 2      , (5.4) at the candidate optimal values of a and b . H ( a, b ) i s referred to as positive deﬁnite when all of its eigen values are positive; cf. Ref. [18, Sec. 3.6]. 5.2 Empirica l regr es s ion line It is a fa irly straightforward algebraic exercise (see, e.g., T outenbur g (2004) [107, p 141ff ]) to show that the values of the unknowns a and b , which determi ne a uni q ue global m i nimum of S ( a, b ) , amount to b = s Y s X r , a = ¯ y − b ¯ x . (5.5) These values are referred to as the least s quar es estimators for a and b . Note that they are ex- clusive ly expressible in terms of familiar univ ariate and biv ariate measures characterising the jo int distribution for ( X , Y ) . W ith the solut i ons a and b o f E q . (5.5) inserted in Eq. (5.1), the resultant best-ﬁt li near model i s giv en by ˆ y = ¯ y + s Y s X r ( x − ¯ x ) . (5.6) It m ay be employed for the purpose of generating intrapolating predictions of the k ind x 7→ ˆ y , for x -va lues conﬁned t o the empirical interv al [ x (1) , x ( n ) ] . An example of a best-ﬁt linear mod el obtained by the method of least squares is shown i n Fig. 5 . 1 . R : lm( variabl e:y ~ varia ble:x ) EXCEL, OpenOfﬁc e: SLOPE , INTERCEPT (dt.: STEIGUNG , ACHSENABS CHNITT ) SPSS: Analyze → Regression → Li near . . . Note that Eq. (5.6) may be re-expressed in terms of the corresponding Z scores of X and ˆ Y , according to Eq. (3.18). This yields  ˆ y − ¯ y s Y  = r  x − ¯ x s X  ⇔ ˆ z Y = r z X . (5.7) 5.3. COEFFICIENT OF DETERMIN A TION 39 60 70 80 90 0 50 100 150 temperature [°F] ozone [ppb] least squares simple linear regression Figure 5.1: Example of a best -ﬁt lin ear m odel obtained by t he method of least squ ares for the case of the bivariate joint di stribution featured in Fig - 4.1. The least squares estimators for t he y -intercept and the slope take values a = 6 9 . 41 ppb and b = 0 . 20 ( ppb / °F ) , respectively . R : data("airq uality") ?airqualit y regMod <- lm( airqualit y$Temp ~ airquality$Ozo ne ) summary(re gMod) plot( airquali ty$Temp , airquality$Oz one ) abline(reg Mod) 5.3 Coefﬁcie n t of d e terminat ion The quality of an y particular simpl e li n ear regression model, i. e., i ts goodness-of-the-ﬁt , is as- sessed by means of the coefﬁcient of determination B (metr). This m easure is deriv ed by starting from the algebraic identity n X i =1 ( y i − ¯ y ) 2 = n X i =1 ( ˆ y i − ¯ y ) 2 + n X i =1 ( y i − ˆ y i ) 2 , (5.8) which, upon con veniently re-arranging, leads to deﬁning a quantity B := n X i =1 ( y i − ¯ y ) 2 − n X i =1 ( y i − ˆ y i ) 2 n X i =1 ( y i − ¯ y ) 2 = n X i =1 ( ˆ y i − ¯ y ) 2 n X i =1 ( y i − ¯ y ) 2 , (5.9) 40 CHAPTER 5. DESCRI PTIVE LINEAR REGRESSION AN AL YSIS with range 0 ≤ B ≤ 1 . A perfect ﬁt is s i gniﬁed by B = 1 , while no ﬁt amo u nts to B = 0 . The coef ﬁcient of determinati on provides a descriptive measure for t he propo rt i on of v ariability of Y in a biva riate data set { ( x i , y i ) } i =1 ,...,n that can be accounted for as due to the associati on with X via the simple linear regression model. Note that in simpl e linear regression it h o lds that B = r 2 ; (5.10) see, e.g., T outenbur g (2004) [107, p 150f]). R : summary( lm( variabl e:y ~ varia ble:x ) ) EXCEL, OpenOfﬁc e: RSQ (dt.: BESTIM MTHEITSMAS S ) SPSS: Analyze → Regression → Li near . . . → Statistics . . . : Model ﬁt This concludes Part I of these lecture notes, t he i ntroductory d i scussion on u n i- and biv ariate descriptiv e statistical methods of data analysis . W e wish to encourage the interested reader t o adhere to accepted scientiﬁc standards when acti vely getting in volve d with data analysi s her/him- self. Thi s entails, amongst ot her aspects, foremost the truthful document at i on of all data taken into account in a speciﬁc analysis condu cted. Features fac ilitatin g understanding such as visu- alisations of em pirical distributions by means of, where appropriate, histograms, b ar charts, box plots or scatter plot s , or providing th e values of ﬁ ve number summ aries, sample means, sample standard deviations, standardis ed ske wness and excess kurtosis measures, or s am ple correlation coef ﬁcients shoul d be comm onplace in any kind of research report. It must b e a prime objective of the researcher to empower potential readers to retrace t h e i n ferences made by her/him. T o set the stage for the application of inferential stati stical methods in P art III, we now turn to re view the element ary concepts underlyi ng P robability Theory , predominantly as in terpreted in the fre quentist approach to this topic. Chapter 6 Elements of pr obability theory All examples of i nfer ential stati s tical methods of data a nalysis to be presented in Chs. 1 2 and 13 hav e been deve loped in t h e con text of the s o -called fr equentist approach to Probability Theory . 1 The issue in Infer ential Statistics is to estimate the pl ausibility or likelihood of hypot h eses g iv en the observational evidence for th em. The fr equentist approach was pi o neered by the Italian mathematician, physician, astrologer , phi losopher and gambl er Girolamo Cardano (1501–1576), the French lawyer and amateur mathematician Pierre d e Fermat (16 01–1665), the French math- ematician, phys icist, in ventor , wri t er and Catholic phil o sopher Blaise Pascal (1623–1662), th e Swiss mathematician Jakob Bernoull i (1654–1705), and the French m ath ematician and ast ron omer Marquis Pierre Simon de Laplace (1749–1827 ). It is deeply rooted in the two fundamental as- sumption s t h at any particular random experiment can be repeated arbitrarily often (i) under t h e “same conditio n s, ” and (ii) completely “independent of one another , ” so t hat a theoretical basis is giv en for deﬁning allegedly “ objective pr oba bilities ” for random events and hyp o theses vi a the r elative fr equencies of very long sequences of repetition of the same random experiment. 2 This is a highly i d ealised viewpoint, ho wev er , which shares only a limited degree of similarity with the actual conditio n s pertaining to an observer’ s resp. experimentor’ s reality . Renowned textbooks adopting the fr equentist viewpoint of P r obability Theory and Infer ential Statistics are, e.g. , Cramér (1946) [13] and Feller (1968) [21]. Not everyone in Statistics is entirely happy , though, with the philo sophy un d erlying the fre- quentist appr oach to introducing the concept of probability , as a number of it s central ideas rely on un observed data (information). A complementary viewpoint is taken by the frame- work which orig i nated from the work of the English math ematician and Presbyterian mini s ter Thomas Bayes (1702–1761), and l at er of Lapl ace, and so is common l y referred to as t h e Bayes– Laplace appr oach ; cf. Bayes (176 3 ) [2] and L aplace (1812) [58 ]. A strikin g conceptual difference to the frequ entist a ppr oach consists in i t s use of prior , allegedly “ s u b jective pr obabi lities ” for ran- dom eve nts and hypotheses, quantifyi n g a person s’ s individual reasonabl e degree -of-belief in their 1 The orig in of the term “proba b ility” is traced back to the Latin word pr ob abilis , which the Roman p hilosoph e r Cicero (10 6 BC–43 BC) used to captu r e a notio n o f plausibility or likelihood; see Mlo dinow (20 08) [73, p 32 ]. 2 A special role in the context of the frequ entist app roach to Probab ility Th eory is assumed by Jakob Bern oulli’ s law of large n u mbers, as well as the concep t of independe ntly an d identically distributed (in short: “i.i.d . ”) random variables; we will discu ss these issues in Sec. 8.15 below . 41 42 CHAPTER 6. ELEMENTS OF PR OBABI LITY THEOR Y likelihood, which are subsequently updated by analysi ng relev ant empiri cal d ata. 3 Reno wned text- books adopt ing the Bayes–Laplace v i ewpoint of Pr obability Theory and Inferent ial Statistics are, e.g., Jeffr eys (1939) [44] and Jaynes (2003) [43], while general information re garding the Bayes–Laplace appr oach is a vailable from the websi te bayes.wustl.e du . More recent t ext- books, which assist in the implementation of advanced computati onal routines, hav e been issu ed by Gelman et al (2014) [30] and by McElreath (20 1 6) [69]. A discus s ion o f the pros and cons of either of these two competing approaches to Probability Theory can be found, e.g., i n Sivia and Skilling (2006) [92, p 8ff], or in Gilbo a (200 9 ) [31, Sec. 5.3]. A com m on denominator of both frameworks, freque ntist and Bayes–Laplace , is t he attem p t to quantify a no tion of uncertainty t hat can be related to in formal treatments of decision-making . In the following we turn to discuss the g eneral p ri n ciples on which Prob ability Theory is built. 6.1 Random e vents W e begin by introducing some basic formal constructions and corresponding terminology used in the fre quentist approach to Pr o babili ty Theory : • Random experiments : Random experiments are experiments which can be repeated arbi- trarily often under identical conditions , with events — also called outcomes — th at can- not be predicted wit h certainty . W ell-kno wn si mple examples are found amongst games of chance such as tossing a coin, rolli ng dice, or p laying roulette. • Sample space Ω = { ω 1 , ω 2 , . . . } : The sampl e space asso ciated wit h a random experiment is const i tuted by the set of all possible elementary ev ents (or elementary outcomes) ω i ( i = 1 , 2 , . . . ), which are signiﬁed by t heir p roperty of mutual exclusivity . The sample space Ω of a random experiment may contain ei t her (i) a ﬁnite number n of elem entary e vents; then | Ω | = n , or (ii) countably many elementary e vents in the sense of a one-to-one correspondence with the set of natural numbers N , or (iii) uncountabl y may elements in the s ens e of a one-to-one correspond ence with the set o f real numbers R , or an open or clos ed s ubset thereof. 4 The essent i al concept of t he sam p l e space associated wi th a random experiment was intro- duced to Probability Theory by the Italian math em atician Girolamo Cardano (1501–1576 ); see Cardano (1564) [10], Mlodi now (2008) [73, p 42], and Bernstein (1998 ) [3, p 47ff ]. • Random ev ents A, B , . . . ⊆ Ω : Random events are formally deﬁned as all kinds of subs ets of Ω that can be formed from the elementary ev ents ω i ∈ Ω . 3 Anscombe and Aum ann (196 3) [1] in their seminal paper ref er to “objective p robab ilities” as associated with “roulette lo tter ies, ” and to “subjective pr obabilities” as associated with “horse lotteries. ” Sa vage (195 4) [89] e m ploys the alternative termin ology of disting uishing between “objectivistic probab ilities” and “per sonalistic probabilities. ” 4 For reason s of deﬁniten ess, we will assume in th is case that the sample space Ω associated with a rand om exper- iment is c o mpact. 6.1. RANDOM EVENTS 43 • Certain ev ent Ω : The certain event is synonymous with th e sample space itsel f. When a particular random experiment is con ducted, “something will happen for sure. ” • Impossi ble event ∅ = {} = ¯ Ω : Th e impossible ev ent is the natural complement to the certain e vent. When a particular random experiment is conducted, “it i s n o t p ossible that nothing will happen at all. ” • Event s pace P ( Ω ) := { A | A ⊆ Ω } : The ev ent space, als o referre d to as the power set of Ω , is the s et of all possible subsets (random events!) th at can be formed from element ary e vents ω i ∈ Ω . Its size (or cardinality) is gi ven by |P ( Ω ) | = 2 | Ω | . The e vent s pace P ( Ω ) constitutes a so-called σ –algebra associated with the sample sp ace Ω ; cf. Rinne (2008 ) [87 , p 177]. When | Ω | = n , i.e., wh en Ω is ﬁnite, then |P ( Ω ) | = 2 n . In the formulation o f probability theoretical laws and computational rul es, the following set oper- ations and identiti es prove useful. Set operations 1. ¯ A = Ω \ A — complementation of a set (or event) A (“not A ”) 2. A \ B = A ∩ ¯ B — formation of the differ ence of sets (or events) A and B (“ A , but not B ”) 3. A ∪ B — formation of the union of sets (or e vents) A and B , otherwis e referre d to as t he disjunction of A and B (“ A or B ”) 4. A ∩ B — form ati on of the intersection of sets (or ev ents) A and B , otherwise referred to as the conjunction of A and B (“ A and B ”) 5. A ⊆ B — inclusion of a set (or ev ent) A in a set (or ev ent) B (“ A is a s ubset of o r equal to B ”) Computational rules and identities 1. A ∪ B = B ∪ A and A ∩ B = B ∩ A ( commutativity ) 2. ( A ∪ B ) ∪ C = A ∪ ( B ∪ C ) and ( A ∩ B ) ∩ C = A ∩ ( B ∩ C ) ( associativity ) 3. ( A ∪ B ) ∩ C = ( A ∩ C ) ∪ ( B ∩ C ) and ( A ∩ B ) ∪ C = ( A ∪ C ) ∩ ( B ∪ C ) ( distrib utivity ) 4. A ∪ B = ¯ A ∩ ¯ B and A ∩ B = ¯ A ∪ ¯ B ( de Morgan’ s laws ) Before addressing the central axioms of Prob ability Theory , we ﬁrst provide the following i m - portant deﬁnition. Def.: Suppose gi ven a comp a ct sam p le space Ω of some random experiment. Then one u n der - stands by a ﬁnite complete partition of Ω a set of n ∈ N random e vents { A 1 , . . . , A n } such that 44 CHAPTER 6. ELEMENTS OF PR OBABI LITY THEOR Y (i) A i ∩ A j = ∅ for i 6 = j , i.e., they are pairwise disjoint (mutu ally exclusive), and (ii) n [ i =1 A i = Ω , i.e., their union is identical to the full s a mple space . 6.2 K olmo gor ov’ s axiom s o f pr obability theory It took a fairly long tim e until, in 1933, a unanim ously accepted basis of Probability The ory was established. In part t he delay was due to problems with providing a un i que deﬁnition of pr obability , and how it could be measured and interpreted in practice. The situatio n was resolved only wh en the Russian mathemat i cian Andrey Nikolaevich K olmogorov (1903–198 7) proposed t o discard the int ent ion of providing a unique deﬁnition of probability altogether , and to restrict the issue instead to merely prescribi ng in an axio matic fashion a m inimum set of properties any pr obability measure needs to p o ssess in order to be coh erent and consi s tent. W e now recapitul ate the axioms that K olmogorov put forward; cf. K olmogoroff (1933) [50]. For a giv en random experiment , let Ω be it s sample space and P ( Ω ) th e ass o ciated eve nt space . Then a mapping P : P ( Ω ) → R ≥ 0 (6.1) deﬁnes a pr obability measure w i th the following properties: 1. for all random events A ∈ P ( Ω ) , ( non-negativity ) P ( A ) ≥ 0 , (6.2) 2. for th e certain ev ent Ω ∈ P ( Ω ) , ( normalisability ) P ( Ω ) = 1 , (6.3) 3. for all pairwise disjoint random even ts A 1 , A 2 , . . . ∈ P ( Ω ) , i.e., A i ∩ A j = ∅ for all i 6 = j , ( σ –additivity ) P ∞ [ i =1 A i ! = P ( A 1 ∪ A 2 ∪ . . . ) = P ( A 1 ) + P ( A 2 ) + . . . = ∞ X i =1 P ( A i ) . (6.4) The ﬁrst two axioms imply the property 0 ≤ P ( A ) ≤ 1 , for all A ∈ P ( Ω ) ; (6.5) the expre ssion P ( A ) itself is refer red to as the pr obability of a random e vent A ∈ P ( Ω ) . A less strict version of the third axiom is giv en by requiring only ﬁnite additivity of a probabil ity measure. This means it shall pos sess the p ro p erty P ( A 1 ∪ A 2 ) = P ( A 1 ) + P ( A 2 ) , for any two A 1 , A 2 ∈ P ( Ω ) with A 1 ∩ A 2 = ∅ . (6.6) 6.2. KOLMOGOR O V’S AXIOM S OF PR OB ABILITY THEOR Y 45 The triplet ( Ω , P , P ) constitutes a special case of a so-called probability space . The follo wing consequences for random e vents A, B , A 1 , A 2 , . . . ∈ P ( Ω ) can be deriv ed from K ol mogorov’ s three axiom s of probability theory; cf., e.g., T outenb ur g (2005) [108, p 19f f]. Their implication s can be con vienently visualis ed b y means of V enn diagrams , named in honour of the English logician and philoso pher John V enn FRS FSA (1834–1923); see V enn (1880) [11 3], and also, e.g., W e wel (2014) [11 6 , Ch. 5]. Consequences 1. P ( ¯ A ) = 1 − P ( A ) 2. P ( ∅ ) = P ( ¯ Ω ) = 0 3. If A ⊆ B , then P ( A ) ≤ P ( B ) . 4. P ( A 1 ∪ A 2 ) = P ( A 1 ) + P ( A 2 ) − P ( A 1 ∩ A 2 ) . 5. P ( B ) = n X i =1 P ( B ∩ A i ) , provided the n ∈ N random ev ent s A i constitute a ﬁnite complete partition of the sample space Ω . 6. P ( A \ B ) = P ( A ) − P ( A ∩ B ) . Employing its complementation ¯ A and the ﬁrst of the consequ ences stated above, o ne deﬁnes by the ratio O ( A ) := P ( A ) P ( ¯ A ) = P ( A ) 1 − P ( A ) (6.7) the so-called odds of a random event A ∈ P ( Ω ) . The reno wned Israeli–US-American experimental psychologist s Daniel Kahneman and Amos Tversky (the latter of which deceased in 1996, aged ﬁfty-nine) refer to the third of the conse- quences stated above as the extension rule ; s ee Tversky and Kahneman (1983) [111, p 294]. It provides a cornerstone to their remarkable i n vestigations on th e “intuitive statistics” applied by Humans in e veryday decision-making , which focus in particular on the conjunction rule , P ( A ∩ B ) ≤ P ( A ) and P ( A ∩ B ) ≤ P ( B ) , (6.8) and the associated disjunction rule , P ( A ∪ B ) ≥ P ( A ) and P ( A ∪ B ) ≥ P ( B ) . (6.9) Both may be perceiv ed as s ubcases of the fourth law abov e, which is occasionally referred to as the con vexity property of a probabilit y m easure; cf. G i lboa (2009) [31, p 160]. By means of their famous “Linda the bank teller” example i n particular , Tversky and Kahneman (1983) [111, 46 CHAPTER 6. ELEMENTS OF PR OBABI LITY THEOR Y p 297ff] were able to demonst rate the startli n g empi ri cal fact that the conjunction rule is frequently violated in ev eryday (intui tiv e) decision-m aking; in their view , in consequence of decision-m akers often resorting to a so -called repr esentat iveness heuristic as an aid in correspond i ng situatio n s; see also Kahneman (2011) [46, Sec. 15 ]. In recognition of their as much intriguing as groun d breaking work, which sparked t he discipline of Beha vioural Economics , Daniel Kahneman was aw arded the Sveriges Riksbank Prize i n Economic Sciences in Memory of Alfred Nobel in 2002. 6.3 Laplacian random exp eriments Games of chance with a ﬁnite number n of possi ble mutually exclusi ve element ary outcomes, such as tossing a singl e coin once, rolling a single dye on ce, or selecting a sing le playing card from a deck of 3 2, belong to the simp l est kinds of random experiments. In this context, there exists a clear -cut frequentist notio n of a unique “ obj ective pr obability ” associated with any ki nd of possible random e vent (outcome) that may occur . Such probabili ties can b e com - puted according t o a straight forward prescription d ue to the French mathematician and astronomer Marquis Pierre Simon de Laplace (1749–1827 ). The prescripti o n rests on the assumptio n that the device generating the random events i s a “fair” (i.e., unbiased) one. Consider a random experiment, the n elementary events ω i ( i = 1 , . . . , n ) of which that const itute the associated sample space Ω are supposed to be “equally likely , ” m eaning they are assi g ned equal pr obabili ty : P ( ω i ) = 1 | Ω | = 1 n , for all ω i ∈ Ω ( i = 1 , . . . , n ) . (6.10) All random experiments o f this nature are referred to as Laplacian random experiments . Def.: For a Laplacian random experiment, the probabi lity of an arbitrary random e vent A ∈ P ( Ω ) can be computed according to the rule P ( A ) := | A | | Ω | = Number of cases fa vourable to event A Number of all possible cases . (6.11) Any probabi l ity m easure P which can be constructed in th i s fashion is called a Laplacian proba- bility measur e . The syst em atic counting of the num b ers o f possible out comes of random experiments i n general is the central theme of combinatorics . W e now brieﬂy address its main consi d erations. 6.4 Combinat orics At the heart of combinatorical considerations is the well-known urn model . T h is su pposes g iven an urn containing N ∈ N ball s that are either (a) all differe nt, and thu s can be uniquely distinguished from one another , or 6.4. COMBINA TORICS 47 (b) there are s ∈ N ( s ≤ N ) subsets o f ind i stinguish abl e like balls, of sizes n 1 , . . . , n s resp., such that n 1 + . . . + n s = N . The ﬁrst systematic d evelopments in Combinatorics date back to t he Italian ast ro n o mer , physicist, engineer , philosopher , and mathemati ci an Galileo G al i lei (1564–1642) and the French mathemati- cian Blaise Pascal (16 23–1662); cf. Mlodinow (2008) [73, p 62ff]. 6.4.1 Permutations Pe rmutations relate to the nu mber of disting u i shable possibili ties o f arranging N balls in an or- dered sequences. Altogeth er , for cases (a) resp. (b) one ﬁnds that there are a total num b er of (a) all balls differe nt (b) s subsets of like balls N ! N ! n 1 ! n 2 ! · · · n s ! diffe rent possibilit ies. Remember that t h e factorial of a natural number N ∈ N i s deﬁned by N ! := N × ( N − 1 ) × ( N − 2) × · · · × 3 × 2 × 1 . (6.12) R : factorial( N ) 6.4.2 Combinations and variations Combinations and variations ask for the total number of distinguishabl e possi b ilities of selecting from a collection of N balls a s am ple of size n ≤ N , while di fferentiating between cases w h en (a) the order in which balls were selected i s either neglected or ins tead account ed for , and (b) a ball that was selected once eit her cannot be s elected again or in deed can be selected again as often as a ball is being drawn. These considerations result in the following cases o f different possibiliti es: no repetition with repetition combinations (order neglected)  N n   N + n − 1 n  var iations (order accounted for)  N n  n ! N n 48 CHAPTER 6. ELEMENTS OF PR OBABI LITY THEOR Y Note that, herein, the binomial coefﬁcient for t wo natural numbers n, N ∈ N , n ≤ N , introduced by Blaise Pascal (1623– 1 662), is deﬁned by  N n  := N ! n !( N − n )! . (6.13) For ﬁx ed value of N and running value of n ≤ N , it generates the positive integer entries o f Pascal’ s well-known numerical triangle; s ee, e.g., Mlodinow (2008) [73, p 72ff] . The bin o mial coef ﬁcient satisﬁes the identity  N n  ≡  N N − n  . (6.14) R : choose( N , n ) T o conclud e thi s chapter , w e turn to discuss t he essential concept of conditional probabilities of random e vents. 6.5 Condition a l pr o babilities Consider so me random experiment with sample space Ω , e vent space P ( Ω ) , and a well-deﬁned, unique probability measure P over P ( Ω ) . Def.: For random events A, B ∈ P ( Ω ) , with P ( B ) > 0 , P ( A | B ) := P ( A ∩ B ) P ( B ) (6.15) deﬁnes the conditional pr obabili ty of A to occur , giv en that it i s kn own that B occurred before. Analogously , one d eﬁnes a conditi o nal probability P ( B | A ) with the roles of random ev ent s A and B swi tched, provided P ( A ) > 0 . Note that since, by Eq. (6.5), 0 ≤ P ( A | B ) , P ( B | A ) ≤ 1 , th e implication of deﬁnition (6.15) is that the conju nction rul e (6.8) must always be satisﬁed. Def.: Random events A, B ∈ P ( Ω ) are called mutually stochastically independen t , if, s imulta- neously , the conditi o ns P ( A | B ) ! = P ( A ) , P ( B | A ) ! = P ( B ) Eq. 6.15 ⇔ P ( A ∩ B ) = P ( A ) P ( B ) (6.16) are s atisﬁed, i.e., when for both random events A and B the a posteri ori pr obabilities P ( A | B ) and P ( B | A ) coincide with the respective a priori pr obabilities P ( A ) and P ( B ) . For appl i cations, t he following two prominent laws of Probability Theory prove essential. 6.5.1 Law of total pr obability For a random experiment wi t h probability space ( Ω , P , P ) , it holds by the law of total probab ility that for any random event B ∈ P ( Ω ) P ( B ) = m X i =1 P ( B | A i ) P ( A i ) , (6.17) 6.5. CONDITIONAL PR OBAB ILITIES 49 provided the random events A 1 , . . . , A m ∈ P ( Ω ) constit ute a ﬁnite complete partition of Ω into m ∈ N pairwise disjoint events . The content of this law may be conv eni ently visualised by means of a V enn diagram. 6.5.2 Bayes’ t heor em This important result is due to the English mathematici an and Presbyterian minist er Thomas Bayes (1702–1761); see t he posthu mous publi cati o n Bayes (1763) [2]. For a random experiment wi th probability space ( Ω , P , P ) , it states that, given (i) random events A 1 , . . . , A m ∈ P ( Ω ) which constitute a ﬁnite complete partition of Ω i n to m ∈ N pairwise disjoint even ts , (ii) P ( A i ) > 0 for all i = 1 , . . . , m , wit h m X i =1 P ( A i ) = 1 by Eq. (6.3), and (iii) a random e vent B ∈ P ( Ω ) with P ( B ) Eq. 6.17 = m X i =1 P ( B | A i ) P ( A i ) > 0 that is k nown to ha ve occurred, the identity P ( A i | B ) = P ( B | A i ) P ( A i ) m X j =1 P ( B | A j ) P ( A j ) (6.18) applies. This form of the theorem was give n by L aplace (1774) [56]. By Eq. (6.3), i t necessar- ily follows that m X i =1 P ( A i | B ) = 1 . Again, the content of Bayes’ theorem m ay be con veniently visualised by means of a V enn diagram. Some of the differ ent terms appearing in Eq. (6.18) have been given names in their own righ t : • P ( A i ) is referred to as the prior prob ability of random event, or hypot h esis, A i , • P ( B | A i ) is the li kelihood of random event, or emp i rical e v idence, B , given random ev ent, or hypothesis, A i , and • P ( A i | B ) is called the posterior pr obabili ty of random event, or hypo thesis, A i , given ran- dom e vent, o r empirical e vi dence, B . The most common interpretation o f Bayes’ theorem is that it essentiall y provides a m eans for computing the posterior pr obability of a random event, or hypoth esis, A i , given inform ation on the factual realisation of an ass o ciated random event, or evidence, B , in terms of the product of the likelihood of B , gi ven A i , and the prior probability of A i , P ( A i | B ) ∝ P ( B | A i ) × P ( A i ) . (6.19) 50 CHAPTER 6. ELEMENTS OF PR OBABI LITY THEOR Y This result is at th e heart of the int erpretati o n th at empirical learning amounts to updating the prior “ subjective pr o bability ” one has assi gned to a speciﬁc random event, or hypot hesis, A i , in or- der to quantify one’ s ini tial reasonable degree-of-be lief in its occurrence resp. in its truth content, by means of adequate experimental or observational data and corresponding t heoretical considera- tions; see, e.g., Sivia and Skillin g (2006) [92 , p 5ff], Gelman et al (2014) [30, p 6ff ], or McEl reath (2016) [69, p 4ff]. The Bayes–Laplace approach to tackling quanti tativ e–statistical prob lems in Econometrics was pioneered by Zellner in th e early 1970ies; see the 1996 reprint of his renowned 1971 mo no- graph [123]. A recent thoro u g h introduction into its m ain considerations is provided by the g radu- ate textbook by Greenber g (20 13) [35]. A particularly promi nent applicatio n of this frame work in Econometrics is gi ven by proposals to t he mat h ematical m odelling of economic agent s’ decision-making (in the sense of choice be- haviour) under conditi o ns of uncertainty , which, fundament ally , assum e rational behaviour on the part of t he agent s; see, e.g., the graduate textbook by Gilboa (2009) [31], and the brief re views by Svetlov a and van Elst (2012, 2014) [103 , 104], as well as references therein. Psychological di- mensions of decision-making , on the other hand, such as the empirically established existence of reference points, loss aver sion, and dis tortion of probabilities i nto correspond i ng decisi on weights, hav e been account ed for i n Kahneman and Tversky’ s (1979) [47 ] Pr o spect Theory . Chapter 7 Discr ete and continuous random variables Applications of inferent ial statistical methods root ed in the freque ntist appr o a ch to Prob ability Theory , some of which are t o be discussed in Chs. 12 and 13 b el ow , rest fund am entally on the concept of a probability-dependent quantity arising in the context of random experiments that is referre d to as a random variable . T h e present chapter aims to provide a basic int roduction to the g eneral properties and characteristic features of random variables. W e begin by stating the deﬁnition of this concept. Def.: A real-va lued one-dimensional random variable i s deﬁned as a one-to-one mapping X : Ω → D ⊆ R (7.1) of th e sample space Ω of some rando m experiment with associated probability space ( Ω , P , P ) into a subset D of the real numbers R . Depending on the nature of the s pectrum o f values of X , we will distingui sh in the following between random variables of the discrete and continuous kinds. 7.1 Discrete random v ariables Discre te random variables are signiﬁed by the existence of a ﬁnite or countably inﬁnite Spectrum of values: X 7→ x ∈ { x 1 , . . . , x n } ⊂ R , with n ∈ N . (7.2) All values x i ( i = 1 , . . . , n ) in this spectrum, referred t o as pos sible realisations of X , are assigned individual probabilities p i by a real-v al u ed Pr obability function: P ( X = x i ) = p i for i = 1 , . . . , n , (7.3) with properties (i) 0 ≤ p i ≤ 1 , and ( non-negativity ) 51 52 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES (ii) n X i =1 p i = 1 . ( normalisability ) Speciﬁc distributional features of a d iscrete random var iable X deriving from its probabil ity func- tion P ( X = x i ) are encoded in the associated theoretical Cumulative distribution function ( cdf ): F X ( x ) = cdf ( x ) := P ( X ≤ x ) = X i | x i ≤ x P ( X = x i ) . (7.4) The cdf exhibits t h e asymptotic beha viour lim x →−∞ F X ( x ) = 0 , lim x → + ∞ F X ( x ) = 1 . (7.5) Information o n the central tend ency and the variability o f a discrete random va riable X is quantiﬁed in terms of its Expectation value and variance: E( X ) := n X i =1 x i P ( X = x i ) (7.6) V ar( X ) := n X i =1 ( x i − E( X )) 2 P ( X = x i ) . (7.7) One of the ﬁrst occurrences o f the noti on of th e expectation v alue of a random variable relates to the famous “wager” put forward by the French mathemati cian Blaise Pasca l (1623–166 2); cf. Gilboa (2009) [31, Sec. 5.2]. By the so-called shift theor em it holds that the v ariance may alternativ ely b e obtained from the computationall y more efﬁc ient formula V ar( X ) = E  ( X − E( X )) 2  = E( X 2 ) − [E( X )] 2 . (7.8) Speciﬁc values of E( X ) and V ar( X ) will be denoted throughout by the Greek letters µ and σ 2 , respectiv el y . The standard deviation of X amounts to p V ar( X ) ; its speciﬁc values will be denoted by σ . The e va luation of event prob abilities for a discrete random variable X wi th kno wn probabili ty function P ( X = x i ) follows from the Computational rules: P ( X ≤ d ) = F X ( d ) (7.9) P ( X < d ) = F X ( d ) − P ( X = d ) (7.10) P ( X ≥ c ) = 1 − F X ( c ) + P ( X = c ) (7.11) P ( X > c ) = 1 − F X ( c ) (7.12) P ( c ≤ X ≤ d ) = F X ( d ) − F X ( c ) + P ( X = c ) (7.13) P ( c < X ≤ d ) = F X ( d ) − F X ( c ) (7.14) P ( c ≤ X < d ) = F X ( d ) − F X ( c ) − P ( X = d ) + P ( X = c ) (7.15) P ( c < X < d ) = F X ( d ) − F X ( c ) − P ( X = d ) , (7.16) 7.2. CONTINUOUS RANDOM V ARIABLES 53 where c and d denote arbitrary lower and up p er cut-of f v alues imposed on the spectrum of X . In applications it is frequently of interest to know the values of a discrete cdf ’ s α –quantiles: These are realisations x α of X speciﬁcally determined by the condition that X take values x ≤ x α at least with probability α (for 0 < α < 1 ), i.e., F X ( x α ) = P ( X ≤ x α ) ! ≥ α and F X ( x ) = P ( X ≤ x ) < α for x < x α . (7.17) Occasionally , α –quanti les of a probability distribution are also referred to as percentile values . 7.2 Continuou s random v ariables Continuous random variables possess an uncountably inﬁnite Spectrum of values: X 7→ x ∈ D ⊆ R . (7.18) It is, therefore, no l onger meaningful to assign probabilities to i ndividual realisations x of X , but only to inﬁnitesimally small intervals d x ∈ D ins tead, by means of a real-v alued Pr obability density function ( pdf ): f X ( x ) = pdf ( x ) . (7.19) Hence, approximately , P ( X ∈ d x ) ≈ f X ( ξ ) d x , for some representative ξ ∈ d x . The pdf of an arbitrary con tinuous random v ariable X has the deﬁning properties: (i) f X ( x ) ≥ 0 for all x ∈ D , ( non-negativity ) (ii) Z + ∞ −∞ f X ( x ) d x = 1 , and ( normalisability ) (iii) f X ( x ) = F ′ X ( x ) . ( link to cdf ) The ev al u at i on of event pr obabilities for a continuo u s random variable X rests o n the associated theoretical Cumulative distribution function ( cdf ): F X ( x ) = cdf ( x ) := P ( X ≤ x ) = Z x −∞ f X ( t ) d t . (7.20) 54 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES Event probabi lities for X are th en to be obtained from the Computational rules: P ( X = d ) = 0 (7.21) P ( X ≤ d ) = F X ( d ) (7.22) P ( X ≥ c ) = 1 − F X ( c ) (7.23) P ( c ≤ X ≤ d ) = F X ( d ) − F X ( c ) , (7.24) where c and d denote arbitrary lower and upper cut-off v alues im posed o n the sp ectrum of X . Note that, again, the cdf exhibits the asy mptotic properties lim x →−∞ F X ( x ) = 0 , lim x → + ∞ F X ( x ) = 1 . (7.25) The central tendency and the variabilty of a continuou s random variable X are quantiﬁed by its Expectation value and variance: E( X ) := Z + ∞ −∞ xf X ( x ) d x (7.26) V ar( X ) := Z + ∞ −∞ ( x − E( X ) ) 2 f X ( x ) d x . (7.27) Again, by the shift theorem t h e variance may alternativ ely be obt ained from the com putationally more efﬁ cient formu la V ar( X ) = E  ( X − E( X )) 2  = E( X 2 ) − [E( X )] 2 . Speciﬁc values o f E( X ) and V ar( X ) will be d enoted throu ghout by µ and σ 2 , respecti vely . The standard deviation of X amounts to p V ar( X ) ; it s speciﬁc values w i ll be denoted by σ . The constructio n of interval estimates for unknown distribution parameters of continuo u s one- dimensional random variables X in give n t ar get populati ons Ω , and null hypothesi s signi ﬁcance testing (to be discussed later in Chs. 12 and 13), both require explicit knowledge of the α – quantiles associated with the cdf s of the X s. Generally , these are deﬁned as follows. α –quantiles: X take values x ≤ x α with probability α (for 0 < α < 1 ), i.e., P ( X ≤ x α ) = F X ( x α ) ! = α F X ( x ) is strictly monotonously increasi ng z}|{ ⇔ x α = F − 1 X ( α ) . (7.28) Hence, α –quanti les of t he probability dis t ribution for a continuous one-dimensional random vari- able X are determined by t he in verse c df , F − 1 X . For given α , the spectrum of X is thus nat u rally partitioned int o domains x ≤ x α and x ≥ x α . Occasional l y , α –quantiles o f a probability distribu- tion are also referred to as per centile values . 7.3 Skewness and excess kurtosis In analogy to the descriptive case of Sec. 3 . 3, dimensionless measur es of r ela tive distortion char- acterising t he shape of the probability dis tribution for a discrete o r a continuou s one-dimension al random va riable X are deﬁned by the 7.4. LORENZ CUR V E FOR CONTINUOUS RANDOM V ARIABLES 55 Skewn ess and excess kurtosis: Sk ew ( X ) := E [( X − E( X )) 3 ] [V ar( X )] 3 / 2 (7.29) Kurt( X ) := E [( X − E( X )) 4 ] [V ar( X )] 2 − 3 , (7.30) giv en V ar( X ) > 0 ; cf. Rinne (2008) [87, p 196]. Speciﬁc values of Sk ew ( X ) and Kurt( X ) may be denoted by γ 1 and γ 2 , respecti vely . 7.4 Lor enz c u rve f or continuous ra ndom variab les For a contin u ous one-dimensi onal random variable X , the Loren z curve expressing q u alitative ly the degree of concentration in volved i n its associated probability distribution of is deﬁned b y L ( x α ) = Z x α −∞ tf X ( t ) d t Z + ∞ −∞ tf X ( t ) d t , (7.31) with x α denoting a particular α –quanti le of the di stribution in questio n . 7.5 Linear transf ormat i o ns of random variables Linear transformations of real-valued one-dimensional random variables X are determined by the two-parameter relatio n Y = a + bX with a, b ∈ R , b 6 = 0 , (7.32) where Y d eno t es the resultant ne w random variable. Tr ansformations of random variables of thi s kind hav e t he fol lowing eff ects on the computati on of expectation values and variances. 7.5.1 Effect on expectation values 1. E( a ) = a 2. E( bX ) = b E( X ) 3. E( Y ) = E( a + bX ) = E( a ) + E( bX ) = a + b E( X ) . 7.5.2 Effect on variances 1. V ar( a ) = 0 2. V ar( bX ) = b 2 V ar( X ) 3. V ar( Y ) = V ar( a + bX ) = V ar( a ) + V ar( bX ) = b 2 V ar( X ) . 56 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES 7.5.3 Standardisation Standardisation of an arbit rary on e-dimensional random variable X , with p V ar( X ) > 0 , impli es the determination of a special linear transformation X 7→ Z according to Eq. (7.32) such th at the expectation value and variance of X are re-scaled to their simplest values possible, i.e., E( Z ) = 0 and V ar( Z ) = 1 . Hence, t h e two (in part non-linear) conditions 0 ! = E( Z ) = a + b E( X ) and 1 ! = V ar( Z ) = b 2 V ar( X ) , for unknowns a and b , need to be satisﬁed s i multaneously . These are sol ved by , respecti vely , a = − E( X ) p V ar( X ) and b = 1 p V ar( X ) , (7.33) and so X → Z = X − E( X ) p V ar( X ) , x 7→ z = x − µ σ ∈ ¯ D ⊆ R , (7.34) irrespectiv e of whether the random variable X i s of the dis crete kind (cf. Sec. 7.1) or of the continuous kind (cf. Sec. 7.2). It is essent i al for applications to realise that under the process of s t andardisation the values of event probabilities for a random va riable X remain in variant (unchanged), i.e., P ( X ≤ x ) = P X − E( X ) p V ar( X ) ≤ x − µ σ ! = P ( Z ≤ z ) . (7.35) 7.6 Sums of rand om variable s and r eproductivity Def.: For a set of n additive one-dimensional rando m variables X 1 , . . . , X n , one deﬁnes a total sum random var iable Y n and an associated mean random variable ¯ X n according to Y n := n X i =1 X i and ¯ X n := 1 n Y n . (7.36) By linearity of the expectation v alue operation, 1 it then holds that E( Y n ) = E n X i =1 X i ! = n X i =1 E( X i ) and E( ¯ X n ) = 1 n E( Y n ) . ( 7.37) If, i n addit i on, the X 1 , . . . , X n are mut u ally stochastically independent according to Eq. (6.16) (see also Sec. 7.7.4 below), it foll ows from Sec. 7.5 .2 that the va riances of Y n and ¯ X n are giv en by V ar( Y n ) = V ar n X i =1 X i ! = n X i =1 V ar( X i ) and V ar( ¯ X n ) =  1 n  2 V ar( Y n ) , (7.3 8 ) 1 That is: E( X 1 + X 2 ) = E( X 1 ) + E ( X 2 ) . 7.7. TWO-DIMENSION AL RANDOM V ARIABLES 57 respectiv el y . Def.: Repr o ductivity o f a probabili ty di s tribution l aw ( cd f ) F ( x ) is give n when the total sum Y n of n independent and identically distributed (in short: “i.i.d. ”) additive one-dimensional random var iables X 1 , . . . , X n , whi ch each individually satisfy distribution laws F X i ( x ) ≡ F ( x ) , inherits this very dist ri bution law F ( x ) from its underlyin g n random variables. Examples of reproductiv e distribution laws, to be discuss ed in the foll owing Ch. 8, are the binomial, the Gaußian normal, and the χ 2 –distributions. 7.7 T wo-dimens ional random varia bles The empirical tests for as sociation between two statistical variables X and Y of Ch. 13 require the noti ons of two-dimensional random variables and their biva riate joint pr obability distrib u- tions . Recommend ed introductory li terature on these matters are, e.g., T outenb urg (2005) [108, p 57ff] and Kredler (2003) [52, Ch. 2]. Def.: A real-va lued two-dimensional random variable is deﬁned as a one-to-one mapping ( X , Y ) : Ω → D ⊆ R 2 (7.39) of th e sample space Ω of some rando m experiment with associated probability space ( Ω , P , P ) into a subset D of the two-dimensional Euclidian space R 2 . W e proceed by sketching s o me important concepts relating to two-dimensional rando m variables. 7.7.1 Joint probabili ty distribution s Discre te case: T wo-dimensional discrete random variables po s sess a Spectrum of values: ( X , Y ) 7→ ( x, y ) ∈ { x 1 , . . . , x k } × { y 1 , . . . , y l } ⊂ R 2 , with k, l ∈ N . (7.40) All pairs of values ( x i , y j ) i =1 ,...,k ; j =1 ,... ,l in this s p ectrum are assigned individual probabili ties p ij by a real-valued Joint probability function: P ( X = x i , Y = y j ) = p ij for i = 1 , . . . , k ; j = 1 , . . . , l , (7. 4 1) with properties (i) 0 ≤ p ij ≤ 1 , and ( non-negativity ) (ii) k X i =1 l X j =1 p ij = 1 . ( normalisability ) 58 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES By analogy to the case of one-dimensio nal random variables, speciﬁc ev ent pr obabili ties for ( X , Y ) are obtained from the associated Joint cumulative distrib ution function ( cdf ): F X Y ( x, y ) = cdf ( x, y ) := P ( X ≤ x, Y ≤ y ) = X i | x i ≤ x X j | y j ≤ y p ij . (7.42) Continuous case: For two-dimens ional continuous random variables the range can be represented by t h e Spectrum of values: ( X , Y ) 7→ ( x, y ) ∈ D = ( x min , x max ) × ( y min , y max ) ⊆ R 2 . (7.43) Probabilities are now assig ned t o i nﬁnitesimally small areas d x × d y ∈ D by means of a real-v alued Joint probability density function ( pdf ): f X Y ( x, y ) = pdf ( x, y ) , (7.44) with properties: (i) f X Y ( x, y ) ≥ 0 for all ( x, y ) ∈ D , and ( non-negativity ) (ii) Z + ∞ −∞ Z + ∞ −∞ f X Y ( x, y ) d x d y = 1 . ( normalisability ) Approximately , one now has P ( X ∈ d x, Y ∈ d y ) ≈ f X Y ( ξ , η ) d x d y , for representativ e ξ ∈ d x and η ∈ d y . Speciﬁc event probabilities for ( X , Y ) are obtained from the associated Joint cumulative distrib ution function ( cdf ): F X Y ( x, y ) = cdf ( x, y ) := P ( X ≤ x, Y ≤ y ) = Z x −∞ Z y −∞ f X Y ( t, u ) d t d u . (7.45) 7.7.2 Marginal and conditional probab ility distrib utions Discre te case: The u n iv ariate marginal prob ability functions for X and Y ind uced by the joint probabilit y function P ( X = x i , Y = y j ) = p ij are p i + := l X j =1 p ij = P ( X = x i ) for i = 1 , . . . , k , (7.46) 7.7. TWO-DIMENSION AL RANDOM V ARIABLES 59 and p + j := k X i =1 p ij = P ( Y = y j ) for j = 1 , . . . , l . (7.47) In addi t ion, one deﬁnes conditional pr obability functions for X given Y = y j , with p + j > 0 , and for Y given X = x i , with p i + > 0 , by p i | j := p ij p + j = P ( X = x i | Y = y j ) for i = 1 , . . . , k , (7.48) respectiv el y p j | i := p ij p i + = P ( Y = y j | X = x i ) for j = 1 , . . . , l . (7.49) Continuous case: The univ ariate marginal probability density functions for X and Y induced by the joint proba- bility density function f X Y ( x, y ) are f X ( x ) = Z + ∞ −∞ f X Y ( x, y ) d y , (7.50) and f Y ( y ) = Z + ∞ −∞ f X Y ( x, y ) d x . (7.51 ) Moreover , one deﬁnes conditional probability density functions for X giv en Y , and for Y given X , by f X | Y ( x | y ) := f X Y ( x, y ) f Y ( y ) for f Y ( y ) > 0 , (7.52) respectiv el y f Y | X ( y | x ) := f X Y ( x, y ) f X ( x ) for f X ( x ) > 0 . (7.53) 7.7.3 Bayes’ t heor em f or two-dimensional random variabl es The concept of a bivariate joi nt probabi lity dist ri bution is at the heart of the formulation of Bayes’ theorem, Eq. (6.18), for a real-valued two-dimensi o nal random variable ( X , Y ) . Discre te case: Let P ( X = x i ) = p i + > 0 be a prior probab ility function for a discrete random variable X . Then, on the grounds of a joint probability functio n P ( X = x i , Y = y j ) = p ij and Eqs. (7.48) and (7.49), the posterior probability function for X given Y = y j , wi t h P ( Y = y j ) = p + j > 0 , is determined by p i | j = p j | i p + j p i + for i = 1 , . . . , k . (7.54) 60 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES By using Eqs. (7.47) and (7.49) to re-e xpressed the denomi nator p + j , this may b e given in th e standard form p i | j = p j | i p i + k X i =1 p j | i p i + for i = 1 , . . . , k . (7.55) Continuous case: Let f X ( x ) > 0 be a prior probability density function for a continuous random variable X . Then, on the g rounds of a joint probability density fun cti on f X Y ( x, y ) and Eqs. (7.52) and (7.53), the posterior pr obability density function for X given Y , with f Y ( y ) > 0 , is determi n ed b y f X | Y ( x | y ) = f Y | X ( y | x ) f Y ( y ) f X ( x ) . (7.56) By usin g Eqs. (7.51) and (7.53) to re-expressed the denom inator f Y ( y ) , this may be stated in the standard form f X | Y ( x | y ) = f Y | X ( y | x ) f X ( x ) Z + ∞ −∞ f Y | X ( y | x ) f X ( x ) d x . (7.57) In p ractical applicatio ns, ev aluatio n of t he, at times intricate, single and double i n tegrals con- tained in this representati on o f Bayes’ theorem is managed by employing soph i sticated num eri- cal approxi m ation techniques; cf. Saha (2002) [88], Si via and Skilling (2006) [92], Greenber g (2013) [35], Gelman et al (2014) [30], or McElreath (20 1 6) [69 ]. 7.7.4 Covariance and correlation W e conclud e thi s section by revie wing t he st andard m easures for characterising the degree of stochastic association between two random v ariables X and Y . The covariance of X and Y is deﬁned by Co v ( X , Y ) := E [( X − E( X )) ( Y − E( Y ))] . (7.58) It constitut es the off-diagonal component of the sy mmetric ( 2 × 2) covariance matrix Σ ( X , Y ) :=  V ar( X ) Co v ( X , Y ) Co v ( X , Y ) V ar( Y )  , (7. 5 9) which is regular and t hus in vertible as l ong as det[ Σ ( X , Y )] 6 = 0 . By a suitable n o rmalisation procedure, one deﬁnes from Eq . (7.58) the correlation coefﬁcient of X and Y as ρ ( X, Y ) := Co v ( X , Y ) p V ar( X ) p V ar( Y ) . (7.60) 7.7. TWO-DIMENSION AL RANDOM V ARIABLES 61 This features as the off-diagonal comp o nent in the sy mmetric (2 × 2 ) correlation matrix R ( X , Y ) :=  1 ρ ( X, Y ) ρ ( X, Y ) 1  , (7.61) which is positive deﬁnite and t hus in vertible for 0 < det[ R ( X , Y )] = 1 − ρ 2 ≤ 1 . Def.: T wo random v ariables X and Y are referred to as mutu ally stochastically independ ent provided t hat Co v ( X , Y ) = 0 ⇔ ρ ( X, Y ) = 0 . (7.62) It then follows that P ( X ≤ x, Y ≤ y ) = P ( X ≤ x ) × P ( Y ≤ y ) ⇔ F X Y ( x, y ) = F X ( x ) × F Y ( y ) (7.63) for ( x, y ) ∈ D ⊆ R 2 . M oreover , in this case (i) E( X × Y ) = E( X ) × E( Y ) , and (ii) V ar( aX + bY ) = a 2 V ar( X ) + b 2 V ar( Y ) . In the next chapter we wi ll highlight a number of standard univ ariate probabil i ty di stributions for discrete and continuo u s one-dim ensional random variables. 62 CHAPTER 7. D ISCRETE AND CONTINUOUS RANDOM V ARIABLES Chapter 8 Standard uni va riate pr obability distrib utions f or discr ete and continuous random va riables In thi s chapter , we re v i e w (i) the univ ariate probabil i ty distributions for one-dimensi onal random var iables which one typically encounters as theoretical pr obability distrib utions in the context of frequentist null hypothesis signiﬁcance testing (cf. Chs. 12 and 13), but we also include (ii) cases of well-established pedagogical merit, and (iii) a fe w examples of rather specialised uni- var iate probabilit y dis tributions, which, never theless, prove to be of interest in t h e description and modelling of various theoretical market situati o ns in Economics . W e split our considerations into two m ain parts according to whether a o ne-dimensional random variable X und erlying a parti cular distribution law varies discretely o r conti n uously . For each o f the cases t o be presented, we list the spectrum of values of X , its probability function (for discrete X ) or pr obability density function ( pd f ) (for continuous X ), i ts cumulative distribution function ( cdf ), its expectation value and its variance , and, in some continuo us cases, also its skewness , excess kurtosis and α –quantiles . Additio nal information, e.g., commands i n R , on a GDC, in E X CEL, or in OpenOf- ﬁce, by whi ch a speciﬁc distribution function may be activ ated for com putational purposes or be plotted, is included where av ailable. 8.1 Discrete unif orm distrib ution One of t h e simp lest probabilit y distributions for a d i screte o n e-dimensional random variable X i s giv en by t h e one-parameter dis crete unif orm distrib ution , X ∼ L ( n ) , (8.1) which is characterised by the number n of di f ferent values in X ’ s Spectrum of values: X 7→ x ∈ { x 1 , . . . , x n } ⊂ R , with n ∈ N . (8.2) 63 64 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS Probability function: P ( X = x i ) = 1 n for i = 1 , . . . , n ; (8.3) its graph is shown in Fig. 8.1 below for n = 6 . 1 2 3 4 5 6 0.00 0.10 0.20 x Discreteunif ormprob(x) L(6) Discrete unif orm distr ibution Figure 8.1: Probability function of the discrete uniform distribution according t o Eq. (8.3) for the case L (6) . An en velopin g line is also shown. Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = X i | x i ≤ x 1 n . (8.4) Expectation value and variance: E( X ) = n X i =1 x i × 1 n = µ (8.5) V ar( X ) = n X i =1 x 2 i × 1 n ! − µ 2 . (8.6) For skewness and excess kurtosis, see, e.g., Rinne (2008) [87, p 372f]. The discrete uniform di stribution is i dentical to a Laplacian probability measure; cf. Sec. 6.3. Th i s is well-known from games of chance such as tossin g a fair coin once, selecting a single card from a deck of cards, rolling a fair dye once, or the fair roulette lottery . R : ddunif ( x, x 1 , x n ) , pdunif ( x, x 1 , x n ) , qd unif ( α, x 1 , x n ) , rdunif ( n simulations , x 1 , x n ) (package: extraDis tr , by W olodzko (2018) [121]) 8.2. BINOMIAL DISTRIB UTION 65 8.2 Binomial distrib u tion 8.2.1 Bernoulli distribution Another s imple probabil ity dist ribution, for a discrete one-dimensional random var iable X with only two possi b le values, 0 and 1 , 1 is due to the Swiss mathem atician Jakob Bernoulli (1654–1705). The Bernou lli distribution , X ∼ B (1; p ) , (8.7) depends on a single free parameter , t he p rob ability p ∈ [0; 1] for the ev ent X = x = 1 . Spectrum of values: X 7→ x ∈ { 0 , 1 } . (8.8) Probability function: P ( X = x ) =  1 x  p x (1 − p ) 1 − x , with 0 ≤ p ≤ 1 ; (8.9) its graph is shown in Fig. 8.2 below for p = 1 3 . 0.0 0.2 0.4 0.6 0.8 x Bernoulliprob(x) 0 1 B(1; 1/3) Bernoulli distr ibution Figure 8.2: Probabil ity functi o n of the Bernoulli dis tribution according to Eq. (8.9) for the case B  1; 1 3  . 1 Any one-d imensional ran dom variable of this kind is refe r red to as dich otomou s. 66 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = ⌊ x ⌋ X k =0  1 k  p k (1 − p ) 1 − k . (8.10) Expectation value and variance: E( X ) = 0 × (1 − p ) + 1 × p = p (8.11) V ar( X ) = 0 2 × (1 − p ) + 1 2 × p − p 2 = p (1 − p ) . (8.12) 8.2.2 General binomial distrib ution A direct g eneralisation of th e Bernoulli dis tribution i s t h e case of a discrete one-dimensi o nal ran- dom variable X which is the sum of n mutually stochasticall y independent, identi call y Bernoulli- distributed (“i.i.d. ”) on e-di m ensional random v ariables X i ∼ B (1; p ) ( i = 1 , . . . , n ), i.e., X := n X i =1 X i = X 1 + . . . + X n , (8.13) which yields the reproductive two-parameter binomial distribution X ∼ B ( n ; p ) , (8.14) again with p ∈ [0; 1] the probabilit y for a sing le event X i = x i = 1 . Spectrum of values: X 7→ x ∈ { 0 , . . . , n } , with n ∈ N . (8.15) Probability function: 2 P ( X = x ) =  n x  p x (1 − p ) n − x , with 0 ≤ p ≤ 1 ; (8.16) its graph is shown in Fig. 8.3 below for n = 10 and p = 3 5 . Recall that  n x  denotes the bino mial coef ﬁcient deﬁned in Eq. (6.13), which generates the pos i tiv e integer entries of Pascal’ s triangle. Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = ⌊ x ⌋ X k =0  n k  p k (1 − p ) n − k . (8.17) 2 In the context of an urn model with M black balls and N − M white balls, and the ran dom selection of n balls from a total o f N , with repetition, th is p robability fu n ction can be d erived fro m Laplace’ s p rinciple of form ing the ratio between the “n umber of fa vourable cases” and the “numb er o f a ll po ssible cases, ” cf. Eq. (6.11). Thus, P ( X = x ) =  n x  M x ( N − M ) n − x N n , wh ere x denotes the n umber of b lack balls drawn, and one substitutes accordin g ly fr om the deﬁnition p := M / N . 8.3. HYPERGEOMETRIC DISTRIB UTION 67 0 2 4 6 8 10 0.00 0.10 0.20 0.30 x Binomialprob(x) B(10; 3/5) Binomial distribution Figure 8.3: Probability function of the binomial distribution according to Eq. (8.16) for th e case B  10; 3 5  . An en veloping line is also shown. Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 260]): E( X ) = n X i =1 p = np (8.18) V ar( X ) = n X i =1 p (1 − p ) = np (1 − p ) (8.19) Sk ew ( X ) = 1 − 2 p p np (1 − p ) (8.20) Kurt( X ) = 1 − 6 p (1 − p ) np (1 − p ) . (8.21) The results for E( X ) and V a r ( X ) are based on the rules (7.37) and (7.38), the latter o f which applies to a set of mutuall y stochastically ind epend ent random variables. R : dbinom ( x, n, p ) , pbin om ( x, n, p ) , qbinom ( α, n, p ) , rbinom ( n simulations , n, p ) GDC: binompdf ( n, p, x ) , binomcdf ( n, p, x ) EXCEL, OpenOfﬁc e: BINOM.DIST (dt .: BI NOM.VERT , BINOMVER T ), BINOM.INV (for α – quantiles) 8.3 Hypergeome t r ic distr ib u tion The hypergeomet ric distribution for a discrete one-dimensi o nal random v ariable X derives from an urn mod el with M black ball s and N − M wh ite balls , and the random selection of n balls from 68 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS a total of N ( n ≤ N ), wit h out repetit i on. If X represents the number of b l ack balls amongst the n selected balls, it is subject to the three-parameter probabili ty di s tribution X ∼ H ( n, M , N ) . (8.22) In particular , thi s mo d el forms the m athematical basis of the internation al l y popul ar Nati o nal Lot- tery “6 out of 49, ” in which case there are M = 6 winni n g numbers amongst a total of N = 49 numbers, and X ∈ { 0 , 1 , . . . , 6 } count s the total of correctly guessed winning numbers on an individual gambler’ s lottery ticket. Spectrum of values: X 7→ x ∈ { max(0 , n − ( N − M )) , . . . , min( n, M ) } . (8.23) Probability function: P ( X = x ) =  M x   N − M n − x   N n  ; (8.24) its graph is shown in Fig. 8.4 below for the National Lottery example, so n = 6 , M = 6 and N = 49 . 0 1 2 3 4 5 6 0.0 0.1 0.2 0.3 0.4 x Hypergeometricprob(x) H(6, 6, 49) Hypergeometric distr ibution Figure 8.4: Probability function of the hypergeometric distribution according to Eq. (8. 2 4) for the case H (6 , 6 , 49 ) . An en veloping li n e is also shown. 8.4. POISSON DISTRIB UTION 69 Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = ⌊ x ⌋ X k =max(0 ,n − ( N − M ))  M k   N − M n − k   N n  . (8.25) Expectation value and variance: E( X ) = n M N (8.26) V ar( X ) = n M N  1 − M N   N − n N − 1  . ( 8.27) For skewness and excess kurtosis, see, e.g., Rinne (2008) [87, p 270]. R : dhyper ( x, M , N − M , n ) , ph yper ( x, M , N − M , n ) , qhyper ( α, M , N − M , n ) , rhyper ( n simulations , M , N − M , n ) EXCEL, OpenOfﬁc e: HYPGEOM.DI ST (dt.: HYPGEO M.VERT , HYPG EOMVERT ) 8.4 P o isson dis trib ution The one-parameter Poisson distribution for a discrete one-dimensional random variable X , X ∼ P ois ( λ ) . (8.28) plays a m ajor role in analysing count data when t he maximu m numb er of pos- sible counts associated w i th a correspon d i ng data-generating process is unknown. This dist ribution is named after th e French m athematician, engineer , and physicist Baron Siméon Denis Poisson FRSFor HFRSE MIF (1781–1840) and can b e considered a special case of the binomial dis t ribution, discussed in Sec. 8.2, when n is very lar ge ( n ≫ 1 ) and p is very small ( 0 < p ≪ 1 ); cf. Sivia and Skilli ng (2006) [92, Sec. 5.4]. Spectrum of values: X 7→ x ∈ { 0 , . . . , n } , with n ∈ N . . (8.29) Probability function: P ( X = x ) = λ x x ! exp ( − λ ) , with λ ∈ R > 0 ; (8.30) λ is a dimensionl ess rate parameter . It is also referred to as the intensity parameter . The graph of the probability function is shown in Fig. 8.5 below for the case λ = 3 2 . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) =   ⌊ x ⌋ X k =0 λ k k !   exp ( − λ ) . (8.31) 70 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS 0 2 4 6 8 10 0.00 0.10 0.20 0.30 x P oissonprob(x) P ois(3/2) P oisson distribution Figure 8.5: Probability function of the Poisson distribution according to Eq. (8.30) for t h e case P ois  3 2  . An en veloping line is also shown. Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 285f]): 3 E( X ) = λ (8.32) V ar( X ) = λ (8.33) Sk ew ( X ) = 1 √ λ (8.34) Kurt( X ) = 1 λ . (8.35) R : dpois ( x, λ ) , ppo is ( x, λ ) , qpo is ( α, λ ) , rpois ( n simulations , λ ) EXCEL, OpenOfﬁc e: POISSON .DIST (dt.: POI SSON.VERT ), PO ISSON 8.5 Continuou s unif orm distribution The simplest example o f a probabili ty dis tribution for a continuo us one-dim ensional random vari- able X is the continuous unif orm distribu tion , X ∼ U ( a ; b ) , (8.36) also referred to as the r ectangular distribution . Its two free parameters, a and b , denote the limi ts of X ’ s 3 Note th at for a binom ial d istribution, cf. Sec. 8.2, in the limit that n ≫ 1 wh ile simultaneou sly 0 < p ≪ 1 it holds that n p ≈ np (1 − p ) , and so the co rrespond ing expectation value and variance b ecome more and more equ a l. 8.5. CONTINUOUS UNIFORM DISTRIB UTION 71 Spectrum of values: X 7→ x ∈ [ a, b ] ⊂ R . (8.37) Probability density function ( pdf ): 4 f X ( x ) =        1 b − a for x ∈ [ a, b ] 0 otherwise ; (8.38) its graph is shown in Fig. 8.6 below for fou r dif ferent combinations of the parameters a and b . 0 1 2 3 4 5 0.0 0.4 0.8 x Unif ormpdf(x) U(2; 3) U(3/2; 7/2) U(1; 4) U(0; 5) Continuous unif or m distr ibutions Figure 8.6: pdf of the con t inuous uniform d istribution according to Eq. (8.38) for the cases U (0 ; 5 ) , U (1; 4) , U (3 / 2; 7 / 2) and U (2; 3) . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) =                  0 for x < a x − a b − a for x ∈ [ a, b ] 1 for x > b . (8.39) 4 It is a nice and instructive little exercise, stro ngly reco mmende d to the reader, to go throug h th e details of explicitly computin g from this simp le pdf the corr espondin g cdf , expectation value, variance, skewness and excess ku r tosis of X ∼ U ( a ; b ) . 72 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS Expectation value, variance, skewness and excess kurtosis: E( X ) = a + b 2 (8.40) V ar( X ) = ( b − a ) 2 12 (8.41) Sk ew ( X ) = 0 (8.42) Kurt( X ) = − 6 5 . (8.43) Using som e of these results, as well as Eq. (8.39), one ﬁnds t h at for all continuous uniform d i stri- butions the event probability P ( | X − E( X ) | ≤ p V ar( X )) = P √ 3( a + b ) − ( b − a ) 2 √ 3 ≤ X ≤ √ 3( a + b ) + ( b − a ) 2 √ 3 ! = 1 √ 3 ≈ 0 . 5773 , (8.44) i.e., the event probability that X falls within one standard deviation (“ 1 σ ”) o f E( X ) is 1 / √ 3 . α – quantiles of continuous uniform distributions are obtained by straightforward i n version, i.e., for 0 < α < 1 , α ! = F X ( x α ) = x α − a b − a ⇔ x α = F − 1 X ( α ) = a + α ( b − a ) . (8.45) R : dunif ( x, a, b ) , pun if ( x, a, b ) , qunif ( α, a, b ) , runif ( n simulations , a, b ) Standardisation of X ∼ U ( a ; b ) according to Eq. (7.34) yields a one-dimensional random variable Z ∼ U ( − √ 3; √ 3) by X → Z = √ 3 2 X − ( a + b ) b − a 7→ z ∈ h − √ 3 , √ 3 i , (8.46) with pdf f Z ( z ) =        1 2 √ 3 for z ∈  − √ 3 , √ 3  0 otherwise , (8.47) and cdf F Z ( z ) = P ( Z ≤ z ) =                    0 for z < − √ 3 z + √ 3 2 √ 3 for z ∈  − √ 3 , √ 3  1 for z > √ 3 . (8.48) 8.6. GA USSIAN NORMAL DISTRIB UTION 73 8.6 Gaußian normal distri bution The b es t -known probability distribution for a continuous one-dimensional random variable X , which proves ubiquit ous in Infer ential Statistics (see Chs. 12 and 13 below), is due to Carl Friedrich Gauß (1777–1855); cf. Gauß (180 9 ) [29]. This is the reproductive two-parameter normal distribu tion X ∼ N ( µ ; σ 2 ) ; (8.49) the meaning of the parameters µ and σ 2 will be explained s hortly . The extraordinary sta- tus of the normal distribution in Pr obability Theory and Statisti cs was cemented th rough the discovery of the central l imit theor em by the French mathematician and astronomer Marquis Pierre Simon de Laplace (1749–1827 ), cf. Laplace (18 09) [57]; see Sec. 8.15 below . Spectrum of values: X 7→ x ∈ D ⊆ R . (8.50) Probability density function ( pdf ): f X ( x ) = 1 √ 2 π σ exp " − 1 2  x − µ σ  2 # , with σ ∈ R > 0 . (8.51) This normal – p df deﬁnes a reﬂection-symmetri c characteristic bell-shaped curve, the analytical properties of which were ﬁrst discus sed by the French m athematician Abraham de Moivre (1667–1754). The x –po sition o f this curve’ s (glob al) maximum i s speciﬁed by µ , while t he x –pos i tions of its t wo points o f inﬂection are giv en by µ − σ resp. µ + σ . The ef fects of dif ferent values of the parameters µ and σ on the bell-shaped curv e are illustrated in Figs. 8.7 and 8.8 below . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = Z x −∞ 1 √ 2 π σ exp " − 1 2  t − µ σ  2 # d t . (8.52) W e emphasise the fact that the normal – cdf cannot be expressed in terms of elementary mathe- matical functions. Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 301]): E( X ) = µ (8.53) V ar( X ) = σ 2 (8.54) Sk ew ( X ) = 0 (8.55) Kurt( X ) = 0 . (8.56) R : dnorm ( x, µ, σ ) , pnorm ( x, µ, σ ) , qno rm ( α, µ, σ ) , rnorm ( n simulations , µ , σ ) GDC: normalpdf ( x, µ, σ ) , n ormalcdf ( −∞ , x, µ, σ ) EXCEL, OpenOfﬁc e: NORM.DI ST (dt.: NO RM.VERT , NORMVERT ) 74 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS −4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 x Npdf(x) N(−2; 1/4) N(0; 1/4) N(1; 1/4) N(3/2; 1/4) Gauß distributions (1) Figure 8.7: pdf of the Gaußian norm al di s tribution according t o Eq. (8.51). Cases N ( − 2; 1 / 4) , N (0 ; 1 / 4) , N (1 ; 1 / 4) and N (3 / 2; 1 / 4) , which have constant σ . −4 −2 0 2 4 0.0 0.2 0.4 0.6 0.8 x Npdf(x) N(0; 1/4) N(0; 1) N(0; 2) N(0; 4) Gauß distributions (2) Figure 8.8: pdf of the Gaußian n ormal dist ri bution according to Eq. (8.51). Cases N (0; 1 / 4) , N (0 ; 1 ) , N (0; 2) and N (0 ; 4 ) , which hav e constant µ . Upon standardis ation of a normally distri buted one-dimensi onal random variable X according to Eq. (7. 3 4), the corresponding normal di stribution N ( µ ; σ 2 ) i s transformed into the unique stan- dard normal distribut ion , N (0; 1) , wit h 8.6. GA USSIAN NORMAL DISTRIB UTION 75 Probability density function ( pdf ): ϕ ( z ) := 1 √ 2 π exp  − 1 2 z 2  for z ∈ R ; (8.57) its graph is shown in Fig. 8.9 below . −6 −4 −2 0 2 4 6 0.0 0.1 0.2 0.3 0.4 z φ ( z ) N(0; 1) Standard normal distr ibution Figure 8.9: pdf of th e st andard normal distribution according to Eq. (8.57). Cumulative dist ribution functi on ( cdf ): Φ( z ) := P ( Z ≤ z ) = Z z −∞ 1 √ 2 π exp  − 1 2 t 2  d t . (8.58) R : dnorm ( z ) , pnorm ( z ) , qn orm ( α ) , rnorm ( n simulations ) EXCEL: NORM.S.DIS T (dt.: NORM.S. VERT ) The resultant random variable Z ∼ N (0; 1) satisﬁes the Computational rules: P ( Z ≤ b ) = Φ( b ) (8.59) P ( Z ≥ a ) = 1 − Φ( a ) (8.60) P ( a ≤ Z ≤ b ) = Φ( b ) − Φ( a ) (8.61) Φ( − z ) = 1 − Φ( z ) (8.62) P ( − z ≤ Z ≤ z ) = 2Φ( z ) − 1 . (8.63) 76 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS The ev ent probabi l ity th at a (standard) normally distri buted one-dimension al random variable takes values insi de an interval of length k times two standard deviations, centred on its expectation value, is giv en b y the i mportant kσ –rule . This states that P ( | X − µ | ≤ k σ ) Eq. (7.34) z}|{ = P ( − k ≤ Z ≤ + k ) Eq. (8.63) z}|{ = 2Φ( k ) − 1 for k > 0 . (8.64) According to this rule, the ev ent probabilit y of a normall y distributed one-dimensional random var iable to deviate from its mean by more than six standard deviations amounts to P ( | X − µ | > 6 σ ) = 2 [1 − Φ(6) ] ≈ 1 . 97 × 10 − 9 , (8.65) i.e., about two parts in one bill ion. Thu s, in this scenario th e occurrence of extreme o utliers for X i s practically imp ossible. In t urn, the persistent occurrence of so-called 6 σ –ev ents , o r l ar ger deviations from the mean, in quantit at ive stat i stical su rve ys can be i n terpreted as e v idence agains t the assumpti on of an underlying Gaußian random process; cf. T aleb (2007 ) [105, Ch. 1 5 ]. The rapid, accelerated decli ne in the e vent p rob abilities for de viations from t he mean of a Gaußian normal distri bution can be related to the fact that the elast i city of the standard normal– p df is given by (cf. Ref. [18, Sec. 7.6]) ε ϕ ( z ) = − z 2 . (8.66) Manifestly this is negativ e for all z 6 = 0 and increases non -linearly in absolute v alue as one moves aw ay from z = 0 . α –quanti l es asso ci at ed wi th Z ∼ N (0; 1) are obtained from the in verse standard normal– cdf according to α ! = P ( Z ≤ z α ) = Φ( z α ) ⇔ z α = Φ − 1 ( α ) for all 0 < α < 1 . (8.67) Due to the reﬂection symmetry of ϕ ( z ) with respect to the vertical axis at z = 0 , it holds that z α = − z 1 − α . (8.68) For this reason, one typically ﬁnds z α -va lues l isted in textbooks on Statistics only for α ∈ [1 / 2 , 1) . Alternative ly , a particular z α may be o b tained from R , a GDC, E X CEL, or from OpenOfﬁce. The backward transformation from a parti cular z α of the standard normal distribution to the corre- sponding x α of a giv en normal distribution fol lows from Eq. (7.34) and amounts to x α = µ + z α σ . R : qnorm ( α ) GDC: invNorm ( α ) EXCEL, OpenOfﬁc e: NORM.S.INV (dt.: NORM.S .INV , NORMI NV ) At this stage, a few historical remarks are in order . The Gauß i an normal distribution gained a prominent , tho ugh in parts questi onable status in t he Social Sciences through the highly inﬂuential work of the Belgian astronom er , mathemati cian, statist ician and s ociologist Lambert Adolphe Jacques Quetelet (1796–18 7 4) during the 19 th Century . In particular , his re- search programme on the generic p rop erties of l’homme moyen (engl.: the a verage man), see 8.7. χ 2 –DISTRIB UTION 77 Quetelet (1835) [84], an ambiti ous and to some extent ob s essiv e attem pt to quantify and clas- sify physiologi cal and sociological human characteristics according to the pri n ciples of a nor- mal distribution, left a last i ng impact on th e ﬁeld, with repercussions t o this d ay . Quetelet, by th e way , co-founded the Royal Statistical Society ( rss.org. uk ) i n 1834 . Furth er vis- ibility was given to Quetelet’ s i d eas at the time by a contemporary , the Engli s h em p iricist Sir Francis Galton FRS (1822–19 11), whose intense studies on h eredity in Humans , see Galton (1869) [27], which he later subsumed und er the term “eugenics, ” complem ent ed Quetelet’ s in- vestigations, and profoundly shaped s u bsequent developments in so cial research; cf. Bernstein (1998) [3, Ch. 9]. Incidently , amo ngst many other contributions to th e ﬁeld, Galton ’ s activities helped to pa ve the way for making questionnair es and sur veys a commonplace for collecti ng statistical data from Humans. The (standard) n ormal distribution, as well as the next three examples o f probabil ity dist ri butions for a continuous one-dimensional random variable X , are com monly referred to as th e test dis tri- b utions , due to t he central roles they play in null hypothesis s i gniﬁcance testing (cf. Chs. 12 and 13). 8.7 χ 2 –distribution with n degrees of fr eed om The reproductive o n e-parameter χ 2 –distrib ution w i th n degr ees of free dom was devised by the English mathematical statistician Karl Pearson FRS (1857–1936); cf. Pearson (1900) [78]. Th e underlying continuous one-dimensio nal random variable X ∼ χ 2 ( n ) , (8.69) is perceived of as the s um of squares of n sto chastically independent, identically s t andard normall y distributed (“i.i.d. ”) random variables Z i ∼ N ( 0; 1) ( i = 1 , . . . , n ), i.e., X := n X i =1 Z 2 i = Z 2 1 + . . . + Z 2 n , with n ∈ N . (8.70) Spectrum of values: X 7→ x ∈ D ⊆ R ≥ 0 . (8.71) The probabi lity density functio n ( pdf ) of a χ 2 –distribution wi th d f = n degrees of freedom is a fairly complicated m athematical expression; see Rinne (2008) [87, p 319] or Ref. [19 , Eq. (3.26)] for the explicit representation of the χ 2 pdf . Plots are sho wn for four different values of th e parameter n i n Fig. 8.10. The χ 2 cdf cannot be expressed in terms of elementary mathematical functions. 78 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS 0 10 20 30 40 50 0.00 0.10 0.20 x chi2pdf(x) chi2(3) chi2(5) chi2(10) chi2(30) chi−squared−distributions Figure 8.10: pdf of the χ 2 –distribution for d f = n ∈ { 3 , 5 , 1 0 , 30 } degrees of freedom. Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 320f]): E( X ) = n (8.72 ) V ar( X ) = 2 n (8.73) Sk ew ( X ) = r 8 n (8.74) Kurt( X ) = 12 n . (8.75) α –quanti l es, χ 2 n ; α , of χ 2 –distributions are generally tabulated in textbooks on Statistics . Alterna- tiv ely , th ey may be obtain ed from R , EXCEL, or from OpenOfﬁce. Note that for n ≥ 50 a χ 2 –distribution m ay be approximated reasonably wel l by a normal di s tri- bution, N ( n, 2 n ) . This is a reﬂection of the central limi t theor em , to be discussed i n Sec. 8.15 below . R : dchisq ( x, n ) , pchi sq ( x, n ) , qchi sq ( α, n ) , rchisq ( n simulations , n ) GDC: χ 2 pdf ( x, n ) , χ 2 cdf (0 , x, n ) EXCEL, OpenOfﬁc e: CHISQ.D IST , CHISQ.I NV (dt.: C HIQU.VERT , CHIQVERT , CHIQU.INV , CHIQINV ) 8.8 t –distribution with n degrees of fr e e dom The non-reproductiv e one-parameter t –distribu tion with n degre es of free dom was discovered by the Engli sh statistician W ill iam Sealy Gosset (1876–1937). Int ending to some extent to irritate the scienti ﬁc communit y , he publis h ed his ﬁndi ngs und er th e pseudonym of “Student; ” cf. Student 8.8. T –DISTRIB UTION 79 (1908) [10 0]. Consider two stochastically i ndependent one-dim ensional random variables, Z ∼ N (0 ; 1 ) and X ∼ χ 2 ( n ) , satisfying the i ndicated d istribution laws. Then the quo t ient random var iable deﬁned by T := Z p X/n ∼ t ( n ) , with n ∈ N , (8.76) is t –distributed wi t h d f = n degrees of freedom. Spectrum of values: T 7→ t ∈ D ⊆ R . (8.77) The probabilit y density functi o n ( pdf ) of a t –distribution, which exhibits a reﬂection symmetry with respect to the vertical axis at t = 0 , is a fairly compli cated math ematical expression; see Rinne (2008) [87, p 326] or Ref. [19, Eq. (2.26)] for the explicit representation of the t pdf . Plots are shown for four different values of the parameter n in Fig. 8.11. The t cdf cannot be expressed in terms of elementary mathematical functions . −6 −4 −2 0 2 4 6 0.0 0.1 0.2 0.3 0.4 x tpdf(x) t(2) t(3) t(5) t(50) t−distributions Figure 8.11: pd f of the t –distribution for d f = n ∈ { 2 , 3 , 5 , 50 } degrees o f freedom. For t he case t (50) , the t pd f is essenti ally equivalent t o the standard normal pdf . Notice the fatter tail s of the t pdf for small values of n . Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 327]): E( X ) = 0 (8.78) V ar( X ) = n n − 2 for n > 2 ( 8.79) Sk ew ( X ) = 0 for n > 3 (8.80) Kurt( X ) = 6 n − 4 for n > 4 . (8.81) 80 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS α –quanti l es, t n ; α , of t –dis t ributions, for which, due to the reﬂection symm et ry of the t pdf , the identity t n ; α = − t n ;1 − α holds, are generally t abulated in textbooks on Statistics . A l ternativ ely , they may be obt ai n ed from R , some GDCs, EXCEL, or from OpenOfﬁ ce. Note that for n ≥ 5 0 a t –dist ribution may b e approx imated reasonably well by the standard normal distribution, N (0; 1) . Again, this is a manifestation of th e central limit theor em , to be discus s ed in Sec. 8.15 belo w . For n = 1 , a t –distribution amoun t s t o t h e special case a = 1 , b = 0 of the Cauchy distribution; cf. Sec. 8.1 4. R : dt ( x, n ) , pt ( x, n ) , qt ( α , n ) , rt ( n simulations , n ) GDC: tpdf ( t, n ) , tcdf ( − 10 , t, n ) , in vT ( α, n ) EXCEL, OpenOfﬁc e: T.DIST , T.INV (dt.: T.VER T , TVERT , T. INV , TINV ) 8.9 F –dis t rib ution with n 1 and n 2 degrees of fr eedo m The reproductiv e two-parameter F –distribut ion with n 1 and n 2 degr ees of freedom was made prominent in Statistics by the English statis t ician, ev olutionary biologi st, eugeni cist and geneticist Sir Ronald A ylmer Fisher FRS (1890– 1 962), and t h e US-American m athematician and stati s tician Geor ge W addel Snedecor (1881–1974); cf. Fisher (1924) [23] and Snedecor (1934) [95]. Consider two sets of s t ochastically independent, ident ically standard no rm ally distributed (“i.i.d. ”) one- dimensional random variables, X i ∼ N (0; 1) ( i = 1 , . . . , n 1 ), and Y j ∼ N (0; 1) ( j = 1 , . . . , n 2 ). Deﬁne the sums X := n 1 X i =1 X 2 i and Y := n 2 X j =1 Y 2 j , (8.82) each o f which sati s ﬁes a χ 2 –distribution with n 1 resp. n 2 degrees of freedom. Then the quo t ient random va riable F n 1 ,n 2 := X/n 1 Y /n 2 ∼ F ( n 1 , n 2 ) , with n 1 , n 2 ∈ N , (8.83) is F –dist ri buted wi t h d f 1 = n 1 and d f 2 = n 2 degrees of freedom. Spectrum of values: F n 1 ,n 2 7→ f n 1 ,n 2 ∈ D ⊆ R ≥ 0 . (8.84) The probabi lity density functio n ( pdf ) of an F – distribution is quite a complicated mathem atical expression; s ee Rinne (2008) [87, p 330] for the explicit representati o n of the F pdf . Plots are shown for four different combination s of the parameters n 1 and n 2 in Fig. 8.12. The F cdf cannot be expressed in terms of elementary mathematical functions. 8.10. P ARET O DISTRIB UTION 81 0 1 2 3 4 5 6 0.0 0.5 1.0 1.5 x Fpdf(x) F(80, 40) F(40, 20) F(6, 10) F(3, 5) F−distributions Figure 8.12: pdf of the F –distribution for four combi n ations of degrees of freedom ( d f 1 = n 1 , d f 2 = n 2 ) . The curves correspond to the cases F (80 , 40) , F (40 , 20) , F (6 , 10) and F (3 , 5) , respectiv el y . Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 332]): E( X ) = n 2 n 2 − 2 for n 2 > 2 (8.85) V ar( X ) = 2 n 2 2 ( n 1 + n 2 − 2 ) n 1 ( n 2 − 2 ) 2 ( n 2 − 4) for n 2 > 4 (8.86) Sk ew ( X ) = (2 n 1 + n 2 − 2) p 8( n 2 − 4) ( n 2 − 6 ) p n 1 ( n 1 + n 2 − 2) for n 2 > 6 (8.87) Kurt( X ) = 12 n 1 (5 n 2 − 22)( n 1 + n 2 − 2) + ( n 2 − 2) 2 ( n 2 − 4 ) n 1 ( n 2 − 6)( n 2 − 8)( n 1 + n 2 − 2) for n 2 > 8 . (8.8 8) α –quanti l es, f n 1 ,n 2 ; α , of F –dis tributions are t abulated in advanced textbooks on Statistics . Alter- nativ ely , th ey may be obtained from R , EXCEL, or from OpenOfﬁce. R : df ( x, n 1 , n 2 ) , pf ( x, n 1 , n 2 ) , qf ( α, n 1 , n 2 ) , rf ( n simulations , n 1 , n 2 ) GDC: F pdf ( x, n 1 , n 2 ) , F cdf (0 , x, n 1 , n 2 ) EXCEL, OpenOfﬁc e: F.DIST , F.INV (dt.: F.VER T , FVERT , F. INV , FINV ) 8.10 Par e t o distri bution When studying the distribution of wealt h and in come of people in Italy towards the end of the 19 th Century , th e Italian engin eer , sociologist , econom ist, pol itical scienti st and philo s opher V ilfredo Federico Damaso Pareto (1848–1923) discovered a certain type of quantitative regularity 82 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS which he could model mathemati call y in terms of a simple powe r-la w fun cti on inv olvin g only two free parameters; cf. Pareto (1896) [77]. The on e-di m ensional random variable X underlying s u ch a Par eto distribution , X ∼ P ar ( γ , x min ) , (8.89) has a Spectrum of values: X 7→ x ∈ { x | x ≥ x min } ⊂ R > 0 , (8.90) and a Probability density function ( pdf ): f X ( x ) =        0 for x < x min γ x min  x min x  γ +1 , γ ∈ R > 0 for x ≥ x min ; (8.91) its graph is shown in Fig. 8.13 below for four diff erent v alues of the dimension l ess exponent γ . 1 2 3 4 5 6 7 8 0.0 1.0 2.0 x P aretopdf(x) P ar(1/3, 1) P ar(1/2, 1) P ar(ln(5)/ln(4), 1) P ar(5/2, 1) P areto distr ibutions Figure 8.13: pdf of t he P areto distribution according to Eq. (8.91) for x min = 1 and γ ∈  1 3 , 1 2 , ln(5) ln(4) , 5 2  . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) =        0 for x < x min 1 −  x min x  γ for x ≥ x min . (8.92) 8.10. P ARET O DISTRIB UTION 83 Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 362]): E( X ) = γ γ − 1 x min for γ > 1 (8.93) V ar( X ) = γ ( γ − 1) 2 ( γ − 2) x 2 min for γ > 2 (8.94) Sk ew ( X ) = 2(1 + γ ) γ − 3 r γ − 2 γ for γ > 3 (8.95) Kurt( X ) = 6( γ 3 + γ 2 − 6 γ − 2) γ ( γ − 3)( γ − 4) for γ > 4 . (8.96) It is imp o rtant to realise that E( X ) , V ar( X ) , Sk ew ( X ) and Kurt( X ) are well-deﬁned only for t he values of γ indicated; otherwis e th ese m easures do not exist. α –quanti l es: α ! = F X ( x α ) = 1 −  x min x α  γ ⇔ x α = F − 1 X ( α ) = γ r 1 1 − α x min for all 0 < α < 1 . (8.97) R : dpareto ( x, γ , x min ) , ppareto ( x, γ , x min ) , qpareto ( α, γ , x min ) , rpareto ( n simulations , γ , x min ) (package: extraD istr , b y W olodzko (2018) [121]) Note that it follows from Eq. (8.92) that the probabil ity of a P areto-distributed cont inuous one- dimensional random variable X t o exceed a certain threshold value x is given by the simple power - law rule P ( X > x ) = 1 − P ( X ≤ x ) =  x min x  γ . (8.98) Hence, the ratio of probabiliti es P ( X > k x ) P ( X > x ) =  x min k x  γ  x min x  γ =  1 k  γ , ( 8.99) with k ∈ R > 0 , is s cale-in variant , meaning in dependent o f a particular scale x at which one ob- serves X (cf. T aleb (2007) [105, p 2 5 6f f and p 326ff]). This behaviour is a direct consequence of a special mathemati cal property of Pareto distributions which is technically referred t o as self- similarity . It is determined by the fact that a Pareto– pdf (8. 9 1) has constant elasticity , i.e. (cf. Ref. [18, Sec. 7.6]) ε f X ( x ) = − ( γ + 1) for x ≥ x min , (8.100) which contrasts with the case of the standard normal distribution; cf. Eq. (8.66). This feature implies that in the present scenario the occurrence o f extreme outliers for X is not enti rely unus u al. Further interesting examples, in various ﬁelds of applied science, o f di stributions o f qu ant ities which also feature the s cale-in variance of s caling laws are d escrib ed in W iesenfeld (2001) [11 8 ]. Now adays , Pareto distributions play an import ant role in the quantitative modelling of ﬁnancial risk; see, e.g., Bouchaud and Potters (2003) [4]. 84 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS W orking out t h e equation of the Lorenz curve associated with a Pareto dis tribution according to Eq. (7.31), using Eq. (8.97), yields a particularly sim ple result gi ven by L ( α ; γ ) = 1 − (1 − α ) 1 − (1 /γ ) . (8.101) This resul t forms the basis of Pare to’ s famous 80/20 rule concerning concentration i n the distribu- tion of v arious assets of general importance in a given population. According to Pareto’ s em pirical ﬁndings, typi cally 80% of such an asset are owned by just 20% of the population considered (and vice versa); cf. Pareto (1896) [77]. 5 The 80/ 20 rule app l ies exactly for a value of t h e power -l aw index of γ = ln(5) ln(4) ≈ 1 . 16 . It is a prominent example of the phenomenon of universality , fre- quently observed in t he mathematical modelling of quant itative –empirical relatio nships between var iables in a wide variety of scientiﬁc di s ciplines; cf. Gleick (1987) [34, p 157ff]. For purposes of numerical sim ulation i t i s us eful to work with a truncated Par eto distribu tion , for which the one-dimension al random v ariable X takes v alues in an interval [ x min , x cut ] ⊂ R > 0 . Samples of random values for such an X can be easily generated from a one-dimensional random var iable Y that is uniformly distributed on the interval [0 , 1] . The sampl e values of the latter are subsequently transformed according to the formula; cf. Ref. [120]: x ( y ) = x min x cut [ x γ cut − ( x γ cut − x γ min ) y ] 1 /γ . (8.102) The required uniform ly distributed random n u mbers y ∈ [0 , 1] can be obtained, e.g., from R by means o f runif ( n simulations , 0 , 1 ) , or from the random number generator RAND() (dt.: ZUFALLSZAH L() ) in EXCEL or in OpenOfﬁce. 8.11 Exponen t i a l distribution The exponential distrib ution for a continuo us one-dim ensional random variable X , X ∼ E x ( λ ) , (8.103) depends on a single free parameter , λ ∈ R > 0 , which represents an in verse scale. Spectrum of values: X 7→ x ∈ R ≥ 0 . (8.104) Probability density function ( pdf ): f X ( x ) =      0 for x < 0 λ exp [ − λx ] , λ ∈ R > 0 for x ≥ 0 ; (8.105) its graph is shown in Fig. 8.14 below . 5 See also footn ote 2 in Sec. 3.4.2. 8.11. EXPONENT IAL DISTRIB UTION 85 0 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 x Exponentialpdf(x) Ex(1/4) Ex(1/2) Ex(1) Ex(2) Exponential distributions Figure 8 . 1 4: pdf of the exponential d i stribution according to Eq. (8.105). Displ ayed are the cases E x (1 / 4) , E x (1 / 2) , E x (1) and E x (2) . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) =      0 for x < 0 1 − exp [ − λx ] for x ≥ 0 . (8.106) Expectation value, variance, skewness and excess kurtosis: 6 E( X ) = 1 λ (8.107) V ar( X ) = 1 λ 2 (8.108) Sk ew ( X ) = 2 (8.109) Kurt( X ) = 6 . (8.110) α –quanti l es: α ! = F X ( x α ) = 1 − exp [ − λx α ] ⇔ x α = F − 1 X ( α ) = − ln(1 − α ) λ for all 0 < α < 1 . (8.111) R : dexp ( x, λ ) , pexp ( x, λ ) , qexp ( α , λ ) , rexp ( n simulations , λ ) 6 The der ivation of these results entails integration by par ts for a nu mber of tim es; see, e.g ., Ref. [18, Sec . 8.1]. 86 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS 8.12 Logistic distrib ution The logistic distribut ion for a cont i nuous one-dimensional random v ariable X , X ∼ Lo ( µ ; s ) , (8.112) depends on two free parameters: a location parameter µ ∈ R and a scale parameter s ∈ R > 0 . Spectrum of values: X 7→ x ∈ R . (8.113 ) Probability density function ( pdf ): f X ( x ) = exp  − x − µ s  s  1 + exp  − x − µ s  2 , µ ∈ R , s ∈ R > 0 ; (8.114) its graph is shown in Fig. 8.15 below . −6 −4 −2 0 2 4 6 8 0.0 0.4 0.8 x Logisticpdf(x) Lo(−2; 1/4) Lo(−1; 1/2) Lo(0; 1) Lo(1; 2) Logistic distributions Figure 8.15: pdf of the logistic distribution according to Eq. (8.114). Di s played are the cases Lo ( − 2; 1 / 4) , Lo ( − 1; 1 / 2 ) , Lo (0; 1) and Lo (1; 2) . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = 1 1 + exp  − x − µ s  . (8.115) 8.13. SPECIAL HYPERBOLIC DISTRIB U T ION 87 Expectation value, variance, skewness and excess kurtosis (cf. Rinne (2008) [87, p 359]): E( X ) = µ (8.116) V ar( X ) = s 2 π 2 3 (8.117) Sk ew ( X ) = 0 (8.11 8) Kurt( X ) = 6 5 . (8.119) α –quanti l es: α ! = F X ( x α ) = 1 1 + exp  − x α − µ s  ⇔ x α = F − 1 X ( α ) = µ + s ln  α 1 − α  for all 0 < α < 1 . (8.120) R : dlogis ( x, µ, s ) , plogis ( x, µ, s ) , qlogis ( α, µ , s ) , rlogis ( n simulations , µ , s ) 8.13 Special h yp erbolic distr ib u tion The complex dynamics associated with th e formatio n of generic singul arities in relativistic cos- mology can be perceiv ed as a random process. In thi s context, the following s pecial hyperbolic distrib ution for a continu o us one-dim ensional random variable X , X ∼ sH y p , (8.121) which does n ot depend o n any free parameters, was introduced by Khalatnikov et al (1985) [49] to aid a simpl iﬁed dynami cal descriptio n of singularity formati o n; see also Heinzle et al (2009) [41, Eq. (50)]. Spectrum of values: X 7→ x ∈ [0 , 1 ] ⊂ R ≥ 0 . (8.122) Probability density function ( pdf ): f X ( x ) =        1 ln(2) 1 1 + x for x ∈ [0 , 1] 0 otherwise ; (8.123) its graph is shown in Fig. 8.16 below . Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) =                  0 for x < 0 1 ln(2) ln(1 + x ) for x ∈ [0 , 1] 1 for x > 1 . (8.124) 88 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 x sHyppdf(x) sHyp Special h yperbolic distribution Figure 8.16: pdf of the special hyperboli c di stribution according to Eq. (8.123). Expectation value, variance, skewness and excess kurtosis: 7 E( X ) = 1 − ln(2) ln(2) (8.125) V ar( X ) = 3 ln(2) − 2 2 [ln(2)] 2 (8.126) Sk ew ( X ) = 7 [ln(2)] 2 − 27 2 ln(2) + 6 3  1 2  3 / 2 [3 ln(2) − 2] 3 / 2 (8.127) Kurt( X ) = 15 [ln(2)] 3 − 193 3 [ln(2)] 2 + 72 ln(2) − 24 [3 ln(2) − 2] 2 . (8.128) α –quanti l es: α ! = F X ( x α ) = 1 ln(2) ln(1 + x α ) ⇔ x α = F − 1 X ( α ) = e α ln(2) − 1 for all 0 < α < 1 . (8.129) 8.14 Cauchy distribution The French mathemati cian Aug ustin Louis Cauchy (1789–1857) is credited with the i nception into Statistics of the continuous two-parameter distribution law X ∼ C a ( b ; a ) , (8.13 0 ) with properties 7 Use po ly nomial division to simplify the integrands in the ensuing mom ent integrals wh en verifyin g these results. 8.14. CA UCHY DISTRIBUTION 89 Spectrum of values: X 7→ x ∈ R . (8.131 ) Probability density function ( pdf ): f X ( x ) = 1 π a a 2 + ( x − b ) 2 , with a ∈ R > 0 , b ∈ R ; (8.132) its graph is shown in Fig. 8.17 below for four particular cases. −8 −6 −4 −2 0 2 4 6 0.0 0.2 0.4 x Cauch ypdf(x) Ca(−2; 2) Ca(−1; 3/2) Ca(0; 1) Ca(1; 3/4) Cauch y distributions Figure 8.17: pd f of the Cauchy distribution according to Eq. (8.132). D i splayed are the cases C a ( − 2; 2) , C a ( − 1; 3 / 2 ) , C a (0 ; 1 ) and C a (1; 3 / 4) . The case C a (0; 1 ) correspond s to a t – distribution with d f = 1 degree of freedom; cf. Sec. 8.8. Cumulative dist ribution functi on ( cdf ): F X ( x ) = P ( X ≤ x ) = 1 2 + 1 π arctan  x − b a  . (8.133) Expectation value, variance, skewness and excess kurtosis: 8 E( X ) : does NO T exist due to a div er ging integral (8.134) V ar( X ) : does NO T exist due to a diver ging integral (8.135) Sk ew ( X ) : does NO T exist due to a diver ging integral (8.136) Kurt( X ) : does N OT exist due to a diver ging integral . (8.137) 8 In the case of a Cauchy distribution the fall-off in the tails of the p df is no t sufﬁciently fast f or th e expectation value a n d variance in tegrals, Eqs. (7.26) and ( 7.27), to c o n verge to ﬁnite values. Con sequently , this also concer ns the ske w n ess and excess kur tosis giv en in Eqs. (7 .29) and (7 .30). 90 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS See, e.g., Si via and Skilling (2006) [92, p 34]. α –quanti l es: α ! = F X ( x α ) ⇔ x α = F − 1 X ( α ) = b + a tan  π  α − 1 2  for all 0 < α < 1 . (8.138) R : dcauchy ( x, b, a ) , pca uchy ( x, b, a ) , qcau chy ( α, b, a ) , rcau chy ( n simulations , b, a ) 8.15 Central limit theorem The ﬁrst systematic deri vation and presentation of the paramount central limi t the- ore m of Probability Theory i s due t o the French mathematician and astron o mer Marquis Pierre Simon de Laplace (1749–1827 ), cf. Laplace (18 09) [57]. Consider a set of n mutually stochastically independent [cf. Eqs. (7.62) and (7.63)], additive one-dimensional random var iables X 1 , . . . , X n , with (i) ﬁnit e expectation values µ 1 , . . . , µ n , (ii) ﬁni te v ariances σ 2 1 , . . . , σ 2 n , which are not too different from one anot h er , and (iii) corresponding cdf s F 1 ( x ) , . . . , F n ( x ) . Introduce for this set a total sum Y n according to Eq. (7.36 ), and , by standardisatio n v ia Eq. (7.34), a related standardised summation random variable Z n := Y n − n X i =1 µ i v u u t n X j =1 σ 2 j . (8.139) Let F n ( z n ) denote the cdf associated with Z n . Then, subject to the con vergence condit ion lim n →∞ max 1 ≤ i ≤ n σ i v u u t n X j =1 σ 2 j = 0 , (8.140) i.e., that asym ptotically the s tandard deviation of the to t al s um dominates the standard devia- tions of any of the individual X i , and certain additional regularity requirement s (see, e.g., Rinne (2008) [87, p 427 f]), the central limit theor em i n its general form according to the Finnish mathematician Jarl W aldemar Lindeber g (1876–1932 ) and the Croatian–American math ematician 8.15. CENTRAL L IM IT THEOREM 91 W illiam Feller (1906–1 970) states that in the asymp t otic limit of inﬁnitely many X i contributing to Y n (and so to Z n ), it holds that lim n →∞ F n ( z n ) = Φ( z ) , (8.141) i.e., the limit of the sequence of probabil ity dis tributions F n ( z n ) for the standardis ed sum - mation random v ariables Z n is constituted b y the standard normal distribution N (0; 1) , discussed in Sec. 8.6; cf. Lindeber g (1922) [63] and Feller (1951) [20]. Earlier results on the asymptotic distributional properties of a sum o f independent add i tiv e one-dim ensional random va riables were o b tained by the Russian mathemati cian, mechanician and physicis t Aleksandr Mikhailovich L yapu nov (1857–1918); cf. L yapunov (1901 ) [66]. Thus, under fa irly general conditions, the normal distribution acts as a stable attractor distri- b ution for the sum of n mutually stochastically independent, additive random v ariables X i . 9 In oversimpliﬁed terms: this result bears a certain econom ical con venience for most practical pur - poses in that, given fav ourable condi tions, wh en the size of a random sample is sufﬁ ciently large (in practice, a typical rule of thumb is n ≥ 50 ), one essentially needs to know the characteristic features of only a s i ngle continuous univ ariate prob ability distribution t o p erform, e.g., nul l hypoth- esis sign i ﬁcance testing wit hin the frequentist framew ork; cf. Ch. 11. As will become apparent in subsequent chapters, the central limit theorem h as p rofound ramiﬁcations for applications in all empirical scientiﬁc disciplin es. Note that for ﬁni t e n t h e central limi t theorem m akes no st atement as to the nature of t h e tails of the probabilit y distribution for Z n (or for Y n ), where, i n principle, it can b e very diffe rent from a normal distribution; cf. Bouchaud and Pott ers (2003) [4, p 25f]. A direct consequence of the central limi t t heorem and its preconditions is the fact that for the sample mean ¯ X n , deﬁned in Eq. (7.36) above, both lim n →∞ E( ¯ X n ) = lim n →∞ n X i =1 µ i n and lim n →∞ V ar( ¯ X n ) = lim n →∞ n X i =1 σ 2 i n 2 con ver ge to ﬁnite va lues. This property is most easil y recognised i n the special case of n mu- tually s tochastically indepen dent and identically distrib uted (in short: “i.i.d. ”) additive one- dimensional random va riables X 1 , . . . , X n , which have commo n ﬁnit e expectation va lue µ , com- mon ﬁnite variance σ 2 , and common cdf F ( x ) . 10 Then, lim n →∞ E( ¯ X n ) = lim n →∞ nµ n = µ (8.142) lim n →∞ V ar( ¯ X n ) = lim n →∞ nσ 2 n 2 = lim n →∞ σ 2 n = 0 . (8.143) 9 Put differently , f or incr easingly large n the cdf of the total sum Y n approx imates a normal distribution with expectation value n X i =1 µ i and variance n X i =1 σ 2 i to an increasingly accurate degree. In pa r ticular, all rep r oductive distri- butions may be appr oximated by a n ormal distribution as n becomes large. 10 These con ditions lead to the cen tral limit theore m in the special form accord ing to Jarl W aldemar Lind eberg (1876 –1932 ) and th e French m athematician Paul Pierre Lév y (1886 –1971 ) . 92 CHAPTER 8. ST AND ARD UNIV ARIA TE PR OBAB ILITY DISTRIBUTIONS This result is known as the law of large numbers according to the Swiss mathematician Jakob Bernoulli (1654–1705); the sample mean ¯ X n con verges s tochastical ly to its e x p ectation value µ . W e p o int out that a counter-example to the central l imit theorem is giv en by a set of n i.i. d . Pareto- distributed with exponent γ ≤ 2 one-dimensional random variables X i , since in t his case the var iance of the X i is undeﬁned; cf. Eq . (8.94). This end s Part II of these lecture notes, and we now turn to Part III in whi ch we focus on a number of useful applications of infer ential statistical methods of data analysis within th e freque ntist framework . Data analysis t echniques wi t hin t he conceptually com p elling Bayes–Laplace frame- work ha ve been revie wed, e.g., in the online lecture n o tes by Saha (20 0 2 ) [88], in the textbooks by Si via and Skilling (2006) [92 ], Gelman et al (2014) [30] and McElreath (20 1 6) [69], and in the lecture notes of Ref. [19]. Chapter 9 Operationalisation of latent v ariables: Likert’ s sca l i n g m e t h o d o f s u m m a t e d i t e m r a t i n g s A sou nd operationali s ation of ones’ s portfoli o of statistical variables in quanti tativ e–empirical research is key to a successful and ef fectiv e application of statistical methods of data analysis , particularly in the Social Sciences and Humanities . The most frequently practiced m ethod to date for operationalis ing latent variables (su ch as unobservable “social constructs”) is due to the US-American psy chologist Rensis Likert’ s (1903–1981). In his 1 9 32 paper [62], which completed his thesis work for a Ph.D., he expressed the idea that latent statistical variables X L , when they may be perceiv ed as one-dim ens ional in nature, can be rendered m easurable in a quasi-metrical fashion by m eans of the summated ratings over an extended set of suitabl e and observable indicator items X i ( i = 1 , 2 , . . . ), which, in order to ensure ef fectiveness, ought to be (i) hig h ly inter depend ent and possess (ii) high dis criminatory power . Such in dicator items are often formulated as speciﬁc statements relatin g to the theoretical concept a parti cul ar one-dimensional l atent variable X L is supp osed to capture, with respect t o which t est persons need t o express their s ubjectiv e lev el of agreement or , in diff erent s ettings, in dicate a speciﬁc subjective degree of intensit y . A t y p ical item rating scal e for the indicator items X i , providing t he necessary i tem ratin g s, is given for ins tance by the 5–level ordinal l y ranked attributes of agreement 1: strongly disagree/strongl y unfa vourable 2: disagree/unfav ourable 3: undecided 4: agree/fa vourable 5: strongly agree/strongly fa vourable. In the research literature, one also encounters 7–l evel or 10–le vel item rating scales, which offer more ﬂe xibilit y . Not e t hat it is a s sumed (!) fom the outset that the items X i , and thu s their rati n gs, can be t reated as additive , so that the conceptual princip les of Sec. 7.6 relating to sum s of random var iables can be relied upon. When forming the sum over the ratings of all the indicator items one selected, it is essential to carefully pay attenti on to the polarity of th e items in volve d. For t he 93 94 CHAPTER 9. L IKE R T’S SCALING METHO D OF SUMMA TED ITEM RA TINGS resultant total sum X i X i to be consist ent, the po larity of all items used needs to be uniform. 1 The construct i on of a consistent and coherent Likert scale for a one-dim ensional latent s tatistical var iable X L in volves four basic steps (see, e.g., Trochim (2006) [109]): (i) the compil at i on of an initial lis t of 80 to 100 p otential i ndicator i tems X i for the one- dimensional latent var iable of interest, (ii) th e d raw of a gauge random sa mple from the targe t populati on Ω , (iii) the compu tation o f t h e total sum X i X i of item ratings, and, most im portantly , (iv) the performance of an item analysis based on the sample data and the associated total sum X i X i of item ratings. The item analysis , in particular , consists of the consequential appl ication of two exclusion crit eria, which aim at establ ishing the scientiﬁc quality of the ﬁnal Likert scale . Items are being discarded from the list when either (a) they show a weak item-to-total correlation with the total sum X i X i (a rule of thumb is to exclude item s wit h correlations less than 0 . 5 ), or (b) it is pos sible to increase the value of Cronb ach’ s 2 α –coefﬁcient (see Cronbach (1951) [14]), a measure of the scale’ s internal consistency r eliability , by excluding a particular item from the list (the objective b ei n g to attain α -values greater than 0 . 8 ). For a set of m ∈ N indicator items X i , Cronbach’ s α –coefﬁc ient is deﬁned by α :=  m m − 1        1 − m X i =1 S 2 i S 2 total       , (9.1) where S 2 i denotes the sampl e v ariance associated with th e i th indicator it em (percei ved as being metrically scaled), and S 2 total is the sample variance of the total sum X i X i . R : alpha( items ) (package: psych , by Rev elle (2019) [86]) SPSS: Analyze → Scale → Reliability Analysis . . . (Model: Alpha) → Statistics . . . : Scale if item deleted 1 For a qu estionnaire, howev er , it is strong ly reco mmend e d to inc lude also in dicator items of reversed polarity . This will impr ove th e overall co nstruct validity of the measuremen t tool. 2 Named after the US-Am erican ed ucational psychologist Lee Joseph Cron bach ( 1916– 2001 ) . Th e r a n ge of the normalised real- valued α –coefﬁcient is the interval [0 , 1] . 95 One-dimensional latent statistical variable X L : • Item X 1 : strongly di sagree      strongly agree • Item X 2 : strongly di sagree      strongly agree . . . . . . . . . . . . . . . . . . . . . . . . • Item X k : strongly disagree      strongly agree T able 9.1: Structure of a discrete k -indicator -item Likert scale for some one-dimens i onal latent statistical va riable X L , based on a visualised equidistant 5–level item rating scale. The outcome of t he item analysis is a drastic reductio n of the initial list to a set of just k ∈ N indicator item s X i ( i = 1 , . . . , k ) o f high d i scriminatory power , where k i s t y pically in the range of 10 to 15 . 3 The associated total sum X L := k X i =1 X i (9.2) thus operationali ses the one-dimension al latent statisti cal variable X L in a quasi-metrical fashion, since it is to b e measured on an inter val scale with a discr et e spectrum of values given (for a 5–lev el item rating scale) by X L 7→ k X i =1 x i ∈ [1 k , 5 k ] . (9.3) The structure o f a ﬁnalised discrete k -ind icator- item Likert scale for some one-dimens ional la- tent statistical variable X L with an equidis tant graphical 5–level i tem rati ng s cale is displayed in T ab . 9.1. Likert’ s scalin g meth o d of aggregating informati on from a set of k highly interdependent ord i nally scaled items to form an effecti vely quasi-metrical, one-dimensio nal total sum X L = X i X i draws its legitim i sation to a large extent from a generalised version of the central l i mit theore m (cf. Sec. 8.15), wherein the precondition of m utually stochastically independent variables contributing to the sum is relaxed. In practice i t is found that for many cases of interest in the samples one has a vailable for research the tot al su m X L = X i X i is normally dis t ributed in to a very good approximation. Nev ertheless, the normality property of Likert scale data needs to be est ablished on a case-by-case basis . The main shortcoming of Likert’ s approach is its dependency of the gauging process of the scale on the target p opulation. In the Social Sc iences there is a va ilable a broad v ariety of operationalisation procedures al t er - nativ e to the discrete Likert scal e . W e restrict ourselves here to mention but one example, 3 Howe ver, in many research p a p ers one ﬁnds Likert scales with a minimu m of just four indicato r items. 96 CHAPTER 9. L IKE R T’S SCALING METHO D OF SUMMA TED ITEM RA TINGS namely the contin uous psycho m etric vi sual analogue scale (V AS) de veloped by Hayes and Pa- terson (1921) [40] and by Freyd (1923) [26]. Further m easurement scales for latent statistical var iables can be obtained from the websites zis.gesis .org , German Social Sciences mea- surement s cales (ZIS), and ss rn.com , Social Science Research Network (SSRN). On a historical note: one of the ﬁrst systemat i cally desig n ed questionnaires as a measurement too l for collecting socio-economic data (from workers on st rike at the time in Britain) was publis hed b y the Statis tical Society of London in 1838; see Ref. [97]. Chapter 10 Random sampling of target popula tions Quantitative–emp irical resear ch methods may be employed for explo ratory as well as for con- ﬁrmatory data analysis . Here we will focus on the latter , in the context of a freq uentist view- point of Pr obability Theory and statistical inferenc e . T o inv est igate resear ch qu estions sys- tematically by statistical means, with t he objective to make inferences about the di stributional properties of a set of statistical variables in a speciﬁc target population Ω o f study o b jects, o n the basis of analysis of data from just a few units in a sample S Ω , the following t h ree issues have to be addressed in a clearcut fashion: (i) the target population Ω of the research activity needs to be deﬁned in an un ambiguous way , (ii) an adequate random sample S Ω needs to be drawn from an underlying sampli ng frame L Ω associated with Ω , and (iii) a reliable m athematical procedure for estimating quantitativ e population parameters from random sample data needs to be employed. W e will bri eﬂy discuss t hese issues i n turn, beginning with a re view in T ab . 10.1 of con ventional notation for distinguishing speciﬁc statist ical m easures relating t o tar get populati ons Ω on the one-hand side from the corresponding ones relating to random s am ples S Ω on the other . One-dimensional random variables in a target p o pulation Ω (of si ze N ), as what statistical vari- ables will be u n derstood to const i tute subsequ ently , will be denoted by capit al Latin letters such as X , Y , . . . , Z , while t heir r ealisations in random s amples S Ω (of size n ) will be denoted b y lower case Latin letters such as x i , y i , . . . , z i ( i = 1 , . . . , n ). In add i tion, one d enotes population parameters by lower case Greek letters, while for th eir corresponding point estimator functions relating to random samples, which are als o perceiv ed as random va riables, again capital Latin let- ters are us ed for representation. The ratio n/ N will be referred t o as the sampling fraction . As is standard in the st atistical literature, we w i ll denote a particul ar random sample of size n for a one-dimensional random variable X by a s et S Ω : ( X 1 , . . . , X n ) , with X i representing any arbitrary random va riable associated with X in this sample. In actual practice, it is often not pos sible to acquire access fo r t he pu rpose o f enquiry to ev ery si n gle statistical unit belonging to an identiﬁed target populati on Ω , not e ven in princip l e. For example, this could be due to the fact that Ω ’ s size N is far too lar ge to be d etermined accurately . In this case, 97 98 CHAPTER 10. R ANDOM SAMPLING O F T ARGET POPULA TIONS T a rget population Ω Random sample S Ω population size N sample size n arithmetical mean µ sample mean ¯ X n standard deviation σ sample standard d eviation S n median ˜ x 0 . 5 sample median ˜ X 0 . 5 ,n correlation coef ﬁcient ρ sample correlation coefﬁcient r rank correlation coefﬁc ient ρ S sample rank correl. coefﬁcient r S regression coefﬁcient (intercept) α sample regression intercept a regression coefﬁ cient (slope) β sample regression slope b T able 10 .1: Notation for distinguishing between stati s tical measures relating to a target pop ula- tion Ω on the one-hand side, and to the correspondi ng quantities and unbiased maximum likelihood point estimator functions obtained from a random sample S Ω on the other . 10.1. RANDOM SAMPLING METHODS 99 to ensure a reliable in vestigation, one needs to resort to usin g a sampling frame L Ω for Ω . By this one understands a representativ e lis t of elements in Ω to which access can actually be obt ained one way or another . Such a list will hav e t o be com piled by some authorit y of scientiﬁc integrity . In an attempt to av oid a notati onal ove rﬂo w in the following, we will continue to use N to deno te both : the size of the target population Ω and the size o f its associated sa mpling frame L Ω (e ven though this is not entirely accurate). As regards th e s p eciﬁc sampling process, one m ay dist i nguish cr oss-sectional one-of f sampling at a ﬁxed instant from longitudinal mult iple sampling over a ﬁnite time interval. 1 W e no w proceed to int ro d u ce the t hree most commo nly practiced methods of drawing random samples from giv en ﬁxed target pop u lations Ω of statistical units. 10.1 Random sa mpling methods 10.1.1 Simple random sampling The simple random sampling technique can be best und erstood in terms of the urn model of combinatorics introdu ced in Sec. 6.4. Given a target population Ω (or sam p ling frame L Ω ) of N distingui shable st at i stical un i ts, t h ere is a total of  N n  distinct pos sibiliti es of drawing s amples of size n from Ω (or L Ω ), giv en the order o f selection is not being accounted for and excluding r epetitio n s , see Sec. 6.4.2. A simple random sample i s then deﬁned by th e property t hat its probability of selection is equal to 1  N n  , (10.1) according to the Laplacian principle of Eq. (6.11). This h as the immediate cons equ ence t h at the a priori probability of selection of any single statistical unit is giv en by 2 1 −  N − 1 n   N n  = 1 − N − n N = n N . (10.2) On the other hand, the probabilit y th at two statistical units i and j will be m embers of the sam e sample of size n amounts to n N × n − 1 N − 1 . (10.3) As s uch, by Eq. (6.16), this type of a selection procedure of two statistical units proves not to yield two sto chast ically independent un its (in which case the joint probability of selection would 1 In a sense, cross-sectional samp ling will yield a “snap shot” of a target po pulation of intere st in a particular state, while long itudinal sampling is the b asis for pr o ducing a “ﬁlm” fe a tu ring a particu lar evolutionary aspect of a target populatio n of interest. 2 In the statistical literatu re this pa r ticular p roperty of a rand om sample is re ferred to as “epsem”: equal pro bability of selection method. 100 CHAPTER 10. R ANDOM SAMPLING O F T ARGET POPULA TIONS be n/ N × n/ N ). Howe ver , for sampling fractions n/ N ≤ 0 . 05 , stochastic independence of the se- lection of stati stical units generally holds to a reasonably good approxim ation. When, in addit ion, the sampl e s ize is n ≥ 50 , t he conditions for the central limit theor em in the v ariant of Lind eber g and Lévy (cf. Sec. 8.15) to apply o ft en hold to a fairly good degree. 10.1.2 Stratiﬁed random sampling Stratiﬁed random sampling adapts the sam pling process to a known intrinsic s t ructure of the tar- get population Ω (and i t s associated sampl i ng frame L Ω ), as provided by the k mutually exclusive and exhaustiv e categories of some qualitative (nominal or ordinal) v ariable; these thus deﬁne a set of k strata (layers) of Ω (or L Ω ). By const ruction, there are N i statistical units belonging t o the i th stratum ( i = 1 , . . . , k ). Simple random samp l es of sizes n i are drawn from each st ratum according to t he princi p les outlin ed in Sec. 10.1. 1, y ielding a total sample of size n = n 1 + . . . + n k . Fre- quently applied variants of thi s s am pling technique are (i) pr oportionate allo cati o n of stati stical units, deﬁned by the conditio n 3 n i n ! = N i N ⇒ n i N i = n N ; (10.4) in particular , this allows for a fair representati on of m inorities in Ω , and (ii) optimal al location of statist i cal uni t s which aims at a m inimisati o n of the result ant s ampling errors of the variables in vestigated. Further details on the st ratiﬁed random sam pling technique can be found, e.g., in Bortz and Döring (2006) [6, p 425ff]. 10.1.3 Cluster random sampling When the target p opulation Ω (and i ts asso ci at ed sampling frame L Ω ) n atu rally subd ivides into an exhaustiv e set of K m utually exclusive clusters of stati stical units, a con venient sampling strategy is giv en by selecting k < K clusters from this set at random and perform complete surve ys wit hin each o f the chosen clusters. The probability of selection of any particular statis tical unit from Ω (or L Ω ) thus amounts to k /K . This cluster random sampling meth o d has th e practical advantage of being less contrived. Howe ver , in general it entails sam pling errors that are greater than for the pre vious two sampling methods. Further details on the cluster random samplin g techniq ue can be found, e.g., in Bortz and Döring (2006) [6, p 4 3 5f f]. W e emphasise at this point that empi rical data gained from con venience samples (in contrast to random sampl es) is not amenable to s tatistical infer ence , in t hat its inform at i on content cannot be generalised to the tar get po pulation Ω from which i t was drawn; see, e.g., Bryson (1976) [9, p 185], or Schnell et al (2013) [91, p 289 ]. 10.2 P oint est imator functions Many infer ential stati stical methods of data analysis in the frequ entist framework rev ol ve around th e estimation o f unknown distrib ution parameters θ wit h respect to some t arget 3 Note th at, thus, this also has th e “epsem” p r operty . 10.2. POINT ESTIMA TOR FUNCTIONS 101 population Ω by means of corresponding maximum lik elihood point estimator functions ˆ θ n ( X 1 , . . . , X n ) (or: statistics ), the values of wh i ch are com puted from the data of random sam- ples S Ω : ( X 1 , . . . , X n ) . Owing to the s tochastic n atu re of t he rando m samplin g process, any p oint estimator function ˆ θ n ( X 1 , . . . , X n ) is subject to a random s ampling error . One can show that this estimation procedure becomes reliable provided t hat a point estimator function satisﬁes the following two important criteria of quality: (i) Unbiasedness: E( ˆ θ n ) = θ , and (ii) Consis tency: lim n →∞ V ar( ˆ θ n ) = 0 . For metrically scaled one-di m ensional random variables X , deﬁning for a giv en random s ample S Ω : ( X 1 , . . . , X n ) of size n a sample total sum by Y n := n X i =1 X i , (10 .5) the two mos t prominent maximum likelihood point estimator functions satisfyi ng the unbiased- ness and consistency conditions are the sample mean and sample variance , deﬁned by ¯ X n := 1 n Y n (10.6) S 2 n := 1 n − 1 n X i =1 ( X i − ¯ X n ) 2 . (10.7) These will be frequently employed in s ubsequent consideration s in Ch. 12 for point-esti m ating the values o f the location and scale parameters µ and σ 2 of th e distribution for a one-dimensio nal random v ariable X in a target popu l ation Ω . Sampling th eory in the fr equentist framework holds it t hat the standard err ors (SE ) associated with the maximum likelihood poin t estim at o r functions ¯ X n and S 2 n , deﬁned in Eqs. (10.6) and (10.7), amount t o th e standard deviations of the underlying theoretical sampling distributions for t h ese functions; see, e.g., Cramér (1946) [13, Chs. 2 7 to 2 9 ]. For a given tar get population Ω (or sampling frame L Ω ) of size N , imagine drawing all possible  N n  mutually independent rando m samples of a ﬁxed size n (no order accounted for and repetiti ons excluded), from each of which individual realisations of ¯ X n and S 2 n are obtained. The theoretical distributions for all s u ch realisations of ¯ X n resp. S 2 n for given N and n are referred to as thei r corresponding sampling distributions . A useful simulation i l lustrating the concept of a sampling distribution is av ailable at th e website onlinestatb ook.com . In the lim it t hat N → ∞ while keeping n ﬁxed, the th eoretical sampling distributions of ¯ X n and S 2 n become norm al (cf. Sec. 8.6) resp. χ 2 with n − 1 degrees of freedom (cf. Sec. 8.7), with st andard deviations SE ¯ X n := S n √ n (10.8) SE S 2 n := r 2 n − 1 S 2 n ; (10 .9) 102 CHAPTER 10. R ANDOM SAMPLING O F T ARGET POPULA TIONS cf., e.g., Lehman and Casella (1998) [59, p 91ff] , and Levin et al (2010) [61, Ch. 6 ]. Thus, for a ﬁnite samp le st andard deviation S n , these t wo standard err ors decrease wit h the sample size n in proporti on t o the in verse of √ n resp. the in verse of √ n − 1 . It is a main criti cism of proponents of the Bayes–Laplace approach to Probab ility Theory and stati s tical inferenc e that the concept of a sampling distribution for a maximum likelihood point estimator function is based on unobserved data ; cf. Greenberg (20 13) [35, p 31f]. There are like wi s e unbiased maxim u m likelihood po int est imators for the shape parameters γ 1 and γ 2 of t h e probabi lity di stribution for a o ne-dimensional random variable X i n a target p o pulation Ω , as giv en in Eqs. (7.2 9 ) and (7.30). For n > 2 resp. n > 3 , th e s a mple skewness and s a mple excess kurtosis in, e.g., their impl ementation in the software p ackages R (package: e1071 , by Meyer et al (2019) [71]) or SPSS are deﬁned by (see, e.g., Joanes and Gill (1998) [45, p 184]) G 1 := p ( n − 1) n n − 2 1 n P n i =1 ( X i − ¯ X n ) 3  1 n P n j =1 ( X j − ¯ X n ) 2  3 / 2 (10.10) G 2 := n − 1 ( n − 2 ) ( n − 3)    ( n + 1)    1 n P n i =1 ( X i − ¯ X n ) 4  1 n P n j =1 ( X j − ¯ X n ) 2  2 − 3    + 6    , (10.11) with associated standard errors (cf. Joanes and Gi l l (1998) [45, p 185f]) SE G 1 := s 6( n − 1) n ( n − 2)( n + 1)( n + 3) (10.12) SE G 2 := 2 s 6( n − 1) 2 n ( n − 3 ) ( n − 2)( n + 3)( n + 5) . (10 . 1 3) Chapter 11 Null h ypothesis sig niﬁcance testing Null hypothe sis signiﬁcance testing by means of observable quantiti es is the centrepiece of the current body of inferential statistical m ethods in the fr equentist framew ork . Its lo gic of an on - going routine of sys t ematic falsi ﬁcation of null hypotheses by empirical means is ﬁrmly rooted i n the ideas of critical rationalism and logical positivism . The latter were expressed most emphat- ically by the Aust ro–British philosopher Sir Karl Raimund Popper CH FRS FB A (1902–1994); see, e.g., Popper (2002) [83]. The systematic procedure for null hypoth esis signiﬁcance testing on the grounds of obs erv ational eviden ce , as practiced today within t h e fre quentist framework as a standardised metho d of probabili t y-based decision-making , was developed during the ﬁrst half of the 20 th Century , predominantly by the Engl i sh stati stician, ev o l ution- ary biologist, eugenicist and geneti ci s t Sir Ronald A ylmer Fisher FRS (1890–1962), the Polish– US-American mathemat i cian and st atistician Jerzy Neyman (1894–1981), the Eng lish mathe- matician and stati stician Karl Pearson FRS (1857–1936 ) , and his son, the English statistician Egon Sharpe Pearson CBE FRS (1895–198 0); cf. Fisher (19 3 5) [24], Neyman and Pearson (1933) [75], and Pearson (1900) [78]. W e will describe the m ain steps of the systematic test procedure in the following. 11.1 General procedure The central aim of null hypothesis signiﬁcance testing is to s eparate, as reliabl y as possibl e, true effects in a target population Ω of statisti cal units concerning distributional properties of, or relations between, selected s tatistical variables X, Y , . . . , Z from chance effects potentially injected by the sampli n g approach to pro b i ng the nature of Ω . The sam p ling app roach results in a, generally unav oi d able, state of i ncomplete inf ormation on the part of the researcher . In an inferential stat i stical context, (null and/or research) hypothe ses are formulated as assump- tions on (i) the probability distribution function F of one or more random variables X , Y , . . . , Z in Ω , or on (ii) one o r more parameters θ of this probabilit y distribution function. 103 104 CHAPTER 11. NULL HYPO THESIS SIGNIFICANCE TESTING Generically , statist i cal hypotheses n eed to be viewed as probabilistic statements. As such the researcher will always ha ve to deal wit h a fair amount of uncertainty in deciding whether an observed, potentiall y o nly apparent effec t is statisti cally signiﬁcant and/ o r practically signiﬁcant in Ω or not. Bernstein (1998) [3, p 207] summarises the circumst ances relating to t h e test of a speciﬁc hypothesis as follows: “Under conditi ons of u n certainty , the choice is not between rejecting a hypothesis and accepting it, but between reject and not–reject. ” The q u estion arises as to which kinds of quant itative pr ob lems can be efﬁciently set t led by statist ical means ? W ith respect to a given target population Ω , in the sim plest k i nds of appl ications of null hypothe sis signiﬁcance testing , one m ay (a) test for differ ences in the dis tributional properti es of a single one-dimensi onal statist ical variable X between a number of subgroups of Ω , necessitating univ ariate methods of data analysis, or one may (b) test for ass ociation for a two-dim ens ional statistical v ariable ( X , Y ) , th u s requiring bivariate methods o f data analysis. The standardised procedure for null h ypothesis si gniﬁcance testing , practiced with in the fr equentist framework for the p u rp o s e of assessing st atistical signi ﬁcance of an observed, potential l y apparent ef fect, takes the fol lowing six steps on the way to making a decision : Six-step proce dur e for null hyp othesis signiﬁcance testing 1. Formulation, with respect to the tar get po p ulation Ω , o f a pair of mutuall y exclusi ve hy- potheses : (a) the null hypothesis H 0 conjectures that “there exists no eff ect in Ω of the kind en vis- aged by the researcher , ” wh i le (b) the resear ch hypothesis H 1 conjectures that “there does exist a true ef fect in Ω of the kind en vi saged by t h e researcher . ” The starting p oint of th e t est procedure is t h e a s sumption (!) that it is the content o f the H 0 conjecture which i s realised in Ω . The objectiv e is to t ry to refute H 0 empirically on t he basis of random sampl e data drawn from Ω , to a level of signiﬁcance which needs to be speciﬁed in advance. In th i s sense it is H 0 which is bei n g subj ected to a statisti cal test. 1 The striking asymmetry regarding the roles of H 0 and H 1 in the test procedure embodies the notion of a falsiﬁcation of hypotheses, as adv ocated by critical rationalism . 2. Speciﬁcation of a si gniﬁcance level α prior to the performance of the t est , where, by con ven- tion, α ∈ [0 . 01 , 0 . 05] . The parameter α is synonymous wit h the probabili t y of committin g a T ype I error (to be deﬁned below) in making a test decision. 3. Const ructi on of a sui table conti nuous real-valued measure for quant i fying deviations of the data in a random sampl e S Ω : ( X 1 , . . . , X n ) of size n from the init ial “no effe ct i n Ω ” con- jecture of H 0 , a test stati s tic T n ( X 1 , . . . , X n ) that is percei ved as a one-dimensional ran- dom variable with (under the H 0 assumption ) known (!) asso ciat ed theor etical probability 1 Bernstein (19 98) [3, p 2 0 9] refers to the statistical test of a (n ull) hypoth esis as a “mathem atical stress test. ” 11.1. GEN E RAL PR OCEDURE 105 distrib ution for computing related e vent p robabilities. The latter i s referred to as the test distrib ution . 2 4. Determi n ation of the rejec tion region B α for H 0 within the sp ectrum of values of the test statistic T n ( X 1 , . . . , X n ) from re-arranging the conditional probabil ity cond ition P ( T n ( X 1 , . . . , X n ) ∈ B α | H 0 ) ! ≤ α , (11.1) where P ( . . . ) and the threshold α –quantile(s) P − 1 ( α ) d em arking the boundary(ies) of B α are to be calculated from the assumed (continu o us) test distribution. 5. Compu tation of a s p eciﬁc realisation t n ( x 1 , . . . , x n ) of the test stat i stic T n ( X 1 , . . . , X n ) from the data x 1 , . . . , x n in a random sample S Ω : ( X 1 , . . . , X n ) , the latter of which consti- tutes the required observational evidence . 6. Deriv ation of a test decision on t he basis of the foll owing alternative criteria: w h en for the realis at i on t n ( x 1 , . . . , x n ) of the test statist i c T n ( X 1 , . . . , X n ) , resp. the p –va lue (to be deﬁned in Sec. 11.2 below) associated with thi s realisation, 3 it holds that (i) t n ∈ B α , resp. p –value < α, then ⇒ reject H 0 , (ii) t n / ∈ B α , resp. p –value ≥ α, then ⇒ not reject H 0 . A ﬁtting metaphor for the six-step procedure for null hypothesis signiﬁcance testing just de- scribed is t hat of a statistical long jump competition . The issue here is to ﬁnd out whether actual empirical data d e viates su f ﬁciently st rongly from the “no ef fect” reference state conjectured in t he giv en null hypothesis H 0 , so as t o land in the corresponding re jection r egion B α within the spec- trum of v al u es of the test statistic T n ( X 1 , . . . , X n ) . Steps 1 to 4 prepare the l ong jump facility (the test stage), while the ev aluation of the outcome of th e jump attempt tak es place in steps 5 and 6. Step 4 necessitates the direct applicatio n of Pr obability Th eory within the frequen tist framework in that th e determin at i on of the reject ion region B α for H 0 entails the calculat i on of a conditional e vent prob abi lity from an assumed test distribu tion . When an effec t ob served on th e basis of random sample data proves to po s sess statistical signiﬁ- cance (to a predetermined sign iﬁcance leve l), this m eans th at most likely it has come about not by chance due to the sampling methodology . A different mat t er altogether is whether such an ef fect also possesses practical signiﬁcance , so t hat, for instance, managem ent decisions o ught to be adapted to it. Practical signiﬁcance of an observed ef fect can be e valuated, e.g., with the standardised and scale-in variant effect size measures proposed by Cohen (1992, 2009) [11 , 12]. Addressing the practical si gniﬁcance o f an observed ef fect sh o uld be com monplace in any report on inferential statisti cal data analysis ; see also Sullivan and R Feinn (20 1 2) [102]. When performing null h ypothesis sig niﬁcance testing , the researcher is always at risk of making a wrong decision. Hereby , one d i stinguish es between the following two kinds of potential error: 2 W ithin the frequen tist fr amew ork of nu ll h ypothesis signiﬁcan ce testing th e test statistic and its par tner test distri- bution fo rm an in timate pair of d ecision-mak ing devices. 3 The statistical so ftware p ackages R and SPSS provide p –values as a m eans fo r making decisions in nu ll hyp othesis signiﬁcance testing. 106 CHAPTER 11. NULL HYPO THESIS SIGNIFICANCE TESTING H 0 : no effect Decision for: H 1 : effect H 0 : no effect correct decision: T ype I err or : true P ( H 0 | H 0 true ) = 1 − α P ( H 1 | H 0 true ) = α Reality / Ω : H 1 : effec t T ype II error : correct decis i on: true P ( H 0 | H 1 true ) = β P ( H 1 | H 1 true ) = 1 − β T able 11 . 1 : Cons equences of test decisions in null hypothesi s signiﬁcance testing . • T ype I err or: reject an H 0 which, howe ver , i s true, with conditional probability P ( H 1 | H 0 true ) = α ; this case is also referred to as a “false p o sitive, ” and • T ype II err or: not reject an H 0 which, howe ver , is false, wi th condi tional probabil ity P ( H 0 | H 1 true ) = β ; thi s case is also referred t o as a “false negative . ” By ﬁxing the sign iﬁcance level α prior to running a stati stical test, one controls the ri s k of com- mitting a T ype I error in th e d ecision p rocess. W e condense the di f ferent possible ou t comes when making a test decision in T ab . 11.1. While th e probabili ty α is requi red t o be speciﬁed a pr iori to a statistical test , the probability β is typically com puted a posteriori . One refers to the probabili ty 1 − β associated with the latter as the power of a statistical test. Its m agn i tude is determi ned in particular by the parameters sample size n , signiﬁcance lev el α , and the effect size of t he pheno m enon to be in vestigated; see, e.g., Cohen (2009) [12] and Hair et al (2010) [36, p 9f]. As emphasi s ed at t h e beginning of t his chapter , null hypothe sis signiﬁcance testing is at the heart of quanti tativ e–empirical research rooted in t he frequ entist framework . T o fost er scientiﬁc progress in thi s context, it is essential that the s cientiﬁc communit y , in an act of self-control, aims at repeated r eplicatio n of s peciﬁc test result s in independent i nv estigations. An i nteresting article in this respect w as pu b lished by the weekly magazine The Economist on Oct 19, 2013, see Ref. [17], which points out that, when subjected to such scrutiny , in general negati ve empi ri cal results ( H 0 not rejected) prove much m ore reli abl e than positive ones ( H 0 rejected), though scien- tiﬁc j ournals t end to hav e a bias towa rds pub l ication of the latter . A simi lar viewpoint is expressed in the paper by Nuzzo (2014) [76]. Rather critical accounts of the conceptual foundati o ns of null hypothesis signiﬁcance testing are given in the works by Gill (1999) [32] and by Krus chke and Liddell (2017) [53]. The complementary Bayes–Laplace appr oach t o statistical data analysis (cf. Sec. 6.5.2) does neither require the p rior speciﬁcation of a signiﬁcance le vel α , no r t h e i n t roduction of a test statis- tic T n ( X 1 , . . . , X n ) with a partner test dis tribution for the empirical t esting of a (null) h y pothesis. 11.2. DEFINITION O F A P –V ALUE 107 As described in detail by Jeffre ys (1939) [44], J aynes (2003) [43], Si via and Skill ing (2006) [92], Gelman et a l (2014) [30] or McElreath (20 16) [69], here statisti cal infer ence is practiced entirely on t he basis of a posterior pr obability distrib ution P ( hypoth esi s | data , I ) for the (research) hy- pothesis t o b e tested, conditional on the empirical data t hat was analysed for this purpos e, and on the “rele vant background information I ” a vailable to the researcher beforehand. By employing Bayes’ theore m [cf. Eq. (6.18)], this posterior pr obability distribut ion is computed in p arti cular from the p rod uct between the likelihood function P ( data | hypothesis , I ) of the data, give n th e hy- pothesis and I , and the prior prob ability distrib ution P ( hypothesis , I ) encoding the researcher’ s initial reasonable degree-of-b elief in t he truth content of the hypothesis on the backdrop of I . That is (see Sivia and Skilli ng (2006) [92, p 6]), P ( hypo thesis | data , I ) ∝ P ( data | hypot h esis , I ) × P ( hypot hesis , I ) . (11.2) The Bayes–Laplace approach can be viewe d as a proposal to the form alisation of the process of learning . Note that the p o sterior probability distribution of one round of data generation and analysis can serve as the prior p robability dist ri bution for a subsequent round of generation and analysis of new data. Further details on the prin ci p les within the Bayes–Laplace framework underlying the estimation o f distribution parameters, the optimal curve-ﬁtting to a given set of empirical data p oints, and the related selection of an adequate math ematical model are given in, e.g., Greenber g (20 13) [35, Chs. 3 and 4 ], Saha (2002) [88, p 8ff], Lupton (1993) [65, p 50f f], and in Ref. [19]. 11.2 Deﬁnitio n o f a p –value Def.: Let T n ( X 1 , . . . , X n ) be the test statistic of a particular null hypothesis sig niﬁcance test in t he fr equentist framework . The test distrib ution associated with T n ( X 1 , . . . , X n ) be known under t h e assumptio n that t he null hypothesis H 0 holds true in th e target population Ω . The p – value associated with a r eali sation t n ( x 1 , . . . , x n ) of the t est statist ic T n ( X 1 , . . . , X n ) is deﬁned as th e conditional probabili ty of ﬁndi n g a value for T n ( X 1 , . . . , X n ) whi ch is equal to or mor e e xtr eme than the actual realisation t n ( x 1 , . . . , x n ) , giv en that t he null hypothesi s H 0 applies i n the tar get populati on Ω . This conditional probability is to be computed from the test distribution. Speciﬁcally , using the computati onal rules (7.22)–(7.24), one obtains for a • two-sided statist ical test, p := P ( T n < − | t n || H 0 ) + P ( T n > | t n || H 0 ) = P ( T n < − | t n || H 0 ) + 1 − P ( T n ≤ | t n || H 0 ) = F T n ( −| t n | ) + 1 − F T n ( | t n | ) . (11.3) This result s pecialises to p = 2 [1 − F T n ( | t n | )] if the respective pdf of the test distribu- tion exhibits reﬂec tion symmetry wi th respect t o a vertical axis at t n = 0 , i.e., when F T n ( −| t n | ) = 1 − F T n ( | t n | ) holds. 108 CHAPTER 11. NULL HYPO THESIS SIGNIFICANCE TESTING • left-sid ed s tatistical test, p := P ( T n < t n | H 0 ) = F T n ( t n ) , (11.4) • right -sided st at i stical test, p := P ( T n > t n | H 0 ) = 1 − P ( T n ≤ t n | H 0 ) = 1 − F T n ( t n ) . (11.5) W ith respect to the test decision criterion of rejecting an H 0 whenev er p < α , one refers to (i) cases with p < 0 . 05 as signiﬁcant test results , and to (ii) cases wit h p < 0 . 01 as highly signiﬁcant test results. 4 Remark: User -friendly routines for the computation of p –values are a vailable in R , SPSS, EXCEL and OpenOfﬁc e, and also on som e GDCs. In the following t wo chapters, we will t urn to discuss a number of standard problems in I nferential Statistics within the frequ entist framework , in association with th e quanti tativ e–empirical tools that have been de veloped in this context t o tackle th em . In Ch. 12 we will be concerned with prob- lems of a univariate natu re, in particular , testing for statistical differ ences in the di stributional properties of a sing le one-di m ensional statist ical variable X b et w een two of more subgrou p s of some tar get population Ω , wh ile i n Ch. 13 the problems at hand wil l be of a bi variate nature, testing for statistical asso cia tion in Ω for a two-dimension al stati s tical var iable ( X , Y ) . An en- tertaining exhaustiv e account of the history of statistical methods of data analysis prior to the year 1900 is given by Stigler (198 6) [99 ]. 4 Lakens (20 1 7) [55] posted a stimulatin g blog e n try on the potential trap s associated with the inte r pretation of a p – value in statistical data an alysis. His remarks come along with illu stra ti ve demon strations in R , in cluding the underly ing codes. Chapter 12 Uni v ariate methods of statistical data analysis : conﬁdence interv als and testing f or differ ences In this chapter we present a selection of standard in ferential statist ical techniques within the fre- quentist framework that, based upon the random sampling of som e target population Ω , were dev elo p ed for the purpose of (a) range-estimatin g un known dist ribution parameters by means of conﬁdenc e intervals , (b) testing for differenc es between a give n empirical di s tribution of a one- dimensional statistical var iable and it s a pri ori assum ed t heoretical d istribution, and (c) comparing distributional properties and parameters of a one-dimensi onal s tatistical var iable between two or more subgroups o f Ω . Since the methods to be introduced relate to consid eratio n s on distributions of a single one-dimensio n al statis tical variable only , they are thus referred to as univ ariate . 12.1 Conﬁden c e intervals Assume giv en a continuous one-dimensional statistical v ariable X which satisﬁes in some tar - get population Ω a Gaußian normal distribut ion with unknown distribution parameters θ ∈ { µ, σ 2 } (cf. Sec. 8.6). The is sue is to determine, using empirical data from a random sample S Ω : ( X 1 , . . . , X n ) , a two-sided conﬁdence interval estimate for any one of these un- known d i stribution parameters θ at (as one says) a conﬁdence lev el 1 − α , where, by con vention, α ∈ [0 . 01 , 0 . 05 ] . Centred on a suitable unbiased and consist ent maxi mum likelihood po int estimator func- tion ˆ θ n ( X 1 , . . . , X n ) for θ , the aim of the est imation process i s to explicitly account for the sam- pling err or δ K arising due to the random selection process. This approach yi elds a t wo-sided conﬁdence interval K 1 − α ( θ ) = h ˆ θ n − δ K , ˆ θ n + δ K i , (12.1) such that P ( θ ∈ K 1 − α ( θ )) = 1 − α appli es. The int erpretation of the conﬁdence interval K 1 − α is that upon arbitrarily many independent repetitions of the random sampling process, in (1 − α ) × 100% of all cases the unknown distribution parameter θ will fall insi de the boundaries of 109 110 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS K 1 − α and in α × 100% of all cases it will not. 1 In th e foll owing we will consider t h e two cases which result when choosing θ ∈ { µ, σ 2 } . 12.1.1 Conﬁdence intervals f or a population m ea n When θ = µ , and ˆ θ n = ¯ X n by Eq. (10.6), the two-sided conﬁde nce interv al for a population mean µ at signiﬁcance level 1 − α becomes K 1 − α ( µ ) =  ¯ X n − δ K , ¯ X n + δ K  , (12.2) with a sampling error amount ing to δ K = t n − 1;1 − α/ 2 S n √ n , (12.3) where S n is the positive square root of the sample variance S 2 n according to Eq. (10.7), and t n − 1;1 − α/ 2 denotes th e value o f t h e (1 − α / 2) –quant i le of a t –distribution with d f = n − 1 degrees of freedom; cf. Sec. 8.8. Th e ratio S n √ n represents the standard err or SE ¯ X n associated wi t h ¯ X n ; cf. Eq. (10.8). GDC: mode STAT → TESTS → TInterv al Equation (12.3) may be inv erted t o obtain the minimum sample si ze necessary to con struct a two- sided conﬁdence in t erv al for µ to a prescribed accurac y δ max , maxi m al sample variance σ 2 max , and ﬁxed conﬁdence level 1 − α . Thus, n ≥  t n − 1;1 − α/ 2 δ max  2 σ 2 max . (12.4) 12.1.2 Conﬁdence intervals f or a population variance When θ = σ 2 , and ˆ θ n = S 2 n by Eq. (10.7), the associated point estimator function ( n − 1 ) S 2 n σ 2 ∼ χ 2 ( n − 1) , with n ∈ N , (12.5) satisﬁes a χ 2 –distribution with d f = n − 1 degrees of freedom; cf. Sec. 8.7. By i nv erting t he condition P  χ 2 n − 1; α/ 2 ≤ ( n − 1) S 2 n σ 2 ≤ χ 2 n − 1;1 − α/ 2  ! = 1 − α , (12.6) one derive s a two-sided conﬁdence interval f or a population variance σ 2 at signiﬁcance level 1 − α given by " ( n − 1) S 2 n χ 2 n − 1;1 − α/ 2 , ( n − 1) S 2 n χ 2 n − 1; α/ 2 # . (12.7) χ 2 n − 1; α/ 2 and χ 2 n − 1;1 − α/ 2 again denote the values of particular quantil es of a χ 2 –distribution. 1 In actual reality , for a given ﬁxed con ﬁdence in terval K 1 − α , the un known distribution parameter θ either takes its value in side K 1 − α , or not, but the re searcher canno t say which case ap plies. 12.2. ONE -SAMPLE χ 2 –GOODNESS–OF–FIT –T E ST 111 12.2 One-sam ple χ 2 –goodness –of–ﬁt–te st A standard research quest i on in quantitative–empirical in vestigations deals wit h the issue whether or not, with respect to so m e target population Ω of s ample units, the distribu tion law for a s peciﬁc one-dimensional statistical v ariable X may be assumed to comply with a particular t heoretical reference distribution. This question can be formul ated in t erms of th e corresponding cdf s, F X ( x ) and F 0 ( x ) , presuppos i ng th at for practical reasons th e s p ectrum of values of X i s subdivided int o a set of k mut u ally exclusiv e categories (or bins ), with k a j udiciously chosen pos i tiv e i nteger whi ch depends in the ﬁrst p l ace on the size n of the random sample S Ω : ( X 1 , . . . , X n ) to be in vestigated. The non-parametric one-sample χ 2 –goodness–of–ﬁt–test takes as it s starting point the pair of Hypotheses: ( H 0 : F X ( x ) = F 0 ( x ) ⇔ O i − E i = 0 H 1 : F X ( x ) 6 = F 0 ( x ) ⇔ O i − E i 6 = 0 , (12.8) where O i ( i = 1 , . . . , k ) denotes the actuall y observe d frequ ency of category i i n a random sampl e of size n , E i := np i denotes the, under H 0 (and so F 0 ( x ) ), theoretically expected fr equency of category i in t he same random sampl e, and p i is the pr obability of ﬁnding a value of X in category i u nder F 0 ( x ) . The p resent procedure, devised by Pearson (1900) [78], employs the residuals O i − E i ( i = 1 . . . , k ) to construct a suitable T est s tatistic: T n ( X 1 , . . . , X n ) = k X i =1 ( O i − E i ) 2 E i H 0 ≈ χ 2 ( k − 1 − r ) (12.9) in terms of a sum of rescaled squared resid uals ( O i − E i ) 2 E i , 2 which, under H 0 , approximately follows a χ 2 –test distribu tion with d f = k − 1 − r degrees of freedom (cf. Sec. 8.7); r denot es the num ber of free parameters of th e reference di s tribution F 0 ( x ) wh ich need to be estimated from the random sample data. For this test procedure to be reliabl e, it i s important (!) that the size n of the random sample be chosen such that the condit i on E i ! ≥ 5 (12.10) holds for all categories i = 1 , . . . , k , due to the fact that the E i appear i n the denominat o r of the test statistic in Eq. (12 . 9 ) (and so would artiﬁcally inﬂate the magnitudes of the summed ratios when the denominators become too small ). T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n > χ 2 k − 1 − r ;1 − α . (12 .11) 2 As th e E i ( i = 1 . . . , k ) amoun t to count d ata with unkn own m aximum coun ts, the proba b ility distribution relev ant to model variation is th e Poisson d istribution discussed in Sec. 8.4. Hence, the standa r d d eviations a re equal to √ E i , and so th e variances equal to E i ; cf. Jef freys ( 1939) [4 4, p 106 ]. 112 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS By Eq. (11.5), the p –v alue associated with a realisation t n of the test statistic (12.9), which is t o be calculated from the χ 2 –test distrib ution , amount s to p = P ( T n > t n | H 0 ) = 1 − P ( T n ≤ t n | H 0 ) = 1 − χ 2 cdf (0 , t n , k − 1 − r ) . (12.12) R : chisq.t est(table( variable )) SPSS: Analyze → Nonparametric T ests → Legacy Dialogs → Chi-square . . . Effect size: In the present context, the practical s i gniﬁcance of the phenomenon in vestigated can be estimated from the realisation t n and the sample size n by w := r t n n . (12.13) For th e in terpretation of its strength Cohen (1992) [11, T ab . 1] recommends the Rule of thumb: 0 . 10 ≤ w < 0 . 30 : s mall ef fect 0 . 30 ≤ w < 0 . 50 : m edium ef fect 0 . 50 ≤ w : large effe ct. Note that i n the spi rit of critical rationalis m the one-sampl e χ 2 –goodness–of–ﬁt–t est provides a tool for empirically excluding pos s ibilities of distribution laws for X . 12.3 One-sam ple t – and Z –te sts f or a popu lation m ean The i d ea here is to test whether t he unk n own populati o n mean µ of some continu ous one- dimensional statis tical variable X is equal to, less than, or greater than some reference va lue µ 0 , to a give n si gniﬁcance lev el α . T o this end, it is required that X satisfy in th e tar get populatio n Ω a Gaußian normal distribution , i.e., X ∼ N ( µ ; σ 2 ) ; cf. Sec . 8.6. The quant itative –analytical tool to be employed i n this case is the parametric one-sample t –test f o r a population mean developed by Student [Gosset] (19 08) [100], or , when the sample size n ≥ 50 , in consequence o f the central limit theor em discussed in Sec. 8.15, the correspondin g one-sample Z –test . For a random sample S Ω : ( X 1 , . . . , X n ) of size n ≥ 50 , the validity of t h e assumpt ion (!) of nor - mality for the X -di stribution can be tested by a procedure due to the Russian mathematicians Andrey Nikolaevich Kolmogorov (1903– 1987) and Nikolai V asily evich Smirnov (1900–1966). This test s the null hypothesis H 0 : “There i s no differe nce between the dis t ribution of the sam- ple data and the associated reference norm al distribution” against t he alternativ e H 1 : “There is a diffe rence between t h e distribution of the sample data and the associated reference normal distri- bution;” cf. K o lmogorov (1933) [51] and Smirnov (19 3 9) [93]. This procedure is referred to as the Kolmogorov–Smir nov–test (or , for short, the KS–test). The associated test stati stic ev alu - ates th e strength o f the deviation of the empirical cum u lativ e distribution function [cf. E q . (2.4)] of given random sample data, with sample mean ¯ x n and sample v ariance s 2 n , from the cdf of a reference Gaußi an normal distribution with parameters µ and σ 2 equal to these sample values [cf. Eq. (8.52)]. 12.3. ONE -SAMPLE T – AND Z –TESTS FOR A POPULA TION MEAN 113 R : ks.test( va riable , "p norm") SPSS: Analyze → Nonparametric T ests → Legacy Dialogs → 1-Sample K-S . . . : Normal For sam ple sizes n < 50 , h owev er , the va lidity of the normalit y assumption for the X -distribution may be estimated in terms of the magnitudes of the standardised skewness and excess kurtosis measur es ,     G 1 SE G 1     and     G 2 SE G 2     , (12.14) which are const ructed from the quantities deﬁned in Eqs. (10.10)–(10.13). At a signiﬁcance le vel α = 0 . 05 , the normali ty assumption may be m aintained as long as bo th measures are smaller th an the critical value of 1 . 96 ; cf. Hair et al (2010 ) [36, p 7 2 f]. Formulated in a non-directed or a directed fashion, the starting po int of the t –test resp. Z –test procedures are the Hypotheses: ( H 0 : µ = µ 0 or µ ≥ µ 0 or µ ≤ µ 0 H 1 : µ 6 = µ 0 or µ < µ 0 or µ > µ 0 . (12 .15) T o measure the d e viation of t h e sample data from the st ate conjectured to hold in the null hypo th- esis H 0 , the di fference between the sample mean ¯ X n and t h e hypot hesised population m ean µ 0 , normalised in analogy to Eq. (7.34) by the standard error SE ¯ X n := S n √ n (12.16) of ¯ X n giv en in Eq. (10.8), serves as the µ 0 –dependent T est s tatistic: T n ( X 1 , . . . , X n ) = ¯ X n − µ 0 SE ¯ X n H 0 ∼      t ( n − 1) for n < 50 N (0 ; 1 ) for n ≥ 50 , (12.17) which, under H 0 , follows a t –test distribu tion wit h d f = n − 1 degrees of freedom (cf. Sec. 8.8) resp. a standard normal test distribution (cf. Sec. 8.6). T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by 114 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided µ = µ 0 µ 6 = µ 0 | t n | > ( t n − 1;1 − α/ 2 ( t –test ) z 1 − α/ 2 ( Z –t est ) (b) left-sided µ ≥ µ 0 µ < µ 0 t n < ( t n − 1; α = − t n − 1;1 − α ( t –test ) z α = − z 1 − α ( Z –t est ) (c) right-sided µ ≤ µ 0 µ > µ 0 t n > ( t n − 1;1 − α ( t –test ) z 1 − α ( Z –t est ) p –values ass ociated wi t h realisations t n of the test statistic (12.17) can be obtained from Eqs. (11.3)–(11.5), using the rele vant t –test distrib ution resp. the standard normal test dis- trib ution . R : t.test( var iable , mu = µ 0 ) , t.test( var iable , mu = µ 0 , alternat ive = "less") , t.test( var iable , mu = µ 0 , alternat ive = "greater") GDC: mode STAT → TESTS → T-Test... when n < 50 , resp. m ode STAT → TESTS → Z-Test... when n ≥ 50 . SPSS: Analyze → Compare Means → One-Sample T T est . . . Note: Regrettably , SPSS provides no option for selecting between a “one-tailed” (left-/right -sided) and a “two-tailed” (two-sided) t –test. The default setting is for a t wo-sided t est. For the purpose of one-sided tests the p –value outp ut of SPSS needs t o be divided by 2 . Effect si ze: The practical signiﬁcance of the phenomeno n in vestigated can be estimated from the sample mean ¯ x n , the sample standard de viation s n , and the reference value µ 0 by the scale-in variant ratio d := | ¯ x n − µ 0 | s n . (12.18) For th e in terpretation of its strength Cohen (1992) [11, T ab . 1] recommends the Rule of thumb: 0 . 20 ≤ d < 0 . 50 : s mall ef fect 0 . 50 ≤ d < 0 . 80 : m edium eff ect 0 . 80 ≤ d : l ar ge ef fect. W e remark that the statistical softwar e package R holds a vailable a routi ne power.t.te st(power, sig.level, delta, sd, n , alternative, type = "one.sam ple") for the purpose of calculat i ng any one of the parameters power , delta 12.4. ONE -SAMPLE χ 2 –TEST FOR A POPULA T ION V ARIANCE 115 or n (provided all remainin g parameters ha ve been speciﬁed) in th e context of empirical in vesti- gations employing t h e o n e-sample t –test for a population mean. One-sided tests are speciﬁed v i a the parameter setting alternativ e = "one.sided" . 12.4 One-sam ple χ 2 –test for a populati o n variance In analogy to the statistical sig niﬁcance test described in the previous section 12.3, one may like- wise test hypot heses on the value of an unknown popul at i on variance σ 2 with respect to a reference value σ 2 0 for a continuo u s one-dimensional stati s tical variable X which sati sﬁes in Ω a Gaußian normal distribu tion , i.e., X ∼ N ( µ ; σ 2 ) ; cf. Sec. 8.6. The hypo t heses may al s o be formulated in a non-directed or directed fashion according to Hypotheses: ( H 0 : σ 2 = σ 2 0 or σ 2 ≥ σ 2 0 or σ 2 ≤ σ 2 0 H 1 : σ 2 6 = σ 2 0 or σ 2 < σ 2 0 or σ 2 > σ 2 0 . (12.19) In the one-sample χ 2 –test for a population variance , the underlying σ 2 0 –dependent T est s tatistic: T n ( X 1 , . . . , X n ) = ( n − 1) S 2 n σ 2 0 H 0 ∼ χ 2 ( n − 1) (12.20) is chosen to be proporti o nal to the sam p l e var iance deﬁned by Eq. (10.7), and so, un d er H 0 , follows a χ 2 –test distrib ution with d f = n − 1 degrees of freedom; cf. Sec. 8.7. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided σ 2 = σ 2 0 σ 2 6 = σ 2 0 t n ( < χ 2 n − 1; α/ 2 > χ 2 n − 1;1 − α/ 2 (b) left-sided σ 2 ≥ σ 2 0 σ 2 < σ 2 0 t n < χ 2 n − 1; α (c) right-sided σ 2 ≤ σ 2 0 σ 2 > σ 2 0 t n > χ 2 n − 1;1 − α p –values associated with realisations t n of the test statistic (12.20), which are to be calculated from the χ 2 –test distrib ution , can be obtained from Eqs. (11.3)–(11.5). R : varTest( va riable , sigma.squared = σ 2 0 ) (package: EnvStats , by M i llard (2013) [72]), 116 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS varTest( va riable , sigma.squared = σ 2 0 , alternative = "less") , varTest( va riable , sigma.squared = σ 2 0 , alternative = "greater") Regrettably , the one-sample χ 2 –test for a populat i on variance does not appear to have been imple- mented in the SPSS software package. 12.5 T wo inde pendent samples t –test f or a p opulation mean Quantitative–empirical stud ies are frequently i n t erested in the question as t o what extent there exist signiﬁcant differ ences between two subgrou ps of some target p o p ulation Ω in the distribution of a metrically scaled one-dim ensional statisti cal variable X . Given that X is norma l ly d i stributed i n Ω (cf. Sec. 8.6), t he parametric two independe nt samples t –test for a population mean originatin g from work b y Student [Gosset] (1908) [100] provides an efﬁcient and power ful in vestigative tool. For independent random sampl es of sizes n 1 , n 2 ≥ 50 , the i ssue of whether there exists empi ri cal e vidence in the samples against the assumption of a normally distributed X in Ω can again be tested for by means of the Kolmogoro v–Smirno v–test ; cf. Sec. 12.3 . R : ks.test( va riable , "p norm") SPSS: Analyze → Nonparametric T ests → Legacy Dialogs → 1-Sample K-S . . . : Normal For n 1 , n 2 < 50 , o ne may resort to a consideration of t he magnitud es of the standardised skew- ness and excess kurtosis measures , Eqs. (12.14), to check for the validity of the normality as- sumption for the X -distributions. In addition, prior to the t –t est procedure, one n eeds to establish whether or not the variances of X hav e to be vie wed as signiﬁcantly differ ent in the tw o random samp l es selected. Le vene’ s test provides an empirical metho d to test H 0 : σ 2 1 = σ 2 2 against H 1 : σ 2 1 6 = σ 2 2 ; cf. Leve ne (1960) [60 ]. R : leveneTest ( variable , group variable ) (package: c ar , by Fox and W eisberg (2011) [25]) The hypotheses of a t –t est may be formul ated in a non-directed fashion or i n a directed one. Hence, the different kinds of poss i ble conjectures are Hypotheses: (test for diffe rences) ( H 0 : µ 1 − µ 2 = 0 or µ 1 − µ 2 ≥ 0 or µ 1 − µ 2 ≤ 0 H 1 : µ 1 − µ 2 6 = 0 or µ 1 − µ 2 < 0 or µ 1 − µ 2 > 0 . (12.21) A test s t atistic is constructed from the difference o f s am ple means, ¯ X n 1 − ¯ X n 2 , standardised by the standard err or SE ( ¯ X n 1 − ¯ X n 2 ) := s S 2 n 1 n 1 + S 2 n 2 n 2 , (12.22) which derives from the associated theoretical sampling distrib ution for ¯ X n 1 − ¯ X n 2 . Thus, one obtains the 12.5. INDEPENDENT SAMPLES T –TEST FOR A MEAN 117 T est s tatistic: T n 1 ,n 2 := ¯ X n 1 − ¯ X n 2 SE ( ¯ X n 1 − ¯ X n 2 ) H 0 ∼ t ( d f ) , (12.23) which, under H 0 , satisﬁes a t –test distrib ution (cf. Sec. 8.8) with a num b er of degrees of freedom determined by the relations d f :=              n 1 + n 2 − 2 , when σ 2 1 = σ 2 2  S 2 n 1 n 1 + S 2 n 2 n 2  2 ( S 2 n 1 /n 1 ) 2 n 1 − 1 + ( S 2 n 2 /n 2 ) 2 n 2 − 1 , when σ 2 1 6 = σ 2 2 . (12.24) T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided µ 1 − µ 2 = 0 µ 1 − µ 2 6 = 0 | t n 1 ,n 2 | > t d f ;1 − α/ 2 (b) left-sided µ 1 − µ 2 ≥ 0 µ 1 − µ 2 < 0 t n 1 ,n 2 < t d f ; α = − t d f ;1 − α (c) right-sided µ 1 − µ 2 ≤ 0 µ 1 − µ 2 > 0 t n 1 ,n 2 > t d f ;1 − α p –values associated wit h realisations t n 1 ,n 2 of the test stati s tic (12.23), which are to be calculated from the t –test distribution , can be obt ain ed from Eqs. (11.3)–(11.5). R : t.test( var iable ~ grou p variable ) , t.test( var iable ~ grou p variable , alternative = "less") , t.test( var iable ~ grou p variable , alternative = "greater") GDC: mode STAT → TESTS → 2-SampT Test... SPSS: Analyze → Compare Means → Independent-Samples T T est . . . Note: Regrettably , SPSS pro vides no option for selecting b et w een a o ne-sided and a two-sided t –test. The default setting is for a two-sided test. For the purpose of one-sided tests the p –value output of SPSS needs to be divided by 2 . Effect si ze: The practical signiﬁcance of the phenomeno n in vestigated can be estimated from the sample means ¯ x n 1 and ¯ x n 2 and the pooled sample standard deviation s po oled := s ( n 1 − 1 ) s 2 n 1 + ( n 2 − 1 ) s 2 n 2 n 1 + n 2 − 2 (12.25) 118 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS by the scale-in variant ratio d := | ¯ x n 1 − ¯ x n 2 | s po oled . (12.26) For th e in terpretation of its strength Cohen (1992) [11, T ab . 1] recommends the Rule of thumb: 0 . 20 ≤ d < 0 . 50 : s mall ef fect 0 . 50 ≤ d < 0 . 80 : m edium eff ect 0 . 80 ≤ d : l ar ge ef fect. R : c ohen.d( var iable , group variable , pooled = TRUE) (package: effsize , by T o rchiano (2018) [10 6 ]) W e remark that the statistical softwar e package R holds a vailable a routi ne power.t.te st(power, sig.level, delta, sd, n , alternative) for the purpose of calculation of any one of the parameters power , delta o r n (provided all remaining parameters have been speciﬁed) in the cont ext of empirical in vestigat i ons employing the indepen- dent samples t –test for a populat i on m ean. Equ al values of n are required here. One-sided tests are addressed via the parameter settin g alternat ive = "one.sided" . When the necessary conditions for the appl ication of the i ndependent samples t –test are not satis- ﬁed, the fol lowing alternative test procedures (typically of a weaker test power , thoug h ) for com- paring two subgroups o f Ω with respect to the distribution of a metrically scaled v ariable X exist: (i) at the nominal scale level, provided E ij ≥ 5 for all i, j , the χ 2 –test f o r homogeneity ; cf. Sec. 12.10 below , and (ii) at t he ordinal scale l evel, provided n 1 , n 2 ≥ 8 , the two independent samples Mann– Whitney– U –test for a median; cf. t h e following Sec. 12.6. 12.6 T wo inde pendent samples Mann–Whitne y– U – t est f o r a populatio n m e d i a n The non -parametric two independe nt s a mples Mann–Whitney– U –test for a popu- lation median , devised by the Austrian– US-American mathematician and statisti cian Henry Berthold Mann (1905–2000 ) and the US-American statistician Donald Ransom Whit- ney (1915–2001) in 1947 [68 ], can be app l ied to random sample d ata for ordinally scaled one- dimensional statistical variables X , or for metri call y scaled one-dimensio nal s tatistical v ariables X which may not be reasonably assum ed to be normally distributed in the target population Ω . In both s ituations, the method emp loys rank number data (cf. Sec. 4.3), which faithfully represents the o ri g inal random samp l e data, to effe ctiv ely compare the medians of X (or , rather , the mean rank n umbers) between two ind ependent groups. It aims to test empirically t he null hypothesis H 0 of one of the following pairs of n o n-directed or directed Hypotheses: (test for diffe rences) ( H 0 : ˜ x 0 . 5 (1) = ˜ x 0 . 5 (2) or ˜ x 0 . 5 (1) ≥ ˜ x 0 . 5 (2) or ˜ x 0 . 5 (1) ≤ ˜ x 0 . 5 (2) H 1 : ˜ x 0 . 5 (1) 6 = ˜ x 0 . 5 (2) or ˜ x 0 . 5 (1) < ˜ x 0 . 5 (2) or ˜ x 0 . 5 (1) > ˜ x 0 . 5 (2) . (12.27) 12.6. INDEPENDENT SAMPLES MANN–WHITNEY – U –TEST 119 Giv en two independent set s of random sample data for X , ranks are being introduced on the basis of an ordered joint random sample of size n = n 1 + n 2 according to x i (1) 7→ R [ x i (1)] and x i (2) 7→ R [ x i (2)] . From the ranks thus assigned t o th e elements of each of the two sets of data, one computes the U –values: U 1 := n 1 n 2 + n 1 ( n 1 + 1) 2 − n 1 X i =1 R [ x i (1)] (12.28) U 2 := n 1 n 2 + n 2 ( n 2 + 1) 2 − n 2 X i =1 R [ x i (2)] , (12.29) for which the ident ity U 1 + U 2 = n 1 n 2 applies. Choose U := min ( U 1 , U 2 ) . 3 For independent random s am ples of s izes n 1 , n 2 ≥ 8 (see, e.g., Bortz (200 5 ) [5, p 151]), the standardised U –value serves as the T est s tatistic: T n 1 ,n 2 := U − µ U SE U H 0 ≈ N (0 ; 1 ) , (12.30) which, und er H 0 , approximately s atisﬁes a standard normal test distribut ion ; cf. Sec. 8.6. Here, µ U denotes the mean of the U –value expected under H 0 ; it is deﬁned in terms of the s am ple sizes by µ U := n 1 n 2 2 ; (12.31) SE U denotes the standard err or of the U –value and can be obtained, e.g., from Bortz (2005) [5, Eq. (5.49)]. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided ˜ x 0 . 5 (1) = ˜ x 0 . 5 (2) ˜ x 0 . 5 (1) 6 = ˜ x 0 . 5 (2) | t n 1 ,n 2 | > z 1 − α/ 2 (b) left-sided ˜ x 0 . 5 (1) ≥ ˜ x 0 . 5 (2) ˜ x 0 . 5 (1) < ˜ x 0 . 5 (2) t n 1 ,n 2 < z α = − z 1 − α (c) right-sided ˜ x 0 . 5 (1) ≤ ˜ x 0 . 5 (2) ˜ x 0 . 5 (1) > ˜ x 0 . 5 (2) t n 1 ,n 2 > z 1 − α 3 Since th e U –values are tied to each other by th e iden tity U 1 + U 2 = n 1 n 2 , it makes no difference to this metho d when on e c hooses U := max( U 1 , U 2 ) instead. 120 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS p –values associated wit h realisations t n 1 ,n 2 of the test stati s tic (12.30), which are to be calculated from the standard normal test distribu tion , can be ob t ained from Eqs. (11.3)–(11.5). R : wilcox.tes t( variable ~ group variable ) , wilcox.tes t( variable ~ group variable , alternative = "less") , wilcox.tes t( variable ~ group variable , alternative = "greater") SPSS: Analyze → Non parametric T ests → Legac y Dialogs → 2 Independent Samples . . . : M ann- Whitney U Note: Regrettably , SPSS pro vides no option for selecting b et w een a o ne-sided and a two-sided U –t est. The default setti n g is for a two-sided test. F or the purpose of one-sided tests the p – value output of SPSS needs to be divided by 2 . 12.7 T wo independe n t sample s F –test f or a pop u lation vari- ance In analogy to the i n dependent sam p l es t –test for a populati on mean of Sec. 12.5, one may likewise in vestigate for a metrically scaled on e-di m ensional statistical var iable X , which can be assumed to satisfy a Gaußian normal distribution in Ω (cf. Sec. 8.6), whether there e xists a signiﬁcant diffe rence in the values of th e populat i on v ariance between two ind ependent random s amples. 4 The parametric two independent samples F –test for a population variance empirically eva luates the plausibili t y of the null hypot hesis H 0 in the non-directed resp. directed pairs of Hypotheses: (test for diffe rences) ( H 0 : σ 2 1 = σ 2 2 or σ 2 1 ≥ σ 2 2 or σ 2 1 ≤ σ 2 2 H 1 : σ 2 1 6 = σ 2 2 or σ 2 1 < σ 2 2 or σ 2 1 > σ 2 2 . (12.32) Dealing with ind ependent random samples of sizes n 1 and n 2 , the ratio of the correspond i ng sampl e var iances serves as a T est s tatistic: T n 1 ,n 2 := S 2 n 1 S 2 n 2 H 0 ∼ F ( n 1 − 1 , n 2 − 1) , (12.33) which, under H 0 , satisﬁes an F –test distribution with d f 1 = n 1 − 1 and d f 2 = n 2 − 1 degrees of freedom; cf. Sec. 8.9. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by 4 Run th e Kolmogorov–Smirn ov–test to check whether th e assump tion of normality of the d istribution o f X in the two random sample s d rawn n eeds to be rejected. 12.8. DEPENDEN T SAMPLES T –TEST FOR A MEAN 121 Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided σ 2 1 = σ 2 2 σ 2 1 6 = σ 2 2 t n 1 ,n 2 ( < 1 /f n 2 − 1 ,n 1 − 1;1 − α/ 2 > f n 1 − 1 ,n 2 − 1;1 − α/ 2 (b) left-sided σ 2 1 ≥ σ 2 2 σ 2 1 < σ 2 2 t n 1 ,n 2 < 1 /f n 2 − 1 ,n 1 − 1;1 − α (c) right-sided σ 2 1 ≤ σ 2 2 σ 2 1 > σ 2 2 t n 1 ,n 2 > f n 1 − 1 ,n 2 − 1;1 − α p –values associated wit h realisations t n 1 ,n 2 of the test stati s tic (12.33), which are to be calculated from the F –test distribution , can be obtained from Eqs. (11.3)–(11.5). R : var.test( v ariable ~ group var iable ) , var.test( v ariable ~ group variable , alternative = "less") , var.test( v ariable ~ group variable , alternative = "greater") GDC: mode STAT → TESTS → 2-SampF Test... Regrettably , the two-sample F –test for a populat i on v ariance does not appear to have been imple- mented in the SPSS software package. Instead, to address quanti t ativ e issues of the kind raised here, one may resort to Levene ’ s test ; cf. Sec. 12.5. 12.8 T wo dep e ndent sa mples t –test for a population mean Besides in vestigating for s igniﬁcant di fferences in the d istribution of a single one-dimensional sta- tistical variable X i n two or more independent sub g ro u p s of some targe t population Ω , many research projects are interested in ﬁnding out (i) how the distributional properties of a o ne- dimensional st at i stical variable X hav e changed within one and the same random sam ple of Ω in an experimental before–after situation, or (ii) how the distribution o f a one-dimens ional statist i- cal variable X di f fers between two subgroup s of Ω , the sample unit s of whi ch co-exist in a natural pairwise one-to-one correspondence to one another . When the one-dimensional statistical v ariable X in question is metrically scaled and can be as- sumed to satisfy a Gaußian normal dist ribution in Ω , s igniﬁcant d iff erences can be tested for by means of the parametric two dependent samples t –test f o r a population mean . Denoting by A and B either temporal before and after ins t ants, or partners in a set of natural pairs ( A, B ) , deﬁne for X the metrically scaled differ ence variable D := X ( A ) − X ( B ) . (12.34) An import a nt test pr er equi site demands that D itself m ay be assumed norma lly distributed in Ω ; cf. Sec. 8.6. Wheth er this property h o l ds true, can be checked for n ≥ 50 via the Kolmogor ov– 122 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS Smirno v–test ; cf. Sec. 12. 3. When n < 50 , one may resort to a consideration o f the m agnitudes of the standardised skewness and excess kurtosis measur es , Eqs. (12.14). W ith µ D denoting the population mean of the differe nce variable D , the Hypotheses: (test for diffe rences) ( H 0 : µ D = 0 or µ D ≥ 0 or µ D ≤ 0 H 1 : µ D 6 = 0 or µ D < 0 or µ D > 0 (12.35) can be giv en in a non-directed or a directed form ulation. From the sample mean ¯ D and i ts associ- ated standard err or , SE ¯ D := S D √ n , (12. 3 6) which deri ves from t h e theoretical sampling distrib ution for ¯ D , one obtain s by m eans of stan- dardisation according to Eq. (7.34) the T est s tatistic: T n := ¯ D SE ¯ D H 0 ∼ t ( n − 1) , (12.37) which, under H 0 , satisﬁes a t –test distribution with d f = n − 1 degrees of freedom; cf. Sec. 8.8. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection r egi o n for H 0 (a) two-sided µ D = 0 µ D 6 = 0 | t n | > t n − 1;1 − α/ 2 (b) left-sided µ D ≥ 0 µ D < 0 t n < t n − 1; α = − t n − 1;1 − α (c) right-sided µ D ≤ 0 µ D > 0 t n > t n − 1;1 − α p –values associated with realisations t n of the test statistic (12.37), which are to be calculated from the t –test distribution , can be obt ain ed from Eqs. (11.3)–(11.5). R : t.test( var iableA , va riableB , paired = "T") , t.test( var iableA , variableB , paired = "T", alternative = "less") , t.test( var iableA , variableB , paired = "T", alternative = "greater") SPSS: Analyze → Compare Means → Paired-Samples T T est . . . 12.9. DEPENDEN T SAMPLES WILCO XON–TEST 123 Note: Regrettably , SPSS pro vides no option for selecting b et w een a o ne-sided and a two-sided t –test. The default setting is for a two-sided test. For the purpose of one-sided tests the p –value output of SPSS needs to be divided by 2 . Effect si ze: The practical signiﬁcance of the phenomeno n in vestigated can be estimated from the sample mean ¯ D and the sample standard deviation s D by the scale-in variant ratio d :=   ¯ D   s D . (12.38) For th e in terpretation of its strength Cohen (1992) [11, T ab . 1] recommends the Rule of thumb: 0 . 20 ≤ d < 0 . 50 : s mall ef fect 0 . 50 ≤ d < 0 . 80 : m edium eff ect 0 . 80 ≤ d : l ar ge ef fect. R : cohen.d( va riable , group variable , paired = TRUE) (package: effsize , by T o rchiano (2018) [10 6 ]) W e remark that the statistical softwar e package R holds a vailable a routi ne power.t.te st(power, sig.level, delta, sd, n , alternative, type = "paired" ) for the p urpose of calculati on of any one of the parameters power , delta or n (provided all remaini ng parameters have been speciﬁed) i n the context of empi rical in vestigations employing the dependent samples t –t est for a population m ean. One-sided tests are addressed via the parameter setting alternativ e = "one.sided" . 12.9 T wo depen dent sam ples W ilcoxon–t e st f or a population median When th e test prerequisites of the dependent sam ples t –t est can n ot be m et, i.e., a giv en metri- cally s caled one-dim ensional st at i stical variable X cannot be assumed to satisfy a Gaußi an nor- mal distribution in Ω , or X i s an ordinally scaled o n e-dimensional s tatistical var iable in the ﬁrst place, the non-parametric signed ranks test publi shed by the US-American chemist and statist i - cian Frank Wilcoxon (1892–1 965) in 1945 [119] consti tutes a q uantitative–empirical tool for com- paring the distributional properti es of X between two dependent random samples drawn from Ω . Like Mann and Whitney’ s U –test d iscussed in Sec. 12.6, it is b uilt around the idea of rank num- ber data faithfully representi ng the original random sample data; cf. Sec. 4.3. Deﬁning again a var iable D := X ( A ) − X ( B ) , (12.39) with associated median ˜ x 0 . 5 ( D ) , the null hypot h esis H 0 in the non-directed or directed pairs of Hypotheses: (test for diffe rences) ( H 0 : ˜ x 0 . 5 ( D ) = 0 or ˜ x 0 . 5 ( D ) ≥ 0 or ˜ x 0 . 5 ( D ) ≤ 0 H 1 : ˜ x 0 . 5 ( D ) 6 = 0 or ˜ x 0 . 5 ( D ) < 0 or ˜ x 0 . 5 ( D ) > 0 (12.40) 124 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS needs to be subjected to a suit abl e s i gniﬁcance test. For realisations d i ( i = 1 , . . . , n ) of D , introduce rank numbers according to d i 7→ R [ | d i | ] for the ordered absolute values | d i | , while keeping a record of the s ign of each d i . Exclude from the data set all n ull diffe rences d i = 0 , leading to a sample o f reduced size n 7→ n red . Then form the sums of rank numbers W + for t he d i > 0 and W − for t he d i < 0 , respecti vely , which are linked to one another by the identity W + + W − = n red ( n red + 1) / 2 . Choos e W + . 5 For reduced sample sizes n red > 20 (see, e.g., Rinne (2008) [87, p 5 5 2]), o n e employs t h e T est s tatistic: T n red := W + − µ W + SE W + H 0 ≈ N (0 ; 1 ) , (12.41) which, und er H 0 , approximately s atisﬁes a standard normal test distribut ion ; cf. Sec. 8.6. Here, the mean µ W + expected u nder H 0 is deﬁned in terms of n red by µ W + := n red ( n red + 1) 4 , (12.42) while the standard error SE W + can be computed from, e.g., Bortz (2005) [5, Eq. (5.52)]. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided ˜ x 0 . 5 ( D ) = 0 ˜ x 0 . 5 ( D ) 6 = 0 | t n red | > z 1 − α/ 2 (b) left-sided ˜ x 0 . 5 ( D ) ≥ 0 ˜ x 0 . 5 ( D ) < 0 t n red < z α = − z 1 − α (c) right-sided ˜ x 0 . 5 ( D ) ≤ 0 ˜ x 0 . 5 ( D ) > t n red > z 1 − α p –values associated w i th realisations t n red of the test statistic (12. 4 1), which are t o be calculated from the standard normal test distribu tion , can be ob t ained from Eqs. (11.3)–(11.5). R : wilcox.tes t( variable A , variableB , paired = "T") , wilcox.tes t( variable A , variabl eB , paired = "T", alternative = "less") , wilcox.tes t( variable A , variabl eB , paired = "T", alternative = "greater") SPSS: Analyze → Nonparametric T ests → Legacy Dialogs → 2 Related Samples . . . : W il coxon 5 Due to the identity W + + W − = n red ( n red + 1) / 2 , choo sing instead W − would make no qu alitati ve d ifference to the subsequen t test p rocedur e. 12.10. χ 2 –TEST FOR HOM OGENEITY 125 Note: Regrettably , SPSS pro vides no option for selecting b et w een a o ne-sided and a two-sided W ilcoxon–t est. The default setti ng is for a two-sided test. For t he purpose of one-sided tests t he p –value output of SPSS needs to be divided by 2 . 12.10 χ 2 –test f or homogeneity Due to its independence of scale le vels of m easurement, the no n-parametric χ 2 –test for homo- geneity constitutes the most generally applicable st atistical test for signiﬁcant differences in the distributional properties of a particular one-dimensional statisti cal variable X between k ∈ N diffe rent ind ependent subg rou ps of some popu lation Ω . By assumpt ion, the one-dimensional vari- able X may take values in a total of l ∈ N different categories a j ( j = 1 , . . . , l ). Begin by formulating the Hypotheses: (test for diffe rences) ( H 0 : X satisﬁes the same distribution in all k s ubgroups of Ω H 1 : X satisﬁes a differ ent distribution in at least one sub group of Ω . (12.43) W ith O ij denoting the observ ed fr equency of category a j in s ubgroup i ( i = 1 , . . . , k ), and E ij the, under H 0 , expected fr equency of cate go ry a j in subgroup i , the sum of rescaled squared r esiduals ( O ij − E ij ) 2 E ij provides a useful T est s tatistic: T n := k X i =1 l X j =1 ( O ij − E ij ) 2 E ij H 0 ≈ χ 2 [( k − 1) × ( l − 1) ] . (12.44) Under H 0 , th is t est s t atistic satisﬁes approx i mately a χ 2 –test distribu tion with d f = ( k − 1) × ( l − 1 ) degrees of freedom; cf. Sec. 8.7. The E ij are deﬁned as pr ojections of the o bserv ed pr oportions O + j n in th e total sample of size n := O 1+ + . . . + O k + of each of the l categories a j of X into each of the k subgroups of si ze O i + by [cf. Eqs. (4.3) and (4.4)] E ij := O i + O + j n . (12 .45) Note the important (!) test pr er equisit e t h at t he total sample size n be such that E ij ! ≥ 5 (12.46) applies for all categories a j and subgroups i . T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n > χ 2 ( k − 1) × ( l − 1);1 − α . (12.47) 126 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS By Eq. (11.5 ), the p –value associ at ed with a realisati on t n of the test statistic (12.44), whi ch is to be calculated from the χ 2 –test distrib ution , amount s to p = P ( T n > t n | H 0 ) = 1 − P ( T n ≤ t n | H 0 ) = 1 − χ 2 cdf (0 , t n , ( k − 1) × ( l − 1)) . (12.48) R : chisq.test ( group variable , variable ) GDC: mode STAT → TESTS → χ 2 -Test... SPSS: Analyze → Descriptive Stati s tics → Crosstabs . . . → Statist i cs . . . : Chi-square T ypi call y the p ower o f a χ 2 –test for homo geneity is weaker than for t he related two procedures of comparing three or more i ndependent s ubgroups of Ω , which will be discussed in the subsequent Secs. 12.11 and 12.12. Effect size: The practical signiﬁcance of the phenomenon in vestigated can be est imated and in- terpreted by means of the effec t size measure w d eﬁned in Eq. (12.13); cf. Cohen (1992) [11, T ab . 1]. 12.11 One-way analysis of variance (ANO V A) This po werful qu antitative–analytical tool has been developed in the context of in vestigations on biometrical genetics by the Engli s h stati stician Sir Ronald A ylmer Fisher FRS (1890–1962) (see Fisher (1918) [22]), and later extended by the US-American statistician Henry Schef fé (1907–1977 ) (see Scheffé (1959) [90]). It is of a p arametric nature and can be interpreted alternativ ely as a m et h od for 6 (i) in vestigating the i nﬂuence of a qualitative one-dimensional st atistical v ariable Y with k ≥ 3 categories a i ( i = 1 , . . . , k ), generally referred to as a “factor , ” on a quantitative one-dimensional statistical var iable X , or (ii) test ing for diff erences of the m ean of a quantitative one-dimensional statistical variable X between k ≥ 3 different s ubgroups of some tar get populatio n Ω . A necessary condition for the applicatio n of the one-way analysis of variance (ANO V A) test procedure is th at the qu ant itative one-dim ens ional statistical variable X to be in vestigated may be reasonably as s umed to be (a) n ormally distributed (cf. Sec. 8.6) in the k ≥ 3 subgroups of the tar get popul ation Ω considered, with , in addition, (b) equal variances . Both of these conditions also have to hold for each of a set of k mutuall y stochastically independent random variables X 1 , . . . , X k representing k random s amples d rawn independently from the identiﬁed k subgroups of Ω , of sizes n 1 , . . . , n k ∈ N , respecti vely . In the fol l owing, th e element X ij of the underlying ( n × 2) data matrix X represents the j th v alue o f X in the random sample drawn from the i th subgroup of Ω , with ¯ X i the correspond i ng s ubgr oup s a mple mean . The k independent random samples can be understoo d to form a total random sample of size n := n 1 + . . . + n k = k X i =1 n i , with total sample mean ¯ X n ; cf. Eq. (10.6). 6 Only experim e ntal design s with ﬁxed effects are consider ed he re. 12.11. ON E -W A Y AN AL YSIS OF V A RIANCE (ANO V A) 127 The intention of the ANO V A procedure in t h e v ariant (ii ) stated above is to emp i rically test the null hypothesis H 0 in the set of Hypotheses: (test for diffe rences) ( H 0 : µ 1 = . . . = µ k = µ 0 H 1 : µ i 6 = µ 0 at least for one i = 1 , . . . , k . ( 12.49) The necessary test prerequisites can be checked by (a) the K olmogorov–Smir nov–test for n o r - mality o f the X -distribution in each of the k subgroups of Ω (cf. Sec. 12.3) wh en n i ≥ 50 , or , when n i < 50 , by a consideration of the magnitu d es of the standardised skewne ss and excess kurtosis meas ur es , Eqs . (12.14), and l i ke wi s e by (b) Leven e’ s test for H 0 : σ 2 1 = . . . = σ 2 k = σ 2 0 against H 1 : “ σ 2 i 6 = σ 2 0 at least for one i = 1 , . . . , k ” to test for equality of the variances in these k subgroups (cf. Sec. 12.5). R : leveneTest ( variable , group variable ) (package: c ar , by Fox and W eisberg (2011) [25]) The starting point of the ANO V A procedure is a simple algebraic decompositio n of the random sample values X ij into three additive comp onents according to X ij = ¯ X n + ( ¯ X i − ¯ X n ) + ( X ij − ¯ X i ) . (12.50) This e xpresses the X ij in terms of the sum of the total sample m ean, ¯ X n , t he deviation of the subgroup sample means from the tot al sample mean, ( ¯ X i − ¯ X n ) , and t he residual deviation of the sample v alues from t h eir respecti ve subgroup sample means, ( X ij − ¯ X i ) . Th e decomposition of the X ij motiv ates a linear stochastic model for th e t arget population Ω of the form 7 in Ω : X ij = µ 0 + α i + ε ij (12.51) in order t o quantify , via the α i ( i = 1 , . . . , k ), the potential inﬂuence of the qualitative one- dimensional v ariable Y on the quantitativ e one-dimensional v ariable X . Here µ 0 is the popu- lation mean of X , it holds that P k i =1 n i α i = 0 , and it is assumed for the random err ors ε ij that ε ij i.i.d. ∼ N (0; σ 2 0 ) , i.e., that they are identically normally distributed and m utually stochastically independent. Ha ving establis h ed the decom p osition (12.50), one next turns to consider the associated set of sums of square d deviations , deﬁned by BSS := k X i =1 n i X j =1  ¯ X i − ¯ X n  2 = k X i =1 n i  ¯ X i − ¯ X n  2 (12.52) RSS := k X i =1 n i X j =1  X ij − ¯ X i  2 (12.53) TSS := k X i =1 n i X j =1  X ij − ¯ X n  2 , (12.54) 7 Formulated in th e context o f this line a r stochastic m odel, the null and resear ch h ypotheses are H 0 : α 1 = . . . = α k = 0 and H 1 : at least one α i 6 = 0 , resp ectiv ely . 128 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS where the sum m ations are (i) over all n i sample units with i n a subgroup, and (ii) over all of the k subgroups themselves. Th e s ums are referred to as, resp., (a) the sum of squared de viations be- tween the subgroup samples (BSS), (b) the resid u al sum of squared deviations withi n the su b group samples (RSS), and (c) the total sum of squared deviations (TSS) of the individual X ij from the total sample m ean ¯ X n . It is a fairly elaborate thou g h straightforward algebraic exercise to s how that these three squared de viation term s relate to one another according to the strikingly simple and elegant identit y (cf. Bosch (199 9) [7, p 220f]) TSS = BSS + RSS . (12.55) Now , from the sums of sq u ared deviations (12.52)–(12.54), one deﬁnes, resp., the total sample variance , S 2 total := 1 n − 1 k X i =1 n i X j =1  X ij − ¯ X n  2 = TSS n − 1 , (12.56) in volving d f = n − 1 d egrees of freedom, the sample variance between subgr oups , S 2 betw een := 1 k − 1 k X i =1 n i  ¯ X i − ¯ X n  2 = BSS k − 1 , (12.57) with d f = k − 1 , and the mean sample variance within subgr oups , S 2 within := 1 n − k k X i =1 n i X j =1  X ij − ¯ X i  2 = RSS n − k , (12.58) for which d f = n − k . Employing t he latter two sub g ro u p -speciﬁc dispersion measures, t he set of hy potheses (12.49) may be recast into the alternative form Hypotheses: (test for diffe rences)            H 0 : S 2 betw een S 2 within ≤ 1 H 1 : S 2 betw een S 2 within > 1 . (12.59) Finally , as a test statistic for the ANO V A procedure one cho oses t h is very ratio of v ariances 8 we just employed, T n,k := ( sample variance between subgrou p s ) ( mean sample variance with i n subg rou ps ) = BSS / ( k − 1) RSS / ( n − k ) , 8 This ratio is sometimes given as T n,k := ( explained variance ) ( unexplained variance ) , in a n alogy to expression (1 3.10) below . Occ a - sionally , one also consider s the co efﬁcient η 2 := BSS TSS , which , h owever , does not acc ount fo r the degrees of freed om in volved. In this resp ect, the mo diﬁed coefﬁcient ˜ η 2 := S 2 b etw ee n S 2 total would constitute a more sophisticated me a sure. 12.11. ON E -W A Y AN AL YSIS OF V A RIANCE (ANO V A) 129 ANO V A sum of df mean test var iability squares square statisti c between groups BSS k − 1 S 2 betw een t n,k within groups RSS n − k S 2 within total TSS n − 1 T able 12 . 1 : A NO V A summary table. expressing the size of t he “sam ple variance between subg roups” in terms of mult iples of the “mean sample variance within subgroups”; it t h u s constitu tes a relative m easure. A real effect of differ - ence between sub groups is thus given when the non-negative numerator turns out to be signiﬁcantly lar ger than t h e non-negati ve denominato r . Mathematically , thi s statist ical measure of deviations between the data and the null hypot hesis is captured by the T est s tatistic: 9 T n,k := S 2 betw een S 2 within H 0 ∼ F ( k − 1 , n − k ) . (12.60) Under H 0 , it s atisﬁes an F –test distribut ion with d f 1 = k − 1 and d f 2 = n − k degrees of freedom; cf. Sec. 8.9. It is a well -established standard in practical appli cations of the one-way ANO V A procedure to display the results of the data analysis in the form of a summary table , here giv en in T ab . 12.1. T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n,k > f k − 1 ,n − k ;1 − α . (12.61) W ith Eq. (11.5), the p –value associated with a speciﬁc realisati o n t n,k of the test stati s tic (12.60), which is to be calculated from the F –test distribu tion , amounts to p = P ( T n,k > t n,k | H 0 ) = 1 − P ( T n,k ≤ t n,k | H 0 ) = 1 − F cdf (0 , t n,k , k − 1 , n − k ) . (12.62) R : anova( lm( variable ~ group variable ) ) (v ariances equal), oneway.tes t( variable ~ group variable ) (variances not equal) GDC: mode STAT → TESTS → ANOVA( SPSS: Analyze → Compare Means → One-W ay ANO V A . . . Effect si ze: The practical signiﬁcance of the phenomeno n in vestigated can be estimated from the sample sums of squared deviations BSS and RSS according to f := r BSS RSS . (12.63) For th e in terpretation of its strength Cohen (1992) [11, T ab . 1] recommends the 9 Note th e one-to-o n e corresp ondenc e to the test statistic (12.33) employed in the independ e n t samples F –test for a po pulation variance. 130 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS Rule of thumb: 0 . 10 ≤ f < 0 . 25 : small effec t 0 . 25 ≤ f < 0 . 40 : medium effect 0 . 40 ≤ f : lar g e ef fect. W e remark that the statistical softwar e package R holds a vailable a routi ne power.anov a.test(gro ups, n , between. var, within.var, sig.level, power) for the purpose of calculation of any one o f the parameters p ower or n (provided all remaining parameters have been speciﬁed) i n the context of empirical in vestigations employing the one-way ANO V A. Equal v alues of n are required here. When a one-way ANO V A yields a statist ically si gniﬁcant result, so-called post-hoc tests need t o be run subsequent ly in order to identify those subg roups i whose m eans µ i diffe r mo s t drasti call y from the reference va lue µ 0 . The Student–Newman –Keuls–test (Ne wman (1939) [74] and Keuls (1952) [48]), e.g., successi vely s ubjects the pairs of subgroups wi th th e largest diff erences in sam- ple m eans to independent sam ples t –tests; cf. Sec. 12.5. Other useful post-hoc tests are t hose dev elo p ed by Holm–Bonferroni (Holm (1979) [42]), T ukey (T ukey (1977) [110]), or by Scheffé (Scheffé (1959) [90]). R : pairwise.t .test( vari able , group variable , p.adj = "bonferroni ") SPSS: Analyze → Compare Means → One-W ay ANO V A . . . → Post Hoc . . . 12.12 Kruskal–W allis – test f or a p o pulation median Finally , a feasibl e alternative to the one-way ANO V A, when the conditions for the l at- ter’ s le gitimate application cannot be met, or one is int erested in the dis tributional prop- erties of a speciﬁc ordinally scaled on e-dimensional statisti cal variable X , is given by the non-parametric signiﬁcance t est de vis ed by the US-American mathematician and statis- tician W i lliam Henry Kruskal (1919–2005) and th e US-American economist and stati stician W ilson Allen W allis (1912–1998) in 1952 [54]. The Kruskal–W allis–test effecti vely serves to detect signiﬁcant dif ferences for a population median of an ordinall y or metrically scaled one- dimensional stat i stical variable X between k ≥ 3 independent subgroups of some target popula- tion Ω . T o be in vestigated empirically is the null hypothesis H 0 in the pair of mutual l y exclusive Hypotheses: (test for diffe rences) ( H 0 : ˜ x 0 . 5 (1) = . . . = ˜ x 0 . 5 ( k ) H 1 : at least one ˜ x 0 . 5 ( i ) ( i = 1 , . . . , k ) is dif ferent from the other group medians . (12 . 6 4) Introduce rank numbers according to x j (1) 7→ R [ x j (1)] , . . . , and x j ( k ) 7→ R [ x j ( k )] withi n the random sampl es drawn independently from each of th e k ≥ 3 subgroups of Ω on the basis of an ordered joint random sample of s i ze n := n 1 + . . . + n k = k X i =1 n i ; cf. Sec. 4.3 . Then form the sum of rank numbers for each random samp l e s eparately , i.e., R + i := n i X j =1 R [ x j ( i )] ( i = 1 , . . . , k ) . (12.65) 12.12. KR USKAL–W AL L IS–TEST 131 Provided the s ample sizes satisfy the condition n i ≥ 5 for all k ≥ 3 independent random sampl es (hence, n ≥ 15 ), the test procedure can be based o n the T est s tatistic: T n,k := " 12 n ( n + 1) k X i =1 R 2 + i n i # − 3( n + 1) H 0 ≈ χ 2 ( k − 1) , (12.66) which, u n der H 0 , approxim ately satisﬁes a χ 2 –test distribut ion with d f = k − 1 degrees of freedom (cf. Sec. 8.7); see, e.g., Rinne (2008 ) [87, p 553]. T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n,k > χ 2 k − 1;1 − α . (12.67) By Eq. (11.5), the p –value associated with a realis ation t n,k of the test statistic (12.66), wh ich is to be calculated from the χ 2 –test distrib ution , amounts to p = P ( T n,k > t n,k | H 0 ) = 1 − P ( T n,k ≤ t n,k | H 0 ) = 1 − χ 2 cdf (0 , t n,k , k − 1) . (12.68) R : kruskal .test( vari able ~ group variab le ) SPSS: Analyze → Nonp arametri c T ests → Legac y Dialogs → K Independent Samples . . . : Kruskal-W alli s H 132 CHAPTER 12. UNIV ARIA TE METHODS OF ST A TISTICAL D A T A ANAL YSIS Chapter 13 Bi va riate methods of statistical data analysis : testing f or associa tion Recognising patterns of regularity in the variability of data sets for given (observable) statisti- cal variables, and e xplaining them in terms of causal relationships in the context of a suitable theor etical model , is one of the main objectives of any empi rical scient iﬁc discipline, and thus motiv ation for corresponding resear ch ; see, e.g., Penrose (2004 ) [82]. Causal relations h ips are intimately related to interactions between objects or agents of the ph y sical o r/ and of the social kind. A necessary (though not sufﬁcient) con dition o n the way to theoretically fathomi n g causal relationships is to establish empiri call y the existence of signiﬁcant statis tical associations be- tween the v ariables in question. Replication of pos itive observational or experimental resul ts of this ki n d , when accompli s hed, yi elds strong sup p ort in fa vour of thi s idea. Re grettably , ho wev er , the existence of causal relationsh ips between t wo st atistical v ariables cannot be establis hed with absolute certainty by empirical means; compel l ing theoretical ar guments need to stand in. Causal relationships between statisti cal variables imp ly an unambiguous distinctio n between independent variables and dependent variables . In the following, we wi ll discuss the principles of the sim- plest three inferential s tatistical methods within the frequ entist framework , each associated wit h speciﬁc null hypothesis signiﬁcance tests , th at provide empirical checks of the aforementio ned necessary condition in the bivariate case . 13.1 Correlation analysis a nd linear regr essi o n 13.1.1 t –test f or a correlation The parametric correlation analysis presupp oses a metri call y s caled two-dimensional statistical var iable ( X , Y ) that can be assumed to satisfy a bivariate normal distribution in some target pop- ulation Ω . Its aim is to i n vestigate whether o r no t the components X and Y feature a quantitative– statistical association of a linear nature, given a data matrix X ∈ R n × 2 obtained from a random sample of size n . Formulated in terms of the population corr elation coefﬁcient ρ according to Auguste Bra vais (1811–1863 ) and Karl Pearson FRS (1857–1936), the method t ests H 0 against H 1 in one of the alternative pairs of 133 134 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS Hypotheses: (test for association) ( H 0 : ρ = 0 or ρ ≥ 0 or ρ ≤ 0 H 1 : ρ 6 = 0 or ρ < 0 or ρ > 0 , ( 13.1) with − 1 ≤ ρ ≤ +1 . For sample si zes n ≥ 50 , the assumption of normality of the marginal X - and Y -distributions in a giv en random sample S Ω : ( X 1 , . . . , X n ; Y 1 , . . . , Y n ) drawn from Ω can be tested by means of th e Kolmogor ov–Smirno v –test ; cf. Sec. 12 . 3. For s ample sizes n < 50 , on the other h and , the magnitudes of th e standardised s kewness and excess kurtosis measur es , Eq s. (12.14), can be consi dered instead. A scatter plot of th e biva riate raw s ample data { ( x i , y i ) } i =1 ,...,n displays characteristic features of the joint ( X, Y ) -distrib ution . R : ks.test( va riable , "p norm") SPSS: Analyze → Nonparametric T ests → Legacy Dialogs → 1-Sample K-S . . . : Normal Normalising the sample corr elation coefﬁcient r of Eq . (4.19) by its standard err o r , SE r := r 1 − r 2 n − 2 , ( 13.2) the latter of which can be derived from the corresponding theoretical sampling distribution for r , presently yields the (see, e.g., T outenbur g (2005) [108, Eq. (7.18)]) T est s tatistic: T n := r SE r H 0 ∼ t ( n − 2) , (13.3) which, under H 0 , satisﬁes a t –test distribution with d f = n − 2 degrees of freedom; cf. Sec. 8.8. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided ρ = 0 ρ 6 = 0 | t n | > t n − 2;1 − α/ 2 (b) left-sided ρ ≥ 0 ρ < 0 t n < t n − 2; α = − t n − 2;1 − α (c) right-sided ρ ≤ 0 ρ > 0 t n > t n − 2;1 − α p –values associated wit h realisations t n of the test statistic (13.3), which are to be calculated from the t –test distribu tion , can be obt ai n ed from Eqs. (11.3)–(11.5). 13.1. CORRELA TION AN AL YSIS AND LINEAR REGRESSION 135 R : cor.test( v ariable1 , variable2 ) , cor.test( v ariable1 , variable2 , alternative = "less") , cor.test( v ariable1 , variable2 , alternative = "greater") SPSS: Analyze → Correlate → Biv ariate . . . : Pearson Effect size: The practical si gniﬁcance o f the phenom enon i nv estigated can be estimated directly from the absolute value of the scale-in variant sample correlation coefﬁc ient r according to Cohen’ s (1992) [11, T ab . 1] Rule of thumb: 0 . 10 ≤ | r | < 0 . 30 : small ef fect 0 . 30 ≤ | r | < 0 . 50 : medium effe ct 0 . 50 ≤ | r | : l ar ge ef fect. It i s g enerally recomm ended t o h andle s igniﬁcant test results of corre lation analyses for metrically scaled two-dimensional st atistical var iables ( X , Y ) with som e care, due t o the p ossibilit y of spuri- ous corr elatio ns induced by addit i onal contr ol variables Z , . . . , acting hi dden in the background . T o exclude this pos sibility , a correlation analysis should, e.g., be repeated for homogeneous su b - groups of the sample S Ω . Some rather curious and startling cases of spurious correlatio n s have been collected at the website www.tyle rvigen.com . 13.1.2 F –test of a regr ession model When a correlation in the jo i nt distribution of a metrically scaled two-dimensional statistical vari- able ( X, Y ) , signi ﬁcant in Ω at leve l α , proves to b e strong , i .e., when the magnitude of ρ takes a value in the interval 0 . 71 ≤ | ρ | ≤ 1 . 0 , it is m eanin gful to ask which lin ear quantitative m odel best represents t he detected linear statist i cal association; cf. Pearson (1903) [80]. T o t his end, simple l i near regr essi on seeks to devise a linear stochastic re gre ssion model for the t ar get populatio n Ω of t h e form in Ω : Y i = α + β x i + ε i ( i = 1 , . . . , n ) , (13.4) which, for instance, ass igns X the role of an i ndepen dent variable (and so its values x i can be considered prescribed by the modell er) and Y the role of a depen dent variable ; such a model is essentially univariate in nature. The re gre ssion coefﬁcients α and β denote the un k nown y – inter cept and slope of the model in Ω . For the random errors ε i it is assumed that ε i i.i.d. ∼ N (0 ; σ 2 ) , (13.5) meaning they are identically normally distri buted (with zero m ean and constant variance σ 2 ) and mut ually stochasticall y independent. W ith respect to th e biv ariate random sample S Ω : ( X 1 , . . . , X n ; Y 1 , . . . , Y n ) , the su p posed linear relationship between X and Y is expressed by in S Ω : y i = a + bx i + e i ( i = 1 , . . . , n ) . (13.6) 136 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS So-called r esiduals are then deﬁned according to e i := y i − ˆ y i = y i − a − bx i ( i = 1 , . . . , n ) , (13.7) which, for given values of x i , encode the diff erences betw een the observed realis at i ons y i of Y and the correspondin g (by the linear regre ssion mo d el) predicted v alues ˆ y i of Y . Given the assumption expressed in E q . (13.5), the residuals must satisfy the conditio n n X i =1 e i = 0 . Next, in t roduce sums of squared deviations for th e Y -data, in line with the ANO V A procedure of Sec. 12.11, i.e., TSS := n X i =1 ( y i − ¯ y ) 2 (13.8) RSS := n X i =1 ( y i − ˆ y i ) 2 = n X i =1 e 2 i . (13.9) In terms of these quanti ties, the coefﬁcient of determination of Eq. (5.9) for assessing the goodness-of-the-ﬁt of a regression model can be expressed by B = TSS − RSS TSS = ( total variance o f Y ) − ( unexplained va riance of Y ) ( total variance o f Y ) . (13.10) This normalised measure e xpresses the proporti on of variability in a data set o f Y which can be explained by the corresponding variability of X through the best-ﬁt r egression mo del . The range of B is 0 ≤ B ≤ 1 . In the method ology of a r egre ssion analysis within t h e frequen tist framew ork , the ﬁrst iss ue to be addressed is to test the signiﬁcance of th e overall simple linear regr essio n model (13.4), i.e., to test H 0 against H 1 in the set of Hypotheses: (test for diffe rences) ( H 0 : β = 0 H 1 : β 6 = 0 . (13.11) Exploitin g the goodness-of-th e-ﬁt aspect of the regression model as quanti ﬁed by B in Eq. (13.10), one arriv es via division by the s tandard err o r of B , SE B := 1 − B n − 2 , (13.12) which derives from the t heoretical sampling distribution for B , at the (see, e.g., Hatzinger and Nagel (2013) [37, Eq. (7.8)]) T est s tatistic: 1 T n := B SE B H 0 ∼ F (1 , n − 2) . (13.13) 1 Note th at with th e iden tity B = r 2 of Eq. (5. 10), which applies in simple linear regression, th is is just the square of the test statistic (13.3). 13.1. CORRELA TION AN AL YSIS AND LINEAR REGRESSION 137 Under H 0 , this sati s ﬁes an F –test distribu tion with d f 1 = 1 and d f 2 = n − 2 degrees of freedom; cf. Sec. 8.9. T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n > f 1 ,n − 2;1 − α . (13.14) W ith E q . (11.5 ), the p –value associated wit h a speciﬁc realisatio n t n of the test statistic (13.13), which is to be calculated from the F –test distribu tion , amounts to p = P ( T n > t n | H 0 ) = 1 − P ( T n ≤ t n | H 0 ) = 1 − F cdf (0 , t n , 1 , n − 2) . (13.15) 13.1.3 t –test f or the regression coefﬁcients The second issue to be addressed in a systematic r egression analysis within the fr equentist framework is to test st atistically which of the regression coefﬁcients in the model (13.4) are sig- niﬁcantly different from zero. In the case of simple linear regression, though, the matter for th e coef ﬁcient β is settled already by the F –test of the regression model jus t outl ined, resp. the t –test for ρ described in Sec. 13.1.1; see, e.g., Levin et al (2010) [61, p 3 8 9f]. In this sens e, a further test of stati stical signi ﬁcance is redundant i n the case of simple li n ear regression. Ho w ever , when extending the concept of r egression analysi s to the m ore in volved case of multivariate data , a quantitative approach frequentl y employed in the research literature of the Social Sciences and Economics , this question attains relev ance i n its o wn right. In this context, the linear stochastic r egre ssion model for the dependent v ariable Y to be assessed is of the general form (cf. Y ule (1897) [122]) in Ω : Y i = α + β 1 x i 1 + . . . + β k x ik + ε i ( i = 1 , . . . , n ) , (13.16) containing a total of k uncorrelated i ndependent variables and k + 1 re gressi on coef ﬁcients, as well as a random error term. A multiple linear r egression model to be estimated from data of a corresponding random sample from Ω of si ze n t hus entails n − k − 1 degree s of freedom; cf. H air et al (2010) [36, p 176]. In view of this prospect, we continue with ou r methodological considerations. First of all, unbiased maximum likelihood point estimators for the regression coefﬁcients α and β in Eq. (13. 4 ) are obtained from application t o the data of Gauß’ method of minimising the sum of squar ed residuals (RSS) (cf. Gauß (1809) [29] and Ch. 5), minimis e RSS = n X i =1 e 2 i ! , yielding solutions b = S Y s X r and a = ¯ Y − b ¯ x . (13.17) The equation of the best-ﬁt simple linear regr ession model is thus giv en by ˆ y = ¯ Y + S Y s X r ( x − ¯ x ) , (13.18) 138 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS and can b e employed for p urposes of predicti ng values of Y from given values of X in the empirical interval [ x (1) , x ( n ) ] . Next, the s tandard err o rs associated wi th the values of the m axi mum likelihood poi n t estimators a and b in Eq. (13.17) are deriv ed from the correspondi ng theoretical sampling distribu tions and amount to (cf., e.g., Hartung et al (2005) [39, p 5 7 6ff ]) SE a := s 1 n + ¯ x ( n − 1) s 2 X SE e (13.19) SE b := SE e √ n − 1 s X , (13.20) where the standard error o f the r esi duals e i is deﬁned by SE e := v u u u u t n X i =1 ( Y i − ˆ Y i ) 2 n − 2 . ( 13.21) W e now describe the test procedure for the r egr ession coefﬁcient β . T o be tested is H 0 against H 1 in one of the alternative pairs of Hypotheses: (test for diffe rences) ( H 0 : β = 0 or β ≥ 0 or β ≤ 0 H 1 : β 6 = 0 or β < 0 or β > 0 . (13.22) Dividing the sample regr ession slope b by its standard err or (13.20) yields the T est s tatistic: T n := b SE b H 0 ∼ t ( n − 2) , (13.23) which, under H 0 , satisﬁes a t –test distribution with d f = n − 2 degrees of freedom; cf. Sec. 8.8. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection r egi o n for H 0 (a) two-sided β = 0 β 6 = 0 | t n | > t n − 2;1 − α/ 2 (b) left-sided β ≥ 0 β < 0 t n < t n − 2; α = − t n − 2;1 − α (c) right-sided β ≤ 0 β > 0 t n > t n − 2;1 − α 13.1. CORRELA TION AN AL YSIS AND LINEAR REGRESSION 139 p –values associated with realisations t n of the test statistic (13.23), which are to be calculated from the t –test distribution , can be obtained from Eqs. (11.3)–(11.5). W e emph as i se once more that for simple lin ear regression the t est procedure just described is equivalent to the correlation analysis of Sec. 13.1.1. An analogous t –test needs to be run to check whether the r egression coefﬁcient α is non-zero, too, using the ratio a SE a as a test st at i stic. Howe ver , in particular when the origin of X is n o t contained in the empi ri cal interval [ x (1) , x ( n ) ] , th e null hypoth esis H 0 : α = 0 is a meaningless statement. R : regMod <- lm( variable:y ~ varia ble:x ) summary(re gMod) GDC: mode STAT → TESTS → LinRegT Test... SPSS: Analyze → Regression → Li near . . . . . . Note: Re grettabl y , SPSS provides no o ption for selecting b etw een a one-sided and a two-sided t –test. The default setting is for a two-sided test. For the purpose of one-sided tests the p –value output of SPSS needs to be divided by 2 . The extent to which t he prerequisi t es of a regression analysis as stated in Eq. (13.5) are satisﬁed can be assessed by means of an analysis of the residuals : (i) for n ≥ 50 , normality of the distribution of residuals e i ( i = 1 , . . . , n ) can be checked by m eans of a Kolmogoro v – Smirno v–test ; cf. Sec. 12 . 3; otherwise, when n < 5 0 , resort to a consideration of th e magnitu d es of the standardised skewness and excess kurtosis measur es , Eqs. (12.14); (ii) homoscedasticity of the e i ( i = 1 , . . . , n ), i.e., whether o r not they can be assumed to have constant va riance, can be in vestigated qualitatively in terms of a scatter plot that marks the standardised e i (along the vertical axis ) against the correspondi ng predicted Y -values ˆ y i ( i = 1 , . . . , n ) (along t he horizontal axis). A n ell iptically shaped en velope of the cloud of data points thus obtained indicates that homo scedasticity applies. Simple l inear regression analys is can be easily modi ﬁed to provide a tool to t est biva riate empirical data { ( x i , y i ) } i =1 ,...,n for posit ive metrically scaled s t atistical variables ( X , Y ) for an associati o n in the form of a Pareto distribu tion ; cf. Sec. 8 .10. T o begin with, the orig inal data is subjected to log- arithmic transformations in order to obt ai n data for the loga rithmic quantities ln( y i ) resp. ln( x i ) . Subsequently , a correlation analysis can be performed on the transformed data. Given there exists a funct i onal relationship between the ori g inal Y and X of t he form y = K x − ( γ +1) , the logarithm ic quantities are related by ln( y ) = ln( K ) − ( γ + 1) × ln ( x ) , (13.24) i.e., one ﬁnds a straight line r ela t ionship between ln( y ) and ln( x ) with ne gative slope equal to − ( γ + 1) . W e like t o draw the reader’ s attenti on to a remarkable statistical p henomenon that was discovered, and emp h atically publicised, by the Engl i sh empiricis t Sir Francis Galton FRS (1822–191 1 ), fol- lowing years of i ntense research during the late 19 th Century; see Galto n (1886) [28], and also 140 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS Kahneman (20 1 1) [46, Ch. 17]. Regr ession toward the mean is best demonstrated on the basis of the standardised version of t he best-ﬁt simple linear regression model of Eq . (13.18), namely ˆ z Y = r z X . (13.25) For bivariate metrically scaled random sample data t hat exhibits a non-perfect positive correlation (i.e., 0 < r < 1 ), one observes that, on av erage, lar ge (small) z X -va lues (i.e., values that are far from their mean; that are, perhaps, e ven outliers) pair with smaller (lar ger) z Y -va lues (i.e., values that are closer to their mean; that are m ore mediocre). Sin ce thi s ph enomenon persists after the roles of X and Y in the regression model have been switched, this is clear evidence that regr ession toward the mean is a manifestation of randomness , and not of causality (which requires an unambiguou s temporal order between a cause and an effe ct). Incident ly , re gre ssion toward the mean ensures that many phys i cal and social processes cannot become unstable. Ending this section we poin t ou t that in reality a l o t of the processes studied in the N a tural Sci- ences and in the Social Sciences prove to be of an in h erently non-linear natur e ; see e.g. Gleick (1987) [34], Penrose (2004) [82], and Smith (2007) [94]. On the o ne hand, this increases the leve l of complexity in volved in the analysis o f data, on t h e o t her , non-linear processes offer the re ward of a plethora of interesting and int ri guing (dy namical) phenomena. 13.2 Rank c orrelation analysis When the two-dimensional statistical variable ( X , Y ) is metri call y scaled but may not be assumed biv ariate normally distributed in the tar get population Ω , or when ( X , Y ) is ordinally scaled in the ﬁrst p lace, the standard t ool for testing for a statistical association between the components X and Y is the parametric rank correlation anal y sis dev elo p ed by the Engli sh psycho logist and s t atisti- cian Charles Edward Spearman FRS (1863–1945) in 1904 [96]. This approach, like the univ ariate test procedures of Mann and Whitney , W ilcoxon, and Kruskal and W allis dis cus sed in Ch. 12, is again fundamentally rooted in the concept of rank number s repre senting statis t ical dat a which possess a natural order , i n troduced in Sec. 4.3. Follo wing t h e t ransl ation of the original data pairs into correspondin g rank number pairs , ( x i , y i ) 7→ [ R ( x i ) , R ( y i )] ( i = 1 , . . . , n ) , (13.26) the objective is to s ubject H 0 in the alternative sets of Hypotheses: (test for association) ( H 0 : ρ S = 0 or ρ S ≥ 0 or ρ S ≤ 0 H 1 : ρ S 6 = 0 or ρ S < 0 or ρ S > 0 , (13.27) with ρ S ( − 1 ≤ ρ S ≤ +1 ) t h e population rank correlation coefﬁcient , to a test of statist i cal signiﬁcance at lev el α . Provided the size of the random sample is such th at n ≥ 30 (see, e.g., Bortz (2005) [5, p 233]), by dividing the sample rank corr elation coefﬁcient r S of Eq. (4.32 ) by its standard err or SE r S := s 1 − r 2 S n − 2 (13.28) 13.3. χ 2 –TEST FOR INDEPENDENCE 141 deriv ed from the theoretical sampling distribution for r S , one obtains a suit able T est s tatistic: T n := r S SE r S H 0 ≈ t ( n − 2) . (13.29) Under H 0 , this approximately satis ﬁes a t –test distribu tion wit h d f = n − 2 degrees of freedom; cf. Sec. 8.8. T est decision: Depending on the kind of t est to be performed, the rejectio n region for H 0 at signiﬁcance lev el α is giv en by Kind of test H 0 H 1 Rejection re gion for H 0 (a) two-sided ρ S = 0 ρ S 6 = 0 | t n | > t n − 2;1 − α/ 2 (b) left-sided ρ S ≥ 0 ρ S < 0 t n < t n − 2; α = − t n − 2;1 − α (c) right-sided ρ S ≤ 0 ρ S > 0 t n > t n − 2;1 − α p –values associated with realisations t n of the test statistic (13.29), which are to be calculated from the t –test distribution , can be obt ain ed from Eqs. (11.3)–(11.5). R : cor.test( v ariable1 , variable2 , method = "spearman") , cor.test( v ariable1 , variable2 , method = "spearman", alternative = "less") , cor.test( v ariable1 , variable2 , method = "spearman", alternative = "greater") SPSS: Analyze → Correlate → Biv ariate . . . : Spearman Effect size: The practical si gniﬁcance o f the phenom enon i nv estigated can be estimated directly from t he absol u t e value of the scale-in variant s ample rank correlation coefﬁcient r S according to (cf. Cohen (1992) [11, T ab . 1]) Rule of thumb: 0 . 10 ≤ | r S | < 0 . 30 : sm all effect 0 . 30 ≤ | r S | < 0 . 50 : medi um effect 0 . 50 ≤ | r S | : large effect. 13.3 χ 2 –test f or independenc e The non-parametric χ 2 –test f o r independen ce constitutes t he most generally applicable si g n if- icance test for biv ariate statist ical associations . Due to its formal in d iff erence t o the scale lev el 142 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS of m easurem ent of t h e two-dimensi onal statistical variable ( X , Y ) i n volved in an in vestigation, it may be used for statistical analysis of any kind of pairwise combin ations between nom i nally , ordinally and metrically scaled components. The advantage of generality of the met h od is paid for at the price of a generally weaker test power . Giv en quali tativ e and/or quantitative stat i stical variables X and Y that take values i n a spectrum of k mutuall y exclusiv e categories a 1 , . . . , a k resp. l mutually exclusi ve categories b 1 , . . . , b l , th e intention is to subject H 0 in the pair of alternative Hypotheses: (test for association) ( H 0 : There does not exist a s tatistical association between X and Y in Ω H 1 : There does exist a s tatistical association between X and Y in Ω (13.30) to a con venient empirical signiﬁcance test at lev el α . A conceptual i s sue that requires sp ecial attention along th e way is the deﬁnition of a reasonabl e zer o point on the scal e of statistical depen dence of s tatistical variables X and Y (which one aims t o establish). This problem is s olved by recognising that a common feature of sample data for statistical variables of all scale levels of measurement is the information residing in th e distri bution of (relati ve) frequencies over (all pos sible combinations of) categories, and drawing an analogy to the con cept of stochastic independence of two ev ents as expressed in Probability Theory by Eq. (7.62). In this way , b y deﬁnition, we refer to v ariables X and Y as being mut ually s tati stically independ ent p rovided that the bivar iate relative frequencies h ij of all combinations of categories ( a i , b j ) are nu merically equal to the products of th e univa riate marginal relative frequencies h i + of a i and h + j of b j (cf. Sec. 4.1), i.e., h ij = h i + h + j . (13.31) T ransl ated into the l anguage of random sample variables, viz. i n troducing sample observe d fre- quencies , this o perational i ndepend ence condition is re-expressed by O ij = E ij , where t he O ij denote t he biva riate observed freq uencies of the category com b inations ( a i , b j ) in a cr oss tabu- lation und erlying a s peciﬁc random sample of size n , and the quantities E ij , which are d eﬁned in terms of (i) the univ ariate sum O i + of obs erved frequencies in row i , see Eq. (4.3), (ii ) the uni- var iate s um O + j of observed frequencies in column j , see Eq. (4.4), and (iii) the sample s ize n by E ij := O i + O + j n , are interpreted as the expected fre quencies of ( a i , b j ) , given that X and Y are statistically independent . Expressing differences between observed and (under independence) expected frequencies via th e r esiduals O ij − E ij , the hypotheses may be reformulated as Hypotheses: (test for association) ( H 0 : O ij − E ij = 0 for all i = 1 , . . . , k and j = 1 , . . . , l H 1 : O ij − E ij 6 = 0 for at least one i and j . (13.32) For the subsequent test procedure to be reliable, it is very i m p ortant (!) t hat t h e empirical prere- quisite E ij ! ≥ 5 (13.33) 13.3. χ 2 –TEST FOR INDEPENDENCE 143 holds for all values of i = 1 . . . , k and j = 1 , . . . , l , such t h at o ne avoids the possibility of ind ivid- ual rescaled squared residuals ( O ij − E ij ) 2 E ij becoming artiﬁcially magniﬁed. The latter constitute the core of the T est s tatistic: T n := k X i =1 l X j =1 ( O ij − E ij ) 2 E ij H 0 ≈ χ 2 [( k − 1) × ( l − 1) ] , (13.34) which, under H 0 , approxim at ely s at i sﬁes a χ 2 –test distrib ution with d f = ( k − 1) × ( l − 1) degrees of freedom; cf. Sec. 8.7. T est decisio n: The rejection region for H 0 at signiﬁcance leve l α is giv en by (right -sided test) t n > χ 2 ( k − 1) × ( l − 1);1 − α . (13.35) By Eq. (11.5 ), the p –value associ at ed with a realisati on t n of the test statistic (13.34), whi ch is to be calculated from the χ 2 –test distrib ution , amount s to p = P ( T n > t n | H 0 ) = 1 − P ( T n ≤ t n | H 0 ) = 1 − χ 2 cdf (0 , t n , ( k − 1) × ( l − 1)) . (13.36) R : chisq.test ( row variable , column variable ) GDC: mode STAT → TESTS → χ 2 -Test... SPSS: Analyze → Descriptive Stati s tics → Crosstabs . . . → Statist i cs . . . : Chi-square The χ 2 –test for i ndependence can establish the exis tence of a signiﬁcant association i n the joint distribution of a two-dimensional statis t ical v ariable ( X , Y ) . The stren gth of the association, on the other h and, may be measured in terms of Cramér’ s V (Cramér (1946) [13]), which has a normalised range of values gi ven by 0 ≤ V ≤ 1 ; cf. Eq. (4.36) and Sec. 4.4. Low values of V i n the case of signi ﬁcant ass o ciations between components X and Y typically ind i cate the stati stical inﬂuence of additional contr o l variables . R : assocstats ( contingen cy table ) (package: vcd , by Meyer et al (2017) [70]) SPSS: Analyze → Descriptive Stati s tics → Crosstabs . . . → Statist i cs . . . : Phi and Cramer’ s V Effect size: The practical signiﬁcance of the phenomenon in vestigated can be est imated and in- terpreted by means of the effec t size measure w d eﬁned in Eq. (12.13); cf. Cohen (1992) [11, T ab . 1]. 144 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS Outlook Our discussion on the foundations of s tatistical m ethods of data analys is and t heir application to speciﬁc quantitative problems ends here. W e ha ve focused on the description of uni- and biv ari- ate data sets and making inferences from correspon d ing random samples wit h in the frequentist approach to Probabili ty Theory . At t his stage, the attentive reader should feel well -equipped for confronting problems concerning mo re complex, multiv ariate data sets, and adequate m ethods for tackling them by statist ical means. Many modu les at the Master degree level revie w a broad spec- trum of advanced top ics such as multi ple l i near regression, generalised lin ear models, pri n cipal component analysis, or cluster analysis, which in turn relate to computational techniques presently employed in th e context of machine learning. Th e ambitious reader might eve n think of getting in- volv ed with proper research and work towards a Ph.D. degree in an empirical scientiﬁc discipli ne. T o gain additio n al data analyti cal ﬂexibility , and to increase chances on ob taining transparent and satisfactory research results, it is stron gly recommended to consu lt the conceptually compellin g inductive Bayes–Laplace approach to statistical in ference. In o rder to leav e behind t he met h od- ological s h ortcomings uncovered by the recent replication crisis (cf., e.g., Refs. [17], [76], or [112]), strict adherence to accepted scientiﬁc standards cannot be com p romised w i th. 2 Beyond activities within the scientiﬁc commu n ity , the dedicated reader may feel encouraged to use her/his solid topical qualiﬁcation in statistical m et h ods of data analysi s for career s in either ﬁeld of higher education , public health, rene wable ener gy s u pply chains, ev aluation of climate change adaptation, de velopment of plans for sustainable production in agricultu re and global econom y , civil service, business m anagem ent , marketing, logist i cs, or the ﬁnancial services, amon gst a m ul- titude of other inspirational possibil ities. Not e very single m atter of human life is am enable to quantiﬁcation, or , acknowledging an indi- vidual freedom of making choices, n eeds to be quanti ﬁed in the ﬁrst place. Blind faith in the powers of quantitative metho ds i s certainly m isplaced. Thoroug h reﬂection and i n trospection on the option s a vailable for action and their implied consequences, tog et h er with a crit ical ev alua- tion of relev ant tangible fac ts, might suggest a vi able alternativ e approach to a given research or practical problem . Generally , there is a potential for look ing behind curtains, s h ifting horizons, or anticipating pros pects and opportuniti es. Finally , mo re often than not, there exists a dimension of non-knowledge on the part of the in d ividual in vestigator that needs t o be taken int o account as an integral part of the boundary conditions of t he ov erall problem in question. The adventurous m i nd will always excel in view of th e intricate challeng e of making inferences on the b as i s of incomplete information. 2 W ith regard to the replication crisis, the interested reader mig ht b e aw are of th e international in itiati ve kn own as the Open Science Framework. URL (cited o n August 17 , 2019 ): http s://osf.io. 145 146 CHAPTER 13. B IV ARIA TE M E T H O DS OF ST A TISTICAL D A T A A NAL YSIS A pp e n d ix A Principal component analysis of a (2 × 2) corr elation matrix Consider a real-valued (2 × 2) corre lation matrix expressed by R =  1 r r 1  , − 1 ≤ r ≤ + 1 , (A.1) which, by const ruction, is symmet ri c. Its trace amoun t s t o T r( R ) = 2 , while its determinant is det( R ) = 1 − r 2 . Consequently , R is regular as long as r 6 = ± 1 . W e seek to determine the eigen values and corresponding eigen vectors (or principal components ) of R , i .e., real numb ers λ and real-va lued vectors v such that the condition R v ! = λ v ⇔ ( R − λ 1 ) v ! = 0 (A.2) applies. The determination of non-trivial solu t ions of th is algebraic problem leads to the charac- teristic equation 0 ! = det( R − λ 1 ) = ( 1 − λ ) 2 − r 2 = ( λ − 1) 2 − r 2 . (A.3) Hence, by completing squares, it is clear that R pos sesses the two eig en values λ 1 = 1 + r and λ 2 = 1 − r , (A.4) showing t hat R is positive-d eﬁnite whenever | r | < 1 . The normalised eigen vectors associated with λ 1 and λ 2 , obtained from Eq. (A.2), then are v 1 = 1 √ 2  1 1  and v 2 = 1 √ 2  − 1 1  , (A.5) and const i tute a right-handedly oriented basis of the two-dimens ional eigenspace of R . Note that due to the symmet ry of R it ho lds th at v T 1 · v 2 = 0 , i.e., the eigen vectors are mut u ally ortho gonal. The normalised eigen vectors o f R deﬁne a regular orthogo n al transf ormation matrix M , and an in verse M − 1 = M T , give n by resp. M = 1 √ 2  1 − 1 1 1  and M − 1 = 1 √ 2  1 1 − 1 1  = M T , (A.6) 147 148 APPENDIX A. SIMPLE PRINCIP AL COMPONENT ANAL YSIS where T r( M ) = √ 2 and det ( M ) = 1 . T he correlation mat ri x R can now b e diagonalised by means of a rotation with M according to 1 R diag = M − 1 RM = 1 √ 2  1 1 − 1 1   1 r r 1  1 √ 2  1 − 1 1 1  =  1 + r 0 0 1 − r  . (A.7) Note t hat T r( R diag ) = 2 and det( R diag ) = 1 − r 2 , i.e., the trace and determi n ant of R remain in variant under the diagonalising transformation. The concepts o f eigen values and eigen vectors (principal compon ents), as well as of diagonalis at i on of sy m metric matrices, generalise in a straightforward though computationally more demanding fashion t o arbitrary real-v alued corr elati o n matrices R ∈ R m × m , with m ∈ N . R : prcomp( dat a matrix ) 1 Alternatively one c an write M =  cos( π / 4) − s in( π / 4) sin( π / 4) cos( π / 4)  , thus emph asising the character of a r o tation of R by an angle ϕ = π / 4 . A pp e n d ix B Distance measur es in Statistics Statistics employs a numb er of diffe rent measures of distance d ij to quantify the separation i n an m –D space of metrically scaled statistical variables X, Y , . . . , Z of two statisti cal uni ts i and j ( i, j = 1 , . . . , n ). Note that, by construction, t hese measures d ij exhibit the properti es d ij ≥ 0 , d ij = d j i and d ii = 0 . In the following, X ik is t h e ent ry of the data matrix X ∈ R n × m relating t o the i th st atistical unit and the k th stati stical variable, etc. The d ij deﬁne the elem ents of a ( n × n ) pr oximity matrix D ∈ R n × n . Euclidian distance (dimensionful) This mo st straightforward, di mensionful dis t ance measure is named after the ancient Greek (?) mathematician Euclid of Alexandria (ca. 325BC–ca. 265BC). It is deﬁned by d E ij := v u u t m X k =1 m X l =1 ( X ik − X j k ) δ k l ( X il − X j l ) , (B.1) where δ k l denotes the elements of the unit matrix 1 ∈ R m × m ; cf. Ref. [18, Eq. (2.2)]. Mahalanobis distance (dimensionless) A more sophisticated, scale-in variant dis tance measure in Statistics was de vised by the Indian applied statistician Prasanta Chandra Mahalanobi s (1893–1972); cf. M ahalanobis (1936) [67]. It is deﬁned by d M ij := v u u t m X k =1 m X l =1 ( X ik − X j k )( S 2 ) − 1 k l ( X il − X j l ) , (B.2) where ( S 2 ) − 1 k l denotes the elements of the in verse covariance matrix ( S 2 ) − 1 ∈ R m × m relating to X , Y , . . . , Z ; cf. Sec. 4.2.1. Th e Mahalanobis distance thus accounts for inter-v ariable correlations and so eliminates a potential source of bias. R : mahalanobi s( data matrix ) 149 150 APPENDIX B. DIST ANCE MEASURES IN ST A TISTICS A pp e n d ix C List of online surv ey tools A ﬁrst version of the following li s t of on line survey tools for t he Social Sciences, the use of some of whi ch is free of charge, was compil ed and released courtesy of an in vestigation by Michael Rüger (IMC, year of entry 2010): • easy-feedback.de/de/startseit e • www .ev alandgo.de • www .limesurvey .org • www .netigate.de • pol l daddy .com • q-set.de • www .qualtrics.com • www .soscisurvey .de • www .surveymonke y .com • www .umfrageonline.com 151 152 APPENDIX C. LIST OF ONLINE SUR VEY TOOLS A pp e n d ix D Glossary of technical terms (GB – D) A additive: additiv , summierbar ANO V A: V arianzanalyse arithmetical mean: arithmetischer Mitt elwert association: Zusammenhang, Assoziati on attribute: Aus prägung, Ei g enschaft B bar chart: Balkendiagramm Bayes’ theorem: Satz von Bayes Bayesian probability: Bayesianischer W ahrscheinlichkeitsbegrif f best-ﬁt model: Anpassungsm odell bin: Datenintervall binomial coef ﬁcient: Binomialkoefﬁzient biv ariate: biv ariat, zwei var iable Größen betreff end box plot: Kastendiagramm C category: Kategorie causality: Kausalität causal relationship: Kausalbeziehung census: statisti s che V ol lerhebung central limit theorem: Zentraler Grenzwertsatz centre of gravity: Schwerpunkt centroid: geometrischer Schwerpunkt certain ev ent : sicheres Ereignis class interval: Ausprägung s klasse cluster analysis: Klumpenanalys e cluster random sample: Klum penzufallsstichprobe coef ﬁcient of determination: Bestimm t heitsmaß coef ﬁcient of variation: V ariatio n s koef ﬁzient combination: K ombinatio n combinatorics: K ombinatorik 153 154 APPENDIX D. GLOSSAR Y OF TECHNICAL TERMS (GB – D) compact: geschlossen, kompakt complementation of a set: Bilden der Komplementärmenge concentration: K onzentration conditional distribution: beding t e V erteilung conditional probability: bedingte W ahrschei n l ichkeit conﬁdence interval: Konﬁdenzinterv all conjunction: K onjunktion , Mengenschnitt contingency table: K onti n genztafel continuous data: stetig e Daten control var iable: Störvariable con venience s am ple: Gelegenheitsstichprobe con vexity: Kon ve xität correlation matrix: K orrelationsmatrix cov ariance matrix: Ko var ianzmatrix critical value: kritischer W ert cross tabulation: Kreuztabell e cumulative dis tribution fun ct i on ( cdf ): theoretische V erteilung s funktion D data: Daten data matrix: Datenmatrix decision: Entscheidu n g deductiv e metho d : deduktive Method e degree-of-belief: Glaubwürdigkeitsgrad, Plausib i lität degrees of freedom: Freiheitsgrade dependent va riable: abhängige V ariable descriptive statis tics: Beschreibende Statistik deviation: Abweichung diffe rence: Differenz direction: Richtung discrete data: diskrete Daten disjoint ev ent s: d isjunkte Ereignisse, einander ausschließend disjunction : Disjunkti o n, Mengen vereinigung dispersion: Streuung distance: Abstand distortion: V erzerrung distribution: V erteil u ng distributional properties: V erteilu n gseigenschaften E econometrics: Ökonometrie ef fect size: Effektgröße eigen value: E i genwert elementary ev ent: Elementarereignis empirical cumulative dis tribution fun cti on: empirische V erteilungsfun ktion 155 estimator: Schätzer Euclidian distance: Euklid i scher Abstand Euclidian space: Eukli discher (nichtgekrüm mter) Raum e vent: Ereigni s e vent sp ace: Ereignisraum e vidence: Anzeichen, Hinweis, Anhaltsp unkt, Ind i z expectation value: Erwartungswert extreme value: extremer W ert F fact: T atsache, Faktum factorial: Fakultät falsiﬁcation: Fa lsiﬁkation ﬁ ve numb er s u mmary: Fünfpunktzusammenfassung frequency: Häuﬁgkeit frequentist probability: frequentistis cher W ahrscheinlichkeitsbegriff G Gini coef ﬁcient: Ginikoefﬁzient goodness-of-the-ﬁt: Anpassungs güte H Hessian matrix: Hesse’ sche Matrix histogram: Histogramm homoscedasticity : Homoskedastizität , homogene V arianz hypothesis: Hypothese, Behauptung, V ermutung I inclusion of a set: Mengeni nklusion independent variable: unabhängi ge V ariable inductive meth o d: i nduktive Methode inferential statistics: Schließende Statistik interaction: W echselwirkung intercept: Achsenabschnitt interquartile range: Quartilsabs t and interval scale: Intervallskala impossib l e ev ent: unmögliches Ereignis J joint distribution: g em einsame V erteilung K k σ –rule: k σ –Regel kurtosis: Wölbung L latent va riable: latente V ariable, ni chtbeobachtbares K onst rukt law of large numbers: Gesetz der großen Zahlen 156 APPENDIX D. GLOSSAR Y OF TECHNICAL TERMS (GB – D) law of tot al probability: Satz v on der totalen W ahrscheinlichkeit Likert scale: Likertskala, V erfahren zum Messen von eindimensionalen latenten V ariablen linear regression analy sis: lineare Regressionsanalyse location parameter: Lageparameter Lorenz curve: Lorenzkurve M Mahalanobis distance: Mahalanobi s ’ scher Abstand manifest variable: manifeste V ariable, Obs erv able marginal distri bution: Randver teilung marginal frequencies: Randhäuﬁgkeiten measurement: Messung, Datenaufnahme method of least squares: Meth o de d er k l einsten Qu adrate median: Median metrical: metrisch mode: Modalwert N nominal: nominal O observable: beobachtbare/mess bare V ariabl e, Observable observation: Beobachtung odds: W et t chancen operationalisation: Operationalisieren, latente V ariable m essbar g estalten opinion poll: Meinungs umfrage ordinal: ordinal outlier: Ausreißer P p –value: p –W ert partition: Zerlegung, A u fteilung percentile value: Perzentil, α –Quantil pie chart: Kreisdiagramm point estimator: Punktschätzer population: Grundgesamth ei t power: T eststärke power set: Potenzmenge practical signiﬁcance: praktische Signiﬁkanz, Bedeutung principal component analysis: Hauptkomponentenanalys e probability: W ahrschein lichkeit probability density function ( pdf ): W ahrscheinlichkeitsdichte probability function: W ahrscheinlichkeitsfunktion probability measure: W ahrscheinlichkeitsmaß probability space: W ahrschein l ichkeitsraum projection: Projektion 157 proportion: Anteil proximity matrix: Distanzm atrix Q quantile: Quantil quartile: Quartil questionnaire: Fragebogen R randomness: Zufälligkeit random experiment: Zufallsexperiment random sample: Zufallsstichprobe random va riable: Zufallsvariable range: Spannweite rank: Rang rank number: Rangzahl rank order: Rangordnung ratio scale: V erhältnissk ala raw data s et: Datenurliste realisation: Realisierung, konkreter Messwert für eine Zufallsvariable regression analysi s: Re gressionsanalyse regression coefﬁcient: Regressionskoefﬁz ient regression m o del: Regressionsmodell regression toward the mean: Regression zur Mitte rejection region: Ab lehnungsbereich replication: Nachahmung research: Forschung research question: Forschungsfrage residual: Residuum, Restgröße risk: Risiko (berechenbar) S σ –algebra: σ –Algebra 6 σ –event: 6 σ –Ereignis sample: Stichprobe sample correlation coefﬁ cient: Stichprobenkorrelationskoefﬁ zient sample cov ariance: Stichprobenkovar ianz sample mean: Stichprobenm i ttelwert sample size: Stichprobenum fang sample space: Ergebnismenge sample var iance: Stichproben va rianz sampling distribution: Stichprobenkenngrößen verteilung sampling error: Stichprobenfehler sampling frame: Auswahlgesamtheit sampling unit: Stichprobeneinheit scale-in variant: skalenin variant 158 APPENDIX D. GLOSSAR Y OF TECHNICAL TERMS (GB – D) scale lev el: Skalenniveau scale parameter: Skalenparameter scatter plot: Streudiagramm scientiﬁc method: W issens chaftliche Methode shift theorem: V erschiebungssatz signiﬁcance lev el: Sign iﬁkanzniv eau simple random sample: einfache Zufallsstichprobe ske wn ess: Schiefe slope: Steigung spectrum of values: W ertespektrum spurious correlation: Scheinkorrelation standard error: Standardfehler standardisation: Standardisierung statistical (in)dependence: statist ische (Un)abhängigkeit statistical unit: Erhebungseinheit statistical signiﬁcance: statis t ische Signiﬁkanz statistical va riable: Merkmal, V ariable stochastic: stochastisch, wahrscheinlichkeitsbedingt stochastic independence: stochastis che Unabhängigkeit stratiﬁed random sample: geschichtete Zufallsstichprobe strength: Stärke summary table: Zusammenfassungst abelle survey: statistis che Erhebung, Umfrage T test statistic: T eststatist ik, st atistische Ef fekt m essgröße type I error: Fehler 1. Art type II error: Fehler 2. Art U unbiased: erwartungstreu, un verfälscht, u n verzerrt uncertainty: Unsicherheit (nicht berechenbar) univ ariate: univ ariat, eine variable Größe betreffend unit: Einheit urn model: Urnenmodell V value: W ert var iance: V arianz var iation: V ariation V enn diagram: V enn– Diagramm visual analogue scale: visuelle Analogsk al a W weighted mean: gewichteter Mi ttelwert 159 Z Z s cores: Z –W erte zero point: Nullpu n kt 160 APPENDIX D. GLOSSAR Y OF TECHNICAL TERMS (GB – D) Bibliograph y [1] F J Anscombe and R J Au mann (1963) A deﬁnition of subjective probability The Annals of Mathematical Stati stics 34 (1963) 199–20 5 [2] T Bayes (1763 ) An essay tow ards sol ving a p roblem in the doctrine of chances Philosophi cal T ransacti ons 53 370–4 18 [3] P L Bernstein (1998) Ag a i nst the Gods — The Remarkable Stor y o f Risk (New Y ork: W i l ey) ISBN–10: 047129563 9 [4] J–P Bouchaud and M Potters (2003) Theory o f F inancial Risk and Deri vat ive Pricing — F r om S t atistical Physics to Ri s k Management 2 nd Edition (Cambridge: Cambrid g e U n iv ersity Press) ISBN–13: 9780521741 866 [5] J Bortz (2005) Stati stik für Human– un d Sozialwiss enschaftler 6 th Edition (Berlin: Springer) ISBN–13: 978354021 2713 [6] J Bortz and N Döring (2006) F orschungsmethoden un d Evaluation für Human – und Sozia l - wissenschaftler 4 th Edition (Berlin: Springer) ISBN–13: 9783540333050 [7] K Bosch (199 9) Grundzüge der Stat istik 2 nd Edition (München: Ol denbourg) ISBN–10: 3486252593 [8] A Bra vais (1846) Analyse mathématiq ue sur les probabilit és des erre urs de situat i on d’un point Mémoir es présentés par divers savants à l ’Académie r oyale des sciences d e l’Institut de F rance 9 255–332 [9] M C Bryson (1976) The Literary Dig est p o ll: makin g of a statist ical myth The American Statisticia n 30 184–185 [10] G Cardano (1564) Lib er de Ludo A l eae (Book on Games of Chance) [11] J Cohen (1992) A p ower p rimer Psychological Bulleti n 112 155–15 9 [12] J Cohen (200 9 ) Statis t ical P ower Analysis for the Behavioral Sciences 2 nd Edition (New Y ork: Psychology Press) ISBN–13: 97808 05802832 [13] H Cramér (1946) Mathemati cal Meth ods of Statistics (Princeton, NJ : Princeton University Press) ISBN–10: 0691080046 161 162 BIBLIOGRAPHY [14] L J Cronbach (195 1) Coefﬁ cient alpha and the internal structure of tests Psychometrika 16 297 –334 [15] P Dalgaard (2008) Int roductory Stati stics with R 2 nd Edition (New Y ork: Springer) ISBN–13: 978038779053 4 [16] C Dul l er (2007) Einführun g in di e Statist ik mit EX CEL und SPSS 2 nd Edition (Heidelberg: Physica) ISBN–13: 9783790819113 [17] The Ecom omist (20 1 3) Trouble at the lab URL (cited on August 25, 2015): www .economi st.com/news/brieﬁng/21588057-scientists-think-science-self-correcting-alarming-degree-i t - n o t - t r o u b l e [18] H van Elst (2015) An introducti on to business mathemati cs Pr eprint arXiv:1509.04333v2 [q-ﬁn.GN]] [19] H van Elst (2018) An introducti on to inductive s t atistical inference: from p arameter estim a- tion to decision-making Pr eprint arXiv:1808.10137v1 [stat.A P] [20] W Feller (1951) The asymptotic distribution of th e range of sums of independent random var iables The Annals of Mathematical Stati stics 22 427– 432 [21] W Feller (196 8 ) An Intr odu ction to Probability Theory and Its Ap plications — V ol ume 1 3 rd Edition (Ne w Y ork: W il ey) ISBN–13: 9780 4 71257080 [22] R A Fisher (1918 ) Th e correlation between relativ es on the supposit ion of Mendelian inheri- tance T ransactio ns of the Royal Societ y of Edi nbur gh 52 399–43 3 [23] R A Fisher (1924) O n a distribution yielding the error functi o ns of se veral well known st at i s- tics Pr oc. Int. Cong. Math. T oronto 2 805–813 [24] R A Fisher (1935 ) The logic of inductive inference J ournal of the Royal Statistical Society 98 39–82 [25] J Fox and S W eisberg (2011) An R Companion to A p plied Re g ression 2 nd Edition (Thousand Oaks, CA: Sage) URL (cited on June 8, 2019): socserv .socsci.m cmaster .ca/jfox/Books/Companio n [26] M Freyd (1923) The graphic rating scale Journal of Educati onal Ps ychology 14 83–102 [27] F Galton (1869) Her editaty Genius: An Inquiry into its Laws and Consequences (London : Macmillan) [28] F Galton (1886) Regression tow ards mediocrity in h eredit ary stature The J ournal of the Anthr opological Institute of Gr eat Britain and Ire land 15 246– 2 63 [29] C F Gauß (1809) Theoria mo t us corporum celestium in sectioni bus conicis sol em am bientium [30] A Gelman, J B Carlin, H S Stern, D B Duns o n, A V eht ari and D B Rubin (2014) Bayesian Data Analysis 3 rd Edition (Boca Raton, FL: Chapman & Hall) ISBN–13: 9781439840955 BIBLIOGRAPHY 163 [31] I Gi lboa (2009) Theory of Decisio n under U n certainty (Cambridge: Cambridge Unive rsity Press) ISBN–13: 9780521571 324 [32] J Gil l (1999) The insigniﬁcance of null hypot hesis signiﬁcance testing P oli tical Researc h Quart erl y 5 2 647–674 [33] C Gini (1921) M easurement of inequality of incomes The Economic Journal 31 124 – 126 [34] J Gleick (1987) Chaos — Making a Ne w Science n th Edition 199 8 (London : V intage) ISBN– 13: 9780749386 061 [35] E Greenber g (2013) Int roduction to Bayesian Econometrics 2 nd Edition (Cambridge: Cam- bridge Univ ersity Press) ISBN–13: 9 7 81107015319 [36] J F Hair jr , W C Black, B J Babin and R E An d erson (2010) Mult i variate Data Analysis 7 th Edition (Upper Saddle Ri ver (NJ): Pearson) ISBN–13: 9780135153 0 93 [37] R Hatzinger and H Nagel (2013) Statis tik mit SPSS — F allbeispiele und Methoden 2 nd Edition (München: Pearson Studium) ISBN–13: 97838 68941821 [38] R Hatzinger , K Hornik, H Nagel and M J Maier (2014) R — Einführung durc h ange wandte Statisti k 2 nd Edition (München: Pearson Studium) ISBN–13: 978386 8942507 [39] J Hartung, B Elpelt and K–H Klö s ener (2005) Statis tik: Lehr– und Handbuch der ange- wandten Statistik 14 th Edition (München: Oldenburg) ISBN–10: 34865789 01 [40] M H S Hayes and D G Paterson (1921) Ex perimental development of the g raphic rating method Psychological Bulletin 18 98–99 [41] J M Heinzle, C Ugg la and N Röhr (2009) The cosm ological billard attrac- tor Advances i n Theor etical and Mat hematical Physics 13 293–407 and Pr eprint arXiv:gr -qc/07021 41v1 [42] S Holm (1979) A sim ple sequentially rejecti ve multi ple test p rocedure Scandinavian J ournal of Statist ics 6 65–70 [43] E T Jaynes (2003) Pr obabili ty Theory — The Logic of Science (Cambridge: Cambridge Uni- versity Press) ISBN–13: 9780521592710 [44] H Jeffreys (1939) Theory of Pr o b ability (Oxford: Oxford Un iversity Press) (1961) 3 rd Edition ISBN–10 (2003 Reprint): 019850368 7 [45] D N Joanes and C A Gill (1998) Comparing measures of sample ske wness and kurtosi s J ournal of the Royal Statistical Society: Series D (The S tatisticia n ) 47 183–18 9 [46] D Kahneman (20 11) Thinking, F ast and Sl ow (Lon don: Penguin) ISBN–13: 9780 141033570 [47] D Kahneman and A Tversky (1979) Prospect Theory: an analysis of decision under risk Econometrica 47 263–292 164 BIBLIOGRAPHY [48] M Keuls (1952) The use o f the “studentized range” in connecti on wit h an analysi s of variance Euphytica 1 112–122 [49] I M Khalatnikov , E M Lifshitz, K M Khanin, L N Shchur and Y a G Sinai (1985) On th e stochasticity in relativistic cosmology Journal of Stat istical Physics 38 9 7 –114 [50] A Kolmogoroff (1933) Grundbe griffe der W ahrscheinlichkeitsr echnung (Berlin: Springer) 2 nd reprint: (1973) (Berlin: Springer) ISBN–13: 9783540061106 [51] A N K olmogorov (1933) Sulla determinazione em pirica di u n a legge di distribuzione Inst. Ital. Atti. Giorn. 4 83–91 [52] C Kredler (20 0 3) Einführung in die W ahrscheinlichke itsr echnung und Stati stik Onli ne lec- ture notes (München: T echnische Universität München) URL (cited on August 20, 2015): www .ma.tu m .de/foswiki/p ub/Studium/ChristianKredler/Stoch1.pdf [53] J K Kruschke and T M Liddell (2017) The Bayesian New Statist ics: hypoth esis testing, estim ation, m et a-analysi s, and power analysis from a Bayesian perspecti ve Psychonomic Bulleti n & Review 24 1–29 (Brief Report) [54] W H Kruskal and W A W allis (1952) Us e of ranks on on e-criterion variance analysis J ournal of the American Statistical Associati on 47 583–62 1 [55] D Lak ens (2017) Understanding common misconception s ab out p-values (blog entry: De- cember 5, 2017) URL (cited on June 19, 2019): htt p ://daniellakens.blogsp ot.com/2017/ [56] P S Lapl ace (177 4 ) Mémo i re sur l a probabilité des causes par les évènements Mémoir es de l’Académie Royale des Sciences Pr esent és p ar Divers Savans 6 621–656 [57] P S Laplace (1809) Mémo i re sur l es approxi m ations des formules qui sont fonctions de très grands no mbres et sur leur appl ication aux probabilités Mémoir es de l’Académie des sci ences de P aris [58] P S Laplace (18 12) Théorie Analytique des Pr obabil ités (Paris: Courcier) [59] E L Lehman and G Casella (1998) Theory of P oint Estimat i on 2 nd Edition (Ne w Y ork: Springer) ISBN–13: 978038798 5022 [60] H Lev ene (19 6 0) Robust tests for equality of variances Contributions t o Probability and Statisti cs: Essays in Hon or of Har old Hotellin g eds I Olkin et al (Stanford, CA: Stanford Univ ersity Press) 278 –292 [61] J A Levin, J A Fox and D R Forde (2010) Elementary Statistics in Social Resear ch 11 th Edition (München: Pearson Education) ISBN–13: 9780205 6 36921 [62] R Likert (1932) A technique for the measurement of attitudes Ar chives of Psychology 140 1–55 BIBLIOGRAPHY 165 [63] J W Li ndeber g (1922) Eine neue Herleitung des Expo n ent ialgesetzes in der W ahrschein- lichkeitsrechnung Mat hematische Zeitschrift 15 2 11–225 [64] M O Lorenz (1905) Methods of measuring th e concentration of wealth Publications of the American Statis tical Associat ion 9 209–219 [65] R Lupto n (1993) Stati stics in Theory and Practice (Princeton, NJ: Princeton University Press) ISBN–13: 978069107 4290 [66] A M L yapunov (1901) Nouvelle forme du théorème sur l a limite de la probabilité Mé- moir es de l’Académie Impériale des Sciences de St .-Pétersbour g VIII e Série, Classe Physico– Mathématique 12 1–24 [in Russian] [67] P C M ahalanobis (19 3 6) On the generalized distance in statist ics Pr oceedings of the National Institut e of Sciences of Indi a (Calcutta ) 2 49–55 [68] H B Mann and D R Whitney (1947) On a test o f whether one of two random variables is stochastically lar ger than the oth er The Ann als of Mathematical Statistics 18 50–60 [69] R McElreath (2016) S t atistical Rethinking — A Bayesian Cour se with Examples in R and Stan (Boca Raton, FL: Chapman & Hall) ISBN–13: 9781 4 82253443 [70] D M eyer , A Zeileis and K Hornik (2017) vcd: V is ualizing cate gorical data (R pack age ver- sion 1.4-4) URL (cited on June 7, 2019): htt ps://CRAN.R-project.org/package=vcd [71] D Meyer , E Dimitriadou, K Hornik, A W eingessel and F Leisch (2019) Misc f unctions of the Department of St atistics, Pr ob ability Theory Gr oup (F orm erl y: E1071), TU W ien (R pac kage version 1.7 -1) URL (cited on May 16, 2019): https ://CRAN.R-project.org/package= e1071 [72] S P Mi l lard (2013) En vStats: An R P acka ge for En vir onm ent al Statistics (Ne w Y ork: Springer) ISBN–13: 978146148 4554 [73] L M lodinow (2008) The Drunkar d’ s W al k — How Randomness Rules Our Lives (New Y ork: V intage Books) ISBN–13: 97 80307275172 [74] D Newman (1939) The distri bution of range in sampl es from a norm al popul at i on, expressed in terms of an independent estimate of standard deviation Bi ometrika 3 1 20–30 [75] J Neyman and E S Pearson (1933) O n the problem o f the most efﬁcient tests of statisti cal h y- potheses Philosophi cal T ransact i ons of the R o yal Society of London, Series A 231 289–337 [76] R Nuzzo (2014) Scientiﬁc metho d: stati stical errors — P va lues, the ‘gold standard’ of sta- tistical v alidity , are not as reliable as m any scienti sts ass u me Natur e 506 150–152 [77] V Pareto (1896) Cours d’Économie P olit i que (Genev a: Droz) [78] K Pearson (1900) On the criterion that a given syst em of deviations from the probable in the case of a correlated sys tem of variables is such that it can b e reasonably supposed to hav e arisen from random sampling Philos o phical Magazine Series 5 50 157–17 5 166 BIBLIOGRAPHY [79] K Pearson (1901) LIII. On lines and planes of closest ﬁt to systems of points i n space Philosophi cal Magazine Series 6 2 559–572 [80] K Pearson (1903 ) The law of ancestral heredity Biometri ka 2 211 – 228 [81] K Pearson (1920 ) Notes o n the t heory of correlation Biometrika 13 25–45 [82] R Penrose (20 04) The Road to Reality — A Complete Guide to the Laws of the Universe 1 st Edition (London: Jonathan Cape) ISBN–10: 02 2 4 044478 [83] K R Popper (2002) Conjectur es and Refutations: The Gr owth of Scientiﬁc Knowledge 2 nd Edition (London: Routledge) ISBN–13: 9780415 2 85940 [84] A Quetelet (1835) Sur l’ Homme et le Développment de ses F a cultés, ou E ssai d’une P hysique Sociale (Paris: Bachelier) [85] R Core T eam (2019 ) R: A l a n guage and en vir o n ment for statisti cal computing (W ien: R Foun- dation for Statistical Computing ) URL (cited on June 24, 2019 ): https://www .R-project.org/ [86] W Re velle (20 1 9) psych: Proce dur es for psychological, ps ychometric, and per - sonality r esear ch (R pack age version 1.8.12) URL (cit ed on June 2, 2019): https:// CRAN.R-project.org/package=psych [87] H Rinne (200 8) T as chenbuch der Statistik 4 th Edition (Frankfurt/Main: Harri Deutsch) ISBN–13: 978381711 8274 [88] P Saha (2002) Pri nciples of Data A n alysis Onl i ne lecture notes URL (cit ed on August 15, 2013): www .ph ysik.uzh.ch/~psaha/p d a/ [89] L J Sav age (1954 ) The F ound a t ions of St atistics (Ne w Y ork: W il ey) Reprint: (1972) 2 nd re vised Edition (Ne w Y ork: Dover) ISBN–13: 9780 4 86623498 [90] H Scheffé (1959 ) The An alysis of V ariance (New Y ork: W iley) Reprint: (1999) (New Y ork: W iley) ISBN–13: 9 7 80471345053 [91] R Schnell, P B Hill and E Esser (2013) Methoden der empirischen Sozial forschung 10 th Edition (München: Oldenbourg) ISBN–13: 9 7 83486728996 [92] D S Si v ia and J Skilling (2006) Data Analysis — A Bayesian T uto r ial 2 nd Edition (Oxford: Oxford Univ ersi t y Press) ISBN–13: 978 0198568322 [93] N Smirnov (1939) On the estimation of the discrepancy bet w een empirical curves of distri- bution for two independent samples Bull. Math. Univ . Moscou 2 fasc. 2 [94] L Smith (2007) Chaos — A V ery Short Intr oductio n (Oxford: Oxford University Press) ISBN–13: 978019285 3783 [95] G W Snedecor (1934) Calculation and Interpr etation of Analysis of V ariance and Covariance (Ames, IA: Collegiate Press) BIBLIOGRAPHY 167 [96] C Spearman (1904) Th e proof and measurement o f associatio n between two thing s The American J ournal of Psychology 15 72–101 [97] Statist ical Society of London (1838 ) Fourth An nual Report of the Council of th e Statist ical Society of London J ournal of the Stati stical Society of London 1 5– 13 [98] S S Steve ns (1946) On the theory of scales of measurement Science 103 6 77–680 [99] S M Stigler (1986) The History of S t atistics — The Measur ement of Uncertainty before 1900 (Cambridge, MA: Harvard Unive rsity Press) ISBN–10: 067440341x [100] Student [W S Gosset] (1908) Th e probable error of a mean Biometr i ka 6 1–25 [101] sue ddeutsche. de (2012) Reiche trotz Finanzkrise im - mer reicher URL (cited on September 19, 201 2): www .sueddeut s che.de/wirtschaft/neuer-armuts-und-reichtumsbericht-der-b undesregierung-reiche-trotz-ﬁ n a n z k r i s e - i m m e r - r e i c h e r - 1 . 1 4 7 0 6 7 3 [102] G M Sullivan and R Feinn (2012) Usi n g ef fect size — or why the p value is not enough J ournal of Graduate Medical Educat ion 4 279–282 [103] E Svetlov a and H v an Elst (20 1 2) Ho w is non -knowledge represented i n economi c theory? Pr eprint arXi v:1209.2204 v 1 [q-ﬁn.GN] [104] E Svetlov a and H van Elst (2014) Decision-theoretic approaches to non-knowledge in eco- nomics Pr eprint arXiv:1407.0787v1 [q-ﬁn.GN] [105] N N T aleb (2007) The Black Swan — The Impact of the Highly Impr obable (London: Pen- guin) ISBN–13: 978014103 4591 [106] M T orchiano (2018) effsize: Ef ﬁcient effect siz e computat ion (R p a c kage version 0.7.4) URL (cited on June 8, 2019): https: //CRAN.R-project.or g/package=ef fsize [107] H T out enburg (2004) Deskriptive Sta t istik 4 th Edition (Berlin: Springer) ISBN–10: 3540222332 [108] H T outenbur g (2005) Induktive Statistik 3 rd Edition (Berlin: Springer) ISBN–10: 3540242937 [109] W M K Trochim (20 0 6) W eb Center for Social Researc h Methods URL (cit ed on June 22, 2012): www .so cialresearchmethods.net [110] J W Tuk ey (1977) Exploratory Dat a Analysis (Reading, MA: Addi son–W esl ey) ISBN–10: 0201076160 [111] A Tversky and D Kahneman (1983) Extens i onal versus intu i tiv e reasoning: the conjunction fallac y in probability judgment Psychological Review 90 293–315 [112] S V asisht h (2017) The replication crisis in science (blog entry: December 29, 2017) URL (cited on July 2, 2018): htt p s ://thewire.in/science/replication-crisis-science 168 BIBLIOGRAPHY [113] J V enn (1880) On the empl oyment of geometrical diagrams for the sensib l e representations of logical propositio ns Pr oceedings of the Cambri dge Phil o s ophical Society 4 47–59 [114] G R W arnes, B Bolker , T Lumley and R C Johnson (2018) gmodels: V arious R pr ogram- ming tools for model ﬁt t ing (R packa ge ver sion 2.18.1) URL (cited on June 27, 2019): https:// CRAN.R-project.org/package=gmodels [115] S L W einberg and S K Abramowitz (2008) S tatistics Using SPSS 2 nd Edition (Cambridge: Cambridge Univ ersi t y Press) ISBN–13: 978 0521676373 [116] M C W ewel (2014) Statistik im Bachelor–Studium der BWL und VWL 3 nd Edition (München: Pearson Studium) ISBN–13: 97838 68942200 [117] H W ickham (2016) ggplot2: Ele gant Graphics for Data Anal ysis (Ne w Y ork: Springer) ISBN–13: 978331924 2774 URL (cited on Ju ne 1 4, 2019): ggplot 2.tidyverse.org [118] K W iesenfeld (2001) Resource Letter: ScL-1: Scaling laws American J ournal of Physics 69 938–942 [119] F W ilcox on (1945) Individual comparisons by ranking method s Biometrics Bulletin 1 80–83 [120] Wol framMathWo rld (201 5) Random number URL (cited on January 28, 2015): mathworld.wolfram.com/RandomNumber .htm l [121] T W olodzko (2018 ) e xtraDistr: Additiona l univari a te a n d mul tivariate distributions (R packag e version 1.8.1 0) URL (cited on June 30, 2019): https:// CRAN.R-project.org/package=e xtraDistr [122] G U Y ule (1897) On the theory of correlation J ournal of the Royal Statistical Society 60 812–854 [123] A Zellner (1996) An Introduction to Bayesian Infer ence i n Econometrics (Reprint) (Ne w Y ork: W iley) ISBN–13: 978047116 9376

Foundations of Descriptive and Inferential Statistics

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment