The Compass for Statistical Researchers
We have hiked many miles alongside several professors as we traversed our statistical path -- a regime switching trail which changed direction following a class on the foundations of our discipline. As we play the game of research in that limbo betwe…
Authors: Daniele Durante, Davide Vidotto, Sabrina Vettori
The Compass for Statistical Researchers • Januar y 201 5 The Compass for Statistical Researchers D aniele D urante ∗ , D a v ide V idotto † and S abrin a V ettori ‡ Department of Statistical S ciences, Univ ersity of Pado v a ∗ Department of Methodology and Statistics, T ilburg University † CEMSE Division, King Abdullah Univ ersity of S cience and T echnology ‡ . durante@stat.unipd.it ∗ , d.vidotto@tilburguniv ersity .edu † , sabrina.vettori@k aust.edu.sa ‡ Abstract We ha ve hiked many miles alongside several professors as we traversed our stati sti cal path – a regime switching trail which changed dire ction following a class on the foundations o f our discipline. As we play the game of r esear ch in that limbo b etween student and academic, one thing among Prof. Bernardi’ s teachings has never been more clear: to draw a route in the resea rch ma p you not only need to know your destination, but you must also understand where you are and how you arrived t here. W here are we ? Students who are set in their choice to study a spe- cific d iscipline since the be ginning are like outliers . A few dots far aw ay , determined a nd with clear ideas, which are likely to be generated from a latent pro- cess of dec isions and ed ucation very differe nt from that random mechanism matching the ambitions of the majority of students with their university route. W e are no exception. Our decision to study Statis- tics resulted from the minimization of different loss functions, grossly calibrated and solv ed with few data. Since then it has been a shared jour ney , ma d e of several hikes in which we hav e filled that bag of skills a nd knowledge to carr y with us on the journey to w a rds the future. A regime switching pa th, which conv erged tow ards the PhD: a common destination of one journey and a starting point of a new one. Among all these structural breaks in our statisti- cal journey , the one that occurred during a class on the f oundations of Statistics w as certainly one of the most decisiv e. An educational revolution lead ing us to w a rd an ine vitable paradigm shift, or better sa id , to w a rds an ontological a nd epistemological enlarge- ment of the primitive methodological pa radigm we had prior to that cla ss. W e w er e already capable of playing with sev era l, sometim es ev en sophisticated, statistical techniq ues. But had we asked ourselv es if these phenomena existed (ontology)? And if they existed, were w e sure it was possible to learn them (epistemology)? If yes, how (methodology)? The a nsw ers to these questions are all “no”, w e had no idea about the response. Before studying the foundations of our discipline we w ere simply pla ying caref ree in the S ciences’ backyards, without w ondering whether those gardens really existed or if statisticians w ere allo wed to enter . W e w er e like tha t conceited person who studies nature through statisti- cal techniques, be f ore making sure, as Galileo Galilei claims, that it is truly written in a mathematica l lan- guage whose f onts are cir cles, squares and geome- tries. Where w ere w e? W a s Statistics “ dull, disreputable, prosaic and m isleading [...], th ird rate discipline [...] a jackal picking over the bones and carcasses of the game th a t the big cat s, the biologists, t h e physicists and th e c hemists, have brought down” ? Or ra ther “ a wonderful discipline, including mat hematics and philosop hy, analysis and em- piricism [...]. Which dem a nds clear thinking, good judge- ment and flair [... ], telling us how to turn data into de- cisions [... ]. A Science dealing with the very essence of the universe: ch ance and contingency” ? Amon g these tw o definitions giv e n by Stephen S enn in Dicing wit h Death (2003 ), w e had hope in the second one to pro v e to be true. T ha t foundations cla ss confirmed our expectations, while ad d ing the further aw areness that “dat a are not statistics as well as the marble is not the sculpture” . A sentence from Ber nardo Colombo which Lorenzo Ber nardi commented in his contri- bution Statistics and Mass Media (20 01), highlighting ho w “both need to be modeled to gain intrinsic v alue and both somehow hav e t h e breath, originality a nd inspiration 1 The Compass for Statistical Researchers • Januar y 201 5 of their creator ” . This is the story of our jou rney a cross the foun- dations of Statistics. From the epistemolo gical con- sciousness, tow ards the methodological reasoning, until reaching the understanding of the techniques. Later , we will go bey ond the path, placing Statistics in the map of S ciences, while looking at its final goal: the communication. B ack to the methodological future Prior to learning the foundations of Statistics, for us testing a statistical hypothesis w as a simple proce- dure: simply subtracting a v alue from the mean, di- viding it b y a measure of v ariability , and checking whether the result was within a certa in rejection re- gion. W e certainly had no idea that what w e were doing was instead pla c ed in a much wider method- ological framew ork. Our immediate declaration of war w a s in fact only the final moment of a r icher resea rch path which required, among others, a careful com- parison with existing contributions, the formulation of specific hypotheses, the identification of the aggre- gate to study , and the choice of adequate analysis instruments. But why did we nee d this researcher ’s shopping list? Our calculations w ould still hav e re - mained the same. Continuing our journey back to the future, the an- sw er w e ga v e to the previous question w as “yes", the results wou ld hav e been the same, but without the warranty of the method they w ould ha v e had zero w e ight in the scientific progress. W e started looking at the method as the insurance of kno wledge. An “ars bene d isponendi seriem plurimarum cog itationum” [art of ordering acc urately multiple t h oughts] framed within specific historical coordinates, and with clear epis- temological roots marking that invisible and some- times v ery thin border between science a nd non- science. Where were w e then? In that ancient map representing the history of S cience our coor dinates w e re inductive infer ence , ex- perimental method , intersubjectivity of Science and cross- fertilization . A p a radigmatic north-south-east-west where Statistics, intended as the S cience aimed at balancing the empirical evidence with the hypothe- ses and conjectures of reason, w as predestined to be - come the queen. W e realized that our little calcula- tions w ere sons of a new scientific method arising during the 16 th and 17th centuries from a rene w ed confidence in the human abilities. A period of Renais- sance that gradually broke the barrier b e tw ee n natu- ral and artificial in fav or of a re- ev aluation of the tech- nical and practical tools, and in which artists such as B runelleschi, M antegna and Leonardo da V inci w e r e also e x perts in urba n planning, a rchitecture and ballistics. A new w orldview that abandoned the Aristotelic d istinction between knowledge directed to practice and kno wledge dev oted to the contempla- tion of truth, fav oring a much deeper experimental dialogue with the nature. Our little problems were thus the result of a new relationship with phenomena, no more char a cterized b y a passiv e obser v ation based on la ws “in libris” [ in books] , but an expe rimental one. The statistical tech- niques w e used to process our da ta became instead like the telescope Galileo pointed skyw ard in 1609. Reliable research tools in pr olonging the senses to fa v or a d eeper understanding of natur e . W e discov- ered, finally , that our hypothesis testing procedure w a s actually part of a more a mbitious attempt of “secare naturam” [slicing t he nature] . W e felt like lit- tle Galileos inv olv ed in a research process that re- calls his study on falling bodies in 16 08, where the comparison be tw ee n each measured distance with what “should h ave been” (on the basis of previous the- ories) repre se nted the close re lationship betw e en the- ory a nd meaningful experience. W e w ere fr eeing our- selv e s from the concept of “auctoritas” [auth o rity] to v enture into a gro wing ed ucation tow ards the critical meaning and the rea sonable interrogation of nature within the guidelines of the inductiv e inference. But, again, why w as the method so important? The answ e r is that it can’t be any other wa y , ac- cording to a notion of S cience which fa vored cross- fertilization and introduced the intersubjectivity as a form of objectivity . Citing Leonardo da V inci “t h ose who love practice without theory are like the sailor who boards ship wit hout a rudder and compass and never knows where he may cast." The method w as the rud- der and the compass guaranteeing internal consis- tency , transparency a nd r eproducibility . Essential el- ements to allow comparison and monitoring of dev e l- opments of kno wledge in a slow process of progres- siv e accumulations made b y corrections and substi- tutions. An iceberg-shaped S c ie nce whose great dis- co v eries and rev olutions arise from a much broader 2 The Compass for Statistical Researchers • Januar y 201 5 la y er of minor theories and findings, sometimes fal- sified and not visible to human ey es. H ot da t a & doctrines of u ncert ainty Data a nd probability are the statistician’s da ily bread . Citing Edward Deming, it w as clea r to us from the beginning that “without data you are just another per- son with an opinion.” Ho w ev er , w e didn’t expec t that these two basic ingredients in the kitchen of the statistician w ere indeed so elaborate and delicate. For us, da ta were initially cold numbers that some- one ga v e us and which in turn w e used to f eed our statistical-probabilistic instruments to answ er c ertain questions. Durin g that foundations class, for the first time w e started instead seeing them as ho t in- formation, and their collection process as a f ur ther expression of the artistic v alue of our discipline. Con- sider the S ocial S ciences, where the statistical sur- v ey beca me a slo w artistic and painstaking process in which the statistician play ed cat-a nd-mouse with “chunks” of infor ma tion so as to obtain the maximum quality from the raw material har v ested. It w a s nec- essary to present the purpose of the sur v ey and then consider general questions before mo ving slowly to- w a rds increasingly specific aspects introducing the interview e e to the “th reatening” questions. A first d ate with data , where the sta tistician had to go through a series of ordered lov ely actions, hoping that the so- cial phenomenon of interest show ed itself in a ll sin- cerity . What about probability? W e w ere ab le to pla y with sev eral probabilistic tools. But from which re- mote pla net w a s probability coming f rom? Or maybe w e had lear ned it from ancient hieroglyphics? W as the concept universal? The first signboard along the fascinating trail that traces the historical and p hilo- sophical foundations of probability indicated it was coming from the planet New Scientific Method , which w e had alread y ex plored. Citing Costantini’s Histori- cal and Philosoph ical Foundations of Statistics and Proba- bility (200 4), “ the probability was the notion wit h which, during the breakthrough th at led to th e mo dern era, we tried to discov er t h e laws of p henomena characterized by uncertain behavior .” Our Mar tians were therefore Lapla ce, v on Mises, Jeffreys and De Finetti. The hierogl yphs w e r e in- stead written on books like Laplace’s T hèorie analy- tique des probabilitès (1812), V on Mises’ Wahrschein- lichkeit, Statistik und W ahrheit (19 28), De Finetti’s Sul significato soggettivo della probabilità (1 931) a nd Jef- frey’s Theory of Probability ( 1 939). For us probability w a s no longer just a number between 0 a nd 1, which expressed a measure of certainty of an ev ent, but took the shape of a history about the beliefs in t he uncertainty . A branching process starting from the fact that nature is not is deter ministic, and later branching out in dif- ferent doctrines to define its uncertainty . The wa y we initially view ed probability as the ratio of the fav orable cases to the total number of cases w as only one possible definition, the classical one formaliz e d by Laplace. What about the others? W e had unconsciously been using one of these defi- nitions, the V on Mises’ frequentist one, since our early classes in inference when we considered probabil- ity in terms of the limit of a relative frequency with which an ev ent occurs in a very la rge (infinite) popu- lation. Havin g a frequentist background, w e saw the other tw o, the De Finetti’s subjectivist and J effrey’s logicist definitions, as perhaps more sur prising but no less interesting. The first placed the subject in the foreground, defining the probability as a degree of c onfidence that a n individual assigns to the occur- rence of an ev ent according to his prior knowl edge. The second instead defined probability as a logical relationship that exists between something kno wn and something that is not, highlightin g the objectiv- ity (outside the subject) of the relationships. The question at this point was: why should w e proceed in our journey with this historical bagga ge on probability? Simply because each of the paradigms of statistical infere nce a rises from the a foremen- tioned definitions of probability . W e r e alized how the Fisherian approach inherited the frequen tist con- cept of probability in exploiting information from the observed data (seen as realizations from a true un- kno wn generativ e process) to appropriately estimate the ge ner a tiv e mechanism and ev aluate ho w sensi- tiv e this reconstruction was to the fact that observ ed data w ere only one sample of many poss ible. Th e frequency-decision theor y a dopted the same defini- tion of probability , but changed the main focus of statistical inference to pro vide rules f or action in sit- uations of uncertainty . The subjectiv e Ba y e sian ap- proach w a s instead epistemologically and method- ologically different in exp licitly studying ho w sub- 3 The Compass for Statistical Researchers • Januar y 201 5 jects’ initial confidence (prior kno wledge) a bout the uncertain generativ e mecha nism changed in the light of data. It was clear therefore how the subjectivist def- inition of pr obability w as more suitable within this paradigm. The methodology remained mostly simi- lar in the objectivist B a y esian app roach, with a n epis- temological shift from prior knowledge to pr ior igno- rance, progressiv ely abandoning the subjectiv e idea to rea ch the objective one stressed by the logicist def- inition. Far from the scope of pro viding a de tailed and comparativ e o verview of these paradigms (see Bar- nett’s Comparative Statistical Inference (19 99) for a n extensiv e d iscussion), the main lesson w e learned w a s that the choice of each pa radigm of statistical in- ference c a rries specific definitions of probability w e should be a w a re of in order to av oid confusion in choosing our techniques and in dra wing our conclu- sions. A n ancillary science No w that w e had found our coordinates in the map of S c ience a nd read the instructions of some toy s we enjo y ed playing with each da y , w e could fully un- derstand a definition of S tatistics often repeated by Lorenzo Ber nardi. An ancillary science . A discipline that is fundamental to all other S ciences. The ma in actor in that slow process of donning with new cer- tainties and doffing of falsified thesis that we had come to know b y the name of scientific p rogre ss . Fi- nally , not only did w e apprec ia te T ukey’s cla im “the best thing about being a statistician is that you g et to play in everyone’ s backyard,” b ut w e could also giv e a name to each of these ba c ky ards, understa nding at the sa me time when w e could enter and in which game w e could play . W e ev en discov ered that these gardens had much more to offer than w e initially thought. There was the garden of M edicine, made f or ex - ample of gro wth cur v es, odds, a nd sur viv al mod- els. In the Sports a nd B etting ba c kyard w e found sophisticated methods to model sports results; see also contributions in the third issue of volum e 27 of Chance . The garden of Finance and Economy w as co v ered with refined micro- and mac ro-econometric models and roller coasters of time ser ies. At the end of the race the w atchw ord w as forecast. Cer- tainly Sharpe, Miller and M arkow itz in 1990 and En- gle and Granger in 2003 must ha ve enjo yed more than a ny one else winn ing the Nobel Prize for eco- nomics. Prediction, especia lly of extreme ev e nts, w as a key also in the backyard of Natur a l S ciences to pro- vide too ls for planning in agriculture a nd to antici- pate natural disa sters. In the garden of Industr y we met, among others, experimental design and control charts for ev aluating the conformity of the products with their re quired cha r acteristics to av oid unpleas- ant surprises f or customers. The latter be came kings in the ga rden of Marketing, flooded with tons of data from the w eb which only data mining techniques could hav e been ab le to transfor m into useful infor- mation. Actors within c ertain social phenomena re- placed customers in the park of S ocial S cie nce s, stud- ded with social indicators, netw orks and many other sociometric instruments. This w as only one wing of a huge Neverland for statisticians, embracing all fields of know ledge such as physics, a stronom y , chemistr y , biology , psychol- ogy , archeology and many others. A Disney W orld that is constantly re new ed to kee p up with the times, new technologies and issues; where Statistics be- comes S enn’s (20 03) “wonderful discip line that includes mathematic s and philosophy, analysis and empiricism,” and the new attractions are Neuroscience, Bioinfor- matics, Computer S cience, W eb Ma r keting, Senti- mental Analysis and Applied Criminology . An amusement park we w anted to celebr ate in the Stat istic al Calendar (ca l.stat.unipd.it/index.html) as well as in the video My Statistician Frie nd (www .y outube.com/w atch?v=yU2qQywUnnU). T w o dissemination pro jects perhaps naiv e, but pas- sionate, which we w ould hav e nev e r considered with- out ha ving attended that foundations class. T he sex y job ? “I keep saying t he sexy job in the next ten years will be statisticians. P eople think I ’m joking.” At this point in our journey we hav e no doubts that the chief economist of Google Hal V arian w a s not joking. Ho w can a d iscipline that combines art, science a nd tech- nology not be sexy? Ho we v er , the more w e looked at the collective idea of Statistics, the more it see med to be commonly pe rceiv ed as Stephen S enn’s (200 3) “nasty old lady, y ou don’t know h er , but she loath es you 4 The Compass for Statistical Researchers • Januar y 201 5 already ,” or “like Australia, everyone knows where it is but no one wants to go there. E xcept that peop le do want to go to Australia.” Ev ery time we found ourselves d iscussing Statis- tics among students from other fields, w e noticed ho w afte r a few minutes they started surfing far a w ay in a slo w decaying process conv erging every- where except in a neighborhood centered around us and with radius “from here we d o not listen t o what you are speaking about.” The few br a v e hearts who rarely remained started asking a ser ies of questions randomly sampled from that set of conceptualiz a - tions a nd stereotypes, such as the famous T rilussa’ s chicken or new e r v ersions like the one claiming that if y ou hav e your head in the ov en and your feet in the freez er then y our av eraged body temper ature is within the limits of healthy living. In that complex classificatio n tree grouping stu- dents according to their field of study , w e found our- selv e s in an isolated branch rep r esenting the semi- kno wn disciplines. This fact sounded extremely dis- sonant. If the goal of Statistics is to tra nsfor m d ata into information, why w asn’t this informa tion be ing ackno wledged? W e w anted to under stand why , after pla ying in the S ciences backyards, they didn’t come up and see our et c hings . The answ er w e ga ve to this question w as that it wasn’t enough to hav e etchings in our room, but it w as also necessary to kno w how to correctly present them a nd how to properly invite S ciences. In that slow process of re c onciliation with the other S ciences w e understood the need to av oid messy floods of d a ta and instead pro vide transpare nt information, fav oring the opportunity to recognize our conceptual c hoices and report the ge ner al frame- w ork our ana lyses and inter pretations ca me from. In parallel, it was important to crea te an airlock to bal- ance the need to inform, sometimes with sophis ti- cated methods of Statistics, with the aloof and sus- picious a ttitude of some interlocutors who had to transform informa tion into decisions. This armistice w ould ha v e been possible only when the first w ould abandon its arrogance a nd, quoting George B ox, the “bad habit o f falling in love with its m odels,” while the latter woul d ha v e changed that attitude ranging be- tw ee n quantitativ e mythology and c e nsorship. Ho w could w e av oid the arrogance ? Simply ac- cepting the idea that many fields of statistical re- search are relative . Citing Berna rdi (2001 ) “ t he spirit of statistical resear ch can be explained by t his story: to th e friend who asks h ow’ s t he wife, the economic stat istician will reply: compared to when? wh ile th e social statistician will ask: with respect to whom?” Part of our arrogance came from the need to alw ays provi de absolute a n- sw er s resulting from naive conclusions, rather than proper exploitation of the analyses. Our checklist ended with an apolog y of simplicity : w e had to thro w out the high-tech weapons and first learn to hit little birds with a slingsh ot instead of a cannon. Dev eloping sophisticated statistical mod- els was certainly much more stimulating than the slo w process of verification of hypotheses and r e- sults somehow already present in the collective id e as. Ho w ev er , as B ernardi (2001 ) claims, “wh ile the first is based o n a scientific app roach which sometim es proves pre- carious and naive, the second is a lways a solid starting point for new curiosities and fruitful intuitions.” An exercise of “restating the obvious so as such re- mains,” to tra in ourselv e s in the effective commu- nication of finest methods, balancing sophistication and simplicity . T w o not necessarily rival aspects, but difficult to reconcile for fear of trivializing the con- tribution of Statistics. Our masterpieces should not, therefore, hav e been mere ly an expression of P o p artists , but the y a lso had to reflect the car eful w ork of copy ists m onks engaged in that funda mental w ork of “translation a nd simplific a tion of t eaching, easily acces- sible for the reader .” Only b y accepting this responsibility will we make peace with the other Sciences, allo wing them to appreciate our etchings . Conv ersely if w e d o not take proper car e in communication, then we will re- main that conceited guy admiring himself in the mir- ror without realizing he is indeed a lone in a corner while all the Sc ie nces a re enjo ying the dance floor of scientific progress. A cknowledgments W e are grateful to Giulio Peruz z i for his precious teachings on philosophy a nd history of S cience. W e also thank Bruno S car p a , Antonio Canale, Maria T er- res, Jacopo S oria no and Davide S alanitri for the fruit- ful discussion and edits on an early v e rsion of the manuscript. 5 The Compass for Statistical Researchers • Januar y 201 5 R eferences [Bar nett1999] Barnett, V . (1999 ). Compa rative Statisti- cal Infer ence . John W iley & S ons, Inc. [Berna rdi2001] Ber nardi, L. (20 01). Statistica e mez zi di comunicazione di massa. In T uzzi, A., edi- tor , Dall’intervista alla notizia , pa ges 2 41–25 1. Edi- zioni Sapere . [Costantini2004] Costantini, D. (2004 ). I fonda- menti sto rico- filosofici delle d iscip line statistico- probabilistiche . Bollati Boringhieri. [De Finetti1931] De Finetti, B. (19 31). Sul significato soggettiv o della probabilità. Fundamenta Mathe- maticae , 17:29 8–329 . [Jeffreys1939] Jeffreys, H. (1 9 39). Th eory of P robabi lity . The Clarendon Press. [Laplace1 812] Laplace, P . S. (18 12). Thèorie analy tique des p robabilitès . V e Courcier . [S enn2003] S enn, S. (20 03). Dicing wit h D eat h . Cam- bridge Univ ersity Press. [V on Mises1928] V on M ises, R. (1928). W ahrschein- lichkeit, Statistik und W ah rheit . V on Julius Springer . 6
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment