Beyond subjective and objective in statistics

We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, …

Authors: Andrew Gelman, Christian Hennig

Bey ond sub jectiv e and ob jectiv e in statistics ∗ Andrew Gelman † Christian Hennig ‡ 5 August 201 5 Abstract W e argue that the w ords “ ob jectivit y” and “sub jectivit y” in s tatistics discour se are used in a mostly unhelpful wa y , and we prop ose to replace ea ch of them with broader co llections o f attributes, with o b jectivit y replac e d by tr ansp ar ency , c onsensus , imp artiality , and c orr esp on- denc e to observable r e ality , and sub jectivity replaced by aw areness of multiple p ersp e ctives and c ontext dep endenc e . The adv antage of these re fo rmulations is that the r eplacement terms do not opp ose each other . Instead of debating ov er w hether a giv en statistical method is sub jec- tive or ob jective ( or no rmatively debating the rela tive merits of sub jectivity and o b jectivit y in statistical practice), we can recognize desirable attributes such as tr ansparency and a ckno wledg - men t of multiple p ersp ectives as complementary goa ls. W e demons trate the implica tions of our prop osal with re c en t applied ex amples from pharma cology , election p olling, and socio econo mic stratification. 1. Intro duction 1.1. Motivation W e can’t do statistics without data, and as statisticians muc h of our efforts rev olve around mo deling the links b etw een data and su bstan tiv e constructs of interest. W e migh t analyze national survey data on p urc hasing d ecisions as a wa y of estimating consumers’ r esp onses to economic conditions; or gather blo o d samp les o ver time on a sample of patien ts with the goal of estimating the metab olism of a drug, with the u ltimate goa l of coming up with a more effec tiv e dosing s c hed ule; or w e migh t b e perf orming a more exploratory analysis, see king clusters in a multiv ariate dataset with the aim of disco vering p atterns not app aren t in simple a ve rages of ra w data. As app lied researchers we are con tinually r eminded of the v alue of in tegrating new data into an analysis, and the balance b et ween data qualit y and quantit y . In some settings it is p ossible to answ er questions of in terest using a sin gle clean dataset, but more and more w e are fi nding that this simple textbo ok approac h d o es not work. External in f ormation can come in many forms, including (a) recommendations on wh at v ari- ables to adjust for non-representa tiv eness of a surve y or imbala nce in an exp eriment or observ ational study; (b) the exten t to whic h outliers sh ould b e treated as regular, erroneous, or a s indicating something that is meaningfu l but essen tially d ifferent from the main b o dy of observ ations; (c) substanti al information o n the role of v ariables, including p oten tial issues with m easur emen t, con- founding, and substan tially meaningfu l effect sizes; (d) p opulation distr ib utions that are us ed in p oststratification, age adjustment, and other pr o cedu res that attempt to align inf erences to a com- mon p opulation o f int erest; (e) restrictions suc h as smo othness or sparsit y that serv e to regularize estimates in high-dimensional settings; (f ) the c h oice of fun ctional form in a regression m o del (which in eco nomics might b e chosen to work with a particular utilit y fu nction, or in p ublic h ealth migh t ∗ W e t hank Sebastian W eb er, Jay Kadane, A rthur Dempster, Mic hael Betancourt, Michael Zyphur, E. J. W agen- makers , Deb orah May o, James Berger, Prasan ta Bandyopadh ya y , Laurie Paul, Jan-W illem R omeijn, Gianluca Baio, Keith O’Rourke, and Lau rie Davies for helpful comments. † Department of S tatistics and Department of Political Science, Columbia Universit y , N ew Y ork. ‡ Department of S tatistical Science, Universit y College London. b e motiv ated b ased on success in similar stud ies in the literature); and (g) numerical information ab out particular parameters in a mo d el. Of all these, only the fi nal item is tr aditionally giv en the name “prior information” in a statistical an alysis, b ut all can b e usefu l in serious applied work. Other relev ant information co ncerns not the data generating pro cess but rather how the d ata and results of an analysis are to b e used or in terpr eted. W e w ere motiv ated to write the p resen t pap er b ecause w e f elt that our app lied wo rk, and that of others, wa s imp eded b ecause of the con ven tional fr aming of certain statistical analyses as sub jectiv e. It seemed to us that, rather than b eing in opp osition, sub j ectivit y an d ob jectivit y b oth had virtues that were relev ant in making d ecisions ab out s tatistica l analyses. W e hav e earlier noted (Gelman and O’Rourk e, 2015) that statisticians t ypically c ho ose their pro cedures based on non-statistic al criteria, and philosophical traditions and ev en the lab els attac hed to particular concepts can affect real-w orld p r actice. In this pap er we reassess ob jectivit y and su b jectivit y , explodin g eac h in to sev eral su b -concepts, and w e demonstrate the relev ance of th ese ideas for three of our activ e applied researc h pro jects: a hierarc hical p opulation mod el in ph armacolog y , a pro cedure for adjustmen t of opt-in s urv eys, and a cluster an alysis of d ata on so cio economic stratificatio n. W e h op e that readers will likewise see the relev ance of these ideas in their o w n applied w ork, where decisions must b e m ade ab out how to com bine information of v arying qualit y from differen t sources. 1.2. Objectivit y and subjectivit y The con tin uing in terest in and discussion of ob jectivit y and sub jectivit y in s tatistics is, w e b eliev e, a necessary pro duct of a fundamenta l tension in science: On one han d , scie nti fic claims should b e imp ersonal in the sense that a scien tific argumen t should b e und er s tandable by any one with the necessary training, not just by the p erson promulg ating it, and it should b e p ossible for s cientific claims to b e ev aluated and tested by outsiders. On the other hand, the pro cess of scien tific in ference and disco very inv olv es individu al choice s; indeed, scientists and the general pub lic celebrate the brilliance and inspiration of greats such as Einstein, Darwin, and the like, recognizing the r oles of their person alities and individual experiences in shaping their theo ries and disco ve ries, and philosophers of science hav e studied the inte rpla y b et w een p ersonal attitudes and scientific theories (Kuhn, 1962). Thus it is clear that ob jectiv e and su b jectiv e element s arise in the pr actice of science, and similar consideratio ns hold in statisti cs. Within statistics, though, discourse on ob jectivit y and sub j ectivit y is at an impasse. Ideally these concepts w ould b e p art of a consideration of the role of different s orts of information and assumptions in statistical analysis, but instead they often seemed to b e u s ed in restrictiv e and misleading w ays. One prob lem is that the terms “ob jectiv e” and “sub jectiv e” are loaded with so m any asso ciatio ns and are often used in a m ixed descriptive /normativ e w ay . Scienti sts whose metho ds are branded as sub jectiv e h a ve the a wkward c hoice of either sa ying, No, we are really ob jectiv e, or else embracing the su b jectiv e lab el and turning it in to a principle. F rom the other direction, scien tists who use metho ds lab eled as ob jectiv e often seem so inte nt on eliminating sub jectivit y from their analyses, that they end up censoring themselv es. This happ ens, for examp le, wh en r esearc hers rely on p - v alues but refuse to r ecognize when their choic e of analysis is contingen t on data and that th e th eory b ehind the p -v alues is th er efore inv alidated (as discu s sed by S immons, Nelson, and S imonsohn, 2011, and Gelman and Lok en, 2014): significance testing is often used as a to ol for a misguided ideology that leads researc h ers to hide, eve n fr om themselv es, the iterativ e searc h ing pro cess by whic h a scie nt ific theory is mapp ed in to a statistic al mo del or c hoice of data analysis (Box, 1983 ). 2 More generally , misguided concerns about sub jectivit y can lead researc hers to a voi d in corp orating relev ant and av ailable information into their analyses and adapting the analyses appropriately to their researc h questions and p oten tial uses of their results. Man y users of th e terms “ob jectiv e” and “sub jectiv e” in discussions concerning statistics d o not ac knowledge that these terms are quite con trov ersial in th e ph ilosoph y of s cience, and that they are used with a v ariet y of different meanings and are therefore p rone to misu nderstandin gs. 2. Our propo sal W e prop ose when talking ab out statistics to rep lace, where eve r p ossible, the words “ob jectiv- it y” and “su b jectivit y” with broader collec tions of attributes, w ith ob jectivit y r eplaced b y tr ans- p ar ency , c onsensus , i mp artiality , and c orr esp ondenc e to observable r e ality , an d sub jectivit y r eplaced b y a wa reness of multiple p ersp e ctives and c ontext dep endenc e . The adv anta ge of this reform ulation is that the replacement terms do not opp ose eac h other. Instead of debating o v er whether a give n statistica l metho d is sub j ectiv e or ob jectiv e (or normativ ely debating the relativ e merits of su b jectivit y and ob jectivit y in statistical p ractice), we can recognize attributes suc h as transparency and ac kno wledgmen t of multiple p ersp ectiv es as complemen tary goals. 2.1. “T ranspa rency ,” “consensus,” “impartialit y ,” and “co rresp ondence to observable reality ,” instead of “objectivit y” Merriam-W ebster defines “ob jectiv e” as “based on facts rather than feelings or opinions: n ot in- fluenced by feelings” and “existing ou tsid e of th e mind: existing in the real w orld” (actually the concept is quite con tro v er s ial, see Section 4.3). Science is practiced by human b eings, wh o only h a ve access to the real w orld throu gh in terp retation of their p erceptions. T aking ob jectivit y seriously as an ideal, scien tists need to make the sharing of their p erceptions and inte rpretations p ossible. When ap p lied to statistics, the im p lication is th at the choi ces in the data analysis (including the prior distribution, if an y , b ut also the m o del for the data, metho d ology , and the c hoice of what information to in clude in the fir st p lace) should b e motiv ated based on factual, externally verifiable information and transparent criteria. Th is is similar to the id ea of the concept of “institutional decision analysis” (Section 9.5 of Gelman, Carlin, et al., 2013), u nder which the m athematics of formal decision theory ca n be used to ensure that decisions can b e justified based on clearly-stated criteria. Differen t s takehol ders will d isagree on decision criteria, and differen t scient ists will differ on statistical m o deling decisions, so, in general, there is no uniqu e “ob jectiv e” analysis, b ut w e can aim at communicati ng and justifying analyses in wa ys that su p p ort scru tin y and even tually consensus. Similar th ough ts hav e motiv ated the slogan “transparency is the new ob jectivit y” in journalism (W ein b erger, 2009). In the conte xt of statistical analysis, a k ey asp ect of ob jectivit y is therefore a process of tr ans- p ar ency , in wh ic h the choic es in vo lv ed are justified based on extern al, p oten tially verifiable sources or at least transparent considerations (ideally accompanied by sensitivit y analyses if suc h consid- erations lea ve alternativ e options op en), a sort of “pap er trail” le ading from external information, through mo deling assumptions and decisions ab out statistic al analysis, all the w a y to inferences and decision recommendations. But transp arency is n ot enough. W e hold that science aims at c onsensus in p oten tially free exc hange (see Section 4.4 f or elab oration), wh ic h is one reason that the current crisis of non-replication is tak en so seriously in p syc h ology (Y ong, 2012 ). T r ansparency con trib utes to this building of consensu s by allo wing sc h olars to trace the sources and information 3 used in statistical r easoning (Gelman and Basbøll, 2013 ). F urthermore, scien tific consens us, as far as it deserv es to b e calle d “ob jectiv e,” r equires rationales, clear argument s and motiv ation, an d elucidation ho w th is rela tes to already existing kno wledge. F ollo wing generally accepted rules and pro cedures counte rs th e dep endence of results on the p er s onalities of the individu al researc hers, although there is alwa ys a danger that su c h generally accepted rules and pro cedures are inappro- priate or sub optimal for the sp ecific situation at hand. In an y case, consensus can on ly b e ac hieved if r esearc hers attempt to b e imp artial b y taking int o accoun t comp eting p ersp ectiv es, a voiding to fa vor pre-c h osen hypotheses, and b eing o p en to criticism. The world outsid e the observe r’s m in d plays a k ey r ole in usual concepts of ob jectivit y . Findin g out ab out th e r eal world is s een by many as the ma jor ob jectiv e of science, and this su ggests corresp ondence to r ealit y as the ultimate source of scient ific consensus. This idea is not without its problems and meets some philosophical opp osition; see S ection 4.3. W e ac kno wledge that the “real wo rld” is only accessible to h uman b eings through obs erv ation, and that scienti fic observ ation and measurement cannot b e indep endent of human p reconceptions and theories. As statisticians w e are concerned with making general statemen ts based on systematized observ ations, and this mak es c orr esp ondenc e to observe d r e ality a core concern regarding ob jectivit y . This is not mean t to imply that empirical statemen ts ab out observ ations are the only meaningful ones that can b e made ab out realit y; w e think that scien tific theories that cannot b e v erifi ed (but p oten tially b e falsified) b y observ ations are meaningful th ou ght constructs, particularly b ecause observ ations are truly indep endent of though t constructs. F ormal statistical method s contribute to ob jectivit y as far as they cont ribute to the fulfillment of these desiderata, particularly by making pr o cedures and their implied rationales transparent and unam biguous. F or example, Bay esian statistics is commonly c haracterized as “sub jectiv e” by Ba yesia ns and non-Ba yesia ns alike . But dep ending on ho w exactly prior distributions are inte rpr eted and used (see Sections 5.3–5.5), they fulfill or aid some or all of the virtues listed ab o ve. Priors mak e the researc h er s ’ prior p oin t of view transparen t; different a ppr oac hes of inte rpreting them provide d if- feren t r ationales for consensus; “ob jectiv e Ba y esians” (see Section 5.4) try to m ake them im p artial; and if su itably in terp r eted (see S ection 5. 5) they can b e prop erly grounded in observ ations. 2.2. “Multiple p ersp ectives” a nd “context dep endence,” instead of “ subjectivit y” Merriam-W ebster d efi nes “sub jectiv e” as “relati ng to the wa y a p erson exp eriences things in h is or her own min d ” and “based on feelings or opinions rather than f acts.” Science is normally seen as str iving for ob jectivit y , and therefore ac kno wledging sub jectivit y is not p opular in science. But as noted ab o ve already , realit y and the facts are only acc essible through individu al p ersonal exp eriences. Differen t p eople bring different information and different v iewp oin ts to the table, and they will use scien tific results in different wa ys. In order to enable clear comm unication and consensus, differing p ersp ectiv es need to b e ac kno wledged, wh ic h con trib u tes to transparency and th us to ob jectivit y . Therefore, sub jectivit y is imp ortan t to the scien tific pro cess. Sub jectivit y is v aluable in statistics in that it r ep resen ts a w a y to incorp orate the information coming f rom d ifferin g p ersp ectiv es. W e prop ose to replace the concept of “sub jectivit y” with awar eness of multiple p ersp e ctives and c ontext dep endenc e . T o the exten t th at sub jectivit y in s tatistics is a go o d thing, it is b ecause information truly is disp ersed, an d , for an y particular problem, different stak eholders ha v e differen t goals. A counterpro ductiv e implicat ion of the idea that sci ence should b e “o b jectiv e” is that there is a tendency in the communicatio n of statistical analyses to either a v oid or h ide decisions that 4 cannot b e made in an automatic, seemingly “ob jectiv e” fashion by the a v ailable d ata. Giv en that all observ ations of realit y dep end on the p ersp ectiv e of an observer, in terpreting scie nce as s triving for a un ique (“ob jectiv e”) p ersp ectiv e is illusory . Multiple p ersp ectiv es are a r ealit y to b e rec k oned with and should not b e hidd en. F ur thermore, b y a v oiding p ersonal decisions, resea rc hers often w aste opp ortunities to adapt their analyses app ropriately to the con text, the sp ecific bac kgroun d and their sp ecific resea rc h aims, and to comm unicate their p ersp ectiv e more clearly . Th erefore w e see the ac kno wledgmen t of m ultiple p ersp ectiv es and cont ext d ep endence as virtues, making clearer in whic h sense sub jectivit y can b e pro d uctiv e and helpful. The term “sub jectiv e” is often used to c haracterize asp ects of certain s tatistica l procedu r es that cannot b e d eriv ed in an automatic manner f rom th e data to b e analyzed, suc h as Ba y esian prior distributions and tuning parameters (for example, th e prop ortion of trimmed observ ations in trim m ed means, or the thresh old in w a vele t smo othing). Such decisions are en try p oint s for m ultiple p ersp ectiv es and cont ext dep endence. The first decisions of this kind are t ypically the c h oice of data to b e analyzed and the family of statistic al mo d els to b e fit. T o connect with th e other h alf of our p rop osal, the recognition of d ifferen t p ersp ectiv es should b e done in a trans p aren t w a y . W e should not sa y we set a tunin g p arameter to 2.5 (sa y) just b ecause that is our b elief. Rather, we should justify the c h oice explaining clearly ho w it su pp orts the r esearc h aims. Th is could b e by em b edding the c h oice in a s tatistica l mo del that can ultimately b e linked back to observ able realit y and emp irical data, or by reference to desirable c haracteristics (or a v oidance of un d esirable artifacts) of the metho dology give n the u s e of the chosen parameter; actually , many tunin g parameters are related to su c h characte ristics and aims of the analysis rather than to some assumed u nderlying “b elief ” (see Section 3.3). In some cases, su c h a jus tification ma y b e imprecise, for example b ecause bac kground kn o wledge may b e only qualitativ e and not quan titativ e or not pr ecise enough to tell p ossible alternativ e choice s apart, but often it can b e argued that ev en then conscious tun ing or sp ecificat ion of a prior distribution comes with b enefits compared to us in g default metho ds of wh ic h the main attraction often is that seemingly “su b jectiv e” decisions can b e a voided. T o consider an imp ortant example, regularization requir es such decisions. Default priors on regression co efficien ts are used to express th e b elief that co efficien ts are t ypically close to zero, and from a non-Ba y esian p ersp ectiv e, lasso sh rink age can b e in terpreted as encodin g an external assumption of sparsit y . Sparsit y assumptions can b e connected to an implicit or explicit mo del in whic h problems are in s ome sens e b eing sampled from some distribution or probabilit y measure of p ossible situations; see Section 5.5. This general p ersp ectiv e (which can b e seen as Ba ye sian with an implicit p r ior on sta tes of n ature, or classical with an implicit reference set for the ev aluation o f statistica l pro cedures) pro vides a p oten tial basis to connect choic es to exp erience; at least it mak es transparent what kind of vie w of realit y is encod ed in the c hoices. Tibshirani (2014) writes that enforcing sp arsit y is not prim arily motiv ated by b eliefs ab out the world, bu t rather by b enefits such as computabilit y and int erpretabilit y , hin ting at the fact that considerations other than b eing “close to the real world” often p la y an imp ortant role in statistics and more generally in science. Eve n in areas s uc h as so cial science wh er e no underlyin g truly sparse structure exists, imp osing sp ars it y can ha ve adv ant ages suc h as sup p orting stabilit y (Gelman, 2013 ). In a w ider sense, if one is p erforming a linear or logistic regression, f or example, and considering options of maxim um likel iho o d, lasso, or hierarchical Ba y es with a p articular structure of pr iors, all of these c h oices are “sub jectiv e” in the sense of enco ding aims regarding p ossible outputs and assumptions, and all are “ob jectiv e” as far as these aims and assump tions are made transparent and the assu m ptions can b e ju stified based on past data and u ltimately b e chec k ed g iv en enough future 5 data. So the con ve nt ional lab eling of Ba y esian analyses or regularized estimates as “sub jectiv e” misses the point. Alternativ ely to basing it on past data , the choic e of tunin g parameter c an b e based on kno wl- edge of the impact of the c hoice on results and a cl ear explanatio n wh y a certain impact is desired or not. In robust statistic s, for example, the b reakdo wn p oin t of some metho ds can b e tuned and ma y b e c hosen lo w er than the optimal 50%, b ecause if there is a to o large p ercen tage of data deviating strongly from the m a j ority , one ma y rather w an t the metho d to deliv er a compromise b et w een all observ ations, b ut if the p ercenta ge of outliers is qu ite lo w, one ma y r ather w an t them to b e d isregarded, with b orderline p ercentag es dep end ing on the application (p articularly on to what extent outliers are interpreted as erroneous observ ations rather than as somewh at sp ecial but still relev ant ca ses). 2.3. A list of sp ecific objective a nd subjective virtues T o sum marize the ab ov e discussion, virtues that are often referred to as “ob jectiv e” include: 1. T rans parency: (a) Clear and u nam biguous definitions of concepts, (b) Op en planning and follo wing agreed proto cols, (c) F ull comm unication of reasoning, pro cedures, an d p oten tial limitations; 2. Consensus: (a) Acco unting for r elev ant kno wledge and existing rela ted work, (b) F ollo wing generally accepted rules wh ere possib le and reasonable, (c) Pro vision of rationales for consensus and unifi cation; 3. Impartialit y: (a) Thorough consideration of relev an t and p oten tially comp eting theories and p oint s of view, (b) Thorough consider ation and if p ossible remov al of p oten tial biases: factors that m ay jeopardize consensus and th e intended in terp retation of r esults, (c) Op enness to criticism and exc han ge; 4. Corresp ondence to observ able realit y: (a) Clear co nnection of concepts and m o dels to observ ables, (b) Clear co nditions for repro du ction, testing, a nd falsification. This last bit is a chall enge in stat istics, as repro duction, testing, a nd falsification can only b e assessed probabilistically in any real , finite-sample setting. What ab out sub jectivit y ? T he term “sub jectiv e” is often used as opp osite to “ob jectiv e” and as suc h often mean t to b e opp osed to scient ific virtues, or to b e something th at cann ot fu lly b e a voided and that ther efore has to b e only grudgingly accepted. But sub jectiv e p ersp ectiv es are the building blo cks for scien tific consensus, and therefore there are also scientific virtues asso ciated with sub jectivit y: 6 1. Aw areness of multiple p ersp ectiv es, 2. Aw areness of con text dep end ence: (a) Recog nition of dep end ence on sp ecific con texts and aims, (b) Honest ac kno wledgmen t of the researc h er ’s p osition, goals, exp eriences, and sub jectiv e p oint of view. In the subsequ ent discussion we shall lab el the items in the ab o ve lists as O1a–O4b or O1– O4 for group s of items (“O” for “connected to ob j ectivit y”), and S1, S 2 (S2a, S2b) for the items connected to sub jectivit y . Our inten tion is to sketc h a system of virtues that all o ws a more p recise and detailed d iscussion where issues of ob jectivit y and sub jectivit y are at sta k e. W e are a ware that in some situations some of th ese virtues m a y opp ose eac h other, f or example “consensus” can con tradict “a wareness of m ultiple p ersp ectiv es,” and indeed dissen t is essen tial to scien tific progress. This tension b et we en imp ersonal consensus and creativ e debate is an una v oidable asp ect of science. Sometimes the consens u s can only b e that there are differen t legitimate p oin ts of view. F urthermore, th e listed virtues are n ot all fully autonomous; clear r eference to observ ations ma y b e b oth a main rationale for co nsensus and a key cont ribution to transparen cy; and the three sub jectiv e virtues con tribu te to b oth transparency and op enn ess to criticism and exc hange. Not all items on th e list apply to all situations. F or example, in the follo wing section we will apply the list to the foundations of statistics, but the items O1c and S2b rather apply to sp ecific studies. 3. Applied examples In con ven tional statistics, assumptions are commonly minimized. C lassical statistics and econo- metrics is often framed in terms of robustn ess, with th e goal b eing metho ds that work with min im al assumptions. But the decisions about wh at information to include and ho w to frame the mo del— these are t ypically bu ried, not sta ted formally as assumptions but j ust baldly state d: “Here is the analysis w e did . . . ,” sometimes with the statemen t or implication that these ha v e a theoretical basis b ut t ypically with little clear connection b etw een s u b ject-matter theory and details of m ea- surements. F rom the other p ersp ectiv e, Ba y esian analyses are often b oldly assumption-based but with th e imp lication that these assum ptions, b eing sub jectiv e, need no justification and cannot b e c h ec ke d fr om data . W e would lik e s tatistica l p ractice, Ba ye sian and otherwise, to mo ve to w ard more transp arency regarding the steps lin king theory and data to mo dels, and recognition of m u ltiple p ersp ectiv es in the information that is includ ed in this p ap er trail and this mo del. In this section we sho w how w e are try in g to mo ve in this direction in s ome of our recen t researc h pro jects. W e presen t these examples not as an y sort of id eals but rather to demonstrate ho w w e are grappling w ith these ideas and, in particular, the wa ys in w hic h activ e a wareness of the concepts of transparency , consensus, impartialit y , corresp ondence to observ able r ealit y , m u ltiple p ersp ectiv es and con text d ep endence is c h anging our ap p lied w ork. 3.1. A hierarchical Bay esian mo del in pharmacology Statistical inf erence in p harmacokinetics/pharmacody n amics inv olv es many c hallenges: d ata are indirect and often n oisy; th e mathematical mo dels are nonlinear and compu tationally exp ens ive, requiring the solution of differen tial equations; and p arameters v ary b y p erson but often with only 7 a small amo unt of data on eac h exp erimental sub ject. Hierarc hical mo d els and Ba yesia n inference are often used to get a handle on the man y lev els of v ariation and u n certain ty (see, for example, Sheiner, 1984 , and Gelman, Bois, and Jiang, 1996). One of us is cur ren tly working on a pro ject in drug deve lopment in v olving a Ba yesian mod el that w as difficult to fit, ev en wh en u s ing adv anced statistical algorithms and soft ware. F ollo win g the so-called f olk theorem of statistica l computing (Gelman, 2008), we su sp ected that the problems with computing could b e attributed to a pr oblem with our statistical model. In this case, the iss ue did not seem to b e lac k of fit, or a missing in teraction, or unmo deled measurement error—pr oblems w e had seen in other sett ings of this sort. Rather, the fi t app eared to b e in sufficien tly constrained, with the Ba yesian fitting algo rithm b eing stuc k going th r ough r emote regions of parameter space that corresp onded to implaus ib le or u n physic al parameter v alues. In sh ort, the mo del as written was only w eakly id en tified , and the giv en data and priors were consisten t with a ll sorts of parameter v alues that did not mak e scien tific sense. Our iterativ e Ba yesia n computation had p o or con v ergence—that is, the algorithm w as having d ifficult y app ro x- imating the p osterior distribution—and the simula tions w ere going through zones of parameter space that w ere not co nsistent with the scien tific understand ing of our pharmacology co lleagues. T o pu t it another w a y , our researc h team h ad access to p rior information that had not b een included in the mo del. So w e to ok the time to sp ecify a more informativ e prior. The initial mo d el th us pla y ed the role of a p laceholder or default w hic h could b e elab orated a s needed, follo wing the iterativ e prescription of falsificationist Ba y esianism (Bo x, 1980 , Gelman et al., 201 3, Section 5.5). In our exp erience, informativ e priors are not so common in applied Ba yesia n inferen ce, a nd when they are used, they often seem to b e presen ted with ou t clear justification. In this instance, though, w e decided to follo w the principle of transparency and write a note explaining the genesis of eac h pr ior distribution. T o giv e a sense of what we’re talking ab out, we present a su bset of these notes here: • γ 1 : mean o f p opulation distribution of log(BV A latent j / 50), centered at 0 b ecause the mean of the BV A v alues in the p opulation should indee d b e near 50 . W e set the prior sd to 0.2 which is close to log(6 0 / 50) = 0 . 1 8 to indica te that we’re pretty s ur e the mean is b et ween 4 0 a nd 6 0 . • γ 2 : mean of po p dist of log( k in j /k out j ), centered at 3 .7 b ecause we started with − 2 . 1 for k in and − 5 . 9 for k out , sp ecified from the literature ab out the disease. W e use a sd o f 0.5 to represent a certain amount of ignor ance: we’re saying that our prio r guess for the po pulation mean of k in /k out could eas ily be off by a factor of exp(0 . 5 ) = 1 . 6 . • γ 3 : mean o f pop dist of log k out j , centered at − 5 . 8 with a sd of 0.8, which is the pr ior that we were given b efore, from the time scale of the natura l disea se progre s sion. • γ 4 : lo g E 0 max , centered at 0 with s d 2.0 beca use that’s what we were given ea rlier. W e see this sort of painf u lly honest j ustification as a template for futu r e Ba y esian data analyses. The ab ov e s nipp et certainly d o es not repr esen t an exemplar of b est practices, but we see it as a “goo d enough ” effort that present s our mo deling decisions in the con text in which they were made. T o lab el this p rior s p ecification as “ob jectiv e” or “sub j ectiv e” w ould miss the p oin t. Rather, w e see it as ha ving some of th e virtues of ob jectivit y and s u b jectivit y—notably , transparency (O1) and some asp ects of consensus (O2) and a w areness of multiple p ersp ectiv es (S1)—while recognizing its clear imp erfections and incompleteness. Other desirable features would deriv e from other asp ects of the statistical analysis—for example, w e u se external v alidation to approac h corresp ondence to observ able realit y (O4), and our a w areness of con text dep end ence (S2) comes from the placemen t of our analysis within the larger goa l, which is to mo del dosing options for a particular d r ug. 8 One concern ab out our analysis which w e h a ve not y et thoroughly add ressed is sensitivit y to mo del assumptions. W e ha v e established that the prior distribu tion mak es a difference b ut it is p ossible that different r easonable priors yield p osteriors with greatly differing real-w orld imp lica- tions, which w ould raise concern ab out consensu s (O2) and impartialit y (O3). Our resp onse to suc h concerns, if this sensitivit y is indeed a problem, would b e to m ore carefully do cument our c h oice of prior, th us d oubling do wn on the pr inciple of transp arency (O1) a nd to compare to other p ossible prior d istributions supp orted b y other information, th us su pp orting im p artialit y (O3) and a wareness of m ultiple p ersp ectiv es (S1). As w ith “institutional decision analysis” (Gelman et al., 2003, section 22.5), the p oin t is n ot that our particular c h oices of prior distributions are “correct” (whatev er that means), or optimal, or ev en go o d, but rather that they are transparen t, and in a transp arent wa y connected to kn o wl- edge. Subs equen t researc hers—wh ether supp ortiv e, criti cal, or neutral rega rding our metho d s and substanti v e findin gs—should b e able to in terpret ou r p r iors (and, by implication, our p osterior inferences) as the result of some systematic pro cess, a p ro cess op en enough that it can b e criticized and impro ved as appr opriate. 3.2. Adjustments f o r p re-election p olls W ang et al. (2 014) describ e another of our recen t app lied Ba yesia n researc h pro jects, in this ca se a statistica l an alysis that allo ws highly stable estimates of pub lic opinion b y adju stmen t of data from non-random samples. The particular example used was an analysis of d ata from an opt-in s urv ey conducted on the Microsoft Xb ox v id eo game platform, a tec hn iqu e that allo w ed th e researc h team to, effectiv ely , interview resp ondents in their living ro oms, without ev er n eeding to cal l or enter their houses. The Xb o x survey was perf ormed during th e t wo mon ths b efore t he 2012 U.S. presidentia l electio n. In addition to offering the p oten tial practical b enefits of p erform in g a national survey using inexp ensive data, this particular p ro ject m ade use of its large sample s ize and panel structur e (rep eated resp ons es on m an y thousands of Americ ans) to learn something new ab out U.S. politics: w e found that certain s w ings in the p olls, whic h had b een generally interpreted as representi ng large swings in public opinion, actually could b e attributed to d ifferen tial nonresp onse, with Demo crats and Republicans in turn b eing more or less lik ely to resp ond dur ing p eriod s where there was go o d or bad news ab out their candidate. This fin ding wa s consistent with some of th e literature in p olitical science (see Erikson, Panag op oulos, and Wlezie n, 2004 ), but the Xb o x study represented an imp ortan t empirical confirmation. Ha ving esta blished the p otent ial imp ortance of the work, we next consider its con tro versial asp ects. F or m an y decades, the go ld sta ndard in p u blic opinion research h as b een pr obabilit y sampling, in whic h the p eople b eing surve y ed are selecte d at rand om from a list or lists (for example, selecting hous eholds at random from a list of addr esses or telephone num b ers and then selecting a p erson within eac h sampled h ousehold f rom a list of the adults who liv e there). F rom this standp oint, opt-in samp ling of th e sort emplo yed in the Xb o x sur v ey lac ks a theoretical foun d ation, and the estimates and standard errors th us obtained (and wh ic h we rep orted in our researc h pap ers) do not h a ve a clear statistical in terpretation. This criticism—that inferen ces from opt-in surveys lac k a theoretical found ation–is interesting to us h ere b ecause it is not framed in terms of ob jectivit y or sub jectivit y . W e do u se Ba yesia n metho ds for our survey adjustment but the criticism fr om certain sur v ey pr actitio ners is not ab out adjustment but rather ab out the data collection: they tak e the p osition that no go o d adjustmen t is p ossible f or data co llected from a non-probabilit y sample. 9 As a practica l matter, our resp onse to this criti cism is that nonresp onse rate s in national random-digit-dialed telephone p olls are curr en tly in the range of 90%, whic h implies that real- w orld surveys of th is s ort are essent ially opt-in samples in an y case: If there is no th eoretica l justification for n on-random samples then w e are all dead, w hic h lea ve s us all with th e c hoice to either abandon statistical inference en tirely when dealing w ith su rv ey data, or to accept that our inferences are model-based and do our b est (Gelman, 2014c ). W e shall now express this d iscu ssion using the criteria from Section 2.3. Probabilit y sampling has the clear adv anta ge of trans parency (O1) in that the popu lation and sampling mechanism can b e clearly d efined and accessible to outsiders, in a w a y that an opt-in survey su c h as the Xb o x is not. In addition, the pr obabilit y s ampling has the b enefits of consensus (O2), at least in the United States, where such surv eys hav e a long history and are accepted in m ark eting and opinion researc h . Impartialit y (O3) and corresp ondence to observ able realit y (O4) are less clearly p resen t b ecause of the concern with nonresp onse, ju st n oted. W e wo uld argue that the large sample size and rep eated measurement s of the Xb o x data, coupled with our soph isticated h ierarchical Bay esian adjustment sc h eme, p ut us w ell on the road to impartialit y (through the u se of m ultiple sources of information, including past election outcomes, used to correct for biases in the form of kno w n differences b et wee n sample and observ ation) and corresp ond en ce to observ able realit y (in that the metho d can b e used to esti mate pop u lation qu an tities that could b e v alidated from other sources). Regarding the virtues asso ciated with su b jectivit y , the v arious adjustmen t sc hemes represent a wareness of con text d ep endence (S2) in that the choi ce of v ariables to matc h in the p opu lation dep end on the context of p olitica l p olling, b oth in the sense of whic h asp ects of th e p opulation a re particularly relev an t for this pur p ose, and in resp ecting th e a w areness of surv ey practitioners of what v ariables are predictiv e of n onresp onse. The researc h er’s sub jectiv e p oin t of view is inv olv ed in the c hoice of exactly what information to include in w eight ing ad j ustmen ts and exact ly w hat statistica l m o del to fit in regression-based adjus tmen t. Users of p robabilit y samp lin g on groun ds of “ob jectivit y” ma y shrink from using such judgments, and ma y th erefore ignore v aluable information from the conte xt. 3.3. T r ansfo rmation of va riables in cluster analysis f o r so cio economic stratification Cluster analysis aims at group ing toge ther similar ob jects and separating dissimilar ones, and as suc h is based, explicitly or implicitly , on some measure of dissimilarit y measure. Defining suc h a measure, for example using some set of v ariables c haracterizing th e ob jects to b e clustered, can in v olv e many decisions. Here we consider an example of Hennig and Liao (2013), w h ere we clustered data from the 2007 U.S. Cons u mer Finances Su rv ey , comprising v ariables on income, savings, housing, education, occup ation, num b er of c hec king and sa vin gs accoun ts, and life insuran ce with the aim of data-based exploration of socio economic stratificatio n. The choi ce of v ariables and the decisions of how they are selected, transformed, standard ized, and weig hte d has a str on g impact on the results of the cluster a nalysis. T h is imp act dep end s to some exten t on the clustering tec h n ique that is afterward applied to the resu lting dissimilarities, but will t yp ically b e consid er ab le, ev en for cluster analysis tec hniques that are not directly based on dissimilarities. On e o f the v arious issues discussed by Hennig and Liao (2013) was the transformation of the v ariables tr eated as cont in uous (namely income and savings amount), with the view of basing a cluster analysis on a Eu clidean distance after transformation, s tandardization, and weigh ting of v ariables. There is some literature on c ho osing transformations, bu t the usual aims of transformation, namely ac hieving a ppr o ximate add itivity , linearit y , equal v ariances, or n ormalit y , are often not relev ant for clus ter analysis, w here such assumptions only apply to mo d el-based clustering, and 10 only within th e clusters, whic h are not kn o wn b efore transf orm ation. The rationale for transformation wh en setting up a d issimilarit y m easur e for clustering is of a different kind. The measure needs to form alize appropr iately whic h ob jects are to b e treated as “similar” or “dissimilar” by the clustering metho ds , and should therefore b e put in to the same or d ifferen t clusters, resp ectiv ely . I n other w ord s, the formal d issimilarit y b et wee n ob jects sh ould matc h w hat could b e called th e “in terpretativ e d issimilarit y” b et we en ob jects. Th is is an issue in v olving sub ject-matter kno wledge that cannot b e d ecided b y the data alone. Hennig and Liao (2013) argue that th e interpretativ e dissimilarit y b et wee n differen t sa vings amoun ts is go v erned rather by ratios than by differences, so that $2 m illion of savings is seen as ab out as dissimilar from $1 million, as $2,000 is dissimilar from $1,000 . Th is implies a logarithmic transformation. W e do not argue that there is a p recise argumen t that privileges the log trans- formation o ver other transformations that achiev e something similar, and one m igh t argue from in tuition that ev en taking logs ma y not b e strong enough. W e therefore recognize that any choice of transformation is a pr o visional device and only an app ro ximation to an ideal “interpretativ e dissimilarit y ,” ev en if suc h an id eal exists. In the d ataset, there are no negati v e savings v alues as there is no in formation on debts, but there are man y p eople who rep ort zero sa vings, and it is con v en tional to k lu ge the logarithmic transformation to b ecome x 7→ log ( x + c ) with some c > 0. Hennig and Liao then p oint out that, in this example, the c hoice of c h as a considerable impact on clustering. The num b er of p eople with v ery small but n onzero savings in the dataset is rather small. Setting c = 1, for example, the transformation creates a su bstan tial gap b et wee n the zero sa vings group and p eople with fairly low (but not very sm all) amount s of sa vin gs, and of course th is c hoice is also sensitiv e to scaling (for example, sa vings might b e co ded in dollars, or in thou s ands of dollars). The subsequent cluster analysis (done by “partitioning aroun d medoids”; Kaufman and Rousseeu w, 19 90) would therefore separate the zero sa vings group strictly; no p erson with zero sa vings w ould app ear together in a cluster with a p erson with n on zero s a vings. F or larger v alues for c , the diss im ilarity b et ween the zero s a vings group and p eople with a lo w sa vings amoun t b ecomes effectiv ely small enough that p eople with zero s avings could app ear in clusters together w ith other p eople, as long as v alues on other v ariables are similar enough. W e do n ot b eliev e that there is a tr u e v alue of c . Rather, clusterings arising from differen t c h oices of c are legit imate bu t imply differen t in terpretations. T h e clustering for c = 1 is based on treating the zero sa vin gs group as very sp ecial , w hereas the clustering for c = 200, sa y , implies that a difference in savings b et w een 0 and $100 is tak en as n ot suc h a big d eal (although it is a b igger deal in any case than the difference b etw een $100 and $2 00). Sim ilar co nsiderations hold for issues suc h as selec ting and w eigh ting v ariables and co ding ordinal v ariables. It can b e f r ustrating to the novice in cluster analysis that such d ecisions f or whic h there d o not seem to b e an ob jectiv e basis can mak e such a difference, and there is apparen tly a v ery strong temptation to ignore the issu e and to ju st c ho ose c = 1, wh ic h ma y lo ok “natural” in the sense that it m aps zero ont o zero, or ev en to a v oid transformation at all in order to a void the discuss ion, so that no ob vious lac k of ob jectivit y strikes the reader. Havi ng the aim of so cioeconomic s tratificatio n in mind, though, it is easy to argue that clus terings that result fr om ignoring the issue are less desirable and useful than a clustering obtained from making a ho we v er imp recisely grounded decision c h o osing a c > 1, th erefore a v oiding either separation of the zero sa vings group as a clus terin g artifact or an undue domination of the clustering by p eople with large sa vings in case of not applying any transformation at all . W e b eliev e that this kind of tuning problem that cannot b e in terpreted as estimating an unknown true constant (and do es therefore n ot lend itself naturally to an app roac h thr ou gh a Ba yesia n prior) 11 is not exclusive to cluster analysis, and is ofte n hidd en in presenta tions of data anal yses. In Hennig and Liao (2013 ), w e p oin ted out the issu e and did some sensitivit y analysis ab out the strength of the impact of th e c hoice of c (O1, transparency). The w a y w e pic k ed the c in that pap er made clear reference to the con text dep endence, while b eing honest that the sub ject-matter kno wledge in this case p ro vid ed only weak guidelines for making this d ecision (S2). W e were also clear that alternative c hoices would amount to alternativ e p ersp ectiv es rather than b eing just w r ong (S1, O3). The issue how to foster consensus and to mak e a connection to observ able r ealit y (O2, O4) is of in terest, but not treated here. It is, ho wev er, p roblematic to esta blish rationales for consensus that are based on ignorin g the con text and p oten tially multiple p ersp ectiv es. Th ere is a tend ency in the cluster analysis literature to seek formal arguments for making such decisions automatica lly (see, for example, Ev eritt et al., 2011, Section 3.7, on v ariable w eigh ting; it is hard to find an ything systematic in the clustering literature on transformations), for example tryin g to optimize “clusterabilit y” of the d ataset, or to prefer metho ds that are less sensitiv e to such decisions, b ecause this amoun ts to making the decisions implicitly without giving the researchers acce ss to them. In other words, the data are giv en the authorit y to determine not only which o b jects are similar (wh ic h is what w e wan t them to d o), bu t also w hat similarit y should mean. The latt er should b e left to the researc her, although w e ac kn owledge that the data can ha v e a certain impact: for example the idea that d issimilarit y of sa vin gs amoun ts is go verned by ratios rather than differences is connected to (but not determined b y) the fact that th e distr ib ution of sa vings amounts is sk ew ed, w ith large sa vings amounts sparsely distributed. 3.4. T esting fo r homogeneit y against clustering Another feature of the c luster analysis in Hennig and Liao (201 3) w as a parametric b o otstrap test for h omogeneit y against cl ustering, see also Hennig and Lin (2015) for a more general elab oration. Clusterings can b e computed regardless of wh ether the data are clustered in a sense that is r elev ant for the app lication of int erest. In this example, the test in v olv ed the construction of a null mo del that captured the features of the dataset su c h as the dep endence b et ween v ariables and marginal distributions of the cat egorical v ariables as well as p ossible, w ith ou t in volving anything that could b e in terpreted as clustering structure. As op p osed to the categorica l v ariables, the marginal d is- tributions of the “con tinuous” v ariables suc h as the transformed sa vings amoun t w er e treated as p oten tially indicating clustering, and therefore the null mo del used un imo dal d istributions for them. As test statistic w e used a cluster v alidit y statistic of the clustering compu ted on the data, with a parametric b o otstrap used to compute a clustering in th e same manner on data generated from the n ull model. W e used a classical significance test rather than a Ba y esian approac h here b ecause w e w ere not in terested in p osterior probabilities for either the n ull mo del to b e true or prediction of f uture observ ations. Rather the question of in terest wa s whether the o bserved cl ustering structure in the data (as measured by the v alidit y ind ex) could b e explained b y a mo del without any feature that w ould b e int erpreted as “real clustering,” regardless of whether or to what extent we b eliev e this mo del or not. Ho wev er, w e deviated from classical significance test logic in some wa ys, p articularly not using a test statistic that wa s optimal test against any sp ecific alternativ e, instead c ho osing a statistic p oin ting in a rough d irection (namely “clustering”) fr om the null mo del. F urtherm ore, setting u p the null mo d el requir ed decisions on which p otenti al c haracteristics of th e dataset wo uld b e interpretable as “clustering,” on could ther efore not b e incorp orated in the n u ll model that w as 12 to b e in terp r eted as “non-clustering.” A non-significan t outcome of the test can then clearly b e in terpreted as no evidence in the d ata for r eal clustering, whereas th e interpretati on of a significant outcome dep ends on whether w e can argue con vincingly that the n ull mo del is as go o d as it gets at trying to mo del the data without clustering structure. Setting up a stra w man null mo del for homogeneit y and rejecting it w ould hav e b een easy and n ot inf ormativ e. There is no p oin t in argu in g that ou r significance test was more ob jectiv e than for example a Bay esian analysis w ould ha v e b een, and actually our appr oac h inv olv ed decisions suc h as the distinction b et w een data charact eristics in terpreted as “clustering” or “non-clustering” and the c h oice of a test stat istic that were made b y by considerations other th an seemingly ob jectiv e mathematical optimalit y or estimation from the data. Still the ultimate aim w as to see whether the idea of a real clus terin g wo uld b e supp orted by the data (O4), in an impartial and transp aren t manner (O1, O3), trying h ard to giv e the n ull mod el a fair c hance to fit the data, but in v olving con text dep en d en t judgmen t (S2) and the transparent c hoice of a sp ecific p ersp ectiv e (the c hosen v alidit y index) among a p oten tial v ariety (S1), b ecause we w ere after more qu alitativ e statemen ts than degrees o f b elief in ce rtain mo dels. 4. Objectivit y and subjectivit y in stati stics and science 4.1. Discussions wit hin stat istics In discussions of the foundations of statistics, ob jectivit y and sub jectivit y are seen as opp osites. Ob jectivit y is t ypically seen as a goo d thing; many see it as a ma jor requiremen t for go o d science. Ba yesia n statistics is often presente d as b eing su b jectiv e b ecause of the c hoice of a pr ior distribution. Some Ba y esians (notably Ja ynes, 2003, and Berger, 2006 ) ha v e advocated an ob jectiv e approac h, whereas others (notably de Finetti, 1974) ha v e em braced sub jectivit y . It has b een argued that the s ub jectiv e/ob jectiv e distinction is meaningless b ecause all statistical metho ds, Bay esian or otherwise, requ ire sub jectiv e choice s, but the c h oice of pr ior distribu tion is sometimes held to b e particularly sub jectiv e b ecause, unlike the d ata mo del, it cannot b e determined for sure even in the asymp totic limit. In practice, sub jectiv e prior d istributions often ha v e w ell known empirical problems su c h as o verconfidence (Alp ert and Raiffa, 1984, E r ev, W allsten, and Budescu, 1994), whic h motiv ates efforts to c hec k and calibrate Ba y esian mo d els (Ru bin, 1984 , Little, 201 2) and to situate Ba y esian inference within an error-statisti cal philosophy (Ma yo , 1996, Gelman and Shalizi, 2013) . De Finetti can b e credited with ac kno wledging honestly that sub jectiv e decisions cannot b e a voided in sta tistics, bu t it is misleading to think that the requ ir ed sub j ectivit y alw a ys takes the form of prior b elief. The confusion arises from t w o directions: first, p rior d istr ibutions are not n ecessarily any more sub jectiv e than other asp ects of a statistical mo del; ind eed, in man y applications priors can and are estimated from data frequencies (see Ch apter 1 of Gelman, Carlin, et al., 2013, for sev er al examp les). Second, somewhat arbitrary choice s come in to man y asp ects of statistica l mod els, Ba yesia n and otherwise, and therefore we think it is a mistake to consider the prior distribution a s the exclusiv e gate at whic h sub jectivit y ent ers a statistica l p ro cedure. The ob jectivit y vs. sub jectivit y issue also arises with statistical metho d s th at require tuning parameters; decision b oundaries such as the signifi cance lev el of tests; an d decisio ns regarding inclusion, exclusion, and transform ation of d ata in pr eparation f or analysis. On one hand, statistics is sometimes said to b e the science of defaults: most applications of statistics are p erformed by non-statisticians who adap t existing general metho d s to their p articular problems, and m u c h of the researc h within the field of statistics in vo lv es devising, ev aluating, 13 and impro ving suc h generally applicable p ro cedures (Gelman, 2014b). It is then seen as desirable that an y required d ata-analyt ic decisions or tunin g are p erformed in an ob jectiv e manner, either determined someho w from the data or ju stified by s ome kind of optimalit y argum ent. On the other hand, practitioners must apply their sub jectiv e ju dgmen t in the c h oice of what metho d to use, what assumptions to inv ok e, and wh at data to include in their analyses. Even using “no need for tuning” as a criterion for metho d selection or prioritizing bias, for example, or mean squared error, is a sub jectiv e decision. Settings that app ear c ompletely mec h anical in volv e choice : for example, if a researc her has a chec klist sa y in g to apply linear regression for con tinuous data, logistic regression for binary data, and P oisson regression for coun t data, h e or s h e still has the option to co de a resp onse as contin uous or to u se a threshold to defi ne a binary classification. And suc h c hoices can b e far from trivial; for example, when mo d eling elections or sp orts outcomes, one can simp ly predict the winner or in stead predict the n umerical p oin t differen tial or vo te margin. Mo deling the bin ary outcome can b e sim p ler to exp lain but in general will thr o w aw a y inf ormation, and sub jectiv e judgment arises in deciding what to do in this s ort of problem (Gelman, 2013a) . And in b oth classical and Ba yesia n statistics, sub jectiv e c h oices arise in defi n ing the sample space and considering w hat information to condition on. 4.2. Discussions in other fields Sc holars in humanistic stud ies su c h as h istory an d literary criti cism ha ve co nsidered the wa ys in which d ifferently-situate d observers can giv e different in terpretations to w hat Luc San te calls the “factory of facts.” I n p olitical arguments, contro v ersies often arise o v er “c her r y pic king” or selectiv e use of data, a concern w e can map dir ectly to the statistical principle of random or represent ativ e samp lin g, and the more general idea that information used in data collection b e included in any statistic al analysis (Rub in , 1978). In a different w a y , the concepts of transference and coun ter-transference, cen tral to p s yc hoanalysis, liv e at the b ound ary of p ersonal impressions and measurable facts, all sub ject to the constrain t that, as Philip K. Dic k put it, “Realit y is that whic h, when y ou stop b elieving in it, do esn’t go a w a y .” The so cial sciences ha ve seen endless arguments ov er the relativ e imp ortance of ob jectiv e condi- tions and what Keynes (1936) called “animal spirits.” In macro economics, f or example, the debate has b een b etw een the monetarists wh o tend to c h aracterize recessions as necessary consequences of underlying economic conditions (as measured , for example, by current accoun t b alances, busin ess in v estmen t, and pro du ctivit y), and the K eynesians who fo cus on more sub jectiv e factors such as sto c k mark et bub bles and firm s’ in v estmen t d ecisions. Th ese disagreement s also turn metho dologi- cal, with muc h disp ute, for example, ov er the virtues and d efects of v arious attempts to ob jectiv ely measure the supp ly and v elo cit y of money , or consumer confidence, or v arious ot her inputs to eco- nomic mo dels. Th e int erpla y b et w een ob jectiv e and sub j ectiv e effects also arises in p olitical science, for example in the question of wh ether to attribute the p olitical successes of a Ronald Reaga n or a Bill Clinto n to their c harisma and app ealing p ersonalities, to their p olitical n egotia ting skills, or simply to p erio ds of economic pr osp erit y that w ould h a ve m ad e a success out of ju st ab out any p olitical leader. Again, these disputes link to con tro versies reg arding r esearch metho ds: a fo cus on ob jectiv e, measurable factors can b e narr o w, but w ith a more su b jectiv e analysis it can b e difficult to attain a scien tific consensus. In fi elds s u c h as so cial work it has b een argued that one must w ork with sub j ectiv e r ealitie s in order to mak e ob jectiv e progress (Saari, 200 5), but this view is relev ant to science more generally . In the social and physica l sciences alik e (as well as in hybrid fi elds such as psychoph ysics), the t wen tieth century sa w an in tertwining of ob jectivit y and sub jectivit y . F rom one direction, He isen- 14 b erg’s un certain ty principle told u s that, at the quan tum lev el, measur emen t dep ends fu ndamenta lly on the observ ation p ro cess, an insight that is implicit in mo d er n statistic s and econometrics with lik eliho o d functions, measuremen t-error mo dels, and sampling and missing-data mec hanisms b eing manifestations of observ ation mo d els. So in that sense th ere is n o pure ob jectivit y . F rom the other direction, psyc h ologists h a ve con tin ued their effort to scien tifically measure p ersonalit y traits and sub jectiv e s tates. F or example, Kahneman (1999) defines “ob jectiv e h appiness” as “the a v erage of utilit y o ver a p erio d of time.” Whether or not this definition make s muc h sense, it illustrates a mo vemen t in the so cial and b eha vioral sciences to measure, in su pp osedly ob jectiv e manner s , wh at migh t previously ha v e b een considered unmeasurable. Muc h of these discussions are relev ant to sta tistics b ecause of the r ole of quantificat ion. There is an ideo logy widespread in man y areas of science that sees quan tification and n u m b ers and their statistica l analysis as k ey to ols for ob jectivit y . An imp ortant fun ction of qu an titativ e scien tific measuremen t is the pro d uction of observ ations that are though t of as indep end en t of individual p oint s of view. Apart from the generally difficult issu e of measurement v alidit y , the focu s on what can b e quantified, h o wev er, ma y narrow do wn what can b e observ ed, and may not necessarily do the measured entiti es justice, see the examples f rom p olitical science and ps yc hology ab o ve . Another examp le is the use of qu an titativ e indicators for human righ ts in differen t countries; al- though it has b een argued that it is of ma jor imp ortance that su c h ind icators should b e ob jectiv e to ha ve appropriate impact on p olitical decision m aking (Candler et al., 2011), many asp ects of their defin ition and method ology are sub ject to con tr ov ersy and reflect sp ecific p olitica l in terests and views (Merry , 20 11), and we th in k th at it will help the debate to communicate suc h indicators transparent ly together with their limitations and th e in v olv ed decisions rather than to sell them as ob jectiv e and un questionable. In man y places the p resen t pap er may read as if w e treat the ob- serv ations to b e analyzed b y the statisticia ns as gi v en, but we ac knowledge the cen tral imp ortance of measuremen t and th e b enefits and drawbac ks of quan tifi cation. See P orter (1996), Desrosieres (2002 ), Douglas (2009) for more discussion of the connection b et w een quant ification and ob jectivit y . As with c hoices in statistical m o deling and analysis, we b eliev e that when considering measur e- men t the ob jectiv e/sub jectiv e antag onism is less h elpf ul th an a more detailed d iscu ssion of what quan tification can ac h iev e and what its limitations are. 4.3. Concepts of objectivit y Discussions in volving ob jectivit y and sub jectivit y often suffer from ob jectivit y ha ving m u ltiple meanings, in stat istics and elsewhere (m uch of the follo wing discussion will focus on the te rm “ob jectivit y”; sub jectivit y is often considered as the opp osite of ob jectivit y and as suc h implicitly defined). Ambiguit y in these terms is often ignored. W e b eliev e that suc h discuss ions ca n b ecome clearer b y referrin g to the meanings that are relev an t in an y sp ecific situation instead of us ing the am b iguous terms “o b jectivit y” and “sub jectivit y” without furth er explanation. Lorraine Daston has explored the wa ys in wh ic h ob jectivit y has b een u s ed as a wa y to generalize scien tific inquiry and mak e it more p ersuasiv e. As Daston (1992) p u ts it, scientific ob jectivit y “is conceptually and historically distinct fr om the ont ological asp ect of ob jectivit y that p ursues the ul- timate structure of realit y , and from the mechanica l asp ect of ob jectivit y th at forbid s in terpretation in rep orting and p icturing scien tific results.” Th e core of the curr en t use of the term “ob jectivit y” is the idea of imp ersonalit y of scienti fic statemen ts and pro cedures. According to Daston and Galison (2007 ), the term has only b een used in this wa y in science from the mid-n ineteen th ce n tury; b efore then, “ob jectiv e” an d “su b jectiv e” were used with meanings almost opp osite from the current ones and did not play a strong role in discussions ab out science. Dasto n (1994) sp ecifically add r esses 15 c h anging concepts of su b jectivit y a nd ob jectivit y of probabilities, and Zab ell (2011) traces th e historical dev elopment of these concepts. The idea of in d ep endence of the individu al su b ject can b e applied in v arious wa ys . Megill (1994) listed four basic senses of ob jectivit y: “absolute ob jectivit y” in the sense of “repr esen tin g the things as they really are” (indep end en tly of an observ er), “disciplinary ob jectivit y” referring to a consensu s among exp erts within a discipline and highlighti ng the role of comm unication and negotiati on, “pro- cedural ob jectivit y” in the sense of follo w in g ru les that are ind ep endent of the ind ividual researc h er, and “dialectica l ob jectivit y .” The latter somewhat surp r isingly inv olv es sub jectiv e con trib utions, b ecause it refers to activ e human “ob jectification” r equired to mak e p henomena communicable and measurable so that they can then b e treated in an o b jectiv e wa y so that differen t sub jects can un- derstand them in the s ame w a y . S tatistics for example r elies on the constru ction of w ell delimited p opulations and categories within whic h a v erages and probabilities can b e defined; see Desrosieres (2002 ). Daston and Galison (2007) call the ideal of scien tific images that attempt to capture realit y in an u nmanipulated wa y “mec hanical ob jectivit y” as opp osed to “structur al ob jectivit y ,” whic h emerged fr om the in sigh t of scientists and philosophers suc h as Helmholtz and Poinca re that obser- v ation of realit y cannot exclude the observ er and will nev er be as reliable a nd pure as “mec hanical ob jectivists” w ould hop e. Instead, “structural ob jectivit y” r efers to mathematical and logica l stru c- tures. P orter (1996 ) lists the ideal of impartialit y of observ ers as another sense of ob j ectivit y , and highligh ts the imp ortant r ole of quantitat iv e and f ormal r easoning for concepts of ob jectivit y b e- cause of their p oten tial f or remo ving ambiguitie s. In b road agreemen t with interpretati ons already listed (and co v ered by our virtu es), Reiss and S prenger (2014) group key asp ects of ob jectivit y into the categories “faithfulness to facts,” “absence of normativ e commitment s and v alue-freedom,” and “absence of p ersonal bias.” F uc hs (1997) n otes that v arious mo d ern meanings of ob jectivit y rather refer to th e absence of sub jectivit y and all kinds of b iasing f actors than to something p ositiv e. T o us, the most problematic aspect of the term “ob jectivit y ” is that it incorp orates normativ e and descriptiv e asp ects, and that these are often n ot clearly delimited. F or example, a statistical metho d that do es not require the sp ecificatio n of any tuning parameters is ob jective in a descriptiv e sense (it does not r equire decisions b y the in dividual scientist) . Often this is presented as an adv an tage of th e metho d without fur ther discussion, imp lying ob jectivit y as a norm, bu t dep end ing on the sp ecific situation the lac k of flexibilit y caused by the imp ossibilit y of tunin g ma y actually b e a disadv antag e (and in deed can lead to sub jectivit y at a differen t p oint in the an alysis, when the analyst must mak e the decision of whether to use an auto-tuned approac h in a setting where its inferences do not app ear to mak e sense). Th e frequ en tist interpretatio n of probabilit y is ob jectiv e in the sense that it lo cates probabilities in an ob jectiv e world that exists indep endently of the observ er, but the d efinition of these probabilities requir es a sub jectiv e definition of a reference set. Although some prop onent s of frequent ism consider its ob jectivit y (in the sense of imp ersonalit y , conditional on th e defin ition of the reference set) as a virtue, this prop ert y is ultimately only descriptiv e; it do es not imply on its o wn that such probabilities indeed exist in the ob jectiv e w orld , nor that th ey are a w orthwhile target for scien tific inqu iry . The in terpr etation of ob jectivit y as a scien tific virtue is connected to what are seen to b e the aims and v alues of science. Scien tific realists h old that finding out the truth ab out the observer- indep end en t realit y is the ma jor aim of science. This mak es “absolute ob jectivit y” as discus sed ab o v e a core scien tific id eal, as w h ic h it is still p opular. But ob s erv er -in d ep endent realit y is only accessible through h uman ob s erv ations, and the realist ideal of ob jectivit y has b een branded as metaph ysical, m eaningless, and illusory by p ositivists in clud ing Karl Pearson (1911 ), and more con temp orarily by empiricists such as v an F r aassen (1980). In the latter group s, ob jectivit y is seen 16 as a virtue as well, although for them it do es n ot refer to observ er-indep en den t r ealit y but rather to a standardized, disciplined, and impartial ap p lication of scien tific m etho dology enabling academic consensus ab out observ ations. Reference to ob s erv ations is an elemen t that the empiricist, p ositivist, and realist ideas of ob jectivit y ha v e in common; Ma yo and Spanos (2010) see c hecking theories against exp erience b y means of what they call “error statistics” as a cent ral to ol to ensur e ob jectivit y , whic h according to them is concerned w ith fi nding out ab ou t realit y in an unbiased manner 1 . In con trast, v an F raassen (1980 ) tak es observ abilit y and the abilit y of theory to accoun t f or observ ed f acts as ob jectiv e from an anti -realist p ersp ectiv e. His construal of obser v abilit y dep ends on the cont ext, theory , and means of observ ation, and his concept of ob jectivit y is conditional on these conditions of observ ation, assuming that at least acceptance of observ ations and observ ability giv en th ese conditions should not dep end on the sub ject. Daston and Galison (2007 ) p ortr ay the rise of “mec hanical ob jectivit y” as a scie nt ific virtue in reaction to shortcomings of the earlie r scien tific id eal of “truth-to-nature,” whic h r efers to the idea that science should d isco ve r and present an underlying ideal and u n iv er s al (Platonic) truth b elo w the observed p henomena. The mov e tow ards m ec hanical ob jectivit y , ins p ired by the develo pment of photographic tec hniques, implied a shift of p ersp ectiv e; instead of pro ducing pur e and ideal “ true” t yp es the fo cus m o ved to capturing n ature “as it is,” with all irregulariti es an d v ariations that h ad b een suppressed b y a s cience dev oted to “truth-to-nature.” Increasing insigh t in the sh ortcomings and the theory-dep end ence of supp osedly ob jectiv e observ ational tec hniques led to the virtue of “trained jud gmen t” as a r esp onse to mechanica l ob jectivit y . According to Daston and Galison (2007 ), th e later virtues did not simply r eplace the older ones, but rather su pplemen ted them, so that no wada ys all three still exist in science. Another t yp ology of ob jectivit y w as set u p by Douglas (2004), w h o d istinguishes three mo des of ob jectivit y , namely human interac tion with the w orld (connected to our “corresp ondence to observ- able r ealit y”), ind ividual thought pr o cesses (connected to our “impartialit y”) and pro cesses to reac h an agreemen t (connected to our “consensus” and “transparency”). T hese mo d es are su b divided in to differen t “senses.” Regarding human interact ion with the world, Dougla s distinguish es ob jec- tivit y connected to human manipulation and inte rv en tion and ob j ectivit y connected to stabilit y of results wh en taking multi ple app r oac hes of observ ation. Regarding individual thought pro cesses, a interesting distinction is made b et w een prohibition ag ainst using v alues in place of evidence and against using an y v alues at all. Douglas susp ects that the latter is hard to ac hiev e and will rather encourage sw eeping issues und er th e carp et. She writes th at “hiding the d ecisions that scien tists mak e, and the imp ortan t role v alues sh ould pla y in th ose decisions, do es not exclude v alues.” A third sense is the conscious attempt to b e v alue-neutral. The three prop osed sen ses of ob jectivit y regarding pro cesses to reac h an agreemen t are the u se of generally agreed pro cedures, exploration of whether and to what exten t consensu s exists, and an in teractiv e discursive attempt to ac hiev e consensus. F ur ther distinctions regarding ob jectivit y app ear in the philosophical literature. Reiss an d Sprenger (201 4) d istinguish the ob jectivit y of a p ro cess, suc h as inference or pro cedure, from the ob jectivit y of an outcome. Some of our asp ects of ob jectivit y , such as impartialit y , concern the former; while others, suc h as co rresp ondence to observ able realit y , concern the latter; but the connection is not alw a ys clear. F ollo win g Reic henbac h (1938) , there is m uch discussion in the philosophy of science concerning the distinction b et wee n the “con text of d isco ve ry” and the 1 Ma yo emphasizes that h er approach do es not requ ire b eing a realist; according to our reading, she is in any case concerned with observer-independ ent realit y , as opp osed t o the p ositivists, without subscribing to naive and all t o o optimistic ideas ab out what w e can k now about it. 17 “con text of ju stification,” with s ome arguing that sub jectiv e impact is problematic for the latter but n ot the former. Some, how ev er, c hallenge th e idea that these tw o con texts can b e appropr iately separated and that one can a v oid th e impact of sub jectiv e v alues on justification, see Reiss and Sprenger (20 14) for an o verview. Ob jectivit y has also b een criticized on th e groun ds that, as attractiv e as it ma y seem as an id eal, it is illusory . T his cr iticism h as to refer to a sp ecific in terpretation of ob jectivit y , and a w eak er in terpretation of ob jectivit y may still seem to cr itics to b e a g o o d thing: v an F raassen agrees with K uhn (1962) a nd o thers that “absolute ob jectivit y ” is an illusion and that ac cess to realit y is dep endent of the obs er ver, b ut he still h olds that ob jectivit y conditional on a system of r eference is a virtue. But ther e is ev en criti cism of th e idea that ob jectivit y , p ossible or not, is desirable. F rom a particular f eminist p oin t of view, MacKinnon (1987) w rote: “T o lo ok at th e world ob jectiv ely is to ob jectify it.” Str iving for ob jectivit y itself is seen here as a sp ecific and p oten tially harm ful p ersp ectiv e, implying a denial of the sp ecific conditions of an observer’s p oin t of view. A similar p oint wa s made b y F ey erab end (1978). Maturana (1 988) critic ally discussed the “explanatory path of ob jectivit y-without-paren thesis” in wh ic h observ ers deny p ersonal resp onsibility f or their p ositions based on a su pp osedly privileged access to an ob jectiv e realit y; he appr eciate d a more p ersp ectiv e-dep endent attitude, which he called “ob j ectivit y-with-parent hesis.” F uchs (1997) gives an o v erview of con tro versies rega rding the role of ob jectivit y in comm unica- tion and discourse, lo oking at ideas such as that ob j ectivit y m ay b e m erely a rhetorical device, or a to ol of p o we r, or, on the other hand , a to ol to defend suppr essed v iews against p ow er. Although h e is critical of suc h ideas if interpreted in a redu ctionist manner, in his o wn theory h e p ortra ys ob jec- tivit y as a comm un icativ e “mediu m” of science, which is to some ext en t constitutiv e and essen tial for s cience, particularly for fostering consensus, although it comes w ith its o wn “blind sp ot.” W e can connect with su c h a so ciologic al p ersp ectiv e regarding the asp ect that our criticism of ob j ec- tivit y rather p oints at its role in the (statistics) discourse, wh ic h we do not see as ind isp ensable, than at its meanings. A recent pap er titled “Let’s n ot talk ab out ob jectivit y” (Ha c kin g, 2015) suggests that statistics is not alone in h andling the concepts of ob jectivit y and sub jectivit y in a messy wa y , and that we are not alone in advocating that suc h discourse should b e replace d by lo oking at more specific virtues of scien tific wo rk. Ho wev er, despite our pr op osal to replace these terms in statistics, we app reciate the deep insight manifest in th e philosophical con tro v er s ies abou t ob j ectivit y and sub jectivit y and in the existing attempts to define them in a meanin gfu l wa y . Th e reader will realize that our prop osal was strongly influenced by the elab oration of different senses and t yp es of ob jectivit y in the philosoph y of science, as discus s ed ab o v e. W e also resp ect th e view that it is v aluable to stic k to a concept of ob jectivit y that can serv e to distinguish v alid from (in s ome w a y) biased in ference and science from p seudo- science. Deb orah May o even suggested to us that the term “ob jectiv e” could r efer to an appr op r iate use of all ou r p ositiv e attributes com bined, wh ereas wh at w e are really (and correctly) opp osed to is some kind of misleading “fak e” notion of ob jectivit y . Ho we v er, facing some of the damage done by more common and m uch less thought th r ough uses of “ob jectivit y” and “sub jectivit y” in everyda y exc hanges, w e still b eliev e that our prop osal will result in an impr o vemen t of the d iscussion cultur e in statist ics o v erall. 4.4. Our at titude tow ard objectivit y and subjectivit y in science The attit ude tak en in the p resen t pap er is based on Hennig (2 010), wh ich was in turn inspired b y constructivist philosoph y (Maturana, 1988, v on Glasersfeld, 1995 ) and distinguishes p ersonal real- 18 it y , so cial realit y , and observer-indep endent realit y . According to this p ersp ectiv e, h uman in quiry starts from observ ations that are made by p ersonal observers (p ersonal r ealit y). Th rough comm u- nication, p eople sh are observ ations and ge nerate social r ealities that go beyond a p ers onal p oin t of view. These shared realities include for example measuremen t p ro cedures that standard ize obser- v ations, and mathematica l mo dels that connect observ ations to an abstract form al system that is mean t to create a though t system cleaned from individu ally different p oint of views. Nevertheless, h uman b eings o nly ha v e access to observ er-ind ep endent realit y through p ersonal observ ations and ho w these a re brought together in social realit y . According to Henn ig (2010), sci ence aims at arriving at a view of realit y that is stable and reliable and can b e agreed f reely by general observ ers and is therefore as observ er-in d ep endent as p ossible. In this sense w e see ob jectivit y as a scienti fic ideal . But at the same time we ac kn o wledge what ga v e rise to the criticism of ob jectivit y: the existence of differen t individual p ersp ectiv es and also of p ersp ectiv es that differ b etw een so cial systems, a nd therefore th e ultimate inacc essibilit y of a realit y that is truly indep end en t of observers, is a basic human condition. Ob jectivit y can only b e attributed by observers, and if observers disagree ab out w hat is ob jectiv e, there is no privileged p osition from w hic h this can b e decided. Ideal ob jectivit y can nev er b e a c hiev ed. This do es not imply , how ev er, that scien tific d isp utes can never b e resolv ed b y scien tific means. Y es, there is an elemen t of “p olitics” inv olv ed in the adjudication of schola rly disagreements, but, as we shall discus s , the norm of tr ansp ar ency and other norms asso ciated w ith b oth ob jectivit y and sub jectivit y can adv ance suc h discuss ions . In general no p articular observ er has a privileged p osition but this do es n ot mean that all p ositions are equal. W e recognize sub jectivit y not to thro w up our h ands and giv e up on the p ossibilit y of scien tific consensus bu t as a firs t step to exploring and, ideally , reconciling, the multiple p ersp ectiv es th at are inevitable in n early an y human in quiry . Den yin g the existence of differen t legitimate sub jectiv e p ersp ectiv es and of their p oten tial to con trib ute to scien tific inquiry cannot mak e sense in the n ame of ob jectivit y . Heterogeneous p oin ts of view cannot b e dealt with b y imp osing authorit y . Our attitude v alues the attempt to reac h scien tific agreemen t b et ween d ifferent p ers p ectiv es, but ideally such an agreemen t is reac hed by free exc hange b et ween the differen t p oint s of view. In practice, ho w ev er, agreement will not normally b e universal, and in order to progress, s cience has to aim at a m ore restricted agreemen t b et wee n exp erts who h a ve en ou gh backg round knowledge to either mak e sure that th e agreement ab ou t something n ew is in line with what w as already established earlier, or to kn ow that and ho w it r equires a revision of existing knowledge . But the r esulting agreemen t is still in tended to b e p oten tially op en for every one to join or to c hallenge. Ther efore, in science there is alw a ys a tension b et w een the ideal of general agreemen t and the realit y of heterogeneo us p er s p ectiv es. F ur thermore our atti tude to science is based on th e idea that consens u s is p ossible regarding stable and r eliable statemen ts ab out the observ ed r ealit y (whic h may require elab orate measuremen t pro cedures), and that science aims at n on trivial kn o wledge in the sense that it mak es statemen ts ab out observ able realit y that can and should b e c hec k ed and p oten tially falsified b y observ ation. Although there is no ob jectiv e access to obser ver-indep endent realit y , w e ac knowle dge that there is an almost u niv ersal human exp erience of a r ealit y p er ceive d as lo cated outsid e the obs erv er and as n ot con trollable by the observ er. T his realit y is a target of science, although it cannot b e tak en for g rant ed that it is indeed indep endent of the observer. W e are therefore “activ e scien tific realists” in the sense of Chang (20 12), who writes: “I take r ealit y as wh ateve r is not sub ject to one’s will, and knowledge as an abilit y to act without b eing fr ustrated by resistance from realit y . This p ersp ectiv e allo ws an optimistic rendition of the p essimistic induction, which celebrates the fact that we can b e su ccessful in science w ithout ev en kn owing the truth. The standard realist argumen t from success to truth is sho w n to b e ill-defined and fla wed.” This form of r ealism is not 19 in con tr adiction to the criticism of realism by v an F raassen or the arguments against the desirab ility of certain forms of ob jectivit y by constructivists or feminists as outlined ab o ve . Activ e scien tific realism imp lies that finding out the truth ab out ob jectiv e reali t y is not the ultimate aim of science, but that science r ather aims at supp orting human actions. T his means that scientific m etho dology has to b e assessed relativ e to the sp ecific aims and actions connected to its us e. Another ir reducible sub jectiv e elemen t in science, apart from m ultiple p ersp ectiv es on r ealit y , is th er efore the aim of scien tific inquiry , whic h cannot b e standardized in an ob jectiv e w a y . A t ypical statistic al instance of this is how muc h prediction ac curacy in a r estricted setting is v alued compared w ith parsimon y and in terpr etabilit y . Because science aims at agreemen t, comm unication is cen tral to science, as are transparency and tec hn iqu es for supp orting the clarit y of comm unication. Among these tec hniqu es are formal and mathematical language, standardized measuremen t pro cedur es, and scien tific mo d els. Ob j ectivit y as w e see it is therefore a scien tific ideal that can never fully b e achiev ed. As m uc h as science aims for ob jectivit y , it has to ac kno wledge that it can only b e b uilt fr om a v ariet y of su b jectiv e p ersp ectiv es through co mmunicat ion. 5. Decomp osing subjectivit y and objectivit y in the foundations of s tatistics In this sectio n, we use the ab o ve list of virtues to revisit asp ects of the discussion on fundamen tal approac h es to s tatistics, f or whic h the terms “sub jectiv e” and “ob jectiv e” t ypically play a dominant role. W e discuss what w e p erceiv e to b e the ma jor streams of the foun dations of statistic s, but within eac h of these streams ther e exi st s ev eral differen t app roac hes, whic h w e cannot cov er complete ly in suc h a p ap er; rather we ske tc h the streams somewhat r oughly and refer to only a single or a few leading authors for details where needed. Here, we d istinguish b et wee n interpretatio ns of prob ab ility , and approac hes for s tatistical in fer- ence. Thus, we take frequen tism to b e a n inte rpretation of probabilit y , whic h do es not necessarily imply th at Fisherian or Neyman-Pearson tests are p r eferred to Ba y esian metho ds, despite the fact that frequen tism is more often asso ciated with the former than with the latter. W e shall go through seve ral p hilosophies of statistical inferen ce, for ea c h la ying out th e connec- tions w e see to the virtu es of ob jectivit y and sub jectivit y outlined in Section 2.3. Exercising a wareness of multiple p ersp ectiv es, w e emphasize th at we do not b eliev e that one of these p h ilosophies is the correct or b est one, n or do we claim that redu cing the differen t ap- proac h es to a single one w ould b e desirable. What is lac king here is not unification, but rather, often, transparency ab out whic h interpretation of pr ob ab ilistic outcomes is intended wh en applying statistica l mo d eling to sp ecific problems. Particularly , we think that, dep endin g on the situation, b oth “aleato ry” or “epistemic” approac hes to modeling uncertain ty are legit imate and worth wh ile, referring to d ata generating pr o cesses in observe r-indep enden t realit y on one hand and rational degrees of b elief on the other. 5.1. F requentism W e lab el “frequen tism” as the iden tification of the probabilit y of an ev ent in a certain exp eriment with a limiting relativ e frequency of o ccur rences if the exp eriment w ere to b e carried out infinitely often in some kind of ind ep endent mann er. F requent ist statistics is based on ev aluating pro cedu r es based on a long-term av erage o v er a “reference set” of h yp othetical replicated data sets. In the wider sense, we call pr ob ab ilities “frequen tist” when they formalize observ er-indep end en t tendencies or prop ensities of exp erimen ts to yield certain outcomes (see, for example, Gillies, 2000), whic h are 20 though t of as replicable and yielding a b eha vior und er infinite replicati on as suggested b y what is assumed to be the “true” p robabilit y mo d el. The f requen tist mindset lo cates probabilities in the observ er-indep end en t world, so they are in th is sense ob jectiv e. This ob jectivit y , h o wev er, is mo del-based, as an infin ite amount of actual replicates cannot exist, and most researc hers, in most s ettings, w ould b e ske ptical ab out tru ly iden tical replicates and true indep endence or, when it comes to observ ational studies, ab out whether observ ations can b e int erpreted as dr a wn in a p urely rand om mann er from an appr opriate reference set. The decision to adopt th e frequentist interpretatio n of p robabilit y regarding a certain ph e- nomenon therefore r equ ires ideali zation. It cannot b e jus tified in a fully ob jectiv e wa y , w h ic h here means, r eferr ing to our list of virtues, that it can neither b e enf orced b y observ ation, nor is there general enough consensus that this in terp retation applies to any sp ecific setup, although it is w ell discussed and supp orted in some physica l settings such as radioactiv e deca y (O2, O4). On ce a fre- quen tist mo del is adopted, ho w ev er, it makes predictions ab out observ ations that can b e c heck ed, so the r eference to the observ able realit y (O4) is clear. There is s ome disagreemen t about whether the frequen tist defin ition of probabilit y is clear and unam biguous (O1a). On one hand, the idea of a tendency of an exp eriment to pro duce certain outcomes as manifested in observ ed and exp ected relativ e frequencies seems clear en ough, giv en that the circumstances of the exp erimen t are w ell defin ed and r egardless of whether frequencies indeed b eha v e in the implied w a y . On the ot her h and, v on Mises (1957) was n ot completely successful in his attempt to a v oid in volving sto c hastic indep endence and iden tit y in the definition of frequen tist probabilities through the concepts of the collecti v e and the axiom of inv ariance u nder place selection rules (Fine, 1973), and the issue has n ev er b een completely resolv ed. F requ entism implies that, in the observer-indep endent r ealit y , true probabilities are uniqu e, but there is co nsiderable r o om for m ultiple p ersp ectiv es (S1) regarding the definition of replicable exp eriment s, colle ctiv es, or reference sets. The idea of replication is often constru cted in a rather creativ e wa y . F or example, fr equ en tist time series mo dels are used f or time series data, implying an underlying tr ue distribu tion for ev ery single time p oin t, but there is no wa y to r ep eat observ ations indep end en tly at the same time p oin t. Th is a ctually means that the effect iv e sample size for time series data w ould b e 1, if replication w as not implicitly constructed in the statistica l mo d el, for example by assuming indep enden t inno v ations in ARMA-t yp e mo d els. Suc h models, or, more precisely , certain asp ects of su c h mod els, can b e c h ec ked against the data, b ut ev en if suc h a c h ec k do es not fail, it is still clear that there is no s u c h thing in observ able realit y , even appr o ximately , as a marginal “t rue” frequen tist distr ibution of the v alue of the time series x t at fixed t , as implied b y the model, b ecause x t is strictly n ot replicable. The issu e that u seful statistical m o dels require a construction of replication (or exc hangeabilit y) on some lev el by the statistic ian, is, as w e d iscuss b elo w, not confined to frequentist mo dels. In order to pr o vide a rationale f or the essenti al statistical task of p o oling information from man y observ ations to make inference relev an t f or future observ ations, all these observ ations n eed to b e assumed to someho w represent the same process. The appr opriateness of suc h assumptions in a sp ecific situation can often only b e tested in a quite limited wa y b y observ ations. All kinds of inform al arguments can apply ab out why it is a go o d or b ad idea to consider a ce rtain set of observ ations (or unobserv able imp lied enti ties suc h as error terms and laten t v ariables) as indep endent and iden tically distribu ted frequen tist replicates. Unfortunately , although such an op enness to multiple p ersp ectiv es and p otent ial con text-de- p endence (S2a) can b e seen as p ositive fr om our p ersp ectiv e, these issu es in v olved in the c hoices of a fr equen tist reference s et are often not clearly comm u nicated and discussed. The existence of 21 a true mod el with implied reference set is t ypically tak en for gran ted b y frequen tists, motiv ated at least in part by the desire for ob jectivit y . F rom the p ersp ectiv e tak en here and in Henn ig (2010), the frequen tist in terpretation of p rob- abilit y can b e adopted as an idealized mo d el, a though t construction, without having to b elieve that frequentist probabilities r eally exist in the observ er-indep end en t w orld (many criticisms of frequent ism suc h as most of the issues raised in Ha jek, 2009, refer to a b elief in the “existence ” of limits of h yp othetical sequences that are imp ossible in the real w orld). Th is can b e justified, on a case-b y-case b asis, if it is seen as useful f or the scien tific aims in the giv en situation, f or example b ecause a sp ecific frequent ist mo del communicates (more or less) clearly the scientist’s view of a certain phenomenon (O1a) , and implies the means for testing this aga inst observ ations (O4). 5.2. Erro r stat istics The term “error statistics” wa s coined by th e ph ilosopher Deb orah May o (1996). W e us e it here to refer to an appr oac h to statistical inference that is based on a fr equen tist in terpretation of p rob- abilit y and metho ds that can b e c haracterized and ev aluated b y error p robabilities. T raditionally these would b e th e T yp e I and Typ e I I errors of Neyman-Pe arson h yp othesis testing, b ut the err or- statistica l p ersp ectiv e could also app ly to other constructs suc h as errors of sign and m agnitude (“T yp e S” and “Type M” errors; Gelman and Carlin, 2014). Ma y o (1996) in tr o duced another k ey concept for error s tatistic s, “sev erity ,” whic h is connected with, but not identica l to, the p o wer of tests. It serv es to quantify the exten t to whic h a test resu lt can corrob orate a h yp othesis (kee ping in mind that testing sp ecific statistical hypothesis can only ever shed ligh t on sp ecific asp ects of a scient ific theory of in terest; and that a sp ecific test can only corrob orate a sp ecific asp ect of a h yp othesized statistical mo d el). The severit y pr inciple states that a test resu lt can only b e evidence of the absence of a certain discrepancy from a (n ull) hyp othesis, if the p robabilit y is h igh, giv en that suc h a discrep an cy indeed existed, that the test result w ould h a ve b een less in line with the h yp othesis than what was ob s erv ed 2 . According to Ma y o and Spanos (2010), ob j ectivit y is a core concern of error statistics, whic h is sp ecifically dr iv en b y pro viding metho dology for repro du ction, testing, and f alsificatio n (O4b). Ma yo (2014 ) defined ob jectiv e scien tifi c measuremen t as b eing “relev an t,” “reliably capable,” and “able to learn f rom err or,” whic h ca n b e interpreted as the err or-statistica l rationale for consens u s (O2c). Error statistic al metho d ology is p ortra y ed as “reliably capable” as far as its p oten tial to pro du ce inferen tial errors can b e analyzed, and as far as the resulting error probabilities are lo w. The “abilit y to learn fr om error” refers to err oneous h yp otheses, r ejected b y an error statistical pro cedure that optimally can pinp oint th e reason for r ejection and th u s lead to an imp ro v ement of the hyp othesis, rather than errors of the inferential metho d. Th e un derlying idea, with whic h w e ag ree, is th at lea rning from e rror is a main driving force in sci ence, a lifetime cont ract b et ween the mo de of statistical inv estigation and its ob ject. This corresp onds to Ch an g’s activ e scien tific realism mentio ned ab o v e, and it imp lies that for Ma yo the reference to observ ations is cen tral for ob jectivit y . Ma yo’ s “ relev ance” concerns the problem of inquiry of interest and is therefore r elated to virtu e S2a, which w e classified as related to sub jectivit y . As May o attempts to defend th e ob jectivit y of the error statistic al approac h against c harges of s ub jectivit y , she ma y not b e happy about this classification, bu t we agree with her that this is an imp ortan t virtue non etheless, w hic h , ho w ev er, is not s p ecifically co nnected to error sta tistics. 2 Ma yo applies the term “severit y ” also more generally , not confined to statistics. 22 The error pr obabilit y c haracteristics of error statistical metho ds rely , in ge neral, on model assumptions. In principle, these m o del assumptions can b e tested in an error statistica l manner, to o, and are therefore, according to Ma y o, no thr eat to the ob jectivit y of the account . But this comes with tw o problems. Firstly , deriv ations of statistical in f erence based on error probabilities t yp ically assume the mo del as fixed and do not accoun t for pr ior mo del selection based on the data. Th is iss u e has recentl y a ttracted some researc h (for exa mple, Berk et a l., 2013), b ut th is still requires a transparent listing of all the p ossible mo deling decisions that could b e made (virtue O1b), whic h often is missing, an d whic h ma y not ev en be desirable as long as the methods are used in an exploratory fashion (Gelman and Loken, 2014). Secondly , an y dataset can b e consisten t with man y mo dels, which can lead to d iv ergent in ferences. Da vies (201 4) illustrate s this with the analysis o f a dataset on amoun ts of copp er in drin k in g water, whic h can b e fitted w ell b y a Gaussian, a double exp onent ial, and a com b distribution, but yields v astly d ifferen t co nfiden ce in terv als for the cen ter of symmetry (whic h is assumed to b e the target of inference) under these three mo dels 3 . Da vies suggests that it is misleading to hyp othesize mo d els or parameters to b e “true,” and that one sh ould in stead tak e int o accoun t all mo dels that are “adequate” for appro ximating the data in the sense that they are not rejected b y tests based on features of the data the statistician is interested in, which do es not require reference to un observ able tru e frequ en tist p robabilities, but tak es into accoun t err or p robabilities as w ell. S uc h an appr oac h is tied to the observ ations in a more direct wa y without making metaphysica l assumptions ab out unobserv able features of observ er- indep end en t realit y (O1a, O4). How ev er, it is p ossib le that su c h a metaphysic al assump tion is implicitly s till needed if the researc her wan ts to use “data appro ximating mo dels” to learn ab out observ er-indep end en t realit y , and that the class of all adequ ate mo d els is too r ic h for meaningfu l inference (as in more standard f requen tist treatmen ts, Da vies fo cuses on models with indep endent and iden tically distributed rand om v ariables or error terms). E arlier wo rk on robu st statistics (see Hub er and Ronc hetti, 200 9) already int ro du ced the idea of sets of mo d els that neigh b or a n ominal mo del, from w hic h the mo dels in the neighborh o o d could not b e reliably d istinguished b ased on the data. Ev en further fl exibilit y in error sta tistical analyses comes from the fact that the assumption of a single tru e underlyin g distribution do es not determine the parametric or nonp arametric family of distributions, within whic h the true distribution is embedd ed. Although Neyman and Pea rson deriv ed optimal tests considering sp ecific alternativ es to the n ull h yp othesis, man y kinds of alter- nativ es and test statistics could b e of p otenti al inte rest. Da vies (2014) explicitly men tions the dep enden ce of th e c h oice of statistics for chec king the adequacy of mo dels on the con text and the researc h er ’s aims (S2a) instead of r elying on Neyman-Pea rson type optimalit y results. Ov erall, there is n o sh ortage of entry p oints for multiple p ersp ectiv es (S 1) in the error statistical approac h . This could b e seen as something p ositiv e, but it runs coun ter to some extent to the w ay the appr oac h is adv ertised as ob jectiv e by some of its prop onen ts. Man y frequentist and er r or statistica l analyses could in our opinion b enefit fr om ac knowle dging honestly their flexibilit y and the researc her’s choic es made, man y of whic h cannot b e d etermin ed by d ata alo ne. 5.3. Subjectivist B ay esianism W e call “sub jectivist epistemic” the in terpr etation of p robabilities as quanti fications of strengths of b elief o f an in dividual, where p robabilities can b e interpreted as derived f rom, or imp lemen table through, b ets that are coherent in that no opp onent can cause sure losses by setting up some 3 Davies (2014) uses th e example for a wider discussion of mo deling issues includ ing regularization and d efects of the likel iho od. 23 com bin ations of b ets. F rom this requir ement of coherence, the usual probabilit y axioms follo w (O2c). Allo win g conditional b ets implies Ba y es’s theorem, and therefore, as far as in ference concerns learning fr om observ ations ab out not (y et) observed hypotheses, Ba y esian metho d ology is used for sub jectivist epistemic probabilities, hence the term “sub jectivist Ba y esianism.” A ma jor pr op onen t of sub j ectivist Bay esianism was Bru no de Finetti (1974). De Finetti wa s not against ob jectivit y in general. He view ed observ ed facts as ob jectiv e, as we ll as mathematics and logic and certain formal conditions of r an d om exp erimen ts su c h as the set of p ossible outcomes. But he view ed u ncertain ty as something sub jectiv e and he h eld that ob jectiv e (frequentist) probabilities do n ot exist. He claimed that his su b jectivist Ba y esianism appropriately tak es int o accoun t b oth the ob jectiv e (see ab ov e) and sub jectiv e (opinions ab out un kno wn facts based on known evidence) comp onen ts for p robabilit y ev aluation. Giv en the degree of idealization requ ired for frequentism as discussed in Sectio n 5.1, this is certainly a legitima te p osition. In de Finetti’s w ork the term “pr ior” refers to all p r obabilit y assignment s usin g information external to th e d ata at hand , with n o fundamenta l d istin ction b et ween t he “paramete r prior” assigned t o paramete rs in a model, and the form of the “sampling distribution” giv en a fixed parameter, in con tr ast to common Ba y esian practice to day , in wh ic h the term “pr ior” is u sed to refer only to the parameter prior. In the follo wing discussion w e sh all use the term “priors” in d e Finetti’s general sense. Regarding the list of virtues in Section 2.3, de Finetti pr o vides a cl ear definition of p r obabilit y (O1a) based on prin ciples that he sought to establish as generally acc eptable (O2c). As opp osed to ob jectivist Ba yesia ns, s ub jectivist Ba yesians do not attempt to enforce agreemen t rega rding prior distributions, not eve n give n the s ame eviden ce; still, de Finetti (1974) and other sub j ectivist Ba yesia ns prop osed rational pr inciples for assigning prior pr obabilities. The difference b et ween the ob jectivist and sub jectivist Ba y esian p oin t of view is r o oted in th e general tension in science explained ab ov e; the sub jectivist approac h can b e criticized for not su pp orting agreemen t en ough— conclusions based on one p rior ma y b e seen as irr elev ant for s omeb o dy who holds another one (O2c)—but can b e defend ed for honestly ackno wledging th at p rior inform ation often do es not come in w ays that allo w a uniqu e formalization (S2b). In an y case it is vital that sub jectivist B a y esians explain transparen tly how they arriv e at th eir priors, so that other researchers can decide to w hat exten t th ey can supp ort the conclusions (O1c). Su c h transparency is d esirable in any statistical approac h bu t is particularly relev ant for su b jectiv e Ba y esian mo dels wh ic h cannot b e rejected within the sub jectivist p aradigm in ca se of disagreemen t with observ ations. In de Finetti’s conception, pr obabilit y assessment s, p rior and p osterior, can ultimately only concern observ able ev en ts, b ecause b ets can only b e ev aluated if the exp er im ent on wh ich a b et is placed has a n observ able outcome, and s o there is a clear connection to observ ables (O3a). Ho wev er, p r iors in the su b jectivist Ba y esian c onception are n ot op en to falsification (O3b), b ecause by definition they ha v e to b e fixed b efore observ ation. Adju sting the prior after h a ving observ ed th e data to b e analyzed vio lates coherence. Th e Ba y esian system as deriv ed from axioms suc h as coherence (as we ll as those u sed by ob jectivist Bay esians; see S ection 5.4) is d esigned to co ver all asp ects of learning from data, including mo del selection and rejection, b ut this requires th at all p otenti al later decisions are already incorp orated in the pr ior, which itself is not in terpreted as a testable statement ab out y et un kno w n obs erv ations. In particular this m eans that once a sub jectivist Ba yesia n has assessed a setup as exc h an geable a priori, he or she cannot dr op this assumption la ter, whatev er the data are (think of obser v in g tw ent y zero es, then tw ent y ones, then ten f u rther zero es in a binary exp eriment). This is a ma jor problem, b ecause sub jectivist Ba y esians use de Finetti’s theorem to j u stify w orking with p arameter priors and sampling mo dels u nder the assumption of exchange abilit y , w hic h is commonplace in Ba ye sian statistics. Da wid (1982) discussed 24 calibration (qualit y of matc h b et w een pr edictiv e pr obabilities an d the frequency of pred icted even ts to h app en) of sub jectivist Ba y esians inferences, and he suggests that badly calibrated Ba y esians could do w ell to a djus t their future priors if this is needed to impr o ve calibration, ev en at the cost of violati ng coherence. Sub jectivist Ba y esianism scores w ell on the sub jectiv e vir tues S1 and S2b. But it is a limitation that the prior distribution exclusive ly formalizes b elief; con text and aims of the analysis do not enter unless they ha v e implications ab out b elief. In practice, an exhaustiv e elicita tion of beliefs is rarely feasible, and mathematical and computational conv enience often plays a role in setting up sub jectiv e priors, despite de Finetti’s ha ving famously accused fr equ en tists of “adho c keries for mathematical con venience.” F urther m ore, the assumption of exchange abilit y will hardly ev er pr ecisely matc h an individu al’s b eliefs in an y situation—ev en if there is n o sp ecific reason against exc hangeabilit y in a sp ecific setup, the imp licit commitment to stic k to it w hatev er will b e observ ed seems too strong—but some kind of exc hangeabilit y assumption is r equ ired by Ba yesia ns for the same reaso n for which frequen tists need to rely on indep endence assump tions: some int ernal replication in the mo del is needed to allo w generaliza tion or extrap olation to future observ ations; see Sectio n 5.1. Summarizing, we view muc h of de Finetti’s criticism of frequentism as legitimate, and sub jec- tivist Ba y esianism comes with a commend able honest y ab out the impact of su b jectiv e decisions and allo ws for fl exibilit y accommo dating m ultiple p ersp ectiv es. But c h ec king and falsification of the prior is not built in to the appr oac h, and this can get in th e w a y of agreemen t b et we en observe rs. F ur thermore, some p roblems of the frequentist approac h criticized by de Finetti and h is disciples stem from th e una v oidable fact that useful mathematical mo dels idealize and simp lify p ersonal and so cial p ers p ectiv es on realit y (see Hennig, 2010 and ab ov e), and the sub j ectivist Ba yesia n app r oac h incurs suc h issues as w ell. 5.4. Objectivist Ba yes ianism Giv en the w a y ob jectivit y is often adv ertised as a k ey scientific virtue (often without sp ecifying what exactly it means), it is n ot surprisin g that de Fin etti’s emphasis on sub jectivit y is not shared b y all Ba y esians, and that there ha v e b een man y attempts to sp ecify prior distributions in a more ob jectiv e wa y . Curr en tly the approac h of E. T. Ja ynes (2003) seems to b e among the most p opular. As with m an y of his predecessors such as Jeffr eys and Carnap, J aynes sa w probability as a generalizat ion of b inary logic to uncertain prop ositions. Co x (19 61) prov ed that giv en a certain list of su pp osedly common-sense desiderata for a “plausibilit y” measurement, all suc h measurements are e quiv alen t, after s u itable scaling, to p robabilit y measures. Th is theorem is the basis of Ja ynes’ ob jectivist Ba y esianism, and the cla im to ob jectivit y comes from p ostulating that, giv en the same information, ev eryb o d y sh ould come to the same conclusions r egarding p lausibilities: prior and p osterior probabilities (O2c) , a statemen t w ith whic h sub jectivist Ba y esians disagree. In practice, this ob jectivist ideal seems to b e hard to ac hiev e, and Ja ynes (2003) admits that setting up ob jectiv e priors in cluding all in formation is an unsolv ed p roblem. One may wonder whether h is ideal is achiev able at all. F or example, in c h apter 21, he giv es a fu ll Ba yesian “solution” to th e p roblem of d ealing with and iden tifying outliers, wh ic h assumes that prior mo dels ha ve to b e sp ecified for b oth “goo d” and “bad” data (b etw een which therefore there h as to b e a prop er distinction), including p arameter pr iors for b oth mo dels, as w ell as a prior probabilit y for any n umber of observ ations to b e “bad.” It is hard to see, and no inform ation ab out th is is pr o vided by Ja yn es h im s elf, how it can b e p ossible to translate the u nsp ecific in formation of kno w ing of some outliers in many kinds of situations, some of which are more or less related, but n on e identica l (say) to the pr oblem at hand , in to p r ecise quanti tativ e s p ecifications as needed for Ja ynes’ approac h in 25 an ob jectiv e wa y , all b efore see ing the data. Setting aside the difficulties or w orkin g with informally sp ecified pr ior information, even the more elemen tary k ey issu e of sp ecifying an ob jectiv e prior distrib ution formalizing the absence of in formation is ridd led with difficulties, and there are v arious p r inciples for d oing this which disagree in many cases (Kass and W asserman, 1996 ). Ob jectivit y seems to b e an ambition rather than a description of wh at ind eed can b e a c hiev ed by setting up ob jectivist Ba yesian priors. More mo destly , th erefore, Bernardo (1979) sp oke of “reference priors,” a voiding the term “ob jectiv e,” and emphasizing that it w ould b e desirab le to h a ve a con v en tion f or such cases (O2b), but admitting that it m a y not b e p ossible to pro v e any general approac h for arriving at such a con v en tion uniquely correct or o ptimal in an y rational sense. Apart f r om the issue of the ob jectivit y of the sp ecification of the prior, b y and large the ob- jectivist Ba yesian approac h has similar adv ant ages and disad v an tages regarding ou r list of virtues as the sub jectivist Ba y esian approac h. P articularly it comes with th e same d iffi cu lties r egardin g the issu e of falsifiabilit y from observ ations. Prior probabilities are connected to logical analysis of the situation rather than to b etting rates for fu ture obser v ations as in de Finetti’s s ub jectivist approac h , whic h mak es the connection of ob jectivist Ba ye sian p rior probabilities to observ ations ev en w eak er than in the sub jectivist Ba yesia n appr oac h (bu t probabilistic logic has applications other than stat istical data analysis, for wh ic h this ma y not be a problem). The merit of ob jectivist Ba y esianism is that the appr oac h comes with a m u c h stronger drive to justify prior d istributions in a transp aren t w a y using principles that are as clear and general as p ossible. T h is driv e, together with some sub jectivist honest y ab out the fact that despite trying hard in the v ast ma jorit y of applications the resulting pr ior w ill not deserve the “ob jectivit y” stamp and will still b e sub ject to p oten tial disagreemen t, can p oten tially com b ine the b est of b oth of these traditional Ba ye sian worlds. 5.5. F alsificationist B a y esianism F or b oth sub jectivist and ob jectivist Ba y esians, follo wing de Fin etti (1974) and Jaynes (2003) , probabilit y m o dels includin g b oth p arameter priors and sampling m o dels do n ot mo del the data generating pro cess, but rather repr esen t plausib ility or b elief fr om a certain p oin t of vie w. Plausi- bilit y and b elief m o d els can b e mo dified by data in wa ys that are sp ecified a pr iori, but they cannot b e falsified b y data. In muc h app lied Ba yesian w ork, on the other hand, the sampling mo del is interpreted, explicitly or implicitly , as representing the data-g enerating pro cess in a fr equ en tist or similar w a y , an d pa- rameter priors and p osteriors are inte rpr eted a s giving information ab out what is known ab out the “true” parameter v alues. It has b een argued that su c h wo rk do es n ot d irectly run coun ter to th e sub jectivist or ob jectivist p hilosoph y , b ecause the “true parameter v alues” can often b e interpreted as exp ected large sample functions giv en the prior mo d el (Bernardo and Smith, 1994), b u t the wa y in wh ic h classical sub jectivist or ob jectivist statistical data analysis is d etermined by the untestable prior assignments is seen as unsatisfactory by man y statisticians. The suggestion of testing asp ects of the pr ior d istr ibution b y observ ations using error statistical tec h niques h as b een around for s ome time (Bo x, 1980). Gelman and Sh alizi (20 13) incorp orate this in an outline of what w e refer to here as “falsificat ionist Ba yesianism,” a philosophy that op enly deviates from b oth ob jectivist and sub jectivist Ba yesianism, in tegrating Bay esian metho dology with an int erpretation of p r obabilit y that can b e seen as frequen tist in a wide sense and with an err or statistical app roac h to testing assumptions in a bid to impro ve B a y esian statistics regarding virtue O4b. F alsificationist Ba y esianism follo ws the fr equen tist interpretati on of the pr obabilities f ormalized 26 b y the samp ling mo del giv en a tr u e parameter, so that these mo dels can b e tested using error statistica l tec hniques (with the limita tions that suc h tec h niques ha v e, as discus sed in Section 5.2). Gelman and Sh alizi argue, as s ome frequen tists do, th at such models are idealiza tions and shou ld not be b eliev ed to b e literally true, but that the scien tific pro cess pro ceeds from simp lifi ed mo d els through test and p oten tial falsification by impro ving th e mo dels where they are found to b e deficien t. This r eflects certain attitudes of Ja y n es (2003), with the difference that Ja yn es generally considered probabilit y mo dels as deriv able fr om constrain ts of a ph ysical system, w hereas Gelman and Sh alizi fo cus on examples in so cial or netw ork science which are not go verned by simple physical laws and th us where one cannot in general deriv e probabilit y distr ib utions f r om first prin ciples, so that “priors” (in the s ense that w e are using the term in this paper, enco mpassing b oth the d ata mo del and the p arameter model) are more cle arly sub j ectiv e. A cen tral issu e for falsificationist Bay esianism is the meaning and use of th e p arameter p rior, whic h can ha v e v arious interpretations, wh ic h gives falsificationist Ba yesia nism a lot of flexibility for taking into accoun t m ultiple p ersp ectiv es, con texts, and aims (S1, S2a) but may b e seen as a prob- lem regarding clarit y and un ification (O1a, O2c). F requent ists may wonder wh ether a parameter prior is n eeded at all. Here are some p oten tial b enefits of incorp orating a parameter prior: • The parameter prior may f ormalize rele v an t prior information. • The parameter prior may b e a useful device for r egularizatio n. • The p arameter prior m a y formalize d elib erately extreme p oin ts of view to explore sensitivity of the in ference. • The parameter prior may m ak e transparent a p oint of view in volv ed in an analysis. • The parameter prior ma y facilitate a certain kind of b eh avior of the results that is connected to th e aims of analysis (suc h as p enalizing complexit y or mo d els o n which it is difficult to act b y giving them low p rior w eight ). • The Ba yesian pr o cedure in v olving a certain parameter prior ma y h a ve b etter er r or statistical prop erties (suc h as the mean squ ared error of p oint estimates d eriv ed fr om the p osterior) than a s traigh tforward frequen tist method , if such a metho d ev en exists. • Often finding a Ba ye sian parameter prior whic h em ulates a frequenti st/error statistical metho d helps understanding the imp licatio ns of the metho d. Here are some wa ys to in terpret the parameter prior: • The parameter p r ior ma y b e in terp reted in a f requen tist wa y , as formalizing a more or less idealized data generating pro cess generating parameter v alues. The “generated” parameter v alues ma y not b e directly observ able, bu t in some applications the idea of having, at least indirectly , a sample of sev eral parameter v alues from the parameter prior mak es sense (“em- pirical Ba y es”). In many other app licatio ns the id ea is that only a single parameter from the parameter p rior is actually realized, whic h then giv es rise to al l the observ ed data. Even in these applications one could in pr inciple p ostulate a data generating pro cess b ehind th e parameter, of whic h only one realization is ob s erv able, and only indir ectly . This is a r ather b old idealiza tion, bu t fr equen tists are n o strangers to su c h id ealiza tions either; see Section 5.1. A similarly b old idealiza tion wo uld b e to view “all kinds of p oten tial stu dies with th e 27 (statistica lly) same parameter” as the r elev ant p opulation, ev en if the stud ies are ab out dif- feren t topics with d ifferen t v ariables, in wh ic h case more realizatio ns exist, bu t it is hard to view a sp ecific stu dy of interest as a “random dra w ” fr om suc h a p opulation. If parameter p riors are in terpreted in th is sense, they can actually b e tested and falsified using error sta tistical metho ds; see Gelman, Meng a nd Stern (19 96). In situations with only one parameter realization, the p o we r of su ch tests is low, though, and any kind of sev ere corrob oration will b e h ard to ac hiev e. Also, if there is only a sin gle realization of an idealized parameter distribution, th e information in the parameter p osterior seems to r ely s tr ongly on idealizati on. • If the qualit y of the inference is to b e assessed by error statistica l measures, th e parameter prior ma y b e seen as a purely tec hnical device. In this case, how ev er, the p osterior distri- bution do es n ot h a ve a prop er in terpretation, and only w ell defined s tatistics with known error statisti cal prop erties s u c h as the mean or mo de of the parameter p osterior should b e in terpreted. • Assuming that frequ entist probabilities from samp ling mo dels should b e equal to the su b- jectivist or ob jectivist epistemic probabilities if it is known that the sampling mo del is true (whic h Lewis, 1980, called “the principal principle”), the parameter prior can s till b e inte r- preted as giving epistemic probabilities suc h as sub jectivist b etting rates, conditionally on the sampling mo del to hold, ev en if the sampling mo d el is interpreted in a frequen tist w a y . The p ossibilit y of rejecting the sampling mo del based on th e data will in v alidate b oth coherence and Co x’s axioms, so that th e foun dation for the resulting epistemic pr obabilities b ecomes rather shaky . This d o es not n ecessarily hav e to stop an ind ividual from interpreting and us ing them as b etting rates, thou gh . Giv en such a v ariet y of u ses and meanings, it is cru cial for applications of falsificationist Ba yesia nism that the c h oice of the parameter prior is clearly explained and motiv ated, so transparency is central here as w ell as for the other v arieties of Ba y esian s tatistics. Ov erall, falsificationist Ba y esianism com bines the vir tu e of error statistical falsifiabilit y with the virtues listed ab ov e as “sub jectiv e,” d oing so via a fl exibilit y that may b e seen by some as problematic regarding cla rit y and unification. 6. Other philosophies There are imp ortan t p ersp ectiv es on statistics that lie outside the traditional frequen tist-Ba ye sian divide. In machine le arning , the f o cus is on predictio n rather than paramete r estimation, thus the emphasis is on corresp ondence to obs er v able realit y (O4). Computer scien tists are also in terested in transparency; disclosure of d ata, and m etho d s with full rep r o ducibilit y (O1) bu t are s ometimes less attuned to multiple p ersp ectiv es and con text dep endence (S1, S2). S uc h attributes are necessary in practice (users h a ve man y “knobs” to tune in external v alidation, includ ing the ob jectiv e f unction b eing optimized, the division int o training and test sets, and the choice of corpu s to use in the ev aluation)—but are t ypically pushed to the bac kground. In r obust statistics , the p oint is to assess stabilit y of inferences when assump tions a re violat ed, or to make minimal assump tions. This connects to impartialit y (O3). There is literature on classical and Bay esian robustness; in an y c ase consideration of mo del violations requires a w areness of multiple p ersp ectiv es (S1). S triving for robu s tness (aga inst distur bances of systems, observ ations, 28 assumptions) can itself b e seen as a scien tific virtue, although it is not n ormally asso ciated w ith either ob jectivit y or sub jectivit y . Altern ative mo dels of unc e rtainty suc h as b elief fun ctions, imprecise probabilities or fu zzy logic aim to get around some of the limitations of probabilit y theory (most noto riously , the difficult y of distinguishing b et w een “kno wn unkno wns” and “unkn own unknowns,” or risk and uncertain t y in the terminology of Knigh t, 1921). These appr oac hes are typically framed not as sub jectiv e or ob- jectiv e but rather as a wa y to incorp orate radically un certain information in to a statistical analysis. One could s a y that these generalizatio ns of probabilit y theory aim at virtues O 1c (comm u nication of p oten tial limitations), O2a (accoun ting for relev ant kn o wledge, here regarding distinctions th at are not represen ted in classical p r obabilit y mo deling) and O4a (connection of mo dels to obs erv ables). Explor atory data analysis (EDA; T uk ey , 1977) is all ab out data op erations rather th an mo d els. In that sense, ED A resembles classical s tatistics in its p ositivist fo cus, but with the difference that the goal is exploration rather than hyp othesis testing or rigorous inf erence. ED A is sensitiv e to multiple p ersp ectiv es (S1) and con text d ep endence (S2) in that disco ve ry of the unexp ected is alw ays relativ e to what wa s p r eviously exp ected by th e researc her. Regarding tr ansparency (O1), it could b e argued that the refu sal to use p robabilit y mo dels with all their problems and particularly references to w h at cannot b e observ ed (see ab o ve ) con tributes to clarit y . Ho we v er, it can also b e argued th at some tec h niques of ED A can be usefully explained in terms of probab ility mo dels, e.g., as predictiv e chec king (Gelman, 2003 ), but in traditional ED A su c h mo d els are left incomplete or implicit, and metho ds that come with implicit assumptions are p ortra y ed as assumptionless, whic h w orks aga inst transp arency . 7. Discussion 7.1. Implications fo r statistical theo ry a nd p ractice A t the lev el of discourse, we would like to mo ve b eyond a sub jectiv e vs. o b jectiv e shouting matc h . But our goals are larger than t his. Gel man and S halizi (2013) on the ph ilosoph y of Bay esian statistics s ough t n ot j u st to clear the air but also to pro vide ph ilosophical and rhetorical space for Bay esians to feel free to c hec k their m o dels and for applied statisticians who w ere concerned ab out mo del fit to feel comfortable with a Ba y esian approac h . In the present pap er, our goals are for scienti sts and statistici ans to ac hiev e more of the sp ecific p ositiv e qualities in to wh ic h we decomp ose ob jectivit y and sub jectivit y in S ection 2.3. A t the present time , we feel that concerns ab out ob jectivit y are getting in the wa y of researc h ers trying out different ideas an d considering differen t sources of inp uts to their mo del, w hile an ideology of sub jectivit y is limiting the degree to whic h researc h ers are justifying and u nderstandin g th eir mo del. There is a tendency for hardcore b elieve rs in ob jectivit y to needlessly a v oid the use of v aluable external information in their analyses, and for sub jectivists, but also for statisticians wh o w an t to mak e their results seem strong and un con trov ersial, to leav e their assumptions u nexamined. W e hop e that our new framing of transp arency , consens us, a v oidance of bias, reference to observ able realit y , m ultiple p ersp ectiv es, dep endence on context and aims, and honesty ab out the researc her’s p osition an d decisions will giv e researc hers of all strip es the imp etus and, ind eed, p ermission, to in tegrate d ifferen t sources of inf ormation into their analyses, to state their assumptions m ore clearly , and to trace these assumptions bac kwa rd to p ast data that justify them and forward to futu re d ata that can b e used to v alidate them. Also, we b eliev e th at the pressur e to app ear ob jectiv e has led to confu s ion and ev en dishonest y regarding d ata co ding and analysis decisions whic h cann ot b e motiv ated in supp osedly ob jective 29 w a ys; see v an Lo o and Romeijn (2015) for a d iscussion of this p oin t in the con text of ps yc hiatric diagnosis. W e pr efer to e ncourage a cultu r e in whic h it is acceptable to b e op en ab out the r easons for whic h decisions are made, wh ic h ma y at times be mathematical con v enience, or the aim of the study , rather th an strong theory or hard data. It should b e recognized op enly that the aim of statistica l mod eling is not alw a ys to m ak e the mo del as close as p ossible to observer-indep endent realit y (wh ic h alwa ys requir es idealization an yw a y ), an d that some decisions are made, f or example, in order to make outcomes more easily in terpr etable for sp ecific ta rget audiences. Our key p oin ts: (1) m ultiple p er s p ectiv es corresp ond to m u ltiple lines of reasoning, not merely to mindless and unjustified guesses; and (2) w hat is n eeded is not just a p rior distribu tion or a tuning parameter, bu t a statistical approac h in wh ic h these c hoices can b e grounded, either empirically or b y connecting them in a transparent w a y to the con text and aim of the analysis. F or th ese r easons, we do not think it at al l ac c ur ate to limit Bayesian infer e nc e to “the anal ysis of subje ctive b eliefs.” Y es, Ba yesia n analysis can b e expressed in terms of su b jectiv e b eliefs, but it can also b e app lied to other settings that ha ve nothing to do with b eliefs (except to the extent that all scientific inquiries are u ltimately abou t what is b eliev ed ab out the w orld ). Similarly , we would not limit classic al statistic al infer enc e to “the analysis of simple r andom samples.” Classical metho ds of h yp othesis testing, estimation, and d ata reduction can b e applied to al l sorts of problems that do not in v olv e random samplin g. There is no need to limit the applications of these metho ds to a narro w set of sampling or r an d omization pr ob lems; rather, it is imp ortant to clarify the foun dation for usin g the mathematical mo dels for a larger class of problems. 7.2. Bey ond “objective” and “ subjective” The list in Section 2.3 is th e core of th e pap er . The list ma y not b e complete, and s u c h a list ma y also b e systematized in different wa ys. Pa rticularly , w e dev elop ed th e list ha ving particularly applied statistics in m in d, and w e may hav e missed asp ects of ob jectivit y and sub j ectivit y th at are not connected in some sense to statisti cs. In an y ca se, w e b eliev e that the giv en list can b e helpfu l in practice for resea rc hers, for justifying and exp laining their choic es, and for recipien ts of researc h w ork, for chec king to wh at exten t the listed virtues are practiced in scien tific w ork. A k ey issue here is transparency , whic h is required for c hec king all the other virtues. Another k ey issue is that sub jectivit y in science is not something to b e av oided at an y cost, but that multiple p ersp ectiv es and con text dep endence are actually basic conditions of scientific inquiry , wh ic h s h ould b e explicitly ac knowledged and tak en into account by researchers. W e thin k that th is is muc h more constructiv e than the simple ob jectiv e/sub jectiv e dualit y . W e do not th in k this advice rep r esen ts empt y truisms of th e “mom and app le pie” v ariet y . In fact, w e rep eatedly encounter pu b lications in top s cientific journ als that fall foul of these virtues, whic h indicates to us that the underlyin g prin ciples are subtle and motiv ates this pap er. W e h op e that a c hange in names will clarify what can b e d one to improv e statistica l an alyses in these t wo dimensions. Instead of p ointing at sp ecific bad examples, here is a list of some issu es that can regularly b e encountered in scien tific pub licatio ns (see, for example, our d iscu ssions in Gelman, 2015 , an d Gelman and Zelizer, 2015), and w here w e b eliev e that exercising one or more of our listed virtues w ould impro ve mat ters: • Presen ting analyses that are conti ngen t on data without explaining the exploration and se- lection pro cess and w ithout ev en ac knowledging that it to ok place, • Justifying decisions by reference to sp ecific literature without ac knowledging that what was 30 cited ma y b e con tro v ersial, not applicable in the giv en situation, or without prop er justifica- tion in the cited literature as w ell (or not justifying the decisions at all), • F ailure to refl ect on whether mo del assumptions are reasonable in the giv en s ituation, wh at impact it would ha ve if they were violated, or whether alternativ e mo dels and appr oac hes could b e r easonable as well, • Cho osing m etho d s for the main reason that they do not require tu ning or mak e decisions automatica lly and therefore seem “ob jectiv e” w ithout d iscussing whether the c hosen metho ds can hand le the d ata more appr opriately in the give n situati on than alternativ e methods with tuning, • Cho osing metho ds for th e main reason that they “do not require assumptions” without re- alizing th at ev ery metho d is based on implicit assumptions ab out how to trea t the d ata appropriately , regardless of wh ether these are stated in terms of statistical mo dels, • Cho osing Ba y esian pr iors without ju s tification o r explanation of what they mean and imply , • Using nonstandard metho dology without justifying the deviation from standard appr oac hes (where they exist) , • Using standard approac hes without discus s ion of w hether they are appropr iate in the sp ecific con text. Most of these h a ve to do with the unwillingness to admit to h aving made decisions, to justify th em, and to tak e in to acco unt alte rnativ e p ossible views that ma y b e equally reaso nable. In some s ense p erhaps this can b e justified based on a s o ciolo gical mo d el of the scien tific pro cess in w hic h eac h pap er presents just one view, and then the different p ersp ective s battle it out. But w e think that this idea ignores the imp ortance of comm unication and facilitating consensu s for science. Scientist s normally b elieve that eac h analysis aims at th e truth, and if different analyses give d ifferen t results, this is not b ecause there are differen t conflicting truths but rather b ecause differen t analysts hav e differen t aims, p ersp ectiv es and acc ess to different in formation. Letting the issue aside of whether it mak es sens e to talk of the existence of different truths or not, we see aiming at general agreemen t in free exchange as essen tial to science, and the m ore p ersp ectiv es are tak en in to accoun t, th e more the scien tific pro cess is supp orted. W e see the listed virtues as ideals which in practice cannot generally b e f u lly ac hiev ed in any real pr o ject. F or example, tracing all assumptions to observ ations and making them c hec k able b y observ able data is imp ossible b ecause one can alw ays ask whether and why results from the sp ecific observ ations used should generalize to other times and other s itu ations. As men tioned in Section 5.1, ultimately a rationale f or treating different situations as “iden tical and indep end en t” or “ex- c h angeable” n eeds to b e constructed b y h u man thought (p eople ma y app eal to historica l successes for justifying such idealizations, but this do es not help m uc h regarding sp ecific app lications). A t some p oin t—bu t, we hop e, not to o early—researc hers hav e to resort to somewhat arbitrary c hoices that can b e justified only by log ic or con ve nti on, if th at. And it is lik ewise unr ealistic to sup p ose that we can capture all the relev ant p ersp ectiv es on an y scien tific pr ob lem. No netheless, we b eliev e it is u seful to set these as goals w h ic h , in con tr ast to the inheren tly opp osed co ncepts of “ob jectivit y” and “sub jectivit y ,” can b e approac h ed toget her. 31 References Alp ert, M., and Raiffa, H. (1984). A pr ogress rep ort on th e training of probabilit y assessors. In J udgment Under Unc ertainty: Heuristics and Biases , ed . K ahneman, D., Slo vic, P ., and Tv ersky , A., 29 4–305. Cam bridge Univ ersity Press. Berger, J. (2 006). The case for ob jectiv e Ba ye sian analysis. Bayesian Analysis 1 , 38 5–402. Bernardo, J. M. (1979). Reference p osterior distribu tions for Ba yesian inference. Journal of the R oyal Statistic al So ci e ty B 41 , 113–1 47. Bernardo, J. M., and Smith, A. F. M. (1994). Bayesian The ory . Ch ic hester: Wiley . Berk, R., Brown, L., Buja, A., Z h ang, K., and Zhao, L. (2013 ). V alid p ost-selection inference. Anna ls of Statistics 41 , 8 02–837 . Bo x, G. E. P . (1980). S ampling and Bay es’ inference in scien tific m o delling and robu stness. Journal of the R oyal Statistic al So ciety A 143 , 383 —430. Bo x, G. E . P . (1983). An ap ology for ecumenism in statistics. In Scientific Infer enc e, Data Analy sis, and R obustness , ed. G. E. P . Bo x, T. Leonard, T., and C. F. W u, 51–8 4. New Y ork: Academic Press. Candler, J., Holder, H. , Hosali, S., P ayne, A. M., Tsang T., and Viz ard, P . (201 1). H uman R ights Me asur ement F r amework: Pr ototyp e Panels, Indic ator Set and Evidenc e Base . Researc h R ep ort 81. Manc hester: Equalit y and Human Rig ht s Commission. Chang, H. (2 012). Is Water H 2 O ? E videnc e, R e alism and Plu r alism . Dordrec ht : Spr in ger. Co x, R. T. (1961). The Algebr a of Pr ob able Infer enc e . Baltimore: Johns Hopkins Univ ersit y Press. Daston, L. (1992). Ob jectivit y and the escap e from p ersp ectiv e. So cial Studies of Scienc e 22 , 597–6 18. Daston, L. (199 4). Ho w probabilities came to b e ob j ectiv e and sub jectiv e. Historia Mathematic a 21 , 330 –344. Daston, L., and Galison, P . (2007 ). Obje c tivity . New Y ork: Z one Books. Da vies, P . L. (2014 ). Data A nalysis and Appr oximate Mo dels . Bo ca Raton, Fla.: CRC Pr ess. Da wid, A. P . (1982). Th e we ll-calibrated Ba yesian. Journal of the Americ an Statistic al Asso ciation 77 , 605 –610. de Finetti , B. (1974) . The ory of Pr ob ability . New Y ork: Wiley . Desrosieres, A. (2002). The P olitics of L ar ge Numb e rs . Boston: Harv ard Univ ersit y Press. Douglas, H. (2 004). T he irreducible co mplexit y of ob jectivit y . Synthese , 138 , 45 3–473. Douglas, H. (2 009). Sc i enc e, Policy and the V alue-F r e e Ide al. Universit y of Pittsburgh Press. Erev, I., W allsten, T. S., and Bud escu, D. V. (1994). Simultaneous ov er- and underconfid ence: The role of error in judgment pr o cesses. Psycholo gic al R eview 101 , 519– 527. Erikson, R. S., Pa nagop oulos, C., and Wlezien, C. (2004). Likely (and unlik ely) v oters and the assessmen t of campaig n dynamics. Pu b lic Opinion Q uarterly 68 , 588–6 01. Ev eritt, B. S., Landau, S., Leese, M. and S tahl, D. (201 1), Cluster Analysis , fi fth edition. Wile y , Chic hester. F eye rab end . P . (1978 ). Scienc e in a F r e e So ciety . London: New Left Books. Fine, T. L. (1973). The ories of Pr ob ability . W altham, Mass.: Academic Press. F uchs, S. (1997). A so ciolog ical th eory of ob jectivit y . Scienc e Studies 11 , 4–26. 32 Gelman, A. (2003). A Ba yesian formulat ion of exploratory data analysis and go o dness-of-fit testing. International Statistic al R eview 71 , 36 9–382. Gelman, A. (2008). The folk theorem of statistical computing. Statistical Mod eling, Causal In- ference, a nd So cial Science blog , 13 Ma y . http ://andrewgelman.com/2008/05/13/the_folk_ theore/ Gelman, A. (2013). Whither the “b et on spars ity principle” in a n onsparse wo rld? S tatistical Mo deling, Causal In ference, and So cial Science blog, 25 F eb. http ://andrewgelman.com/ 2013/12/16/whither- the- bet- on- sparsity- principle- in- a- no nsparse- world/ Gelman, A. (2014a ). Bask etball stats: Don’t mo del the probab ility of win, mo del the exp ected score differen tial. Statistical Mod eling, Causal Infer- ence, and So cial Science blog, 25 F eb. http://andrewge lman. c om/2014/02/25/ basketba ll- s tats- dont- model- probability- win- model- expected- score- differential/ Gelman, A. (2014b). Ho w do we c ho ose our d efault metho ds? In Past, Pr esent, and F utur e of Statistic al Scienc e , ed. X. Lin, C. Genest, D. L. Banks, G. Molen b erghs, D. W. Scott, and J . L. W ang, 293–301. London: Chapm an and Hall. Gelman, A. (2014c). President of American Asso ciation of Buggy-Whip Man ufac- turers takes a strong stand against in ternal com bustion engine, argues that the so-calle d “automobile” has “ little grounding in theory” and that “results can v ary w idely based on the particular fuel that is us ed.” Stat istical Mo deling, Causal In f erence, and So cial S cience blog, http://andrewge lman. c om/2014/08/06/ presiden t- a merican- associ ation- buggy- w hip- man ufacturers- takes- strong- stand- internal- combust i o Gelman, A. (2015). The connection b et wee n v arying treatment effects and the crisis of un replicable researc h : A Ba yesia n p ersp ectiv e. Journal of Management 41 , 63 2–643. Gelman, A., and Basbøll, T. (2 013). T o thro w a wa y data: Plagiarism as a statistical crime. Americ an Scie ntist 101 , 168–171. Gelman, A., Bois, F. Y., and Jiang, J. (1996). Physio logical pharmacokinetic analysis using p opu- lation mo deling and informativ e prior distr ib utions. Journal of the Americ an Statistic al Asso- ciation 91 , 1400–1412 . Gelman, A., and Carlin, J. B. (2014). Beyo nd p o w er calculatio ns: Assessing Type S (sign) and T yp e M (magnitude) e rrors. P e rsp e ctives on P sycholo gic al Scienc e 9 , 641–6 51. Gelman, A., Carlin, J. B., Stern, H. S ., Dun s on, D., V ehtari, A., and Rub in, D. B. (2013). Bayesian Data Analysis , th ir d edition. L ondon: Chapman and Hall . Gelman, A., and Loken, E. (2 014). The statistical crisis in science. Americ an Scientist 102 , 460–4 65. Gelman, A., Meng, X. L., and Stern , H. S. (1996). Posterior predictiv e assessment of model fi tness via realize d d iscr ep ancies (with discu ssion). Statistic a Sinic a 6 , 733—807. Gelman, A., and O’Rourke, K. (2015). Convincing evidence. In R oles, T rust, and R eputation in So cial Me dia Know le dge Markets , ed. Sorin Mate i and Elisa Be rtino. New Y ork: Sprin ger. Gelman, A., and Shalizi, C . (20 13). P h ilosoph y and th e p ractice of Ba yesian statistics (with discussion). British Journal of Mathematic al and Statistic al Psycholo gy 66 , 8–80. Gelman, A., and Zelizer, A. (2015). Evidence on the deleterious imp act of sus tained use of p oly- nomial regression on causal inference. R ese ar ch and Politics 2 , 1–7. Gillies, D. (2 000). P hilosop hic al The ories of Pr ob ability . London: Routledge. Hac king, I. (2015). Let’s n ot talk ab out ob jectivit y . In Obje ctivity in Sci e nc e , ed. F. P adov ani et 33 al. Boston Studies in the Philosophy and History of Science. Ha j ek, A. (2009). Fifteen argument s against h yp othetical frequen tism. E rke nntnis 70 , 211-–2 35. Hennig, C. (2010). Mathematical mo d els and r ealit y: A constru ctivist p ersp ectiv e. F oundations of Scienc e 15 , 2 9–48. Hennig, C., and Liao, T. F. (2013). Ho w to find an appr op r iate clustering f or mixed t yp e v ari- ables with application to so cio economic stratification (with discussion). Journ al of the R oyal Statistic al Scie nc e, Series C (Applie d Statistics) 62 , 309–369. Hennig, C. and Lin, C.-J. (201 5). Flexible parametric b o otstrap for testing h omogeneit y against clustering and assessing the num b er of clusters. Statistics and Computing 25 , 821 –833. Hub er, P . J., and R on chetti, E. M. (2009). R obust Statistics , second edition. New Y ork: Wiley . Ja yn es, E. T . (200 3). Pr ob ability The ory: Th e L o gic of Scienc e . Cambridge Universit y Press. Kahneman, D. (1999). O b jectiv e happiness. In Wel l- b eing: F oundations of H e donic Psycholo gy , 3–25. New Y ork: Russell Sage F oundation Press. Kass, R. E. and W asserman, L. (1996). Th e selection of p r ior distribu tions b y formal r ules. Journal of the Am eric an Statistic al Asso ciation 91 , 1343 –1370. Keynes, J. M. (1936). The Gener al The ory of Employment, Inter est and Money . London: Macmil- lan. Knigh t, F. H. (1921). Risk, Unc ertainty, and Pr ofit . Boston: Hart, Sc haffner and Marx. Kuhn, T. S. (1962). The Structur e of Scientific R evolutions . Universit y of Chicago Press. Lewis, D. (1980). A sub jectivist’s guide to ob jectiv e c h ance. In Studies in Inductive L o gic and Pr ob ability, V olume II , ed. R. C. Jeffr ey , 263-–29 3. Berkele y: Universit y of California Press. Linstone, H. A. (1989). Multiple p ers p ectiv es: Concept, applications, and user guid elines. Systems Pr actic e 2 , 307 –3331. Little, R. J . (2012). Calibrated Ba yes, an alternativ e inferential paradigm for official statistics. Journal of Official Statistics 28 , 309– 334. MacKinnon, C. (1 987). F e minism U nmo difie d . Boston: Harv ard Universit y Press. Maturana, H. R. (1988). Realit y: The searc h for ob jectivit y or the quest for a comp elling argument. Irish J ournal of Psycholo gy 9 , 25–82. Ma yo, D. G. (1996). E rr or and the Gr owth of E xp erimental Know le dge . Un iversit y of Chicago Press. Ma yo, D. G. (2014) . Ob jectiv e/sub jectiv e, dirt y hands, and all that. E r ror Statistics Ph ilosoph y blog, 16 Jan. htt p://errorstatist ics. c om/2014/01/16/objectives ubjective - dirty- hands- and- all- that- gelmanwasserman- blogolog/ Ma yo, D. G. and Spanos, A. (2010) . Introd uction and b ac kground : The error-statistical philosophy . In Err or and Infer enc e , ed. Ma yo, D. G. and Spanos, A., 15 –27. C am b ridge Univ ersit y Press. Megill, A. (1994). Introd uction: F our senses of ob jectivit y . In R ethinking Obje ctiv ity , ed. A. Megill, 1–20. Durh am, N.C.: Du ke Univ ersity Press. Merry , S. E. (201 1). Measuring the w orld: Indicators, human righ ts, and global go v ernance. Cur- r e nt Anthr op olo gy 52 (S3), S83– S95. P earson, K. (1911). The Gr ammar of Scienc e . 2007 edition. New Y ork: Cosimo. P ollster.com (20 04). Should p ollsters w eight by part y iden tification? http: //www.pollster. com/ faq/should_pollster s_ weight_by_par.php 34 P orter, T. M. (1996). T rust in Numb ers: The P u rsuit of Obje ctivity in Scienc e and Pub lic Life . Princeton Univ er s it y Press. Reic henbac h, H. (1938). On pr obabilit y and induction, Philosop hy of Scienc e , 5 , 21-–45. Reiss, J., and Sprenger, J . (2014). Scienti fic ob jectivit y . In Stanfor d Encyclop e dia of Philoso- phy (F all 2014 Ed ition), ed. E. N. Zalta, http://plato.stanford.edu/archive s/ fall201 4/ entries/scienti fic- objectivit y/ . Rubin, D. B. (1978). Bay esian in f erence f or causal effects: Th e role of randomization. A nnals of Statistics 6 , 34–58. Rubin, D. B. (1984). Ba yesia nly justifiable and relev an t frequency calculatio ns for the ap p lied statisticia n. Annals of Statistics 12 , 1151–117 2. Saari, C. (2005). T h e con tribution o f relational theory to so cial work practice. Smith Col le ge Studies in So c ial Work 75 , 3–1 4. Sheiner, L. B. (1984). The p opulation approac h to p harmacokinetic data analysis: Rationale and standard data analysis m etho ds. Drug M etab olism R eviews 15 , 153– 171. Silb erzahn, R., et al . (2015) . Cro wdsour cing d ata analysis: Do soccer referees giv e more red card s to dark s kin toned pla y ers? C en ter for Op en Science, https://osf.io/j5v8f/ Simmons, J ., Nelson, L., and Simonsohn, U. (2011). F alse-p ositiv e p syc hology: Undisclosed flex- ibilit y in data collect ion and analysis allo w presenting an ything as significan t. Psycholo gic al Scienc e 22 , 1 359–13 66. Tibshirani, R. J. (2014 ). In praise of sparsit y and con vexit y . In Past, Pr esent, and F utur e of Statistic al Scienc e , ed. X. Lin, C. Genest, D. L. Banks, G. Molen b erghs, D. W. Scott, and J . L. W ang, 505–513. London.: Chapm an and Hall. v an F raassen, B. (1980 ). The Scientific Image . Oxf ord Univ ersit y Press. v an L o o, H. M., and Romeijn, J . W. (2015). Psychiat ric comorbid it y: F act or artifact? The or etic al Me dicine and Bio ethics 36 , 41 –60. v on Glasersfeld, E. (199 5). R adic al Constructivism: A Way of Knowing and L e arning . London: F almer Pr ess. v on Mises, R. (1957). Pr ob ability, Statistics and T ruth , second revised English edition. New Y ork: Do ver. W ang, W., R othschild, D., Go el, S., an d Gelman, A. (20 15). F orecasting electi ons with non- represent ativ e p olls. International Journal of F or e c asting 31 , 98 0–991. W einberger, D. (2009) . T ransp arency is the new ob jectivit y . Eve rything is Miscella neous blog, 19 Jul. http: //www.everythingismi scellaneo us. com/2009/07/19/trans parency- is- the- new- objectivi ty/ Y ong, E. (2012). Nob el laureate c h allenges psyc hologists to clean up their act. Natur e News , 3 Oct. http://www.nature.com/news/nobel- laurea te- challenges- psychologists- to- clean- up- their- act- 1. 1153 5 Zab ell, S. L. (2011 ). The sub jectiv e and the ob jectiv e. In Philo sophy of Statistics , ed. P . S . Bandy opadh y a y and M. R. F oster. Am s terdam: Elsevier. 35

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment