The Waves and the Sigmas (To Say Nothing of the 750 GeV Mirage)

This paper shows how p-values do not only create, as well known, wrong expectations in the case of flukes, but they might also dramatically diminish the `significance' of most likely genuine signals. As real life examples, the 2015 first detections o…

Authors: Giulio DAgostini

The Waves and the Sigmas (To Say Nothing of the 750 GeV Mirage)
The W a v es and the Sigmas (T o Sa y Nothing of the 750 GeV Mirage) ∗ G. D’Agostini Univ ersit` a “L a Sapienza” and INFN, Roma, Italia (giulio.dagos tini@roma1 .infn.it, ht tp:// www.ro ma1.infn.it/ ~ dagos ) Abstract This pap er shows how p-v alues do not only cr eate, as well known, wrong expecta tions in the case of flukes, but they might also dra matically diminish the ‘significance’ of most likely gen uine signals . As real life examples, the 2015 first detections of gravitational wa ves are discussed. The March 2016 s tatement of the American Statistical Asso c ia tion, warning scientists ab out interpretation and misuse o f p-v alues, is also reminded and commented. (The pap er is co mplemen ted with s o me remar k s on past, recent a nd future claims of discov eries b ase d on sigmas from Particles Ph ysics.) 1 In tro duc tion On F ebruary 11 the LIGO-Virgo collaboration a nn ounced the detection of Gra vitational W a v es (GW). They we re emitt ed abou t one b illion y ears ag o by a Binary Blac k Hol e (B BH) merger and r eac hed Earth on Septem b er 14, 20 15. The claim, as it app ears in the ‘d isco v ery pap er’ [1] and stressed in p ress releases and seminars, w as b ased on “ > 5 . 1 σ significance.” Ironically , sh ortly after, on Marc h 7 the American Statistical Asso ciation (ASA) came out (indep end en tly) with a strong statemen t warning s cientists ab out interpretatio n and misus e of p -v alues [2]. As promptly rep orted b y Nat ur e [3], “t his is the first time that the 177-y ear- old ASA has made explicit r ecommend ations on suc h a foun dational matter in statistics, sa ys executiv e d ir ector Ron W asserstein. T h e so ciet y’s mem b ers h ad b ecome increasingly concerned th at th e P v alue wa s b eing misapplied in wa ys th at cast doubt on statistics generally , he ad d s.” In J une we ha ve finally learned [4] th at another ‘one and a half ’ gra vitational wa v es from Binary Blac k Hole mergers w ere also obs erv ed in 2015 , where b y the ‘half ’ I refer to the Octob er 12 ev en t, h ighly b elieve d b y the collab oration to b e a gra vitational wa v e, although ∗ Note based on the invited talk Claim s of disc overies b ase d on sigmas at MaxEnt 2016 (Gh ent, Belgium, 15 July 2016) and on seminars and courses to PhD studen ts in the first h alf of 2016. 1 ha ving only 1.7 σ signi fic anc e and therefore classified j u st as L VT (LIGO-Virgo T r igger) instead of GW. Ho we ver, another figure of m erit has b een provided by the collab oration for eac h ev en t, a n umb er based on probabilit y theory and that tells how muc h w e must mo dify the relativ e b eliefs of t w o alternativ e hyp otheses in the ligh t of the exp erimenta l information. This num b er , at m y kno wledge never ev en men tioned in pr ess releases or seminars to large audiences, is the Ba y es f actor (BF), whose meaning is easil y exp lained: if y ou considered ` a priori t wo alternativ e hypotheses equally lik ely , a BF of 100 c hanges y our o dds to 100 to 1; if ins tead yo u considered one hyp othesis rather unlik ely , let u s sa y your o dds w ere 1 to 100, a BF of 10 4 turns them the other w a y around, that is 100 to 1 . Y ou will b e amazed to learn th at eve n th e “1.7 sigma” L VT15101 2 has a BF of the order of ≈ 10 10 , considered a v ery strong evidence in fa vor of th e hyp othesis “Binary Blac k Hole merger” against the alternativ e hyp othesis “Noise”. (Alan T urin g w ould h a v e called the evidence pro vided by suc h an huge ‘Ba yes f actor,’ or wh at I. J. Go o d w ould hav e pr eferred to call “Ba yes-T u r ing factor” [5], 1 100 de cib an , well ab ov e the 17 deciban threshold considered b y the team at Bletc hley Park durin g W orld W ar I I to b e reasonably confi den t of ha ving crac k ed the daily Enigma key [7].) In the past I ha v e b een w riting quite a bit on how ‘statistica l’ considerations b ased on p-v alues tend to create wrong exp ectat ions in f r on tier p hysics (see e.g. [8] and [9]). The main pu rp ose of this pap er is the opp osite, i.e. to sh o w ho w p-v alues might relega te to th e role of a p ossible fluke wh at is most lik ely a gen uine find ing. In particular, the solution of the apparent paradox of how a marginal ‘1.7 sigma effect’ could ha ve a huge BF suc h as 10 10 (and virtually eve n m uch more!) is explained in a didactic w a y . 1 Note t h at Eq. (1) in [5] clearly contains a typo, or it has got a problem in the scanning of the do cument, since P ( E | H ) /P ( E | H ) makes no sense in t h at eq uation and it should hav e b een P ( E | H ) /P ( E | H ), where H and H stand for ‘complemen tary’ ( formally “exhaustiv e, m utually exclusive”) hyp otheses. The equ ation should then read O ( H | E ) O ( H ) = P ( E | H ) P ( E | H ) , where O ( H ) and O ( H | E ) are prior and p osterior o dds, i.e., respectively , O ( H | E ) = P ( H | E ) /P ( H | E ) and O ( H ) = P ( H ) /P ( H ). Eq. (1) of [5] would then result into P ( H | E ) /P ( H | E ) P ( H ) /P ( H ) = P ( E | H ) P ( E | H ) or P ( H | E ) P ( H | E ) = P ( E | H ) P ( E | H ) · P ( H ) P ( H ) , in w ords posteri or odds = Bay es factor × prior odds (F or log representatio n of o dds and Bay es factors see section 2 and app end ix E of [6] and references therein, although at that time T uring’s contributions, as well as ‘ban s’ and ‘decibans’, were unknown to t h e author, who arrived at the same conclusion of T uring’s 1 deciban as rough estimate of human r esolution to judgement le aning and weight of evidenc e – tab le 1 in page 13 and text just b elow it.) 2 2 Pream ble Since this p ap er can b e seen as the sequel of Refs. [12 ] and [9], with the basic consid erations already exp ounded in [8], for the conv enience of the reader I shortly summ arize th e main p oints mainta ined there. • The “essen tial problem of th e exp erimental metho d ” is n othing but solving “a prob- lem in the probability of causes”, i.e. r anking in credib ilit y the hyp otheses that are considered to b e p ossibly resp onsible of the obs erv ations, (quotes b y Poincar ´ e [13]). 2 [There is ind eed no conceptual difference b etw een “comparing hypotheses” or “infer- ring the v alue” of a ph ysical quan tity , the t wo problems only differing in the numeros- it y of hyp otheses, v i rtual ly infin ite in the latter case, when the physical quant ity is assume d , f or m athematical con venience , 3 to assume v alues with con tin u ity .] • The deep source of un certain t y in inference is due to the fact th at (apparently) iden- tical c auses might pro du ce different effects, du e to internal (in trinsic) p robabilistic asp ects of the theory , as w ell as to external factors (think at m easur emen t err ors). • Humankind is used to liv e – and su rviv e – in conditions of uncertain t y and th erefore the human min d has develo p ed a menta l ‘catego ry’ to handle it: pr ob ability , meant as degree of b elief. Th is is also v alid wh en w e ‘mak e science’, since “it is scien tific only to sa y w h at is m ore lik ely and what is less like ly” (F eynman [15]). • F alsific ationism can b e r ecognized as an attempt to extend th e classical pr o of by c ontr adiction of classical logic to the exp erimenta l metho d, but it simply fails when sto c hastic (either in ternal or external) effects might o ccur. • The fu rther extension of falsificationism from i mp ossible effects to impr ob able effects is simply deleterious. • The in v en tion of p-v alues can b e s een as an at tempt to o v ercome the eviden t problem o ccurring in the case of a large n umb er of effects ( virtual ly infinite when w e make measuremen ts): an y obser v ation has a very small probabilit y in the ligh t of whatev er h yp othesis is considered , and then it ‘falsifies’ it. • Logically the previous extension (“observ ed effect” → “all p ossible effects equally or less p robable than the observ ed one”) d o es n ot hold wat er. (But it seems that for man y practitioners logic is op tional – the reason wh y “ p -v alues often work ” [8] will b e discussed in section 6.) 2 Instead, “making statis tics”, i.e. to describ e and summari ze data, has never b een the primary in terest of ph ysicists as w ell as of man y other scientis ts, although it is certai nly useful f or a v ariety of reasons. 3 “ No mathematic al squ abbles ” wa s John Skilling’s mantra in h is recent tutorial at MaxEnt 2016, in which he was stressing the imp ortance to restart thinking, at least “initially”, in terms of “finite target”, “finite partitioning” and integers [14]. 3 • In practice p-v alues are routinely misinte rp reted b y most pr actitioners and scien tists, and in corr ect interpretatio ns of the data are s p read aroun d ov er th e med ia 4 (for recen t examples, related to the LHC presump tiv e 750 GeV di-photon signal (see e.g. [16, 17, 18, 19, 20] and fo otnote 31 for later commen ts.). • The reason of th e misunderstandings is that p-v alues (as w ell as other outco mes fr om other m etho ds of the dominating ‘stand ard statistics’, in clud ing c onfidenc e inter- vals [8]), d o not r eply to the v ery qu estion human minds by natur e ask for, i.e. which h yp othesis is more or less b eliev able (or how lik ely the ‘true’ v alue of a quant it y lies within a giv en in terv al). F or this reason I am afraid p-v alues (or p erhaps a n ew inv en - tion b y statistici ans) will still b e misint erpr eted and misused despite the 2016 ASA statemen t, as I will argue at the end of section 3.2). • Giv en the imp ortance of the previous p oint, for the con venience of the reader I r ep ort here verbatim the list of m isu nderstand in gs app earing in the Wikip edia at the end of 2011 [9], 5 highligh ting the sen tences that m ostly concern our discour s e. 1. “ The p-v al ue is not the probabili t y that the n ull h yp othes is is true. In fact, frequentist statistics do es not, a nd cannot, attach pro babilities to hypotheses. Comparison o f Bay esian and classical a pproaches shows that a p-v alue can be very close to zer o while the p oster io r probability of the n ull is very close to unity (if there is no alternative hypothesis with a lar ge eno ugh a priori probabilit y and whic h w ould explain the r e sults more ea sily). This is the Jeffreys-Lindley paradox. 2. The p- v alue is not the proba bi l it y that a finding is “merel y a fluk e.” As the calc ula tion of a p-v alue is based o n the as sumption that a finding is the pr o duct o f chance alone, it patently ca nnot also be us e d to ga ug e the probability of that a ssumption b eing true. This is different from the real meaning which is that the p-v a lue is the chance of obtaining such results if the null hypothesis is tr ue. 3. The p-v a lue is not the pr obability of falsely rejecting the null hypo thesis. This error is a version of the s o-called prosecutor ’s fallacy . 4. The p-v alue is no t the probability that a replicating exp eriment would not yield the same conclusion. 4 Sometimes scientists say they rep orted “the right th ing” (i.e. just the p-val ue), but it was jour- nalist’s fault to misinterpret them. But, as I hav e docu mented in my writings, often are t he offi- cial statement of laboratories, of collaboration sp okespersons, or of prominent physicists to confuse p- v alues with probabilities of hyp oth eses, as you can e.g. find in [9] and, more extensively , in http: //www.roma 1.infn.it/ ~ dagos/badm ath/index.html#a dded . A su ggestion to laymen is that, “instead of heeding impressi ve-sounding statistic s, we should as k what sci entists actuall y b elieve ” [21]. 5 As it is w ell k now n, the content of Wikip ed ia is v ariable with time. The rea son I rep ort here the list of misunderstandings as it app eared some years ago, and as it has b een more ore less until the b eginning of 2016 – I hav e no do cumented records, bu t I hav e b een chec king it from time to time, in occasion of seminars and courses and I had not real ized ma jor changes, like the reductions of the items from 7 to 5 – is that the present v ersion has b een clearly b eing influ enced by the ASA statement of Marc h 2016. (I rep ort here all seven items, although I h a ve to admit th at I get lost after the third one – b ut you for you seve n are still not enough see [22]) 4 5. (1 − p- v alue) is not the pr obability of the a lter native hypothesis b eing tr ue. 6. The significa nce level o f the test is not determined b y the p- v alue. The sig - nificance level of a test is a v a lue that s hould b e decided up on by the a gent int er pr eting the da ta before the data are viewed, and is compared against the p-v alue o r any other statistic ca lculated after the test has been p erfor med. (How e ver, repo rting a p-v a lue is more us eful tha n simply saying that the re- sults were or were no t sig nificant at a given level, and allows the rea der to decide for himself whether to consider the results significant.) 7. The p-v alue do es not indicate the size or imp or tance of the observed effect (compare with effect size). The tw o do v ary tog ether how ever – the la rger the effect, the smaller sample size will b e requir ed to get a s ignificant p- v alue.” • If we wa nt to form our minds ab out w hic h hypothesis is more or less prob ab le in the ligh t of all av ailable information, then w e need to base our reasoning on pr ob ability the ory , u ndersto o d as the mathematics of b elie f s , that is essen tially going bac k to the ideas of L aplace. In particular the u p dating ru le, present ly kno wn as the Bayes rule (or Ba yes theorem), should b e probably b etter called L aplac e rule , or at least Ba y es-Laplace rule. • The ‘ru le’, expressed in terms of the alternativ e c auses ( C i ) wh ic h could p ossibly pro du ce the effe ct ( E ), as originally d on e b y Laplace, 6 is P ( C i | E , I ) = P ( E | C i , I ) · P ( C i | I ) P k P ( E | C k , I ) · P ( C k | I ) . (1) or, consid ering a lso P ( C j | E , I ) and taking the ratio of the t wo p osterior pr ob abilities , P ( C i | E , I ) P ( C j | E , I ) = P ( E | C i , I ) P ( E | C j , I ) × P ( C i | I ) P ( C j | I ) , (2) where I stands for the b ackgr ound informatio n , sometimes imp licitly assumed. • Imp ortant consequences of this rule – I lik e to call them Laplace’s teac hings [9], b e- cause they stem from his “ fundamental principle of that b ranc h of the analysis of c hance that consists of reasoning a p osteriori f rom even ts to causes” [23] – are: – It mak es no sense to sp eak ab out how th e p robabilit y of C i c hanges if: 1. there is n o alternativ e cause C j ; 2. the wa y how C j migh t p ro du ce E is not p r op erly m o delled, i.e. if P ( E | C j , I ) has not b een somehow assessed. 7 6 This is “Principle VI“, ex p ounded in simple w ords in [23], in which he calls ‘principles’ the principal rules resulting from his theory . N ote also that Eq . (1) requires that hyp otheses C i form a ‘complete class’ (exhaustive and m utually exclusive), while Eq. (2) is more general, although it migh t req u ire some care in its application, as pointed out in [24] [ think e.g. at the h yp otheses H 1 = C 1 ∩ C 2 and H 2 = C 2 , implying: i) P ( H 1 ) ≤ P ( H 2 ) ∀ E ; ii) t he cal culation of P ( C 1 | E ) and P ( C 2 | E ) requires extra in formation]. 7 It do es n ot matter if th e assessmen t is done analytically , numerically , by simulation, or just by pu re sub jectiv e considerations – what is imp ortant to un derstand is th at without the sligh test guess on what P ( E | C j , I ) co uld b e, and on how muc h C j is more or less believ able, you cann ot mo dify your ‘confidence’ on C i , as it will be further reminded in section 6. 5 – The u p d ating of the probability ratio dep ends o nly on the so called Bayes factor P ( E | C i , I ) P ( E | C j , I ) , (3) ratio of the probabilities of E giv en e ither hyp otheses, 8 and not on the pr ob ability of other events that have not b e e n observe d and that ar e even less pr ob able than E (up on whic h p-v alues are instead calculated). – One shou ld b e careful not to confu s e P ( C i | E ) with P ( E | C i ), and in gen- eral P ( A | B ), with P ( B | A ). O r, m oving to con tinuous v ariables, f ( µ | x ) w ith f ( x | µ ), wher e: ‘ f ()’ s tands here, dep ending on the con test, for a pr ob ability function or for a pr ob ability density function (p df ): x and µ are s ym b ols f or observ ed quant ity and ‘true’ v alue, resp ectiv ely , the latter b eing in fact just the p ar ameter of the mo del we use to describ e the physic al world . – Cause C i is falsifie d by the observ ation of the even t E only i f C i cannot pro du ce it, and not b ecause of th e smallness of P ( E | C i , I ). – Extending the r easoning to contin uou s observ ables (generically called X ) c har- acterized by a p df f ( x | H i ), th e p robabilit y to observe a v alue in the smal l in terv al ∆ x is f ( x | H i ) ∆ x . Wh at matters, for the comparison of t wo h yp othe- ses in the ligh t of the observ ation X = x m , is therefore th e ratio of p d f ’s f ( x m | H i ) /f ( x m | H j ), and not th e smallness of f ( x m | H i ) ∆ x , which tends to zero as ∆ x → 0. Therefore, an hyp othesis is , strictly sp eaking, falsifie d , in the ligh t of th e obs er ved X = x m , only if f ( x m | H i ) = 0. • Finally , I w ould lik e to s tress that falsific ability is not a strict r e quir ement for a the ory to b e ac c epte d as ‘scientific’ . In f act, in m y opinion a w eak er cond ition is sufficien t, whic h I called testability in [12]: giv en a theory T h i and p ossible observ ational data D , it should b e p ossible to mo del P ( D | T h i ) in order to compare it with an alternativ e theory Th j c haracterized b y P ( D | Th j ) 6 = P ( D | T h i ). 9 This will allo w to rank theo ries in pr obabilit y in the light of empirical d ata and of an y other criteria, lik e simplicit y or aesthetics 10 without the r equiremen t of falsification, that cannot b e ac h ieved, logically sp eaking, in most cases. 11 8 Eq. 3 is also known as “ likeliho o d ratio”, but I a void and d iscourage the use of the ‘ l -word’, b eing a ma jor source of misunderstand in g among practitioners [8, 25], who regularly use the ‘ l -function’ as p df of the unknow n q uantit y , tak ing then (also i n virtue of an un needed ‘principle’) its argmax as most b elievable value , stic king to it in furth er ‘p rop agations’ [25]. (A recent, imp ortant example comes f rom tw o reports of the same organization, eac h using the ‘ l -w ord’ with tw o different meanings [26, 27].) 9 F or example String Theory ( S T ) su p p orters sh ould tell u s in what P ( D | S T ) differs from P ( D | S M ) from Standard Mod el, with D b eing past, present or fut u re ob servational data . 10 But we ha ve to b e careful with jud gments based on aesthetics, which are unav oidably anthropic (and debates on aesthetics will nev er end, while ancient Romans wisely used to say that “de gustibu s non dispu tan- dum est” and, as someone w arned, “if you are out to d escribe t h e truth, lea ve elega nce to the tailor.”[29]. This is more or less what is going on in Pa rticle Ph ysics in the p ast year s, after that nothing new h as b een found at LHC b esides the highly exp ected observ ation of the Higgs b oson in the final state, with many serious theoris ts hum bling admitting that “ Natu re does n ot se em to share our ideas of natur alness .” 11 Think for ex ample at all infinite numbers of Gaussian mod els N ( µ, σ ) that might have pro duced the 6 3 ASA statemen t on statistical significance and p-v alues 3.1 A nte factum The statemen t of the American Statistical Asso ciation on March this y ear did n ot arr iv e completely u n exp ected. Man y scien tists w ere in fact a ware and worried of the “science’s dirtiest secret”, i.e. th at “the ‘scien tific metho d ’ of testing hypotheses by statistical analysis stands on a flimsy found ation”[30]. I n deed, as Allen Caldwell of MPI Mun ic h elo quentl y puts it (e.g. in [31]) “The real problem is not that p eople ha ve difficulties in understanding Ba y esian reasoning. The problem is that they do not und erstand the frequentist app roac h and what can b e concluded from a frequentist analysis. What is not un dersto o d, or forgot- ten, is that the fr equen tist analysis relates only to p ossible data outcomes within a mo del con text, and not probabilities of a mo d el b eing correct. This misun derstanding leads to fault y conclusions.” F ault y conclusions b ased on p -v alues are coun tless in all fields of researc h , and frankly I am p ersonally m uc h more worried when they migh t affect our health 12 and securit y , or the future of our planet, rather then wh en they s pread around unj ustified claims of revo lutionary disco v eries or of p ossible failures of the so called Standard Mo del of Pa rticle Physics [9]. 13 F or instance, “A lot of wh at is published is incorrect” rep orted last year The L anc et ’s Editor-in-Chief Ric hard Horton [36 ]. This could b e b ecause, lo oking ar ound mor e or less ‘at r andom’ , statistica l ‘significant results’ will so on or later s ho w up (as that of the last frame of an xkc d carto on sh o wn in Fig. 1 – s ee [37] for the fu ll story); or b ecause dishonest (or driv en b y wishful thinking, whic h in Science is more or less the same) researc hers might do some p-hacking (see e.g. [38] and [39]) in order to mak e ‘significan t effects’ app ear – remem b er that “if y ou torture the data long enough, it will confess to an ythin g” [40]. A sp ecial mentio n deserves th e F ebruary 2014 ed itorial of Da vid T r afimo w, Director of Basic and Applie d So cial Psycholo gy (BASP), in whic h h e tak es a strong p osition against “n ull hyp othesis s ignifi cance testing pro cedur e (NHSTP)” b ecause it “has b een sho wn to b e logically inv alid and to pro vide little information ab out the actual likeli h o od of either the null or exp eriment al h yp othesis” [41]. In fact a large ec ho (see e.g. [42], [43] and [44]) had last y ear a second editorial, signed tog ether with his Associate Director Mic hael Marks observ ation x m = 4. S ince, strictly sp eaking, any Gaussian might pro duce any real v alue, it follo ws non e of the ∞ 2 mod els can b e falsified. Nevertheless, every one will agree that x m = 4 it is mor e l ikely to b e attributed to model N (3 , 1) than N (20 , 1). But you cann ot say that the observa tion x m = 4 fal sifies model N (20 , 1)! 12 See e.g. [32, 33, 34, 35] (for instance Elisabeth I orns’ c omment on New Scientist [33] rep orts that “more than half of biomedical find in gs cannot b e repro duced” and “pharmaceutical company Bay er says it fails to replicate t wo -th irds o f published dru g studies” – !!!). 13 F rankly I d o not think t hat these claims hurt fund amental physics, whic h I consider quite healthy and (mostly) d one by honest researchers. In fact, false alarms m ight even ha ve p ositive effects inside the comm un it y , b ecause they stimulate discussions on completely n ew p ossibilities and encourage new researc hes to b e un dertaken, as also recognized in the b ottom line of de R u jula’s carto on of Fig. 2. My w orries mainly concern negative reputation the field risks to gain and, p erhap s even more, bad education provided to young p eople, most of which will lea ve pure research an d will t ry to apply elsewhere the analysis meth od s they learned in searc hing for n ew p articles and new phenomena. 7 Figure 1: A ‘signi fi cant’ result obtained p rovando e riprovando [3 7]. published on F ebruary 15, 2 015, in w h ic h they ann ou n ce that, after “a grace p erio d allo w ed to authors”, “from no w on, BASP is b anning the NHSTP” [45]. 3.2 Principia Mo ving fi nally to the con ten t of th e ASA statemen t, after a short in tro d uction, in which it is recognized that “the p-v alue [. . . ] is commonly misus ed and misinterpreted,” and a reminder of what a p -v alue “informally”’ is (“the pr obabilit y under a sp ecified statistical mo del that a statistical summary of the data [. . . ]w ould b e equal to or more extreme th an its observed v alue”) a list of six items, indicated as “principles”, follo ws (the highligh ting is original). 1. P -v alues can indicate how incompatible the data are with a sp ecified statistical m o del. A p -v alue pr ovides one appr oach to s umma rizing the incompatibility b etw een a par ticular s et o f da ta and a prop osed mo del for the da ta. The most common context is a mo del, constructed under a set of assumptions, tog ether with a so- called “null hypothesis .” Often the null hypo thesis p ostulates the absence of an effect, s uch as no differ e nce b etw een tw o groups, or the absence of a relation- ship be t ween a factor and an outcome. The s maller the p -v alue, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p -v alue ho ld. This inc o mpatibility can b e in- terpreted as casting doubt o n or providing evide nc e against the null hypothesis or the underlying assumptions. 2. P -v alues do not m easure the probabil it y that the studied h yp o thesis i s true, or the probability that the data w ere produced b y random chanc e 8 alone. Researchers often wish to turn a p -v alue int o a statement a b out the truth of a null hypo thesis, or ab out the probability that ra ndom chance pro duce d the observed data. The p -v alue is neither. It is a statement abo ut data in r e lation to a sp ecified hypothetical explanation, and is not a statement ab out the explanatio n itself. 3. Scien tific conclusi o ns and bu s iness or p olicy decisions should not b e based only on whether a p -v alue passes a sp e cific thresho ld. Practices that reduce data ana lysis o r scientific inference to mechanical “ br ight- line” rules (such as “ p < 0 . 05” ) for justifying s cientific claims o r conclusions can lead to err oneous be lie fs a nd p o or de c ision making . A conclusio n do e s no t immediately b eco me “ true” on one side o f the divide and “false” on the other. Researchers s hould bring man y contextual factors in to play to derive scientifi c inferences, including the design o f a study , the quality of the measurements, the external evidence for the phenomenon under study , and the v alidity of as sumptions that underlie the data analysis. Pra gmatic co ns iderations often r equire bina ry , “yes-no” decisions, but this do es not mean that p - v alues alo ne can ensur e that a decision is correct or incor rect. The widespread use of “statistical sig nifica nce” (generally in terpr eted as p ≤ 0 . 05”) as a license for making a claim of a scientific finding (or implied truth) leads to co ns iderable distortion of the scientific pr o cess. 4. Prop er inference requires full rep o rti n g and transparency P -v alues a nd related a nalyses should not be r epo rted sele c tively . Conduct- ing multip le ana ly ses of the data a nd rep orting o nly those with certain p -v a lues (t ypically those pas sing a significance thres ho ld) renders the r ep o rted p -v alues es- sentially uninterpretable. Cherr y-picking pr omising findings, a lso known by such terms a s data dredging, s ignificance chasing, sig nificance questing, selective infer- ence, and “ p -ha cking,” leads to a spurious excess of statis tica lly s ig nificant results in the published literature and should b e vig orously avoided. One need not for- mally car ry o ut multiple statistical tests for this pr oblem to a rise: Whenev er a resear cher c ho os es what to present based on statistical results, v alid in terpr etation of those results is severely compromise d if the r eader is not infor med of the choice and its basis. Rese a rchers sho uld disclo s e the num b er of hypo theses explored dur - ing the study , all da ta collection decisions, all statistical ana lyses conducted, and all p - v alues computed. V alid scientific conclusions based on p -v a lue s and related statistics cannot be drawn without at least knowing ho w man y and which analy- ses were conducted, and how thos e ana lyses (including p -v alues ) w ere se lected for rep orting. 5. A p -v alue, or statis ti cal significance, do es not m easure the s ize of an effect or the imp ortance o f a result. Statistical significance is not equiv alent to scientific, human, o r economic sig - nificance. Smaller p - v alues do no t necess arily imply the pr esence of larg e r o r mo re impo rtant effects, and larger p - v alues do not imply a lack of imp or ta nce or even lack of effect. Any effect, no ma tter ho w tiny , c a n pro duce a small p -v alue if the s a mple size or measurement prec is ion is high enough, and larg e effects may pro duce unimpres sive p -v alues if the s a mple s ize is small or measurements are imprecise. Similar ly , identical estimated effects will hav e different p -v a lues if the precision of the estimates differs. 9 6. By itsel f, a p - v alue do es not p rovide a go o d measure of evidence re- garding a mo de l or h yp othes i s. Researchers sho uld recogniz e that a p -v alue without context or other evidence provides limited information. F or example, a p -v a lue near 0.05 taken by itself offers only w eak evidence against the null h yp othesis. Likewise, a relatively la rge p -v a lue do es not imply evidence in favor o f the null h yp othesis ; many other hy- po theses may be equa lly or more consistent with the observed data. F or these reasons , data analysis s hould no t e nd with the calcula tio n of a p -v alue when other approaches ar e appropriate and feasible. These words sound a s an admission of f ailure o f m uc h of the stat istics teac h ing and practice in the p ast many decades. But yet I fi nd their courageous statemen t still someho w un s at- isfactory , and , in particular, the first prin ciple is in m y opinion still affected by the k in d of ‘original sin’ at the basis of p -v alue misinterpretations and misu se. Man y practitioners consider in fact a v alue o ccurrin g seve ral (but often just a few) standard deviations from the ‘exp ected v alue’ (in the probabilistic sense) to b e a ’devia nce’ from the mo d el, wh ic h is clearly absur d: no v alue a mo del can yield can b e considered an exc eption f rom the mo d el itself (see also fo otnote 11 – th e reason why “p-v alues often work ” will b e discussed in section 6). Then, m o ving to principle 2, it is not that “researc hers often wish to turn a p -v alue in to a statemen t ab out the truth of a null hyp othesis” (italic mine), as if this wo uld b e an extra v agan t fan tasy: r easoning in terms of degree of b elief of wh atev er is uncertain is connatural to the ‘human u nderstandin g’ [46]: al l metho ds that do not tackle str aight the f undamental issue of the pr ob ability of hyp otheses, in the pr oblems in which this is the crucial question, ar e destinate d to fail, and to p erp etuate misu nderstanding and misuse . 4 The ‘Monster’ blessed by the 5 sigmas Rumors that the LIGO interferomete rs h ad most likely detected a gravit ation w a v e (GW) w ere circulating in autumn last y ear. Pe rson ally , the d irect information I got quite late, at the b eginning of Decem b er, was “w e ha v e seen a M onster ”, without f urther detail. T here- fore, when a few da ys b efore F ebr uary 11 quan titativ e ru mors talk ed of 5.1 sigmas, I w as disapp ointed and h ighly p uzzled. How could a Monster ha ve only just a bit more than fiv e s igmas? Indeed in the past decades we hav e seen in P article Ph ysics several effects of similar statistical significance coming and going, as Alv aro de Rujula depicted already in 1985 in his famous Cemetery of P hysics of Fig. 2 [48]. 14 Therefore for many of us a fiv e-sigma effect would ha ve b een something worth discuss ions or p erhaps further inv esti- 14 Finally h e humorously summarized his very long experience in the ‘ de R ujula p ar adox’ [47]: If you disb elieve e very result prese nt ed as ha vi ng a 3 sigma, or ‘ e quiv alently’ a 99.7% chance of b ei ng correct, you will turn out to b e right 99.7% of the times. (‘Equiv alently’ within quote marks is de Rujula’s original, b ecause he know s very well th at there is n o equiv alence at all.) 10 Figure 2: Alvaro de Rujula’s Cemetery o f Physic s [4 8], w ith g raves indic a ting ‘false alarms’ in fronti er physics, and not old physics ideas faded ou t with time, like epicycle s , phlogiston o r aether. gations but certainly not a Monster. 15 This impression w as v ery evident from the reaction man y p eople h ad after seeing the wa ve form. “Came on, this is n ot a five- sigma effect”, commen ted seve ral colleagues, more or less usin g the s ame w ords, “these are hundr e ds of sigmas!”, a colored expression to sa y th at just b y eye the h yp othesis Noise was b ey ond an y imagination. 16 The reason of th e ‘monstrosit y’ of GW150914 w as indeed in T able 1 of the accompa- n ying pap er on P r op erties of the binary black hole mer ger GW150914 [28]: a Ba y es factor “BBH m er ger” Vs “Noise” 17 of ab out 5 × 10 125 (y es, five times ten to one-hundr e d-twenty- 15 “And the July 2012 5-sigma Higgs b oson?”, y ou might argue. Come on! That w as the H iggs b oson, th e highly exp ected missing tessera to give sense t o t he amazing mosaic of the Standard Mo del, whose mass had already b een somehow inferred from other measuremen ts, although with q uite large u ncertaint y (see e.g. [49, 50]). F or this reason the 2011 data w ere sufficient to many who had follow ed this p hysics si nce years (and not sticking to the 5-sigma dogma) to b e highly confident that the Higgs b oson w as fi nally observed in a final state diagram [9]. In stead, some of th ose who were casting doubt on the p ossibil ity of observing the Higgs are th e same who w ere giving credit t o t he December 2015 γ γ 750 GeV excess at LHC (and some even t o the Op era’s sup erluminar neutrinos!). I hope t hey will learn fro m the doub le/triple les son. 16 And indeed we have also learned th at the only serious alternative hypothesis taken into account and inv estigated in detail was th at of a sab otage! 17 T o b e precise, the competing hypotheses ar e “BBH-merger & Noise” Vs “only Noise”. 11 five ). T his means that, no matter how sm all the o dds in fa v or of a BBH merger were and ev en casting doubt on the ev aluation of the Ba yes factor, 18 the p osterior o dds w ould b e extraordinary large , the pr obabilit y of n oise b eing smalle r than Shakespeare’s drop of wate r iden tically reco v ered from the sea. 19 5 Cinderella and her sisters The results of the fu ll observing run of the Ad v anced LIGO detectors (S ep tem b er 12, 2015, to J an uary 19, 2016) ha ve b een p resen ted on June 8 [4], sligh tly up dating some of the F ebru ary’s digits. F igure 3 su m marizes detector p erformances and r esults, with some imp ortant num b ers (within this conte xt) remin ded in the caption. 18 At this p oint a ‘technical’ remark is in order, which is indeed also conceptual and sheds some li ght on the difficult y of th e cal culation and possible uncertainties on the resulting v alue. Given the hypotheses H 0 and H 1 and data D , the Bay es factor H 1 Vs H 0 is P ( D | H 1 , I ) P ( D | H 0 , I ) , where for sake of simpli city we iden tify H 1 with “BBH merger” and H 0 with “N oise”. No w the q uestion is that ther e is not a single, pr e cisely define d, hyp othesis “BBH mer ger” . And th e same is tru e also for th e ‘null hyp othesis’ “Noise”. This is because each hypothesis comes with fre e parameters. F or example, in t he case of “BBH merger”, the conditional probabilit y of D dep ends on t h e masses of the tw o blac k holes ( m 1 and m 2 ), on their distance from Earth ( d ) and so on, i.e. P ( D | H 1 , m 1 , m 2 , d, . . . , I ). The same holds for the Noise, because t h ere is n o suc h a t hing as “th e Noise”, b ut rather a noise mo del with many parameters obtained monitori ng the detectors. So in general, f or the generic hypoth esis H we hav e P ( D | H , θ , I ) , in which θ stands for t h e set of parameters of the hyp othesis H . But what matters for th e calculation of the Ba yes factor is P ( D | H , I ), and this can b e ev aluated from probabilit y th eory taking account all p ossible v alues of the set of parameters θ , weigh ting them by the pd f f ( θ | H , I ), i. e. ‘simply’ as P ( D | H , I ) = Z Θ P ( D | H , θ , I ) f ( θ | H , I ) dθ . ( F . 1) But the game can b e n ot simple at all, b ecause i) this integ ral can b e very difficult to calculate; ii) the result, and then the BF, dep ends on the p rior f ( θ | H , I ) abou t th e parameters, which have to be prop erly mod eled from the physics case. A rather simple example, also related to gravitational wa ves, is shown in [51] and help ed dumpin g down claims of GW detection based on p-va lues, resulting in fact in ineffective Ba yes factors Signal Vs Noise of the order of the unity , with v alues dep ending on th e mo del considered. The calculations of the BF’s pu blished by the LIGO- Virgo Collaboration are much more complicate th an those of [51] (see [28] and [4] and references therein, in particular [5 2 ]), and they have highly b enefitted of Skilling’s Neste d Sampling algorithm [53]. And, for the little I can understand of BBH mergers, the priors on the p arameters appear to hav e b een chosen safely , so that the resulting BF’s seem very reliable. 19 William S hakespeare, The Come dy of Err ors : F or know, my lo ve, as easy ma y st thou fall A drop of water in the breaking gulf, And take u nmingled then ce that drop again, Without add ition or diminishing, 12 Figure 3: Th e M onster (GW150914), Cinderella (L VT151 012) and the third sister (GW15122 6), visiting us in 201 5 (Fig. 1 of [4] – see text for the reason of the name s ). The publi shed ‘signific ance’ of the three events (T able 1 of [4]) is, in the order, “ > 5 . 3 σ ”, “ 1 . 7 σ ” and “ > 5 . 3 σ ” , correspo nding to the following p-values: 7 . 5 × 10 − 8 , 0 . 045 , 7 . 5 × 1 0 − 8 . The log of the Bay es factors a re instead (T able 4 of [4]) appro ximately 289, 23 and 60, cor resp on d ing to Ba yes facto rs ab out 3 × 10 125 , 10 10 and 10 26 . The bu sy plot on th e left side s ho ws the sensitivit y curv es of the t wo int erferometers (red and blue curves, with plen t y of resonant p eaks) and ho w the three signals fall inside them (bands with colors matc hing the wa ve forms of the righ t plot). In short, the tw o curv es tell u s that a signal of a giv en f requency can b e distinguish ed from the noise if its amplitude is ab o ve them. Therefore all initial parts of the wa ve s, wh en the black h oles b egin to sp ir al aroun d eac h other at lo w frequency , are un ob s erv able, and the bands b elow ≈ 20 Hz are extrap olations from the physical mo dels. Later, when the fr equency increases, the w a ve ente rs th e s en sitivit y range, 20 whic h extends up to a giv en frequ ency , after w hic h w e ‘lo ose’ it. The lo we r and up p er b oundary frequencies dep end on the amplitude of the signal, as it also happ en s in acoustics. The p lot on the right s ho ws finally the ‘w av es’ 21 from the instan t they en ter the optimal 30 Hz s ensitivit y region (the acoustic analogy d epicted in fo otnote 20 might help): 20 In analogy , imagine someone communicating to u s using an aud io signal, whose frequency c hanges with time, from infrasounds to ultrasounds. W e can ear the signal only when it is in the acoustic region, conv entional ly in the range b etw een 20 and 20,000 Hz, although dep ending from p erson to p erson. And, since this sensitivit y window is not sh arp, close t o its edges loud sounds are b etter eared than qu iet ones. 21 T o b e more precise, these ar e not data p oin ts, bu t rather the ‘adapted filters’ that b est matc h t hem, and therefore they could provide a too optimistic impression of what has really b eing detected. Therefore w e hav e to use the Bay es factors provided b y the collaboration, rather th an intuitiv e judgement based on these w ave forms. 13 • The wa ve in dicated by GW15091 4 (the ‘Monster’, with GW s tand ing for gravita tional w a ve a n d 150914 for the detection date, Septem b er 14, 2015) is c haracterized by high amplitude, bu t short duration in the sensitivit y region, b ecause it fades out at a few h un dred h er tz. • GW1512 26 ins tead, although of smaller in tensity , has a longer ‘life’ (ab out 1.7 seconds) in the ‘audible’ sp ectrum, and therefore the signatur e of a BBH merger is also very recognizable. • Then there is the October 12 eve nt, L VT151012, wh ic h has an amplitude comparable to that of GW151226, but sm aller duration. It has, nev erth eless, about 20 oscillations in the sens itivity region, an information that, com bined with the peculiar sh ap e of the signal (remark ably the crests get closer as time passes, w hile the amplitude increases, unt il s omethin g ‘catastrophic’ seems to happ en) and the f act that t wo practically ‘iden tical’ and ‘sim ultaneous’ signals ha v e b een observed by the t w o int erferometers 3000 km apart, make s the exp erts h ighly confid en t that this is also a gra vitational w a ve. Ho w ev er, ev en if at a fir st sigh t it do es not lo ok dissimilar f r om GW15122 6 (bu t remem b er that the w a ves in Fig. 3 do not sho w ra w data!), the Octob er 12 ev ent, hereafter referred as Cinder el la , is not rank ed as GW, but, more mo destly , as L VT, for LIGO-Virgo T rigger. Th e reason of th e d o wngrading is that ‘she’ c annot we ar a “ > 5 σ ’s dr e ss” to go to gether with the ‘sisters’ to the ‘sumptuous b al l of the Establishment.’ In fact Chance h as assigned ‘h er’ only a p o or, un present able 1 . 7 σ rank in g, usually considered in th e P article Physics comm unity not even wo rth a ment ion in a parallel session of a minor confer en ce by an u ndergradu ate student . 22 But, d esp ite the mo dest ‘statistical s ignifi cance’, exp erts are highly confident, b ecause of ph ysics reasons 23 (and of their unders tand ing of bac kground), that this is also a gra vitational w av e radiated by a BBH merger, muc h m ore than the 87% qu oted in [4]. 24 22 Note ho w the quoted p -v alue of 0.045 associated to it is just b elow the (in-)famous 0.05 “significance” threshold reminded in t he xkcd cartoon of Fig. 1. I hop e it is so just by c hance and that no “p-va lue ≤ 0 . 05” requirement was a pp lied t o th e data, then filtering out other possible goo d signals. 23 Detecting something that has go o d reason to exist , b ecause of our und erstanding of the Physica l W orld (related to a netw ork of other exp erimen tal facts and theories connecting t h em!), is q uite different from just observing an unexp ected bump, p ossibly due to bac kground, even if with sma ll probability , as already commented in footnote 15. [And remember that whatev er w e observe in real lif e, if seen with h igh enough resolution in the N - dimensional phase space, had very smal l probability to occur! ( imagine, as a simplifie d example, th e pixel con tent of an y picture y ou tak e walking on the road, in which N is equal to five, i.e t wo plus the RGB cod e of eac h pixel).] 24 T o understand how m uch p eople b elieve on a scientific statement it is often u seful, b esides prop osing b ets [9], to ask ab out the complementary hyp othesis. F or example when I see a 90% C.L. u pp er limit on a quantit y , I ask “do y ou really b eliev e 10% t hat the v alue is ab ov e that limi t”, or, even more em barrassing, “please use your metho d to ev aluate th e 50% C.L. u pp er limit, t hen, whatever num b er comes out, tell me if you really b elieve 50-50 that t he va lue could b e in either side of the limit, and b e ready to accept a b et with 1 to 1 odd s in the direction I will choose.” ( T o learn more ab out the absurdities of ‘frequentistic cov erage’ and also about li mits deriv ed from ‘ob jective Ba yesian meth o ds,’ see section 10.7 and chapter 13 of [8].) In the case of this 87% probability that L VT151012 is a GW from BBH merger the question to ask is “do you 14 Indeed the most useful n umb er exp erimen talists can provide to th e scien tific communit y to quan tify ho w the exp erimen tal data alone fa vo r the ’Signal’ hyp othesis is the Ba yes factor, as exp oun ded in the p r eam ble. And this f actor is v ery large also for Cin derella: ≈ 10 10 . This m eans that, ev en if yo ur initial o dd s Signal Vs Noise w ere one to one million, the observ ation of the LIGO in terferometers turns th em in to 10,000 to 1, i.e. a p robabilit y of BBH merger of 99.99%. 25 No w the question is, ho w can a modest 1 . 7 σ effect b e compatible with a Ba ye s factor as large as 10 10 ? The solution to this apparent parado x will b e giv en in th e n ext s ection, but I antic ipate the answer: p-values and BF’s ar e two differ ent things, and ther e is no simple, gener al rule, inside pr ob ability the ory, that r elates them. 6 P-v alues Vs Ba ye s factors Ha ving d iscu ssed at length this topic elsewhere (see in particular sections 1.8, and 10.8 of [8]), I sk etc h h ere th e main points, with the h elp of some plots. This is ob viously a didactic example and do es not en ter at all into the (ve ry complicate and CPU time consum ing) details of the analysis of the in terferometer data (see fo otnote 18). In particular a direct observ ation will b e considered, wh ile in general h yp othesis tests are p er f ormed on a statistic c hosen with lar ge fr e e dom . 26 So w e just consider here simple mo dels H i that could pro duce the quantit y x according to p df ’s f ( x | H i , I ). • As reminded ab o ve , acco rd ing to probability theory wh at m atters for th e up date of relativ e b eliefs is the r atio of the p df ’s. F or example the observ ation x m = 5 sho wn in the upp er plot of Fig. 4 mo difies our b eliefs in fav or of H 3 , w ith resp ect to H 1 and H 2 , no matter the size of the ar e a under the p df ’ s right of x m . • In p articular H 2 is ruled out (‘falsified’) b ecause, b eing f ( x m | H 2 ) = 0, it cann ot pro du ce the observ ation, despite it pr ovides the h ighest probability of X > x m . 27 really b elieve 13%, i.e. ab out 1 to 7, that this even t is not a gra vitational w av e d ue to a BBH merger?” (and w e should not accept any answer which is, ev en partial ly , based to the smallness of the sigmas.) A s a matter of fact I find this 87 % b eyond my u nderstanding, b ecause suc h a probability has to dep en d on the prior probabilit y of BBH mergers. F or this rea son I will fo cus in the sequel only on Bay es factors and how they ( do not sim ply ) relate to p -v alues. 25 Note that this probabilit y dep ends on set of hypotheses taken in accoun t. If another, alternativ e p hysical hypothesis H ∗ to explain the LIGO signals is considered, than the Ba yes factor of H ∗ Vs “BBH merger” has t o b e ev aluated, and the absolute probabilities re-calculated accordingly . 26 It is p erhaps imp ortant to remind th at, among other problems, p-val ues are affected by arbitrarit y of the test v ariable used (see e.g. [54]), as well by the chosen subset of data. With some ex p erience I have developed m y golden rule : The more ex otic is the nam e of the test, the less b elieve the result. The rationale is that I’m pretty sure that several more common tests h a ve b een discarded b efore arriving to t hat whic h pro vided the desire d significance. 27 Note that, contrary t o the similar probabilities for the mo dels H 1 and H 3 , this 13% is not a p -v alue, b ecause f ( x | H 2 ) ≥ f ( x m | H 2 ) ∀ x > x m , while a p-v alue implies an integral on ‘less probable’ va lues. 15 2 4 6 8 10 x 0.1 0.2 0.3 0.4 0.5 f H x L H 1 H 2 H 3 4% 9% 13% x m 2 4 6 8 10 x 0.1 0.2 0.3 0.4 0.5 f H x L H 3 H 4 H 5 H 6 Figure 4: Several mo dels that could have pro duced the observed value of x m [8]. • It follo ws that, if the v alues of p d f ’s f ( x m | H i ) are equal for all H i , as in the lo wer plot of Fig. 4, then the experiment is irrelev ant a n d we hold our beliefs, indep endently of ho w far x m o ccurs from the exp ected v alues E[ X | H i ], or of th e size of the area left or righ t x m . • The reason why p-v alues ‘often w ork’ (and can then b e useful alarm b el ls when getting exp eriments run n ing, or v alidating freshly collected d ata), is quite simp le. – Small p-v alues are norm ally asso ciat ed to small v alues of the p df, as shown in the upp er plot of Fig. 5. – It is then c onc eivable an alternativ e h yp othesis H 1 suc h that f ( x m | H 1 ) ≫ f ( x m | H 0 ), as shown in th e b ottom plot of Fig. 5. 16 0 10 20 30 40 50 0.00 0.04 0.08 0.1 2 x f ( x ) H 0 x m P ( x > x m ) = 0.01 0 10 20 30 40 50 0.00 0.04 0.08 0.1 2 x f ( x ) H 0 x m H 1 Figure 5: Pdf ’s o f X given the null hyp othe sis H 0 and the alternative hyp othesis H 1 . – Then, if this is the case, the observ ed x m would push our b eliefs towar ds H 1 , in the sense BF( H 1 : H 0 ) = f ( x m | H 1 ) f ( x m | H 0 ) ≫ 1 . – BUT we need to take int o accoun t also the priors o dd s P ( H 1 | I ) /P ( H 0 | I ). – In the extreme case suc h a conceiv able H 1 could not exist , or it could b e not b eliev able, 28 or it could b e just ad h o c, as it h app ens in recen t y ears, with a plethora of ‘theorists’ who give credit to an y fluctuation. If this is the case, as it is often the case in fron tier p hysic s, then ⇒ P ( H 1 | I ) /P ( H 0 | I ) → 0 ⇒ the smal lness of the p-value is irr elev ant! (Note that if, instead of th e smallness of the v alue of the p df, the rational were reall y the smallness of the area b elo w the p df, than the absurd situation migh t arise in whic h one could c ho ose a “rejection area” anywhere, as sh o wn in chapter 1 of [8].) 28 F or the distinction betw een what is c onc eivable (“Nothin g is more free than the imaginatio n of man”) and what is b elievable a reference to David Hume [46] is a must. 17 • Finally , in order to unders tand the apparen t parado x of large p -v alue an d indeed v ery large BF, think at a very p redictiv e mo del H 1 , whose p df of the observ able x ov erlaps with that of H 0 , lik e in th e upp er plot of Fig. 6. W e clearly see that f ( x m | H 1 ) ≫ f ( x m | H 0 ), th us resulting in a Ba y es f actor h ighly in fav or of H 1 , although the p -v alue calculated from the n ull h yp othesis H 0 w ould b e absolutely insignific ant . Something like that o ccurs in the analysis of th e gra vitational wa ve analysis, the case of Cind erella b eing the most strikin g one. 29 • And ‘parado xically’ – this is just a colloquial term, since there is n o parado x at all – large deviations fr om the exp ected v alue of x giv en H 0 , corresp onding to small p-v alues, are those whic h fav or H 0 , if H 1 and H 0 are the only h yp otheses in h an d , as s h o wn in th e b ottom plot of the same figure. No w, in the light of these examples, I simp ly r e-prop ose yo u th e follo wing sentence fr om the first p rinciple of the ASA’s statemen t “Th e smaller the p -v alue, th e greater the statistica l in compatibilit y of the data with th e null hypothesis, if the u nderlying assump tions used to calculate the p -v alue hold.” [2] As y ou can now und erstand, it is n ot a matter of assum ptions concerning H 0 , but rather on whether alternativ e hyp otheses to H 0 are conceiv able and, more imp ortant, b eliev ab le! 29 I would like to remind that this is just an academic example t o sho w that effects of this kin d are p ossible and, as far as the GW analysis, I rely on the LIGO-Virgo collab oration for t he ev aluation of p -v alues and Ba yes factors. I am not arguing at all that there could b e mistakes in the calculation of the p- v alues, but rather that it is th e in terpretation of the latter to be troublesome. Finally , p eople mostly used to perform χ 2 tests must hav e already realized t hat th e example does not apply tout c ourt to what they do, b ecause in that case H 1 is usually ‘richer’ than H 0 and it has then a h igher level of adaptability . Therefore the observed v alue of χ 2 decreases (with a ‘p enalt y’ t h at frequentists quantify with a reduced number of degree of freedom). As a consequ ence, the measured v alue of the test va riable is d ifferent under the tw o hypoth esis, and, in order to distinguish them, let u s in d icate the first b y χ 2 0 and the second by χ 2 1 . Wh at instead still holds, of the example sketc hed in the text , is th at the adaptab ility of H 1 makes the p -v alue calculated f rom f ( χ 2 1 | H 1 ) la rger that th at calculated from f ( χ 2 0 | H 0 ), Z ∞ χ 2 1 m f ( χ 2 1 | H 1 ) dχ 2 1 > Z ∞ χ 2 0 m f ( χ 2 0 | H 0 ) dχ 2 0 , and therefore H 1 ‘gets preferre d’ to H 0 . But, as stated in the text, the alternative hyp othesis H 1 could be hardly b eliev able, and therefore its ‘nice’ p-val ue will n ot affect t he credibility of H 0 . This almost regularly happ ens when suspicions against H 0 only arise from event c ounting in a particular v ariable, without any sp e cific physic al signatur e . [As a side remark, I would lik e to p oint out, or to remind, that one of the nice features of the Bay es factor calculated integra ting o ver th e prior p arameters of the mo del, as sketc hed in footnote 18, is that mo dels which ha ve a large numbers of parameters, whose p ossible val ues a priori extend o ver a large (hyper-) vol ume, are sup p ressed by the integral ( F. 1) with resp ect to ‘simpler’ mo dels. This effect is kno wn as Bayesian Oc c am’s r azor and is independent from other considerations which might enter in the choice of t h e p riors. Those interested to th e sub ject are invited to read chapter 28 of David MacKay’s great bo ok [55].] 18 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 x f ( x ) H 0 x m H 1 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 x f ( x ) H 0 x m H 1 Figure 6: P df ’s of X given the null hypothes is H 0 and the alternative hypothesis H 1 (case of ov erlapping pdf ’s). 19 6.1 Pla ying with sim ulations I hop e it is now clear the r eason why p -v alues and Ba y es f actors h a v e in p rinciple nothin g to do with eac h other, and why p-v alues are not only resp onsible of u njustified claims of disco v eries, but migh t also relega te gen uin e signals to the lev el of fluk e, or reduce th eir ‘sig- nificance’, the w ord no w used a s normally unders to o d a nd not with the ‘tec hn ical meaning’ of statisticians. But since I kno w that many m igh t not b e used with the reasoning just sho wn, I m ad e a little R script [56], so that those who are still sceptical can run it and get a feeling of wh at is going on. # init ializ ation mu.H0 <- 0; sigma. H0 <- 1 mu.H1 <- 0; sigma. H1 <- 1e- 3 p.H1 < - 1/2 mu <- c(mu. H0, mu. H1) sigma <- c(si gma.H 0, s igma. H1) # simu latio n fu nctio n simula te <- fun ction () { M <- rbinom(1 , 1, p.H1 ); x <- rnorm(1, mu[M+1], sigma[M+1 ]) x <- rnorm(1, mu[M+1 ], sigm a[M+1 ]) p.val <- 2 * pnorm(m u[1] - abs(x -mu[1 ]), mu[1], sigma[ 1]) BF <- dnorm(x , mu[2] , sigma[2]) / dnorm(x, mu[1], sigma[1]) lBF <- dnorm( x, mu[2 ], sigma[2], log=TRUE) - dnorm(x, mu[1], sigma[1], log=TRUE) cat(sp rintf ("x = %.5f => p.v al = %.2e, BF = %.2e [ log(B F) = %. 2e ]\n", x, p.val, BF, lBF)) return (M) } By default H 0 is simply a stand ard Gaussian distribu tion ( µ = 0 and σ = 1), while H 1 is still a Gaussian centered in 0, with a ve ry narrow width ( σ = 1 / 1000). The p rior o dds are set at 1 to 1, i.e. P ( H 1 ) = P ( H 0 ) = 1 / 2. Eac h call to the f u nction simu late() prints the v alues that w e w ould get in a real exp eriment ( x , p-v alue, Ba y es factor an d its log) an d returns the tru e mo d el (0 or 1), stored in a vec tor v ariable for later c heck. In this w a y you can try to infer what was the real cause of x b efore knowing the ‘tru th’ (in simulatio ns we can, in ph ysics we cannot!). Here are the r esults of a small run, w ith x = 12 c hosen in order to fill th e p age, thus p ostp oning the solution to the next one. > set.seed(150 914); n=12; M <- rep(NA, n); for(i in 1:n) M[i] <- simulate() x = -0.00079 => p.val = 9.99e-01, BF = 7.29e+02 [ log(BF) = 6.59e+00 ] x = -0.62293 => p.val = 5.33e-01, BF = 0.00e+00 [ log(BF) = -1.94e+05 ] x = -0.00029 => p.val = 1.00e+00, BF = 9.57e+02 [ log(BF) = 6.86e+00 ] x = -0.00162 => p.val = 9.99e-01, BF = 2.68e+02 [ log(BF) = 5.59e+00 ] x = -0.39258 => p.val = 6.95e-01, BF = 0.00e+00 [ log(BF) = -7.71e+04 ] x = -0.82578 => p.val = 4.09e-01, BF = 0.00e+00 [ log(BF) = -3.41e+05 ] x = 0.00073 => p.val = 9.99e-01, BF = 7.69e+02 [ log(BF) = 6.64e+00 ] x = -0.00012 => p.val = 1.00e+00, BF = 9.93e+02 [ log(BF) = 6.90e+00 ] x = 0.22295 => p.val = 8.24e-01, BF = 0.00e+00 [ log(BF) = -2.48e+04 ] x = -0.00022 => p.val = 1.00e+00, BF = 9.76e+02 [ log(BF) = 6.88e+00 ] x = 0.00117 => p.val = 9.99e-01, BF = 5.07e+02 [ log(BF) = 6.23e+00 ] x = -1.03815 => p.val = 2.99e-01, BF = 0.00e+00 [ log(BF) = -5.39e+05 ] 20 And the winners ar e : > M [1] 1 0 1 1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 It should not b e an y longer a sur prise that th e b est figure to d iscriminate b etw een the t wo mo dels is the Ba y es factor and not the p-v alue. 30 Y ou can now pla y with the s imulations, v arying the parameters. If you wa nt to get a situation yielding Ba y es factors of O (10 10 ) y ou can k eep the standard parameters of H 0 , fixing instead mu.H1 at 1 . 7 and sigma.H 1 at ≈ 4 × 10 − 10 . T hen yo u can choose p.H1 at wish and ru n the simulat ion. (Y ou also need to c hange the num b ers of d igits of x , replacing “ %.5f ” by “ %.11f ” insid e s printf() .) 7 Conclusions Uncritical or wishfu l use of p-v alues can b e dan gerous, not to sp eak of unscru pulous p- hac king. While y ears ago these criticisms w ere raised by a minorit y of thorny Ba yesia ns, no w the effect on the results in sev eral fields of science and tec hnology is felt as a primary issue. 31 The statement of the American Statistic al Ass o ciatio n is c ertainly commendable in 30 If you don’t li ke how t h e p- val ue is calculated in th e script, b ecause you might argue about one-side or tw o-sides tail(s), y ou are w elcome to recalculate it, bu t the substance of the conclusions will n ot change. 31 In the meanwhile it seems that particle physicists are hard in learning the lesson and th e num b er of gra ves in the Cemetery of physics (Fig. 2) h as increased since 1985, the last funer al b eing recen tly celebrated in Chicago on August 5, with th e follow ing obituary for the de ar dep arte d : “The intri guing h int of a p ossible resonance at 750 GeV deca yin g into photon pairs, which caused considerable interest from the 2015 data, has not reapp eared in the m uch larger 2016 data se t and thus app ears to b e a statistical fluctuation” [57]. And de Rujula’s dictum (footnote 14) gets corroborated. Someone w ould arg ue that this incident has happ ened b ecause t h e sigmas w ere only ab out three and not five. But it is not a qu estion of sigmas, but of Physics, as it can b e u ndersto od by those who in 2012 incorrectly turn ed the 5 σ into 99,999 94% “discov ery p robabilit y” for t h e H iggs [58], while in 2016 are sceptical in front of a 6 σ claim (“if I hav e to b et, my money is on th e fact that the result will not survive th e verificatio ns” [59 ]): the famous “du sublime au ridicule, il n’y a qu’un pas” seems really approp riate! (Or the less famous, outside Italy , “siamo uomini o cap orali!?”) Seriously , the question is indeed that, now that predictions of New Ph y sics around what should h a ve been a natur al scale substantially all failed, the only ‘sure’ scale I can see seems Planc k’s scale. I really hop e that LHC will surprise us, but hoping and b elieving are different things. And, since I have the impression that are too many nervous p eople aroun d, b oth among exp erimental ists and theorists, and b ecause th e number of p ossible histograms t o lo ok at is quite large, after t he e asy b ets of the past years (against CDF p eak and against sup erluminar neutrinos in 2011; in fav or of t he Higgs b oson in 2011; against th e 750 GeV di-ph oton in 2015, n ot to mentio n that against Sup ersymmetry going on since it fa iled to predict new phen omenology b elow the Z 0 – or the W ? – m ass at L EP , thus inducing me more th an tw ent y years ago to gav e aw ay all SUSY Monte Carlo generators I had developed in ord er to optimize the p erformances of the HER A detectors.) ∗ I can serenely b et, as I keep sa ying since July 2012, that the first 5-sigma claim from LHC will b e a fluke . ( I hav e instead little to commen t on the so ciology of the Par ticle Physics theory comm un ity and on t he va lidity of ‘ob jective’ criteria to rank scientific v alue and p rod u ctivity , b eing the situation self evident from the hundreds of references in a review pap er which even had in th e fron t page a fake PDG entry for the particle [60] and other amenities y ou can find on th e web, like [61].) ∗ Note added : on August 22 , 2016 a supersymm etry b et among theorists has b een settled in Copen hagen, de claring winners th ose who b ette d against sup ersymmetry [62]. But I d o n ot think all S USY sup p orters will agree, b ecause some of them seem to b ehave like th e guy who said (reference mis sing) “I will not die, and nob od y wil l b e able to convince me of the opp osite” – try to convince a dead man he died! 21 addressing the issue, but it is in my op in ion unsatisfactory not admitting that the question is inherent to all statistical metho ds that refuse the very idea of probabilit y of hypotheses, or of “probability of causes”, i.e. wh at Po incar´ e used to call “the essentia l problem of the exp erimenta l metho d.” While I had exp erienced sev eral times in the past, in cluding this winter, cla ims of p ossible breaking disco veries in P article Ph ysics simply d ue to misinterpretations of p - v alues, for the first time I ha ve realized of a case in which judgements b ased on p-v alues strongly reduce the ‘significance’ of imp ortan t results. T h is happ ens with the gra vitational w a ve ev ent s r ep orted this year by the LIGO-Virgo c ollab oration, and in particular with the Octob er 12 ev ents timidly rep orted as a LIGO-Virgo T rigger (‘Cinderella’), b ecause of its 1.7 sigmas, in sp ite of the huge Ba yes factor of ab out 10 10 , that should instead con vince an y hesitating physicist ab out its nature of a gra vitational wa v e radiated b y a Binary Blac k Hole merger, esp ecially in the light of the other, more solid tw o ev en ts (‘the tw o sisters’). 32 I hop e than that L VT15101 2 will b e up graded to GW 151012 and th at in future searc hes the Ba y es facto r will b ecome the principal figur e of merit to ran k gra vitational w a v e candidates. 32 The last point deserves a comment, b ecause someone would ob ject that the three even ts are “inde- p endent” and, “having n oth ing to do with each other, w e hav e to p rove one by one 1) fi rst, that it is a gra vitational wa ve, and then 2) that it comes from BBH merger.” In reality it is consistency of many things, including th e fact th at the val ues of th e inferred parameters fall in the exp ected region, that makes us t o b eliev e that they are gra vitational wa ves and come from a BBH merger. This is b ecause Physi cs, mean t as a Science, i.e. an activity of our mind s to und erstand th e Physica l W orld, can be viewed as a large network of exp erimental facts and mo d els, connecting eac h other (“a matrix of b eliefs”, as historian Galison pu ts it [63]). F or th is reason it is very hard, or even imp ossible, to accommo date in the ove rall pictu re a n ew observ ation that breaks dramatically the net, like the 2011 ‘sup erluminar neutrinos.’ Not by chance the title of the F ebruary 11 p ap er was Observation of Gr avitational Waves f r om a Binary Black Hole M er ger stressing b oth observations at onc e (or if you lik e ‘dis cov eries’ – but I don’t wa nt to enter into the question of what is ‘d iscov ery’ and w hat is ‘observ ation’, and I fi nd it commendable that the coll ab oration used lo w profile terminology). Therefore, after the fi rst ev ent we feel highly confiden t that ev ents of that k ind, with masses of that order of magnitud e do exist, and with this resp ect the three ev ents are not indep endent, if w e refer to probabilistic i nd ep endence. More precisely they are p ositively c orr elate d , i.e. P ( E 2 = “BBHm’s GW” | E 1 = “BBHm’s GW” , I ) > P ( E 2 = “BBHm’s GW” | I ) P ( E 1 = “BBHm’s GW” | E 2 = “BBHm’s GW” , I ) > P ( E 1 = “BBHm’s GW” | I ) , and so on. This effect, indeed rather intuitiv e, can b een shown to o ccur in a qu antita tive w ay , mo delling Galison’s matrix of b eliefs with a (simplified) pr ob abili stic network ‘Ba yesian net work’. F or this reas on our b elief that also Cinderella is a gra vitational w av e from a BBH merger increases in the ligh t that also th e sisters are ob jects o f the same kind. Note that this corroboration effect acts on the priors, while the Ba yes factor should only co ntain the experimental info rmation. But this is n ot exactly true, d ue to role t hat the priors on the mo del p arameters play in the calculation of the Bay es factor via the integral ( F . 1) of fo otnote 18. As soon as w e start getting informatio n about the BBH merger parameters the prior p df f ( θ | H , I ) to analyze th e next events b ecomes less ‘diffuse’ than h o w they initially w ere, thus increasing the va lue of the integ ral ( → “Occam razor”) and then the resulting Ba yes factor. (F or a toy mo del sh owing t he effect of mutually corrob orating h yp otheses see e.g. the Bay esian net work describ ed in App en dix J of [6 ].) Note added : it is in teresting t o remark how, after six mon ths from t h e first announcement, with muc h emphasis on th e sig mas to prov e its origin (plus Ba yes factors), the Monster is finally considered ‘sel f ev id ent’, or more precisely , “strong enough to b e apparent, without u sing any w av eform mo del, in th e filtered d etector strain data” [64]. So proceeds Science: the ‘ma trix of b elief ’ has b een clearly extended. 22 I fi nally conclude w ith some questions ask ed at the en d of talk on which th is pap er is based. • Which Bayes factor would char acterize the 750 GeV e xc ess? The r esu lt dep ends on the mo del to exp lain the excess 33 and an answe r came the w eek after MaxEnt 2016 by Andr ew F owlie [66]. F or the mo del considered h e got a BF ar ound 10, the exact v alue b eing irrelev ant: a weak ind ication, b u t nothing striking to force sceptics to c hange sub stan tially their opin ion. 34 • Could have CDF at F ermilab claime d to have observe d the Higgs b oson if they had done a Bayesian analysis? I am quite p ositiv e they could ha v e it, also b ecause the prior on th e p ossible v alues of the Higgs mass was not so v ague and we ll matc hing the v alue found later, and therefore the Ba ye s F actor wo u ld hav e b een rather high (and the prior probability of a p ossible manifestation of th e b oson in the fi nal state was high to o). Ac kno wledgements This work w as partially s u pp orted by a gran t f rom Simons F ound ation, w hic h allo we d me a stimulating w orking environmen t d uring m y visit at the Isaac Newton Institute of Cam brid ge, UK. T he understand ing an d /or presentat ion of s everal things of this pap er has b enefitted of the in teractions w ith Pia Astone, Ar iel Catic ha, K yle Cranmer, W alter Del P ozzo, Norman F ento n, Enrico F ranco, Gianluca Gemme, Stefano Giagu, Massimo Gio v annini, Keith Inman, Gianluca Lamanna, Pao la Leaci, Marco Nardecc hia, Aleandro Nisati, and Cristiano P alom ba. I am particularly indeb ded to Allen Caldw ell, Alv aro de Rujula and John Skilling for m an y discussions on ph ysics, p robabilit y , epistemology and so ciology of scientific communities, as w ell for v aluable commen ts on the man u script, whic h has also b enefitted of an accurate reading by Chr istian Du ran te and Dino Esp osito. References [1] B. P . Abb ott et al. (LIGO Scien tific Collab oratio n and Virgo Collab ora- tion), O b servation of Gr avitational Waves fr om a Binary Bla ck Hole Me r ger , PRL 11 6 , 061102 (2016 ), https://dcc .ligo.or g/public/0122/P150914/014/ LIGO- P150914 _Detectio n_of_GW150914.pdf [2] R. L. W asserstein and N. A. Lazara, The ASA’s Statement on p-V alues: Con- text, Pr o c ess, and Purp ose , The American S tatistician, 70:2 (2016) 129-133, DOI: 10.1080/0 0031305.2016.1154108, ht tp://dx. doi.org/1 0.1080/00031305. 2016.115 4108 33 As an example from P article Ph ysics of mod el depend ent Bay es factors see [65 ]. 34 A side question is ho w an ex p erimenta l team can rep ort th e Ba yes factor, since it dep ends on the alternativ e mo del. Obviously it cannot (one of “Laplace’s teac hin gs”), b ut they provide Bay es factors using ‘popular’ mo dels, or it co uld just rep ort the in tegral whic h app ears in the d enominator, and pro vide informations that allo ws other ph ysicists to ev aluate th e n umerator, depend ing on the their model. 23 [3] M. Bak er, Statisticians issue warning over misuse of P values , Nature, 531 (2016) 151. [4] B. P . Abb ott et al (LIGO S cien tific Collab oration and Virgo Collab oratio n), Binary Black Hole Mer gers in the first A dvanc e d LIGO Observing Run , http: //arxiv. org/abs/ arXiv:16 06.04856 . [5] I. J. Go o d, A list of pr op erties of Bayes-T uring factors , do cument declassified by NSA in 2011, htt ps://www .nsa.gov /news- features/declassified- docu ments/ tech- journal s/assets/ files/list- o f- properties.pdf . [6] G. D’Agostini, A defense of Columb o (and of the use of Bayesian i nf e r enc e in for ensics): A multilevel intr o duction to pr ob abilistic r e asoning , arXiv:1003 .2086, http://a rxiv.org /abs/1003.2086 . [7] S. B. McGra yne, The the ory that would not die: H ow Bayes’ rule cr acke d the enigma c o de, hunte d down russian su b marines, and emer ge d triumphant fr om two c enturies of c ontr oversy , Y ale Univ ersit y Pr ess 2012. (Video of the pr esen tation of the b o ok at Go ogle a v ailable at h ttps://ww w.youtub e.com/watch?v=8oD6eBkjF9o .) [8] G. D’Agostini, Bayesian r e asoning in data analysis – a critic al intr o duction , W orld Scien tific 2003. [9] G. D’Agostini, P r ob ably a disc overy: Bad mathematics me ans r ough scie ntific c om- munic ation , h ttp://ar xiv.org/ abs/1112.3620 . [10] https:/ /en.wiki pedia.or g/wiki/Misunderstandings_of_p- values . [11] http:// en.wikip edia.org /wiki/P- value https:// en.wikip edia.org/wiki/Misunderstandings_of_p- values . [12] G. D’Agostini, F r om Observations to Hyp otheses: Pr ob abilistic R e asoning V ersus F alsific ationism and its Statistic al V ariations , 2004 V ulcano W orkshop on F r on tier Ob jects in Astrophysics and Pa rticle Physic s, V ulcano (Italy), http://ar xiv.org/ abs/phys ics/0412 148 [13] H. Po incar´ e, “Sci e nc e et Hyp oth` ese” , 1905. [14] J. Skilling, Intr o ductory tutorial at the MaxEnt 2016, J uly 10-15, 2016 Ghen t, Bel- gium, http: //www.ma xent2016 .org . [15] See e.g. https ://www.y outube.c om/watch?v=EYPapE- 3FRw . [16] D. Ov erby e, Physicists in Eur op e Find T antalizing Hints of a Mysterious New Particle , The New Y ork Times, D ecem- b er 15, 2015, http://w ww.nytim es.com/2 015/12/16/science/ physicis ts- in- europe- find- tantalizing- hin ts- of- a- mysterious- new- particle. html?_r= 0 . 24 [17] J. Parsons, CERN announc es p otential disc overy of a new H iggs Bo- son p article a t the L ar ge Had r on Col lider , Mirror, D ecem b er 17, 2015, http://w ww.mirro r.co.uk/news/technology- science/science/ cern- announc es- potential- discovery- new- 70 27421 [18] M. Delmastro, Qualc osa di nuovo da LHC? Solo il temp o lo dir` a , Le Scienze 19 dicem bre 2015, http: //www.le scienze. it/news/2015/12/19/news/qualcosa_ di_nuovo _a_lhc_s olo_il_tempo_lo_dira_- 2900622/ [19] F. Flam, Lies, Damne d Li e s and Physics , Blo omberg View, Decem- b er 30, 2015, https://ww w.bloomb erg.com/view/articles/2015- 12- 30/ lies- damned- lies- and- phys ics . [20] B. Cr ew, Evidenc e of a new p article that c ould br e ak the standar d mo del of physics is mounting , Science Alert, Marc h 21, 2016, http:/ /www.sci encealer t.com/ evidence - o f- a- n ew- particle- that- c ould- break- the- standard- m odel- of- physics- is- mounti ng . [21] Ph. Ball, I’d put a tenner – but not a ton – on the Hi g gs-Boson existing , The Guardian, Decem b er 23, 2011, htt ps://www .theguar dian.com/commentisfree/ 2011/dec /23/crit ical- scientist- higgs- boson . [22] S. Go o dman, A Dirty Dozen: Twelve P- V alue Misc onc eptions , Seminars in Hematolo gy 45 (2008) 135, htt p://www. perfendo .org/docs/BayesProbability/ twelvePv aluemisc onceptions.pdf . [23] P .S. Lap lace, Essai philosophique sur les pr ob abilit´ es , 1814, http://b ooks.goo gle.it/books?id=JrEWAAAAQAAJ (the English quotes in this p a- p er are tak en from A.I. Dale’s translation, Sprin ger-V er lag, 1995). [24] N. F enton, D. Berger, D. Lagnado, M. Neil an d A. Hsu, When ‘neutr al’ e vi- denc e stil l has pr ob ative value (with implic ations fr om the Bar ry Ge or ge Case) , Science and Justice 54 (2 014) 274, ht tp://www .science andjusticejournal.com/ article/ S1355- 0306(13 )00059- 2/abstract . [25] G. D’Agostini, A symmetric Unc e rtainties: Sour c es, T r e atment and Potential Dan- gers , arXiv:physics/ 0403086, http://ar xiv.org/ abs/phys ics/0403086 . [26] Europ ean Net work of F orensic Science Institutes, E N FSI Guidelines for the E valu- tative R ep orting in F or ensic Scienc e , Ma rch 8, 2015, http://www.enfs i.eu/sit es/ default/ files/do cuments/external_publications/m1_guideline.pdf [27] Europ ean Net work of F orensic Science Institutes, Best Pr actic e Manual for the F or ensic Examination of Digital T e chnolo gy , ENFSI-BPM-FIT-01, No v ember 2015. http://www. enfsi.eu /sites/default/files/documents/1._forensic_ examinat ion_of_d igital_technology_0.pdf 25 [28] B. P . Abb ott et al (LIGO Scient ific Collab oration and Virgo C ollab oration), P r op- erties of the Binary Black Hole Mer ger GW150914 , PRL 116 , 24110 2 (2016), https:// dcc.ligo .org/LIGO- P1500218/public [29] https:/ /en.wiki quote.or g/wiki/Truth#H [30] T. Siegfried, T. (2010) , Odds Ar e, It’s Wr ong: Scienc e F ails to F ac e the Shortc om- ings of Statistics , S cience News 177 , 26 , https://www.s ciencene ws.org/article/ odds- are- its- wrong [31] A. C aldw ell, Lectures at the Sc ho ol on Ba y esian analysis in Physics and Astronom y , Stellen b osch, S outh Africa, 23-26 Nov ember 2013. [32] G. Naik, Scientists’ Elusive Go al: R epr o ducing Study R esults , The W all Street Journal, Decem b er 2, 2011, http: //www.ws j.com/ar ticles/ SB100014 24052970 203764804577059841672541590 . [33] E. Iorns, Is me dic al scienc e b uilt on shaky foundations? , New S cien- tist, 12 Septem b er 20 12, https://www. newscien tist.com/article/ mg215288 26- 000- is- medical- science- built- on- shaky- foundations/ . [34] P . Jump , M or e than half of psycholo gy p ap ers ar e not r epr o ducible , Times Higher Education, August 27, 2015, https://w ww.times highered ucation.com/news/ more- half- psychol ogy- papers- are- not- reproducible . [35] P . Jump, R epr o ducing r esults: how big is the pr oblem? , Times Higher Educa- tion, Septemb er 3, 2015, https: //www.ti meshighe reducation.com/features/ reproduc ing- results- how- big- is- the- problem . [36] R. Horton, em Offline: What is medicine’s 5 sigma?, T h e Lancet 385 (20 15) 1380, h ttp://ww w.thelan cet.com/journals/lancet/article/PIIS0140- 6736% 2815%296 0696- 1/fullt ext [37] xkcd , Signific ant , http://x kcd.com/ 882/ [38] L. D. Nelson, F alse-p ositives, p- hacking, statistic al p ower, and evidential value , BITSS 2014 Sum mer Institute, Jun e 2014, http s://bits sblog.fi les.wordpress. com/2014 /02/nels on- presentation.pdf . [39] A. C harp entier, P-hacking, or che ating on a p-value , R-blogg ers, J une 2015, http s: //www.r- blog gers.com /p- hacking- or- cheating- on- a- p- va lue/ [40] https:/ /en.wikt ionary.o rg/wiki/If_you_torture_the_data_long_enough, _it_will _confess _to_anything [41] D. T rafim o w, Editorial , Basic and Applied So cial Ps y cholog y 36 (2014) 1. h ttp: //www.ta ndfonlin e.com/doi/full/10.1080/01973533.2014.865505 26 [42] S. Nov ella, Psycho lo gy J ournal Bans Signific anc e T esting , Science-Based Medicine, F ebruary 25, 2015, https: //www.sc iencebas edmedicine.org/ psycholo gy- journal- bans- significance- testing/ [43] A. Gelman, P sych journal b ans signific anc e tests; stat blo gger inundate d with emails , F eb r uary 26, 2015 , http: //andrew gelman.c om/2015/02/26/ psych- journa l- bans- significance- tests- sta t- blogger- inundated- wi th- em ails/ [44] J. Berger, Ph. Da wid , J. Kadane, T. O’Hagan, L. Pir icc hi, Ch. P . Rob ert and D. Szucs, con tributions to Banning nul l hyp othesis signific anc e testing , IS BA Bullettin 22, March 2015, 5, https:/ /bayesia n.org/si tes/default/files/fm/ bulletin s/1503.p df [45] D. T rafimow and M Marks, Editorial , Basic and Ap plied So cial Psyc h ology 37 (2015) 1, http:// www.tand fonline. com/doi/full/10.1080/01973533.2015.1012991 [46] D. Hume, A T r e atise of Human Natur e, 1739 ; An Enquiry Conc erning Human Understanding , 1748. (Also a v ailable as audiob o oks at LibriVox , with links to the online texts: https:// librivox .org/treatise- of- human- nature- vol- 1- by- david- hu me/ ; https:// librivox .org/an- enquiry- c oncerning- human- understanding- by- david- hume/ .) [47] A. de Ruju la, p riv ate communicat ion, Decem b er 2011. [48] A. de Rujula, “Snapshots of the 1985 high e ne r gy physics p anor ama” , Pro c. of the In t. Europhys. Con f . on High-En er gy Physics, Bari (Italy), July 1995, L. Nitti and G. Preparata eds. [49] G. D’Agostini and G. Degrassi, Constr aints on the Higgs Boson M ass fr om Dir e ct Se ar ches and Pr e cision Me asur ement , Eur.Phys.J. C10 (1999) 663, http://l ink.spri nger.com/article/10.1007%2Fs100529900171 (arXiv:hep- ph/990222 6, htt p://arxi v.org/ab s/hep- ph/9902226 ). [50] G. D’Agostini and G. De grassi, Constr aining the Higgs b oson mass thr ough the c om- bination of dir e ct se ar ch and pr e cision me asur ement r esults , arXiv:hep-ph/0001269, http://a rxiv.org /abs/hep- ph/0001269 . [51] P . Astone, G. D’Agostini and S. D’An tonio, B ayesian mo del c omp arison applie d to the E xplor er- Nautilus 2001 c oincidenc e data , Class.Quant.G rav. 20 (2003) S769- S784 (arXiv:gr-qc/03 04096, http://xxx .lanl.go v/abs/gr- qc/0304096 ). [52] J. V eitc h and A. V ecc hio, Bayesian c oher ent analysis of in-spir al gr avitational wave signals with a dete ctor network , P hys. Rev. D 81 (2010) 06200 3 (arXiv:0911.382 0). [53] J. Skilling, Neste d Sampling for Gener al Bayesian Computation , Bay esian Anal- ysis 1 (200 6) 833 , http:// www.mrao .cam.ac. uk/ ~ steve/ma xent2009 /images/ skilling .pdf , https:// en.wikip edia.org /wiki/Nested_sampling_algorithm . 27 [54] G. K. Kanji, 100 statistic al tests , SAGE Pu blications Ltd, 2006. [55] D. J.C . MacKa y , Information the ory, Infer enc e and le arning algorithms , Cam- bridge Universit y Press, 2003, http://w ww.infer ence.phy .cam.ac.uk/itila/ book.htm l [56] R C ore T eam (201 6). R: A language and envir onment for statistic al c omputing . R F oundation for Statistical C omp uting, Vienna, Austr ia. https: //www.R- proj ect. org/ . (Script at ht tp://www. roma1.in fn.it/ ~ dagos/pr ob+stat. html .) [57] Email to the C ERN users b y the CERN DG Office, August 5, 2016. [58] Corriere della Sera, “T r ovata la p artic el la di Dio” – Una c ac cia lunga mezzo se c olo , July 3, 2012, http://www .corrier e.it/scienze/12_luglio_03/ trovata- part icella- di- dio- caccia- lunga- mezzo- secolo- giovanni- caprara_ b967689e - c 4d0- 11e1- a141- 5df29481da70.shtml . [59] Repubb lica, Da un lab or atorio ungher ese spunta la quinta forza , Ma y 25, 2016, http://w ww.repub blica.it/scienze/2016/05/25/news/modello_standard_ forze_fo ndamenta li_cern_lhc_particelle_fondamentali_materia_oscura_ bosone- 14056 7449/ . [60] A. S trumia. Interpr e ting the 750 GeV digamma exc e ss: a r evi ew , CERN-TH-2016- 131, http:// arxiv.or g/abs/16 05.09401 . [61] Game of Thr ones: 750 GeV e dition , R´ esonances, June 18 2016, http: // resonaan ces.blog spot.co.uk/2016/06/game- of- t hrones- 75 0- gev - edit ion. html . [62] N. W alc ho ver, Sup ersymmetry b et settle d with c o g nac , Quant a Magazi ne, August 22, 2016, https://www.q uantamaga zine.org/ 20160822 - s upersymm etry- bet- settled- cognac/ . [63] P .L. Galison, How exp eriments e nd , The Universit y of C hicago P ress, 1987. [64] P .B: Abb ot et al., LI GO -Virgo Collab oration, The b asic physics of the binary black hole mer ger GW150914 , arXiv:1608.0194 0, htt p://arxi v.org/ab s/1608.01940 . [65] D. Ghosh, M. Nardecc hia and S . A. Renner, Hint of lepton flavour non-unive rsality in B meson de c ays , J . High Energ. Ph ys. (201 4) 131, http:// link.spr inger.com/ article/ 10.1007/ JHEP12(2014)131 http://arxiv.o rg/pdf/14 08.4097.pdf [66] A. F owlie , Bayes-factor of the A TLAS diphoton exc ess , arXiv:160 7.06608, h ttp: //arxiv. org/abs/ 1607.06608 . 28

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment