Justifying the Norms of Inductive Inference

Justifying the Norms of Inductiv e Inference Olav Benjami n V assend September 17, 2019 Abstract Ba y esian inference is limited in scop e b ecause it cannot b e applied in ide- alized con texts where none of the h yp otheses under consideration is true and b ecause it is committed to alw a ys using the lik eliho o d as a measure of eviden- tial fav oring, ev en when that is inapprop r iate. The pu r p ose of this pap er is to study ind uctiv e inf erence in a very general setting where ﬁnding the truth is not necessarily the goal and where the measure of evid ential fa v oring is not necessarily t h e lik eliho o d . I use a n accuracy argument to argue for probabil- ism and I dev elop a new kind of argumen t to argu e for t wo general u p dating rules, b oth of whic h are reasonable in diﬀeren t con texts. O n e of the u p dating rules has standard Bay esian up dating, Bissiri et al.’s (2016) general Ba ye sian up dating, Douv en ’s (2016) IBE-based up d ating, and V assend ’s (2019a) quasi- Ba y esian up dating as sp ecial cases. The other up dating ru le is no v el. Con tents 1 In tro duction 2 2 Wh y credibilit y functions should b e probabilistic 5 3 Deriving the up dating rules 7 3.1 The com bination step . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 The normalization step . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Characterizations of inferen tial and predictiv e updating . . . . . . . . 14 4 Discussion of inferen tial and predict ive up dating 15 4.1 The diﬀerence bet we en inferen tial up dating and predictiv e up dat ing . 15 4.2 The relationship b et we en inferen tial up dating and other upda ting pro- cedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1 5 Conclusion 18 A Characterization of the combin ation function 22 B Characterization of the normalization step 24 C Characterization of inferen tial up dating 25 D Characterization of predictive up dating 26 E General Ba yesia n up dating is a sp ecial case of inferen tial up dating 27 F An alternativ e ch aracterization of the com bination step 28 1 In tro duction Ba y esians hold that inductiv e inference requires t w o ingredien ts. First, a prior prob- abilit y function deﬁned on the h yp otheses unde r cons ideration. Second, a lik eliho o d function, whic h assigns a probability to the evidence conditional on eac h hypothesis. In tuitiv ely , the prior pro babilit y assigned to a h yp otheses represen ts ho w plausible it is that the h yp othesis is true b efore the evidence has been tak en in to accoun t. The lik eliho o d, on the other hand, is a meas ure of eviden tial fa voring: if H 1 ’s lik eliho o d on the evidence is greater than H 2 ’s lik eliho o d on the same evidence , then the evi- dence fav ors H 1 o v er H 2 . Giv en a prior and lik elihoo d, Ba y esians hold that the prior probabilit y of each h yp othesis s hould be up dated to a posterior probabilit y through the us e of Bay es ’s fo r mula, so that the posterior probabilit y of H is prop ortional to the prior probabilit y o f H m ultiplied b y it s lik eliho o d. Ba y esianism has b ecome t he most common fo r mal framew ork used by philoso- phers of science to study scien tiﬁc metho dology , and it is also an inﬂuen tial frame- w ork for statistical inference. But it rests o n an assumption that is often violated in scien tiﬁc practice, namely that one of the h yp otheses under consideration is true. 1 Supp ose none of the h yp otheses under consideration is true, so that the goa l is in- stead t o ﬁnd t he hypothesis that is – in some sense – b est. Dep ending on what is mean t b y “b est,” the lik eliho o d ma y not b e an a ppropriate measure of eviden tial fa v or ing . F or example, suppose the goal is to iden tify the h yp othesis whose exp ected 1 This limitation is w ell kno wn, but o ften ignor ed. F or discussion o f the problem, s e e, e.g . Box (1980); Ber nardo a nd Smith (1994); F orster and Sob er (1994); F o rster (1995); Key et al. (1999); Shaﬀer (20 01); Sprenger (2009); Gelman and Shalizi (2013); V ass end (2019 b ); W alker (2013); a nd Sprenger (forthcoming). 2 maximal prediction error on f utur e data is as low as p o ssible. Then, as V assend (2019a) sho ws, the likelihoo d is not an appropriat e measure o f eviden tial f a v oring b ecause the h yp othesis that has the best lik eliho o d score on the evidence will in gen- eral not be the h yp othesis that has the low est expected maximal prediction error on future data. In this con text, a more reasonable measure of eviden tial fav oring ma y b e one according to whic h the evide nce fav ors H 1 o v er H 2 if and only if H 1 ’s maximal prediction e rro r on the evidence is low er than H 2 ’s maximal prediction error on the evidence. The fact that Ba y esianism is tied to using the lik elihoo d as a measure of eviden tial fa v oring is therefore a limitation of the framew ork. The go al of this pa p er is to study inductiv e inference in a very general setting. Supp ose o ur goal is to identify the b est h yp othesis H (where “b est” do es not nec- essarily mean “true”). Let p b e a function that assigns a num ber b etw een 0 and 1 (inclusiv e) to eac h h ypo thesis, suc h that p ( H ) is interpre ted a s represen ting a prior judgmen t of how plausible it is that H is b est (in the relev an t sense) out of t he h yp otheses under consideration. In the rest o f the pap er, I will r efer to any suc h function as a “credibilit y function”. Supp ose, mo r eo v er, that Ev [ E | H ] is an evi- den tial measure that is sensible giv en the purp ose at hand. Then the questions to consider are as fo llows: (1) What norms should p ob ey? (2 ) Ho w should p ( H ) and Ev[ E | H ] b e com bined in order to pro duce a p osterior score p E ( H ) that represen ts ho w plaus ible it is that H is best in ligh t of E and the prior informatio n? As we will s ee, one of the standard Bay esian arg umen ts fo r probabilism general- izes, so that – given widely applicable conditions – p and p E ough t to b e probabilit y functions. The more interes ting results concern up dating. I will show tha t, de- p ending on what t he goal is, the prior probability function and eviden tial measure should be combine d in one of the follo wing t wo w ays in or der to pro duce a p osterior probabilit y: Inferen tial up dating. G iv en eviden tial measure Ev and prio r proba- bilit y function p , up date p to the p osterior p E b y w ay of the following form ula: p E ( H ) = Ev[ E | H ] p ( H ) P i Ev[ E | H i ] p ( H i ) Predictiv e up dating. Giv en eviden tial measure Ev and prior proba - bilit y function p , up date p to the p osterior p E b y w ay of the following pro cedure: Step 1. F or eac h i , calculate q ( H i ) = p ( H i ) + Ev[ E | H i ]. 3 Step 2. T ransform q to p E as follo ws: for eac h i , p E ( H i ) = 0 or p E ( H i ) = q ( H i ) + d , where d is the unique n umber suc h that d is minimal and, for all i , p E ( H i ) ≥ 0 and P i p E ( H i ) = 1. The justiﬁcation for the names of the t w o up dating pro cedures will b ecome clearer later. Inferen tial up dating is clearly a generalization of Ba y esian up dating. Indeed Ba y esian up dating is just inferen tial up dating with the lik eliho o d used as the me asure of eviden tial fav oring. 2 What separates inferen tial up dating f rom predictiv e up dating is the former rule’s commitmen t to R e gularity : inferen tial up da ting will nev er assign a probabilit y of 0 to an y hypothesis, whereas predictiv e up dating typic ally will. In Section 4, we’ll see that a commitmen t to Regularit y is sometimes r easonable and sometimes no t. The pla n for the rest of the pap er is as follo ws. In Section 2, I sk etch a n argumen t for wh y an y credibilit y function ought to b e probabilistic, regardless of whether the goal is truth or s omething else. Since the arg umen t is a straigh tforw ard adaptation of Pe ttig rew’s (201 6) accuracy argumen t for probabilism, the section is brief. In Section 3, I giv e c haracterizations of inferential and predictiv e up dating fro m a set of plausible assumptions. The strategy is to divide inductive up dating in to t w o steps: in t he ﬁrst step, the prior plausibilit y of a h yp othesis is combined with the h yp othesis’s score on the evidence acc ording to some meas ure of eviden tial fav oring in o rder to pro duce a p osterior score. In the second step, the p osterior scores are normalized so that they are probabilistic. As w e’ll see, the requiremen t that the com bination step and normalization step comm ute in certain desirable w ays , to g ether with a few other plausible assumptions, result in the conclusion that the combination step and normalization step m ust b oth b e either multiplicativ e or a dditiv e. The c haracterizations of inferential and predictiv e up dating are then just a few short steps a w ay . I end the pap er with a discussion of inferen tial and predictiv e up da ting, including the ir relationship to eac h other and to other up dat ing rules. 2 Predictive up dating, on the other hand, may remind the reader o f the alternative to Jeﬀrey conditionalization derived b y Leitgeb and Pettigrew (2010). The tw o rules do indeed share sev- eral features in co mmon, although they are als o impor tan tly diﬀerent. In fact, it is p ossible to derive a sp ecial cas e of predictiv e up dating by us ing a pro of strategy that resemb les the one in Leitgeb and Pettigrew (20 10). 4 2 Wh y cre dibilit y functio n s should b e probabi l is- tic Before w e can show that credibilit y functions oug ht to b e probabilistic, w e need to get clearer on what this claim amounts t o. Let H b e a set of hy p otheses and suppose the goa l is to iden tify the h yp othesis in H that is b est rather tha n true ( where “b est” can mean anything w e lik e). One complication that arises when “true” is replaced b y “b est” is that whereas there is only o ne true hypotheses, there ma y b e sev eral that are b est. 3 F or example, if “b est” means “ ha ving a minimal maxim um exp ected prediction error,” then there may b e sev eral hypotheses t hat are tied for b est. Note, how ev er, t hat this is mor e a theoretical p ossibility than a practical one, since it is quite unlik ely that m ultiple h ypot heses w ould ha v e (sa y) exactly the same predictiv e accuracy score, esp ecially if the n um b er of h yp otheses is large. I will henceforth a ssume that at most one hypothesis out of the h yp otheses under consideration is b est. Note t hat if we mak e this assumption, then the hy p o t heses will also b e m utually exclusiv e in the sense that in any subset of hy p otheses a t most one hypothesis can b e b est. Another theoretical p o ssibility is that none of the h yp otheses under consideration is b est. This can, for example, happ en if the hypothesis space is inﬁnite and do es not con tain a single b est h yp othesis, but rather an inﬁnite sequence of h yp otheses in ascending order of g o o dness. 4 T o preclude this p ossibilit y , w e m ust also assume that at least one o f the h yp otheses under consideration is b est. Pro vided w e mak e the ab ov e assumptions (i.e. that exactly one o f the h ypot heses in H is b est), then there is nothing mathematically or philosophically that prev en ts us from treating H a s a sample space . I.e. H consists of h yp otheses that are ex haustiv e in the sense that one of the h yp otheses is b est and m utually exclusiv e in the sense that at most one o f the h yp otheses is b est in an y collection of h yp otheses. Note also that there is a natural σ -algebra on H . More precisely , union (or disjunction) and in tersection (or conj unction) are deﬁned in the normal wa y , the identit y elemen t for conjunction (i.e. the top elemen t of the algebra) is H , and the complemen t (negation) of any set A formed through unions and inte rsections of subsets of H is deﬁned in the follo wing w a y: ¬ A := H − A . The main diﬀerence fro m the deﬁnition giv en in most philosophical treatments of Bay es ianism is that t he top elemen t is now H rather than the tauto logy . This mak es a big in terpretiv e diﬀerence, but no diﬀerence to the mathem at ics. 3 I thank X for po in ting this out to me. 4 I thank a referee for p oin ting out this p ossibility . 5 Giv en t he ab o v e set-up, w e can no w deﬁne what it means for a function on the algebra, H ∗ , generated b y H to b e probabilistic in the follo wing w ay: Probabilit y axioms. A function p deﬁned on H ∗ is probabilistic if a nd only if it satisﬁes the follo wing requiremen ts: 1. p ( H ) = 1. 2. p ( A ) ≥ 0 for all subsets A of H ∗ . 3. p ( A ∨ B ) = p ( A ) + p ( B ) − p ( A & B ), fo r all subsets A and B of H ∗ . Note that credibilit y functions automatically satisfy 2 since w e hav e deﬁne d them to ha v e a range b etw ee n 0 and 1, so the real question is whether they ought to satisfy 1 and 3. One of the standard arguments for wh y r egula r credence f unctions (o r degrees of b elief ) ough t to b e probabilistic is the accuracy argumen t (Joy ce (1998), Jo yce (2009), P ettigrew (2016), Predd et al. (20 0 9)). Brieﬂy , the a rgumen t is as follo ws: 5 the ideal credence function to ha v e is the function that assigns 1 to the h yp othesis that is true and 0 to all hy p otheses that are false. Supp ose no w t ha t w e ha v e a div ergence measure (satisfying certain reasonable properties) that quan tiﬁes the distance b etw een the ide al function and an y other candidate credence function. It can then b e sho wn that an y credence function that is not probabilistic will b e dominated b y some proba bilistic function in the sense that the probabilistic function will b e guaran teed to hav e a smaller div ergence from the ideal function. Since it is irrational to choose an o ption that is known to b e dominated, it follows that it is irrational to use a non-pro babilistic credence function. An in teresting fact ab out the accuracy argumen t for probabilism is that it do es not depend for its v alidit y on any sp eciﬁc in terpretation of the credence function, nor do es it dep end on the assumption that the ideal credibilit y function is the function that assigns 1 to the hy p othesis that is true and 0 to all hypotheses that are false. Indeed, nothing in t he accuracy argumen t prev en ts us from designating t he ideal credibilit y f unction otherwise. Hence, w e can easily adapt the argumen t to a con text where the goal is to iden tify the h ypothesis that is b est rather than true. In suc h a con text, the ideal function w ould clearly b e one that assigns 1 to the h yp o thesis that is b est a nd 0 to all other hypotheses. W e can then form ulate the following v ersion of the accu ra cy argumen t: 5 There a re se v er a l versions of the argument; here, I present a v aria n t of Pettigrew’s (2016) version. 6 P1: The ideal credibilit y function is the function tha t a ssigns 1 to the h yp othesis that is b est and 0 to a ll other h ypo theses . P2: G iv en an y non-probabilistic function, there is a probabilistic function that is gua r a n teed to hav e a smaller div ergence fr o m the ideal function (giv en that the div ergence measure has certain reasonable pro p erties). P3: Giv en any probabilistic function, there do es not exist an y function that is gua r a n teed to hav e a smaller div ergence fr o m the ideal function (giv en that the div ergence measure has certain reasonable pro p erties). P4: If P1-P3, then non-probabilistic credibilit y func tions are irrational. C: Non-pro ba bilistic credibilit y functions a r e irrational. P2 and P3 are ma t hematical theorems (pro v en b y Predd et al. (20 09)) that hold regardless of what w e c ho ose as the ideal function. P1 and P4, on the other hand, are in tuitiv ely reasonable general rational principles. The main question that ma y b e ra ised ab out the generalized v ersion of the accuracy argument is whether the conditions on the dive rg ence measure are still reasonable when truth is no longer the goal. F o r e xample, P2 and P3 require t he assumption that the div ergence measure b elong to the class of Bregman dive rgences. Is this a reasonable requiremen t to mak e? My only resp onse to this question is that I do not see ho w this assumption (and other necessary mathematical assumptions) are more plausible if truth is the goal than if the goal is to iden tify the hypothesis that is best in some o ther se nse. So, at least in m y ey es, the generalized accuracy argumen t is at least as plausible as the original argumen t. In an y case, my m ain goal in this pa p er is not to giv e a careful analysis of the a ccuracy ar g umen t. F rom now I will assume that an y credibility function ought to b e proba bilistic. That is, I will assume that if p is a function that assigns a n umber b et we en 0 and 1 to eac h h yp othesis H that represen ts ho w plausible it is that H is b est (in some sens e), then p o ugh t to b e probabilistic. In the next sec tion, I turn to the main q uestion of the pa p er: giv en a probabilit y function p and giv en a pie ce of evidence E , ho w should p b e up dated in ligh t of E ? 3 Deriving the up datin g rules Supp ose we hav e a credibility function deﬁne d on a h yp othesis set H tha t is proba- bilistic in the se nse of the preceding section. Supp ose, also, that w e ha v e a n ev iden tia l measure function Ev[ E | H ] deﬁned on the set of evidenc e and the set of h ypotheses 7 under consideration. Note that we are not a ssuming that Ev [ E | H ] is probabilistic (e.g. P i Ev[ E | H i ] need no t sum to 1). It is widely a ccepted t ha t if the goal is to ﬁnd the true hypothesis in a partit ion of h yp otheses a nd the eviden tial measure is the lik eliho o d, i.e. Ev [ E | H ] = p ( E | H ), then a ny probability function ov er the h yp otheses ough t to be updated through Ba y esian up dating: Ba y esian up dating: p E ( H ) = p ( E | H ) p ( H ) P i p ( E | H i ) p ( H i ) The natural generalization o f Bay es ian up da ting is what I ha v e c alled infe rential up dating in the in tro duction. How ev er, it is not clear wh y the prior probability function and the eviden tial measure should alw ay s b e combined in a Bay esian-lik e manner, rega r dless of what the eviden tial measure is and regardless of what the pur- p ose of up dating is. Unfortunately , whereas the accuracy argumen t for probabilism do es not mak e any assumptions ab out how the credibilit y function is in terpreted, the standard accuracy argumen t for Ba ye sian up dating (G rea v es and W allace, 20 06) relies on prop erties that are unique to the lik elihoo d, in par t icular the f a ct that the lik eliho o d forms a join t distribution with the prior. Th us, the standard accuracy argumen t do es not generalize to cases where the eviden tial measure is not the lik e- liho o d. Other standard arguments for Bay es ian up dating ha v e the same limitation (e.g. Dutch b o ok arguments ). A diﬀerent kind of approac h is therefore needed. Bissiri et al. (2016) come up with a diﬀeren t a pproac h. They sho w that provided that the eviden tial measure is a function of an additiv e loss function, L ( E , H ), suc h that Ev[ E 1 & E 2 | H ] = f ( L ( E 1 , H ) + L ( E 2 , H )), and giv en that a few other assump- tions a re met, then the up dating pro cedure m ust ha v e the follo wing form, where c is some constan t: p E ( H ) = e − c ∗ L ( E | H ) p ( H ) P i e − c ∗ L ( E | H i ) p ( H i ) (3.1) Bissiri et al. (2016) call the abov e up da ting pro cedure “general Ba y esian up dat- ing.” General Ba y esian up dat ing tr a ces bac k to Zhang (2 006) and been increasingly inﬂuen tial in statistics in recen t ye ars. 6 Although Bissiri et al.’s (201 6) argumen t for general Ba y esian updating is intere sting, it has sev eral limitations. One problem is that, as V assend (2019b) ar gues, the probabilities in (3.1) cannot b e in terpreted in the standard Bay esian w ay as plausibilities of truth. But if the pro babilities are not standard credibilit y f unctions, then the decision theoretic fra mew ork assumed b y Bissiri et al. (2016) would seem to lack justiﬁcation. The argumen t also mak es 6 See Gr ¨ un wald a nd v an O mmen (2017) for a thorough discussion of gener al Bay esian updating and related updating r ules. 8 certain mathematical assumptions that seem hard to justify from a philosophical p oin t of view. In particular, the a uthors base their argument in part on t he use of statistical div ergence measures , and they assum e that the div ergence belongs to the class of f -div erences. 7 This assumption r ules out many standar d div ergence mea- sures, including all Bregman div ergences aside from the Kullback-Leibler div ergence (Amari, 200 9). 8 A ﬁnal limitation of Bissiri et al.’s (2016) deriv ation is that there are many reasonable eviden tial measures tha t cannot b e written as a function of an additiv e loss function. Indeed, ev en the lik eliho o d will only hav e suc h a f o rm if the evidence is indep enden t conditional on H i , for all i . 9 Th us, althoug h their argumen t is interesting, a mor e g eneral approach that mak es less restrictiv e a nd more philo- sophically defens ible assumptions is desirable. That is the goal of this section. Later w e will see that Bissiri et al.’s (2016) up dating r ule ma y b e deriv ed as a sp ecial case. T o start, note that ordinary Ba y esian up dat ing can be decomp osed in to t w o steps: Com bination step. F or each i , c alculate p ∗ ( H i ) = p ( E | H i ) p ( H i ). Normalization step. T ransform p ∗ to p ′ as follo ws: for eac h i , p ′ ( H i ) = p ∗ ( H i ) p ( E ) . In the ﬁrst step, the prior plausibility of t he hypothesis is combine d with t he eviden tial score ( i.e. lik eliho o d) of the h yp othesis in order to pro duce an o v erall judgmen t of the h yp othesis’s p osterior plausibility . In the second step, the p osterior plausibilit y of a ll t he hypotheses a re rescaled in suc h a w a y t ha t they j o in tly ob ey the probabilit y axioms, i.e. suc h that all the posterior plausibilit y scores fall b et w een 0 and 1, inclusiv e, and join tly sum to 1. Ba y esian up da t ing is a sp ecial case of a m uc h broa der clas s of up dating rules that decomp ose in to a com bination step and a no rmalization step. The purp ose of the re- mainder of this pap er will b e to study this class of up dating rules. The com bination step requires a combination function, c , that tak es as its input a prior probability , p ( H ) a nd a set of evid ential scores, Ev[ E 1 | H ], E v [ E 2 | H , E 1 ], Ev[ E 3 | H , E 1 , E 2 ], etc., and that assigns a total score to H , taking in to consideration b oth its prior proba- bilit y and its p erformance on the evidence . The normalization step then transforms 7 They also give an alternative deriv a tio n tha t does not make t his as sumption. How ever, the alternative deriv ation makes other susp ect a ssumptions. In particular, it assumes that the normal- ization pro cedure is m ultiplicative, whic h we’ll see later in this paper can b e put in to question. 8 Recall that Bregman divergences pla y a crucia l r o le in the accuracy argument for probabilism. The justiﬁcation for the fo cus on Breg ma n div ergences is their tight connection to strict propriety (see Predd et al. (2009)). 9 If p ( E 1 , E 2 | H ) = p ( E 1 | H ) p ( E 2 | H ), we can write p ( E 1 , E 2 | H ) = e log p ( E 1 | H ) +log p ( E 2 | H ) , i.e. the likelihoo d is of the form required b y Bissiri et al. (2016). B ut if p ( E 1 , E 2 | H ) 6 = p ( E 1 | H ) p ( E 2 | H ), then w e cannot write the likeliho o d in this wa y . 9 those scores in to probabilities. In other w ords, on an abstract lev el, our purp ose will b e to s tudy up dating pro cedures that decomp ose in the following w a y: Com bination step: F or eac h h yp othesis, H i , a set of eviden tial scores and a prior probabilit y are com bined using some com bination function c in order to pro duce an o v erall posterior score for H i . Normalization step: The p osterior scores of all the H i are transformed using some func tion N suc h that they join tly satisfy the probability ax- ioms. In the nex t t w o subsec tio ns the com bination step and t he normalization step are analyzed in detail. T he g oal is to sho w t ha t – given reasonable assumptions – the com bination function c and the normalizatio n function N b oth hav e a v ery limited set of possible functional forms. 3.1 The com bination step Let e 1 and e 2 represen t the eviden tial scores of a h yp othesis H on some evidence, and let h represe nt H ’s prior probabilit y; then there are t w o candidate forms f or t he com bination function that arguably stand out as b eing particularly plausible: Additiv e com bination: c ( e 1 , e 2 , h ) = e 1 + e 2 + h Multiplicativ e com bination: c ( e 1 , e 2 , h ) = e 1 ∗ e 2 ∗ h Note that e 1 and e 2 here ma y r epresen t either conditional or unconditional ev- iden tial scores. F or example, e 1 ma y represe nt Ev [ E 1 | H ], i.e. the unconditional eviden tial score of H on E 1 , or it may represen t Ev[ E 1 | H , E 2 ], i.e. the conditional eviden tial score of H o n E 1 giv en that E 2 has already been tak en in to accoun t. Note, also, that to say that the com bination function is additive or m ultiplicative is not the same as s aying that the evid ential measure is additiv e or m ultiplicativ e in the sense that Ev[ E 1 , E 2 | H ] = Ev[ E 1 | H ] + Ev [ E 2 | H ] or Ev [ E 1 , E 2 | H ] = Ev[ E 1 | H ] ∗ Ev [ E 2 | H ]. The latter assum ptions are mu ch stronger, and amoun t to assuming that E 1 and E 2 are indep enden t conditional o n H (relat ive to t he evide ntial measure Ev). If w e mak e a few reasonable assumptions, w e can prov e that the com bination function m ust b e m ultiplicativ e or a dditiv e. First of all, suppose w e ha v e ev idential scores e 1 and e 2 , and a pr io r probabilit y h . Clearly , the order in whic h w e com bine the eviden tial scores a nd the prior should not matter for the ﬁnal result w e get. 10 That is not to sa y that the o r der in whic h the evidence is rec eiv ed do es no t matter; it ma y . F o r example, if w e ﬂip a coin and the outcomes ar e six heads in a row and then six tails in a row , then the order of the outcomes strongly suggest that the outcomes are probabilistically dep enden t. Nev ertheless, the order in whic h we ev a lua te the a v ailable pieces of evidence in order to pro duce an ov erall judgmen t should not inﬂuence the o ve rall judgmen t a t whic h w e a r r iv e. F or that reason, the com bination function should b e comm utative : c ( e 1 , e 2 ) = c ( e 2 , e 1 ). F urthermore, it clearly should not matter whe ther w e ﬁrst com bine e 1 and e 2 and then com bine the result of that with e 3 , or whether w e com bine e 2 with e 3 and then com bine the result with e 1 , o r whether we com bine all three pieces of evidence at the same time. In other words, c should be associative : c ( e 1 , c ( e 2 , e 3 )) = c ( c ( e 1 , e 2 ) , e 3 ) = c ( e 1 , e 2 , e 3 ). The ﬁnal reasonable requiremen t is more quan titat ive. Clearly , the impact that e 1 has on H ’s ov erall ev idential score, after e 2 has already b een tak en in to acc ount, should not dep end on the impact that e 2 has on H . That is not to say that a piece of evidence E 2 should not inﬂuence the impact that a diﬀeren t piece o f evidence E 1 has on H ’s eviden tial score; it may we ll, but if it do es it should do so through Ev[ E 1 | H , E 2 ]. A piece of evid ence may inﬂuence the eviden tial impact conferred by another piece of evidence, but the eviden tial scores themselv es should not inﬂuence eac h other. In other words, the requiremen t is that the impact that, fo r example, e 1 = Ev [ E 1 | H , E 2 ] mak es on H ’s total eviden tial score should not dep end on the impact that e 2 = Ev[ E 2 | H ] mak es on H ’s total eviden tial score, nor vice v ersa. Giv en t ha t w e are willing to suppose that the com bination function is t wice diﬀeren tiable, the pre ceding requiremen t ma y b e naturally formalized as constrain ts on the part ia l deriv ativ es of the com bination function. Let c ( x, y ) b e the com bination function as a function of v ar iables x and y . Then the impact that the eviden tial score e 1 mak es on H ’s to tal eviden tial score is pla usibly the v alue of the partial deriv ativ e of c ( x, y ) with resp ect to x , when ev a lua ted a t x = e 1 . If ∂ c ( x,y ) ∂ x c ( x = e 1 , y ) is a large n um b er, then tha t means setting x to e 1 mak es a large diﬀerence to H ’s o v erall eviden tial sc ore; if it is 0, then e 1 mak es no diﬀerence . The requiremen t that the impact that e 1 mak es should not dep end o n the impact that e 2 mak es, nor vice v ersa, for any e 1 and e 2 , ma y then b e formalized in t erms of a constrain t on the higher-order partial deriv ativ es of c , namely that for some cons tant k the follo wing equation be obey ed: ∂ 2 c ( x, y ) ∂ x∂ y = k The ab ov e equation f o rmalizes the idea that the impact that x mak es, i.e. ∂ c ∂ x , should not dep end on the impact that y mak es, i.e. ∂ c ∂ y , where x and y represen t 11 an y p ossible eviden tial scores. W e can no w sho w the fo llowing (the deriv ation is in App endix A): Characterization of t he combina tion function. Supp ose the c ombi- nation function, c ( x, y ) satisﬁes the fol low i ng r e quir em e nts : 1. c is comm utativ e. 2. c is ass o ciativ e. 3. c is t wice diﬀeren tiable. 4. c ’s pa rtial deriv ativ es satisfy the follo wing eq uation, for some n um- b er k : ∂ 2 c ( x, y ) ∂ x∂ y = k Then c must have one of the fol lowing two forms : 1. If k = 0, then c ( x, y ) = x + y . 2. If k 6 = 0, then c ( x, y ) = xy . Hence, it follo ws that the combination f unction m ust b e additiv e or m ultiplicativ e. Of course, this conclusion is only as plausible as the assump tio ns f r om whic h it is deriv ed, a nd some p eople ma y b e uncomforta ble with some of t he assumptions that ha ve b een made, in particular the condition on the partial deriv atives o f the com bination function. As it happ ens, it’s p ossible to deriv e the conclu sion from quite diﬀeren t assumptions. Hence, in order to sho w the robustness of the conclusion, I pro vide an alternativ e c haracterization of the com bination function in Appendix F. 3.2 The normalization step After the combination function has pro duced a posterior plausibilit y score, the p os- terior score m ust b e normalized to b e a probability . In theory , normalizing a set of n um b ers means transforming the n umbers in suc h a w a y tha t they are all b et w een 0 and 1 and join tly sum to 1, while at t he same time retaining as m uc h of their inte rnal structure as p ossible. In practice, this means that the most extreme num b ers in t he set ma y b e f orced to tak e the v alue 0, while the remaining n umbers in the set ar e rescaled b y some function, f . In other w ords, normalizatio n in general tak es the follo wing functional form: 12 N ( x ) = ( 0 Giv en that x is su ﬃcien tly low f ( x ) Otherwise (3.2) F or example, in the normalizatio n step of standard Ba y esian up dating, N ( x ) = f ( x ) (i.e. no non-zero num bers are normalized to 0) a nd if the set to b e normalized is { a 1 , a 2 , . . . , a n } , then f ( x ) = 1 P i a i . Note t hat b o t h N and f a re relative to the set that is b eing normalized; hence, if w e need to b e precise, w e should write N S and f S , where the subscript indicates the set that is b eing normalized. Nev erthele ss, I will t ypically lea v e oﬀ the subscripts in order to a v oid clutter. Clearly , f should b e a one-to-one function. Indeed , except in the case where x and y are bo th normalized to 0, it should b e the case that if x < y then f ( x ) < f ( y ). F urthermore, it is clear that the f unction f ough t to comm ute with the com binatio n function. Supp ose w e ha v e scores e 1 , e 2 , and h . Then we should arriv e at the same p osterior probabilit y regardless of whether w e do either of the following: ﬁrst w e combine h and e 2 , normalize, then comb ine the normalized result with e 1 and normalize ag a in; or we ﬁrst com bine h and e 1 , normalize, and then com bine that normalized result with e 2 b efore normalizing again. In sym b ols, we require, f o r all p ossible scores x , y , and z , that: f ( c ( x, f ( c ( y , z )))) = f ( f ( c ( x, y ) , z ) ). The justiﬁcation for this requiremen t is, again, tha t the order in whic h w e ev aluate our evidence – whic h is arbitrary – should not ha v e an inﬂue nce on our ﬁnal judgment. By comb ining just the preced ing t wo requiremen ts, w e can sh ow the follo wing: Characterization of the normalization pro cedure. Supp ose we have a normalization pr o c e dur e as in (3.2) that sa tisﬁ e s t he fol low ing r e quir e- ments : 1. f comm utes with the com bination function c . F or all x , y , and x : f ( c ( x, f ( c ( y , z )))) = f ( f ( c ( x, y ) , z )). 2. f is o ne-to-one: for all x and y , f ( x ) = f ( y ) if and only if x = y . Then the norm a lization pr o c es s must have one of the fol lowing fo rms, f o r some c onstant k that dep ends on the set, S , of numb ers b eing normalize d : 1. If the com bination function is m ultiplicativ e, then, for all x in S , f ( x ) = k ∗ x . 2. If the com bination function is additiv e, then, for all x in S , f ( x ) = x + k . The pro of, whic h again is stra ig h tforw ard, is in App endix B. 13 3.3 Characterizations of inferen tial and p redictiv e up dating The results so fa r show that an y up da ting pro cedure needs to hav e either: ( 1) A m ultiplicativ e com bination step and a multiplicativ e normalizatio n step, or (2) an additiv e com binatio n step and an additive no rmalization step. Call an up dating pro cedure that satisﬁes either (1) or (2) a legitimate up dating procedure. 10 T o c haracterize inferential up dating w e now in tro duce the follo wing principle: Regularit y: No h yp othesis is ev er conclusiv ely r uled out by an y evid ence unless the evidence log ically refutes t he h yp othesis, i.e. the p osterior probabilit y of an y h ypo thesis is alw a ys greater than 0. W e can then show the fo llo wing (see App endix C): Characterization of inferen tial up dating. Th e only legitimate up- dating pro cedure that satisﬁes R egularit y is inferential up dating. I.e., giv en eviden tial measure Ev and prior probabilit y f unction p , up date p to the posterior p E b y wa y o f the follo wing form ula: p E ( H ) = Ev[ E | H ] p ( H ) P i Ev[ E | H i ] p ( H i ) Inferen tial up dating satisﬁes Regularity ; it will nev er result in a ny h yp othesis ha ving a p osterior probability of 0 . On the other hand, in Appendix C, I show that an up dating pro cedure that uses an additive com bination function and an additive normalization function must violate Regularit y; most of the time, a n y suc h up dat ing rule m ust assign a p o sterior proba bilit y of 0 t o some hy p otheses. But this do es not mean that suc h an up dating rule should neve r b e used. As w e will see in the next section, sometimes we ma y w an t to be able to exclude certain h ypotheses from consideration—i.e., ass ign them a posterior probabilit y of 0. Nev ertheless, w e do not wan t to exclude more h yp otheses than is warran ted by the data . The up dating pro cedure ought to b e conserv ativ e and exclude as f ew h yp otheses as p ossible at ev ery step. In other words, an y up dating pro cedure that violates Regularit y should pla usibly still satisfy the follo wing principle: 10 Note that not every up dating rule that has bee n sug gested in the literature is legitimate in this sense o f the word. F or e x ample, Douven and W enma c kers (2017) consider a rule according to which p E ( H ) = c ∗ ( p ( H ) ∗ p ( E | H ) + f ( E , H )) where c is a nor malization constant and f ( E , H ) is a “b onus” assigned to H in case H is the best explanation of E . This up dating rule is not leg itimate bec ause it is neither purely additive nor purely m ultiplicative. On the other hand, the class of rules considered in Douv en (2 0 16) are legitimate. 14 Conserv ativen ess: The up dating pro cedure assigns a p osterior proba- bilit y of 0 to as few h yp otheses as p ossible, giv en the com binatio n func- tion, the normalization pro cedure, and the evidence a v ailable. W e are no w in a p osition to c haracterize predic tive up dating: Characterization of predictiv e up dating. The o nly legitimate up- dating pro cedure that violates Regularity , but satisﬁes Conserv ativ eness, is predictiv e up dating. I.e., given eviden tial measure Ev and prior prob- abilit y function p , up date p to the p osterior p E b y wa y of the follow ing pro cedure: Step 1. F or eac h i , calculate q ( H i ) = p ( H i ) + Ev[ E | H i ]. Step 2. T ransform q to p E as follo ws: for eac h i , p E ( H i ) = 0 or p E ( H i ) = q ( H i ) + d , where d is the unique n umber suc h that d is minimal and, for all i , p E ( H i ) ≥ 0 and P i p E ( H i ) = 1. 4 Discuss ion of inferen tial and predic t iv e up dat- ing 4.1 The diﬀerence b et w een inferen tial up dating and predic- tiv e u p dating Inferen tial upda t ing and predictiv e updating diﬀer in that the former up dating rule ob eys Regularit y while t he latter rule do es not. Is Regularit y a reasonable constrain t? In some contexts it is, but in others it is not. Supp ose our main priorit y is to iden tify the hypothesis that is tr ue or (if none of the h yp otheses is true) the h yp othesis that is closest to the truth a ccording to some appropriate measure of closeness to the truth. Giv en this goal, it is reasonable to be ris k-av erse a nd op en-minded: w e do not w an t to rule out any hypothesis as p otentially b eing the h yp othesis that is true. Ev en if a lot o f evidence strongly suggests that a hypothesis is f alse, there is alw ay s the p ossibilit y that the evidence is unrepresen tativ e or misleading. And so Regularit y is a reasonable constrain t in this con text. Ho w ev er, supp ose we do not care ab out whic h of our h yp otheses is true or closes t to the truth; our goal is not inferen tial, but predictiv e. W e wish to ﬁnd, as eﬃcien tly as p ossible, the subs et of h yp otheses tha t can b e expected to b e as predictiv ely accurate as possible. In this con text, there is no theoretical justiﬁcation fo r requiring 15 that the up dating rule ob ey Regularit y; on the contrary , there a re go o d reasons for wh y w e migh t w ant an up dating rule that violates Regularity . In particular, suppose the p osterior distribution will be used in order to make a we ighted probabilistic prediction, i.e. the goa l is f o r p ( D | H i ) p E ( H i ) to b e as accurate on future data D as p ossible. In that case, it would seem inadvisable to assign p ositiv e probabilit y to any h yp othesis that has shown itself t o be v ery predictiv ely inaccurate, s ince the predictions made by suc h a hypothesis w ould lik ely thro w oﬀ the weigh ted prediction. On the ot her hand, w e do not w ant to go to the opp osite extreme and base the prediction on the single h yp othesis that has p erformed best on the evidence, as that is liable to lead to o ve rﬁtting ( F orster and Sob er, 1994). Predictiv e updating enables one to set the proba bilities of predictiv ely inaccurate h yp otheses to 0 in a principled (and conserv ativ e) w a y . Let’s consider a sp eciﬁc example. When t he hypotheses under considerations mak e probabilistic predictions a nd the go al is maximal predictiv e accuracy , it is natural to use a strictly prop er scoring rule as the measure of eviden tial fav oring (Gneiting and Raftery, 2 007). F o r v arious reasons, the most p opular scoring rule in applied researc h is probably the Con tinuous Rank ed Probability Score (CRPS). Supp ose w e ha v e a set of comp eting statistical mo dels M 1 , M 2 , etc., and for eac h mo del, let p M i b e the marginal (cumulativ e) probability forecast distribution corre- sp onding to M i . Supp ose, mor eov er, that p M i has ﬁnite ﬁrst momen t, that X , X 1 and X 2 are indep enden t and identic ally distributed random v ariables that follo w the distribution of p M i , and that x is the actual observ ed outcome. Then the CRPS can b e written in the following w ay (where the ex p ectations are taken relativ e to p M i ): CRPS( p M i , x ) = E | X − x | − 1 2 E | X 1 − X 2 | (4.1) As ( 4 .1) make s clear, CRPS is a statistical generalization of absolute error. As Gneiting a nd Raftery (200 7) p oin t out, a signiﬁcant b eneﬁt of the CRPS is tha t it is easily in terpretable, since the outputs of (4.1) can b e rep orted in the same units as the me asuremen ts. F or example, supp ose the meas uremen ts ar e in terms o f meters. Then the CRPS score of a model on an observ ation will b e a represen tation of ho w man y meters inaccu ra t e the model’s predictions are of that observ ation, on a v erage (since the prediction is a probability distribution rather than a single n umber, the a v erage is needed). If we let Ev[ x | p M i ] = a ∗ CRPS( p M i , x ), where a is some constant, and assign prior probabilities to all the models, then predictiv e up dating can be used to assign p osterior probabilities to all the models. 11 Imp ortan tly , give n suﬃcien t evidence 11 If the models contain parameters, then th e pr o babilit y distributions ov er those parameters ma y 16 (and depending on ho w the constan t a is c hosen) man y of the mo dels will receiv e a p osterior probabilit y of 0. These po sterior pro babilities can then be used for mo del selection or for making a w eighte d prediction using all the mo dels. Of course, it is an empirical question whether predictiv e up dating is better (for predictiv e purpo ses) than inferen tial up dating (including standard Ba y esian up dating). An empirical ev a lua ting of predictiv e up dating will hav e to w ait for a diﬀerent occasion, how ev er. In this section I hav e simply tried to suggest one concrete w a y in whic h predictiv e up dating m ay b e implemen ted. 4.2 The r elationship b et w een inferen tial up dating and other up dating pro cedures As w as already men tioned in the in tro duction to the pap er, standard Ba ye sian updat- ing is clearly a special case of inferential up da ting: more precisely , w e get Ba y esian up dating if and only if Ev[ E | H ] ∝ p ( E | H ), i.e. if and only if the eviden tial me asure is prop ortiona l to the lik eliho o d. What V a ssend (2019a) calls “ quasi-Bay esian up- dating” is also a sp ecial case of inferen tial up dating; indeed, quasi-Ba ysian up dating is simply inferen tial up dating with an eviden tial measure that has b een suitably cal- ibrated to a ve risimilitude measure. Similarly , Douv en’s (201 6) IBE-ba sed updating rule is also clearly a kind of inferen tial up dating. P erhaps mor e in terestingly , Bissiri et al.’s (2016) general Bay esian up dating is also a special case of inferen tial up dating. More pre cisely , w e ha v e: General Bay esian up dating is a special case of inferen tial up dat- ing. Supp ose the evidential me asur e Ev is a strictly de cr e asing function f of so me loss function, L ( E , H ) , such that for al l E 1 and E 2 , Ev satisﬁes the fol lowing c onditions : 1. Ev[ E 1 | H , E 2 ] = Ev[ E 2 | H ] = f ( L ( E 1 , H )). 2. Ev[ E 1 , E 2 | H ] = f ( L ( E 1 , H ) + L ( E 2 , H )) . Then infer ential up dating has the fol low i n g form : p ( H | E ) = e − c ∗ L ( E ,H ) p ( H ) P i e − c ∗ L ( E ,H i ) p ( H i ) F or some c on s tant c . be updated using either inferential or predictiv e up dating. 17 A sk etc h of the pro of, whic h is straightforw ard, is giv en in App endix E. Although general Bay es ian up dating is a sp ecial case of inferen tial up dat ing, the r ev erse is not the case b ecause – a s was previously men tioned – many reasonable eviden tial measures cannot b e written as a function of an additiv e loss function. Supp ose, for example, that t he h yp otheses under consideration are real- v alued functions, f i and that the eviden tial measure is of the form Ev[( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) | f i ] = Minim um( | y 1 − f i ( x 1 ) | , | y 1 − f i ( x 1 ) | , . . . , | y 1 − f i ( x 1 ) | ). It is clear in this case that the e viden tial measure cannot be written as a function of an additiv e loss f unction, simply b ecause the Minim um operato r is not additiv e. A diagram depicting the relationship b etw een inferen tial up dating, predictiv e up dating, and v arious up dating rules that ha v e b een suggested in the literature is giv en in Figure 1. Legitimate up dating rules Inferen tial updating Quasi-Ba ye sian (V assend, 2019a) Standard Bay esian IBE-based (Douv en, 2016) General Bay esian (Bissiri et al., 2016) Predictiv e updating Figure 1: Ov erview of v a rious up dating rule s 5 Conclus ion The primary purpose of this pap er has been to justify a s et of v ery general sync hronic and diac hronic inductiv e norms. The resulting normat iv e framew ork can b e put to b oth philosophical and scien tiﬁc use. In philosoph y of scienc e, a standard w a y of analyzing scien tiﬁc metho dology is b y s eeing whether the methodology mak es sense from a Bay es ian p ersp ectiv e. F or example, in this wa y , Sob er (2015) analyzes parsi- mon y inference , 12 Da wid et al. (2015) analyze no-alt ernat ives arg umen ts in ph ysics, 12 Sob er uses a lik eliho o dist approach, which is Bay esianism without the priors. 18 Sc h upbac h (2018) analyzes robustness a na lysis, and Myrvold (2016) ev aluates the epistemic v alue of uniﬁcation. Since the preceding analyses tak e place in a Ba y esian framew ork, they inherit the limitatio ns and assumptions of Ba ye sianism. In the broader normative framew ork dev elop ed in this pap er, it’s p ossible to c hec k whether the analys es still hold up whe n those assumptions are lifted. F or example , Myrv old (2016) sho ws that more unifying h yp otheses will b e more conﬁrmed b y evidence than less unifying hy p ot heses, other things b eing equal. Since his a nalysis is Ba ye sian, he implicitly uses the lik eliho o d as his measure of eviden tial f av oring. A natural question to ask is whether his result still holds if the lik eliho o d is replaced with an arbitrary measure of eviden tial fav oring. The p erhaps surprising answe r is y es, although a prop er demonstration of this fact m ust b e r eserv ed for a diﬀeren t time. The normative framew ork dev elop ed in this pa p er can also b e used fo r scien- tiﬁc inference. Indeed, implicitly it already has b een—as sho wn in Section 4.2, the general Ba y esian updating rule suggested b y Bis siri et al. (2016) is a sp ecial case of inferen tial up dating, and general Ba y esian up dating is gaining in p opularity in the statistical comm unit y . But inferential up dating is more general than general Ba y esian up dating, a nd allo ws for the use of eviden tial measures that cannot b e represen ted in Bissiri et al.’s ( 2 016) framew or k. One example is the phylogenetic parsimony measure discussed by V a ssend (20 19a). P redictiv e up dating can a lso b e applied in scien tiﬁc inference problems, for example through the use of strictly proper scoring rules as suggested in Section 4 .1. Of course, it is ultimately an empirical question whether predictiv e up dating p erforms b etter than inferen tial up dating. An answ er to this question m ust w ait until later; in this pap er, m y g o al has b een to provide a general normative fr a mew ork for inductiv e inference that is as ﬂexible as p ossible while ob eying basic theoretical desiderata. References Acz ´ el, J. (2006) . L e ctur es on F unctional Equations and Their Appli c ations . Dov er Bo oks on M at hematics. Do v er Publications. Amari, S.- I. (2009). alpha - Div ergence is Unique, Belonging to Both f - D iv ergence and Bregman Div ergence Classe s. IEEE T r ansactions on Information The ory 55 (11 ), 4925 – 49 31. Bernardo, J. M. and A. F. M. Smith ( 1 994). Bayesian The ory . Wiley , New Y ork, NY. 19 Bissiri, P . G., C. Holmes, and S. W alker (2016). A General F ramework for Up da ting Belief Distributions. Journal of the R oyal Statistic al So ciety. Series B (Metho d- olo gic al) 78 (5), 1103–113 0. Bo x, G. E. P . (1980 ). Sampling a nd Ba y es’ Inference in Scien tiﬁc Mo delling and Robustness. Journal of the R oyal Statistic al So ciety. Series A (Gener al) 143 (4), 383–430. Da wid, R., S. Har t ma nn, and J. Sprenger (2015 ). T he No Alternativ es Argument. British Journal for the Philoso p hy of Sc i e n c e 66 (1), 213– 234. Douv en, I. (20 16). Explanation, Up dating, and Accuracy. Journal of Co gnitive Psycholo gy 28 (8), 1004 –1012. Douv en, I. and S. W enmac ke rs (2017). Inference to the Best Explanation v ersus Ba y es’s R ule in a So cial Setting. British Journal for the Philoso p hy of Sci- enc e 68 (2 ), 535–570. F orster, M. R . (1995, Septem b er). Ba y es and bust: Simplicity as a problem for a probabilist’s approac h to conﬁrmation. British Journal for the Philosophy of Scienc e 46 (3), 399–424. F orster, M. R. and E. Sob er (199 4). Ho w T o T ell When Simpler, More Uniﬁed, or Less Ad Ho c Theories Will Pro vide More Accurate Predictions. The British Journal for the Philosophy of Scienc e 45 (1), 1–3 5 . Gelman, A. and C. R. Shalizi (2013). Philosophy and the Practice of Bay esian Statistics. British Journal of Mathematic a l and Statistic al Psycholo gy 6 6 , 8–38 . Gneiting, T. a nd A. E. R aftery (2007). Strictly Prop er Scoring Rules, Prediction, a nd Estimation. Journal of the A meric an Statistic al Asso ciation 102 (477), 359– 378. Grea ve s, H. and D. W allace (2006). Justifying conditionalization: Conditionalization maximizes epistemic utilit y. Mind 115 (459), 607–632 . Gr ¨ un w ald, P . and T. v an O mmen (2017). Inconsis tency of Ba y esian Inference for Mis- sp eciﬁed Linear Models, and a Prop o sal for Repairing It. Bayesi a n A nalysis 12 (4), 1069–110 3. Jeﬀrey , R. (1983). T h e L o gic of De cision (Second ed.). Cambridge Univ ersit y Pres s, Cam bridge. 20 Jo yce, J. (1998). A Non-Pragmat ic Vindication of Probabilism. Philos o phy of Sci- enc e 65 (4 ), 575–603. Jo yce, J. (2009). Accuracy and Coherence: Prosp ects for an Alethic Epistemology of Partial Belief. In F. Hub er and C. Schmidt-P etri (Eds.), D e gr e es of Belief . Syn these. Key , J. T., L. R. P ericc hi, and A. F. M. Smith (1999). Ba ye sian Mo del Choice: What and Wh y? In J. M. Bernardo, J. O. Berger, A. P . D a wid, and A. F. M. Smith (Eds.), Bayesian Statistics 6 , pp. 343–370. Oxford: Oxford Un ivers ity Pres s. Kop ytov , V. M. a nd N. Y. Medv edev (1996). Right-Or der e d Gr oups . Sib erian Sc ho ol of Algebra a nd Logic. Springer. Leitgeb, H. and R. P ettigrew (2 0 10). An Ob jectiv e Justiﬁcation of Ba y esianism I I: The Consequences of Minimizing Inaccuracy. Ph ilosophy of Scienc e 7 7 , 236–272 . Levinstein, B. A. (2012) . Leitgeb and Pe ttig r ew on Accuracy and Updating. Philos- ophy of Scienc e 79 (3), 413– 424. Myrv old, W. (2016 ). On the E viden tial Imp ort of Uniﬁcation. Unpublished man uscript. P ettigrew, R. (2016). A c cur acy and the L aws of Cr e denc e . Oxfor d Univ ersit y P ress. Predd, J. B., R. Seiringer, E. H. Lieb, D . N. Osherson, H. V. P o or, and S. R. Kulk arni (2009). Probabilistic Coherence and Prop er Scoring Rules. IEEE T r ansactions on Information The ory 55 (10), 4786–4 792. Sc h upbac h, J. N. ( 2 018). R obustness Ana lysis as Explanatory R easoning. British Journal for the Philosophy of Scienc e 69 (1), 275 – 300. Shaﬀer, M. J. (2001 ) . Bay esian Conﬁrmatio n of Theories That Incorp orate Idealiza- tions. Philosophy of Scienc e 68 (1), 36 –52. Sob er, E. (2015). Ockham’s R azors: A User’s Manual . Cam bridge Univ ersit y Press. Sprenger, J. (2009). Statistics Betw een Inductiv e Logic and Em pirical Science. Jour- nal of Applie d L o g ic 7 (2), 239– 2 50. Sprenger, J. (forthcoming). Conditional D egr ee of Belief. T o app ear in Philo sophy of Scienc e . 21 T anton, J. (2005). Enc yclop e dia of Mathematics . Science Encyclop edia. F acts on File. V assend, O. B. (2019a ). A V erisimilitude F ramew ork for Inductiv e Inference, with an Application to Ph ylogenetics. T o app ear in British Journal for the Philosop hy of Scienc e . V assend, O. B. (2019b). New Seman tics for Ba y esian Inference: The Interpre tive Problem and Its Solutions. T o a pp ear in Philosophy of Scienc e . W alker, S. G. (2013). Bay es ian Inference with Misspeciﬁed Mo dels. Journal of Statistic al Plann i ng and In fer enc e 143 ( 10), 1621–1633 . Zhang, T. (200 6 ). F rom e-En trop y to KL-En tro p y: Analysis of Minim um Informat ion Complexit y Densit y Estimation. The A nnals of Statistics 34 (5), 2180–2210. A Characterizatio n of t he com bination functi o n The goal of this section is to sho w the c hara cterization of the com bination function in Section 3.1. There are t w o cases to consider: k = 0 and k 6 = 0. Since the tw o cases ar e v ery similar, I will only consider the case where k 6 = 0. So supp ose that fo r some non-zero k , w e hav e: ∂ 2 c ( x, y ) ∂ x∂ y = k (A.1) T aking the antideriv ativ e with resp ect to x , it follo ws that: ∂ c ( x, y ) ∂ y = k x + C ( y ) + D (A.2) Where C ( y ) is a function of y , but not x , and D is some real n umber. T aking the antideriv ativ e of (A.2) with resp ect to y , w e get: c ( x, y ) = k xy + Z C ( y ) dy + Dy + G ( x ) + F (A.3) Where G is a function of x and F is some real n umber. Moreov er, exc hanging the lab els x and y in (A.3) giv es us: c ( y , x ) = k y x + Z C ( x ) d x + Dx + G ( y ) + F (A.4) 22 But since c ( x, y ) = c ( y , x ), (A.3) and (A.4) m ust b e equal, whic h means that k xy + R C ( y ) dy + D y + G ( x ) + F = k xy + R C ( x ) d x + Dx + G ( y ) + F , and hence R C ( y ) dy + Dy + G ( x ) = R C ( x ) d x + D x + G ( y ). Rearr a nging, w e get: G ( x ) = Z C ( x ) d x + Dx + G ( y ) − Z C ( y ) dy − Dy (A.5) But since G ( x ) do es not dep end on y , the only w a y for (A.5) to b e true is if G ( y ) − R C ( y ) dy − Dy is equal to s ome constant n um b er, c . Hence, R C ( y ) dy + D y = G ( y ) − c . Plugging this bac k in to (A.3) ( and abs orbing the constan t c in to F ), w e get: c ( x, y ) = k xy + G ( x ) + G ( y ) + F (A.6) Without loss of generalit y , w e ma y assume that G (0) = 0, b ecause if G (0) = A for some non-zero A , then w e can just put G ′ ( x ) = G ( x ) − A and F ′ = F + 2 A , and w e get: c ( x, y ) = k xy + G ′ ( x ) + G ′ ( y ) + F ′ , with G ′ (0) = 0 (i.e. we simply a bsorb the constant A into F ′ ). No w the fact tha t c is asso ciativ e and comm utat iv e means tha t c ( c ( x, y ) , z )) = c ( c ( y , z ) , x ), and henc e (A.6) implies that, for all x , y , a nd z : k ( k xy + G ( x ) + G ( y ) + F ) z + G ( k xy + G ( x ) + G ( y ) + F ) + G ( z ) + F = k ( k y z + G ( y ) + G ( z ) + F ) x + G ( k y z + G ( y ) + G ( z ) + F ) + G ( x ) + F (A.7) Simplifying, w e hav e: [ G ( x ) + G ( y ) + F ] k z + G [ k xy + G ( x ) + G ( y ) + F ] + G ( z ) = G ( y ) k x + G ( z ) k x + F k x + G [ k y z + G ( y ) + G ( z ) + F ] + G ( x ) (A.8) Note that b ecause c is t wice diﬀerentiable, so is G . T aking the deriv ativ e of eac h side of (A.8) with resp ect to z g ives : [ G ( x ) + G ( y ) + F ] k + ∂ G ( z ) ∂ z = ∂ G ( z ) ∂ z k x + G ′ [ k y z + G ( y ) + G ( z ) + F ] ∗ ∂ G ( z ) ∂ z (A.9) Next, ta king the de riv ativ e of eac h s ide of (A.9) with resp ect to x giv es: ∂ G ( x ) ∂ x k = ∂ G ( z ) ∂ z k (A.10) 23 Hence, since k 6 = 0 , it follo ws that ∂ G ( x ) ∂ x = ∂ G ( z ) ∂ z . But since G ( x ) do es not dep end on z and G ( z ) do es not dep end on x, this means tha t ∂ G ( x ) ∂ x m ust b e a constan t n um b er, i.e. ∂ G ( x ) ∂ x = a for some constan t a . Since w e are assuming that G (0) = 0, it follo ws that G ( x ) = ax . Next, t he fact that c ( x, y , z ) = c ( c ( x, y ) , z ) implies : k xy z + ax + ay + az + F = k ( k xy + ax + ay + F ) z + a ( k xy + ax + ay + F ) + az + F (A.11) Comparing the terms that contain xy z , 13 w e se e that k = 1, and henc e: ax + ay = axz + ay z + F z + axy + a 2 x + a 2 y + F a (A.12) Comparing the terms t ha t contain z , w e see that a ( x + y ) + F = 0 for all x and y . The only wa y this can b e true is if a = F = 0. Hence w e ha v e, ﬁnally , that c ( x, y ) = xy . B Characterizati o n of the nor malizati o n step The g oal o f this section is to show the c haracterization of the normalization step in Section 3.2. Let { a i } b e an arbitrary set of n nu mbers, S 1 , with normalization function f S 1 . Conside r the set S 2 = { 1 a i } and the set S 3 = { 1 i } , which consists of n copies o f 1. Then condition (1) implies that, for all i , f ( c ( f ( c ( 1 a i , a i )) , 1)) = f ( c ( 1 a i , f ( c ( a i , 1)))), where the v a rious f ’s are relativ e to the relev an t sets. F or exam- ple, in f ( c ( 1 a i , a i )), f is a rescaling function deﬁned on the set { c ( 1 a i , a i ) } . Note that w e are abusing notation here: strictly sp eaking the v arious f ’s are no t the same func- tion, since they are deﬁned o ve r diﬀeren t sets. How ev er, to av oid ne edless clutter, I use f without subsc ripts. According to the c haracterization of the com bination function, the com bination function is either multiplicativ e or additive. Since the deriv ations are ve ry similar, I will only sho w that the normalization function m ust b e m ultiplicativ e giv en that the com bination function is m ultiplicativ e. So supp ose that the combination function is c ( a, b ) = ab . Then w e get: f ( f ( 1 a i ∗ a i ) ∗ 1) = f ( 1 a i ∗ f ( a i ∗ 1 )). Th us, w e ha v e: f ( f (1)) = f ( 1 a i ∗ f ( a i )), i.e. f ( 1 a i ∗ f ( a i )) is a constan t. But since, f is one-to-one, that means 1 a i ∗ f ( a i ) m ust a lso b e a constant. That is, there exists a constan t k suc h tha t, for all a i in S , 1 a i ∗ f ( a i ) = k . Hence f ( a i ) = k ∗ a i for all a i . Since S 13 Whic h w e can do, a s befor e, by successively diﬀerentiating with resp ect to x , y , and z . This pro of method is s ometimes called “equating coe ﬃcie n ts” (T anton, 2005, p. 169 ). 24 w as a n arbitrary set, it follows that in general the normalization pro cedure m ust b e m ultiplicativ e giv en that the combination function is m ultiplicativ e. C Characteriz ati o n of inferen tial up dating The goal in this section is to sho w that the o nly legitimate up dating r ule that satisﬁes Regularit y is inferen tial up dating. According to the results in sections 3 .1 and 3.2, an y le gitima t e upda ting rule m ust either ha v e (1) a m ultiplicativ e com bination step and a multiplicativ e normalization ste p, or (2) an additive com bination step and an additiv e normalization step. It is easy to sho w that it is p ossible for an up dat ing rule that satisﬁes ( 1 ) to satisfy Regularity , and that – indeed – the resulting up dat ing rule is inferential up dating. In order to show that inferential up dating is the only up dating rule that s atisﬁes Regularity , it suﬃces to s how that there is no up dating rule satisfying (2) that also satisﬁes Regularity . Supp ose, for the sake of contradiction, that there is some up dating rule that satisﬁes bo th (2) and Regularit y . In order for Regularity to b e obeye d, it has to b e the case that giv en an y set of non-zero prior pro ba bilit ies o ve r a set of h yp otheses, h 1 , h 2 , . . . , h n , and giv en an y set of eviden tial scores for the hypotheses, e 1 , e 2 , . . . , e n , the p osteriors are also all non-zero. Th us, if N is the normalization function, then the following m ust b e true for all h i : N ( e i + h i ) > 0 (C.1) Since the normalization function is a ssumed to satisfy (2), C.1 implies that the follo wing is true for all i , where d is an additiv e normalizatio n constan t: e i + h i + d > 0 (C.2) Since the po sterior probabilities m ust sum to 1, w e also hav e: X i ( e i + h i + d ) = 1 (C.3) And therefore, d = − 1 n P e i . And so w e ha ve , for all h i : e i + h i − 1 n X e i > 0 (C.4) But it’s o bvious tha t (D.4) will not in gene ral b e true. F or example, suppose e 1 is the smallest e i . The n r = e i − 1 n P e i < 0. No w supp ose it’s a lso the case tha t h 1 < − r . Then we ha ve : 25 e 1 + h 1 − 1 n X e i = r + h 1 < 0 (C.5) Consequen tly , additiv e com bination and additiv e normalization jointly violate Regularit y . So there can b e no up dating pro cedure that satisﬁes b oth (2 ) and Reg- ularit y . D Characterizati o n of p redictiv e u p datin g The goal in this section is to sho w that the only legitimate up dating rule that viola tes Regularit y but satisﬁes Cons erv ativ eness is predictiv e up dating. It is clear that an y up dating rule that satisﬁes Conserv ativ eness but violates Regularity m ust b e addi- tiv e. This is b ecause an y m ultiplicative up dating rule that satisﬁes Conserv ativene ss clearly a lso satisﬁes Regularit y . So supp ose the up dating r ule is additiv e and satisﬁes Conserv ativ eness . Then the goal is to sho w that t he up dating rule m ust b e equiv alen t t o predictiv e up dat- ing. Since the rule is additiv e, it mu st hav e the following form, where p E is the p osterior probability distribution, H i is a hypothesis, h i is the prior probability of the h ypothesis, e i is the eviden tial score of the h yp othesis, and d is a normalization constan t: p E ( H ) = ( 0 Giv en that x is su ﬃcien tly low h i + e i + d Otherwise (D.1) If the up dating r ule is conserv ativ e, then as few h yp otheses as p ossible should be assigned a p osterior proba bility of 0. It remains to sho w that this uniquely happens when d is minimal. Supp ose there are n h yp otheses. Without loss of generalit y , supp ose the h ypo t heses are ordered suc h tha t 0 ≥ p E ( H 1 ) ≥ p E ( H 2 ) ≥ . . . ≥ p E ( H n ). Then there is some index m suc h that p E ( H i ) = 0 for i ≤ m and p E ( H i ) > 0 fo r i > m . Note tha t the up dating pro cedure is conserv ative if and only if m is minimal b ecause m is minimal if and only if a minimal num b er of h yp otheses ha ve a p o sterior probabilit y of 0 . In order for the p osterior probabilities to b e probabilistic, we m ust ha v e: X i p E ( H i ) = X i>m ( h i + e i ) + ( n − m ) d = 1 (D.2) No w supp ose we ha v e a diﬀeren t up da t ing rule resu lting in some po sterior p ′ that is not conserv ative : i.e. there an index m ′ > m suc h that p ′ E ( H i ) = 0 for i ≤ m ′ 26 and p ′ E ( H i ) > 0 for i > m ′ . Then p ′ m ust satisfy the follo wing constraint for some normalization c onstant d ′ : X i>m ′ ( h i + e i ) + ( n − m ′ ) d ′ = 1 (D.3) Comparing D .2 and D.3 and remem b ering that m ′ > m , we see that: 0 < m ′ X i = m ( h i + e i ) = ( n − m ′ ) d ′ − ( n − m ) d (D.4) And hence, d < n − m ′ n − m d ′ < d ′ (D.5) Hence, d < d ′ . What the ab ov e pro of sho ws is that any conse rv ativ e updating r ule has a smaller additive normalization constan t than an y non-conserv ativ e up dating rule. T o ﬁnish the pro of, w e show that there is just one conserv ative up dating rule. Here w e can use D.4 aga in. I f b o th up dating rules are conserv ativ e, then w e ha v e m = m ′ , and he nce – making the necessary amendmen ts in D.4, w e hav e: 0 = m ′ X i = m ( h i + e i ) = ( n − m ) d ′ − ( n − m ) d (D.6) Hence it fo llows that d ′ = d . But then the t w o up dating rules are equiv alen t. Hence, there is only one conserv ativ e up dat ing rule, namely the one tha t uses a minimal additive normalization constan t. This is predictiv e updating. E General Ba y esian up dating is a sp ecial case of inferen tial up dating The go al in this section is to show that Bissiri et al.’s (2 0 16) general Bay es ian up- dating is a sp ecial case of inferen tial up dating. F or some normalization constan t k , w e ha v e: p ( H | E 1 , E 2 ) = k ∗ Ev[ E 1 | H , E 2 ]Ev[ E 2 | H ] p ( H ) = k ∗ f ( L ( E 1 , H )) f ( L ( E 2 , H )) p ( H ) (E.1) But we also hav e: 27 p ( H | E 1 , E 2 ) = k ∗ Ev[ E 1 , E 2 | H ] p ( H ) = k ∗ f ( L ( E 1 , H ) + L ( E 2 , H )) p ( H ) (E.2) Comparing C.1 and C.2, w e see that f ob eys the following functional equation for all x and y : f ( x ) f ( y ) = f ( x + y ). Let g ( x ) = log f ( x ). Then g ( x + y ) = g ( x ) + g ( y ), whic h is the w ell kno wn Cauc h y equation whose solution is g ( x ) = − cx , for some p ositive constant c (Acz ´ el, 2006, p. 31) (since f , and therefore g , is strictly decreasing). Consequen tly f ( x ) = e − cx , a nd hence p ( H | E ) = k ∗ e − c ∗ L ( E ,H ) p ( H ), whic h is Bissiri et al.’s (2016) general Ba y esian upda t ing rule. F An alternative c haracterizatio n o f t h e com bina- tion ste p In b oth ev eryda y and scien tiﬁc con texts, it’s common to think of evidence alge- braically: m ultiple lines of evidence com bine in order pro vide stronger evidence; some evidence fav ors a h yp othesis, while other evidence g o es ag ainst it; a piece of evidence here can cancel out a piece of evidence there; and some purp orted evide nce has no eﬀect at all. In o ther words , eviden tial fa v oring has all the hallmarks of a mathematical group. No w, suppose – as w e hav e b een doing up to no w – that w e use real num bers to represen t eviden tial scores. Then the set of all p ossible eviden tial scores, G , to g ether with the comb inatio n function plausibly form a ma t hematical group. Indeed, they plausibly form an A r chime de an group, b ecause in tuitiv ely there is no maximal eviden tial score. That is, if we use • t o denote the com bination f unc- tion, i.e. e 1 • e 2 = c ( e 1 , e 2 ), then it is plausible that ( G, • ) satisﬁes the followin g axioms: 1. Closure. F or all p o ssible eviden tial scores e 1 and e 2 , e 1 • e 2 is als o a p ossible eviden tial sc ore. 2. Asso ciativity . F or a ll p ossible evid ential s cores e 1 , e 2 and e 3 , ( e 1 • e 2 ) • e 3 = e 1 • ( e 2 • e 3 ). 3. Iden tity . There exis ts a p ossible e viden tial score i suc h that for all e , i • e = e • i = e . I.e., there exis ts a real num ber that represe nts evid ence that has no eﬀect (either fa v orable or unfa v orable). 28 4. In verse. F or each p ossible evide ntial score e , there exists a p ossible eviden tial score e ′ suc h that e • e ′ = e ′ • e = i . I.e. ev ery eviden tial score could p ot en tially (in principle) b e cancelled out b y other coun terv ailing evidence. 14 5. Comm utativity . F o r all p ossible eviden tial scores e 1 and e 2 , e 1 • e 2 = e 2 • e 1 . I.e. the o r der in whic h the evidence is considered is irrelev an t. 6. Archime dean prop erty . F or all p ossible eviden tial scores e 1 and e 2 , there exists an in teger n su ch that e 1 < e 2 • e 2 . . . • e 2 (n times ). Supp ose, in addition, tha t the set of eviden tial scores is totally ordered: for all eviden tial scores e 1 and e 2 , either e 1 > e 2 or e 1 ≤ e 2 . 15 Then w e can use the follo wing imp ortan t result from group theory (see (Kop yto v and Medv edev, 1996, p. 33), for a pro of ): H¨ older’s theorem. Ev ery Arc himedean totally ordered group is order- isomorphic to a subgroup of the additiv e g roup of r eal n um b ers with the natural order. The fact that ( G, • ) is order-isomorphic to a subgroup of the additiv e g r o up of real n um b ers with the natural order means there exists some subgroup, ( S, +) of t he real n um b ers and a one-to -one function, g , from ( G, • ) to ( S, +) that o b eys the follo wing equation for all e 1 and e 2 in G : g ( e 1 • e 2 ) = g ( e 1 ) + g ( e 2 ). Since g is one-to-one, it has an in v erse, f . Hence, for all e 1 and e 2 in G , w e can write: e 1 • e 2 = f ( g ( e 1 ) + g ( e 2 )). In the main text, I show ed that the normalization pro cedure m ust be either addi- tiv e or m ultiplicativ e, give n that the com bination function is either m ultiplicativ e or additiv e. But, arguably , it is not unreasonable to simply assume that the normaliza- tion m ust b e either multiplicativ e or additive . Indeed, all up dating rules that hav e b een prop osed in the literature ha v e implicitly relied on a normalization pro cedure that is either m ultiplicativ e or additiv e. In particular, the normalization pro cedure implicit in b oth standard Ba ye sian up dating and Jeﬀrey updat ing (Jeﬀrey, 19 8 3) is m ultiplicativ e, and the normalization pro cedure implicit in Leitgeb and P ettigrew’s (2010) alternative to Jeﬀrey upda t ing is additiv e. Finally , it is reasonable to a ssume – as w e did in the main text – tha t the nor- malization pro cedure comm utes with the com bination function in the sense that, for all a , b , and c , we ha v e: N ( a • N ( b )) = N ( N ( a ) • b ) = N ( a • b ). W e can no w giv e the following c haracterization of the com bination function: 14 A referee p oints out that this is a bit of an idealization, since a piece of evidence and a defeater of that evidence will not t ypically cancel each other out precisely . 15 A referee rightly p oints out that this assumption is also idealized. 29 Alternativ e cha ract erization of the com bination function. Sup- p ose the c ombination function, c ( x, y ) satisﬁes the fol lowing r e quir ements : 1. The set of all eviden tial scores, G , and the com bination function c ( x, y ) = x • y together form a totally ordere d Arc himedean group. 2. The combination function comm utes with the normalization func- tion N in the sense that, for all a , b , and c : N ( a • N ( b )) = N ( N ( a ) • b ) = N ( a • b ). Then c must have one of the fol lowing two forms : 1. If the normalization function is additiv e, then c ( x, y ) = x + y . 2. If the normalization function is m ultiplicativ e, then c ( x, y ) = xy . Pr o of. The fact that the combination function commu tes with the normalization function implies that, for ev ery e with in v erse e − 1 : N ( e • e − 1 ) = N ( N ( e ) • e − 1 ) = N ( f ( g ( N ( e )) + g ( e − 1 ))) (F.1) Therefore, for all e , N ( f ( g ( N ( e )) + g ( e − 1 ))) = N ( i ), where i is the iden tity elemen t of t he group. Since N is one-to -one, this means that f ( g ( N ( e )) + g ( e − 1 )) = k , for some c onstant k that do es not dep end on e . F urthermore, s ince f is o ne-to-one, this in turn implies that g ( N ( e )) + g ( e − 1 ) = k ′ , f o r some constan t k ′ that do es not depend on e . F o r the same reason, (F.1) also implies that g ( e ) + g ( e − 1 ) = k ′′ , f or some constan t k ′′ that do es not dep end on e . Hence w e ha v e, ﬁnally , that g ( N ( e )) − g ( e ) = K , where K = k ′ − k ′′ . Hence, g ( N ( e )) = g ( e ) + K . If the normalization pro cedure is multiplicativ e, then for some normalization constan t a , w e ha v e g ( ae ) = g ( e ) + K . Note that a dep ends on the set to whic h e b elongs. If { e i } is t he set, then a = 1 P e i (F.2) Hence, depending on the o ther members of the set to w hich e b elongs, a can b e an y n um b er in the half-op en in terv al (0 , 1 e ). Thus w e hav e, f or all e a nd all a in (0 , 1 e ), that g ( ae ) = g ( e ) + K , where K is a constant that may depend on a , but do es not dep end on e . Similarly , we hav e—for some normalization constan t b —that g ( bae ) = g ( ae ) + K ′ = g ( e ) + K ′′ . Here, b can b e an y n um b er in the rang e (0 , 1 ae ), or in other w ords 30 in (0 , ∞ ). But if w e let y = ab and x = e , then the preceding means that for all x and y in (0 , ∞ ) we ha v e: g ( y x ) = g ( x ) + K ′′ (F.3) Where K ′′ dep ends on y , but not o n x . Interc hanging the role of y and x , we also ha v e: g ( xy ) = g ( y ) + K ′′′ (F.4) Where K ′′′ dep ends on x , but not on y . Comparing the ab ov e equations, w e se e that g ( x ) + K ′′ = g ( y ) + K ′′′ . This implies the follow ing: g ( xy ) = g ( x ) + g ( y ) + C (F.5) Where C is a constant that dep ends on neither x nor y . Now note that f (2 g ( i )) = i • i = i = f ( g ( i ). Since f is one-to-one, t his implies that g ( i ) = 0. Next, ( F.5) implies that g ( i ) = g (1 ∗ i ) = g (1) + g ( i ) + C . Th us g (1) = − C . Using (F.5) again, w e hav e g (1) = g ( i ∗ 1 i ) = g ( i ) + g ( 1 i ) = g ( 1 i ). But since g is o ne-to-one, this implies that 1 i = 1, so that i = 1. Hence − C = g ( 1) = g ( i ) = 0, so C = 0. Finally , then, we hav e, for all x > 0 and y > 0: g ( xy ) = g ( x ) + g ( y ) (F.6) No w pu t r ( x ) = g ( e x ). Then (F .6) b ecomes, f o r all re al x and y : r ( x + y ) = r ( x ) + r ( y ) (F.7) This is the Cauc h y functional equation, whose only s olution is r ( x ) = cx , for an arbitrary constan t c (Acz´ e l , 20 06, p. 31) . Hence, g ( x ) = r (log x ) = log x c . Since f is the inv erse of g , w e ha v e that f ( x ) = e x 1 c . Finally , then, w e ha v e: x • y = f ( g ( x ) + g ( y ) ) = e (log( x c )+log( y c )) 1 c = e ( c ∗ log( xy )) 1 c = xy (F.8) I.e. the combination function is m ultiplicativ e, c ( x, y ) = xy . 31

Justifying the Norms of Inductive Inference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment