Justifying the Norms of Inductive Inference

Bayesian inference is limited in scope because it cannot be applied in idealized contexts where none of the hypotheses under consideration is true and because it is committed to always using the likelihood as a measure of evidential favoring, even wh…

Authors: Olav Benjamin Vassend

Justifying the Norms of Inductiv e Inference Olav Benjami n V assend September 17, 2019 Abstract Ba y esian inference is limited in scop e b ecause it cannot b e applied in ide- alized con texts where none of the h yp otheses under consideration is true and b ecause it is committed to alw a ys using the lik eliho o d as a measure of eviden- tial fav oring, ev en when that is inapprop r iate. The pu r p ose of this pap er is to study ind uctiv e inf erence in a very general setting where finding the truth is not necessarily the goal and where the measure of evid ential fa v oring is not necessarily t h e lik eliho o d . I use a n accuracy argument to argue for probabil- ism and I dev elop a new kind of argumen t to argu e for t wo general u p dating rules, b oth of whic h are reasonable in differen t con texts. O n e of the u p dating rules has standard Bay esian up dating, Bissiri et al.’s (2016) general Ba ye sian up dating, Douv en ’s (2016) IBE-based up d ating, and V assend ’s (2019a) quasi- Ba y esian up dating as sp ecial cases. The other up dating ru le is no v el. Con tents 1 In tro duction 2 2 Wh y credibilit y functions should b e probabilistic 5 3 Deriving the up dating rules 7 3.1 The com bination step . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 The normalization step . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Characterizations of inferen tial and predictiv e updating . . . . . . . . 14 4 Discussion of inferen tial and predict ive up dating 15 4.1 The difference bet we en inferen tial up dating and predictiv e up dat ing . 15 4.2 The relationship b et we en inferen tial up dating and other upda ting pro- cedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1 5 Conclusion 18 A Characterization of the combin ation function 22 B Characterization of the normalization step 24 C Characterization of inferen tial up dating 25 D Characterization of predictive up dating 26 E General Ba yesia n up dating is a sp ecial case of inferen tial up dating 27 F An alternativ e ch aracterization of the com bination step 28 1 In tro duction Ba y esians hold that inductiv e inference requires t w o ingredien ts. First, a prior prob- abilit y function defined on the h yp otheses unde r cons ideration. Second, a lik eliho o d function, whic h assigns a probability to the evidence conditional on eac h hypothesis. In tuitiv ely , the prior pro babilit y assigned to a h yp otheses represen ts ho w plausible it is that the h yp othesis is true b efore the evidence has been tak en in to accoun t. The lik eliho o d, on the other hand, is a meas ure of eviden tial fa voring: if H 1 ’s lik eliho o d on the evidence is greater than H 2 ’s lik eliho o d on the same evidence , then the evi- dence fav ors H 1 o v er H 2 . Giv en a prior and lik elihoo d, Ba y esians hold that the prior probabilit y of each h yp othesis s hould be up dated to a posterior probabilit y through the us e of Bay es ’s fo r mula, so that the posterior probabilit y of H is prop ortional to the prior probabilit y o f H m ultiplied b y it s lik eliho o d. Ba y esianism has b ecome t he most common fo r mal framew ork used by philoso- phers of science to study scien tific metho dology , and it is also an influen tial frame- w ork for statistical inference. But it rests o n an assumption that is often violated in scien tific practice, namely that one of the h yp otheses under consideration is true. 1 Supp ose none of the h yp otheses under consideration is true, so that the goa l is in- stead t o find t he hypothesis that is – in some sense – b est. Dep ending on what is mean t b y “b est,” the lik eliho o d ma y not b e an a ppropriate measure of eviden tial fa v or ing . F or example, suppose the goal is to iden tify the h yp othesis whose exp ected 1 This limitation is w ell kno wn, but o ften ignor ed. F or discussion o f the problem, s e e, e.g . Box (1980); Ber nardo a nd Smith (1994); F orster and Sob er (1994); F o rster (1995); Key et al. (1999); Shaffer (20 01); Sprenger (2009); Gelman and Shalizi (2013); V ass end (2019 b ); W alker (2013); a nd Sprenger (forthcoming). 2 maximal prediction error on f utur e data is as low as p o ssible. Then, as V assend (2019a) sho ws, the likelihoo d is not an appropriat e measure o f eviden tial f a v oring b ecause the h yp othesis that has the best lik eliho o d score on the evidence will in gen- eral not be the h yp othesis that has the low est expected maximal prediction error on future data. In this con text, a more reasonable measure of eviden tial fav oring ma y b e one according to whic h the evide nce fav ors H 1 o v er H 2 if and only if H 1 ’s maximal prediction e rro r on the evidence is low er than H 2 ’s maximal prediction error on the evidence. The fact that Ba y esianism is tied to using the lik elihoo d as a measure of eviden tial fa v oring is therefore a limitation of the framew ork. The go al of this pa p er is to study inductiv e inference in a very general setting. Supp ose o ur goal is to identify the b est h yp othesis H (where “b est” do es not nec- essarily mean “true”). Let p b e a function that assigns a num ber b etw een 0 and 1 (inclusiv e) to eac h h ypo thesis, suc h that p ( H ) is interpre ted a s represen ting a prior judgmen t of how plausible it is that H is b est (in the relev an t sense) out of t he h yp otheses under consideration. In the rest o f the pap er, I will r efer to any suc h function as a “credibilit y function”. Supp ose, mo r eo v er, that Ev [ E | H ] is an evi- den tial measure that is sensible giv en the purp ose at hand. Then the questions to consider are as fo llows: (1) What norms should p ob ey? (2 ) Ho w should p ( H ) and Ev[ E | H ] b e com bined in order to pro duce a p osterior score p E ( H ) that represen ts ho w plaus ible it is that H is best in ligh t of E and the prior informatio n? As we will s ee, one of the standard Bay esian arg umen ts fo r probabilism general- izes, so that – given widely applicable conditions – p and p E ough t to b e probabilit y functions. The more interes ting results concern up dating. I will show tha t, de- p ending on what t he goal is, the prior probability function and eviden tial measure should be combine d in one of the follo wing t wo w ays in or der to pro duce a p osterior probabilit y: Inferen tial up dating. G iv en eviden tial measure Ev and prio r proba- bilit y function p , up date p to the p osterior p E b y w ay of the following form ula: p E ( H ) = Ev[ E | H ] p ( H ) P i Ev[ E | H i ] p ( H i ) Predictiv e up dating. Giv en eviden tial measure Ev and prior proba - bilit y function p , up date p to the p osterior p E b y w ay of the following pro cedure: Step 1. F or eac h i , calculate q ( H i ) = p ( H i ) + Ev[ E | H i ]. 3 Step 2. T ransform q to p E as follo ws: for eac h i , p E ( H i ) = 0 or p E ( H i ) = q ( H i ) + d , where d is the unique n umber suc h that d is minimal and, for all i , p E ( H i ) ≥ 0 and P i p E ( H i ) = 1. The justification for the names of the t w o up dating pro cedures will b ecome clearer later. Inferen tial up dating is clearly a generalization of Ba y esian up dating. Indeed Ba y esian up dating is just inferen tial up dating with the lik eliho o d used as the me asure of eviden tial fav oring. 2 What separates inferen tial up dating f rom predictiv e up dating is the former rule’s commitmen t to R e gularity : inferen tial up da ting will nev er assign a probabilit y of 0 to an y hypothesis, whereas predictiv e up dating typic ally will. In Section 4, we’ll see that a commitmen t to Regularit y is sometimes r easonable and sometimes no t. The pla n for the rest of the pap er is as follo ws. In Section 2, I sk etch a n argumen t for wh y an y credibilit y function ought to b e probabilistic, regardless of whether the goal is truth or s omething else. Since the arg umen t is a straigh tforw ard adaptation of Pe ttig rew’s (201 6) accuracy argumen t for probabilism, the section is brief. In Section 3, I giv e c haracterizations of inferential and predictiv e up dating fro m a set of plausible assumptions. The strategy is to divide inductive up dating in to t w o steps: in t he first step, the prior plausibilit y of a h yp othesis is combined with the h yp othesis’s score on the evidence acc ording to some meas ure of eviden tial fav oring in o rder to pro duce a p osterior score. In the second step, the p osterior scores are normalized so that they are probabilistic. As w e’ll see, the requiremen t that the com bination step and normalization step comm ute in certain desirable w ays , to g ether with a few other plausible assumptions, result in the conclusion that the combination step and normalization step m ust b oth b e either multiplicativ e or a dditiv e. The c haracterizations of inferential and predictiv e up dating are then just a few short steps a w ay . I end the pap er with a discussion of inferen tial and predictiv e up da ting, including the ir relationship to eac h other and to other up dat ing rules. 2 Predictive up dating, on the other hand, may remind the reader o f the alternative to Jeffrey conditionalization derived b y Leitgeb and Pettigrew (2010). The tw o rules do indeed share sev- eral features in co mmon, although they are als o impor tan tly different. In fact, it is p ossible to derive a sp ecial cas e of predictiv e up dating by us ing a pro of strategy that resemb les the one in Leitgeb and Pettigrew (20 10). 4 2 Wh y cre dibilit y functio n s should b e probabi l is- tic Before w e can show that credibilit y functions oug ht to b e probabilistic, w e need to get clearer on what this claim amounts t o. Let H b e a set of hy p otheses and suppose the goa l is to iden tify the h yp othesis in H that is b est rather tha n true ( where “b est” can mean anything w e lik e). One complication that arises when “true” is replaced b y “b est” is that whereas there is only o ne true hypotheses, there ma y b e sev eral that are b est. 3 F or example, if “b est” means “ ha ving a minimal maxim um exp ected prediction error,” then there may b e sev eral hypotheses t hat are tied for b est. Note, how ev er, t hat this is mor e a theoretical p ossibility than a practical one, since it is quite unlik ely that m ultiple h ypot heses w ould ha v e (sa y) exactly the same predictiv e accuracy score, esp ecially if the n um b er of h yp otheses is large. I will henceforth a ssume that at most one hypothesis out of the h yp otheses under consideration is b est. Note t hat if we mak e this assumption, then the hy p o t heses will also b e m utually exclusiv e in the sense that in any subset of hy p otheses a t most one hypothesis can b e b est. Another theoretical p o ssibility is that none of the h yp otheses under consideration is b est. This can, for example, happ en if the hypothesis space is infinite and do es not con tain a single b est h yp othesis, but rather an infinite sequence of h yp otheses in ascending order of g o o dness. 4 T o preclude this p ossibilit y , w e m ust also assume that at least one o f the h yp otheses under consideration is b est. Pro vided w e mak e the ab ov e assumptions (i.e. that exactly one o f the h ypot heses in H is b est), then there is nothing mathematically or philosophically that prev en ts us from treating H a s a sample space . I.e. H consists of h yp otheses that are ex haustiv e in the sense that one of the h yp otheses is b est and m utually exclusiv e in the sense that at most one o f the h yp otheses is b est in an y collection of h yp otheses. Note also that there is a natural σ -algebra on H . More precisely , union (or disjunction) and in tersection (or conj unction) are defined in the normal wa y , the identit y elemen t for conjunction (i.e. the top elemen t of the algebra) is H , and the complemen t (negation) of any set A formed through unions and inte rsections of subsets of H is defined in the follo wing w a y: ¬ A := H − A . The main difference fro m the definition giv en in most philosophical treatments of Bay es ianism is that t he top elemen t is now H rather than the tauto logy . This mak es a big in terpretiv e difference, but no difference to the mathem at ics. 3 I thank X for po in ting this out to me. 4 I thank a referee for p oin ting out this p ossibility . 5 Giv en t he ab o v e set-up, w e can no w define what it means for a function on the algebra, H ∗ , generated b y H to b e probabilistic in the follo wing w ay: Probabilit y axioms. A function p defined on H ∗ is probabilistic if a nd only if it satisfies the follo wing requiremen ts: 1. p ( H ) = 1. 2. p ( A ) ≥ 0 for all subsets A of H ∗ . 3. p ( A ∨ B ) = p ( A ) + p ( B ) − p ( A & B ), fo r all subsets A and B of H ∗ . Note that credibilit y functions automatically satisfy 2 since w e hav e define d them to ha v e a range b etw ee n 0 and 1, so the real question is whether they ought to satisfy 1 and 3. One of the standard arguments for wh y r egula r credence f unctions (o r degrees of b elief ) ough t to b e probabilistic is the accuracy argumen t (Joy ce (1998), Jo yce (2009), P ettigrew (2016), Predd et al. (20 0 9)). Briefly , the a rgumen t is as follo ws: 5 the ideal credence function to ha v e is the function that assigns 1 to the h yp othesis that is true and 0 to all hy p otheses that are false. Supp ose no w t ha t w e ha v e a div ergence measure (satisfying certain reasonable properties) that quan tifies the distance b etw een the ide al function and an y other candidate credence function. It can then b e sho wn that an y credence function that is not probabilistic will b e dominated b y some proba bilistic function in the sense that the probabilistic function will b e guaran teed to hav e a smaller div ergence from the ideal function. Since it is irrational to choose an o ption that is known to b e dominated, it follows that it is irrational to use a non-pro babilistic credence function. An in teresting fact ab out the accuracy argumen t for probabilism is that it do es not depend for its v alidit y on any sp ecific in terpretation of the credence function, nor do es it dep end on the assumption that the ideal credibilit y function is the function that assigns 1 to the hy p othesis that is true and 0 to all hypotheses that are false. Indeed, nothing in t he accuracy argumen t prev en ts us from designating t he ideal credibilit y f unction otherwise. Hence, w e can easily adapt the argumen t to a con text where the goal is to iden tify the h ypothesis that is b est rather than true. In suc h a con text, the ideal function w ould clearly b e one that assigns 1 to the h yp o thesis that is b est a nd 0 to all other hypotheses. W e can then form ulate the following v ersion of the accu ra cy argumen t: 5 There a re se v er a l versions of the argument; here, I present a v aria n t of Pettigrew’s (2016) version. 6 P1: The ideal credibilit y function is the function tha t a ssigns 1 to the h yp othesis that is b est and 0 to a ll other h ypo theses . P2: G iv en an y non-probabilistic function, there is a probabilistic function that is gua r a n teed to hav e a smaller div ergence fr o m the ideal function (giv en that the div ergence measure has certain reasonable pro p erties). P3: Giv en any probabilistic function, there do es not exist an y function that is gua r a n teed to hav e a smaller div ergence fr o m the ideal function (giv en that the div ergence measure has certain reasonable pro p erties). P4: If P1-P3, then non-probabilistic credibilit y func tions are irrational. C: Non-pro ba bilistic credibilit y functions a r e irrational. P2 and P3 are ma t hematical theorems (pro v en b y Predd et al. (20 09)) that hold regardless of what w e c ho ose as the ideal function. P1 and P4, on the other hand, are in tuitiv ely reasonable general rational principles. The main question that ma y b e ra ised ab out the generalized v ersion of the accuracy argument is whether the conditions on the dive rg ence measure are still reasonable when truth is no longer the goal. F o r e xample, P2 and P3 require t he assumption that the div ergence measure b elong to the class of Bregman dive rgences. Is this a reasonable requiremen t to mak e? My only resp onse to this question is that I do not see ho w this assumption (and other necessary mathematical assumptions) are more plausible if truth is the goal than if the goal is to iden tify the hypothesis that is best in some o ther se nse. So, at least in m y ey es, the generalized accuracy argumen t is at least as plausible as the original argumen t. In an y case, my m ain goal in this pa p er is not to giv e a careful analysis of the a ccuracy ar g umen t. F rom now I will assume that an y credibility function ought to b e proba bilistic. That is, I will assume that if p is a function that assigns a n umber b et we en 0 and 1 to eac h h yp othesis H that represen ts ho w plausible it is that H is b est (in some sens e), then p o ugh t to b e probabilistic. In the next sec tion, I turn to the main q uestion of the pa p er: giv en a probabilit y function p and giv en a pie ce of evidence E , ho w should p b e up dated in ligh t of E ? 3 Deriving the up datin g rules Supp ose we hav e a credibility function define d on a h yp othesis set H tha t is proba- bilistic in the se nse of the preceding section. Supp ose, also, that w e ha v e a n ev iden tia l measure function Ev[ E | H ] defined on the set of evidenc e and the set of h ypotheses 7 under consideration. Note that we are not a ssuming that Ev [ E | H ] is probabilistic (e.g. P i Ev[ E | H i ] need no t sum to 1). It is widely a ccepted t ha t if the goal is to find the true hypothesis in a partit ion of h yp otheses a nd the eviden tial measure is the lik eliho o d, i.e. Ev [ E | H ] = p ( E | H ), then a ny probability function ov er the h yp otheses ough t to be updated through Ba y esian up dating: Ba y esian up dating: p E ( H ) = p ( E | H ) p ( H ) P i p ( E | H i ) p ( H i ) The natural generalization o f Bay es ian up da ting is what I ha v e c alled infe rential up dating in the in tro duction. How ev er, it is not clear wh y the prior probability function and the eviden tial measure should alw ay s b e combined in a Bay esian-lik e manner, rega r dless of what the eviden tial measure is and regardless of what the pur- p ose of up dating is. Unfortunately , whereas the accuracy argumen t for probabilism do es not mak e any assumptions ab out how the credibilit y function is in terpreted, the standard accuracy argumen t for Ba ye sian up dating (G rea v es and W allace, 20 06) relies on prop erties that are unique to the lik elihoo d, in par t icular the f a ct that the lik eliho o d forms a join t distribution with the prior. Th us, the standard accuracy argumen t do es not generalize to cases where the eviden tial measure is not the lik e- liho o d. Other standard arguments for Bay es ian up dating ha v e the same limitation (e.g. Dutch b o ok arguments ). A different kind of approac h is therefore needed. Bissiri et al. (2016) come up with a differen t a pproac h. They sho w that provided that the eviden tial measure is a function of an additiv e loss function, L ( E , H ), suc h that Ev[ E 1 & E 2 | H ] = f ( L ( E 1 , H ) + L ( E 2 , H )), and giv en that a few other assump- tions a re met, then the up dating pro cedure m ust ha v e the follo wing form, where c is some constan t: p E ( H ) = e − c ∗ L ( E | H ) p ( H ) P i e − c ∗ L ( E | H i ) p ( H i ) (3.1) Bissiri et al. (2016) call the abov e up da ting pro cedure “general Ba y esian up dat- ing.” General Ba y esian up dat ing tr a ces bac k to Zhang (2 006) and been increasingly influen tial in statistics in recen t ye ars. 6 Although Bissiri et al.’s (201 6) argumen t for general Ba y esian updating is intere sting, it has sev eral limitations. One problem is that, as V assend (2019b) ar gues, the probabilities in (3.1) cannot b e in terpreted in the standard Bay esian w ay as plausibilities of truth. But if the pro babilities are not standard credibilit y f unctions, then the decision theoretic fra mew ork assumed b y Bissiri et al. (2016) would seem to lack justification. The argumen t also mak es 6 See Gr ¨ un wald a nd v an O mmen (2017) for a thorough discussion of gener al Bay esian updating and related updating r ules. 8 certain mathematical assumptions that seem hard to justify from a philosophical p oin t of view. In particular, the a uthors base their argument in part on t he use of statistical div ergence measures , and they assum e that the div ergence belongs to the class of f -div erences. 7 This assumption r ules out many standar d div ergence mea- sures, including all Bregman div ergences aside from the Kullback-Leibler div ergence (Amari, 200 9). 8 A final limitation of Bissiri et al.’s (2016) deriv ation is that there are many reasonable eviden tial measures tha t cannot b e written as a function of an additiv e loss function. Indeed, ev en the lik eliho o d will only hav e suc h a f o rm if the evidence is indep enden t conditional on H i , for all i . 9 Th us, althoug h their argumen t is interesting, a mor e g eneral approach that mak es less restrictiv e a nd more philo- sophically defens ible assumptions is desirable. That is the goal of this section. Later w e will see that Bissiri et al.’s (2016) up dating r ule ma y b e deriv ed as a sp ecial case. T o start, note that ordinary Ba y esian up dat ing can be decomp osed in to t w o steps: Com bination step. F or each i , c alculate p ∗ ( H i ) = p ( E | H i ) p ( H i ). Normalization step. T ransform p ∗ to p ′ as follo ws: for eac h i , p ′ ( H i ) = p ∗ ( H i ) p ( E ) . In the first step, the prior plausibility of t he hypothesis is combine d with t he eviden tial score ( i.e. lik eliho o d) of the h yp othesis in order to pro duce an o v erall judgmen t of the h yp othesis’s p osterior plausibility . In the second step, the p osterior plausibilit y of a ll t he hypotheses a re rescaled in suc h a w a y t ha t they j o in tly ob ey the probabilit y axioms, i.e. suc h that all the posterior plausibilit y scores fall b et w een 0 and 1, inclusiv e, and join tly sum to 1. Ba y esian up da t ing is a sp ecial case of a m uc h broa der clas s of up dating rules that decomp ose in to a com bination step and a no rmalization step. The purp ose of the re- mainder of this pap er will b e to study this class of up dating rules. The com bination step requires a combination function, c , that tak es as its input a prior probability , p ( H ) a nd a set of evid ential scores, Ev[ E 1 | H ], E v [ E 2 | H , E 1 ], Ev[ E 3 | H , E 1 , E 2 ], etc., and that assigns a total score to H , taking in to consideration b oth its prior proba- bilit y and its p erformance on the evidence . The normalization step then transforms 7 They also give an alternative deriv a tio n tha t does not make t his as sumption. How ever, the alternative deriv ation makes other susp ect a ssumptions. In particular, it assumes that the normal- ization pro cedure is m ultiplicative, whic h we’ll see later in this paper can b e put in to question. 8 Recall that Bregman divergences pla y a crucia l r o le in the accuracy argument for probabilism. The justification for the fo cus on Breg ma n div ergences is their tight connection to strict propriety (see Predd et al. (2009)). 9 If p ( E 1 , E 2 | H ) = p ( E 1 | H ) p ( E 2 | H ), we can write p ( E 1 , E 2 | H ) = e log p ( E 1 | H ) +log p ( E 2 | H ) , i.e. the likelihoo d is of the form required b y Bissiri et al. (2016). B ut if p ( E 1 , E 2 | H ) 6 = p ( E 1 | H ) p ( E 2 | H ), then w e cannot write the likeliho o d in this wa y . 9 those scores in to probabilities. In other w ords, on an abstract lev el, our purp ose will b e to s tudy up dating pro cedures that decomp ose in the following w a y: Com bination step: F or eac h h yp othesis, H i , a set of eviden tial scores and a prior probabilit y are com bined using some com bination function c in order to pro duce an o v erall posterior score for H i . Normalization step: The p osterior scores of all the H i are transformed using some func tion N suc h that they join tly satisfy the probability ax- ioms. In the nex t t w o subsec tio ns the com bination step and t he normalization step are analyzed in detail. T he g oal is to sho w t ha t – given reasonable assumptions – the com bination function c and the normalizatio n function N b oth hav e a v ery limited set of possible functional forms. 3.1 The com bination step Let e 1 and e 2 represen t the eviden tial scores of a h yp othesis H on some evidence, and let h represe nt H ’s prior probabilit y; then there are t w o candidate forms f or t he com bination function that arguably stand out as b eing particularly plausible: Additiv e com bination: c ( e 1 , e 2 , h ) = e 1 + e 2 + h Multiplicativ e com bination: c ( e 1 , e 2 , h ) = e 1 ∗ e 2 ∗ h Note that e 1 and e 2 here ma y r epresen t either conditional or unconditional ev- iden tial scores. F or example, e 1 ma y represe nt Ev [ E 1 | H ], i.e. the unconditional eviden tial score of H on E 1 , or it may represen t Ev[ E 1 | H , E 2 ], i.e. the conditional eviden tial score of H o n E 1 giv en that E 2 has already been tak en in to accoun t. Note, also, that to say that the com bination function is additive or m ultiplicative is not the same as s aying that the evid ential measure is additiv e or m ultiplicativ e in the sense that Ev[ E 1 , E 2 | H ] = Ev[ E 1 | H ] + Ev [ E 2 | H ] or Ev [ E 1 , E 2 | H ] = Ev[ E 1 | H ] ∗ Ev [ E 2 | H ]. The latter assum ptions are mu ch stronger, and amoun t to assuming that E 1 and E 2 are indep enden t conditional o n H (relat ive to t he evide ntial measure Ev). If w e mak e a few reasonable assumptions, w e can prov e that the com bination function m ust b e m ultiplicativ e or a dditiv e. First of all, suppose w e ha v e ev idential scores e 1 and e 2 , and a pr io r probabilit y h . Clearly , the order in whic h w e com bine the eviden tial scores a nd the prior should not matter for the final result w e get. 10 That is not to sa y that the o r der in whic h the evidence is rec eiv ed do es no t matter; it ma y . F o r example, if w e flip a coin and the outcomes ar e six heads in a row and then six tails in a row , then the order of the outcomes strongly suggest that the outcomes are probabilistically dep enden t. Nev ertheless, the order in whic h we ev a lua te the a v ailable pieces of evidence in order to pro duce an ov erall judgmen t should not influence the o ve rall judgmen t a t whic h w e a r r iv e. F or that reason, the com bination function should b e comm utative : c ( e 1 , e 2 ) = c ( e 2 , e 1 ). F urthermore, it clearly should not matter whe ther w e first com bine e 1 and e 2 and then com bine the result of that with e 3 , or whether w e com bine e 2 with e 3 and then com bine the result with e 1 , o r whether we com bine all three pieces of evidence at the same time. In other words, c should be associative : c ( e 1 , c ( e 2 , e 3 )) = c ( c ( e 1 , e 2 ) , e 3 ) = c ( e 1 , e 2 , e 3 ). The final reasonable requiremen t is more quan titat ive. Clearly , the impact that e 1 has on H ’s ov erall ev idential score, after e 2 has already b een tak en in to acc ount, should not dep end on the impact that e 2 has on H . That is not to say that a piece of evidence E 2 should not influence the impact that a differen t piece o f evidence E 1 has on H ’s eviden tial score; it may we ll, but if it do es it should do so through Ev[ E 1 | H , E 2 ]. A piece of evid ence may influence the eviden tial impact conferred by another piece of evidence, but the eviden tial scores themselv es should not influence eac h other. In other words, the requiremen t is that the impact that, fo r example, e 1 = Ev [ E 1 | H , E 2 ] mak es on H ’s total eviden tial score should not dep end on the impact that e 2 = Ev[ E 2 | H ] mak es on H ’s total eviden tial score, nor vice v ersa. Giv en t ha t w e are willing to suppose that the com bination function is t wice differen tiable, the pre ceding requiremen t ma y b e naturally formalized as constrain ts on the part ia l deriv ativ es of the com bination function. Let c ( x, y ) b e the com bination function as a function of v ar iables x and y . Then the impact that the eviden tial score e 1 mak es on H ’s to tal eviden tial score is pla usibly the v alue of the partial deriv ativ e of c ( x, y ) with resp ect to x , when ev a lua ted a t x = e 1 . If ∂ c ( x,y ) ∂ x c ( x = e 1 , y ) is a large n um b er, then tha t means setting x to e 1 mak es a large difference to H ’s o v erall eviden tial sc ore; if it is 0, then e 1 mak es no difference . The requiremen t that the impact that e 1 mak es should not dep end o n the impact that e 2 mak es, nor vice v ersa, for any e 1 and e 2 , ma y then b e formalized in t erms of a constrain t on the higher-order partial deriv ativ es of c , namely that for some cons tant k the follo wing equation be obey ed: ∂ 2 c ( x, y ) ∂ x∂ y = k The ab ov e equation f o rmalizes the idea that the impact that x mak es, i.e. ∂ c ∂ x , should not dep end on the impact that y mak es, i.e. ∂ c ∂ y , where x and y represen t 11 an y p ossible eviden tial scores. W e can no w sho w the fo llowing (the deriv ation is in App endix A): Characterization of t he combina tion function. Supp ose the c ombi- nation function, c ( x, y ) satisfies the fol low i ng r e quir em e nts : 1. c is comm utativ e. 2. c is ass o ciativ e. 3. c is t wice differen tiable. 4. c ’s pa rtial deriv ativ es satisfy the follo wing eq uation, for some n um- b er k : ∂ 2 c ( x, y ) ∂ x∂ y = k Then c must have one of the fol lowing two forms : 1. If k = 0, then c ( x, y ) = x + y . 2. If k 6 = 0, then c ( x, y ) = xy . Hence, it follo ws that the combination f unction m ust b e additiv e or m ultiplicativ e. Of course, this conclusion is only as plausible as the assump tio ns f r om whic h it is deriv ed, a nd some p eople ma y b e uncomforta ble with some of t he assumptions that ha ve b een made, in particular the condition on the partial deriv atives o f the com bination function. As it happ ens, it’s p ossible to deriv e the conclu sion from quite differen t assumptions. Hence, in order to sho w the robustness of the conclusion, I pro vide an alternativ e c haracterization of the com bination function in Appendix F. 3.2 The normalization step After the combination function has pro duced a posterior plausibilit y score, the p os- terior score m ust b e normalized to b e a probability . In theory , normalizing a set of n um b ers means transforming the n umbers in suc h a w a y tha t they are all b et w een 0 and 1 and join tly sum to 1, while at t he same time retaining as m uc h of their inte rnal structure as p ossible. In practice, this means that the most extreme num b ers in t he set ma y b e f orced to tak e the v alue 0, while the remaining n umbers in the set ar e rescaled b y some function, f . In other w ords, normalizatio n in general tak es the follo wing functional form: 12 N ( x ) = ( 0 Giv en that x is su fficien tly low f ( x ) Otherwise (3.2) F or example, in the normalizatio n step of standard Ba y esian up dating, N ( x ) = f ( x ) (i.e. no non-zero num bers are normalized to 0) a nd if the set to b e normalized is { a 1 , a 2 , . . . , a n } , then f ( x ) = 1 P i a i . Note t hat b o t h N and f a re relative to the set that is b eing normalized; hence, if w e need to b e precise, w e should write N S and f S , where the subscript indicates the set that is b eing normalized. Nev erthele ss, I will t ypically lea v e off the subscripts in order to a v oid clutter. Clearly , f should b e a one-to-one function. Indeed , except in the case where x and y are bo th normalized to 0, it should b e the case that if x < y then f ( x ) < f ( y ). F urthermore, it is clear that the f unction f ough t to comm ute with the com binatio n function. Supp ose w e ha v e scores e 1 , e 2 , and h . Then we should arriv e at the same p osterior probabilit y regardless of whether w e do either of the following: first w e combine h and e 2 , normalize, then comb ine the normalized result with e 1 and normalize ag a in; or we first com bine h and e 1 , normalize, and then com bine that normalized result with e 2 b efore normalizing again. In sym b ols, we require, f o r all p ossible scores x , y , and z , that: f ( c ( x, f ( c ( y , z )))) = f ( f ( c ( x, y ) , z ) ). The justification for this requiremen t is, again, tha t the order in whic h w e ev aluate our evidence – whic h is arbitrary – should not ha v e an influe nce on our final judgment. By comb ining just the preced ing t wo requiremen ts, w e can sh ow the follo wing: Characterization of the normalization pro cedure. Supp ose we have a normalization pr o c e dur e as in (3.2) that sa tisfi e s t he fol low ing r e quir e- ments : 1. f comm utes with the com bination function c . F or all x , y , and x : f ( c ( x, f ( c ( y , z )))) = f ( f ( c ( x, y ) , z )). 2. f is o ne-to-one: for all x and y , f ( x ) = f ( y ) if and only if x = y . Then the norm a lization pr o c es s must have one of the fol lowing fo rms, f o r some c onstant k that dep ends on the set, S , of numb ers b eing normalize d : 1. If the com bination function is m ultiplicativ e, then, for all x in S , f ( x ) = k ∗ x . 2. If the com bination function is additiv e, then, for all x in S , f ( x ) = x + k . The pro of, whic h again is stra ig h tforw ard, is in App endix B. 13 3.3 Characterizations of inferen tial and p redictiv e up dating The results so fa r show that an y up da ting pro cedure needs to hav e either: ( 1) A m ultiplicativ e com bination step and a multiplicativ e normalizatio n step, or (2) an additiv e com binatio n step and an additive no rmalization step. Call an up dating pro cedure that satisfies either (1) or (2) a legitimate up dating procedure. 10 T o c haracterize inferential up dating w e now in tro duce the follo wing principle: Regularit y: No h yp othesis is ev er conclusiv ely r uled out by an y evid ence unless the evidence log ically refutes t he h yp othesis, i.e. the p osterior probabilit y of an y h ypo thesis is alw a ys greater than 0. W e can then show the fo llo wing (see App endix C): Characterization of inferen tial up dating. Th e only legitimate up- dating pro cedure that satisfies R egularit y is inferential up dating. I.e., giv en eviden tial measure Ev and prior probabilit y f unction p , up date p to the posterior p E b y wa y o f the follo wing form ula: p E ( H ) = Ev[ E | H ] p ( H ) P i Ev[ E | H i ] p ( H i ) Inferen tial up dating satisfies Regularity ; it will nev er result in a ny h yp othesis ha ving a p osterior probability of 0 . On the other hand, in Appendix C, I show that an up dating pro cedure that uses an additive com bination function and an additive normalization function must violate Regularit y; most of the time, a n y suc h up dat ing rule m ust assign a p o sterior proba bilit y of 0 t o some hy p otheses. But this do es not mean that suc h an up dating rule should neve r b e used. As w e will see in the next section, sometimes we ma y w an t to be able to exclude certain h ypotheses from consideration—i.e., ass ign them a posterior probabilit y of 0. Nev ertheless, w e do not wan t to exclude more h yp otheses than is warran ted by the data . The up dating pro cedure ought to b e conserv ativ e and exclude as f ew h yp otheses as p ossible at ev ery step. In other words, an y up dating pro cedure that violates Regularit y should pla usibly still satisfy the follo wing principle: 10 Note that not every up dating rule that has bee n sug gested in the literature is legitimate in this sense o f the word. F or e x ample, Douven and W enma c kers (2017) consider a rule according to which p E ( H ) = c ∗ ( p ( H ) ∗ p ( E | H ) + f ( E , H )) where c is a nor malization constant and f ( E , H ) is a “b onus” assigned to H in case H is the best explanation of E . This up dating rule is not leg itimate bec ause it is neither purely additive nor purely m ultiplicative. On the other hand, the class of rules considered in Douv en (2 0 16) are legitimate. 14 Conserv ativen ess: The up dating pro cedure assigns a p osterior proba- bilit y of 0 to as few h yp otheses as p ossible, giv en the com binatio n func- tion, the normalization pro cedure, and the evidence a v ailable. W e are no w in a p osition to c haracterize predic tive up dating: Characterization of predictiv e up dating. The o nly legitimate up- dating pro cedure that violates Regularity , but satisfies Conserv ativ eness, is predictiv e up dating. I.e., given eviden tial measure Ev and prior prob- abilit y function p , up date p to the p osterior p E b y wa y of the follow ing pro cedure: Step 1. F or eac h i , calculate q ( H i ) = p ( H i ) + Ev[ E | H i ]. Step 2. T ransform q to p E as follo ws: for eac h i , p E ( H i ) = 0 or p E ( H i ) = q ( H i ) + d , where d is the unique n umber suc h that d is minimal and, for all i , p E ( H i ) ≥ 0 and P i p E ( H i ) = 1. 4 Discuss ion of inferen tial and predic t iv e up dat- ing 4.1 The difference b et w een inferen tial up dating and predic- tiv e u p dating Inferen tial upda t ing and predictiv e updating differ in that the former up dating rule ob eys Regularit y while t he latter rule do es not. Is Regularit y a reasonable constrain t? In some contexts it is, but in others it is not. Supp ose our main priorit y is to iden tify the hypothesis that is tr ue or (if none of the h yp otheses is true) the h yp othesis that is closest to the truth a ccording to some appropriate measure of closeness to the truth. Giv en this goal, it is reasonable to be ris k-av erse a nd op en-minded: w e do not w an t to rule out any hypothesis as p otentially b eing the h yp othesis that is true. Ev en if a lot o f evidence strongly suggests that a hypothesis is f alse, there is alw ay s the p ossibilit y that the evidence is unrepresen tativ e or misleading. And so Regularit y is a reasonable constrain t in this con text. Ho w ev er, supp ose we do not care ab out whic h of our h yp otheses is true or closes t to the truth; our goal is not inferen tial, but predictiv e. W e wish to find, as efficien tly as p ossible, the subs et of h yp otheses tha t can b e expected to b e as predictiv ely accurate as possible. In this con text, there is no theoretical justification fo r requiring 15 that the up dating rule ob ey Regularit y; on the contrary , there a re go o d reasons for wh y w e migh t w ant an up dating rule that violates Regularity . In particular, suppose the p osterior distribution will be used in order to make a we ighted probabilistic prediction, i.e. the goa l is f o r p ( D | H i ) p E ( H i ) to b e as accurate on future data D as p ossible. In that case, it would seem inadvisable to assign p ositiv e probabilit y to any h yp othesis that has shown itself t o be v ery predictiv ely inaccurate, s ince the predictions made by suc h a hypothesis w ould lik ely thro w off the weigh ted prediction. On the ot her hand, w e do not w ant to go to the opp osite extreme and base the prediction on the single h yp othesis that has p erformed best on the evidence, as that is liable to lead to o ve rfitting ( F orster and Sob er, 1994). Predictiv e updating enables one to set the proba bilities of predictiv ely inaccurate h yp otheses to 0 in a principled (and conserv ativ e) w a y . Let’s consider a sp ecific example. When t he hypotheses under considerations mak e probabilistic predictions a nd the go al is maximal predictiv e accuracy , it is natural to use a strictly prop er scoring rule as the measure of eviden tial fav oring (Gneiting and Raftery, 2 007). F o r v arious reasons, the most p opular scoring rule in applied researc h is probably the Con tinuous Rank ed Probability Score (CRPS). Supp ose w e ha v e a set of comp eting statistical mo dels M 1 , M 2 , etc., and for eac h mo del, let p M i b e the marginal (cumulativ e) probability forecast distribution corre- sp onding to M i . Supp ose, mor eov er, that p M i has finite first momen t, that X , X 1 and X 2 are indep enden t and identic ally distributed random v ariables that follo w the distribution of p M i , and that x is the actual observ ed outcome. Then the CRPS can b e written in the following w ay (where the ex p ectations are taken relativ e to p M i ): CRPS( p M i , x ) = E | X − x | − 1 2 E | X 1 − X 2 | (4.1) As ( 4 .1) make s clear, CRPS is a statistical generalization of absolute error. As Gneiting a nd Raftery (200 7) p oin t out, a significant b enefit of the CRPS is tha t it is easily in terpretable, since the outputs of (4.1) can b e rep orted in the same units as the me asuremen ts. F or example, supp ose the meas uremen ts ar e in terms o f meters. Then the CRPS score of a model on an observ ation will b e a represen tation of ho w man y meters inaccu ra t e the model’s predictions are of that observ ation, on a v erage (since the prediction is a probability distribution rather than a single n umber, the a v erage is needed). If we let Ev[ x | p M i ] = a ∗ CRPS( p M i , x ), where a is some constant, and assign prior probabilities to all the models, then predictiv e up dating can be used to assign p osterior probabilities to all the models. 11 Imp ortan tly , give n sufficien t evidence 11 If the models contain parameters, then th e pr o babilit y distributions ov er those parameters ma y 16 (and depending on ho w the constan t a is c hosen) man y of the mo dels will receiv e a p osterior probabilit y of 0. These po sterior pro babilities can then be used for mo del selection or for making a w eighte d prediction using all the mo dels. Of course, it is an empirical question whether predictiv e up dating is better (for predictiv e purpo ses) than inferen tial up dating (including standard Ba y esian up dating). An empirical ev a lua ting of predictiv e up dating will hav e to w ait for a different occasion, how ev er. In this section I hav e simply tried to suggest one concrete w a y in whic h predictiv e up dating m ay b e implemen ted. 4.2 The r elationship b et w een inferen tial up dating and other up dating pro cedures As w as already men tioned in the in tro duction to the pap er, standard Ba ye sian updat- ing is clearly a special case of inferential up da ting: more precisely , w e get Ba y esian up dating if and only if Ev[ E | H ] ∝ p ( E | H ), i.e. if and only if the eviden tial me asure is prop ortiona l to the lik eliho o d. What V a ssend (2019a) calls “ quasi-Bay esian up- dating” is also a sp ecial case of inferen tial up dating; indeed, quasi-Ba ysian up dating is simply inferen tial up dating with an eviden tial measure that has b een suitably cal- ibrated to a ve risimilitude measure. Similarly , Douv en’s (201 6) IBE-ba sed updating rule is also clearly a kind of inferen tial up dating. P erhaps mor e in terestingly , Bissiri et al.’s (2016) general Bay esian up dating is also a special case of inferen tial up dating. More pre cisely , w e ha v e: General Bay esian up dating is a special case of inferen tial up dat- ing. Supp ose the evidential me asur e Ev is a strictly de cr e asing function f of so me loss function, L ( E , H ) , such that for al l E 1 and E 2 , Ev satisfies the fol lowing c onditions : 1. Ev[ E 1 | H , E 2 ] = Ev[ E 2 | H ] = f ( L ( E 1 , H )). 2. Ev[ E 1 , E 2 | H ] = f ( L ( E 1 , H ) + L ( E 2 , H )) . Then infer ential up dating has the fol low i n g form : p ( H | E ) = e − c ∗ L ( E ,H ) p ( H ) P i e − c ∗ L ( E ,H i ) p ( H i ) F or some c on s tant c . be updated using either inferential or predictiv e up dating. 17 A sk etc h of the pro of, whic h is straightforw ard, is giv en in App endix E. Although general Bay es ian up dating is a sp ecial case of inferen tial up dat ing, the r ev erse is not the case b ecause – a s was previously men tioned – many reasonable eviden tial measures cannot b e written as a function of an additiv e loss function. Supp ose, for example, that t he h yp otheses under consideration are real- v alued functions, f i and that the eviden tial measure is of the form Ev[( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) | f i ] = Minim um( | y 1 − f i ( x 1 ) | , | y 1 − f i ( x 1 ) | , . . . , | y 1 − f i ( x 1 ) | ). It is clear in this case that the e viden tial measure cannot be written as a function of an additiv e loss f unction, simply b ecause the Minim um operato r is not additiv e. A diagram depicting the relationship b etw een inferen tial up dating, predictiv e up dating, and v arious up dating rules that ha v e b een suggested in the literature is giv en in Figure 1. Legitimate up dating rules Inferen tial updating Quasi-Ba ye sian (V assend, 2019a) Standard Bay esian IBE-based (Douv en, 2016) General Bay esian (Bissiri et al., 2016) Predictiv e updating Figure 1: Ov erview of v a rious up dating rule s 5 Conclus ion The primary purpose of this pap er has been to justify a s et of v ery general sync hronic and diac hronic inductiv e norms. The resulting normat iv e framew ork can b e put to b oth philosophical and scien tific use. In philosoph y of scienc e, a standard w a y of analyzing scien tific metho dology is b y s eeing whether the methodology mak es sense from a Bay es ian p ersp ectiv e. F or example, in this wa y , Sob er (2015) analyzes parsi- mon y inference , 12 Da wid et al. (2015) analyze no-alt ernat ives arg umen ts in ph ysics, 12 Sob er uses a lik eliho o dist approach, which is Bay esianism without the priors. 18 Sc h upbac h (2018) analyzes robustness a na lysis, and Myrvold (2016) ev aluates the epistemic v alue of unification. Since the preceding analyses tak e place in a Ba y esian framew ork, they inherit the limitatio ns and assumptions of Ba ye sianism. In the broader normative framew ork dev elop ed in this pap er, it’s p ossible to c hec k whether the analys es still hold up whe n those assumptions are lifted. F or example , Myrv old (2016) sho ws that more unifying h yp otheses will b e more confirmed b y evidence than less unifying hy p ot heses, other things b eing equal. Since his a nalysis is Ba ye sian, he implicitly uses the lik eliho o d as his measure of eviden tial f av oring. A natural question to ask is whether his result still holds if the lik eliho o d is replaced with an arbitrary measure of eviden tial fav oring. The p erhaps surprising answe r is y es, although a prop er demonstration of this fact m ust b e r eserv ed for a differen t time. The normative framew ork dev elop ed in this pa p er can also b e used fo r scien- tific inference. Indeed, implicitly it already has b een—as sho wn in Section 4.2, the general Ba y esian updating rule suggested b y Bis siri et al. (2016) is a sp ecial case of inferen tial up dating, and general Ba y esian up dating is gaining in p opularity in the statistical comm unit y . But inferential up dating is more general than general Ba y esian up dating, a nd allo ws for the use of eviden tial measures that cannot b e represen ted in Bissiri et al.’s ( 2 016) framew or k. One example is the phylogenetic parsimony measure discussed by V a ssend (20 19a). P redictiv e up dating can a lso b e applied in scien tific inference problems, for example through the use of strictly proper scoring rules as suggested in Section 4 .1. Of course, it is ultimately an empirical question whether predictiv e up dating p erforms b etter than inferen tial up dating. An answ er to this question m ust w ait until later; in this pap er, m y g o al has b een to provide a general normative fr a mew ork for inductiv e inference that is as flexible as p ossible while ob eying basic theoretical desiderata. References Acz ´ el, J. (2006) . L e ctur es on F unctional Equations and Their Appli c ations . Dov er Bo oks on M at hematics. Do v er Publications. Amari, S.- I. (2009). alpha - Div ergence is Unique, Belonging to Both f - D iv ergence and Bregman Div ergence Classe s. IEEE T r ansactions on Information The ory 55 (11 ), 4925 – 49 31. Bernardo, J. M. and A. F. M. Smith ( 1 994). Bayesian The ory . Wiley , New Y ork, NY. 19 Bissiri, P . G., C. Holmes, and S. W alker (2016). A General F ramework for Up da ting Belief Distributions. Journal of the R oyal Statistic al So ciety. Series B (Metho d- olo gic al) 78 (5), 1103–113 0. Bo x, G. E. P . (1980 ). Sampling a nd Ba y es’ Inference in Scien tific Mo delling and Robustness. Journal of the R oyal Statistic al So ciety. Series A (Gener al) 143 (4), 383–430. Da wid, R., S. Har t ma nn, and J. Sprenger (2015 ). T he No Alternativ es Argument. British Journal for the Philoso p hy of Sc i e n c e 66 (1), 213– 234. Douv en, I. (20 16). Explanation, Up dating, and Accuracy. Journal of Co gnitive Psycholo gy 28 (8), 1004 –1012. Douv en, I. and S. W enmac ke rs (2017). Inference to the Best Explanation v ersus Ba y es’s R ule in a So cial Setting. British Journal for the Philoso p hy of Sci- enc e 68 (2 ), 535–570. F orster, M. R . (1995, Septem b er). Ba y es and bust: Simplicity as a problem for a probabilist’s approac h to confirmation. British Journal for the Philosophy of Scienc e 46 (3), 399–424. F orster, M. R. and E. Sob er (199 4). Ho w T o T ell When Simpler, More Unified, or Less Ad Ho c Theories Will Pro vide More Accurate Predictions. The British Journal for the Philosophy of Scienc e 45 (1), 1–3 5 . Gelman, A. and C. R. Shalizi (2013). Philosophy and the Practice of Bay esian Statistics. British Journal of Mathematic a l and Statistic al Psycholo gy 6 6 , 8–38 . Gneiting, T. a nd A. E. R aftery (2007). Strictly Prop er Scoring Rules, Prediction, a nd Estimation. Journal of the A meric an Statistic al Asso ciation 102 (477), 359– 378. Grea ve s, H. and D. W allace (2006). Justifying conditionalization: Conditionalization maximizes epistemic utilit y. Mind 115 (459), 607–632 . Gr ¨ un w ald, P . and T. v an O mmen (2017). Inconsis tency of Ba y esian Inference for Mis- sp ecified Linear Models, and a Prop o sal for Repairing It. Bayesi a n A nalysis 12 (4), 1069–110 3. Jeffrey , R. (1983). T h e L o gic of De cision (Second ed.). Cambridge Univ ersit y Pres s, Cam bridge. 20 Jo yce, J. (1998). A Non-Pragmat ic Vindication of Probabilism. Philos o phy of Sci- enc e 65 (4 ), 575–603. Jo yce, J. (2009). Accuracy and Coherence: Prosp ects for an Alethic Epistemology of Partial Belief. In F. Hub er and C. Schmidt-P etri (Eds.), D e gr e es of Belief . Syn these. Key , J. T., L. R. P ericc hi, and A. F. M. Smith (1999). Ba ye sian Mo del Choice: What and Wh y? In J. M. Bernardo, J. O. Berger, A. P . D a wid, and A. F. M. Smith (Eds.), Bayesian Statistics 6 , pp. 343–370. Oxford: Oxford Un ivers ity Pres s. Kop ytov , V. M. a nd N. Y. Medv edev (1996). Right-Or der e d Gr oups . Sib erian Sc ho ol of Algebra a nd Logic. Springer. Leitgeb, H. and R. P ettigrew (2 0 10). An Ob jectiv e Justification of Ba y esianism I I: The Consequences of Minimizing Inaccuracy. Ph ilosophy of Scienc e 7 7 , 236–272 . Levinstein, B. A. (2012) . Leitgeb and Pe ttig r ew on Accuracy and Updating. Philos- ophy of Scienc e 79 (3), 413– 424. Myrv old, W. (2016 ). On the E viden tial Imp ort of Unification. Unpublished man uscript. P ettigrew, R. (2016). A c cur acy and the L aws of Cr e denc e . Oxfor d Univ ersit y P ress. Predd, J. B., R. Seiringer, E. H. Lieb, D . N. Osherson, H. V. P o or, and S. R. Kulk arni (2009). Probabilistic Coherence and Prop er Scoring Rules. IEEE T r ansactions on Information The ory 55 (10), 4786–4 792. Sc h upbac h, J. N. ( 2 018). R obustness Ana lysis as Explanatory R easoning. British Journal for the Philosophy of Scienc e 69 (1), 275 – 300. Shaffer, M. J. (2001 ) . Bay esian Confirmatio n of Theories That Incorp orate Idealiza- tions. Philosophy of Scienc e 68 (1), 36 –52. Sob er, E. (2015). Ockham’s R azors: A User’s Manual . Cam bridge Univ ersit y Press. Sprenger, J. (2009). Statistics Betw een Inductiv e Logic and Em pirical Science. Jour- nal of Applie d L o g ic 7 (2), 239– 2 50. Sprenger, J. (forthcoming). Conditional D egr ee of Belief. T o app ear in Philo sophy of Scienc e . 21 T anton, J. (2005). Enc yclop e dia of Mathematics . Science Encyclop edia. F acts on File. V assend, O. B. (2019a ). A V erisimilitude F ramew ork for Inductiv e Inference, with an Application to Ph ylogenetics. T o app ear in British Journal for the Philosop hy of Scienc e . V assend, O. B. (2019b). New Seman tics for Ba y esian Inference: The Interpre tive Problem and Its Solutions. T o a pp ear in Philosophy of Scienc e . W alker, S. G. (2013). Bay es ian Inference with Misspecified Mo dels. Journal of Statistic al Plann i ng and In fer enc e 143 ( 10), 1621–1633 . Zhang, T. (200 6 ). F rom e-En trop y to KL-En tro p y: Analysis of Minim um Informat ion Complexit y Densit y Estimation. The A nnals of Statistics 34 (5), 2180–2210. A Characterizatio n of t he com bination functi o n The goal of this section is to sho w the c hara cterization of the com bination function in Section 3.1. There are t w o cases to consider: k = 0 and k 6 = 0. Since the tw o cases ar e v ery similar, I will only consider the case where k 6 = 0. So supp ose that fo r some non-zero k , w e hav e: ∂ 2 c ( x, y ) ∂ x∂ y = k (A.1) T aking the antideriv ativ e with resp ect to x , it follo ws that: ∂ c ( x, y ) ∂ y = k x + C ( y ) + D (A.2) Where C ( y ) is a function of y , but not x , and D is some real n umber. T aking the antideriv ativ e of (A.2) with resp ect to y , w e get: c ( x, y ) = k xy + Z C ( y ) dy + Dy + G ( x ) + F (A.3) Where G is a function of x and F is some real n umber. Moreov er, exc hanging the lab els x and y in (A.3) giv es us: c ( y , x ) = k y x + Z C ( x ) d x + Dx + G ( y ) + F (A.4) 22 But since c ( x, y ) = c ( y , x ), (A.3) and (A.4) m ust b e equal, whic h means that k xy + R C ( y ) dy + D y + G ( x ) + F = k xy + R C ( x ) d x + Dx + G ( y ) + F , and hence R C ( y ) dy + Dy + G ( x ) = R C ( x ) d x + D x + G ( y ). Rearr a nging, w e get: G ( x ) = Z C ( x ) d x + Dx + G ( y ) − Z C ( y ) dy − Dy (A.5) But since G ( x ) do es not dep end on y , the only w a y for (A.5) to b e true is if G ( y ) − R C ( y ) dy − Dy is equal to s ome constant n um b er, c . Hence, R C ( y ) dy + D y = G ( y ) − c . Plugging this bac k in to (A.3) ( and abs orbing the constan t c in to F ), w e get: c ( x, y ) = k xy + G ( x ) + G ( y ) + F (A.6) Without loss of generalit y , w e ma y assume that G (0) = 0, b ecause if G (0) = A for some non-zero A , then w e can just put G ′ ( x ) = G ( x ) − A and F ′ = F + 2 A , and w e get: c ( x, y ) = k xy + G ′ ( x ) + G ′ ( y ) + F ′ , with G ′ (0) = 0 (i.e. we simply a bsorb the constant A into F ′ ). No w the fact tha t c is asso ciativ e and comm utat iv e means tha t c ( c ( x, y ) , z )) = c ( c ( y , z ) , x ), and henc e (A.6) implies that, for all x , y , a nd z : k ( k xy + G ( x ) + G ( y ) + F ) z + G ( k xy + G ( x ) + G ( y ) + F ) + G ( z ) + F = k ( k y z + G ( y ) + G ( z ) + F ) x + G ( k y z + G ( y ) + G ( z ) + F ) + G ( x ) + F (A.7) Simplifying, w e hav e: [ G ( x ) + G ( y ) + F ] k z + G [ k xy + G ( x ) + G ( y ) + F ] + G ( z ) = G ( y ) k x + G ( z ) k x + F k x + G [ k y z + G ( y ) + G ( z ) + F ] + G ( x ) (A.8) Note that b ecause c is t wice differentiable, so is G . T aking the deriv ativ e of eac h side of (A.8) with resp ect to z g ives : [ G ( x ) + G ( y ) + F ] k + ∂ G ( z ) ∂ z = ∂ G ( z ) ∂ z k x + G ′ [ k y z + G ( y ) + G ( z ) + F ] ∗ ∂ G ( z ) ∂ z (A.9) Next, ta king the de riv ativ e of eac h s ide of (A.9) with resp ect to x giv es: ∂ G ( x ) ∂ x k = ∂ G ( z ) ∂ z k (A.10) 23 Hence, since k 6 = 0 , it follo ws that ∂ G ( x ) ∂ x = ∂ G ( z ) ∂ z . But since G ( x ) do es not dep end on z and G ( z ) do es not dep end on x, this means tha t ∂ G ( x ) ∂ x m ust b e a constan t n um b er, i.e. ∂ G ( x ) ∂ x = a for some constan t a . Since w e are assuming that G (0) = 0, it follo ws that G ( x ) = ax . Next, t he fact that c ( x, y , z ) = c ( c ( x, y ) , z ) implies : k xy z + ax + ay + az + F = k ( k xy + ax + ay + F ) z + a ( k xy + ax + ay + F ) + az + F (A.11) Comparing the terms that contain xy z , 13 w e se e that k = 1, and henc e: ax + ay = axz + ay z + F z + axy + a 2 x + a 2 y + F a (A.12) Comparing the terms t ha t contain z , w e see that a ( x + y ) + F = 0 for all x and y . The only wa y this can b e true is if a = F = 0. Hence w e ha v e, finally , that c ( x, y ) = xy . B Characterizati o n of the nor malizati o n step The g oal o f this section is to show the c haracterization of the normalization step in Section 3.2. Let { a i } b e an arbitrary set of n nu mbers, S 1 , with normalization function f S 1 . Conside r the set S 2 = { 1 a i } and the set S 3 = { 1 i } , which consists of n copies o f 1. Then condition (1) implies that, for all i , f ( c ( f ( c ( 1 a i , a i )) , 1)) = f ( c ( 1 a i , f ( c ( a i , 1)))), where the v a rious f ’s are relativ e to the relev an t sets. F or exam- ple, in f ( c ( 1 a i , a i )), f is a rescaling function defined on the set { c ( 1 a i , a i ) } . Note that w e are abusing notation here: strictly sp eaking the v arious f ’s are no t the same func- tion, since they are defined o ve r differen t sets. How ev er, to av oid ne edless clutter, I use f without subsc ripts. According to the c haracterization of the com bination function, the com bination function is either multiplicativ e or additive. Since the deriv ations are ve ry similar, I will only sho w that the normalization function m ust b e m ultiplicativ e giv en that the com bination function is m ultiplicativ e. So supp ose that the combination function is c ( a, b ) = ab . Then w e get: f ( f ( 1 a i ∗ a i ) ∗ 1) = f ( 1 a i ∗ f ( a i ∗ 1 )). Th us, w e ha v e: f ( f (1)) = f ( 1 a i ∗ f ( a i )), i.e. f ( 1 a i ∗ f ( a i )) is a constan t. But since, f is one-to-one, that means 1 a i ∗ f ( a i ) m ust a lso b e a constant. That is, there exists a constan t k suc h tha t, for all a i in S , 1 a i ∗ f ( a i ) = k . Hence f ( a i ) = k ∗ a i for all a i . Since S 13 Whic h w e can do, a s befor e, by successively differentiating with resp ect to x , y , and z . This pro of method is s ometimes called “equating coe fficie n ts” (T anton, 2005, p. 169 ). 24 w as a n arbitrary set, it follows that in general the normalization pro cedure m ust b e m ultiplicativ e giv en that the combination function is m ultiplicativ e. C Characteriz ati o n of inferen tial up dating The goal in this section is to sho w that the o nly legitimate up dating r ule that satisfies Regularit y is inferen tial up dating. According to the results in sections 3 .1 and 3.2, an y le gitima t e upda ting rule m ust either ha v e (1) a m ultiplicativ e com bination step and a multiplicativ e normalization ste p, or (2) an additive com bination step and an additiv e normalization step. It is easy to sho w that it is p ossible for an up dat ing rule that satisfies ( 1 ) to satisfy Regularity , and that – indeed – the resulting up dat ing rule is inferential up dating. In order to show that inferential up dating is the only up dating rule that s atisfies Regularity , it suffices to s how that there is no up dating rule satisfying (2) that also satisfies Regularity . Supp ose, for the sake of contradiction, that there is some up dating rule that satisfies bo th (2) and Regularit y . In order for Regularity to b e obeye d, it has to b e the case that giv en an y set of non-zero prior pro ba bilit ies o ve r a set of h yp otheses, h 1 , h 2 , . . . , h n , and giv en an y set of eviden tial scores for the hypotheses, e 1 , e 2 , . . . , e n , the p osteriors are also all non-zero. Th us, if N is the normalization function, then the following m ust b e true for all h i : N ( e i + h i ) > 0 (C.1) Since the normalization function is a ssumed to satisfy (2), C.1 implies that the follo wing is true for all i , where d is an additiv e normalizatio n constan t: e i + h i + d > 0 (C.2) Since the po sterior probabilities m ust sum to 1, w e also hav e: X i ( e i + h i + d ) = 1 (C.3) And therefore, d = − 1 n P e i . And so w e ha ve , for all h i : e i + h i − 1 n X e i > 0 (C.4) But it’s o bvious tha t (D.4) will not in gene ral b e true. F or example, suppose e 1 is the smallest e i . The n r = e i − 1 n P e i < 0. No w supp ose it’s a lso the case tha t h 1 < − r . Then we ha ve : 25 e 1 + h 1 − 1 n X e i = r + h 1 < 0 (C.5) Consequen tly , additiv e com bination and additiv e normalization jointly violate Regularit y . So there can b e no up dating pro cedure that satisfies b oth (2 ) and Reg- ularit y . D Characterizati o n of p redictiv e u p datin g The goal in this section is to sho w that the only legitimate up dating rule that viola tes Regularit y but satisfies Cons erv ativ eness is predictiv e up dating. It is clear that an y up dating rule that satisfies Conserv ativ eness but violates Regularity m ust b e addi- tiv e. This is b ecause an y m ultiplicative up dating rule that satisfies Conserv ativene ss clearly a lso satisfies Regularit y . So supp ose the up dating r ule is additiv e and satisfies Conserv ativ eness . Then the goal is to sho w that t he up dating rule m ust b e equiv alen t t o predictiv e up dat- ing. Since the rule is additiv e, it mu st hav e the following form, where p E is the p osterior probability distribution, H i is a hypothesis, h i is the prior probability of the h ypothesis, e i is the eviden tial score of the h yp othesis, and d is a normalization constan t: p E ( H ) = ( 0 Giv en that x is su fficien tly low h i + e i + d Otherwise (D.1) If the up dating r ule is conserv ativ e, then as few h yp otheses as p ossible should be assigned a p osterior proba bility of 0. It remains to sho w that this uniquely happens when d is minimal. Supp ose there are n h yp otheses. Without loss of generalit y , supp ose the h ypo t heses are ordered suc h tha t 0 ≥ p E ( H 1 ) ≥ p E ( H 2 ) ≥ . . . ≥ p E ( H n ). Then there is some index m suc h that p E ( H i ) = 0 for i ≤ m and p E ( H i ) > 0 fo r i > m . Note tha t the up dating pro cedure is conserv ative if and only if m is minimal b ecause m is minimal if and only if a minimal num b er of h yp otheses ha ve a p o sterior probabilit y of 0 . In order for the p osterior probabilities to b e probabilistic, we m ust ha v e: X i p E ( H i ) = X i>m ( h i + e i ) + ( n − m ) d = 1 (D.2) No w supp ose we ha v e a differen t up da t ing rule resu lting in some po sterior p ′ that is not conserv ative : i.e. there an index m ′ > m suc h that p ′ E ( H i ) = 0 for i ≤ m ′ 26 and p ′ E ( H i ) > 0 for i > m ′ . Then p ′ m ust satisfy the follo wing constraint for some normalization c onstant d ′ : X i>m ′ ( h i + e i ) + ( n − m ′ ) d ′ = 1 (D.3) Comparing D .2 and D.3 and remem b ering that m ′ > m , we see that: 0 < m ′ X i = m ( h i + e i ) = ( n − m ′ ) d ′ − ( n − m ) d (D.4) And hence, d < n − m ′ n − m d ′ < d ′ (D.5) Hence, d < d ′ . What the ab ov e pro of sho ws is that any conse rv ativ e updating r ule has a smaller additive normalization constan t than an y non-conserv ativ e up dating rule. T o finish the pro of, w e show that there is just one conserv ative up dating rule. Here w e can use D.4 aga in. I f b o th up dating rules are conserv ativ e, then w e ha v e m = m ′ , and he nce – making the necessary amendmen ts in D.4, w e hav e: 0 = m ′ X i = m ( h i + e i ) = ( n − m ) d ′ − ( n − m ) d (D.6) Hence it fo llows that d ′ = d . But then the t w o up dating rules are equiv alen t. Hence, there is only one conserv ativ e up dat ing rule, namely the one tha t uses a minimal additive normalization constan t. This is predictiv e updating. E General Ba y esian up dating is a sp ecial case of inferen tial up dating The go al in this section is to show that Bissiri et al.’s (2 0 16) general Bay es ian up- dating is a sp ecial case of inferen tial up dating. F or some normalization constan t k , w e ha v e: p ( H | E 1 , E 2 ) = k ∗ Ev[ E 1 | H , E 2 ]Ev[ E 2 | H ] p ( H ) = k ∗ f ( L ( E 1 , H )) f ( L ( E 2 , H )) p ( H ) (E.1) But we also hav e: 27 p ( H | E 1 , E 2 ) = k ∗ Ev[ E 1 , E 2 | H ] p ( H ) = k ∗ f ( L ( E 1 , H ) + L ( E 2 , H )) p ( H ) (E.2) Comparing C.1 and C.2, w e see that f ob eys the following functional equation for all x and y : f ( x ) f ( y ) = f ( x + y ). Let g ( x ) = log f ( x ). Then g ( x + y ) = g ( x ) + g ( y ), whic h is the w ell kno wn Cauc h y equation whose solution is g ( x ) = − cx , for some p ositive constant c (Acz ´ el, 2006, p. 31) (since f , and therefore g , is strictly decreasing). Consequen tly f ( x ) = e − cx , a nd hence p ( H | E ) = k ∗ e − c ∗ L ( E ,H ) p ( H ), whic h is Bissiri et al.’s (2016) general Ba y esian upda t ing rule. F An alternative c haracterizatio n o f t h e com bina- tion ste p In b oth ev eryda y and scien tific con texts, it’s common to think of evidence alge- braically: m ultiple lines of evidence com bine in order pro vide stronger evidence; some evidence fav ors a h yp othesis, while other evidence g o es ag ainst it; a piece of evidence here can cancel out a piece of evidence there; and some purp orted evide nce has no effect at all. In o ther words , eviden tial fa v oring has all the hallmarks of a mathematical group. No w, suppose – as w e hav e b een doing up to no w – that w e use real num bers to represen t eviden tial scores. Then the set of all p ossible eviden tial scores, G , to g ether with the comb inatio n function plausibly form a ma t hematical group. Indeed, they plausibly form an A r chime de an group, b ecause in tuitiv ely there is no maximal eviden tial score. That is, if we use • t o denote the com bination f unc- tion, i.e. e 1 • e 2 = c ( e 1 , e 2 ), then it is plausible that ( G, • ) satisfies the followin g axioms: 1. Closure. F or all p o ssible eviden tial scores e 1 and e 2 , e 1 • e 2 is als o a p ossible eviden tial sc ore. 2. Asso ciativity . F or a ll p ossible evid ential s cores e 1 , e 2 and e 3 , ( e 1 • e 2 ) • e 3 = e 1 • ( e 2 • e 3 ). 3. Iden tity . There exis ts a p ossible e viden tial score i suc h that for all e , i • e = e • i = e . I.e., there exis ts a real num ber that represe nts evid ence that has no effect (either fa v orable or unfa v orable). 28 4. In verse. F or each p ossible evide ntial score e , there exists a p ossible eviden tial score e ′ suc h that e • e ′ = e ′ • e = i . I.e. ev ery eviden tial score could p ot en tially (in principle) b e cancelled out b y other coun terv ailing evidence. 14 5. Comm utativity . F o r all p ossible eviden tial scores e 1 and e 2 , e 1 • e 2 = e 2 • e 1 . I.e. the o r der in whic h the evidence is considered is irrelev an t. 6. Archime dean prop erty . F or all p ossible eviden tial scores e 1 and e 2 , there exists an in teger n su ch that e 1 < e 2 • e 2 . . . • e 2 (n times ). Supp ose, in addition, tha t the set of eviden tial scores is totally ordered: for all eviden tial scores e 1 and e 2 , either e 1 > e 2 or e 1 ≤ e 2 . 15 Then w e can use the follo wing imp ortan t result from group theory (see (Kop yto v and Medv edev, 1996, p. 33), for a pro of ): H¨ older’s theorem. Ev ery Arc himedean totally ordered group is order- isomorphic to a subgroup of the additiv e g roup of r eal n um b ers with the natural order. The fact that ( G, • ) is order-isomorphic to a subgroup of the additiv e g r o up of real n um b ers with the natural order means there exists some subgroup, ( S, +) of t he real n um b ers and a one-to -one function, g , from ( G, • ) to ( S, +) that o b eys the follo wing equation for all e 1 and e 2 in G : g ( e 1 • e 2 ) = g ( e 1 ) + g ( e 2 ). Since g is one-to-one, it has an in v erse, f . Hence, for all e 1 and e 2 in G , w e can write: e 1 • e 2 = f ( g ( e 1 ) + g ( e 2 )). In the main text, I show ed that the normalization pro cedure m ust be either addi- tiv e or m ultiplicativ e, give n that the com bination function is either m ultiplicativ e or additiv e. But, arguably , it is not unreasonable to simply assume that the normaliza- tion m ust b e either multiplicativ e or additive . Indeed, all up dating rules that hav e b een prop osed in the literature ha v e implicitly relied on a normalization pro cedure that is either m ultiplicativ e or additiv e. In particular, the normalization pro cedure implicit in b oth standard Ba ye sian up dating and Jeffrey updat ing (Jeffrey, 19 8 3) is m ultiplicativ e, and the normalization pro cedure implicit in Leitgeb and P ettigrew’s (2010) alternative to Jeffrey upda t ing is additiv e. Finally , it is reasonable to a ssume – as w e did in the main text – tha t the nor- malization pro cedure comm utes with the com bination function in the sense that, for all a , b , and c , we ha v e: N ( a • N ( b )) = N ( N ( a ) • b ) = N ( a • b ). W e can no w giv e the following c haracterization of the com bination function: 14 A referee p oints out that this is a bit of an idealization, since a piece of evidence and a defeater of that evidence will not t ypically cancel each other out precisely . 15 A referee rightly p oints out that this assumption is also idealized. 29 Alternativ e cha ract erization of the com bination function. Sup- p ose the c ombination function, c ( x, y ) satisfies the fol lowing r e quir ements : 1. The set of all eviden tial scores, G , and the com bination function c ( x, y ) = x • y together form a totally ordere d Arc himedean group. 2. The combination function comm utes with the normalization func- tion N in the sense that, for all a , b , and c : N ( a • N ( b )) = N ( N ( a ) • b ) = N ( a • b ). Then c must have one of the fol lowing two forms : 1. If the normalization function is additiv e, then c ( x, y ) = x + y . 2. If the normalization function is m ultiplicativ e, then c ( x, y ) = xy . Pr o of. The fact that the combination function commu tes with the normalization function implies that, for ev ery e with in v erse e − 1 : N ( e • e − 1 ) = N ( N ( e ) • e − 1 ) = N ( f ( g ( N ( e )) + g ( e − 1 ))) (F.1) Therefore, for all e , N ( f ( g ( N ( e )) + g ( e − 1 ))) = N ( i ), where i is the iden tity elemen t of t he group. Since N is one-to -one, this means that f ( g ( N ( e )) + g ( e − 1 )) = k , for some c onstant k that do es not dep end on e . F urthermore, s ince f is o ne-to-one, this in turn implies that g ( N ( e )) + g ( e − 1 ) = k ′ , f o r some constan t k ′ that do es not depend on e . F o r the same reason, (F.1) also implies that g ( e ) + g ( e − 1 ) = k ′′ , f or some constan t k ′′ that do es not dep end on e . Hence w e ha v e, finally , that g ( N ( e )) − g ( e ) = K , where K = k ′ − k ′′ . Hence, g ( N ( e )) = g ( e ) + K . If the normalization pro cedure is multiplicativ e, then for some normalization constan t a , w e ha v e g ( ae ) = g ( e ) + K . Note that a dep ends on the set to whic h e b elongs. If { e i } is t he set, then a = 1 P e i (F.2) Hence, depending on the o ther members of the set to w hich e b elongs, a can b e an y n um b er in the half-op en in terv al (0 , 1 e ). Thus w e hav e, f or all e a nd all a in (0 , 1 e ), that g ( ae ) = g ( e ) + K , where K is a constant that may depend on a , but do es not dep end on e . Similarly , we hav e—for some normalization constan t b —that g ( bae ) = g ( ae ) + K ′ = g ( e ) + K ′′ . Here, b can b e an y n um b er in the rang e (0 , 1 ae ), or in other w ords 30 in (0 , ∞ ). But if w e let y = ab and x = e , then the preceding means that for all x and y in (0 , ∞ ) we ha v e: g ( y x ) = g ( x ) + K ′′ (F.3) Where K ′′ dep ends on y , but not o n x . Interc hanging the role of y and x , we also ha v e: g ( xy ) = g ( y ) + K ′′′ (F.4) Where K ′′′ dep ends on x , but not on y . Comparing the ab ov e equations, w e se e that g ( x ) + K ′′ = g ( y ) + K ′′′ . This implies the follow ing: g ( xy ) = g ( x ) + g ( y ) + C (F.5) Where C is a constant that dep ends on neither x nor y . Now note that f (2 g ( i )) = i • i = i = f ( g ( i ). Since f is one-to-one, t his implies that g ( i ) = 0. Next, ( F.5) implies that g ( i ) = g (1 ∗ i ) = g (1) + g ( i ) + C . Th us g (1) = − C . Using (F.5) again, w e hav e g (1) = g ( i ∗ 1 i ) = g ( i ) + g ( 1 i ) = g ( 1 i ). But since g is o ne-to-one, this implies that 1 i = 1, so that i = 1. Hence − C = g ( 1) = g ( i ) = 0, so C = 0. Finally , then, we hav e, for all x > 0 and y > 0: g ( xy ) = g ( x ) + g ( y ) (F.6) No w pu t r ( x ) = g ( e x ). Then (F .6) b ecomes, f o r all re al x and y : r ( x + y ) = r ( x ) + r ( y ) (F.7) This is the Cauc h y functional equation, whose only s olution is r ( x ) = cx , for an arbitrary constan t c (Acz´ e l , 20 06, p. 31) . Hence, g ( x ) = r (log x ) = log x c . Since f is the inv erse of g , w e ha v e that f ( x ) = e x 1 c . Finally , then, w e ha v e: x • y = f ( g ( x ) + g ( y ) ) = e (log( x c )+log( y c )) 1 c = e ( c ∗ log( xy )) 1 c = xy (F.8) I.e. the combination function is m ultiplicativ e, c ( x, y ) = xy . 31

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment