Information vs. Uncertainty as the Foundation for a Science of Environmental Modeling

! 1 ! Information vs. Uncertainty as the Foundation for a Science of Environmental Modeli ng Grey S. Neari ng 1,2,3 & Hosh in V. Gu pta 4 1 NASA Goddard Spac e Flight Cente r, Hydrological Sciences Labor atory; Greenbel t, MD; grey.s.nearing@nasa.gov 2 N ational Center for Atmospheric Resear ch, Research Applications Laboratory; Boulder, CO; grey@ucar.edu 3 University of Maryland Baltimore County, Depar tment of Co mputer Sci ence and Electri cal Engineering; Catonsville, MD ; grey@umbc.edu 4 University of Arizona , Department of Hydrology and Atmospheric Sciences; Tucson, AZ; hoshin@email.arizona.edu Abstra ct: I nformation accounting provides a better foundation f or hypothes is tes ting than does uncert ainty quantificati on . A quantitative account of scie nce is derived u nder this p erspective that alleviates the need for epistemic bridge principles , solves the problem of ad hoc falsification crite ria , and deals with verisi militude by facilitating a general approac h to process - lev el diagno stics. Our argument is that the we ll - known inc onsistenc ies of both Bayesian and classi cal statist ical hypothesis te sts are due to the fact that probabilit y theory is an insufficien t logic of scienc e. I nformation theory , as an exte nsion of probability theory , is req uired to provide a complete logic on which to base quantitative theories of empirical learning . The organizing question in this case becomes not whether our theorie s o r models are m ore or les s tru e, or about how much uncertainty is associated with a parti cula r m odel , but instead whether there is any informat ion avail able from experiment al data that might allow us to improve the mo del . This becomes a formal hyp othe sis test , provides a theory of model diagnosti cs, and suggests a new approach to building dynamical systems models . Keyword s: Information Theory , Philosophy of Scien ce, Hypothesis Tes ting, Model Evalu ation 1. Moti vat io n a nd Over vie w Th e ostensibl e object ive of this paper is to outline a conceptual framew ork for testin g an d ev aluating model s in a way that fac ili tate s rel iabl e scienti fic learni ng . An importan t example of this is in the Earth and Environmenta l S ciences , where impr oved proc ess understandi ng i s critical to ma king informed decisions under nonstationary climato logical and anthropogenic conditions . Put simply , complex systems mo dels must be able to produce accurate and reliable predictions due to the fact that the y are in some sense isom orphic w it h the systems that they represent , rathe r than be cause th ey are d evelop ed or ca librated to agree with observat ions . An opt imist ic pe rspec tive might be that our understanding of how to evaluate complex systems models is in a pre - normal phase . T here are eff ecti vely innumerable reports , published across all fields of science, that propose and/ or apply varied and no n - commensurate met hods for model evalua tion , benchmarking, interc omparis on, e tc . Ores kes & Beli tz (2001) recognize a fundam entally sub jective com ponent to mode l evaluation . R elated to the Envir onmental Sciences in particular, Beck et al. (2009) outlined in their NSF white pape r a ‘G ran d C hallenge ’ related to the need for “r adically novel procedures and algorithms … to recti fy the chronic, historical deficit of the pa st four decad es in engag ing co mplex mode ls system atically and successfu lly with field data fo r the purposes of learning and discovery.” A more pessimist ic view com es from recognizing that the problem of reconcil ing complex systems models with observations is fu ndamen tally hard. Th e confirmation holism problem (Severo, 2012) , whic h is a n aspect of the Duhem - Quin e th esis (Harding, 1976) , proposes that sc ientists are o nly able to test m odels in toto , and that it is impossib le to test individual sc ientific hypotheses . Lenhard & W insberg (2011 ) argued th at a consequence of this is that we “are likely to continue to be unable to attri bute the va rious sources o f [ climate model ] succe sses and failures to their internal mode ling assump tions .” Leaving asid e whether the situa tion is quit e so hopel ess, it is certainly true that we curr ently lack any fundamental theor y of how t o diagnose individual components of reductionist - typ e models of systems that a re composed of many int eracting components and processes. ! 2 ! We are opti mis ts. The remain der of this essay wil l argue that meaningful and general solutions to the related problems of model e valua tion and mo del diagnostics require rather f undamenta l adjustmen t to bo th the philosophy and practice of m odel - based science . Our argu ment is that man y of the practical c halleng es that model ers face on a day - to - day basis are due to certain deeply rooted logical incons istencie s ( e.g., Nearin g et al., 2016b) in ou r standa rd suite of quantitative methods of inference . Th ese in consisten cies turn out to be la rgely related to the fact that all of our statis tical methods are built fun damenta lly ar ound a con cept of un certainty (imprecise doxa stic states) and that uncertainty is strictly non - quantifiable ( i.e., all probability distributions are wrong) . W e propose th at it is possible to develop a coherent metho d of science that do es not require explicit treatme nt of uncertainty . Instead of using methods of infer ence that treat uncertainty ( e.g., probabilisti c or statist ical hypothesis tests) , we base our met hods of infer ence on measur e s of information. In partic ular , we measur e whethe r observation data contains information that m ay be used to improve a mode l. We do thi s by fir st not ing tha t probability theory lacks the ability to describe a certain fundamental component of ep istemic behavior during empirical in quiry , and therefo re cannot be a complete logic of science. This deficiency is r emed ied with informatio n theory. The basic insight, which we adopt from Jaynes (2003; p21 ) , is that any lo gic of hypothe si s testing acts on statements about what we know about stat es of affai rs in the world (epistemic propositions ), rather than on statements abou t states of affairs in the w orld (ontic propositions) . Notice th at thi s is exactly what occurs under all types of p robabilistic inference – probabilities are epistemic statements, and these are the quantities that are manip ulat ed d urin g a hypot hesi s t est or appl ica tion of Bayes ’ t heor em. Accord ingl y, if predict ions deriv ed from scientific hyp otheses are statements abo ut states of affairs ( i.e., the model makes a predi ctio n abou t what will happe n) , then any system of logic capable of supporting a scientific metho d must neces sar ily inc lude a theory for relat ing such ontic propositions with epistemic propositions ( i.e ., w hat we lea rn from predicti ve models) . This effectiv ely means assigning probabil ities to model predi ctions and also to observation data. We will refer to method s for assigning quan titative translations between ontic and epistemic propositions as epistemic b ridges 1 . Epistemic bridges are things like error functions , performan ce metri cs, and /or likelihoo d functions . The challenge – as we will discuss in Section 2 and associated A ppendices – i s that ad hoc a ssign ment of epistemic bridge s can cause substantial e rrors during in ference ( e.g., Beven et al. , 2008) . This probl em is re solved by noting tha t there exists a fully d eveloped logic al theory that re lates ontic with epis temic propo sitions . T his theory is not a bridge principle – it is a necessary an d unique con sequenc e o f the calculitic system of logic that supp orts a nd includes probability theory. This is descri bed in acces sible detail by (Knuth, 2004) , and taking this fact serio usly allows us to avoid the need for any ad hoc metrics or any choice of adequacy or falsificat ion criteria . Secti on 3 discusses how improv ing the unde rlying log ic o f hypothes is testing allows us to resolve at least many of the hard practical problems related to hypothes is t esting and mo del evaluation. We of fe r no opi ni on on sci ent if ic real ism , howeve r we do point out th at any qua nti ta tiv e method for as sig nin g degree s of belief about truth - values of models will necessarily contain contradiction simply because all models are approximations. And so we see it as inevitable that any formal and quan titative account of inference will not treat m odels a s truth - apt (Ba yesiani sm, for example, does assign doxastic measures to models, but fails to do so reliably ) . Because we have given up any pret ense of sear ch ing for truth - values of models, it is th erefore necess ary to someh ow draw a distinction between models that are emp irically ju stifiable, and those that are s omeho w isomorp hic with re al - world syste ms. Secti on 4 applies th e theory from Section 3 to this purpo se . In the first sen tence of this essay we said that o ur ostensibl e objective is to outline a conceptual framework for e valuating models. However, there is no way to test any hypothes i s except by first emb edding it into one or more !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 1 The term bridge princi p le ref ers to any explicit connection between different bodies of theory. Here we use the term epistemic bridge to refer to a proposition that rela tes any ontological theory with any epistemological theor y. ! 3 ! predictive models (Cartwright, 1983) . This mea ns that any theory of model evaluation is actually a theory of science writ large , and any th eory of scien ce is a theory of m odel evalu ation – there is no distin ction betwe en these tw o th ings . T hus, the real object ive of this essay is to p ropose a theory of science. Specifically, w hile it is true th at there are no criteria ( e.g., falsifiability) sufficient to differentiate a hypothes i s as either scientific or non - scientific (Laudan, 198 3) , we can demarcate combinations of models plus experiment as either scientific or non - scientific depending on whether the experiment is sufficient to in dicate (eithe r formally o r informa lly) the informatio n content of th e hypothe s is abo ut the exp erimental data . I nformation content of a model is defined here as the fractional a bility of mode l predictions to inform vari ability in experimenta l data . 2. Uncert ainty 2.1. Epistemologi cal Fou ndations : Logic and Pro bability Theo ry B efore we can undertake any meaningful investigation , s cientific or othe rwise, it is ne cessary to have a clearly define d logica l environment . As Quine (1986; chapter 6) put it, changing our logic is akin to changing the question . W ithou t a w ell - speci fied system of log ic, we cannot ask a well - formed question a nd there is no hope of consistency . A s alluded to above, our argument is that many of the scientific community’s practical challenges related to model evaluation are ac tually due to primitive logi cal inconsistencies (Nearing et al., 2016b) . A ny scientific method or practice t hat we derive wil l depend completel y on our logical primitive s. W e will use here what is unargua bly the most com mon system of logic, bas ed on the axioms that Russell (1912 ; chapter VII) – perhaps controversially ( e.g., Bueno and Coly van, 2004) – called self - evident . In this c ontext, all proposition s are either true or fa lse (ex cluded m iddle) but not both (n on - contradiction) , and although w e often do not know the truth - values of a given set of propositions , we may have some ability to form ulate consistent doxastic sta tes. T his sy stem of logic , which w e will h ereafter refer to a s the classical logic , leads to a standard propositional algebra 2 , however this logic remains insufficie nt to des cribe any dynamic learnin g proce ss because it do es not contain a variable that is subject to change. To describe a dynamic learning process we need a calculus, and Cox (1946) laid the groundw ork for a d emonstration (T erenin and Draper, 2015) that the only scalar calculus that is consistent with the classical logic is prob ability the ory , at lea st under certain relatively weak assumptions (Van Horn, 2003) . F or an accessible tre atment of this see Ja ynes (2003 ; chapter 1) . W hile there certainly exist other epistemic calculi that do not derive from th e classical logic ( e.g., Kosko, 1990, Chang, 1958, Resch er, 1968) , it is necessary that we have some logical calculus if we want to formalize any method of scien ce , and f or the rest of this essay we wil l work under the classical logic and its uniq ue scalar extension in to the prob ability calculu s. Like all formal system s of logic , the probability calculus describes relationships between pr opositions . Note two things: 1) that probability distr ibutions a re always conditional on a priori information (Jayn es, 2003; p43) – at the very least a formal and explicit statement of a se t of competing propositions – and 2) that all proba bilities are conditiona l must be explicit in any coherent formulation of scientific inferen ce (How son, 2000; p239 ) . In other words, probabili ty distribut ions are – fundamentally – expre ssions of the logica l doxa stic consequence s of available information . It is also important to u nderstand that although probability distri butions are express ions of uncertainty, in that they describe relative doxasti c states about the truth - values of various competing propositi ons, probability dis tributions cannot be related to actual uncertainty in any sys tematic w ay. This problem is well known (e.g., Knight, 1921, Taleb, 2010, Beven, 2016) . T he difference between having an expression of uncertainty and having an expression of uncertainty that can be related to real uncertainty is critical , and the remainder of Section 2 outlines why our inability to estimate uncertainty means that probabilit y theory is insufficient to support a coherent account of science . Section 3 the n describes how in formation theory fill s the gap , and how th is solves m any of o ur day - to - day practical problems related to testing mo dels . 2.2. Types of Unc ertaint y tha t A rise in Model/Hypothesis Testing !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 2 What we call the propositional algebra is more o ften called a propositional calculus . It is, however, not a calcu lus since it does not contain any dyn amic variable. I t is actually an algebra, and we w ill refer to it as such so a s to differentiate betwe en this and a truly calculitic lo gic li ke probability theory. ! 4 ! To see why uncertainty is not a useful conc ept in science we must fir st under stan d what a model - based experiment looks like . An exam ple of a common model - based experiment in the geos ciences is illu strated in Figure 1 . Here our model consis ts of various process representations denoted by 𝒽 ∗ . The se a re Hempel & Oppenhe im’s (1948) explanan tia , and are statement s that describe the dynamics of both the system under study and the measure ment devices us ed to co llect exp erimenta l data. The experiment also includ es statements about various aspects of the system that are observable (at least in princ iple) ; th ese are denoted by rand om variables ( e.g., 𝑢 , 𝜃 , 𝑦 ) , whic h we refer to as phenomenological 3 . Fi nall y, the experiment includes data that contain informatio n about some of these phenomenological components . Note th at d ata , represen ted here as 𝑧 ∗ , are actually ontic in nature ; for exam ple, t he se are often the binary states of some computer memory . Data must be relat ed to phen omenol ogy via explanan tia tha t represe nt a measure ment processes , an d it is important to rem embe r that all measurin g process es – probably including human perception and cognition ( e.g,. Tononi, 2011) – are just dynamic physical systems . It is only after th is mod el build ing pro cess – i.e., after assigning one or more phenomenological vari able s and measurement mod el s – that data facilitate statements about states of affairs . Such statements are artifacts of models, not of data themselves. To put this another w ay, reco gnizing a fun damental separation between phenomenological model buil ding vs. hypothesis - driven model building ( e.g., Chalmers , 2013; chapter 1) seem s to be an er ror . In other words, we take Quine’s (1951) all - in clusive e pistemic fabric quite seriously , e xcept – as pointed out above and by Quine (1986) – to n ote that we m ust have some sys tem of log ic to even f ormulate a question. Figure 1 : A probabilistic sci ence experiment involving a complex systems m odel. P rocess components ( explanantia ) are labeled 𝒽 ∗ , phenomenological comp onents – those that are observable – are labeled by random variables 𝑢 , 𝜃 , 𝑦 , and experimental data are labeled 𝑧 ∗ . Th e particular example of an experiment in Figure 1 recogniz e s thr ee types of phenomenological components – model parameters 𝜃 , boundary conditions 𝑢 , and system r espons es 𝑦 . Mor e gener al ly , any science ex periment include s som e experimental inputs (here 𝜃 and 𝑢 ) simply because by definition there are no accessible closed systems, and an experiment proceed s by developing and testing explanantia to predict responses of a perturbed system (perturbations may be natural or induced). W e must keep track o f the pertu rbations, in the fo rm of informatio n store d in dat a (here 𝑧 ! and 𝑧 ! ) , and this informatio n can only be interpreted through so me measu rement model that relate s data with various phenomenological components , and i t is the act of interpreting data thro ugh a me asur ement model that admit s something like obser vation error , which is actually a form o f model err or. S pecif ical ly, what we typic all y cal l measuremen t error is actu ally error in a m easurem ent mod el. It is im portant that we aren’t temp ted to reify the idea tha t data them selves contain error . Data ar e just physical !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 3 The term phenomenological is used to re fer to things th at are, in p rinciple, observable . Whether o r not an en tity is in prin ciple observable is c ompletely an aspect of the m odel itself. We make no a priori (model independent) claims about observability; the theory itself that is embedded in the model tells us whether a particular entity of a partic ular model is at lea st potenti ally observ able, eit her direct ly or ind irectly. ! 5 ! manif esta tio ns of port ions o f the u niverse, an d the b est the scientis t can do is to accurately and precisely predict their interactio ns with experim ental data using a descript ive (and predictive) mode l . A n experiment con tain s two primar y types of uncertainty: uncertai nty related to d ata and uncertainty related to explanantia ( various components of the model) . Related to data, experimenta l perturbation data will genera lly not contain sufficient information to fully determine experimental response data . V ariability in resp onse data that cannot be explained by in formation (here var iability ) in pert urbation data mi ght come fro m real ontic ( i.e., quantum 4 ) ran domness that mani fest s at relevant scale s (Albrecht and Phillips, 201 4) , or simply from incomplete informatio n abou t t he actual perturbations to our system(s). Related to explanantia , uncertainty comes from the fact that we can never expect to perfectly characterize the dyn amical p rocesses in either the system under s tudy, or our measurement devices. In co ntex t of any particu lar ex perimen t , we might ca ll th e former ty pe o f un certainty aleatory and the latter epistemic , wher e aleatory uncerta inty refers to intrinsic random ness (here intrin sic rela tive to th e ex periment) and epistemic uncertainty is due to lack of perfect kn owledge (here relative to the hy potheses, which are the expl anato ry ob ject s) (Kiureghian and Ditlevs en, 2009) . Note tha t the probabilit y distribut ions around all of the process comp onents ( i.e., 𝒽 ∗ ) in Figure 1 r epresent epistemic un certainty, and it is by margin ali zing over these distributions that we are able to obtain probability distributi ons over phenomenological components, and finally over response data 𝑧 ! conditional on control data 𝑧 ! and 𝑧 ! as : ℳ 𝑧 ! 𝑧 ! , 𝑧 ! = 𝒽 ! 𝑧 ! 𝑦 𝒽 ! 𝑦 𝑢 , 𝜃 𝒽 ! 𝑢 𝑧 ! 𝒽 ! 𝜃 𝑧 ! 𝑑 𝒽 ! 𝑑 𝒽 ! 𝑑 𝒽 ! 𝑑 𝒽 ! 𝜇 ! , ! , ! . [1] These predictions about actual data like 𝑝 𝑧 ! 𝑧 ! , 𝑧 ! are the only things that a scientist can actually test. Further, a leatory uncerta inty will mani fest in 𝑝 𝑧 ! 𝑧 ! , 𝑧 ! only if the various explanantia ( i.e., the various 𝒽 ! , 𝒽 ! , 𝒽 ! , and 𝒽 ! ) are them selves pr obabilistic , and this is fund amentally differen t from recognizing epistem ic uncertainty in the ch oice of any specific set of explanantia since we may have a correct or incorrect probabilit y distribution conditional on experimental pertur bations . 2.3. The Probl em with Uncertainty There are apparently two way s to test m odels based on the ir pred ictions: e ither by inter - comparing several models or by treating each hypothetical model individually. In reality, these app roaches are no t different, because any family of models that we mi ght want to compa re i s i tsel f t he mo del that we e nd u p te sti ng. However, to faci lit ate a discussion of the two main approaches to testing models that currently exist, the following subsections draw a distinction between Bayesian ( e.g., Howson and Urbach, 1989) an d falsification ist ( e.g., Mayo, 1996) types of inference . This distin ction is describ ed by Gelm an and Shalizi (2013) , who recognize that the statistics community sometimes inco rrectly refe rs to the form er as indu ctive and the latter as hypotheti co - deductive . The latte r authors argue that there are certain inconsiste ncies between these tw o normative accounts vs. the way that the y are actually applied in practice. Their poignant and illuminating treatment of this discrepancy between theory and practice nevertheless does not solve any underlying problem – inst ead of building a coherent normative theory of inference, they o ffer a largely ad hoc account of using tw o im perfect methods in tande m to help mitigate the deficiencies of each. Our argumen t in th e following is tha t t he se differences between theory and pra ctice betray deep underlying logical inconsistencies in both of the existing normative th eories (Bay esian and falsificationism ), and that su ch logical in consistenc ies can be attacked directl y by using a m ore complete system of logic. 2.3.1. The Probl em with Unce rtainty f or Bayesian Inference Bayesia n inferenc e compares some number of competin g models by first assigning over this family a probability mass funct ion or probabil ity density fu ncti on , a nd then con ditioning tha t probabilit y distri bution on observation data. That is, it is suggested th at we may recogn ize compet ing versi ons of things like 𝒽 ! , 𝒽 ! , 𝒽 ! , and 𝒽 ! as !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 4 Quantum mechan ics is the only known potential source of ontic rando mness – all other randomness is epistemic (Jaynes, 2003; ch 10). ! 6 ! somehow representative of epistemic uncertainty , and that by modif ying our doxas tic states distributed acro ss these alterna tives based on recon ciliation with observatio n data w ill lead us to re liable inferen ces. The problem with this approach is that it is impossible to gu arantee that any probabilistic inferences will be correct , or even consistent in any me aningful se nse . Bayesian posterio rs converge almost certainly to a true model , but this can on ly occu r if a tru e mo del is assigned finite probability by the prior. This would require that the v arious epistemic distr ibutions in Figure 1 over competing explanantia like 𝒽 ! , 𝒽 ! , 𝒽 ! , an d 𝒽 ! be constructed such that they each place fini te probability on at least one true process - representation . This alone is obviously unrealist ic in all conceiv able situ ations , howe ver th e pro blem is further co mplicated by the fa ct that any “true” explanantia must correctly repres ent aleatory un certainty for each experiment that we put to it . S o not only must we have a “t rue” pr ocess model, we also require a “true” statistical model of information m issing from our experimental data for each and every set of experiment s . When thes e conditions are not met , probabili stic inference over families of models is not guaranteed to converge (Berk, 196 6) , and if in ference does con verge, may not conve rge to models that g ive the best pred ictions (Oreskes and B elitz, 2001, G rünwald and Langford, 2007, Müller, 20 13) . Even if we ignore Hume’s (17 48) problem and suppose that there does exist some space - time station arity in the physics of the universe, so that that true models remain true, this does not mean that best approxima tions over past data wil l be best, or even reasonable, approximations over future data . For that, the error propertie s of our models wo uld also have to be correc tly characterized, which is indistinguishable from having a perfect model . T his is exactly the problem of verisimilitude (Popp er, 2014) , which we w ill dis cuss further in Section 4 . T here is no sense in which any indi vidua l proposit ion may be said to be more or less “truth like ”, only that famili es of propositions (here models) contain as derivatives more or fewer true propos iti ons. Popper proposed his theory of verisimilit ude precisely to account for the difference between predictive adequacy and the “truthlikeness” of a model . T he theory fails exactly because it only accounts for the natu re of predictive statements that may be derived from a model. In simplest terms, the problem is that in a Ba yesian fra mework we hav e no way to understa nd the relationsh ip between our family of models and the true nature of either phenomenology or process in the system – we just search for the best am ong alternatives conditional on data . Gelman and Shalizi (2013) point out that the way scientists currently deal with this is by using a falsi ficationist type approach in conjunction with a B ayesian approach ; to state it lo osely, falsificatio n is used for actual model testing and Ba yesian meth ods are used for calibration . The falsifica tionist app roach i s discussed in the next subsection ( Section 2. 3.2 ). This issue causes real problems during inference. In a com plex m odeling chain , where epist emic un cert aint y exists about a number of interacting model components, an y misspec ifi cati on of the d istribution ov er o ne component will unavoidably bias probabi lities placed on other components. An example of t his was given by Beven (2008) , and that ex ample is exp anded in Appendix A of this article . In th ese e xam ple s, errors in the specification of measureme nt models like 𝒽 ! and 𝒽 ! result in erro rs in the proba bilities assigned to process descriptions 𝒽 ! . B ecause we can not ever guarantee that any of our epistemic distributions meet criteria for reliab le convergence to true mo dels , we have at leas t the pote nti al for inconsistency in any Bayesia n inferen ce prob lem . Give n this, it is impossib le to assign anything like reliable posterior probabil ities over any particular family of hypothesis. The best that we can do is assign probabilities that are conditional on the other imperfect model compone nts , whic h are themselves necessarily distributi on al and almost certainly misspecified . Thi s is exactly the c onfirma tion holism aspect of the Duhem - Quine the sis as it manifests in the contex t of Bayesian inference. To summ arize the problem, it is that w e are a sking about truth - values of families of models that we know are all approximations. There is a fundamental logical contradiction inherent to the B ayesian philosophy itself. Remember that probabi liti es are doxastic meas ures associated with the truth - value of a given set of propositions, and so t h is lo gical error ex ists in any type of exp eriment whe re we attemp t to assign probabilities to models . T here is a contradiction between (i) the fact that we generally don’t have a ccess to a nything like a “true” m odel , ! 7 ! and (ii) the assignment of finite probabilities to m odels that w e know with near certainty are incorrect . In fact, probabilisti c inference methods assign doxastic states conditional on all information input to the problem, and given that any family of models is itself a model, there is no way to connect inferences over famili es of models with real epi stemi c un cert aint y. To reiterate what w e said in the introduction, w e are not advocating a strict anti - realis t perspecti ve on models – model s may be truth - apt (C artwright, 1983) in a me aningful w ay – it’s just that since all models th at we m ight actually construct are false ( i.e., model s invari abl y represe nt real ity in some simpli fie d or incomplet e manner) , it might be wort hwhil e t o r ecogn ize this formally in ou r method of science . 2.3.2. The Probl em with Uncert ainty for St atistic al Hypo thes i s Testing Conside r next the case where we treat ea ch model indi v id ually . Popper’s (1959) motiva tion in develop ing a falsification ist perspective was the asym metry betwee n asserting truth vs. assessing falseho od. In particular, given Hume’s probl em ( i.e., that we ca n nev er gu arant ee t hat any ge neral izat ion of data will appl y out side of the d omain of observation ), all we can do under the cla ssical logic describe d in Section 2.1 is to reject models that disagree with t hose dat a that we actual ly have in hand . The general strategy of naïve falsification might therefore be thought of as reject ing models that con flict w ith data , and w hile this would indeed solve both the practical and logical inc onsistency problems discussed in Section 2.3 .1 – since it do esn’t require that we express any degree of belief in the truth of any model – i t do es require th at we hav e a logically consistent way to reject models. However , i f we took this pu rely deductive approach and restricted ourselves to falsifying models directly via the modus tol lens 5 , then es sentially all mo dels w ould be imm ediately f alsified g iven on ly a sm all num ber of data. This is true even if we tested a perfect mode l , since the b est a modeler can ever hope to do in the presence of aleatory uncertainty is to build a m odel that predicts with accuracy and precision w arranted by the (partial) informatio n content of the e xperime ntal perturbation data. Somewhat ironi cally, w hile confirm ation holism tells us that no hypotheses can be rejected, it is also the case that all models will be r eject ed g iven any data. This following point is crucial: Bayesi an methods cannot deal with epistemic uncertainty , in th at they cannot tell us about anyth ing about difference s between mode l famili es and truth, and deductive meth ods cannot de al with aleatory uncertainty . This means ( as Popper was well aw are ) that we must theref ore apply some probabilistic or possibilist ic version of falsif ication, and the re sult is that in general w e are unable to strictly falsify a ny model . There is no longer any a priori criteria for rejecti ng models , and a s Ney man (19 57) put it, rejecting a model is an act of w ill ra ther than an act of rationality b ecause the choice of rejection criteria is necessarily ad hoc . The problem is, however, worse than ad hoc rejection criteria – to hav e a ny criteri on at all, ad ho c or otherwise, we must have some calcul iti c logic that allows us to develop som e measure that m ay be associate d with a gi ven model and used as the quantitate basis for rejection . But now we are back where we sta rted – we can't strictly falsify models, s o instead w e must associate with them some doxastic mea sure in the presenc e of inc omplete experimental information. T hat being said , it is c lear tha t som e models are useful , and so one might suggest that it m atters little if the pedan t (us) calls them false . We propose , how ever, that th e practice of assigni ng degrees of belief to the B oolean tru th - values associated with hypotheses or models that w e kno w are incorrect is a primitive logical error th at gives rise to esse ntially all o f the practical p roblems we currentl y face relat ed to evaluat ing the accuracy, preci sion , and reliability o f com plex system s mod els . Our point is that we should not use any meth od of inferenc e that e ither implicitly or exp licitly asks about the truth - value of any explanantia, and inst ead we wis h to e mploy a quant ita tive metho d that recognize that treats this no n - realism strictly. We shou ld be measur ing degree s of isom orphism , n ot degrees of bel ief . In other words, our objective is a method of science that a) does not require (but may allow for) comparing competing explanantia , and b) does not assign degrees of probability or confidence to models . Certainly , w e want this metho d to be fu lly cohere nt with pr obability the ory, given Cox’ theo rem and that we will use the classical !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 5 The modus tol lens says if proposition 𝑃 implies proposition 𝑄 and 𝑄 is fa lse, then 𝑃 is false: 𝑃 → 𝑄 ∧ ~ 𝑄 → ~ 𝑃 . ! 8 ! logic: we do not necessarily wish to deny non - contradiction , although we certainly leave open that possibility ( e.g., Kosko, 1990 ) . 3. The Sol ution: An Information - Based Sci ence I t is he lpful to first notice is that the re is a significant component of empirical logic that is missing from probability theory. Let’s see this by example. Suppose that we plan to cond uct a series of experiments – to be concrete we will roll tw o dice. Before observ ing the outcome of the roll, our epistemic fram ework behaves multi pli cati vel y: ther e 6 × 6 = 36 possible outcomes. After observing the outcome , our epistemic frame work behaves additively: we now require 1 + 1 = 2 pieces of information (the actual outcomes of each experiment) to capture everything we now know about the state of the dice. In general, doxastic states – the me asures of belief we place on all explic itl y stat ed possibilities – behave according to a product rule before observing experimental outcomes and according to a sum rule afte rwards . Knuth (200 5) showe d that t his logarithmic collapse of doxastic states – from m ultiplicative to additive behavior over multi ple experiments – is unique and ge neral in the context of a distributive and unitary calculitic logic like p robability theory . G iven that this logarithm ic change in behavior e ffected by observation is not actua lly a part of prob ability theor y , it is diff icult to see wh y pr obability theory should be a sufficient logic for conducting any type of empirical investigation . Notic e that this argument for th e incom pleteness of prob ability theo ry is different than some others . For e xamp le, Pearl (2001) claimed that “ the b ulk of hu man k nowledg e is org anized around causal, not probabilisti c relationships” and that probability theory l acks causality . Pearl wishes to req uire that epi stemi c t heory be designed to accommodate a particular a priori ontological vi ew of the world ( tha t physical law s are causal ) . O f course, we might counter - argue, as Hume d id, that indeed all we may know are propensities. Wher eas Pearl requires scientific epistemology to acco mmodate his a priori ontology , instead our argument is that no matter how we construc t the ories and models , probabilistic logic itself is incomp lete becaus e it doesn’t recognize a fundamental characteristic of belief itself – how the b ehavior o f related belie f states change during experiment. To state this symbolical ly, two experiments result in dat a 𝑧 ! an d 𝑧 ! and our epistemic state changes due to observation as: 𝑝 𝑧 ! , 𝑧 ! = 𝑝 𝑧 ! 𝑧 ! 𝑝 𝑧 ! → ℎ 𝑧 ! 𝑧 ! + ℎ 𝑧 ! = ℎ 𝑧 ! , 𝑧 ! , [2] where ℎ . is s ome mea sure of the info rmation necessary to describe the outcome of our experimen t . Given that this is a fundamen tal and necessary consequ ence of the epistemic environment describe d in Section 2. 1 , the question is how to account for it in practice . 3.1. An Informat ion - Based Scien tifi c Method Let’s start by ask ing what is probably the mos t straightforw ard question a scientist can ask in the contex t of any particular experiment : I s there any information in m y ex perimental data that cou ld be used to improve my mod el? If the answe r is yes th en we reject the model in a strict sense becaus e w e k no w that thi s model has the pote ntia l to be im prove d given only currently available experimental data . If the a nswe r is no then this mean s that w e have not discovered any potential to mit iga te d efic ienc ies in the model. Figure 2 illustrate s a scientific meth od structured around this question. Again we have some experimental perturbation dat a and some experimental response dat a, and again these ar e not immediately treat ed as representative of anything in particular (data are ontic states o f affairs rather than propositions about states of affairs). The perturbation and response data are actually related according to the physics of whatever system we are trying to model, and thus variability in perturbation data at least partially informs variability in response data. Since the objective of science is apparently to understand process relationshi ps (Davies, 1998) , ideally we wou ld measu re the inform ation about the relationship between experimental perturbations and responses that is actually contained in that data, and then compare this to the information about this same relationship as it is described by our m odel. We typically can’t do th is directly, and so we do what is called science : we use tests of mo del predictions to reflect on the via bility of m odel explanantia . W e conside r the a ct of us ing prediction s to tes t hypotheses an es sential component of any definit ion of sci ence. ! 9 ! In thi s case, we want to c ompare information about process relationships contained in data vs. mo dels, but will instead measure informatio n about experimental response data that is availab le pu rely from rela tionships mined from data vs. inform ation about experimental response data available from our hypothetical model that we want to test. S pecifically , w e w ill m easure the in formatio n ab out experimental response data ( 𝑧 ! ) contain ed in experimental perturbation data as some quantity 𝐼 𝑧 ! ; 𝑧 ! (for s implicity we will only notate 𝑧 ! , but this cou ld include a ny exp erimental perturbation d ata, includ ing 𝑧 ! ), and the info rmation abo ut experime ntal response data contained in predictions m ade by model 𝓂 : 𝑧 ! → 𝑧 ! , that maps 𝑧 ! onto a predicti on 𝑧 ! , as some quantity 𝐼 𝑧 ! ; 𝑧 ! . If we use a self - equitable measure to quantify information (K inney and Atwal, 2014) then informatio n from the model is alway s bounded by the actual informat ion conte nt of the input data by the Data Process ing Inequality (Gon g et al., 2013) : 𝐼 𝑧 ! ; 𝑧 ! ≥ 𝐼 𝑧 ! ; 𝑧 ! . [3] Supposing that we could estimate these two quantities : 𝐼 𝑧 ! ; 𝑧 ! and 𝐼 𝑧 ! ; 𝑧 ! , then the sign o f their differenc e is sufficient to know w hether information from the model is less than in formation from data. Again, a “yes” answer indicates po tential to im prove the model w ithout colle cting any n ew data . Figure 2 : An information - based science experiment. Experimental perturbation data partially informs experimental response data according to the physics of the real system, includ ing all measurement devices. The model attempts to emulate this relationship, and, by the data processing inequality, model outputs contain no more information about experimental response data than is contained in experim ental perturbation data. If the mod el provides less information than we are able to extract empiricall y from experimental da ta p erturbation/resp onse pairs, then th e mo del is improvable giv en only currently a vailable data. The challenge is to measure the information content of exp erimental pe rturbation da ta: 𝐼 𝑧 ! ; 𝑧 ! , which req uires that we know exactly the joint d istribution betwee n control/respon se data p airs. We obviously do not know this relationship exactly because if w e d id then w e w ould not have n eed to test any further m odels . Instead we ret urn to the ac tual deside ratum, w hich is to te st relationsh ips rather than prediction s, and a sk simp ly wheth er the m odel provides as much information about the relationship b etween experimental perturbation and response data as we are actually able to extract by building a relationship purely from that data itself (Nearing and Gupta, 2015) . The following explains. Altho ugh we can’ t est imate 𝐼 𝑧 ! ; 𝑧 ! dire ctly, we can bound it conservatively. For example, the current auth ors have approach ed this using nonparametric regression ( e.g., Nearing et al., 201 6a, Near ing and Gupt a, 2015) . G iven an arbitrary em pirically - derived mapping 𝓇 : 𝑧 ! → 𝑧 ! , then by anoth er ap plication of the data process ing inequality w e have 𝐼 𝑧 ! ; 𝑧 ! ≥ 𝐼 𝑧 ! ; 𝓇 𝑧 ! such that informa tion missin g from m odel 𝓂 is bound ed : ℰ = 𝐼 𝑧 ! ; 𝑧 ! − 𝐼 𝑧 ! ; 𝑧 ! ; [4 .1] ℰ = 𝐼 𝑧 ! ; 𝓇 𝑧 ! − 𝐼 𝑧 ! ; 𝑧 ! ; [4 .2] ! 10 ! ℰ ≥ ℰ . [4 .3] If 𝓇 has known convergence properties over large function classes ( e.g., Horni k, 1991) , then in principle and as long as m etric 𝐼 itself adm its a conv ergent estim ator, we have a bounde d and c onvergent estimate of the informatio n content of experimental data . I n practice , of course, we a lways must approx imate by either optimizing or integrati ng over regression hyperparameters, and so our results are not actually conver gent, but unlike purely probabilistic methods we have a logically con sistent theo ry that w e can a t least appro ximate – and we are not in principle precluded from accomplishing our objectives as we w ere when conducting an experiment like in Figur e 1 . Not only that, thi s m ethod is ro bust ( i.e., bounded) to approximation in all cases except overfitting, which can be mitigated at least empirically by partiti oning the experimental data into calibration and evaluation records when cal culat ing 𝐼 𝑧 ! ; 𝓇 𝑧 ! . It’s worth po inting o ut that we really want to have a t heoretical convergence proof for whatever str ategy we use to estimate 𝐼 𝑧 ! ; 𝑧 ! from data. Ideally this wo uld include theoretical co nvergence rates as well a s an exp ression for the asymptotic distribution over th e statistical estimator 𝐼 𝑧 ! ; 𝓇 𝑧 ! in the limit of an increasing n umber of data , and p erhaps even a series expansion that yields ordered bounds. However, such theoretical demonstration may or may not be po ssible for any particular estimator , a nd w e will leave this as an open challenge . However , again, the point is that the ba sic theory is co nsistent and approx i mable even with out a dem onstration of asymptotic behavior . To summarize, a meaningf ul model evalu atio n experiment might seek to establis h whethe r the mode l contai ns (at least) as much information about the relationship between experimental perturbation /respon se data as do the data themselv es . Becau se we cannot directly measur e info rmati on abou t rela tion ship s , we in stead measure infor mati on about experimental response data that is contained in experimental p erturbation data , an d comp are that with informatio n contained in m odel predictions. Of course, we cannot measure informat ion cont ained in experimental perturbation data without a model of the relationships underlying that da ta , and so ou r objective is to develop an empirical relationship between experimental perturbation data and experimental response data that is complete ly theory - free , or at least based solely on logical theory rather th an ontological theory , an d which s eeks to ex tract all possible information about the perturba tion/response relationshi p as possible from experimental data . We th en measu re th e inf ormat ion ab out r espon se dat a pro vide d by wha tever rel atio nshi p we ar e able to extract purely from data vs. from our hypothetical model, and this cons ervatively bounds the information missi ng from t he hypothetical model . 3.2. Measure s of Infor mation for Hyp othesis Testing In Section 3 .1 we only requi red that inf ormat ion be quantified by a self - equ itable metri c so that we can employ the data pro cessing ine quality (K inney and Atw al, 2 014) . T hi s is , howe ver, in sufficient for logical consistency since w e must account for Knuth’s (200 5) description of the logarith mic relationsh ip between in formation and probabilit y ( Equatio n [ 2 ] ) . T his constraint is h elpful because it reduces apparent freedom of choice about wh at informatio n metri c to use . Under probabil ity theory , everything that w e know about any potential data 𝑧 ! is described by som e probability distributi on 𝑝 𝑧 ! – this d istribution is an expressio n of availab le informa tion before applying a ny explan atory or predictive hypot heses or conditi oning on any exogenous da ta, like mo del inputs . Here we do n't care about wheth er a true outcom e is assigned finit e probability , w e o nly care that 𝑝 𝑧 ! rep resents the distribution of our actual experimental response data 𝑧 ! in a bsence of a ny other d ata o r any hypotheses – th is is a frequency distribution . Next we obtai n some data, say 𝑧 ! , and appropria tely co ndition 𝑝 𝑧 ! to obtain 𝑝 𝑧 ! 𝑧 ! . T he resulting reduction in new inform ation that we would need to identify any particular occurrence of 𝑧 ! is , by Equation [ 2] , the difference ℎ 𝑧 ! − ℎ 𝑧 ! 𝑧 ! . We can com pare this diffe rence with the reduction in new inform ation needed to fully determine 𝑧 ! if , instead of conditioning on 𝑧 ! , we were to co ndit ion on model predi ctio ns 𝑧 ! : ℎ 𝑧 ! − ℎ 𝑧 ! 𝑧 ! . T he information that is available in model inputs that is missing from model predictions is ℎ 𝑧 ! 𝑧 ! − ! 11 ! ℎ 𝑧 ! 𝑧 ! and , according to the data processing inequalit y, this quantity is alwa ys non - nega tive as long as the conditional and shared information measures behave additively as : 𝐼 𝑧 ! ; 𝑧 ! = ℎ 𝑧 ! − ℎ 𝑧 ! 𝑧 ! . This is precisely the relationship that is s atisfied by the standard Shannon - type entropy and mutual information metric s (Cover and Th omas, 1991 ) . It is useful to develop an epistemological perspective on the uniquen ess of Shannon’s entropy and mutual informatio n measure s , but first let’s define the m. E ntropy is the expecte d amount of inform ation about a rando m variable tha t can be gained by obse rving that va riable, and is defined as: ℎ 𝑧 ! = 𝑝 𝑧 ! = 𝜁 ln 𝑝 𝑧 ! = 𝜁 ! ! 𝑑𝜁 , [5 .1 ] So, e ntropy is ( in so me sense ) a measure of the variability in a probability distribution over a random variab le 6 . Following from this , m utual information is the expected reduction in entropy of one random variable that is caused by conditioning on a correlated variable ; e.g. : 𝐼 𝑧 ! ; 𝑧 ! = ℎ 𝑧 ! − ℎ 𝑧 ! 𝑧 ! , [5 .2 ] 𝐼 𝑧 ! ; 𝑧 ! = 𝑝 𝑧 ! = 𝜁 , 𝑧 ! = 𝜈 ln 𝑝 𝑧 ! = 𝜁 𝑧 ! = 𝜈 𝑝 𝑧 ! = 𝜁 𝑑𝜁𝑑𝜈 . [5 .3] So, m utual information is the expected amount of information about one variable that is contained in a realization of a correlat ed variable , a nd so 𝐼 𝑧 ! ; 𝑧 ! specifically measur es the expect ed reduction in th e amount of e ntropy that w o uld be achieved by directly observing 𝑧 ! depending on whether we had first conditioned on 𝑧 ! . The quantity 𝐼 𝑧 ! ; 𝑧 ! is sim ilar , except here we are concerned w ith model predictions such that the model pred ictions are independe nt of ex perimental response data conditional on experimental pertur bation data . Notic e that 𝐼 . is a special c ase o f inte gration over a transform ed ratio of a con ditional distribution to its margi nal: 𝐼 𝑧 ! ; 𝑧 ! = 𝐸 𝑓 𝑝 𝑧 ! 𝑧 ! 𝑝 𝑧 ! . [6] There are three imp ortant c ompon ent attribu tes of this statistic . T he first is the p ro bability ratio itself, which is simply the expression of the relationship betw een margin al and condit iona l knowledge about 𝑧 ! given 𝑧 ! . The second impor tant aspect is the integra tion itself . In general th e probability ratio is high dimensional (infinite - dimensional in the case of conti nuous random variables) , a nd the integration collapses such distributions to a metri c . The integration is pur ely fo r trac t ability, and thus we may want to integrate several d ifferent statistics to understand different aspects of the probability ratio . Th e t hird important as pect of t his statis tic is the l og - transform . Whil e t he conversion fr om a distribution t o a metric is gen eral as long as we allow for any transform of the probability ra tio in side of our integratio n, we will specifically obtain a self - equitable metric as long as our transform is convex (Csiszár, 1972) . Here, however, we must use Shannon’s logar ith m , 𝑓 𝑢 = − 𝑢 ln 𝑢 , to b e consistent with Equat ion [ 2] . Further, it is important to stress that the above in formation m etrics are not bridge prin ciples. Bridge principl es are tools u sed to join tog ether d ifferent bodie s of theo ry (Carnielli and Coniglio , 2007) , and t yp ically, model evaluation metrics are treated as ep istemic bridge s that relate the mo del’s ontologic al predictions with appropriate epistemic consequences ( i.e., probability distributions ). In co ntrast, the abov e inform ation metrics derive uniquely from the same system of logic that supports probabil ity theory in the first place. That is, if our epistemic theory admits probabilities, then we do not need any bridge principles to quantify m odel performance, we are for ced to !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! 6 We won’t pretend that stati stical entropy is in any way related to uncertai nty, even though we have used that language in pre vious publications. The reason that th is is incorrect is again that th e distr ibutions like 𝑝 ( 𝑧 ∗ ) that are integrated to g et quantities like ℎ . suffer from the usual degeneracies, and therefore have no systematic relat ionship with uncertainty. We simply drop the concep t of uncertainty altogeth er – it is not useful. ! 12 ! use information theory. Moreover, by using this theory, we get the scientific method outlined in Figure 2 , which avoids essentially al l of the pitfall s discussed in Secti on 2 . It is in teresting to notice that linearized epistemic b ridges are very common ( e.g., Tian et al., 2016, Taylor, 2001, Gupta et al ., 2009) , and these are chosen apparently for mathematical convenience. What we hav e a rgu ed a bov e i s that there is a real (not arb itrary) lin earity in our epistemolog ical th eory that is exploited to r esult in additive measu res of model perfor mance . This line ari ty does not result from conveni ence , but rathe r resu lts neces sarily as a fundamental property of distributive logic. While Near ing & Gupta (20 15) show ed that the standard line arized evaluation m etrics ( mean - squared error, Pearson product - mom ent correlation coefficient, additive mean bias ) are special cases of th e integratio n in Equatio n [ 6] that come fro m a ssigning specific integrating functions 𝑓 . and specific param etric forms of the conditional and marginal probabili ties , the only in tegrating function that make s Equation [ 6] coherent with Equa tion [ 2] is Shannon’s : 𝑓 𝑢 = − 𝑢 ln ( 𝑢 ) . In particular, the metrics in Equati on [ 5] are apparently unique under the following ch oices : • First, the cho ice to use a two - valued logic that includ es n on - contradiction, so that propositions are either true or false but not both. An exampl e of an alt ernat ive to this is Fuz zy Logic (Kosko, 1990) . • Second, the choice to represen t o ur dynamic beliefs abo ut various propositions using a scalar belief metr ic (Van Horn, 20 03) . Then b y Cox’ theorem we have probability theory , and b y Knuth’s theorem probabilit ies collapse logarithmically over repeated experiments . • Third, the choice to wor k with m etrics rath er than dis tributions. T he integ rations t hat turn distribution s into statistics result s in Shannon’s inform ation theo ry. This does not , h oweve r, mean t hat th ere are no choices left to make . The fact that we are quantifying information by taki ng an expected value ( i.e., an integration over the distributions) is importan t. Any statistical procedure requires some repre sentative sample, and so we must integr ate our statistics over some finite num ber of experimental trial s – and th e choice of integrati on domain matt ers . That is, our results wil l depend on which experimental data we consider . I deally we would consider all existing experimental data that is in any way relevant to our hypotheses , but in prac tice this is ob viously impos sible . Again, however, the poi nt is tha t w e can make relia ble (boun ded) hypothetico - deductive inferences in the context of whatever experimental data we do decide to use. Moreo ver , w e can actually use differences bet ween integrati on domains to quantify pro cess stationarity ( e.g., Nea ring et al., in prep ). Anothe r imp ortan t choice is ab out wh ich aspect (s) of our experimental data we want to investigate . There i s apparently nothing tha t p revents us from manipu lat ing data before calculating information statistics . F or exam ple, we may be concerned only with extreme events, and we may theref ore assi gn phenomenological variable 𝑦 to represent som e aspect o f our defin ition of “ extreme ” . Or we m ay ca re abo ut time - series matching, in which case 𝑦 might repr esen t some aspec t(s ) of a power spectrum. In all cases , our model w ill predict the particular aspect of the exp erimental data th at we want to test, a nd we may also actually manipulate the actual experimental data ( e.g., calculate its power spectrum ) . Notice that m anipu lating ex perim ental results does not actually constitute applying a data model or m easurement m odel – it is simpl y physical manipul ati on of data via some information - processing device like a com puter , and we will sim ply test the extent to w hich our holistic model accounts for whatever eventual data we p ut i t u p aga inst . The res ult of this is apparently a theory of hy pothes is t esting that solves two basic problems: 1) w e can test model s that do not inher entl y admit probab ili stic predictions (thereby avoiding the degeneracy problem related to separating aleatory un certainty), and 2) we do not re quire any ad hoc epistemic bridges or ad hoc falsification criteria. Our fal sifi cati on criter ia here are never perfect (obta ined throug h high - dimensional nonparametric regression) but if calculated reliably, they always provide a lower bound on t he available information, and there fore , at le ast in theory, we can test models without an y chance of Ty pe I error. A toy application of this theory is giv en in Appendix B . Note that t his example extends t he inconsistenc y example in Ap pendix A . 4. Right Answers for the Right Reas ons ! 13 ! In Sectio n 3 .1 , we made a disti ncti on between testi ng models based o n t heir a bili ty to i nfor m pr ocess relationships v s. their ab ility to m ake inform ative predictions. The risk in doing the latter is that we may build model s that are info rmati ve over parti cula r data sets for wrong reasons – eit her by chance or by kl udging . Kludgi ng happens when informative predictions are derived from model behavior that is not isomorphic with real system dynamics (Clark, 19 87) . Of course , potent ial for kludgin g is not unique to the theory we’v e proposed. For example, Popper (2014 ) dealt with the fact that our best scient ifi c t heori es and mod els are often stri ctly fa lse by de fini ng a concept of verisimilitude , or approximatio n of truth. Accor ding to Popper, one theory is a better approxi matio n of truth than another if it entails more true (pr esumab ly pheno menolo gical) statem ents . This fails to solve the pr oblem, however, because testing the truth of entailed propositions is ex actl y the definition of testing a model’s ability to make true pr edict ion s . As far as we can tell , the proble m of asses sing truthli kenes s or verisi milit ude is intr acta ble. Instead, we might approach the problem from a reductionist perspective. Under the a ssumption that there does exist regularity in the universe , and that it is the objective of a scientist to discover this regularity, then we not only want informative model s, we want mode ls that are reliably informativ e because the y generate predictions us ing process es th at are isomorp hi c with whatever dynamic aspects of the syst em ar e act ually process - stationary . This is what the reductionist m eans when we say that w e want right an swers for the right reaso n s , and is the only fe asible concept of reliability that the curr ent authors can imagine . So can we at least measu re proce ss isomor phis m in physic al systems ? Again, we can' t measure info rmati on about individual process relationshi ps embedded i n a model, but we can measure i nformation transferred through i ndividual process relationships. This is not unlike measuri ng informati on provided by mod el predictions except that here we don’t have anything like experimental perturbation data , since in a ny complex model with many interacting p rocess rela tionships the inputs to eac h process componen t are thems elves mo deled varia bles. We cannot simply dissect a com plex systems model and run each componen t separa tel y , because the model must include interaction s betw een all of the various hypoth etical p rocess relationship s . Althou gh e nvironm ental mo dels often do not contain explicit field equations, the governing conservation equations w ill generally include many directly or indirectl y interacting eff ects. So we will not ask whether particular process relationship s embedde d in a model individu ally produce informativ e predictions; instead we w ill tre at m odel s as network s of interacting processes. P erhaps those inclined to Pearl’s perspe ctive on causality might call these causal networks , and rec ognize that they cann ot be eva luated in the sam e way as holistic mo dels becau se of the lac k of expe rimental p erturbation data on the model sid e . Figure 3 : A process network representing a typical ecohydrology m odel. Nodes represent phenomenological variables and edges represent com ponents of explanantia . More gene ral ly , becau se we are usi ng proba bil it y theor y we will tre at proc ess net work s as so - called B ayesian networks. An exam ple of a Bayesian network of a typical ecohydrology model is given in Figure 3 . Eac h node in this netw ork represents a phenome nological variab le with so me spatiotem poral exten t, and w e represent variabilit y (not uncertainty) in each node using probabilit y distributions that are conditional on all variables upstream in the network. The edges in the network rep resent explan antia that manifest as pro babilistic conditioning r elationships. Again, these probabilities do not represent uncertainty but rather partial P = precipitation R n = net radiati on SM = soi l moisture A n = carbon assimi lation LAI = leaf ar ea index g s = stomatal res istance NEE = net ecosy stem exch ang e Qh = sensible hea t flux Qe = latent heat flux ! 14 ! informativ eness – the conditioning variable inform s ob served or modele d varia bility in the cond itioned variable. Even if all equations in the model are determinist ic each variable is still probabili stic condit ional on only a subset of whatever other variables determine its value at any point during the simulation. Our objective is to quantify the influence th at each va riable has on all others in the real world, a nd then to se e whethe r the mod el reliably simulates these partially informative relationships. To quantify the influen ce that one variable, say 𝑦 ! , has on another varia ble, s ay 𝑦 ! , in a dyn amic (time - evolving) Mark ovi an ( i.e., local or causal) system, we wo uld m easure the ex pected effect of con ditioning 𝑦 ! at time 𝑡 + 𝑠 on the value of 𝑦 ! at time 𝑡 given that all of the variables in the model other than 𝑦 ! (notate these as 𝑦 ~ ! ) also h ad particular values at time 𝑡 . We th erefore mea sure the influe nce o f 𝑦 ! on 𝑦 ! in a Marko v syste m ov er som e tim e lag 𝑠 as: 𝐼 𝑦 ! ! ! ! ; 𝑦 ! ! | 𝑦 ~ ! ! = 𝑝 𝑦 ~ ! ! 𝑝 𝑦 ! ! ! ! , 𝑦 ! ! 𝑦 ~ ! ! ln 𝑝 𝑦 ! ! ! ! 𝑦 ! ! , 𝑦 ~ ! ! 𝑝 𝑦 ! ! ! ! 𝑦 ~ ! ! 𝑑 𝑦 ! ! ! ! 𝑑 𝑦 ! ! 𝑑 𝑦 ~ ! ! . [7.1] This metri c typicall y has to be approxi mated due to a probl em of dimensiona lity ( 𝑦 ~ ! is hi gh - dimensional), and the most c ommo n approx imation is c alled tran sfer entropy (Schreibe r, 2000 ): 𝐼 𝑦 ! ! ! ! ; 𝑦 ! ! | 𝑦 ! ! = 𝑝 𝑦 ! ! 𝑝 𝑦 ! ! ! ! , 𝑦 ! ! 𝑦 ! ! ln 𝑝 𝑦 ! ! ! ! 𝑦 ! ! , 𝑦 ! ! 𝑝 𝑦 ! ! ! ! 𝑦 ! ! 𝑑 𝑦 ! ! ! ! 𝑑 𝑦 ! ! 𝑑 𝑦 ! ! . [7.2] Transfer entropy re lies on a st rong Mar kovian approximation whereby the ti me history of the whole model, 𝑦 ~ ! ! is substituted by the time history of the target variable, 𝑦 ! ! . The res ult is that th is metric only e ver requ ires integration over a 3 - dimensional probability distribution, and is therefore generally feasible to estimate given a realistic number of da ta. Equation s [ 7 ] calculate the strength of directed interactions between each pair of phenomenological variables in our model at any spatiotemporal scale (if desi red we can subs titute the time indexes for spatial in dexes). If we had complete observations of all p henomenological variables we could construct similar process networks from data and compare the strengths of interactions derived from data with the hypothetical st rength of interactions f rom our model . The model may eit her over - esti mate or under - estimate the informativeness of any one variable about any other, and both over - estimation and under - estimation in t his case is undesirable. Of cours e, we must deal with the very real problem that we will never be able to directly observe every phenomenological com ponent of our m odeled system, and therefore cannot expect to actually construct Bayesian networks or calculate all necessary transfer entropy metrics directly from observations. However, we can do this with whate ver (part ial) observati ons we do actual ly have avail able, and again the point is that we have a cohere nt and general theory that we can approximate in practice. F urther, we propose that we can even take advantage of partial observations by using data assimilati on - based system identificat ion, which is the applicati on of Bayes’ theorem to conditio n the intern al states of a model o n inform ation in w hatever o bservation s we do actually ha ve available ( e.g., Bulygina and Gupta, 2011, Wilkins on et al. , 2011) . The idea bein g to measure how co nditionin g model phase - space trajectories change (Wik le a nd Berliner, 2007) conditional on information in observation data, and how this implies changes to process - level in formatio n transfers w ithin the m odel. A simple exa mple of thi s theor y is give n in Appendi x C , whil e Drewery et al . (in prep) g i ve several more sophisticated examples . W e will not go into further detail here except to say that this type of Bayesian system identification does not suffer from the problem of in consistenc y und er de generacy , because w e are simply asking how observations project on to the phas e traje ctories of whateve r model we have propose d. We are not using dat a assimilation to search for a “ true ” phase trajectory or “ true ” Bayesian informati on transfe r network , w e are u sing data assimilation to ask how much our observations can inf orm changes to modeled phase traje ctories and model ed p roces s - level inform ation trans fer. It’s worth pointing out that t his diagnostic approach (w ith or without da ta assim ilation) does not tell us w hether we are m issi ng any important proces ses in the model . How ever, it do es help us understand whe ther what ever ! 15 ! processes we have hypothesi zed and modeled interac t in way s that are apparently isomo rphic with observed interactions . This is not a formal approach to gaining knowledge, only an approach to gaining diagnosti c insigh t into the struc ture and f unctioning of complex models. 5. Discus sion : Science Arbi tration and Predict ing with Limited Informatio n In th e p receding sections we arg ued for a particular p erspective on evaluating models for the purpos e of scientific learning. However, anoth er purpos e o f building science - inform ed models is to make pre dict ions that contribute to societal management and decision strategies. This is called science arbitration (Pielke, 2007) , an d it might see m inevitable that succe ssful and compr ehensive mo del - based arbitration requires communicating predictive uncertainty to decision makers. We argu e that this is a so mewh at mis lea din g pres cr ipt ion . Instead, w hat the scientist actually wants to do is to c ommunicate to decision - make rs predictions that represent the scienti st’s best available info rmation . S uch predictions will n ecessarily be distributional , but will not be rep resenta tive o f uncertainty in any meaningful sense . The scient ist should als o communicate all choi ces that were made during model dev elopme nt and evalu ati on – for example by using form al decision tr ees (Beven, 2016 ) . So how do we predict with our best available information? The first thing that we must do is to ensure that we do not pretend to have more information than we actuall y have. We should not over - constrain our models. The practice of producing deterministi c predictions and then appending to these “unc ertainty distributions” or “error distributi ons” is very strange indeed. This practice essentially boils down to over - estimating our ava i lab l e informatio n in the first place and then addin g m ore informa tion about that overestim ation. Instead , a better organizing question seems to be about how w e might construct models that do not require that we pretend to have more inf ormat ion than we actu ally hav e fr om t heor ies, hyp othe ses, and dat a. Meth odol ogi ca lly this perspective imp lies a v ery different appro ach to model development. In particular, we suspect that this perspective will lead to a strategy for constructing models of complex systems that does not use differentia l equations to represent conservation laws. Instead we expect that in the no t - too - distant fu ture the general approach to constructing models of conservative macro - scale dynamical systems will involve imposing conservation laws expressed as symmetry constraints, via Noether’s theorem , on maximum entropy distri butions. The motivation for using maxi mum entr opy dist rib utio ns is that the basic project when developi ng a model under a calculitic epistemic theory is to construc t a joint distribution betw een a ll mode led variab les over the entire ( perhaps spatiotemp oral) domain of the simulation. In the cas e where we know absolutely nothing about the system, this distribution will be non - inform ative . Any theory, hyp othesis, or data th at we have or want to test will provide information about the behavior of the system , an d should be u sed to constrain this joi nt distribution. In a discrete mod el ( e.g., like what a finite element or finite difference approach produces ), this joint distribution would be a dire cted Bayesian network . The structure of the cre ative project in science is now to develop hypotheses abou t dynamic system s that act directly a s epistem ic co nstraints o n Bay esian n etworks, instead of as flux or source/sink terms in gov erning or field equations. 6. Conclu sion To reduce the sit uation to a vague analogy, quantifying uncertainty is very mu ch like try ing to measure cold or dark – uncertainty is the absence of knowledge , and it seems natura l that we m ight instea d develop theor ies around the thing that w e actually have access to: heat, light, or informatio n. The constructive que stion – which we have star ted to answ er here – is a bout ho w we accom plish ou r scientific inferenc e and a rbitration o bjectives in this context. Afte r writin g this paper, w e struggle som ewhat to understand why uncertai nty quantification has enjoyed such a central place in both the ph ilosophy and practice of science . The idea of assigning truth - values to hypotheses or models, and trying to assess our limit ed abili ty to do so seems to be a som ewhat contri ved or fantastical objective. It see ms m uch m ore n atural to try to understand and q uantify what inf ormation we have available from data about our mo dels. In general, informatio n - based approaches to inferen ce, arbitra tion, an d communication see m much more coherent, straightforward , and intuitive than uncer tainty - based approaches. The proposal is to avoid associating dox astic states directly with thing s that we ha ve any fun damen tal uncertain ty about – lik e h ypothese s, models , or future data. Instead, we re cognize that proba bilities are expressions of p artial ! 16 ! informatio n rather than ex pression of partia l uncertainty, and we use probab ilities simply to quantify parti ally informativ e relationsh ips betw een ontic states ( i.e., actually - existing hypothetical and experimental data) . It is essentially indispu table in the modern era that probabilit y theor y is a system of logic, and the theory we propose is absolutely bu ilt aro und this fa ct. H oweve r, o ur th eory only ever applies probabilitie s to represent counting distributions. The reason for thi s is t hat under empirici sm, we gai n informat ion by ac t ual experience, and so w hen w e accoun t for the doxastic effects of experiment we are fund amentally co unting. M ore specifically, we do assign doxastic measures, but only to pheno menolo gical en tities and not to theoretical entities, and then only based on empirically availabl e information. B y refusing to explicitly treat explanantia as truth - apt (a lthough leaving o pen the possibility for an y type of realism th e scientist might prefe r) , w e are taking an approach that is quite similar to the old frequentist int erpretation of probability theo ry . This is perhaps not surpris ing since frequentism i s a special case of the Cox - Jayn es interpretation of probabilities as expressions of rational doxastic consequences of partial information (Terenin and Draper, 2015) . Altho ugh Howson (20 00) did not call on Cox or Jaynes directly, tha t same mature theo ry of probabil ity theory as epistemic logic is what he was referr ing to when he sai d “ The 300 - year - old programme for an inductive logic based on formal probability has arrived finally at m aturity … now, for the first time in its long history, it can display it s own explanatory credentials as an authentic s pecies of l ogic, kindre d to deducti ve logic .” Th e development of this theory is almost certainly one of humanity’s greatest achievements, however it is not a complete description of empirical inquiry without K nuth’s (2005 ) dem onstration of the relationship between the logic of q uestions (probab ility theory) and the log i c of answers (information theory). O nly now , in the beginning of the 21 st century, are w e entering an era where we have access to what seems to be a com plete logic of inquiry, and although the problem of deriving non - ad h oc hypothesis tests is quite old, i t does seem that the time is ripe for progress. To restat e our central claim, the onl y question that a scient ist can cohe rently ask and reliabl y answer is: “Does my model capt ure a ll of t he i nfor matio n th at i s pr esent in my e xperi mental dat a?” Appendi x A: A n Example of Bayesi an Inc onsist ency The purpose of this toy example is to show that probabili stic inference is inconsistent under degeneracy. In particular , incorre ct phenomen ological distributio ns result in no t o nly incorrect, but actually contradicto ry (under different as sumptions) , inferences over e nsemb les. We d raw in spi rat io n f or th is exa mpl e f ro m Beve n et al. (2008) , who showed that incorrect likelihood functions result in incorrect parame ter estimate s; their likelihood fu nction is effectively an approximation of the result of integrating Equation [ 1] , an d if we get this ap proxim ation wro ng (in their exam ple by using an incorrect p henom enologica l distribution ov er 𝑦 append ed to each parameterized model ), th en we ca nnot expect to obtai n eit her correct or cons ist ent p aramet er i nfer ences . This is exactly an example of the Bayesian reliability p roblem that we are discussing here , and we demonstrate this same effect on assigning p robabilities to competing model s after marginalizing over parameters. Later, in Appendix B , we extrapolate th is example to demonstrate that information theory allows us to obtain reliable measures of info rmati on mis sing from expe rime ntal data an d from model predictions , and to do this w e use a synthetic experiment where the correct answer is known exactly. The syn thetic inference p roblem w as constructed using daily precipitation (in [mm]) and daily potential evaporation (i n [mm ]) data from the Le af River catchment in Miss issippi, U SA to simulate a 10 00 - day streamflow record (also in mm) with the HyMod (Boyle, 2000) model that inc lude d thr ee sl ow - flow tanks . The o utput time series from th is sim ulation was ta ken a s synthetic ‘truth’ data, and the record was split into a 500 - day warm - up period and a 500 - day observation period . T he model outflow fraction parameters and were sampled uniform ly over [0,1]; the soil m oisture storage parameter was sampled uniformly over [0 - 1000] mm, and the infi ltration exponent uniformly over [0,10]. The ensemble ℳ consisted of t wo competing model st ructures: a three - bucket Nash cascade and the abc - hydrology model , b oth run at a daily timestep. In both models, t he parameters are outflow ratios sampled uniformly over [0,1]; 500 parameter sets were sampled for each m odel. Additionally 500 different time series of daily precipitation were sampled from each of three different measurement distributions – al l stationary i id ! 17 ! Gaussi an with st andar d dev iati ons of 𝜎 ! = 0 . 01 , 𝜎 ! = 0 . 1 , and 𝜎 ! = 0 . 5 . W e then c alculated probabilities for each of the 500 × 500 = 250 , 000 model simul ati ons fro m each of thr ee inpu t dist rib utio ns usin g each of th ree different measurement distributions over streamflow w ith the same standard deviations as before: 𝜎 ! = 0 . 01 , 𝜎 ! = 0 . 1 , and 𝜎 ! = 0 . 5 . In total there w ere nine exp erimen ts e ach consis ting of 250,00 0 r uns of each of two model s. Figure A.1 : In consistent results in a synthetic multi - model inverse problem. Model probabilit ies depend on the choice of phenomenological distributions over precipitation and streamflow. Error bars re present one standard deviation over bootstrapped samples of the observations. Probabili ties associ ated with each of the two competing models were calcula ted by bootstrappi ng daily observations from the observ ation period and then marginalizing over all 250 , 000 model s parameter/boundary condition combinations in each of the nine experiments. The relative probabilit ies placed over the two - model ensemble are presented in Figu re A .1 . Te n boots trapped observa tion s amples were used to calculate the means and standard deviations that are plotted – bootstrapping was done to avoid undue influence of any individual observation, since probabili ties are multiplicati ve over observati ons and we necessar ily used a finite dat a record. The take - away from Figure A.1 is that we get co mpletely diffe rent res ults in even the relative ranking s of the two model s dependi ng on the choic e of measureme nt dist rib utio ns. In so me cases the ABC model is assign ed great er than 5 0% probab ility o n average (across bootstrap samples) and in others the Nash cascade is assigned greater than 50 % prob ability. Th ere is n o consiste ncy. M ore gen erally, w e will n ever in practice find ourselves in a situation where our m odels only have a single u ncertain com ponent, and fo r any situation where our modeling chain has interacting degeneracies in uncertainty distributions, inference results will always at least have the potential t o be inconsis tent. Appendi x B: Consi stenc y of Infe rences under Inf ormati on Theo ry H ere we re - examine the synthetic example given in Appendix A from the informa tion - based perspective outlined in Section 3 . This is an example of what the informatio n - based method lo oks like as it approximate s correct answers to the inference probl em under u ncertainty . As in Appen dix A , daily prec ipitation [m m] and po tential evap oration [mm] we re used to cre ate synthe tic ‘true’ streamflow data using HyM od ( 𝑧 ! ), and synthetic precipitation forcings were then generated by sampling ! 18 ! perturbations from iid Gaussian distributions to generate several different forcing data sets ( 𝑧 ! ) . This tim e instead of sampling from three forci ng distributions, we sampled from several with increasing st andard deviations, in order to provide more resolution in the effect of limited information content of forcing data. Just as in Append ix A , sev eral 3 - bucket Nash cascades and abc - hydrology models were used to simulate daily streamflow over a 10,000 - day period usi ng these pertur bed forcings True measur es of infor mati on missi ng from each set of synthetic precipitation data was measured by running the perturbed forcings through the real system (HyMod) and calculating the entropy of the synthetic observations conditional on each of the res ulting streamflow series. These quanti ties are notated as 𝐻 𝑧 ! 𝑧 ! . True measures of the in formation mis sing f rom each m odel w ere calculated by running each synthetic precipi tation da ta series through one of ou r com peting m odels – either a three - bucket Nash cascades or an abc - hydrology model, each with differe nt parame ters – to generate mod el predictions 𝑧 ! . Para meter sam ples were the same as those used in Appendi x A . The diffe re nce 𝐻 𝑧 ! 𝑧 ! and the resulting model - conditional entropies 𝐻 𝑧 ! 𝑧 ! quantify the entropy du e to m odel error according to Equation [ 4 .2] . A s in Appendix A , there w ere two model s tructures an d 500 parameter samples . Figure B.1 : C onvergence of mutual information statistics during calibrati on and testing data records after training neural networks to esti mate 𝐼 𝑧 ! ; 𝑓 𝑧 ! . Again, th e obj ecti ve of this exampl e is to demonstr ate that we can accurat ely estimat e informat ion m issing from the mode l using Equat ion [ 4 .1 ] . The qu antities 𝐼 𝑧 ! ; 𝑓 𝑧 ! were calcula ted using single - layer feed - forw ard neural networks to regress each set of (lagged) precipit ation data onto streamflow observation data. W e used a ninety - day lag period, so that there w ere ninety precipit ation inputs used to predict each streamflow output. It is importan t that o ur regres sion m odels are not o verfit w hen us ed to b ound 𝐼 𝑧 ! ; 𝑧 ! ≥ 𝐼 𝑧 ! ; 𝑓 𝑧 ! . To assess this, neura l network s were trained on an increasing fraction o f the total d ata record and the trained network s were used to predict both in - sample and out - of - sample data points. The o bjective is for the in - sample and out - of - sample informatio n statistics to converg e, which is what w e see in Figure B .1 . In partic ular, the in - sample and out - of - sample statistics converged to within 5% of each other with about 5000 training data. ! 19 ! Figure B .2 : Real vs. estimated in formation miss ing from (i ) experimenta l perturbati on data and ( ii) model p redictions. Next, the out - of - sample predictions made by these trained networks w ere used to calculate 𝐼 𝑧 ! ; 𝑓 𝑧 ! statistics, which bound the 𝐼 𝑧 ! ; 𝑧 ! statistics that we really want. The mutual information betw een Nash cascade predic tions and each dif ferent input d ata set, 𝐼 𝑧 ! ; 𝑧 ! , were the n used in Equation [ 4 .1 ] to bou nd inform ation missi ng from the model accord ing to ℰ ≤ ℰ . Figure B.2 com pares ‘real’ vs. estimated m issing information (from experimental data and model predictions) as a function of the standard deviation of forcing perturbat ions. Error bars represent one standard deviation over 30 repeated experiments using different Nash cascade parameterizations (again parameters were sampled uniformly over [0,1]). U nl ike in Appendix A , we treat eac h model indiv idual ly an d ther e is no nee d to marg inal ize ov er par amete rs (al tho ugh thi s coul d de done if we want ed to test a model that included a p aramete r dis tribution). Inform ation missing from the model predictions is alw ays underestimated, as expected, but in thi s case the under - estimation is generally less than 5% relative er ror. Appendi x C : An Exam ple of Process - Level Di agnosti cs This appendix includes a demonstrati on of the verisimili tude theory outlined in Section 4 applied to a simple rainfall - runoff m odel like what were u sed in the previous two ap pendices. In this case we u sed the HyM od model with rea l - worl d precip itat ion and st reamfl ow data fro m the same Leaf Rive r catchme nt. Sin ce we are inte rest ed to understand somehow the realism of internal mechanics of this model, we need to have some understanding of the model itse lf. A conceptual illustration of the HyMod rainfall - runof f simulato r is in Figure C.1 - in this mod el a water shed is concep tual ized as a set of wate r mass stores with linear o utflow rates. Figure C.1 : A conceptual diagram of the HyM od rain fall - runoff simulator. Markov state variables are labeled 𝑥 ∗ , the precipitation boundary condition is 𝑢 ! , the potential evaporation boundary condition is 𝑢 ! , and simulated streamflow is 𝑦 ! . Model parameters (not nota ted) control the height of a nonlinear soil moisture stora ge tank, the outflow r atios from each storage bucket, a nd th e frac tional partitioning of soil moistu re into surfa ce ru noff ( 𝑥 ! ) and subsurface runoff ( 𝑥 ! ). ! 20 ! Figure C.2 show s what this same model look s like when conceptualized as a Baye sian network. The same phenomenological variables are labeled, and the network connections illustrate the paths of informati on flow during one timest ep of numerical int egration. The onl y observed variabl e is streamfl ow, and this is observed at the integration timestep (daily). O ur objectiv e is to use information from that single ob served time series to learn about overall process - level deficienc ies in the m odel. Figure C.2 : A Bayesian network illustra tion of the same model as in Figure C.1 . Predicti ve accuracy (as q uantifie d us ing a mean - squared error) of all of these procedures (calibrati on, data assimilation, and system identification) are illustrat ed in Figure C.3 . It is, of course, entirely inapprop riate to use a squared error metric in any way during the process of conducting science (see Section 3.2 ) but for illustrative purposes this is sufficient and sufficiently simple to make the point. The point is that calibrati on helps improve model performan ce both durin g calibr ati on and evaluat ion (forecasting) periods. Data assimilation, w hi ch o nly updates initial stat es for prediction, only improved model accuracy during the calibration period where we had observations available to assimilate. System identification, on the other hand assimilated observations in the same way as data ass imila t ion, ho wever it enc oded the info rmation from those obse rvations into the s tructure of the Bayesia n network , which facilit ated better pred ictio ns not only during the calibr ation period, but also duri ng the evaluation period. More to th e poi nt of wha t we wan t to do here , w e ca n m easure the proce ss - level infor mation flow s w ithin the model before vs. aft er sys tem ide nti fica tion to under stan d what ass imil ate d infor mati on from ou r obser vati on dat a has to tell us about deficiencies in the model structure. We apply Equation [ 7 .2] to ca lculate information flo ws along each edge of the Bayesian network illustrat ed in Figure C. 2 both before and aft er system identi fication. The differences between the values of the prior and posterior transfer metrics are illustrated in Figure C.4 . This figure shows differences in information transfers between pairs of m odeled variables that were effected b y assimilating streamflow observations. Figure C.3 : Squared error metrics from calibration and evaluation periods. Results show how system identification stores information from data assimilation into a predictive model by turning that model into a Bayesian network (like in Figure C.2 ) and then updating the network itself. ! 21 ! The purpose of m aking these plots is first to locate (and compa re the relative magnitu des of ) individu al deficiencies in the model. The st ory in this example is that the model is strongly underestimating the role of precipitati on on the soil moisture storage variable, which sug gests that the biggest weakness in this m odel is the infiltration fun ction. Add itionally we s ee model u nderestim ations of influ ence betw een the soil m oisture state and the surface an d subsurfac e storage states , implying that s oil m oisture should play a larger role in mediating surface runoff and subsurface flow. Figure C. 4 : Absolute differences betwee n pairwise directed information transfers in HyM od before vs. after assimilating daily streamflow observations over a period of three years. The degree of red shading reflects larger differences in proce ss - level relationsh ips indicated by ob servations. ! 22 ! Refere nces : Albrecht , A. an d Phill ips, D. (2014) ' Origin of proba biliti es and t heir a pplicati on to t he mult iverse' , Physical Rev iew D, 90(12), pp. 123514. Beck, M. B., Gupta, H., Rastet ter, E., Shoemaker, C., Tarboton, D., Butler, R., Edelso n, D., Graber, H., G ross, L. and Harmon, T. (2009) 'Grand cha llenges of the future for en vironment al modeling ', Whit e Pape r, Nat ional Scie nce Fo undati on, Arling ton, Virgi nia . Berk, R. H. (1966) 'Limiting behavior of posterior distributi ons when the model i s incorrect', The Annals of Mathematical Statistics, 37(1), pp. 51 - 58. Beven, K. J. ( 2016) 'Facet s of uncertai nty: Epistemi c error, non - stationarity, likelihood, hypothesis testing, and comm unication', Hydrologi cal Sci ences Jo urnal, (9), pp. 1652 - 1665. Bev en, K. J., Smith, P. J. and Freer, J. E. (2008) 'So just wh y would a modeller cho ose to be incoherent?', Journal of hydrology, 354(1), pp. 15 - 32. Boyle, D. P. (2000) Multic riter ia Cali brati on of Hy drolog ic Model s. University of Ari zona, Depa rtment of Hydr o logy and W ater Resources, Tucson, AZ. Bueno, O. and Colyvan , M. (eds .) (2004) Logical non - apriorism and the law of non - contradiction . Oxford, UK: Oxfor d University Pre ss. Bulygina, N. and Gupta, H. (2011) 'Correcting the mathemati cal structure of a hydrolo gical model via Bayesian data assimilation', Water Resources Rese arch, 47(5), pp. W05514, doi: 10.1029/2010WR009614. Carnielli , W. and Coniglio, M. E. ( 2007) Bridge principles and combined re asoning. na. Cartwright , N. (1 983) How th e Laws of Physic s Lie. N ew York, NY: Cam bridge Univ Press. Chalmers, A. F. ( 2013) What is t his thing call ed sci ence? : Hackett Publishing. Chang, C. C. (1958) 'Algebra ic analys is of many valued l ogics', Transactions of the American Mathe matical societ y, 88(2), pp. 467 - 490. Clark, A. (1987) 'The kl udge in the mach ine', Mind & Langu age, 2(4), pp. 277 - 300. Cover, T. M. and Th omas, J. A. (1991) Elements of Information The ory. New York, NY: Wile y - Interscience. Cox, R. T . (1946) 'Probabil ity, fr equency and reasonabl e expectat ion', Americ an Journal of Physics , 14, pp. 1 - 13. Csiszár, I. (19 72) 'A Cla ss of Meas ures of Informativ ity of Obs ervation Channels', Periodica Mat hematica Hungarica , 2(1), pp. 191 - 213. Davies, P. C. W. 'Why is the physical world so comprehensible' . Complexity, entropy and the physics of information , Santa Fe, NM USA: Santa Fe Inst itute, 61 - 70. Gelman, A. and Shalizi, C. R. (2013) 'Phil osophy and the practice of Bayesian statisti cs', Brit ish Journal of Mathem atical and Statistical Psychology, 66(1), pp. 8 - 38. Gong, W., Gupt a, H. V., Yang, D., Sricharan, K. a nd Hero, A. O. ( 2013) 'Estimating Epistemic & Aleatory Uncertaint ies During Hydrologi c Modeli ng: An I nformatio n Theoret ic Appro ach', W ater Resources Research, 49(4), pp. 2253 - 2273. Grünwald, P. and Langford , J. ( 2007) 'Su boptimal behavior of Baye s and MDL in clas sificat ion unde r misspe cificat ion', Mach ine Le arnin g, 66(2 - 3), pp. 119 - 149. Gupta, H. V., Kling, H., Yilmaz, K. K. and Martinez , G. F. (2009) 'Decompo sition of the m ean squar ed error and N SE perf ormance criteria: Implications for improving hydrological modelling', Journal of Hydrology, 377(1), pp. 80 - 91. Harding, S. (ed .) (1976 ) Can t heories b e refuted ?: essays on the Dunhem - Quine thesis . Boston, MA U SA: D. Reid el Publishing C o. Hempel, C. G. an d Oppenhei m, P. ( 1948) 'St udies i n the L ogic of Explanat ion', Philoso phy of scien ce , pp. 135 - 175. Hornik, K. (1991 ) 'Approx imation capabil ities o f multi layer f eedforwar d network s', Neural n e tworks, 4(2), pp. 251 - 257. Howson, C. (2000) Hume's Problem: Induction and the Justifi cation of Belief: Induction and the Justificati on of Belief. Oxford, UK: Clarendon Press. Howson, C. and Urb ach, P. (1989) Scientific Reasoni ng: The Bayesian Approach. Ch icago, IL: Open Court Pu blishing. Hume, D. (1748) Philosophcail Essays Concerni ng Human Understandi ng. London: A. Mil ler. Jaynes, E. T. (2003) Probabilit y Theory: The Logic of Sci ence. New York, NY: Cambr idge Uni versity Press. Kinney, J. B. and Atwal, G. S . (2014) 'Equitability, mutual inform ation, and the maximal in formation coe fficient', Proceedings of the National Academy of Sciences, 111(9), pp. 3354 - 3359. Kiureghi an, A. D. and Di tlevsen, O. (20 09) 'Ale atory or episte mic? Does it mat ter?', Str uctural Sa fety, 31(2), pp. 105 - 112. Knight, F. H. ( 1921) Ri sk, Uncertai nty and Prof it. Boston, MA: Hart, Schaffner & Marx; Hought on Miffli n Company. Knuth, K. H. (20 04) 'What is a question ?', arXiv preprint physics/0403089 . Knuth, K. H. (20 05) 'Lat tice d uality: The or igin of probability and entropy', Neur ocomputing, 67, pp. 245 - 274. Kosko, B. (1990) 'Fuzzin ess vs. probabil ity', International Journal of General System, 17(2 - 3), pp. 211 - 240. Laudan, L. ( 1983) 'The demi se of the demarcation pr oblem', Phys ics, phil osoph y and psychoanalysis : Springer, pp. 111 - 127. Lenhard, J. and Winsberg, E. (2011) 'Holism and entrenchment in cli mate model validat ion', Science in the Context of Application : Springer, pp. 115 - 130. Mayo, D. G. (199 6) Err or and the gr owth of exper imental k nowledge. Uni versity of Chic ago Pres s. Mülle r, U. K. (2013 ) 'Ri sk of Bayesia n infe rence in misspeci fied models, and the sandwi ch covarianc e matr ix', Econometrica, 81(5), pp. 1805 - 1849. Nearing, G. S. and Gupta, H. V. (2015) 'The quantity and quality of inf ormation in hydrologic models', Water Resources Researc h, 51(1), pp. 524 - 538. Nearing, G. S. , Mocko, D. M., Peters - Lidard, C. D ., Kumar, S. V. and Xia, Y . (2016a) 'Benchmar king NLD AS - 2 soil moisture and evapotranspiration to separate uncertainty contributi ons', Journal of Hydrometeorology , pp. 745 - 759. Nearing, G. S., Tian, Y., Gupta, H. V., Clark, M. P., Harrison, K. W. and Weijs, S. V. (2016b) 'A philosophical basis for hyd rologic uncertainty', Hy drologic al Scien ces Jour nal, 16(9), pp. 1666 - 1678. ! 23 ! Neyman, J. (1957) '" Inductive Behavior" as a Basic Concept of Philosophy of Science', Revue de l'Institut International de Statistiqu e , pp . 7- 22. Oreskes, N. and Belitz, K. (200 1) 'Phi losophic al issu es in mod el asse ssment', Mo del v alidat ion: Persp ectiv es in hydro logical science, 23. Pearl, J. (2001) 'Bayesian ism and causalit y, or, why I am only a half - Bayesian', Foundations of bayesiani sm : Springer, pp. 19 - 36. Pielke, R. A. (2007) The Honest Broker: Making S ense of Scienc e in Policy a nd Politics. Cambridge: Camb ridge Univ Press. Popper, K. (2014 ) Conjectur es and re futations : The growt h of sci entific knowledge. routledge. Popper, K. R. (1959) The Logic of Scienti ﬁ c Discovery. Hutc hinson & Co. Quine, W. (1951) 'Two Dogmas of Empi ricism' , The Phi losophical Revi ew, USA: Duke Univers ity Press, 60(1), pp. 20 - 43. Quine, W. V. O. (1986) Philosophy of logic. Harv ard Unive rsity P ress. Rescher, N. (1968) 'Many - valued logic' , Topi cs in Philosop hical Logic : Springer, pp. 54 - 125. Russell, B. (1912) The problems of philosophy. Home Universi ty Libr ary of Modern Knowl edge. Schreiber, T. (2000) 'Measuri ng information t ransfer', Physical revi ew letter s, 85(2), pp. 461. Severo, R. P . (2012) 'Confirmation Ho lism and Unde rdetermination in Quine's Thou ght', Filosofia Unisinos , 13(2), pp. 96. Taleb, N. N. (2010) The Bl ack Swan:: The I mpact of the Hig hly Improbable Frag ility. New Yo rk: Rando m House. Taylor, K. E. ( 2001) 'Summarizing multi pl e aspe cts of model performance in a single diagram', Jou rnal of Geophysical Research: Atmospheres (1 984 – 2012), 106(D7), pp. 7183 - 7192. Terenin, A. and Draper, D. (2015) 'Rigorizing and Extend ing the Cox - Jaynes Derivation of Probability: Implications for Statisti cal Practice', arXiv preprint arXiv:1507.06597 . Tian, Y., Nearing, G. S., Peters - Lidard, C. D., Harrison, K. W. and Tang, L. (2016) ' Performance Metrics, Error Modeling, and Uncert ainty Quantifi cation' , Mont hly We ather Revi ew, 144(2), pp. 607 - 613. Tononi, G. (2011) 'Integrated information theory of consciousne ss: an updated account', Archives italiennes de biologie , 150(2 - 3), pp. 56 - 90. Van Horn, K. S. (2003) 'Constructi ng a logic of plausib le inference: a guide to cox’s theorem', Interna tional Jour nal of Approximate Reasoning, 34(1), pp. 3 - 24. Wikle , C. K. and Berliner , L. M. (20 07) 'A Bayesia n tutori al for data assi milat ion' , Physica D - Nonlinear Phenomena, 230(1 - 2), pp. 1 - 16, doi:http://dx.doi. org/10.1016/j.physd.2006.09.017. Wilki nson, R. D., Vr et tas, M., Cornford, D. and Oakle y, J. E. (2011) 'Quantifyin g simulator discrep ancy in discrete - tim e dynamical simulators', Journal of agricultural, biological, and environmental statisti cs, 16(4), pp. 554 - 570.

Information vs. Uncertainty as the Foundation for a Science of Environmental Modeling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment