Overcoming model simplifications when quantifying predictive uncertainty
It is generally accepted that all models are wrong -- the difficulty is determining which are useful. Here, a useful model is considered as one that is capable of combining data and expert knowledge, through an inversion or calibration process, to ad…
Authors: George M. Mathews, John Vial
O VERCOMING MODEL SIMPLIFICA TIONS WHEN QU ANTIFYING PREDICTIVE UNCER T AINTY GEORGE M. MA THEWS ∗ AND JOHN VIAL ∗ Abstract. It is generally accepted that all models are wrong – the difficulty is determining which are useful. Here, a useful model is considered as one that is capable of combining data and expert knowledge, through an in version or calibration process, to adequately characterize the uncertainty in predictions of interest. This paper deriv es conditions that specify which simplified models are useful and how they should be calibrated. T o start, the notion of an optimal simplification is defined. This relates the model simplifications to the nature of the data and predictions, and determines when a standard probabilistic calibration scheme is capable of accurately characterizing uncertainty . Furthermore, two additional conditions are defined for suboptimal models that determine when the simplifications can be safely ignored. The first allows a suboptimally simplified model to be used in a way that replicates the performance of an optimal model. This is achiev ed through the judicial selection of a prior term for the calibration process that explicitly includes the nature of the data, predictions and modelling simplifications. The second considers the dependency structure between the predictions and the available data to gain insights into when the simplifications can be overcome by using the right calibration data. Furthermore, the derived conditions are related to the commonly used calibration schemes based on Tikhono v and subspace re gularization. T o allo w concrete insights to be obtained, the analysis is performed under a linear expansion of the model equations and where the predictiv e uncertainty is characterized via second order moments only . Key words. uncertainty quantification, model calibration, in verse modelling, model simplification, model inad- equacy , structural error , hydrogeology , groundwater . 1. Intr oduction. This paper considers the problem of assessing uncertainty in a pre- diction made for a particular system based on the combination of specific measurement data and expert domain knowledge. The focus of this work is environmental systems, such as riv er basins and groundwater systems, howe ver , the analysis is likely to be applicable to a much wider set of domains. Correctly addressing such en vironmental prediction problems is fundamental to ensuring these important system are managed appropriately and sustain- abl. Probabilistic Bayesian methods pro vide a theoretically consistent set of rules to combine such site specific data and prior knowledge through the use of a system model. Howe ver , these methods often fail when naively applied to a model that does not capture the full com- plexity of the system. A possible solution is the incorporation of more detail and structural variations in the system model such that more sources of uncertainty are included. Ho wever , it is important to admit that this modelling ad. infinitum is not a solution as unlimited resources are nev er av ailable for detailed model construction and e xecution, and simplifications must be made, at least at some lev el. Howe ver , deciding on what should be included in a model of a system and what may be ignored is not straightforward. For instance in groundwater hydrology there is no consensus on what is an appropriate le vel of parameterization detail, e.g within the hydraulic conducti v- ity field [ 24 , 48 , 47 , 51 ]. In addition to this are the related decisions of what processes should be explicitly represented and what can be ignored. Or alternatively , when is it reasonable to lump many different processes together under a semi-physical or non-physical black box model that directly represents input-output relationships [ 17 , 46 ]. What is necessary is an understanding of the effects of simplifying assumptions made during model de velopment, and appropriate w ays of dealing with them when calibrating the model and generating predictions. A general, qualitativ e, notion of model adequacy was explored by [ 23 ] in terms of the issues f aced within the surf ace water , groundw ater, unsaturated zone, and terrestrial hydrome- ∗ Data61, CSIR O, Australia 1 2 G. M. MA THEWS AND J. VIAL teorology modelling communities. The w ork considered intermediate stages within the mod- elling process from the initial perceptual understanding through to the construction of a com- putational model and introduced a pluralistic definition of model adequacy . On one extreme is the engineering vie wpoint that defines a structurally adequate model as one that can repro- duce the input-output relationship of the system, with well characterized uncertainties (error models). On the other extreme is the physical science viewpoint that requires an adequate model to be consistent with the underlying physical system [ 23 ]. This paper will consider systems that may be exposed to future disturbances and limited data is av ailable. W ithin these problems a well characterized regression model (the engi- neering vie wpoint) cannot be constructed and some ph ysical insight of the system is required [ 23 ]. Furthermore, focus is gi ven to environmental management problems, where the role of a model is to help inform a subsequent decision problem related to risk management e.g. en- gineering design [ 18 ], groundwater management [ 15 ], or climate change [ 40 ]. This requires the specification of a probability distribution over the predictions of interest such that decision theoretic methods can be applied to quantify the risk and determine the optimal management strategy [ 5 , 43 ]. 1.1. Related W ork. The issue of mismatch between reality and a numerical represen- tation of a system has received considerable attention from may dif ferent perspectives. The presence of simplifications may be identified in the data as additional misfit that is not consis- tent with measurements errors alone. These additional errors are referred to as: model error [ 30 , 51 , 29 ], model structural error [ 6 ], structural noise [ 16 ], model inadequacy [ 25 , 23 ], model discrepancy [ 19 , 44 ], modelization uncertainties [ 45 ], and others. W ithin the system identification and control community [ 27 , 28 ] the concept is often referred to as system under-modelling [ 33 , 37 ] and two main probabilistic approaches hav e been dev eloped: stochastic embedding [ 22 , 21 , 29 ] and model error modelling [ 26 ]. These al- low subjecti ve information about the discrepancy between the transfer functions that describe the dynamics of the real system and that of a numerical model to be explicitly defined and combined with a time series dataset collected from the system. This in turn allo ws prediction uncertainty to include uncertainty due to modelling errors and measurement errors contained within the data. These approaches generally allow a very fle xible black box representation of the system dynamics to be used. In the statistical modelling community , computational simulators ha ve been considered as an explicit approximation to the real physical system they are attempting to represent. The mismatch is represented as an additional error term and modelled probabilistically , generally with explicit space time correlation [ 25 , 13 , 19 , 20 , 39 , 40 ]. Due to the computational com- plexity of the methods for high dimensional models, approximate Bayesian methods hav e also been developed [ 20 ]. Such simulators may also be decomposed and internal error terms included to represent the structural errors within smaller submodels [ 44 ]. Other work within hydrology has examined model simplifications and ho w these effect the calibration and prediction processes. In particular it has been sho wn that model parame- ters that are designed to represent particular physical properties may be forced to undertake surrogate roles to compensate for the simplifications during the model calibration process. This surrogacy has the potential to introduce additional biases into the predictions that are unaccounted for by typical probabilistic methods. This has been analyzed by e xplicitly mod- elling the simplification process in [ 31 , 16 , 14 , 50 , 51 ]. Furthermore, se veral strategies have been considered to ov ercome the approximations inherent in a simplified model. This includes the generalized likelihood uncertainty estima- tion (GLUE) method [ 7 ] that modifies the data lik elihood function such that less information is extracted from the data. For a dynamical system, the assumption of a deterministic dy- O VERCOMING MODEL SIMPLIFICA TIONS 3 namic transition function can be relaxed and a stochastic process or transition function used instead [ 49 , 11 ] (such models are often referred to as data assimilation methods). Similarly the assumption that the model parameters are time in v ariant may be related such that they can change ov er time to better match the observed system behavior [ 36 ]. It is noted that man y of these methods rely on a noticeable discrepancy between the mea- sured data and model predicted v alues. Howe ver , the adv erse ef fects of model simplifications may not necessarily cause any additional misfit between the data and model output, and thus may go unnoticed during model calibration and prediction [ 51 ]. Thus it is critical to ha ve a theoretical understanding of model adequacy that goes be yond data fit. 1.2. A pproach and Contributions. This paper considers a subjective Bayesian frame- work and formally defines what is an adequate or appropriate model by explicitly considering: (i) the nature of the predictions, (ii) the av ailable data and prior knowledge, (iii) the model simplification strate gy , and (i v) the calibration scheme. It is shown that these are intrinsically linked and performing model calibration should take into account the nature of the data, pre- dictions and any inherent limitations of the simplified computational model. It is noted that this departs from classical guidance on calibration and inv ersion that focuses on data misfit only [ 30 , 45 , 2 ]. A dual model approach is used, where a “reference” [ 31 ] or “reality” [ 51 ] model is used to describe how the system is believ ed to function. Such a high fidelity reference model has been used in the past to characterize the performance of a simplified model. Here, the approach is extended to explicitly determine when and ho w a simplified model can be used to generate an accurate, or at least conservati ve, estimate of uncertainty in a prediction of interest. T o allo w concrete insights to be produced, only linear(ized) problems are considered and the beliefs are restricted to be Gaussian, where the second order moment is sufficient to capture uncertainty . W ithin this framew ork, model simplifications are represented as a subspace projection that restricts the fle xibility of the simplified model. This pre vents the model from being able to fully represent the complexity of how the real system is believed to function. It is sho wn that if the model simplifications are ignored and standard probabilistic calibration methods are used, the model may generate overconfident and non-conserv ativ e predictions. Generally , this is a voided only when the simplification strategy is optimal . The k ey contribution of this paper is the characterization of two new calibration and prediction schemes for suboptimally simplified models that avoid this under estimation of uncertainty . The first scheme allows a simplified model to be used in a way that replicates the performance of an optimal sch- eme through the appropriate modification of the prior, or re gularization, term. In the second scheme, focus is giv en on linking the predictions with the available data such that insight is gained into when the simplifications can be ov ercome by gathering the appropriate set of data. Specifically , this paper fully defines the general problem of assessing uncertainty with a simplified model in Section 2 . Section 3 summarizes the optimal benchmark solution based on the high fidelity reference model. The general representation of model simplifications based on linear projection is defined in Section 4 . A typical probabilistic calibration sch- eme is considered in Section 5 that ignores the effects of model simplifications within the calibration and prediction process. This section defines the concept of an optimal simplifi- cation that determines when such a scheme is adequate for assessing uncertainty . Section 6 considers suboptimally simplified models and introduces two new calibration and prediction schemes, and defines the conditions when they are adequate for assessing predictiv e uncer- tainty . Discussions and directions for future research are covered in Section 7 . A worked example in volving a simplified groundw ater prediction problem is given in Appendix A . 4 G. M. MA THEWS AND J. VIAL 2. Pr oblem Definition. Consider a system where a modeller is task ed with generating a probability distrib ution ov er a prediction of a specific feature of the system to inform a future decision process. For instance in a groundwater system, the prediction of interest may be the reduction in water le vels that will be experienced by an aquifer due to an increase in future extractions. The prediction of interest will be denoted by the vector p ∈ R D p . T o aid the modeller , a limited number of measurements have been made on the sys- tem and will be represented by d ∈ R D d . Furthermore, it is considered that expert prior knowledge exists on how the system functions, for instance the physical processes inv olved, and allo ws the available data and prediction of interest to be related to the properties of the underlying system. This information shall be considered to define a high fidelity or reference model for the system with parameters denoted by the vector x ∈ R D x . Furthermore, the relationship between the data d , predictions p , and the underlying parameters x that the modeller believ es exists shall be represented by the pair of equations d = G ( x ) + errors , (1) p = Y ( x ) + errors . (2) It is considered that these functions include the operation of known physical laws, such as conservation of mass, energy and momentum, while the vector x includes a detailed repre- sentation of the forcing terms, initial conditions, material properties, etc. These requirements result in a very high dimensional v ector x and complex functions G and Y . Furthermore, it is considered that epistemic uncertainty exists as to the v alue of x for the system under inv estigation. In addition, the errors on the right of (1) and (2) allow for the inclusion of additional uncertainty , for instance to capture measurement errors introduced by the data acquisition method, or uncertainty that may exist in how a physical process actually functions [ 45 ]. This allo ws the modeller’ s complete belief about ho w the system behaves to be captured by three probability distribution functions (3) p ( x ) , p ( d | x ) , p ( p | x ) . These in turn can be used to generate a posterior distrib ution o ver the predictions of interest, denoted as p ( p | d ) , using standard rules of probability theory . It is noted that a major issue with pursuing this type of subjectiv e Bayesian approach is the requirement that these probability distrib utions must represent e very detail that is belie ved to be present in the physical system. This is not possible and simplification must be made. 2.1. Simplifications. It is considered that the process of modelling, for instance as out- lined in [ 23 ], transforms the detailed knowledge of the modeller and produces the desired simplified computational representation of the problem that can be solved within the typical computational constraints. Specifically , the output of the modelling process is (or should be) an approximate representation of the modeller’ s beliefs that can be processed numerically to generate a prediction posterior distribution that is somehow similar to the optimal, but compu- tationally intractable, distribution p ( p | d ) generated by the high fidelity model. The modelling process is depicted in Figure 1 where the outputs are denoted by the set of approximate beliefs (4) ˆ p ( v ) , ˆ p ( d | v ) , ˆ p ( p | v ) . These probability distributions are defined ov er a simplified description of the system, de- noted by the parameter vector v ∈ R D v . Furthermore, it is considered that the approximate O VERCOMING MODEL SIMPLIFICA TIONS 5 Perceptual Model Conceptual Model Mathematical Model Computational Model Subjectiv e expert beliefs: p ( x ) , p ( d | x ) , p ( p | x ) → p ( p | d ) Simplified beliefs: ˆ p ( v ) , ˆ p ( d | v ) , ˆ p ( p | v ) → ˆ p ( p | d ) F I G . 1 . The pr ocess of modelling is to transform the modeller’ s beliefs into a set of approximate pr obabilities that can be pr ocessed numerically to generate a conservative posterior pr obability distribution over the predictions of inter est. posterior ˆ p ( p | d ) should not just be similar to posterior p ( p | d ) , it should be in some sense conservative such that the uncertainty is not underestimated [ 15 ]. Thus, the key question that is addressed in this paper is no w specified as: How can the subjective beliefs of a modeller , defined for the high fidelity r eference model, be transformed to generate appr oximate pr obabilities that define a simplified computational model such that the computed pr ediction posterior ˆ p ( p | d ) is a conservative approximation of the optimal r eference posterior p ( p | d ) ? This question explicitly links the typically separate problems of: model construction, calibration and prediction. Furthermore, answers will be provided under the restriction that the uncertainties are described by multiv ariate Gaussian distributions and the dependencies are linear . This will allow more specific and concrete insights to be gained into the effects of simplifications and how they may be ov ercome. Extensions to more general non-Gaussian and nonlinear systems is left for future work. 3. Optimal Inference and Prediction. The optimal Bayesian prediction scheme, which does not consider any simplifications, is now briefly re viewed [ 5 , 45 , 2 ]. This is based on the high fidelity reference model. Let the predictions of interest, the av ailable data, and the parameters of the high fidelity model be defined by the random vectors: p ∈ R D p , d ∈ R D d , and x ∈ R D x , where D p , D d and D x are their respective dimensions. It is considered that the dimension of x is excessi vely large, while the data is limited, and the number of predictions is small (perhaps only one), such that D x D d , D p . Prior knowledge of the parameter v ector is represented by the Gaussian probability den- sity function p ( x ) , with mean of zero and known second order moment Σ x such that (5) p ( x ) = N ( x ; 0 , Σ x ) . It is noted that the requirement of a zero mean distribution simplifies the analysis and can be met in general with a transformation into increments. Now , the believ ed relationship between the system properties x , and the data vector d is considered to be linear with coefficient matrix G . This can be considered as a linearized ap- proximation of (1) , howe ver the approximation introduced by the linearization is considered out of scope in this paper . In addition, the uncertainty that is belie ved to e xist in this relation- 6 G. M. MA THEWS AND J. VIAL ship is represented by the error term δ such that d = Gx + δ . Furthermore, the uncertainty in the value of δ is represented by a zero mean Gaussian density with known co variance matrix of Σ δ . In the simplest case, the matrix Σ δ represents errors within the measurement pro- cess. This allows the modeller’ s conditional belief of the measured data gi ven the underlying parameters to be defined by the Gaussian density (6) p ( d | x ) = N ( d ; Gx , Σ δ ) . The prediction of interest p is also considered to be linearly dependent on x with a coefficient matrix Y , and an additiv e error denoted by ρ , such that p = Yx + ρ . In addition, it is considered that the uncertainty in the value of the error is a zero mean Gaussian density with known co variance Σ ρ . It is noted that this requires that δ and ρ are independent. If any correlation exists, it must be included in x . Finally , the conditional belief of the prediction giv en the system properties can now be defined as (7) p ( p | x ) = N ( p ; Yx , Σ ρ ) . Note that if the system is belie ved to be well described by a deterministic model, the cov ari- ance of the prediction error could be considered negligible. The calibration or in version stage seeks to determine the posterior belief over the param- eters x given the a vailable data d p ( x | d ) = p ( x ) p ( d | x ) R p ( x ) p ( d | x ) d x . Here, the posterior density p ( x | d ) = N ( x ; µ x | d , Σ x | d ) is Gaussian with a mean and cov ari- ance giv en by µ x | d = Σ x G > ( GΣ x G > + Σ δ ) − 1 d , (8) Σ x | d = Σ x − Σ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x . (9) The coefficient matrix of the data d in (8) is sometimes referred to as the optimal estimator, or gain matrix, and will be denoted by E = Σ x G > ( GΣ x G > + Σ δ ) − 1 . T o propagate the posterior belief into the prediction of interest, requires the integration ov er the underlying parameters p ( p | d ) = Z p ( p | x ) p ( x | d ) d x . The prediction posterior density p ( p | d ) = N ( p ; µ p | d , Σ p | d ) is Gaussian with mean and cov ariance giv en by µ p | d = Y µ x | d = YΣ x G > ( GΣ x G > + Σ δ ) − 1 d , Σ p | d = YΣ x | d Y > + Σ ρ = YΣ x Y > + Σ ρ − YΣ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x Y > . The ov erall calibration and prediction scheme is now summarized belo w . This sets the benchmark for the other schemes considered in this paper . D E FI N I T I O N 1 (Optimal Scheme). The optimal scheme is denoted by the set of lin- ear Gaussian probability density functions p ( x ) , p ( d | x ) and p ( p | x ) , parameterized by the matrices Σ x , G , Σ δ , Y , and Σ ρ , as defined in (5) – (7) . O VERCOMING MODEL SIMPLIFICA TIONS 7 x Σ x d G , Σ δ p Y , Σ ρ F I G . 2. Bayesian network depicting the structure of the densities p ( x ) , p ( d | x ) , p ( p | x ) that employ the high fidelity model. F or a given dataset d , the posterior belief over the predictions of inter est is the Gaussian density p ( p | d ) = N ( p ; µ p | d , Σ p | d ) , with mean and covariance defined by the functions µ ( · ) and Σ ( · ) respectively µ p | d = µ ( Σ x , G , Σ δ , Y , Σ ρ , d ) (10) = YΣ x G > ( GΣ x G > + Σ δ ) − 1 d , Σ p | d = Σ ( Σ x , G , Σ δ , Y , Σ ρ ) (11) = YΣ x Y > + Σ ρ − YΣ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x Y > . Note that the covariance matrix of the posterior is not a function of the data d . The structure of the above prediction scheme that encapsulates the prior densities p ( x ) , p ( d | x ) , p ( p | x ) that are based on the high fidelity model is depicted graphically in Figure 2 . 4. Model Simplifications. The simplified computational model is considered to contain an approximate system simulator, with exposed parameters v and linear structure analogous to that defined for the high fidelity model d ≈ ˜ Gv + δ , (12) p ≈ ˜ Y v + ρ . (13) Here ˜ G and ˜ Y are linear data and prediction matrices of a simplified system simulator . The exact form of the approximations within these relationships are of critical interest and will be explored by e xplicitly representing the simplifications inv olved. T o achieve this consider initially that the parameters v are someho w physically inspired and hav e some meaning to the modeller in describing aspects of the system. This allows the parameters of the simplified model to be used to describe a restricted version of the high fidelity model with parameters x 0 ∈ R D x determined by the parameter vector of the simplified model v 0 ∈ R D v via a matrix C ∈ R D x × D v (14) x 0 = Cv 0 . This explicit representation of model simplifications allows aspects of the system to be ig- nored [ 16 , 51 ] and allo ws for the incorporation of arbitrary linear parameterization schemes, such as spatial and temporal homogeneity [ 30 ]. W ith this definition, the simplified data and prediction matrices may be rewritten in terms of the matrices of the high fidelity model ˜ G = GC , (15) ˜ Y = YC . (16) 8 G. M. MA THEWS AND J. VIAL The matrix C will be referred to as a simplification matrix and provides a way of linking the simplified simulator to the original high fidelity reference model of how the system functions. Finally , it is important to note that physical intuition is not strictly required when defining the simple model as for any G , Y and ˜ G , ˜ Y , such that G Y has full ro w rank, there is al ways a C that satisfies (15) and (16) . Howe ver , this may not be unique and thus there are potentially many dif ferent ways to interpret the meaning of the simple model. Here, it is considered that the modeller specifies the interpretation by e.g. specifying C directly . 4.1. Representing Unmodelled Complexity . A simplification matrix C ∈ R D x × D v , divides the space R D x into two perpendicular subspaces: columnspace( C ) and cokernel( C ) . The column space contains the set of parameter vectors that can be e xplicitly represented by a lo w dimensional vector . The second subspace cokernel( C ) contains vectors that includes some degree of comple xity that cannot be represented by a low dimensional vector . Before defining these further , the following assumption is made. A S S U M P T I O N 1. A simplification matrix C has full column rank, i.e . rank( C ) = D v . It is noted that if a simplification matrix C does not have full column rank, then there will be parameter combinations of the simple model that hav e the same influence on the high fidelity model, which implies that the simple model is not as simple as it could be for the same expressi ve po wer . Now , consider the singular value decomposition of C under the abov e assumption (17) C = USV > = U C U ¯ C S C 0 V > C = U C S C V > C , where U C is the collection of orthogonal unit vectors that span the column space of C and similarly U ¯ C spans the subspace perpendicular to this, the cokernel of C . This enables the parameter vector of the high fidelity model to be e xpanded as (18) x = Cv + U ¯ C u , where v is the parameter v ector of the simple model and u is a random vector that captures all the unmodelled complexity . Finally , it is noted that this expansion is unique and only dependent on the definition of the simplification matrix C . Under the abov e expansion, the data and prediction is related to the two components v and u through d = Gx + δ = ˜ Gv + δ + η z }| { GU ¯ C u , (19) p = Yx + ρ = ˜ Y v + ρ + YU ¯ C u | {z } . (20) From these equations, it is noted that the unmodelled components of the system will be ex- pressed in the data whene ver GU ¯ C 6 = 0 and in the predictions whene ver YU ¯ C 6 = 0 . These additional error terms are denoted as η and respectively and play a very dif ferent role to the other error terms δ and ρ as they are correlated via u . It is noted that the additional error η introduced by the model simplifications was explicitly considered in [ 30 ] and [ 45 ] and was used to define a composite measurement error model for the sum δ + η . Howe ver , the effects of the simplifications on the prediction, including the explicit correlation between η and , was not addressed. In an ideal world it could be argued that the objecti ve of any modelling ex ercise should be to construct a model that is a simplified version of ho w the system is believed to function O VERCOMING MODEL SIMPLIFICA TIONS 9 that captures the full complexity of the av ailable data and predictions of interest. That is, which has a C such that η = = 0 . Such a simplification matrix always exists, and will be referred to as an optimal simplification. D E FI N I T I O N 2 (Optimal Simplification). Let G and Y be data and predictions matrices for a high fidelity system model. Then a simplification matrix C is optimal if (21) GU ¯ C = 0 and YU ¯ C = 0 , wher e U ¯ C spans the cokernel of C , as defined in (17) . It is noted that the above definition of optimal simplification dif fers from that of [ 16 , 14 , 50 ] in that it explicitly considers the nature of the predictions. An optimal simplification matrix, C ∗ , can be found for any gi ven data and prediction matrices G and Y , by first taking a singular value decomposition of the matrix Z = G Y (22) Z = G Y = U Z U ¯ Z S Z 0 0 0 V > Z V > ¯ Z . Now , an optimal simplification matrix is giv en by C ∗ = V Z . It is considered that an optimally simplified model is dif ficult, or even achiev able, to obtain in practice and the remainder of the paper will focus on understanding and ov ercoming the issues caused by suboptimal simplifications. 5. Naive Use of Simplified Models. Here, a prediction scheme is defined that first infers the parameters of the simple model and then propagates them into the prediction, b ut ignores the potential for unmodelled complexity to be expressed in the data or predictions. This represents a standard probabilistic Bayesian approach to model calibration and prediction. It is sho wn that this scheme is generally not conservati ve as the uncertainty in the predictions is underestimated. 5.1. Prior Inf ormation. The scheme will have the same basic structure as the optimal benchmark scheme. In particular it will include an explicit representation of a prior over the parameters ˆ p n ( v ) , a data likelihood ˆ p n ( d | v ) and conditional density for the prediction ˆ p n ( p | v ) . Here the superscript n denotes approximate distributions associated with this naive method. The conditional densities ˆ p n ( d | v ) and ˆ p n ( p | v ) are defined directly with the simplified data and prediction matrices ˜ G and ˜ Y respectiv ely . Furthermore, the cov ariances are set to those used in the optimal scheme i.e. ˆ p n ( d | v ) = N ( d ; ˜ Gv , Σ δ ) , (23) ˆ p n ( p | v ) = N ( p ; ˜ Y v , Σ ρ ) . (24) This formulation of the data and prediction densities is equiv alent to ignoring the approxima- tions in (12) and (13) introduced by the model simplifications. Now , to be somewhat rigorous, the specification of a prior distribution on the parameters v is performed by e xplicitly considering the nature of the simplification and the uncertainty in the parameters of the high fidelity model. This is accomplished by considering a transfor- mation that propagates the prior distribution ov er the parameters of the high fidelity model x into the joint space formed by the parameter v ector of the simple model v and the vector u that denotes the unmodelled complexity . This is defined in the following proposition. P R O P O S I T I O N 1 (Uncertainty Propagation). Let p ( x ) = N ( x ; 0; Σ x ) be a zer o mean Gaussian density , with co variance Σ x , that captures the prior uncertainty in the parameter s 10 G. M. MA THEWS AND J. VIAL of a high fidelity model. Also, let C be a simplification matrix that describes a simplified model. Then, the joint pr obability density function for v and u , r epresenting the modelled and unmodelled components of the simple model r espectively , is the zer o mean corr elated Gaus- sian density p ( v , u ) = N v u ; 0 0 , Σ v Σ vu Σ > vu Σ u , wher e the covariance terms ar e given by Σ v = C † Σ x C † > , Σ u = U > ¯ C Σ x U ¯ C , and Σ vu = C † Σ x U ¯ C . Her e, ( · ) † is the pseudoin verse operator and U ¯ C spans the cokernel of C as defined in (17) . See Appendix B for a proof. This proposition defines the covariance matrix Σ v , that represents the uncertainty in the parameters of the simplified model, to be the orthogonal projection of Σ x onto the column space of C . This is the most rigorous approach to specifying the prior over the parameters of the simple model as it preserves the uncertainty that is believ ed to e xist and is consistent with the physical intuition used to construct the simplified model. This marginal density is used to define the prior of the nai ve prediction scheme ˆ p n ( v ) = N ( v ; 0 , Σ v ) . 5.2. Naive Calibration and Prediction. The naive calibration and prediction scheme is now defined that generates a posterior via the standard rules of probability theory . This is similar to the optimal scheme, but without consideration to the simplifications within the model’ s data and prediction equations. D E FI N I T I O N 3 (Naive Scheme). A naive pr ediction scheme is denoted by the simpli- fied linear Gaussian density functions ˆ p n ( v ) , ˆ p n ( d | v ) and ˆ p n ( p | v ) parameterized by the matrices Σ v , ˜ G , Σ δ , ˜ Y , and Σ ρ . Furthermor e, the prior co variance matrix Σ v is defined in terms of a high fidelity model using Pr oposition 1 , that is (25) Σ v = C † Σ x C † > , wher e Σ x denotes a covariance matrix that describes the uncertainty in the parameters of the high fidelity model, and C denotes the simplification matrix that links the parameters of the simple and high fidelity models. F or a given dataset d , the posterior belief over the predictions of interest g enerated by this naive scheme is the Gaussian density ˆ p n ( p | d ) = N ( p ; ˆ µ n p | d , ˆ Σ n p | d ) , with mean and covariance defined as ˆ µ n p | d = µ ( Σ v , ˜ G , Σ δ , ˜ Y , Σ ρ , d ) , (26) ˆ Σ n p | d = Σ ( Σ v , ˜ G , Σ δ , ˜ Y , Σ ρ ) , (27) wher e the functions µ ( · ) and Σ ( · ) ar e given in Definition 1 . 5.3. Perf ormance. T o assess the performance of the nai ve scheme the notion of conser - vati veness, discussed in Section 2 , is first defined by considering the squared error in the mean O VERCOMING MODEL SIMPLIFICA TIONS 11 prediction. This is performed for a pair of probability density functions in Definition 4 , and extended to calibration and prediction schemes in Definition 5 . A further generalization of these definitions, which explicitly incorporates the utility function of the subsequent decision problem, is proposed in Section 7 . D E FI N I T I O N 4 (Conservati ve Density). Let p ( p ) and ˆ p ( p ) be density functions defined over the random vector p . Furthermor e, let ˆ µ p be the mean of ˆ p ( p ) . Then ˆ p ( p ) is defined as a conservative appr oximation of the refer ence density p ( p ) if the approximate density’ s expected mean squared error E ˆ p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > } , does not incr ease when the ex- pectation is instead taken with r espect to r eference density p ( p ) (28) E ˆ p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > } E p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > } . Wher e A B r equires that A − B is positive semi-definite. It is noted that condition (28) can be rewritten in terms of the means µ p , ˆ µ p and cov ariances Σ p , ˆ Σ p of the two densities, and yields the condition ˆ Σ p Σ p + ( ˆ µ p − µ p )( ˆ µ p − µ p ) > . An example of a conserv ativ e density , that satisfies Definition 4 , is giv en in Figure 3 . The notion of conserv ativeness is now generalized to a prediction scheme. This is per- formed by considering an expectation o ver typical data. D E FI N I T I O N 5 (Conservati ve Scheme). Let the densities p ( x ) , p ( d | x ) , and p ( p | x ) de- note the prior information of a r eference scheme. Also, let p ( p | d ) denote the posterior density this sc heme generates for a given dataset d . Furthermore , let ˆ p ( p | d ) denote a posterior den- sity generated by an appr oximate scheme. F or the given dataset d , the de gr ee of conservativeness of the approximate posterior is denoted by the function (29) Ω( d ) = E ˆ p ( p | d ) { ( ˆ µ p | d − p )( ˆ µ p | d − p ) > } − E p ( p | d ) { ( ˆ µ p | d − p )( ˆ µ p | d − p ) > } . Now , the appr oximate scheme is defined as conservative if the expectation of the function Ω( d ) is positive semi-definite (30) E p ( d ) { Ω( d ) } 0 . Her e, the density p ( d ) denotes how probable differ ent datasets are under the prior knowledge of the r eference sc heme and is given by p ( d ) = R p ( x ) p ( d | x ) d x . It is noted that for the linear Gaussian densities considered in this work, the covariance of the posterior distributions are independent of the data and (30) can be simplified as (31) ˆ Σ p | d Σ p | d + E p ( d ) ( ˆ µ p | d − µ p | d )( ˆ µ p | d − µ p | d ) > . Thus, a scheme is considered as conservati ve if it generates a posterior covariance matrix ˆ Σ p | d that is inflated with respect to Σ p | d by an amount dependent on the av erage squared difference in the posterior means. It is now of interest to consider the performance of the nai ve scheme and determine when it is conservati ve with respect to the optimal benchmark scheme. This is performed in the following proposition. 12 G. M. MA THEWS AND J. VIAL 2 σ 2 ˆ σ µ ˆ µ x p ( x ) ˆ p ( x ) F I G . 3 . The Gaussian pr obability density function ˆ p ( x ) , with mean ˆ µ and standar d deviation ˆ σ , is a conserva- tive estimate of the Gaussian density p ( x ) , with mean µ and standar d deviation σ , as ˆ σ 2 ≥ σ 2 + ( µ − ˆ µ ) 2 . P R O P O S I T I O N 2 (Performance of Naiv e Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ define an optimal calibration and pr ediction scheme , with posterior density denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Let C denote a simplification matrix such that the matrices Σ v = C † Σ x C † > , ˜ G = GC , Σ δ , ˜ Y = YC , Σ ρ define a naive pr ediction scheme for a simplified model. Now , consider the posterior density generated by the naive sc heme ˆ p n ( p | d ) = N ( p ; ˆ µ n p | d , ˆ Σ n p | d ) . 1. If C is an optimal simplification, then the generated mean and covariance of the naive pr ediction scheme ar e equivalent to those of the optimal prediction scheme, that is ˆ µ n p | d = µ p | d and ˆ Σ n p | d = Σ p | d for all d . 2. If C is a suboptimal simplification, then the naive sc heme is conservative, pr ovided the following condition holds (32) YU ¯ C = ˜ Y E n GU ¯ C , wher e E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 denotes the estimator matrix of the naive scheme and U ¯ C spans the cokernel of C as defined in (17) . 3. If C is a suboptimal simplification, condition (32) does not hold, u and v ar e con- sider ed independent, and the covariance in the unmodelled complexity is non-zer o Σ u 6 = 0 , then the naive scheme is strictly non-conservative . See Appendix C for a proof. Under a suboptimal simplification, the non-conservati ve nature of the prediction scheme is caused by the influence of unmodelled complexity on the data and/or predictions. If it influences the predictions, then there is a direct bias ef fect. If it influences the data, then this causes the estimated parameters of the simple model to become biased by forcing them to take on surrogate roles to compensate for the simplifications. Furthermore, this bias in the estimated parameters is propagated to the prediction. Neither of these ef fects are taken into account by the naiv e prediction scheme and the uncertainty is underestimated. The condition in (32) , which guarantees the scheme is conservati ve, requires that these two sources of errors in the prediction caused by the unmodelled comple xity to exactly cancel each other out. This is unlikely to hold in practical scenarios without careful attention to the data, predictions, simplifications, and the type of prior knowledge available. This condition will form the basis of the data driv en prediction scheme introduced in Section 6 . O VERCOMING MODEL SIMPLIFICA TIONS 13 d ˜ G , Σ δ p ˜ Y , Σ ρ v Σ v u F I G . 4. Bayesian network depicting the structur e of the densities ˆ p n ( v ) , ˆ p n ( d | v ) , and ˆ p n ( p | v ) employed by the naive scheme . Also displayed in dashed lines is the dependency on unmodelled complexity u under the conditions of an optimal simplification. Note that under an optimal simplification, the unmodelled complexity does not have any dir ect effect on the data or predictions. Finally , it is noted that for the general case when v and u are not independent and (32) does not hold, the naiv e scheme is still not guaranteed to be conservati ve. Howe ver there are more special cases which may produce conservati ve posterior densities. The conditions that delineate the strictly conserv ativ e from the non-conservati ve scenarios are likely to be of limited interest in realistic problems and hav e not been enumerate here. 5.4. Summary . This section has formally defined a probabilistic calibration and predic- tion scheme that embeds the standard separation of modelling, calibration and prediction. In particular , physical knowledge is used to define a prior probability distrib ution for the param- eters included in the simplified model. Also, the conditional data and prediction probability distributions are defined by ignoring the simplifying assumptions used to construct the data and prediction equations. Furthermore, it is sho wn that a true characterization of predicti ve uncertainty typically requires an optimally simplified model. For suboptimally simplified models it is shown that this scheme generally underestimates the uncertainty in the predic- tions and is not conservati ve. The dependency structure of the probability distributions for this naive scheme is de- picted in Figure 4 . Also displayed is the dependency on the unmodelled complexity for an optimally simplified model. It is noted that under this condition the data and predictions are conditionally independent of the unmodelled complexity u giv en the model parameters v . Finally , it is noted that the non-conservati ve, or o verconfident, nature of predictions gen- erated by nai vely applying probabilistic Bayesian methods on simplified models is not a ne w finding. In particular , the result can be considered as a generalization of [ 16 , 14 , 51 ] and is consistent with arguments of [ 7 ]. 6. Overcoming Model Simplifications. It is of interest now to understand what to do about the issue of ov erconfidence when a suboptimal model is used. Firstly , it is noted that this issue may not be a problem. For instance, the prediction uncertainty may turn out to be larger than expected, and this in itself may provide sufficient information to allow a giv en management decision to be made. Howe ver , the situation becomes more dif ficult when it is necessary to determine an accurate or at least conservati ve prediction probability distribution, for instance as required by [ 15 ]. In this scenario the modeller may opt to: • Expand the sources of uncertainty considered in the model through less simplistic 14 G. M. MA THEWS AND J. VIAL modelling assumptions with the hope of producing an optimally simplified model. This is not just about representing greater spatial/temporal detail in the model [ 24 ], but more importantly the goal should be the inclusion of more sources of uncer- tainty . As an example, if there exists uncertainty over the presence of some impor - tant structural feature, a multi-hypothesis, or multi-model, analysis may be needed [ 34 , 35 , 38 ]. • Directly model the additional errors introduced by the model simplifications. For instance by defining an additional probabilistic model for p ( η , ) introduced in (19) and (20) that captures the uncertainty the modeller has in how well the simplistic model represents the beha vior of the system. Such approaches have been developed in [ 25 , 13 , 19 , 20 , 39 , 40 ]. • W ithin the conte xt of a dynamical system, remo ve the deterministic assumption on how the system e volves ov er time. This allows the simplifications to be represented by an error model within the modelled transition function. Such a scheme is em- ployed by [ 49 ]. Alternativ ely , the assumption that the model parameters are time in variant can be relaxed such that the parameters can change over time to better match the underlying system behavior and inject greater uncertainty in the predic- tions [ 36 ]. • Modify the way that the data is used through changing the likelihood model p ( d | v ) such that less information is propagated into the prediction via the model parameters. The generalized likelihood uncertainty estimation method [ 7 ] can be considered as an example of this approach. In the remainder of this section, two additional approaches are defined, along with ex- plicit conditions that determine when they are appropriate. Firstly a scheme is defined that allows the posterior of the ideal prediction scheme to be reproduced using a suboptimally simplified model through the appropriate modifications of the prior covariances. Secondly , structural considerations of the data and predictions are considered and an approach devel- oped that allows the simplifications to be overcome through the use of the right calibration data. 6.1. Optimal Use of Simplified Models. The naiv e scheme de veloped pre viously was not conserv ative for a suboptimally simplified model. Here a prediction scheme is defined that is capable of reproducing the optimal scheme, ev en under suboptimal simplifications. T o start, consider the probability densities ˆ p o ( v ) , ˆ p o ( d | v ) , ˆ p o ( p | v ) to be defined in a similar manner to the naiv e scheme, but parameterized in terms of new covariance matrices ˆ Σ o v , ˆ Σ o δ and ˆ Σ o ρ such that ˆ p o ( v ) = N ( v ; 0 , ˆ Σ o v ) , ˆ p o ( d | v ) = N ( d ; ˜ Gv , ˆ Σ o δ ) , ˆ p o ( p | v ) = N ( p ; ˜ Y v , ˆ Σ o ρ ) . Here the covariance matrices will be considered as adjustable such that they can be chosen in such a way that the y compensate for the simplifications and allo w the optimal posterior to be reproduced. Thus, it is of interest to select ˆ Σ o v , ˆ Σ o δ and ˆ Σ o ρ such that the posterior density that is produced by their combination, ˆ p o ( p | d ) , replicates the posterior of the optimal scheme, i.e. (33) ˆ p o ( p | d ) = p ( p | d ) for all p , d . The posterior density ˆ p o ( p | d ) is Gaussian, with mean and covariance defined in a similar fashion to the optimal scheme. Thus, the above condition will be satisfied when the means O VERCOMING MODEL SIMPLIFICA TIONS 15 and cov ariances are equiv alent, which occurs when ˜ Y ˆ Σ o v ˜ G > [ ˜ G ˆ Σ o v ˜ G > + ˆ Σ o δ ] − 1 = YΣ x G > [ GΣ x G > + Σ δ ] − 1 (34a) and (34b) ˜ Y ˆ Σ o v ˜ Y > + ˆ Σ o ρ − ˜ Y ˆ Σ o v ˜ G > [ ˜ G ˆ Σ o v ˜ G > + ˆ Σ o δ ] − 1 ˜ G ˆ Σ o v ˜ Y > = YΣ x Y > + Σ ρ − YΣ x G > [ GΣ x G > + Σ δ ] − 1 GΣ x Y > . If these conditions can be met, it defines a set of optimal cov ariance matrices of the simpli- fied probability density functions such that when they are used to generate a posterior , the performance of the optimal scheme is replicated. 6.1.1. Highly Parameterized Models. The above conditions will now be specialized for a highly parameterized model, where the calibration problem is under -constrained. Such models are recommended by [ 24 ] and [ 51 ], and the results will e xplicit determine when such approaches are appropriate. This will be performed by focusing attention only on the prior cov ariance matrix for the parameters of the simple model, such that ˆ Σ o δ and ˆ Σ o ρ remain the same as those used in the naiv e and optimal schemes, i.e. ˆ Σ o δ = Σ δ and ˆ Σ o ρ = Σ ρ . Such a scheme is now defined. D E FI N I T I O N 6 (Optimally Compensated Scheme). An optimally compensated scheme is denoted by the simplified linear Gaussian pr obability density functions ˆ p o ( v ) , ˆ p o ( d | v ) and ˆ p o ( p | v ) par ameterized by the matrices ˆ Σ o v , ˜ G , Σ δ , ˜ Y , and Σ ρ . T o fully define the prior covariance matrix ˆ Σ o v , let C denote a simplification matrix that links the simplified model to a high fidelity model with data and pr ediction matrices denoted by G and Y suc h that ˜ Y = YC and ˜ G = GC . Furthermor e, let Σ x denote the covariance matrix that describes the uncertainty in the parameters of this high fidelity model. The prior covariance matrix ˆ Σ o v is now defined as (35) ˆ Σ o v = RΣ x R > , wher e R = ˜ Z † Z , and Z = G Y , ˜ Z = ˜ G ˜ Y . F or a given dataset d , the posterior belief over the predictions of interest g enerated by this scheme is the Gaussian density ˆ p o ( p | d ) = N ( p ; ˆ µ o p | d , ˆ Σ o p | d ) , with mean and covariance defined as ˆ µ o p | d = µ ( ˆ Σ o v , ˜ G , Σ δ , ˜ Y , Σ ρ , d ) , (36) ˆ Σ o p | d = Σ ( ˆ Σ o v , ˜ G , Σ δ , ˜ Y , Σ ρ ) , (37) wher e the functions µ ( · ) and Σ ( · ) ar e given in Definition 1 . This scheme is now shown to be equiv alent to the optimal scheme when applied to a suboptimal but highly parameterized model where the number of free parameters is at least as large as the number of linearly independent measurements and predictions of interest. P R O P O S I T I O N 3 (Performance of Compensated Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ define an optimal calibration and pr ediction scheme, with posterior density denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Let C denote a suboptimal simplification matrix and let the matrices ˆ Σ o v = RΣ x R > , ˜ G = GC , Σ δ , ˜ Y = YC , and Σ ρ define an optimally compensated scheme, wher e R = ˜ Z † Z and Z = G Y , ˜ Z = ZC = ˜ G ˜ Y . Now , consider the posterior density gener ated by the 16 G. M. MA THEWS AND J. VIAL optimally compensated scheme ˆ p o ( p | d ) = N ( p ; ˆ µ o p | d , ˆ Σ o p | d ) . If the simplification is chosen such that (38) rank( ZC ) = rank( Z ) . Then, the posterior mean and covariance generated by the optimally compensated scheme ar e equivalent to those of the optimal scheme ˆ µ o p | d = µ p | d and ˆ Σ o p | d = Σ p | d for all d . See Appendix D for a proof. This demonstrates that giv en a model simplified in a suboptimal fashion, the optimal performance can be recovered through the adjustment of the assumed prior uncertainties. The main rank condition in (38) is fairly easy to achie ve in practice and is met when, e.g. the number of parameters is not smaller than the number of linearly independent measurements and predictions, and the simplified model matrix ˜ Z = ZC has full (row) rank. A model may be highly parameterized due to the degrees of spatial variability in e.g. the modelled material properties that is included in the model (as in [ 51 ]). Howe ver , the abov e result can apply to models of dynamical systems that use stochastic transition models (sometimes referred to as data assimilation methods). In these models the additional dynamic error terms that are incorporated at each time increment can be similarly vie wed as a set of additional model parameters. The general structure of the optimally compensated calibration and prediction scheme is depicted with a Bayesian network in Figure 5 . It is noted that the difference between this scheme and the naiv e scheme lies only in how the prior cov ariance matrix for the parameters of the simple model is specified. They are both defined as a transformation of the cov ariance matrix Σ x that captures the uncertainty in the parameters for the high fidelity model. Recall that the naiv e scheme uses Σ v = C † Σ x C † > , while the optimally compensated scheme uses ˆ Σ o v = RΣ x R > . The matrix R is dependent on the simplification matrix C but it is also dependent on the data and predictions matrices G and Y of the high fidelity models. It is this explicit dependency on the data and predictions that allows the prior covariance matrix ˆ Σ o v to compensate for the errors in the data and prediction equations of the simplified model. T o gain additional intuition as to how the compensating prior covariance matrix ˆ Σ o v is related to Σ v used by the nai ve scheme, consider for the moment that v and u are indepen- dent. This enables the prior covariance matrix for the parameters of the high fidelity model to be decomposed as Σ x = CΣ v C > + U ¯ C Σ u U > ¯ C . W ith this, the prior matrix ˆ Σ o v can be rewritten as (39) ˆ Σ o v = Σ v + Σ + , where Σ + = ˜ Z † ZU ¯ C Σ u U > ¯ C Z > ˜ Z † > and is positiv e semi-definite matrix. This means that for ˆ Σ o v to compensate for the simplifications, the parameters must be allowed greater flexi- bility than would be gi ven by the naiv e use of Σ v as ˆ Σ o v Σ v . O VERCOMING MODEL SIMPLIFICA TIONS 17 d ˜ G , Σ δ p ˜ Y , Σ ρ v ˆ Σ o v u F I G . 5 . Bayesian network depicting the structur e of the densities ˆ p o ( v ) , ˆ p o ( d | v ) , and ˆ p o ( p | v ) employed by the optimally compensated scheme. Also displayed in dashed lines are the dependency on unmodelled complexity u under the conditions of a suboptimal simplification. Note that under suc h a simplification, the unmodelled complexity has a dir ect effect on the data and/or predictions. 6.1.2. W eighted Least Squares In version. In addition to the Bayesian ar guments pro- vided above, no w consider a regularized weighted least squares formulation commonly used for calibration and in version problems [ 30 , 45 ]. With the assumption of linear models, the mean v ector ˆ µ o v | d is also the maximum a posteriori estimate of the posterior parameter distri- bution of the simplified model ˆ p o ( v | d ) ∝ ˆ p o ( d | v ) ˆ p o ( v ) , and may be equiv alently defined as the solution to the regularized weighted least squares optimization problem (40) ˆ µ o v | d = arg min v ( d − ˜ Gv ) > Σ − 1 δ ( d − ˜ Gv ) + v > ˆ Σ o v − 1 v . Note that this explicitly employs the simplified forward model ˜ Gv . Furthermore, the pre- diction generated by this estimate ˆ µ o p | d = ˜ Y ˆ µ o v | d is the mean and mode of the prediction posterior ˆ p o ( p | d ) . Due to the optimality of the compensated scheme, this point prediction is also the optimal minimum error variance prediction that the high fidelity model would generate. This demonstrates that a simplified, but highly parameterized model, can be calibrated through the careful selection of a T ikhonov regularizer to generate the optimal prediction. Additionally , this optimal regularizer can also be used to characterize the predictiv e uncer- tainty of this point prediction through the use of the cov ariance formula in (37) . 6.1.3. Summary . The introduced optimally compensated prediction scheme allows sub- optimal model simplifications to be ov ercome through modifying the prior probability distri- bution for the parameters of the simplified model. This explicitly links: (i) the modelling problem, through the selection of a simplification matrix C ; (ii) the nature of the av ailable data via G ; (iii) the predictions of interest defined by Y ; and (iv) ho w the simple model is calibrated, as the required regularization term ˆ Σ o v is explicitly dependent on all three terms. It is noted howe ver , that this explicit dependency on the high fidelity model will make the generation of ˆ Σ o v problematic in practice as the matrix G and Y will not generally be av ailable. Ne vertheless, it formally defines ho w a prior regularization term for a suboptimal model should be selected. 6.2. Data Driven Predictions. It was noted that the main barrier to applying the op- timally compensated scheme defined abov e is the difficulty in determining the cov ariance 18 G. M. MA THEWS AND J. VIAL g d p u v F I G . 6. Dependency structur e of a prediction p where the unmodelled complexity u is hidden behind the data. Here the node g denotes an uncorrupted version of the data and is given by the deterministic r elationship g = Gx = ˜ Gv + GU ¯ C u . matrix ˆ Σ o v that compensates for the simplifications. Here a ne w scheme is proposed that ov ercomes this issue by moving away from optimality and focusing on conserv ativ eness. In particular, a specific class of predictions will be considered that allow the simplifica- tions to be hidden behind the data. Thus, the objective is to examine how model simplifi- cations can be ov ercome through the collection of the right data . The intuition here is that data of a similar type to the predictions is often av ailable, for example groundwater head measurements are often av ailable when it is of interest to predict heads, similarly stream flow measurements are often av ailable when predictions of stream flows are of interest, etc. The class of predictions that will be considered here incorporates two important princi- ples: • The effects of all unmodelled components of the system on the predictions are cap- tured by the data. • Any component of the system that ef fects the predictions, and is not captured by the data, is explicitly represented in the simplified model. From these principles it is considered that the prediction of interest can be explicitly de- composed into an intermediate data term that is dependent on g = Gx and a term wholly dependent on the parameters of the simple model, that is p = ¯ Ag + ¯ Bv + ρ , (41) where ¯ A and ¯ B are arbitrary matrices. Predictions with this structure are conditionally in- dependent of u gi ven g and v , i.e. p ( p | g , v , u ) = p ( p | g , v ) . This structure is represented by the Bayesian network in Figure 6 and requires the prediction matrix to have the follo wing form (42) Y = ¯ AG + ¯ BC † . It will be sho wn in the sequel that e ven with this restricted form it is not possible to overcome the simplifications and further constraints must be imposed. 6.2.1. Naive Scheme. Now , consider the nai ve calibration and prediction scheme intro- duced in Definition 3 . Recall that this method will be conservati ve for a suboptimal sim- plification when condition (32) , introduced in Proposition 2 , is satisfied. Specifically , this requires YU ¯ C − ˜ Y E n GU ¯ C = 0 , O VERCOMING MODEL SIMPLIFICA TIONS 19 where E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 . For a prediction matrix Y with structure consistent with (42) , this condition can be rewritten as YU ¯ C − ˜ Y E n GU ¯ C = ¯ A [ I − ˜ GΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 ] GU ¯ C − ¯ BΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 U ¯ C = 0 . Now , under non-trivial conditions, this holds when (43a) ¯ BΣ v ˜ G > = 0 , and (43b) ˜ GΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 = I . Condition (43a) requires the random vector ¯ b = ¯ Bv to be uncorrelated with ˜ g = ˜ Gv under the prior covariance Σ v . Furthermore, condition (43b) requires ˜ GΣ v ˜ G > Σ δ and occurs when the data is perfect such that Σ δ ≈ 0 , or when the prior kno wledge in the subspace that is informed by the data (i.e. rowspace( ˜ G ) ) is very weak such that ˜ GΣ v ˜ G > ≈ ∞ . These conditions ensure that: (i) the subset of parameters of the simple model that di- rectly influence the predictions, represented by ¯ b = ¯ Bv , cannot be estimated from the data and (ii) the simple model can exactly reproduce the data. It is important to note howe ver that these conditions are unlikely to hold in practical scenarios and the naiv e scheme is not guaranteed to be conservati ve, ev en for the predictions with the structure considered in (41) . 6.2.2. Data Driven Scheme. T o overcome the abov e issues, it is proposed to ensure conservati veness through the judicial remov al of information such that the conditions defined in (43) can be satisfied. This will occur in two main areas: • An easily computable inflation of the prior cov ariance matrix Σ v . • The application of a preprocessing or filtering step that may discard or combine some components of the data. Additionally , the class of predictions must be restricted further than those allo wed by (41) . T o start, let F ∈ R D d 0 × D d denote a data filtering matrix such that (44) d 0 = Fd , where D d 0 ≤ D d . Similarly define the transformed data matrices as G 0 = FG , and ˜ G 0 = F ˜ G , and the transformed data error cov ariance as Σ 0 δ = FΣ δ F > . The purpose of the filter F will be elaborated on later . Note howe ver , filtering is optional and the identity matrix can be used F = I . Now , consider the singular value decomposition of ˜ G 0 (45) ˜ G 0 = F ˜ G = U 1 U 2 S 1 0 0 0 V > 1 V > 2 = U 1 S 1 V > 1 . The last expression U 1 S 1 V > 1 only includes the nonzero singular v alues, and will be referred to as the compact SVD. With this expansion, the vectors v 1 = V > 1 v and v 2 = V > 2 v define the rowspace and nullspace components of the parameters of the simple model for the giv en filtered data matrix ˜ G 0 = F ˜ G . T o meet condition (43b) , the prior covariance will be inflated such that all information pertaining to the rowspace of ˜ G 0 , i.e. the subspace spanned by V 1 , is removed. In addition any correlation in the random vectors v 1 and v 2 will be ignored. Such a calibration and prediction scheme that embodies these properties is now defined. 20 G. M. MA THEWS AND J. VIAL D E FI N I T I O N 7 (Data Dri ven Scheme). F or a given filtering matrix F ∈ R D d 0 × D d with D d 0 ≤ D d , a data driven scheme is denoted by the set of simplified linear Gaussian pr obability density functions ˆ p d ( v ) , ˆ p d ( d 0 | v ) and ˆ p d ( p | v ) parameterized by the matrices ˆ Σ d v , ˜ G 0 = F ˜ G , Σ 0 δ = FΣ δ F > , ˜ Y , and Σ ρ . T o fully define the prior covariance matrix ˆ Σ d v , let the covariance matrix Σ v denote the uncertainty in the par ameters of the simplified model defined in Pr oposition 1 as Σ v = C † Σ x C † > . Furthermor e, let the matrices V 1 and V 2 each denote a set of orthonormal column vectors that form a basis for the r owspace and nullspace of ˜ G 0 , e.g. as defined in (45) . The prior covariance matrix ˆ Σ d v is now defined by the limit (46) ˆ Σ d v = lim α →∞ α V 1 V > 1 + V 2 V > 2 Σ v V 2 V > 2 . F or a given dataset d the posterior belief over the pr edictions of inter est generated by the data driven scheme is the Gaussian density ˆ p d ( p | d ) = N ( p ; ˆ µ d p | d , ˆ Σ d p | d ) , with mean and covariance given by ˆ µ d p | d = µ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ , d ) , ˆ Σ d p | d = Σ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ ) , wher e the functions µ ( · ) and Σ ( · ) ar e given in Definition 1 . Finally , a more specific class of predictions is considered than that defined in (42) , such that condition (43a) will hold by construction. Specifically , the prediction must depend on the uncorrupted version of the filtered data, denoted by g 0 = FGx , or on the components of the parameters of the simple model that line in the nullspace of ˜ G 0 , denoted by v 2 = V > 2 v . The prediction cannot depend on any ro wspace components directly . Predictions with this structure can be written in the form p = Ag 0 + Bv 2 + ρ , (47) where A and B are arbitrary matrices. Furthermore, the dependency structure of this re- stricted class of predictions is depicted in Figure 7 . The key difference between this and the structure depicted in Figure 6 is the explicit separation of v into v 1 and v 2 and the restric- tion that v 1 cannot hav e a direct dependency on the prediction. That is, the prediction p is conditionally independent of u and v 1 giv en g 0 and v 2 , or equiv alently p ( p | g 0 , v 1 , v 2 , u ) = p ( p | g 0 , v 2 ) . A prediction with this restricted structure has a prediction matrix of the form Y = AF G + BV > 2 C † . It is noted that the matrices A and B are arbitrary , the important characteristic is the structure that links the data and predictions to the simplification. This structure not only ensures that the simplifications are hidden behind the data, but also that any surrog ate roles that the parameter vector v 1 is forced to undertake during calibration cannot adversely af fect the predictions. It is now demonstrated that the data driv en scheme is conserv ative for predictions that hav e this special structure. P R O P O S I T I O N 4 (Performance of Data Driv en Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ define an optimal scheme with posterior denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Furthermor e, let C denote a suboptimal simplification matrix and let the matrices F , ˆ Σ d v , ˜ G 0 = F GC , Σ 0 δ = FΣ δ F > , ˜ Y = YC , and Σ ρ denote a data driven scheme, wher e ˆ Σ d v is as specified in Definition 7 . Additionally , let ˆ µ d p | d and ˆ Σ d p | d denote the mean and covariance of the posterior pr oduced by this scheme for the dataset d . O VERCOMING MODEL SIMPLIFICA TIONS 21 d 0 g 0 p v 1 v 2 u F I G . 7. Dependency structur e of a prediction p where the unmodelled complexity u is hidden behind the data. Any induced parameter surrogacy in the parameters v 1 cannot contaminate the pr edictions. Her e the node g 0 denotes an uncorrupted version of the filtered data and is given by the deterministic relationship g 0 = FGx = F ˜ GV 1 v 1 + FGU ¯ C u . 1. If the filtering matrix F is selected such that ˜ G 0 = F ˜ G has full r ow rank, then the estimator matrix E d of the data driven scheme is equivalent to the pseudoin verse of ˜ G 0 and can be expr essed in terms of the compact SVD (48) E d = ( ˜ G 0 ) † = V 1 S − 1 1 U > 1 . In addition, the posterior mean and covariance can be r ewritten as ˆ µ d p | d = ˜ Y E d d , ˆ Σ d p | d = ˜ Y E d Σ δ E d > ˜ Y > + ˜ Y WΣ v W ˜ Y > + Σ ρ , wher e W = I − V 1 V > 1 . 2. If ˜ G 0 = F ˜ G has full row r ank, and the high fidelity prediction matrix has the form (49) Y = AF G + BV > 2 C † , then the data driven scheme is conservative with r espect to the optimal scheme . See Appendix E for a proof. Before providing additional intuition as to the importance of this result, and the role of the filtering term F , it is first demonstrated that the data driv en scheme is a generalization of the truncated singular value decomposition calibration method. 6.2.3. T runcated SVD Inv ersion. The truncated singular value decomposition inv er- sion or calibration method is a commonly used technique [ 2 , 32 , 51 ] that generates the solu- tion to the parameter estimation problem using a truncated decomposition of the data matrix. It will be demonstrated that the truncated SVD scheme can be replicated with the appropriate selection of a filtering matrix. T o start, consider the SVD of the simplified data matrix (50) ˜ G = ˜ U ˜ S ˜ V > . Now , let ˜ U t denote the columns of ˜ U that correspond to the largest k nonzero singular values. Furthermore, consider the filtering matrix defined by (51) F TSVD = ˜ U > t . W ith this definition, the filtered data matrix ˜ G 0 = F TSVD ˜ G becomes (52) ˜ G 0 = F TSVD ˜ G = ˜ U > t ˜ U ˜ S ˜ V > = ˜ S t ˜ V > t , 22 G. M. MA THEWS AND J. VIAL where ˜ V t , ˜ S t similarly denote truncated versions of ˜ V , ˜ S . Now , as F TSVD ˜ G has full row rank (only nonzero singular values are included), the results of Proposition 4 (1) allow the estimator matrix for the scheme to be given by the pseudoin verse of ˜ G 0 = F TSVD ˜ G and thus can be written as (53) E d TSVD = ˜ V t ˜ S − 1 t . The estimated parameter vector of the simple model is no w related to the data v ector via the in verse of the truncated data matrix ˜ G , i.e. (54) ˆ µ v | d 0 = E d TSVD d 0 = ˜ V t ˜ S − 1 t ˜ U > t d . This demonstrates that the truncated SVD in version method can be replicated with the selec- tion of an appropriate filtering matrix. It is important to note howe ver that this does not mean that the predictions generated by the truncated SVD method will be conservati ve. T o guarantee this, the prediction must hav e the form specified in condition (49) of Proposition 4 , which in this case requires the prediction matrix to hav e the form (55) Y = A ˜ U t G + B ˜ V > ¯ t C † , where ˜ V ¯ t contains the columns of ˜ V that were removed by the truncation. If this is not obeyed, the predictions may be over confident. Furthermore, this condition is implicitly dependent on the truncation point k , such that this may hold for some values and not others. This dependency explains in part the results of the simulation studies performed by [ 51 ] which demonstrated the difficulty in choosing an appropriate truncation point such that predictiv e uncertainty is accurately estimated by a simplified model. 6.2.4. Selection of Data Filtering. A key element of the defined data dri ven prediction scheme is the filtering matrix F . For the results of Proposition 4 to hold, and the scheme to be conservati ve, two main requirements must be met: 1. F must be selected such that ˜ G 0 = F ˜ G has linearly independent rows. This is a fairly trivial requirement, and simply requires that dependent, e.g. duplicated, measurements be combined, e.g. by av eraging. Note that the information content is preserved as the error cov ariance is also transformed. In addition, it requires measurements that are insensitiv e to the model parameters be dropped. 2. F must be chosen such that the induced separation of the parameter vector v into those components that are estimatable v 1 = V 1 v and lie in the rowspace of F ˜ G from those that are not v 2 = V 2 v and lie in the nullspace of F ˜ G , forces all com- ponents of the parameter vector that have a direct influence on the prediction to be contained in the vector v 2 and are thus not updated during the calibration process. This ensures that these parameters do not take on any surrogate roles, and that the predictions are not corrupted. In addition to these two requirements, the filtering matrix can be selected to improve the predictiv e performance by remo ving data components that are only weakly informati ve to the parameters. This can be considered identical to selecting an appropriate truncation point on a SVD calibration scheme such that the highly informative prior kno wledge can be used instead of data that only imposes weak constraints, e.g. as recommended by [ 32 ]. Howe ver before searching for an optimal filter, it is important to note that for a given type of data, predictions, and simplification, defined by the triple G , Y , C , there may not be an F that allo ws the prediction to be cast in the form required by Proposition 4 and the use of the data driven scheme is not guaranteed to be conservati ve. This reinforces the fact that O VERCOMING MODEL SIMPLIFICA TIONS 23 d 0 Σ 0 δ = FΣ δ F > g 0 F , ˜ G p A , B , Σ ρ v 1 Ignore prior in- formation on v 1 u v 2 Σ v 2 = V > 2 Σ v V 2 F I G . 8. Bayesian network depicting the structure of the densities ˆ p d ( v ) , ˆ p d ( d | v ) , and ˆ p d ( p | v ) employed by the data driven scheme. Also displayed in dashed lines are the dependencies on unmodelled complexity u of a suboptimal model under condition (49) of Pr oposition 4 . Note also that the simplified prediction matrix is given by ˜ Y = A ˜ G 0 + BV > 2 , however the actual values of A and B are not needed. choosing a model simplification and performing calibration with a particular dataset, must be considered an integrated task that is explicitly dependent on the types of predictions that are required. 6.2.5. Summary . This section has introduced a calibration and prediction scheme that is a generalization of the typically used truncated SVD scheme. Furthermore it has been demonstrated that the scheme is guaranteed to produce a conservati ve prediction posterior when the unmodelled complexity is hidden behind the data. The general structure of the prediction problem, that obeys this condition, is depicted in graphically in Figure 8 . This represents the probability densities that define the data dri ven scheme and includes the de- pendency structure of the predictions (as required by condition (49) and depicted in Figure 7 ) as well as the structure of the inflated prior cov ariance matrix ˆ Σ d v (as specified in (46) ). The important contribution here is the deri ved structural condition between the data, pre- dictions and the simplifications that determines when it is adequate to use the data driv en scheme, with a simplified model, to assess predicti ve uncertainty . It is noted that unlike the optimally compensated scheme, this scheme can be used without needing the explicit numer- ical values within the data and prediction matrices of the high fidelity reference model. It is noted howe ver that, structural information from them is still required to ensure Proposition 4 is met. Although this still may not be straightforward in practice, structural considerations are generally significantly easier to handle than numerical ones. 7. Discussion and Conclusions. This paper has considered what constitutes a simpli- fied, but useful, model. In particular it has e xamined how simplified models can be used to combine data and expert knowledge within a calibration or in version process to generate a prediction with a conservati ve estimate of uncertainty . The concept of an optimal simplified model was defined that determines when standard probabilistic calibration methods are ade- quate to quantify predictiv e uncertainty . The main contribution is the introduction of two new calibration and prediction schemes, along with conditions that explicitly define when they are appropriate to generate a conservati ve estimate of uncertainty , for suboptimal models. These conditions explicitly relate the nature of the calibration data, the predictions of interest as well as the simplifications within the model. The first scheme allo ws the optimal posterior distrib ution to be generated by allowing the simplifications to be o vercome by adjusting how the beliefs of the modeller are used to define 24 G. M. MA THEWS AND J. VIAL a prior term. The scheme is only applicable for highly parameterized models. Furthermore, it requires the prior covariance for the parameters of the simple model to be generated using the data and prediction matrices of a high fidelity reference model. This is a significant limitation as if these matrices could be generated for a practical problem they could be directly used to produce a prediction posterior without the need to consider a simplified model. Nev ertheless, the v alue of this scheme lies in defining the ideal calibration process for a simplified model and demonstrates it is possible to ov ercome suboptimal simplifications through the judicial selection of a prior regularization matrix. The second data driv en scheme is designed for predictions that are strongly related to the data, such that the unmodelled complexity effects both in the same way . This scheme does not require the data and prediction matrices of the high fidelity model to be av ailable for model calibration in the way the optimally compensated scheme does. Ho wev er, it is only guaranteed to be conservati ve if the predictions, data and simplification have the required structural form. The key insight provided by this scheme is understanding how the model simplifications can be ov ercome with the use of the right calibration data. It was also demonstrated that this data driven scheme is a generalization of the popu- lar truncated singular v alue decomposition in version scheme [ 2 ]. The generalization allo ws greater flexibility in filtering the data to ensure that the predictions are of the appropriate form for a giv en simplified model. Both calibration schemes have been applied to a prototypical groundwater prediction problem in Appendix A . Finally , it is noted that each of the two ne wly defined calibration and prediction schemes hav e conditions that are linked to the data and prediction matrices of a high fidelity reference model. For any practical problem it is unlikely that these conditions can be directly assessed and further subjective judgment will be needed. T o make this process easier, two areas of further work can be pursued. Firstly , more synthetic experimental analyses are required to demonstrate how the two schemes can be applied in more complex problems, e.g. as in [ 51 ]. The second area is to understand the effect of partial non-satisfaction of these conditions, and determine how the schemes can be made rob ust to these. It is also noted that the use of Bayesian networks, for instance as depicted in Figure 10 , may be of great benefit to determine when the required structural conditions are likely to be obeyed, for instance, when is a gi ven simplified model optimal, or when is it appropriate to use the data driv en scheme. Such structural considerations may also help frame the arguments put forward in modelling projects (e.g. performed within en vironmental impact assessment studies) as to why a given modelling and calibration approach is adequate for a gi ven predic- tion problem. For instance these argument can be framed using a two stage process. The first may conceptually link the complex system to an optimally simplified model, this stage would consider the features and characteristics of the system that ideally should be modelled. The second stage may then put forward arguments as to ho w any further simplifications used to produce a suboptimal numerical model will be handled by the calibration process. Lastly , se veral other important areas of further research are identified: • Nonlinear simulators. The analysis within this paper has required a linear relation- ship between the system properties and the data and predictions. This has allowed generic insights to be obtained, but is a significant limitation and further work is needed to relax this. One approach is to consider higher order expansions of the models such that some of the nonlinearities can be included, for example second order expansions have been considered in [ 8 , 31 , 12 ]. Alternativ ely , more direct probabilistic formulations may also be possible that e.g. generalize the data dri ven scheme and only exploit the structural constraints with the problem. • Nonlinear parameterizations. In the developed analytical framew ork only linear relationships between the parameters of the high fidelity model and the simplified O VERCOMING MODEL SIMPLIFICA TIONS 25 model where considered. This should be extended to considered nonlinear parame- terizations. • Over constrained calibration problems. The results obtained for the optimally compensated prediction scheme required that the simplified model is under con- strained by the data and provides no insight to aid the calibration of ov er constrained problems. Ne vertheless, other approaches may be possible that reproduce the opti- mal, or at least a conserv ativ e, result. Similar modifications may also be possible for the data driv en scheme. • Conservativ eness. This paper has employed a definition of conservati veness based on the mean and cov ariance of the model predictions. Generalizations of this to non- Gaussian distributions have been considered in e.g. [ 3 , 1 ]. Howe ver , it is perhaps more appropriate to consider a generalization of conserv ativ eness that explicitly in- cludes the subsequent decision problem (i.e. engineering design or environmental management). This can be performed using a decision theoretic approach [ 5 ] that includes the utility function of this subsequent decision problem and defines an ap- proximate density ˆ p ( p ) as conservati ve if the expected utility is not ov er estimated, e.g. E ˆ p ( p ) { U ( a , p ) } ≤ E p ( p ) { U ( a , p ) } for all a . Here U ( a , p ) is the utility function that encodes the gain under action a , when the consequence p occurs. Such a generalization would also allow the incorporation of the decision problem into ho w modelling and calibration should be performed. It is noted that the above generalization is equiv alent to Definition 4 when U ( a , p ) has the form of a weighted squared dif ference, i.e. U ( a , p ) = − ( a − p ) > A ( a − p ) for arbitrary positiv e semidefinite matrix A . 26 G. M. MA THEWS AND J. VIAL F I G . 9. Groundwater pr ediction pr oblem of interest. The aquifer is considered to be unit width into the pa ge. Appendix A. Example: Gr oundwater Head Prediction. The two ne wly defined calibration and prediction schemes are now applied to a proto- typical groundwater prediction problem similar to that considered in [ 51 ]. The scenario is depicted in Figure 9 and consists of a 1D confined aquifer with a single observation well. Of interest is the predicted hydraulic head within the aquifer to the right of the observation well. It is noted that this is a very rudimentary problem, howe ver the objectiv e here is to demonstrate the dif ferences between the schemes and how the conditions that guarantee con- servati veness relate to a specific problem. A.1. Prior Beliefs. For this example, the expert modeller has the following beliefs about the system (taken from [ 51 ] where possible) 1. A constant head boundary condition is belie ved to exist at the left side, correspond- ing to a discharge location. The head at this location, h 0 , is believed to be normally distributed with mean 1 . 0 m and standard deviation 0 . 75 m abo ve the upper confining layer . 2. No areal recharge or leakage is belie ved to take place within the domain of interest. 3. The thickness of the aquifer b is known to be a constant over the domain with value 10 m. 4. The rate q of water flo wing through the aquifer known to be 0 . 5 m 3 /day . 5. The system is belie ved to be in steady state. Furthermore, a numerical simulation of the system with cell length ` = 10 m is belie ved to be adequate to capture the spatial variation in the head field that is of interest. 6. The hydraulic conductivity of the aquifer is believ ed to be heterogeneous, with a mean value of 2 . 5 m/day . A set of 10 cells, each of length ` = 10 m, is considered adequate to describe the heterogeneity , where the hydraulic conducti vity of each cell, K i , is believed to be log normally distributed such that log 10 K i has a mean log 10 2 . 5 ≈ 0 . 398 , and a spatial correlation described by the exponential variogram with sill of 0 . 1 ≈ (0 . 316) 2 and range 300 m. 7. The error in the data acquisition method is believ ed to be small and normally dis- tributed with a mean of zero and standard de viation 0 . 1 m. It is noted that these capture what is believed to realistically describe the system (or at least an optimally simplified model). Further simplifying assumptions will be considered in Ap- pendix A.4 . Based on the above prior kno wledge, the system properties can be defined as the v ector that captures the constant boundary head, and the set of hydraulic conducti vities (56) x = [ h 0 , log 10 ( K 1 ) , . . . , log 10 ( K 10 )] > O VERCOMING MODEL SIMPLIFICA TIONS 27 Furthermore, the uncertainty in the system properties is fully captured by the mean and co- variance matrix that defines the Gaussian distrib ution p ( x ) = N ( x ; µ x , Σ x ) . T o define the data and prediction likelihood functions, consider the vector x as known and define the mean with the following nonlinear functions, corresponding to the finite difference solution to Darcy’ s equations [ 4 ] G ( x ) = h 0 + 5 X i =1 q ` bK i , (57) Y ( x ) = h 0 + 10 X i =1 q ` bK i , (58) = G ( x ) + 10 X i =6 q ` bK i . (59) Note the variables ` , b and q are all known constants. Furthermore, as the numerical simula- tion at this discretization is believed to be adequate, the data error δ is completely captured by errors in the measurement process such that Σ δ = (0 . 1) 2 . In addition the prediction error variance is tak en to be zero, Σ ρ = 0 . It is noted that the prediction equation has been rewritten on line (59) to explicitly include the data equation as an additive term. This form will be important when judging whether the data driv en scheme is appropriate for a particular system simplification. A.2. Data Generation. The measured head at the observation well is d = 2 . 5 m. This is generated by the data equations d = G ( x t ) with no added measurement error . In addition the system properties x t are the same as the prior mean µ x , with the exception that the boundary head is set to h 0 = 1 . 5 m. This difference corresponds to 2 / 3 of the prior standard deviation. A.3. Linearized Solution. From the nonlinear functions G ( x ) and Y ( x ) , a pair of lin- ear functions are constructed by linearizing about the prior mean with a first order T aylor expansion [ 30 , 10 ]. G ( x ) ≈ G ( µ x ) + ∇ x G ( µ x )[ x − µ x ] and similarly for Y . Furthermore, a transform into zero mean increments is performed such that ∆ x = x − µ x , ∆ d = d − G ( µ x ) , and ∆ p = p − Y ( µ x ) . Under the linearized approximation the data and prediction equations become ∆ d ≈ G ∆ x + δ ∆ p ≈ Y ∆ x + ρ where the data and prediction matrices are defined by the Jacobian matrices ev aluated at the prior mean G = ∇ x G ( µ x ) and Y = ∇ x Y ( µ x ) . The solutions to the full non-linear prediction problem and the approximate linearized version defined abov e are given in Figure 11 (a). The non-Gaussian posterior density of the nonlinear problem has been produced using a standard Metropolis MCMC algorithm [ 9 ] to generate samples that are then smoothed using a kernel density estimator [ 41 ]. Overlaid with this is the Gaussian posterior produced from the linearized problem. It is noted that the posterior under the non-linear equations is slightly more peaked and non-symmetric when compared to the posterior generated by the linearized scheme. It is noted that the non-linearity is only present as the log conducti vity is considered to be normally distributed. If, for instance, the hydraulic resistance (inv erse conductivity) 28 G. M. MA THEWS AND J. VIAL was represented directly , the equations would become linear . Howe ver , this is not pursued here and the error caused by assuming a linear approximation of the nonlinear equations is considered out of scope. A.4. Assumed Simplifications. Consider the following simplifying assumptions A1: The constant flo w boundary condition is assumed known and set to the prior v alue of 1 . 0 m. A2: Cells 1-5 are grouped together and assumed a homogeneous unit. This set of cells will be referred to as Zone A. A3: Cells 6-10 are grouped together and assumed a homogeneous unit. This set of cells will be referred to as Zone B. This enables the parameters of the simple model to be defined by the vector of log conducti v- ities of the two zones v = [log 10 K A , log 10 K B ] This simplification scheme corresponds to a C matrix of C = 0 0 1 0 . . . . . . 1 0 0 1 . . . . . . 0 1 . The use of increments allo ws the state of the high fidelity model to be defined in terms of an increment ∆ v such that ∆ x = C ∆ v . Furthermore, the simplified data equation becomes ˜ G (∆ v ) = G ( µ x + C ∆ v ) and similarly for the prediction equation. No w , the linearized form of the simplified data and prediction equations become ∆ d ≈ ˜ G ∆ v + δ ∆ p ≈ ˜ Y ∆ v + ρ Here the simplified data and prediction matrices can be written in terms of the high fidelity model ˜ G = GC and ˜ Y = YC , where G and Y are the Jacobian matrices of the high fidelity model G = ∇ x G ( µ x ) , Y = ∇ x Y ( µ x ) . It is now of interest to determine whether this simplification is optimal. If it is, the naiv e prediction scheme when used with the simplified model will produce the same results as the optimal scheme applied to the high fidelity model. For the simplification to be optimal, the unmodelled complexity should ha ve no effect on the data or predictions. This occurs when the following conditions hold GU ¯ C = 0 and YU ¯ C = 0 O VERCOMING MODEL SIMPLIFICA TIONS 29 K 1 . . . h 5 d h 0 K 6 . . . h 10 K A h 5 d h 0 K B K − A K − B h 10 (a) (b) F I G . 1 0 . Bayesian Network that depicts the structur e of the pr oblem with (a) the refer ence model (b) the simplified model. The unmodelled complexity in the simplified model includes the boundary head h 0 and the small scale complexity within each zone, denoted by the two random vectors K − A and K − B , each with four elements. The dependencies these have ar e denoted by dashed lines. where the columns of U ¯ C define an orthonormal basis for the cokernel of C , as defined in (17) . The unmodelled complexity consists of h 0 and the two 4D parameters that describe the small scale heterogeneity in the conductivity of the two zones. It is noted that this condition does not hold and thus the simplification is not optimal. Furthermore, the failure of this condition to hold is completely due to assumption A1 that considers the boundary condition h 0 is kno wn. If this assumption is remo ved and h 0 included in the simplified model, while at the same time retaining the homogeneity assumptions of A2 and A3, then the simplification would become optimal (under the linearization considered). This optimal simplification will not be considered and it will be expected that the nai ve method will not be conserv ative. Finally , the structure of the high fidelity prediction problem and the simplified problem is depicted graphically with two Bayesian networks in Figure 10 . It is noted that the suboptimal- ity of the simplification can be observed in Figure 10 (b) as the data and predictions are not conditionally independent of the unmodelled components ( h 0 ) giv en the model parameters ( K A and K B ). A.5. Prediction Results. Here, the performance of the two new calibration and predic- tion schemes are considered, with the resulting posterior density functions for the predicted head displayed in Figure 11 (b). The results are also compared with the nai ve scheme which represents a typical probabilistic calibration scheme of a simplified model. It is important to note that no “ground truth” v alue for the prediction is giv en for comparison purposes as the objectiv e is to produce the full probability distribution and not make a single point prediction. A.5.1. Naive Scheme. The naiv e scheme considers the uncertainty in the parameters included by the simple model, but ignores any errors introduced in the simplification. As the simplification is suboptimal, it should not be expected that the prediction posterior will be conservati ve. Furthermore, the results in Figure 11 (b) demonstrate that this is true for this scenario as the posterior is ov erly narrow . T o understand the cause of the o verconfidence in this scenario, consider the prior covari- ance in the parameters ∆ v as defined in Proposition 1 (60) Σ v = C † Σ x C † > ≈ 10 − 2 8 . 58 6 . 19 6 . 19 8 . 58 30 G. M. MA THEWS AND J. VIAL 0 2 4 6 Predicted Head (m) Linearised Nonlinear 0 2 4 6 Predicted Head (m) Naiv e Scheme Optimal Scheme Data Driven Scheme (a) (b) F I G . 1 1 . Pr ediction probability density functions under (a) refer ence model and (b) the simplified model. The optimally compensated scheme in (b) repr oduces the posterior distribution of the linearized version of the refer ence model in (a). The naive scheme is non-conservative and under estimates the uncertainty, while the data driven scheme is conservative. Due to the large range of 300m used for the variogram of the log conductivity field, the correlation between the two parameters maintained by the simple model is considerable. In addition, the known flow rate and the assumption of known boundary head h 0 , allows the head measurement to be used to very accurately , but incorrectly , estimate the conductivity K A . Furthermore, due to the large correlation that K A has with K B , this scheme produces an ov erconfident prediction probability density function for the parameter K B , which in turn causes the predicted head to be also ov erconfident and non-conservati ve. A.5.2. Optimally Compensated Scheme. Now consider the optimally compensated calibration and prediction scheme. Firstly it is noted that the simplified problem is highly parameterized as D v ≥ D p + D d as D v = 2 and D p = D d = 1 . The major difference between the naiv e scheme and the optimally compensated scheme is the modification of the prior covariance used for the parameters of the simple model. In particular , the modified prior of this scheme is not just defined in terms of the simplification matrix, but also in terms of the data and prediction matrices, via the matrix R = ˜ Z † Z , where Z = G Y and ˜ Z = ˜ G ˜ Y . The optimally compensating prior cov ariance becomes ˆ Σ o v = RΣ x R > ≈ 10 − 2 27 . 44 6 . 19 6 . 19 8 . 58 It is noted that the only difference between the values of the naiv e prior cov ariance Σ v and ˆ Σ o v , is the inflation of the mar ginal variance of the first component corresponding to log K A . This causes less information to be propagated from the data into log K B , and enables the prediction posterior to replicate the results of the high fidelity model. The key difficulty with this scheme is that it requires the data and prediction matrices of the high fidelity model in order to produce the inflated prior cov ariance matrix ˆ Σ o v that compensates for the suboptimal simplifications. A.5.3. Data Driven Scheme. T o apply the data driven calibration and scheme, it is first of interest to determine if it is appropriate for the problem, in particular , does the prediction hav e the right structural relationship with the data and the simplifications as specified in Proposition 4 . This will be checked first using the algebraic expression, and then using the structural characteristics of the network depicted in Figure 10 (b). O VERCOMING MODEL SIMPLIFICA TIONS 31 The required algebraic condition requires that for some A and B the prediction matrix can be written as (61) Y = AF G + BV > 2 C † . For this problem, the data is scalar , and thus the filtering matrix will be set to unity F = 1 . Also, from the prediction equation in (59) , it is clear that the prediction matrix Y is equal to the data matrix G plus an additional term, which will be denoted by T . This allows the Jacobian matrix Y to be written as (62) Y = G + T . Thus, the matrix A in (61) will be set to unity A = 1 . Now it is of interest to consider the second term in (61) . For the prediction to hav e the required form, the matrix T = Y − G must be equi valent to BV > 2 C † , and thus must satisfy two conditions: ( Y − G ) U ¯ C = 0 (63) ( Y − G ) CV 1 = ( ˜ Y − ˜ G ) V 1 = 0 (64) The first condition ( 63 ) requires the dif ference between the data and the prediction to be in- dependent of the unmodelled complexity . For this scenario this can be checked algebraically using the high fidelity prediction and data matrices and is satisfied. No w consider the second condition ( 64 ), this requires the difference between the data and the prediction to be inde- pendent on the rowspace components of the ˜ G . This can be easily checked using just the simplified data and prediction matrices and is also satisfied. Thus, it is guaranteed that the data driv en scheme is conservati ve for this problem. As an alternati ve to this algebraic check, the conditional independence requirements can be check using the structure of the Bayesian network. In particular , the requirement of (61) is equi valent to the requirement that the prediction p = h 10 is conditionally independent of all unmodelled components, u = [ h 0 , K − A , K − B ] , given the uncorrupted data g = h 5 and the nullspace components of the parameter vector v 2 = K B . From the structure of the network, these two nodes are the only parents of the prediction and thus, the required independence requirement is satisfied. This graphical approach provides much greater intuition into when the requirements are met. When the data driv en scheme is applied, it is noted that the av ailable prior knowledge is modified in two ways. Firstly , the prior information on K A is ignored and this parame- ter is estimated from the data only . Secondly , the correlation between K A and K B is also ignored, this causes the posterior over the parameter K B to be the same as the prior . These modifications are encapsulated in the data driv en prior cov ariance matrix ˆ Σ d v = lim α →∞ α V 1 V > 1 + V 2 V > 2 Σ v V 2 V > 2 = 10 − 2 ∞ 0 0 8 . 58 . The posterior generated by this scheme is conservati ve and is depicted in Figure 11 (b). It is noted that the posterior is slightly shifted than that produced by the optimal scheme and has a larger v ariance. A.6. Summary . This section has demonstrated the operation of the naive, optimally compensated, and data driv en calibration and prediction schemes on a prototypical groundwa- ter problem. A high fidelity model was considered to represent the true belief of an modeller and a suboptimal simplification dev eloped to represent a computational model. 32 G. M. MA THEWS AND J. VIAL The ke y difference between the three schemes is in the specification of the prior co- variance matrix for the parameters of the simplified model. In particular, the naive scheme generates this by directly projecting the prior distribution that e xists in the parameters of the high fidelity model into the parameters of the simplified model. The optimally compensated scheme performs the projection using full knowledge of the data and prediction matrices of the high fidelity model. Lastly , the data driven scheme uses the same matrix as the naiv e scheme but thro ws away some of the information it contains. The conditions under which the schemes are conservati ve (or optimal) have also been highlighted. The naive scheme should only be applied to optimally simplified models, this was not satisfied and the posterior shown to be ov erconfident and non-conservati ve. The optimally compensated scheme should only be applied to highly parameterized models. And the data driv en scheme is conservati ve only when the predictions and data hav e a similar dependency to the unmodelled complexity . These last two conditions were shown to hold for the problem considered, and the generated posterior distributions were also conserv ativ e. Appendix B. Pr oof of Proposition 1. Pr oof. For a given simplification matrix, the parameters of the high fidelity model x can be expanded using (18) as x = Cv + U ¯ C u = C U ¯ C v u . Furthermore, from the decomposition in (17) , the matrix C U ¯ C can be rewritten as C U ¯ C = U C U ¯ C S C V > C 0 0 I . This is nonsingular and has an in verse that simplifies to C U ¯ C − 1 = C † U > ¯ C , where C † = V C S − 1 C U > C . Thus, the transformed co variance matrix in the space of v , u has the form Σ v Σ vu Σ > vu Σ u = C † U > ¯ C Σ x C † > U ¯ C . Appendix C. Pr oof of Proposition 2. Pr oof. Under the conditions of part (1) the matrix C denotes an optimal simplification and thus GU ¯ C = YU ¯ C = 0 , where U ¯ C is an orthonormal basis for the cokernel of C , e.g. as giv en in (17) . Using the e xpansion of the parameter vector as x = Cv + U ¯ C u , the cov ariance Σ x becomes (65) Σ x = CΣ v C > + U ¯ C Σ u U > ¯ C + CΣ vu U > ¯ C + U ¯ C Σ > vu C > . Substituting (65) into the posterior mean and cov ariance of the optimal scheme, defined in (10) and (11) , and noting that GU ¯ C = YU ¯ C = 0 , the following forms are obtained µ p | d = YCΣ v C > G > ( GCΣ v C > G > + Σ δ ) − 1 d , Σ p | d = YCΣ v C > Y > + Σ ρ − YCΣ v C > G > ( GCΣ v C > G > + Σ δ ) − 1 GCΣ v C > Y > . O VERCOMING MODEL SIMPLIFICA TIONS 33 Noting that ˜ G = GC and ˜ Y = YC these e xpressions are equiv alent to mean and covariance those of the naive scheme, i.e. ˆ µ n p | d = µ p | d and ˆ Σ n p | d = Σ p | d for all d . This proves part (1) of the proposition. T o proceed, to parts (2) and (3), for the naive scheme to be conservati ve, the mean and cov ariance of the posterior must obey (30) , which simplifies to (66) ˆ Σ p | d E p ( d , p ) n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o Also, the e xpectation ov er d , p is equi valent to an expectation over the independent v ariables x , δ , ρ , where d = Gx + δ and p = Yx + ρ . Thus to prove part (2) of the proposition, it will be now sho wn that this holds under the special condition of YU ¯ C = ˜ Y E n GU ¯ C , defined in (32) . T o start, note that ˆ µ n p | d = ˜ Y E n d , d = [ Gx + δ ] and x = Cv + U ¯ C u , thus the difference ˆ µ n p | d − p can be written ˆ µ n p | d − p = [ ˜ Y E n ˜ G − ˜ Y ] v + ˜ Y E n δ − ρ + [ ˜ Y E n GU ¯ C − YU ¯ C ] u (67) = [ ˜ Y E n ˜ G − ˜ Y ] v + ˜ Y E n δ − ρ . (68) Where, (68) has used condition (32) . Now , as v = C † x , δ , ρ are all independent and zero mean, the expected squared dif ference of ˆ µ n p | d − p can be written in terms of the covariance matrices E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = [ ˜ Y E n ˜ G − ˜ Y ] Σ v [ ˜ Y E n ˜ G − ˜ Y ] > + ˜ Y E n Σ δ E n > ˜ Y > + Σ ρ . Expanding the right hand side with the naive estimator matrix E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 and simplifying, produces the result E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = ˜ Y Σ v ˜ Y > + Σ ρ − ˜ Y Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 ˜ GΣ v ˜ Y > = ˆ Σ n p | d Thus, the special condition YU ¯ C = ˜ Y E n GU ¯ C is suf ficient for (66) to be satisfied (with equality), ensuring the naiv e scheme is conservati ve. This proves part (2). T o prov e part (3), consider equation (67) above and assume v and u are independent and ˜ Y E n GU ¯ C 6 = YU ¯ C , then the expected squared dif ference of ˆ µ n p | d − p becomes E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = [ ˜ Y E n ˜ G − ˜ Y ] Σ v [ ˜ Y E n ˜ G − ˜ Y ] > + ˜ Y E n Σ δ E n > ˜ Y > + Σ ρ + [ ˜ Y E n GU ¯ C − YU ¯ C ] Σ u [ ˜ Y E n GU ¯ C − YU ¯ C ] > . Defining M = ˜ Y E n GU ¯ C − YU ¯ C 6 = 0 , this simplifies to E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = ˆ Σ n p | d + MΣ u M > . Now , as M is a non-zero matrix and Σ u is a non-zero positiv e semi-definite covariance matrix, MΣ u M > must be non-zero and positive semi-definite. Thus, the required condition 34 G. M. MA THEWS AND J. VIAL (66) does not hold as E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o ˆ Σ n p | d . and the scheme is strictly not conservati ve. This prov es part (3). Appendix D . Proof of Pr oposition 3. Pr oof. T o pro ve optimality of the compensated scheme it is necessary to sho w the condi- tions defined by (34) are satisfied by the prior cov ariance matrix ˆ Σ o v defined by this scheme. It is noted that these are satisfied when ˜ G ˆ Σ o v ˜ G > = GΣ x G > , ˜ Y ˆ Σ o v ˜ Y > = YΣ x Y > , and ˜ Y ˆ Σ o v ˜ G > = YΣ x G > . These conditions can be combined and rewritten as (69) ˜ Z ˆ Σ o v ˜ Z > = ZΣ x Z > , where Z = G Y and ˜ Z = ˜ G ˜ Y . Now with the definition of ˆ Σ o v = ˜ Z † ZΣ x ˜ Z † > Z > , condition (69) becomes ˜ Z ˜ Z † ZΣ x ˜ Z † > Z > ˜ Z > = ZΣ x Z > , and holds when ˜ Z ˜ Z † Z = Z . Now , to demonstrate ˜ Z ˜ Z † Z = Z is satisfied, the condition rank( ZC ) = rank( Z ) im- plies that columnspace( ˜ Z ) = columnspace( ZC ) = columnspace( Z ) [ 42 , 3.16]. Further- more, it is noted that columnspace( ˜ Z ) = columnspace( ˜ Z † > ) [ 42 , 7.52(l)], and thus columnspace( Z ) = columnspace( ˜ Z † > ) . This implies that Z > ( I − ˜ Z ˜ Z † ) = 0 [ 42 , 2.34]. Furthermore, as ˜ Z is real, ( ˜ Z ˜ Z † ) > = ˜ Z ˜ Z † . Thus, taking the transpose of Z > ( I − ˜ Z ˜ Z † ) = 0 , produces ( I − ˜ Z ˜ Z † ) Z = 0 and hence ˜ Z ˜ Z † Z = Z . Appendix E. Pr oof of Proposition 4. Pr oof. T o start, note that the inv erse of the prior cov ariance matrix ˆ Σ d v simplifies as follows [ ˆ Σ d v ] − 1 = lim α →∞ [ V 2 V > 2 Σ v V 2 V > 2 + α V 1 V > 1 ] − 1 (70) = V 2 ( V > 2 Σ v V 2 ) − 1 V > 2 (71) Now , the estimator matrix is defined as E d = ˆ Σ d v ˜ G 0 > ( ˜ G 0 ˆ Σ d v ˜ G 0 > + Σ 0 δ ) − 1 . Using the matrix inv ersion identity [ 42 , 15.1(a)], the in verse definition in (71) , the compact SVD of ˜ G 0 = U 1 S 1 V > 1 , and noting that as ˜ G 0 has full row rank U − 1 1 = U > 1 , the estimator matrix simplifies to E d = [[ ˆ Σ d v ] − 1 + ˜ G 0 > Σ 0 δ − 1 ˜ G 0 ] − 1 ˜ G 0 > Σ 0 δ − 1 = V 1 S − 1 1 U > 1 = ˜ G 0 † . O VERCOMING MODEL SIMPLIFICA TIONS 35 This can be used to directly define the mean of the prediction posterior ˆ µ d p | d = µ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ , d 0 ) = ˜ Y E d d 0 . W ith similar substitutions as above, and using the W oodbury identity [ 42 , 15.3(b)(i)], the cov ariance of the prediction posterior simplifies to ˆ Σ d p | d = Σ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ ) = ˜ Y ˆ Σ d v ˜ Y > + Σ ρ − ˜ Y ˆ Σ d v ˜ G 0 > ( ˜ G ˆ Σ d v ˜ G 0 > + Σ 0 δ ) − 1 ˜ G 0 ˆ Σ d v ˜ Y > = ˜ Y [[ ˆ Σ d v ] − 1 + ˜ G 0 > Σ 0 δ − 1 ˜ G 0 ] − 1 ˜ Y > + Σ ρ = ˜ Y [ V 2 V > 2 Σ v V 2 V > 2 + V 1 S − 1 1 U > 1 Σ 0 δ U 1 S − 1 1 V > 1 ] ˜ Y > + Σ ρ = ˜ Y W > Σ v W ˜ Y > + ˜ Y E d Σ 0 δ E d > ˜ Y > + Σ ρ where W = V 2 V > 2 = I − V 1 V > 1 represents a projection on to the nullspace of ˜ G 0 . This concludes the proof of part (1) of the proposition. For the scheme to be conserv ativ e, the mean and covariance of the posterior must obe y (72) ˆ Σ d p | d E p ( d , p ) n ( p − ˆ µ d p | d )( p − ˆ µ d p | d ) > o Also, the e xpectation ov er d , p is equi valent to an expectation over the independent v ariables x , δ , ρ , where d = Gx + δ and p = Yx + ρ . Now , the difference between the prediction p = Yx + ρ and the mean ˆ µ d p | d simplifies to p − ˆ µ d p | d = Yx + ρ − ˜ Y E d Fd , = [ ˜ Y − ˜ Y E d ˜ G 0 ] v − ˜ Y E d δ + ρ + [ YU ¯ C − ˜ Y E d G 0 U ¯ C ] u = ˜ Y Wv − ˜ Y E d δ + ρ Furthermore, as v = C † x , δ and ρ are all independent and zero mean, then the difference p − ˆ µ d p | d has an expected v alue of zero and a cov ariance given which is equi valent to ˆ Σ d p | d E n ( p − ˆ µ d p | d )( p − ˆ µ d p | d ) > o = ˜ Y WΣ v W ˜ Y > + ˜ Y E d Σ δ E d > ˜ Y > + Σ ρ = ˆ Σ d p | d . Thus, the required condition (72) is satisfied (with equality) and the prediction scheme is conservati ve. This prov es part (2) of the proposition. REFERENCES [1] J . A J G L A N D M . S I M AN DL , On conservativeness of posterior density fusion , in 2013 16th International Conference on Information Fusion (FUSION), July 2013, pp. 85–92. [2] R . C. A S T E R , B . B O R CH ER S , A N D C . H . T H U R BE R , P arameter Estimation and Inver se Problems, Second Edition , Academic Press, W altham, MA, 2 edition ed., Feb . 2012. [3] T. B A I L E Y , S . J U L I E R , A N D G . A G A M E NN O NI , On conservative fusion of information with unknown non- Gaussian dependence , in 2012 15th International Conference on Information Fusion (FUSION), July 2012, pp. 1876–1883. 36 G. M. MA THEWS AND J. VIAL [4] J . B EA R A N D A . H . -D . C H EN G , Modeling groundwater flow and contaminant transport , Springer, Dor- drecht; London, 2010. [5] J . B E R G E R , Statistical Decision Theory and Bayesian Analysis , Springer , New Y ork, 2nd edition ed., 1985. [6] K . B E VE N , On the concept of model structural error , W ater Science and T echnology: A Journal of the International Association on W ater Pollution Research, 52 (2005), pp. 167–175. [7] K . B E V EN A N D J . F R EE R , Equifinality, data assimilation, and uncertainty estimation in mechanistic mod- elling of complex envir onmental systems using the GLUE methodology , Journal of Hydrology , 249 (2001), pp. 11–29, https://doi.org/10.1016/S0022- 1694(01)00421- 8 . [8] M . J . B O X , Bias in Nonlinear Estimation , Journal of the Royal Statistical Society . Series B (Methodological), 33 (1971), pp. 171–201. [9] S . B RO OK S , Handbook for Markov chain Monte Carlo , T aylor & Francis, Boca Raton, 2011. [10] J . C A RR ER A , A. A LC OL EA , A . M E D I N A , J . H I DA L G O , A ND L . J . S L OO T E N , Inverse pr oblem in hydr oge- ology , Hydrogeology Journal, 13 (2005), pp. 206–222, https://doi.org/10.1007/s10040- 004- 0404- 7 . [11] M . P . C LA RK A ND J . A . V RU G T , Unraveling uncertainties in hydr ologic model calibration: Addr essing the pr oblem of compensatory parameters , Geophysical Research Letters, 33 (2006), p. L06406, https: //doi.org/10.1029/2005GL025604 . [12] R . L . C O O LE Y A N D S . C HR IS TE NS E N , Bias and uncertainty in re gr ession-calibrated models of gr oundwater flow in heterog eneous media , Advances in W ater Resources, 29 (2006), pp. 639–656, https://doi.org/10. 1016/j.advwatres.2005.07.012 . [13] P . S . C RA IG , M . G OL DS TE I N , J . C . R O U G I E R , A ND A . H . S EH EU L T , Bayesian F orecasting for Com- plex Systems Using Computer Simulators , Journal of the American Statistical Association, 96 (2001), pp. 717–729, https://doi.org/10.1198/016214501753168370 . [14] J . D O H E RTY A N D S . C H R I ST EN SE N , Use of pair ed simple and complex models to r educe pr edictive bias and quantify uncertainty , W ater Resources Research, 47 (2011), pp. n/a–n/a, https://doi.org/10.1029/ 2011WR010763 . [15] J . D O HE RTY A ND C . T . S IM MO NS , Gr oundwater modelling in decision support: reflections on a uni- fied conceptual framework , Hydrogeology Journal, 21 (2013), pp. 1531–1537, https://doi.org/10.1007/ s10040- 013- 1027- 7 . [16] J . D OH E RTY A N D D . W E LT E R , A short explor ation of structural noise , W ater Resources Research, 46 (2010), https://doi.org/10.1029/2009WR008377 . [17] R . F E RD OW S I A N , D . J . P A NN E LL , C . M C C A RR O N , A . R Y D E R , A N D L . C R O S S I NG , Explaining gr oundwa- ter hydr ogr aphs: separating atypical rainfall events fr om time tr ends , Soil Research, 39 (2001), pp. 861– 876. [18] R . A . F R E E ZE , J . M A S S M AN N , L . S M I T H , T. S P E RL IN G , A N D B . J A M E S , Hydr ogeolo gical Decision Analy- sis: 1. A F ramework , Ground W ater , 28 (1990), pp. 738–766, https://doi.org/10.1111/j.1745- 6584.1990. tb01989.x . [19] M . G O L DS TE I N A N D J . R O U G I E R , Probabilistic F ormulations for T ransferring Inferences from Mathematical Models to Physical Systems , SIAM Journal on Scientific Computing, 26 (2004), pp. 467–487, https: //doi.org/10.1137/S106482750342670X . [20] M . G O L D ST EI N A N D J . R O U GI ER , Bayes Linear Calibrated Pr ediction for Complex Systems , Jour- nal of the American Statistical Association, 101 (2006), pp. 1132–1143, https://doi.org/10.1198/ 016214506000000203 . [21] G . C . G OO D W I N , M . G E VE RS , A N D B . N IN NE SS , Quantifying the error in estimated transfer functions with application to model or der selection , IEEE T ransactions on Automatic Control, 37 (1992), pp. 913–928, https://doi.org/10.1109/9.148344 . [22] G . C . G O O D W I N A N D M . E . S A LG A DO , A stochastic embedding approac h for quantifying uncertainty in the estimation of r estricted complexity models , International Journal of Adaptiv e Control and Signal Processing, 3 (1989), pp. 333–356, https://doi.org/10.1002/acs.4480030405 . [23] H . V . G U P T A , M . P . C LA RK , J . A . V R U G T , G . A B R AM OW I T Z , A N D M . Y E , T owards a compr ehensive assessment of model structural adequacy , W ater Resources Research, 48 (2012), p. W08301, https: //doi.org/10.1029/2011WR011044 . [24] R . J . H UN T , J . D O H ERT Y , AN D M . J . T O N K I N , Are Models T oo Simple? Arguments for Increased P arame- terization , Ground W ater , 45 (2007), pp. 254–262, https://doi.org/10.1111/j.1745- 6584.2007.00316.x . [25] M . C . K E N N E DY A N D A . O ’ H AG A N , Bayesian calibration of computer models , Journal of the Royal Sta- tistical Society: Series B (Statistical Methodology), 63 (2001), pp. 425–464, https://doi.org/10.1111/ 1467- 9868.00294 . [26] L . L J U NG , Model V alidation and Model Err or Modeling , in Proceedings of the Asrom Symposium on Control, Lund, Sweden, 1999, pp. 15–42. [27] L . L J UN G , System Identification: Theory for the User , Prentice Hall, Upper Saddle River , NJ, 2 edition ed., Jan. 1999. [28] L . L JU N G , P erspectives on system identification , Annual Revie ws in Control, 34 (2010), pp. 1–12, https: //doi.org/10.1016/j.arcontrol.2009.12.001 . O VERCOMING MODEL SIMPLIFICA TIONS 37 [29] L . L JU NG , G . C . G O O D W I N , A N D J . C . A G ER O , Stochastic Embedding Revisited: A Modern Interpretation , nation, 15 (2014), p. 35. [30] D . M C L AU G H LI N A N D L. R . T O W N L E Y , A Reassessment of the Groundwater Inverse Pr oblem , W ater Re- sources Research, 32 (1996), pp. 1131–1161, https://doi.org/10.1029/96WR00160 . [31] D . M C L AU GH LI N A N D E . F. W O O D , A distributed parameter appr oach for evaluating the accuracy of gr oundwater model predictions: 2. Application to gr oundwater flow , W ater Resources Research, 24 (1988), pp. 1048–1060, https://doi.org/10.1029/WR024i007p01048 . [32] C . M O OR E A N D J . D O H ERT Y , Role of the calibration pr ocess in r educing model pr edictive err or , W ater Resources Research, 41 (2005), p. W05020, https://doi.org/10.1029/2004WR003501 . [33] B . N I NN ES S AN D G. C . G O O DW I N , Estimation of model quality , Automatica, 31 (1995), pp. 1771–1797, https://doi.org/10.1016/0005- 1098(95)00108- 7 . [34] E . P O E T ER , All Models ar e Wr ong, How Do W e Know Which ar e Useful? , Ground W ater , 45 (2007), pp. 390– 391, https://doi.org/10.1111/j.1745- 6584.2007.00350.x . [35] E . P . P O E T ER A N D M . C . H I L L , MMA, A computer code for Multi-Model Analysis , U.S. Geological Survey T echniques and Methods 6-E3, 2007. [36] P . R E IC HE RT A N D J . M I E LE IT NE R , Analyzing input and structur al uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters , W ater Resources Research, 45 (2009), p. W10402, https: //doi.org/10.1029/2009WR007814 . [37] W . R E IN ELT , A . G A RU L L I , A N D L . L J U N G , Comparing different approac hes to model error modeling in r obust identification , Automatica, 38 (2002), pp. 787–803, https://doi.org/10.1016/S0005- 1098(01) 00269- 2 . [38] R . R O JAS , S . K A H U ND E , L . P E ET E RS , O . B AT E L A A N , L . F EY EN , A N D A . D A SS A RG UE S , Application of a multimodel appr oach to account for conceptual model and scenario uncertainties in groundwater modelling , Journal of Hydrology , 394 (2010), pp. 416–435, https://doi.org/10.1016/j.jhydrol.2010.09. 016 . [39] J . R O U G I ER , Probabilistic Infer ence for Future Climate Using an Ensemble of Climate Model Evaluations , Climatic Change, 81 (2007), pp. 247–264, https://doi.org/10.1007/s10584- 006- 9156- 9 . [40] J . R O U G I ER A N D M . C RU C IFI X , Uncertainty in climate science and climate policy , [physics], (2014). arXi v: 1411.6878. [41] D . W . S C OT T , Multivariate Density Estimation: Theory, Practice, and V isualization , Wile y , New Y ork, 1st edition ed., Aug. 1992. [42] G . A . F. S E B ER , A matrix handbook for statisticians , W iley-Interscience, Hoboken, N.J., 2008. [43] J . Q . S M IT H , Bayesian Decision Analysis: Principles and Practice , Cambridge Uni versity Press, Sept. 2010. [44] M . S T R O N G A N D J . O A K L E Y , When Is a Model Good Enough? Deriving the Expected V alue of Model Impr ovement via Specifying Internal Model Discr epancies , SIAM/ASA Journal on Uncertainty Quantifi- cation, 2 (2014), pp. 106–125, https://doi.org/10.1137/120889563 . [45] A . T A R A N T O L A , In verse Pr oblem Theory and Methods for Model P arameter Estimation , Society for Indus- trial and Applied Mathematics, Philadelphia, P A, 1 edition ed., 2005. [46] J . R . V O N A SM UT H , K . M AA S , M . B A K K E R , A N D J . P E T ER SE N , Modeling T ime Series of Ground W ater Head Fluctuations Subjected to Multiple Str esses , Ground W ater, 46 (2008), pp. 30–40, https://doi.org/ 10.1111/j.1745- 6584.2007.00382.x . [47] C . I . V O S S , Editors message: Gr oundwater modeling fantasies - part 1, adrift in the details , Hydrogeology Journal, 19 (2011), pp. 1281–1284, https://doi.org/10.1007/s10040- 011- 0789- z . [48] C . I . V O SS , Editors message: Gr oundwater modeling fantasies - part 2, down to earth , Hydrogeology Journal, 19 (2011), pp. 1455–1458, https://doi.org/10.1007/s10040- 011- 0790- 6 . [49] J . A . V RU G T , C . G . H . D IK S , H . V . G U P TA , W . B O U TE N , A N D J . M . V E R S TR A T E N , Impr oved treatment of uncertainty in hydr ologic modeling: Combining the strengths of global optimization and data assimi- lation , W ater Resources Research, 41 (2005), p. W01017, https://doi.or g/10.1029/2004WR003059 . [50] T. A . W ATS ON , J . E . D O HE RT Y , A N D S . C H R I ST E NS EN , P arameter and predictive outcomes of model simplification , W ater Resources Research, (2013), pp. 3952–3977, https://doi.or g/10.1002/wrcr.20145 . [51] J . T. W H I T E , J . E . D O H ERT Y , A N D J . D . H U G H E S , Quantifying the predictive consequences of model error with linear subspace analysis , W ater Resources Research, 50 (2014), pp. 1152–1173, https://doi.org/10. 1002/2013WR014767 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment