Overcoming model simplifications when quantifying predictive uncertainty

O VERCOMING MODEL SIMPLIFICA TIONS WHEN QU ANTIFYING PREDICTIVE UNCER T AINTY GEORGE M. MA THEWS ∗ AND JOHN VIAL ∗ Abstract. It is generally accepted that all models are wrong – the difﬁculty is determining which are useful. Here, a useful model is considered as one that is capable of combining data and expert knowledge, through an in version or calibration process, to adequately characterize the uncertainty in predictions of interest. This paper deriv es conditions that specify which simpliﬁed models are useful and how they should be calibrated. T o start, the notion of an optimal simpliﬁcation is deﬁned. This relates the model simpliﬁcations to the nature of the data and predictions, and determines when a standard probabilistic calibration scheme is capable of accurately characterizing uncertainty . Furthermore, two additional conditions are deﬁned for suboptimal models that determine when the simpliﬁcations can be safely ignored. The ﬁrst allows a suboptimally simpliﬁed model to be used in a way that replicates the performance of an optimal model. This is achiev ed through the judicial selection of a prior term for the calibration process that explicitly includes the nature of the data, predictions and modelling simpliﬁcations. The second considers the dependency structure between the predictions and the available data to gain insights into when the simpliﬁcations can be overcome by using the right calibration data. Furthermore, the derived conditions are related to the commonly used calibration schemes based on Tikhono v and subspace re gularization. T o allo w concrete insights to be obtained, the analysis is performed under a linear expansion of the model equations and where the predictiv e uncertainty is characterized via second order moments only . Key words. uncertainty quantiﬁcation, model calibration, in verse modelling, model simpliﬁcation, model inad- equacy , structural error , hydrogeology , groundwater . 1. Intr oduction. This paper considers the problem of assessing uncertainty in a pre- diction made for a particular system based on the combination of speciﬁc measurement data and expert domain knowledge. The focus of this work is environmental systems, such as riv er basins and groundwater systems, howe ver , the analysis is likely to be applicable to a much wider set of domains. Correctly addressing such en vironmental prediction problems is fundamental to ensuring these important system are managed appropriately and sustain- abl. Probabilistic Bayesian methods pro vide a theoretically consistent set of rules to combine such site speciﬁc data and prior knowledge through the use of a system model. Howe ver , these methods often fail when naively applied to a model that does not capture the full com- plexity of the system. A possible solution is the incorporation of more detail and structural variations in the system model such that more sources of uncertainty are included. Ho wever , it is important to admit that this modelling ad. inﬁnitum is not a solution as unlimited resources are nev er av ailable for detailed model construction and e xecution, and simpliﬁcations must be made, at least at some lev el. Howe ver , deciding on what should be included in a model of a system and what may be ignored is not straightforward. For instance in groundwater hydrology there is no consensus on what is an appropriate le vel of parameterization detail, e.g within the hydraulic conducti v- ity ﬁeld [ 24 , 48 , 47 , 51 ]. In addition to this are the related decisions of what processes should be explicitly represented and what can be ignored. Or alternatively , when is it reasonable to lump many different processes together under a semi-physical or non-physical black box model that directly represents input-output relationships [ 17 , 46 ]. What is necessary is an understanding of the effects of simplifying assumptions made during model de velopment, and appropriate w ays of dealing with them when calibrating the model and generating predictions. A general, qualitativ e, notion of model adequacy was explored by [ 23 ] in terms of the issues f aced within the surf ace water , groundw ater, unsaturated zone, and terrestrial hydrome- ∗ Data61, CSIR O, Australia 1 2 G. M. MA THEWS AND J. VIAL teorology modelling communities. The w ork considered intermediate stages within the mod- elling process from the initial perceptual understanding through to the construction of a com- putational model and introduced a pluralistic deﬁnition of model adequacy . On one extreme is the engineering vie wpoint that deﬁnes a structurally adequate model as one that can repro- duce the input-output relationship of the system, with well characterized uncertainties (error models). On the other extreme is the physical science viewpoint that requires an adequate model to be consistent with the underlying physical system [ 23 ]. This paper will consider systems that may be exposed to future disturbances and limited data is av ailable. W ithin these problems a well characterized regression model (the engi- neering vie wpoint) cannot be constructed and some ph ysical insight of the system is required [ 23 ]. Furthermore, focus is gi ven to environmental management problems, where the role of a model is to help inform a subsequent decision problem related to risk management e.g. en- gineering design [ 18 ], groundwater management [ 15 ], or climate change [ 40 ]. This requires the speciﬁcation of a probability distribution over the predictions of interest such that decision theoretic methods can be applied to quantify the risk and determine the optimal management strategy [ 5 , 43 ]. 1.1. Related W ork. The issue of mismatch between reality and a numerical represen- tation of a system has received considerable attention from may dif ferent perspectives. The presence of simpliﬁcations may be identiﬁed in the data as additional misﬁt that is not consis- tent with measurements errors alone. These additional errors are referred to as: model error [ 30 , 51 , 29 ], model structural error [ 6 ], structural noise [ 16 ], model inadequacy [ 25 , 23 ], model discrepancy [ 19 , 44 ], modelization uncertainties [ 45 ], and others. W ithin the system identiﬁcation and control community [ 27 , 28 ] the concept is often referred to as system under-modelling [ 33 , 37 ] and two main probabilistic approaches hav e been dev eloped: stochastic embedding [ 22 , 21 , 29 ] and model error modelling [ 26 ]. These al- low subjecti ve information about the discrepancy between the transfer functions that describe the dynamics of the real system and that of a numerical model to be explicitly deﬁned and combined with a time series dataset collected from the system. This in turn allo ws prediction uncertainty to include uncertainty due to modelling errors and measurement errors contained within the data. These approaches generally allow a very ﬂe xible black box representation of the system dynamics to be used. In the statistical modelling community , computational simulators ha ve been considered as an explicit approximation to the real physical system they are attempting to represent. The mismatch is represented as an additional error term and modelled probabilistically , generally with explicit space time correlation [ 25 , 13 , 19 , 20 , 39 , 40 ]. Due to the computational com- plexity of the methods for high dimensional models, approximate Bayesian methods hav e also been developed [ 20 ]. Such simulators may also be decomposed and internal error terms included to represent the structural errors within smaller submodels [ 44 ]. Other work within hydrology has examined model simpliﬁcations and ho w these effect the calibration and prediction processes. In particular it has been sho wn that model parame- ters that are designed to represent particular physical properties may be forced to undertake surrogate roles to compensate for the simpliﬁcations during the model calibration process. This surrogacy has the potential to introduce additional biases into the predictions that are unaccounted for by typical probabilistic methods. This has been analyzed by e xplicitly mod- elling the simpliﬁcation process in [ 31 , 16 , 14 , 50 , 51 ]. Furthermore, se veral strategies have been considered to ov ercome the approximations inherent in a simpliﬁed model. This includes the generalized likelihood uncertainty estima- tion (GLUE) method [ 7 ] that modiﬁes the data lik elihood function such that less information is extracted from the data. For a dynamical system, the assumption of a deterministic dy- O VERCOMING MODEL SIMPLIFICA TIONS 3 namic transition function can be relaxed and a stochastic process or transition function used instead [ 49 , 11 ] (such models are often referred to as data assimilation methods). Similarly the assumption that the model parameters are time in v ariant may be related such that they can change ov er time to better match the observed system behavior [ 36 ]. It is noted that man y of these methods rely on a noticeable discrepancy between the mea- sured data and model predicted v alues. Howe ver , the adv erse ef fects of model simpliﬁcations may not necessarily cause any additional misﬁt between the data and model output, and thus may go unnoticed during model calibration and prediction [ 51 ]. Thus it is critical to ha ve a theoretical understanding of model adequacy that goes be yond data ﬁt. 1.2. A pproach and Contributions. This paper considers a subjective Bayesian frame- work and formally deﬁnes what is an adequate or appropriate model by explicitly considering: (i) the nature of the predictions, (ii) the av ailable data and prior knowledge, (iii) the model simpliﬁcation strate gy , and (i v) the calibration scheme. It is shown that these are intrinsically linked and performing model calibration should take into account the nature of the data, pre- dictions and any inherent limitations of the simpliﬁed computational model. It is noted that this departs from classical guidance on calibration and inv ersion that focuses on data misﬁt only [ 30 , 45 , 2 ]. A dual model approach is used, where a “reference” [ 31 ] or “reality” [ 51 ] model is used to describe how the system is believ ed to function. Such a high ﬁdelity reference model has been used in the past to characterize the performance of a simpliﬁed model. Here, the approach is extended to explicitly determine when and ho w a simpliﬁed model can be used to generate an accurate, or at least conservati ve, estimate of uncertainty in a prediction of interest. T o allo w concrete insights to be produced, only linear(ized) problems are considered and the beliefs are restricted to be Gaussian, where the second order moment is sufﬁcient to capture uncertainty . W ithin this framew ork, model simpliﬁcations are represented as a subspace projection that restricts the ﬂe xibility of the simpliﬁed model. This pre vents the model from being able to fully represent the complexity of how the real system is believed to function. It is sho wn that if the model simpliﬁcations are ignored and standard probabilistic calibration methods are used, the model may generate overconﬁdent and non-conserv ativ e predictions. Generally , this is a voided only when the simpliﬁcation strategy is optimal . The k ey contribution of this paper is the characterization of two new calibration and prediction schemes for suboptimally simpliﬁed models that avoid this under estimation of uncertainty . The ﬁrst scheme allows a simpliﬁed model to be used in a way that replicates the performance of an optimal sch- eme through the appropriate modiﬁcation of the prior, or re gularization, term. In the second scheme, focus is giv en on linking the predictions with the available data such that insight is gained into when the simpliﬁcations can be ov ercome by gathering the appropriate set of data. Speciﬁcally , this paper fully deﬁnes the general problem of assessing uncertainty with a simpliﬁed model in Section 2 . Section 3 summarizes the optimal benchmark solution based on the high ﬁdelity reference model. The general representation of model simpliﬁcations based on linear projection is deﬁned in Section 4 . A typical probabilistic calibration sch- eme is considered in Section 5 that ignores the effects of model simpliﬁcations within the calibration and prediction process. This section deﬁnes the concept of an optimal simpliﬁ- cation that determines when such a scheme is adequate for assessing uncertainty . Section 6 considers suboptimally simpliﬁed models and introduces two new calibration and prediction schemes, and deﬁnes the conditions when they are adequate for assessing predictiv e uncer- tainty . Discussions and directions for future research are covered in Section 7 . A worked example in volving a simpliﬁed groundw ater prediction problem is given in Appendix A . 4 G. M. MA THEWS AND J. VIAL 2. Pr oblem Deﬁnition. Consider a system where a modeller is task ed with generating a probability distrib ution ov er a prediction of a speciﬁc feature of the system to inform a future decision process. For instance in a groundwater system, the prediction of interest may be the reduction in water le vels that will be experienced by an aquifer due to an increase in future extractions. The prediction of interest will be denoted by the vector p ∈ R D p . T o aid the modeller , a limited number of measurements have been made on the sys- tem and will be represented by d ∈ R D d . Furthermore, it is considered that expert prior knowledge exists on how the system functions, for instance the physical processes inv olved, and allo ws the available data and prediction of interest to be related to the properties of the underlying system. This information shall be considered to deﬁne a high ﬁdelity or reference model for the system with parameters denoted by the vector x ∈ R D x . Furthermore, the relationship between the data d , predictions p , and the underlying parameters x that the modeller believ es exists shall be represented by the pair of equations d = G ( x ) + errors , (1) p = Y ( x ) + errors . (2) It is considered that these functions include the operation of known physical laws, such as conservation of mass, energy and momentum, while the vector x includes a detailed repre- sentation of the forcing terms, initial conditions, material properties, etc. These requirements result in a very high dimensional v ector x and complex functions G and Y . Furthermore, it is considered that epistemic uncertainty exists as to the v alue of x for the system under inv estigation. In addition, the errors on the right of (1) and (2) allow for the inclusion of additional uncertainty , for instance to capture measurement errors introduced by the data acquisition method, or uncertainty that may exist in how a physical process actually functions [ 45 ]. This allo ws the modeller’ s complete belief about ho w the system behaves to be captured by three probability distribution functions (3) p ( x ) , p ( d | x ) , p ( p | x ) . These in turn can be used to generate a posterior distrib ution o ver the predictions of interest, denoted as p ( p | d ) , using standard rules of probability theory . It is noted that a major issue with pursuing this type of subjectiv e Bayesian approach is the requirement that these probability distrib utions must represent e very detail that is belie ved to be present in the physical system. This is not possible and simpliﬁcation must be made. 2.1. Simpliﬁcations. It is considered that the process of modelling, for instance as out- lined in [ 23 ], transforms the detailed knowledge of the modeller and produces the desired simpliﬁed computational representation of the problem that can be solved within the typical computational constraints. Speciﬁcally , the output of the modelling process is (or should be) an approximate representation of the modeller’ s beliefs that can be processed numerically to generate a prediction posterior distribution that is somehow similar to the optimal, but compu- tationally intractable, distribution p ( p | d ) generated by the high ﬁdelity model. The modelling process is depicted in Figure 1 where the outputs are denoted by the set of approximate beliefs (4) ˆ p ( v ) , ˆ p ( d | v ) , ˆ p ( p | v ) . These probability distributions are deﬁned ov er a simpliﬁed description of the system, de- noted by the parameter vector v ∈ R D v . Furthermore, it is considered that the approximate O VERCOMING MODEL SIMPLIFICA TIONS 5 Perceptual Model Conceptual Model Mathematical Model Computational Model Subjectiv e expert beliefs: p ( x ) , p ( d | x ) , p ( p | x ) → p ( p | d ) Simpliﬁed beliefs: ˆ p ( v ) , ˆ p ( d | v ) , ˆ p ( p | v ) → ˆ p ( p | d ) F I G . 1 . The pr ocess of modelling is to transform the modeller’ s beliefs into a set of approximate pr obabilities that can be pr ocessed numerically to generate a conservative posterior pr obability distribution over the predictions of inter est. posterior ˆ p ( p | d ) should not just be similar to posterior p ( p | d ) , it should be in some sense conservative such that the uncertainty is not underestimated [ 15 ]. Thus, the key question that is addressed in this paper is no w speciﬁed as: How can the subjective beliefs of a modeller , deﬁned for the high ﬁdelity r eference model, be transformed to generate appr oximate pr obabilities that deﬁne a simpliﬁed computational model such that the computed pr ediction posterior ˆ p ( p | d ) is a conservative approximation of the optimal r eference posterior p ( p | d ) ? This question explicitly links the typically separate problems of: model construction, calibration and prediction. Furthermore, answers will be provided under the restriction that the uncertainties are described by multiv ariate Gaussian distributions and the dependencies are linear . This will allow more speciﬁc and concrete insights to be gained into the effects of simpliﬁcations and how they may be ov ercome. Extensions to more general non-Gaussian and nonlinear systems is left for future work. 3. Optimal Inference and Prediction. The optimal Bayesian prediction scheme, which does not consider any simpliﬁcations, is now brieﬂy re viewed [ 5 , 45 , 2 ]. This is based on the high ﬁdelity reference model. Let the predictions of interest, the av ailable data, and the parameters of the high ﬁdelity model be deﬁned by the random vectors: p ∈ R D p , d ∈ R D d , and x ∈ R D x , where D p , D d and D x are their respective dimensions. It is considered that the dimension of x is excessi vely large, while the data is limited, and the number of predictions is small (perhaps only one), such that D x  D d , D p . Prior knowledge of the parameter v ector is represented by the Gaussian probability den- sity function p ( x ) , with mean of zero and known second order moment Σ x such that (5) p ( x ) = N ( x ; 0 , Σ x ) . It is noted that the requirement of a zero mean distribution simpliﬁes the analysis and can be met in general with a transformation into increments. Now , the believ ed relationship between the system properties x , and the data vector d is considered to be linear with coefﬁcient matrix G . This can be considered as a linearized ap- proximation of (1) , howe ver the approximation introduced by the linearization is considered out of scope in this paper . In addition, the uncertainty that is belie ved to e xist in this relation- 6 G. M. MA THEWS AND J. VIAL ship is represented by the error term δ such that d = Gx + δ . Furthermore, the uncertainty in the value of δ is represented by a zero mean Gaussian density with known co variance matrix of Σ δ . In the simplest case, the matrix Σ δ represents errors within the measurement pro- cess. This allows the modeller’ s conditional belief of the measured data gi ven the underlying parameters to be deﬁned by the Gaussian density (6) p ( d | x ) = N ( d ; Gx , Σ δ ) . The prediction of interest p is also considered to be linearly dependent on x with a coefﬁcient matrix Y , and an additiv e error denoted by ρ , such that p = Yx + ρ . In addition, it is considered that the uncertainty in the value of the error is a zero mean Gaussian density with known co variance Σ ρ . It is noted that this requires that δ and ρ are independent. If any correlation exists, it must be included in x . Finally , the conditional belief of the prediction giv en the system properties can now be deﬁned as (7) p ( p | x ) = N ( p ; Yx , Σ ρ ) . Note that if the system is belie ved to be well described by a deterministic model, the cov ari- ance of the prediction error could be considered negligible. The calibration or in version stage seeks to determine the posterior belief over the param- eters x given the a vailable data d p ( x | d ) = p ( x ) p ( d | x ) R p ( x ) p ( d | x ) d x . Here, the posterior density p ( x | d ) = N ( x ; µ x | d , Σ x | d ) is Gaussian with a mean and cov ari- ance giv en by µ x | d = Σ x G > ( GΣ x G > + Σ δ ) − 1 d , (8) Σ x | d = Σ x − Σ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x . (9) The coefﬁcient matrix of the data d in (8) is sometimes referred to as the optimal estimator, or gain matrix, and will be denoted by E = Σ x G > ( GΣ x G > + Σ δ ) − 1 . T o propagate the posterior belief into the prediction of interest, requires the integration ov er the underlying parameters p ( p | d ) = Z p ( p | x ) p ( x | d ) d x . The prediction posterior density p ( p | d ) = N ( p ; µ p | d , Σ p | d ) is Gaussian with mean and cov ariance giv en by µ p | d = Y µ x | d = YΣ x G > ( GΣ x G > + Σ δ ) − 1 d , Σ p | d = YΣ x | d Y > + Σ ρ = YΣ x Y > + Σ ρ − YΣ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x Y > . The ov erall calibration and prediction scheme is now summarized belo w . This sets the benchmark for the other schemes considered in this paper . D E FI N I T I O N 1 (Optimal Scheme). The optimal scheme is denoted by the set of lin- ear Gaussian probability density functions p ( x ) , p ( d | x ) and p ( p | x ) , parameterized by the matrices Σ x , G , Σ δ , Y , and Σ ρ , as deﬁned in (5) – (7) . O VERCOMING MODEL SIMPLIFICA TIONS 7 x Σ x d G , Σ δ p Y , Σ ρ F I G . 2. Bayesian network depicting the structure of the densities p ( x ) , p ( d | x ) , p ( p | x ) that employ the high ﬁdelity model. F or a given dataset d , the posterior belief over the predictions of inter est is the Gaussian density p ( p | d ) = N ( p ; µ p | d , Σ p | d ) , with mean and covariance deﬁned by the functions µ ( · ) and Σ ( · ) respectively µ p | d = µ ( Σ x , G , Σ δ , Y , Σ ρ , d ) (10) = YΣ x G > ( GΣ x G > + Σ δ ) − 1 d , Σ p | d = Σ ( Σ x , G , Σ δ , Y , Σ ρ ) (11) = YΣ x Y > + Σ ρ − YΣ x G > ( GΣ x G > + Σ δ ) − 1 GΣ x Y > . Note that the covariance matrix of the posterior is not a function of the data d . The structure of the above prediction scheme that encapsulates the prior densities p ( x ) , p ( d | x ) , p ( p | x ) that are based on the high ﬁdelity model is depicted graphically in Figure 2 . 4. Model Simpliﬁcations. The simpliﬁed computational model is considered to contain an approximate system simulator, with exposed parameters v and linear structure analogous to that deﬁned for the high ﬁdelity model d ≈ ˜ Gv + δ , (12) p ≈ ˜ Y v + ρ . (13) Here ˜ G and ˜ Y are linear data and prediction matrices of a simpliﬁed system simulator . The exact form of the approximations within these relationships are of critical interest and will be explored by e xplicitly representing the simpliﬁcations inv olved. T o achieve this consider initially that the parameters v are someho w physically inspired and hav e some meaning to the modeller in describing aspects of the system. This allows the parameters of the simpliﬁed model to be used to describe a restricted version of the high ﬁdelity model with parameters x 0 ∈ R D x determined by the parameter vector of the simpliﬁed model v 0 ∈ R D v via a matrix C ∈ R D x × D v (14) x 0 = Cv 0 . This explicit representation of model simpliﬁcations allows aspects of the system to be ig- nored [ 16 , 51 ] and allo ws for the incorporation of arbitrary linear parameterization schemes, such as spatial and temporal homogeneity [ 30 ]. W ith this deﬁnition, the simpliﬁed data and prediction matrices may be rewritten in terms of the matrices of the high ﬁdelity model ˜ G = GC , (15) ˜ Y = YC . (16) 8 G. M. MA THEWS AND J. VIAL The matrix C will be referred to as a simpliﬁcation matrix and provides a way of linking the simpliﬁed simulator to the original high ﬁdelity reference model of how the system functions. Finally , it is important to note that physical intuition is not strictly required when deﬁning the simple model as for any G , Y and ˜ G , ˜ Y , such that  G Y  has full ro w rank, there is al ways a C that satisﬁes (15) and (16) . Howe ver , this may not be unique and thus there are potentially many dif ferent ways to interpret the meaning of the simple model. Here, it is considered that the modeller speciﬁes the interpretation by e.g. specifying C directly . 4.1. Representing Unmodelled Complexity . A simpliﬁcation matrix C ∈ R D x × D v , divides the space R D x into two perpendicular subspaces: columnspace( C ) and cokernel( C ) . The column space contains the set of parameter vectors that can be e xplicitly represented by a lo w dimensional vector . The second subspace cokernel( C ) contains vectors that includes some degree of comple xity that cannot be represented by a low dimensional vector . Before deﬁning these further , the following assumption is made. A S S U M P T I O N 1. A simpliﬁcation matrix C has full column rank, i.e . rank( C ) = D v . It is noted that if a simpliﬁcation matrix C does not have full column rank, then there will be parameter combinations of the simple model that hav e the same inﬂuence on the high ﬁdelity model, which implies that the simple model is not as simple as it could be for the same expressi ve po wer . Now , consider the singular value decomposition of C under the abov e assumption (17) C = USV > =  U C U ¯ C   S C 0  V > C = U C S C V > C , where U C is the collection of orthogonal unit vectors that span the column space of C and similarly U ¯ C spans the subspace perpendicular to this, the cokernel of C . This enables the parameter vector of the high ﬁdelity model to be e xpanded as (18) x = Cv + U ¯ C u , where v is the parameter v ector of the simple model and u is a random vector that captures all the unmodelled complexity . Finally , it is noted that this expansion is unique and only dependent on the deﬁnition of the simpliﬁcation matrix C . Under the abov e expansion, the data and prediction is related to the two components v and u through d = Gx + δ = ˜ Gv + δ + η z }| { GU ¯ C u , (19) p = Yx + ρ = ˜ Y v + ρ + YU ¯ C u | {z }  . (20) From these equations, it is noted that the unmodelled components of the system will be ex- pressed in the data whene ver GU ¯ C 6 = 0 and in the predictions whene ver YU ¯ C 6 = 0 . These additional error terms are denoted as η and  respectively and play a very dif ferent role to the other error terms δ and ρ as they are correlated via u . It is noted that the additional error η introduced by the model simpliﬁcations was explicitly considered in [ 30 ] and [ 45 ] and was used to deﬁne a composite measurement error model for the sum δ + η . Howe ver , the effects of the simpliﬁcations on the prediction, including the explicit correlation between η and  , was not addressed. In an ideal world it could be argued that the objecti ve of any modelling ex ercise should be to construct a model that is a simpliﬁed version of ho w the system is believed to function O VERCOMING MODEL SIMPLIFICA TIONS 9 that captures the full complexity of the av ailable data and predictions of interest. That is, which has a C such that η =  = 0 . Such a simpliﬁcation matrix always exists, and will be referred to as an optimal simpliﬁcation. D E FI N I T I O N 2 (Optimal Simpliﬁcation). Let G and Y be data and predictions matrices for a high ﬁdelity system model. Then a simpliﬁcation matrix C is optimal if (21) GU ¯ C = 0 and YU ¯ C = 0 , wher e U ¯ C spans the cokernel of C , as deﬁned in (17) . It is noted that the above deﬁnition of optimal simpliﬁcation dif fers from that of [ 16 , 14 , 50 ] in that it explicitly considers the nature of the predictions. An optimal simpliﬁcation matrix, C ∗ , can be found for any gi ven data and prediction matrices G and Y , by ﬁrst taking a singular value decomposition of the matrix Z =  G Y  (22) Z =  G Y  =  U Z U ¯ Z   S Z 0 0 0   V > Z V > ¯ Z  . Now , an optimal simpliﬁcation matrix is giv en by C ∗ = V Z . It is considered that an optimally simpliﬁed model is dif ﬁcult, or even achiev able, to obtain in practice and the remainder of the paper will focus on understanding and ov ercoming the issues caused by suboptimal simpliﬁcations. 5. Naive Use of Simpliﬁed Models. Here, a prediction scheme is deﬁned that ﬁrst infers the parameters of the simple model and then propagates them into the prediction, b ut ignores the potential for unmodelled complexity to be expressed in the data or predictions. This represents a standard probabilistic Bayesian approach to model calibration and prediction. It is sho wn that this scheme is generally not conservati ve as the uncertainty in the predictions is underestimated. 5.1. Prior Inf ormation. The scheme will have the same basic structure as the optimal benchmark scheme. In particular it will include an explicit representation of a prior over the parameters ˆ p n ( v ) , a data likelihood ˆ p n ( d | v ) and conditional density for the prediction ˆ p n ( p | v ) . Here the superscript n denotes approximate distributions associated with this naive method. The conditional densities ˆ p n ( d | v ) and ˆ p n ( p | v ) are deﬁned directly with the simpliﬁed data and prediction matrices ˜ G and ˜ Y respectiv ely . Furthermore, the cov ariances are set to those used in the optimal scheme i.e. ˆ p n ( d | v ) = N ( d ; ˜ Gv , Σ δ ) , (23) ˆ p n ( p | v ) = N ( p ; ˜ Y v , Σ ρ ) . (24) This formulation of the data and prediction densities is equiv alent to ignoring the approxima- tions in (12) and (13) introduced by the model simpliﬁcations. Now , to be somewhat rigorous, the speciﬁcation of a prior distribution on the parameters v is performed by e xplicitly considering the nature of the simpliﬁcation and the uncertainty in the parameters of the high ﬁdelity model. This is accomplished by considering a transfor- mation that propagates the prior distribution ov er the parameters of the high ﬁdelity model x into the joint space formed by the parameter v ector of the simple model v and the vector u that denotes the unmodelled complexity . This is deﬁned in the following proposition. P R O P O S I T I O N 1 (Uncertainty Propagation). Let p ( x ) = N ( x ; 0; Σ x ) be a zer o mean Gaussian density , with co variance Σ x , that captures the prior uncertainty in the parameter s 10 G. M. MA THEWS AND J. VIAL of a high ﬁdelity model. Also, let C be a simpliﬁcation matrix that describes a simpliﬁed model. Then, the joint pr obability density function for v and u , r epresenting the modelled and unmodelled components of the simple model r espectively , is the zer o mean corr elated Gaus- sian density p ( v , u ) = N   v u  ;  0 0  ,  Σ v Σ vu Σ > vu Σ u   , wher e the covariance terms ar e given by Σ v = C † Σ x C † > , Σ u = U > ¯ C Σ x U ¯ C , and Σ vu = C † Σ x U ¯ C . Her e, ( · ) † is the pseudoin verse operator and U ¯ C spans the cokernel of C as deﬁned in (17) . See Appendix B for a proof. This proposition deﬁnes the covariance matrix Σ v , that represents the uncertainty in the parameters of the simpliﬁed model, to be the orthogonal projection of Σ x onto the column space of C . This is the most rigorous approach to specifying the prior over the parameters of the simple model as it preserves the uncertainty that is believ ed to e xist and is consistent with the physical intuition used to construct the simpliﬁed model. This marginal density is used to deﬁne the prior of the nai ve prediction scheme ˆ p n ( v ) = N ( v ; 0 , Σ v ) . 5.2. Naive Calibration and Prediction. The naive calibration and prediction scheme is now deﬁned that generates a posterior via the standard rules of probability theory . This is similar to the optimal scheme, but without consideration to the simpliﬁcations within the model’ s data and prediction equations. D E FI N I T I O N 3 (Naive Scheme). A naive pr ediction scheme is denoted by the simpli- ﬁed linear Gaussian density functions ˆ p n ( v ) , ˆ p n ( d | v ) and ˆ p n ( p | v ) parameterized by the matrices Σ v , ˜ G , Σ δ , ˜ Y , and Σ ρ . Furthermor e, the prior co variance matrix Σ v is deﬁned in terms of a high ﬁdelity model using Pr oposition 1 , that is (25) Σ v = C † Σ x C † > , wher e Σ x denotes a covariance matrix that describes the uncertainty in the parameters of the high ﬁdelity model, and C denotes the simpliﬁcation matrix that links the parameters of the simple and high ﬁdelity models. F or a given dataset d , the posterior belief over the predictions of interest g enerated by this naive scheme is the Gaussian density ˆ p n ( p | d ) = N ( p ; ˆ µ n p | d , ˆ Σ n p | d ) , with mean and covariance deﬁned as ˆ µ n p | d = µ ( Σ v , ˜ G , Σ δ , ˜ Y , Σ ρ , d ) , (26) ˆ Σ n p | d = Σ ( Σ v , ˜ G , Σ δ , ˜ Y , Σ ρ ) , (27) wher e the functions µ ( · ) and Σ ( · ) ar e given in Deﬁnition 1 . 5.3. Perf ormance. T o assess the performance of the nai ve scheme the notion of conser - vati veness, discussed in Section 2 , is ﬁrst deﬁned by considering the squared error in the mean O VERCOMING MODEL SIMPLIFICA TIONS 11 prediction. This is performed for a pair of probability density functions in Deﬁnition 4 , and extended to calibration and prediction schemes in Deﬁnition 5 . A further generalization of these deﬁnitions, which explicitly incorporates the utility function of the subsequent decision problem, is proposed in Section 7 . D E FI N I T I O N 4 (Conservati ve Density). Let p ( p ) and ˆ p ( p ) be density functions deﬁned over the random vector p . Furthermor e, let ˆ µ p be the mean of ˆ p ( p ) . Then ˆ p ( p ) is deﬁned as a conservative appr oximation of the refer ence density p ( p ) if the approximate density’ s expected mean squared error E ˆ p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > } , does not incr ease when the ex- pectation is instead taken with r espect to r eference density p ( p ) (28) E ˆ p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > }  E p ( p ) { ( ˆ µ p − p )( ˆ µ p − p ) > } . Wher e A  B r equires that A − B is positive semi-deﬁnite. It is noted that condition (28) can be rewritten in terms of the means µ p , ˆ µ p and cov ariances Σ p , ˆ Σ p of the two densities, and yields the condition ˆ Σ p  Σ p + ( ˆ µ p − µ p )( ˆ µ p − µ p ) > . An example of a conserv ativ e density , that satisﬁes Deﬁnition 4 , is giv en in Figure 3 . The notion of conserv ativeness is now generalized to a prediction scheme. This is per- formed by considering an expectation o ver typical data. D E FI N I T I O N 5 (Conservati ve Scheme). Let the densities p ( x ) , p ( d | x ) , and p ( p | x ) de- note the prior information of a r eference scheme. Also, let p ( p | d ) denote the posterior density this sc heme generates for a given dataset d . Furthermore , let ˆ p ( p | d ) denote a posterior den- sity generated by an appr oximate scheme. F or the given dataset d , the de gr ee of conservativeness of the approximate posterior is denoted by the function (29) Ω( d ) = E ˆ p ( p | d ) { ( ˆ µ p | d − p )( ˆ µ p | d − p ) > } − E p ( p | d ) { ( ˆ µ p | d − p )( ˆ µ p | d − p ) > } . Now , the appr oximate scheme is deﬁned as conservative if the expectation of the function Ω( d ) is positive semi-deﬁnite (30) E p ( d ) { Ω( d ) }  0 . Her e, the density p ( d ) denotes how probable differ ent datasets are under the prior knowledge of the r eference sc heme and is given by p ( d ) = R p ( x ) p ( d | x ) d x . It is noted that for the linear Gaussian densities considered in this work, the covariance of the posterior distributions are independent of the data and (30) can be simpliﬁed as (31) ˆ Σ p | d  Σ p | d + E p ( d )  ( ˆ µ p | d − µ p | d )( ˆ µ p | d − µ p | d ) >  . Thus, a scheme is considered as conservati ve if it generates a posterior covariance matrix ˆ Σ p | d that is inﬂated with respect to Σ p | d by an amount dependent on the av erage squared difference in the posterior means. It is now of interest to consider the performance of the nai ve scheme and determine when it is conservati ve with respect to the optimal benchmark scheme. This is performed in the following proposition. 12 G. M. MA THEWS AND J. VIAL 2 σ 2 ˆ σ µ ˆ µ x p ( x ) ˆ p ( x ) F I G . 3 . The Gaussian pr obability density function ˆ p ( x ) , with mean ˆ µ and standar d deviation ˆ σ , is a conserva- tive estimate of the Gaussian density p ( x ) , with mean µ and standar d deviation σ , as ˆ σ 2 ≥ σ 2 + ( µ − ˆ µ ) 2 . P R O P O S I T I O N 2 (Performance of Naiv e Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ deﬁne an optimal calibration and pr ediction scheme , with posterior density denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Let C denote a simpliﬁcation matrix such that the matrices Σ v = C † Σ x C † > , ˜ G = GC , Σ δ , ˜ Y = YC , Σ ρ deﬁne a naive pr ediction scheme for a simpliﬁed model. Now , consider the posterior density generated by the naive sc heme ˆ p n ( p | d ) = N ( p ; ˆ µ n p | d , ˆ Σ n p | d ) . 1. If C is an optimal simpliﬁcation, then the generated mean and covariance of the naive pr ediction scheme ar e equivalent to those of the optimal prediction scheme, that is ˆ µ n p | d = µ p | d and ˆ Σ n p | d = Σ p | d for all d . 2. If C is a suboptimal simpliﬁcation, then the naive sc heme is conservative, pr ovided the following condition holds (32) YU ¯ C = ˜ Y E n GU ¯ C , wher e E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 denotes the estimator matrix of the naive scheme and U ¯ C spans the cokernel of C as deﬁned in (17) . 3. If C is a suboptimal simpliﬁcation, condition (32) does not hold, u and v ar e con- sider ed independent, and the covariance in the unmodelled complexity is non-zer o Σ u 6 = 0 , then the naive scheme is strictly non-conservative . See Appendix C for a proof. Under a suboptimal simpliﬁcation, the non-conservati ve nature of the prediction scheme is caused by the inﬂuence of unmodelled complexity on the data and/or predictions. If it inﬂuences the predictions, then there is a direct bias ef fect. If it inﬂuences the data, then this causes the estimated parameters of the simple model to become biased by forcing them to take on surrogate roles to compensate for the simpliﬁcations. Furthermore, this bias in the estimated parameters is propagated to the prediction. Neither of these ef fects are taken into account by the naiv e prediction scheme and the uncertainty is underestimated. The condition in (32) , which guarantees the scheme is conservati ve, requires that these two sources of errors in the prediction caused by the unmodelled comple xity to exactly cancel each other out. This is unlikely to hold in practical scenarios without careful attention to the data, predictions, simpliﬁcations, and the type of prior knowledge available. This condition will form the basis of the data driv en prediction scheme introduced in Section 6 . O VERCOMING MODEL SIMPLIFICA TIONS 13 d ˜ G , Σ δ p ˜ Y , Σ ρ v Σ v u F I G . 4. Bayesian network depicting the structur e of the densities ˆ p n ( v ) , ˆ p n ( d | v ) , and ˆ p n ( p | v ) employed by the naive scheme . Also displayed in dashed lines is the dependency on unmodelled complexity u under the conditions of an optimal simpliﬁcation. Note that under an optimal simpliﬁcation, the unmodelled complexity does not have any dir ect effect on the data or predictions. Finally , it is noted that for the general case when v and u are not independent and (32) does not hold, the naiv e scheme is still not guaranteed to be conservati ve. Howe ver there are more special cases which may produce conservati ve posterior densities. The conditions that delineate the strictly conserv ativ e from the non-conservati ve scenarios are likely to be of limited interest in realistic problems and hav e not been enumerate here. 5.4. Summary . This section has formally deﬁned a probabilistic calibration and predic- tion scheme that embeds the standard separation of modelling, calibration and prediction. In particular , physical knowledge is used to deﬁne a prior probability distrib ution for the param- eters included in the simpliﬁed model. Also, the conditional data and prediction probability distributions are deﬁned by ignoring the simplifying assumptions used to construct the data and prediction equations. Furthermore, it is sho wn that a true characterization of predicti ve uncertainty typically requires an optimally simpliﬁed model. For suboptimally simpliﬁed models it is shown that this scheme generally underestimates the uncertainty in the predic- tions and is not conservati ve. The dependency structure of the probability distributions for this naive scheme is de- picted in Figure 4 . Also displayed is the dependency on the unmodelled complexity for an optimally simpliﬁed model. It is noted that under this condition the data and predictions are conditionally independent of the unmodelled complexity u giv en the model parameters v . Finally , it is noted that the non-conservati ve, or o verconﬁdent, nature of predictions gen- erated by nai vely applying probabilistic Bayesian methods on simpliﬁed models is not a ne w ﬁnding. In particular , the result can be considered as a generalization of [ 16 , 14 , 51 ] and is consistent with arguments of [ 7 ]. 6. Overcoming Model Simpliﬁcations. It is of interest now to understand what to do about the issue of ov erconﬁdence when a suboptimal model is used. Firstly , it is noted that this issue may not be a problem. For instance, the prediction uncertainty may turn out to be larger than expected, and this in itself may provide sufﬁcient information to allow a giv en management decision to be made. Howe ver , the situation becomes more dif ﬁcult when it is necessary to determine an accurate or at least conservati ve prediction probability distribution, for instance as required by [ 15 ]. In this scenario the modeller may opt to: • Expand the sources of uncertainty considered in the model through less simplistic 14 G. M. MA THEWS AND J. VIAL modelling assumptions with the hope of producing an optimally simpliﬁed model. This is not just about representing greater spatial/temporal detail in the model [ 24 ], but more importantly the goal should be the inclusion of more sources of uncer- tainty . As an example, if there exists uncertainty over the presence of some impor - tant structural feature, a multi-hypothesis, or multi-model, analysis may be needed [ 34 , 35 , 38 ]. • Directly model the additional errors introduced by the model simpliﬁcations. For instance by deﬁning an additional probabilistic model for p ( η ,  ) introduced in (19) and (20) that captures the uncertainty the modeller has in how well the simplistic model represents the beha vior of the system. Such approaches have been developed in [ 25 , 13 , 19 , 20 , 39 , 40 ]. • W ithin the conte xt of a dynamical system, remo ve the deterministic assumption on how the system e volves ov er time. This allows the simpliﬁcations to be represented by an error model within the modelled transition function. Such a scheme is em- ployed by [ 49 ]. Alternativ ely , the assumption that the model parameters are time in variant can be relaxed such that the parameters can change over time to better match the underlying system behavior and inject greater uncertainty in the predic- tions [ 36 ]. • Modify the way that the data is used through changing the likelihood model p ( d | v ) such that less information is propagated into the prediction via the model parameters. The generalized likelihood uncertainty estimation method [ 7 ] can be considered as an example of this approach. In the remainder of this section, two additional approaches are deﬁned, along with ex- plicit conditions that determine when they are appropriate. Firstly a scheme is deﬁned that allows the posterior of the ideal prediction scheme to be reproduced using a suboptimally simpliﬁed model through the appropriate modiﬁcations of the prior covariances. Secondly , structural considerations of the data and predictions are considered and an approach devel- oped that allows the simpliﬁcations to be overcome through the use of the right calibration data. 6.1. Optimal Use of Simpliﬁed Models. The naiv e scheme de veloped pre viously was not conserv ative for a suboptimally simpliﬁed model. Here a prediction scheme is deﬁned that is capable of reproducing the optimal scheme, ev en under suboptimal simpliﬁcations. T o start, consider the probability densities ˆ p o ( v ) , ˆ p o ( d | v ) , ˆ p o ( p | v ) to be deﬁned in a similar manner to the naiv e scheme, but parameterized in terms of new covariance matrices ˆ Σ o v , ˆ Σ o δ and ˆ Σ o ρ such that ˆ p o ( v ) = N ( v ; 0 , ˆ Σ o v ) , ˆ p o ( d | v ) = N ( d ; ˜ Gv , ˆ Σ o δ ) , ˆ p o ( p | v ) = N ( p ; ˜ Y v , ˆ Σ o ρ ) . Here the covariance matrices will be considered as adjustable such that they can be chosen in such a way that the y compensate for the simpliﬁcations and allo w the optimal posterior to be reproduced. Thus, it is of interest to select ˆ Σ o v , ˆ Σ o δ and ˆ Σ o ρ such that the posterior density that is produced by their combination, ˆ p o ( p | d ) , replicates the posterior of the optimal scheme, i.e. (33) ˆ p o ( p | d ) = p ( p | d ) for all p , d . The posterior density ˆ p o ( p | d ) is Gaussian, with mean and covariance deﬁned in a similar fashion to the optimal scheme. Thus, the above condition will be satisﬁed when the means O VERCOMING MODEL SIMPLIFICA TIONS 15 and cov ariances are equiv alent, which occurs when ˜ Y ˆ Σ o v ˜ G > [ ˜ G ˆ Σ o v ˜ G > + ˆ Σ o δ ] − 1 = YΣ x G > [ GΣ x G > + Σ δ ] − 1 (34a) and (34b) ˜ Y ˆ Σ o v ˜ Y > + ˆ Σ o ρ − ˜ Y ˆ Σ o v ˜ G > [ ˜ G ˆ Σ o v ˜ G > + ˆ Σ o δ ] − 1 ˜ G ˆ Σ o v ˜ Y > = YΣ x Y > + Σ ρ − YΣ x G > [ GΣ x G > + Σ δ ] − 1 GΣ x Y > . If these conditions can be met, it deﬁnes a set of optimal cov ariance matrices of the simpli- ﬁed probability density functions such that when they are used to generate a posterior , the performance of the optimal scheme is replicated. 6.1.1. Highly Parameterized Models. The above conditions will now be specialized for a highly parameterized model, where the calibration problem is under -constrained. Such models are recommended by [ 24 ] and [ 51 ], and the results will e xplicit determine when such approaches are appropriate. This will be performed by focusing attention only on the prior cov ariance matrix for the parameters of the simple model, such that ˆ Σ o δ and ˆ Σ o ρ remain the same as those used in the naiv e and optimal schemes, i.e. ˆ Σ o δ = Σ δ and ˆ Σ o ρ = Σ ρ . Such a scheme is now deﬁned. D E FI N I T I O N 6 (Optimally Compensated Scheme). An optimally compensated scheme is denoted by the simpliﬁed linear Gaussian pr obability density functions ˆ p o ( v ) , ˆ p o ( d | v ) and ˆ p o ( p | v ) par ameterized by the matrices ˆ Σ o v , ˜ G , Σ δ , ˜ Y , and Σ ρ . T o fully deﬁne the prior covariance matrix ˆ Σ o v , let C denote a simpliﬁcation matrix that links the simpliﬁed model to a high ﬁdelity model with data and pr ediction matrices denoted by G and Y suc h that ˜ Y = YC and ˜ G = GC . Furthermor e, let Σ x denote the covariance matrix that describes the uncertainty in the parameters of this high ﬁdelity model. The prior covariance matrix ˆ Σ o v is now deﬁned as (35) ˆ Σ o v = RΣ x R > , wher e R = ˜ Z † Z , and Z =  G Y  , ˜ Z =  ˜ G ˜ Y  . F or a given dataset d , the posterior belief over the predictions of interest g enerated by this scheme is the Gaussian density ˆ p o ( p | d ) = N ( p ; ˆ µ o p | d , ˆ Σ o p | d ) , with mean and covariance deﬁned as ˆ µ o p | d = µ ( ˆ Σ o v , ˜ G , Σ δ , ˜ Y , Σ ρ , d ) , (36) ˆ Σ o p | d = Σ ( ˆ Σ o v , ˜ G , Σ δ , ˜ Y , Σ ρ ) , (37) wher e the functions µ ( · ) and Σ ( · ) ar e given in Deﬁnition 1 . This scheme is now shown to be equiv alent to the optimal scheme when applied to a suboptimal but highly parameterized model where the number of free parameters is at least as large as the number of linearly independent measurements and predictions of interest. P R O P O S I T I O N 3 (Performance of Compensated Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ deﬁne an optimal calibration and pr ediction scheme, with posterior density denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Let C denote a suboptimal simpliﬁcation matrix and let the matrices ˆ Σ o v = RΣ x R > , ˜ G = GC , Σ δ , ˜ Y = YC , and Σ ρ deﬁne an optimally compensated scheme, wher e R = ˜ Z † Z and Z =  G Y  , ˜ Z = ZC =  ˜ G ˜ Y  . Now , consider the posterior density gener ated by the 16 G. M. MA THEWS AND J. VIAL optimally compensated scheme ˆ p o ( p | d ) = N ( p ; ˆ µ o p | d , ˆ Σ o p | d ) . If the simpliﬁcation is chosen such that (38) rank( ZC ) = rank( Z ) . Then, the posterior mean and covariance generated by the optimally compensated scheme ar e equivalent to those of the optimal scheme ˆ µ o p | d = µ p | d and ˆ Σ o p | d = Σ p | d for all d . See Appendix D for a proof. This demonstrates that giv en a model simpliﬁed in a suboptimal fashion, the optimal performance can be recovered through the adjustment of the assumed prior uncertainties. The main rank condition in (38) is fairly easy to achie ve in practice and is met when, e.g. the number of parameters is not smaller than the number of linearly independent measurements and predictions, and the simpliﬁed model matrix ˜ Z = ZC has full (row) rank. A model may be highly parameterized due to the degrees of spatial variability in e.g. the modelled material properties that is included in the model (as in [ 51 ]). Howe ver , the abov e result can apply to models of dynamical systems that use stochastic transition models (sometimes referred to as data assimilation methods). In these models the additional dynamic error terms that are incorporated at each time increment can be similarly vie wed as a set of additional model parameters. The general structure of the optimally compensated calibration and prediction scheme is depicted with a Bayesian network in Figure 5 . It is noted that the difference between this scheme and the naiv e scheme lies only in how the prior cov ariance matrix for the parameters of the simple model is speciﬁed. They are both deﬁned as a transformation of the cov ariance matrix Σ x that captures the uncertainty in the parameters for the high ﬁdelity model. Recall that the naiv e scheme uses Σ v = C † Σ x C † > , while the optimally compensated scheme uses ˆ Σ o v = RΣ x R > . The matrix R is dependent on the simpliﬁcation matrix C but it is also dependent on the data and predictions matrices G and Y of the high ﬁdelity models. It is this explicit dependency on the data and predictions that allows the prior covariance matrix ˆ Σ o v to compensate for the errors in the data and prediction equations of the simpliﬁed model. T o gain additional intuition as to how the compensating prior covariance matrix ˆ Σ o v is related to Σ v used by the nai ve scheme, consider for the moment that v and u are indepen- dent. This enables the prior covariance matrix for the parameters of the high ﬁdelity model to be decomposed as Σ x = CΣ v C > + U ¯ C Σ u U > ¯ C . W ith this, the prior matrix ˆ Σ o v can be rewritten as (39) ˆ Σ o v = Σ v + Σ + , where Σ + = ˜ Z † ZU ¯ C Σ u U > ¯ C Z > ˜ Z † > and is positiv e semi-deﬁnite matrix. This means that for ˆ Σ o v to compensate for the simpliﬁcations, the parameters must be allowed greater ﬂexi- bility than would be gi ven by the naiv e use of Σ v as ˆ Σ o v  Σ v . O VERCOMING MODEL SIMPLIFICA TIONS 17 d ˜ G , Σ δ p ˜ Y , Σ ρ v ˆ Σ o v u F I G . 5 . Bayesian network depicting the structur e of the densities ˆ p o ( v ) , ˆ p o ( d | v ) , and ˆ p o ( p | v ) employed by the optimally compensated scheme. Also displayed in dashed lines are the dependency on unmodelled complexity u under the conditions of a suboptimal simpliﬁcation. Note that under suc h a simpliﬁcation, the unmodelled complexity has a dir ect effect on the data and/or predictions. 6.1.2. W eighted Least Squares In version. In addition to the Bayesian ar guments pro- vided above, no w consider a regularized weighted least squares formulation commonly used for calibration and in version problems [ 30 , 45 ]. With the assumption of linear models, the mean v ector ˆ µ o v | d is also the maximum a posteriori estimate of the posterior parameter distri- bution of the simpliﬁed model ˆ p o ( v | d ) ∝ ˆ p o ( d | v ) ˆ p o ( v ) , and may be equiv alently deﬁned as the solution to the regularized weighted least squares optimization problem (40) ˆ µ o v | d = arg min v ( d − ˜ Gv ) > Σ − 1 δ ( d − ˜ Gv ) + v > ˆ Σ o v − 1 v . Note that this explicitly employs the simpliﬁed forward model ˜ Gv . Furthermore, the pre- diction generated by this estimate ˆ µ o p | d = ˜ Y ˆ µ o v | d is the mean and mode of the prediction posterior ˆ p o ( p | d ) . Due to the optimality of the compensated scheme, this point prediction is also the optimal minimum error variance prediction that the high ﬁdelity model would generate. This demonstrates that a simpliﬁed, but highly parameterized model, can be calibrated through the careful selection of a T ikhonov regularizer to generate the optimal prediction. Additionally , this optimal regularizer can also be used to characterize the predictiv e uncer- tainty of this point prediction through the use of the cov ariance formula in (37) . 6.1.3. Summary . The introduced optimally compensated prediction scheme allows sub- optimal model simpliﬁcations to be ov ercome through modifying the prior probability distri- bution for the parameters of the simpliﬁed model. This explicitly links: (i) the modelling problem, through the selection of a simpliﬁcation matrix C ; (ii) the nature of the av ailable data via G ; (iii) the predictions of interest deﬁned by Y ; and (iv) ho w the simple model is calibrated, as the required regularization term ˆ Σ o v is explicitly dependent on all three terms. It is noted howe ver , that this explicit dependency on the high ﬁdelity model will make the generation of ˆ Σ o v problematic in practice as the matrix G and Y will not generally be av ailable. Ne vertheless, it formally deﬁnes ho w a prior regularization term for a suboptimal model should be selected. 6.2. Data Driven Predictions. It was noted that the main barrier to applying the op- timally compensated scheme deﬁned abov e is the difﬁculty in determining the cov ariance 18 G. M. MA THEWS AND J. VIAL g d p u v F I G . 6. Dependency structur e of a prediction p where the unmodelled complexity u is hidden behind the data. Here the node g denotes an uncorrupted version of the data and is given by the deterministic r elationship g = Gx = ˜ Gv + GU ¯ C u . matrix ˆ Σ o v that compensates for the simpliﬁcations. Here a ne w scheme is proposed that ov ercomes this issue by moving away from optimality and focusing on conserv ativ eness. In particular, a speciﬁc class of predictions will be considered that allow the simpliﬁca- tions to be hidden behind the data. Thus, the objective is to examine how model simpliﬁ- cations can be ov ercome through the collection of the right data . The intuition here is that data of a similar type to the predictions is often av ailable, for example groundwater head measurements are often av ailable when it is of interest to predict heads, similarly stream ﬂow measurements are often av ailable when predictions of stream ﬂows are of interest, etc. The class of predictions that will be considered here incorporates two important princi- ples: • The effects of all unmodelled components of the system on the predictions are cap- tured by the data. • Any component of the system that ef fects the predictions, and is not captured by the data, is explicitly represented in the simpliﬁed model. From these principles it is considered that the prediction of interest can be explicitly de- composed into an intermediate data term that is dependent on g = Gx and a term wholly dependent on the parameters of the simple model, that is p = ¯ Ag + ¯ Bv + ρ , (41) where ¯ A and ¯ B are arbitrary matrices. Predictions with this structure are conditionally in- dependent of u gi ven g and v , i.e. p ( p | g , v , u ) = p ( p | g , v ) . This structure is represented by the Bayesian network in Figure 6 and requires the prediction matrix to have the follo wing form (42) Y = ¯ AG + ¯ BC † . It will be sho wn in the sequel that e ven with this restricted form it is not possible to overcome the simpliﬁcations and further constraints must be imposed. 6.2.1. Naive Scheme. Now , consider the nai ve calibration and prediction scheme intro- duced in Deﬁnition 3 . Recall that this method will be conservati ve for a suboptimal sim- pliﬁcation when condition (32) , introduced in Proposition 2 , is satisﬁed. Speciﬁcally , this requires YU ¯ C − ˜ Y E n GU ¯ C = 0 , O VERCOMING MODEL SIMPLIFICA TIONS 19 where E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 . For a prediction matrix Y with structure consistent with (42) , this condition can be rewritten as YU ¯ C − ˜ Y E n GU ¯ C = ¯ A [ I − ˜ GΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 ] GU ¯ C − ¯ BΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 U ¯ C = 0 . Now , under non-trivial conditions, this holds when (43a) ¯ BΣ v ˜ G > = 0 , and (43b) ˜ GΣ v ˜ G > [ ˜ GΣ v ˜ G > + Σ δ ] − 1 = I . Condition (43a) requires the random vector ¯ b = ¯ Bv to be uncorrelated with ˜ g = ˜ Gv under the prior covariance Σ v . Furthermore, condition (43b) requires ˜ GΣ v ˜ G >  Σ δ and occurs when the data is perfect such that Σ δ ≈ 0 , or when the prior kno wledge in the subspace that is informed by the data (i.e. rowspace( ˜ G ) ) is very weak such that ˜ GΣ v ˜ G > ≈ ∞ . These conditions ensure that: (i) the subset of parameters of the simple model that di- rectly inﬂuence the predictions, represented by ¯ b = ¯ Bv , cannot be estimated from the data and (ii) the simple model can exactly reproduce the data. It is important to note howe ver that these conditions are unlikely to hold in practical scenarios and the naiv e scheme is not guaranteed to be conservati ve, ev en for the predictions with the structure considered in (41) . 6.2.2. Data Driven Scheme. T o overcome the abov e issues, it is proposed to ensure conservati veness through the judicial remov al of information such that the conditions deﬁned in (43) can be satisﬁed. This will occur in two main areas: • An easily computable inﬂation of the prior cov ariance matrix Σ v . • The application of a preprocessing or ﬁltering step that may discard or combine some components of the data. Additionally , the class of predictions must be restricted further than those allo wed by (41) . T o start, let F ∈ R D d 0 × D d denote a data ﬁltering matrix such that (44) d 0 = Fd , where D d 0 ≤ D d . Similarly deﬁne the transformed data matrices as G 0 = FG , and ˜ G 0 = F ˜ G , and the transformed data error cov ariance as Σ 0 δ = FΣ δ F > . The purpose of the ﬁlter F will be elaborated on later . Note howe ver , ﬁltering is optional and the identity matrix can be used F = I . Now , consider the singular value decomposition of ˜ G 0 (45) ˜ G 0 = F ˜ G =  U 1 U 2   S 1 0 0 0   V > 1 V > 2  = U 1 S 1 V > 1 . The last expression U 1 S 1 V > 1 only includes the nonzero singular v alues, and will be referred to as the compact SVD. With this expansion, the vectors v 1 = V > 1 v and v 2 = V > 2 v deﬁne the rowspace and nullspace components of the parameters of the simple model for the giv en ﬁltered data matrix ˜ G 0 = F ˜ G . T o meet condition (43b) , the prior covariance will be inﬂated such that all information pertaining to the rowspace of ˜ G 0 , i.e. the subspace spanned by V 1 , is removed. In addition any correlation in the random vectors v 1 and v 2 will be ignored. Such a calibration and prediction scheme that embodies these properties is now deﬁned. 20 G. M. MA THEWS AND J. VIAL D E FI N I T I O N 7 (Data Dri ven Scheme). F or a given ﬁltering matrix F ∈ R D d 0 × D d with D d 0 ≤ D d , a data driven scheme is denoted by the set of simpliﬁed linear Gaussian pr obability density functions ˆ p d ( v ) , ˆ p d ( d 0 | v ) and ˆ p d ( p | v ) parameterized by the matrices ˆ Σ d v , ˜ G 0 = F ˜ G , Σ 0 δ = FΣ δ F > , ˜ Y , and Σ ρ . T o fully deﬁne the prior covariance matrix ˆ Σ d v , let the covariance matrix Σ v denote the uncertainty in the par ameters of the simpliﬁed model deﬁned in Pr oposition 1 as Σ v = C † Σ x C † > . Furthermor e, let the matrices V 1 and V 2 each denote a set of orthonormal column vectors that form a basis for the r owspace and nullspace of ˜ G 0 , e.g. as deﬁned in (45) . The prior covariance matrix ˆ Σ d v is now deﬁned by the limit (46) ˆ Σ d v = lim α →∞ α V 1 V > 1 + V 2 V > 2 Σ v V 2 V > 2 . F or a given dataset d the posterior belief over the pr edictions of inter est generated by the data driven scheme is the Gaussian density ˆ p d ( p | d ) = N ( p ; ˆ µ d p | d , ˆ Σ d p | d ) , with mean and covariance given by ˆ µ d p | d = µ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ , d ) , ˆ Σ d p | d = Σ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ ) , wher e the functions µ ( · ) and Σ ( · ) ar e given in Deﬁnition 1 . Finally , a more speciﬁc class of predictions is considered than that deﬁned in (42) , such that condition (43a) will hold by construction. Speciﬁcally , the prediction must depend on the uncorrupted version of the ﬁltered data, denoted by g 0 = FGx , or on the components of the parameters of the simple model that line in the nullspace of ˜ G 0 , denoted by v 2 = V > 2 v . The prediction cannot depend on any ro wspace components directly . Predictions with this structure can be written in the form p = Ag 0 + Bv 2 + ρ , (47) where A and B are arbitrary matrices. Furthermore, the dependency structure of this re- stricted class of predictions is depicted in Figure 7 . The key difference between this and the structure depicted in Figure 6 is the explicit separation of v into v 1 and v 2 and the restric- tion that v 1 cannot hav e a direct dependency on the prediction. That is, the prediction p is conditionally independent of u and v 1 giv en g 0 and v 2 , or equiv alently p ( p | g 0 , v 1 , v 2 , u ) = p ( p | g 0 , v 2 ) . A prediction with this restricted structure has a prediction matrix of the form Y = AF G + BV > 2 C † . It is noted that the matrices A and B are arbitrary , the important characteristic is the structure that links the data and predictions to the simpliﬁcation. This structure not only ensures that the simpliﬁcations are hidden behind the data, but also that any surrog ate roles that the parameter vector v 1 is forced to undertake during calibration cannot adversely af fect the predictions. It is now demonstrated that the data driv en scheme is conserv ative for predictions that hav e this special structure. P R O P O S I T I O N 4 (Performance of Data Driv en Scheme). Suppose the matrices Σ x , G , Σ δ , Y , Σ ρ deﬁne an optimal scheme with posterior denoted by p ( p | d ) = N ( p ; µ p | d , Σ p | d ) for arbitrary dataset d . Furthermor e, let C denote a suboptimal simpliﬁcation matrix and let the matrices F , ˆ Σ d v , ˜ G 0 = F GC , Σ 0 δ = FΣ δ F > , ˜ Y = YC , and Σ ρ denote a data driven scheme, wher e ˆ Σ d v is as speciﬁed in Deﬁnition 7 . Additionally , let ˆ µ d p | d and ˆ Σ d p | d denote the mean and covariance of the posterior pr oduced by this scheme for the dataset d . O VERCOMING MODEL SIMPLIFICA TIONS 21 d 0 g 0 p v 1 v 2 u F I G . 7. Dependency structur e of a prediction p where the unmodelled complexity u is hidden behind the data. Any induced parameter surrogacy in the parameters v 1 cannot contaminate the pr edictions. Her e the node g 0 denotes an uncorrupted version of the ﬁltered data and is given by the deterministic relationship g 0 = FGx = F ˜ GV 1 v 1 + FGU ¯ C u . 1. If the ﬁltering matrix F is selected such that ˜ G 0 = F ˜ G has full r ow rank, then the estimator matrix E d of the data driven scheme is equivalent to the pseudoin verse of ˜ G 0 and can be expr essed in terms of the compact SVD (48) E d = ( ˜ G 0 ) † = V 1 S − 1 1 U > 1 . In addition, the posterior mean and covariance can be r ewritten as ˆ µ d p | d = ˜ Y E d d , ˆ Σ d p | d = ˜ Y E d Σ δ E d > ˜ Y > + ˜ Y WΣ v W ˜ Y > + Σ ρ , wher e W = I − V 1 V > 1 . 2. If ˜ G 0 = F ˜ G has full row r ank, and the high ﬁdelity prediction matrix has the form (49) Y = AF G + BV > 2 C † , then the data driven scheme is conservative with r espect to the optimal scheme . See Appendix E for a proof. Before providing additional intuition as to the importance of this result, and the role of the ﬁltering term F , it is ﬁrst demonstrated that the data driv en scheme is a generalization of the truncated singular value decomposition calibration method. 6.2.3. T runcated SVD Inv ersion. The truncated singular value decomposition inv er- sion or calibration method is a commonly used technique [ 2 , 32 , 51 ] that generates the solu- tion to the parameter estimation problem using a truncated decomposition of the data matrix. It will be demonstrated that the truncated SVD scheme can be replicated with the appropriate selection of a ﬁltering matrix. T o start, consider the SVD of the simpliﬁed data matrix (50) ˜ G = ˜ U ˜ S ˜ V > . Now , let ˜ U t denote the columns of ˜ U that correspond to the largest k nonzero singular values. Furthermore, consider the ﬁltering matrix deﬁned by (51) F TSVD = ˜ U > t . W ith this deﬁnition, the ﬁltered data matrix ˜ G 0 = F TSVD ˜ G becomes (52) ˜ G 0 = F TSVD ˜ G = ˜ U > t ˜ U ˜ S ˜ V > = ˜ S t ˜ V > t , 22 G. M. MA THEWS AND J. VIAL where ˜ V t , ˜ S t similarly denote truncated versions of ˜ V , ˜ S . Now , as F TSVD ˜ G has full row rank (only nonzero singular values are included), the results of Proposition 4 (1) allow the estimator matrix for the scheme to be given by the pseudoin verse of ˜ G 0 = F TSVD ˜ G and thus can be written as (53) E d TSVD = ˜ V t ˜ S − 1 t . The estimated parameter vector of the simple model is no w related to the data v ector via the in verse of the truncated data matrix ˜ G , i.e. (54) ˆ µ v | d 0 = E d TSVD d 0 = ˜ V t ˜ S − 1 t ˜ U > t d . This demonstrates that the truncated SVD in version method can be replicated with the selec- tion of an appropriate ﬁltering matrix. It is important to note howe ver that this does not mean that the predictions generated by the truncated SVD method will be conservati ve. T o guarantee this, the prediction must hav e the form speciﬁed in condition (49) of Proposition 4 , which in this case requires the prediction matrix to hav e the form (55) Y = A ˜ U t G + B ˜ V > ¯ t C † , where ˜ V ¯ t contains the columns of ˜ V that were removed by the truncation. If this is not obeyed, the predictions may be over conﬁdent. Furthermore, this condition is implicitly dependent on the truncation point k , such that this may hold for some values and not others. This dependency explains in part the results of the simulation studies performed by [ 51 ] which demonstrated the difﬁculty in choosing an appropriate truncation point such that predictiv e uncertainty is accurately estimated by a simpliﬁed model. 6.2.4. Selection of Data Filtering. A key element of the deﬁned data dri ven prediction scheme is the ﬁltering matrix F . For the results of Proposition 4 to hold, and the scheme to be conservati ve, two main requirements must be met: 1. F must be selected such that ˜ G 0 = F ˜ G has linearly independent rows. This is a fairly trivial requirement, and simply requires that dependent, e.g. duplicated, measurements be combined, e.g. by av eraging. Note that the information content is preserved as the error cov ariance is also transformed. In addition, it requires measurements that are insensitiv e to the model parameters be dropped. 2. F must be chosen such that the induced separation of the parameter vector v into those components that are estimatable v 1 = V 1 v and lie in the rowspace of F ˜ G from those that are not v 2 = V 2 v and lie in the nullspace of F ˜ G , forces all com- ponents of the parameter vector that have a direct inﬂuence on the prediction to be contained in the vector v 2 and are thus not updated during the calibration process. This ensures that these parameters do not take on any surrogate roles, and that the predictions are not corrupted. In addition to these two requirements, the ﬁltering matrix can be selected to improve the predictiv e performance by remo ving data components that are only weakly informati ve to the parameters. This can be considered identical to selecting an appropriate truncation point on a SVD calibration scheme such that the highly informative prior kno wledge can be used instead of data that only imposes weak constraints, e.g. as recommended by [ 32 ]. Howe ver before searching for an optimal ﬁlter, it is important to note that for a given type of data, predictions, and simpliﬁcation, deﬁned by the triple G , Y , C , there may not be an F that allo ws the prediction to be cast in the form required by Proposition 4 and the use of the data driven scheme is not guaranteed to be conservati ve. This reinforces the fact that O VERCOMING MODEL SIMPLIFICA TIONS 23 d 0 Σ 0 δ = FΣ δ F > g 0 F , ˜ G p A , B , Σ ρ v 1 Ignore prior in- formation on v 1 u v 2 Σ v 2 = V > 2 Σ v V 2 F I G . 8. Bayesian network depicting the structure of the densities ˆ p d ( v ) , ˆ p d ( d | v ) , and ˆ p d ( p | v ) employed by the data driven scheme. Also displayed in dashed lines are the dependencies on unmodelled complexity u of a suboptimal model under condition (49) of Pr oposition 4 . Note also that the simpliﬁed prediction matrix is given by ˜ Y = A ˜ G 0 + BV > 2 , however the actual values of A and B are not needed. choosing a model simpliﬁcation and performing calibration with a particular dataset, must be considered an integrated task that is explicitly dependent on the types of predictions that are required. 6.2.5. Summary . This section has introduced a calibration and prediction scheme that is a generalization of the typically used truncated SVD scheme. Furthermore it has been demonstrated that the scheme is guaranteed to produce a conservati ve prediction posterior when the unmodelled complexity is hidden behind the data. The general structure of the prediction problem, that obeys this condition, is depicted in graphically in Figure 8 . This represents the probability densities that deﬁne the data dri ven scheme and includes the de- pendency structure of the predictions (as required by condition (49) and depicted in Figure 7 ) as well as the structure of the inﬂated prior cov ariance matrix ˆ Σ d v (as speciﬁed in (46) ). The important contribution here is the deri ved structural condition between the data, pre- dictions and the simpliﬁcations that determines when it is adequate to use the data driv en scheme, with a simpliﬁed model, to assess predicti ve uncertainty . It is noted that unlike the optimally compensated scheme, this scheme can be used without needing the explicit numer- ical values within the data and prediction matrices of the high ﬁdelity reference model. It is noted howe ver that, structural information from them is still required to ensure Proposition 4 is met. Although this still may not be straightforward in practice, structural considerations are generally signiﬁcantly easier to handle than numerical ones. 7. Discussion and Conclusions. This paper has considered what constitutes a simpli- ﬁed, but useful, model. In particular it has e xamined how simpliﬁed models can be used to combine data and expert knowledge within a calibration or in version process to generate a prediction with a conservati ve estimate of uncertainty . The concept of an optimal simpliﬁed model was deﬁned that determines when standard probabilistic calibration methods are ade- quate to quantify predictiv e uncertainty . The main contribution is the introduction of two new calibration and prediction schemes, along with conditions that explicitly deﬁne when they are appropriate to generate a conservati ve estimate of uncertainty , for suboptimal models. These conditions explicitly relate the nature of the calibration data, the predictions of interest as well as the simpliﬁcations within the model. The ﬁrst scheme allo ws the optimal posterior distrib ution to be generated by allowing the simpliﬁcations to be o vercome by adjusting how the beliefs of the modeller are used to deﬁne 24 G. M. MA THEWS AND J. VIAL a prior term. The scheme is only applicable for highly parameterized models. Furthermore, it requires the prior covariance for the parameters of the simple model to be generated using the data and prediction matrices of a high ﬁdelity reference model. This is a signiﬁcant limitation as if these matrices could be generated for a practical problem they could be directly used to produce a prediction posterior without the need to consider a simpliﬁed model. Nev ertheless, the v alue of this scheme lies in deﬁning the ideal calibration process for a simpliﬁed model and demonstrates it is possible to ov ercome suboptimal simpliﬁcations through the judicial selection of a prior regularization matrix. The second data driv en scheme is designed for predictions that are strongly related to the data, such that the unmodelled complexity effects both in the same way . This scheme does not require the data and prediction matrices of the high ﬁdelity model to be av ailable for model calibration in the way the optimally compensated scheme does. Ho wev er, it is only guaranteed to be conservati ve if the predictions, data and simpliﬁcation have the required structural form. The key insight provided by this scheme is understanding how the model simpliﬁcations can be ov ercome with the use of the right calibration data. It was also demonstrated that this data driven scheme is a generalization of the popu- lar truncated singular v alue decomposition in version scheme [ 2 ]. The generalization allo ws greater ﬂexibility in ﬁltering the data to ensure that the predictions are of the appropriate form for a giv en simpliﬁed model. Both calibration schemes have been applied to a prototypical groundwater prediction problem in Appendix A . Finally , it is noted that each of the two ne wly deﬁned calibration and prediction schemes hav e conditions that are linked to the data and prediction matrices of a high ﬁdelity reference model. For any practical problem it is unlikely that these conditions can be directly assessed and further subjective judgment will be needed. T o make this process easier, two areas of further work can be pursued. Firstly , more synthetic experimental analyses are required to demonstrate how the two schemes can be applied in more complex problems, e.g. as in [ 51 ]. The second area is to understand the effect of partial non-satisfaction of these conditions, and determine how the schemes can be made rob ust to these. It is also noted that the use of Bayesian networks, for instance as depicted in Figure 10 , may be of great beneﬁt to determine when the required structural conditions are likely to be obeyed, for instance, when is a gi ven simpliﬁed model optimal, or when is it appropriate to use the data driv en scheme. Such structural considerations may also help frame the arguments put forward in modelling projects (e.g. performed within en vironmental impact assessment studies) as to why a given modelling and calibration approach is adequate for a gi ven predic- tion problem. For instance these argument can be framed using a two stage process. The ﬁrst may conceptually link the complex system to an optimally simpliﬁed model, this stage would consider the features and characteristics of the system that ideally should be modelled. The second stage may then put forward arguments as to ho w any further simpliﬁcations used to produce a suboptimal numerical model will be handled by the calibration process. Lastly , se veral other important areas of further research are identiﬁed: • Nonlinear simulators. The analysis within this paper has required a linear relation- ship between the system properties and the data and predictions. This has allowed generic insights to be obtained, but is a signiﬁcant limitation and further work is needed to relax this. One approach is to consider higher order expansions of the models such that some of the nonlinearities can be included, for example second order expansions have been considered in [ 8 , 31 , 12 ]. Alternativ ely , more direct probabilistic formulations may also be possible that e.g. generalize the data dri ven scheme and only exploit the structural constraints with the problem. • Nonlinear parameterizations. In the developed analytical framew ork only linear relationships between the parameters of the high ﬁdelity model and the simpliﬁed O VERCOMING MODEL SIMPLIFICA TIONS 25 model where considered. This should be extended to considered nonlinear parame- terizations. • Over constrained calibration problems. The results obtained for the optimally compensated prediction scheme required that the simpliﬁed model is under con- strained by the data and provides no insight to aid the calibration of ov er constrained problems. Ne vertheless, other approaches may be possible that reproduce the opti- mal, or at least a conserv ativ e, result. Similar modiﬁcations may also be possible for the data driv en scheme. • Conservativ eness. This paper has employed a deﬁnition of conservati veness based on the mean and cov ariance of the model predictions. Generalizations of this to non- Gaussian distributions have been considered in e.g. [ 3 , 1 ]. Howe ver , it is perhaps more appropriate to consider a generalization of conserv ativ eness that explicitly in- cludes the subsequent decision problem (i.e. engineering design or environmental management). This can be performed using a decision theoretic approach [ 5 ] that includes the utility function of this subsequent decision problem and deﬁnes an ap- proximate density ˆ p ( p ) as conservati ve if the expected utility is not ov er estimated, e.g. E ˆ p ( p ) { U ( a , p ) } ≤ E p ( p ) { U ( a , p ) } for all a . Here U ( a , p ) is the utility function that encodes the gain under action a , when the consequence p occurs. Such a generalization would also allow the incorporation of the decision problem into ho w modelling and calibration should be performed. It is noted that the above generalization is equiv alent to Deﬁnition 4 when U ( a , p ) has the form of a weighted squared dif ference, i.e. U ( a , p ) = − ( a − p ) > A ( a − p ) for arbitrary positiv e semideﬁnite matrix A . 26 G. M. MA THEWS AND J. VIAL F I G . 9. Groundwater pr ediction pr oblem of interest. The aquifer is considered to be unit width into the pa ge. Appendix A. Example: Gr oundwater Head Prediction. The two ne wly deﬁned calibration and prediction schemes are now applied to a proto- typical groundwater prediction problem similar to that considered in [ 51 ]. The scenario is depicted in Figure 9 and consists of a 1D conﬁned aquifer with a single observation well. Of interest is the predicted hydraulic head within the aquifer to the right of the observation well. It is noted that this is a very rudimentary problem, howe ver the objectiv e here is to demonstrate the dif ferences between the schemes and how the conditions that guarantee con- servati veness relate to a speciﬁc problem. A.1. Prior Beliefs. For this example, the expert modeller has the following beliefs about the system (taken from [ 51 ] where possible) 1. A constant head boundary condition is belie ved to exist at the left side, correspond- ing to a discharge location. The head at this location, h 0 , is believed to be normally distributed with mean 1 . 0 m and standard deviation 0 . 75 m abo ve the upper conﬁning layer . 2. No areal recharge or leakage is belie ved to take place within the domain of interest. 3. The thickness of the aquifer b is known to be a constant over the domain with value 10 m. 4. The rate q of water ﬂo wing through the aquifer known to be 0 . 5 m 3 /day . 5. The system is belie ved to be in steady state. Furthermore, a numerical simulation of the system with cell length ` = 10 m is belie ved to be adequate to capture the spatial variation in the head ﬁeld that is of interest. 6. The hydraulic conductivity of the aquifer is believ ed to be heterogeneous, with a mean value of 2 . 5 m/day . A set of 10 cells, each of length ` = 10 m, is considered adequate to describe the heterogeneity , where the hydraulic conducti vity of each cell, K i , is believed to be log normally distributed such that log 10 K i has a mean log 10 2 . 5 ≈ 0 . 398 , and a spatial correlation described by the exponential variogram with sill of 0 . 1 ≈ (0 . 316) 2 and range 300 m. 7. The error in the data acquisition method is believ ed to be small and normally dis- tributed with a mean of zero and standard de viation 0 . 1 m. It is noted that these capture what is believed to realistically describe the system (or at least an optimally simpliﬁed model). Further simplifying assumptions will be considered in Ap- pendix A.4 . Based on the above prior kno wledge, the system properties can be deﬁned as the v ector that captures the constant boundary head, and the set of hydraulic conducti vities (56) x = [ h 0 , log 10 ( K 1 ) , . . . , log 10 ( K 10 )] > O VERCOMING MODEL SIMPLIFICA TIONS 27 Furthermore, the uncertainty in the system properties is fully captured by the mean and co- variance matrix that deﬁnes the Gaussian distrib ution p ( x ) = N ( x ; µ x , Σ x ) . T o deﬁne the data and prediction likelihood functions, consider the vector x as known and deﬁne the mean with the following nonlinear functions, corresponding to the ﬁnite difference solution to Darcy’ s equations [ 4 ] G ( x ) = h 0 + 5 X i =1 q ` bK i , (57) Y ( x ) = h 0 + 10 X i =1 q ` bK i , (58) = G ( x ) + 10 X i =6 q ` bK i . (59) Note the variables ` , b and q are all known constants. Furthermore, as the numerical simula- tion at this discretization is believed to be adequate, the data error δ is completely captured by errors in the measurement process such that Σ δ = (0 . 1) 2 . In addition the prediction error variance is tak en to be zero, Σ ρ = 0 . It is noted that the prediction equation has been rewritten on line (59) to explicitly include the data equation as an additive term. This form will be important when judging whether the data driv en scheme is appropriate for a particular system simpliﬁcation. A.2. Data Generation. The measured head at the observation well is d = 2 . 5 m. This is generated by the data equations d = G ( x t ) with no added measurement error . In addition the system properties x t are the same as the prior mean µ x , with the exception that the boundary head is set to h 0 = 1 . 5 m. This difference corresponds to 2 / 3 of the prior standard deviation. A.3. Linearized Solution. From the nonlinear functions G ( x ) and Y ( x ) , a pair of lin- ear functions are constructed by linearizing about the prior mean with a ﬁrst order T aylor expansion [ 30 , 10 ]. G ( x ) ≈ G ( µ x ) + ∇ x G ( µ x )[ x − µ x ] and similarly for Y . Furthermore, a transform into zero mean increments is performed such that ∆ x = x − µ x , ∆ d = d − G ( µ x ) , and ∆ p = p − Y ( µ x ) . Under the linearized approximation the data and prediction equations become ∆ d ≈ G ∆ x + δ ∆ p ≈ Y ∆ x + ρ where the data and prediction matrices are deﬁned by the Jacobian matrices ev aluated at the prior mean G = ∇ x G ( µ x ) and Y = ∇ x Y ( µ x ) . The solutions to the full non-linear prediction problem and the approximate linearized version deﬁned abov e are given in Figure 11 (a). The non-Gaussian posterior density of the nonlinear problem has been produced using a standard Metropolis MCMC algorithm [ 9 ] to generate samples that are then smoothed using a kernel density estimator [ 41 ]. Overlaid with this is the Gaussian posterior produced from the linearized problem. It is noted that the posterior under the non-linear equations is slightly more peaked and non-symmetric when compared to the posterior generated by the linearized scheme. It is noted that the non-linearity is only present as the log conducti vity is considered to be normally distributed. If, for instance, the hydraulic resistance (inv erse conductivity) 28 G. M. MA THEWS AND J. VIAL was represented directly , the equations would become linear . Howe ver , this is not pursued here and the error caused by assuming a linear approximation of the nonlinear equations is considered out of scope. A.4. Assumed Simpliﬁcations. Consider the following simplifying assumptions A1: The constant ﬂo w boundary condition is assumed known and set to the prior v alue of 1 . 0 m. A2: Cells 1-5 are grouped together and assumed a homogeneous unit. This set of cells will be referred to as Zone A. A3: Cells 6-10 are grouped together and assumed a homogeneous unit. This set of cells will be referred to as Zone B. This enables the parameters of the simple model to be deﬁned by the vector of log conducti v- ities of the two zones v = [log 10 K A , log 10 K B ] This simpliﬁcation scheme corresponds to a C matrix of C =           0 0 1 0 . . . . . . 1 0 0 1 . . . . . . 0 1           . The use of increments allo ws the state of the high ﬁdelity model to be deﬁned in terms of an increment ∆ v such that ∆ x = C ∆ v . Furthermore, the simpliﬁed data equation becomes ˜ G (∆ v ) = G ( µ x + C ∆ v ) and similarly for the prediction equation. No w , the linearized form of the simpliﬁed data and prediction equations become ∆ d ≈ ˜ G ∆ v + δ ∆ p ≈ ˜ Y ∆ v + ρ Here the simpliﬁed data and prediction matrices can be written in terms of the high ﬁdelity model ˜ G = GC and ˜ Y = YC , where G and Y are the Jacobian matrices of the high ﬁdelity model G = ∇ x G ( µ x ) , Y = ∇ x Y ( µ x ) . It is now of interest to determine whether this simpliﬁcation is optimal. If it is, the naiv e prediction scheme when used with the simpliﬁed model will produce the same results as the optimal scheme applied to the high ﬁdelity model. For the simpliﬁcation to be optimal, the unmodelled complexity should ha ve no effect on the data or predictions. This occurs when the following conditions hold GU ¯ C = 0 and YU ¯ C = 0 O VERCOMING MODEL SIMPLIFICA TIONS 29 K 1 . . . h 5 d h 0 K 6 . . . h 10 K A h 5 d h 0 K B K − A K − B h 10 (a) (b) F I G . 1 0 . Bayesian Network that depicts the structur e of the pr oblem with (a) the refer ence model (b) the simpliﬁed model. The unmodelled complexity in the simpliﬁed model includes the boundary head h 0 and the small scale complexity within each zone, denoted by the two random vectors K − A and K − B , each with four elements. The dependencies these have ar e denoted by dashed lines. where the columns of U ¯ C deﬁne an orthonormal basis for the cokernel of C , as deﬁned in (17) . The unmodelled complexity consists of h 0 and the two 4D parameters that describe the small scale heterogeneity in the conductivity of the two zones. It is noted that this condition does not hold and thus the simpliﬁcation is not optimal. Furthermore, the failure of this condition to hold is completely due to assumption A1 that considers the boundary condition h 0 is kno wn. If this assumption is remo ved and h 0 included in the simpliﬁed model, while at the same time retaining the homogeneity assumptions of A2 and A3, then the simpliﬁcation would become optimal (under the linearization considered). This optimal simpliﬁcation will not be considered and it will be expected that the nai ve method will not be conserv ative. Finally , the structure of the high ﬁdelity prediction problem and the simpliﬁed problem is depicted graphically with two Bayesian networks in Figure 10 . It is noted that the suboptimal- ity of the simpliﬁcation can be observed in Figure 10 (b) as the data and predictions are not conditionally independent of the unmodelled components ( h 0 ) giv en the model parameters ( K A and K B ). A.5. Prediction Results. Here, the performance of the two new calibration and predic- tion schemes are considered, with the resulting posterior density functions for the predicted head displayed in Figure 11 (b). The results are also compared with the nai ve scheme which represents a typical probabilistic calibration scheme of a simpliﬁed model. It is important to note that no “ground truth” v alue for the prediction is giv en for comparison purposes as the objectiv e is to produce the full probability distribution and not make a single point prediction. A.5.1. Naive Scheme. The naiv e scheme considers the uncertainty in the parameters included by the simple model, but ignores any errors introduced in the simpliﬁcation. As the simpliﬁcation is suboptimal, it should not be expected that the prediction posterior will be conservati ve. Furthermore, the results in Figure 11 (b) demonstrate that this is true for this scenario as the posterior is ov erly narrow . T o understand the cause of the o verconﬁdence in this scenario, consider the prior covari- ance in the parameters ∆ v as deﬁned in Proposition 1 (60) Σ v = C † Σ x C † > ≈ 10 − 2  8 . 58 6 . 19 6 . 19 8 . 58  30 G. M. MA THEWS AND J. VIAL 0 2 4 6 Predicted Head (m) Linearised Nonlinear 0 2 4 6 Predicted Head (m) Naiv e Scheme Optimal Scheme Data Driven Scheme (a) (b) F I G . 1 1 . Pr ediction probability density functions under (a) refer ence model and (b) the simpliﬁed model. The optimally compensated scheme in (b) repr oduces the posterior distribution of the linearized version of the refer ence model in (a). The naive scheme is non-conservative and under estimates the uncertainty, while the data driven scheme is conservative. Due to the large range of 300m used for the variogram of the log conductivity ﬁeld, the correlation between the two parameters maintained by the simple model is considerable. In addition, the known ﬂow rate and the assumption of known boundary head h 0 , allows the head measurement to be used to very accurately , but incorrectly , estimate the conductivity K A . Furthermore, due to the large correlation that K A has with K B , this scheme produces an ov erconﬁdent prediction probability density function for the parameter K B , which in turn causes the predicted head to be also ov erconﬁdent and non-conservati ve. A.5.2. Optimally Compensated Scheme. Now consider the optimally compensated calibration and prediction scheme. Firstly it is noted that the simpliﬁed problem is highly parameterized as D v ≥ D p + D d as D v = 2 and D p = D d = 1 . The major difference between the naiv e scheme and the optimally compensated scheme is the modiﬁcation of the prior covariance used for the parameters of the simple model. In particular , the modiﬁed prior of this scheme is not just deﬁned in terms of the simpliﬁcation matrix, but also in terms of the data and prediction matrices, via the matrix R = ˜ Z † Z , where Z =  G Y  and ˜ Z =  ˜ G ˜ Y  . The optimally compensating prior cov ariance becomes ˆ Σ o v = RΣ x R > ≈ 10 − 2  27 . 44 6 . 19 6 . 19 8 . 58  It is noted that the only difference between the values of the naiv e prior cov ariance Σ v and ˆ Σ o v , is the inﬂation of the mar ginal variance of the ﬁrst component corresponding to log K A . This causes less information to be propagated from the data into log K B , and enables the prediction posterior to replicate the results of the high ﬁdelity model. The key difﬁculty with this scheme is that it requires the data and prediction matrices of the high ﬁdelity model in order to produce the inﬂated prior cov ariance matrix ˆ Σ o v that compensates for the suboptimal simpliﬁcations. A.5.3. Data Driven Scheme. T o apply the data driven calibration and scheme, it is ﬁrst of interest to determine if it is appropriate for the problem, in particular , does the prediction hav e the right structural relationship with the data and the simpliﬁcations as speciﬁed in Proposition 4 . This will be checked ﬁrst using the algebraic expression, and then using the structural characteristics of the network depicted in Figure 10 (b). O VERCOMING MODEL SIMPLIFICA TIONS 31 The required algebraic condition requires that for some A and B the prediction matrix can be written as (61) Y = AF G + BV > 2 C † . For this problem, the data is scalar , and thus the ﬁltering matrix will be set to unity F = 1 . Also, from the prediction equation in (59) , it is clear that the prediction matrix Y is equal to the data matrix G plus an additional term, which will be denoted by T . This allows the Jacobian matrix Y to be written as (62) Y = G + T . Thus, the matrix A in (61) will be set to unity A = 1 . Now it is of interest to consider the second term in (61) . For the prediction to hav e the required form, the matrix T = Y − G must be equi valent to BV > 2 C † , and thus must satisfy two conditions: ( Y − G ) U ¯ C = 0 (63) ( Y − G ) CV 1 = ( ˜ Y − ˜ G ) V 1 = 0 (64) The ﬁrst condition ( 63 ) requires the dif ference between the data and the prediction to be in- dependent of the unmodelled complexity . For this scenario this can be checked algebraically using the high ﬁdelity prediction and data matrices and is satisﬁed. No w consider the second condition ( 64 ), this requires the difference between the data and the prediction to be inde- pendent on the rowspace components of the ˜ G . This can be easily checked using just the simpliﬁed data and prediction matrices and is also satisﬁed. Thus, it is guaranteed that the data driv en scheme is conservati ve for this problem. As an alternati ve to this algebraic check, the conditional independence requirements can be check using the structure of the Bayesian network. In particular , the requirement of (61) is equi valent to the requirement that the prediction p = h 10 is conditionally independent of all unmodelled components, u = [ h 0 , K − A , K − B ] , given the uncorrupted data g = h 5 and the nullspace components of the parameter vector v 2 = K B . From the structure of the network, these two nodes are the only parents of the prediction and thus, the required independence requirement is satisﬁed. This graphical approach provides much greater intuition into when the requirements are met. When the data driv en scheme is applied, it is noted that the av ailable prior knowledge is modiﬁed in two ways. Firstly , the prior information on K A is ignored and this parame- ter is estimated from the data only . Secondly , the correlation between K A and K B is also ignored, this causes the posterior over the parameter K B to be the same as the prior . These modiﬁcations are encapsulated in the data driv en prior cov ariance matrix ˆ Σ d v = lim α →∞ α V 1 V > 1 + V 2 V > 2 Σ v V 2 V > 2 = 10 − 2  ∞ 0 0 8 . 58  . The posterior generated by this scheme is conservati ve and is depicted in Figure 11 (b). It is noted that the posterior is slightly shifted than that produced by the optimal scheme and has a larger v ariance. A.6. Summary . This section has demonstrated the operation of the naive, optimally compensated, and data driv en calibration and prediction schemes on a prototypical groundwa- ter problem. A high ﬁdelity model was considered to represent the true belief of an modeller and a suboptimal simpliﬁcation dev eloped to represent a computational model. 32 G. M. MA THEWS AND J. VIAL The ke y difference between the three schemes is in the speciﬁcation of the prior co- variance matrix for the parameters of the simpliﬁed model. In particular, the naive scheme generates this by directly projecting the prior distribution that e xists in the parameters of the high ﬁdelity model into the parameters of the simpliﬁed model. The optimally compensated scheme performs the projection using full knowledge of the data and prediction matrices of the high ﬁdelity model. Lastly , the data driven scheme uses the same matrix as the naiv e scheme but thro ws away some of the information it contains. The conditions under which the schemes are conservati ve (or optimal) have also been highlighted. The naive scheme should only be applied to optimally simpliﬁed models, this was not satisﬁed and the posterior shown to be ov erconﬁdent and non-conservati ve. The optimally compensated scheme should only be applied to highly parameterized models. And the data driv en scheme is conservati ve only when the predictions and data hav e a similar dependency to the unmodelled complexity . These last two conditions were shown to hold for the problem considered, and the generated posterior distributions were also conserv ativ e. Appendix B. Pr oof of Proposition 1. Pr oof. For a given simpliﬁcation matrix, the parameters of the high ﬁdelity model x can be expanded using (18) as x = Cv + U ¯ C u =  C U ¯ C   v u  . Furthermore, from the decomposition in (17) , the matrix  C U ¯ C  can be rewritten as  C U ¯ C  =  U C U ¯ C   S C V > C 0 0 I  . This is nonsingular and has an in verse that simpliﬁes to  C U ¯ C  − 1 =  C † U > ¯ C  , where C † = V C S − 1 C U > C . Thus, the transformed co variance matrix in the space of v , u has the form  Σ v Σ vu Σ > vu Σ u  =  C † U > ¯ C  Σ x  C † > U ¯ C  . Appendix C. Pr oof of Proposition 2. Pr oof. Under the conditions of part (1) the matrix C denotes an optimal simpliﬁcation and thus GU ¯ C = YU ¯ C = 0 , where U ¯ C is an orthonormal basis for the cokernel of C , e.g. as giv en in (17) . Using the e xpansion of the parameter vector as x = Cv + U ¯ C u , the cov ariance Σ x becomes (65) Σ x = CΣ v C > + U ¯ C Σ u U > ¯ C + CΣ vu U > ¯ C + U ¯ C Σ > vu C > . Substituting (65) into the posterior mean and cov ariance of the optimal scheme, deﬁned in (10) and (11) , and noting that GU ¯ C = YU ¯ C = 0 , the following forms are obtained µ p | d = YCΣ v C > G > ( GCΣ v C > G > + Σ δ ) − 1 d , Σ p | d = YCΣ v C > Y > + Σ ρ − YCΣ v C > G > ( GCΣ v C > G > + Σ δ ) − 1 GCΣ v C > Y > . O VERCOMING MODEL SIMPLIFICA TIONS 33 Noting that ˜ G = GC and ˜ Y = YC these e xpressions are equiv alent to mean and covariance those of the naive scheme, i.e. ˆ µ n p | d = µ p | d and ˆ Σ n p | d = Σ p | d for all d . This proves part (1) of the proposition. T o proceed, to parts (2) and (3), for the naive scheme to be conservati ve, the mean and cov ariance of the posterior must obey (30) , which simpliﬁes to (66) ˆ Σ p | d  E p ( d , p ) n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o Also, the e xpectation ov er d , p is equi valent to an expectation over the independent v ariables x , δ , ρ , where d = Gx + δ and p = Yx + ρ . Thus to prove part (2) of the proposition, it will be now sho wn that this holds under the special condition of YU ¯ C = ˜ Y E n GU ¯ C , deﬁned in (32) . T o start, note that ˆ µ n p | d = ˜ Y E n d , d = [ Gx + δ ] and x = Cv + U ¯ C u , thus the difference ˆ µ n p | d − p can be written ˆ µ n p | d − p = [ ˜ Y E n ˜ G − ˜ Y ] v + ˜ Y E n δ − ρ + [ ˜ Y E n GU ¯ C − YU ¯ C ] u (67) = [ ˜ Y E n ˜ G − ˜ Y ] v + ˜ Y E n δ − ρ . (68) Where, (68) has used condition (32) . Now , as v = C † x , δ , ρ are all independent and zero mean, the expected squared dif ference of ˆ µ n p | d − p can be written in terms of the covariance matrices E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = [ ˜ Y E n ˜ G − ˜ Y ] Σ v [ ˜ Y E n ˜ G − ˜ Y ] > + ˜ Y E n Σ δ E n > ˜ Y > + Σ ρ . Expanding the right hand side with the naive estimator matrix E n = Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 and simplifying, produces the result E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = ˜ Y Σ v ˜ Y > + Σ ρ − ˜ Y Σ v ˜ G > ( ˜ GΣ v ˜ G > + Σ δ ) − 1 ˜ GΣ v ˜ Y > = ˆ Σ n p | d Thus, the special condition YU ¯ C = ˜ Y E n GU ¯ C is suf ﬁcient for (66) to be satisﬁed (with equality), ensuring the naiv e scheme is conservati ve. This proves part (2). T o prov e part (3), consider equation (67) above and assume v and u are independent and ˜ Y E n GU ¯ C 6 = YU ¯ C , then the expected squared dif ference of ˆ µ n p | d − p becomes E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = [ ˜ Y E n ˜ G − ˜ Y ] Σ v [ ˜ Y E n ˜ G − ˜ Y ] > + ˜ Y E n Σ δ E n > ˜ Y > + Σ ρ + [ ˜ Y E n GU ¯ C − YU ¯ C ] Σ u [ ˜ Y E n GU ¯ C − YU ¯ C ] > . Deﬁning M = ˜ Y E n GU ¯ C − YU ¯ C 6 = 0 , this simpliﬁes to E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o = ˆ Σ n p | d + MΣ u M > . Now , as M is a non-zero matrix and Σ u is a non-zero positiv e semi-deﬁnite covariance matrix, MΣ u M > must be non-zero and positive semi-deﬁnite. Thus, the required condition 34 G. M. MA THEWS AND J. VIAL (66) does not hold as E n ( ˆ µ n p | d − p )( ˆ µ n p | d − p ) > o  ˆ Σ n p | d . and the scheme is strictly not conservati ve. This prov es part (3). Appendix D . Proof of Pr oposition 3. Pr oof. T o pro ve optimality of the compensated scheme it is necessary to sho w the condi- tions deﬁned by (34) are satisﬁed by the prior cov ariance matrix ˆ Σ o v deﬁned by this scheme. It is noted that these are satisﬁed when ˜ G ˆ Σ o v ˜ G > = GΣ x G > , ˜ Y ˆ Σ o v ˜ Y > = YΣ x Y > , and ˜ Y ˆ Σ o v ˜ G > = YΣ x G > . These conditions can be combined and rewritten as (69) ˜ Z ˆ Σ o v ˜ Z > = ZΣ x Z > , where Z =  G Y  and ˜ Z =  ˜ G ˜ Y  . Now with the deﬁnition of ˆ Σ o v = ˜ Z † ZΣ x ˜ Z † > Z > , condition (69) becomes ˜ Z ˜ Z † ZΣ x ˜ Z † > Z > ˜ Z > = ZΣ x Z > , and holds when ˜ Z ˜ Z † Z = Z . Now , to demonstrate ˜ Z ˜ Z † Z = Z is satisﬁed, the condition rank( ZC ) = rank( Z ) im- plies that columnspace( ˜ Z ) = columnspace( ZC ) = columnspace( Z ) [ 42 , 3.16]. Further- more, it is noted that columnspace( ˜ Z ) = columnspace( ˜ Z † > ) [ 42 , 7.52(l)], and thus columnspace( Z ) = columnspace( ˜ Z † > ) . This implies that Z > ( I − ˜ Z ˜ Z † ) = 0 [ 42 , 2.34]. Furthermore, as ˜ Z is real, ( ˜ Z ˜ Z † ) > = ˜ Z ˜ Z † . Thus, taking the transpose of Z > ( I − ˜ Z ˜ Z † ) = 0 , produces ( I − ˜ Z ˜ Z † ) Z = 0 and hence ˜ Z ˜ Z † Z = Z . Appendix E. Pr oof of Proposition 4. Pr oof. T o start, note that the inv erse of the prior cov ariance matrix ˆ Σ d v simpliﬁes as follows [ ˆ Σ d v ] − 1 = lim α →∞ [ V 2 V > 2 Σ v V 2 V > 2 + α V 1 V > 1 ] − 1 (70) = V 2 ( V > 2 Σ v V 2 ) − 1 V > 2 (71) Now , the estimator matrix is deﬁned as E d = ˆ Σ d v ˜ G 0 > ( ˜ G 0 ˆ Σ d v ˜ G 0 > + Σ 0 δ ) − 1 . Using the matrix inv ersion identity [ 42 , 15.1(a)], the in verse deﬁnition in (71) , the compact SVD of ˜ G 0 = U 1 S 1 V > 1 , and noting that as ˜ G 0 has full row rank U − 1 1 = U > 1 , the estimator matrix simpliﬁes to E d = [[ ˆ Σ d v ] − 1 + ˜ G 0 > Σ 0 δ − 1 ˜ G 0 ] − 1 ˜ G 0 > Σ 0 δ − 1 = V 1 S − 1 1 U > 1 = ˜ G 0 † . O VERCOMING MODEL SIMPLIFICA TIONS 35 This can be used to directly deﬁne the mean of the prediction posterior ˆ µ d p | d = µ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ , d 0 ) = ˜ Y E d d 0 . W ith similar substitutions as above, and using the W oodbury identity [ 42 , 15.3(b)(i)], the cov ariance of the prediction posterior simpliﬁes to ˆ Σ d p | d = Σ ( ˆ Σ d v , ˜ G 0 , Σ 0 δ , ˜ Y , Σ ρ ) = ˜ Y ˆ Σ d v ˜ Y > + Σ ρ − ˜ Y ˆ Σ d v ˜ G 0 > ( ˜ G ˆ Σ d v ˜ G 0 > + Σ 0 δ ) − 1 ˜ G 0 ˆ Σ d v ˜ Y > = ˜ Y [[ ˆ Σ d v ] − 1 + ˜ G 0 > Σ 0 δ − 1 ˜ G 0 ] − 1 ˜ Y > + Σ ρ = ˜ Y [ V 2 V > 2 Σ v V 2 V > 2 + V 1 S − 1 1 U > 1 Σ 0 δ U 1 S − 1 1 V > 1 ] ˜ Y > + Σ ρ = ˜ Y W > Σ v W ˜ Y > + ˜ Y E d Σ 0 δ E d > ˜ Y > + Σ ρ where W = V 2 V > 2 = I − V 1 V > 1 represents a projection on to the nullspace of ˜ G 0 . This concludes the proof of part (1) of the proposition. For the scheme to be conserv ativ e, the mean and covariance of the posterior must obe y (72) ˆ Σ d p | d  E p ( d , p ) n ( p − ˆ µ d p | d )( p − ˆ µ d p | d ) > o Also, the e xpectation ov er d , p is equi valent to an expectation over the independent v ariables x , δ , ρ , where d = Gx + δ and p = Yx + ρ . Now , the difference between the prediction p = Yx + ρ and the mean ˆ µ d p | d simpliﬁes to p − ˆ µ d p | d = Yx + ρ − ˜ Y E d Fd , = [ ˜ Y − ˜ Y E d ˜ G 0 ] v − ˜ Y E d δ + ρ + [ YU ¯ C − ˜ Y E d G 0 U ¯ C ] u = ˜ Y Wv − ˜ Y E d δ + ρ Furthermore, as v = C † x , δ and ρ are all independent and zero mean, then the difference p − ˆ µ d p | d has an expected v alue of zero and a cov ariance given which is equi valent to ˆ Σ d p | d E n ( p − ˆ µ d p | d )( p − ˆ µ d p | d ) > o = ˜ Y WΣ v W ˜ Y > + ˜ Y E d Σ δ E d > ˜ Y > + Σ ρ = ˆ Σ d p | d . Thus, the required condition (72) is satisﬁed (with equality) and the prediction scheme is conservati ve. This prov es part (2) of the proposition. REFERENCES [1] J . A J G L A N D M . S I M AN DL , On conservativeness of posterior density fusion , in 2013 16th International Conference on Information Fusion (FUSION), July 2013, pp. 85–92. [2] R . C. A S T E R , B . B O R CH ER S , A N D C . H . T H U R BE R , P arameter Estimation and Inver se Problems, Second Edition , Academic Press, W altham, MA, 2 edition ed., Feb . 2012. [3] T. B A I L E Y , S . J U L I E R , A N D G . A G A M E NN O NI , On conservative fusion of information with unknown non- Gaussian dependence , in 2012 15th International Conference on Information Fusion (FUSION), July 2012, pp. 1876–1883. 36 G. M. MA THEWS AND J. VIAL [4] J . B EA R A N D A . H . -D . C H EN G , Modeling groundwater ﬂow and contaminant transport , Springer, Dor- drecht; London, 2010. [5] J . B E R G E R , Statistical Decision Theory and Bayesian Analysis , Springer , New Y ork, 2nd edition ed., 1985. [6] K . B E VE N , On the concept of model structural error , W ater Science and T echnology: A Journal of the International Association on W ater Pollution Research, 52 (2005), pp. 167–175. [7] K . B E V EN A N D J . F R EE R , Equiﬁnality, data assimilation, and uncertainty estimation in mechanistic mod- elling of complex envir onmental systems using the GLUE methodology , Journal of Hydrology , 249 (2001), pp. 11–29, https://doi.org/10.1016/S0022- 1694(01)00421- 8 . [8] M . J . B O X , Bias in Nonlinear Estimation , Journal of the Royal Statistical Society . Series B (Methodological), 33 (1971), pp. 171–201. [9] S . B RO OK S , Handbook for Markov chain Monte Carlo , T aylor & Francis, Boca Raton, 2011. [10] J . C A RR ER A , A. A LC OL EA , A . M E D I N A , J . H I DA L G O , A ND L . J . S L OO T E N , Inverse pr oblem in hydr oge- ology , Hydrogeology Journal, 13 (2005), pp. 206–222, https://doi.org/10.1007/s10040- 004- 0404- 7 . [11] M . P . C LA RK A ND J . A . V RU G T , Unraveling uncertainties in hydr ologic model calibration: Addr essing the pr oblem of compensatory parameters , Geophysical Research Letters, 33 (2006), p. L06406, https: //doi.org/10.1029/2005GL025604 . [12] R . L . C O O LE Y A N D S . C HR IS TE NS E N , Bias and uncertainty in re gr ession-calibrated models of gr oundwater ﬂow in heterog eneous media , Advances in W ater Resources, 29 (2006), pp. 639–656, https://doi.org/10. 1016/j.advwatres.2005.07.012 . [13] P . S . C RA IG , M . G OL DS TE I N , J . C . R O U G I E R , A ND A . H . S EH EU L T , Bayesian F orecasting for Com- plex Systems Using Computer Simulators , Journal of the American Statistical Association, 96 (2001), pp. 717–729, https://doi.org/10.1198/016214501753168370 . [14] J . D O H E RTY A N D S . C H R I ST EN SE N , Use of pair ed simple and complex models to r educe pr edictive bias and quantify uncertainty , W ater Resources Research, 47 (2011), pp. n/a–n/a, https://doi.org/10.1029/ 2011WR010763 . [15] J . D O HE RTY A ND C . T . S IM MO NS , Gr oundwater modelling in decision support: reﬂections on a uni- ﬁed conceptual framework , Hydrogeology Journal, 21 (2013), pp. 1531–1537, https://doi.org/10.1007/ s10040- 013- 1027- 7 . [16] J . D OH E RTY A N D D . W E LT E R , A short explor ation of structural noise , W ater Resources Research, 46 (2010), https://doi.org/10.1029/2009WR008377 . [17] R . F E RD OW S I A N , D . J . P A NN E LL , C . M C C A RR O N , A . R Y D E R , A N D L . C R O S S I NG , Explaining gr oundwa- ter hydr ogr aphs: separating atypical rainfall events fr om time tr ends , Soil Research, 39 (2001), pp. 861– 876. [18] R . A . F R E E ZE , J . M A S S M AN N , L . S M I T H , T. S P E RL IN G , A N D B . J A M E S , Hydr ogeolo gical Decision Analy- sis: 1. A F ramework , Ground W ater , 28 (1990), pp. 738–766, https://doi.org/10.1111/j.1745- 6584.1990. tb01989.x . [19] M . G O L DS TE I N A N D J . R O U G I E R , Probabilistic F ormulations for T ransferring Inferences from Mathematical Models to Physical Systems , SIAM Journal on Scientiﬁc Computing, 26 (2004), pp. 467–487, https: //doi.org/10.1137/S106482750342670X . [20] M . G O L D ST EI N A N D J . R O U GI ER , Bayes Linear Calibrated Pr ediction for Complex Systems , Jour- nal of the American Statistical Association, 101 (2006), pp. 1132–1143, https://doi.org/10.1198/ 016214506000000203 . [21] G . C . G OO D W I N , M . G E VE RS , A N D B . N IN NE SS , Quantifying the error in estimated transfer functions with application to model or der selection , IEEE T ransactions on Automatic Control, 37 (1992), pp. 913–928, https://doi.org/10.1109/9.148344 . [22] G . C . G O O D W I N A N D M . E . S A LG A DO , A stochastic embedding approac h for quantifying uncertainty in the estimation of r estricted complexity models , International Journal of Adaptiv e Control and Signal Processing, 3 (1989), pp. 333–356, https://doi.org/10.1002/acs.4480030405 . [23] H . V . G U P T A , M . P . C LA RK , J . A . V R U G T , G . A B R AM OW I T Z , A N D M . Y E , T owards a compr ehensive assessment of model structural adequacy , W ater Resources Research, 48 (2012), p. W08301, https: //doi.org/10.1029/2011WR011044 . [24] R . J . H UN T , J . D O H ERT Y , AN D M . J . T O N K I N , Are Models T oo Simple? Arguments for Increased P arame- terization , Ground W ater , 45 (2007), pp. 254–262, https://doi.org/10.1111/j.1745- 6584.2007.00316.x . [25] M . C . K E N N E DY A N D A . O ’ H AG A N , Bayesian calibration of computer models , Journal of the Royal Sta- tistical Society: Series B (Statistical Methodology), 63 (2001), pp. 425–464, https://doi.org/10.1111/ 1467- 9868.00294 . [26] L . L J U NG , Model V alidation and Model Err or Modeling , in Proceedings of the Asrom Symposium on Control, Lund, Sweden, 1999, pp. 15–42. [27] L . L J UN G , System Identiﬁcation: Theory for the User , Prentice Hall, Upper Saddle River , NJ, 2 edition ed., Jan. 1999. [28] L . L JU N G , P erspectives on system identiﬁcation , Annual Revie ws in Control, 34 (2010), pp. 1–12, https: //doi.org/10.1016/j.arcontrol.2009.12.001 . O VERCOMING MODEL SIMPLIFICA TIONS 37 [29] L . L JU NG , G . C . G O O D W I N , A N D J . C . A G ER O , Stochastic Embedding Revisited: A Modern Interpretation , nation, 15 (2014), p. 35. [30] D . M C L AU G H LI N A N D L. R . T O W N L E Y , A Reassessment of the Groundwater Inverse Pr oblem , W ater Re- sources Research, 32 (1996), pp. 1131–1161, https://doi.org/10.1029/96WR00160 . [31] D . M C L AU GH LI N A N D E . F. W O O D , A distributed parameter appr oach for evaluating the accuracy of gr oundwater model predictions: 2. Application to gr oundwater ﬂow , W ater Resources Research, 24 (1988), pp. 1048–1060, https://doi.org/10.1029/WR024i007p01048 . [32] C . M O OR E A N D J . D O H ERT Y , Role of the calibration pr ocess in r educing model pr edictive err or , W ater Resources Research, 41 (2005), p. W05020, https://doi.org/10.1029/2004WR003501 . [33] B . N I NN ES S AN D G. C . G O O DW I N , Estimation of model quality , Automatica, 31 (1995), pp. 1771–1797, https://doi.org/10.1016/0005- 1098(95)00108- 7 . [34] E . P O E T ER , All Models ar e Wr ong, How Do W e Know Which ar e Useful? , Ground W ater , 45 (2007), pp. 390– 391, https://doi.org/10.1111/j.1745- 6584.2007.00350.x . [35] E . P . P O E T ER A N D M . C . H I L L , MMA, A computer code for Multi-Model Analysis , U.S. Geological Survey T echniques and Methods 6-E3, 2007. [36] P . R E IC HE RT A N D J . M I E LE IT NE R , Analyzing input and structur al uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters , W ater Resources Research, 45 (2009), p. W10402, https: //doi.org/10.1029/2009WR007814 . [37] W . R E IN ELT , A . G A RU L L I , A N D L . L J U N G , Comparing different approac hes to model error modeling in r obust identiﬁcation , Automatica, 38 (2002), pp. 787–803, https://doi.org/10.1016/S0005- 1098(01) 00269- 2 . [38] R . R O JAS , S . K A H U ND E , L . P E ET E RS , O . B AT E L A A N , L . F EY EN , A N D A . D A SS A RG UE S , Application of a multimodel appr oach to account for conceptual model and scenario uncertainties in groundwater modelling , Journal of Hydrology , 394 (2010), pp. 416–435, https://doi.org/10.1016/j.jhydrol.2010.09. 016 . [39] J . R O U G I ER , Probabilistic Infer ence for Future Climate Using an Ensemble of Climate Model Evaluations , Climatic Change, 81 (2007), pp. 247–264, https://doi.org/10.1007/s10584- 006- 9156- 9 . [40] J . R O U G I ER A N D M . C RU C IFI X , Uncertainty in climate science and climate policy , [physics], (2014). arXi v: 1411.6878. [41] D . W . S C OT T , Multivariate Density Estimation: Theory, Practice, and V isualization , Wile y , New Y ork, 1st edition ed., Aug. 1992. [42] G . A . F. S E B ER , A matrix handbook for statisticians , W iley-Interscience, Hoboken, N.J., 2008. [43] J . Q . S M IT H , Bayesian Decision Analysis: Principles and Practice , Cambridge Uni versity Press, Sept. 2010. [44] M . S T R O N G A N D J . O A K L E Y , When Is a Model Good Enough? Deriving the Expected V alue of Model Impr ovement via Specifying Internal Model Discr epancies , SIAM/ASA Journal on Uncertainty Quantiﬁ- cation, 2 (2014), pp. 106–125, https://doi.org/10.1137/120889563 . [45] A . T A R A N T O L A , In verse Pr oblem Theory and Methods for Model P arameter Estimation , Society for Indus- trial and Applied Mathematics, Philadelphia, P A, 1 edition ed., 2005. [46] J . R . V O N A SM UT H , K . M AA S , M . B A K K E R , A N D J . P E T ER SE N , Modeling T ime Series of Ground W ater Head Fluctuations Subjected to Multiple Str esses , Ground W ater, 46 (2008), pp. 30–40, https://doi.org/ 10.1111/j.1745- 6584.2007.00382.x . [47] C . I . V O S S , Editors message: Gr oundwater modeling fantasies - part 1, adrift in the details , Hydrogeology Journal, 19 (2011), pp. 1281–1284, https://doi.org/10.1007/s10040- 011- 0789- z . [48] C . I . V O SS , Editors message: Gr oundwater modeling fantasies - part 2, down to earth , Hydrogeology Journal, 19 (2011), pp. 1455–1458, https://doi.org/10.1007/s10040- 011- 0790- 6 . [49] J . A . V RU G T , C . G . H . D IK S , H . V . G U P TA , W . B O U TE N , A N D J . M . V E R S TR A T E N , Impr oved treatment of uncertainty in hydr ologic modeling: Combining the strengths of global optimization and data assimi- lation , W ater Resources Research, 41 (2005), p. W01017, https://doi.or g/10.1029/2004WR003059 . [50] T. A . W ATS ON , J . E . D O HE RT Y , A N D S . C H R I ST E NS EN , P arameter and predictive outcomes of model simpliﬁcation , W ater Resources Research, (2013), pp. 3952–3977, https://doi.or g/10.1002/wrcr.20145 . [51] J . T. W H I T E , J . E . D O H ERT Y , A N D J . D . H U G H E S , Quantifying the predictive consequences of model error with linear subspace analysis , W ater Resources Research, 50 (2014), pp. 1152–1173, https://doi.org/10. 1002/2013WR014767 .

Overcoming model simplifications when quantifying predictive uncertainty

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment