Model Error Embedding with Orthogonal Gaussian Processes

M O D E L E R R O R E M B E D D I N G W I T H O RT H O G O N A L G AU S S I A N P R O C E S S E S A P R E P R I N T Mridula Kuppa Department of Aerospace Engineering Univ ersity of Illinois Urbana Champaign Champaign, IL 61801 USA vkuppa2@illinois.edu Khachik Sargsyan * Sandia National Laboratories Liv ermore, CA 94551 USA ksargsy@sandia.gov Marco P anesi Univ ersity of California Irvine Mechanical and Aerospace Engineering Irvine, CA 92697 USA mpanesi@uci.edu Habib N. Najm Sandia National Laboratories Liv ermore, CA 94551 USA hnnajm@sandia.gov February 23, 2026 A B S T R AC T Computational models of complex physical systems often rely on simplifying assumptions which ine vitably introduce model error , with consequent predicti ve errors. Giv en data on model observables, the estimation of parameterized model-error representations, along with other model parameters, would be ideally done while separating the contributions of each of the two sets of parameters, in order to ensure meaningful stand-alone model predictions. This work b uilds an embedded model error framew ork using a weight-space representation of Gaussian processes (GPs) to ﬂe xibly capture model-error spatiotemporal correlations and enable inference with GP-embedding in non-linear models. T o disambiguate model and model-error/bias parameters, we extend an existing orthogonal GP method to the embedded model-error setting and derive appropriate orthogonality constraints. T o address the increased dimensionality introduced by the GP representation, we emplo y the likelihood- informed subspace method. The construction is demonstrated on linear and non-linear examples, where it ef fectively corrects model predictions to match data trends. Extrapolation be yond the training data recov ers the prior predicti ve distribution, and the orthogonality constraints lead to meaningful stand-alone model predictions and nearly uncorrelated posteriors between model and model-error parameters. Keyw ords model error embedding · Gaussian processes · orthogonality · likelihood-informed subspace 1 Introduction Computational models are ubiquitous in science and engineering, enabling the prediction of comple x physical phenom- ena arising in ﬂuid and solid mechanics, chemical kinetics, biological systems, and many other domains. These models are typically deriv ed from theoretical principles or constructed using empirical relations when theoretical understanding is incomplete. In many applications, the underlying physics is not fully understood or experimental data are limited due to cost, complexity , or safety considerations. As a result, simplifying assumptions are often introduced during model dev elopment, ine vitably leading to a degree of model inadequac y (also referred to in this work as model error , model discrepancy , or bias). ∗ Corresponding author Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T Moreov er , computational models depend on a set of independent variables, such as space or time, and a collection of input parameters that must be speciﬁed to e valuate the model. For e xample, permeability must be prescribed in groundwater simulations to predict hydraulic head, and Arrhenius rate coef ﬁcients must be speciﬁed in chemical kinetics models to simulate the ev olution of species concentrations. These parameters are typically inferred through model calibration, wherein limited observ ational data, obtained either from experiments or from higher -ﬁdelity simulations, are used to learn the appropriate parameter values. In this work, we focus on the Bayesian formulation of such in verse problems while explicitly accounting for the model error . Bayesian approaches to model calibration have been an activ e area of research for sev eral decades [25, 45, 46, 48]. In particular , Gaussian processes (GPs) have been widely used to deﬁne priors ov er unknown functions and also as cheap emulators for the expensi ve computer models [8, 12, 20, 29, 35, 38, 49]. Often, early work on Bayesian model calibration either subsumed model error into the observational error term or neglected it altogether by assuming it to be negligible. A ke y advance in explicitly accounting for model inadequacy w as introduced by K ennedy and O’Hagan [27] (hereafter KOH). Their framework augments the computational model with a GP representation of the discrepancy between the model predictions and reality , and jointly infers both the model parameters and the discrepancy function using observ ational data. While the ﬂe xibility of the additi ve GP allo ws the capture of the spatial correlation structure of model error [6, 22, 28], se veral studies ha ve highlighted ke y challenges with this formulation, most notably the difﬁculty in controlling ho w data informs the model parameters versus ho w they inform the GP [3, 4, 9, 17, 40, 51, 57]. This makes it dif ﬁcult to assess the o verall predicti ve capability of the computational model. In addition, without careful prior-design [10], the GP-augmented predictions may not adhere to the underlying physical constraints and gov erning equations. Recent theoretical work has sho wn that under simplifying assumptions of an inexpensiv e computer code, deterministic data, and a known cov ariance function for the GP bias, KOH calibration produces maximum likelihood estimates of the model parameters that depend on the reproducing kernel Hilbert space (RKHS) norm based on the bias cov ariance [56, 57]. This implies that the estimated parameter values are tied to the prior speciﬁcation of the bias, which results in the K OH approach being L 2 -inconsistent, meaning that the parameter values minimizing the L 2 loss are not recov ered ev en in the inﬁnite-data re gime [56]. Sev eral modiﬁcations hav e been proposed to address this limitation, including modular approaches that separately ﬁt the model and the GP [2, 7]; prior tuning strategies that shape the discrepancy GP using physical kno wledge about the parameters [10]; orthogonalizing the discrepancy GP with respect to model gradients, leading to the orthogonal Gaussian process (OGP) framew ork [43, 44]; and scaled GPs that reconcile L 2 calibration of [56] with the K OH framework [18]. Among these, the OGP approach is particularly appealing because it provides a foundationally principled mechanism to disentangle the roles of the model and discrepancy within joint Bayesian calibration. A more physically informed approach to model discrepancy is the embedded/internal model correction framew ork, in which stochastic correction terms are embedded within the computational model [5, 36, 39, 42, 53] rather than being additiv e on model outputs. In the embedded model-error formulation of [53], a statistical model is embedded within speciﬁc physical model components where approximations are known to ha ve been made. Estimation of parameters in this model error construction provides predictions with uncertainty accounting for model error , while also serving to provide diagnostic information on the importance of dif ferent approximations on resulting predictive uncertainty . The simplest version of this approach in volv es enriching existing model parameters with additive random v ariables. Further, when polynomial chaos e xpansions (PCEs) are used to construct a surrog ate for the dependence of model quantities of interest (QoIs) on uncertain model parameters, additional stochastic dimensions (germs) representing model error are employed to augment select model parameters, and the associated PCE coef ﬁcients are inferred alongside the physical model parameters. Subsequent uncertainty propag ation yields predictions that remain consistent with the underlying physics while enabling the attribution of discrepanc y to speciﬁc submodels. This framew ork has been successfully applied in div erse contexts, including chemical ignition modeling [19, 53], scramjet design [24], materials modeling [21], hypersonic ﬂo w modeling [31], and has been extended to account for noisy observ ations [52]. Most of the existing work on embedded model error has focused on random v ariable embeddings, which typically ignore the spatial and/or temporal dependence of the model error . A recent e xception is the w ork of [15], which employs a 2-degree of freedom GP embedding, relying on the Karhunen Loève expansion (KLE) [11, 26, 50], to represent model error , lev eraging the ﬂexibility of GPs to capture some spatial correlation in the context of nonlocal operator regression. While there is an e xtensiv e literature on regression and generati ve modeling for non-linear data – such as warped GPs [54], deep GPs [14], and multiﬁdelity GPs [41] – the problem considered in this work is fundamentally different. In our setting, the computational model is already known, and the objective is to calibrate this model using observ ational data while explicitly accounting for model error . Thus, rather than learning a purely data-dri ven surrogate, we aim to infer model parameters and model discrepanc y in a Bayesian framework. Similarly , while there is signiﬁcant work [32 – 34] on Bayesian learning of random ﬁelds, e.g. spatially dependent model parameters/inputs, the 2 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T disambiguation challenges that emerge with the combination of physical parameterization and embedded random ﬁeld model error constructions distinguishes the present context. One reason for the relatively limited literature on GP-based embedded model error is the difﬁculty of inferring function-space GPs when they are embedded within non-linear , often non-in vertible, computational models. T o address this challenge, we propose adopting the weight-space representation of Gaussian processes [58], in which the GP is e xpressed as a weighted sum of eigenfunctions of the underlying covariance kernel. This representation enables inference of a GP embedded within a non-linear model without requiring in vertibility . Nev ertheless, Bayesian inference remains computationally challenging due to the increased dimensionality of the parameter space, as the GP weights must be inferred jointly with the computational model parameters. W e ar gue that recent adv ances in dimensionality reduction of high-dimensional Bayesian inference problems [13, 33, 55] can be lev eraged to mitigate these challenges. Howe ver , the issue of disambiguation between the computational model and the embedded model bias persists and must be carefully addressed in this embedded model error setting. This work has three main contributions. First, we extend the general embedded model error framework of [53] to increase the ﬂexibility of model error representations by adopting the weight-space perspecti ve of Gaussian processes. Second, to address the disambiguation issue and disentangle the contributions of the model from those of the embedded GP , we adapt the OGP methodology of [43] to the embedded model error setting. This ensures that the model’ s standalone predictions remain meaningful (with respect to a speciﬁed loss function), which is especially valuable when extrapolating beyond the data inputs. And lastly , we show ho w the likelihood-informed subspace (LIS) approach [13, 55], which enables ef ﬁcient dimensionality reduction for Bayesian inference problems, can be employed to retain a suf ﬁciently large set of basis functions in the weight-space representation of GPs for model error embedding. The remainder of the paper is or ganized as follows. Section 2 revie ws Gaussian processes, highlighting the equi valence between the function-space and weight-space perspectives. Section 3 presents the additi ve model error formulation along with the corresponding OGP construction. In Section 4, we introduce our embedded model error methodology using the weight-space GP formulation and describe two approaches for enforcing orthogonality in the embedded setting: linearized OGP (LOGP) and re gularized OGP (R OGP). Section 5 revie ws highlights of the LIS approach, which enables efﬁcient dimensionality reduction for Bayesian inference. In Section 6, three e xamples – a linear model, a non-linear model with interacting submodels, and an advection-dif fusion-reaction PDE, demonstrate the features of the proposed methodology . Finally , Section 7 provides concluding remarks and directions for future work. 2 Gaussian Processes The weight-space perspecti ve of Gaussian processes is particularly con venient for GP-based model error embedding within non-linear computational models, as it enables working with a ﬁnite set of weights associated with a ﬁx ed set of basis functions, in contrast to the non-parametric function space view . While the two perspectiv es are equi valent in principle, it is useful to ﬁrst e xamine how the y compare in practice, both in terms of predictiv e capability and inference using data. W e study this comparison in the regression setting with a training dataset D = { ( x i , y i ) | i = 1 , . . . , N } , with inputs x i ∈ X ⊂ R D and outputs y i ∈ R . The objective is to learn the relationship between x and y in order to make predictions at ne w , unseen inputs. Let X ∈ R N × D denote the matrix of all training inputs and y ∈ R N the vector of corresponding training outputs. W e ne xt summarize ho w the function-space and weight-space vie ws achie ve this goal. For a detailed comparison, we refer the reader to [58]. 2.1 Function-Space V iew In the function-space (FS) formulation, we begin by deﬁning a prior distribution ov er functions. A Gaussian process is fully speciﬁed by a mean function m ( x ) and a cov ariance function k ( x , x ′ ) : m ( x ) = E [ f ( x )] , (1) k ( x , x ′ ) = E [( f ( x ) − m ( x ))( f ( x ′ ) − m ( x ′ ))] . (2) W e write f ( x ) ∼ GP ( m ( x ) , k ( x , x ′ )) , to indicate that for any ﬁnite collection of inputs x 1 , . . . , x N ∈ X , the v ector [ f ( x 1 ) , . . . , f ( x N )] follows a multi variate normal distribution with the gi ven mean and cov ariance functions. Assuming independent and identically distrib uted ( iid ) Gaussian noise with zero mean and known vari ance σ 2 d , we model the observations as y = f ( x ) + ε, ε ∼ N (0 , σ 2 d ) . (3) 3 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T W ith a set of test inputs X ∗ ∈ R N t × D and f ∗ := [ f ( x ∗ 1 ) , . . . , f ( x ∗ N t )] ⊤ , the joint prior distribution of the noisy observations y and the latent function values at the test inputs is gi ven by [58]  y f ∗  ∼ N  0 ,  K ( X , X ) + σ 2 d I K ( X , X ∗ ) K ( X ∗ , X ) K ( X ∗ , X ∗ )  , where the covariance matrices K ( · , · ) are obtained by e valuating the kernel function k ( x , x ′ ) at all relev ant pairs of points, and I denotes the N × N identity matrix. Conditioning this joint Gaussian prior on the observed data yields the posterior predictiv e distribution [58] f ∗ | X, y , X ∗ ∼ N  ¯ f ∗ , co v ( f ∗ )  , (4) with ¯ f ∗ = K ( X ∗ , X ) [ K ( X , X ) + σ 2 d I ] − 1 y , (5) co v ( f ∗ ) = K ( X ∗ , X ∗ ) − K ( X ∗ , X ) [ K ( X , X ) + σ 2 d I ] − 1 K ( X , X ∗ ) . (6) 2.2 W eight-Space V iew The weight-space (WS) GP interpretation begins with Mercer’ s theorem [30], which states that an y positiv e deﬁnite kernel function can be written as an (in general inﬁnite) series e xpansion in the kernel eigenfunctions and eigen values: k ( x , x ′ ) = ∞ X i =0 λ i ϕ i ( x ) ϕ i ( x ′ ) . The eigenfunctions, ϕ i , and eigen values, λ i , are obtained from the integral equation Z X k ( x , x ′ ) ϕ ( x ) dµ ( x ) = λ ϕ ( x ′ ) , where µ denotes a measure ov er X . These eigenfunctions form an orthogonal basis and may be normalized using Z X ϕ i ( x ) ϕ j ( x ) dµ ( x ) = δ ij . Any L 2 function that lies in the span of { ϕ i } can therefore be expressed as [37]: f ( x ) = ∞ X i =0 ϕ i ( x ) w i , (7) where w i are the coefﬁcients (weights) associated with the basis functions. In practice, the inﬁnite series is truncated to the ﬁrst m terms: f m ( x ) = ϕ ( x ) ⊤ w , (8) where ϕ ( x ) = [ ϕ 0 ( x ) , . . . , ϕ m − 1 ( x )] ⊤ and w ∈ R m . Placing a Gaussian prior on the weights, w ∼ N ( 0 , Σ p ) , Σ p = diag( λ 0 , . . . , λ m − 1 ) , and using Eqs. (1)–(2), we obtain the corresponding mean and cov ariance functions in the FS view [58]: m ( x ) = ϕ ( x ) ⊤ E [ w ] = 0 , (9) k ( x , x ′ ) = ϕ ( x ) ⊤ E [ w w ⊤ ] ϕ ( x ′ ) = ϕ ( x ) ⊤ Σ p ϕ ( x ′ ) = m − 1 X i =0 λ i ϕ i ( x ) ϕ i ( x ′ ) . (10) Therefore, we see that truncation introduces a local variance deﬁcit at location x , gi ven by ∆V ar( x ) = V ar[ f ( x )] − V ar[ f m ( x )] = ∞ X i = m λ i ϕ i ( x ) 2 . (11) T o illustrate this deﬁcit, consider a one-dimensional input x distributed according to N (0 , 2) and the squared exponential (SQE) cov ariance function k ( x, x ′ ) = σ 2 exp  − ( x − x ′ ) 2 / (2 ℓ 2 )  with σ = 1 . For this setting, analytical expressions for the eigenfunctions and eigenv alues are a vailable [58]. Figure 1 compares plus/minus one standard deviation obtained 4 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T from the FS view with that obtained from the truncated WS representation for m = 10 , 20 , and 40 . The variance deﬁcit is smallest near x = 0 and increases as we move a way . This behaviour of the deﬁcit is tied to the sampling distrib ution of x as well as the k ernel function. For shorter correlation lengths, the eigen values decay more slo wly , resulting in a faster loss of v ariance and requiring a larger number of terms to accurately approximate the FS vie w . 2 0 1 0 0 1 0 2 0 x 1 . 0 0 0 . 7 5 0 . 5 0 0 . 2 5 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 ± o n e s t a n d a r d d e v i a t i o n (a) ℓ = 0 . 2 2 0 1 0 0 1 0 2 0 x 1 . 0 0 0 . 7 5 0 . 5 0 0 . 2 5 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 ± o n e s t a n d a r d d e v i a t i o n (b) ℓ = 1 Figure 1: Panels (a) and (b) show the comparison of plus/minus one standard deviation obtained using the FS vie w (dashed black) and the WS view with m = 10 (red), 20 (green), and 40 (blue) for correlation lengths 0.2 and 1 respectiv ely . In the WS representation, the noisy observations are used to compute the posterior distribution of the weights via Bayes theorem: π ( w | y ) ∝ π ( y | w ) π ( w ) . Using the statistical model in Eq. (3), the likelihood is written as π ( y | w ) = 1 (2 π σ 2 d ) N/ 2 exp  − 1 2 σ 2 d ∥ y − Φ ⊤ w ∥ 2  , where Φ is the m × N matrix formed by stacking the basis e valuations ϕ ( x i ) for all training points. Since both the likelihood and prior are Gaussian, the posterior is also Gaussian, gi ven by [58]: π ( w | y ) ∼ N ( ¯ w , A − 1 ) , A = σ − 2 d ΦΦ ⊤ + Σ − 1 p , ¯ w = σ − 2 d A − 1 Φ y . Finally , pushing forward the posterior distribution of the weights through the linear model Eq. (8) yields the push forward posterior (PFP) distrib ution at a new input x ∗ [58]: f ∗ | x ∗ , X , y ∼ N  1 σ 2 d ϕ ( x ∗ ) ⊤ A − 1 Φ y , ϕ ( x ∗ ) ⊤ A − 1 ϕ ( x ∗ )  . (12) T runcation at a ﬁnite number of basis functions that leads to the collapse of prior predicti ve standard de viation in the WS perspectiv e as seen in Figure 1 translates to the PFP in Eq (12) when the test input x ∗ is far aw ay from the training data, irrespecti ve of the Gaussian noise present in the data. Figure 2 sho ws the PFP distrib utions with three standard deviations obtained using the FS and WS formulations. W e take N = 20 data points generated using the function f ( x ) = 1 + x + sin x with x ∼ N (0 , 2) . T wo cases are compared, one with an iid noise variance of 10 − 10 and the other with 0 . 1 . The number of basis functions is ﬁxed to m = 40 in the WS vie w . W e see an exact correspondence between FS and WS in the vicinity of the training data with a collapse in prediction uncertainty using WS perspecti ve as we mov e far away from the data points. T o summarize this section, the WS view , allows us to deal with a ﬁnite set of weights for a ﬁx ed set of basis functions while maintaining a close correspondence with the nonparametric FS view as long as we do not extrapolate too far away from the training data points. W ith this precaution in mind, the WS view can be used in practice as a ﬂe xible GP-based embedded model error representation for both linear and non-linear computational models. 5 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 1 0 0 1 0 x 4 3 2 1 0 1 2 3 4 5 output (a) FS, σ 2 d = 10 − 10 1 0 0 1 0 x 4 3 2 1 0 1 2 3 4 5 output (b) WS, σ 2 d = 10 − 10 1 0 0 1 0 x 4 3 2 1 0 1 2 3 4 5 output (c) FS, σ 2 d = 0 . 1 1 0 0 1 0 x 4 3 2 1 0 1 2 3 4 5 output (d) WS, σ 2 d = 0 . 1 Figure 2: Comparison of PFP mean and plus/minus three standard de viations using FS and WS formulations. GP cov ariance function hyperparameters are σ = 1 and l = 1 . In panels (a) and (b) the noise variance is set to 10 − 10 and in panels (c) and (d) it is set to 0.1. 3 Additive Model Err or In this section, we ﬁrst recall the general Bayesian frame work for handling model error using additive GP representations. W e then introduce the OGP formulation, which addresses the well-known issue of confounding between model parameters and the bias term. 3.1 General Formulation In a typical model calibration problem, we are given N observations, collected in a v ector y ∈ R N , corresponding to a matrix of N inputs, X ∈ R N × D . The data are assumed to be contaminated by iid Gaussian noise with kno wn standard deviation σ d , such that y ( x ) = f t ( x ) + ε ( σ d ) , (13) where f t ( x ) denotes the unkno wn true function and ε ( σ d ) ∼ N (0 , σ 2 d ) . The goal is to infer the model parameters λ ∈ Λ ⊂ R p of a computational model f ( x ; λ ) and subsequently make predictions at arbitrary points distinct from the data inputs. Kennedy and O’Hagan [27] proposed an additi ve GP approach to explicitly model the discrepanc y between the truth and the predictions from the computational model. Their data model is giv en by y ( x ) = f ( x ; λ ) + δ ( x ) | {z } g ( x ; λ ) + ε ( σ d ) , (14) 6 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T where δ ( x ) ∼ GP (0 , k ( x , x ′ )) represents the additi ve model error (a non-zero mean could be used, but we adopt the zero-mean form for simplicity). Let f λ denote the vector of model e valuations at the input locations X . Using Bayes’ rule, the posterior distribution of the model parameters becomes π ( λ | y ) ∝ π ( y | λ ) π ( λ ) , where π ( λ ) is the parameter prior and π ( y | λ ) is the likelihood. From (14), the data satisfy y ∼ GP  f ( x ; λ ) , k ( x , x ′ ) + σ 2 d δ xx ′  , where δ xx ′ is the Kronecker delta function. T o av oid clutter in the notation, we do not show the GP hyperparameters and assume that they are giv en, or estimated, e.g. by maximizing the marginal likelihood. The parameter posterior may thus be written as π ( λ | y ) ∝ π ( λ ) | K + σ 2 d I | 1 / 2 exp  − 1 2 ( y − f λ ) ⊤ ( K + σ 2 d I ) − 1 ( y − f λ )  . (15) where K is the N × N matrix with elements k ( x i , x j ) . Having obtained the parameter posteriors, the posterior distribution of the bias function at some arbitrary location, x ′ , can now be obtained with [43] π ( δ ( x ′ ) | y ) = Z π ( δ ( x ′ ) | y , λ ) π ( λ | y ) d λ , (16) where the distribution π ( δ ( x ′ ) | y , λ ) follows from Eqs. (4) – (6) with y replaced by y − f λ in Eq. (5) . W ith this straightforward K OH formulation, we see from Eqs. (15) and (16) that there is no b uilt-in mechanism to control ho w data inform the parameters λ versus ho w they inform the bias δ ( x ) . This makes it difﬁcult to compare the contributions of the various parameters to the predictiv e performance of the model and more importantly assess the ov erall predictiv e capability of the computational model. In the next section, we describe the OGP formulation of [43], which pro vides a principled mechanism to address this issue within a joint Bayesian framew ork. 3.2 Formulation with Orthogonal Gaussian Processes W e note from Eq. (15) that the parameter posterior depends on the bias cov ariance matrix K . The method introduced in [43] modiﬁes this co variance based on a “best” parameter value, λ ∗ , deﬁned according to a suitable loss function, so that the parameter posterior lies in the vicinity of λ ∗ . Speciﬁcally , the bias cov ariance is modiﬁed to enforce orthogonality between the discrepancy and the gradient of the computer model e valuated at λ ∗ . Assuming both the true function and the computer model to be deterministic, the approach be gins by deﬁning a loss functional that can uniquely identify the best parameter value. Using the L 2 loss weighted by a measure µ ov er X [43], L L 2 ( µ ) { f t ( · ) − f ( · , λ ) } = Z X ( f t ( x ) − f ( x ; λ )) 2 dµ ( x ) , (17) the optimal parameter λ ∗ is deﬁned as λ ∗ = argmin λ ∈ Λ L L 2 ( µ ) { f t ( · ) − f ( · , λ ) } where Λ denotes an admissible parameter space. The true function may then be approximated as f t ( x ) ≈ f ( x ; λ ∗ ) + δ ( x ) , where δ ( x ) represents the model discrepanc y . Substituting the above approximation into Eq. (17) yields L L 2 ( µ ) { f t ( · ) − f ( · , λ ) } = Z X ( f ( x ; λ ∗ ) + δ ( x ) − f ( x ; λ )) 2 dµ ( x ) . Because this loss function is minimized at λ ∗ , its gradient with respect to λ must vanish at λ ∗ , leading to the orthogonality condition [43] Z X ∇ λ f ( x ; λ )   λ ∗ δ ( x ) dµ ( x ) = 0 . (18) This condition results in p linear constraints on the discrepancy process: L k ( δ ) := Z X ∂ f ∂ λ k ( x ; λ )   λ ∗ δ ( x ) dµ ( x ) = 0 , k = 1 , . . . , p. (19) 7 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T Starting with the prior δ ( x ) ∼ GP (0 , k ( x , x ′ )) , the modiﬁed cov ariance obtained by conditioning the prior GP on these linear constraints is giv en by [43, 44] k λ ∗ ( x , x ′ ) = k ( x , x ′ ) − h λ ∗ ( x ) ⊤ H − 1 λ ∗ h λ ∗ ( x ′ ) , (20) where h λ ∗ ( x ) = Z X ∇ λ f ( x ′ ; λ )   λ ∗ k ( x , x ′ ) dµ ( x ′ ) ∈ R p , (21) H λ ∗ = Z X Z X ∇ λ f ( x ′ ; λ )   λ ∗ ∇ λ f ( x ; λ )   ⊤ λ ∗ k ( x ′ , x ) dµ ( x ′ ) dµ ( x ) ∈ R p × p . (22) In practice, λ ∗ is obtained by solving the least squares (LS) problem using the observ ed data y and the computer model. The integrals in Eqs. (21) – (22) are typically computed numerically , except in certain special cases where closed-form expressions are a vailable (see [44]). Because k λ ∗ incorporates the orthogonality condition implied by Eq. (18) , adopting the prior δ ( x ) ∼ GP (0 , k λ ∗ ( x , x ′ )) prev ents the discrepancy from being confounded with the best-ﬁt parameter value λ ∗ . This results in a posterior distribution π ( λ | y ) that lies in the vicinity of λ ∗ . Also, the posterior distribution in the FS view using the OGP framew ork has the same form as Eq. (15), with K replaced by K λ ∗ . The WS vie w may also be used by expanding the discrepancy in the eigenfunctions of k λ ∗ . T o see ho w orthogonality modiﬁes the basis functions is WS GP representation, δ w ( x ) = ϕ ( x ) ⊤ w , w ∈ R m , consider a linear model f ( x ; λ ) = a ( x ) ⊤ λ , where a ( x ) = [ a 1 ( x ) , · · · , a p ( x )] ⊤ . Substituting the GP basis expansion and model gradients in Eq. (18) results in Z X ϕ k ( x ) a i ( x ) dµ ( x ) = 0 , k = 0 , · · · , m − 1 , i = 1 , · · · , p Thus, each basis function is orthogonalized with respect to the a i ( x ) functions. In the next section, we build upon this framew ork for orthogonality in model error embedding with WS GPs. 4 Model Error Embedding with Orthogonal Gaussian Pr ocesses As proposed in [53], the idea of internal model correction is to embed model error terms within the computer model so that propagation of model error satisﬁes the constraints of the model leading to physically consistent predictions. Along the same lines, we extend the methodology of [53] and embed our model f with a WS GP to make the model error characterization more ﬂexible. The data model can be written as y = ˜ f ( x ; λ , δ w ( x )) | {z } g ( x ; λ , w ) + ε ( σ d ) , (23) where δ w ( x ) = ϕ ( x ) ⊤ w is the WS GP whose basis functions correspond to a cov ariance function k ( x , x ′ ) . In the case of ˜ f being a linear model, we focus on additi ve embedding in order to retain linearity in ( λ , w ) . As a result, this case reduces to the K OH framework. Similarly , our framework for additiv e model error embedding with OGP when applied to linear models simply recov ers the OGP framew ork of [43] described in the previous section. Therefore, for the subsequent dev elopment of our methodology we focus only on non-linear models. Follo wing Bayes rule, the joint posterior of ( λ , w ) , according to the data model in Eq. (23), is giv en by π ( λ , w | y ) ∝ π ( λ , w ) (2 π σ 2 d ) N/ 2 × exp    − P N i =1  y i − ˜ f  x i ; λ , ϕ ( x i ) ⊤ w   2 2 σ 2 d    . (24) Straightforward sampling of the abov e posterior will again not av oid confounding of model parameters and bias weights. T o achieve the required disambiguation, we propose tw o approaches – one based on model linearization (LOGP) and the other based on regularization (R OGP) – that are built on the same OGP foundations introduced in Section 3.2. 8 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 4.1 Linearized OGP Similar to OGP for additi ve model error, we start by deﬁning a best parameter v alue, λ ∗ , according to the L 2 loss weighted by a measure µ ov er X λ ∗ = argmin λ ∈ Λ L L 2 ( µ ) { f t ( · ) − f ( · , λ ) } . T o approximate the true function using the predicti ve function, ˜ f , we ﬁrst linearize ˜ f using a ﬁrst-order T aylor series expansion about w ∗ (yet to be deﬁned) and λ ∗ : g ( x ; λ , w ) ≈ ˜ f ( x ; λ ∗ , δ w ∗ ( x )) + ∇ λ ˜ f   ⊤ ( λ ∗ , w ∗ ) ∆ λ + ∇ w ˜ f   ⊤ ( λ ∗ , w ∗ ) ∆ w = ˜ f ( x ; λ ∗ , δ w ∗ ( x )) + ∇ λ ˜ f   ⊤ ( λ ∗ , w ∗ ) ∆ λ + ∂ δ w ( x ) ˜ f   ( λ ∗ , w ∗ ) ∇ w δ w ( x )   ⊤ w ∗ ∆ w = ˜ f ( x ; λ ∗ , δ w ∗ ( x )) + ∇ λ ˜ f   ⊤ ( λ ∗ , w ∗ ) ∆ λ + ∂ δ w ( x ) ˜ f   ( λ ∗ , w ∗ ) m − 1 X j =0 ϕ j ( x )( w j − w ∗ j ) , where ∆ λ = λ − λ ∗ , ∆ w = w − w ∗ , ∇ λ ˜ f := ∇ λ ˜ f ( x ; λ , δ w ( x )) and ∂ δ w ( x ) ˜ f := ∂ δ w ( x ) ˜ f ( x ; λ , δ w ( x )) . Now , setting w ∗ = 0 and ev aluating the above linearized function at λ = λ ∗ , we get f t ( x ) ≈ f ( x ; λ ∗ ) + ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) m − 1 X j =0 ϕ j ( x ) w j = f ( x ; λ ∗ ) + ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ϕ ( x ) ⊤ w . (25) W ith this separation of the computer model and model error contributions, the orthogonality constraint on δ w ( x ) can be deriv ed by substituting Eq. (25) into the loss function: L L 2 ( µ ) { f t ( . ) − f ( . ; λ ) } = Z X  f ( x ; λ ∗ ) + ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) δ w ( x ) − f ( x ; λ )  2 dµ ( x ) . Evaluating the gradient of the loss with respect to λ at λ ∗ results in the required constraints: Z X ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ∇ λ f   λ ∗ δ w ( x ) dµ ( x ) = 0 . (26) W e see that, as in the additive GP case, we ha ve p linear constraints in volving the embedded GP: L k ( δ w ( x )) := Z X ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ∂ λ k f   λ ∗ δ w ( x ) dµ ( x ) = 0 , k = 1 , · · · , p . (27) The modiﬁed cov ariance for LOGP , k λ ∗ ( x , x ′ ) , has the same form as Eq. (20) but with the inte grals in Eqs. (21) – (22) now also including the gradient ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) : h λ ∗ ( x ) = Z X ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ( x ′ ) ∇ λ f ( x ′ ; λ )   λ ∗ k ( x , x ′ ) dµ ( x ′ ) ∈ R p , (28) H λ ∗ = Z X Z X ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ( x ′ ) ∇ λ f ( x ′ ; λ )   λ ∗ × ∂ δ w ( x ) ˜ f   ( λ ∗ , 0 ) ( x ) ∇ λ f ( x ; λ )   ⊤ λ ∗ k ( x ′ , x ) dµ ( x ′ ) dµ ( x ) ∈ R p × p . (29) W e can now obtain the WS GP for k λ ∗ ( x , x ′ ) , numerically , which can be used for inference in Eq. (24). 4.2 Regularized OGP A dif ferent approach to enforce the orthogonality constraints is to use re gularization terms with the prior . The adv antage of this approach is that the constraints in the embedded GP can be applied without an y linearization of the model. In addition, unlike the LOGP construction, this approach lea ves the eigenfunctions of the original cov ariance kernel unchanged. Consequently , one may e xploit analytical eigenfunctions, such as those available for the SQE kernel under 9 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T a Gaussian measure on the inputs, thereby a voiding the numerical errors that often arise in estimating eigenfunctions. T o construct the necessary regularization terms, we begin again by approximating the unkno wn true function with: f t ( x ) ≈ ˜ f ( x ; λ ∗ , ϕ ( x ) ⊤ w ) . Now , we expand the loss as L L 2 ( µ ) { f t ( . ) − f ( ., λ ) } = Z X  ˜ f ( x ; λ ∗ , ϕ ( x ) ⊤ w ) − f ( x ; λ )  2 dµ ( x ) . Evaluating the gradient of this loss with respect to λ at λ ∗ results in the required constraints Z X h ˜ f ( x ; λ ∗ , ϕ ( x ) ⊤ w ) − f ( x ; λ ∗ ) i ∇ λ f ( x ; λ )   λ ∗ dµ ( x ) = 0 . Again, we hav e p non-linear constraints in δ w ( x ) : R k ( λ ∗ , δ w ( x )) := Z X h ˜ f ( x ; λ ∗ , ϕ ( x ) ⊤ w ) − f ( x ; λ ∗ ) i ∂ λ k f ( x ; λ )   λ ∗ dµ ( x ) = 0 , k = 1 , · · · , p (30) which can be enforced as regularization terms along with the prior , resulting in a joint log-posterior written as log π ( λ , w | y ) ∝ log π ( λ , w ) − N X i =1  1 2 log(2 π σ 2 d ) + 1 2 σ 2 d n y i − ˜ f ( x i ; λ , ϕ ( x i ) ⊤ w ) o 2  − 1 2 p X k =1 α k R k ( λ ∗ , δ w ( x )) 2 (31) where { α k } denote the penalty parameters. The values of α = { α k } must be decided according to ho w strongly one wishes to enforce the orthogonality constraint. The larger the value of α k , the closer will be the parameter posterior to λ ∗ k . Howe ver , our numerical experiments also demonstrate that large v alues of α introduce strong correlations among the GP weights, leading to lar ger compute times with Markov chain Monte Carlo (MCMC) sampling. Therefore, we recommend choosing α so that the parameter posteriors are “close enough" to λ ∗ as deemed by the user , while av oiding an ov erly expensi ve MCMC sampling. 5 Likelihood Inf ormed Subspace As demonstrated in Section 2.2, achie ving a close correspondence between the WS and FS formulations requires retaining a suf ﬁciently large number of basis functions and associated weights, especially when making predictions far from the training data. Howe ver , retaining many weights substantially increases the computational cost of MCMC sampling, since the resulting inv erse problem becomes high-dimensional. In practice, giv en data size limitations, only a subset of the model parameters and GP weights are likely to be strongly informed by the data, while the remaining weights, typically those corresponding to higher-order basis functions, would hav e posteriors that remain close to their priors. This observation moti vates the use of the Likelihood-Informed Subspace (LIS) method [13, 55], which allows the retention of a large basis set while reducing the ef fectiv e dimensionality of the Bayesian in verse problem. Beyond dimensionality reduction, LIS provides a principled mechanism for combining the posterior of the data-informed parameters, obtained via projection onto the LIS, with the prior of the uninformed parameters, obtained via projection onto the complementary subspace (CS). LIS has been de veloped for both linear and nonlinear model constructions [13, 55]. Notably , in the nonlinear setting, the method identiﬁes a lo w-dimensional subspace in the parameter space, referred to as the global LIS, along which the likelihood most strongly alters the prior . This global LIS is estimated as the Monte Carlo mean of local LISs at sev eral points in the parameter space. The posterior can then be approximated as the product of a reduced posterior supported on the global LIS and the prior marginalized onto its CS. W ith calibration parameters θ = ( λ , w ) ∈ R p + m , the posterior admits the approximation π ( θ | y ) ≈ ˆ π ( θ | y ) ∝ π ( y | P r θ ) π ( θ ) = π ( y | U r θ r ) π r ( θ r ) π ⊥ ( θ ⊥ ) = π ( θ r | y ) π ⊥ ( θ ⊥ ) (32) where P r denotes the rank- r projector onto the global LIS; U r denotes the global LIS basis with θ r representing the corresponding coordinates and θ ⊥ denotes the global CS coordinates. Thus, the ef fective dimension for MCMC sampling is reduced to r ≤ p + m . The value of r is chosen to retain all meaningful/signiﬁcant likelihood-informed directions. W e provide a summary of the method in Appendix A and refer the reader to [13, 55] for further details. 10 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 6 A pplications Next we demonstrate the ke y features of our methodology through three examples – a linear model, a non-linear model with interacting sub-models, and an advection–diffusion–reaction (ADR) PDE. In each case, the goal is to calibrate a “ﬁt" model with respect to a “truth" model while accounting for the model error using the embedded GP (additiv e GP in the linear case). In all the applications, we compare the push forward posterior through the ﬁt model on its own, denoted by PFP- f , the push forw ard posterior through the GP-augmented ﬁt model, denoted by PFP- g , and the posterior predictive, denoted by PP , which includes both the GP-augmented ﬁt model and the measurement noise. This comparison will allow us to demonstrate the key strengths of the proposed method which are: (i) the orthogonality constraints on the embedded GP lead to predictiv e distributions obtained with the ﬁt model on its o wn that are meaningful, in the sense that they closely follow predictions obtained with the least-squares estimates of the model parameters, (ii) to the e xtent allo wed by the ﬁt model structure, the embedded GP provides ﬂexibility in capturing the spatial correlations of the model error and hence addressing the structural deﬁciencies of the ﬁt model. The No-U-T urn-Sampler (NUTS) algorithm [23] implemented in the python package PyMC [1] is used for all the MCMC results obtained with R OGP . F or LOGP with LIS, we use a combination of emcee [16] and PyMC . 6.1 Linear Model Let the truth model, f t ( x ) = 2 + 2 x + 3 x 2 − 5 x 3 , and the ﬁt model, f ( x ; λ ) = λ 0 + λ 1 x, be deﬁned on x ∈ [ − 3 , 3] with λ = ( λ 0 , λ 1 ) denoting the calibration parameters. The measurement data are generated by ev aluating the truth model and adding iid Gaussian noise with standard de viation σ d = 0 . 2 . W e ﬁrst demonstrate the model calibration using the K OH method to highlight its inability to disambiguate between model and bias predictions. The GP augmented ﬁt model is giv en by g ( x ; λ , w ) = λ 0 + λ 1 x + ϕ ( x ) ⊤ w . W e start with N = 20 data points across the x -axis, drawn randomly from a uniform distrib ution on [ − 1 , 1] , and use a SQE kernel with l = 0 . 3 and σ f = 1 . W e restrict the data set to this domain in order to sho w the behaviour of predicti ve distributions when e xtrapolating away from the data. The GP expansion uses m = 20 weights (this choice is based on a con ver gence study that is detailed later in this section). A Gaussian prior is placed on the model parameters, π ( λ ) ∼ N  [ − 2 , 4] , diag(1 , 1)  . As described previously in Section 2.2, the prior on the GP weights is Gaussian with a diagonal covariance matrix consisting of the kernel eigen values in order to maintain equi valence with the corresponding function vie w . W e also note that the posteriors in this linear model case are obtained analytically , for both K OH and OGP , since both the prior and the likelihood are Gaussian. Figures 3a and 3b present the mar ginal posterior distrib utions of the parameters λ . The solid black lines denote the optimal parameter values obtained by solving the corresponding least squares problem. While the marginal posteriors indicate a degree of information g ain relativ e to the priors, becoming narrower , they do still indicate a signiﬁcant degree of uncertainty despite the highly-informati ve data set, and are further a way from the LS estimates in comparison to the OGP results in Figure 4 discussed later below . Figure 3c shows the PFP and PP distributions. Because the marginal parameter posteriors are weakly informed, the resulting predictions from the ﬁt model are not reliable. As expected, the GP-augmented ﬁt model, shown in orange, accurately captures the trend of the data giv en the ﬂexibility of the GP correction and lastly , the posterior predictive plotted in green fully “cov ers" all the data points. 11 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 4 2 0 2 4 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 PDF LS P rior P ost (a) Marginals of λ 0 2 0 2 4 6 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 PDF LS P rior P ost (b) Marginals of λ 1 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 2 0 2 4 6 8 1 0 output P F P - f P F P - g P P T ruth Data (c) Push forward predictions Figure 3: Frames (a) and (b) show mar ginal priors, posteriors and least-squares estimates for λ 0 and λ 1 obtained with K OH. Panel (c) shows the truth function, data points and push forward posterior predictions. PFP lines show the mean prediction and bands show ± three standard deviations. 4 2 0 2 4 0 0 1 2 3 4 5 6 7 8 PDF LS P rior P ost (a) Marginals of λ 0 2 0 2 4 6 1 0 1 2 3 4 PDF LS P rior P ost (b) Marginals of λ 1 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 2 0 2 4 6 8 1 0 output P F P - f P F P - g P P T ruth Data (c) Push forward predictions Figure 4: Frames (a) and (b) show mar ginal priors, posteriors and least-squares estimates for λ 0 and λ 1 obtained with OGP . P anel (c) shows the truth function, data points and push forward posterior predictions. PFP lines sho w the mean prediction and bands show ± three standard deviations. Figure 4 sho ws the results for marginal posteriors, PFP , and PP distributions obtained with the OGP method. The parameter posteriors are no w better informed by the data and lie in the vicinity of the LS parameter v alues. Accordingly , the push forward posterior through the ﬁt model on its own provides meaningful predictions, as the model now has full latitude in learning from the data, while letting the additi ve GP capture the residual discrepanc y . W e note that, gi ven the inﬁnite support of our prior , the LS estimate is expected to be on top of the MAP estimate without model error consideration in the inﬁnite data limit, making it an ideal reference. It is also instructiv e to compare the joint distributions of the model parameters and GP weights obtained using the two methods, as shown in Figure 5. In the KOH formulation, a clear correlation emerges between the parameter λ 0 and the weight w 0 , illustrating the conﬂation between the ﬁt model and the GP component. In contrast, the OGP-based formulation yields joint distributions in which the model parameters and GP weights are effecti vely uncorrelated. This decoupling not only reﬂects the intended separation between model and bias contributions but also leads to more efﬁcient MCMC sampling as will be described in the ne xt example. The emergence of only mild correlations under OGP is easy to see in this linear case. The linear model can be re written as, y = [ G ⊤ Φ ⊤ ]  λ w  + ε , ε ∼ N ( 0 , Σ obs ) 12 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 0 1 w 0 w 1 0 w 2 1 w 0 w 1 w 2 (a) K OH 0 1 w 0 w 1 0 w 2 1 w 0 w 1 w 2 (b) OGP Figure 5: Frames (a) and (b) sho w the joint posterior distrib utions of model parameters and GP weights for N = 20 and m = 20 obtained with KOH and OGP respecti vely . where G ⊤ = [ 1 X ] ∈ R N × 2 and Φ ⊤ ∈ R N × m , obtained by stacking the basis function e valuations at all data points. W ith a Gaussian prior π ( λ , w ) = N  µ λ µ w  ,  Σ λ 0 0 Σ w  and a Gaussian likelihood, the posterior is also Gaussian with the precision matrix gi ven by Σ − 1 pos =  G Φ  Σ − 1 obs  G ⊤ Φ ⊤  +  Σ − 1 λ 0 0 Σ − 1 w  = " G Σ − 1 obs G ⊤ + Σ − 1 λ G Σ − 1 obs Φ ⊤ ΦΣ − 1 obs G ⊤ ΦΣ − 1 obs Φ ⊤ + Σ − 1 w # The applied orthogonality constraint R X ∇ λ f ( x ; λ )   λ ∗ δ ( x ) dµ ( x ) = 0 is equiv alent to G Σ − 1 obs Φ ⊤ = 0 (since ε is iid and Σ obs ∝ I ) giv en that ∇ λ f ( x ; λ ) = [1 x ] ⊤ . Therefore, the of f-diagonal terms in the posterior precision are zero and hence the cross-correlations between the model parameters and GP weights in the posterior distrib ution are also zero. As will be demonstrated in the next example, orthogonalization also leads to reduced correlations between model parameters and the GP weights in the non-linear model error embedding case. When Σ obs is not diagonal, the cross-correlations are expected to be non-zero. W e now examine con ver gence of the marginal posterior distrib utions of model parameters for both K OH and OGP as the number of GP basis functions increases. Results are shown in Figure 6. For K OH, shown with dashed lines, we observe that approximately 20 basis functions are suf ﬁcient to achiev e con verged mar ginal posteriors for λ . Howe ver , the posterior distributions change substantially when increasing the number of basis functions from 5 to ≥ 20. This behaviour is a direct consequence of the confounding between the model parameters and the GP weights inherent in this approach. In contrast, the posterior distrib utions obtained using OGP , sho wn with solid lines, con ver ge faster , remaining largely unchanged across dif ferent numbers of basis functions, and staying close to the LS solution. This is again attributed to the orthogonality constraint in OGP . The number of requisite GP basis functions can also be selected by analyzing the PP standard de viation, which is a reliable con vergence indicator since it accounts for combined ef fects of model parameters and GP weights. Figure 7 shows the con ver gence of the PP standard de viation with increasing number of basis functions, for both interpolation and e xtrapolation. In the interpolation re gion, the PP standard de viation for both K OH and OGP is ef fectiv ely con verged with m = 20 . Howe ver , in the extrapolation region, the standard deviation is signiﬁcantly lar ger for K OH. This increase arises from the high uncertainty in the model parameters, as sho wn in Figure 6. Because this uncertainty scales with x , it leads to a gro wing standard deviation as predictions mov e farther away from x = 0 . In contrast, for OGP , the uncertainty in the extrapolation re gion is driv en solely by the additiv e GP . As the number of basis functions increases, this uncertainty approaches the prior standard de viation, σ f = 1 , thereby recov ering equiv alence with the FS view of GP . 13 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 0 1 2 3 4 5 0 0 2 4 6 8 PDF LS m = 5 m = 1 0 m = 2 0 m = 4 0 (a) PDF of λ 0 4 2 0 2 1 0 1 2 3 4 5 PDF LS m = 5 m = 1 0 m = 2 0 m = 4 0 (b) PDF of λ 1 Figure 6: Panels (a) and (b) sho w the con ver gence in marginal posteriors of λ 0 and λ 1 , respecti vely , with increasing number of basis functions. In both plots, the dashed lines sho w results with K OH and solid lines show results with OGP . 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 0 . 2 0 0 . 2 2 0 . 2 4 0 . 2 6 0 . 2 8 0 . 3 0 PP standar d deviation m = 5 m = 1 0 m = 2 0 m = 4 0 (a) Interpolation region 2 0 2 x 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 2 . 0 0 PP standar d deviation m = 5 m = 1 0 m = 2 0 m = 4 0 (b) Extrapolation region Figure 7: Panels (a) and (b) show con ver gence in posterior predictive standard de viation with increasing number of basis functions in interpolation and extrapolation regions respecti vely . In both plots, the dashed lines show results with K OH and solid lines show results with OGP . T o illustrate the asymptotic beha vior of K OH and OGP , Figure 8 shows the conv ergence of the marginal posterior distributions of λ as the number of data points increases. F or this study , we use m = 5 basis functions, which allo ws the posterior distributions to con verge more rapidly with increasing data because the calibration problem in volv es only sev en parameters. The posterior modes obtained using K OH, shown with dashed lines, do not approach the LS solution ev en when the number of data points is increased to 1000. This observ ation is consistent with the theoretical results of [56], where they sho w that the maximum likelihood estimate in the K OH framew ork con verges asymptotically to a value that can dif fer substantially from the LS solution. Moreover , the posterior distributions remain relati vely broad ev en for N = 1000 . In contrast, the posterior distributions obtained with OGP , shown with solid lines, remain tightly concentrated around the LS solution for each value of N (the LS solutions themselves are omitted from the ﬁgure to av oid visual clutter). Moreov er , the zoomed-in OGP posteriors shown in Fig. 9 indicate that the support does get narrower with increasing data, as w ould be expected gi ven the added information content. Further , the KOH dif fuse posteriors are accompanied by strong correlations between the model parameters and the GP weights, speciﬁcally between λ 0 and w 0 , and between λ 1 and w 1 , as shown in Figure 10a. By contrast, as shown in Figure 10b, the OGP posteriors show no correlations between model parameters and GP weights. On the other hand, it is notew orthy that particularly strong correlations emerge among OGP weights whose indices differ by two, which is to some extent, also the case for some of the K OH GP weights. 14 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 0 2 4 6 0 0.0 0. 2 0.4 0.6 0. 8 1.0 K OH PDF 0 10 20 30 40 50 60 OGP PDF N = 20 N = 200 N = 500 N = 1000 (Dashed) (Solid) (a) 8 6 4 2 0 2 1 0.0 0. 2 0.4 0.6 0. 8 1.0 1. 2 K OH PDF 0 5 10 15 20 25 30 35 OGP PDF N = 20 N = 200 N = 500 N = 1000 (Dashed) (Solid) (b) Figure 8: Panels (a) and (b) sho w the con ver gence in marginal posteriors of λ 0 and λ 1 , respecti vely , with increasing number of data points for m = 5 . In both plots, the dashed lines sho w results with K OH and solid lines sho w results with OGP . 2 . 8 2 . 9 3 . 0 3 . 1 3 . 2 0 0 1 0 2 0 3 0 4 0 5 0 6 0 OGP PDF N = 2 1 N = 2 0 0 N = 5 0 0 N = 1 0 0 0 (a) 1 . 2 1 . 0 0 . 8 0 . 6 1 0 5 1 0 1 5 2 0 2 5 3 0 3 5 OGP PDF N = 2 1 N = 2 0 0 N = 5 0 0 N = 1 0 0 0 (b) Figure 9: Zoomed-in view of the OGP mar ginal posteriors from Fig. 8. Next, we demonstrate how the LIS approach enables us to use an arbitrarily large number of basis functions in the WS GP . W e start with m = 400 basis functions in the GP , giving us a total of 402 calibration parameters. W ith linear models, the local LIS from one point in the parameter space to another does not change because the Hessian of the log likelihood function is independent of the parameter v alue. Hence the global LIS and local LIS are equiv alent. For N = 20 data points, the rank obtained for LIS, with an eigen value cut-off of 0.1, is r = 10 . This indicates that the data effecti vely inform only 10 directions in the high dimensional parameter space. Thus, it is suf ﬁcient to perform a reduced inference of dimension 10 in the LIS and sample from the prior projected onto the CS for the remaining 392 directions. Finally , the posterior samples from the LIS and the prior samples from the CS can be combined via Eq (32) to approximate the full high-dimensional posterior . The PP distribution, and the push-forward posterior through the GP component, denoted by PFP-GP , obtained with LIS, are shown in Figure 11a. For comparison, we also plot in Figure 11b the results obtained with m = 10 basis functions but without employing LIS. W e observe that the inclusion of a lar ger set of basis functions remov es the oscillatory behaviour of the predictions seen in Figure 11b, which is an artifact of truncating the GP with very few weights as e videnced in PFP-GP . These oscillations are also visible in Figure 7b. The PFP-GP distribution of LIS cleanly attains the prior standard deviation of σ f = 1 as we extrapolate outside the data regime without an y additional cost for inference. It is also informative to examine how increasing the number of data points affects the prediction intervals. This is illustrated in Figure 12 for N = 500 . The LIS rank with 0.1 eigenv alue cut-off increases to r = 12 as the av ailability of 15 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 0 1 w 0 w 1 w 2 w 3 0 w 4 1 w 0 w 1 w 2 w 3 w 4 (a) K OH 0 1 w 0 w 1 w 2 w 3 0 w 4 1 w 0 w 1 w 2 w 3 w 4 (b) OGP Figure 10: Panels (a) and (b) show the joint posterior distributions of model parameters and GP weights for N = 1000 and m = 5 obtained with KOH and OGP respecti vely . 3 2 1 0 1 2 3 x 4 2 0 2 4 6 8 output PP PFP - GP (a) W ith LIS, r = 10 and m = 400 3 2 1 0 1 2 3 x 4 2 0 2 4 6 8 output PP PFP - GP (b) W ithout LIS, m = 10 Figure 11: For N = 20 , panel (a) shows the posterior predicti ve and push-forward posterior through the GP when LIS with r = 10 and m = 400 is used; panel (b) sho ws the predictions without LIS and m = 10 . Solid lines show mean predictions and bands show ± three standard deviations. more data can inform a lar ger set of directions in the parameter space. With a denser data set, the ± three standard deviation bands of PFP-GP contract to nearly zero within the interpolation region because of the iid noise model. Howe ver , due to the inclusion of a large set of GP basis functions, the GP mean fully accounts for the structural deﬁciencies of the ﬁt model. Recall that random variable embedding (see [53]), on the other hand, retains ﬁnite prediction interv als as a compromise for model error ﬂe xibility ev en as the data is increased. Lastly , e ven though the prediction intervals shrink in the interpolation region, the prior distrib utions are faithfully recovered in the extrapolation regime with LIS. 6.2 Non-linear Model Consider the truth model to be f t ( x ) = exp  1 − 0 . 5 x + x 2 + x 3  and the ﬁt model to be f ( x ; λ ) = sin ( λ 0 x ) | {z } S 1 + exp ( λ 1 x ) | {z } S 2 16 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 3 2 1 0 1 2 3 x 4 2 0 2 4 6 8 output PP PFP - GP (a) W ith LIS, r = 12 and m = 400 3 2 1 0 1 2 3 x 4 2 0 2 4 6 8 output PP PFP - GP (b) W ithout LIS, m = 12 Figure 12: For N = 500 , panel (a) shows the posterior predictive and push-forward posterior through the GP when using LIS with r = 12 and m = 400 . Panel (b) sho ws the predictions without LIS, and for m = 12 . Solid lines show mean predictions and bands show ± three standard deviations. deﬁned on x ∈ [ − 2 , 2] with λ = ( λ 0 , λ 1 ) . The ﬁt model can be thought of as a combination of two sub-models, S 1 and S 2 , whose predictiv e capability can be compared with GP embedded corrections. First, consider embedding in S 1 , e.g . as ˜ f ( x ; λ , δ w ( x )) = sin( λ 0 x + δ w ( x )) + exp( λ 1 x ) . It is evident that embedding model error within this particular submodel will not signiﬁcantly improv e the predictions, since the true underlying function is exponential and therefore structurally mismatched with the submodel. In practice, howe ver , the form of the data-generating process is rarely known. In such cases, indeed, introducing embedded model error into individual submodels that comprise the ﬁt model provides a systematic way to assess the relati ve contribution and importance of each submodel to overall model error . The fact that model error embedding in S 1 does not much improv e predictions provides diagnostic information as to the dominant role of this misspeciﬁed term in subsequent predictiv e errors. In fact, its detrimental ef fects will still be evident in the present case when embedding in S 2 , which we focus on now . Let us employ GP Embedding in S 2 as follows: ˜ f ( x ; λ , δ w ( x )) = sin( λ 0 x ) + exp( λ 1 x + δ w ( x )) with a Gaussian prior π ( λ ) ∼ N  [ − 3 , 0] , diag(1 , 1)  on the model parameters, and an SQE kernel with σ f = 1 and l = 0 . 3 for the GP . First we analyse the performance of con ventional inference without the application of any orthogonality constraint. The conv ergence of model parameter posteriors with increasing number of basis functions and PFP distributions for N = 50 data points are shown in Fig. 13. Similar to K OH with the linear model, the λ posteriors are quite broad because of strong correlations with the GP weights, as e vident in Fig. 14. Also, the minimum ef fectiv e sample size (ESS) for a total MCMC chain length of 40000 with a burn-in period of 10000 was 12500 for λ and 11000 for w . These values are lower in comparison to both LOGP and R OGP discussed below . Lastly , the resulting PFP- f distribution does not capture the data well, as the resulting posterior of λ 1 , which controls the data trend, does not include the LS solution in its support. As shown in the linear model e xample, to decide the appropriate number of basis functions in the GP embedding, one can either use LIS or perform a con ver gence study of the PP standard de viation. W e note that the LIS formulation of [13, 55], while in principle applicable for nonlinear models, is in fact feasible as-is only for LOGP in the present context, as it relies on the assumption of Gaussian priors. Extending this formulation to the R OGP setting, which incorporates additional regularization terms in the prior, would require non-tri vial modiﬁcations to the existing LIS framew ork. Consequently , we employ LIS to retain a lar ge number of basis functions for LOGP , while for ROGP we instead perform a con ver gence study to select an appropriate basis size. Giv en the non-linear model, we need to ﬁrst discov er the global LIS, which is approximated as the Monte Carlo mean of a set of local LISs e valuated at a set of parameter values from the posterior distrib ution. T o this end, we use the adaptiv e procedure of [13] (see Appendix A for further details on this) and use the emcee package [16] for subchain 17 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 6 4 2 0 0 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 1 . 2 5 1 . 5 0 1 . 7 5 PDF LS P rior m = 5 m = 1 0 m = 2 0 m = 4 0 (a) Marginal of λ 0 2 1 0 1 2 3 1 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 PDF LS P rior m = 5 m = 1 0 m = 2 0 m = 4 0 (b) Marginal of λ 1 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P F P - f P F P - g P P Data (c) Push forward predictions with m = 40 Figure 13: For N = 50 , frames (a) and (b) show marginal priors, posteriors and least-squares estimates for ( λ 0 , λ 1 ) with con ventional inference. Panel (c) sho ws the data and posterior push forw ard predictions. PFP lines sho w the mean prediction and bands show ± three standard deviations. 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 Figure 14: Joint posterior distribution obtained with con ventional inference. simulations. This package allows us to set the initial state of the chain which is a necessary requirement of the adapti ve procedure, howe ver , any other similarly-equipped MCMC algorithm can also be emplo yed. After the global LIS is discov ered, we use the NUTS algorithm [23] implemented in PyMC [1] for sampling in this global LIS. The posterior distributions for LOGP with m = 400 are shown in Fig. 15. The parameter posterior distrib utions were approximated using LIS posteriors of rank r = 13 for an eigenv alue cut-off of 0.1. First, note that the CS distributions are quite narrow and centered at 0 indicating that the parameter posteriors are decided entirely by the LIS. The posterior of λ 0 is not in the vicinity of its LS solution, which may be attrib uted to the linearization of the model for orthogonality constraints and also the small dataset size. Howe ver this is not an issue since the posterior of λ 1 , which decides the data trend through the exponential term, is indeed closer to its LS solution resulting in a meaningful PFP- f distribution, as shown in Fig. 15c. The ESS values for the LIS parameters (18000) are much higher than those obtained abo ve using con ventional inference, without imposition of orthogonality constraints. This impro vement is also e vident in the joint posterior distribution plots sho wn in Fig. 16a. The weak correlations observed among the LIS parameters arise because the LIS basis diagonalizes the Hessian of the log-likelihood [13], leading to highly efﬁcient MCMC sampling in the LIS space. The joint posterior distrib utions between the model parameters and the ﬁrst ten GP weights, estimated using 18 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T samples from the LIS and CS, are shown in Fig. 16b. W eak correlations between the model parameters and the GP weights are observed, which can be attrib uted to the enforced orthogonality . Howe ver , strong correlations are present among the GP weights themselves. 6 4 2 0 0 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 LIS PDF LIS+CS LIS 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 CS PDF CS LS (a) Marginal of λ 0 1 0 1 2 3 1 0 5 1 0 1 5 2 0 2 5 3 0 LIS PDF LIS+CS LIS 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 CS PDF CS LS (b) Marginal of λ 1 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P F P - f P F P - g P P Data (c) Push forward predictions Figure 15: For N = 50 , m = 400 and r = 13 , frames (a) and (b) show the marginal LIS posterior (red), CS prior (green) and full marginal posterior approximation (LIS+CS shown in blue) for λ 0 and λ 1 respectiv ely . Note that LIS+CS marginal is on top of LIS mar ginal. Panel (c) sho ws the data and push forward posterior predictions. 0 1 2 3 4 5 6 0 7 1 2 3 4 5 6 7 (a) LIS parameters 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 (b) Model parameters and ﬁrst ten GP weights Figure 16: Frame (a) shows the joint posterior distributions of LIS parameters and frame (b) shows the joint posterior distributions of model parameters and GP weights for N = 50. W ith a larger data set of size N = 1000 , the resulting parameter posteriors and joint distrib utions are sho wn in Figs. 17 and 18 respectiv ely . Both model parameters now hav e posterior distributions more concentrated around the LS solution when compared to the N = 50 case. W e also observe much stronger correlations between the GP weights according to Fig. 18b. Howe ver , they can still be captured accurately as MCMC sampling happens in the LIS where the parameters hav e uncorrelated posteriors, as seen in Fig. 18a. Overall, LOGP in conjunction with LIS serves as a v ery ef ﬁcient strategy for GP error embedding in non-linear models. The posterior distrib utions with R OGP are sho wn in Figs. 19a and 19b. W e observe con vergence to the LS solution with just m = 10 basis functions; ho wev er con ver gence study of PP standard de viation showed that at least m = 20 basis functions are required for predictions in both interpolation and extrapolation regions. A meaningful PFP- f distribution is recov ered as shown in Fig. 19c corresponding to the LS prediction, in the high x > 0 region, as the problematic sine function S 1 term becomes less signiﬁcant with respect to the e xponential S 2 term in this re gion, and its detrimental 19 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 6 4 2 0 0 0 2 4 6 8 LIS PDF LIS+CS LIS 0 2 0 0 0 4 0 0 0 6 0 0 0 8 0 0 0 1 0 0 0 0 1 2 0 0 0 CS PDF CS LS (a) Marginal of λ 0 1 0 1 2 3 1 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 LIS PDF LIS+CS LIS 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 CS PDF CS LS (b) Marginal of λ 1 Figure 17: For N = 1000 , m = 400 and r = 15 , frames (a) and (b) show the marginal LIS posterior (red), CS prior (green) and full marginal posterior approximation (LIS+CS sho wn in blue) for λ 0 and λ 1 respectiv ely . Note that LIS+CS marginal is on top of LIS mar ginal. 0 1 2 3 4 5 6 0 7 1 2 3 4 5 6 7 (a) LIS parameters 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 (b) Model parameters and ﬁv e GP weights Figure 18: Frame (a) shows the joint posterior distributions of LIS parameters and frame (b) shows the joint posterior distributions of model parameters and GP weights for N = 1000. effects are ameliorated. The joint posterior distrib utions for the m = 40 case, for both weak ( α 1 = 10 7 , α 2 = 50 ) and strong ( α 1 = 10 8 , α 2 = 50 2 ) α -enforcement of the orthogonality constraints, are shown in Fig. 20. For the weaker constraints case, we observ e correlations between the parameters and the weight w 1 . This also results in a negati ve correlation between λ 0 and λ 1 . These correlations mostly disappear when stronger constraints are used, ho wev er , strong correlations emerge between the GP weights. Further , with weaker constraints we get an ESS of 31000 for λ and 15000 for w , both signiﬁcantly higher than the numbers we reported above for the con ventional unconstrained inference case. Howe ver , with stronger constraints, the strong w correlations led to lower ESS v alues for it compared to conv entional inference without orthogonality , thereby increasing the computational cost of sampling. The resulting parameter posteriors and joint distributions for N = 1000 data points are shown in Figs. 21 and 22 respectiv ely . The support of λ 1 is narrower than the N = 50 case, and the posterior mode of λ 0 has shifted to the right side of the LS estimate. Note that, here, the same penalty parameter values of strong α -enforcement were used as in the N = 50 case and hence, the orthogonality constraint on λ 0 turned out to be slightly weaker . This also results in a positi ve correlation between λ 0 and λ 1 as seen in Fig. 22. W e observ e much stronger correlations between the GP weights as was the case also with LOGP . Howe ver , with ROGP we directly sample these GP weights which is much more challenging than operating with LIS. Consequently , the penalty parameter must be tuned on a case-by-case 20 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 2 . 7 5 2 . 5 0 2 . 2 5 2 . 0 0 1 . 7 5 0 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 PDF LS m = 5 m = 1 0 m = 2 0 m = 4 0 (a) Marginal of λ 0 2 . 4 2 5 2 . 4 5 0 2 . 4 7 5 2 . 5 0 0 2 . 5 2 5 1 0 5 1 0 1 5 2 0 2 5 3 0 3 5 PDF LS m = 5 m = 1 0 m = 2 0 m = 4 0 (b) Marginal of λ 1 1 . 0 0 . 5 0 . 0 0 . 5 1 . 0 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P F P - f P F P - g P P Data (c) Push forward predictions with m = 40 Figure 19: For N = 50 , frames (a) and (b) show the mar ginal priors, posteriors and least-squares estimates for λ 0 and λ 1 obtained with R OGP ( α 1 = 10 8 , α 2 = 50 2 ). Panel (c) shows the data and posterior push forward predictions. Lines in PFP show the mean prediction and bands sho w ± three standard deviations. basis to enforce suf ﬁcient orthogonality while also a voiding the introduction of excessi ve correlations among the GP weights. W e emphasize that the primary objectiv es are (a) achieving a good ﬁt between the stand-alone model and the data, and (b) ensuring that the GP-embedded model also ﬁts the data well, pro viding sufﬁcient correction of the residual discrepancy . In our current setup, orthogonality helps us achiev e goal (a). Therefore, to decide the penalty (hyper)-parameter , one can set up an outer optimization loop, where for each proposed value of α in Eq. (31) , one ﬁrst solves the Bayesian problem and then e valuates some statistic of the joint parameter posterior , say the mean and/or the magnitudes of the errors in (a) and (b). One can then decide how to choose the penalty according to some user -deﬁned combined error measure which may also account for some measure of the problem comple xity to av oid solving an unnecessarily hard problem without much to gain in terms of predicti ve accuracy . 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 (a) W eak constraints, α 1 = 10 7 , α 2 = 50 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 (b) Strong constraints, α 1 = 10 8 , α 2 = 50 2 Figure 20: Panels (a) and (b) show the joint posterior distributions obtained using R OGP with weak and strong constraints respectiv ely for N = 50 . Finally , the posterior predictiv e distributions along with push-forward posteriors through the GP (PFP-GP) when extrapolating outside the data regime using LOGP and R OGP are shown in Fig. 23. First, note that the prior predicti ve distributions differ between the two methods because LOGP emplo ys basis functions deri ved from a modiﬁed cov ariance kernel, whereas ROGP uses basis functions corresponding to the unmodiﬁed kernel. As a result, the LOGP prior predictiv e distribution is narro wer than that of ROGP , particularly over the interv al [0, 1], reﬂecting the inﬂuence of the 21 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 2 . 0 1 . 8 1 . 6 0 0 1 2 3 4 5 6 PDF LS m = 4 0 (a) Marginal of λ 0 2 . 4 4 2 . 4 6 2 . 4 8 1 0 1 0 2 0 3 0 4 0 5 0 6 0 PDF LS m = 4 0 (b) Marginal of λ 1 Figure 21: Parameter posteriors obtained with R OGP ( α 1 = 10 8 , α 2 = 50 2 ) for N = 1000 and m = 40 . 0 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 0 w 9 1 w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 Figure 22: Joint posterior distributions of model parameters and ﬁrst ten weights obtained with R OGP ( α 1 = 10 8 , α 2 = 50 2 ) for N = 1000 and m = 40 . modiﬁed cov ariance. In contrast, the PFP-GP distributions for both LOGP and R OGP match closely and successfully recov er the push-forward prior distributions in re gions sufﬁciently f ar from the data. 22 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 2 1 0 1 2 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P rior P r ed PFprior - GP (a) LOGP prior predictiv e 2 1 0 1 2 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output PFP - GP P P (b) LOGP posterior predictiv e 2 1 0 1 2 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P rior P r ed PFprior - GP (c) R OGP prior predictiv e 2 1 0 1 2 x 5 . 0 2 . 5 0 . 0 2 . 5 5 . 0 7 . 5 1 0 . 0 1 2 . 5 1 5 . 0 output P P PFP - GP (d) R OGP posterior predictiv e Figure 23: Panels (a) and (b) sho w the prior predicti ve and posterior predicti ve distrib utions obtained with LOGP and panels (c) and (d) sho w the same for R OGP . Solid lines correspond to mean predictions and bands sho w ± three standard deviations. 6.3 Advection-Diffusion-Reaction PDE Lastly , we consider a model linear non-homogeneous adv ection–diffusion–reaction PDE problem adapted from [47]. W e deﬁne the truth model as L ( u ) := ∂ u ∂ t + ∂ u ∂ x − ∂ 2 u ∂ x 2 − u = e − t  2 π cos(2 π x ) + 2(2 π 2 − 1) sin(2 π x )  on the interval x ∈ [0 , 1] , 0 ≤ t ≤ 1 with initial and boundary conditions gi ven by u ( x, 0) = sin(2 π x ) , u (0 , t ) = 0 , and u (1 , t ) = 0 . The ﬁt PDE model considered is L ( u ) = e − t (2 π 2 − 1) sin( λx ) which is missing the cosine term and half of the sine term. Embedding a GP in the source term gives a GP augmented ﬁt model of the form L ( u ) = e − t  (2 π 2 − 1) sin( λx ) + δ w ( x )  . Giv en data from the truth model, D = { u ( x i , t i ) | i = 1 , · · · , N } , our objectiv e is to infer both the ﬁt model parameter λ and the GP weights w . It’ s noteworthy that, being in the WS conte xt, by inferring the GP weights we ef fectively learn the structural form of the missing source term in the ﬁt model, thereby enabling model discov ery . W e consider randomly sampled data points from the spatio-temporal ﬁeld of the truth model with measurement noise σ d = 0 . 02 . In this example, for the ease of demonstration, we only show results with R OGP as LOGP would require the e valuation of a 4-D integral in Eq. (29) . Nonetheless, with an appropriate quadrature rule and ﬁnite-dif ference based gradient computations, LOGP can also be implemented. 23 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T 2 4 6 0 5 1 0 1 5 2 0 2 5 ROGP PDF ROGP LS 0 . 0 0 0 . 0 5 0 . 1 0 0 . 1 5 0 . 2 0 0 . 2 5 0 . 3 0 0 . 3 5 0 . 4 0 P rior PDF P rior (a) 6 . 4 5 6 . 5 0 6 . 5 5 0 5 1 0 1 5 2 0 2 5 PDF ROGP LS (b) Figure 24: P anel (a) shows the mar ginal prior , posterior and least-squares estimate of λ . Panel (b) presents the zoomed-in view of the mar ginal posterior . For this problem, it was necessary to optimize the SQE kernel hyperparameters to accurately capture the missing source term with the GP . T o this end, we optimized for the hyperparameter values by maximizing the log posterior in Eq. (31) while using 10 basis functions in the GP representation and 100 data points from the truth model. The optimization resulted in σ f = 100 and l = 0 . 6 and we use these values for subsequent inference of GP weights and model parameters. Figure 24 shows the mar ginal posterior distribution of λ , compared to the least-squares estimate. The R OGP posterior aligns very closely with both the least-squares solution and the true v alue of 2 π used in the truth model. Joint distributions of the parameters are shown in Fig. 25. While λ remains uncorrelated with the GP weights, strong correlations are observ ed between some of the GP weights, speciﬁcally w 0 - w 2 and w 1 - w 3 which contrib ute to capturing the sine and cosine terms in the missing source. Figure 26 compares the true and recovered source terms. In particular , Fig. 26a compares the mean and three standard deviation intervals of the learned GP with the true missing source term. The slight difference between the learned GP and the true missing term is e xpected because the least-squares estimate of λ is not exactly equal to 2 π . The full source term, which also accounts for the contrib ution of λ , is in excellent agreement with the true source term as sho wn in Fig. 26b. Incorporating the learned source term into subsequent predictions reduced the maximum absolute error from 0.4 (in the absence of the GP source term) to 0.008. 24 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T w 0 w 1 w 2 w 3 w 4 w 5 w 6 w 0 w 1 w 2 w 3 w 4 w 5 w 6 Figure 25: Joint posterior distributions of λ and ﬁrst few GP weights. 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x 2 0 1 0 0 1 0 2 0 s G P ( x ) T ruth Infer r ed GP (a) 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x 4 0 3 0 2 0 1 0 0 1 0 2 0 3 0 4 0 s f u l l ( x ) T ruth Infer r ed sour ce (b) Figure 26: Panel (a) compares the inferred GP and the true missing source term, s GP ( x ) := 2 π cos(2 π x ) + (2 π 2 − 1) sin 2 π x . Panel (b) compares the inferred source term and the spatial component of the true source term, s full ( x ) := 2 π cos(2 π x ) + 2(2 π 2 − 1) sin 2 π x . Bands correspond to ± three standard deviations. 7 Conclusions W e have de veloped a ﬂe xible framework for embedded model error by adopting the weight-space representation of GPs. WS GPs allo w us to work with a ﬁnite set of weights, associated with the eigenfunctions of the underlying co variance kernel, enabling inference when the GP is embedded within non-linear models. Predictions with WS and FS GPs were compared, showing that the two match v ery closely near the training data. Ho wever , inevitable basis truncation led to a collapse of the predicti ve distribution with the WS GP when predictions were made f ar enough from the data, highlighting the need for a judicious choice of the number of basis functions, based on the extent of e xtrapolation, for accurate extrapolation relati ve to the FS GP . T o disambiguate the contributions of model parameters and model bias, we adapted the OGP formulation originally dev eloped for additi ve model error [43] to the embedded model error setting in general nonlinear models. The ke y idea is to enforce orthogonality between the model bias and the model gradient e valuated at optimal parameter v alues 25 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T deﬁned by a chosen loss function. W e proposed tw o approaches: one based on model linearization and the other based on regularization of the bias prior distribution without any linearization. In LOGP , linear constraints are used to modify the prior cov ariance of the embedded GP , and eigenfunctions of the modiﬁed cov ariance are computed numerically . In contrast, with R OGP , the model e valuated at the optimal parameter v alues is considered directly in the loss function leading to non-linear constraints on the embedded GP . These non-linear constraints are introduced as re gularization terms weighted by a penalty parameter along with the prior distribution of the embedded GP . The WS GP representation increases the dimensionality of the inference problem, which we addressed using the LIS method [13, 55] to identify a lo w-dimensional subspace where the likelihood most strongly informs the prior . This substantially reduced the effecti ve dimension in inference and enabled efﬁcient MCMC sampling. The proposed framew ork was demonstrated on three examples – a linear algebraic model, a non-linear algebraic model with interacting sub-models, and a linear non-homogeneous advection–dif fusion–reaction PDE. For the linear case, both LOGP and R OGP reduce to the standard additive OGP formulation. Con ver gence studies were used to determine the number of WS basis functions, while the use of LIS allowed large GP representations with minimal additional computational cost. The orthogonality constraints, with iid Gaussian noise on the data, not only successfully captured the least-squares estimates in the supports of the parameter posteriors but also translated to zero of f-diagonal terms in the posterior cov ariance resulting in uncorrelated model parameters and GP weights. In the non-linear example, model parameters remained weakly correlated with the GP weights, although correlations among the GP weights themselv es were observed. For LOGP , LIS can be applied directly , yielding uncorrelated LIS parameters and efﬁcient sampling. For R OGP , the penalty parameter controls the strength of orthogonality enforcement and inﬂuences posterior correlations. Hence, it must be chosen to balance constraint enforcement and sampling efﬁcienc y . In all cases, extrapolation far from the training data recovered the prior predicti ve distribution when a sufﬁciently rich WS representation was used. In the ADR example, MAP-based hyperparameter optimization was necessary for the embedded GP to accurately capture the missing source term. Subsequent predictions with the embedded GP led to a substantial reduction in the maximum absolute error . Future work can include sev eral directions. The proposed framew ork can be naturally extended to constructions including both internal and e xternal GP corrections, for which orthogonality constraints can be deri ved for each GP . The internal GP would mak e the model more ﬂexible while adhering to the physics of the problem while the external GP would capture residual model error . As noted earlier , the LIS framework must be modiﬁed for use with the R OGP approach, speciﬁcally by accounting for the regularization terms in the construction of the global LIS. While the methodology in this work was demonstrated using a weighted L 2 loss, other loss functions with unique minimizers for the model parameters can also be considered, for instance the loss function based on inner products in the Sobolev space as in [43] as well as user-deﬁned losses such as those based on moment matching. Orthogonality constraints for these cases can be deriv ed following the general procedure proposed in this w ork. A Likelihood Inf ormed Subspace Summary W e summarize in the following the LIS construction of [13, 55]. Recall that our model to be calibrated is giv en by y = ˜ f ( x ; λ , ϕ ( x ) ⊤ w ) + ε with calibration parameters θ = { λ , w } ∈ R p + m and prior π ( θ ) = N ( θ | µ p , Σ p ) . F or notational con venience, let ˜ f ( θ ) = [ ˜ f ( x 1 ; θ ) , · · · , ˜ f ( x N ; θ )] . Further , let the cov ariance matrix of the zero-mean measurement noise ε be Σ d . For non-linear models, the global LIS basis is b uilt by forming a Monte Carlo approximation of the e xpected prior-preconditioned Gauss-Ne wton Hessian (ppGNH), S = 1 n n X k =1 L T H ( θ ( k ) ) L (33) where θ ( k ) ∼ π ( θ | y ) and L is obtained from the symmetric decomposition of the prior cov ariance Σ p = LL ⊤ . The local Hessian, H ( θ ( k ) ) , is approximated by H ( θ ( k ) ) = ∇ θ ˜ f   ⊤ θ ( k ) Σ − 1 d ∇ θ ˜ f   θ ( k ) . Let ( δ 2 i , v i ) denote the eigen value–eigen vector pairs of the ppGNH, ordered such that δ 2 i > δ 2 i +1 . The LIS basis v ectors are giv en by u i = Lv i , and the rank- r LIS basis is U r = [ u 1 , . . . , u r ] . The associated rank- r projector is P r = U r W ⊤ r , 26 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T where W r = [ w 1 , . . . , w r ] with w i = L −⊤ v i . Choosing V ⊥ such that [ V r V ⊥ ] forms an orthonormal basis of R p + m , the projector onto the CS becomes P ⊥ = I − P r = U ⊥ W ⊤ ⊥ , where U ⊥ = LV ⊥ and W ⊥ = L −⊤ V ⊥ . Further , since P r is self-adjoint with respect to the inner product induced by the prior precision, any parameter v ector admits the decomposition θ = P r θ + P ⊥ θ = U r θ r + U ⊥ θ ⊥ , with θ r = W ⊤ r θ ∈ R r and θ ⊥ = W ⊤ ⊥ θ ∈ R p + m − r . The prior distribution factorizes accordingly: π ( θ ) = π r ( θ r ) π ⊥ ( θ ⊥ ) , where π r ( θ r ) = N ( θ r | W ⊤ r µ p , I r ) and π ⊥ ( θ ⊥ ) = N ( θ ⊥ | W ⊤ ⊥ µ p , I ⊥ ) . Finally , the posterior admits the approximation π ( θ | y ) ≈ ˆ π ( θ | y ) ∝ π ( y | P r θ ) π ( θ ) = π ( y | U r θ r ) π r ( θ r ) π ⊥ ( θ ⊥ ) = π ( θ r | y ) π ⊥ ( θ ⊥ ) (34) so that MCMC sampling is required only for the r LIS coordinates. Thus, the ef fecti ve dimension of the in verse problem is reduced from p + m to r ≤ p + m , and typically r ≪ p + m . T o retain all meaningful lik elihood-informed directions, we choose r such that the r -th eigen value is less than one, i.e. δ 2 r < 1 , and, in our embedded GP applications (see Section 6), we select r such that δ 2 r ≤ 0 . 1 . Finally , the posterior samples needed to approximate the ppGNH in Eq. (33) are obtained using Algorithm 1 of [13]. This is an adaptiv e procedure to ﬁnd the global LIS, where we start with the posterior mode and its corresponding local LIS. A subspace MCMC chain in this local LIS, with the initial state at the posterior mode projected onto the LIS, is simulated until we obtain an uncorrelated sample. The initial LIS is modiﬁed using this ne w sample, and Eq. (33) , to start a new subspace MCMC chain. This adaptation is terminated after a maximum allow able number of Hessian ev aluations or using subspace distance metrics. Once the global LIS is discov ered, a much longer MCMC chain can be initialized to obtain posterior samples from this global LIS. Acknowledgments MK, KS, and HNN ackno wledge support by the U.S. Department of Ener gy , Ofﬁce of Science, Of ﬁce of Advanced Scientiﬁc Computing Research (ASCR), Scientiﬁc Discovery through Advanced Computing (SciD A C) program, under the F ASTMath Institute. MK and MP also ackno wledge support by the U.S. Department of Defence, Of ﬁce of Na val Research (ONR), under the MURI grant no. N00014-21-1-2475. This article has been co-authored by emplo yees of National T echnology and Engineering Solutions of Sandia, LLC under Contract No. DE-NA0003525 with the U.S. Department of Energy (DOE). The emplo yees co-own right, title and interest in and to the article and are responsible for its contents. The United States Government retains and the publisher , by accepting the article for publication, acknowledges that the United States Gov ernment retains a non-e xclusiv e, paid-up, irrev ocable, world-wide license to publish or reproduce the published form of this article or allo w others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan https://www.energy.gov/downloads/doe- public- access- plan . The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the ofﬁcial policies or endorsements, either e xpressed or implied, of the ONR or the US gov ernment. References [1] O . A B R I L - P L A , V . A N D R E A N I , C . C A R RO L L , L . D O N G , C . J . F O N N E S B E C K , M . K O C H U R OV , R . K U M A R , J . L AO , C . C . L U H M A N N , O . A . M A RT I N , M . O S T H E G E , R . V I E I R A , T. W I E C K I , A N D R . Z I N K OV , PyMC: A modern and compr ehensive pr obabilistic pro gramming frame work in Python , PeerJ Computer Science, 9 (2023), https://doi.org/10.7717/peerj- cs.1516 . [2] D . A N D R É S A R C O N E S , M . W E I S E R , P . - S . K O U T S O U R E L A K I S , A N D J . F. U N G E R , Model bias identiﬁcation for bayesian calibration of stochastic digital twins of bridges , Applied Stochastic Models in Business and Industry , 41 (2025), p. e2897. [3] P . D . A R E N D T , D . W . A P L E Y , A N D W. C H E N , Quantiﬁcation of model uncertainty: Calibration, model discr epancy , and identiﬁability , Journal of Mechanical Design, 134 (2012), p. 100908, https://doi.org/10. 1115/1.4007390 . 27 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T [4] P . D . A R E N D T , D . W . A P L E Y , A N D W. C H E N , A pr eposterior analysis to pr edict identiﬁability in the experi- mental calibration of computer models , IIE T ransactions, 48 (2016), pp. 75–88, https://doi.org/10.1080/ 0740817X.2015.1064554 . [5] R . B A N D Y , T. P O RT O N E , A N D R . M O R R I S O N , Stochastic model corr ection for the adaptive vibration isolation r ound-robin challenge , in IMA C, A Conference and Exposition on Structural Dynamics, Springer , 2024, pp. 53–62. [6] M . B AY A R R I , J . O . B E R G E R , M . C . K E N N E D Y , A . K O T TA S , R . P AU L O , J . S A C K S , J . A . C A F E O , C . - H . L I N , A N D J . T U , Predicting vehicle crashworthiness: V alidation of computer models for functional and hierar chical data , Journal of the American Statistical Association, 104 (2009), p. 929 – 943, https://doi.org/10.1198/ jasa.2009.ap06623 . [7] M . J . B A Y A R R I , J . O . B E R G E R , A N D F . L I U , Modularization in bayesian analysis, with emphasis on analysis of computer models , Bayesian Analysis, 4 (2009), pp. 119 – 150, https://doi.org/10.1214/09- BA404 . [8] J . B E R NA R D O , J . B E R G E R , A . D A W I D , A . S M I T H , E T A L . , Re gr ession and classiﬁcation using gaussian pr ocess priors , Bayesian statistics, 6 (1998), p. 475. [9] J . L . B R O W N A N D L . B . H U N D , Estimating material pr operties under extr eme conditions by using bayesian model calibration with functional outputs , Journal of the Royal Statistical Society: Series C (Applied Statistics), 67 (2018), pp. 1023–1045, https://doi.org/10.1111/rssc.12273 . [10] J . B RY N JA R S D Ó T T I R A N D A . O ’ H A G A N , Learning about physical parameters: The importance of model discr epancy , In verse problems, 30 (2014), p. 114007. [11] K . C H O W D H A RY A N D H . N . N A J M , Bayesian estimation of karhunen-loève expansions; a random subspace appr oach , Journal of Computational Physics, 319 (2016), pp. 280 – 293, http://dx.doi.org/10.1016/j. jcp.2016.02.056 . [12] N . C R E S S I E , The origins of kriging , Mathematical Geology , 22 (1990), pp. 239–252, https://doi.org/10. 1007/BF00889887 . [13] T. C U I , J . M A R T I N , Y . M . M A R Z O U K , A . S O L O N E N , A N D A . S PA N T I N I , Likelihood-informed dimension r eduction for nonlinear in verse pr oblems , Inv erse Problems, 30 (2014), p. 114015. [14] A . D A M I A N O U A N D N . D . L AW R E N C E , Deep gaussian pr ocesses , in Artiﬁcial intelligence and statistics, PMLR, 2013, pp. 207–215. [15] Y . F A N , H . N A J M , Y . Y U , S . S I L L I N G , A N D M . D ’ E L I A , Embedded Nonlocal Oper ator Re gr ession (ENOR): Quantifying Model Err or in Learning Nonlocal Operators , Int. J. UQ, 15 (2025), pp. 61–94, https://doi.org/ 10.1615/Int.J.UncertaintyQuantification.2025056958 . [16] D . F O R E M A N - M AC K E Y , D . W . H O G G , D . L A N G , A N D J . G O O D M A N , emcee: the mcmc hammer , Publications of the Astronomical Society of the Paciﬁc, 125 (2013), p. 306. [17] R . B . G R A M A C Y , D . B I N G H A M , J . P . H O L L O W A Y , M . J . G R O S S K O P F , C . C . K U R A N Z , E . R U T T E R , M . T R A N - T H A M , A N D R . P . D R A K E , Calibrating a lar ge computer experiment simulating r adiative shock hydr odynamics , The Annals of Applied Statistics, 9 (2015), pp. 1141–1168. [18] M . G U A N D L . W A N G , Scaled gaussian stochastic pr ocess for computer model calibration and prediction , SIAM/ASA Journal on Uncertainty Quantiﬁcation, 6 (2018), pp. 1555–1583. [19] L . H A K I M , G . L AC A Z E , M . K H A L I L , K . S A R G S YA N , H . N A J M , A N D J . O E F E L E I N , Pr obabilistic parameter estimation in a 2-step chemical kinetics model for n-dodecane jet autoignition , Comb ustion Theory and Modelling, 22 (2018), pp. 446–466, https://doi.org/10.1080/13647830.2017.1403653 . [20] M . S . H A N D C O C K A N D M . L . S T E I N , A bayesian analysis of kriging , T echnometrics, 35 (1993), pp. 403–410. [21] A . H E G D E , E . W E I S S , W. W I N D L , H . N . N A J M , A N D C . S A F TA , A bayesian calibration framework with embedded model err or for model diagnostics , International Journal for Uncertainty Quantiﬁcation, 14 (2024), pp. 37–70, https://doi.org/10.1615/Int.J.UncertaintyQuantification.2024051602 . [22] D . H I G D O N , M . K E N N E DY , J . C . C A V E N D I S H , J . A . C A F E O , A N D R . D . R Y N E , Combining ﬁeld data and computer simulations for calibration and pr ediction , SIAM Journal on Scientiﬁc Computing, 26 (2004), pp. 448– 466, https://doi.org/10.1137/S1064827503426693 . [23] M . D . H O FF M A N , A . G E L M A N , E T A L . , The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. , J. Mach. Learn. Res., 15 (2014), pp. 1593–1623. [24] X . H U A N , C . S A F TA , K . S A R G S Y A N , G . G E R AC I , M . S . E L D R E D , Z . V A N E , G . L A C A Z E , J . C . O E F E L E I N , A N D H . N . N A J M , Global sensitivity analysis and quantiﬁcation of model err or for lar ge eddy simulation in scramjet design , in 19th AIAA Non-Deterministic Approaches Conference, 2017, p. 1089. 28 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T [25] J . P . K A I P I O A N D E . S O M E R S A L O , Statistical and computational in verse pr oblems , Springer , 2005. [26] K . K A R H U N E N , Zur spektraltheorie stochastisc her pr ozesse , Ann. Acad. Sci. Fennicae, 37 (1946). [27] M . C . K E N N E DY A N D A . O ’ H AG A N , Bayesian calibration of computer models , Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63 (2001), pp. 425–464. [28] M . C . K E N N E DY , A . O ’ H A G A N , A N D N . H I G G I N S , Bayesian analysis of computer code outputs , in Quantitati ve Methods for Current En vironmental Issues, C. W . Anderson, V . Barnett, P . C. Chatwin, and A. H. El-Shaarawi, eds., London, 2002, Springer London, pp. 227–243, https://doi.org/10.1007/978- 1- 4471- 0657- 9_11 . [29] G . S . K I M E L D O R F A N D G . W A H B A , A correspondence between bayesian estimation on stochastic pr ocesses and smoothing by splines , The Annals of Mathematical Statistics, 41 (1970), pp. 495–502. [30] H . K Ö N I G , Eigen value distribution of compact oper ators , vol. 16, Birkhäuser , 2013. [31] M . K U P P A , R . G H A N E M , A N D M . P A N E S I , Stochastic operator learning for c hemistry in non-equilibrium ﬂows , Journal of Computational Physics, (2025), p. 114381. [32] J . M A RT I N , L . C . W I L C OX , C . B U R S T E D D E , A N D O . G H A T TAS , A stochastic newton mcmc method for lar ge- scale statistical in verse pr oblems with application to seismic in version , SIAM Journal on Scientiﬁc Computing, 34 (2012), pp. A1460–A1487, https://doi.org/10.1137/110845598 . [33] Y . M . M A R Z O U K A N D H . N . N A J M , Dimensionality r eduction and polynomial chaos acceler ation of bayesian infer ence in in verse pr oblems , Journal of Computational Physics, 228 (2009), pp. 1862–1902. [34] Y . M . M A R Z O U K , H . N . N A J M , A N D L . A . R A H N , Stochastic spectral methods for efﬁcient bayesian solution of in verse pr oblems , Journal of Computational Physics, 224 (2007), pp. 560–586, https://doi.org/10.1016/j. jcp.2006.10.010 . [35] G . M A T H E R O N , Principles of geostatistics , Economic Geology , 58 (1963), pp. 1246–1266, https://doi.org/ 10.2113/Fgsecongeo.58.8.1246 . [36] R . E . M O R R I S O N , T. A . O L I V E R , A N D R . D . M O S E R , Repr esenting model inadequacy: A stochastic operator appr oach , SIAM/ASA Journal on Uncertainty Quantiﬁcation, 6 (2018), pp. 457–496. [37] K . P . M U R P H Y , Machine learning: a pr obabilistic perspective , MIT press, 2012. [38] R . M . N E A L , Bayesian learning for neural networks , vol. 118, Springer Science & Business Media, 2012. [39] T. A . O L I V E R , G . T E R E JA N U , C . S . S I M M O N S , A N D R . D . M O S E R , V alidating pr edictions of unobserved quantities , Computer Methods in Applied Mechanics and Engineering, 283 (2015), pp. 1310–1335. [40] J . O R E L U K , L . S H E P S , A N D H . N A J M , Bayesian model calibration for vacuum-ultraviolet photoionisation mass spectr ometry , Comb ustion Theory and Modelling, 26 (2022), pp. 513–540, https://doi.org/10.1080/ 13647830.2022.2030495 . [41] P . P E R D I K A R I S , M . R A I S S I , A . D A M I A N O U , N . D . L AW R E N C E , A N D G . E . K A R N I A D A K I S , Nonlinear information fusion algorithms for data-efﬁcient multi-ﬁdelity modelling , Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473 (2017), p. 20160751. [42] P . P E R N OT A N D F. C A I L L I E Z , A critical re view of statistical calibration/pr ediction models handling data inconsistency and model inadequacy , AIChE Journal, 63 (2017), pp. 4642–4665. [43] M . P L U M L E E , Bayesian calibr ation of inexact computer models , Journal of the American Statistical Association, 112 (2017), pp. 1274–1285. [44] M . P L U M L E E A N D V . R . J O S E P H , Orthogonal gaussian pr ocess models , Statistica Sinica, (2018), pp. 601–619. [45] D . P O O L E A N D A . E . R A F T E RY , Infer ence for deterministic simulation models: the bayesian melding appr oach , Journal of the American Statistical Association, 95 (2000), pp. 1244–1255. [46] A . E . R A F T E RY , G . H . G I V E N S , A N D J . E . Z E H , Infer ence fr om a deterministic population dynamics model for bowhead whales , Journal of the American Statistical Association, 90 (1995), pp. 402–416. [47] M . R A I S S I , P . P E R D I K A R I S , A N D G . E . K A R N I A D A K I S , Inferring solutions of differ ential equations using noisy multi-ﬁdelity data , Journal of Computational Physics, 335 (2017), pp. 736–746. [48] R . R O M A N O W I C Z , K . B E V E N , A N D J . T A W N , Evaluation of pr edictive uncertainty in nonlinear hydr ological models using a bayesian appr oach , Statistics for the En vironment, 2 (1994), pp. 297–317. [49] J . S A C K S , W . J . W E L C H , T. J . M I T C H E L L , A N D H . P . W Y N N , Design and analysis of computer experiments , Statistical science, 4 (1989), pp. 409–423. [50] S . S A K A M OT O A N D R . G H A N E M , P olynomial chaos decomposition for the simulation of non-gaussian nonsta- tionary stochastic pr ocesses , Journal of Engineering Mechanics, 128 (2002), pp. 190 – 201. 29 Model Error Embedding with Orthogonal Gaussian Processes A P R E P R I N T [51] T. J . S A N T N E R , B . J . W I L L I A M S , W. I . N O T Z , A N D B . J . W I L L I A M S , The design and analysis of computer experiments , v ol. 1, Springer , 2003. [52] K . S A R G S Y A N , X . H U A N , A N D H . N . N A J M , Embedded model err or repr esentation for bayesian model calibration , International Journal for Uncertainty Quantiﬁcation, 9 (2019). [53] K . S A R G S Y A N , H . N . N A J M , A N D R . G H A N E M , On the statistical calibration of physical models , International Journal of Chemical Kinetics, 47 (2015), pp. 246–276. [54] E . S N E L S O N , Z . G H A H R A M A N I , A N D C . R A S M U S S E N , W arped gaussian processes , Advances in neural information processing systems, 16 (2003). [55] A . S PA N T I N I , A . S O L O N E N , T. C U I , J . M A RT I N , L . T E N O R I O , A N D Y . M A R Z O U K , Optimal low-rank appr oximations of bayesian linear in verse pr oblems , SIAM Journal on Scientiﬁc Computing, 37 (2015), pp. A2451– A2487. [56] R . T U O A N D C . J E FF W U , A theoretical framework for calibration in computer models: P arametrization, estimation and con ver gence pr operties , SIAM/ASA Journal on Uncertainty Quantiﬁcation, 4 (2016), pp. 767–795. [57] R . T U O A N D C . J . W U , Efﬁcient calibration for imperfect computer models , The Annals of Statistics, 43 (2015), pp. 2331–2352. [58] C . K . W I L L I A M S A N D C . E . R A S M U S S E N , Gaussian pr ocesses for machine learning , vol. 2, MIT press Cambridge, MA, 2006. 30

Model Error Embedding with Orthogonal Gaussian Processes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment