Smoothness and other hyperparameter estimation for inverse problems related to data assimilation

Smo othness and other h yp erparameter estimation for in v erse problems related to data assimilation Baptiste Simandoux 1 , Nik olas Kantas 1 , and Dan Crisan 1 1 Departmen t of Mathematics, Imp erial College, London, SW7 2AZ, UK. Abstract W e consider Ba y esian inv erse problems arising in data assimilation for dynamical sys- tems go verned b y partial and stochastic partial diﬀeren tial equations. The space–time dep enden t ﬁeld is inferred join tly with static parameters of the prior and likelihoo d densities. P articular emphasis is placed on the hyperparameter controlling the prior smo othness and regularit y , which is critical in ensuring well-posedness, shaping p osterior structure, and determining predictive uncertain ty . Commonly it is assumed to b e kno wn and ﬁxed a priori; how ev er in this pap er w e will adopt a hierarchical Ba yesian framew ork in whic h smo othness and other h yp erparameters are treated as unknown and assigned h yp erpriors. P osterior inference is performed using Metrop olis-within-Gibbs sampling suitable to high dimensions, for which hyperparameter estimation in volv es little com- putational ov erhead. The metho dology is demonstrated on inv erse problems for the Na vier–Stokes e quations and the sto chastic advection–diﬀusion equation, under sparse and dense observ ation regimes, using Gaussian priors with diﬀeren t cov ariance structure. Numerical results show that jointly estimating the smo othness substan tially reduces the errors in uncertain ty quantiﬁcation and parameter estimation induced b y smoothness missp eciﬁcation, by achieving p erformance comparable to scenarios in which the true smo othness is known. 1 In tro duction In meteorology , o ceanograph y and atmospheric sciences, h ydrology , geophysics, v arious dy- namical space-time dep enden t quantities need to b e estimated and used for prediction; see [6, 20, 5, 4] for an o verview. They are typically mo delled with physical mo dels from ﬂuid dynamics formulated as partial diﬀeren tial equations (PDEs) and more recently sto c hastic PDEs (SPDEs), describing the evolution of the atmospheric quantities suc h as v elo cit y , tem- p erature, pressure, turbulence of wind or o cean curren ts, salinit y and man y others [62, 3, 35]. In the last few dec ades, increasingly larger space–time data sets are obtained from remote sensing satellites or in situ sensors for example [24]. Satellites pro duce high-resolution im- ages of geophysical v ariables, suc h as stratospheric ozone, sea surface height winds, that are used for many tasks ranging from the estimation of seasonal cycles to the identiﬁca- tion of long-term trends [54, 53]. Data assimilation pro vides a framew ork to combine this observ ational data with dynamical mo dels to estimate and predict accurately those states. F orecasting, one of the core aims of n umerical weather prediction, can b e achiev ed using an inferred initial condition of the state b y propagating its gov erning dynamics forw ard. As 1 these mo del tra jectories quic kly deviate from their true state, we incorp orate observ ations to correct this drift, see [46, 4, 6] for more details. In addition to states, referred to as dynamical v ariables, there are static parameters that determine the mo del equations and prop erties such as smo othness, noise amplitude, rotation angles among others. These also need to b e estimated from data and in this pap er we will consider joint Bay esian inference for dynamic and static v ariables [1, 7]. Data assimilation problems are naturally phrased as Bay esian inv erse problems: this has b een common notably in the ﬁelds of meteorology , atmospheric and o ceanic sciences, see [6, 20] for examples. Motiv ated b y these applications, w e adopt a Bay esian p erspective to infer from noisy observ ations a v ector ﬁeld v = ( v ( t, x ); t ∈ [0 , T ] , x ∈ X ) in a suitable separable v ector space ( V , ∥ · ∥ V ). Bay esian in verse problems hav e gro wn signiﬁcantly in p opularit y ov er the last decade, driv en by the av ailabilit y of simulation-based algorithms and large computational p o wer [25]. Within this framew ork, we aim to learn a p osterior distribution µ given data y via the abstract Ba yes rule: dµ dµ θ 0 ( v ) = l ( y , v , ϕ ) . (1) Here θ , ϕ denote sets of v arious constants used to sp ecify the prior µ 0 and the likelihoo d l resp ectiv ely . An estimate of v is given b y samples of this p osterior. The prior acts as regulariser and imp oses certain smo othness and regularity conditions on v so that µ is well p osed [17, 9, 30, 55]. Diﬀerent regularisers can b e chosen dep ending on the approac h used, for example Gaussian priors corresp ond to a Tikhonov regulariser [23] or a T otal V ariation prior can b e used to preserve edges [31]. Here we will use Gaussian priors as in [29] but our metho dology is generic for other priors and regularisers, see [17] for an ov erview of diﬀerent options. θ is usually a concatenation of several parameters: v ariance amplitude, spatial range, correlation le ngth or order of a diﬀerential op erator for example. In this pap er w e will pay particular atten tion to determining the smo othness/roughness of v , denoting the relev an t parameter as α . One requiremen t for the a priori spatial smo othness α is to ensure p osterior w ell-p osedness of the in v erse problem itself. Beyond this α is crucial in setting v arious spatial prop erties of the p osterior and subsequen t predictions. In many cases in data assimilation estimation of α is omitted and is assumed to b e ﬁxed and known [16, 26, 45, 48, 52]. Similar c hallenges in setting these regularisation parameters also app ear in image pro cessing prob- lems, see [60, 37]. In this pap er w e prop ose to estimate α and demonstrate the b eneﬁts of the added mo del ﬂexibilit y brought by the h yp erpriors. The remaining parameters of θ and the ones of ϕ will also need to b e estimated. While in some cases these ma y b e kno wn a priori via some oﬄine calibration metho ds, esp ecially for ϕ , they often require joint estimation with v , particularly when not physically measurable [1, 11]. T o that end, w e will adopt a hierarchical mo deling approach [1, 7, 19, 27, 43, 44], treating ( θ , ϕ ) as v ariables also to b e inferred and assigned with h yp er-prior π ( ϕ, θ ). This allo ws for a ﬂexible and expressiv e w ay to specify priors and models. The resulting joint prior distribution on v , θ , ϕ is c hosen to lead to a well-posed p osterior distribution by conditioning on the observed data [55]. It is common in hierarc hical in verse problems for v to emplo y ensem ble Kalman techniques [27, 49]. Whilst fast and c heap to implement, the latter are not theoretically justiﬁed and are biased. They ha ve p erformed well in providing p oin t estimates like the maximum a p osteriori (MAP) estimator, ho wev er they do not capture the uncertain ty around these estimates [2]. This motiv ates our choice to use Mon te Carlo metho ds for parameter estimation and in particular Mark ov Chain Mon te Carlo (MCMC) as 2 in [7, 8, 9, 10, 32, 34, 40, 55]. T o estimate ϕ, θ along with v , we will use Metrop olis-within- Gibbs (MwG) sampling. F or v , w e will use appropriate MCMC algorithms that are eﬃcient and robust to the high dimensionality of v , namely the preconditioned Crank–Nicolson (pCN) [10, 22, 32]. These are computationally intensiv e methods especially when used for n umerical discretisations of v and the relev an t (S)PDEs, but they are theoretically justiﬁed and robust under mesh reﬁnement [22]. W e will demonstrate the merit of our approac h in t wo case studies. First, we will con- sider an in v erse problem using the Navier-Stok es equations, similar to [8, 9, 26, 55]. Secondly w e consider the sto c hastic advection-diﬀusion equation, p opular for environmen tal monitor- ing, for example in [54, 63, 52, 14, 36, 65]. F or µ 0 , we will use the p opular Gaussian pro cess prior used in many data assimilation settings [55, 25, 29, 42, 26]. F or the Navier-Stok es case, the co v ariance of µ 0 will b e set as β 2 A − α , where α and β c haracterise the smo othness and scaling resp ectiv ely and A is the Stok es op erator. In the sto c hastic advection-diﬀusion case w e consider as prior the Whittle-Mat´ ern cov ariance [61], widely used in spatial statistics [38, 52]. In b oth cases the p osterior is highly sensitive to choices of its hyperparameters [40, 41] and in particular α . 1.1 Relations with existing literature F rom a theoretical standp oin t a ﬁxed α may yield sub-optimal contraction rates, unless the prior captures the ﬁne prop erties of the truth like its regularit y [29]. As this is rarely known, determining the regularit y/smo othness of v systematically is necessary for accurate infer- ence of v . Alternativ ely to Hierarc hical Ba yesian metho ds, one could use empirical Bay es to estimate θ , including α from the data, t ypically by maximizing the marginal lik eliho o d of θ prior to p osterior inference [58, 28]. Whilst computationally cheaper, this is not a fully Ba yesian metho d and it ignores the parameters’ uncertain t y , p oten tially underestimating our p osterior uncertain ty . Hierarc hical Ba yesian metho ds considering smo othness were studied theoretically for the linear diagonalizable in verse problems in [28], obtaining adaptiv e op- timal conv ergence rates, under mild assumptions on α ’s hyper-prior. The sp eciﬁc form of the parameterisation for θ can aﬀect the mixing of MwG [1, 43, 64]. Non-centered parame- terisations of the laten t space and h yp erparameters allo w for faster mixing in MwG, and a non-cen tered parameterisation was also used for pCN algorithms, with [1] sho wing that it is robust to discretisation. W e note that to the b est of our kno wledge, there are no case studies related to nonlinear inv erse problems prior to this work inv estigating and demonstrating the eﬃciency of this approach when also estimating the smo othness parameter α . W e will b e using the MwG algorithm along with pCN as in [7]. Parameter learning for the Matern prior using MwG is studied in [48] but α := 3 − d/ 2 was ﬁxed. F or linear mo dels the MAP estimate has b een well studied theoretically in [28, 15], but there is a lack of case studies on in verse problems with PDEs/SPDEs, esp ecially when learning the smo othness parameter. Our pap er addresses this gap and is intended to illustrate that one can recreate similar p erformance to knowing α . In parallel w ork, [41] provides a case study with a p oin t estimate (MAP) for ﬁxed hyperparameters, without considering lik elihoo d parameters ϕ . W e will also demonstrate how learning α leads to b etter uncertain ty quantiﬁcation. Finally our case studies include diﬀerent mo dels and observ ation regimes. 1.2 Con tributions • W e prop ose an eﬃcient MwG algorithm for in verse problems of PDEs and SPDEs. W e use pCN for the state and Gibbs sampling or Gibbs MCMC kernels for the h y- 3 p erparameters, as in [7]. W e remark that the computational ov erhead of considering h yp erparameters is minimal. • W e pay particular attention to the eﬀect of the smo othness parameter on the state’s inference and prediction, often ov erlo ok ed in the literature as men tioned earlier. T o that end, w e demonstrate its impro vemen ts in uncertaint y quantiﬁcation b oth in es- timation and prediction compared to standard metho ds relying on ﬁxed smo othness assumptions. W e also lo ok into the smoothness’ inﬂuence on the rest of the parameters’ estimation. 1.3 Organisation The structure of this pap er is as follows: in section 2 w e presen t the problem form ulation, our notation along with the tw o diﬀerent problems to b e studied. Section 3 describ es the Metrop olis-within-Gibbs algorithm we are considering in the high-dimensional setting and its implementation for eac h case study . Subsequently , section 4 contains n umerical results. Finally , in section 5 we will conclude and discuss future work. The App endix contains supplemen tary material and additional n umerical exp erimen ts. 2 Problem F orm ulation Adopting a Bay esian p ersp ectiv e, we will estimate v from y b y sampling from its p osterior distribution, introduced in (1). T o do so, we ﬁrst sp ecify a prior for v conditional on param- eter θ , where α ∈ θ controls the spatial regularity of the prior realizations. W e will mo del the observ ations y as noisy measuremen ts of the latent ﬁeld v at ﬁxed lo cations and discrete in terv als for the observ ation times. As a result, the lik eliho o d function will dep end on the dynamics of v and the observ ation pro cess. F or simplicit y w e will only consider an Eule- rian observ ation setting with a regularly spaced grid for the observ ation lo cations and use Gaussian additive noise, but we p oin t out that our metho dology straightforw ardly extends to more general non-linear non Gaussian observ ation schemes and irregular grids as we are using MCMC. Recall that ϕ, θ are the parameters of the likelihoo d and prior resp ectiv ely . As men- tioned abov e, inference for θ and ϕ is performed join tly with v . T ypically , the parameters θ , ϕ are low dimensional whilst the dimension of v is high, so the latter dictates the algorithm’s computational cost. Below w e will in tro duce the observ ation equation and t wo diﬀerent dy- namical mo dels we will consider in our n umerical study: the 2D Navier-Stok es equation and the sto c hastic advection-diﬀusion equation. 2.1 Observ ations W e mo del the data as noisy measurements of the v elo cit y ﬁeld v on a ﬁxed grid of p oints: x 1 , ..., x Υ , for Υ ≥ 1. W e assume that the observ ation noise is Gaussian and its standard deviation τ is known. These are obtained regularly at interv als δ time units apart. y n,υ = v ( x υ , nδ ) + τ ζ n,υ , ζ n,υ iid ∼ N (0 , 1) , 1 ≤ υ ≤ Υ , 1 ≤ n ≤ T . So the likelihoo d of this data conditioned on the initial condition v 0 tak es the form: l ( y ; v ) = 1 (2 π τ 2 ) T · Υ 2 T Y n =1 Υ Y υ =1 exp  − 1 2 τ 2 ( y n,υ − v ( x υ , nδ )) 2  . (2) 4 2.2 Case study 1: Inferring the initial condition of deterministic Na vier- Stok es equations The domain we consider is a 2D T orus T = [0 , 2 π ] 2 with initial ﬂow v 0 ∈ V and let v : T × [0 , ∞ ) → R 2 , v ( x, t ) = ( v 1 ( x, t ) , v 2 ( x, t )) T . One can write the incompressible Navier- Stok es PDE as lifted in a suitable Hilb ert space (see section A in the App endix for details) dv dt + η Av + B ( v , v ) = P ( f ) , v (0 , · ) = v 0 , (3) where η > 0 is a known viscosit y parameter, A is the Stok es op erator, B is the symmetric bilinear form and P ( f ) is the forcing term. Let Ψ( · , t ) : V → V t ≥ 0 denote the semigroup of solution op erators for the ODE (3) through t time units. Due to the nature of the obser- v ations we are interested in the discrete-time semigroup where v ( · , nδ ) = G ( n ) δ ( · ) = Ψ( · , nδ ) corresp onding to time instances t = nδ , of lag δ > 0 and n = 0 , ..., T , with conv entions G (0) δ = I , G (1) δ = G δ , G ( n ) δ = G δ ∗ G ( n − 1) δ . F or simplicity we will assume ϕ = { η , P ( f ) , τ } to b e known. Notice then that in this problem the Navier-Stok es dynamics enter in the p oste- rior only in the likelihoo d function, formulated abov e in (2). Also all the information for v is captured in v 0 as v ( x υ , nδ ) = G ( n ) δ ( v 0 )( x υ ) so w e will consider this as an inference problem for v 0 . T o clarify , since the dynamics are deterministic, forecasting can b e p erformed b y propagating them from samples of v 0 to the desired future time. An appropriate orthonormal basis for V is comprised of ψ k ( x ) = k ⊥ 2 π | k | exp( ik · x ) , k ∈ Z 2 \ { 0 } , where k ⊥ = ( − k 2 , k 1 ) T . Using this basis we obtain the F ourier decomp osition where v 0 ( x ) = P k ∈ Z 2 \{ 0 } u k ψ k ( x ) with u k = ⟨ v 0 , ψ k ⟩ . Since v 0 is real v alued, u k = − u − k so w e only need to consider half the F ourier co eﬃcien ts. W e split the domain b y deﬁning: Z 2 ↑ = { k = ( k 1 , k 2 ) ∈ Z 2 \ { 0 } : k 1 + k 2 > 0 } ∪ { k ∈ Z 2 \ { 0 } : k 1 + k 2 = 0 , k 1 > 0 } , and imp osing u k = − u − k for k ∈ { Z 2 \ { 0 }} \ Z 2 ↑ . W e represent the initial vector ﬁeld v 0 b y its F ourier co eﬃcien ts, and equiv alen tly: v 0 ↔ { u k } k ∈ Z 2 ↑ . v 0 will denote the concatenated v ector of the real and imaginary part of its F ourier co eﬃcien ts. Since w e are only inferring v 0 , we set the prior on v 0 to b e µ 0 = N (0 , β 2 A − α ) with h yp erparameters θ := ( α > 1 / 2 , β > 0) as earlier. This prior, a Gaussian distribution on this Hilb ert space, is a conv enien t and ﬂexible prior, see for details [13]. In the F ourier domain w e hav e: v 0 ∼ µ 0 ⇐ ⇒ Re( u k ) , Im( u k ) iid ∼ N (0 , 1 2 β 2 | k | − 2 α ) , k ∈ Z 2 ↑ . The prior admits a Karhunen–Lo ` eve expansion: µ θ 0 = Law   X k ∈ Z 2 \{ 0 } β √ 2 | k | − α ξ k ψ k   , (4) with θ = ( α, β 2 ) , Re( ξ k ) , Im( ξ k ) iid ∼ N (0 , 1) , k ∈ Z 2 ↑ ; ξ k = − ξ − k , k ∈ { Z 2 \ { 0 }} \ Z 2 ↑ , pro viding the Non-Centered Parameterisation ( θ , { ξ k } k ∈ Z 2 ↑ ). W e choose a zero-mean prior for simplicity; results will still hold for any mean function in the Cameron-Martin space of µ 0 [8, 9, 55]. Under this prior, u k are initially assumed 5 indep enden t normally distributed, with a particular rate of deca y for their v ariances as | k | increases so that the prior is trace class i.e. X k ∈ Z 2 \{ 0 } β √ 2 | k | − α < ∞ . This v ariance decay is particularly relev ant for the Na vier-Stokes mo del, for which the dissi- pation will transfer energy from high to low frequencies un til v 0 en ters the PDE’s attractor [47]. F or appropriate forcing and initialisation, the sp ectrum of v 0 tends to concen trate at lo w frequencies. This means higher frequencies will b e less informativ e in the posterior. Here the role of the smo othness parameter α is crucial: higher v alues w ould corresp ond to lo wer n umber of F ourier mo des capturing most of the prior and p osterior and since the viscosity constan t η is ﬁxed and known, α sets the eﬀectiv e dimension. Since ϕ is known, we use the following joint p osterior probability measure µ on V : µ ( v 0 , α, β 2 ) = 1 Z ( y ) l ( y ; v 0 ) µ α,β 2 0 ( v 0 ) π ( α ) π ( β 2 ) , (5) where the sup erscript in µ α,β 2 0 denotes conditioning with α , β 2 and π ( α ) , π ( β 2 ) are indep en- den t hyper-priors. The w ell-p osedness of µ is cov ered by the results in [8, 9, 55], which deal with the inﬁnite dimensional v ariable v 0 , and has b een established with the requiremen t that α > 1 / 2 or equiv alently A − α b e a trace-class op erator. As long as the hyper-prior π ( α ) do es not violate this requiremen t and Z ( y ) is ﬁnite, the p osterior will b e well posed. F or this reason w e will use an in verse Gamma distribution for β 2 and an appropriate uniform for α . 2.3 Case study 2: Data assimilation with the Sto c hastic Adv ection Diﬀu- sion equation W e now study a mo del whereby the state’s dynamics follo w a linear SPDE on V := T 2 . d v ( t ) = ( − µ T ∇ v t + ∇ · Σ ∇ v t − ζ v t )d t + d ϵ t , where µ T ∇ v t mo dels transp ort eﬀects, advection in weather applications, with drift µ , ∇ · Σ ∇ v t is a diﬀusion term that can incorp orate anisotropy and − ζ v t , ζ > 0 is a damping term. The sto c hastic noise ϵ t is a temp orally white and spatially correlated Gaussian pro cess, with Whittle-Mat ´ ern co v ariance. Σ − 1 = 1 ρ 2 1  cos( ψ ) sin( ψ ) − γ sin( ψ ) γ cos( ψ )  T  cos( ψ ) sin( ψ ) − γ sin( ψ ) γ cos( ψ )  where ρ 1 > 0 , γ > 0 and ψ ∈ [0 , π / 2]. Here, ρ 1 acts as a range parameter and controls the amoun t of diﬀusion. γ and ψ control the amount and the direction of anisotropy; with γ = 1, isotropic diﬀusion is obtained. F or this problem: ϕ = ( ζ , ρ 1 , γ , ψ , µ 1 , µ 2 , τ ). Contrary to the previous setting, the noise in the dynamics means w e cannot easily recov er v t directly from v 0 . T o do this we need to simulate the whole path of v t . As a result, this is an inv erse problem on the whole path { v t } t ∈ [0 ,T ] . F orecasting b ey ond the time of the last observ ation T is p ossible after inferring { v t } T t ≥ 0 giv en observ ations deﬁned as in section 2.1. This ﬁltering setting and the prior are deﬁned on the path space { v t } T t ≥ 0 . So we let our prior include the initial condition and the noise increments in the dynamics, whic h will dep end on θ . W e set v 0 P := ϵ 0 and describ e in detail the distribution of ϵ b elo w. 6 2.3.1 Whittle-Mat ´ ern prior The Mat ´ ern cov ariance b et w een lo cations u, v is r ( u, v ) = σ 2 Γ( ν )2 ν − 1 ( ρ 0 ∥ v − u ∥ ) ν K ν ( ρ 0 ∥ v − u ∥ ) , where K ν is the mo diﬁed Bessel function of second kind and order ν > 0 , ρ 0 > 0 is a scaling parameter and σ 2 is the marginal v ariance. Using a Gaussian ﬁeld with Whittle- Mat ´ ern co v ariance, the noise { ϵ t } T t =0 is quite a common choice in practice and applications [38, 52, 54, 36]. A Gaussian ﬁeld x ( u ) with Mat´ ern cov ariance is a solution to ( ρ 2 0 − ∆) α/ 2 x ( u ) = σ W ( u ) , u ∈ R d , α = ν + d/ 2 , ρ 0 > 0 , ν > 0 where the inno v ation pro cess W is a spatial Gaussian white noise [33]. As in [52] we w ill infer σ 2 , ho wev er in our case we will also infer α ([52] has it set with ν = 1). ( ρ 2 0 − ∆) α/ 2 is a pseudo-diﬀeren tial op erator with F ourier transform: {F ( ρ 2 0 − ∆) α/ 2 Λ } ( k ) = ( ρ 2 0 + ∥ k ∥ 2 ) α 2 ( F Λ)( k ) (6) where Λ is a function on R d for which the right-hand side of (6) has a well-deﬁned inv erse F ourier transform. ρ con trols the spatial correlation, a deﬁning feature of the Mat ´ ern ﬁeld. Here θ = ( α, σ 2 , ρ 0 ), so ( ρ 2 0 + ∥ k ∥ 2 ) α 2 cannot b e precomputed, and we consider indep enden t h yp erpriors for θ . As a result in this case every mo del parameter is b eing estimated. So the join t p osterior we sample from corresp onds to: µ ( v , ϕ, θ ) = 1 Z ( y ) l ϕ ( y ; v ) µ θ 0 ( v ) π ( θ ) π ( ϕ ) . (7) 3 Mon te Carlo Metho ds for ﬁltering and in v erse problems 3.1 Mark o v Chain Mon te Carlo MCMC is an iterative pro cedure simulating a long run of an ergo dic time-homogeneous µ - in v arian t Marko v chain to sample from the target distribution µ . After a burn-in p erio d the samples of the chain form an empirical appro ximation of µ . Several Marko v chain metho ds are av ailable to sample from a given target p osterior distribution µ , suc h as the Gibbs sampler or the Metrop olis-Hastings algorithm; see [59] for an ov erview. Our setting requires sampling from distributions deﬁned on a Hilb ert space and the p os- teriors lie in an inﬁnite-dimensional state-space. F or inv erse problems on v , one cannot easily use conditional dep endence to develop Gibbs-type samplers to up date blo c ks of co eﬃcien ts and improv e eﬃciency . In practice we use a high-dimensional discretisation, but our metho d still needs to b e robust to mesh reﬁnement. Sampling directly from the full conditional of v giv en ϕ, θ , y is imp ossible so we use a MCMC suited to high dimensions. Random-W alk Metrop olis algorithms are not a v alid option for targets with Gaussian priors as their sp ec- tral gaps will prov ably collapse with increasing dimensions [22]. Autoregressiv e proposals that are inv arian t to the prior hav e b een prop osed in [59]. These were recen tly renamed as preconditioned Crank–Nicolson (pCN) and introduced for inference of high-dimensional in verse problems in [10, 55]. [22] establishes that in contrast to Random walk metho ds, the sp ectral gap of pCN and hence the mixing of the metho d do es not collapse as the dimension gro ws and o vercomes some of the slow mixing issues encountered b y standard MCMCs in high dimensions. W e ﬁx ϕ, θ to the current estimates and run iterations of pCN on the full conditional of v . The pseudo-co de is presented in Algorithm 1. 7 Algorithm 1 pCN K pC N ( v 0 , · ): µ -inv ariant MCMC for High Dimensional In verse Prob- lems Inputs : • Prior µ 0 • step-size ρ close to 1 ( ∼ 1 − 1 / (n umber of v ariables to estimate) 2 ) • Data y , lik eliho o d and its parameters ϕ conditioned on initial condition v 0 : l ( y , v 0 , ϕ ) • Num b er of iterations T with burn-in p erio d bur n in Initialize : v 0 (0) ∼ µ 0 . F or t = 1 , . . . , T : 1. Prop ose: ˜ µ = ρ · v 0 ( t − 1) + p 1 − ρ 2 · ξ , ξ ∼ µ 0 2. Accept v 0 ( t ) = ˜ v 0 with probabilit y: 1 ∧ l ( y , ˜ v 0 , ϕ ) l ( y , v 0 ( t − 1) , ϕ ) otherwise v 0 ( t ) = v 0 ( t − 1). Output: ( v 0 ( t )) T t = burn in form estimates of v 0 as samples from K pC N ( v 0 (0)). 3.2 Metrop olis-within-Gibbs Giv en the hierarc hical structure of the mo del, we will use Metrop olis-within-Gibbs (MwG). Gibbs sampling alternates sampling from the full conditional distributions of each v , θ , ϕ , conditional on the curren t v alues of the remaining v ariables and y . Indep enden t hyperpriors for θ and ϕ will b e used so that up dates of v , ϕ, θ will b e simulating in sequence from the follo wing densities: π ( v | y , θ , ϕ ) ∝ l ( y | v , ϕ ) π ( v | θ ) (8) π ( ϕ | y , v , θ ) ∝ l ( y | v , ϕ ) π ( v | θ ) π ( ϕ ) (9) π ( θ | v ) ∝ π ( v | θ ) π ( θ ) (10) One could use either direct sampling from each conditional ab o v e or if this is not p ossible a small num b er of Metrop olis iterations targeting eac h of the ab o v e densities. 1 The latter is referred to as MwG. Sampling directly using conjugate relationships is recommended when a v ailable to accelerate sampling. W e now discuss the particular case of estimating the hyperparameters θ in the prior. Unlik e v and ϕ , θ is not informed by the data, only b y the estimate of v . So in its Gibbs up date of π ( v , θ , ϕ | v ) the costly likelihoo d do es not need to b e computed. 3.3 Sampling for Case study 1 W e pro ceed with the Navier-Stok es inv erse problem from section 2.2 and the observ ations in section 2.1. F or the hyperparameters, θ = ( α, β 2 ) and the join t prior is π ( v | α, β 2 ) π ( α ) π ( β 2 ). 1 One may even go further and implement MwG for each co ordinate or blo c ks of co ordinates of the v ariables in v , θ , ϕ ). 8 Algorithm 2 Metrop olis-within-Gibbs sampler K M w G ( v 0 , α, β 2 , · ) for Case study 1 A dditional inputs to Algorithm 1: • P arameter hyper-priors: P β = In v-Gamma( a, b ) , U α • prop ortion of steps p er parameter: p v 0 , p β 2 , p α (= 1 − p v 0 − p β 2 ) • additional learning rate: ρ α Initialize : v 0 (0) ∼ µ 0 , β 2 (0) ∼ P β , α (0) ∼ U α . F or t = 1 , . . . , T : 1. Sample p ∼ U ([0 , 1]). 2. IF p < p u , v 0 ( t ) ∼ K pC N ( v 0 ( t − 1) , · | α ( t − 1) , β 2 ( t − 1)), ELSE IF p < p u + p β 2 , β 2 ( t ) ∼ In v-Gamma( a + n 2 2 , b + v 0 ( t − 1) A α v 0 ( t − 1) 2 ), ELSE prop ose ˆ α = ρ α · α ( t − 1) + p 1 − ρ 2 α · u where u ∼ U α , and accept α ( t ) = ˆ α with probabilit y A , or reject and set α ( t ) = α ( t − 1). 3. Set the remaining parameters to their previous v alue e.g. α ( t ) = α ( t − 1) Output: ( v 0 , α, β 2 ) T t = burn in form estimates of ( v 0 , α, β 2 ) as samples from K M w G ( v 0 (0) , α (0) , β 2 (0) , · ). In Alg. 2 we present a random scan MwG where with probability p v , p β 2 , p α , w e choose one of the conditionals (or when appropriate the MCMC kernels targeting them) in (8)-(10) to iterate the corresp onding v ariable, see step 2 in Algorithm 2. F or up dating v , the full conditional is as follows: π ( v 0 | y , α, β 2 ) ∝ l ( y | v 0 ) π ( v 0 | α, β 2 ) ∝ 1 (2 π τ 2 ) T · Υ 2 T Y n =1 Υ Y υ =1 exp  − 1 2 τ 2 ( y n,υ − G ( n ) δ ( v 0 )( x υ )) 2  × 1 √ 2 π d/ 2 | β | d | A − α | 1 2 exp( − 1 2 β 2 v 0 A α v 0 ) , so giv en α, β 2 w e will target this densit y using Alg. 1, where we adjust ρ for an a verage acceptance probabilit y ( ∼ 0 . 2 − 0 . 3). Each iteration requires running a PDE solver un til the time of the latest observ ation to compute l ( y | v , ϕ ), which is the most computationally exp ensiv e step in our algorithm. W e will set the h yp erprior π ( β 2 ) := IG( a, b ) (Inv erse Gamma distribution with shap e parameter a and scale parameter b ) to get a conjugate full conditional: π ( β 2 | y , α, v 0 ) ∝ π ( v 0 | α, β 2 ) π ( β 2 ) = IG(ˆ a, ˆ b ) with ˆ a = a + d 2 , ˆ b = b + 1 2 v 0 A α v 0 , where d corresp onds to the dimension of the mesh for v . This is equal to the n umber of random v ariables we are ev aluating: d = n ( n − 1) 2 b ecause of the restrictions ∇ · v = 0 and u k = − u − k . Deriv ations for this expression can b e found in the App endix in section B. 9 F or α we will use a uniform prior U α on compact domain and sample from pCN type MCMC k ernel targeting the follo wing full conditional: π ( α | y , β 2 , v 0 ) ∝ π ( v 0 | α, β 2 ) π ( α ) . The prop osal will b e ˆ α = ρ α α + (1 − ρ α ) P where P ∼ U α and hence is in v arian t to the uniform prior. As a result, the acceptance ratio A for this MCMC k ernel is given b y: A = 1 ∧ π ( v 0 | ˆ α, β 2 ) π ( v 0 | α, β 2 ) Q ( ˆ α, α ) Q ( α, ˆ α ) = µ 0 ( v 0 | ˆ α, β 2 ) µ 0 ( v 0 | α, β 2 ) = | A − α | 1 2 | A − ˆ α | 1 2 exp  1 2 β 2 ( v 0 A α v 0 − v 0 A ˆ α v 0 )  3.4 Sampling for Case study 2 W e now lo ok to the problem in section 2.3. The underlying dynamics are linear and the noise is Gaussian, so Kalman ﬁltering provides eﬃcient and quic k estimates of v t giv en ( y n ) nδ ≤ t and ϕ, θ . How ever estimating the system’s parameters ( ϕ, θ ) is a c hallenging problem. This study extends the w ork in [52] by also inferring α and hence redesigning the hyperpriors. Compared to the previous case study , the use of a SPDE in the mo delling means that after discretising in time one has a higher dimensional Gaussian prior on { ϵ t } T t ≥ 0 . W e will follow [51] for sampling p ( v | y , ϕ, θ ) using the forward ﬁltering bac kw ard sampling (FFBS) algorithm [18]. Whilst this is not the pCN algorithm, it uses Kalman ﬁltering recursions and ac hieves suﬃcien tly fast mixing. In [51] smo othness was deterministically ﬁxed to α = 2 and w as known. Also improp er priors were used, which w ould cause unidentiﬁabilit y issues for v arying α . Redesigning this requires the prior parameters for the remaining θ to b e rescaled according to v alues of α . So w e will set these priors instead to: ρ 0 ∼ U [0 . 01 , 5 · ( α − d/ 2) − 2 ] , σ 2 ∼ In vGamma(1 , 1) , γ ∼ In vGamma(5 , 5) , ζ ∼ InvGamma(1 , 1) , τ 2 ∼ U [0 , 100]. W e fo cused our eﬀorts on the noise parameters since they are the most sensitive to c hanges in α . The b oundaries on the uniform prior of ρ 0 w ere chosen so that ρ 0 w ould b e identiﬁable from its corresp onding Mat ´ ern ﬁeld. As for σ 2 w e chose an In verse Gamma h yp erprior as it could b e used in a conjugate relationship to sample directly from it. W e also chose In verse Gamma hyperpriors for γ and ζ to allow the whole state space to b e explored whilst p enalising large mov es. The rest are as in [52]: µ x , µ y ∼ U [ − 0 . 5 , 0 . 5] , ψ ∼ U [0 , π / 2] , ρ 1 ∼ U [0 , 100]. W e choose again a w eakly informative prior on α to highlight the adv antages of estimating it in a challenging case: U [1 , 4]. 4 Numerical results 2 4.1 Case study 1: inv erse problem for Na vier-Stok es W e are interested in comparing pCN and MwG for similar exp erimen ts to [26], where α is assumed to b e known and set to either 2 or 2 . 2. W e pro duce ﬁgures in the study to compare the p erformance in initial condition estimation of the MwG to pCN, assuming correct and incorrect v alues of α . F or the PDE solver, we will use a sp ectral Galerkin metho d. W e consider the ﬁnite dimensional pro jection onto the Hilb ert space spanned b y { ψ k } k ∈ L n where n ∈ N \ { 0 } and L n = { ( k 1 , k 2 ) ∈ Z 2 \ { 0 } : max( | k 1 | , | k 2 | ) < n/ 2 } . W e refer to the size of the implied mesh, d u , as the dimensionalit y of u. W e observ e that A is diagonalized in U with eigenv alues { λ k } k ∈ Z 2 \{ 0 } , where λ k = | k | 2 . Periodic b oundary conditions allow the use 2 The co de for the exp eriments can b e found in https://github.com/bs519/Hyperpar_est . 10 of sp ectral Galerkin methods when solving the Navier-Stok es PDE n umerically with F ast F ourier T ransforms (FFTs). When facing diﬀerent b oundary conditions or forcing, w e would use Finite Elements tec hniques or other metho ds [21]. Conv olutions arising from pro ducts in the nonlinear term are computed via FFTs on our mesh with additional antialiasing using double-sized zero padding. In addition, exp onen tial time diﬀerencing is used as in [12] to deal with the resulting stiﬀ system, whereby an analytical integration is used for the linear part, together with an explicit numerical Euler integration scheme for the nonlinear part. F or the mesh n = 32, the num b er of observ ations is set to T = 5 and for the observ ation grid Υ = 16. W e ha v e set f ( x ) = ∇ ⊥ cos((5 , 5) ′ · x ) , τ 2 = 0 . 2. W e consider N = 8 · 10 5 iterations and hav e found the optimal learning rate to b e around 0 . 999 and 0 . 998 in a stationary ( η = 0 . 1 , δ = 1) and ”chaotic” regime ( η = 0 . 02 , δ = 0 . 02) resp ectiv ely . F or the true initial condition and observ ations, we use a similar synthetic dataset, generated from a random sample of the prior with α = 2 . 2 , β 2 = 1 . 2. W e use π ( β 2 ) = IG(1 . 5 , 2 . 5). Stationary regime Figure 1: (Stationary regime) T race (left) and densit y (righ t) plots of prior parameters. The trace plots show the MCMC estimates (blue) captures the true v alue (green) in its distribution, with its cumulativ e av erage (red) con verging to a v alue close to the truth. Similarly the density plots display the p osterior’s estimate (blue) p eaking around the true v alue (red) of b oth parameters. The prior corresp onds to the blac k dashed line. F or b oth parameters, the true v alue lies in the supp ort of the densities of the estimates. The diagnostics plots for pCN and MwG are in the App endix in section C.1. Run times are shown in T able C.1. W e remark that as exp ected the runtimes are close to that of Algorithm 1, with α, β 2 kno wn. In Fig. 1 w e plot the trace plots and densities of our parameter estimation and the results pro ve the success of the parameter estimation even with uninformative priors. W e now study the algorithm’s uncertaint y quan tiﬁcation of the initial condition. In Fig. 2 w e plot the v orticity of the initial condition’s estimates against the true v alue. W e observ e similar estimates from the algorithms knowing the correct α and 11 Figure 2: (Stationary regime) V orticity of the initial condition v 0 with the truth (far left) and MCMC estimates: knowing the true α = 2 . 2 v alue (cen tre left), underestimating with α = 1 . 5 (centre), o verestimating with α = 3 (cen tre right) and estimating α using MwG (far right). The crosses indicate the p ositions where the v ector ﬁeld is observed. W e observe similar estimates from the algorithms kno wing the correct smo othness and estimating it, alb eit giving a smo other estimate. estimating it. The estimates reco ver the true initial condition relativ ely accurately , alb eit giving a smo other estimate. W e now plot in Fig. 3 the v ariance of the estimates’ vorticit y to fully ev aluate the algorithm’s uncertaint y quan tiﬁcation. W e see from the scales that the v ariance is highly impacted by the choice of α . W e recov er the order of v ariance when the true α is kno wn only with the MwG. When α is ov er and underestimated the uncertaint y is under and o ver estimated, resp ectfully . Practitioners often choose to sac riﬁce uncertaint y quantiﬁcation for impro ved p oin t estimates and vice-v ersa. This could b e used within our framework by adjusting the priors of α . Note how ev er we will use a high viscosit y and this low ers the system’s energy , and in doing so, this stationary regime renders our v ariance estimation less crucial and the predictive p o w er of all the algorithms similar after a few timesteps. W e repeat this b elo w for the ”c haotic” regime with low er viscosity and more frequen t and informative observ ations to in vestigate the diﬀerences. Chaotic regime W e obtain similarly go od results in parameter estimation as in the stationary case, see Fig. 9. This is also the case for the vorticit y p oint estimates sho wn in Fig. 10. These plots can b e found in the App endix in section C.2. In Fig. 4 w e presen t the results as in Fig. 3 previ- ously . The scale of the v ariance of the v orticit y estimates is as b efore at unobserv ed lo cations, ho wev er there is spatial correlation in tro duced b y the observ ations. The lik eliho o d is p eaky around the informative observ ation lo cations, which reduces the v ariance of the p osterior of the velocity at these lo cations, as w ell as the p osterior’s dep endence on the prior. F or higher v alues of α this spatial correlation is more pronounced as the corresp onding prior v ariance is smo othed. So a higher α leads to a less informative prior v ariance, additionally increasing the p osterior’s dep endence on the likelihoo d. This is why the v ariance is impacted by α as it w as in the stationary setting only at unobserved lo cations. This can b e additionally seen in the plots of the tra jectories of the velocity’s estimates at observed times. Due to the chaotic regime having high energy , the tra jectory plots sho w that the diﬀerence in v ariances do es not v anish in time and in turn impacts the estimate’s predictive p o wer. α = 3 leads to an underestimated v ariance at unobserv ed lo cations: the conﬁdence interv al do es not capture the true v alue after v ery few time steps past the observ ations. At observed lo cations the forecast p erforms w ell, meaning the likelihoo d is enough to correct the missp eciﬁed prior. As for α = 1 . 5, the v ariance is o verestimated leading to p oor uncertain ty quan tiﬁcation at all times. At observed lo cations its forecast p erforms well un til the last observ ation, its pre- 12 Figure 3: (Stationary regime) Heat map of the v ariance of the vorticit y of v 0 vs α (top) in the stationary setting, as previously arranged, with: α = 2 . 2 v alue (far left), α = 1 . 5 (cen tre left), α = 3 (centre righ t) and estimating α using MwG (far righ t). Note the scale of the v ariance of v 0 ’s estimates b eing inv ersely prop ortional to α . W e also plot the tra jectory of the v elo cit y in the 2nd dimension of those estimates at obse rv ed (middle) and unobserved (b ottom) lo cations. The vertical dashed line indicates the time of the last observ ation. The algorithms kno wing the true α and estimating it via MwG exhibit similar b eha viour and uncertain ty quantiﬁcation. 13 Figure 4: (Chaotic regime) Heat map of the v ariance of the velocity squared of v 0 vs α (top) in the c haotic setting, as previously arranged, with: α = 2 . 2 v alue (far left), α = 1 . 5 (cen tre left), α = 3 (centre righ t) and estimating α using MwG (far righ t). The white circles in the top panels indicate the p ositions where the vector ﬁeld is observed. W e also plot the tra jectory of the velocity in the 2nd dimension of those estimates at observed (middle) and unobserved (b ottom) lo cations. The v ertical dashed line indicates the time of the last observ ation. Note the scale of the v ariance of the prediction estimates being in v ersely prop ortional to α . 14 Figure 5: T race (left) and density (righ t) plot of smo othness parameter estimates in case study 2. The dashed line corresp onds to the median estimate ≈ 1 . 1, close the true v alue generating the data α = ν + 1 = 2. dictiv e p o wer degrades quic kly as the v ariance explo des. Then pCN with α = 1 . 5 p erforms p oorly at uncertaint y quantiﬁcation, particularly for unobserved states and prediction. This sho ws that even where the likelihoo d is most inﬂuen tial, the prior still inﬂuences the algo- rithm’s predictive pow er. Finally the MwG repro duces similar p erformance to the algorithm assuming the correct v alue of α and p erforms similarly w ell even in prediction. 4.2 Case study 2: advection-diﬀusion SPDE W e reuse the co de and example provided in [51] using the R pack age ’spate’. The dis- cretisation grid is of size n 2 with n = 32. W e set T = 20 observ ations, generated from the whole grid (Υ = n 2 ) with Gaussian noise τ 2 = 0 . 01. ξ is generated from parameters ρ 0 = 0 . 1 , σ 2 = 0 . 2 , ζ = 0 . 5 , ρ 1 = 0 . 1 , γ = 2 , α = π / 4 , µ 1 = 0 . 2 , µ 2 = − 0 . 2. W e pick N = 10 6 with 10% burn-in and 5000 to b e b oth the burn-in p eriod and the minimal num ber of MCMC samples after this burn-in to estimate the prop osal matrix. W e pic k as starting v alues ρ 0 = 0 . 2 , σ 2 = 0 . 1 , ζ = 0 . 25 , ρ 1 = 0 . 2 , γ = 1 , α = 0 . 3 , µ 1 = 0 , µ 2 = 0 , τ 2 = 0 . 005 and initiate α correctly . As before we compare the p erformance of the algorithm additionally estimating the smo othness to the standard algorithm assuming correct and incorrect v alues of α . Results on α and the remaining parameters are shown in Fig. 5 and Fig. 6 respec - tiv ely . The estimation of α as 2 . 1 is close to the true v alue 2 and the parameter space is w ell explored. Assuming the wrong v alue of α leads to very p o or parameter estimation with parameter samples for σ 2 , ζ , ρ 1 , γ , µ x , µ y , τ in non-represen tative v alues. How ev er es timating α allows for parameter estimation results very close to the accurate p erformance of the stan- dard algorithm kno wing the correct α , as the distributions of their estimates are cen tered or capture well the true v alue. This is additionally supp orted by the parameters’ trace plots in Fig. 11 in section D in the App endix. 5 Conclusion In this w ork, w e consider tw o case studies that share common c haracteristics with op erational data assimilation problems. In b oth cases, w e ha ve illustrated the merits of estimating the smo othness parameter together with other parameters of the prior and lik eliho o d as w ell as noticed p erformance similar to knowing the true v alue of α . At the same time, we 15 (a) α = 2 (b) Estimated α (c) α = 1 . 5 (d) α = 2 . 5 Figure 6: Histogram of the parameters estimated when the algorithm knows the true α = 2 v alue (top left), underestimates α = 1 . 5 (b ottom left), o verestimates α = 2 . 5 (b ottom right) and estimates α (top right). The dashed line indicates the true v alue and the full line represen ts the median of the samples. Assuming the wrong v alue of α leads to very p oor parameter estimation with parameter samples for σ 2 , ζ , ρ 1 , γ , µ x , µ y , τ in non-represen tative v alues. Samples for µ x , µ y are still distributed according to their prior meaning no informa- tion w as gained from the data for µ . How ev er estimating α and knowing the correct α allo ws for similar accurate parameter estimation results, as the distributions of their estimates are cen tered around or capture w ell the true v alue. 16 sho wed how missp ecifying α can lead to p o or uncertain ty quantiﬁcation or estimation of other parameters in ϕ, θ . Our case studies co vered diﬀerent Gaussian priors (Matern/Laplacian), diﬀeren t dynamics (linear, nonlinear stationary or c haotic), observ ation regimes (dense or sparse observ ation grids). So w e b eliev e using hierarchical mo delling for ϕ, θ together with MwG can b e very helpful in practice. Ev en when little is kno wn ab out the smo othness of the state, it can b e inferred and by exploring the parameter space it adds ﬂexibility to the mo del. W e ha ve also established that in certain cases practitioners ma y c ho ose to sacriﬁce uncertaint y quantiﬁcation for improv ed p oint estimates and vice-versa. In real NWP op erational problems the state dimension may reac h O (10 7 − 10 9 ), whereas our pro of of concept lo oks at O (10 3 ). Still we highlight that the ov erhead of estimating α as well as the rest of θ and ϕ can b e minimal. Finally , this approac h could b e generalized/extended to more elab orate priors: e.g. Matern with Cauc hy noise [56] or the more general α -stable distributions [57], with Studen t’s t priors as in [50] or sparsity-promoting priors as in [16, 39]. Ac kno wledgemen ts B.S. was supp orted by the Additional F unding Programme for Mathematical Sciences, de- liv ered b y EPSR C (EP/V521917/1) and the Heilbronn Institute for Mathematical Research. Conﬂict of in terest The authors state that none of the work describ ed in this publication app ears to hav e b een inﬂuenced b y any known conﬂicting ﬁnancial motiv es or p ersonal ties. 17 References [1] S. Agapiou, J. M. Bardsley, O. P ap aspiliopoulos, and A. M. Stuar t , A nal- ysis of the Gibbs Sampler for Hier ar chic al Inverse Pr oblems , SIAM/ASA Journal on Uncertain ty Quantiﬁcation, 2 (2014), pp. 511–544. [2] S. A gra w al, H. Kim, D. Sanz-Alonso, and A. Strang , A V ariational Infer enc e Appr o ach to Inverse Pr oblems with Gamma Hyp erpriors , SIAM/ASA Journal on Un- certain ty Quantiﬁcation, 10 (2022), pp. 1533–1559. [3] D. Beletsky, D. J. Schw ab, P. J. Roebber, M. J. McCormick, G. S. Miller, and J. H. Sa ylor , Mo deling wind-driven cir culation during the Mar ch 1998 se diment r esusp ension event in L ake Michigan , Journal of Geoph ysical Researc h (Oceans), 108 (2003), p. 3038. [4] A. F. Bennett , Inverse Mo deling of the Oc e an and Atmospher e , Cambridge Univ ersity Press, Cam bridge, 2002. [5] M. Bocquet, C. A. Pires, and L. Wu , Beyond Gaussian Statistic al Mo deling in Ge ophysic al Data Assimilation , Monthly W eather Review, 138 (2010), pp. 2997–3023. [6] A. Carrassi, M. Bocquet, J. Demaeyer, C. Grudzien, P. Raanes, and S. V an- nitsem , Data assimilation for chaotic dynamics , Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (V ol. IV), (2022), pp. 1–42. [7] V. Chen, M. M. Dunlop, O. P ap aspiliopoulos, and A. M. Stuar t , Dimension- R obust MCMC in Bayesian Inverse Pr oblems , Mar. 2019. [8] S. L. Cotter, M. D ashti, J. C. R obinson, and A. M. Stuar t , Bayesian inverse pr oblems for functions and applic ations to ﬂuid me chanics , In verse Problems, 25 (2009), p. 115008. [9] S. L. Cotter, M. D ashti, and A. M. Stuar t , Appr oximation of Bayesian Inverse Pr oblems for PDEs , SIAM Journal on Numerical Analysis, 48 (2010), pp. 322–345. [10] S. L. Cotter, G. Rober ts, A. Stuar t, and D. White , MCMC Metho ds for F unc- tions: Mo difying Old Algorithms to Make Them F aster , Statistical Science, 28 (2012). [11] J. Coullon and Y. Pokern , Markov chain Monte Carlo for a hyp erb olic Bayesian inverse pr oblem in tr aﬃc ﬂow mo deling , Data-Centric Engineering, 3 (2022). [12] S. Cox and P. Ma tthews , Exp onential Time Diﬀer encing for Stiﬀ Systems , Journal of Computational Physics, 176 (2002), pp. 430–455. [13] G. D a Pra to and J. Zabczyk , Sto chastic e quations in inﬁnite dimensions , no. 152 in Encyclop edia of mathematics and its applications, Cambridge Univ ersity Press, Cam- bridge, United Kingdom, second edition ed., 2014. [14] J. A. Duan, A. E. Gelf and, and C. F. Sirmans , Mo deling sp ac e-time data using sto chastic diﬀer ential e quations , Bay esian Analysis, 4 (2009). [15] M. M. Dunlop, T. Helin, and A. M. Stuar t , Hyp erp ar ameter Estimation in Bayesian MAP Estimation: Par ameterizations and Consistency , The SMAI Journal of computational mathematics, 6 (2020), pp. 69–100. 18 [16] M. Emzir, S. Lasanen, Z. Purisha, L. Roininen, and S. S ¨ arkk ¨ a , Non-stationary multi-layer e d Gaussian priors for Bayesian inversion , Inv erse Problems, 37 (2020), p. 15002. [17] H. W. Engl, M. Hanke, and A. Neubauer , R e gularization of inverse pr oblems , v ol. 375, Kluw er Academic Publishers Group, Dordrech t, mathematics and its applica- tions ed., 1996. [18] S. Fr ¨ uhwir th-Schna tter , Data Augmentation and Dynamic Line ar Mo dels , Journal of Time Series Analysis, 15 (1994), pp. 183–202. [19] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Veht ari, and D. B. R ubin , Bayesian Data A nalysis , Chapman and Hall/CR C, New Y ork, 3 ed., July 2015. [20] M. Ghil and P. Malanotte-Rizzoli , Data Assimilation in Mete or olo gy and Oc e ano gr aphy , Adv ances in geoph ysics, 33 (1991), pp. 141–266. [21] V. Giraul t and P.-A. Ra viar t , Finite Element Metho ds for Navier-Stokes Equa- tions , vol. 5 of Springer Series in Computational Mathematics, Springer, Berlin, Heidel- b erg, 1986. [22] M. Hairer, A. M. Stuar t, and S. J. Vollmer , Sp e ctr al gaps for a Metr op olis- Hastings algorithm in inﬁnite dimensions , The Annals of Applied Probability , 24 (2014). [23] T. Hein , On Tikhonov r e gularization in Banach sp ac es – optimal c onver genc e r ates r esults , Applicable Analysis, 88 (2009), pp. 653–667. [24] T. Hey and A. Trefethen , The Data Deluge: An e-Scienc e Persp e ctive , in Grid Computing: Making the global infrastructure a reality , Wiley Online Library , 1 ed., Mar. 2003, pp. 809–824. [25] J. P. Kaipio and E. Somersalo , Statistic al and Computational Inverse Pr oblems , v ol. 160 of Applied Mathematical Sciences, Springer, New Y ork, NY, 2005. [26] N. Kant as, A. Beskos, and A. Jasra , Se quential Monte Carlo Metho ds for High-Dimensional Inverse Pr oblems: A Case Study for the Navier–Stokes Equations , SIAM/ASA Journal on Uncertaint y Quan tiﬁcation, 2 (2014), pp. 464–489. [27] H. Kim, D. Sanz-Alonso, and A. Strang , Hier ar chic al ensemble Kalman metho ds with sp arsity-pr omoting gener alize d gamma hyp erpriors , F oundations of Data Science, 5 (2023), pp. 366–388. [28] B. T. Knapik, B. T. Szab ´ o, A. W. v an der V aar t, and J. H. v an Zanten , Bayes pr o c e dur es for adaptive infer enc e in inverse pr oblems for the white noise mo del , Probabilit y Theory and Related Fields, 164 (2016), pp. 771–813. [29] B. T. Knapik, A. W. v. d. V aar t, and J. H. v. Zanten , Bayesian inverse pr oblems with Gaussian priors , Annals of Statistics, (2012). [30] D. Konen and R. Nickl , Data assimilation with the 2D Navier-Stokes e quations: Optimal Gaussian asymptotics for the p osterior me asur e , July 2025. [31] M. Lassas and S. Sil t anen , Can one use total variation prior for e dge-pr eserving Bayesian inversion? , In verse Problems, 20 (2004), p. 1537. 19 [32] K. J. H. La w , Pr op osals which sp e e d-up function-sp ac e MCMC , Journal of Computa- tional and Applied Mathematics, 262 (2014), pp. 127–138. [33] F. Lindgren, H. R ue, and J. Lindstr ¨ om , An explicit link b etwe en Gaussian ﬁelds and Gaussian Markov r andom ﬁelds: the sto chastic p artial diﬀer ential e quation ap- pr o ach , Journal of the Ro yal Statistical So ciet y: Series B (Statistical Metho dology), 73 (2011), pp. 423–498. [34] D. Luengo, L. Mar tino, M. Bugallo, V. El vira, and S. S ¨ arkk ¨ a , A survey of Monte Carlo metho ds for p ar ameter estimation , EURASIP Journal on Adv ances in Signal Pro cessing, 2020 (2020), p. 25. [35] A. J. Majda and J. Harlim , Filtering Complex T urbulent Systems , Cambridge Uni- v ersity Press, Cambridge, 2012. [36] A. Malmber g, A. Arellano, D. P. Ed w ards, N. Fl yer, D. Nychka, and C. Wikle , Interp olating ﬁelds of c arb on monoxide data using a hybrid statistic al- physic al mo del , The Annals of Applied Statistics, 2 (2008), pp. 1231–1248. [37] R. Molina, A. Ka tsa ggelos, and J. Ma teos , Bayesian and r e gularization meth- o ds for hyp erp ar ameter estimation in image r estor ation , IEEE T ransactions on Image Pro cessing, 8 (1999), pp. 231–246. [38] F. Monard, R. Nickl, and G. P. P a ternain , Statistic al Guar ante es for Bayesian Unc ertainty Quantiﬁc ation in Nonline ar Inverse Pr oblems with Gaussian Pr o c ess Pri- ors , The Annals of Statistics, 49 (2021), pp. 3255–3298. [39] K. Monterrubio-G ´ omez, L. R oininen, S. W ade, T. D amoulas, and M. Giro- lami , Posterior infer enc e for sp arse hier ar chic al non-stationary mo dels , Computational Statistics & Data Analysis, 148 (2020), p. 106954. [40] R. M. Neal , Monte Carlo Implementation of Gaussian Pr o c ess Mo dels for Bayesian R e gr ession and Classiﬁc ation , Jan. 1997. arXiv:ph ysics/9701026. [41] N. H. Nelsen, H. O whadi, A. M. Stuar t, X. Y ang, and Z. Zou , Bilevel optimiza- tion for le arning hyp erp ar ameters: Applic ation to solving PDEs and inverse pr oblems with Gaussian pr o c esses , Oct. 2025. [42] R. Nickl and E. S. Titi , On p osterior c onsistency of data assimilation with Gaussian pr o c ess priors: The 2D-Navier–Stokes e quations , The Annals of Statistics, 52 (2024). [43] O. P ap aspiliopoulos and G. Rober ts , Non-Center e d Par ameterisations for Hier- ar chic al Mo dels and Data Augmentation , Bay esian Statistics, 7 (2003), pp. 307–326. [44] O. P ap aspiliopoulos, G. O. R ober ts, and M. Sk ¨ old , A Gener al F r amework for the Par ametrization of Hier ar chic al Mo dels , Statistical Science, 22 (2007), pp. 59–73. [45] F. Pons Llopis, N. Kant as, A. Beskos, and A. Jasra , Particle Filtering for Sto chastic Navier–Stokes Signal Observe d with Line ar A dditive Noise , SIAM Journal on Scien tiﬁc Computing, 40 (2018), pp. A1544–A1565. [46] S. Reich and C. Cotter , Pr ob abilistic F or e c asting and Bayesian Data Assimilation , Cam bridge Universit y Press, Cambridge, 2015. 20 [47] J. C. R obinson , Inﬁnite-Dimensional Dynamic al Systems: A n Intr o duction to Dissi- p ative Par ab olic PDEs and the The ory of Glob al A ttr actors. Cambridge T exts in Applie d Mathematics , Applied Mechanics Reviews, 56 (2003), pp. B54–B55. [48] L. Roininen, M. Girolami, S. Lasanen, and M. Markkanen , Hyp erpriors for Mat ´ ern ﬁelds with applic ations in Bayesian inversion , In verse Problems and Imaging, 13 (2018), pp. 1–29. [49] D. Sanz-Alonso and N. W aniorek , Hier ar chic al Bayesian inverse pr oblems: A high- dimensional statistics viewp oint , SIAM Review, 67 (2025), pp. 543–575. [50] A. Senchuko v a, F. Uribe, and L. Roininen , Bayesian inversion with Student’st priors b ase d on Gaussian sc ale mixtur es , In verse Problems, 40 (2024), p. 105013. [51] F. Sigrist, H. R. K ¨ unsch, and W. A. St ahel , sp ate: an R Package for Statistic al Mo deling with SPDE Base d Sp atio-T emp or al Gaussian Pr o c esses . [52] F. Sigrist, H. R. K ¨ unsch, and W. A. St ahel , Sto chastic Partial Diﬀer ential Equa- tion Base d Mo del ling of L ar ge Sp ac e–Time Data Sets , Journal of the Ro yal Statistical So ciet y Series B: Statistical Metho dology , 77 (2015), pp. 3–33. [53] M. L. Stein , Sp atial V ariation of T otal Column Ozone on a Glob al Sc ale , The Annals of Applied Statistics, 1 (2007), pp. 191–210. [54] J. R. Stroud, M. L. Stein, B. M. Lesht, D. J. Schw ab, and D. Beletsky , An Ensemble Kalman Filter and Smo other for Satel lite Data Assimilation , Journal of the American Statistical Asso ciation, 105 (2010), pp. 978–990. [55] A. M. Stuar t , Inverse pr oblems: A Bayesian p ersp e ctive , Acta Numerica, 19 (2010), pp. 451–559. [56] J. Suuronen, N. K. Chada, and L. Roininen , Cauchy Markov r andom ﬁeld priors for Bayesian inversion , Statistics and Computing, 32 (2022), p. 33. [57] J. Suur onen, T. Soto, N. K. Chada, and L. R oininen , Bayesian inversion with α -stable priors , Inv erse Problems, 39 (2023), p. 105007. [58] A. L. Teckentr up , Conver genc e of Gaussian Pr o c ess R e gr ession with Estimate d Hyp er-Par ameters and Applic ations in Bayesian Inverse Pr oblems , SIAM/ASA Journal on Uncertain ty Quantiﬁcation, 8 (2020), pp. 1310–1337. [59] L. Tierney , Markov Chains for Exploring Posterior Distributions , The Annals of Statistics, 22 (1994), pp. 1701–1728. [60] A. F. Vidal, V. D. Bor toli, M. Pereyra, and A. Durmus , Maximum Likeliho o d Estimation of R e gularization Par ameters in High-Dimensional Inverse Pr oblems: An Empiric al Bayesian Appr o ach Part I: Metho dolo gy and Exp eriments , SIAM Journal on Imaging Sciences, (2020). [61] P. Whittle , On Stationary Pr o c esses in the Plane , Biometrik a, 41 (1954), pp. 434–449. [62] C. K. Wikle, R. F. Milliff, D. Nychka, and L. M. Berliner , Sp atiotemp or al Hier ar chic al Bayesian Mo deling T r opic al Oc e an Surfac e Winds , Journal of the American Statistical Asso ciation, 96 (2001), pp. 382–397. 21 [63] K. Xu and C. K. Wikle , Estimation of p ar ameterize d sp atio-temp or al dynamic mo dels , Journal of Statistical Planning and Inference, 137 (2007). [64] Y. Yu and X.-L. Meng , T o Center or Not to Center: That Is Not the Question—An A ncil larity–Suﬃciency Interwe aving Str ate gy (ASIS) for Bo osting MCMC Eﬃciency , Journal of Computational and Graphical Statistics, 20 (2011), pp. 531–570. [65] Y. Zheng and B. H. Aukema , Hier ar chic al dynamic mo deling of outbr e aks of moun- tain pine b e etle using p artial diﬀer ential e quations , En vironmetrics, 21 (2010), pp. 801– 816. 22 A The Na vier Stok es equation W e introduce here the Navier-Stok es equation for incompressible ﬂo w. Let v ( x, t ) denote the velocity vector ﬁeld at p oin t x and time t . The domain w e consider is a 2D T orus T = [0 , 2 π ] 2 with initial ﬂow v 0 ∈ T and let v : T × [0 , ∞ ) → R 2 , v ( x, t ) = ( v 1 ( x, t ) , v 2 ( x, t )) T . The dynamics for v are as follo ws: ∂ t v − η ∆ v + ( v · ∇ ) v = f − ∇ p, v ( x, 0) = v 0 ( x ) (11) ∇ · v = 0 , Z T v j ( x, · ) dx = 0 , j = 1 , 2 , (12) where η > 0 is the viscosity parameter, p : T × [0 , ∞ ) → R is the pressure function, and f : T → R 2 is an exogenous time-homogeneous forcing. F or simplicity we consider the mean ﬂo w to b e 0 and assume p erio dic b oundary conditions: v j ( · , 0 , t ) = v j ( · , 2 π , t ) , v j (0 , · , t ) = v j (2 π , · , t ) , j = 1 , 2 ∀ t ≥ 0. Here v 0 is the initial velocity ﬁeld to b e estimated. W e no w deﬁne its domain as the following Hilb ert space V with closure V with resp ect to L 2 ( T ) 2 norm: V :=  2 π -p eriodic trigonometric p olynomials u : T → R 2 | ∇ · u = 0 , Z T u ( x ) dx = 0  . Let P : ( L 2 ( T )) 2 → V denote the Leray–Helmholtz orthogonal pro jector. P pro jects f on to the space of incompressible functions. In our con text this means the pressure term in (11) is remov ed. W e obtain the equation as in (3), rep eated b elo w for conv enience: dv dt + η Av + B ( v , v ) = P ( f ) , v (0) = v 0 , (13) where B ( v , ˜ v ) = 1 2 P (( v · ∇ ) ˜ v ) + 1 2 P (( ˜ v · ∇ ) v ) and A = − P ∆ is the Stokes op erator. If this set of equations is pro jected on a ﬁnite dimensional subspace, then it is an ODE. B Conjugate prior deriv ation In this section we explicitly show the conjugate relationship of the parameter β 2 , in Case study 1 used in section 3.3, exploiting the conjugate prior with Inv erse Gamma distribution for the Gaussian likelihoo d. A similar conjugate relationship holds for σ 2 in case 2. π ( β 2 | y , α, v 0 ) ∝ √ 2 √ 2 π d | β | d | A − α | 1 2 exp( − 1 2 β 2 v 0 A α v 0 ) · b a Γ( a ) ( β 2 ) − a − 1 exp  − b β 2  ∝ b a Γ( a ) ( β 2 ) − a − 1 ( β 2 ) − d 2 exp  − b β 2 − 1 2 β 2 v 0 A α v 0  ∝ b a Γ( a ) ( β 2 ) − ( a + d 2 ) − 1 exp − b + 1 2 v 0 A α v 0 β 2 ! ∝ ( b + 1 2 v 0 A α v 0 ) a + d 2 Γ( a + d 2 ) ( β 2 ) − ( a + d 2 ) − 1 exp − b + 1 2 v 0 A α v 0 β 2 ! 23 C Case study 1: diagnostics and run times C.1 Diagnostics (a) Acceptance ratio (b) V elo cit y trace plot (c) Auto correlation plot Figure 7: pCN diagnostics: the cumulativ e a verage (red) of the acceptance ratio (left) is around 0 . 23 as desired. The acceptance ratio (blue) do es not alternate b et ween 0 and 1 despite the blue background, it is div erse as desired. The trace plot (middle) of the real part of the v elo cit y in the ﬁrst dimension at the p oin t [2,0] of our grid shows the MCMC estimate (blue) oscillates around the true v alue (green), with its cumulativ e av erage (red) con verging to a v alue close to the truth. The auto correlation plot of this quantit y shows that the MCMC chain decorrelates after 1 . 5 · 10 5 iterations. Fig. 7 diagnostics for pCN show that N = 8 · 10 5 iterations are suﬃcien t for the MCMC to pro vide goo d estimates. In Fig. 7a we can observ e that the acceptance ratio is stable after the burn-in p erio d of 8 · 10 4 and b et ween the desired v alues of 0.2 and 0.3. F o cusing on the real v alue of a single F ourier coeﬃcient, in Fig. 7b the MCMC estimate con verges to − 0 . 54 very close to the true v alue − 0 . 56. Additionally its auto correlation plot in Fig. 7c suggests that despite high decorrelation lags, the high num ber of MCMC iterations will lead to a reasonable eﬀective sample size. F or MwG, we increase the n umber of MCMC mo ves to the traditional MCMC algorithm to take into accoun t the Gibbs sampler’s mov es on the h yp erparameters: N = 2 . 4 · 10 6 for p v = p β 2 = p α = 1 3 . T o fully observe the b eha viour of the parameter estimation, we use longer runs: N = 6 · 10 6 . This is solely for our study and w ould not b e needed in practice. W e found the optimal step-size for α to b e 0 . 96 for b oth regimes. T race plots of the v elo cit y estimates are sho wn in Fig. 8. 24 Figure 8: MwG diagnostics: the cum ulativ e av erage of the acceptance ratio of the initial condition and of α is around 0 . 23 as desired. The acceptance ratio is diverse as desired. The trace plot of the real part of the velocity in the ﬁrst dimension at v arious p oin ts shows each MCMC estimate (blue) captures the true v alue (green) in its distribution, with its cum ulative a verage (red) conv erging to a v alue mostly close to the truth. 25 Run times Algorithm Regime Mo del Conﬁguration Execution time PDE coun t pCN Stationary N-S N = 8 · 10 5 25 hours 8 · 10 7 MwG Stationary N-S N = 6 · 10 6 85 hours 2 · 10 8 pCN Chaotic N-S N = 8 · 10 5 12.5 hours 4 · 10 6 MwG Chaotic N-S N = 6 · 10 6 49 hours 5 · 10 6 MwG N/A Sto c h. AD α := 2 11 hours N/A MwG N/A Sto c h. AD U (1 , 4) 12.5 hours N/A T able 1: Run times of the MwG algorithm estimating θ , ϕ and of the pCN algorithm kno wing θ , ϕ . The MwG with p u = p β 2 = p α = 1 / 3 is sligh tly slow er than pCN due to additional mem- ory requiremen ts for storing h yp erparameter samples. T aking this into accoun t and lo oking at the n umber of times the PDE solver was ran, the run times highligh t that estimating θ do es not signiﬁcan tly increase the run times ov er the known θ case. The computational run times are dominated b y the runs of the PDE solver when ev aluating the likelihoo d, which w as conﬁrmed when proﬁling the co de. C.2 Chaotic regime plots Figure 9: T race (left) and densit y (righ t) plots of prior parameters. The trace plots show the MCMC estimates (blue) captures the true v alue (green) in its distribution, with its cum ulative av erage (red) conv erging to a v alue close to the truth. Similarly the densit y plots display the p osterior’s estimate (blue) p eaking around the true v alue (red) of b oth parameters. The prior corresp onds to the black dashed line. F or b oth parameters, the true v alue lies in the supp ort of the densities of the estimates. F or β , the challenging hea vy-tailed Gamma prior is shrunk around the true v alue to form the p osterior. 26 Figure 10: V orticit y of the initial condition v 0 with the truth (far left) and MCMC estimates: kno wing the true α = 2 . 2 v alue (centre left), underestimating α = 1 . 5 (centre), ov erestimat- ing α = 3 (cen tre right) and estimating α using MwG (far right). The crosses indicate the p ositions where the v ector ﬁeld is observed. W e observe similar estimates from the algo- rithms kno wing the correct smo othness and estimating it, alb eit giving a smo oth estimate. 27 D Case Study 2: trace plots (a) α = 2 (b) Estimated α (c) α = 1 . 5 (d) α = 2 . 5 Figure 11: T race plots of the parameters estimated when the algorithm knows the true α = 2 v alue (top left), underestimates α = 1 . 5 (b ottom left), ov erestimates α = 2 . 5 (b ottom righ t) and estimates α (top right). The dashed line indicates the burn-in p eriod. Assuming the wrong v alue of α leads to very p o or parameter estimation with σ 2 , ζ , ρ 1 , γ , µ x , µ y , τ in non-represen tative v alues. Samples for µ x , µ y are still distributed according to their prior indicating no information was gained from the data for µ . How ev er estimating α and kno wing the correct α allows for similar accurate parameter estimation results, as the samples are cen tered around or capture w ell the true v alue. 28

Smoothness and other hyperparameter estimation for inverse problems related to data assimilation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment