Appraisal of data-driven and mechanistic emulators of nonlinear hydrodynamic urban drainage simulators

Appraisal of data-driven and mechanistic emulators of nonlinear hydrodynamic urban drainage simulators J uan P ablo Carbajal 1 , J oão P aulo Leitão 1 , Carlo Albert 1 , and Jörg Rieckermann 1 1 Swiss F ederal Institute of Aquatic Science and T echnology , Eaw ag , Überlandstrasse 133, 8600 Dübendorf , Switzerland F ebruary 6, 2017 Abstract Many model based scientiﬁc and engineering methodologies, such as sys- tem identiﬁcation, sensitivity analysis, optimization and control, require a large number of model evaluations. In particular , model based real-time con- trol of urban water infrastructures and online ﬂood alarm systems require fast prediction of the network response at different actuation and/or param- eter values . General purpose urban drainage simulators are too slow for this application. F ast surrogate models, so-called emulators, provide a solution to this efﬁciency demand. Emulators are attractive, because they sacriﬁce unneeded accuracy in favor of speed. However , they have to be ﬁne-tuned to predict the system behavior satisfactorily . Also, some emulators fail to ex- trapolate the system behavior beyond the training set. Although, there are many strategies for developing emulators , up until now the selection of the emulation strategy remains subjective. In this paper , we therefore compare the performance of two families of emulators for open channel ﬂows in the context of urban drainage simulators . W e compare emulators that explic- itly use knowledge of the simulator’s equations, i.e. mechanistic emulators based on Gaussian Processes, with purely data-driven emulators using ma- trix factorization . Our results suggest that in many urban applications, naive data-driven emulation outperforms mechanistic emulation. Nevertheless, we discuss scenarios in whic h we think that mechanistic emulation might be fa- vorable for i) extrapolation in time and ii) dealing with sparse and unevenly sampled data. W e also provide many references to advances in the ﬁeld of Machine Learning that have not yet permeated into the Bayesian environ- mental science community . 1 Introduction F or many real-world systems with a nonlinear response, model based tasks such as sensitivity analysis, learning model parameters from data (i.e. sys- tem identiﬁcation or model calibration), and real-time control, are hampered by the long runtime of the employed numerical simulators. Even if runtimes 1 are short, these methods require a large number of model runs, which can take a prohibitive long time. One way of speeding up these tasks is to build fast surrogate models , so called emulators , to replace the computationally ex- pensive simulators. An emulator is a numerical model that is tailored to ap- proximate the results of a computationally expensive simulator with a huge reduction in the time needed to run a simulation [ O’Hagan , 2006 ], i.e. it is a metamodel. These ideas also belong to the technique of Reduced-Order Models (ROM), specially for models based on P artial Differential Equations (PDE) [ Baur et al. , 2014 , Quarteroni et al. , 2016 ], and emulation as described below . T o ground ideas, imagine that the ﬂow at the outlet of a drainage network is limited using a ﬂow limiting gate or by activating water storage systems (Fig . 1 ). The position of the gate and the activation of storage is controlled us- ing a model predictive controller [ Xi et al. , 2013 ]. Such a scenario is relevant in performance optimization of w ater treatment plants [ Fu et al. , 2008 ]. The signals used to control the ﬂows could be the current intensity and duration of rain events from several rain gauges within the catchment. The controller needs to estimate an optimal course of action by predicting the ﬂows induced by the rain and many possible actuations. This optimization generally re- quires thousands of model runs, which can take a prohibitive long time when running a physically detailed simulator of the sewer network, such as a EP A Storm W ater Management Model (SWMM) model [ Rossman , 2010 ]. How- ever , the simulator is just used to estimate the relation between the rain, the actuation, and the ﬂow . The full details of the simulator might not be required to obtain an accurate estimation of this relation. Feedbac k control might further reduce the required accuracy of the estimated relation. Emulation and interpolation are equivalent problems . An emulator is built using the best available simulator to sample the space of actuations and/or parameters (henceforth the latter will include actuations). The train- ing data is then used to build an interpolation function which should predict values at unseen parameters with an acceptable degree of accuracy , which is case dependent. That is, we reconstruct an unknown function F : R | γ |+ 1 → R , that takes a parameter vector of size | γ | and a time instant, and generates the value of the magnitude of interest. When this function is evaluated at the inputs used for training, the results are the same as the training outputs 1 , i.e. this is the meaning of interpolation of the training data adopted herein. Stated in this way , no distinction is needed between the parameters and the time components in the input. However , knowing that the data is generated by a dynamical system, we separate time from the other parameter compo- nents. Thus, we can ﬁnd one interpolant in time and one in parameter space, which might be coupled to each other . This parameter -time coupling emerges naturally in mechanistic emulation, as will be shown in Sec. 2.2 . When the simulator is based on differential equations , the link between Gaussian Processes (GP) and linear stochastic differential equations (SDE) [ Pog- gio and Girosi , 1990 , Steinke and Schölkopf , 2008 , Albert , 2012 , Särkkä et al. , 2013 , González et al. , 2014 , Solin , 2014 ], permits the creation of GP based emulators that include knowledge about the simulator dynamics; these are called mechanistic emulators (MEMs). Conceptually , mechanistic emulation seeks a function that interpolates the training data whithin a class of func- tions deﬁned by an SDE. The importance of GPs for MEMs stems from the fact that they are the formal solution of this SDE. Hence when the simu- lator is linear the emulator gives exact results; while for nonlinear simula- 1 Herein considered noiseless since they are generated by a deterministic simulator . 2 Figure 1: An imaginary drainage network with ﬂow limiting gate and/or water storage systems. The interesting singal is the level/ﬂow at the outlet of the network, marked with a circle. Isometric tiles from www.kenney.nl . 3 tors, the MEM will provide only an approximation. Increasing the accuracy of this approximation and the efﬁciency of the methods are two fundamental challenges in GP based emulation [ O’Hagan , 2006 , Rasmussen and Williams , 2006 , Ch. 8]. Reichert et al. [ 2011 ] enumerated four overlapping approaches for devel- oping emulators of dynamic simulators: i) Gaussian Processes ii) Basis function decompositions iii) State space transition function approximation iv) Stochastic linear model conditioned on data using Kalman smoothing . In particular approaches i) and iv) are two different implementations of the same problem [ Steinke and Schölkopf , 2008 ]. Roughly speaking the Kalman smoothing algorithm used in iv) is an iterative implementation of the con- ditioning of the GP in i) . The iteration in iv) avoids the ill-conditioned co- variance matrices [ Hansen , 1998 ] involved in GP when sampling rates are high [ Steinke and Schölkopf , 2008 , Reichert et al. , 2011 ] and it is faster than direct matrix inversion in a serial implementation. The GP approach i) is bet- ter suited for parallelization, speedups and energy saving via approximated computing [ Angerer et al. , 2015 ]. Approaches i) (or iv) ) and ii) are similar with respect to their implemen- tation. That is, approach ii) can be implemented using GP regression [ Ras- mussen and Williams , 2006 , sec. 2.7]. Therefore, the essential difference between i) and ii) is that the former explicitly introduces mechanistic knowl- edge. It will be shown here that approach i) is currently constrained to linear mechanistic knowledge, while popular methods based on maximum entropy [ Victor and J ohannesma , 1986 , Christakos , 1998 , Harte , 2011 ] can handle nonlinear knowledge. The difference between GP based mechanistic emulation and maximum entropy methods is that in the latter , the mecha- nistic knowledge is added as constraints on the moments of the predictive or posterior distribution, while in the former its is added as the dynamics of the prior model. Adding constraints to predictive distributions require expert knowledge available at the level of the emerging behavior of the simulator , while adding dynamic information requires knowledge about the constitutive elements parts of the simulator . The latter is likely to be readily available from the development of the simulator itself . Herein we compare the performance of GP emulators built using approaches i) , which we call mechanistic emulation , and ii) which we call data-driven emu- lation . The basis function that will be used for data-driven emulation will be derived solely from the data using matrix factorization, i.e. they will not ex- plicitly include mechanistic knowledge. In this article we use singular value decomposition (SVD) and nonnegative matrix factorization (NMF) to extract these bases (Sec . 2.1 ), but more general basis extraction methods like Proper Orthogonal Decomposition (POD) could be used [ Hesthaven et al. , 2016 ]. W e compare results in an academic emulation problem to highlight the differ- ences between the approaches (Sec. 3.1 - 3.2 ). W e also provide emulation ex- amples pertinent to the ﬁelds of hydrology and urban water management (Sec. 3.3 - 3.4 ). In all of these , da ta-driven emulation outperforms mechanistic emulation. The objective of this comparison is to provide intuition about the suitability of each approach, which is not available to date to the best of our knowledge, to highlight the need of enhancements of our emulators, and to motivate research questions in the ﬁeld of emulation. This is relevant, be- cause, as described above, many applications in the ﬁeld of urban drainage 4 and ﬂood predictions to date are hampered by slow models . T o a lesser de- gree, this is one of the few NMF applications in hydrology [ Alexandrov and V esselinov , 2014 ]. 2 Methods and Materials In the subsequent sections we ﬁrstly describe the two emulation approaches used herein (Sec. 2.1 - 2.2 ). Aiming at a wide readership, mathematical de- tail and rigor are kept at a minimum required level. References are provided for the interested reader . Secondly , we describe the datasets used for training and testing the emulators (Sec . 2.3 ). These include models of two small catc h- ments in Switzerland, that we use to thoroughly evaluate the performance of the emulators. W e use the word data to refer to the input and output pairs provided by the simulator being emulated, e.g. in the case of a hydrodynamic simula- tor , inputs could be time and physical parameters of a sewer network, and outputs water levels or ﬂows . 2.1 Data-driven emulators In this approach we make the following assumptions about the simulation data used to build the emulator: a) The data contains the most signiﬁcant dynamic features of the system response and these can be used as a time varying basis to reproduce the data. b) Unseen system responses are well approximated by a linear combina- tion of the features in a) , i.e. there exist a solution manifold that can be well approximated with low dimensional sets [ Quarteroni et al. , 2016 , Hesthaven et al. , 2016 ]. c) There exists a "smooth" mapping between inputs and the coefﬁcients of the linear combinations of features. With these assumptions in mind we deﬁne the approximation strategy: y ( t , γ ) ' q X i = 1 β i ( γ ) φ i ( t ), (1) β i ( γ ) ∼ G P ¡ m i ( γ ), K i ( γ , γ 0 ) ¢ (2) where y ( t , γ ) is the output of the simulator at time t and parameters γ , β i is a mapping between these parameters and the components of the output in the basis function set © φ i ( t ) ª . Following Rasmussen and Williams [ 2006 , Sec . 2.7] this could be generalized to a full GP regression problem. However , as stated here the problem is simpler and it is justiﬁed by the performance it provides in the examples showcased in Sec. 3 . The procedure to build a data-driven emulator follows: i. Extract the ﬁrst q ≤ n trn features © φ i ( t ) ª using the training set © y i ( t , γ i ) ª n trn i = 1 . This gives a q × n trn matrix B of coefﬁcients ( q coefﬁcients for each sim- ulation used for training) and the basis evaluated at the observed time points Φ i j = φ j ( t i ), i = 1, . . . , N , j = 1, . . . , q . ii. Interpolate the B coefﬁcients from step i. using a GP to obtain a func- tion of the inputs B = f ( γ ). 5 iii. Evaluate f ( γ ) on the parameters in test set, use eq. ( 1 ) to predict out- puts, and compare them with the output test set © y i ( t , γ i ) ª n tst i = 1 . The features © φ i ( t ) ª are extracted from the training data itself via matrix factorization, which allows us to impose some general constraints on the fea- tures, e.g. nonnegativity . Herein we use the singular value decomposition (SVD) and nonnegative matrix factorization (NMF), which are explained in later pragraphs. These methods provide the basis evaluated only at the observed time points, Φ , and to predict at unobserved times we linearly interpolate them over time. The implications of this interpolation will be discussed in Sec . 4 . This approach decouples the interpolation in time with the interpolation in parameter space. SVD is a robust factorization to calculate the eigenvectors of the covari- ance of the data matrix, i.e. the principal components. Given the matrix Y ∈ R N × n , SVD calculates the orthogonal matrices Φ ∈ R N × n and W ∈ R n × n , and the element-wise nonnegative diagonal matrix Σ ∈ R n × n + with only Σ i i 6= 0. These matrices fulﬁll the relation Y = ΦΣ W > . (3) Using this factorization to calculate the covariance of the data matrix gives Y Y > = ΦΣΣ > Φ > . (4) Showing that Φ are eigenvectors of the covariance matrix, i.e the principal components of the data. A decomposition of Y in the form of eq. ( 1 ) is obtained by deﬁning B as the follows: B = Σ W > , (5) Y = Φ B . (6) The quality of approximation of the data degrades graciously with decreas- ing number of principal components . Hence, only the ﬁrst q < n trn principal components are used in general. This reduces the size of the representation of the training data. In the context of ROM and PDE, this decomposion is also know as Proper Orthogonal Decomposition (POD). NMF provides an approximate minimum norm positive decomposition of the data [ Kim and P ark , 2008 ] 2 . F ormally it solves the following problem: Problem 1 (NMF) . Given the matrix Y ∈ R N × n and q ∈ N with q ≤ n , ﬁnd matrices Φ ∈ R N × q + and B ∈ R q × n + such that they minimize k Y − Φ B k 2 + a k Φ k 2 + b k B k 2 . (7) Where k · k is the F robenius matrix norm and R + is the set of nonnegative real numbers . Intuitively , NMF works as SVD but constraining the principal components and the mixing coefﬁcients to be nonnegative. The decomposition controls the norm of the basis and its coefﬁcients using two regularization terms parametrized with weights a and b , which are manually tuned. The non- negative basis provided by NMF might be readily interpreted in physical terms in hydrological applications, yet the method is not widespread in the community . 2 function nmf_bpas of GNU Octa ve’s linear-algebra package http://octave.sourceforge.net/ linear- algebra/ . 6 2.2 GP based mechanistic emulator (MEM) The predictive mean of a GP in x conditioned on data observed at x 0 , has the following structure y ( x ) = K ( x , x 0 ) α + m ( x ), (8) α = ¡ K ( x 0 , x 0 ) + κ I ¢ − 1 ¡ y ( x 0 ) − m ( x 0 ) ¢ . (9) where K and m stand for the covariance and the mean function of the prior GP , respectively . The weights vector α is learned from the data at the con- ditioning step, requiring the inversion of the matrix obtained by evaluating the covariance function at the observed inputs, shown in eq. ( 9 ). The regular - ization parameter κ encodes our trust on the prior knowledge and the error , if any , of the observations y ( x 0 ) 3 . A MEM is the GP in time and simulator parameters associated with an input-driven linear stochastic ordinary differential equation (SODE) with a multidimensional state space. Each component of the state space is deﬁned by an observed simulator’s parameter vector ( γ i ). Extra components are re- served for prediction at unseen simulator’s parameter vectors 4 . The process of building a MEM requires the deﬁnition of i. A linear prior model. ii. The covariance and mean functions of the GP . iii. A mapping from the parameters of the simulator to the parameters of the GP . In the following sections we obtain the mathematical expression of the covari- ance and mean functions of a ﬁrst order linear time invariant (L TI) SODE, and explain the construction of the mapping between parameter spaces. L TI systems often arise, for example, from a ﬁnite element modeling of partial differential equations. F or a rigorous and more general development, that include time varying parameters, see Albert [ 2012 ]. The development for periodic difference equa- tions was presented in Steinke and Schölkopf [ 2008 ] and extended to a more general setting in González et al. [ 2014 ]. A L TI SODE in the vector -valued function s ( t ) : R + → R M , (10) with initial condition s (0) = s 0 ∼ N ( ¯ s 0 , Σ 0 ) , is deﬁned by the equation d s d t ( t ) = A s ( t ) + u ( t ) + ξ ( t ), (11) where A ∈ R M × M , u ( t ) is a deterministic exogenous actuation, and ξ ( t ) ∼ N ¡ 0, Σ ξ ( t , t 0 ) ¢ is a Gaussian noise term with covariance function Σ ξ ( t , t 0 ) = Σ δ ( t − t 0 ) Σ ∈ R M × M . (12) The general solution of this equation is s ( t ) = e A t s 0 + Z t 0 e A ( t − τ ) ( u ( τ ) + ξ ( τ ) ) d τ . (13) 3 In the MEMs used herein, to obtain interpolation of the training data, we set the regularization parameter to the machine epsilon. 4 typically only one component is reserved for this, but more could be used in a parallel setting . 7 where the ﬁrst term is the solution of the homogenous ODE, i.e. with u ( t ) and ξ both zero. The second term is the response of the ODE to the noisy actuation. As mentioned above, this system depends on the simulator’s parameters. In general this means A ( γ ), Σ ξ ( γ , γ 0 ), u ( t , γ ), and potentially s 0 ( γ ). 2.2.1 Covariance function W e calculate the covariance function of the trajectories in eq. ( 13 ) using the formula K ( t , r ) = E h ( s ( t ) − E [ s ( t )] ) ( s ( r ) − E [ s ( r )] ) > i , (14) where s > indicates transposition and E is the ensemble average over initial conditions and noise realizations. The ensemble average of the solutions , E [ s ( t )], is solely determined by the average of the homogeneous part (depending only on the initial condition) and the deterministic actuation u , because the contribution of the noise term ξ vanishes due to its zero mean value: E [ s ( t )] = e A t ¯ s 0 + Z t 0 e A ( t − τ ) u ( τ )d τ . (15) That is, the mean function is the solution to the deterministic (noise-free) inhomogeneous ODE. From eqs. ( 13 ) and ( 15 ) we obtain the factors in the expectation in eq. ( 14 ), s ( t ) − E [ s ( t )] = e A t ( s − ¯ s 0 ) + Z t 0 e A ( t − τ ) ξ ( τ )d τ . (16) Inserting this in the expression for the covariance function, eq. ( 14 ), we ob- tain four terms. The ﬁrst term is the covariance of the initial conditions . The second term is a product of the integral of the noise term. The last two terms are products of the initial conditions and the noise term; these terms will vanish due to the independence of the initial condition and the noise term, and due to the zero mean of the latter . Finally , we obtain: K ( t , r ) = e A t Σ 0 e A > r + Z t 0 Z r 0 e A ( t − τ ) Σ δ ( τ − ρ ) e A > ( r − ρ ) d ρ d τ = e A t Σ 0 e A > r + Z min( t , r ) 0 e A ( t − µ ) Σ e A > ( r − µ ) d µ . (17) The second term can be recognized as the property of the covariance under a linear transformation: s = G v Σ s = G ( G Σ v ) > , (18) where the matrix product should be interpreted as the application of the integral ( A v )( t ) = Z ∞ 0 A ( t , µ ) v ( µ )d µ , (19) ( A B )( t , r ) = Z ∞ 0 A ( t , µ ) B ( µ , r )d µ . (20) In our case G stands for the Green’s function of the linear operator d d t − A , (21) 8 which is G( t , r ) = H ( t − r ) e A ( t − r ) , (22) with H the Heaviside or Step function. Its transpose (adjoint) is G † ( t , r ) = H ( r − t ) e A > ( r − t ) . (23) giving ³ G ¡ G Σ ξ ¢ † ´ ( t , r ) = Z ∞ 0 G( t , µ ) µ Z ∞ 0 G( µ , τ ) Σ δ ( τ − r )d τ ¶ † d µ = Z ∞ 0 G( t , µ ) Σ G † ( µ , r )d µ = Z ∞ 0 H ( t − µ ) H ( r − µ ) e A ( t − µ ) Σ e A > ( r − µ ) d µ = Z min( t , r ) 0 e A ( t − µ ) Σ e A > ( r − µ ) d µ (24) This implies that if the differential operator has a known Green’s func- tion, then the covariance function of the GP can be calculated by transform- ing the covariance function Σ of the noisy input ξ . A more general and formal treatment of this process is described in Kimeldorf and W ahba [ 1970 ] and the relation between differential operators and kernels is summarized in Steinke and Schölkopf [ 2008 ]. This covariance function computes the statistical in- teractions between the components of the L TI SODE. In other words, it pro- vides the coupling of components of what is know as multi-output GP in the machine learning community and cokriging in geostatistics [ Rasmussen and Williams , 2006 , Sec. 9.1]. In Fricker et al. [ 2013 ] different structures of the coupling between output components were studied, MEMs automatically de- rive the coupling structure from the available prior knowledge. 2.2.2 Linear prior T o determine the elements in the system matrix A in ( 11 ), we select a linear model for each output in the training data corresponding to a simulator’s parameter vector , which we call the linear proxy : L i : R + × R q → R d , (25) t , θ i 7→ z ( t , θ i ). (26) The q dimensional parameter vector θ i of the proxy depends on the simula- tor’s parameters used for the i th simulation, i.e. θ i ( γ i ). The proxy’s output dimension d is equal to dimension of each output in the training data, usu- ally d = 1. The concatenated set of proxies deﬁnes the linear prior of the MEM. Herein, the proxies will be m dimensional linear time invariant (L TI) ODEs, i.e . in state space form d s d t ( t ) = A ( θ ) s ( t ) + u ( t , θ ) A ∈ R m × m , u ∈ R m (27) z ( t ) = C s ( t ) C ∈ R d × m (28) Here all proxies have the same dimension m . Although this is not required by the method, it is the simplest structure of MEMs [ Albert , 2012 ]. Nev- ertheless, proxies with different dimensions might be better suited for dy- namical systems with bifurcations, e.g. training data containing a mixture of 9 oscillating and converging time series, due to the presence of a Supercritical Andronov-Hopf bifurcation [ Kuznetsov , 2006 ] in the simulator . The emulator’s linear prior is constructed by aggregating the proxies and by coupling them with a noise term with a covariance that depends on the simulator’s parameters , i.e d S d t ( t ) = A ( Θ ) S ( t ) + U ( t , Θ ) + ξ ( t , Γ ), (29) Z ( t ) = C ( Θ ) S ( t ), (30) A ∈ R m n × m n , U , ξ ∈ R m n × 1 C ∈ R d × mn . Where Θ = © θ i ª and Γ = © γ i ª contain the corresponding n parameter vectors. The matrix A is block diagonal, C is a horizontal concatenation, and U a vertical concatenation: A ( Θ ) =     A ( θ 1 ) . . . A ( θ n )     , U ( t , Θ ) =     u ( t , θ 1 ) . . . u ( t , θ n )     , C ( Θ ) = £ C ( θ 1 ) . . . C ( θ n ) ¤ . (31) 2.2.3 P arameter mapping T o evaluate the matrices in eq. ( 31 ) we need a mapping from simulator’s parameters γ to proxy’s parameters θ , , i.e. Γ 7→ Θ . It can be an ad-hoc function derived from knowledge about the simulator , as was done in [ Albert , 2012 , Machac et al. , 2016a , b ] or it can be learned directly from the data. The latter is especially useful if proxies with different dimensions are combined to form the linear prior of the emulator . A proxy’s parameter θ i can be learned from the data via optimization, e.g . least squares ﬁt of the data y i ( T , γ i ) using z ( T , θ i ). Alternatively , Θ can be left as hyperparameters in the GP and estimated by maximizing the likelihood. In any case, the obtained pairs ( γ i , θ i ) are then used to learn a mapping between the simulator’s and emulator’ s parameter spaces. 2.2.4 W arped MEM It is possible to add some nonlinear knowledge to a mechanistic emulator when the simulator’ s equations can be approximated with a Wiener model, i.e. a time independent nonlinear function applied to the states of a linear dynamical system. To do this we use W arped GPs [ Snelson et al. , 2003 ]. This modiﬁcation does not affect the structure of the emulator , only the data on which it is trained. In addition to the linear dynamical system parame- ters deﬁning the prior , the parameters of the nonlinear function need to be learned as well unless this mapping is explicitly given, e.g. from the lin- earization of a nonlinear ODE via a change of variables as in the Bernoulli ODE [ Hairer et al. , 1993 ]. 10 2.3 Datasets description Simulated data used to build emulators are provided in a set of scalar signals y i ( t j , γ i ) with i = 1, . . . , n , all of them sampled at the same time points, i.e. T = © t j ª , j = 1, . . . , N . F or all emulators reported here , the time series in the datasets were subsampled as much as possible, without deteriorating their relevant dynamic features, e.g . oscillations and/or peaks. In all cases, the datasets are randomly separated in a training set of size n trn and a test set of size n tst , with n trn + n tst = n . T able 1 summarizes the properties of the data used. Dataset Nonlinear DS I & II W artegg Adliswil # parameters, | γ | 1 2 8 Time samples, N (used/total) 6/40 52/2880 193/601 Training set, n trn 10 200 128 T est set, n tst 190 700 128 T able 1: Summary of datasets used herein. 2.3.1 Nonlinear dynamical system dataset I & II T o illustrate the virtues of MEMs, we ﬁrst consider a didactical example us- ing data generated by the model d x d t = a ( x 0 ) x + b ( x 0 ), x (0) = x 0 , (32) a ( x ) = − a 0 e − a 1 ( x − sign ( x ) ) 2 , (33) b ( x ) = b 0 tanh ( x ). (34) The parameters values are a 0 = 12.616, a 1 = 5, b 0 = 2. The time evolution of this contrived system is linear . The parameters deﬁning the evolution depend nonlinearly on the initial condition, i.e . the pa- rameters of the linear system are nonlinear functions of the initial condition. Therefore an emulator of this system takes time and the initial condition ( | γ | = 1) as inputs. The behavior of this system meets the above assumptions underlying a MEM with a linear time-invariant prior . Thus an emulator without coupling noise solves this system exactly . The ﬁrst dataset consists of simulated time series with 40 output obser - vations for 200 initial conditions x 0 ∈ [ − 1, 1]. The second dataset is built by mixing the trajectories of the dynamical system in eq. ( 32 ), speciﬁcally: ˆ x ( t , x 0 ) = Z α x ( t , x 0 + α ) N α ( x 0 , σ 2 )d α , (35) where σ = 0.5. Although the nonlinearities in the dynamical system still depend only on the initial condition, the smoothing couples neighboring tra- jectories. F or the mechanistic emulation we will use a 1 dimensional linear proxy: d s d t ( t ) = θ 1 ( s 0 ) s ( t ) + θ 2 ( s 0 ), (36) 11 and use the knowledge from the ﬁrst simulator to set s 0 = x 0 , θ 1 ( s 0 ) = a ( s 0 ), and θ 2 ( s 0 ) = b ( s 0 ). Since each proxy is the solution of the system in eq. ( 32 ), no coupling of the components of the MEM is needed in the ﬁrst case , i.e. Σ = I . F or the second simulator we will couple the components using a 1 dimensional Matérn covariance function and the parameters mapping will be learned from the data. 2.3.2 W artegg catchment dataset This dataset was generated from a SWMM model of a 2.64 km 2 urban catch- ment located in the city of Lucerne in the canton of Lucerne, Switzerland. This model has been calibrated satisfactorily to observed rainfall-runoff and used for hydrological studies of the site [ T okarczyk et al. , 2015 , detailed model description provided therein]. The dataset consists of 900 time series with 2880 data points, simulating 24 h of water levels in an open outlet during different rain events. T o drive the system into a highly nonlinear behavior we synthetically generated rain events covering a wide range of return periods. These events were generated following a block rain model [ Gujer , 2007 , Ch. 13.2] parametrized by inten- sity ( I ) and duration ( d ), i.e. γ = ( I , d ) and | γ | = 2. The intensity of the event spans 30 values in the range I ∈ [10, 100] mm h − 1 , with 30 different durations ( d ∈ [10, 240] min) for each intensity . F or the mechanistic emulation we will use a 1 dimensional linear proxy for the water level: d h d t ( t ) = θ 1 ( γ ) h ( t ) + θ 2 ( γ ) R ( t , γ ) (37) where R ( t , γ ) is the rain event used in the simulation. The values of θ 1 ( γ ) and θ 2 ( γ ) are obtained by a least squares ﬁt of the data. Due to the low performance achieved by standard MEMs in this dataset, the actual MEM presented in Sec. 3.3 uses a Wiener model as proxy . This is done using W arped GPs (Sec. 2.2.4 ), with a nonlinearity given by ˆ h ( t ) = a ( γ ) tanh( b ( γ ) h ( t ) + c ( γ )), where the parameters a , b and c are learned from the data. 2.3.3 Adliswil catchment dataset This dataset was generated from a SWMM model of a 1.63 km 2 urban catch- ment located in the city of Adliswil in the canton of Zurich, Switzerland. The dataset was used in Machac et al. [ 2016a ](detailed model description pro- vided therein) to create an emulator which was then used to speed-up the calibration (identiﬁcation) of the parameters in the simulator . The dataset consist of 256 time series with 601 data points, all of them corresponding to a single rain event, but with different 8 dimensional in- put parameter vectors ( | γ | = 8). The time series are simulations of inﬂow to the local WWTP , and the parameters describe the physical properties of the sewer network. F or the mechanistic emulation we will use the same linear proxy as in Machac et al. [ 2016a ], a 1 dimensional ODE for the discharge: d Q d t ( t ) = θ 1 ( γ ) Q ( t ) + θ 2 ( γ ) R ( t ) (38) where R ( t ) is the measured rain event. Two MEMs will be built, the ﬁrst uses the values of θ 1 ( γ ) and θ 2 ( γ ) using the relation from Machac et al. [ 2016a ], and the second will obtain them from a least squares ﬁt of the data. 12 2.4 P erformance assessment T o asses the quality of an emulator we compute the following emulation er- rors on the data reserved for testing: i. Maximum absolute error (MAE) e MAE ( γ j ) = max i ¯ ¯ ˆ y ( t i , γ j ) − y ( t i , γ j ) ¯ ¯ (39) where ˆ y ( t , γ ) and y ( t , γ ) are the emulated and simulated responses, re- spectively . ii. Root mean square error (RMSE) e RMSE ( γ j ) = v u u t 1 N N X i = 1 ¡ ˆ y ( t i , γ j ) − y ( t i , γ j ) ¢ 2 (40) Error e MAE measures the error in reproducing extreme values and/or peaks in the simulated signals, while error e RMSE measures an overall quality of the emulation. 3 Results 3.1 Nonlinear system dataset This dataset is used to highlight the value of the mechanistic over data- driven emulation. It is suited for illustration purposes, since the surface to be reconstructed can be plotted ( | γ | = 1). Fig . 2 shows the response of the system described in eqs. ( 32 )-( 34 ) as a surface R 2 → R . Fig. 2a illustrates the weakness of the SVD emulator , which does not exploit the simulator’s parameter dependence . Moreover predictions at unseen time points are provided by linear interpolation of the SVD basis. T o build the MEM we used the exact simulator’s parameter dependence. Al- though training data is sparse in the time direction, since the MEM encodes the right time evolution, the reconstruction quality is high, as can be seen in Fig. 2b . Although the SVD emulator could be improved by using an ex- ponential basis for time interpolation [ Franz and Gehler , 2006 ], i.e . the one provided by the covariance function of the MEM, the recovered parameter dependence will still be poor due to the low sampling of the parameter space . This example shows that mechanistic emulation is ideal for situations where there is good prior knowledge and the training data is sparse. This is even more striking when data corresponding to different simulator parame- ters is sampled at different times, e.g . with adaptive stepsize simulators. In this case mechanistic emulation can be applied directly , while SVD emulation becomes complicated as a matrix completion problem needs to be solved be- fore factorization can be applied [see Oh , 2010 , for an SVD relevant analysis]. In a similar fashion the Kalman smoothing algorithm described in Reichert et al. [ 2011 ] also requires a ﬁrst step in which all the data is interpolated into the same temporal grid, e.g . via interpolation. 3.2 Nonlinear system dataset II Since the structure of the proxy of a MEM does not change once its dimension m is set, we can optimize the proxy’s parameters to reduce epistemic biases. The example shown in Fig. 3 illustrates the effect of mismatches between 13 (a) SVD (b) MEM Figure 2: Nonlinear system emulation of didactical model. The colored surface shows the behaviour of the simulator output. Red dots show the training data. The emulated surface is shown with a black wireframe. In this contrived scenario a MEM ( b ) outperforms an SVD emulator ( a ). The failure of SVD is mainly due to the linear interpolation in the time direction. 14 the dynamical system and the proxy , i.e. epistemic bias, and how it can be mitigated. In Fig . 3a we show the performance of a MEM built with the same proxy as in Fig. 2b , but used to emulate the nonlinear dynamical system described at the end of Sec . 2.3.1 . Comparing with Fig . 2b , we see that in the regions where there is no observed data, the emulation is biased. A MEM with proxies ﬁtted to the data is shown in Fig. 3b . In this case the epistemic bias is considerably reduced although we did not used mechanistic knowledge to improve the parameters mapping. Hence, if the mapping between the simulator parameters and the lin- ear proxy’s parameters cannot be exactly determined from the mechanistic knowledge, it is better to ﬁt the proxy’ s parameters to the data instead of using and ad-hoc calculation based on on one’s best guess or expert opinion. The latter can be improved a posteriori by studying the parameters mapping emerging from the ﬁt. 3.3 W artegg catchment dataset Results of these simulations can be seen in Fig. 4 . F or rains shorter than a certain duration (about 20 min for the intensity used in the plot, 41 mm h − 1 ) the water level response shows the typical wave of runoff in an open channel. After some critical duration, which corresponds to the catchment’s time of concentration, the water level becomes constant and remains ﬁxed for the duration of the rain, deﬁning the triangular region marked in the plot. After the rain, the water level goes back to a lower ﬁxed level and remains there for a period of time which is a nonlinear function of the rain duration, e.g. between 4-6 h for a 4 h event. This shows the nonlinear nature of the storage involved in the effective discharge of the network. Finally there is a slow decay in the level fueled by the residual water in the network, the rate of this decay also depends nonlinearly on the rain’ s duration. Fig . 5 shows the distribution of the test error of a MEM 5 and a NMF emulator with 7 components. T able 2 summarizes the mean of the error dis- tributions. The NMF emulator outperforms the mechanistic emulator . In previous trials, an SVD emulator was also built and provided results compa- rable with NMF (not shown here), however it produced negative predictions just before the steep increase of the water level at the beginning of the rain event. Emulator MAE (m, %) RMSE (m, %) NMF(1 × 10 4 ) 3 . 2 × 10 − 2 , 7.7 0 . 94 × 10 − 2 , 4.1 MEM(1 × 10 2 ) 22 . 6 × 10 − 2 , 53.7 4 . 4 × 10 − 2 , 17.4 T able 2: Mean emulation errors corresponding to the W artegg catchment dataset. The value in parenthesis is the simulation speed-up factor obtained with respect to the original simulator , e.g. if SWMM simulation takes 3 s, MNF emulator takes 0.3 ms. Fig . 6 shows the quality of the NMF emulation for three different rain intensities. 3.4 Adliswil catchment dataset In [ Machac et al. , 2016a ] a SWMM model of the Adliswil catchment was em- ulated using a MEM with a prior derived from a simpliﬁed version of the 5 W arped MEM, see Sec. 2.3.2 . MEMs with linear proxies were unable to reduce average RMSE below 25%. 15 (a) MEM (b) MEM ﬁtted proxy Figure 3: Nonlinear system emulation II. The colored surface shows the surface to re- construct. In this second contrived scenario the MEM ( b ) performs the same as the SVD emulator ( a ). Red dots show the training data. The emulated surface is shown with a black wireframe. 16 Figure 4: Simulator output for a ﬁxed intensity (41 mm h − 1 ) and varying duration. The black bold lines label different runoff regimes: free-surface ﬂows, runoff with activated storage and emptying of storage in the catchment. Figure 5: T est error distribution of the NMF emulator . The violin plots show the distri- bution of the logarithm of the error for the maximum absolute error (MAE) and the root mean square error (RMSE). Light colored plots corresponds to the NMF emulator , solid colored plots to the MEM. The mean(median) error is indicated with ﬁlled(empty) circles. See T able 2 for their numerical values. 17 Figure 6: NMF emulation quality . The colored surface shows the simulation outputs for three different rain intensities. The NMF emulated surface is shown with a black wire- frame. 18 simulator’s equations, with the aim of running a system identiﬁcation task from a single rain event. Figure 7 shows the distribution of the test error of a MEM and a SVD emulator with 6 components. Both errors are calculated on the same test set. Figure 7: T est error distribution of the MEM from Machac et al. [ 2016a ] and a SVD emu- lator . The violin plots show the error distribution for the mean maximum absolute error (MAE) and the root mean square error (RMSE). Light colored plots corresponds to the SVD emulator , solid colored plots to the MEM. The mean(median) error is indicated with ﬁlled(empty) circles . See T able 3 for their numerical values. T able 3 summarizes the mean of the error distributions. In this example, the SVD emulator also performs better than its mechanistic counter part. Although the time interpolation of SVD does not respect the dynamics of the system, the density of data is enough to provide a good estimation. The simplicity of the SVD emulator , when compared with the mechanistic one, makes this approach much more compelling for this application. 4 Discussion The results presented in the previous sections seem to suggest that MEMs are not useful for emulating complicated hydrological or hydrodynamic sim- ulators and that they remain as an academic curiosity . However MEMs still have many properties that can be exploited for better and faster emulation, which should not be ignored only due to the early state of the method. These known advantages include the suitability for parallelization and, speedups and energy saving via approximated computing [ Angerer et al. , 2015 ]. 19 Emulator MAE (l s − 1 , %) RMSE (l s − 1 , %) SVD(5 × 10 4 ) 18.3, 6.2 3.19, 2.6 MEM(1 × 10 3 ) 23.2, 7.9 4.94, 4.0 MEM-ﬁt (1 × 10 3 ) 20.2, 6.9 3.42, 2.8 T able 3: Mean emulation errors corresponding to the Adliswil catchment dataset. MEM refers to an emulator in Machac et al. [ 2016a ], while MEM-ﬁt to an emulator with the same proxy structure but parameters ﬁtted to the data. The value in parenthesis is the simulation speed-up factor obtained with respect to the original simulator , e.g. if SWMM simulation takes 3 s, SVD emulator takes 60 µs. T able 4 summarizes the steps involved in the two approaches presented here. There we indicate the relation between the method and the character - istics of the dataset. On one side, although data-driven emulators are compu- tationally and conceptually simpler than MEMs, they are more sensitive to the sparseness of the data. On the other side, MEM’s performance is limited by the linearity of the prior which might fail to express our knowledge about the nonlinear simulator . If good prior knowledge is available, the mechanis- tic emulation can incorporate correct dependencies , which could be exploited to reduce the amount of data needed to achieve a desired performance. Steps 1 and 2 of mechanistic emulation are the most sensitive to the mis- match between prior and simulator . T o mitigate this , we keep the model structure provided by the prior and learn the parameter values from the data, thus providing the best linear proxy for the dataset (Sec. 3.2 ). This generally provides sharper error distributions than using the parameter val- ues obtained directly from the prior [ Albert , 2012 ]. Note that mitigating this simulator -prior mismatch is not a question of data quantity , since more data will override the prior and all mechanistic insights it provided, thus render- ing mechanistic knowledge unnecessary [ Steinke and Schölkopf , 2008 ]. More data would also increase the memory and computational resources needed by the emulator . Increasing the density of time samples will also reduce the condition number of the covariance matrices and the inversion problem at the training phase will be ill-posed, unless iterative condition methods are used [ Reichert et al. , 2011 ]. In step 2 of a data-driven emulator we need to factorize the data. If the data is very sparse the generalization quality of the factorization is expected to be poor . F or data that is sampled at different temporal grids the situa- tion becomes even more delicate since data preprocessing is required to build up a grid, e.g . via interpolation or matrix completion. Matrix factorization also provide features only at the observed inputs , therefore an interpola- tion method is required when emulating unseen time points at step 5. The Kalman smoothing implementation of the mechanistic emulation used in Re- ichert et al. [ 2011 ] will be similarly affected. Optimizing this interpolation can be as hard as using a MEM directly (Sec. 3.1 ). Equation ( 15 ) shows that the mean function of the prior GP is given by the solution to the noise-free linear ODE obtained by removing the noisy term of the SDE ( 11 ). This suggests a simple improvement of the emulator in which the mean function is replaced with a better approximator of the data. This is the underlying idea behind the work of González et al. [ 2014 ], in which the actuation affecting the mean of the GP is replaced by a signal generated by the nonlinear part of the model applied to a surrogate trajec- tory . In that work however , the surrogate trajectory , e.g. built with matrix factorization, does not encode mechanistic knowledge explicitly . This knowl- 20 Step Mechanistic emulation Data-driven emulation 1 Deﬁne prior – 2 Obtain emulator parameters F actorize the data 3 Deﬁne/Build parameters mapping Build parameters mapping 4 Conditioning – 5 (Emulation) Matrix · vector Matrix · vector + time interpolation T able 4: Steps involved in the two emulation approaches described in this article. Data- driven approaches are simpler than mechanistic ones. Step 2 and 5 of data-driven ap- proaches suffer if data is sparse or unevenly sampled. Steps 1 and 2 of the mechanistic approach suffer from the lack of expressiveness of linear priors . edge could be introduced via the analytical solution of a nonlinear differen- tial equation, such as the nonlinear Bernoulli ODE, or analytical approxi- mations of more general nonlinear differential equations [ Adomian , 1991 ]. These enhancements would only affect the mean function of the predictive GP , improving extrapolation quality and thus allowing for more sparse train- ing sets. However , the estimation of uncertainties remains limited by the covariance function associated with the linear prior . This can be improved by including the mean function parameters in the conditioning step [ Rasmussen and Williams , 2006 , sec . 2.7]. All these observations are derived from more general results on the re- ducibility and emulation readinness of general simulators and not only valid for the especiﬁc simulators we used here. This is specially true for the data- driven approach, see for example Ch. 5 of Quarteroni et al. [ 2016 ]. 5 Conclusions W e provided a comparison of mechanistic and data-driven emulation in sev- eral examples pertinent to the ﬁeld of hydrology and urban water manage- ment. In all of these, data-driven emulation outperforms mechanistic emu- lation. The current state of MEMs makes them advantageous to fully data- driven emulators, when the training data is sparse and unevenly sampled. This is the case when many simulation runs with high temporal resolution are prohibitively expensive, or when adaptive stepsize simulators are used. If the only objective of emulation is to obtain a fast tool to replace a simula- tor , there seem to be no advantage in using mechanistic knowledge besides the case of sparse and unevenly sampled data mentioned before. The gain obtained from enhancements of MEMs discussed here, such as Wiener model proxies, Nonlinear mean functions, and hybrid mechanistic/data-driven em- ulators should be quantiﬁed in relation to the test error of an inexpensive data-driven emulator . Acknowledgements The authors would like to thank Prof. P eter Reichert for his support during the development of this article. W e thank Dr . David Machac for his emu- lation results used in Figure 7 . W e thank Dr . Frank Blumensaat for shar- 21 ing with us the SWMM model ﬁle of the W artegg catchment resulting from many data collection campaings . W e thank the developers of GNU Octave , Sage and Inkscape for their excellent software tools, which were used for this article. W e thank the reviewers for their help improving this manuscript. Funding The research leading to these results has received funding from the European Union’ s Horizon 2020 researc h and innovation programme un- der grant agreement No 641931 (Centaur). Author contributions JPC developed the software , carried out sim- ulations, data analysis. JPC & JR wrote this manuscript. JPL adapted SWMM model ﬁles for the simulations and helped interpreting the results . The work was done under the active supervision of CA & JR . All authors copy-edited this manuscript. References Anthony O’Hagan. Bayesian analysis of computer code outputs: A tutorial. Reliability Engineering & System Safety , 91(10-11):1290–1300, 2006. doi: 10.1016/j.ress.2005.11.025. Ulrike Baur , P eter Benner , and Lihong F eng. Model order reduction for lin- ear and nonlinear systems: a system-theoretic perspective. Arc hives of Computational Methods in Engineering , 21(4):331–358, 2014. Alﬁo Quarteroni, Andrea Manzoni, and F ederico Negri. Reduced Basis Meth- ods for P artial Differential Equations , volume 92 of UNITEXT . Springer International Publishing , Cham, 2016. ISBN 978-3-319-15430-5. doi: 10.1007/978- 3- 319- 15431- 2. Y u-Geng Xi, De-W ei Li, and Shu Lin. Model Predictive Control — Status and Challenges. Acta Automatica Sinica , 39(3):222–236, mar 2013. ISSN 18741029. doi: 10.1016/S1874- 1029(13)60024- 5. Guangtao Fu, Soon-Thiam Khu, and David Butler . Multiobjective optimisa- tion of urban wastewater systems using parego: a comparison with nsga ii. In 11th International Conference on Urban Drainage , 11 ICUD , Edin- burgh, Scotland, 2008. L.A. Rossman. Storm W ater Management Model User’s Manual V ersion 5.0. T echnical Report EPA/600/R-05/040, National Risk Management Researc h Laboratory , Ofﬁce of Research and Development, U .S. Environmental Pro- tection Agency , 2010. T omaso P oggio and F . Girosi. Networks for approximation and learning. Proceedings of the IEEE , 78(9):1481–1497, 1990. ISSN 00189219. doi: 10.1109/5.58326. Florian Steinke and Bernhard Schölkopf . Kernels, regularization and dif- ferential equations. Pattern Recognition , 41(11):3271–3286, 2008. ISSN 00313203. doi: 10.1016/j.patcog .2008.06.011. Carlo Albert. A mechanistic dynamic emulator. Nonlinear Analysis: Real W orld Applications , 13(6):2747–2754, 2012. 22 Simo Särkkä, Arno Solin, and Jouni Hartikainen. Spatiotemporal Learning via Inﬁnite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering. IEEE Signal Processing Magazine , 30(4):51–61, jul 2013. ISSN 1053-5888. doi: 10. 1109/MSP .2013.2246292. Ja vier González, Ivan V uja ˇ ci ´ c, and Ernst Wit. Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations. P at- tern Recognition Letters , 45:26–32, 2014. Arno Solin. Explicit Link Between P eriodic Covariance Functions and State Space Models. Proc . of the Seventeenth Int. Conf . on Artiﬁcial Intelligence and Statistics , 33:904–912, 2014. C E Rasmussen and C K I Williams. Gaussian Processes for Machine Learn- ing . Adaptive Computation And Machine Learning. MIT Press, 2006. P . Reichert, G . White, M.J . Bayarri, and E.B . Pitman. Mec hanism-based em- ulation of dynamic simulation models: Concept and application in hydrol- ogy . Computational Statistics & Data Analysis , 55(4):1638–1655, 2011. doi: 10.1016/j.csda.2010.10.011. P er Christian Hansen. Rank-Deﬁcient and Discrete Ill-P osed Problems . Soci- ety for Industrial and Applied Mathematics , jan 1998. ISBN 978-0-89871- 403-6. doi: 10.1137/1.9780898719697. C .M. Angerer , R. P olig , D . Zegarac, H. Giefers, C . Hagleitner , C . Bekas, and A. Curioni. A fast, hybrid, power -efﬁcient high-precision solver for large linear systems based on low-precision hardware. Sustainable Computing: Informatics and Systems , 2015. ISSN 2210-5379. doi: 10.1016/j.suscom. 2015.10.001. J D Victor and P Johannesma. Maximum-entropy approximations of stochas- tic nonlinear transductions: an extension of the Wiener theory . Biological cybernetics , 54(4-5):289–300, 1986. ISSN 0340-1200. George Christakos . Spatiotemporal information systems in soil and environ- mental sciences. Geoderma , 85(2-3):141–179, 1998. ISSN 00167061. doi: 10.1016/S0016- 7061(98)00018- 4. J . Harte. Maximum Entropy and Ecology: A Theory of Abundance, Distribu- tion, and Energetics . Oxford Series in Ecology and Evolution. OUP Oxford, 2011. ISBN 9780191621161. Jan S . Hesthaven, Gianluigi Rozza, and Benjamin Stamm. Reduced Basis Methods , pages 27–43. Springer International Publishing, Cham, 2016. ISBN 978-3-319-22470-1. doi: 10.1007/978- 3- 319- 22470- 1_3. Boian S . Alexandrov and V elimir V . V esselinov . Blind source separation for groundwater pressure analysis based on nonnegative matrix factorization. W ater Resources Research , 50(9):7332–7347, 2014. ISSN 1944-7973. doi: 10.1002/2013WR015037. Jingu Kim and Haesun Park. T oward faster nonnegative matrix factoriza- tion: A new algorithm and comparisons. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining , ICDM ’08, pages 353–362, W ashington, DC , USA, 2008. IEEE Computer Society . ISBN 978-0-7695- 3502-9. doi: 10.1109/ICDM.2008.149. 23 George S . Kimeldorf and Grace W ahba. A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines. The Annals of Mathematical Statistics , 41(2):495–502, apr 1970. ISSN 0003-4851. doi: 10.1214/aoms/1177697089. Thomas E Fricker , Jeremy E Oakley , and Nathan M Urban. Multivariate Gaussian Process Emulators With Nonseparable Covariance Structures. T echnometrics , 55(1):47–56, feb 2013. Y . A. Kuznetsov . Andronov-Hopf bifurcation. Scholarpedia , 1(10):1858, 2006. revision #90964. David Machac, Peter Reichert, Jörg Rieckermann, and Carlo Albert. F ast mechanism-based emulator of a slow urban hydrodynamic drainage sim- ulator . Environmental Modelling & Software , 78:54–67, 2016a. doi: 10.1016/j.envsoft.2015.12.007. David Machac , P eter Reichert, and Carlo Albert. Emulation of dynamic sim- ulators with application to hydrology . Journal of Computational Physics , 313:352–366, May 2016b. Edward Snelson, Carl Edward Rasmussen, and Zoubin Ghahramani. W arped gaussian processes . In In Advances in Neural Inf ormation Processing Sys- tems (NIPS , page 2003. MIT Press, 2003. E. Hairer , S . P . Nørsett, and G . W anner . Solving Ordinary Differential Equa- tions I (2Nd Revised. Ed.): Nonstiff Problems . Springer-V erlag New Y ork, Inc., New Y ork, NY , USA, 1993. ISBN 0-387-56670-8. P . T okarczyk, J . P . Leitao, J . Rieckermann, K. Schindler , and F . Blumensaat. High-quality observation of surface imperviousness for urban runoff mod- elling using uav imagery . Hydrology and Earth System Sciences , 19(10): 4215–4228, 2015. doi: 10.5194/hess- 19- 4215- 2015. Willi Gujer . Siedlungswasserwirtschaft . Springer , Berlin, 3. edition, 2007. Matthias O Franz and P eter V Gehler . How to choose the covariance for gaussian process regression independently of the basis. In Proc . Gaussian Processes in Practice W orkshop , 2006. Sewoong Oh. Matrix completion: Fundamental limits and efﬁcient algo- rithms . PhD thesis, Stanford University , 2010. G Adomian. A review of the decomposition method and some recent results for nonlinear equations. Computers & Mathematics with Applications , 21 (5):101–127, 1991. ISSN 08981221. doi: 10.1016/0898- 1221(91)90220- X. Octave community. GNU Octave 4.0.2, 2016. URL www.gnu.org/software/ octave/ . Last accessed Sept. 1, 2016. The Sage Development T eam. Sage Mathematics Software (V ersion 7.2), 2016. URL http://www.sagemath.org . Last accessed Sept. 1, 2016. Inkscape community. Inkscape 0.91, 2016. URL http://www.inkscape. org/ . Last accessed Sept. 1, 2016. 24

Appraisal of data-driven and mechanistic emulators of nonlinear hydrodynamic urban drainage simulators

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment