Linearly Constrained Gaussian Processes with Boundary Conditions

One goal in Bayesian machine learning is to encode prior knowledge into prior distributions, to model data efficiently. We consider prior knowledge from systems of linear partial differential equations together with their boundary conditions. We cons…

Authors: Markus Lange-Hegermann

Linearly Constrained Gaussian Processes with Boundary Conditions
Linearly Constrained Gaussian Pro cesses with Boundary Conditions Markus Lange-Hegermann Departmen t of Electrical Engineering and Computer Science O WL Univ ersit y of Applied Sciences and Arts markus.lange-hegermann@th-owl.de Abstract One goal in Ba yesian mac hine learning is to en- co de prior knowledge into prior distributions, to mo del data efficiently . W e consider prior kno wledge from systems of linear partial differ- en tial equations together with their boundary conditions. W e construct multi-output Gaus- sian process priors with realizations in the solution set of such systems, in particular only suc h solutions can b e represented by Gaussian pro cess regression. The construction is fully algorithmic via Gröbner bases and it do es not emplo y any appro ximation. It builds these priors combining t w o parametrizations via a pullbac k: the first parametrizes the solutions for the system of differential equations and the second parametrizes all functions adhering to the b oundary conditions. 1 In tro duction Gaussian pro cesses (Rasm ussen and Williams, 2006) are v ery data efficient. Hence, they are the prime regression tec hnique for small datasets and applied when data is rare or exp ensiv e to pro duce. Applications range from rob otics (Lima et al., 2018), biology (Honkela et al., 2015), global optimization (Osb orne et al., 2009), anomaly detection Berns. et al. (2020), hyperparameter searc h Thornton et al. (2013) to engineering (Thewes et al., 2015). A Gaussian pro cess can b e view ed as a suitable probability distribution on a set of functions, whic h we can condition on observ ations using Bay es’ rule. This av oids ov erfitting. Due to the self-conjugacy of the Gaussian distribution, the p osterior is again Gaussian. The mean function of the p osterior is used for regression and the v ariance quan tifies uncertaint y . F or a suitable cov ariance function of the prior, the Pro ceedings of the 24 th In ternational Conference on Artifi- cial In telligence and Statistics (AIST A TS) 2021, San Diego, California, USA. PMLR: V olume 130. Copyrigh t 2021 by the author(s). p osterior can approximate any behavior present in data, even in noisy or unstructured data. An y prior knowledge ab out the regression problem should b e incorp orated in to the prior. Then, the pre- ciously rare measurement data can be used to refine and impro ve on this prior knowledge, instead of needing to relearn it. The prior knowledge is usually enco ded in to the co v ariance structure of the Gaussian pro cess, cf. (Rasmussen and Williams, 2006, §4) or (Duvenaud, 2014). Gaussian pro cess regression differs in philoso- ph y from deep learning, where the latter thrives on extracting knowledge from a lot of data but struggles with one-shot learning and enco ding prior knowledge, whic h is usually done via pretraining on similar data. Prior knowledge is often given b y physical laws. In particular, it is imp ortan t to include linear differential equations into machine learning frameworks. Gaussian pro cesses that adhere to suc h a set of linear differen tial equations were constructed several times in the litera- ture (Graep el, 2003; Macêdo and Castro, 2008; Särkkä, 2011; Scheuerer and Schlather, 2012; W ahlström et al., 2013; Solin et al., 2018; Jidling et al., 2017; Raissi et al., 2017; Raissi and Karniadakis, 2018; Jidling et al., 2018). All realizations and the mean function of the p osterior strictly 1 satisfy these physical la ws. Suc h Gaussian pro cesses exist if and only if the set of linear differential equations describ es a c on trollable system 2 . Their con- struction can b e completely automatiz ed b y symbolic algorithms from algebraic system theory , which again strongly build on Gröbner bases (Lange-Hegermann, 2018). While the ab ov e approaches are exact, there are also appro ximate approaches to include partial differen tial equations in Gaussian pro cess and more generally ma- c hine learning. F or example, v arious forms of p osterior 1 F or notational simplicity , w e refrain from using the phrase “almost surely” in this paper, e.g. by assuming sepa- rabilit y . 2 Con trollable systems hav e “big” sets of solutions, ev en after adding b oundary conditions. As there is no unique solution, one do es regression or control in those systems, instead of (numerically) solving them as is usually done for systems with “small” solution sets. 1 regularization (Ganchev et al., 2010; Song et al., 2016; Y uan et al., 2020) can flexibly consider any differen tial equation. The pap er Raissi et al. (2018) constructed Gaussian pro cesses on numerical difference approxi- mation schemes of differential equations. Gaussian pro cesses hav e b een used to estimate conserv ation laws (Raissi and Karniadakis, 2017; Nguyen and Peraire, 2015, 2016). In (Y ang et al., 2018), a Gaussian pro cess prior is appro ximated from an MCMC sc heme build on numerical sim ulations. Usually , differen tial equations come with boundary conditions. Hence, a description of b oundary conditions in a machine learning framework is highly desirable. In the sp ecial case of ODEs, b oundary conditions b eha ve as data p oin ts, hence one only needs finite-dimensional data to sp ecify them. These data p oin ts can b e trivially included into a Gaussian pro cess (Kocijan et al., 2004; Calderhead et al., 2009; Barb er and W ang, 2014; John et al., 2019) and other machine learning metho ds (Chen et al., 2018; Raissi, 2018; Särkkä and Solin, 2019). This pap er claims no originalit y for ODEs. F or boundary conditions of PDEs, one w ould need functions (sp ecified by infinite dimensional data) to de- scrib e the b oundary conditions. Solving this problem exactly and without any approximation is the main con tribution of this paper: the construction of (non- stationary) Gaussian pro cess priors combining differen- tial equations and general b oundary conditions. This construction is again based on sym bolically building parametrizations using Gröbner bases, as in (Lange- Hegermann, 2018). More precisely , giv en a system of linear differential equations with rational (or, as a sp ecial case, constant) co efficien t of a con trollable system defined b y an op era- tor matrix and b oundary conditions defined by the zero set of a p olynomial ideal, w e construct a Gaussian pro- cess prior of the corresp onding set of smo oth solutions. In particular, a regression mo del constructed from this Gaussian pro cess prior has only solutions of the system as realizations. W e need no approximations. Using the results of this pap er, one can add information to Gaussian pro cesses b y (i) conditioning on data p oin ts (Bay es’ rule), (ii) restricting to solutions of linear op erator matrices Lange-Hegermann (2018), and (iii) adding b oundary conditions (this pap er). Since these constructions are compatible, we can com- bine strict, glob al information from equations and b oundary conditions with noisy, lo c al information from observ ations. This pap er is an example in how sym b olic tec hniques can help data driven machine learning. All results are mathematically prov en in the app endices and the algorithms are demonstrated on toy examples with only one or tw o data p oin ts, an extreme form of one-shot learning. The code for repro duction of the results is given in App endix C and the (v ery small amoun t of ) data is completely given in the text of this pap er. The nov elt y in this pap er do es not lie in either of its tec hniques, whic h are well-kno wn either in algebraic system theory or mac hine learning. Rather, this pa- p er com bines these techniques and thereby presen ts a nov el framew ork to deal with learning from data in the presence of linear controllable p artial differential equations and b oundary conditions. W e found it hard to compare to the state of the art, as there currently is no comparable technique, except the sup erficially similar pap er (Graep el, 2003) discussed in Remark 5.5 and a plethora of machine learning techniques designed for or dinary differential equations. The only exception is Gulian et al. (2020), which considers inhomogeneous linear differential equations with b oundary conditions using the sp ectral decomposition of a cov ariance func- tion (Solin and Kok, 2019), where the right hand side is sp ecified approximately b y data. These approaches allo w to approximately sp ecify a prior for the solution of the differential equation, instead of sp ecifying the prior for the parametrizing function as in this pap er. W e recall Gaussian pro cesses and their connection to linear op erators in Section 2 and summarize the con- struction of Gaussian pro cesses adhering to linear op- erators in Section 3. Describing b oundary conditions as parametrizations is surprisingly simple (Section 4). Theorem 5.2 describ es the core construction of this pap er, which allows to chec k whether and how tw o parametrizations are combinable. In Section 6 we con- struct b oundary conditions with non-zero right hand sides using the fundamen tal theorem on homomor- phisms. 2 Op erators and Gaussian Pro cesses A Gaussian pr o c ess g = G P ( µ, k ) is a probabilit y dis- tribution on the ev aluations of functions R d → R ` suc h that function v alues g ( x 1 ) , . . . , g ( x n ) are join tly Gaus- sian. It is sp ecified by a me an function µ : R d → R ` : x 7→ E ( g ( x )) and a p ositiv e semidefinite c ovarianc e function k : R d × R d → R ` × `  0 : ( x, x 0 ) 7→ E  ( g ( x ) − µ ( x ))( g ( x 0 ) − µ ( x 0 )) T  . All higher momen ts exists and are uniquely determined b y µ and k , all higher cumulan ts are zero. W e often restrict the domain of a Gaussian pro cess to a subset of R d . Assume the regression mo del y i = g ( x i ) and condition on observ ations ( x i , y i ) ∈ R 1 × d × R 1 × ` for i = 1 , . . . , n . Denote by k ( x, X ) ∈ R ` × `n resp. k ( X, X ) ∈ R `n × `n  0 the (cov ariance) matrices obtained by concatenating the matrices k ( x, x j ) resp. the p ositiv e semidefinite blo c k partitioned matrix with blo c ks k ( x i , x j ) . W rite µ ( X ) resp. y ∈ R 1 × `n for the row vector obtained by concatenating the ro ws µ ( x i ) resp. y i . The p osterior is the Gaussian pro cess G P  µ ( x ) + ( y − µ ( X )) k ( X, X ) − 1 k ( x, X ) T , k ( x, x 0 ) − k ( x, X ) k ( X, X ) − 1 k ( x 0 , X ) T  . Its mean function can b e used as regression mo del and its v ariance as mo del uncertaint y . Gaussian pro cesses are the linear ob jects among sto c hastic processes and their ric h connection with linear operators is present everywhere in this pap er. In particular, the class of Gaussian pro cesses is closed un- der linear op erators once mild assumptions hold. Now, w e formalize and generalize the following well-kno wn example of differen tiating a Gaussian pro cess. Example 2.1. Let g = G P (0 , k ( x, x 0 )) b e a scalar uni- v ariate Gaussian pro cess with differentiable realizations. Then,  ∂ ∂ x  ∗ g := G P  0 , ∂ 2 ∂ x∂ x 0 k ( x, x 0 )  is the Gaussian pro cess of deriv atives of realizations of the Gaussian process g . One can interpret this Gaussian pro cess  ∂ ∂ x  ∗ g as taking deriv atives as mea- suremen t data and pro ducing a regression mo del of deriv atives. T aking a one-sided deriv ative ∂ ∂ x k ( x, x 0 ) yields the cross-cov ariance b etw een a function and its deriv ative. See (Cramér and Leadb etter, 2004, §5.2) for a pro of and (W u et al., 2017) resp. Cobb et al. (2018) for a applications in Ba y esian optimization resp. v ector field mo deling. Giv en a set of functions G ⊆ { f : X → Y } and b : Y → Z , then the pushforwar d is b ∗ G = { b ◦ f | f ∈ G } ⊆ { f : X → Z } . X Y Z b ∗ G G b A pushforwar d of a sto c hastic Pro cess g : Ω → ( X → Y ) by b : Y → Z is b ∗ g : Ω → ( X → Z ) : ω 7→ ( b ◦ g ( ω )) . Lemma 2.2. L et F and G b e sp ac es of functions de- fine d on X ⊆ R d with pr o duct σ -algebr a of function evalutions. L et g = G P ( µ ( x ) , k ( x, x 0 )) with r e alizations in F and B : F → G a line ar, me asur able op er ator which c ommutes with exp e ctation w.r.t. the me asur e induc e d by g on F and by B ∗ g on G . Then, the push- forwar d B ∗ g of g under B is again Gaussian with B ∗ g = G P ( B µ ( x ) , B k ( x, x 0 )( B 0 ) T ) , wher e B 0 denotes the op er ation of B on functions with ar gument x 0 . Call B ∗ g the pushforwar d Gaussian pr o c ess of g under B . W e p ostpone the pro of to the app endix. Lemma 2.2 is often stated without assuming that B comm utes with exp ectation, but also without pro of. If such a more general version of Lemma 2.2 holds, the author w ould b e v ery interested to see a reference. Sp ecial cases ha ve been discussed in the literature, often only for mean square differentiabilit y (P apoulis and Pillai, 2002, after (9.87) resp. (10.78); in the first and second resp. third edition; A dler, 1981, Thm 2.2.2, Agrell, 2019; Da V eiga and Marrel, 2012, §2.3; Bertinet and Agnan, 2004, Thm. 9). Consider change points and change surfaces as appli- cation of Lemma 2.2, following (Garnett et al., 2009, 2010; Lloyd et al., 2014; Herlands et al., 2016). Example 2.3. Let ρ 1 , ρ 2 : R d → [0 , 1] a p artition of unity , i.e., ρ 1 ( x ) + ρ 2 ( x ) = 1 for all x ∈ R d . Usually , b oth ρ 1 and ρ 2 are close to b eing 0 or close to b eing 1 o ver most of R d . Suc h a partition of unity induces a linear op erator ρ =  ρ 1 ρ 2  : F 2 × 1 → F 1 × 1 :  f 1 f 2  7→  ρ 1 ρ 2   f 1 f 2  , where F is a space of functions R d → R . Given tw o indep enden t Gaussian pro cesses g 1 = G P (0 , k 1 ) , g 2 = G P (0 , k 2 ) with realizations in F , we ha v e ρ ∗ g :=  ρ 1 ρ 2  ∗  g 1 g 2  = G P (0 , ρ 1 ( x ) k 1 ( x, x 0 ) ρ 1 ( x 0 ) + ρ 2 ( x ) k 2 ( x, x 0 ) ρ 2 ( x 0 )) Thereb y , we mo del change points (for d = 1 ) or change surfaces (for d > 1 ) at p ositions where ρ 1 c hanges from b eing close to 0 to b eing close to 1. This example is the basis for b oundary conditions in Section 4: when setting g 2 to zero, ρ ∗ g is close to zero where ρ 2 ≈ 1 and close to g 1 where ρ 1 ≈ 1 . 3 Solution Sets of Op erator Equations W e consider linear ordinary and partial differen tial equations defined on the set of smo oth functions. Let F = C ∞ ( X, R ) b e the real v ector space of smo oth functions from X ⊆ R d to R with the usual F réchet top ology 3 . The squar e d exp onential c ovarianc e func- tion k F ( x i , x j ) = exp − 1 2 d X a =1 ( x i,a − x j,a ) 2 ! (1) induces a Gaussian pro cess prior g F = G P (0 , k F ) with realizations dense in the space of smo oth functions F = C ∞ ( X, R ) w.r.t. this top ology . The following three rings of line ar op er ators R mo del op erator equations. These rings are R -algebras s.t. F is a top olo gic al R -(left-)mo dule , i.e., F is a top ological R -v ector space of functions X → R for X ⊆ R d that also is an R -(left-)mo dule suc h that the elements of R op erate con tin uously on F . Example 3.1. The p olynomial 4 ring R = R [ ∂ x 1 , . . . , ∂ x d ] mo dels linear differen tial equa- tions with constan t co efficien ts, as ∂ x i acts on F = C ∞ ( X, R ) via partial deriv ativ e w.r.t. x i . Example 3.2. The p olynomial ring R = R [ x 1 , . . . , x d ] mo dels algebraic equations via m ultiplication on F . This ring is relev an t for b oundary conditions. T o combine linear differential equations with constant co efficien ts with b oundary conditions or to mo del lin- ear differential equations with p olynomial 5 co efficien ts, consider the follo wing ring. Example 3.3. The W eyl algebra R = R [ x 1 , . . . , x n ] h ∂ x 1 , . . . , ∂ x n i has the non-commutativ e relation ∂ x i x j = x j ∂ x i + δ ij represen ting the pro duct rule of differentiation, where δ ij is the Kroneck er delta. Op erators defined o ver these three rings satisfy the assumptions of Lemma 2.2: multiplication comm utes with exp ectations and the dominated conv ergence the- orem implies that exp ectation comm utes with deriv a- tiv es, as realizations of g F are con tin uously differen- tiable. These three rings also op erate contin uously on F : the F réc het top ology is constructed to mak e deriv a- tion contin uous, and multiplication is b ounded (if X is b ounded and b ounded a w ay from infinit y) and hence con tinuous, as F is F réchet. 3 F or Gaussian pro cesses on F réchet spaces see (Zapała, 2002; Ossw ald, 2012). The top ology is generated b y the separating family k f k a,b := sup i ∈ Z d ≥ 0 , | i |≤ a sup z ∈ [ − b,b ] d | ∂ ∂ z i f ( z ) | of seminorms for a, b ∈ Z ≥ 0 on F (T reves, 1967, §10). 4 P artial deriv atives commute (symmetry of 2nd deriv a- tiv es) and generate a c ommutative p olynomial ring. 5 No ma jor c hanges for rational, holonomic, or meromor- phic coefficients. 3.1 Parametrizations F or A ∈ R ` 0 × ` define the solution set sol F ( A ) := { f ∈ F ` × 1 | Af = 0 } as a nullspace of an op erator matrix A . W e sa y that a Gaussian pro cess is in a function space, if its realizations are con tained in said space. The follo wing tautological lemma is a version of the fundamen tal theorem of homomorphisms. It describ es the interpla y of Gaussian pro cesses and solution sets of op erators. Lemma 3.4 (Lange-Hegermann, 2018, Lemma 2.2) . L et g = G P ( µ, k ) b e a Gaussian pr o c ess in F ` × 1 . Then g is a Gaussian pr o c ess in the solution set sol F ( A ) of A ∈ R ` 0 × ` if and only if b oth µ is c ontaine d in sol F ( A ) and A ∗ ( g − µ ) is the c onstant zer o pr o c ess. T o construct Gaussian pro cesses with realizations in the solution set sol F ( A ) of an op erator matrix A ∈ R ` 0 × ` , one lo oks for a B ∈ R ` × ` 00 with AB = 0 (Jidling et al., 2017). If g = G P (0 , k ) is a Gaussian pro cess in F ` 00 × 1 , then the realizations of B ∗ g are contained in sol F ( A ) b y Lemma 3.4, as A ∗ ( B ∗ g ) = ( AB ) ∗ g = 0 ∗ g = 0 . In practice, one would like that any solution in sol F ( A ) can b e approximated by B ∗ g to arbitrary precision, i.e., that the realizations of the Gaussian pro cess B ∗ g are dense in sol F ( A ) . T o this end, w e call B ∈ R ` × ` 00 a p ar ametrization of sol F ( A ) if sol F ( A ) = B F ` 00 × 1 . Prop osition 3.5 (Lange-Hegermann, 2018, Prop. 2.4) . L et B ∈ R ` × ` 00 b e a p ar ametrization of sol F ( A ) for A ∈ R ` 0 × ` . T ake the Gaussian pr o c ess g ` 00 × 1 F of  00 i.i.d. c opies of g F , the Gaussian pr o c ess with squar e d exp o- nential c ovarianc e 6 function k F fr om Eq. (1) . Then, the r e alizations of B ∗ g ` 00 × 1 F ar e dense in sol F ( A ) . Pr o of. By construction, realizations of g ` 00 × 1 F are dense in F ` 00 × 1 . The op erator B induces a surjective con tin- uous ( F is a top ological R -mo dule) map. Surjective con tinuous maps map dense sets to dense sets. 3.2 Algorithmically constructing parametrizations W e summarize the algorithm which decides whether a parametrization of a system of linear differen tial equations exists and compute it in the p ositive case. T o construct the parametrization B , we are lead to just compute the nullspace 7 of F ` 0 × 1 A ← − F ` × 1 . This is not feasible, as F is to o “big” to allow computations. Instead, we compute the nullspace of R ` 0 × 1 A ← − R ` × 1 , 6 Or an y other cov ariance with realizations dense in F . 7 W e av oid calling nullspaces kernel, due to confusion with symmetric p ositiv e semidefinite functions. While a left resp. right nullspace is a mo dule, we abuse notation and denote any matrix as left resp. right nullspace if its rows resp. columns generate the n ullspace as an R -mo dule. a symbolic computation, only using op erations ov er R without inv olv emen t of F . Theorem 3.6. L et A ∈ R ` 0 × ` . L et B b e the right nul lspac e of A and A 0 the left nul lsp ac e of B . Then sol F ( A 0 ) is the lar gest subset of sol F ( A ) that is p ar ametrizable and B p ar ametrizes sol F ( A 0 ) . A well-kno wn and trivial sp ecial case of this theorem are linear equations in finite dimensional vector spaces, with R = F = R the field of real num bers. In that case, sol F ( A ) can b e found by applying the Gaussian algorithm to the homogeneous system of linear equa- tions Ab = 0 and write a base for the solutions of b as columns of a matrix B . This matrix B is the (right) n ullspace of A . There are no additional equations sat- isfied by the ab o v e solutions, i.e. A = A 0 generates the (left) nullspace of B . In general, the left nullspace A 0 of the right nullspace B of A is not necessarily A . E.g., for the univ ariate p olynomial ring R = R [ x ] and the matrix A =  x  w e ha ve B =  0  and A 0 =  1  . Corollary 3.7. In The or em 3.6, sol F ( A ) is p ar ametrizable if and only if the r ows of A and A 0 gener ate the same r ow-mo dule. Sinc e AB = 0 , this is the c ase if al l r ows of A 0 ar e c ontaine d in the r ow mo d- ule gener ate d by the r ows of A . In this c ase, sol F ( A ) is p ar ametrize d by B . F or a formal proof w e refer to the literature (Zerz et al., 2010, Thm. 2; Zerz, 2000, Thm. 3, Alg. 1, Lemma 1.2.3; Ob erst, 1990, §7.(24); Quadrat, 2013, 2010; Barak at, 2010; Seiler and Zerz, 2010; Chyzak et al., 2005; Rob ertz, 2015). Luckily , there is a high lev el description of the parametrizable systems. Theorem 3.8 (Ob erst, 1990, §7.(21)) . A system sol F ( A ) is p ar ametrizable iff it is c ontr ol lable. The intuition for con trollabilit y is that one can parti- tion the functions of the system into state and input, suc h that any c hosen state can b e reached by suitably manipulating the inputs. In particular, controllable systems (except the trivial system) are far a wa y from ha ving a unique solution. If A is not parametrizable, then the solution set sol F ( A 0 ) is the subset of control- lable b eha viors in sol F ( A ) . Reduced Gröbner bases generalize the reduced ec he- lon form from linear systems to systems of p olynomial (and hence linear op erator) equations, by bringing them in to a standard form. They are computed b y Buc h- b erger’s algorithm, whic h is a generalization of the Gaussian and Euclidean algorithm and a sp ecial case of the Knuth-Bendix completion algorithm for rewrit- ing systems. The generalization of Gröbner bases to v ectors of p olynomials is straight forward. Gröbner bases make the ab o v e theorems algorithmic. Similar to the reduced echelon form, Gröbner bases allo w to compute all solutions ov er R of the homoge- neous system and compute, if it exists, a particular solution o v er R for an inhomogeneous system. Solv- ing homogeneous systems is the same as computing its righ t resp. left nullspace (of A resp. B ). Solving inhomogeneous equations decides whether an element (the ro ws of A 0 ) is contained in a mo dule (the ro w mo dule of A ). A formal description of Gröbner bases exceeds the scop e of this note. W e refer to the excel- len t literature (Sturmfels, 2005; Eisenbud, 1995; Adams and Loustaunau, 1994; Greuel and Pfister, 2008; Gerdt, 2005; Buch berger, 2006). Not only do they generalize the Gaussian algorithm for linear polynomials, but also the Euclidean algorithm for univ ariate p olynomials. In addition to polynomial rings, Gröbner bases also ex- ist for the W eyl algebra (Rob ertz, 2006, 2008; Chyzak et al., 2007; Lev ando vskyy, 2005; Lev andovskyy and Sc hönemann, 2003) and many further rings. The algo- rithms used in the pap er are usually readily av ailable functions implemented in v arious computer algebra sys- tems (Dec k er et al., 2019; Gra yson and Stillman, 1992). While Gröbner bases dep end on the choice of a term order, similar to reordering columns in the Gaussian algorithm, any term order leads to correct results. Gröbner bases solve problems of high complexity lik e ExpSp ace completeness (Mayr, 1989; Ma yr and Meyer, 1982; Bay er and Stillman, 1988). In practice, this is less of a problem, as the Gröbner basis computations only in volv e the op erator equations, but no data. Hence w e view the complexity of the Gröbner basis compu- tations in O (1) , which only needs to b e applied once to construct the cov ariance function. In particular, the Gröbner bases of ev ery example in this pap er termi- nate instan taneously . F or larger examples, the data dep enden t O ( n 3 ) of the Gaussian pro cesses is the com- putationally restricting subalgorithm. Example 3.9 (Lange-Hegermann, 2018, Example 4.4) . W e construct a prior for smo oth tangent fields on the sphere without sources and sinks using the p olynomial W eyl algebra R = R [ x, y , z ] h ∂ x , ∂ y , ∂ z i . I.e., we are in terested in sol A ( F ) = { v ∈ C ∞ ( S 2 , R 3 ) | Av = 0 } for A :=  x y z ∂ x ∂ y ∂ z  . The right n ullspace B :=   − z ∂ y + y ∂ z z ∂ x − x∂ z − y ∂ x + x∂ y   . can b e chec ked to yield a parametrization of sol F ( A ) . F or a demonstration of this cov ariance functions, see Figure 1. 4 Boundary conditions Differen tial equations and b oundary conditions go hand in hand in applications. Here, we recall a general meth- o ds to incorp orate b oundary conditions into Gaussian pro cesses, a slight generalization of (Graep el, 2003, Sec- tion 3), closely related to vertical rescaling. Boundary conditions in ODEs are equiv alen t to conditioning on data p oin ts John et al. (2019). W e recall the creation of priors for homogeneous b ound- ary conditions for PDEs from Graep el (2003), for the inhomogeneous case see Section 6. Such b oundary conditions fix the function v alues and/or their deriv a- tiv es at a subset of the domain X exactly . W e restrict ourselv es to zero sets of p olynomials. F or more com- plicated, approximate b oundary conditions see Solin and K ok (2019) and for asymptotic b oundaries see T an (2018). Denote again by F = C ∞ ( X, R ) the set of smo oth functions defined on X ⊂ R d compact. Let R 0 ⊂ R X b e a No etherian ring of functions and subring of R , and M ⊆ X implicitely defined M = V ( I ) := { m ∈ X | f ( m ) = 0 for all f ∈ I } for an ideal I E R 0 of equations. An imp ortan t example for this setting is the W eyl algebra R = R [ x 1 , . . . , x d ] h ∂ x 1 , . . . , ∂ x d i and its subring R 0 the p oly- nomial ring R 0 = R [ x 1 , . . . , x d ] . Prop osition 4.1. A r ow B 0 =  f 1 . . . f ` 00  whose entries gener ate the ide al I p ar ametrizes al l solutions of a homo genous b oundary c ondition f | M = 0 for a function f ∈ F via F B 0 ← − F ` 00 × 1 Pr o of. Let on the one hand p ∈ R d suc h that f i ( p ) = 0 for all 1 ≤ i ≤  00 . Then, ( B 0 g )( p ) = 0 for all g ∈ F ` 00 × 1 . On the other hand, let p ∈ R d suc h that there is an 1 ≤ j ≤  00 with f j ( p ) 6 = 0 and parametrize h ∈ F lo cally as h ( x ) = B 0 · h ( x ) f j ( x ) e j for e j the j th standard basis vector, since lo cally f j ( x ) 6 = 0 . F or a global parametrization, patch the lo cal parametrizations via a partition of unity . T o enco de b oundary conditions for  > 1 functions, w e use a direct sum matrix B 0 ∈ ( R 0 ) ` × `` 00 , e.g., B 0 =  B 0 1 0 0 B 0 2  for  = 2 where B 0 1 and B 0 2 are ro ws o v er R 0 describing the b oundaries. Example 4.2. F unctions F = C ∞ ([0 , 1] 2 , R ) with Diric hlet boundary conditions f (0 , y ) = f (1 , y ) = f ( x, 0) = f ( x, 1) = 0 are parametrized by B 0 =  x ( x − 1) y ( y − 1)  . Example 4.3. F unctions F = C ∞ ( R 3 , R ) with b ound- ary condition f (0 , 0 , z ) = 0 are parametrized by B 0 =  x y  . Example 4.4. Consider F = C ∞ ( R 2 , R ) with b ound- ary conditions f (0 , y ) =  ∂ ∂ x f ( x, y )  | x =0 = 0 . Suc h functions are parametrized by B =  x 2  , since  ∂ ∂ x ( x 2 f ( x, y ))  | x =0 =  2 xf ( x, y ) + x 2 ∂ ∂ x f ( x, y )  | x =0 = 0 . 5 In tersecting parametrizations No w, we com bine parametrizations B 1 ∈ R ` × ` 00 and B 2 ∈ R ` × ` 000 , e.g. from differen tial equations and b oundary conditions, by intersecting their images B 1 F ` 00 ∩ B 2 F ` 000 . Example 5.1. A ctually , the Dirichlet b oundary condi- tion of Example 4.2 is an intersection of the images of the b oundary conditions parametrized b y  x  ,  x − 1  ,  y  , and  y − 1  . The following theorem is the main contribution of this pap er. It constructs a parametrization of intersections of parametrizations algorithmically . Theorem 5.2 (In tersecting parametrizations) . L et B 1 ∈ R ` × ` 00 1 and B 2 ∈ R ` × ` 00 2 . Denote by C :=  C 1 C 2  ∈ R ( ` 00 1 + ` 00 2 ) × m the right-nul lsp ac e of the matrix B :=  B 1 B 2  ∈ R ` × ( ` 00 1 + ` 00 2 ) . Then B 1 C 1 = − B 2 C 2 p ar ametrizes solu- tions of B 1 F ` 00 1 ∩ B 2 F ` 00 2 . F or the pro of cf. the app endix. The computations are again Gröbner basis computations o ver the ring R . Example 5.3. W e rephrase the computation of di- v ergence free fields on the sphere from Example 3.9. This is the intersection of div ergence free fields, the zero set of A 1 :=  ∂ x ∂ y ∂ z  , and the fields on the sphere, the zero set of A 2 :=  x y z  , resp ectively parametrized by B 1 =   0 ∂ z − ∂ y − ∂ z 0 ∂ x ∂ y − ∂ x 0   and B 2 =   0 z − y − z 0 x y − x 0   . The right-n ullspace of  B 1 B 2  is C =  C 1 C 2  =         x ∂ x 0 y ∂ y 0 z ∂ z 0 ∂ x 0 x ∂ y 0 y ∂ z 0 z         The matrix  B 1 B 2  is the left nullspace of C . Now, B 1 C 1 = − B 2 C 2 =   z ∂ y − y ∂ z 0 0 − z ∂ x + x∂ z 0 0 y ∂ x − x∂ y 0 0   is equiv alen t 8 to the matrix B from Example 3.9. Example 5.4. W e contin ue with the divergence free fields on the sphere from Examples 3.9 and 5.3. These are parametrized b y B 1 :=   − z ∂ y + y ∂ z z ∂ x − x∂ z − y ∂ x + x∂ y   . functions v anishing at the equator (b oundary condition: f ( x, y , 0) = 0 ) are parametrized by B 2 :=   z 0 0 0 z 0 0 0 z   . The nullspace of  B 1 B 2  is C :=     C 1 C 2 , 1 C 2 , 2 C 2 , 3     =     − z 2 z 2 ∂ y − y z ∂ z − 2 y − z 2 ∂ x + xz ∂ z + 2 x y z ∂ x − xz ∂ y     . The left nullspace of C is not only generated by  B 1 B 2  , but by the additional relation D :=  0 x y z  . This relation D tells us, that the parametrized solutions of C 2 are a vector field on a sphere around the origin, which they remain after b eing m ultiplied by the scalar matrix B 2 . W e gladly accept this additional condition. No w, B 1 C 1 = − B 2 C 2 =   − z 3 ∂ y + y z 2 ∂ z + 2 y z z 3 ∂ x − xz 2 ∂ z − 2 xz − y z 2 ∂ x + xz 2 ∂ y   parametrizes the divergence free fields on the sphere v anishing at the equator, see Figure 1. Remark 5.5. (Graep el, 2003) also constructs a Gaus- sian pro cess prior for a system Af = y of linear differ- en tial equations with b oundary conditions. It assumes an y Gaussian pro cess prior on f and uses a v arian t of Lemma 2.2 to compute the cross-cov ariance b etw een y and f , which allo ws to condition the mo del p ( f ) on data for y . This mo del ensures in no wa y that f is constrained to solutions of Af = y , ev en if e.g. y = 0 is kno wn. F urthermore, conditioning p ( f ) on data for f is just done w.r.t. the (uninformativ e) Gaussian pro cess prior chosen for f . 8 The matrices B 1 and B 2 eac h hav e a non-zero nullspace, corresp onding to the t w o trivial columns in B 1 C 1 . As in this pap er, (Graep el, 2003) uses Prop osition 4.1 and a pushforward to construct a prior for f supp orted on solutions of the homogeneous b oundary condition. No effort to com bine differential equations and b ound- ary conditions as in Theorem 5.2 is necessary , since the differen tial equations are not satisfied an ywa y . The case of inhomogeneous b oundary conditions is solved via taking a particular solutions as a mean function. Finding suc h a particular solution is s imple, as only the b oundary conditions must be satisfied; in contrast to Section 6 of this pap er, where also the differential equations need to b e satisfied. 6 Inhomogenous b oundary conditions So far, we hav e only considered homogeneous equations and boundary conditions, i.e., with right hand sides zero. The fundamental theorem of homomorphisms (cf. Lemma 3.4) extends this to the inhomogeneous case, by taking a particular solution as mean function. While simple theoretically , finding a particular solution can b e quite hard in practice. W e restrict ourselves to examples. Example 6.1. Consider smo oth divergence free fields on the 2-sphere X = S 2 , i.e., f ∈ F 3 × 1 with Af =  x y z ∂ x ∂ y ∂ z  f = 0 and inhomogeneous b oundary condition f 3 ( x, y , 0) = y . The function µ =  0 − z y  T is a particular solu- tion. Hence, we tak e it as mean function. The matrix B 1 C 1 = − B 2 C 2 from Example 5.4 parametrizes func- tions with the corresp onding homogeneous b oundary condition f 3 ( x, y , 0) = 0 of functions v anishing at the equator. Hence, assuming mean zero and squared ex- p onen tial cov ariance k F , the Gaussian pro cess G P  µ, ( B 1 C 1 ) k F (( B 1 C 1 ) 0 ) T  is a prior distribution in the solutions of the equations and b oundary conditions b y Lemma 3.4, which we demonstrate in Figure 1. Example 6.2. Consider smo oth divergence free fields on the square X = [0 , 1] × [0 , 1] such that no flo w in or out of X is possible at the lo w er and upper b oundary of X and there is a constan t flow of strength 1 in x -direction at the left and righ t b oundary . The div ergence-freeness is mo delled by the right k ernel B 1 =  ∂ y − ∂ x  of A =  ∂ x ∂ y  . W e mo del the conditions on the flow b y the constant mean function µ : ( x, y ) 7→  1 0  Figure 1: On the left, the p osterior mean of conditioning the prior in Example 3.9 at tw o opp osite p oin ts on the equator with tangen t vectors p oin ting north. Without sources and sinks, the tangent vectors flow south a wa y from the data. In the middle, the p osterior mean from Example 5.4 of a divergence free tangent field on the sphere which is zero at the equator (red) and conditioned at a single observ ation at the north p ole. Notice the flow parallel to the equator in middle latitudes, orthogonal to the observ ation, a voids sinks or sources. On the right, the p osterior mean from Example 6.1 of a divergence free tangen t field on the sphere with the given b oundary condition (red) at the equator b eing conditioned at a single observ ation at the north p ole. Data is displa yed artificially bigger. Figure 2: A plot of the mo del from Example 6.2, conditioned on the vector (0 , 1) at the p oin t (0 . 5 , 0 . 5) , whic h is plotted artificially bigger and red. describing flow in x -direction and the b oundary condi- tion parametrized b y B 2 =  x ( x − 1) 0 0 y ( y − 1)  . The nullspace of  B 1 B 2  is C :=   C 1 C 2 , 1 C 2 , 2   =   x 2 y 2 − x 2 y − xy 2 + xy − y 2 ∂ y + y ∂ y − 2 y + 1 x 2 ∂ x − x∂ x + 2 x − 1   . and leads to the parametrization P := B 1 C 1 = − B 2 C 2 =  x ( x − 1)( − 1 + y 2 ∂ y + y ( − ∂ y + 2)) − y ( y − 1)( − 1 + x 2 ∂ x + x ( − ∂ x + 2))  . Hence, assuming a squared exp onen tial co v ariance k F for the parametrizing function the Gaussian pro cess G P  µ, P k F P 0 T  is a prior of smo oth divergence free fields on X with the ab o v e flow conditions. W e demonstrate this prior in Figure 6. 7 Conclusion This pap er incorp orates prior kno wledge into machine learning frameworks. It presents a nov el framework to 1. describ e parametrizations for b oundary conditions, 2. com bine parametrizations b y in tersecting their im- ages, and 3. build Gaussian pro cess priors with realizations in the solution set of a system of linear differential equations with b oundary conditions, without any assumptions or approximations. These priors hav e b een demonstrated on geometric problems and lead to reasonable mo dels with one or tw o (cf. Figure 1) data p oin ts. The author thanks the reviewers for their constructive feedbac k and by is in terested in further work on encod- ing ph ysical or system-theoretic prop erties in Gaussian pro cess priors. A Pro of of Lemma 2.2 Before giving the pro of of Lemma 2.2, w e recall the definition (if it exists) of the  -th cumulan t function κ ` ( g ) κ ` ( g )  x (1) , . . . , x ( ` )  = X π ∈ part( ` ) ( − 1) | π |− 1 ( | π | − 1)! Y τ ∈ π E Y i ∈ τ g  x ( i )  ! of a sto c hastic pro cess g , where part (  ) is the set of partitions of  and | π | denotes the cardinality of π . In particular, the first t wo cum ulan t functions κ 1 resp. κ 2 are equal to the mean resp. cov ariance function. F urthermore, g is Gaussian iff all but the first tw o cumulan t functions v anish. The sto c hastic pro cess B ∗ g exists, as F is an R -mo dule and the realizations of g are all contained in F . The compatibilit y w ith exp ectations prov es the following form ula for the cumulan t functions of κ ( B ∗ g ) of B ∗ g , where B ( i ) denotes the op eration of B on functions with argument x ( i ) ∈ R d : κ ` ( B ∗ g )  x (1) , . . . , x ( ` )  = X π ∈ part( ` ) ( − 1) | π |− 1 ( | π | − 1)! · Y τ ∈ π E Y i ∈ τ ( B ∗ g )  x ( i )  ! = X π ∈ part( ` ) ( − 1) | π |− 1 ( | π | − 1)! · Y τ ∈ π Y i ∈ τ B ( i ) ! E Y i ∈ τ g  x ( i )  ! (as B commutes with exp ectation) = X π ∈ part( ` ) ( − 1) | π |− 1 ( | π | − 1)! · b B Y τ ∈ π E Y i ∈ τ g  x ( i )  ! (as π is a partition; b B := Q i ∈{ 1 ,...,` } B ( i ) ) = b B X π ∈ part( ` ) ( − 1) | π |− 1 ( | π | − 1)! · Y τ ∈ π E Y i ∈ τ g  x ( i )  ! (as B is linear) = b B κ ` ( g )  x (1) , . . . , x ( ` )  As g is Gaussian, the higher (  ≥ 3 ) cumulan ts κ ` ( g ) v anish, hence the higher (  ≥ 3 ) cumulan ts κ ` ( B ∗ g ) v anish, whic h implies that B ∗ g is Gaussian. The formulas for the mean function resp. cov ariance function follow from the ab o ve computation for  = 1 resp.  = 2 . B Pro of of Theorem 5.2 Before giving the pro of of Theorem 5.2, w e recall some definitions and facts from homological algebra and category theory nLab authors (2020); Mac Lane (1998); W eib el (1994); Cartan and Eilen b erg (1999). A collection of tw o morphisms with the same source A 2 α 1 ← − B α 2 − → A 2 is a sp an and a collection of tw o morphisms with the same range C 2 γ 1 − → D γ 2 ← − C 2 is a c osp an . Given a cospan C 2 γ 1 − → D γ 2 ← − C 2 , an ob ject P together with tw o morphisms δ 1 : P → C 1 and δ 2 : P → C 2 is called a pul lb ack , if γ 1 ◦ δ 1 = γ 2 ◦ δ 2 and for every P 0 with t w o morphisms δ 0 1 : P 0 → C 1 and δ 0 2 : P 0 → C 2 suc h that γ 1 ◦ δ 0 1 = γ 2 ◦ δ 0 2 there exists a unique morphism π : P 0 → P suc h that δ 1 ◦ π = δ 0 1 and δ 2 ◦ π = δ 0 2 . Pullbac ks are the generalization of intersections. Given a span A 2 α 1 ← − B α 2 − → A 2 , an ob ject P together with tw o morphisms β 1 : A 1 → P and β 2 : A 2 → P is called a pushout , if β 1 ◦ α 1 = β 2 ◦ α 2 and for every P 0 with tw o morphisms β 0 1 : A 1 → P 0 and β 0 2 : A 2 → P 0 suc h that β 0 1 ◦ α 1 = β 0 2 ◦ α 2 there exists a unique morphism π : P → P 0 suc h that β 1 ◦ π = β 0 1 and β 2 ◦ π = β 0 2 . Pullbacks and Pushouts exist in the category of finitely presented mo dules. Given an R -mo dule M , an epimorphism M ← ← − R m is a fr e e c over of M and a monomorphism M  → R m is a fr e e hul l of M . Every finitely presented R -mo dule has a free cov er, but only a free hull iff it corresponds to a con trollable system. Given an R -mo dule M , the c ontr avariant hom-functor hom R ( − , M ) is the hom-set hom R ( A, M ) = { ψ : A → M | ψ R -mo dule homomorphism } when applied to an R -mo dule A and application to an R -mo dule homomorphism ϕ : A → B giv es hom R ( ϕ, M ) : hom R ( B , M ) → hom R ( A, M ) : β 7→ β ◦ ϕ . If R is a commutativ e, then hom R ( − , M ) is a functor to the category of R -mo dules, otherwise it is a functor to the category of Ab elian groups. By Corollary 3.7, the assumptions of Theorem 5.2 ensure that we ha ve a parametrization C of the system defined b y B . As C is the nullspace of B , we ha v e B 1 C 1 = − B 2 C 2 . The parametrization of an in tersection of parametrizations B 1 F ` 00 1 ∩ B 2 F ` 00 2 is given by the image of the pullback P of the cospan F ` 00 1 B 1 − − → F ` B 2 ← − − F ` 00 2 in F ` b y (Eisenbud, 1995, 15.10.8.a). The approach of Theorem 5.2 computes a subset 9 of this image via a free cov er P ← ← − F m of this pullbac k P as image of B 1 C 1 = − B 2 C 2 , as depicted in the following comm utative diagram: F ` F ` 00 1 F ` 00 2 P F m B 1 B 2 C 1 − C 2 As in Theorem 3.6 and Corollary 3.7, the computation is done dually ov er the ring R . There, the cospan R 1 × ` 00 1 C 1 − − → R 1 × m C 2 ← − − R 1 × ` 00 2 defines a free hull Q  C − → R 1 × m of the pushout Q of the span R 1 × ` 00 1 B 1 ← − − R 1 × ` B 2 − − → R 1 × ` 00 2 . Then applying the dualizing hom-functor hom R ( − , F ) transforms this to the function s pace F . Ev en though all op erations in this pro of are algorithmic (Barak at and Lange-Hegermann, 2011), Theorem 5.2 describ es a computationally more efficient algorithm. C Co de The following computation hav e b een p erformed in Maple with the OreMo dules pac k age (Chyzak et al., 2007). Example C.1 (General Co de for GP regression) . > # code for GP regression > GP:=proc(Kf, > points,yy,epsilon) > local n,m,kf,K,s1,s2,alpha,KStar; > n:=nops(points); > m:=RowDimension(Kf); > s1:=map( > a->[x1=a[1],y1=a[2],z1=a[3]], > points); > s2:=map( > a->[x2=a[1],y2=a[2],z2=a[3]], > points); > kf:=convert(Kf,listlist); 9 T o get the full image, we need F to be an injectiv e module. > K:=convert( > evalf( > map( > a->map( > b->convert( > subs(a,subs(b,kf)), > Matrix), > s2), > s1)), > Matrix): > alpha:=yy.(K+epsilon^2)^(-1); > KStar:=map( > a->subs(a,kf), > s1): > KStar:=subs( > [x2=x,y2=y,z2=z],KStar): > KStar:=convert( > map(op,KStar),Matrix): > return alpha.KStar; > end: Example C.2 (Code for Example 3.9) . > restart; > with(OreModules): > with(LinearAlgebra): > Alg:=DefineOreAlgebra(diff=[Dx,x], > diff=[Dy,y], diff=[Dz,z], > diff=[Dx1,x1], diff=[Dy1,y1], > diff=[Dz1,z1], diff=[Dx2,x2], > diff=[Dy2,y2], diff=[Dz2,z2], > polynom=[x,y,z,x1,x2,y1,y2,z1,z2]): > A:=«x,Dx>|| mu:=<1,0>; " 1 0 # > B2:=«(x-1)*x,0>|<0,(y-1)*y»; " ( x − 1) x 0 0 ( y − 1) y # > # combine > B:=: > C:=Involution( > SyzygyModule( > Involution(B,Alg), > Alg), > Alg);     x 2 y 2 − x 2 y − xy 2 + xy − Dy y 2 + Dy y − 2 y + 1 Dx x 2 − Dx x + 2 x − 1     > # the new parametrization > P:=Mult(B1,C[1,1],Alg); " x  − 1 + Dy y 2 + ( − Dy + 2) y  ( x − 1) − ( y − 1) y  − 1 + Dx x 2 + ( − Dx + 2) x  # > # covariance for > # parametrizing function > SE:=exp(-1/2*(x1-x2)^2 > -1/2*(y1-y2)^2): > Kg:=unapply( > DiagonalMatrix([SE]), > (x1,y1,x2,y2)): > # prepare covariance > P2:=ApplyMatrix(P, > [xi(x,y)], Alg): > P2:=convert(P2,list): > l1:=[x=x1,y=y1, > Dx=Dx1,Dy=Dy1]: > l2:=[x=x2,y=y2, > Dx=Dx2,Dy=Dy2]: > # construct covariance > # apply from one side > Kf:=convert( > map( > b->subs( > [xi(x1,y1)=b[1]], > subs(l1,P2)), > convert( > Kg(x1,y1,x2,y2), > listlist)), > Matrix): > # apply from other side > Kf:=convert( > expand( > map( > b->subs( > [xi(x2,y2)=b[1]], > subs(l2,P2)), > convert( > Transpose(Kf), > listlist))), > Matrix): > # code for GP regression > GP:=proc(Kf, > points,yy,epsilon) > local n,m,kf,K,s1,s2,alpha,KStar; > n:=nops(points); > m:=RowDimension(Kf); > s1:=map( > a->[x1=a[1],y1=a[2]], > points); > s2:=map( > a->[x2=a[1],y2=a[2]], > points); > kf:=convert(Kf,listlist); > K:=convert( > evalf( > map( > a->map( > b->convert( > subs(a,subs(b,kf)), > Matrix), > s2), > s1)), > Matrix): > alpha:=yy.(K+epsilon^2)^(-1); > KStar:=map( > a->subs(a,kf), > s1): > KStar:=subs( > [x2=x,y2=y],KStar): > KStar:=convert( > map(op,KStar),Matrix): > return alpha.KStar; > end: > p:=[1/2,1/2]: > mu_p:=Transpose( > subs( > [x=p[1],y=p[2]], > mu)): > gp:=unapply( > factor(simplify( > convert( > GP(Kf,[p],<0|1>-mu_p,1e-5), > list))) > +convert(mu,list), > (x,y)); ( x, y ) 7→ e − 0 . 25+0 . 5 x − 0 . 5 x 2 +0 . 5 y − 0 . 5 y 2 · [1 + 16 x  y 4 ( x − 1) + y 3 ( x − 2 . 5)( x − 1) + y 2 (0 . 5 x + 1 − 1 . 5 x 2 ) − y ( x − 1)( x − 2 . 33) + ( x − 1) 2  , − 16 y  x 4 ( y − 1) + x 3 ( y − 2 . 5)( y − 1) + x 2 (0 . 5 y + 1 − 1 . 5 y 2 ) − x ( y − 1)( y − 2 . 33) + ( y − 1) 2  ] References A dams, W. W. and Loustaunau, P . (1994). An In- tr o duction to Gr öbner Bases . Graduate Studies in Mathematics. American Mathematical So ciet y . A dler, R. (1981). The Ge ometry of R andom Fields . Classics in Applied Mathematics. So ciety for Indus- trial and Applied Mathematics. Agrell, C. (2019). Gaussian Pro cesses with Linear Op erator Inequalit y Constraints. arXiv pr eprint arXiv:1901.03134 . Barak at, M. (2010). Purity Filtration and the Fine Structure of Autonomy. In Pr o c e e dings of the 19th International Symp osium on Mathematic al The ory of Networks and Systems - MTNS 2010 , pages 1657– 1661, Budap est, Hungary . Barak at, M. and Lange-Hegermann, M. (2011). An axiomatic setup for algorithmic homological algebra and an alternative approac h to lo calization. J. Alge- br a Appl. , 10(2):269–293. Barb er, D. and W ang, Y. (2014). Gaussian pro cesses for bay esian estimation in ordinary differential equa- tions. In International c onfer enc e on machine le arn- ing , pages 1485–1493. Ba yer, D. and Stillman, M. (1988). On the Complexity of Computing Syzygies. Journal of Symb olic Com- putation , 6(2-3):135–147. Berns., F., Lange-Hegermann., M., and Beecks., C. (2020). T ow ards gaussian pro cesses for automatic and interpretable anomaly detection in industry 4.0. In Pr o c e e dings of the International Confer enc e on Innovative Intel ligent Industrial Pr o duction and L o- gistics - V olume 1: IN4PL, , pages 87–92. INSTICC, SciT ePress. Bertinet, A. and Agnan, T. C. (2004). R epr o ducing Kernel Hilb ert Sp ac es in Pr ob ability and Statistics . Klu wer A cademic Publishers. Buc hberger, B. (2006). An Algorithm for Finding the Basis Elements of the Residue Class Ring of a Zero Dimensional Polynomial Ideal. J. Symb olic Comput. , 41(3-4):475–511. T ranslated from the 1965 German original by Mic hael P . Abramson. Calderhead, B., Girolami, M., and Lawrence, N. D. (2009). Accelerating bay esian inference ov er nonlin- ear differen tial equations with gaussian processes. In K oller, D., Sch uurmans, D., Bengio, Y., and Bottou, L., editors, A dvanc es in Neur al Information Pr o c ess- ing Systems 21 , pages 217–224. Curran Asso ciates, Inc. Cartan, H. and Eilenberg, S. (1999). Homolo gic al alge- br a . Princeton Landmarks in Mathematics. Princeton Univ ersity Press, Princeton, NJ. With an app endix b y David A. Buchsbaum, Reprin t of the 1956 origi- nal. Chen, T. Q., Rubano v a, Y., Bettencourt, J., and Du- v enaud, D. K. (2018). Neural ordinary differential equations. In A dvanc es in neur al information pr o- c essing systems , pages 6571–6583. Ch yzak, F., Quadrat, A., and Rob ertz, D. (2005). Ef- fectiv e algorithms for parametrizing linear con trol systems ov er Ore algebras. Appl. Algebr a Engr g. Comm. Comput. , 16(5):319–376. Ch yzak, F., Quadrat, A., and Rob ertz, D. (2007). Ore- Mo dules: a Symbolic Pac k age for the Study of Mul- tidimensional Linear Systems. In Applic ations of time delay systems , v olume 352 of L e ctur e Notes in Contr ol and Inform. Sci. , pages 233–264. Springer, Berlin. Cobb, A. D., Everett, R., Markham, A., and Rob erts, S. J. (2018). Identifying sources and sinks in the presence of multiple agen ts with gaussian pro cess v ector calculus. In Pr o c e e dings of the 24th A CM SIGKDD International Confer enc e on Know le dge Disc overy & Data Mining , pages 1254–1262. Cramér, H. and Leadb etter, M. R. (2004). Stationary and Relate d Sto chastic Pr o c esses . Sample function prop erties and their applications, Reprint of the 1967 original. Da V eiga, S. and Marrel, A. (2012). Gaussian Pro cess Mo deling with Inequalit y Constraints. In A nnales de la F aculté des scienc es de T oulouse: Mathématiques , v olume 21, pages 529–555. Dec ker, W., Greuel, G.-M., Pfister, G., and Schöne- mann, H. (2019). Singular 4-1-2 — A com- puter algebra system for p olynomial computations. http://www.singular.uni- kl.de . Duv enaud, D. (2014). Automatic Mo del Construction with Gaussian Pr o c esses . PhD thesis, Universit y of Cam bridge. Eisen bud, D. (1995). Commutative A lgebr a with a View T owar d Algebr aic Ge ometry , v olume 150 of Gr aduate T exts in Mathematics . Springer-V erlag. Ganc hev, K., Gillenw ater, J., T ask ar, B., et al. (2010). P osterior regularization for structured laten t v ari- able mo dels. Journal of Machine L e arning R ese ar ch , 11(Jul):2001–2049. Garnett, R., Osb orne, M. A., Reece, S., Rogers, A., and Rob erts, S. J. (2010). Sequential bay esian prediction in the presence of changepoints and faults. The Computer Journal , 53(9):1430–1446. Garnett, R., Osb orne, M. A., and Rob erts, S. J. (2009). Sequen tial Bay esian Prediction in the Presence of Changep oin ts. In Pr o c e e dings of the 26th Annual In- ternational Confer enc e on Machine L e arning , pages 345–352. ACM. Gerdt, V. P . (2005). In v olutiv e Algorithms for Comput- ing Gröbner bases. In Computational c ommutative and non-c ommutative algebr aic ge ometry , v olume 196 of NA TO Sci. Ser. III Comput. Syst. Sci. , pages 199–225. Graep el, T. (2003). Solving Noisy Linear Op erator Equations b y Gaussian Processes : Application to Ordinary and Partial Differen tial Equations. In Pr o- c e e dings of the Twentieth International Confer enc e on International Confer enc e on Machine L e arning , ICML’03, pages 234–241. AAAI Press. Gra yson, D. R. and Stillman, M. E. (1992). Macaula y2, a softw are system for research in algebraic geometry . http://www.math.uiuc.edu/Macaulay2/ . Greuel, G. and Pfister, G. (2008). A Singular Intr o- duction to Commutative A lgebr a . Springer, Berlin, extended edition. With contributions b y Olaf Bac h- mann, Christoph Lossen and Hans Schönemann. Gulian, M., F rank el, A., and Swiler, L. (2020). Gaussian pro cess regression constrained by b oundary v alue problems. arXiv pr eprint arXiv:2012.11857 . Herlands, W., Wilson, A., Nickisc h, H., Flaxman, S., Neill, D., V an Panh uis, W., and Xing, E. (2016). Scalable gaussian pro cesses for characterizing multi- dimensional change surfaces. In Artificial Intel ligenc e and Statistics , pages 1013–1021. Honk ela, A., Peltonen, J., T opa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnen b erg, H., Reid, G., La wrence, N., and Rattray , M. (2015). Genome- wide Modeling of Transcription Kinetics Rev eals Patterns of RNA Pro duction Delays. Pr o c e e dings of the National A c ademy of Scienc es , 112(42):13115– 13120. Jidling, C., Hendriks, J., W ahlström, N., Gregg, A., Sc hön, T. B., W ensrich, C., and Wills, A. (2018). Probabilistic Mo delling and Reconstruction of Strain. Nucle ar Instruments and Metho ds in Physics R e- se ar ch Se ction B: Be am Inter actions with Materials and Atoms , 436:141–155. Jidling, C., W ahlström, N., Wills, A., and Sc hön, T. B. (2017). Linearly Constrained Gaussian Pro cesses. In Guyon, I., Luxburg, U. V., Bengio, S., W allach, H., F ergus, R., Vishw anathan, S., and Garnett, R., editors, A dvanc es in Neur al Information Pr o c essing Systems 30 , pages 1215–1224. Curran Associates, Inc. John, D., Heuv eline, V., and Schober, M. (2019). GOODE: A Gaussian Off-The-Shelf Ordinary Dif- feren tial Equation Solver. In Chaudhuri, K. and Salakh utdinov, R., editors, Pr o c e e dings of the 36th International Confer enc e on Machine L e arning , vol- ume 97 of Pr o c e e dings of Machine L e arning R e- se ar ch , pages 3152–3162, Long Beach, California, USA. PMLR. K o cijan, J., Murray-Smith, R., Rasm ussen, C. E., and Girard, A. (2004). Gaussian process mo del based predictiv e con trol. In Pr o c e e dings of the 2004 A mer- ic an c ontr ol c onfer enc e , volume 3, pages 2214–2219. IEEE. Lange-Hegermann, M. (2018). Algorithmic Linearly Constrained Gaussian Pro cesses. In Bengio, S., W al- lac h, H., Laro c helle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, A dvanc es in Neur al In- formation Pr o c essing Systems 31 , pages 2137–2148. Curran Asso ciates, Inc. Lev andovskyy , V. (2005). Non-c ommutative Computer Al gebr a for p olynomial algebr as: Gr öbner b ases, ap- plic ations and implementation . PhD thesis, Univer- sit y of Kaiserslautern. Lev andovskyy , V. and Sc hönemann, H. (2003). PLURAL—a Computer Algebra System for Non- comm utative Polynomial Algebras. In Pr o c e e dings of the 2003 International Symp osium on Symb olic and Algebr aic Computation , pages 176–183 (electronic). A CM. Lima, G. S., Bessa, W. M., and T rimp e, S. (2018). Depth Control of Underwater Robots using Sliding Mo des and Gaussian Process Regression. In Pr o c e e d- ing of the 15th L atin Americ an R ob otics Symp osium , Jo ˜ a o Pessoa, Brazil. Llo yd, J. R., Duvenaud, D., Grosse, R., T enenbaum, J., and Ghahramani, Z. (2014). Automatic Construction and Natural-language Description of Nonparametric Regression Mo dels. In Twenty-eighth AAAI c onfer- enc e on artificial intel ligenc e . Macêdo, I. and Castro, R. (2008). Learning Divergence- free and Curl-free Vector Fields with Matrix-v alued Kernels. Instituto Nacional de Matematic a Pur a e Aplic ada, Br asil, T e ch. R ep . Mac Lane, S. (1998). Cate gories for the working math- ematician , v olume 5 of Gr aduate T exts in Mathemat- ics . Springer-V erlag, New Y ork, second edition. Ma yr, E. (1989). Membership in Polynomial Ideals o ver Q is Exp onen tial Space Complete. In ST ACS 89 (Paderb orn, 1989) , volume 349 of L e ctur e Notes in Comput. Sci. , pages 400–406. Springer, Berlin. Ma yr, E. W. and Meyer, A. R. (1982). The Complexity of the Word Problems for Commutativ e Semigroups and Polynomial Ideals. A dvanc es in mathematics , 46(3):305–329. Nguy en, N. C. and Peraire, J. (2015). Gaussian func- tional regression for linear partial differential equa- tions. Computer Metho ds in Applie d Me chanics and Engine ering , 287:69–89. Nguy en, N. C. and Peraire, J. (2016). Gaussian func- tional regression for output prediction: Mo del assim- ilation and exp erimen tal design. Journal of Compu- tational Physics , 309:52–68. nLab authors (2020). The n lab. ( http://ncatlab. org/nlab/ ) [Online; accessed 12-April-2020]. Ob erst, U. (1990). Multidimensional Constant Linear Systems. A cta Appl. Math. , 20(1-2):1–175. Osb orne, M. A., Garnett, R., and Rob erts, S. J. (2009). Gaussian Pro cesses for Global Optimization. In 3r d international c onfer enc e on le arning and intel ligent optimization (LION3) , pages 1–15. Ossw ald, H. (2012). Mal liavin Calculus for Lévy Pr o- c esses and Infinite-Dimensional Br ownian Motion . Cam bridge T racts in Mathematics. Cambridge Uni- v ersity Press. P ap oulis, A. and Pillai, S. U. (2002). Pr ob ability, R an- dom V ariables, and Sto chastic Pr o c esses . McGraw- Hill Higher Education, 4 edition. Quadrat, A. (2010). Systèmes et Structures – Une appro c he de la théorie mathématique des systèmes par l’analyse algébrique constructive. Habilitation thesis. Quadrat, A. (2013). Grade Filtration of L inear F unc- tional Systems. A cta Appl. Math. , 127:27–86. Raissi, M. (2018). Deep hidden physics mo dels: Deep learning of nonlinear partial differen tial equations. Journal of Machine L e arning R ese ar ch , 19(25):1–24. Raissi, M. and Karniadakis, G. E. (2017). Machine learning of linear differen tial equations using Gaus- sian pro cesses. Raissi, M. and Karniadakis, G. E. (2018). Hidden ph ysics mo dels: Machine learning of nonlinear partial differen tial equations. Journal of Computational Physics , 357:125–141. Raissi, M., Perdik aris, P ., and Karniadakis, G. E. (2017). Machine learning of linear differential equa- tions using gaussian pro cesses. Journal of Computa- tional Physics , 348:683–693. Raissi, M., Perdik aris, P ., and Karniadakis, G. E. (2018). Numerical gaussian processes for time- dep enden t and nonlinear partial differential equa- tions. Rasm ussen, C. E. and Williams, C. K. I. (2006). Gaus- sian Pr o c esses for Machine L e arning (A daptive Com- putation and Machine L e arning) . The MIT Press. Rob ertz, D. (2003-2008). JanetOr e: A Maple Package to Compute a Janet Basis for Mo dules over Or e Algebr as . Rob ertz, D. (2006). F ormal Computational Metho ds for Contr ol The ory . PhD thesis, R WTH Aachen. Rob ertz, D. (2015). Recent progress in an algebraic analysis approach to linear systems. Multidimen- sional Syst. Signal Pr o c ess. , 26(2):349–388. Särkkä, S. (2011). Linear Op erators and Stochastic Partial Differential Equations in Gaussian Pro cess Regression. In International Confer enc e on A rtificial Neur al Networks , pages 151–158. Springer. Särkkä, S. and Solin, A. (2019). Applie d sto chastic dif- fer ential e quations , v olume 10. Cambridge Universit y Press. Sc heuerer, M. and Sc hlather, M. (2012). Cov ariance Mo dels for Divergence-free and Curl-free Random Vector Fields. Sto chastic Mo dels , 28(3):433–451. Seiler, W. M. and Zerz, E. (2010). The In v erse Syzygy Problem in Algebraic Systems Theory . P AMM , 10(1):633–634. Solin, A. and Kok, M. (2019). Know y our b oundaries: Constraining gaussian pro cesses by v ariational har- monic features. arXiv pr eprint arXiv:1904.05207 . Solin, A., K ok, M., W ahlström, N., Schön, T. B., and Särkk ä, S. (2018). Mo deling and Interpolation of the Ambien t Magnetic Field by Gaussian Pro cesses. IEEE T r ansactions on R ob otics , 34(4):1112–1127. Song, Y., Zh u, J., and Ren, Y. (2016). Kernel ba y esian inference with p osterior regularization. In Lee, D. D., Sugiy ama, M., Luxburg, U. V., Guy on, I., and Gar- nett, R., editors, A dvanc es in Neur al Information Pr o c essing Systems 29 , pages 4763–4771. Curran As- so ciates, Inc. Sturmfels, B. (2005). What is... a Gröbner Basis? Notic es of the AMS , 52(10):2–3. T an, M. H. Y. (2018). Gaussian pro cess mo deling with b oundary information. Statistic a Sinic a , pages 621–648. Thew es, S., Lange-Hegermann, M., Reub er, C., and Bec k, R. (2015). Adv anced Gaussian Pro cess Mo del- ing T ec hniques. In Design of Exp eriments (DoE) in Powertr ain Development . Exp ert. Thorn ton, C., Hutter, F., Ho os, H. H., and Leyton- Bro wn, K. (2013). Auto-WEKA: Com bined Selec- tion and Hyp erparameter Optimization of Classifi- cation Algorithms. In Pr o c e e dings of the 19th A CM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , KDD, pages 847–855. A CM. T reves, F. (1967). T op olo gic al V e ctor Sp ac es, Distri- butions and Kernels . Dov er b ooks on mathematics. A cademic Press. W ahlström, N., K ok, M., Schön, T. B., and Gustafsson, F. (2013). Mo deling Magnetic Fields using Gaussian Pro cesses. In in Pr o c e e dings of the 38th Interna- tional Confer enc e on A c oustics, Sp e e ch, and Signal Pr o c essing (ICASSP) . W eib el, C. A. (1994). An intr o duction to homolo gic al al- gebr a . Cambridge Studies in Adv anced Mathematics. Cam bridge Universit y Press. W u, J., P olo czek, M., Wilson, A. G., and F razier, P . (2017). Bay esian Optimization with Gradients. In A dvanc es in Neur al Information Pr o c essing Systems , pages 5267–5278. Y ang, X., T artak o vsky , G., and T artako vsky , A. (2018). Ph ysics-informed kriging: A ph ysics-informed gaus- sian pro cess regression metho d for data-mo del con- v ergence. arXiv pr eprint arXiv:1809.03461 . Y uan, Y., Y ang, X. T., Zhang, Z., and Zhe, S. (2020). Macroscopic traffic flow mo deling with ph ysics regularized gaussian pro cess: A new insight in to machine learning applications. arXiv pr eprint arXiv:2002.02374 . Zapała, A. M. (2002). Construction and Basic Prop- erties of Gaussian Measures on Fréchet Spaces. Sto chastic analysis and applic ations , 20(2):445–470. Zerz, E. (2000). T opics in Multidimensional Line ar Systems The ory , volume 256 of L e ctur e Notes in Contr ol and Information Scienc es . London. Zerz, E., Seiler, W. M., and Hausdorf, M. (2010). On the Inv erse Syzygy Problem. Communic ations in A lgebr a , 38(6):2037–2047.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment