A Role for Symmetry in the Bayesian Solution of Differential Equations

A Role for Symmetry in the Ba y esian Solution of Diﬀeren tial Equations Jun yang W ang 1 , Jon Co c k a yne 2 , Chris. J. Oates 1 , 2 1 New castle Univ ersit y , UK 2 Alan T uring Institute, UK Septem b er 24, 2019 Abstract The in terpretation of n umerical metho ds, such as ﬁnite diﬀerence methods for dif- feren tial equations, as p oin t estimators suggests that formal uncertaint y quan tiﬁca- tion can also be p erformed in this con text. Comp eting statistical paradigms can be considered and Ba yesian probabilistic numerical metho ds (PNMs) are obtained when Ba y esian statistical principles are deploy ed. Ba y esian PNM hav e the app ealing prop- ert y of b eing closed under comp osition, such that uncertain ty due to diﬀeren t sources of discretisation in a numerical metho d can be join tly mo delled and rigorously prop- agated. Despite recent attention, no exact Bay esian PNM for the numerical solution of ordinary diﬀeren tial equations (ODEs) has b een prop osed. This raises the funda- men tal question of whether exact Bay esian metho ds for (in general nonlinear) ODEs ev en exist. The purp ose of this pap er is to provide a positive answer for a limited class of ODE. T o this end, we work at a foundational level, where a nov el Bay esian PNM is prop osed as a pro of-of-concept. Our prop osal is a synth esis of classical Lie group metho ds, to exploit underlying symmetries in the gradient ﬁeld, and non-parametric regression in a transformed solution space for the ODE. The pro cedure is presented in detail for ﬁrst and second order ODEs and relies on a certain strong technical condi- tion – existence of a solv able Lie algebra – b eing satisﬁed. Numerical illustrations are pro vided. 1 In tro duction Numerical metho ds underpin almost all of scientiﬁc, engineering and industrial output. In the abstract, a n umerical task can b e formulated as the approximation of a quantit y of in terest Q : Y → Q , 1 sub ject to a ﬁnite computational budget. The true underlying state y † ∈ Y is typically high- or inﬁnite-dimensional, so that only limited information A : Y → A (1) is pro vided and exact computation of Q (y † ) is prohibited. F or example, n umerical integration aims to approximate an integral Q (y † ) = R y † ( t )d t given the v alues A (y † ) = { ( x i , y † ( x i )) } n i =1 of the in tegrand y † on a ﬁnite n umber of abscissa { x i } n i =1 . Similarly , a numerical appro xi- mation to the solution Q (y † ) = y † of a diﬀerential equation dy / d x = f ( x, y( x )), y( x 0 ) = y 0 , m ust b e based on at most a ﬁnite n umber of ev aluations of f , the gradien t ﬁeld. In this viewp oin t a n umerical metho d corresp onds to a map b : A → Q , as depicted in Figure 1a , where b ( a ) represen ts an appro ximation to the solution of the diﬀeren tial equation based on the information a ∈ A . The increasing am bition and complexit y of con temp orary applications is suc h that the computational budget can b e extr emely small compared to the precision that is required at the level of the qua n tit y of in terest. As such, in man y imp ortan t problems it is not possible to reduce the numerical error to a negligible lev el. Fields acutely asso ciated with this c hallenge include climate forecasting ( W edi , 2014 ), computational cardiology ( Chabiniok et al. , 2016 ) and molecular dynamics ( P erilla et al. , 2015 ). In the presence of non-negligible n umerical error, it is unclear how scientiﬁc in terpretation of the output of computation can pro ceed. F or example, a p osteriori analysis of traditional n umerical metho ds can b e used to establish hard upp er b ounds on the n umerical error, but these bounds t ypically dep end on an unknown global constant. In the case of ODEs, this may b e the maximum v alue of a norm k f k of the gradien t ﬁeld (see e.g. Estep , 1995 ). If k f k were known, it would b e p ossible to provide a hard b ound on numerical error. How ev er, in the t ypical n umerical con text all that is kno wn is that k f k is ﬁnite. One could attempt to appro ximate k f k with cubature, but that itself requires a numerical cubature metho d whose error is required to ob ey a known b ound. In general, therefore, there are no hard error b ounds without global information b eing a priori pro vided ( Larkin , 1974 ). Our aim in this paper is to consider, as an alternative to traditional n umerical analysis, an exact Bay esian framework for solution uncertaint y quan tiﬁcation in the ordinary diﬀerential equation con text. 1.1 Probabilistic Numerical Metho ds The ﬁeld of pr ob abilistic numerics dates back to Larkin ( 1972 ) and a mo dern p ersp ectiv e is pro vided in Hennig et al. ( 2015 ); Oates and Sulliv an ( 2019 ). Under the abstract framew ork just describ ed, numerical metho ds can b e interpreted as p oin t estimators in a statistical con text, where the state y † can b e thought of as a latent v ariable in a statistical mo del, and the ‘data’ consist of information A (y † ) that do es not fully determine the quantit y of interest Q (y † ) but is indirectly related to it. Hennig et al. ( 2015 ) provide an accessible introduction and surv ey of the ﬁeld. In particular, they illustrated ho w PNM can be used to quan- tify uncertain ty due to discretisation in imp ortan t scien tiﬁc problems, such as astronomical imaging. 2 Y A Q Q A b (a) P Y A P Q Q # µ a ← [ a B ( µ,a ) ← [ a (b) Figure 1: Diagrams for a numerical metho d. (a) The traditional viewp oint of a numerical metho d is equiv alen t to a map b from a ﬁnite-dimensional information space A to the space of the quantit y of interest Q . (b) The probabilistic viewp oint treats appro ximation of Q (y † ) in a statistical context, describ ed b y a map B ( µ, · ) from A to the space of probability distributions on Q . The probabilistic numerical metho d ( A, B ) is Bay esian if and only if (b) is a commutativ e diagram. Let the notation Σ Y denote a σ -algebra on the space Y and let P Y denote the set of probabilit y measures on ( Y , Σ Y ). A probabilistic n umerical metho d (PNM) is a pro cedure whic h takes as input a ‘b elief ’ distribution µ ∈ P Y , represen ting epistemic uncertaint y with resp ect to the true (but unknown) v alue y † , along with a ﬁnite amoun t of information, A (y † ) ∈ A . The output is a distribution B ( µ, A (y † )) ∈ P Q on ( Q , Σ Q ), representing epistemic uncertain ty with resp ect to the quan tit y of interest Q (y † ) after the information A (y † ) hav e b een pro cessed. F or example, a PNM for an ordinary diﬀerential equation (ODE) takes an initial b elief distribution deﬁned on the solution space of the diﬀeren tial equation, together with information arising from a ﬁnite num b er of ev aluations of the gradient ﬁeld, plus the initial condition of the ODE, to pro duce a distribution o ver either the solution space of the ODE, or p erhaps some derived quantit y of interest. In this pap er, the measurabilit y of A and Q will b e assumed. Despite computational adv ances in this emergent ﬁeld, until recently there had not b een an attempt to establish rigorous statistical foundations for PNM. In Co c k ayne et al. ( 2019 ) the authors argued that Ba y esian principles can b e adopted. In brief, this framew ork requires that the input belief distribution µ carries the semantics of a Bay esian agent’s prior belief and the output of a PNM agrees with the inferences drawn when the agent is rational. T o be more precise recall that, in this pap er, information is provided in a deterministic 1 manner through ( 1 ) and thus Bay esian inference corresp onds to conditioning of µ on the lev el sets of A . Let Q # : P Y → P Q denote the push-forward map asso ciated to Q . i.e. Q # ( µ )( S ) = µ ( Q − 1 ( S )) for all S ∈ Σ Q . Let { µ a } a ∈A ⊂ P Y denote the disin tegration, assumed to exist 2 , of µ ∈ P Y along the map A . 1 It is of course p ossible to perform Ba y esian inference in the noisy-data con text, but for the ODEs considered in this pap er we assume that one can obtain noiseless ev aluations of the gradient ﬁeld. 2 The reader unfamiliar with the concept of a disintegration can treat µ a as a tec hnical notion of the ‘conditional distribution of y given A (y) = a ’ when reading this work. The disinte gr ation the or em , Thm. 1 of Chang and Pollard ( 1997 ), guarantees existence and uniqueness up to a A # µ -n ull set under the weak requiremen t that Y is a metric space, Σ Y is the Borel σ -algebra, µ is Radon, Σ A is coun table generated and Σ A con tains all singletons { a } for a ∈ A . 3 Deﬁnition 1. A pr ob abilistic numeric al metho d ( A, B ) with A : Y → A and B : P Y × A → P Q for a quantity of inter est Q : Y → Q is Ba y esian if and only if B ( µ, a ) = Q # ( µ a ) for al l µ ∈ P Y and al l a ∈ A . This deﬁnition is in tuitive; the output of the PNM should coincide with the marginal distribution for Q (y † ) according to the disin tegration element µ a ∈ P Y , based on the infor- mation a ∈ A that was pro vided. The deﬁnition is equiv alen t to the statemen t that Figure 1b is a commutativ e diagram. In Co c k ayne et al. ( 2019 ) the map A was termed an infor- mation op er ator and the map B was termed a b elief up date op er ator ; we adhere to these deﬁnitions in our work. The Bay esian approach to PNM confers several imp ortant b eneﬁts: • The input µ and output B ( µ, a ) b elief distributions can b e interpreted, resp ectiv ely , as a prior and (marginal) p osterior . 3 As such, they automatically inherit the stronger formal seman tics and philosophical foundations that underpin the Bay esian framework and, in this sense, are well-understoo d (see e.g. Gelman and Shalizi , 2013 ). • The deﬁnition of Bay esian PNM is op erational. Thus, if w e are presen ted with a prior µ and information a then there is a unique Ba y esian PNM and it is constructively deﬁned. • The class of Bay esian PNM is closed under comp osition, such that uncertaint y due to diﬀeren t sources of discretisation can b e jointly mo delled and rigorously propagated. This p oint will not b e discussed further in this work, but we refer the interested reader to Section 5 of Co c k a yne et al. ( 2019 ). Nev ertheless, the strict deﬁnition of Bay esian PNM limits scop e to design conv enient com- putational algorithms and indeed several prop osed PNM are not Ba y esian (see T able 1 in Co c k a yne et al. , 2019 ). The c hallenge is t w o-fold; for a Ba y esian PNM, the elicitation of an appropriate prior distribution µ and the exact computation of its disintegration { µ a } a ∈A m ust b oth b e addressed. In the next section we argue that – p erhaps as a consequence of these constraints – a strictly Bay esian PNM for the numerical solution of an ODE do es not y et exist. 1.2 Existing W ork for ODEs The ﬁrst PNM (of any ﬂav our) for the numerical solution of ODEs, or which we are aw are, w as due to Skilling ( 1992 ). Tw o decades later, this problem is receiving renew ed critical atten tion as part of the active developmen t of PNM. The aim of this section is to provide a high-level ov erview of existing work and to argue that existing metho ds do not adhere to the deﬁnition of Bay esian PNM. 3 Indeed, if the set Y a = { y ∈ Y : A (y) = a } is not measure zero under µ , then µ a is the conditional distri- bution deﬁned b y restricting µ to the subset Y a ; µ a (y) = 1[y ∈ Y a ] µ (y) /µ ( Y a ). The theory of disin tegrations generalises the conditional distribution µ a to cases where Y a is a null set. 4 Notation: The notational con v en tion used in this paper is that the non-italicised y denotes a generic function, whereas the italicised y denotes a scalar v alue taken by the function y . The notation y † is reserv ed for the true solution to an ODE. Throughout, the underlying state space Y is taken to b e a space o ccupied b y the true solution of the ODE, i.e. y † ∈ Y . 1.2.1 Skilling ( 1992 ) The ﬁrst pap er on this topic, of which w e are aw are, was Skilling ( 1992 ). This will serve as a protot ypical PNM for the numerical solution of an ODE. Originally describ ed as ‘Ba yesian’ b y the author, w e will argue that, at least in the strict sense of Deﬁnition 1 , it is not a Ba yesian PNM. Consider a generic univ ariate ﬁrst-order initial v alue problem dy d x = f ( x, y( x )) , x ∈ [ x 0 , x T ] , y( x 0 ) = y 0 . (2) Throughout this paper all ODEs that w e consider will be assumed to b e w ell-deﬁned and admit a unique solution y † ∈ Y where Y is some pre-sp eciﬁed set. In this pap er the quantit y of interest Q (y † ) will either b e the solution curve y † itself or the v alue y † ( x T ) of the solution at a sp eciﬁc input (in this section it will b e the former). The approach outlined in Skilling ( 1992 ) allo ws for a general prior µ ∈ P Y . The gradien t ﬁeld f is treated as a ‘black b ox’ oracle that can b e queried at a ﬁxed computational cost. Th us we are provided with ev aluations of the gradient ﬁeld [ f ( x 0 , y 0 ) , . . . , f ( x n , y n )] > ∈ R n +1 for certain input pairs { ( x i , y i ) } n i =0 . This approac h of treating ev aluations of the gradien t ﬁeld as ‘data’ will be seen to b e a common theme in existing PNM for ODEs and theoretical supp ort for this framework is ro oted in the ﬁeld of information-based complexity ( T raub and W o´ zniak owski , 1992 ). Let a i = f ( x i , y i ) and a i = [ a 0 , . . . , a i ]. The selection of the input pairs ( x i , y i ) on whic h f is ev aluated is not constrained and sev eral p ossibilities, of increasing complexity , w ere discussed in Skilling ( 1992 ). T o ﬁx ideas, the simplest such approach is to pro ceed iteratively as follo ws: (0.1) The ﬁrst pair ( x 0 , y 0 ) is fully determined by the initial condition of the ODE. (0.2) The oracle then provides one piece of information, a 0 = f ( x 0 , y 0 ). (0.3) The prior µ is up dated according to a 0 , leading to a b elief distribution µ 0 whic h is just the disintegration element µ a 0 . (1) A discrete time step x 1 = x 0 + h , where h = x T − x 0 n > 0, is p erformed and a particular p oin t estimate y 1 = R y( x 1 )d µ 0 (y) for the unknown true v alue y † ( x 1 ) is obtained. This sp eciﬁes the second pair ( x 1 , y 1 ). The pro cess contin ues similarly , suc h that at time step i − 1 w e hav e a b elief distribution µ i − 1 = B ( µ, a i − 1 ) ∈ P Y , where the general b elief up date op erator B is yet to b e deﬁned, and the following step is p erformed: ( i ) Let x i = x i − 1 + h and set y i = R y( x i )d µ i − 1 (y) . 5 The ﬁnal output is a probabilit y distribution µ n = B ( µ, a n ) ∈ P Y . Now, strictly sp eaking, the metho d just describ ed is not a PNM in the concrete sense that w e hav e deﬁned. Indeed, the ﬁnal output µ n is a deterministic function of the v alues a n of the gradient ﬁeld that w ere obtained. How ev er, in the absence of additional assumptions on the global smo othness of the gradien t ﬁeld, the v alues of f ( x, y ) outside any op en neighbourho o d of the true solution curv e C = { ( x, y ) : y = y † ( x ) , x ∈ [ x 0 , x T ] } do not determine the solution of the ODE and, con versely , the solution of the ODE provides no information ab out the v alues of the gradient ﬁeld outside any op en neigh b ourhoo d of the true solution curve C . Thus it is not p ossible, in general, to write down an information op erator A : Y → A that repro duces the information a n when applied to the solution curve y † ( · ) of the ODE. The approach tak en in Skilling ( 1992 ) was therefore to p osit an appr oximate information op erator ˆ A and a particular belief up date op erator B , which are now describ ed. The appro x- imate information op erator is motiv ated by the in tuition that if y † ( x i ) is well-appro ximated b y y i at the abscissa x i then dy † d x ( x i ) should b e w ell-approximated by f ( x i , y i ). That is, the follo wing approximate information op erator ˆ A was constructed: ˆ A (y) =  dy d x ( x 0 ) , . . . , dy d x ( x n )  > . (3) Of course, ˆ A (y † ) 6 = a n in general. T o ackno wledge the approximation error, Skilling ( 1992 ) prop osed to mo del the information with an approximate lik eliho o d: d µ n d µ 0 (y) = n Y i =1 d µ i d µ i − 1 (y) (4) d µ i d µ i − 1 (y) ∝ exp − 1 2 σ 2  dy d x ( x i ) − f ( x i , y i )  2 ! (5) This w as referred to in Skilling ( 1992 ) as simply a “likelihoo d” and, together with µ 0 = µ a 0 , the output µ n is completely sp eciﬁed. Here σ is a ﬁxed positive constant, how ev er in principle a non-diagonal cov ariance matrix can also b e considered. The negative consequences of basing inferences on an appro ximate information op erator ˆ A are p oten tially tw ofold. First, recall that v alues of the gradient ﬁeld that are not con tained on the true solution curve of the ODE do not, in principle, determine the true solution curve y † . It is therefore unclear if these v alues should b e tak en into accoun t at all. Second, in the sp ecial case where the gradien t ﬁeld f do es not dep end the second argumen t then the quan tities dy d x ( x i ) and f ( x i , y i ) are iden tical. F rom this p ersp ectiv e, µ n represen ts inference under a mis-sp eciﬁed likelihoo d, since information is treated as erroneous when it is in fact exact. The use of a mis-sp eciﬁed lik eliho o d violates the likeliho o d principle and implies violation of the Bay esian framework. This conﬁrms, through a diﬀerent argument, that the approach of Skilling ( 1992 ) cannot b e Ba yesian in the strict sense of Deﬁnition 1 . 6 1.2.2 Sc hob er et al. ( 2014 , 2019 ); T eymur et al. ( 2016 , 2018 ) After Skilling ( 1992 ), sev eral authors hav e prop osed improv emen ts to the ab ov e metho d. The approac h of Schober et al. ( 2014 ) considered Eq. ( 5 ) in the σ ↓ 0 limit. In order that exact conditioning can b e p erformed in this limit, the input b elief distribution µ was restricted to b e a k -times integrated Wiener measure on the solution space of the ODE. The tractability of the in tegrated W einer measure leads to a closed-form characterisation of the posterior and enables computation to b e cast as a Kalman ﬁlter. This direction of researc h can b e motiv ated b y the following fact: F or k ∈ { 1 , 2 } the authors pro v e that if the input pair ( x 1 , y 1 ) is taken as y 1 = R y( x 1 )d µ 0 (y), as indicated in Section 1.2.1 , then the smo othing estimate ˆ y 1 = R y( x 1 )d µ 1 (y), i.e. the p osterior mean for y( x 1 ) based on information a 1 , coincides with the deterministic appro ximation to y † ( x 1 ) that w ould b e pro vided by a k -th order Runge-Kutta method. As such, theoretical guaran tees suc h as lo cal conv ergence order are inherited. F or k = 3 it was shown that the same conclusion can b e made to hold, pr ovide d that the input pair ( x 1 , y 1 ) is selected in a manner that is no longer obviously related to µ 0 . In all cases the iden tiﬁcation with a classical Runge- Kutta method do es not extend b eyond iteration n = 1. Similar connections to m ultistep metho ds of Nordsiec k and Adams form w ere identiﬁed, resp ectiv ely , in Schober et al. ( 2019 ) and T eym ur et al. ( 2016 , 2018 ). The approach of Sc hob er et al. ( 2014 ) is not Bay esian in the sense of Deﬁnition 1 , which can again b e deduced from dep endence on v alues of the gradient ﬁeld aw ay from the true solution curve, so that the likelihoo d principle is violated. 1.2.3 Kersting and Hennig ( 2016 ) The subsequent w ork of Kersting and Hennig ( 2016 ) attempted to elicit an appropriate non- zero co v ariance matrix for use in Eq. ( 5 ), in order to encourage uncertain ty estimates to b e b etter calibrated. Their prop osal consisted of the approximate likelihoo d d µ i d µ i − 1 (y) ∝ exp   − 1 2 dy d x ( x i ) − m i σ i ! 2   (6) m i = Z f ( x i , y( x i ))d µ i − 1 (y) (7) σ 2 i = Z ( f ( x i , y( x i )) − m i ) 2 d µ i − 1 (y) . (8) This can b e view ed as the predictive marginal likelihoo d for the v alue f ( x i , y( x i )) based on µ i − 1 . F rom a practical p ersp ectiv e, the approac h is somewhat circular as the in tegrals in Eq. ( 7 ) and ( 8 ) inv olv e the black-box gradien t ﬁeld f and are therefore cannot b e computed. The authors suggested a n umber of wa ys that these quantities could b e numerically approx- imated 4 , whic h inv olve ev aluating f ( x i , y i ) at one or more v alues y i that must b e sp eciﬁed. 4 One suc h method is Bayesian quadr atur e , another PNM wherein the in tegrand f is modelled as uncertain un til it is ev aluated. This raises separate philosophical c hallenges, as one m ust then ensure that the statistical 7 The o v erall approac h again violates the likelihoo d principle and is therefore not Ba yesian in the sense of Deﬁnition 1 . 1.2.4 Chkrebtii et al. ( 2016 ) The original work of Chkrebtii et al. ( 2016 ) is somewhat related to Kersting and Hennig ( 2016 ), how ev er instead of using the mean of the current p osterior as input to the gradient ﬁeld, the input pair ( x i , y i ) was selected by sampling y i from the marginal distribution for y( x i ) implied by µ i − 1 . The appro ximate likelihoo d in this approac h was taken as follo ws: d µ i d µ i − 1 (y) ∝ exp   − 1 2 dy d x ( x i ) − f ( x i , y i ) σ i ! 2   m i = Z dy d x ( x i )d µ i − 1 (y) σ 2 i = Z  dy d x ( x i ) − m i  2 d µ i − 1 (y) . Compared to Eq. ( 6 ), ( 7 ) and ( 8 ), this approach do es not rely on integrals o ver the unkno wn gradien t ﬁeld. How ever, the approach also relies on the appro ximate information op erator in Eq. ( 3 ) and is thus not Bay esian according to Deﬁnition 1 . 1.2.5 Conrad et al. ( 2017 ); Ab dulle and Garegnani ( 2018 ) The approac hes prop osed in Conrad et al. ( 2017 ); Abdulle and Garegnani ( 2018 ) are not motiv ated in the Ba y esian framew ork, but instead seek to introduce a stochastic p erturbation in to a classical n umerical metho d. Both metho ds fo cus on the quantit y of in terest Q (y † ) = y † ( x T ). In the simple context of Eq. ( 2 ), the metho d of Conrad et al. ( 2017 ) augments the explicit Euler metho d with a sto c hastic p erturbation: y i = y i − 1 + hf ( x i − 1 , y i − 1 ) + h 2  i , x i = x i − 1 + h, i = 1 , . . . , n The distribution of the sequence (  i ) n i =1 m ust be speciﬁed. In the simplest case where the  i are mo delled as indep endent, sa y with  i ∼ ρ , the canonical ﬂow map Φ i : R → R of the explicit Euler metho d, deﬁned as Φ i ( z ) = z + hf ( x i , z ), is replaced by the probabilistic coun terpart Ψ i : P R → P R giv en by Ψ i ( ν )(d z ) = Z ρ  d z − Φ i ( ˜ z ) h 2  ν (d ˜ z ) through which sto chasticit y can b e propagated. The output of the metho d of Conrad et al. ( 2017 ) is then B = Ψ n ◦ · · · ◦ Ψ 1 δ ( y 0 ), where δ ( z ) denotes an atomic distribution on z ∈ R . mo dels used for y( · ) and f ( x i , · ) are logically consistent. In Kersting and Hennig ( 2016 ) these functions w ere simply mo delled as indep endent. 8 F or the case where each ρ i has zero mean, it can b e shown that the mean of B equals Φ n ◦ · · · ◦ Φ 1 ( y 0 ), which is exactly the deterministic approximation produced with the explicit Euler metho d. This framew ork can b e practically problematic, since  i is c harged with mo delling the extrap olation error and such errors are not easily mo delled as indep enden t random v ariables – Section 2.8 of Higham ( 2002 ) is devoted to this p oin t. Th us, if for example f ( x, y ) = y , the true linearisation error at step i is e x i − e x i − 1 so that the ‘true’ sequence (  i ) n i =1 in this case is monotonic and exp onen tially unbounded. The challenge of designing a sto c hastic mo del for the sequence (  i ) n i =1 that reﬂects the highly structured nature of the error remains unresolv ed. On the other hand, the mathematical prop erties of this metho d are now w ell- understo o d ( Lie et al. , 2018 , 2019 ). The prop osal of Ab dulle and Garegnani ( 2018 ) w as to instead consider randomisation of the inputs { x i } T i =0 in the con text of a classical numerical metho d, also outside of the Ba y esian framework. 1.2.6 Co c k a yne et al. ( 2019 ); T ronarp et al. ( 2019 ) The survey just presented b egs the question of whether a Bay esian PNM for ODEs can exist at all. A ﬁrst step tow ard this goal w as taken in Co c k ayne et al. ( 2019 ), where it w as argued that an information op erator can b e constructed if the vector ﬁeld f is brought to the left-hand-side in Eq. ( 2 ). Sp eciﬁcally , they prop osed the information op erator ˜ A (y) =  dy d x ( x 0 ) − f ( x 0 , y( x 0 )) , . . . , dy d x ( x n ) − f ( x n , y( x n ))  > for which the ‘data’ are trivial; ˜ a n = 0. It was rigorously established that the approximate lik eliho o d d µ i,σ d µ i − 1 ,σ (y) = exp − 1 2 σ 2  dy d x ( x i ) − f ( x i , y( x i ))  2 ! leads to an exact Bay esian PNM in the limit: µ n,σ F → µ ˜ a n as σ ↓ 0 for ˜ A # µ -almost all ˜ a n ∈ R n +1 . Here F → denotes conv ergence in an integral probability metric deﬁned by a suitable set F of test functions (see Sec. 4 of Co ck a yne et al. , 2019 ). Ho wev er, the dep endence of the information op erator ˜ A on f means that this cannot b e used as the basis for a practical metho d. Indeed, unless f dep ends linearly on its second argumen t and conjugacy prop erties of the prior can b e exploited, the p osterior cannot easily b e characterised. Appro ximate tec hniques from nonlinear ﬁltering w ere prop osed to address this c hallenge in T ronarp et al. ( 2019 ). 1.3 Our Contribution The comprehensive literature review in the previous section reveals not only that that no Ba yesian PNM has yet b een prop osed, but also that such an endeav our ma y b e fundamen tally 9 diﬃcult. Indeed, a theme that has emerged with existing PNM for ODEs, whic h can b e traced bac k to Skilling ( 1992 ), is the use of appro ximate and sub jectiv e forms for the likelihoo d. The complex, implicit relationship betw een the laten t ODE solution y † and the data f ( x i , y i ) arising from the gradien t ﬁeld app ears to preclude use of an exact likelihoo d. Of course, violation of the lik eliho o d principle is not traditionally a concern in the design of a n umerical metho d, yet if the strictly ric her output that comes with a Bay esian PNM is desired, then clearly adherence to the lik eliho o d principle is imp ortan t. It is therefore natural to ask the question, “under what conditions can exact Ba y esian inference for ODEs b e made?”. This pap er presents a pro of-of-concept PNM for the n umerical solution of a (limited) class of ODEs that is b oth (a) Bay esian in the sense of Deﬁnition 1 and (b) can in principle b e implemented. The metho d b eing proposed is indicated in Figure 2 and its main prop erties are as follows: • The classical theory o f Lie groups is exploited, for the ﬁrst time in the context of PNM, to understand when an ODE of the form in Eq. ( 2 ) can b e transformed in to an ODE whose gradient ﬁeld is a function of the indep enden t state v ariable only , reducing the ODE to an integral. • F or ODEs that admit a solv able Lie algebra, our prop osal can b e shown to sim ultane- ously p erform exact Bay esian inference on b oth the original and the Lie-transformed ODE. Crucially , as we explain later, to iden tify a Lie algebra only high-level a priori information ab out the ODE is required. The case of ﬁrst- and second-order ODEs is presen ted in detail, but the metho d itself is general. • In general the sp eciﬁcation of prior b elief can b e diﬃcult. The prior distributions that w e construct are guaran teed to resp ect asp ects of the structure of the ODE. As such, our priors are, to some extent, automatically adapted to the ODE at hand as opp osed to b eing arbitrarily p osited. • In addition to the beneﬁts conferred in the Ba y esian framework, detailed in Section 1.1 and in Co c k ayne et al. ( 2019 ), the metho d b eing prop osed can b e computationally realised. On the other hand, there is a cost in terms of the run-time of the method that is substantially larger than existing, non-Bay esian approaches (esp ecially classical n umerical methods). As suc h, w e consider this w ork to b e a pro of-of-concept rather than an applicable Bay esian PNM. The remainder of the pap er is structured as follo ws: Section 2 is dedicated to a succinct review of Lie group methods for ODEs. In Section 3 , Lie group methods are exploited to construct priors ov er the solution space of the ODE whenev er a solv able Lie algebra is admitted and exact Bay esian inference is p erformed on a transformed version of the original ODE which takes the form of an integral. Numerical exp eriments are rep orted in Section 4 . Finally , some conclusions and recommendations for future research are drawn in Section 5 . 10 dy d x = f ( x, y( x )) ds d r = g ( r ) Lie transform; ( x, y ) 7→ ( r, s ) (Lie transform) − 1 ; ( r, s ) 7→ ( x, y ) Exact Ba yesian PNM ? Figure 2: Sc hematic of our prop osed approac h. An n th order ODE that admits a solv able Lie algebra can b e transformed into n integrals, to which exact Bay esian probabilistic numerical metho ds can b e applied. The p osterior measure on the transformed space is then pushed bac k through the inv erse transformation onto the original domain of in terest. 2 Bac kground This section provides a succinct ov erview of classical Lie group metho ds, introduced in the 19th cen tury by Sophus Lie in the diﬀeren tial equation context ( Hawkins , 2012 ). Lie devel- op ed the fundamen tal notion of a Lie gr oup of tr ansformations , which roughly correspond to maps that take one solution of the ODE to another. This provided a formal generalisation of certain algebraic tec hniques, such as dimensional analysis and transformations based on spatial symmetries, that can sometimes b e employ ed to algebraically reduce the order of an ODE. This section pro ceeds as follows: First, in Section 2.1 w e introduce a one-parameter Lie group of transformations and then, in Section 2.2 , we explain what it means for a curve or a surface to b e transformation-in v ariant. In Secion 2.3 w e recall consequences of Lie’s theory in the ODE context. Last, in Section 2.4 the generalisation to multi-parameter Lie groups is indicated. Our developmen t is heavily inﬂuenced by Bluman and Anco ( 2002 ) and we refer the reader to their b o ok when required. 2.1 One-P arameter Lie Groups of T ransformations The purpose of this section is to recall essen tial deﬁnitions, together with the ﬁrst fundamen- tal the or em of Lie , which relates a Lie group of transformations to its inﬁnitesimal generator. In what follo ws we consider a ﬁxed domain D ⊂ R d and denote a generic state v ariable as x = ( x 1 , . . . , x d ) ∈ D . Deﬁnition 2 (One-Parameter Group of T ransformations) . A one-parameter group of trans- formations on D is a map X : D × S → D , deﬁne d on D × S for some set S ⊂ R , to gether with a bivariate map φ : S × S → S , such that the fol lowing hold: 11 (1) F or e ach  ∈ S , the tr ansformation X ( · ,  ) is a bije ction on D . (2) ( S, φ ) forms a gr oup with law of c omp osition φ . (3) If  0 is the identity element in ( S , φ ) , then X ( · ,  0 ) is the identity map on D . (4) F or al l x ∈ D , , δ ∈ S , if x ∗ = X ( x,  ) , x ∗∗ = X ( x ∗ , δ ) , then x ∗∗ = X ( x ∗ , φ ( , δ )) . In what follows w e contin ue to use the shorthand notation x ∗ = X ( x,  ). The notion of a Lie group additionally includes smo othness assumptions on the maps that constitute a group of transformations. Recall that a real-v alued function is analytic if it can b e lo cally expressed as a conv ergent p o wer series. Deﬁnition 3 (One-Parameter Lie Group of T ransformations) . L et X , to gether with φ , form a one-p ar ameter gr oup of tr ansformations on D . Then we say that X , to gether with φ , form a one-parameter Lie group of transformations on D if, in addition, the fol lowing hold: (5) S is a (p ossibly unb ounde d) interval in R . (6) F or e ach  ∈ S , X ( · ,  ) is inﬁnitely diﬀer entiable in D. (7) F or e ach x ∈ D , X ( x, · ) is an analytic function on S . (8) φ is analytic in S × S . Without the loss of generality it will b e assumed, through re-parametrisation if required, that S con tains the origin and  = 0 is the identit y element in ( S, φ ). The deﬁnition is illustrated through three examples: Example 1 (T ranslation in the x-Axis) . The one-p ar ameter tr ansformation x ∗ 1 = x 1 +  , x ∗ 2 = x 2 for  ∈ R forms a Lie gr oup of tr ansformations on D = R 2 with gr oup c omp osition law φ ( , δ ) =  + δ . Example 2 (Rotation Group) . The one-p ar ameter tr ansformation x ∗ 1 = x 1 cos(  ) − x 2 sin(  ) , x ∗ 2 = x 1 sin(  ) + x 2 cos(  ) for  ∈ R again forms a Lie gr oup of tr ansformations on D = R 2 with gr oup c omp osition law φ ( , δ ) =  + δ . Example 3 (Cyclic group C p ) . L et D = { 1 , 2 , 3 , . . . , p } . L et S = Z . F or n ∈ D and m ∈ S , let X ( n, m ) = n + m (mo d p ) . Then X , to gether with φ ( a, b ) = a + b , deﬁnes a one p ar ameter gr oup of tr ansformations on D , but is not a Lie gr oup of tr ansformations sinc e (5) is violate d. The ﬁrst fundamental theorem of Lie establishes that a Lie group of transformations can b e characterised by its inﬁnitesimal generator, deﬁned next: Deﬁnition 4 (Inﬁnitesimal T ransformation) . L et X b e a one-p ar ameter Lie gr oup of tr ans- formations. Then the tr ansformation x ∗ = x + ξ ( x ) , ξ ( x ) = ∂ X ( x,  ) ∂       =0 , is c al le d the inﬁnitesimal transformation asso ciate d to X and the map ξ is c al le d an inﬁnites- imal . 12 Deﬁnition 5 (Inﬁnitesimal Generator) . The inﬁnitesimal generator of a one-p ar ameter Lie gr oup of tr ansformations X is deﬁne d to b e the op er ator X = ξ · ∇ wher e ξ is the inﬁnitesimal asso ciate d to X and ∇ = ( ∂ ∂ x 1 , ∂ ∂ x 2 , . . . , ∂ ∂ x n ) is the gr adient. Example 4 (Ex. 1 , con tin ued) . F or Ex. 1 , we have ξ ( x ) =  d x ∗ 1 d       =0 , d x ∗ 2 d       =0  = (1 , 0) so the inﬁnitesimal gener ator for tr anslation in the x-axis is X = ∂ ∂ x 1 . Example 5 (Ex. 2 , contin ued) . Similarly, the inﬁnitesimal gener ator for the r otation gr oup is X = − x 2 ∂ ∂ x 1 + x 1 ∂ ∂ x 2 . The ﬁrst fundamen tal theorem of Lie provides a constructiv e route to obtain the inﬁnitesimal generator from the transformation itself: Theorem 1 (First F undamen tal Theorem of Lie; see pages 39-40 of Bluman and Anco ( 2002 )) . A one p ar ameter Lie gr oup of tr ansformations X is char acterise d by the initial value pr oblem: d x ∗ d τ = ξ ( x ∗ ) , x ∗ = x when τ = 0 , (9) wher e τ (  ) is a p ar ametrisation of  which satisﬁes τ (0) = 0 and, for  6 = 0 , τ (  ) = Z  0 ∂ φ ( a, b ) ∂ b    ( a,b )=( δ − 1 ,δ ) d δ. Her e δ − 1 denotes the gr oup inverse element for δ . Since Eq. ( 9 ) is translation-in v ariant in τ , it follo ws that without loss of generality we can assume a parametrisation τ (  ) suc h that the group action b ecomes φ ( τ 1 , τ 2 ) = τ 1 + τ 2 and, in particular, τ − 1 = − τ . In the remainder of the pap er, for conv enience we assume that all Lie groups are parametrised such that the group action is φ (  1 ,  2 ) =  1 +  2 . The next result can b e viewed as a conv erse to Theorem 1 , as it sho ws how to obtain the transformation from the inﬁnitesimal generator. All proofs are reserv ed for Supplemen tal Section A.1 . Theorem 2. A one p ar ameter Lie gr oup of tr ansformations with inﬁnitesimal gener ator X is e quivalent to x ∗ = e  X x , wher e e  X = P ∞ k =0 1 k !  k X k x . The following is immediate from the pro of of Theorem 2 : Corollary 1. If F is inﬁnitely diﬀer entiable, then F ( x ∗ ) = e  X F ( x ) . 13 2.2 In v ariance Under T ransformation In this section we explain what it means for a curve or a surface to b e in v ariant under a Lie group of transformations and how this notion relates to the inﬁnitesimal generator. Deﬁnition 6 (Inv arian t F unction) . A function F : D → R is said to b e inv ariant under a one p ar ameter Lie gr oup of tr ansformations x ∗ = X ( x,  ) if F ( x ∗ ) = F ( x ) for al l x ∈ D and  ∈ S . Based on the results in Section 2.1 , one migh t expect that inv ariance to a transformation can b e expressed in terms of the inﬁnitesimal generator of the transformation. This is indeed the case: Theorem 3. A diﬀer entiable function F : D 7→ R is invariant under a one p ar ameter Lie gr oup of tr ansformations with inﬁnitesimal gener ator X if and only if X F ( x ) = 0 for al l x ∈ D . Theorem 4. F or a function F : D 7→ R and a one p ar ameter Lie gr oup of tr ansformations x ∗ = X ( x,  ) , the r elation F ( x ∗ ) = F ( x ) +  holds for al l x ∈ D and  ∈ S if and only if X F ( x ) = 1 for al l x ∈ D . The following deﬁnition is fundamen tal to the metho d prop osed in Section 3 : Deﬁnition 7 (Canonical Co ordinates) . Consider a c o or dinate system r = ( r 1 ( x ) , . . . , r n ( x )) on D . Then any one p ar ameter Lie gr oup of tr ansformations x ∗ = X ( x,  ) induc es a tr ans- formation of the c o or dinates r ∗ i = r i ( x ∗ ) . The c o or dinate system r is c al le d canonical for the tr ansformation if r ∗ 1 = r 1 , . . . , r ∗ n − 1 = r n − 1 and r ∗ n = r n +  . Example 6 (Ex. 2 , contin ued) . F or the r otation gr oup in Ex. 2 , we have c anonic al c o or di- nates r 1 ( x 1 , x 2 ) = p x 2 1 + x 2 2 , r 2 ( x 1 , x 2 ) = arctan( x 2 /x 1 ) . In canonical co ordinates, a one parameter Lie group of transformations can b e viewed as a straigh t-forward translation in the r n -axis. The existence of canonical co ordinates is estab- lished in Thm. 2.3.5-2 of Bluman and Anco ( 2002 ). Note that Thms. 3 and 4 imply that X r ∗ i = 0 for i = 1 , 2 , ..., n − 1, X r ∗ n = 1. Deﬁnition 8 (Inv arian t Surface) . F or a function F : D → R , a surfac e deﬁne d by F ( x ) = 0 is said to b e in v arian t under a one p ar ameter Lie gr oup of tr ansformation x ∗ = X ( x,  ) if and only if F ( x ∗ ) = 0 whenever F ( x ) = 0 for al l x ∈ D and  ∈ S . The in v ariance of a surface, as for a function, can b e cast in terms of an inﬁnitesimal generator: Corollary 2. A surfac e F ( x ) = 0 is invariant under a one p ar ameter Lie gr oup of tr ans- formations with inﬁnitesimal gener ator X if and only if X F ( x ) = 0 whenever F ( x ) = 0 . 14 2.3 Symmetry Metho ds for ODEs The aim of this section is to relate Lie transformations to ODEs for which these transfor- mations are admitted. These techniques form the basis for our prop osed metho d in Section 3 . F or an ODE of the form in Eq. ( 2 ), one can consider the action of a transformation on the co ordinates ( x, y ); i.e. a sp ecial case of the ab ov e framework where the generic co ordinates x 1 and x 2 are resp ectively the indep enden t ( x ) and dep enden t ( y ) v ariables of the ODE. It is clear that such a transformation also implies some kind of transformation of the deriv atives y m := d m y d x m . Indeed, consider a one-parameter Lie group of transformations ( x ∗ , y ∗ ) = ( X ( x, y ;  ) , Y ( x, y ;  )). Then w e hav e from the c hain rule that y ∗ m := d m y ∗ d( x ∗ ) m is a function of x, y , y 1 , . . . , y m and w e denote y ∗ m = Y m ( x, y , y 1 , . . . , y m ;  ). As an explicit example: y ∗ 1 = d y ∗ d x ∗ = ∂ Y ( x,y ;  ) ∂ x + y 1 ∂ Y ( x,y ;  ) ∂ y ∂ X ( x,y ;  ) ∂ x + y 1 ∂ X ( x,y ;  ) ∂ y =: Y 1 ( x, y , y 1 ;  ) In general: y ∗ m = ∂ y ∗ m − 1 ∂ x + y 1 ∂ y ∗ m − 1 ∂ y + y 2 ∂ y ∗ m − 1 ∂ y 1 + ... + y m ∂ y ∗ m − 1 ∂ y m − 1 ∂ X ( x,y ;  ) ∂ x + y 1 ∂ X ( x,y ;  ) ∂ y =: Y m ( x, y , y 1 , . . . , y m ;  ) In this sense a transformation deﬁned on ( x, y ) can b e naturally extended to a transformation on ( x, y , y 1 , y 2 , . . . ) as required. Deﬁnition 9 (Admitted T ransformation) . A n m th or der ODE F ( x, y , y 1 , . . . , y m ) = 0 is said to admit a one p ar ameter Lie gr oup of tr ansformations ( x ∗ , y ∗ ) = ( X ( x, y ;  ) , Y ( x, y ;  )) if the surfac e F deﬁne d by the ODE is invariant under the Lie gr oup of tr ansformations, i.e. if F ( x ∗ , y ∗ , y ∗ 1 , . . . , y ∗ m ) = 0 whenever F ( x, y , y 1 , . . . , y m ) = 0 . Example 7. Cle arly any ODE of the form d y d x = F ( x ) admits the tr ansformation ( x ∗ , y ∗ ) = ( x, y +  ) . Our next task is to understand ho w the inﬁnitesimal generator of a transformation can b e extended to act on deriv ativ es y m . Deﬁnition 10 (Extended Inﬁnitesimal T ransformation) . The m th extended inﬁnitesimals of a one p ar ameter Lie gr oup of tr ansformations ( x ∗ , y ∗ ) = ( X ( x, y ;  ) , Y ( x, y ;  )) ar e deﬁne d as the functions ξ , η , η (1) , . . . , η ( m ) for which the fol lowing e quations hold: x ∗ = X ( x, y ;  ) = x + ξ ( x, y ) + O (  2 ) y ∗ = Y ( x, y ;  ) = y + η ( x, y ) + O (  2 ) y ∗ 1 = Y 1 ( x, y , y 1 ;  ) = y 1 + η (1) ( x, y , y 1 ) + O (  2 ) . . . y ∗ m = Y m ( x, y , y 1 , . . . , y m ;  ) = y m + η ( m ) ( x, y , y 1 , y 2 , . . . , y m ) + O (  2 ) 15 It can b e shown straigh tforwardly via induction that η ( m ) ( x, y , y 1 , y 2 , . . . , y m ) = d m η d x m − m X k =0 m ! ( m − k )! k ! y m − k − 1 d k ξ d x k (10) where d d x denotes the full deriv ativ e with resp ect to x , i.e. d d x = ∂ ∂ x + y 1 ∂ ∂ y + P m +1 k =2 y k ∂ ∂ y k − 1 . It follows that η ( m ) is a p olynomial in y 1 , y 2 , . . . , y m with co eﬃcien ts linear com binations of ξ , η and their partial deriv atives up to the m th order. Deﬁnition 11 (Extended Inﬁnitesimal Generator) . The m th extended inﬁnitesimal gener- ator is deﬁne d as X ( m ) = ξ m ( x, y , y 1 , . . . , y m ) · ∇ = ξ ( x, y ) ∂ ∂ x + η ( x, y ) ∂ ∂ y + η (1) ( x, y ) ∂ ∂ y 1 + · · · + η ( m ) ( x, y , y 1 , . . . , y m ) ∂ ∂ y m wher e ∇ = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂ y 1 , . . . , ∂ ∂ y m ) is the extende d gr adient. The following corollaries are central to the actual computation of the admitted Lie groups of an ODE. Corollary 3. A diﬀer entiable function F : D m → R wher e D m is the phase sp ac e c ontaining elements of the form ( x, y , y 1 , . . . , y m ) , is in variant under a one p ar ameter Lie gr oup of tr ans- formations with an extende d inﬁnitesimal gener ator X ( m ) if and only if X ( m ) F ( x, y , y 1 , . . . , y m ) = 0 for al l ( x, y , y 1 , . . . , y m ) ∈ D m . Corollary 4 (Inﬁnitesimal Criterion for Symmetries Admitted by an ODE) . A one p ar am- eter Lie gr oup of tr ansformations is admitte d by the m th or der ODE F ( x, y , y 1 , .., y m ) = 0 if and only if its extende d inﬁnitesimal gener ator X ( m ) satisﬁes X ( m ) F ( x, y , y 1 , . . . , y m ) = 0 whenever F ( x, y , y 1 , . . . , y m ) = 0 . 2.4 Multi-P arameter Lie Groups and Lie Algebras T o leverage the full p ow er of Lie symmetry metho ds for ODEs of order m ≥ 2, w e need to con- sider multiple Lie symmetries which are collectiv ely describ ed b y a Lie algebr a . F ortunately , the notion of a multi-parameter Lie group of transformations is a natural generalisation from the one parameter case. Th us, this last section of background material concerns the gener- alisation of the deﬁnitions in Section 2.1 to the case of a multi-parameter Lie group. The asso ciated Lie algebra will also b e deﬁned. Deﬁnition 12 (Multi-P arameter Lie Group of T ransformations) . The set of tr ansformations x ∗ = X ( x,  ) wher e x ∗ i = X i ( x,  ) and  = (  1 ,  2 , . . . ,  r ) ∈ S ⊂ R r is c al le d a r -parameter Lie group of transformations if it satisﬁes the same axioms as in the one p ar ameter c ase, but with law of c omp osition φ ( , δ ) = ( φ 1 ( , δ ) , . . . , φ r ( , δ )) , and (without loss of gener ality)  = (0 , 0 , ..., 0) as the gr oup identity element. 16 Deﬁnition 13 (Inﬁnitesimal Matrix) . The appr opriate gener alisation for the inﬁnitesimal tr ansformation is the inﬁnitesimal matrix Ξ = [ ξ ij ] , whose entries ar e deﬁne d as ξ ij ( x ) = ∂ X j ( x, ) ∂  i     =0 . Deﬁnition 14 (Inﬁnitesimal Generator) . An r -p ar ameter Lie gr oup of tr ansformations is asso ciate d with r inﬁnitesimal gener ators, X i , deﬁne d as X i = X i ( x ) = P d j =1 ξ ij ( x ) ∂ ∂ x j . The ﬁrst fundamen tal theorem of Lie can b e generalised to the multi-parameter case. In particular, it can be sho wn that an r -parameter Lie group of transformations is c haracterised b y the set of its r inﬁnitesimal generators. The generalisation is straight-forw ard and so, for brevit y , we refer the reader to pages 39-40 of Bluman and Anco ( 2002 ). Next w e explain ho w the collection of inﬁnitesimal generators forms a Lie algebra. This relies on the basic facts that the set D of diﬀerential op erators on D is a v ector space ov er R (i.e. λ X + µ Y ∈ D for all X , Y ∈ D and λ, µ ∈ R ) and that diﬀeren tial op erators can b e comp osed (i.e. XY ∈ D for all X , Y ∈ D ). Deﬁnition 15 (Comm utator) . The commutator of two inﬁnitesimal gener ators X i and X j is deﬁne d as [X i , X j ] = X i X j − X j X i . Theorem 5 (Second F undamental Theorem of Lie; see page 78 of Bluman and Anco ( 2002 )) . Consider an r -p ar ameter Lie gr oup of tr ansformations and let L denote the line ar sp an of the inﬁnitesimal gener ators X 1 , . . . , X r in D . L et [ · , · ] : L × L → D denote the unique biline ar op er ator that agr e es with Def. ( 15 ) on the set of inﬁnitesimal gener ators. i.e. " r X i =1 λ i X i , r X j =1 µ j X j # = r X i =1 r X j =1 λ i µ j (X i X j − X j X i ) . (11) Then [ · , · ] maps into L . i.e. the right hand side of Eq. ( 11 ) b elongs to L for al l λ, µ ∈ R r . Example 8. Consider the two p ar ameter gr oup of tr ansformations on D = R 2 given by ( x ∗ , y ∗ ) = ( x + x + x 2 δ, y + y  + y 2 δ ) . The inﬁnitesimal gener ators c orr esp onding to δ and  , r esp e ctively, ar e X 1 = x 2 ∂ ∂ x + y 2 ∂ ∂ y , X 2 = x ∂ ∂ x + y ∂ ∂ y . It c an b e dir e ctly veriﬁe d that [X 1 , X 2 ] = − X 1 . The space L , deﬁned in Thm. 5 , satisﬁes the prop erties of an r -dimensional Lie algebra L , deﬁned next: Deﬁnition 16 (Lie Algebra) . A n r -dimensional ve ctor sp ac e L over R to gether with a biline ar op er ator [ · , · ] : L × L → L is c al le d an r -dimensional Lie algebra if the fol lowing hold: (1) A lternativity: [X , X] = 0 for al l X ∈ L (2) Jac obi Identity: [X , [Y , Z]] + [Y , [Z , X]] + [Z , [X , Y]] = 0 for al l X , Y , Z ∈ L In general, for the metho ds presen ted in Section 3 to b e applied, existence of an n - parameter Lie group of transformations is not in itself suﬃcient; we require the existence of an n -dimensional solvable Lie sub-algebra, deﬁned next: 17 Deﬁnition 17 (Normal Lie Sub-algebra) . Consider a Lie sub-algebr a J of a Lie algebr a L with biline ar op er ator [ · , · ] , i.e. a subset J ⊂ L such that, when e quipp e d with the r estriction of [ · , · ] to J × J , is itself a Lie algebr a and, in p articular, [X , Y] ∈ J for al l X , Y ∈ J . Then J is said to b e normal if, in addition, [X , Y] ∈ J for al l X ∈ J , Y ∈ L . Deﬁnition 18 (Solv able Lie Algebra) . An r -dimensional Lie algebr a L is c al le d solv able if ther e exists a chain of sub-algebr as L 1 ⊂ L 2 ⊂ ... ⊂ L q − 1 ⊂ L r =: L such that L i − 1 is a normal sub-algebr a of L i for i = 2 , 3 , ..., r . F or lo w-order ODEs, the existence requirement for an admitted Lie group of transforma- tions is more restrictiv e than the requirement that the asso ciated Lie algebra is solv eable. Indeed, we hav e the follo wing result: Theorem 6. Al l two-dimensional Lie Algebr as ar e solvable. This completes our review of bac kground material. The exact Bay esian PNM developed in Section 3 for an n th order ODE require the existence of an admitted n -parameter Lie group of transformations with a solv able Lie algebra. In practice we therefore require some high-lev el information on the gradient ﬁeld f , in order to establish which transformations of the ODE may b e admitted. In addition, the requirement of a solv able Lie algebra also limits the class of ODEs for whic h our exact Ba y esian methods can b e emplo y ed. Nev ertheless, this class of ODEs is suﬃciently broad to hav e merited extensiv e theoretical researc h ( Bluman and Anco , 2002 ) and the developmen t of softw are ( Baumann , 2013 ). 3 Metho ds In this section our no vel Ba yesian PNM is presen ted. The metho d relies on high-lev el in- formation ab out the gradient ﬁeld f and, in Section 3.1 , w e discuss how such information can b e exploited to iden tify an y Lie transformations that are admitted b y the ODE. In the case of a ﬁrst order ODE, any non-trivial transformation is suﬃcient for our metho d and an explicit information op erator is provided for this case, together with recommendations for prior construction, in Section 3.2 . T ogether, the prior and the information op erator uniquely determine a Bay esian PNM, as explained in Section 1.1 . In the general case of an m th order ODE, we require that an m -dimensional solv able Lie algebra is admitted by the ODE. The sp ecial case m = 2 is treated in detail, with an explicit information op erator and guidance for prior construction provided in Section 3.3 . In the Supplemental Section A.2 the selection of input pairs ( x i , y i ) to the gradient ﬁeld is discussed. 3.1 F rom an ODE to its Admitted T ransformations F or the metho ds proposed in this paper, transformations admitted b y the ODE, together with their inﬁnitesimal generators, m ust ﬁrst be obtained. The algorithm for obtaining inﬁnitesimal generators follo ws as a consequence of Cor. 4 . Indeed, supp ose w e ha ve a m th 18 order ODE of the form y m − f ( x, y , y 1 , . . . , y m − 1 ) = 0. Then, by Cor. 4 , a transformation with inﬁnitesimal generator X is admitted b y the ODE if and only if: X ( m ) ( y m − f ( x, y , y 1 , . . . , y m − 1 )) = 0 whenever y m = f ( x, y , y 1 , . . . , y m − 1 ) . (12) In inﬁnitesimal notation, Eq. ( 12 ) is equiv alen t to η ( m ) ( x, y , y 1 , . . . , y m − 1 , y m ) = ξ ∂ f ∂ x + η ∂ f ∂ y + m − 1 X k =1 η ( k ) ∂ f ∂ y k . (13) The direct solution of Eq. ( 13 ) reco v ers any transformations that are admitted. In the common scenario where f ( x, y , y 1 , . . . , y m − 1 ) is a polynomial in y 1 , y 2 , . . . , y m − 1 , the algorithm just describ ed, for identiﬁcation of admitted transformations, can b e fully automated (c.f. Baumann , 2013 ). Indeed, from Def. 10 it follows that the extended in- ﬁnitesimals η ( k ) for k ∈ 1 , 2 , 3 , . . . , m are p olynomial in y 1 , y 2 , . . . , y k . Thus, by substituting y m = f ( x, y , y 1 , . . . , y m − 1 ), Eq. ( 12 ) to o must b e a p olynomial in y 1 , y 2 , . . . , y m − 1 . Moreo ver, the co eﬃcients of this p olynomial must v anish b ecause ( 12 ) holds for arbitrary v alues of x, y , y 1 , . . . , y m − 1 . This argumen t, of setting co eﬃcients to zero, leads to a system of linear partial diﬀeren tial equations (o verdetermined when m ≥ 2) for ξ ( x, y ) and η ( x, y ), which can b e exactly solved to retriev e all the inﬁnitesimal generators of the ODE. The same strat- egy can often b e applied b ey ond the p olynomial case and explicit w orked examples of this pro cedure are now pro vided: Example 9 (First Order ODE) . Consider the class of al l ﬁrst or der ODEs of the form dy d x = f ( x, y( x )) , f ( x, y ) = F  y x  . F r om Eq. ( 10 ) , we have η (1) = η x + ( η y − ξ x ) y 1 − ξ y ( y 1 ) 2 so Eq. ( 13 ) b e c omes η x + ( η y − ξ x ) f − ξ y ( f ) 2 = ξ ∂ f ∂ x + η ∂ f ∂ y and thus η x + ( η y − ξ x ) F  y x  − ξ y F  y x  2 = − ξ F 0  y x  y x 2 + η F 0  y x  1 x . F or this e quation to hold for gener al F , the c o eﬃcients of F , F 2 and F 0 must vanish: η x = 0 , η y − ξ x = 0 , ξ y = 0 , − ξ y x 2 + η 1 x = 0 . This is now a line ar system of p artial diﬀer ential e quations in ( ξ , η ) which is e asily solve d; namely ξ = x, η = y . The asso ciate d inﬁnitesimal gener ator is X = x ∂ ∂ x + y ∂ ∂ y . Example 10 (Second Order ODE) . The inﬁnitesimal gener ators for the se c ond or der ODE ( x − y( x )) d 2 y d x 2 + 2 dy d x  dy d x + 1  +  dy d x  3 / 2 = 0 (14) ar e derive d in Supplementary Se ction A.1 . 3.2 The Case of a First Order ODE In this section we present our approach for a ﬁrst order ODE. This allo ws some of the more tec hnical details asso ciated to the general case to b e omitted, due to the fact that an y one- dimensional Lie algebra is trivial. The main result that will allow us to construct an exact Ba yesian PNM is as follo ws: 19 Theorem 7 (Reduction of a First Order ODE to an In tegral) . If a ﬁrst or der ODE dy d x = f ( x, y( x )) (15) admits a one p ar ameter Lie gr oup of tr ansformations, then ther e exists c o or dinates r ( x, y ) , s ( x, y ) such that ds d r = G ( r ) (16) for some explicit function G ( r ) . Note that the transformed ODE in Eq. ( 16 ) is nothing more than an integral, for which exact Ba y esian PNM hav e already b een developed (e.g. Briol et al. , 2019 ; Karvonen et al. , 2018 ). A t a high level, as indicated in Fig. 2 , our proposed Ba y esian PNM performs inference for the solution s( r ) of Eq. ( 16 ) and then transforms the resultan t p osterior back into the original ( x, y )-co ordinate system. Our PNM is therefore based on the information op erator A (y) = [ G ( r 0 ) , . . . , G ( r n )] > ∈ A = R n +1 (17) whic h corresp onds indirectly to n + 1 ev aluations of the original gradient ﬁeld f at certain input pairs ( x i , y i ). The selection of the inputs r i is discussed in Section A.2 . The transformation of a ﬁrst order ODE is clearly illustrated in the following: Example 11 (Ex. 9 , contin ued) . Consider the ﬁrst or der ODE dy d x = f ( x, y( x )) , f ( x, y ) = F  y x  . R e c al l fr om Ex. 9 that this ODE admits the one p ar ameter Lie gr oup of tr ansforma- tions x ∗ = αx , y ∗ = αy for α ∈ R and the asso ciate d inﬁnitesimal gener ator is X = x ∂ ∂ x + y ∂ ∂ y . Solving the p air of p artial diﬀer ential e quations X r = 0 , X s = 1 yields the c anonic al c o or- dinates s = log y , r = y x . The tr ansforme d ODE is then ds d r = F ( r ) − r 2 + rF ( r ) =: G ( r ) . Thus an evaluation G ( r ) c orr esp onds to an evaluation of f ( x, y ) at an input ( x, y ) such that r = y x . Tw o imp ortant p oin ts must no w b e addressed: First, the approach just describ ed cannot b e Bay esian unless it corresp onds to a well-deﬁned prior distribution µ ∈ P Y in the original co ordinate system Y . This precludes standard (e.g. Gaussian pro cess) priors in general, as suc h priors assign mass to functions in ( r , s )-space that do not corresp ond to well-deﬁned functions in ( x, y )-space (see Fig. 3 ). Second, an y prior that is used ough t to b e consisten t with the Lie group of transformations that the ODE is kno wn to admit. T o address eac h of these important p oints, we prop ose t w o general principles for prior construction in this w ork. The ﬁrst principle is the implicit prior principle. This ensures that a prior sp eciﬁed in the transformed co ordinates ( r , s ) can b e safely transformed into a well-deﬁned distribution on Y . F or such an implicit prior to b e well-deﬁned we need to understand when a function in ( r, s ) space maps to a well-deﬁned function in the original ( x, y ) domain of interest. Let S denote the image of Y under the canonical co ordinate transformation. Principle 1 (Implicit Prior) . A distribution ν ∈ P S on the tr ansforme d solution sp ac e S c orr esp onds to a wel l-deﬁne d implicit prior µ ∈ P Y pr ovide d that x ( r , s( r )) is strictly monotone as a function of r . 20 ( x, y ) ( r , s ) Lie transform (Lie transform) − 1 Figure 3: Illustration of the implicit prior principle: A prior elicited for the function s( r ) in the transformed co ordinate system ( r, s ) m ust b e supp orted on functions s( r ) that correspond to well-deﬁned functions y( x ) in the original co ordinate system ( x, y ). Thus the situation depicted would not b e allo wed. Example 12 (Ex. 11 , con tinued) . F or the ODE in Ex. 11 , with c anonic al c o or dinates s = log y , r = y x , if x ∈ [ x 0 , x T ] = [1 , x T ] and y ∈ (0 , ∞ ) , then the r e gion in the ( r, s ) plane c orr esp onding to [1 , x T ] × (0 , ∞ ) in the ( x, y ) plane is (0 , ∞ ) × R . Now, d x ( r , s( r )) d r = ∂ x ∂ r + ∂ x ∂ s ds( r ) d r = r s 0 ( r ) exp(s( r )) − exp(s( r )) r 2 . Thus d x d r > 0 if and only if s 0 ( r ) > 1 r and the invariant prior principle r e quir es that we r esp e ct the c onstr aint log ( r ) ≤ s( r ) ≤ log ( r ) + log ( x T ) for al l r > 0 . The set S must ther efor e c onsists of diﬀer entiable functions s deﬁne d on r ∈ (0 , ∞ ) and satisfying log ( r ) ≤ s( r ) ≤ log( r ) + log ( x T ) . No w w e turn to the s econd imp ortant p oin t, namely that the prior ought to enco de kno wledge about an y Lie transformations that are known to b e admitted b y the ODE. In w orking on the transformed space S , it b ecome clear how to construct a prior measure in whic h this kno wledge is enco ded. Our second principle for prior sp eciﬁcation states that equal prior weigh t should b e aﬀorded to all curves that are identical up to a Lie transformation: Principle 2 (Inv arian t Prior) . A distribution ν ∈ P S on the tr ansforme d solution sp ac e S is said to b e in v ariant pr ovide d that ν ( S ) = ν ( S +  ) wher e the elements of S +  ar e the elements of S after a vertic al tr anslation; i.e. s( · ) 7→ s( · ) +  and b oth S, S +  ∈ Σ S . Our recommendation is that, when p ossible, b oth the implicit prior principle and the in v ariant prior principle should b e enforced. How ev er, in practice it seems diﬃcult to satisfy b oth principles and our empirical results in Section 4 are based on implicit priors that are not inv ariant. 3.3 The Case of a Second Order ODE In this section w e present our approach for a second order ODE. The study of second order ODEs is particularly imp ortant, since Newtonian mec hanics is based on ODEs of second 21 order. The presentation is again simpliﬁed relative to the general case of an m th order ODE, this time by virtue of the fact that any tw o dimensional Lie algebra is guaran teed to b e solveable (Thm. 6 ). The main result that will allow us to construct an exact Ba y esian PNM is as follows: Theorem 8 (Reduction of a Second Order ODE to Two Integrals) . If a se c ond or der ODE d 2 y d x 2 = f  x, y( x ) , dy d x  (18) admits a two p ar ameter Lie gr oup of tr ansformations, then ther e exists c o or dinates r ( x, y ) , s ( x, y ) such that ds d r = G ( r ) (19) for some implicitly deﬁne d function G . The function G is explicitly r elate d to the solution of a se c ond e quation of the form d ˜ s d ˜ r = H ( ˜ r ) (20) for some explicit function H ( ˜ r ) . Note that the ODE in Eq. ( 18 ) is reduced to tw o in tegrals, namely Eq. ( 19 ) and Eq. ( 20 ). A t a high level, our prop osed Ba y esian PNM p erforms inference for the solution s( r ) of Eq. ( 19 ) and then transforms the resultant p osterior back into the original ( x, y )-co ordinate system. Ho wev er, b ecause G in Eq. ( 19 ) dep ends on the solution ˜ s( ˜ r ) of Eq. ( 20 ), w e must also estimate ˜ s( ˜ r ) and for this we need to ev aluate H . Our PNM is therefore based on the information op erator A (y) = [ G ( r 0 ) , . . . , G ( r n ) , H ( ˜ r 0 ) , . . . , H ( ˜ r n )] > ∈ A = R 2( n +1) whic h corresponds indirectly to 2( n + 1) ev aluations of f , the original gradien t ﬁeld. The extension of our approac h to a general m th order ODE pro ceeds analogously , with A = R m ( n +1) . The use of Thm. 8 is illustrated in Example 14 in the Supplement. The tw o principles of prior construction that we advocated in the case of a ﬁrst order ODE apply equally to the case of a second- and higher-order ODE. It therefore remains only to discuss the selection of the sp eciﬁc inputs r i (and ˜ r i in the case of a second order ODE) that are used to deﬁne the information op erator A . This discussion is again reserved for Supplemen tal Section A.2 . 4 Numerical Illustration In this section the prop osed Bay esian PNM is empirically illustrated. Recall that w e are not adv o cating these metho ds for practical use, rather they are to serve as a pro of-of-concept 22 for demonstrating that exact Bay esian inference can in principle b e performed for ODEs, alb eit at considerable eﬀort; a non-trivial ﬁnding that helps to shap e ongoing research and discussion in this nascent ﬁeld. The case of a ﬁrst order ODE is considered in Section 4.1 and the second order case is con tained in Section 4.2 . In b oth cases, scop e is limited to verifying the correctness of the pro cedures, as well as indicating how conjugate prior distributions can b e constructed. 4.1 A First Order ODE This section illustrates the empirical p erformance of the prop osed metho d for a ﬁrst order ODE. ODE: T o limit scop e w e consider ﬁrst order ODEs of the form dy d x = F  y( x ) x  , x ∈ [1 , x T ] , y(1) = y 0 . (21) Note that admitted transformation and asso ciated canonical co ordinates for this class of ODE hav e already b een derived in Ex. 9 , Ex. 11 and Ex. 12 . Prior: In constructing a prior µ ∈ P Y w e refer to the implicit prior principle in Sec. 3.2 . Indeed, recall from Ex. 11 that the ODE in Eq. ( 21 ) can b e transformed into an ODE of the form ds d r = G ( r ) , r ∈ (0 , ∞ ) , s( y 0 ) = log ( y 0 ) . Then our approac h constructs a distribution ν ∈ P S where, from Ex. 12 , S is the set of diﬀeren tiable functions s deﬁned on r ∈ (0 , ∞ ) and satisfying log( r ) ≤ s( r ) ≤ log ( r ) + log ( x T ) . (22) T o ensure monotonicity in the implicit prior principle, we take d x dr > 0, which translates into the requirement that ds d r > 1 r . (23) If Eq. ( 23 ) holds, then ν induces a w ell-deﬁned distribution µ ∈ P Y . Note that the constraints in Eq. ( 22 ) and Eq. ( 23 ) preclude the direct use of standard prior mo dels, such as Gaussian pro cesses. Ho wev er, it is nev ertheless possible to design priors that are conv enien t for a giv en set of canonical co ordinates. Indeed, for the canonical co ordinates r , s in our example, we can consider a prior of the form s( r ) = log ( r ) + log ( x T ) ζ ( r ) 23 1 1.2 1.4 1.6 1.8 2 r 0 0.5 1 1.5 2 2.5 s n=5 1 1.2 1.4 1.6 1.8 2 r 0 0.5 1 1.5 2 2.5 s n=50 1 2 3 4 5 x 1 2 3 4 5 6 7 8 9 10 11 y n=5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 1 2 3 4 5 6 7 8 9 10 11 y n=50 Figure 4: Exp erimen tal results, ﬁrst order ODE: The black curves represen t samples from the p osterior, whilst the exact solution is indicated in red. The blue curves represen t a constrain t on the ( r , s ) domain that arises when the implicit prior principle is applied. The n umber n of gradient ev aluations is indicated. T op: results in the ( r, s ) domain. Bottom: results in the ( x, y ) domain. where the function ζ : (0 , ∞ ) → R satisﬁes ζ ( y 0 ) = 0 , ζ ( r ) ≤ 1 , d ζ d r ≥ 0 . (24) F or this exp erimen t, the approach of L´ op ez-Lop era et al. ( 2018 ) w as used as a prior mo del for the monotone, b ounded function ζ ; this requires that a n umber, N , of basis functions is sp eciﬁed - for brevity w e defer the detail to App endix A.3 . 24 The prior just describ ed incorp orates the symmetric structure of the ODE, in the sense that the indep endent v ariable r = y x is the ﬁrst canonical co ordinate of the inﬁnitesimal generator of the Lie group of transformations of the original ODE in Eq. 11 . In other words, r is a v ariable ﬁxed b y the Lie group of transformations of the ODE (in this case x ∗ = αx , y ∗ = αy , so r ∗ = r ). Because the prior is deﬁned on functions s ( r ) of r , this means the prior itself is also unchanged by the Lie group of transformations of the ODE, so that the prior eﬀectiv ely incorp orates the symmetric structure of the ODE. Results: T o obtain empirical results we consider the ODE with F ( r ) = r − 1 + r and y 0 = 1 , x T = 5. The posterior distributions that were obtained as the n umber n of data p oin ts w as increased were sampled and plotted in the ( r, s ) and ( x, y ) planes in Fig. 4 . In eac h case a basis of size N = 2 n w as used. Observe that the implicit prior principle ensures that all curv es in the ( x, y ) plane are well-deﬁned functions (i.e. there is at most one y v alue for each x v alue). Observe also that the p osterior mass app ears to contract to the true solution y † of the ODE as the num b er of ev aluations n of the gradient ﬁeld is increased. 4.2 A Second Order ODE This section illustrates the empirical p erformance of the prop osed metho d for a second order ODE. ODE: Consider again the second order nonlinear ODE in Eqn. 31 together with the initial condition y ( x 0 ) = y 0 , dy d x ( x 0 ) = y 0 0 . Prior: It is sho wn in Ex. 14 in the Supplement that Eq. 31 can b e reduced to a ﬁrst order ODE in ( s, r ) with − 1 x 0 − r ≤ s ≤ − 1 x T − r . The implicit prior principle in this case requires that ds d r > − 1. Thus we are led to consider a parametrisation of the form s( r ) = − 1 x 0 − r +  1 x 0 − 1 x T  ζ ( r ) where the function ζ again satisﬁes the conditions in Eq. ( 24 ). The approach of L´ op ez-Lop era et al. ( 2018 ) was therefore again used as a prior mo del. F or this example an additional lev el of analytic tractability is possible, as describ ed in detail in Ex. 14 in the Supplement. Thus w e need only consider an information op erator of the form A (y) = [ G ( r 0 ) , . . . , G ( r n )]. Results: The p osterior distributions that were obtained are plotted in the ( r , s ) plane and the ( x, y ) plane in Fig. 5 . A basis of size N = 2 n w as used, with [ y 0 , y 0 0 ] = [ − 10 , 1] , [ x 0 , x T ] = [5 , 10]. Observ e that the implicit prior principle ensures that all curv es in the ( x, y ) plane are w ell-deﬁned functions (i.e. there is at most one y v alue for each x v alue). The true solution app ears to b e smo other than the samples, even for 50 gradient ev aluations, which suggests that the prior was somewhat conserv ativ e in this con text. 25 -0.3 -0.29 -0.28 -0.27 -0.26 -0.25 -0.24 r 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 s n=50 5 6 7 8 9 10 x -10.5 -10 -9.5 -9 -8.5 -8 -7.5 -7 y n=50 Figure 5: Exp erimen tal results, second order ODE: The black curv es represent samples from the p osterior in the ( r , s ) plane (left) and ( x, y ) plane (right), whilst the exact solution is indicated in red. The blue curv es represent a constraint on the domain that arises when the implicit prior principle is applied. The num b er of gradient ev aluations was n = 50. 5 Conclusion This pap er presented a foundational p ersp ective on PNM. It w as ﬁrst argued that there did not exist a Ba y esian PNM for the numerical solution of ODEs. Then, to address this gap, a prototypical Bay esian PNM w as developed. The Ba y esian p ersp ectiv e that w e hav e put forw ard sheds light on foundational issues which will need to b e addressed going forward: F oundation of PNM: As explained in Section 1.2 , existing PNM for ODEs each tak e the underlying state space Y to be the solution space of the ODE. This app ears to b e problematic, in the sense that a generic ev aluation f ( x i , y i ) of the gradient ﬁeld cannot b e cast as information A (y † ) ab out the solution y † of the ODE unless the p oint ( x i , y i ) lies exactly on the solution curv e { ( x, y † ( x )) : x ∈ [ x 0 , x T ] } . As a consequence, all existing PNM of whic h we are a ware violate the lik eliho o d principle and are therefore not strictly Ba y esian. The assumption of a solv able Lie algebra, used in this work, can b e seen as a mechanism to ensure the existence of an exact information op erator A , so that the likelihoo d principle can b e ob ey ed. Ho wev er, for a general ODE it migh t be more natural to tak e the underlying state space to b e a set F of p ermitted gradient ﬁelds and the quantit y of interest Q ( f ) to map a gradient ﬁeld f to the solution of the asso ciated ODE. This w ould make the information op erator A trivial but ev aluation of the push-forward Q # µ a w ould require the exact solution op erator of the ODE. Ho wev er, the reliance on access to an oracle solv er Q mak es this philosophically somewhat distinct from PNM. Limitations of Ba y esian PNM: The prop osed metho d w as in tended as a pro of-of- concept and it is therefore useful to highligh t the asp ects in whic h it is limited. First, 26 when an m th order ODE admits an r -parameter Lie group of transformations with r > m , there is an arbitrariness to the particular m -dimensional sub-group of transformations that are selected. Second, the route to obtain transformations admitted b y the ODE demands that some asp ects of the gradient ﬁeld f are kno wn, in con trast to other work in which f is treated as a black-box. F or instance, in Ex. 11 we used the fact that f can b e expressed as f ( x, y ) = F ( y x ), although kno wledge of the form of F was not required. Third, the class of ODEs for which a solv able Lie algebra is admitted is relatively small. On the other hand, references such as Bluman and Anco ( 2002 ) do cumen t imp ortan t cases where our metho d could b e applied. F ourth, the principles for prior construction that w e iden tiﬁed do not en tail a unique prior and, as suc h, the question of prior elicitation m ust still b e addressed. Outlo ok: The goal of pro viding rigorous and exact statistical uncertain ty quan tiﬁcation for the solution of an ODE is, w e b elieve, imp ortan t and will contin ue to b e addressed. T raditional numerical metho ds hav e b eneﬁtted from a cen tury of research eﬀort and, in comparison, Bay esian PNM is an under-developed ﬁeld. F or example, the limited existing w ork on PNM for ODEs, suc h as Skilling ( 1992 ); Sc hob er et al. ( 2014 ); Chkrebtii et al. ( 2016 ); Kersting and Hennig ( 2016 ); Schober et al. ( 2019 ); Kersting et al. ( 2018 ); T ronarp et al. ( 2019 ), do es not attempt to pro vide adaptiv e error con trol (though we note promising ongoing research in that direction b y Chkrebtii and Campb ell , 2019 ). Nevertheless, the case for dev eloping Bay esian n umerical metho ds - which shares some parallels with the case for Ba yesian statistics as opp osed to other inferential paradigms - is clear, as argued in Diaconis ( 1988 ) and Hennig et al. ( 2015 ). The insights we ha ve provided in this pap er serve to highligh t the foundational issues p ertinent to Bay esian PNM for ODEs. Indeed, our pro of- of-concept highlights that p erforming exact Ba yesian inference for ODEs ma y b e extremely diﬃcult. This in turn provides motiv ation for the contin ued dev elopment of ‘approximately Ba yesian’ approaches to PNM, whic h in Sec. 1.2 we surv ey ed in detail. Ac knowledgemen ts: The authors are grateful to Mark Craddo c k, F ran¸ cois-Xavier Briol and Tim Sulliv an for discussion of this work, as w ell as to an Asso ciate Editor and tw o Review ers for their c hallenging but constructiv e feedbac k. JW w as supp orted b y the EPSRC Cen tre for Do ctoral T raining in Cloud Computing for Big Data at Newcastle Universit y , UK. CJO was supp orted b y the Lloyd’s Register F oundation programme on data-centric engineering at the Alan T uring Institute, UK. This material w as based up on w ork partially supp orted by the National Science F oundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. An y opinions, ﬁndings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of the National Science F oundation. References Assyr Ab dulle and Giacomo Garegnani. Random time step probabilistic metho ds for uncer- tain ty quantiﬁcation in chaotic and geometric numerical in tegration. , 27 2018. Gerd Baumann. Symmetry Analysis of Diﬀer ential Equations with Mathematic a R  . Springer Science & Business Media, 2013. G.W. Bluman and S.C. Anco. Symmetry and Inte gr ation Metho ds for Diﬀer ential Equations . Springer, 2002. F ran¸ cois-Xa vier Briol, Chris J Oates, Mark Girolami, Michael A Osb orne, and Dino Sejdi- no vic. Probabilistic integration: A role in statistical computation? (with discussion and rejoinder). Statistic al Scienc e , 34(1):1–42, 2019. Radomir Chabiniok, Vic ky Y W ang, Myrian thi Hadjicharalam b ous, Liy a Asner, Jac k Lee, Maxime Sermesant, Ellen Kuhl, Alistair A Y oung, Philipp e Moireau, Mart yn P Nash, et al. Multiphysics and m ultiscale mo delling, data–mo del fusion and integration of organ ph ysiology in the clinic: v en tricular cardiac mec hanics. Interfac e F o cus , 6(2):20150083, 2016. Kathryn Chaloner and Isab ella V erdinelli. Bay esian exp erimental design: A review. Statis- tic al Scienc e , pages 273–304, 1995. Joseph T Chang and Da vid Pollard. Conditioning as disin tegration. Statistic a Ne erlandic a , 51(3):287–317, 1997. O A Chkrebtii and D A Campb ell. Adaptiv e step-size selection for state-space based proba- bilistic diﬀerential equation solvers. Statistics and Computing , 2019. T o app ear. Oksana Chkrebtii, David A Campb ell, Mark A Girolami, and Ben Calderhead. Ba yesian uncertain ty quantiﬁcation for diﬀerential equations (with discussion). Bayesian Analysis , 11(4):1239–1267, 2016. Jon Co c k ayne, Chris Oates, Tim Sulliv an, and Mark Girolami. Probabilistic meshless meth- o ds for partial diﬀeren tial equations and ba y esian inv erse problems. , 2016. Jon Co c k ayne, Chris Oates, Tim Sulliv an, and Mark Girolami. Bay esian probabilistic n u- merical metho ds. SIAM R eview , 2019. T o app ear. P atrick R Conrad, Mark Girolami, Simo S¨ arkk¨ a, Andrew Stuart, and Konstantinos Zy- galakis. Statistical analysis of diﬀerential equations: introducing probabilit y measures on n umerical solutions. Statistics and Computing , 27(4):1065–1082, 2017. P Diaconis. Ba y esian n umerical analysis. Statistic al De cision The ory and R elate d T opics IV(1) , page 1988, 1988. D. Estep. A p osteriori error b ounds and global error con trol for appro ximation of ordinary diﬀeren tial equations. SIAM Journal on Numeric al Analysis , 32(1):1–48, 1995. 28 Andrew Gelman and Cosma Rohilla Shalizi. Philosoph y and the practice of Bay esian statis- tics. British Journal of Mathematic al and Statistic al Psycholo gy , 66(1):8–38, 2013. Thomas Ha wkins. Emer genc e of the the ory of Lie gr oups: An essay in the history of math- ematics 1869–1926 . Springer Science & Business Media, 2012. Philipp Hennig, Michael A Osb orne, and Mark Girolami. Probabilistic n umerics and uncer- tain ty in computations. Pr o c e e dings of the R oyal So ciety A , 471(2179):20150142, 2015. Nic holas J Higham. A c cur acy and Stability of Numeric al Algorithms . SIAM, 2002. T oni Karv onen, Chris J Oates, and Simo S¨ arkk¨ a. A bay es-sard cubature metho d. In Pr o c e e dings of the 32nd Annual Confer enc e on Neur al Information Pr o c essing Systems (NeurIPS2018) , pages 5882–5893, 2018. H. Kersting, T.J. Sulliv an, and P . Hennig. Con vergence rates of gaussian o de ﬁlters. arXiv:1807.09737 , 2018. Hans Kersting and Philipp Hennig. Activ e uncertaint y calibration in Ba y esian ODE solvers. In Pr o c e e dings of the 32nd Confer enc e on Unc ertainty in Artiﬁcial Intel ligenc e (UAI 2016) , pages 309–318, 2016. F M Larkin. Probabilistic error estimates in spline interpolation and quadrature. In Infor- mation Pr o c essing 74 (Pr o c. IFIP Congr ess, Sto ckholm, 1974) , pages 605–609, 1974. FM Larkin. Gaussian measure in Hilb ert space and applications in numerical analysis. The R o cky Mountain Journal of Mathematics , 2:379–421, 1972. Han Cheng Lie, TJ Sulliv an, and Aretha L T eck entrup. Random forward mo dels and log- lik eliho o ds in bay esian inv erse problems. SIAM/ASA Journal on Unc ertainty Quantiﬁc a- tion , 6(4):1600–1629, 2018. H.C. Lie, A.M. Stuart, and T.J. Sulliv an. Strong con vergence rates of probabilistic integrators for ordinary diﬀerential equations. Statistics and Computing , 2019. T o app ear. Andr ´ es F L´ op ez-Lop era, F ran¸ cois Bac ho c, Nicolas Durrande, and Olivier Roustant. Finite- dimensional Gaussian appro ximation with linear inequalit y constraints. SIAM/ASA Jour- nal of Unc ertainty Quantiﬁc ation , 6(3):1224–1255, 2018. C J Oates and T J Sulliv an. A mo dern retrosp ective on probabilistic numerics. Statistics and Computing , 2019. T o app ear. C J Oates, J Co c k ayne, D Prangle, T J Sulliv an, and M Girolami. Multivariate Algorithms and Information-Base d Complexity , chapter Optimalit y Criteria for Probabilistic Numer- ical Metho ds. Berlin/Boston De Gruyter, 2019. T o app ear. 29 An tony Ov erstall, Da vid W o o ds, and Ben P arker. Ba yesian optimal design for ordinary diﬀeren tial equation mo dels with application in biological science. Journal of the A meric an Statistic al Asso ciation , 2019. T o app ear. A. Pakman and L. P aninski. Exact hamiltonian mon te carlo for truncated multiv ariate gaussians. Journal of Computational and Gr aphic al Statistics , 23:518–542, 2014. Juan R Perilla, Bo on Chong Goh, C Keith Cassidy , Bo Liu, Rafael C Bernardi, Till Rudack, Hang Y u, Zhe W u, and Klaus Sc h ulten. Molecular dynamics simulations of large macro- molecular complexes. Curr ent Opinion in Structur al Biolo gy , 31:64–74, 2015. Mic hael Schober, David K Duv enaud, and Philipp Hennig. Probabilistic ODE solvers with Runge-Kutta means. In Pr o c e e dings of the 28th Annual Confer enc e on Neur al Information Pr o c essing Systems (NIPS 2014) , pages 739–747, 2014. Mic hael Sc hob er, Simo S¨ arkk¨ a, and Philipp Hennig. A probabilistic mo del for the n umerical solution of initial v alue problems. Statistics and Computing , 2019. T o app ear. John Skilling. Ba y esian solution of ordinary diﬀeren tial equations. In Maximum Entr opy and Bayesian Metho ds , pages 23–37. Springer, 1992. On ur T eym ur, Kostas Zygalakis, and Ben Calderhead. Probabilistic linear multistep meth- o ds. In Pr o c e e dings fo the 30th Annual Confer enc e on Neur al Information Pr o c essing Systems (NIPS 2016) , pages 4321–4328, 2016. On ur T eym ur, Han Cheng Lie, Tim Sulliv an, and Ben Calderhead. Implicit probabilistic in- tegrators for ODEs. In Pr o c e e dings of the 32nd Annual Confer enc e on Neur al Information Pr o c essing Systems (NeurIPS 2018) , 2018. J.F. T raub and W o ´ zniako wski. P ersp ectiv es on information-based complexit y . Bul letin of the Americ an Mathematic al So ciety , 26(1):29–52, 1992. Filip T ronarp, Hans Kersting, Simo S¨ arkk¨ a, and Philipp Hennig. Probabilistic solutions to ordinary diﬀerential equations as non-linear Ba y esian ﬁltering: A new p erspective. Statistics and Computing , 2019. T o app ear. Nils P W edi. Increasing horizon tal resolution in numerical weather prediction and climate sim ulations: illusion or panacea? Philosophic al T r ansactions of the R oyal So ciety A , 372 (2018):20130289, 2014. 30 A Supplemen tary material This electronic supplement to the pap er b y W ang, Co c k ayne and Oates contains pro ofs for theoretical results in the main text (Sec. A.1 ), a brief discussion on Bay esian exp erimental design in the con text of designing a Ba y esian probabilistic n umerical metho d (Sec. A.2 ), and computation detail that was suppressed from the main text (Sec. A.3 ). A.1 Pro of of Theoretical Results Pr o of of The or em 2 . F rom T aylor’s theorem w e hav e that x ∗ = X ( x,  ) = ∞ X k =0  k k ! ∂ k X ( x,  ) ∂  k      =0 F or any diﬀerent iable function F w e hav e that d F ( x ∗ ) d  = d X i =1 ∂ F ( x ∗ ) ∂ x ∗ i d x ∗ i d  = d X i =1 ξ i ∂ F ( x ∗ ) ∂ x ∗ i = X F ( x ∗ ) and similarly d k F ( x ∗ ) d  k = X k F ( x ∗ ) . Th us ∂ k X ( x,  ) ∂  k      =0 = X k x so that the stated result is reco v ered. Pr o of of The or em 3 . The result is established as follows: F inv arian t ⇔ F ( x ∗ ) = 0 whenever F ( x ) = 0 ⇔ e  X F ( x ) = 0 whenever F ( x ) = 0 (Cor. 1 ) ⇔ F ( x ) +  X F ( x ) + O (  2 ) = 0 whenever F ( x ) = 0 (T aylor) ⇔ X F ( x ) = 0 whenever F ( x ) = 0 where the last line follo ws since the co eﬃcien t of the O (  ) term in the T a ylor expansion m ust v anish. This completes the pro of. Pr o of of The or em 4 . F rom Cor. 1 , we ha ve that F ( x ∗ ) = F ( x ) +  X F ( x ) + O (  2 ). The result follo ws from insp ection of the  co eﬃcient. 1 Pr o of of The or em 6 . Supp ose L is a t w o dimensional Lie Algebra generated b y linearly in- dep enden t inﬁnitesimal generators X 1 and X 2 . Let Y = [X 1 , X 2 ] = a X 1 + b X 2 and let J b e the one dimensional subalgebra generated b y Y . Supp ose Z = c X 1 + d X 2 is some element of L , then [Y , Z] = [ Y , c X 1 + d X 2 ] = c [Y , X 1 ] + d [Y , X 2 ] = cb [X 2 , X 1 ] + da [X 1 , X 2 ] = ( ad − bc )Y ∈ J So J ⊂ L is normal, th us L is solv able, as claimed. Pr o of of The or em 7 . Let the inﬁnitesimal generator asso ciated with the Lie group of trans- formations b e denoted X. F rom the remarks after Def. 7 , we can obtain canonical co ordinates b y solving the pair of ﬁrst order partial diﬀeren tial equations X r = 0, X s = 1. By the c hain rule we hav e d s d r = s x + s y y 0 r x + r y y 0 =: G ( r, s ) F rom the deﬁnition of canonical co ordinates, the Lie group of transformations is r ∗ = r , s ∗ = s +  in the transformed co ordinate system, so d s ∗ d r ∗ = G ( r ∗ , s ∗ ) = ⇒ d s d r = G ( r , s +  ) for all  , which implies G ( r , s ) = G ( r ) and th us Eq. ( 15 ) b ecomes ds d r = G ( r ) as claimed. Pr o of of The or em 8 . Let the inﬁnitesimal generators asso ciated with the Lie group of trans- formations b e denoted X 1 and X 2 . Recall from Thm. 6 that an y t wo dimensional Lie algebra is solv able. Thus, without loss of generality we ma y assume [X 1 , X 2 ] = λ X 1 (25) for some λ ∈ R . The inﬁnitesimal generators X 1 and X 2 eac h corresp ond to a one parameter Lie group of transformations, denoted x ∗ = X 1 ( x,  1 ) and x † = X 2 ( x,  2 ). Let v ( x, y ), w ( x, y , y 1 ) be inv arian t functions of X 1 and its extension X (1) 1 , resp ectiv ely , so v ( x ∗ , y ∗ ) = v ( x, y ) and w ( x ∗ , y ∗ , y ∗ 1 ) = w ( x, y , y 1 ), where w has a non-trivial dep endence on its third argumen t. It follows from the deﬁnition of in v ariance that d w ∗ d v ∗ = d w d v , 2 whic h is equiv alen t to X (1) 1 d w d v = 0 (26) b y Cor. 3 . Now b ecause Eq. ( 26 ) is a homogeneous partial diﬀerential equation, the gen- eral solution d w d v can b e expressed as a function of the tw o solutions v ( x, y ) and w ( x, y , y 1 ). Therefore d w d v = Z ( v , w ) (27) for some undetermined function Z . Since Eq. ( 18 ) admits X 2 , and Eq. ( 27 ) is the same ODE when expressed in terms of x, y , y 1 , it must b e the case that X (2) 2  d w d v − Z ( v , w )  = 0 whenev er d w d v = Z ( v , w ) . Then from Cor. 4 it follo ws that X (1) 2 is admitted by the ﬁrst order ODE in Eq. ( 27 ). Th us w e are now faced with a ﬁrst order ODE that admits a one parameter Lie group of transformations, as in Thm. 7 . No w, from Eq. ( 25 ), we hav e X 1 X 2 v = X 2 X 1 v + λ X 1 v = 0. Thus X 2 v is an inv ariant of X 1 and so X 2 v = h ( v ) for some function h . Similarly X (1) 1 X (1) 2 v = X (1) 2 X (1) 1 v + λ X (1) 1 v = 0, so that X (1) 2 w = g ( v , w ), for some function g . This implies X (1) 2 = h ( v ) ∂ ∂ v + g ( v , w ) ∂ ∂ w . Pro ceeding as in Thm. 7 , denote the canonical co ordinates of X (1) 2 = h ( v ) ∂ ∂ v + g ( v , w ) ∂ ∂ w b y ˜ r ( v , w ), ˜ s ( v , w ) suc h that X (1) 2 ˜ r = 0, X (1) 2 ˜ s = 1. In canonical co ordinates, Eq. ( 27 ) b ecomes: d ˜ s d ˜ r = H ( ˜ r ) (28) This is again an integral, with solution ˜ s( ˜ r ) = Z ˜ r H ( t )d t + C. (29) W e can rewrite Eq. ( 29 ) in terms of v , w to obtain an equation of the form I ( v , w ) = 0 (30) whic h satisﬁes X (1) 1 ( I ( v , w )) = 0 whenev er I ( v , w ) = 0, since recall v , w are in v ariants of X (1) 1 . F or the ﬁnal step, we recall that v = v ( x, y ) and w = w ( x, y , y 1 ), so that Eq. ( 30 ) represen ts a ﬁrst order ODE in y , which admits X 1 . Th us we can apply Thm. 15 a second time to obtain canonical co ordinates r ( x, y ), s ( x, y ) for X 1 . In these co ordinates, Eq. ( 30 ) reduces into the form ds d r = G ( r ) where G is implicitly deﬁned. 3 Example 13 (Deriving the Inﬁnitesimal Generators for the Second Order ODE in Eq. 10 ) . Consider the se c ond or der nonline ar ODE ( x − y( x )) d 2 y d x 2 + 2 dy d x  dy d x + 1  +  dy d x  3 / 2 = 0 . (31) Using Cor ol lary 4 , we have:  ξ ∂ ∂ x + η ∂ ∂ y + η (1) ∂ ∂ y 1 + η (2) ∂ ∂ y 2  y 2 + 2 y 1 ( y 1 + 1) + y 3 / 2 1 x − y ! = 0 which implies − ξ 2 y 1 ( y 1 + 1) + y 3 / 2 1 ( x − y ) 2 + η 2 y 1 ( y 1 + 1) + y 3 / 2 1 ( x − y ) 2 + η (1) 4 y 1 + 2 + 3 2 y 1 / 2 1 x − y ! + η (2) = 0 R e c al l η (1) = η x + ( η y − ξ x ) y 1 + ξ y y 2 1 and η (2) = η xx + (2 η xy − ξ xx ) y 1 + ( η y y − 2 ξ xy ) y 2 1 − ξ y y y 3 1 + ( η y − 2 ξ x ) y 2 − 2 ξ y y 1 y 2 A lso notic e we c an r eplac e y 2 via the original diﬀer ential e quation, i.e. y 2 = − 2 y 1 ( y 1 + 1) + y 3 / 2 1 x − y . Substituting for η (1) , η (2) and y 2 via the ab ove expr essions, multiplying b oth sides by ( x − y ) 2 and r e arr anging the terms as p owers of y 1 yields the r ather long e quation: (2 xη x − 2 y η x + x 2 η xx − 2 xy η xx + y 2 η xx ) + y 1   − 2 ξ + 2 η + 4 xη x − 4 y η x + 2 xη y − 2 y η y − 2 xξ x + 2 y ξ x +2 x 2 η xy − 2 xy η xy + 2 y 2 η xy − x 2 ξ xx + 2 xy ξ xx − y 2 ξ xx − 2 xη y + 4 xξ x + 2 y η y − 4 y ξ x   + y 2 1   − 2 ξ + 2 η + 4 xη y − 4 y η y − 4 xξ x + 4 y ξ x + 2 xξ y − 2 y ξ y + x 2 η y y − 2 xy η y y + y 2 η y y − 2 x 2 ξ xy + 4 xy ξ xy − 2 y 2 ξ xy − 2 xη y + 4 xξ x + 2 y η y − 4 y ξ x + 4 xξ y − 4 y ξ y   + y 3 1 (4 xξ y − 4 y ξ y − x 2 ξ y y + 2 xy ξ y y − y 2 ξ y y + 4 xξ y − 4 y ξ y ) + y 1 / 2 1  3 2 xη x − 3 2 y η x  + y 3 / 2 1  − ξ + η + 3 2 xη y − 3 2 y η y − 3 2 xξ x + 3 2 y ξ x − xη y + 2 xξ x + y η y − 2 y ξ x  + y 5 / 2 1  3 2 xξ y − 3 2 y ξ y + 2 xξ y − 2 y ξ y  = 0 4 This expr ession on the left hand side must vanish, so c omp aring the c o eﬃcients of p owers of y 1 gives the determining e quations: 2 xη x − 2 y η x + x 2 η xx − 2 xy η xx + y 2 η xx = 0 (32) − 2 ξ + 2 η + 4 xη x − 4 y η x + 2 xη y − 2 y η y − 2 xξ x + 2 y ξ x +2 x 2 η xy − 2 xy η xy + 2 y 2 η xy − x 2 ξ xx + 2 xy ξ xx − y 2 ξ xx − 2 xη y + 4 xξ x + 2 y η y − 4 y ξ x = 0 − 2 ξ + 2 η + 4 xη y − 4 y η y − 4 xξ x + 4 y ξ x + 2 xξ y − 2 y ξ y + x 2 η y y − 2 xy η y y + y 2 η y y − 2 x 2 ξ xy + 4 xy ξ xy − 2 y 2 ξ xy − 2 xη y + 4 xξ x + 2 y η y − 4 y ξ x + 4 xξ y − 4 y ξ y = 0 4 xξ y − 4 y ξ y − x 2 ξ y y + 2 xy ξ y y − y 2 ξ y y + 4 xξ y − 4 y ξ y = 0 (33) 3 2 xη x − 3 2 y η x = 0 (34) − ξ + η + 3 2 xη y − 3 2 y η y − 3 2 xξ x + 3 2 y ξ x − xη y + 2 xξ x + y η y − 2 y ξ x = 0 3 2 xξ y − 3 2 y ξ y + 2 xξ y − 2 y ξ y = 0 (35) It is imme diately obvious fr om ( 34 ) that η x = 0 and fr om ( 35 ) that ξ y = 0 . Conse quently ( 32 ) and ( 33 ) vanishes. The r emaining determining e quations simplify to: − 2 ξ + 2 η + 2 xξ x − 2 y ξ x − x 2 ξ xx + 2 xy ξ xx − y 2 ξ xx = 0 (36) − 2 ξ + 2 η + 2 xη y − 2 y η y + x 2 η y y − 2 xy η y y + y 2 η y y = 0 (37) − ξ + η + 1 2 xη y − 1 2 y η y + 1 2 xξ x − 1 2 y ξ x = 0 (38) These r emaining p artial diﬀer ential e quations in ξ ( x, y ) and η ( x, y ) ar e line ar, and r e c al l η ( x, y ) is indep endent of x and ξ ( x, y ) is indep endent of y r esp e ctively. T o solve these p artial diﬀer ential e quations we c an ther efor e expr ess ξ ( x ) = P ∞ n =0 a n x n and η ( y ) = P ∞ m =0 b m y m . Conse quently ( 36 ) b e c omes: − 2 ∞ X n =0 a n x n + 2 ∞ X m =0 b m y m + 2 ∞ X n =1 na n x n − 2 y ∞ X n =1 na n x n − 1 − ∞ X n =2 n ( n − 1) a n x n + 2 y ∞ X n =2 n ( n − 1) a n x n − 1 − y 2 ∞ X n =2 n ( n − 1) a n x n − 2 = 0 Comp aring the c onstant term implies b 0 = a 0 . Comp aring the terms c ontaining y implies: 2 b 1 − 2 ∞ X n =1 na n x n − 1 + 2 ∞ X n =2 n ( n − 1) a n x n − 1 = 0 Comp aring c o eﬃcients of x n gives b 1 = a 1 , n = n ( n − 1) or a n = 0 for n ≥ 2 . Of c ourse, n = n ( n − 1) has solution n = 2 for n ≥ 2 , so a n = 0 for n ≥ 3 . Comp aring the terms 5 c ontaining y 2 implies b 2 = a 2 . Notic e ( 37 ) is symmetric with ( 36 ) in the sense that swapping ξ with η and x with y in ( 36 ) gives ( 37 ) . So by symmetry ( 37 ) gives b 0 = a 0 , b 1 = a 1 , b 2 = a 2 and b n = 0 for n ≥ 3 . ( 38 ) gives no additional solutions. Ther efor e, the example ODE admits a thr e e p ar ameter Lie gr oup of tr ansformations with inﬁnitesimals: ξ = a 0 + a 1 x + a 2 x 2 η = a 0 + a 1 y + a 2 y 2 wher e a 2 , a 1 and a 0 ar e arbitr ary c onstants. The inﬁnitesimal gener ators c orr esp onding to a 2 , a 1 and a 0 ar e r esp e ctively X 1 = x 2 ∂ ∂ x + y 2 ∂ ∂ y (39) X 2 = x ∂ ∂ x + y ∂ ∂ y X 3 = ∂ ∂ x + ∂ ∂ y , (40) which gener ate a thr e e dimensional Lie algebr a. Example 14 (Ex. 13 , contin ued) . R e c al l fr om Ex. 13 that the se c ond or der nonline ar ODE in Eq. ( 31 ) admits a thr e e p ar ameter Lie gr oup of tr ansformations with inﬁnitesimal gener ators X 1 , X 2 , X 3 deﬁne d in Eqs. ( 39 ) - ( 40 ) . These gener ators c an b e veriﬁe d to satisfy [X 1 , X 2 ] = − X 1 , [X 1 , X 3 ] = − X 2 , [X 2 , X 3 ] = − X 3 . The p airs X 1 , X 2 and X 2 , X 3 form a two dimensional (and ther efor e solvable by Thm. 6 ) Lie sub-algebr a and c an b e use d as the b asis for our metho d. F or the derivations b elow we pr o c e e d with arbitr ary choic e X 1 , X 2 . F ol lowing the pr o of of Thm. 8 , ﬁrst we se ek a solution v = v ( x, y ) to the ﬁrst or der line ar PDE X 1 v = 0 . i.e. we must solve x 2 ∂ v ∂ x + y 2 ∂ v ∂ y = 0 This has gener al solution v = f ( 1 y − 1 x ) for some arbitr ary function f , and we pick a p articular solution v ( x, y ) = 1 y − 1 x . Next we se ek a solution w = w ( x, y , y 1 ) to the ﬁrst or der line ar PDE X (1) 1 w = 0 . i.e. we must solve x 2 ∂ w ∂ x + y 2 ∂ w ∂ y + 2( y − x ) y 1 ∂ w ∂ y 1 = 0 . A gain, we pick a p articular solution w ( x, y , y 1 ) = y 1 ( x y ) 2 . In ac c or danc e with Eq. ( 27 ) , we c an r e-write the original ODE ( 31 ) in terms of the c o or dinates v and w to obtain d w d v =        w 3 / 2 + 2 w ( w + 1) v ( w − 1) , for x y ≥ 0 − w 3 / 2 + 2 w ( w + 1) v ( w − 1) , for x y < 0 (41) 6 Next we expr ess X (1) 2 in terms of v and w ﬁnd its c anonic al c o or dinates ˜ r ( v , w ) , ˜ s ( v , w ) . T o this end, we have X (1) 2 = x ∂ ∂ x + y ∂ ∂ y = − v ∂ ∂ v , which has c anonic al c o or dinates ˜ r ( v , w ) = w , ˜ s ( v , w ) = − log( v ) . R e-writing Eq. ( 41 ) in terms of ˜ r , ˜ s le ads to the analo gue of Eq. ( 28 ) : d ˜ s d ˜ r = 1 − ˜ r ± ˜ r 3 / 2 + 2 ˜ r ( ˜ r + 1) =: H ( ˜ r ) (42) This example exhibits the c onvenient fe atur e that Eq. ( 42 ) c an b e dir e ctly inte gr ate d to give ˜ s( ˜ r ) = − log (2 ˜ r ± √ ˜ r + 2) + log( ˜ r ) / 2 + C , which c an b e r e-written in terms of x, y , y 1 to give log  1 y − 1 x  = log   2 s y 1 x 2 y 2 ± 1 + 2 q y 1 x 2 y 2   − C (43) for some inte gr ation c onstant C . The ﬁnal step, to r emove the y 1 indep endenc e, r e quir es c anonic al c o or dinates for X 1 . These c an b e sele cte d as r ( x, y ) = 1 y − 1 x , s ( x, y ) = − 1 y . Then Eq. ( 43 ) b e c omes r ∓ exp( − C ) 2 exp( − C ) = ds d r 1 + ds d r ! 1 / 2 + 1 + ds d r ds d r ! 1 / 2 which is e quivalent to Eq. ( 19 ) for some function G . A.2 Design of the T raining Set The p erformance of the prop osed Bay esian PNM is not our main fo cus in this w ork, as w e consider the method to b e (only) a pro of-of-concept. How ever, for completeness w e ac knowledge that performance will dep end on the lo cations at whic h the gradient ﬁeld is ev aluated; the so-called tr aining set . In this section we discuss how these inputs could b e optimally selected. T o simplify the presentation, we fo cus on the case of a ﬁrst order ODE, as in Eq. ( 15 ), where the inputs r 0 , . . . , r n m ust b e selected. The design of a PNM can b e viewed as an instance of statistical exp erimen tal design ( Chaloner and V erdinelli , 1995 ). In Sec. 3 of Co ck a yne et al. ( 2019 ) a connection b et ween PNM and decision-theoretic exp erimen tal design was exp osed. Suc h metho ds require that a loss function L : Q × Q → R is provided, where L ( q , q † ) quantiﬁes the loss when q is used as an estimate for the true quantit y of interest q † . F urther detail was provided in Oates et al. ( 2019 ). T o av oid rep etition, in the remainder we fo cus instead on appro ximate exp erimental design, where a loss function is not explicitly needed. Recall that the output of a PNM is the distribution µ n = B ( µ, a n ) ∈ P Q . Then one can sp ecify a functional ` : P Q → R and compute τ ( r 0 , . . . , r n ) = Z ` ( B ( µ, A (y; r 0 , . . . , r n )))d µ (y) (44) 7 where A ( · ; r 0 , . . . , r n ) is the information op erator in Eq. ( 17 ) with the dep endence on r 0 , . . . , r n made explicit. F or the c hoice ` ( ν ) = log det(Co v ˜ Q ∼ ν [ ˜ Q ]), a conﬁguration ( r 0 , . . . , r n ) for whic h τ ( r 0 , . . . , r n ) is minimised is said to b e D-optimal . The functional ` plays the role of an appro ximation to p osterior exp ected loss, and other choices for ` lead to other approxi- mate notions of optimal exp erimen tal design. F or instance, an A-optimal design was used for the Ba yesian solution of a partial diﬀerential equation in Co c k a yne et al. ( 2016 ). F or further bac kground on exp erimental design we refer the reader to Chaloner and V erdinelli ( 1995 ). Imp ortan tly , Eq. ( 44 ) do es not depend on the information A (y † ) and can therefore b e ev aluated prior to the exp eriment b eing p erformed. Ho w ever, in general the n umerical appro ximation of Eq. ( 44 ), and the task of ﬁnding a minimal conﬁguration, is practically diﬃcult. The reader is referred to Overstall et al. ( 2019 ) for further discussion of exp erimental design in the PNM context. A.3 Computational Detail In this App endix we set out in detail the prior construction that was used for the numerical illustrations of Sec. 4 in the main text. Recall that for b oth the ﬁrst order ODE example in Sec. 4.1 and the second order ODE example in Sec. 4.2 we required a non-parametric prior o ver functions ζ whic h satisfy the constraints giv en in Eq. 24 , namely that ζ ( r 0 ) = 0 , ζ ( r ) ≤ 1 , d ζ d r ≥ 0 . (45) Moreo ver, bearing in mind the p osterior computation that is to follo w, w e require in addition that the prior con venien tly facilitates the conditioning calculations inv olved. It is clear that standard non-parametric priors such as Gaussian pro cesses do not satisfy the b oundedness or monotonicit y constrain ts, whilst a nonlinear transformation of suc h a process w ould fail to mak e conditioning on data straight-forw ard. In fact, the construction of such ﬂexible priors remains an active area of researc h. T o pro ceed, we adopted an approac h recently prop osed in L´ op ez-Lop era et al. ( 2018 ). In brief, the main idea is to construct an N -dimensional parametric distribution ov er functions for whic h Eq. 45 is satisﬁed. This distribution, b eing ﬁnite-dimensional, allows for the p ossibilit y of tractable conditioning operations, whilst the ﬂexibility to tak e N arbitrarily large provides a means of ensuring that the salien t uncertain t y is accurately represen ted. More sp eciﬁcally , the function ζ is parametrised as ζ ( r ) = N X j =1 z j φ j ( r ) (46) where the φ j are basis functions φ j ( r ) = ( 1 − | r − t j h | | r − t j h | ≤ 1 0 otherwise 8 for equally spaced p oints t j with incremen t h , as recommended in L´ op ez-Lopera et al. ( 2018 ). A prior on ζ can b e induced via a prior on the co eﬃcients z 1 , . . . , z N , with N tak en to b e substan tially larger than the n umber n of datap oints on whic h the z i are to b e conditioned. The sp eciﬁc construction of a prior on the co eﬃcients is required to enco de the constrain ts in Eq. 45 and to admit tractable conditioning; these issues are discussed in the remainder. First, w e consider the b oundedness and monotonicit y constraints in Eq. 45 . At the level of the co eﬃcien ts, it is straight-forw ard to chec k that this requires that the prior supp ort is restricted to the set Z = { z ∈ R N : 0 < z 1 ≤ z 2 ≤ · · · ≤ z N ≤ 1 } . F or conv enience, we elected to use a prior that w as obtained b y restricting a standard Gaussian measure N (0 , I ) on R N to the set Z . Second, we consider how to condition on a dataset. Recall that information is provided on the v alues of the gradient ζ 0 ( r i ) = b i of the function ζ , ev aluated at a ﬁnite num b er of lo cations r i of the canonical co ordinate r , together with the initial condition ζ ( r 0 ) = b 0 . Th us the information can b e describ ed by the linear system of constraints Φ z = b where Φ =      φ 1 ( r 0 ) . . . φ N ( r 0 ) φ 0 1 ( r 1 ) . . . φ 0 N ( r 1 ) . . . . . . φ 0 1 ( r n ) . . . φ 0 N ( r n )      , b =      b 0 b 1 . . . b n      . The p osterior can therefore b e characterised as the restriction of N (0 , I ) to the set Z ∩ D where D = { z ∈ R N : Φ z = b } . Finally , w e discuss how p osterior computation w as p erformed. The key observ ation is that an equiv alen t c haracterisation of the p osterior is ﬁrst to restrict N (0 , I ) to D and then to further restrict to Z . This is adv an tageous since the linear nature of the data implies that the restriction z |D of N (0 , I ) to D is again a Gaussian with a closed form, denoted N ( µ, Σ). It is imp ortan t to note that Σ is singular (rank ρ = N − n − 1) and so Σ = U Λ 2 U > , where U is an orthogonal matrix and Λ is a diagonal matrix with ρ non-zero entries on the diagonal. Th us we can express z |D in the form z = µ + U Λ ˜ z where ˜ z ∼ N (0 , I ) is a standard Gaussian on R ρ . Let M = U Λ and let m i = [ m i, 1 , . . . m i,ρ ] denote the i th row of M . Then w e hav e the relation z ∈ Z ⇐ ⇒ ˜ z ∈ ˜ Z where ˜ Z = { ˜ z ∈ R ρ : 0 ≤ m 1 ˜ z + µ 1 , 0 ≤ ( m i +1 − m i ) ˜ z + ( µ i +1 − µ i ) , 0 ≤ − m N ˜ z + (1 − µ N ) } 9 or equiv alently ˜ Z = { ˜ z ∈ R ρ : F ˜ z + g ≥ 0 } for F =        m 1 m 2 − m 1 . . . m N − m N − 1 − m N        , g =        µ 1 µ 2 − µ 1 . . . µ N − µ N − 1 1 − µ N        . The computational task is thus reduced to sampling the restriction of the ρ -dimensional standard Gaussian random v ariable ˜ z to the (non-n ull) set ˜ Z . The developmen t of compu- tational metho ds to sample from suc h (p otentially high-dimensional) distributions is itself an active area of research, and for this w ork we employ ed the Hamiltonian Mon te Carlo metho d of Pakman and Paninski ( 2014 ), as recommended sp eciﬁcally for this purp ose in L´ op ez-Lop era et al. ( 2018 ). 10

A Role for Symmetry in the Bayesian Solution of Differential Equations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment