GMLS-Nets: A framework for learning from unstructured data

GMLS-Nets: A framework f or lear ning fr om unstructur ed data Nathaniel T rask 1 , + , Ravi G. P atel 1 , Ben J. Gr oss 2 , Paul J . Atzberger 2 , † 1 Sandia National Laboratories 2 Univ ersity of California Santa Barbara Center for Computing Research † atzberg@gmail.com + natrask@sandia.gov http://atzberger.org/ September 5, 2019 Abstract Data ﬁelds sampled on irregularly spaced points arise in man y applications in the sciences and engineering. For re gular grids, Con volutional Neural Networks (CNNs) hav e been successfully used to gaining beneﬁts from weight sharing and in variances. W e generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Mo ving Least Squares (GMLS). GMLS is a non- parametric technique for estimating linear bounded functionals from scattered data, and has recently been used in the literature for solving partial dif ferential equations. By parameterizing the GMLS estimator , we obtain learning methods for operators with unstructured stencils. In GMLS-Nets the necessary calculations are local, readily parallelizable, and the estimator is supported by a rigorous approximation theory . W e sho w how the frame work may be used for unstructured physical data sets to perform functional re gression to identify associated dif ferential operators and to regress quantities of interest. The results suggest the architectures to be an attracti ve foundation for data-dri ven model de velopment in scientiﬁc machine learning applications. 1 Introduction Many scientiﬁc and engineering applications require processing data sets sampled on irregularly spaced points. Consider e.g. GIS data associating geospatial locations with measurements, LID AR data characterizing object geometry via point clouds, scientiﬁc simulations with unstructured meshes. This need is ampliﬁed by the recent surge of interest in scientiﬁc machine learning (SciML) [2] targeting the application of data-dri ven techniques to the sciences. In this setting, data typically takes the form of e.g. synthetic simulation data from meshes, or from sensors associated with data sites e volving under unkno wn or partially kno wn dynamics. This data is often scarce or highly constrained, and it has been proposed that successful SciML strategies will le verage prior kno wledge to enhance information gained from such data [1, 2]. One may exploit physical properties and in variances such as transformation symmetries, conserv ation structure, or mathematical kno wledge such as solution regularity [1, 3, 7]. This new application space necessitates ML architectures capable of utilizing such knowledge. Implementations in T ensorFlow and PyT orch are av ailable at https://github.com/rgp62/gmls- nets and https://github.com/atzberg/gmls- nets . 1. Sandia National Laboratories is a multimission laboratory managed and operated by National T echnology and Engineering Solutions of Sandia, LLC.,a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’ s National Nuclear Security Administration under contract DE-N A-0003525. For data sampled on regular grids, Con volutional Neural Netw orks (CNNs) are widely used to e xploit translation in variance and hierarchical structure to extract features from data. Here we generalize this technique to the SciML setting by introducing GMLS-Nets based on the scattered data approximation theory underlying generalized moving least squares (GMLS). Similar to how CNNs learn stencils which beneﬁt from weight-sharing, GMLS-Nets operate by using local reconstructions to learn operators between function spaces. The resulting architecture is similarly interpretable and serves as an effecti ve generalization of CNNs to unstructured data, while providing mechanisms to incorporate knowledge of underlying ph ysics. In this work we sho w how GMLS-Nets may be used in a SciML setting. Our results sho w GMLS- Nets are an ef fective tool to discov er partial diferential equations (PDEs), which may be used as a foundation to construct data-driven models while preserving physical in v ariants like conserv ation principles. W e also show they may be used to improve traditional scientiﬁc components, such as time integrators. W e show the y also can be used to regress engineering quantities of interest from scientiﬁc simulation data. Finally , we brieﬂy sho w GMLS-Nets can perform reasonably relativ e to con vNets on traditional computer vision benchmarks. These results indicate the promise of GMLS-Nets to support data-driv en modeling efforts in SciML applications. Implementations in T ensorFlow and PyT orch are available at https://github.com/rgp62/gmls- nets and https: //github.com/atzberg/gmls- nets . 1.1 Generalized Moving Least Squar es (GMLS) Generalized Moving Least Squares (GMLS) is a non-parametric functional re gression technique to construct approximations of linear, bounded functionals from scattered samples of an underlying ﬁeld by solving local least-square problems. On a Banach space V with dual space V ∗ , we aim to recov er an estimate of a giv en target functional τ ˜ x [ u ] ∈ V ∗ acting on u = u ( x ) ∈ V , where x , ˜ x denote associated locations in a compactly supported domain Ω ⊂ R d . W e assume u is characterized by an unstructured collection of sampling functionals, Λ( u ) := { λ j ( u ) } N j =1 ⊂ V ∗ . T o construct this estimate, we consider P ⊂ V and seek an element p ∗ ∈ P which pro vides an optimal reconstruction of the samples in the following weighted- ` 2 sense. p ∗ = argmin p ∈ P N X j =1 ( λ j ( u ) − λ j ( p )) 2 ω ( λ j , τ ˜ x ) . (1) Here ω ( λ j , τ ˜ x ) is a positiv e, compactly supported kernel function establishing spatial correlation between the target functional and sampling set. If one associates locations X h := { x j } N j =1 ⊂ Ω with Λ( u ) , then one may consider radial kernels ω = W  ( || x j − ˜ x || 2 ) , with support r <  . Assuming the basis P = span { φ 1 , ..., φ dim( P ) } , and denoting Φ( x ) = { φ i ( x ) } i =1 ,...,dim ( P ) , the optimal reconstruction may be written in terms of an optimal coefﬁcient v ector a ( u ) p ∗ = Φ( x ) | a ( u ) . (2) Provided one has kno wledge of how the tar get functional acts on P , the ﬁnal GMLS estimate may be obtained by applying the target functional to the optimal reconstruction τ h ˜ x [ u ] = τ ˜ x (Φ) | a ( u ) . (3) Sufﬁcient conditions for the e xistence of solutions to Eqn. 1 depend only upon the unisolvency of Λ ov er V , the distribution of samples X h , and mild conditions on the domain Ω ; they are independent of the choice of τ ˜ x . For theoretical underpinnings and recent applications, we refer readers to [9, 25 – 27]. GMLS has primarily been used to obtain point estimates of dif ferential operators to dev elop meshfree discretizations of PDEs. The abstraction of GMLS howe ver provides a mathematically rigorous This paper describes objecti ve technical results and analysis. Any subjecti ve vie ws or opinions that might be expressed in the paper do not necessarily represent the vie ws of the U.S. Department of Ener gy or the United States Gov ernment. * W ork supported by DOE Grant ASCR PhILMs DE-SC0019246. 2 approximation theory framework which may be applied to a wealth of problems, whereby one may tailor the choice of τ ˜ x , Λ , P and ω to a giv en application. In the current work, we will assume the action of τ ˜ x on P is unknown, and introduce a parameterization τ ˜ x ,ξ (Φ) , where ξ denote hyperparameters to be inferred from data. Classically , GMLS is restricted to linear bounded tar get functionals; we will also consider a nov el nonlinear extension by considering estimates of the form τ h ˜ x [ u ] = q ˜ x ,ξ ( a ( u )) , (4) where q ˜ x ,ξ is a family of nonlinear operators parameterized by ξ acting upon the GMLS reconstruction. Where unambiguous, we will drop the ˜ x dependence of operators and simply write e.g. τ h [ u ] = q ξ ( a ( u )) . W e ha ve recently used related non-linear v ariants of GMLS to dev elop solvers for PDEs on manifolds in [9]. For simplicity , in this work we specialize as follo ws. Let: Λ be point e v aluations on X h ; P be π m ( R d ) , the space of m th -order polynomials; let W  ( r ) = (1 − r / ) ¯ p + , where f + denotes the positiv e part of a function f and p ∈ N . W e stress howe ver that this framew ork supports a much broader application. Consider e.g. learning from ﬂux data related to H ( div ) -conforming discretizations, where one may select as sampling functional λ i ( u ) = R f i u · d A , or consider the physical constraints that may be imposed by selecting P as be di vergence free or satisfy a dif ferential equation. W e illustrate now the connection between GMLS and conv olutional networks in the case of a uniform grid, X h ⊂ Z d . Consider a sampling functional λ ( u ) = ( u ( x j ) − u ( x i )) , and assume the parameterization τ ˜ x ,ξ (Φ) =  ξ 1 , ..., ξ dim ( P )  , x i,j = x i − x j . Then the GMLS estimate is given explicitly at a point x i by τ h ˜ x i [ u ] = X α,β ,j ξ α X k φ α ( x k ) W ( x ik ) φ β ( x k ) ! − 1 φ β ( x j ) W ( x i,j )( u j − u i ) . (5) Contracting terms inv olving α, β and k , we may write τ h ˜ x i [ u ] = P j c ( τ , Λ) ij ( u j − u i ) . The collection of stencil coef ﬁcients at x i ∈ X h are { c ( τ , Λ) ij } j . Therefore, one application for GMLS is to build stencils similar to con volutional netw orks. A major distinction is that GMLS can handle scattered data sets and a judicious selection of Λ , P and ω can be used to inject prior information. Alternativ ely , one may interpret the regression over P as an encoding in a low-dimensional space well-suited to characterize common operators. For continuous functions for e xample, an operator’ s action on the space of polynomials is often sufﬁcient to obtain a good approximation. W e also remark that unlike CNNs there is often less need to handle boundary ef fects; GMLS-nets is capable of learning one-sided stencils. 1.2 GMLS-Nets From an ML perspective, GMLS estimation consists of two parts: (i) data is encoded via the coef ﬁcient v ector a ( u ) providing a compression of the data in terms of P , (ii) the operator is regressed ov er P ∗ ; this is equiv alent to ﬁnding a function q ξ : a ( u ) → R . W e propose GMLS-Layers encoding this process in Figure 1. This architecture accepts input channels inde xed by α which consist of components of the data vector -ﬁeld [ u ] α sampled ov er the scattered points X h . W e allow for dif ferent sampling points for each channel, which may be helpful for heterogeneous data. Each of these input channels is then used to obtain an encoding of the input ﬁeld as the vector a ( u ) identifying the optimal representer in P . W e next select our parameterization of the functional via q ξ , which may be any f amily of functions trainable by back-propagation. W e will consider two cases in this work appropriate for linear and non-linear operators. In the linear case we consider q x i ( a ) = ξ T a , which is sufﬁcient to exactly reproduce dif ferential operators. For the nonlinear case we parameterize with a multi-layer perceptron (MLP), q ξ ( a ) = MLP ( a ) . Note that in the case of linear acti vation function, the single layer MLP model reduces to the linear model. Nonlinearity may thus be handled within a single nonlinear GMLS-Layer , or by stacking multiple linear GMLS-layers with intermediate ReLU’ s, the later mapping more directly onto traditional 3 Figure 1: GMLS-Nets. Scattered data inputs are processed by learnable operators τ [ u ] parameterized via GMLS estimators. A local reconstruction is built about each data point and encoded as a coefﬁcient vector via equation 2. The coefﬁcient mapping q ( a ) of equation 4 pro vides the learnable action of the operator . GMLS-Layers can be stacked to obtain deeper architectures and combined with other neural network operations to perform classiﬁcation and regression tasks (inset, SD: scattered data, MP: max-pool, MLP: multi-layer per ceptron) . CNN construction. W e next introduce pooling operators applicable to unstructured data, whereby for each point in a given tar get point cloud X targ et h , φ ( x i ) = F ( { x j | j ∈ X h , | x j − x i | <  } ) . Here F represents the pooling operator (e.g. max, average, etc.). W ith this collection of operators, one may construct architectures similar to CNNs by stacking GMLS-Layers together with pooling layers and other NN components. Strided GMLS-layers generalizing strided CNN stencils may be constructed by choosing target sites on a second, smaller point cloud. 1.3 Relation to other work. Many recent works aim to generalize CNNs away from the limitations of data on regular grids [4, 6]. This includes work on handling inputs in the form of directed and un-directed graphs [23], processing graphical data sets in the form of meshes and point-clouds [20, 29], and in handling scattered sub-samplings of images [6, 8]. Broadly , these works: (i) use the spectral theory of graphs and generalize con v olution in the frequency domain [6], (ii) develop localized notions similar to con volution operations and kernels in the spatial domain [24]. GMLS-Nets is most closely related to the second approach. The closest works include SplineCNNs [8], MoNet [11, 15], KP-Conv [24], and SpiderCNN [28]. In each of these methods a local spatial con volution kernel is approximated by a parameterized family of functions: open/closed B-Splines [8], a Gaussian correlation kernel [11, 15], or a kernel function based on a learnable combination of radial ReLu’ s [24]. The SpiderCNNs share man y similarities with GMLS-Nets using a kernel that is based on a learnable degree-three T aylor polynomial that is taken in product with a learnable radial piece wise-constant weight function [28]. A key distinction of GMLS-Nets is that operators are re gressed directly o ver the dual space V ∗ without constructing shape/kernel functions. Both approaches provide ways to approximate the action of a processing operator that aggregates o ver scattered data. W e also mention other meshfree learning frameworks: PointNet [19, 20] and Deep Sets [29], but these are aimed primarily at set-based data and geometric processing tasks for segmentation and 4 classiﬁcation. Additionally , Radial Basis Function (RBF) networks are similarly b uilt upon similar approximation theory [5, 18]. Related work on operator regression in a SciML context include [3, 7, 13, 14, 16, 17, 21, 22]. In PINNs [17, 21], a versatile framework based on DNNs is developed to regress both linear and non-linear PDE models while e xploiting physics kno wledge. In [3] and PDE-Nets [14], CNNs are used to learn stencils to estimate operators. In [7, 22] dictionary learning is used along with sparse optimization methods to identify dynamical systems to infer physical la ws associated with time-series data. In [16], regression is performed ov er a class of nonlinear pseudodif ferential operators, formed by composing neural network parameterized Fourier multipliers and pointwise functionals. GMLS-Nets can be used in conjunction with the above methods. GMLS-Nets have the distinction of being able to move be yond reliance on CNNs on regular grids, no longer need moment conditions to impose accuracy and interpretability of ﬁlters for estimating dif ferential operators [14], and do not require as strong assumptions about the particular form of the PDE or a pre-deﬁned dictionary as in [17, 22]. W e e xpect that prior knowledge exploited globally in PINNs methods may be incorporated into the GMLS-Layers. In particular , the ability to regress nati vely ov er solv er degrees of freedom will be particularly useful for SciML applications. 2 Results 2.1 Learning differ ential operators and identifying gov erning equations. Figure 2: Re gression of Differential Operators. GMLS-Nets can accurately learn both linear and non- linear operators, shown is the case of the 1D/2D Laplacians and Burger’ s equation. In-homogeneous operators can also be learned by including as one of the input channels the location x . Training and test data consists of random input functions in 1d at 10 2 nodes on [0 , 1] and in 2d at 400 nodes in [0 , 1] × [0 , 1] . Each random input function follows a Gaussian distribution with u ( x ) = P k ξ k exp ( i 2 π k · x /L ) with ξ k ∼ exp( − α 1 k 2 ) η (0 , 1) . T raining and test data is generated with α 1 = 0 . 1 by computed operators with spectral accurac y for N train = 5 × 10 4 and N test = 10 4 . Many data sets arising in the sciences are generated by processes for which there are expected gov erning laws expressible in terms of ordinary or partial differential equations. GMLS-Nets provide natural features to regress such operators from observed state trajectories or responses to ﬂuctuations. W e consider the two settings ∂ u ∂ t = L [ u ( t, x )] and L [ u ( x )] = − f ( x ) . (6) 5 The L [ u ] can be a linear or non-linear operator . When the data are snapshots of the system state u n = u ( t n ) at discrete times t n = n ∆ t , we use estimators based on u n +1 − u n ∆ t = L [ { u k } k ∈K ; ξ ] . (7) In the case that K = { n + 1 } , this corresponds to using an Implicit Euler scheme to model the dynamics. Many other choices are possible, and later we shall discuss estimators with conserv ation properties. The learning capabilities of GMLS-Nets to regress differential operators are shown in Fig. 2. As we shall discuss in more detail, this can be used to identify the underlying dynamics and obtain gov erning equations. 2.2 Long-time integrators: discretization for nati ve data-dri ven modeling . Figure 3: T op: Advection-dif fusion solution when ∆ t = ∆ t C F L . The true model solution and regressed solution all agree with the analytic solution. Bottom: Solution for under-resolv ed dynamics with ∆ t = 10∆ t C F L . The implicit integrator causes FDM/FVM of true operator to be overly dissipativ e. The regressed operator matches well with the FVM operator , matching the phase almost exactly . ∆ t / ∆ t C F L L F D M,ex L F D M L F V M ,ex L F V M 0.1 0.00093 0.00015 0.00014 0.00010 1 0.0011 0.00093 0.0011 0.00011 10 0.0083 0.0014 0.0083 0.00035 T able 1: The ` 2 -error for data-driv en ﬁnite difference model (FDM) and ﬁnite volume models (FVM) for adv ection-diffusion equation. Comparisons made to classical discretizations using e xact operators. For conserv ative data-dri ven ﬁnite volume model, there is an order of magnitude better accurac y for large timestep inte gration. The GMLS framew ork provides useful ways to target and sample arbitrary functionals. In a data transfer context, this has been lev eraged to couple heterogeneous codes. F or example, one may sample the ﬂux degrees of freedom of a Raviart-Thomas ﬁnite element space and target cell inte gral degrees of freedom of a ﬁnite v olume code to perform native data transfer . This av oids the need to perform intermediate projections/interpolations [12]. Moti vated by this, we demonstrate that GMLS may be used to learn discr etization native data-driven models , whereby dynamics are learned in the natural degrees of freedom for a giv en model. This provides access to structure preserving properties such as conservation, e.g., conserv ation of mass in a physical system. W e take as a source of training data the following analytic solution to the 1D unsteady advection- diffusion equation with advection and diffusion coefﬁcients a and ν on the interval Ω = [0 , 30] . 6 u ex ( x, t ) = 1 a √ 4 π ν t exp  − x − ( x 0 + at ) 4 ν t  (8) T o construct a ﬁnite difference model (FDM), we assume a node set N = { x 0 = 0 , x 1 , ..., x N − 1 , x N = 30 } . T o construct a ﬁnite volume model (FVM), we construct the set of cells C = { [ x i , x i +1 ] , x i , x i +1 ∈ N , i ∈ { 0 , ..., N − 1 }} , with associated cell measure µ ( c i ) = | x i +1 − x i | and set of oriented boundary f aces F i = ∂ c i = { x i +1 , − x i } . W e then assume for uniform timestep ∆ t = t n +1 − t n the Implicit Euler update for the FDM giv en by u n +1 i − u n i ∆ t = L F D M [ u n +1 ; ξ ] , (9) T o obtain conservation we use the FVM update u n +1 i − u n i ∆ t = 1 µ ( c i ) X f ∈ F i Z L F V M [ u n +1 ; ξ ] · d A . (10) For the advection-dif fusion equation in the limit ∆ t → 0 , L F D M,ex = a · ∇ u + ν ∇ 2 u and L F V M ,ex = au + ν ∇ u . By construction, for any choice of hyperparameters ξ the FVM will be locally conservati ve. In this sense, the physics of mass conservation are enforced strongly via the discretization, and we parameterize only an empirical closure for ﬂuxes - GMLS naturally enables such nativ e ﬂux regression. W e use a single linear GMLS-net layer to parameterize both L F D M and L F V M , and train ov er a single timestep by using Eqn. 8 to ev aluate the exact time increment in Eqns. 9-10 . W e perform gradient descent to minimize the RMS of the residual with respect to ξ . For the FDM and FVM we use a cubic and quartic polynomial space, respectively . Recall that to resolve the diffusion and advecti ve timescales one would select a timestep of roughly ∆ t C F L = min  1 2 a ∆ t ∆ x , 1 4 ν ∆ t ∆ x 2  . After regressing the operator , we solve the extracted scheme to adv ance from  u 0 i = u ( x i , t 0 )  i to n u t f inal i o i . As implicit Euler is unconditionally stable, one may select ∆ t  ∆ t C F L at the expense of introducing numerical dissipation, "smearing" the solution. W e consider ∆ t ∈ { 0 . 1∆ t C F L , ∆ t C F L , 10∆ t C F L } and compare both the learned FDM/FVM dynamics to those ob- tained with a standard discretization (i.e. letting L F D M = L F D M,ex . From Fig. 3 we observe that for ∆ t/ ∆ t C F L ≤ 1 both the regressed and reference models agree well with the analytic solution. Howe ver , for ∆ t = 10∆ t C F L , we see that while the reference models are ov erly dissipative, the regressed models match the analytic solution. Inspection of the ` 2 − norm of the solutions at t f inal in T able 1 indicates that as expected, the classical solutions corresponding to L F D M,ex and L F V M ,ex con verge as O (∆ t ) . The regressed FDM is consistently more accurate than the exact operator . Most interesting, the regressed FVM is roughly independent of ∆ t , providing a 20 × improv ement in accuracy o ver the classical model. This preliminary result suggests that GMLS-Nets offer promise as a tool to de velop non-dissipati ve implicit data-dri ven models. W e suggest that this is due to the ability for GMLS-Nets to re gress higher -order dif ferential operator corrections to the discrete time dynamics, similar to e.g. Lax-Friedrichs/Lax-W endrof f schemes. 2.3 Data-driven modeling fr om molecular dynamics. In science and engineering applications, there are often high-ﬁdelity descriptions of the physics based on molecular dynamics. One would like to extract continuum descriptions to allo w for predictions ov er longer time/length-scales or reduce computational costs. Coarse-grained modeling ef forts also hav e similar aims while retaining molecular degrees of freedom. Each seek lo wer-ﬁdelity models that are able to accurately predict important statistical moments of the high-ﬁdelity model over longer timescales. As an example, consider a mean-ﬁeld continuum model deriv ed by coarse-graining a molecular dynamics simulation. Classically , one may pursue homogenization analysis to carefully deriv e such a continuum model, but such techniques are typically problem speciﬁc and can become technical. W e illustrate here ho w GMLS-Nets can be used to extract a conserv ativ e continuum PDE model from particle-lev el simulation data. Brownian motion has as its inﬁnitesimal generator the unsteady dif fusion equation [10]. As a basic example, we will extract a 1D dif fusion equation to predict the long-term density of a cloud of particles 7 Figure 4: GMLS-Nets can be trained with molecular-le vel data to infer continuum dynamical models. Data are simulations of Brownian motion with periodic boundary conditions on Ω = [0 , 1] and diffusi vity D = 1 (top-left, unconstrained trajectory) . Starting with initial density of a heaviside function, we construct histograms over time to estimate the particle density (upper-right, solid lines) and perform further ﬁltering to remove sampling noise (upper-right, dashed lines) . GMLS-Net is trained using FVM estimator of equation 10. Predictiv e continuum model is obtained for the density ev olution. Long-term agreement is found between the particle-le vel simulation (bottom, solid lines) and the inferred continuum model (bottom, dashed lines) . undergoing pseudo-1D Brownian motion. W e consider the periodic domain Ω = [0 , 1] × [0 , 0 . 1] , and generate a collection of N p particles with initial position x p ( t = 0) drawn from the uniform distribution U [0 , 0 . 5] × U [0 , 0 . 1] . Due to this initialization and domain geometry , the particle density is statistically one dimensional. W e estimate the density ﬁeld ρ ( x, t ) along the ﬁrst dimension by constructing a collection C of N uniform width cells and build a histogram, ρ ( x, t ) = X c ∈ C N p X p =1 1 x p ( t ) ∈ c 1 x ∈ c . (11) The 1 x ∈ A is the indicator function taking unit value for x ∈ A and zero otherwise. W e e volve the particle positions x p ( t ) under 2D Bro wnian motion (the density will remain statistically 1D as the particles evolv e). In the limit N p / N → ∞ , the particle density satisﬁes a diffusion equation, and we can scale the Brownian motion increments to obtain a unit dif fusion coefﬁcient in this limit. As the ratio N p / N is ﬁnite, there is substantial noise in the extracted density ﬁeld. W e obtain a low pass ﬁltered density , e ρ ( x, t ) , by conv olving ρ ( x, t ) with a Gaussian kernel of width twice the histogram bin width. W e use the FVM scheme in the same manner as in the previous section. In particular, we regress a ﬂux that matches the increment ( e ρ ( x, t = 10) − e ρ ( x, t = 12)) / 2∆ t . This windo w was selected, since the regression at t = 0 is ineffecti ve as the density approximates a heaviside function. Such near discontinuities are poorly represented with polynomials and subsequently not expected to train well. Additionally , we train ov er a time interv al of 2∆ t , where in general k ∆ t steps can be used to help mollify high-frequency temporal noise. T o show how the GMLS-Nets’ inferred operator can be used to make predictions, we ev olve the regressed FVM for one hundred timesteps and compare to the density ﬁeld obtained from the particle 8 solver . W e apply Dirichlet boundary conditions ρ (0 , t ) = ρ (1 , t ) = 1 and initial conditions matching the histogram ρ ( x, t = 0) . Again, the FVM by construction is conservati ve, where it is easily sho wn for all t that R Ω ρdx = N p . A time series summarizing the e v olution of density in both the particle solver and the regressed continuum model is pro vided in Fig 4. While this is a basic example, this illustrates the potential of GMLS-nets in constructing continuum-le vel models from molecular data. These techniques also could hav e an impact on data-driv en approaches for numerical methods, such as projectiv e integration schemes. 2.4 Image processing: MNIST benchmark. Figure 5: MNIST Classiﬁcation. GMLS-Layers are substituted for conv olution layers in a basic two-layer architecture (Conv2d + ReLu + MaxPool + Conv2d + ReLu + MaxPool + FC). The Con v-2L test are all Conv-Layers, Hybrib-2L has GMLS-Layer follo wed by a Conv-Layer , and GMLS-2L uses all GMLS-Layers. GMLS-Nets used a polynomial basis of monomials. The ﬁlters in GMLS are by design more limited than a general Con v-Layer and correspond here to estimated deri v ativ es of the data set (top-right) . Despite these restrictions, the GMLS-Net still performs reasonably well on this basic classiﬁcation task (bottom-table). While image processing is not the primary application area we intend, GMLS-Nets can be used for tasks such as classiﬁcation. For the common MNIST benchmark task, we compare use of GMLS-Nets with CNNs in Figure 5. CNNs use kernel size 5 , zero-padding, max-pool reduction 2 , channel sizes 16 , 32 , FC as linear map to soft-max prediction of the categories. The GMLS-Nets use the same architecture with a GMLS using polynomial basis of monomials in x, y up to degree p order = 4 . W e ﬁnd that despite the features extracted by GMLS-Nets being more restricted than a general CNN, there is only a modest decrease in the accuracy for the basic MNIST task. W e do expect lar ger differences on more sophisticated image tasks. This basic test illustrates how GMLS-Nets with a polynomial basis extracts features closely associated with taking deriv ati ves of the data ﬁeld. W e emphasize for other choices of basis for p ∗ and sampling functionals λ j , other features may be extracted. For polynomials with terms in dictionary order , coef ﬁcients are sho wn in Fig. 5. Notice the clear trends and directional dependence on increases and decreases in the image intensity , indicating c [1] ∼ ∂ x and c [2] ∼ ∂ y . Given the history of PDE modeling, for many classiﬁcation and re gression tasks arising in the sciences and engineering, we e xpect such deri v ati ve-based features e xtracted by GMLS-Nets will be useful in these applications. 2.5 GMLS-Net on unstructured ﬂuid simulation data. W e consider the application of GMLS-Nets to unstructured data sets representati ve of scientiﬁc machine learning applications. Many hydrodynamic ﬂo ws can be experimentally characterized using velocimetry measurements. While velocity ﬁelds can be estimated e ven for complex geometries, in such measurements one often does not hav e access directly to ﬁelds, such as the pressure. Howe ver , integrated quantities of interest, such as drag are fundamental for performing engineering analysis and yet depend upon both the velocity and pressure. This limits the lev el of characterization that can be accomplished when using velocimetry data alone. W e construct GMLS-Net architectures that allow for prediction of the drag directly from unstructured ﬂuid velocity data, without any direct measurement of the pressure. 9 W e illustrate the ideas using ﬂow past a cylinder of radius L . This provides a well-studied canonical problem whose drag is fully characterized experimentally in terms of the Re ynolds number, Re = U L/ν . For incompressible ﬂo w past a c ylinder , one may apply dimensional analysis to relate drag F d to the Reynolds number via the drag coef ﬁcient C d : 2 F d ρU 2 ∞ A = C d  U L ν  . (12) The U ∞ is the free-stream velocity , A is the frontal area of the cylinder , and C d : R → R . Such analysis requires in practice engineering judgement to identify relev ant dimensionless groups. After such considerations, this allows one to collapse rele v ant experimental parameters to ( ρ, U ∞ , A, L, ν ) onto a single curve. Figure 6: GMLS-Nets are trained on a CFD data set of ﬂow velocity ﬁelds. T op: Training set of the drag coefﬁcient plotted as a function of Reynolds number (small black dots). The GMLS-Net predictions for a test set (large red dots). Bottom: Flo w velocity ﬁelds corresponding to the smallest (left) and largest (right) Re ynolds numbers in the test set. For the purposes of training a GMLS-Net, we construct a synthetic data set by solving the Reynolds av eraged Navier -Stokes (RANS) equations with a steady state ﬁnite volume code. Let L = ρ = 1 and consider U ∈ [0 . 1 , 20] and ν ∈  10 − 2 , 10 8  . W e consider a k −  turbulence model with inlet conditions consistent with a 10% turbulence intensity and a mixing length corresponding to the inlet size. From the solution, we extract the velocity ﬁeld u at cell centers to obtain an unstructured point cloud X h . W e compute C d directly from the simulations. W e then obtain an unstructured data set of 400 ( u ) i features ov er X h , with associated labels C d . W e emphasize that although U ∞ and ν are used to generate the data, they are not included as features, and the Reynolds number is therefore hidden. W e remark that the k −  model is well kno wn to perform poorly for ﬂo ws with strong curvature such as recirculation zones. Here, in our proof-of-concept demonstration, we treat the RANS- k −  solution as ground truth for simplicity , despite its short-comings and acknowledge that a more physical study would consider ensemble a verages of LES/DNS data in 3D. W e aim here just to illustrate the potential utility of GMLS-Nets in a scientiﬁc setting for processing such unstructured data sets. As an architecture, we provide tw o input channels for the two v elocity components to three stacked GMLS layers. The ﬁrst layer acts on the cell centers, and intermediate pooling layers down-sample to random subsets of X h . W e conclude with a linear activ ation layer to extract the drag coefﬁcient as a single scalar output. W e randomly select 80% of the samples for training, and use the remainder as a test set. W e quantify using the root-mean-square (MSE) error which we ﬁnd to be belo w 1 . 5% . 10 The excellent predicti ve capability demonstrated in Fig. 6 highlights GMLS-Nets ability to provide an ef fecti ve means of regressing engineering quantities of interest directly from v elocity ﬂo w data; the GMLS-Net architecture is able to identify a latent low-dimensional parameter space which is typically found by hand using dimensional analysis. This similarity relationship across the Reynolds numbers is identiﬁed, despite the f act that it does not hav e direct access to the viscosity parameter . These initial results indicate some of the potential of GMLS-Nets in processing unstructured data sets for scientiﬁc machine learning applications. 3 Conclusions W e have introduced GMLS-Nets for processing scattered data sets lev eraging the framework of GMLS. GMLS-Nets allo w for generalizing con volutional netw orks to scattered data, while still beneﬁting from underlying translational inv ariances and weight sharing. The GMLS-layers provide feature extractors that are natural particularly for regressing differential operators, developing dynamical models, and predicting quantities of interest associated with physical systems. GMLS-Nets were demonstrated to be capable of obtaining dynamical models for long-time integration be yond the limits of traditional CFL conditions, for making predictions of density ev olution of molecular systems, and for predicting directly from ﬂo w data quantities of interest in ﬂuid mechanics. These initial results indicate some promising capabilities of GMLS-Nets for use in data-driv en modeling in scientiﬁc machine learning applications. References [1] P . J. Atzberger. “Importance of the Mathematical Foundations of Machine Learning Meth- ods for Scientiﬁc and Engineering Applications”. In: SciML2018 W orkshop, position paper , https://arxiv .org/abs/1808.02213 (2018). [2] Nathan Baker, Frank Ale xander, T imo Bremer, Aric Hagber g, Y annis Ke vrekidis, Habib Najm, Manish Parashar, Abani P atra, James Sethian, Stefan W ild, and Karen W illcox. “W orkshop Report on Basic Research Needs for Scientiﬁc Machine Learning: Core T echnologies for Artiﬁcial Intelligence”. In: (2018). [3] Y ohai Bar-Sinai, Stephan Hoyer, Jason Hickey, and Michael P . Brenner. “Learning data- driv en discretizations for partial dif ferential equations”. In: Pr oceedings of the National Academy of Sciences 116.31 (2019), pp. 15344–15349. I S S N : 0027-8424. D O I : 10 . 1073 / pnas.1814058116 . [4] M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P . V andergheynst. “Geometric Deep Learning: Going beyond Euclidean data”. In: IEEE Signal Pr ocessing Magazine 34.4 (2017), pp. 18–42. I S S N : 1053-5888. D O I : 10.1109/MSP.2017.2693418 . [5] D.S. Broomhead and D. Lowe. “Multiv ariable Functional Interpolation and Adaptiv e Net- works”. In: Comple x Systems 2.1 (1988), pp. 321–355. [6] Joan Bruna, W ojciech Zaremba, Arthur Szlam, and Y ann Lecun. “Spectral netw orks and locally connected networks on graphs”. English (US). In: International Conference on Learning Repr esentations (ICLR2014), CBLS, April 2014 . 2014. [7] Stev en L. Brunton, Joshua L. Proctor , and J. Nathan Kutz. “Disco vering go verning equa- tions from data by sparse identiﬁcation of nonlinear dynamical systems”. In: 113.15 (2016), pp. 3932–3937. [8] M. Fey, J. E. Lenssen, F . W eichert, and H. Müller. “SplineCNN: F ast Geometric Deep Learning with Continuous B-Spline Kernels”. In: 2018 IEEE/CVF Confer ence on Computer V ision and P attern Recognition . 2018, pp. 869–877. [9] B. J. Gross, N. T rask, P . Kuberry, and P . J. Atzberger. “Meshfree Methods on Manifolds for Hydrodynamic Flows on Curved Surfaces: A Generalized Moving Least-Squares (GMLS) Approach”. In: arXiv:1905.10469 (2019). U R L : . [10] Ioannis Karatzas and Steven E Shre ve. “Brownian Motion and Stochastic Calculus”. In: Springer, 1998, pp. 47–127. 11 [11] Thomas N. Kipf and Max W elling. “Semi-Supervised Classiﬁcation with Graph Con volutional Networks”. In: ArXiv abs/1609.02907 (2016). [12] Paul Allen K uberry, Pa vel B Bochev, and Kara J Peterson. A virtual contr ol meshfr ee coupling method for non-coincident interfaces . T ech. rep. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2018. [13] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artiﬁcial neural networks for solving ordinary and partial dif ferential equations”. In: IEEE T ransactions on Neural Networks 9.5 (1998), pp. 987–1000. [14] Zichao Long, Y iping Lu, Xianzhong Ma, and Bin Dong. “PDE-Net: Learning PDEs from Data”. In: Pr oceedings of the 35th International Conference on Machine Learning . Ed. by Jennifer Dy and Andreas Krause. V ol. 80. Proceedings of Machine Learning Research. Stockholmsmässan, Stockholm Sweden: PMLR, 2018, pp. 3208–3216. [15] Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Sv oboda, and Michael M. Bronstein. “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs”. In: 2017 IEEE Conference on Computer V ision and P attern Recognition (CVPR) (2016), pp. 5425–5434. [16] Ravi G. P atel and Olivier Desjardins. “Nonlinear inte gro-dif ferential operator regression with neural networks”. In: ArXiv abs/1810.08552 (2018). [17] “Physics-informed neural networks: A deep learning frame work for solving forward and in- verse problems in volving nonlinear partial differential equations”. In: Journal of Computational Physics 378 (2019), pp. 686 –707. [18] T . Poggio and F . Girosi. “Networks for approximation and learning”. In: Pr oceedings of the IEEE 78.9 (1990), pp. 1481–1497. [19] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. “PointNet: Deep Learning on Point Sets for 3D Classiﬁcation and Segmentation”. In: The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) . 2017. [20] Charles Ruizhongtai Qi, Li Y i, Hao Su, and Leonidas J Guibas. “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space”. In: Advances in Neural Information Pr ocessing Systems 30 . Ed. by I. Guyon, U. V . Luxburg, S. Bengio, H. W allach, R. Fergus, S. V ishwanathan, and R. Garnett. Curran Associates, Inc., 2017, pp. 5099–5108. [21] Maziar Raissi and George Em Karniadakis. “Hidden physics models: Machine learning of nonlinear partial differential equations”. In: Journal of Computational Physics 357 (2018), pp. 125 –141. I S S N : 0021-9991. [22] Samuel H. Rudy, Stev en L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Data-driv en discov ery of partial dif ferential equations”. In: 3.4 (2017). [23] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. “The Graph Neural Network Model”. In: T rans. Neur . Netw . 20.1 (Jan. 2009), pp. 61–80. I S S N : 1045-9227. [24] Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcote gui, François Goulette, and Leonidas J. Guibas. “KPCon v: Flexible and Deformable Con volution for Point Clouds”. In: (2019). U R L : . [25] Nathaniel T rask, Pav el Boche v, and Mauro Perego. “A conserv ativ e, consistent, and scalable meshfree mimetic method”. In: arXiv pr eprint arXiv:1903.04621 (2019). [26] Nathaniel T rask, Mauro Perego, and Pa vel Bochev. “A high-order staggered meshless method for elliptic problems”. In: SIAM Journal on Scientiﬁc Computing 39.2 (2017), A479–A502. [27] Holger W endland. Scattered data appr oximation . V ol. 17. Cambridge uni versity press, 2004. [28] Y ifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Y u Qiao. “SpiderCNN: Deep Learning on Point Sets with Parameterized Con volutional Filters”. In: Computer V ision – ECCV 2018 . Ed. by V ittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Y air W eiss. Cham: Springer International Publishing, 2018, pp. 90–105. I S B N : 978-3-030-01237-3. [29] Manzil Zaheer , Satwik K ottur , Siamak Rav anbakhsh, Barnabas Poczos, Ruslan R Salakhut- dinov , and Alexander J Smola. “Deep Sets”. In: Advances in Neur al Information Pr ocessing Systems 30 . Ed. by I. Guyon, U. V . Luxbur g, S. Bengio, H. W allach, R. Fergus, S. V ishwanathan, and R. Garnett. Curran Associates, Inc., 2017, pp. 3391–3401. 12 A Derivation of Gradients of the Operator τ x i [ u ] . A.1 Parameters of the operator ˜ τ . W e give here some details on the derivation of the gradients for the learnable GMLS operator τ [ u ] and intermediate steps. This can be used in implementations for back-propagation and other applications. GMLS works by mapping data to a local polynomial ﬁt in region Ω i around x i with p ∗ ( x ) ≈ u ( x ) for x ∈ Ω i . T o ﬁnd the optimal ﬁtting polynomial p ∗ ( x ) ∈ V to the function u ( x ) , we can consider the case with λ j ( x ) = δ ( x − x j ) and weight function w ij = w ( x i − x j ) . In a region around a reference point x ∗ the optimization problem can be expressed parameterically in terms of coef ﬁcients a as a ∗ ( x i ) = arg min a ∈ R m X j  u j − p ( x j ) T a  2 w ij . W e write for short p ( x j ) = p ( x j , x i ) , where the basis elements in fact do depend on x i . T ypically , for polynomials we just use p ( x j , x i ) = p ( x j − x i ) . This is important in the case we want to tak e deriv atives in the input values x i of the expressions. W e can compute the derivati ve in a ` to obtain ∂ J ∂ a ` ( x i ) = 0 . This implies " X j p ( x j ) w ij p ( x j ) T # a = X j w ij p ( x j ) u j . Let M = " X j p ( x j ) w ij p ( x j ) T # , r = X j w ij p ( x j ) u j , then we can rewrite the coef ﬁcients as the solution of the linear system M a ∗ ( x i ) = r . This is sometimes written more explicitly for analysis and computations as a ∗ ( x i ) = M − 1 r . W e can represent a general linear operator ˜ τ ( x i ) using the a ∗ representation as ˜ τ ( x i ) = q ( x i ) T a ∗ ( x i ) T ypically , the weights will not be spatially dependent q ( x i ) = q 0 . Throughout, we shall denote this simply as q and assume there is no spatial dependence, unless otherwise indicated. A.2 Derivati ves of ˜ τ in x i , a ( x i ) , and q . The deriv ative in x i is giv en by ∂ ∂ x i a ∗ ( x i ) = ∂ M − 1 ∂ x i r + M − 1 ∂ r ∂ x i In the notation, we denote p ( x j ) = p ( x j , x i ) , where the basis elements in fact can depend on the particular x i . These terms can be expressed as ∂ M − 1 ∂ x i = − M − 2 ∂ M − 1 ∂ x i , where ∂ M ∂ x i = X j  ∂ ∂ x i p ( x j , x i )  p ( x j , x i ) T w ij + p ( x j , x i )  ∂ ∂ x i p ( x j , x i )  T w ij + p ( x j , x i ) p ( x j , x i ) T ∂ w ij ∂ x i  . The deriv atives in r are giv en by ∂ r ∂ x i = X j  ∂ ∂ x i p ( x j )  u j w ij + p ( x j ) u j ∂ w ij ∂ x i  . 13 The full deriv ative of the linear operator ˜ τ can be e xpressed as ∂ ∂ x i ˜ τ ( x i ) =  ∂ ∂ x i q ( x i ) T  a ∗ ( x i ) + q ( x i ) T  ∂ ∂ x i a ∗ ( x i )  . In the constant case q ( x i ) = q 0 , the deriv ative of ˜ τ simpliﬁes to ∂ ∂ x i ˜ τ ( x i ) = q T 0  ∂ ∂ x i a ∗ ( x i )  . The deriv atives of the other terms follo w more readily . For deriv ative of the linear operator ˜ τ in the coefﬁcients a ( x i ) , we hav e ∂ ∂ a ( x i ) ˜ τ ( x i ) = q ( x i ) . For deri vativ es of the linear operator ˜ τ in the mapping coef ﬁcient q values, we ha ve ∂ ∂ q ( x i ) ˜ τ ( x i ) = a ( x i ) . In the case of nonlinear operators ˜ τ = q ( a ( x i )) there are further dependencies beyond just x i and a ( x i ) , and less explicit e xpressions. For example, when using MLP’ s there may be hierarchy of trainable weights w . The deriv atives of the non-linear operator can be e xpressed as ∂ ∂ w ˜ τ ( x i ) = ∂ q ∂ w ( a ( x i )) . Here, one relies on back-propagation algorithms for e v aluation of ∂ q ∂ w . Similarly , given the generality of q ( a ) , for deriv atives in a and x i , one can use back-propagation methods on q and the chain-rule with the expressions deriv ed during the linear case for a and x i dependencies. 14

GMLS-Nets: A framework for learning from unstructured data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment