GMLS-Nets: A framework for learning from unstructured data

Data fields sampled on irregularly spaced points arise in many applications in the sciences and engineering. For regular grids, Convolutional Neural Networks (CNNs) have been successfully used to gaining benefits from weight sharing and invariances. …

Authors: Nathaniel Trask, Ravi G.Patel, Ben J. Gross

GMLS-Nets: A framework for learning from unstructured data
GMLS-Nets: A framework f or lear ning fr om unstructur ed data Nathaniel T rask 1 , + , Ravi G. P atel 1 , Ben J. Gr oss 2 , Paul J . Atzberger 2 , † 1 Sandia National Laboratories 2 Univ ersity of California Santa Barbara Center for Computing Research † atzberg@gmail.com + natrask@sandia.gov http://atzberger.org/ September 5, 2019 Abstract Data fields sampled on irregularly spaced points arise in man y applications in the sciences and engineering. For re gular grids, Con volutional Neural Networks (CNNs) hav e been successfully used to gaining benefits from weight sharing and in variances. W e generalize CNNs by introducing methods for data on unstructured point clouds based on Generalized Mo ving Least Squares (GMLS). GMLS is a non- parametric technique for estimating linear bounded functionals from scattered data, and has recently been used in the literature for solving partial dif ferential equations. By parameterizing the GMLS estimator , we obtain learning methods for operators with unstructured stencils. In GMLS-Nets the necessary calculations are local, readily parallelizable, and the estimator is supported by a rigorous approximation theory . W e sho w how the frame work may be used for unstructured physical data sets to perform functional re gression to identify associated dif ferential operators and to regress quantities of interest. The results suggest the architectures to be an attracti ve foundation for data-dri ven model de velopment in scientific machine learning applications. 1 Introduction Many scientific and engineering applications require processing data sets sampled on irregularly spaced points. Consider e.g. GIS data associating geospatial locations with measurements, LID AR data characterizing object geometry via point clouds, scientific simulations with unstructured meshes. This need is amplified by the recent surge of interest in scientific machine learning (SciML) [2] targeting the application of data-dri ven techniques to the sciences. In this setting, data typically takes the form of e.g. synthetic simulation data from meshes, or from sensors associated with data sites e volving under unkno wn or partially kno wn dynamics. This data is often scarce or highly constrained, and it has been proposed that successful SciML strategies will le verage prior kno wledge to enhance information gained from such data [1, 2]. One may exploit physical properties and in variances such as transformation symmetries, conserv ation structure, or mathematical kno wledge such as solution regularity [1, 3, 7]. This new application space necessitates ML architectures capable of utilizing such knowledge. Implementations in T ensorFlow and PyT orch are av ailable at https://github.com/rgp62/gmls- nets and https://github.com/atzberg/gmls- nets . 1. Sandia National Laboratories is a multimission laboratory managed and operated by National T echnology and Engineering Solutions of Sandia, LLC.,a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’ s National Nuclear Security Administration under contract DE-N A-0003525. For data sampled on regular grids, Con volutional Neural Netw orks (CNNs) are widely used to e xploit translation in variance and hierarchical structure to extract features from data. Here we generalize this technique to the SciML setting by introducing GMLS-Nets based on the scattered data approximation theory underlying generalized moving least squares (GMLS). Similar to how CNNs learn stencils which benefit from weight-sharing, GMLS-Nets operate by using local reconstructions to learn operators between function spaces. The resulting architecture is similarly interpretable and serves as an effecti ve generalization of CNNs to unstructured data, while providing mechanisms to incorporate knowledge of underlying ph ysics. In this work we sho w how GMLS-Nets may be used in a SciML setting. Our results sho w GMLS- Nets are an ef fective tool to discov er partial diferential equations (PDEs), which may be used as a foundation to construct data-driven models while preserving physical in v ariants like conserv ation principles. W e also show they may be used to improve traditional scientific components, such as time integrators. W e show the y also can be used to regress engineering quantities of interest from scientific simulation data. Finally , we briefly sho w GMLS-Nets can perform reasonably relativ e to con vNets on traditional computer vision benchmarks. These results indicate the promise of GMLS-Nets to support data-driv en modeling efforts in SciML applications. Implementations in T ensorFlow and PyT orch are available at https://github.com/rgp62/gmls- nets and https: //github.com/atzberg/gmls- nets . 1.1 Generalized Moving Least Squar es (GMLS) Generalized Moving Least Squares (GMLS) is a non-parametric functional re gression technique to construct approximations of linear, bounded functionals from scattered samples of an underlying field by solving local least-square problems. On a Banach space V with dual space V ∗ , we aim to recov er an estimate of a giv en target functional τ ˜ x [ u ] ∈ V ∗ acting on u = u ( x ) ∈ V , where x , ˜ x denote associated locations in a compactly supported domain Ω ⊂ R d . W e assume u is characterized by an unstructured collection of sampling functionals, Λ( u ) := { λ j ( u ) } N j =1 ⊂ V ∗ . T o construct this estimate, we consider P ⊂ V and seek an element p ∗ ∈ P which pro vides an optimal reconstruction of the samples in the following weighted- ` 2 sense. p ∗ = argmin p ∈ P N X j =1 ( λ j ( u ) − λ j ( p )) 2 ω ( λ j , τ ˜ x ) . (1) Here ω ( λ j , τ ˜ x ) is a positiv e, compactly supported kernel function establishing spatial correlation between the target functional and sampling set. If one associates locations X h := { x j } N j =1 ⊂ Ω with Λ( u ) , then one may consider radial kernels ω = W  ( || x j − ˜ x || 2 ) , with support r <  . Assuming the basis P = span { φ 1 , ..., φ dim( P ) } , and denoting Φ( x ) = { φ i ( x ) } i =1 ,...,dim ( P ) , the optimal reconstruction may be written in terms of an optimal coefficient v ector a ( u ) p ∗ = Φ( x ) | a ( u ) . (2) Provided one has kno wledge of how the tar get functional acts on P , the final GMLS estimate may be obtained by applying the target functional to the optimal reconstruction τ h ˜ x [ u ] = τ ˜ x (Φ) | a ( u ) . (3) Sufficient conditions for the e xistence of solutions to Eqn. 1 depend only upon the unisolvency of Λ ov er V , the distribution of samples X h , and mild conditions on the domain Ω ; they are independent of the choice of τ ˜ x . For theoretical underpinnings and recent applications, we refer readers to [9, 25 – 27]. GMLS has primarily been used to obtain point estimates of dif ferential operators to dev elop meshfree discretizations of PDEs. The abstraction of GMLS howe ver provides a mathematically rigorous This paper describes objecti ve technical results and analysis. Any subjecti ve vie ws or opinions that might be expressed in the paper do not necessarily represent the vie ws of the U.S. Department of Ener gy or the United States Gov ernment. * W ork supported by DOE Grant ASCR PhILMs DE-SC0019246. 2 approximation theory framework which may be applied to a wealth of problems, whereby one may tailor the choice of τ ˜ x , Λ , P and ω to a giv en application. In the current work, we will assume the action of τ ˜ x on P is unknown, and introduce a parameterization τ ˜ x ,ξ (Φ) , where ξ denote hyperparameters to be inferred from data. Classically , GMLS is restricted to linear bounded tar get functionals; we will also consider a nov el nonlinear extension by considering estimates of the form τ h ˜ x [ u ] = q ˜ x ,ξ ( a ( u )) , (4) where q ˜ x ,ξ is a family of nonlinear operators parameterized by ξ acting upon the GMLS reconstruction. Where unambiguous, we will drop the ˜ x dependence of operators and simply write e.g. τ h [ u ] = q ξ ( a ( u )) . W e ha ve recently used related non-linear v ariants of GMLS to dev elop solvers for PDEs on manifolds in [9]. For simplicity , in this work we specialize as follo ws. Let: Λ be point e v aluations on X h ; P be π m ( R d ) , the space of m th -order polynomials; let W  ( r ) = (1 − r / ) ¯ p + , where f + denotes the positiv e part of a function f and p ∈ N . W e stress howe ver that this framew ork supports a much broader application. Consider e.g. learning from flux data related to H ( div ) -conforming discretizations, where one may select as sampling functional λ i ( u ) = R f i u · d A , or consider the physical constraints that may be imposed by selecting P as be di vergence free or satisfy a dif ferential equation. W e illustrate now the connection between GMLS and conv olutional networks in the case of a uniform grid, X h ⊂ Z d . Consider a sampling functional λ ( u ) = ( u ( x j ) − u ( x i )) , and assume the parameterization τ ˜ x ,ξ (Φ) =  ξ 1 , ..., ξ dim ( P )  , x i,j = x i − x j . Then the GMLS estimate is given explicitly at a point x i by τ h ˜ x i [ u ] = X α,β ,j ξ α X k φ α ( x k ) W ( x ik ) φ β ( x k ) ! − 1 φ β ( x j ) W ( x i,j )( u j − u i ) . (5) Contracting terms inv olving α, β and k , we may write τ h ˜ x i [ u ] = P j c ( τ , Λ) ij ( u j − u i ) . The collection of stencil coef ficients at x i ∈ X h are { c ( τ , Λ) ij } j . Therefore, one application for GMLS is to build stencils similar to con volutional netw orks. A major distinction is that GMLS can handle scattered data sets and a judicious selection of Λ , P and ω can be used to inject prior information. Alternativ ely , one may interpret the regression over P as an encoding in a low-dimensional space well-suited to characterize common operators. For continuous functions for e xample, an operator’ s action on the space of polynomials is often sufficient to obtain a good approximation. W e also remark that unlike CNNs there is often less need to handle boundary ef fects; GMLS-nets is capable of learning one-sided stencils. 1.2 GMLS-Nets From an ML perspective, GMLS estimation consists of two parts: (i) data is encoded via the coef ficient v ector a ( u ) providing a compression of the data in terms of P , (ii) the operator is regressed ov er P ∗ ; this is equiv alent to finding a function q ξ : a ( u ) → R . W e propose GMLS-Layers encoding this process in Figure 1. This architecture accepts input channels inde xed by α which consist of components of the data vector -field [ u ] α sampled ov er the scattered points X h . W e allow for dif ferent sampling points for each channel, which may be helpful for heterogeneous data. Each of these input channels is then used to obtain an encoding of the input field as the vector a ( u ) identifying the optimal representer in P . W e next select our parameterization of the functional via q ξ , which may be any f amily of functions trainable by back-propagation. W e will consider two cases in this work appropriate for linear and non-linear operators. In the linear case we consider q x i ( a ) = ξ T a , which is sufficient to exactly reproduce dif ferential operators. For the nonlinear case we parameterize with a multi-layer perceptron (MLP), q ξ ( a ) = MLP ( a ) . Note that in the case of linear acti vation function, the single layer MLP model reduces to the linear model. Nonlinearity may thus be handled within a single nonlinear GMLS-Layer , or by stacking multiple linear GMLS-layers with intermediate ReLU’ s, the later mapping more directly onto traditional 3 Figure 1: GMLS-Nets. Scattered data inputs are processed by learnable operators τ [ u ] parameterized via GMLS estimators. A local reconstruction is built about each data point and encoded as a coefficient vector via equation 2. The coefficient mapping q ( a ) of equation 4 pro vides the learnable action of the operator . GMLS-Layers can be stacked to obtain deeper architectures and combined with other neural network operations to perform classification and regression tasks (inset, SD: scattered data, MP: max-pool, MLP: multi-layer per ceptron) . CNN construction. W e next introduce pooling operators applicable to unstructured data, whereby for each point in a given tar get point cloud X targ et h , φ ( x i ) = F ( { x j | j ∈ X h , | x j − x i | <  } ) . Here F represents the pooling operator (e.g. max, average, etc.). W ith this collection of operators, one may construct architectures similar to CNNs by stacking GMLS-Layers together with pooling layers and other NN components. Strided GMLS-layers generalizing strided CNN stencils may be constructed by choosing target sites on a second, smaller point cloud. 1.3 Relation to other work. Many recent works aim to generalize CNNs away from the limitations of data on regular grids [4, 6]. This includes work on handling inputs in the form of directed and un-directed graphs [23], processing graphical data sets in the form of meshes and point-clouds [20, 29], and in handling scattered sub-samplings of images [6, 8]. Broadly , these works: (i) use the spectral theory of graphs and generalize con v olution in the frequency domain [6], (ii) develop localized notions similar to con volution operations and kernels in the spatial domain [24]. GMLS-Nets is most closely related to the second approach. The closest works include SplineCNNs [8], MoNet [11, 15], KP-Conv [24], and SpiderCNN [28]. In each of these methods a local spatial con volution kernel is approximated by a parameterized family of functions: open/closed B-Splines [8], a Gaussian correlation kernel [11, 15], or a kernel function based on a learnable combination of radial ReLu’ s [24]. The SpiderCNNs share man y similarities with GMLS-Nets using a kernel that is based on a learnable degree-three T aylor polynomial that is taken in product with a learnable radial piece wise-constant weight function [28]. A key distinction of GMLS-Nets is that operators are re gressed directly o ver the dual space V ∗ without constructing shape/kernel functions. Both approaches provide ways to approximate the action of a processing operator that aggregates o ver scattered data. W e also mention other meshfree learning frameworks: PointNet [19, 20] and Deep Sets [29], but these are aimed primarily at set-based data and geometric processing tasks for segmentation and 4 classification. Additionally , Radial Basis Function (RBF) networks are similarly b uilt upon similar approximation theory [5, 18]. Related work on operator regression in a SciML context include [3, 7, 13, 14, 16, 17, 21, 22]. In PINNs [17, 21], a versatile framework based on DNNs is developed to regress both linear and non-linear PDE models while e xploiting physics kno wledge. In [3] and PDE-Nets [14], CNNs are used to learn stencils to estimate operators. In [7, 22] dictionary learning is used along with sparse optimization methods to identify dynamical systems to infer physical la ws associated with time-series data. In [16], regression is performed ov er a class of nonlinear pseudodif ferential operators, formed by composing neural network parameterized Fourier multipliers and pointwise functionals. GMLS-Nets can be used in conjunction with the above methods. GMLS-Nets have the distinction of being able to move be yond reliance on CNNs on regular grids, no longer need moment conditions to impose accuracy and interpretability of filters for estimating dif ferential operators [14], and do not require as strong assumptions about the particular form of the PDE or a pre-defined dictionary as in [17, 22]. W e e xpect that prior knowledge exploited globally in PINNs methods may be incorporated into the GMLS-Layers. In particular , the ability to regress nati vely ov er solv er degrees of freedom will be particularly useful for SciML applications. 2 Results 2.1 Learning differ ential operators and identifying gov erning equations. Figure 2: Re gression of Differential Operators. GMLS-Nets can accurately learn both linear and non- linear operators, shown is the case of the 1D/2D Laplacians and Burger’ s equation. In-homogeneous operators can also be learned by including as one of the input channels the location x . Training and test data consists of random input functions in 1d at 10 2 nodes on [0 , 1] and in 2d at 400 nodes in [0 , 1] × [0 , 1] . Each random input function follows a Gaussian distribution with u ( x ) = P k ξ k exp ( i 2 π k · x /L ) with ξ k ∼ exp( − α 1 k 2 ) η (0 , 1) . T raining and test data is generated with α 1 = 0 . 1 by computed operators with spectral accurac y for N train = 5 × 10 4 and N test = 10 4 . Many data sets arising in the sciences are generated by processes for which there are expected gov erning laws expressible in terms of ordinary or partial differential equations. GMLS-Nets provide natural features to regress such operators from observed state trajectories or responses to fluctuations. W e consider the two settings ∂ u ∂ t = L [ u ( t, x )] and L [ u ( x )] = − f ( x ) . (6) 5 The L [ u ] can be a linear or non-linear operator . When the data are snapshots of the system state u n = u ( t n ) at discrete times t n = n ∆ t , we use estimators based on u n +1 − u n ∆ t = L [ { u k } k ∈K ; ξ ] . (7) In the case that K = { n + 1 } , this corresponds to using an Implicit Euler scheme to model the dynamics. Many other choices are possible, and later we shall discuss estimators with conserv ation properties. The learning capabilities of GMLS-Nets to regress differential operators are shown in Fig. 2. As we shall discuss in more detail, this can be used to identify the underlying dynamics and obtain gov erning equations. 2.2 Long-time integrators: discretization for nati ve data-dri ven modeling . Figure 3: T op: Advection-dif fusion solution when ∆ t = ∆ t C F L . The true model solution and regressed solution all agree with the analytic solution. Bottom: Solution for under-resolv ed dynamics with ∆ t = 10∆ t C F L . The implicit integrator causes FDM/FVM of true operator to be overly dissipativ e. The regressed operator matches well with the FVM operator , matching the phase almost exactly . ∆ t / ∆ t C F L L F D M,ex L F D M L F V M ,ex L F V M 0.1 0.00093 0.00015 0.00014 0.00010 1 0.0011 0.00093 0.0011 0.00011 10 0.0083 0.0014 0.0083 0.00035 T able 1: The ` 2 -error for data-driv en finite difference model (FDM) and finite volume models (FVM) for adv ection-diffusion equation. Comparisons made to classical discretizations using e xact operators. For conserv ative data-dri ven finite volume model, there is an order of magnitude better accurac y for large timestep inte gration. The GMLS framew ork provides useful ways to target and sample arbitrary functionals. In a data transfer context, this has been lev eraged to couple heterogeneous codes. F or example, one may sample the flux degrees of freedom of a Raviart-Thomas finite element space and target cell inte gral degrees of freedom of a finite v olume code to perform native data transfer . This av oids the need to perform intermediate projections/interpolations [12]. Moti vated by this, we demonstrate that GMLS may be used to learn discr etization native data-driven models , whereby dynamics are learned in the natural degrees of freedom for a giv en model. This provides access to structure preserving properties such as conservation, e.g., conserv ation of mass in a physical system. W e take as a source of training data the following analytic solution to the 1D unsteady advection- diffusion equation with advection and diffusion coefficients a and ν on the interval Ω = [0 , 30] . 6 u ex ( x, t ) = 1 a √ 4 π ν t exp  − x − ( x 0 + at ) 4 ν t  (8) T o construct a finite difference model (FDM), we assume a node set N = { x 0 = 0 , x 1 , ..., x N − 1 , x N = 30 } . T o construct a finite volume model (FVM), we construct the set of cells C = { [ x i , x i +1 ] , x i , x i +1 ∈ N , i ∈ { 0 , ..., N − 1 }} , with associated cell measure µ ( c i ) = | x i +1 − x i | and set of oriented boundary f aces F i = ∂ c i = { x i +1 , − x i } . W e then assume for uniform timestep ∆ t = t n +1 − t n the Implicit Euler update for the FDM giv en by u n +1 i − u n i ∆ t = L F D M [ u n +1 ; ξ ] , (9) T o obtain conservation we use the FVM update u n +1 i − u n i ∆ t = 1 µ ( c i ) X f ∈ F i Z L F V M [ u n +1 ; ξ ] · d A . (10) For the advection-dif fusion equation in the limit ∆ t → 0 , L F D M,ex = a · ∇ u + ν ∇ 2 u and L F V M ,ex = au + ν ∇ u . By construction, for any choice of hyperparameters ξ the FVM will be locally conservati ve. In this sense, the physics of mass conservation are enforced strongly via the discretization, and we parameterize only an empirical closure for fluxes - GMLS naturally enables such nativ e flux regression. W e use a single linear GMLS-net layer to parameterize both L F D M and L F V M , and train ov er a single timestep by using Eqn. 8 to ev aluate the exact time increment in Eqns. 9-10 . W e perform gradient descent to minimize the RMS of the residual with respect to ξ . For the FDM and FVM we use a cubic and quartic polynomial space, respectively . Recall that to resolve the diffusion and advecti ve timescales one would select a timestep of roughly ∆ t C F L = min  1 2 a ∆ t ∆ x , 1 4 ν ∆ t ∆ x 2  . After regressing the operator , we solve the extracted scheme to adv ance from  u 0 i = u ( x i , t 0 )  i to n u t f inal i o i . As implicit Euler is unconditionally stable, one may select ∆ t  ∆ t C F L at the expense of introducing numerical dissipation, "smearing" the solution. W e consider ∆ t ∈ { 0 . 1∆ t C F L , ∆ t C F L , 10∆ t C F L } and compare both the learned FDM/FVM dynamics to those ob- tained with a standard discretization (i.e. letting L F D M = L F D M,ex . From Fig. 3 we observe that for ∆ t/ ∆ t C F L ≤ 1 both the regressed and reference models agree well with the analytic solution. Howe ver , for ∆ t = 10∆ t C F L , we see that while the reference models are ov erly dissipative, the regressed models match the analytic solution. Inspection of the ` 2 − norm of the solutions at t f inal in T able 1 indicates that as expected, the classical solutions corresponding to L F D M,ex and L F V M ,ex con verge as O (∆ t ) . The regressed FDM is consistently more accurate than the exact operator . Most interesting, the regressed FVM is roughly independent of ∆ t , providing a 20 × improv ement in accuracy o ver the classical model. This preliminary result suggests that GMLS-Nets offer promise as a tool to de velop non-dissipati ve implicit data-dri ven models. W e suggest that this is due to the ability for GMLS-Nets to re gress higher -order dif ferential operator corrections to the discrete time dynamics, similar to e.g. Lax-Friedrichs/Lax-W endrof f schemes. 2.3 Data-driven modeling fr om molecular dynamics. In science and engineering applications, there are often high-fidelity descriptions of the physics based on molecular dynamics. One would like to extract continuum descriptions to allo w for predictions ov er longer time/length-scales or reduce computational costs. Coarse-grained modeling ef forts also hav e similar aims while retaining molecular degrees of freedom. Each seek lo wer-fidelity models that are able to accurately predict important statistical moments of the high-fidelity model over longer timescales. As an example, consider a mean-field continuum model deriv ed by coarse-graining a molecular dynamics simulation. Classically , one may pursue homogenization analysis to carefully deriv e such a continuum model, but such techniques are typically problem specific and can become technical. W e illustrate here ho w GMLS-Nets can be used to extract a conserv ativ e continuum PDE model from particle-lev el simulation data. Brownian motion has as its infinitesimal generator the unsteady dif fusion equation [10]. As a basic example, we will extract a 1D dif fusion equation to predict the long-term density of a cloud of particles 7 Figure 4: GMLS-Nets can be trained with molecular-le vel data to infer continuum dynamical models. Data are simulations of Brownian motion with periodic boundary conditions on Ω = [0 , 1] and diffusi vity D = 1 (top-left, unconstrained trajectory) . Starting with initial density of a heaviside function, we construct histograms over time to estimate the particle density (upper-right, solid lines) and perform further filtering to remove sampling noise (upper-right, dashed lines) . GMLS-Net is trained using FVM estimator of equation 10. Predictiv e continuum model is obtained for the density ev olution. Long-term agreement is found between the particle-le vel simulation (bottom, solid lines) and the inferred continuum model (bottom, dashed lines) . undergoing pseudo-1D Brownian motion. W e consider the periodic domain Ω = [0 , 1] × [0 , 0 . 1] , and generate a collection of N p particles with initial position x p ( t = 0) drawn from the uniform distribution U [0 , 0 . 5] × U [0 , 0 . 1] . Due to this initialization and domain geometry , the particle density is statistically one dimensional. W e estimate the density field ρ ( x, t ) along the first dimension by constructing a collection C of N uniform width cells and build a histogram, ρ ( x, t ) = X c ∈ C N p X p =1 1 x p ( t ) ∈ c 1 x ∈ c . (11) The 1 x ∈ A is the indicator function taking unit value for x ∈ A and zero otherwise. W e e volve the particle positions x p ( t ) under 2D Bro wnian motion (the density will remain statistically 1D as the particles evolv e). In the limit N p / N → ∞ , the particle density satisfies a diffusion equation, and we can scale the Brownian motion increments to obtain a unit dif fusion coefficient in this limit. As the ratio N p / N is finite, there is substantial noise in the extracted density field. W e obtain a low pass filtered density , e ρ ( x, t ) , by conv olving ρ ( x, t ) with a Gaussian kernel of width twice the histogram bin width. W e use the FVM scheme in the same manner as in the previous section. In particular, we regress a flux that matches the increment ( e ρ ( x, t = 10) − e ρ ( x, t = 12)) / 2∆ t . This windo w was selected, since the regression at t = 0 is ineffecti ve as the density approximates a heaviside function. Such near discontinuities are poorly represented with polynomials and subsequently not expected to train well. Additionally , we train ov er a time interv al of 2∆ t , where in general k ∆ t steps can be used to help mollify high-frequency temporal noise. T o show how the GMLS-Nets’ inferred operator can be used to make predictions, we ev olve the regressed FVM for one hundred timesteps and compare to the density field obtained from the particle 8 solver . W e apply Dirichlet boundary conditions ρ (0 , t ) = ρ (1 , t ) = 1 and initial conditions matching the histogram ρ ( x, t = 0) . Again, the FVM by construction is conservati ve, where it is easily sho wn for all t that R Ω ρdx = N p . A time series summarizing the e v olution of density in both the particle solver and the regressed continuum model is pro vided in Fig 4. While this is a basic example, this illustrates the potential of GMLS-nets in constructing continuum-le vel models from molecular data. These techniques also could hav e an impact on data-driv en approaches for numerical methods, such as projectiv e integration schemes. 2.4 Image processing: MNIST benchmark. Figure 5: MNIST Classification. GMLS-Layers are substituted for conv olution layers in a basic two-layer architecture (Conv2d + ReLu + MaxPool + Conv2d + ReLu + MaxPool + FC). The Con v-2L test are all Conv-Layers, Hybrib-2L has GMLS-Layer follo wed by a Conv-Layer , and GMLS-2L uses all GMLS-Layers. GMLS-Nets used a polynomial basis of monomials. The filters in GMLS are by design more limited than a general Con v-Layer and correspond here to estimated deri v ativ es of the data set (top-right) . Despite these restrictions, the GMLS-Net still performs reasonably well on this basic classification task (bottom-table). While image processing is not the primary application area we intend, GMLS-Nets can be used for tasks such as classification. For the common MNIST benchmark task, we compare use of GMLS-Nets with CNNs in Figure 5. CNNs use kernel size 5 , zero-padding, max-pool reduction 2 , channel sizes 16 , 32 , FC as linear map to soft-max prediction of the categories. The GMLS-Nets use the same architecture with a GMLS using polynomial basis of monomials in x, y up to degree p order = 4 . W e find that despite the features extracted by GMLS-Nets being more restricted than a general CNN, there is only a modest decrease in the accuracy for the basic MNIST task. W e do expect lar ger differences on more sophisticated image tasks. This basic test illustrates how GMLS-Nets with a polynomial basis extracts features closely associated with taking deriv ati ves of the data field. W e emphasize for other choices of basis for p ∗ and sampling functionals λ j , other features may be extracted. For polynomials with terms in dictionary order , coef ficients are sho wn in Fig. 5. Notice the clear trends and directional dependence on increases and decreases in the image intensity , indicating c [1] ∼ ∂ x and c [2] ∼ ∂ y . Given the history of PDE modeling, for many classification and re gression tasks arising in the sciences and engineering, we e xpect such deri v ati ve-based features e xtracted by GMLS-Nets will be useful in these applications. 2.5 GMLS-Net on unstructured fluid simulation data. W e consider the application of GMLS-Nets to unstructured data sets representati ve of scientific machine learning applications. Many hydrodynamic flo ws can be experimentally characterized using velocimetry measurements. While velocity fields can be estimated e ven for complex geometries, in such measurements one often does not hav e access directly to fields, such as the pressure. Howe ver , integrated quantities of interest, such as drag are fundamental for performing engineering analysis and yet depend upon both the velocity and pressure. This limits the lev el of characterization that can be accomplished when using velocimetry data alone. W e construct GMLS-Net architectures that allow for prediction of the drag directly from unstructured fluid velocity data, without any direct measurement of the pressure. 9 W e illustrate the ideas using flow past a cylinder of radius L . This provides a well-studied canonical problem whose drag is fully characterized experimentally in terms of the Re ynolds number, Re = U L/ν . For incompressible flo w past a c ylinder , one may apply dimensional analysis to relate drag F d to the Reynolds number via the drag coef ficient C d : 2 F d ρU 2 ∞ A = C d  U L ν  . (12) The U ∞ is the free-stream velocity , A is the frontal area of the cylinder , and C d : R → R . Such analysis requires in practice engineering judgement to identify relev ant dimensionless groups. After such considerations, this allows one to collapse rele v ant experimental parameters to ( ρ, U ∞ , A, L, ν ) onto a single curve. Figure 6: GMLS-Nets are trained on a CFD data set of flow velocity fields. T op: Training set of the drag coefficient plotted as a function of Reynolds number (small black dots). The GMLS-Net predictions for a test set (large red dots). Bottom: Flo w velocity fields corresponding to the smallest (left) and largest (right) Re ynolds numbers in the test set. For the purposes of training a GMLS-Net, we construct a synthetic data set by solving the Reynolds av eraged Navier -Stokes (RANS) equations with a steady state finite volume code. Let L = ρ = 1 and consider U ∈ [0 . 1 , 20] and ν ∈  10 − 2 , 10 8  . W e consider a k −  turbulence model with inlet conditions consistent with a 10% turbulence intensity and a mixing length corresponding to the inlet size. From the solution, we extract the velocity field u at cell centers to obtain an unstructured point cloud X h . W e compute C d directly from the simulations. W e then obtain an unstructured data set of 400 ( u ) i features ov er X h , with associated labels C d . W e emphasize that although U ∞ and ν are used to generate the data, they are not included as features, and the Reynolds number is therefore hidden. W e remark that the k −  model is well kno wn to perform poorly for flo ws with strong curvature such as recirculation zones. Here, in our proof-of-concept demonstration, we treat the RANS- k −  solution as ground truth for simplicity , despite its short-comings and acknowledge that a more physical study would consider ensemble a verages of LES/DNS data in 3D. W e aim here just to illustrate the potential utility of GMLS-Nets in a scientific setting for processing such unstructured data sets. As an architecture, we provide tw o input channels for the two v elocity components to three stacked GMLS layers. The first layer acts on the cell centers, and intermediate pooling layers down-sample to random subsets of X h . W e conclude with a linear activ ation layer to extract the drag coefficient as a single scalar output. W e randomly select 80% of the samples for training, and use the remainder as a test set. W e quantify using the root-mean-square (MSE) error which we find to be belo w 1 . 5% . 10 The excellent predicti ve capability demonstrated in Fig. 6 highlights GMLS-Nets ability to provide an ef fecti ve means of regressing engineering quantities of interest directly from v elocity flo w data; the GMLS-Net architecture is able to identify a latent low-dimensional parameter space which is typically found by hand using dimensional analysis. This similarity relationship across the Reynolds numbers is identified, despite the f act that it does not hav e direct access to the viscosity parameter . These initial results indicate some of the potential of GMLS-Nets in processing unstructured data sets for scientific machine learning applications. 3 Conclusions W e have introduced GMLS-Nets for processing scattered data sets lev eraging the framework of GMLS. GMLS-Nets allo w for generalizing con volutional netw orks to scattered data, while still benefiting from underlying translational inv ariances and weight sharing. The GMLS-layers provide feature extractors that are natural particularly for regressing differential operators, developing dynamical models, and predicting quantities of interest associated with physical systems. GMLS-Nets were demonstrated to be capable of obtaining dynamical models for long-time integration be yond the limits of traditional CFL conditions, for making predictions of density ev olution of molecular systems, and for predicting directly from flo w data quantities of interest in fluid mechanics. These initial results indicate some promising capabilities of GMLS-Nets for use in data-driv en modeling in scientific machine learning applications. References [1] P . J. Atzberger. “Importance of the Mathematical Foundations of Machine Learning Meth- ods for Scientific and Engineering Applications”. In: SciML2018 W orkshop, position paper , https://arxiv .org/abs/1808.02213 (2018). [2] Nathan Baker, Frank Ale xander, T imo Bremer, Aric Hagber g, Y annis Ke vrekidis, Habib Najm, Manish Parashar, Abani P atra, James Sethian, Stefan W ild, and Karen W illcox. “W orkshop Report on Basic Research Needs for Scientific Machine Learning: Core T echnologies for Artificial Intelligence”. In: (2018). [3] Y ohai Bar-Sinai, Stephan Hoyer, Jason Hickey, and Michael P . Brenner. “Learning data- driv en discretizations for partial dif ferential equations”. In: Pr oceedings of the National Academy of Sciences 116.31 (2019), pp. 15344–15349. I S S N : 0027-8424. D O I : 10 . 1073 / pnas.1814058116 . [4] M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P . V andergheynst. “Geometric Deep Learning: Going beyond Euclidean data”. In: IEEE Signal Pr ocessing Magazine 34.4 (2017), pp. 18–42. I S S N : 1053-5888. D O I : 10.1109/MSP.2017.2693418 . [5] D.S. Broomhead and D. Lowe. “Multiv ariable Functional Interpolation and Adaptiv e Net- works”. In: Comple x Systems 2.1 (1988), pp. 321–355. [6] Joan Bruna, W ojciech Zaremba, Arthur Szlam, and Y ann Lecun. “Spectral netw orks and locally connected networks on graphs”. English (US). In: International Conference on Learning Repr esentations (ICLR2014), CBLS, April 2014 . 2014. [7] Stev en L. Brunton, Joshua L. Proctor , and J. Nathan Kutz. “Disco vering go verning equa- tions from data by sparse identification of nonlinear dynamical systems”. In: 113.15 (2016), pp. 3932–3937. [8] M. Fey, J. E. Lenssen, F . W eichert, and H. Müller. “SplineCNN: F ast Geometric Deep Learning with Continuous B-Spline Kernels”. In: 2018 IEEE/CVF Confer ence on Computer V ision and P attern Recognition . 2018, pp. 869–877. [9] B. J. Gross, N. T rask, P . Kuberry, and P . J. Atzberger. “Meshfree Methods on Manifolds for Hydrodynamic Flows on Curved Surfaces: A Generalized Moving Least-Squares (GMLS) Approach”. In: arXiv:1905.10469 (2019). U R L : . [10] Ioannis Karatzas and Steven E Shre ve. “Brownian Motion and Stochastic Calculus”. In: Springer, 1998, pp. 47–127. 11 [11] Thomas N. Kipf and Max W elling. “Semi-Supervised Classification with Graph Con volutional Networks”. In: ArXiv abs/1609.02907 (2016). [12] Paul Allen K uberry, Pa vel B Bochev, and Kara J Peterson. A virtual contr ol meshfr ee coupling method for non-coincident interfaces . T ech. rep. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2018. [13] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial neural networks for solving ordinary and partial dif ferential equations”. In: IEEE T ransactions on Neural Networks 9.5 (1998), pp. 987–1000. [14] Zichao Long, Y iping Lu, Xianzhong Ma, and Bin Dong. “PDE-Net: Learning PDEs from Data”. In: Pr oceedings of the 35th International Conference on Machine Learning . Ed. by Jennifer Dy and Andreas Krause. V ol. 80. Proceedings of Machine Learning Research. Stockholmsmässan, Stockholm Sweden: PMLR, 2018, pp. 3208–3216. [15] Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Sv oboda, and Michael M. Bronstein. “Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs”. In: 2017 IEEE Conference on Computer V ision and P attern Recognition (CVPR) (2016), pp. 5425–5434. [16] Ravi G. P atel and Olivier Desjardins. “Nonlinear inte gro-dif ferential operator regression with neural networks”. In: ArXiv abs/1810.08552 (2018). [17] “Physics-informed neural networks: A deep learning frame work for solving forward and in- verse problems in volving nonlinear partial differential equations”. In: Journal of Computational Physics 378 (2019), pp. 686 –707. [18] T . Poggio and F . Girosi. “Networks for approximation and learning”. In: Pr oceedings of the IEEE 78.9 (1990), pp. 1481–1497. [19] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation”. In: The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) . 2017. [20] Charles Ruizhongtai Qi, Li Y i, Hao Su, and Leonidas J Guibas. “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space”. In: Advances in Neural Information Pr ocessing Systems 30 . Ed. by I. Guyon, U. V . Luxburg, S. Bengio, H. W allach, R. Fergus, S. V ishwanathan, and R. Garnett. Curran Associates, Inc., 2017, pp. 5099–5108. [21] Maziar Raissi and George Em Karniadakis. “Hidden physics models: Machine learning of nonlinear partial differential equations”. In: Journal of Computational Physics 357 (2018), pp. 125 –141. I S S N : 0021-9991. [22] Samuel H. Rudy, Stev en L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Data-driv en discov ery of partial dif ferential equations”. In: 3.4 (2017). [23] Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. “The Graph Neural Network Model”. In: T rans. Neur . Netw . 20.1 (Jan. 2009), pp. 61–80. I S S N : 1045-9227. [24] Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcote gui, François Goulette, and Leonidas J. Guibas. “KPCon v: Flexible and Deformable Con volution for Point Clouds”. In: (2019). U R L : . [25] Nathaniel T rask, Pav el Boche v, and Mauro Perego. “A conserv ativ e, consistent, and scalable meshfree mimetic method”. In: arXiv pr eprint arXiv:1903.04621 (2019). [26] Nathaniel T rask, Mauro Perego, and Pa vel Bochev. “A high-order staggered meshless method for elliptic problems”. In: SIAM Journal on Scientific Computing 39.2 (2017), A479–A502. [27] Holger W endland. Scattered data appr oximation . V ol. 17. Cambridge uni versity press, 2004. [28] Y ifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Y u Qiao. “SpiderCNN: Deep Learning on Point Sets with Parameterized Con volutional Filters”. In: Computer V ision – ECCV 2018 . Ed. by V ittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Y air W eiss. Cham: Springer International Publishing, 2018, pp. 90–105. I S B N : 978-3-030-01237-3. [29] Manzil Zaheer , Satwik K ottur , Siamak Rav anbakhsh, Barnabas Poczos, Ruslan R Salakhut- dinov , and Alexander J Smola. “Deep Sets”. In: Advances in Neur al Information Pr ocessing Systems 30 . Ed. by I. Guyon, U. V . Luxbur g, S. Bengio, H. W allach, R. Fergus, S. V ishwanathan, and R. Garnett. Curran Associates, Inc., 2017, pp. 3391–3401. 12 A Derivation of Gradients of the Operator τ x i [ u ] . A.1 Parameters of the operator ˜ τ . W e give here some details on the derivation of the gradients for the learnable GMLS operator τ [ u ] and intermediate steps. This can be used in implementations for back-propagation and other applications. GMLS works by mapping data to a local polynomial fit in region Ω i around x i with p ∗ ( x ) ≈ u ( x ) for x ∈ Ω i . T o find the optimal fitting polynomial p ∗ ( x ) ∈ V to the function u ( x ) , we can consider the case with λ j ( x ) = δ ( x − x j ) and weight function w ij = w ( x i − x j ) . In a region around a reference point x ∗ the optimization problem can be expressed parameterically in terms of coef ficients a as a ∗ ( x i ) = arg min a ∈ R m X j  u j − p ( x j ) T a  2 w ij . W e write for short p ( x j ) = p ( x j , x i ) , where the basis elements in fact do depend on x i . T ypically , for polynomials we just use p ( x j , x i ) = p ( x j − x i ) . This is important in the case we want to tak e deriv atives in the input values x i of the expressions. W e can compute the derivati ve in a ` to obtain ∂ J ∂ a ` ( x i ) = 0 . This implies " X j p ( x j ) w ij p ( x j ) T # a = X j w ij p ( x j ) u j . Let M = " X j p ( x j ) w ij p ( x j ) T # , r = X j w ij p ( x j ) u j , then we can rewrite the coef ficients as the solution of the linear system M a ∗ ( x i ) = r . This is sometimes written more explicitly for analysis and computations as a ∗ ( x i ) = M − 1 r . W e can represent a general linear operator ˜ τ ( x i ) using the a ∗ representation as ˜ τ ( x i ) = q ( x i ) T a ∗ ( x i ) T ypically , the weights will not be spatially dependent q ( x i ) = q 0 . Throughout, we shall denote this simply as q and assume there is no spatial dependence, unless otherwise indicated. A.2 Derivati ves of ˜ τ in x i , a ( x i ) , and q . The deriv ative in x i is giv en by ∂ ∂ x i a ∗ ( x i ) = ∂ M − 1 ∂ x i r + M − 1 ∂ r ∂ x i In the notation, we denote p ( x j ) = p ( x j , x i ) , where the basis elements in fact can depend on the particular x i . These terms can be expressed as ∂ M − 1 ∂ x i = − M − 2 ∂ M − 1 ∂ x i , where ∂ M ∂ x i = X j  ∂ ∂ x i p ( x j , x i )  p ( x j , x i ) T w ij + p ( x j , x i )  ∂ ∂ x i p ( x j , x i )  T w ij + p ( x j , x i ) p ( x j , x i ) T ∂ w ij ∂ x i  . The deriv atives in r are giv en by ∂ r ∂ x i = X j  ∂ ∂ x i p ( x j )  u j w ij + p ( x j ) u j ∂ w ij ∂ x i  . 13 The full deriv ative of the linear operator ˜ τ can be e xpressed as ∂ ∂ x i ˜ τ ( x i ) =  ∂ ∂ x i q ( x i ) T  a ∗ ( x i ) + q ( x i ) T  ∂ ∂ x i a ∗ ( x i )  . In the constant case q ( x i ) = q 0 , the deriv ative of ˜ τ simplifies to ∂ ∂ x i ˜ τ ( x i ) = q T 0  ∂ ∂ x i a ∗ ( x i )  . The deriv atives of the other terms follo w more readily . For deriv ative of the linear operator ˜ τ in the coefficients a ( x i ) , we hav e ∂ ∂ a ( x i ) ˜ τ ( x i ) = q ( x i ) . For deri vativ es of the linear operator ˜ τ in the mapping coef ficient q values, we ha ve ∂ ∂ q ( x i ) ˜ τ ( x i ) = a ( x i ) . In the case of nonlinear operators ˜ τ = q ( a ( x i )) there are further dependencies beyond just x i and a ( x i ) , and less explicit e xpressions. For example, when using MLP’ s there may be hierarchy of trainable weights w . The deriv atives of the non-linear operator can be e xpressed as ∂ ∂ w ˜ τ ( x i ) = ∂ q ∂ w ( a ( x i )) . Here, one relies on back-propagation algorithms for e v aluation of ∂ q ∂ w . Similarly , given the generality of q ( a ) , for deriv atives in a and x i , one can use back-propagation methods on q and the chain-rule with the expressions deriv ed during the linear case for a and x i dependencies. 14

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment