Neural network augmented wave-equation simulation

Neural network augmented wa ve-equation simulation Ali Siahkoohi, Mathias Louboutin, and F elix J. Herrmann School of Computational Science and Engineering Georgia Institute of T echnology {alisk, mlouboutin3, felix.herrmann}@gatech.edu Abstract Accurate forward modeling is important for solving in verse problems. An in- accurate wav e-equation simulation, as a forward operator , will offset the results obtained via in version. In this work, we consider the case where we deal with incomplete physics. One proxy of incomplete physics is an inaccurate discretiza- tion of Laplacian in simulation of wa ve equation via ﬁnite-dif ference method. W e exploit intrinsic one-to-one similarities between timestepping algorithm with Con- volutional Neural Networks (CNNs), and propose to intersperse CNNs between lo w-ﬁdelity timesteps. Augmenting neural networks with lo w-ﬁdelity timestepping algorithms may allow us to take large timesteps while limiting the numerical disper - sion artifacts. While simulating the wave-equation with lo w-ﬁdelity timestepping algorithm, by correcting the wa veﬁeld se veral time during propag ation, we hope to limit the numerical dispersion artifact introduced by a poor discretization of the Laplacian. As a proof of concept, we demonstrate this principle by correcting for numerical dispersion by keeping the velocity model ﬁxed, and varying the source locations to generate training and testing pairs for our supervised learning algorithm. 1 Introduction In in verse problem, we hea vily rely on ha ving an accurate forw ard modeling operators. Often, we can not af ford being physically or numerically accurate. In other words, being numerically inaccurate can be due to computationally complexity of accurate methods, or incomplete knowledge of the underlying data generation process. In either case, motiv ated by Ruthotto and Haber [1] , we propose to intersperse CNNs between timestepping algorithm for simulating acoustic wa ve-equation. W e mimic incomplete/inaccurate physics by simulating wave equation with ﬁnite-difference method, while utilizing a poor (second-order) discretization of Laplacian. Con ventional method for solving partial differential equations (PDEs), e.g., ﬁnite-difference and ﬁnite-element method, given enough computational recourses, are able to simulate high-ﬁdelity solutions to PDEs. On one hand, as long as the Courant–Friedrichs–Lewy conditions for stability are satisﬁed, ﬁnite-difference methods are able to compute solutions to PDE, re gardless of medium parameters, with arbitrary precision. On the other hand, ﬁnite-element method requires careful meshing of the medium in order to carry out the simulation. Another moti vation behind this w ork is to exploit the fact that in seismic, wav e simulations are usually carried out for speciﬁc families of velocity models and source/recei ver distributions. W e hope our proposed method meets halfway between two mentioned e xtremes—i.e., being too generic (ﬁnite-dif ference method) and being too problem speciﬁc (ﬁnite-element method). There are se veral attempts to exploit learning methods in wa ve-equation simulation. Raissi [2] approximates the solution to a nonlinear PDE with a neural network. The neural network, giv en points on the computational grid as input, computes the solution of PDE. Training data is obtained by computing the solution of PDE on several points. Moseley et al. [3] completely ignore the Laplacian Preprint. and they solely rely on predicting the ne xt timestep from the previous tw o timesteps by learning the action of the spatially varying velocity and Laplacian. While possible in principle, their approach needs to train for long times to pro vide reasonable simulations on relati vely simple models. Siahkoohi et al. [4] instead of ignoring the physics, relies on low-ﬁdelity wave-equation simulation, and by exploiting transfer learning, the y utilize a single CNN to correct wa veﬁeld snapshots simulated on a “nearby” velocity mode for numerical dispersion at any gi ven timestep. In this work, we e xtend ideas in Siahkoohi et al. [4] and propose using multiple CNNs, interspersed between low-ﬁdelity timesteps. Finally , Rizzuti et al. [5] propose interspersing Krylov-subspace iterations and neural nets while in verting the Helmholtz equation. They sho w improv ement in conv ergence by “propagating” an approximated wa veﬁeld, obtained from a limited number of iterations, with the aid of a trained con volutional neural net. This technique can be seen as the frequency-domain counterpart of our proposed method. Our paper is or ganized as follows. First, we describe our approach in detail by ﬁrst, describing our formulation for learned wav e simulation. Next, we introduce our training objectiv e function. Due to dependencies of CNN parameters, we de vised an training heuristic that we describe. Before explaining our numerical experiments, we state used CNN architecture and training details. Next, we describe our three numerical experiments we conduct and discuss effecti veness of the proposed method. 2 Theory W e describe how we augment low-ﬁdelity physics with learning techniques to handle incomplete and/or inaccurate physics, where the low-ﬁdelity physics is modeled via ﬁnite-dif ference method with a poor discretization of the Laplacian. T o ensure accuracy , the temporal and spatial discretization in high-ﬁdelity wa ve-equation simulations ha ve to be chosen very ﬁne, typically one to two orders of magnitude smaller than Nyquist sampling rate. As mentioned earlier , we will utilize a poor discretiza- tion (only second order) for the Laplacian to carry out lo w-ﬁdelity wa ve-equation simulations, but the scheme can be extended to other proxies of incomplete or inaccurate physics. 2.1 Simulations by timestepping After discretization of the acoustic wav e equation, a single timestep of of scalar wa veﬁelds, simulated on 0 ≤ t ≤ T , can be written as below: u j +1 = 2 u j − u j − 1 + δ t 2 c 2 ∆u j , j = 0 , 1 , . . . , N − 1 , (1) where u j is the high-ﬁdelity scalar wav eﬁeld at j th timestep, δ t is the temporal discretization (timestep), c is the spatially v arying velocity in the medium, and ∆ is the high-order discretization of Laplacian. Similar to Equation 1, the low-ﬁdelity timestepping equation can be formulated as ¯ u j +1 = 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j , j = 0 , 1 , . . . , M − 1 , (2) where ¯ u j is the low-ﬁdelity scalar wav eﬁeld, δ T is the coarse timestep, ¯ c is the coarse spatially varying v elocity , and ¯ ∆ is the coarse (only second second order) discretized Laplacian. Motiv ated by Ruthotto and Haber [1] , we consider ev ery timestep as a single layer in a neural network, where the discretized Laplacian i s a linear operator , followed by the (nonlinear) action of the spatial varying v elocity . Moreover , noticing the additional terms in the Equation 1, each timestep is similar to a residual block introduced by Szegedy et al. [6] . Figures 1a and 1b schematically indicate each timestep as a block, corresponding to high- and low-ﬁdelity discretization of wa ve equation. respectively . The similarity of high- and low -ﬁdelity timestepping method and CNNs can be percei ved from Figures 2a and lo w-ﬁdelity-step, respectiv ely , where red and yello w blocks correspond to high- and low-ﬁdelity timestepping equations, respecti vely . As it can be seen, high-ﬁdelity simulation of the wa ve equation up to time t = T requires a lot of high-ﬁdelity timesteps. On the other hand, Figure 2b shows that the low-ﬁdelity simulations can be done with much less low-ﬁdelity timesteps, due to course time sampling, which each timestep is cheaper than the high-ﬁdelity timesteps due to the coarse discretization of Laplacian. Although computationally cheap, the low-ﬁdelity w av e-equation simulations suffer from numerical dispersion artifacts. 2 (a) (b) Figure 1: Comparing a single lo w and high-ﬁdelity timestep. a) High-ﬁdelity timestep. b) Low-ﬁdelity timestep. (a) (b) Figure 2: Comparing low and high-ﬁdelity discretized w av e equation simulations. a) High-ﬁdelity simulation. b) Low-ﬁdelity simulation. 2.2 Learned wa ve simulations Depending on the domain of application, we can assume wa ve simulations are typically carried out for speciﬁc families of velocity models and source/receiv er distributions. This motiv ates us to deploy a data-dri ven w av e simulation algorithm which is coupled with low-ﬁdelity and cheap physics and hope to recov er high-ﬁdelity wav e simulations on a family of velocity models. In our method, we propagate the coarse-grained wa veﬁelds according to Equation 2 with a coarsened Laplacian. After k timesteps, where k is a hyperparameter , we apply a correction with a CNN, G θ i , parameterized by θ i , to the obtained wav eﬁeld at j th timestep and proceed with the timestepping. The proposed data-driv en timestepping wave simulation method is formalized in Equation 3. ¯ u j +1 =  G θ i  2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j  , i = b j k c if j ≡ k − 1 ( mo d k ) , 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j else (3) where j = 0 , 1 , . . . , M − 1 . The schematic representation of Equation 3 is illustrated in Figure 3. Y ello w blocks represent low-ﬁdelity timesteps (see Equation 2 and Figure 1b) and blue blocks correspond to CNNs, G θ i , i = 0 , 1 , . . . , b M − 1 k c . The CNNs correct for the ef fects of inaccurate physics—i.e., numerical dispersion in our e xperiments, at e very k th low-ﬁdelity timestep. In this work, the parameters, θ i , are not shared among the CNNs. W e hav e not explored the possibility of shared weights among the CNNs in dif ferent stages of wav e propagation. Note that although parameters of the CNNs are not shared, they are not independent—i.e., after j th timestep, CNN G θ i , i = b j k c , corrects for errors in the wa veﬁeld introduced by low-ﬁdelity timestepping and imperfections present in the output of ( i − 1) th CNN, which hav e been propagated through timestepping. Therefore, a small perturbation in the parameters of a CNN in the initial stages 3 Figure 3: A schematic representation of the proposed method. of neural network augm ented timestepping causes noticeable dif ferences in the input of the CNNs in later stages of the wave propagation. The described dependencies among the CNN parameters introduces difﬁculties in optimizing the parameters of the CNNs. Below we describe our heuristic for training the CNNs. 2.3 T raining objective During training, we train all the CNNs tow ard the high-ﬁdelity solution of wave-equation, at the corresponding timestep, obtained by solving Equation 1. As it can be seen from Equation 3, after j th timestep, CNN G θ i , i = b j k c , is tasked to correct the ef fects of low-ﬁdelity timestepping. During training, i th CNN maps its input, 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j , to u j +1 , result of j th timestep using high-ﬁdelity timestepping, obtained by Equation 1. Deﬁne function ¯ F k ( . ) as the action of k low-ﬁdelity timesteps—i.e., ¯ F k ( . ) represents k consecutiv e low-ﬁdelity time stepping blocks, depicted in Figure 1b. Clearly , ¯ F k is only a function of k , δ T , ¯ c , and ¯ ∆ . Using the deﬁned notation, we can write the input to i th CNN, ˆ u i , i = 0 , 1 , . . . , b M − 1 k c , as follows: ˆ u i = ¯ F k ( G θ i − 1 ( ˆ u i − 1 )) , i = 1 , 2 , . . . , b M − 1 k c , ˆ u 0 = ¯ F k ( q ) , (4) where q is the source. Also let u τ i denote the wa veﬁeld obtained at j th timestep of high-ﬁdelity timestepping (Equation 1), where τ i = j = ( k + 1) i − 1 . The input-output pair of the i th CNN is ( ˆ u i , u τ i ) . W e can generated multiple training pairs for CNNs by simulating ( ˆ u i , u τ i ) pairs, for v arious velocity models and source locations. Assume we hav e n pairs of of training data for CNNs, namely , ( ˆ u ( p ) i , u ( p ) τ i ) , p = 0 , 1 , . . . , n − 1 . The objectiv e function for optimizing i th CNN can be written as follows: L i = 1 n n − 1 X p =0    G θ i ( ˆ u ( p ) i ) − u ( p ) τ i    1 , i = 0 , 1 , . . . , b M − 1 k c . (5) In the past, in a similar attempt, we used Generative Adv ersarial Networks [GANs, 7 ] to train a CNN in order to remov e numerical dispersion from wa veﬁeld snapshots [ 4 , 8 ]. In this work we choose to use ` 1 norm as the misﬁt function based on two reasons. First, training GANs is computationally expensi ve since it requires training an additional neural network that discerns between high-ﬁdelity wa veﬁeld snapshots and corrected ones. The computational complexity of the proposed method in this work is signiﬁcantly higher than our previous attempts [ 4 , 8 ], because it in volv es training multiple CNNs. Based on the mentioned facts, for limiting the computation time we chose to use ` 1 misﬁt function. Second, motiv ated by a numerical e xperiment performed by Hu et al. [9] , ` 1 norm misﬁt function yields the second best results after the misﬁt function utilizing a combination of GANs and ` 1 norm misﬁt. In the next section, we describe our heuristic for training the CNNs. 2.4 T raining heuristic T o overcome complexities caused by dependencies between parameters of CNNs, we optimize the objecti ve functions L i with a heuristic described below . W e minimize L i , i = 0 , 1 , . . . , b M − 1 k c , with respect to θ i , respectiv ely . In other words, we minimize L i with respect to θ i , by keeping the rest of 4 the parameters ﬁxed. W e keep updating all the set of parameters, in a cyclic fashion—i.e., once we updated all the parameters, L i , i = 0 , 1 , . . . , b M − 1 k c , we start ov er and update them again, in order , until a stopping criteria is achie ved. W e will describe the stopping criteria used in our experiments later . W e minimize objecti ve functions the abo ve objectiv es 5 with a variant of Stochastic Gradient Descent known as the Adam optimizer [ 10 ] with momentum parameter β = 0 . 9 and a linearly decaying stepsize with initial v alue µ = 2 × 10 − 4 for both the generator and discriminator networks. During each iteration of Adam, the gradient L i is approximated by a single randomly selected training pair . These pairs are selected without replacement. Once all the training pairs have been selected, we start ov er by randomly picking training pairs, without replacement from the entire training set. The optimization carries out for a predetermined number of total iterations , where each iterations consists of drawing a random training pair , without replacement, and updating parameters of a CNN. Additionally , while optimizing θ i by keeping the rest of the parameters ﬁxed, before proceeding to the next set of parameters, we carry out the optimization to update θ i for number of iterations, which we refer to it as mini-iterations . Algorithm 1 indicates the steps for optimizing objectiv e functions 5. Algorithm 1 Heuristic for optimizing CNNs G θ i , i = 0 , 1 , . . . , b M − 1 k c . 1. INPUT : M axI tr // total number of iterations to carry out the optimization M axM iniI tr // mini-iterations before proceeding to the next CNN ¯ F k ( . ) // k consecutive low-fidelity time stepping blocks q ( p ) p = 0 , 1 , . . . , n − 1 \\ sources corresponding to different training pairs u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c \\ high-fidelity snapshots θ 0 i , i = 0 , 1 , . . . , b M − 1 k c // randomly initialized parameters 2. I tr N um ← 0 3. FOR i = 0 : b M − 1 k c DO 4. θ i = θ 0 i 5. FOR p = 0 : n − 1 DO 6. ˆ u ( p ) 0 = ¯ F k ( q ( p ) ) 7. WHILE itr N um < M axI tr DO 8. FOR i = 0 : b M − 1 k c DO 9. IF i > 0 DO 10. FOR p = 0 : n − 1 DO 11. ˆ u ( p ) i = ¯ F k ( G θ i − 1 ( ˆ u ( p ) i − 1 )) 12. FOR miniI tr N um = 1 : M axM iniI tr DO 13. p ← S ampleW ithoutRepl acement ( { 0 , 1 , . . . , n − 1 } ) 14. θ i ← arg min θ i    G θ i ( ˆ u ( p ) i ) − u ( p ) τ i    1 15. I tr N um ← I tr N um + 1 16. RETURN θ i , i = 0 , 1 , . . . , b M − 1 k c 2.5 CNN architectur e Motiv ated by our previous attempts for numerical dispersion remov al from wa veﬁeld snapshots [ 4 , 8 ], we use the exact architecture provided by Johnson et al. [11] , which includes Residual Blocks, the main building block of ResNets, introduced by He et al. [12] , for all the CNNs G θ i , i = 0 , 1 , . . . , b M − 1 k c . 2.6 T raining details and implementation While CNNs are kno wn to generalize well—i.e., maintain the quality of performance when applied to unseen data, they can only be successfully applied to a data set dra wn from the same distribution as the training data. Because of the Earth’ s heterogeneity and complex geological structures present in realistic-looking models, training a neural network that can generalize well when applied to another velocity model can become challenging. While we have successfully demonstrated that transfer 5 learning [ 13 ] can be used in situations where the neural network is initially trained on data from a proximal surve y [ 4 ], we chose in this contribution, as a proof of concept, to keep the velocity model ﬁxed, and v ary the source locations to generate different training/testing pairs. W e use the Marmousi velocity model and out of 401 av ailable shot locations with 7 . 5 m spacing, we allocate half of the shot locations to training and use the rest of the shot locations to generate testing pairs, for ev aluation purposes. The maximum simulation time in our experiments in 1 . 1 s . W e designed and implemented our deep architectures in T ensorFlow 1 . T o carry out our wa ve-equation simulations with ﬁnite differences, we used De vito 2 [ 14 , 15 ]. W e used the functionality of Operator Discretization Library 3 to wrap Devito operators into a T ensorFlo w layers. Our implementation can be found on GitHub 4 . W e ran our algorithm on Amazon W eb Services’ g3.4xlarge instance, where we optimize the CNN parameters on a NVIDIA T esla M60 GPU and Devito utilizes 16 CPU cores to perform ﬁnite- difference wave-equation simulations. Initially , we simulate the high-ﬁdelity training waveﬁeld snapshots, u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c , only once, in the beginning, and store them. In order to limit CPU-GPU communication, before utilizing the GPU to to update θ i , i = 0 , 1 , . . . , b M − 1 k c , we generate the input to i th CNN, ˆ u ( p ) i , p = 0 , 1 , . . . , n − 1 all at once, and store them. Afterwards, i th CNN can be (re)trained using the stored input/output wav eﬁeld snapshot pairs for sev eral mini-iterations. 3 Numerical experiments W e want to indicate that neural networks, when augmented with inaccurate physics, e.g., a poor discretization of Laplacian, are able to approximate the wa veﬁelds obtained by an accurate approxi- mation to wav e equation. T o demonstrate this, we conduct three numerical experiments in which we keep the velocity model ﬁx ed, and vary the source locations to generate different training/testing pairs. The experiments differ in the number of CCNs used throughout learned wav e propagation. W e use three, ﬁv e, and ten CNNs while keeping the total number of iterations ﬁxed. This implies that an experiment with more CNNs, optimizes each CNN with a smaller number of iterations per CNN, because, iterations per CNN × number of CNNs = total number of iterations. A neural network augmented wa ve simulator with n 1 CNNs needs more training iterations and possibly more training data to perform equally as well as a neural network augmented w av e simulator with n 2 CNNs, when n 1 > n 2 . For a ﬁxed number of total iterations, iterations per CNN is inv ersely proportional to number of CNNs utilized. Therefore, the ﬁrst n 2 CNNs in the neural network augmented wa ve simulator with n 1 CNNs will perform worse than the CNNs in the wave propagator with n 2 CNNs. Consequently , the error accumulated by the poor performance of ﬁrst n 2 CNNs, combined with artifacts introduced by low-ﬁdelity timestepping makes the matters worse for the later CNNs in the more complex learned w av e propagator . Therefore, in our experi ments, since the total number of iterations is ﬁxed, we e xpect to see the quality of dispersion remov al degrade as the number of CNNs increase in a learned wave propagator . T able 1 summarizes the total number of iterations, iterations per CNN, training pairs, training time, and number of tunable parameters for the three different e xperiments. CNNs Iterations Iterations per CNN Pairs per CNN Time Param. count 3 100500 33500 201 17 . 99 hours 34150272 5 100500 20100 201 19 . 79 hours 56917120 10 100500 10050 201 49 . 24 hours 113834240 T able 1: Summary of details in the three neural network augmented wa ve-equation simulation experiments. 1 https://www .tensorﬂow .org/ 2 https://www .devitoproject.or g/ 3 https://odlgroup.github .io/odl/ 4 https://github .com/alisiahkoohi/NN- augmented- wa ve- sim 6 As described earlier and presented in T able 1, the MaxItr variable used in the While condition in line 8 of Algorithm 1 is set to 500 for all our experiments. Figures 4 − 6 demonstrate the v alues of objectiv e function presented in Equation 5 in orange, and the waveﬁeld correction signal-to-noise ratio (SNR) in blue, ev aluated on testing data pairs during training, for experiments with three, ﬁ ve, and ten CNNs, respectiv ely . Note that the SNR curves hav e not been used to determine when to stop training and they are only depicted for demonstration purposes. Figures 4a, 4c, and 4e show the wa veﬁeld correction SNR for ﬁrst, second, and third CNN, re- spectiv ely , in the neural network augmented wav e simulator that includes three CNNs. Similarly , Figures 4b, 4d, and 4f depict the training objective v alues throughout training for ﬁrst, second, and third CNNs. As it can be seen from objective function curv es, the raining heuristic has been effecti ve and the objecti ve function values have a decreasing trend. Note that CNNs has been trained for 33500 iterations, on average, with a total of 100500 iterations. Sev eral equispaced spikes can be noticed on the objective function value curves. For instance, see the objectiv e value function curve of the third CNN, in Figure 4f, at 6030 , 8040 , 10050 , and 12060 iterations. The mini-batch we use in this experiment is 10 . Those spikes occur in moments in training when we have started retraining the third CNN, after updating the ﬁrst and second CNNs. As discussed before, a change in the parameters of the CNNs preceding a CNN causes changes in the input of the later CNN, and consequently the objectiv e function becomes large when starting to retrain the CNNs in later stages again. Similar objectiv e function value and SNR curves for other two neural network augmented wav e propagators, utilizing ﬁv e and ten CNNs, can be found in Figures 5 and 6, respectively . First column in Figures 5 and 6, from top to bottom, indicate SNR of wav eﬁeld correction obtained by the ﬁrst to the last CNN, ev aluated on testing pairs while training, respectively . The second column of Figures 5 and 6 indicate the objectiv e function value curv es throughout optimization of Equation 5 for training neural network augmented wav e propagators, utilizing ﬁv e and ten CNNs, respecti vely . In both columns, from top to bottom, the objecti ve function v alue curv es correspond to CNNs from beginning to the end of the learned wa ve propagators, in order . W e make two main observ ations from Figures 5 and 6. First, the objectiv e function values indicate ov erall decreases, v alidating ef fectiv eness of the introduced heuristic. Also, the spikes on the objecti ve function value curve can be seen, which are correlated with the stages in training when Algorithm 1 revisits a CNN after updating the rest of the CNN parameters. As explained before, spikes are caused by change in parameters of preceding CNNs to a CNN, which in turn alters the input training wa veﬁelds of the CNN. Second, due to decrease in iterations per CNN as the number of CNNs increases, the SNR curves con ver ge to a lower v alue when the number of CNNs increase. (a) (b) (c) (d) (e) (f) Figure 4: Neural network augmented wa ve simulation with three CNNs. First column from top to bottom: SNR curves, ev aluated on testing pairs during training, for a) the ﬁrst to e) the last CNN, in order . Second column from top to bottom: training objectiv e function v alue curves, e valuated on training pairs, for b) the ﬁrst to the f) last CNN, in order . Next, we will demonstrate the the corrected waveﬁelds in three conducted experiments ev aluated ov er one testing shot location. For each experiment, we sho w the high-ﬁdelity wa veﬁeld snapshots, u τ i , i = 0 , 1 , . . . , b M − 1 k c , where i iterates over the CNNs, numerically dispersed lo w-ﬁdelity 7 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 5: Neural network augmented wav e simulation with ﬁv e CNNs. First column from top to bottom: SNR curves, e valuated on testing pairs during training, for a) the ﬁrst to i) the last CNN, in order . Second column from top to bottom: training objective function value curves, ev aluated on training pairs, for b) the ﬁrst to the j) last CNN, in order . wa veﬁelds, and the corrected waveﬁeld snapshots by the CNNs. T o ev aluate the performance of each correction, we also depict the correction error —i.e., difference between the high-ﬁdelity and corrected wav eﬁeld snapshots. Figure 7 shows the mentioned wav eﬁeld snapshots for the neural network augmented wa ve simulator with three CNNs. First column shows the high-ﬁdelity wav eﬁelds by solving Equation 1, second column depicts lo w-ﬁdelity simulations by solving Equation 2, third column indicates the result of neural network augmented waveﬁeld simulations, and the fourth column is the learned wave simulation error—i.e., difference between the ﬁrst and last column in Figure 7. Similarly , Figures 8 − 10 show the high- and low- ﬁdelity and learned wave simulation wa veﬁeld snapshots in the ﬁrst three columns, in order , for the neural network augmented wav e simulator with ﬁv e and ten CNNs, respectively . As expected because of the reasons stated before, we observe that the quality of neural network augmented wa ve-equation simulation degrades as the number of CNNs increases. On the other hands, the high quality of learned wav e simulation with few CNNs (see Figure 7) suggests the quality of the simulation with more CNNs might be improved by increasing the number of iterations. As it can be seen in the last column of Figures 9 and 10, the learned wa ve simulation with ten CNNs has the lo west quality . It can be seen that the learned wav e simulation has the least accuracy in direct w av e, which happen to be the ev ents with largest amplitudes. Also, it appears that most of the numerical dispersion has been remov ed, the phase has been recov ered, and residual is mostly amplitude differences. 3.1 Perf ormance comparison: Single CNN low-to-high-ﬁdelity mapping In order to e valuate the effecti veness of the proposed method we also train a single CNN similar to our pre vious attempt to remov e numerical dispersion from wav eﬁeld snapshots [ 4 , 8 ] and compare the result of numerical dispersion remov al with the proposed method. T o be more precise, for each presented neural network augmented wav e-equation simulation experiment, where we use three, ﬁve, and ten CNNs, we train a single CNN, G θ , with the same architecture as the architecture used in learned wa ve propag ators, in order to remo ve numerical dispersion from all the lo w-ﬁdelity wa veﬁeld 8 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 6: Neural network augmented wav e simulation with ten CNNs. First column from top to bottom: SNR curves, ev aluated on testing pairs during training, for a) the ﬁrst to s) the last CNN, in order . Second column from top to bottom: training objectiv e function v alue curves, e valuated on training pairs, for b) the ﬁrst to the t) last CNN, in order . 9 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 7: Neural network augmented wa ve simulation with three CNNs. First column from top to bottom: a, e, i) high-ﬁdelity wav eﬁeld snapshots, in order . Second column from top to bottom: b, f, j) low-ﬁdelity wav eﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wa veﬁelds, in order . Third column from top to bottom: c, g, k) result of neural network augmented wa ve-equation simulation. Output of the ﬁrst, second, and the last CNN, in order . Fourth column from top to bottom: d, h, l) difference between ﬁrst and third column, in order . snapshots simulated by solving Equation 2 for j ≡ k − 1 (mod k ) , on training shot locations. Like wise to pre vious e xamples, here we also use half of the av ailable shot locations to simulate training pairs, and the rest is used to e valuate the performance of the trained CNN. The input to G θ during training can be written as follows (compare with Equation 4): ˜ u i = ¯ F k ( ˜ u i − 1 ) , i = 1 , 2 , . . . , b M − 1 k c , ˜ u 0 = ¯ F k ( q ) , (6) The desired output for the mentioned CNN is the high-ﬁdelity wav eﬁeld snapshots simulated on training shot locations, u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c . The objecti ve function for the mentioned CNN can be represented as follows: L = 1 n ( b M − 1 k c + 1) b M − 1 k c X i =0 n − 1 X p =0    G θ ( ˜ u ( p ) i ) − u ( p ) τ i    1 . (7) W e minimize objectiv e function 7 over θ with Adam optimizes, using the same maximum number of iterations as before, this time by combining all the training pairs associated with dif ferent CNNs in the learned wa veﬁeld simulation example. As mentioned before, in order to compare with the proposed method, we minimize objective function 7 over three different set of input-output pairs, each corresponding to our presented e xperiments with v arying number of CNNs. T able 2 summarizes the total number of iterations, training pairs, training time, and number of tunable parameters for three different cases, which dif fer in number of timesteps which we choose to correct the numerical dispersion. This selected timesteps are associated with the timesteps that the CNNs operated on, in our three previous e xamples. 10 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 8: Neural network augmented wav e simulation with ﬁv e CNNs. First column from top to bottom: a) to q) high-ﬁdelity waveﬁeld snapshots, in order . Second column from top to bottom: b) to r) low-ﬁdelity w av eﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wa veﬁelds, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the ﬁrst to the last CNN, in order . Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . # T imesteps to correct Iterations Pairs per CNN Time Param. count 3 100500 603 13 . 85 hours 11383424 5 100500 1005 13 . 96 hours 11383424 10 100500 2010 14 . 56 hours 11383424 T able 2: Summary of details in three conducted experiments 11 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 9: Neural network augmented wa ve simulation with ten CNNs, ﬁrst part. First column from top to bottom: a) to q) high-ﬁdelity wav eﬁeld snapshots, in order . Second column from top to bottom: b) to r) lo w-ﬁdelity wa veﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wa veﬁelds, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the ﬁrst to the ﬁfth CNN, in order . Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . 12 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 10: Neural network augmented wa ve simulation with ten CNNs, second part. First column from top to bottom: a) to q) high-ﬁdelity wa veﬁeld snapshots, in order . Second column from top to bottom: b) to r) low-ﬁdelity wa veﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wav eﬁelds, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the sixth to the last CNN, in order . Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . 13 The slight difference in runtime among three different cases provided in T able 2 is partly due to different number of training pairs needed to be generated. Figure 11 depicts the wa veﬁeld snapshot correction SNR curves, e valuated on testing pairs while training, and the value of objecti ve function 7, in single CNN lo w-to-high-ﬁdelity mapping experiment, as a function of number of iterations. Figures 11a, 11c, and 11e sho w the SNR curves, when we trained the CNN on wa veﬁeld snapshots correspond to three, ﬁv e, and ten, timesteps, respectively . Similarly , Figures 11b, 11d, and 11f depict the training objecti ve function v alue (Equation 7), when the CNN is trained on waveﬁeld snapshots correspond to three, ﬁv e, and ten, timesteps, respectively . SNR curves depicted in ﬁrst column of Figure 11 sho w the ev olution of wa veﬁeld correction SNR e valuated on randomly selected testing wa veﬁeld wa veﬁelds form the waveﬁeld snapshots combined from dif ferent timesteps. Therefore, Figures 11a, 11c, and 11e indicate that the three dif ferent CNNs con ver ge to a wa veﬁeld correction SNR around 20 dB, regardless of number of timesteps they are correcting for . Although this does not suggest that the performance will stay the same as we increase the number of timesteps needed to be corrected. By comparing Figure 11a with ﬁrst column of Figure 4 (SNR curv es for neural netw ork augmented wa ve-equation simulation with three CNNs), we observe that, on a verage the two methods are performing equally well. (a) (b) (c) (d) (e) (f) Figure 11: Single CNN low-to-high-ﬁdelity mapping. a) wa veﬁeld snapshot correction SNR curve and b) objectiv e function value curve when CNN is trained on wav eﬁeld snapshot pairs corresponding to learned wav eﬁeld simulation with three CNNs. c) wa veﬁeld snapshot correction SNR curve and d) objectiv e function value curv e when CNN is trained on wav eﬁeld snapshot pairs corresponding to learned waveﬁeld simulation with ﬁve CNNs. e) wav eﬁeld snapshot correction SNR curve and f) objectiv e function value curv e when CNN is trained on wav eﬁeld snapshot pairs corresponding to learned wa veﬁeld simulation with ten CNNs. Finally , we will sho w the wav eﬁeld corrected by the single CNN low-to-high-ﬁdelity mapping method, for comparison with our proposed method. Figures 12 and 13 indicate the corrected wa veﬁelds, for cases where three and ﬁv e timesteps need to be corrected. Figures 14 and 15 demonstrate the corrected w av eﬁelds, for the case where ten timesteps need to be corrected, in two parts. In Figures 12 − 15, the ﬁrst, second, third, and fourth columns depict the high-ﬁdelity and lo w ﬁdelity waveﬁeld snapshots, corrected low-ﬁdelity w av eﬁeld snapshots, and the error in numerical dispersion remov al, respectiv ely . In each columns, from top to bottom, the simulation time increases. For comparison between our proposed method and single CNN low-to-high-ﬁdelity mapping, compare Figures 7 with 12, 8 with 13, 9 with 14, 10 with 15. As it can be seen, the single CNN low-to-high- ﬁdelity mapping method maintains the quality of its performance when the number of timesteps that need to be corrected increases. On the other hand, as the number of CNNs in neural network augmented wa ve-equation simulation increases, the performance drops, by keeping the maximum number of iterations ﬁxed. Also, by comparing T ables 1 and 2, we observe that the training time needed for single CNN low-to-high-ﬁdelity mapping, when number of timesteps needed to be corrected increases, for ﬁxed number of maximum iterations, grows very slowly compared to the training time required for neural network augmented wa ve-equation simulation. 14 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 12: Single CNN low-to-high-ﬁdelity mapping with three timesteps to be corrected. First column from top to bottom: a, e, i) high-ﬁdelity wav eﬁeld snapshots, in order . Second column from top to bottom: b, f, j) low-ﬁdelity wa veﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wav eﬁelds, in order . Third column from top to bottom: c, g, k) result of single CNN lo w-to-high-ﬁdelity mapping. Fourth column from top to bottom: d, h, l) difference between ﬁrst and third column, in order . 4 Conclusions Our numerical experiments demonstrate that, gi ven suitable training data, the well-trained neural net- work augmented wa ve-equation simulator is capable of approximating wa veﬁeld snapshots simulated by high-ﬁdelity simulation. In this work, as a proxy of inaccurate physics, we simulate w av e-equation with ﬁnite-dif ference method, using a poor discretization of Laplacian. Although not computationally fa vorable to high-ﬁdelity wave simulation, we showed that the learned wav e simulator deals with inaccurate physics. An important observation we made is that training time of the proposed method gets quickly very long and to achieve high accuracy , it may not be possible to utilize too many CNNs. On the other hand, training time required for the single CNN low-to-high-ﬁdelity mapping experiments, conducted for the sake of comparison, grows v ery slowly as the number of timesteps to be corrected increases. In future, we intend to initialize the CNN parameters in the proposed method with the parameters of a CNN trained by the single CNN low-to-high-ﬁdelity mapping algorithm. The initialization may signiﬁcantly reduce the training time needed for the neural network augmented wa ve-equation simulation method, and may giv e the chance to ﬁne-tune the CNNs to the speciﬁc timestep that each CNN is assigned to correct. 5 Acknowledgments The authors thank Xiaowei Hu for his open-access repository 5 on GitHub . Our software implementa- tion built on this w ork. 5 https://github .com/xhujoy/CycleGAN- tensorﬂow 15 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 13: Single CNN lo w-to-high-ﬁdelity mapping with ﬁ ve timesteps to be corrected. First column from top to bottom: a) to q) high-ﬁdelity wa veﬁeld snapshots, in order . Second column from top to bottom: b) to r) low-ﬁdelity wa veﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wav eﬁelds, in order . Third column from top to bottom: c) to s) result of single CNN lo w-to-high-ﬁdelity mapping. Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . 16 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 14: Single CNN low-to-high-ﬁdelity mapping with ten timesteps to be corrected, ﬁrst part. First column from top to bottom: a) to q) high-ﬁdelity waveﬁeld snapshots, in order . Second column from top to bottom: b) to r) low-ﬁdelity wav eﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wa veﬁelds, in order . Third column from top to bottom: c) to s) result of single CNN low-to-high-ﬁdelity mapping. Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . 17 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 15: Single CNN low-to-high-ﬁdelity mapping with ten timesteps to be corrected, second part. First column from top to bottom: a) to q) high-ﬁdelity waveﬁeld snapshots, in order . Second column from top to bottom: b) to r) low-ﬁdelity wav eﬁeld snapshots simulated by solving Equation 2 with the same simulation time as high-ﬁdelity wa veﬁelds, in order . Third column from top to bottom: c) to s) result of single CNN low-to-high-ﬁdelity mapping. Fourth column from top to bottom: d) to t) difference between ﬁrst and third column, in order . 18 References [1] Lars Ruthotto and Eldad Haber . Deep neural networks motiv ated by partial dif ferential equations. CoRR , abs/1804.04272, 2018. URL http://arxiv .org/abs/1804.04272 . [2] Maziar Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. The Journal of Mac hine Learning Resear ch , 19(1):932–955, 2018. [3] Benjamin Moseley , Andrew Markham, and T arje Nissen-Meyer . Fast approximate simulation of seismic wa ves with deep learning. arXiv pr eprint arXiv:1807.06873 , 2018. [4] Ali Siahkoohi, Mathias Louboutin, and Felix J. Herrmann. The importance of transfer learning in seismic modeling and imaging. 2019. Submitted to GEOPHYSICS in February 2019. [5] Gabrio Rizzuti, Ali Siahkoohi, and Felix J. Herrmann. Learned iterativ e solvers for the helmholtz equation. In EA GE Annual Conference Pr oceedings , 06 2019. doi: 10.3997/ 2214- 4609.201901542. URL https://slim.gatech.edu/Publications/Public/Conferences/EA GE/ 2019/rizzuti2019EA GElis/rizzuti2019EAGElis.pdf . (EA GE, Copenhagen). [6] Christian Szegedy , Sergey Iof fe, V incent V anhoucke, and Ale xander A Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Pr oceedings of the Thirty-F irst Association for the Advancement of Artiﬁcial Intelligence Confer ence on Artiﬁcial Intelligence (AAAI-17) , volume 4, pages 4278–4284, 2017. URL http://aaai.org/ocs/index.php/ AAAI/AAAI17/paper/view/14806 . [7] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David W arde-Farle y, Sherjil Ozair, Aaron Courville, and Y oshua Bengio. Generativ e Adversarial Nets. Advances in neural information pr ocessing systems , pages 2672–2680, 2014. [8] Ali Siahkoohi, Mathias Louboutin, Raji v Kumar , and Felix J. Herrmann. Deep-con volutional neural networks in prestack seismic: T wo exploratory examples. SEG T ec hnical Pr ogram Expanded Abstracts 2018 , pages 2196–2200, 2018. doi: 10.1190/segam2018- 2998599.1. URL https://library .seg.org/doi/abs/10.1190/seg am2018- 2998599.1 . [9] T ao Hu, Zhizhong Han, Abhinav Shri v astav a, and Matthias Zwicker . Render4completion: Synthesizing multi-vie w depth maps for 3d shape completion. arXiv pr eprint arXiv:1904.08366 , 2019. [10] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Confer ence on Learning Representations , 2015. [11] Justin Johnson, Ale xandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Computer V ision – Eur opean Confer ence on Computer V ision (ECCV) 2016 , pages 694–711. Springer International Publishing, 2016. doi: 10.1007/978- 3- 319- 46475- 6_43. URL https://link.springer .com/chapter/10.1007% 2F978- 3- 319- 46475- 6_43 . [12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 770–778, June 2016. doi: 10.1109/CVPR.2016.90. URL https://ieeexplore.ieee.or g/ document/7780459 . [13] Jason Y osinski, Jef f Clune, Y oshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information pr ocessing systems , pages 3320–3328, 2014. [14] M. Louboutin, M. Lange, F . Luporini, N. Kukreja, P . A. W itte, F . J. Herrmann, P . V elesko, and G. J. Gorman. Devito: an embedded domain-speciﬁc language for ﬁnite differences and geophysical exploration. CoRR , abs/1808.01995, Aug 2018. URL https://arxiv .org/abs/1808. 01995 . [15] F . Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hück elheim, C. Y ount, P . W itte, P . H. J. Kelly, G. J. Gorman, and F . J. Herrmann. Architecture and performance of devito, a system for automated stencil computation. CoRR , abs/1807.03032, jul 2018. URL http://arxiv .org/abs/ 1807.03032 . 19

Neural network augmented wave-equation simulation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment