Neural network augmented wave-equation simulation
Accurate forward modeling is important for solving inverse problems. An inaccurate wave-equation simulation, as a forward operator, will offset the results obtained via inversion. In this work, we consider the case where we deal with incomplete physi…
Authors: Ali Siahkoohi, Mathias Louboutin, Felix J. Herrmann
Neural network augmented wa ve-equation simulation Ali Siahkoohi, Mathias Louboutin, and F elix J. Herrmann School of Computational Science and Engineering Georgia Institute of T echnology {alisk, mlouboutin3, felix.herrmann}@gatech.edu Abstract Accurate forward modeling is important for solving in verse problems. An in- accurate wav e-equation simulation, as a forward operator , will offset the results obtained via in version. In this work, we consider the case where we deal with incomplete physics. One proxy of incomplete physics is an inaccurate discretiza- tion of Laplacian in simulation of wa ve equation via finite-dif ference method. W e exploit intrinsic one-to-one similarities between timestepping algorithm with Con- volutional Neural Networks (CNNs), and propose to intersperse CNNs between lo w-fidelity timesteps. Augmenting neural networks with lo w-fidelity timestepping algorithms may allow us to take large timesteps while limiting the numerical disper - sion artifacts. While simulating the wave-equation with lo w-fidelity timestepping algorithm, by correcting the wa vefield se veral time during propag ation, we hope to limit the numerical dispersion artifact introduced by a poor discretization of the Laplacian. As a proof of concept, we demonstrate this principle by correcting for numerical dispersion by keeping the velocity model fixed, and varying the source locations to generate training and testing pairs for our supervised learning algorithm. 1 Introduction In in verse problem, we hea vily rely on ha ving an accurate forw ard modeling operators. Often, we can not af ford being physically or numerically accurate. In other words, being numerically inaccurate can be due to computationally complexity of accurate methods, or incomplete knowledge of the underlying data generation process. In either case, motiv ated by Ruthotto and Haber [1] , we propose to intersperse CNNs between timestepping algorithm for simulating acoustic wa ve-equation. W e mimic incomplete/inaccurate physics by simulating wave equation with finite-difference method, while utilizing a poor (second-order) discretization of Laplacian. Con ventional method for solving partial differential equations (PDEs), e.g., finite-difference and finite-element method, given enough computational recourses, are able to simulate high-fidelity solutions to PDEs. On one hand, as long as the Courant–Friedrichs–Lewy conditions for stability are satisfied, finite-difference methods are able to compute solutions to PDE, re gardless of medium parameters, with arbitrary precision. On the other hand, finite-element method requires careful meshing of the medium in order to carry out the simulation. Another moti vation behind this w ork is to exploit the fact that in seismic, wav e simulations are usually carried out for specific families of velocity models and source/recei ver distributions. W e hope our proposed method meets halfway between two mentioned e xtremes—i.e., being too generic (finite-dif ference method) and being too problem specific (finite-element method). There are se veral attempts to exploit learning methods in wa ve-equation simulation. Raissi [2] approximates the solution to a nonlinear PDE with a neural network. The neural network, giv en points on the computational grid as input, computes the solution of PDE. Training data is obtained by computing the solution of PDE on several points. Moseley et al. [3] completely ignore the Laplacian Preprint. and they solely rely on predicting the ne xt timestep from the previous tw o timesteps by learning the action of the spatially varying velocity and Laplacian. While possible in principle, their approach needs to train for long times to pro vide reasonable simulations on relati vely simple models. Siahkoohi et al. [4] instead of ignoring the physics, relies on low-fidelity wave-equation simulation, and by exploiting transfer learning, the y utilize a single CNN to correct wa vefield snapshots simulated on a “nearby” velocity mode for numerical dispersion at any gi ven timestep. In this work, we e xtend ideas in Siahkoohi et al. [4] and propose using multiple CNNs, interspersed between low-fidelity timesteps. Finally , Rizzuti et al. [5] propose interspersing Krylov-subspace iterations and neural nets while in verting the Helmholtz equation. They sho w improv ement in conv ergence by “propagating” an approximated wa vefield, obtained from a limited number of iterations, with the aid of a trained con volutional neural net. This technique can be seen as the frequency-domain counterpart of our proposed method. Our paper is or ganized as follows. First, we describe our approach in detail by first, describing our formulation for learned wav e simulation. Next, we introduce our training objectiv e function. Due to dependencies of CNN parameters, we de vised an training heuristic that we describe. Before explaining our numerical experiments, we state used CNN architecture and training details. Next, we describe our three numerical experiments we conduct and discuss effecti veness of the proposed method. 2 Theory W e describe how we augment low-fidelity physics with learning techniques to handle incomplete and/or inaccurate physics, where the low-fidelity physics is modeled via finite-dif ference method with a poor discretization of the Laplacian. T o ensure accuracy , the temporal and spatial discretization in high-fidelity wa ve-equation simulations ha ve to be chosen very fine, typically one to two orders of magnitude smaller than Nyquist sampling rate. As mentioned earlier , we will utilize a poor discretiza- tion (only second order) for the Laplacian to carry out lo w-fidelity wa ve-equation simulations, but the scheme can be extended to other proxies of incomplete or inaccurate physics. 2.1 Simulations by timestepping After discretization of the acoustic wav e equation, a single timestep of of scalar wa vefields, simulated on 0 ≤ t ≤ T , can be written as below: u j +1 = 2 u j − u j − 1 + δ t 2 c 2 ∆u j , j = 0 , 1 , . . . , N − 1 , (1) where u j is the high-fidelity scalar wav efield at j th timestep, δ t is the temporal discretization (timestep), c is the spatially v arying velocity in the medium, and ∆ is the high-order discretization of Laplacian. Similar to Equation 1, the low-fidelity timestepping equation can be formulated as ¯ u j +1 = 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j , j = 0 , 1 , . . . , M − 1 , (2) where ¯ u j is the low-fidelity scalar wav efield, δ T is the coarse timestep, ¯ c is the coarse spatially varying v elocity , and ¯ ∆ is the coarse (only second second order) discretized Laplacian. Motiv ated by Ruthotto and Haber [1] , we consider ev ery timestep as a single layer in a neural network, where the discretized Laplacian i s a linear operator , followed by the (nonlinear) action of the spatial varying v elocity . Moreover , noticing the additional terms in the Equation 1, each timestep is similar to a residual block introduced by Szegedy et al. [6] . Figures 1a and 1b schematically indicate each timestep as a block, corresponding to high- and low-fidelity discretization of wa ve equation. respectively . The similarity of high- and low -fidelity timestepping method and CNNs can be percei ved from Figures 2a and lo w-fidelity-step, respectiv ely , where red and yello w blocks correspond to high- and low-fidelity timestepping equations, respecti vely . As it can be seen, high-fidelity simulation of the wa ve equation up to time t = T requires a lot of high-fidelity timesteps. On the other hand, Figure 2b shows that the low-fidelity simulations can be done with much less low-fidelity timesteps, due to course time sampling, which each timestep is cheaper than the high-fidelity timesteps due to the coarse discretization of Laplacian. Although computationally cheap, the low-fidelity w av e-equation simulations suffer from numerical dispersion artifacts. 2 (a) (b) Figure 1: Comparing a single lo w and high-fidelity timestep. a) High-fidelity timestep. b) Low-fidelity timestep. (a) (b) Figure 2: Comparing low and high-fidelity discretized w av e equation simulations. a) High-fidelity simulation. b) Low-fidelity simulation. 2.2 Learned wa ve simulations Depending on the domain of application, we can assume wa ve simulations are typically carried out for specific families of velocity models and source/receiv er distributions. This motiv ates us to deploy a data-dri ven w av e simulation algorithm which is coupled with low-fidelity and cheap physics and hope to recov er high-fidelity wav e simulations on a family of velocity models. In our method, we propagate the coarse-grained wa vefields according to Equation 2 with a coarsened Laplacian. After k timesteps, where k is a hyperparameter , we apply a correction with a CNN, G θ i , parameterized by θ i , to the obtained wav efield at j th timestep and proceed with the timestepping. The proposed data-driv en timestepping wave simulation method is formalized in Equation 3. ¯ u j +1 = G θ i 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j , i = b j k c if j ≡ k − 1 ( mo d k ) , 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j else (3) where j = 0 , 1 , . . . , M − 1 . The schematic representation of Equation 3 is illustrated in Figure 3. Y ello w blocks represent low-fidelity timesteps (see Equation 2 and Figure 1b) and blue blocks correspond to CNNs, G θ i , i = 0 , 1 , . . . , b M − 1 k c . The CNNs correct for the ef fects of inaccurate physics—i.e., numerical dispersion in our e xperiments, at e very k th low-fidelity timestep. In this work, the parameters, θ i , are not shared among the CNNs. W e hav e not explored the possibility of shared weights among the CNNs in dif ferent stages of wav e propagation. Note that although parameters of the CNNs are not shared, they are not independent—i.e., after j th timestep, CNN G θ i , i = b j k c , corrects for errors in the wa vefield introduced by low-fidelity timestepping and imperfections present in the output of ( i − 1) th CNN, which hav e been propagated through timestepping. Therefore, a small perturbation in the parameters of a CNN in the initial stages 3 Figure 3: A schematic representation of the proposed method. of neural network augm ented timestepping causes noticeable dif ferences in the input of the CNNs in later stages of the wave propagation. The described dependencies among the CNN parameters introduces difficulties in optimizing the parameters of the CNNs. Below we describe our heuristic for training the CNNs. 2.3 T raining objective During training, we train all the CNNs tow ard the high-fidelity solution of wave-equation, at the corresponding timestep, obtained by solving Equation 1. As it can be seen from Equation 3, after j th timestep, CNN G θ i , i = b j k c , is tasked to correct the ef fects of low-fidelity timestepping. During training, i th CNN maps its input, 2 ¯ u j − ¯ u j − 1 + δ T 2 ¯ c 2 ¯ ∆ ¯ u j , to u j +1 , result of j th timestep using high-fidelity timestepping, obtained by Equation 1. Define function ¯ F k ( . ) as the action of k low-fidelity timesteps—i.e., ¯ F k ( . ) represents k consecutiv e low-fidelity time stepping blocks, depicted in Figure 1b. Clearly , ¯ F k is only a function of k , δ T , ¯ c , and ¯ ∆ . Using the defined notation, we can write the input to i th CNN, ˆ u i , i = 0 , 1 , . . . , b M − 1 k c , as follows: ˆ u i = ¯ F k ( G θ i − 1 ( ˆ u i − 1 )) , i = 1 , 2 , . . . , b M − 1 k c , ˆ u 0 = ¯ F k ( q ) , (4) where q is the source. Also let u τ i denote the wa vefield obtained at j th timestep of high-fidelity timestepping (Equation 1), where τ i = j = ( k + 1) i − 1 . The input-output pair of the i th CNN is ( ˆ u i , u τ i ) . W e can generated multiple training pairs for CNNs by simulating ( ˆ u i , u τ i ) pairs, for v arious velocity models and source locations. Assume we hav e n pairs of of training data for CNNs, namely , ( ˆ u ( p ) i , u ( p ) τ i ) , p = 0 , 1 , . . . , n − 1 . The objectiv e function for optimizing i th CNN can be written as follows: L i = 1 n n − 1 X p =0 G θ i ( ˆ u ( p ) i ) − u ( p ) τ i 1 , i = 0 , 1 , . . . , b M − 1 k c . (5) In the past, in a similar attempt, we used Generative Adv ersarial Networks [GANs, 7 ] to train a CNN in order to remov e numerical dispersion from wa vefield snapshots [ 4 , 8 ]. In this work we choose to use ` 1 norm as the misfit function based on two reasons. First, training GANs is computationally expensi ve since it requires training an additional neural network that discerns between high-fidelity wa vefield snapshots and corrected ones. The computational complexity of the proposed method in this work is significantly higher than our previous attempts [ 4 , 8 ], because it in volv es training multiple CNNs. Based on the mentioned facts, for limiting the computation time we chose to use ` 1 misfit function. Second, motiv ated by a numerical e xperiment performed by Hu et al. [9] , ` 1 norm misfit function yields the second best results after the misfit function utilizing a combination of GANs and ` 1 norm misfit. In the next section, we describe our heuristic for training the CNNs. 2.4 T raining heuristic T o overcome complexities caused by dependencies between parameters of CNNs, we optimize the objecti ve functions L i with a heuristic described below . W e minimize L i , i = 0 , 1 , . . . , b M − 1 k c , with respect to θ i , respectiv ely . In other words, we minimize L i with respect to θ i , by keeping the rest of 4 the parameters fixed. W e keep updating all the set of parameters, in a cyclic fashion—i.e., once we updated all the parameters, L i , i = 0 , 1 , . . . , b M − 1 k c , we start ov er and update them again, in order , until a stopping criteria is achie ved. W e will describe the stopping criteria used in our experiments later . W e minimize objecti ve functions the abo ve objectiv es 5 with a variant of Stochastic Gradient Descent known as the Adam optimizer [ 10 ] with momentum parameter β = 0 . 9 and a linearly decaying stepsize with initial v alue µ = 2 × 10 − 4 for both the generator and discriminator networks. During each iteration of Adam, the gradient L i is approximated by a single randomly selected training pair . These pairs are selected without replacement. Once all the training pairs have been selected, we start ov er by randomly picking training pairs, without replacement from the entire training set. The optimization carries out for a predetermined number of total iterations , where each iterations consists of drawing a random training pair , without replacement, and updating parameters of a CNN. Additionally , while optimizing θ i by keeping the rest of the parameters fixed, before proceeding to the next set of parameters, we carry out the optimization to update θ i for number of iterations, which we refer to it as mini-iterations . Algorithm 1 indicates the steps for optimizing objectiv e functions 5. Algorithm 1 Heuristic for optimizing CNNs G θ i , i = 0 , 1 , . . . , b M − 1 k c . 1. INPUT : M axI tr // total number of iterations to carry out the optimization M axM iniI tr // mini-iterations before proceeding to the next CNN ¯ F k ( . ) // k consecutive low-fidelity time stepping blocks q ( p ) p = 0 , 1 , . . . , n − 1 \\ sources corresponding to different training pairs u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c \\ high-fidelity snapshots θ 0 i , i = 0 , 1 , . . . , b M − 1 k c // randomly initialized parameters 2. I tr N um ← 0 3. FOR i = 0 : b M − 1 k c DO 4. θ i = θ 0 i 5. FOR p = 0 : n − 1 DO 6. ˆ u ( p ) 0 = ¯ F k ( q ( p ) ) 7. WHILE itr N um < M axI tr DO 8. FOR i = 0 : b M − 1 k c DO 9. IF i > 0 DO 10. FOR p = 0 : n − 1 DO 11. ˆ u ( p ) i = ¯ F k ( G θ i − 1 ( ˆ u ( p ) i − 1 )) 12. FOR miniI tr N um = 1 : M axM iniI tr DO 13. p ← S ampleW ithoutRepl acement ( { 0 , 1 , . . . , n − 1 } ) 14. θ i ← arg min θ i G θ i ( ˆ u ( p ) i ) − u ( p ) τ i 1 15. I tr N um ← I tr N um + 1 16. RETURN θ i , i = 0 , 1 , . . . , b M − 1 k c 2.5 CNN architectur e Motiv ated by our previous attempts for numerical dispersion remov al from wa vefield snapshots [ 4 , 8 ], we use the exact architecture provided by Johnson et al. [11] , which includes Residual Blocks, the main building block of ResNets, introduced by He et al. [12] , for all the CNNs G θ i , i = 0 , 1 , . . . , b M − 1 k c . 2.6 T raining details and implementation While CNNs are kno wn to generalize well—i.e., maintain the quality of performance when applied to unseen data, they can only be successfully applied to a data set dra wn from the same distribution as the training data. Because of the Earth’ s heterogeneity and complex geological structures present in realistic-looking models, training a neural network that can generalize well when applied to another velocity model can become challenging. While we have successfully demonstrated that transfer 5 learning [ 13 ] can be used in situations where the neural network is initially trained on data from a proximal surve y [ 4 ], we chose in this contribution, as a proof of concept, to keep the velocity model fixed, and v ary the source locations to generate different training/testing pairs. W e use the Marmousi velocity model and out of 401 av ailable shot locations with 7 . 5 m spacing, we allocate half of the shot locations to training and use the rest of the shot locations to generate testing pairs, for ev aluation purposes. The maximum simulation time in our experiments in 1 . 1 s . W e designed and implemented our deep architectures in T ensorFlow 1 . T o carry out our wa ve-equation simulations with finite differences, we used De vito 2 [ 14 , 15 ]. W e used the functionality of Operator Discretization Library 3 to wrap Devito operators into a T ensorFlo w layers. Our implementation can be found on GitHub 4 . W e ran our algorithm on Amazon W eb Services’ g3.4xlarge instance, where we optimize the CNN parameters on a NVIDIA T esla M60 GPU and Devito utilizes 16 CPU cores to perform finite- difference wave-equation simulations. Initially , we simulate the high-fidelity training wavefield snapshots, u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c , only once, in the beginning, and store them. In order to limit CPU-GPU communication, before utilizing the GPU to to update θ i , i = 0 , 1 , . . . , b M − 1 k c , we generate the input to i th CNN, ˆ u ( p ) i , p = 0 , 1 , . . . , n − 1 all at once, and store them. Afterwards, i th CNN can be (re)trained using the stored input/output wav efield snapshot pairs for sev eral mini-iterations. 3 Numerical experiments W e want to indicate that neural networks, when augmented with inaccurate physics, e.g., a poor discretization of Laplacian, are able to approximate the wa vefields obtained by an accurate approxi- mation to wav e equation. T o demonstrate this, we conduct three numerical experiments in which we keep the velocity model fix ed, and vary the source locations to generate different training/testing pairs. The experiments differ in the number of CCNs used throughout learned wav e propagation. W e use three, fiv e, and ten CNNs while keeping the total number of iterations fixed. This implies that an experiment with more CNNs, optimizes each CNN with a smaller number of iterations per CNN, because, iterations per CNN × number of CNNs = total number of iterations. A neural network augmented wa ve simulator with n 1 CNNs needs more training iterations and possibly more training data to perform equally as well as a neural network augmented w av e simulator with n 2 CNNs, when n 1 > n 2 . For a fixed number of total iterations, iterations per CNN is inv ersely proportional to number of CNNs utilized. Therefore, the first n 2 CNNs in the neural network augmented wa ve simulator with n 1 CNNs will perform worse than the CNNs in the wave propagator with n 2 CNNs. Consequently , the error accumulated by the poor performance of first n 2 CNNs, combined with artifacts introduced by low-fidelity timestepping makes the matters worse for the later CNNs in the more complex learned w av e propagator . Therefore, in our experi ments, since the total number of iterations is fixed, we e xpect to see the quality of dispersion remov al degrade as the number of CNNs increase in a learned wave propagator . T able 1 summarizes the total number of iterations, iterations per CNN, training pairs, training time, and number of tunable parameters for the three different e xperiments. CNNs Iterations Iterations per CNN Pairs per CNN Time Param. count 3 100500 33500 201 17 . 99 hours 34150272 5 100500 20100 201 19 . 79 hours 56917120 10 100500 10050 201 49 . 24 hours 113834240 T able 1: Summary of details in the three neural network augmented wa ve-equation simulation experiments. 1 https://www .tensorflow .org/ 2 https://www .devitoproject.or g/ 3 https://odlgroup.github .io/odl/ 4 https://github .com/alisiahkoohi/NN- augmented- wa ve- sim 6 As described earlier and presented in T able 1, the MaxItr variable used in the While condition in line 8 of Algorithm 1 is set to 500 for all our experiments. Figures 4 − 6 demonstrate the v alues of objectiv e function presented in Equation 5 in orange, and the wavefield correction signal-to-noise ratio (SNR) in blue, ev aluated on testing data pairs during training, for experiments with three, fi ve, and ten CNNs, respectiv ely . Note that the SNR curves hav e not been used to determine when to stop training and they are only depicted for demonstration purposes. Figures 4a, 4c, and 4e show the wa vefield correction SNR for first, second, and third CNN, re- spectiv ely , in the neural network augmented wav e simulator that includes three CNNs. Similarly , Figures 4b, 4d, and 4f depict the training objective v alues throughout training for first, second, and third CNNs. As it can be seen from objective function curv es, the raining heuristic has been effecti ve and the objecti ve function values have a decreasing trend. Note that CNNs has been trained for 33500 iterations, on average, with a total of 100500 iterations. Sev eral equispaced spikes can be noticed on the objective function value curves. For instance, see the objectiv e value function curve of the third CNN, in Figure 4f, at 6030 , 8040 , 10050 , and 12060 iterations. The mini-batch we use in this experiment is 10 . Those spikes occur in moments in training when we have started retraining the third CNN, after updating the first and second CNNs. As discussed before, a change in the parameters of the CNNs preceding a CNN causes changes in the input of the later CNN, and consequently the objectiv e function becomes large when starting to retrain the CNNs in later stages again. Similar objectiv e function value and SNR curves for other two neural network augmented wav e propagators, utilizing fiv e and ten CNNs, can be found in Figures 5 and 6, respectively . First column in Figures 5 and 6, from top to bottom, indicate SNR of wav efield correction obtained by the first to the last CNN, ev aluated on testing pairs while training, respectively . The second column of Figures 5 and 6 indicate the objectiv e function value curv es throughout optimization of Equation 5 for training neural network augmented wav e propagators, utilizing fiv e and ten CNNs, respecti vely . In both columns, from top to bottom, the objecti ve function v alue curv es correspond to CNNs from beginning to the end of the learned wa ve propagators, in order . W e make two main observ ations from Figures 5 and 6. First, the objectiv e function values indicate ov erall decreases, v alidating ef fectiv eness of the introduced heuristic. Also, the spikes on the objecti ve function value curve can be seen, which are correlated with the stages in training when Algorithm 1 revisits a CNN after updating the rest of the CNN parameters. As explained before, spikes are caused by change in parameters of preceding CNNs to a CNN, which in turn alters the input training wa vefields of the CNN. Second, due to decrease in iterations per CNN as the number of CNNs increases, the SNR curves con ver ge to a lower v alue when the number of CNNs increase. (a) (b) (c) (d) (e) (f) Figure 4: Neural network augmented wa ve simulation with three CNNs. First column from top to bottom: SNR curves, ev aluated on testing pairs during training, for a) the first to e) the last CNN, in order . Second column from top to bottom: training objectiv e function v alue curves, e valuated on training pairs, for b) the first to the f) last CNN, in order . Next, we will demonstrate the the corrected wavefields in three conducted experiments ev aluated ov er one testing shot location. For each experiment, we sho w the high-fidelity wa vefield snapshots, u τ i , i = 0 , 1 , . . . , b M − 1 k c , where i iterates over the CNNs, numerically dispersed lo w-fidelity 7 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Figure 5: Neural network augmented wav e simulation with fiv e CNNs. First column from top to bottom: SNR curves, e valuated on testing pairs during training, for a) the first to i) the last CNN, in order . Second column from top to bottom: training objective function value curves, ev aluated on training pairs, for b) the first to the j) last CNN, in order . wa vefields, and the corrected wavefield snapshots by the CNNs. T o ev aluate the performance of each correction, we also depict the correction error —i.e., difference between the high-fidelity and corrected wav efield snapshots. Figure 7 shows the mentioned wav efield snapshots for the neural network augmented wa ve simulator with three CNNs. First column shows the high-fidelity wav efields by solving Equation 1, second column depicts lo w-fidelity simulations by solving Equation 2, third column indicates the result of neural network augmented wavefield simulations, and the fourth column is the learned wave simulation error—i.e., difference between the first and last column in Figure 7. Similarly , Figures 8 − 10 show the high- and low- fidelity and learned wave simulation wa vefield snapshots in the first three columns, in order , for the neural network augmented wav e simulator with fiv e and ten CNNs, respectively . As expected because of the reasons stated before, we observe that the quality of neural network augmented wa ve-equation simulation degrades as the number of CNNs increases. On the other hands, the high quality of learned wav e simulation with few CNNs (see Figure 7) suggests the quality of the simulation with more CNNs might be improved by increasing the number of iterations. As it can be seen in the last column of Figures 9 and 10, the learned wa ve simulation with ten CNNs has the lo west quality . It can be seen that the learned wav e simulation has the least accuracy in direct w av e, which happen to be the ev ents with largest amplitudes. Also, it appears that most of the numerical dispersion has been remov ed, the phase has been recov ered, and residual is mostly amplitude differences. 3.1 Perf ormance comparison: Single CNN low-to-high-fidelity mapping In order to e valuate the effecti veness of the proposed method we also train a single CNN similar to our pre vious attempt to remov e numerical dispersion from wav efield snapshots [ 4 , 8 ] and compare the result of numerical dispersion remov al with the proposed method. T o be more precise, for each presented neural network augmented wav e-equation simulation experiment, where we use three, five, and ten CNNs, we train a single CNN, G θ , with the same architecture as the architecture used in learned wa ve propag ators, in order to remo ve numerical dispersion from all the lo w-fidelity wa vefield 8 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 6: Neural network augmented wav e simulation with ten CNNs. First column from top to bottom: SNR curves, ev aluated on testing pairs during training, for a) the first to s) the last CNN, in order . Second column from top to bottom: training objectiv e function v alue curves, e valuated on training pairs, for b) the first to the t) last CNN, in order . 9 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 7: Neural network augmented wa ve simulation with three CNNs. First column from top to bottom: a, e, i) high-fidelity wav efield snapshots, in order . Second column from top to bottom: b, f, j) low-fidelity wav efield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wa vefields, in order . Third column from top to bottom: c, g, k) result of neural network augmented wa ve-equation simulation. Output of the first, second, and the last CNN, in order . Fourth column from top to bottom: d, h, l) difference between first and third column, in order . snapshots simulated by solving Equation 2 for j ≡ k − 1 (mod k ) , on training shot locations. Like wise to pre vious e xamples, here we also use half of the av ailable shot locations to simulate training pairs, and the rest is used to e valuate the performance of the trained CNN. The input to G θ during training can be written as follows (compare with Equation 4): ˜ u i = ¯ F k ( ˜ u i − 1 ) , i = 1 , 2 , . . . , b M − 1 k c , ˜ u 0 = ¯ F k ( q ) , (6) The desired output for the mentioned CNN is the high-fidelity wav efield snapshots simulated on training shot locations, u ( p ) τ i , p = 0 , 1 , . . . , n − 1 , i = 0 , 1 , . . . , b M − 1 k c . The objecti ve function for the mentioned CNN can be represented as follows: L = 1 n ( b M − 1 k c + 1) b M − 1 k c X i =0 n − 1 X p =0 G θ ( ˜ u ( p ) i ) − u ( p ) τ i 1 . (7) W e minimize objectiv e function 7 over θ with Adam optimizes, using the same maximum number of iterations as before, this time by combining all the training pairs associated with dif ferent CNNs in the learned wa vefield simulation example. As mentioned before, in order to compare with the proposed method, we minimize objective function 7 over three different set of input-output pairs, each corresponding to our presented e xperiments with v arying number of CNNs. T able 2 summarizes the total number of iterations, training pairs, training time, and number of tunable parameters for three different cases, which dif fer in number of timesteps which we choose to correct the numerical dispersion. This selected timesteps are associated with the timesteps that the CNNs operated on, in our three previous e xamples. 10 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 8: Neural network augmented wav e simulation with fiv e CNNs. First column from top to bottom: a) to q) high-fidelity wavefield snapshots, in order . Second column from top to bottom: b) to r) low-fidelity w av efield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wa vefields, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the first to the last CNN, in order . Fourth column from top to bottom: d) to t) difference between first and third column, in order . # T imesteps to correct Iterations Pairs per CNN Time Param. count 3 100500 603 13 . 85 hours 11383424 5 100500 1005 13 . 96 hours 11383424 10 100500 2010 14 . 56 hours 11383424 T able 2: Summary of details in three conducted experiments 11 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 9: Neural network augmented wa ve simulation with ten CNNs, first part. First column from top to bottom: a) to q) high-fidelity wav efield snapshots, in order . Second column from top to bottom: b) to r) lo w-fidelity wa vefield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wa vefields, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the first to the fifth CNN, in order . Fourth column from top to bottom: d) to t) difference between first and third column, in order . 12 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 10: Neural network augmented wa ve simulation with ten CNNs, second part. First column from top to bottom: a) to q) high-fidelity wa vefield snapshots, in order . Second column from top to bottom: b) to r) low-fidelity wa vefield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wav efields, in order . Third column from top to bottom: c) to s) result of neural network augmented wa ve-equation simulation. Output of the sixth to the last CNN, in order . Fourth column from top to bottom: d) to t) difference between first and third column, in order . 13 The slight difference in runtime among three different cases provided in T able 2 is partly due to different number of training pairs needed to be generated. Figure 11 depicts the wa vefield snapshot correction SNR curves, e valuated on testing pairs while training, and the value of objecti ve function 7, in single CNN lo w-to-high-fidelity mapping experiment, as a function of number of iterations. Figures 11a, 11c, and 11e sho w the SNR curves, when we trained the CNN on wa vefield snapshots correspond to three, fiv e, and ten, timesteps, respectively . Similarly , Figures 11b, 11d, and 11f depict the training objecti ve function v alue (Equation 7), when the CNN is trained on wavefield snapshots correspond to three, fiv e, and ten, timesteps, respectively . SNR curves depicted in first column of Figure 11 sho w the ev olution of wa vefield correction SNR e valuated on randomly selected testing wa vefield wa vefields form the wavefield snapshots combined from dif ferent timesteps. Therefore, Figures 11a, 11c, and 11e indicate that the three dif ferent CNNs con ver ge to a wa vefield correction SNR around 20 dB, regardless of number of timesteps they are correcting for . Although this does not suggest that the performance will stay the same as we increase the number of timesteps needed to be corrected. By comparing Figure 11a with first column of Figure 4 (SNR curv es for neural netw ork augmented wa ve-equation simulation with three CNNs), we observe that, on a verage the two methods are performing equally well. (a) (b) (c) (d) (e) (f) Figure 11: Single CNN low-to-high-fidelity mapping. a) wa vefield snapshot correction SNR curve and b) objectiv e function value curve when CNN is trained on wav efield snapshot pairs corresponding to learned wav efield simulation with three CNNs. c) wa vefield snapshot correction SNR curve and d) objectiv e function value curv e when CNN is trained on wav efield snapshot pairs corresponding to learned wavefield simulation with five CNNs. e) wav efield snapshot correction SNR curve and f) objectiv e function value curv e when CNN is trained on wav efield snapshot pairs corresponding to learned wa vefield simulation with ten CNNs. Finally , we will sho w the wav efield corrected by the single CNN low-to-high-fidelity mapping method, for comparison with our proposed method. Figures 12 and 13 indicate the corrected wa vefields, for cases where three and fiv e timesteps need to be corrected. Figures 14 and 15 demonstrate the corrected w av efields, for the case where ten timesteps need to be corrected, in two parts. In Figures 12 − 15, the first, second, third, and fourth columns depict the high-fidelity and lo w fidelity wavefield snapshots, corrected low-fidelity w av efield snapshots, and the error in numerical dispersion remov al, respectiv ely . In each columns, from top to bottom, the simulation time increases. For comparison between our proposed method and single CNN low-to-high-fidelity mapping, compare Figures 7 with 12, 8 with 13, 9 with 14, 10 with 15. As it can be seen, the single CNN low-to-high- fidelity mapping method maintains the quality of its performance when the number of timesteps that need to be corrected increases. On the other hand, as the number of CNNs in neural network augmented wa ve-equation simulation increases, the performance drops, by keeping the maximum number of iterations fixed. Also, by comparing T ables 1 and 2, we observe that the training time needed for single CNN low-to-high-fidelity mapping, when number of timesteps needed to be corrected increases, for fixed number of maximum iterations, grows very slowly compared to the training time required for neural network augmented wa ve-equation simulation. 14 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Figure 12: Single CNN low-to-high-fidelity mapping with three timesteps to be corrected. First column from top to bottom: a, e, i) high-fidelity wav efield snapshots, in order . Second column from top to bottom: b, f, j) low-fidelity wa vefield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wav efields, in order . Third column from top to bottom: c, g, k) result of single CNN lo w-to-high-fidelity mapping. Fourth column from top to bottom: d, h, l) difference between first and third column, in order . 4 Conclusions Our numerical experiments demonstrate that, gi ven suitable training data, the well-trained neural net- work augmented wa ve-equation simulator is capable of approximating wa vefield snapshots simulated by high-fidelity simulation. In this work, as a proxy of inaccurate physics, we simulate w av e-equation with finite-dif ference method, using a poor discretization of Laplacian. Although not computationally fa vorable to high-fidelity wave simulation, we showed that the learned wav e simulator deals with inaccurate physics. An important observation we made is that training time of the proposed method gets quickly very long and to achieve high accuracy , it may not be possible to utilize too many CNNs. On the other hand, training time required for the single CNN low-to-high-fidelity mapping experiments, conducted for the sake of comparison, grows v ery slowly as the number of timesteps to be corrected increases. In future, we intend to initialize the CNN parameters in the proposed method with the parameters of a CNN trained by the single CNN low-to-high-fidelity mapping algorithm. The initialization may significantly reduce the training time needed for the neural network augmented wa ve-equation simulation method, and may giv e the chance to fine-tune the CNNs to the specific timestep that each CNN is assigned to correct. 5 Acknowledgments The authors thank Xiaowei Hu for his open-access repository 5 on GitHub . Our software implementa- tion built on this w ork. 5 https://github .com/xhujoy/CycleGAN- tensorflow 15 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 13: Single CNN lo w-to-high-fidelity mapping with fi ve timesteps to be corrected. First column from top to bottom: a) to q) high-fidelity wa vefield snapshots, in order . Second column from top to bottom: b) to r) low-fidelity wa vefield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wav efields, in order . Third column from top to bottom: c) to s) result of single CNN lo w-to-high-fidelity mapping. Fourth column from top to bottom: d) to t) difference between first and third column, in order . 16 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 14: Single CNN low-to-high-fidelity mapping with ten timesteps to be corrected, first part. First column from top to bottom: a) to q) high-fidelity wavefield snapshots, in order . Second column from top to bottom: b) to r) low-fidelity wav efield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wa vefields, in order . Third column from top to bottom: c) to s) result of single CNN low-to-high-fidelity mapping. Fourth column from top to bottom: d) to t) difference between first and third column, in order . 17 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) Figure 15: Single CNN low-to-high-fidelity mapping with ten timesteps to be corrected, second part. First column from top to bottom: a) to q) high-fidelity wavefield snapshots, in order . Second column from top to bottom: b) to r) low-fidelity wav efield snapshots simulated by solving Equation 2 with the same simulation time as high-fidelity wa vefields, in order . Third column from top to bottom: c) to s) result of single CNN low-to-high-fidelity mapping. Fourth column from top to bottom: d) to t) difference between first and third column, in order . 18 References [1] Lars Ruthotto and Eldad Haber . Deep neural networks motiv ated by partial dif ferential equations. CoRR , abs/1804.04272, 2018. URL http://arxiv .org/abs/1804.04272 . [2] Maziar Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations. The Journal of Mac hine Learning Resear ch , 19(1):932–955, 2018. [3] Benjamin Moseley , Andrew Markham, and T arje Nissen-Meyer . Fast approximate simulation of seismic wa ves with deep learning. arXiv pr eprint arXiv:1807.06873 , 2018. [4] Ali Siahkoohi, Mathias Louboutin, and Felix J. Herrmann. The importance of transfer learning in seismic modeling and imaging. 2019. Submitted to GEOPHYSICS in February 2019. [5] Gabrio Rizzuti, Ali Siahkoohi, and Felix J. Herrmann. Learned iterativ e solvers for the helmholtz equation. In EA GE Annual Conference Pr oceedings , 06 2019. doi: 10.3997/ 2214- 4609.201901542. URL https://slim.gatech.edu/Publications/Public/Conferences/EA GE/ 2019/rizzuti2019EA GElis/rizzuti2019EAGElis.pdf . (EA GE, Copenhagen). [6] Christian Szegedy , Sergey Iof fe, V incent V anhoucke, and Ale xander A Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Pr oceedings of the Thirty-F irst Association for the Advancement of Artificial Intelligence Confer ence on Artificial Intelligence (AAAI-17) , volume 4, pages 4278–4284, 2017. URL http://aaai.org/ocs/index.php/ AAAI/AAAI17/paper/view/14806 . [7] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David W arde-Farle y, Sherjil Ozair, Aaron Courville, and Y oshua Bengio. Generativ e Adversarial Nets. Advances in neural information pr ocessing systems , pages 2672–2680, 2014. [8] Ali Siahkoohi, Mathias Louboutin, Raji v Kumar , and Felix J. Herrmann. Deep-con volutional neural networks in prestack seismic: T wo exploratory examples. SEG T ec hnical Pr ogram Expanded Abstracts 2018 , pages 2196–2200, 2018. doi: 10.1190/segam2018- 2998599.1. URL https://library .seg.org/doi/abs/10.1190/seg am2018- 2998599.1 . [9] T ao Hu, Zhizhong Han, Abhinav Shri v astav a, and Matthias Zwicker . Render4completion: Synthesizing multi-vie w depth maps for 3d shape completion. arXiv pr eprint arXiv:1904.08366 , 2019. [10] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Confer ence on Learning Representations , 2015. [11] Justin Johnson, Ale xandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Computer V ision – Eur opean Confer ence on Computer V ision (ECCV) 2016 , pages 694–711. Springer International Publishing, 2016. doi: 10.1007/978- 3- 319- 46475- 6_43. URL https://link.springer .com/chapter/10.1007% 2F978- 3- 319- 46475- 6_43 . [12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 770–778, June 2016. doi: 10.1109/CVPR.2016.90. URL https://ieeexplore.ieee.or g/ document/7780459 . [13] Jason Y osinski, Jef f Clune, Y oshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information pr ocessing systems , pages 3320–3328, 2014. [14] M. Louboutin, M. Lange, F . Luporini, N. Kukreja, P . A. W itte, F . J. Herrmann, P . V elesko, and G. J. Gorman. Devito: an embedded domain-specific language for finite differences and geophysical exploration. CoRR , abs/1808.01995, Aug 2018. URL https://arxiv .org/abs/1808. 01995 . [15] F . Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hück elheim, C. Y ount, P . W itte, P . H. J. Kelly, G. J. Gorman, and F . J. Herrmann. Architecture and performance of devito, a system for automated stencil computation. CoRR , abs/1807.03032, jul 2018. URL http://arxiv .org/abs/ 1807.03032 . 19
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment