Approaching geoscientific inverse problems with vector-to-image domain transfer networks

We present vec2pix, a deep neural network designed to predict categorical or continuous 2D subsurface property fields from one-dimensional measurement data (e.g., time series), thereby offering a new approach to solve inverse problems. The performanc…

Authors: Eric Laloy, Niklas Linde, Diederik Jacques

Approaching geoscientific inverse problems with vector-to-image domain   transfer networks
A P P R O A C H I N G G E O S C I E N T I FI C I N V E R S E P R O B L E M S W I T H V E C T O R - T O - I M A G E D O M A I N T R A N S F E R N E T W O R K S A P R E P R I N T Eric Laloy ∗ Belgian Nuclear research Centre, SCK • CEN elaloy@sckcen.be Niklas Linde Univ ersity of Lausanne Diederik Jacques Belgian Nuclear research Centre, SCK • CEN Nov ember 11, 2020 A B S T R AC T W e present vec2pix , a deep neural network designed to predict categorical or continuous 2D subsurface property fields from one-dimensional measurement data (e.g., time series), thereby of fering a ne w approach to solv e in verse problems. The performance of the method is in vestigated through two types of synthetic in verse problems: (a) a crosshole ground penetrating radar (GPR) tomograph y experiment with GPR trav el times being used to infer a 2D velocity field, and (2) a multi-well pumping experiment within an unconfined aquifer with time series of transient hydraulic heads being used to retrie ve a 2D hydraulic conducti vity field. For each type of problem, both a multi-Gaussian and a binary channelized subsurf ace domain with long-range connecti vity are considered. Using a training set of 20,000 examples (implying as many forward model e valuations), the method is found to recov er a 2D model that is in much closer agreement with the true model than the closest training model in the forward-simulated data space. Further testing with smaller training sample sizes sho ws only a moderate reduction in performance when using 5000 training examples only . Even if the recov ered models are visually close to the true ones, the data misfits associated with their forward responses are generally larger than the noise lev el used to contaminate the true data. If finding a model that fits the data noise le vel is required, then vec2pix -based in version models can be used as initial inputs for more traditional multiple-point statistics inv ersion. Uncertainty of the in v erse solution is partially assessed using deep ensembles, in which the netw ork is trained repeatedly with random initialization. Overall, this study advances understanding of how to use deep learning to infer subsurface models from indirect measurement data. 1 Introduction Deep learning [DL, see, e.g., the textbook by 9 ] is currently having a profound impact on the Earth sciences [ 27 , 32 , 30 ]. Important adv ances ha ve been made for clustering and classification tasks [e.g., 37 ], forward proxy modeling [ 39 ] and learning tailor-made model encodings of complex geological priors into low-dimensional latent variables for geostatistical in version [ 15 , 16 ] and simulation [ 22 , 16 , 3 ] purposes. In remote sensing and geoph ysics, significant emphasis has been placed on ho w to turn lo w resolution images into high-resolution images using concepts of super resolution [ 34 ]. Lately , researchers in acti ve seismics hav e started to approach inv ersion by transferring reflection data represented as images into geological images [ 1 , 23 , 35 ]. Howe ver , most geoscientific data do not lend themselves to a spatial representation that is visually similar to the type of final model that is sought. One e xample is hydrological time-series (pressure, temperature and concentration) measured at one or several locations that are only indirectly related to the underlying hydraulic conducti vity field through a non-linear function. F or the 2D-to-2D image transfer case, DL architectures ha ve been proposed within the influential pix2pix (and follo w-up cycleGAN ) image-to-image ∗ Corresponding Author A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 translation framework [ 12 , 38 ]. Using a deep neural netw ork (DNN) to turn geoscientific data vectors such as time-series into 2D or 3D subsurf ace models is a challenging task because, in contrast to the pix2pix image-to-image translation framew ork, there is no low-le vel information shared between the two considered domains. Independently of our work, Earp and Curtis [5] recently proposed a 2D-to-2D DNN for trav el time tomography that does not require common low-le vel information to be present, which, in principle, makes it amenable to 1D-to-2D domain transfer . In this contribution, we propose a 1D-to-2D network that takes one or multiple time series or other data represented in a data vector and map them into a subsurface model. This implies that we bypass conv entional inv ersion by instead learning a mapping between 1D measurement data and a corresponding 2D subsurface model. Our presented examples focus on inferring ground-penetrating radar (GPR) velocity and hydraulic conducti vity for giv en 2-D channelized and multi-Gaussian prior models, but the potential of the approach is much wider than this. T o assess uncertainty in the in verse solutions, we use the recently developed deep ensembles approach [ 6 ]. The results produced by our so-called vec2pix algorithm demonstrate the feasibility of 1D-to-2D transfer , thereby allo wing for many possible applications in hydrology , geophysics and Earth system science. The remainder of this paper is or ganized as follo ws. Section 2 summarizes related work and ho w it dif fers from our method. Section 3 describes our proposed domain transfer network and its training, together with the considered in verse problems. This is followed by section 4 that presents our domain transfer inv ersion results. In section 5, we discuss our main findings and outline current limitations and possible future dev elopments. Finally , section 6 provides a conclusion. 2 Related W ork In version using image-to-image domain transfer networks has been proposed in the context of 2D seismic in v ersion [ 1 , 23 , 35 ]. In subsurface hydrology , the study by Sun [31] is a first step to wards in verting steady-state groundwater flow data with image-to-image domain translation. Both Mosser et al. [23] and Sun [31] added loss functions to the cycleGAN network by Zhu et al. [38] to promote reconstruction of paired images. The works listed so far require that the two considered 2D domains share some lo w-lev el information. As written abov e, independently of our work Earp and Curtis [5] proposed a 2D-to-2D transfer network for in v ersion of 2D trav el time tomography data which does not hav e that requirement. In this study , we explicitly cast the problem within a vector -to-image transform framework. Our network is ho we ver conceptually similar to that of Earp and Curtis [5] in the sense that the 1D input processed by our network gets projected in a 2D space at some point (see section 3.2). The main dif ferences between our work and the study by Earp and Curtis [5] are as follo ws. First our work is rooted within an informative prior framework aiming at obtaining solutions with a high degree of prescribed geological realism [ 18 ]. While Earp and Curtis [5] use a completely uncorrelated prior parameter space, we consider the common case in hydrogeology and hydrogeophysics where prior information on the considered subsurface structure is available under the form of a geologically-based prior model. Compared to Earp and Curtis [5] , this allo ws us to work with significantly higher-dimensional output model domains and, perhaps more importantly , with much less training samples for learning the weights and biases of our network. Indeed, we consider (at most) 20,000 training samples and model sizes of 128 × 64 and 160 × 256 . In contrast, Earp and Curtis [5] consider small 8 × 8 and 16 × 16 model domains, and use as many as 2.5 million training samples (i.e., 125 times more). This has a high impact on computational feasibility as obtaining one training sample requires one forward model e v aluation. Second, Earp and Curtis [5] focus on 2D tra vel time tomography only while we also consider a rather nonlinear transient groundwater flo w problem. Lastly , we use a different neural netw ork architecture. 3 Methods 3.1 V ector -to-image transfer network Let us denote by Y the measurement data (vector) domain, and by X the model (2D subsurface property field) domain. Our model consists of the mapping function, G Y X : G Y X : R Y → R X . (1) The G Y X operator predicts the model ˜ x corresponding to the measurement data vector y it is fed with, ˜ x = G Y X ( y ) . At training time, G Y X is learned using a l 1 reconstruction loss: min G Y X {L rec ( G Y X , y , x ) } = E x ∼ p x [ || x − G Y X ( y ) || 1 ] . (2) 3.2 Implementation As stated abo ve, the main methodological dif ference between our network architecture and most of those used pre viously within the 2D-to-2D transform paradigm [ 12 , 38 , 23 , 31 ] is that our code processes a 1D (input) vector to output a 2D 2 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 array . Our generator network architecture is based on Zhu et al. [38] and follows the state-of-the-art in computer vision. T o make our vec2pix generator suitable to the 1D-to-2D ( G Y X ) domain transfer , we first project the input data vector onto an increasingly larger number of lo wer-dimensional representations (or latent spaces or manifolds) using a series of 1D con volutions with increasing number of channels (or filters, see Figure 1 and Appendix A). Then we apply a reshape operation to con vert the final 1D representations into 2D representations before (i) further processing this information through a series of so-called “ResNet" residual blocks [ 11 ] and (ii) projecting the deriv ed latent spaces into increasingly larger -dimensional representations while reducing their numbers, until the final 2D model is produced. This is achiev ed by using a combination of 2D transposed conv olutions and a final 2D conv olution (Figure 1). The key step of going from a 1D to a 2D domain therefore consists in the simple yet practical reshaping operation. Our generator is detailed in 8. Note that projecting the input data onto an increasingly large number of low-dimensional representations allo ws our network to learn many dif ferent features from the input data. If not all of the different lo w-dimensional representations are needed to perform the mapping between the considered domains, then during training the network is expected to learn the relev ant representations only . The Adam optimization solver [ 14 ] was used for training. W e used a learning rate of 0.00001 for the multi-Gaussian case and a learning rate of 0.0002 for the categorical case (see section 3.3 for details about these two cases), and values of 0.5 and 0.999 for the β 1 and β 2 momentum parameters. For the multi-Gaussian case, the vec2pix model realizations were post-processed with a median filter (see section 3.3 for details). Unless stated otherwise, the number of epochs used in training is 200 for the GPR case studies (see section 3.3.1) and 300 for the flow case studies (see section 3.3.2), and the batch size is 25. For e very experiment, we first used 20,000 e xamples of x i - y i pairs. T o study the sensitivity of our results to the training set size, we also considered training with 5,000 and 10,000 training examples for e very case study (see section 4.6). W e used an additional small validation set of 100 pairs (unseen by the training algorithm) to monitor the ev olution of the loss function during training (see Figure 2). The indices of the input-output training pairs are shuf fled at the beginning of every epoch to help the gradient descent escape local minima. Furthermore, to make training robust to the noise in the data, that is, to account for the data measurement error during training, each true data vector used for training was corrupted with a ne w Gaussian white noise realization prior to the next epoch. W ith respect to performance e valuation, an independent test set made of 1000 e xamples was used to assess the performance of the proposed approach. Hence, in version performance is assessed by e valuating how well each of those 1000 test models are recov ered when the trained G Y X transformer is fed with the corresponding noise-contaminated data. 3 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 1: Simplified illustration of the vec2pix architecture. The gray signifies the input data vector (left) and output image or model (right). The 1D input data are standardized such that they hav e a zero-mean and unit variance. The output models are in [ − 1 , 1 ] . The orange and violet colors denote the 1D and 2D parts of the network, respectiv ely . The sketch is not to scale and the depths of the conv olutions are either represented by a single extra unit length in the horizontal direction (central orange rectangle and Resnet blocks) or are not represented at all (orange and violet trapezoids). More specifically , y and ˜ x are the input vector and reconstructed model, respectively , G 1D denotes the series of 1D con volutions with increasing depths, r represents the 1D-to-2D reshape operation, G 2D signifies the series of 2D transposed con volutions and final 2D con volution with decreasing depths, and ResNet refers to the ensemble of “ResNet" residual blocks. 4 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 2: Con ver gence of the used (mean l 1 ) reconstruction loss for an independent test set of 100 samples and Case study 4: transient pressure data and binary channelized domain. The T r 5k, T r 10k and T r 20k labels denote training with 5000 training samples, 10,000 training samples and 20,000 training samples, respectiv ely . 3.3 Synthetic In verse Pr oblems T o test vec2pix , we consider both crosshole ground penetrating radar (GPR) data and transient pressure data acquired during pumping. As for prior geologic models, we consider two common cases: a 2D multi-Gaussian prior and a 2D binary channelized aquifer prior . Regarding the latter, the DeeSse (DS) MPS algorithm [ 20 ] was used to generate the training and test models from the channelized aquifer training image proposed by Zahner et al. [36] . T o produce the multi-Gaussian realizations for training and test purposes, we used the circulant embedding method [ 4 ]. For the multi-Gaussian case, the vec2pix predictions were postprocessed by application of a median filter with a kernel size of either 3 (GPR case) or 5 (hydraulic case) pixels in each spatial direction. No postprocessing was applied to the vec2pix predictions for the categorical case, e xcept for thresholding before running the forward solver . 3.3.1 Crosshole GPR data Crosshole GPR imaging uses a transmitter antenna to emit a high-frequency electromagnetic wave at a location in one borehole and a receiv er antenna to record the arriving ener gy at a location in another borehole. The considered synthetic measurement data are first-arriv al travel times for sev eral transmitter and receiv er locations. These data 5 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 contain information about the GPR velocity distrib ution between the boreholes. The GPR velocity primarily depends on dielectric permittivity , which is strongly influenced by water content and, consequently , porosity in saturated media [ 28 ]. The considered model domain is of size 128 × 64 with a cell size of 0.1 m, and our setup consists of two vertical boreholes that are located 6.4 m apart placed at the left and right hand sides of the domain. Sources (left) and receiv ers (right) are located between 0.5 and 12.5 m depth with 0.5 m spacing (Figures 3a and 3c), leading to a total dataset of y = d GPR of 625 tra vel times. The forward nonlinear ray-based response is simulated by the pyGIMLi toolbox [ 29 ] using the Dijkstra method. The measurement error used to corrupt the data is a zero-mean uncorrelated Gaussian with a standard deviation of 0.5 ns, which is typical for high-quality GPR field data. For the binary channelized aquifer case, the channel and background facies are assigned velocities of 0.06 m ns − 1 and 0.08 m ns − 1 , respectiv ely (Figure 3c). For the multi-Gaussian case, a zero-mean anisotropic Gaussian cov ariance model with a v ariance (sill) of 0.5, integral scales in the horizontal and vertical directions of 2 m (20 pix els) and 4 m (40 pixels), respecti vely , and anisotropy angle of 60 ◦ was selected. The model realizations, were then scaled in [ − 1 , 1] using the minimum and maximum pixel v alues ov er the 20,000 training models before the following relationship was used to con vert a scaled model, x , into a velocity model, x VEL = 0 . 06 + 0 . 02 ( 1 − x ) m/ns (Figure 3a). For illustrativ e purposes, the simulated data vectors corresponding to the models depicted in Figures 3a and 3c are shown in Figures 4a and 4c. 3.3.2 T ransient pumping data Our second type of data consists of transient piezometric heads induced by pumping. The 80 × 128 aquifer domain lies in the x − y plane with a grid cell size of 1 m and a thickness of 10 m. For the binary channelized aquifer case, channel and matrix materials (see Figure 3d) are assigned hydraulic conductivity v alues, K s , of 1 × 10 − 3 m/s and 1 × 10 − 5 m/s, respectiv ely . For the multi-Gaussian case, the same geostatistical parameters as for the GPR setup are used for log 10 ( K s ) , except that the mean is -3 and the variance 0.1. The assumed specific storage and specific yield of the aquifer are 0.0003 m − 1 and 0.3 (-), respectiv ely . The MODFLO W -NWT [ 25 ] code is used to simulate unconfined transient groundwater flow with no-flo w boundaries at the upper and lower sides and a lateral head gradient of 0.01 (-) with water flowing in the x -direction. Four wells are sequentially extracting water for 20 days at a rate of 0.001 m 3 /s (red dots in Figures 3b and d). The measurement data were formed by acquiring daily simulated heads in the four pumping wells (red dots in Figures 3b and d) and nine measuring wells (white crosses in Figures 3b and 3d) during the 80 days simulation period. The synthetic measurement data comprises y = d Flow of 1040 heads. The standard de viation of the measurement error used to contaminate these data with a Gaussian white noise is set to 0.01 m. Figures 4b and 4d display the concatenated data vectors corresponding to the models depicted in Figures 3b and 3d. 6 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 3: Four considered synthetic case studies. The GPR tomography case with (a) multi-Gaussian and (c) channelized bimodal surface structures with red triangles and orange squares representing the GPR source and receiv er positions, respectiv ely . The transient pumping cases with (b) multi-Gaussian and (d) channelized bimodal surface structures and red dots and white crosses representing the pumping/observation and observ ation-only wells, respectiv ely . The models displayed in subfigures (a-d) are randomly chosen from the 20,000 training models considered for each of the four examples. 7 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 4: Simulated data corresponding to the training models depicted in Figure 3. 8 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 3.4 Uncertainty Quantification Most high-dimensional in verse problems are under -determined, implying that the mapping from noise-contaminated data to a model is non-unique. For this reason, it is important to explore different mappings such that uncertainty in the in verse solution can be assessed. Recent work on understanding the loss landscape of high-dimensional deep networks [ 7 , 6 ] has shown that building deep ensembles by training the network multiple times using random initialization of weights and biases works well empirically , and is currently the most viable strategy for exploring multi-modal landscapes. In particular , it has been found to outperform Monte Carlo dropout and v arious subspace sampling strategies, which often drastically underestimate uncertainty [ 6 ]. Here we adopt such an ensemble framework to inv estigate predicti ve uncertainty , using a small ensemble of fiv e trained models. That said, we stress that uncertainty quantification in the context of deep learning is still lar gely an unresolved problem. 4 Results For each of the four considered case-studies, we in vestigate the performance of G Y X based on the independent i = 1 , · · · , 1000 test pairs of model, x i , and data, y i . For each test model, x i , the root-mean-square error (RMSE) between the associated data y i and the j = 1 , · · · , 20000 training data vectors y j is computed and the minimum RMSE ov er the resulting 20,000 values is retained as the smallest distance in data space between the considered test model and the training set. On this basis, we specifically compare the true and predicted model for cases where: 1. The true model is taken as the most dif ferent test model from the set of training models in the data space. 2. The true model is taken as the second most different test model from the set of training models in the data space. 3. A representativ e model of the test set is chosen and this procedure is repeated four times. Cases 1 and 2 serve to highlight the capacity of vec2pix to generalize for cases that are distinctively different from the training data. This leads to six cases where differences between true models and those predicted by vec2pix are scrutinized. For each case we perform two predictions (#1 and #2) based on two different noise realizations used to corrupt the true measurement data. This is done to assess the impact of the measurement data noise realization on prediction accuracy , which should be limited because of our rob ust training strategy . Thus, the smaller the differences between these two model predictions the better . These two predictions are not be confounded with our main uncertainty quantification estimates which, a stated earlier , are based on deep ensembles (see section 4.5). Furthermore, the complete distribution of 1000 RMSEs between the test data and the data simulated by feeding the forw ard solver with the models predicted by G Y X for these test data is also considered. In addition, two similarity indices between true and generated vec2pix models are computed for the 1000 test examples: the l 1 norm, and the widely-used structural similarity index (SSIM) [33] SSIM ( u , v ) = 2 µ u µ v + c 1 2 σ uv + c 2 µ 2 u + µ 2 v + c 1 σ 2 u + σ 2 v + c 2 , (3) where u and v denote tw o N p × N p windows subsampled from x and ˜ x , respecti vely , µ and σ 2 are the mean and variance of u and v , σ uv represents the cov ariance between u and v , and c 1 = 0 . 01 and c 2 = 0 . 03 are two small constants [ 33 ]. A veraged ov er all u and v sliding windows, the mean SSIM ranges from -1 to 1, with 1 implying that the two compared images are identical. Similarly to Sun [31] and Earp and Curtis [5], we set N p = 7 . 4.1 Case study 1: crosshole GPR data and multi-Gaussian model domain The vec2pix results for the GPR first-arri val tra vel time tomography within a multi-Gaussian domain are presented in Figure 5 and T able 1 for the six selected true models, while T able 2 lists the corresponding performance statistics for the 1000 test e xamples. The produced vec2pix models al ways induce a lo wer data misfit and are more similar to the true test models than the corresponding closest training models in data space (T ables 1 and 2). For instance, the vec2pix models display a two times smaller l 1 -norm than the closest training models in data space (T able 1). Also, the SSIMs of the vec2pix models are 10% to 30% larger than those of the closest training models in data space (T ables 1 and 2). Using 20,000 training examples, it is consequently a better option to train and use vec2pix to inv ert the “measurement" data than to simply pick up the training model with the best corresponding data fit. This shows that vec2pix can generalize. The data RMSE of the forward-simulated vec2pix realizations are globally in the 0.5 ns - 0.9 ns range with a median of 0.58 ns (T able 2), which is reasonably close to the “true" noise level of 0.5 ns used to contaminate the data (“true" measurement error). Overall, as compared to using the closest training model, vec2pix allo ws for a reduction in data RMSE by a factor of two for the considered multi-Gaussian problem (T able 1). Howe ver , the vec2pix models are a bit too smooth. 9 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 5: Results for crosshole GPR data in a multi-Gaussian domain. (a-d) The true model is the most different test model from the 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two different noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most different test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representati ve test models. T able 1 lists the prediction quality statistics associated with the models displayed in the (a - x) subfigures. 10 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 T able 1: Statistics of the results obtained for crosshole GPR data in a multi-Gaussian domain. The naming conv ention is the same and the letters (a-x) refer to the models in Figure 4. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 -norm is calculated in terms of velocity (m/ns) while the SSIM is computed in the rescaled [0 , 1] domain. Closest TR means the closest training model in data space and Predicted #1 and Predicted #2 signify predicted models from two dif ferent noise realizations. T rue model RMSE data (ns) l 1 (m/ns) SSIM (-) (a) T rue 0.5 0 1 (b) Closest TR 1.73 18.92 0.72 (c) Predicted #1 0.72 8.07 0.92 (d) Predicted #2 0.73 8.10 0.92 (e) T rue 0.5 0 1 (f) Closest TR 1.44 15.35 0.77 (g) Predicted #1 0.63 6.26 0.93 (h) Predicted #2 0.68 6.60 0.92 (i) T rue 0.5 0 1 (j) Closest TR 0.98 13.31 0.79 (k) Predicted #1 0.56 6.26 0.93 (l) Predicted #2 0.59 5.89 0.94 (m) T rue 0.5 0 1 (n) Closest TR 1.01 12.37 0.82 (o) Predicted #1 0.60 7.20 0.93 (p) Predicted #2 0.63 7.23 0.92 (q) T rue 0.5 0 1 (r) Closest TR 1.16 13.17 0.78 (s) Predicted #1 0.58 5.89 0.93 (t) Predicted #2 0.61 6.65 0.92 (u) T rue 0.5 0 1 (v) Closest TR 1.05 12.65 0.81 (w) Predicted #1 0.63 7.83 0.90 (x) Predicted #2 0.67 7.55 0.91 T able 2: Statistics for the case of crosshole GPR data in a multi-Gaussian domain when considering the 1000 independent test models. The comparison is made between the predicted model (Predicted #1) and the closest training model in data space (Closest TR) within the training set of 20,000 examples using the RMSE in data space ( RMSE data ), the l 1 -norm ( l 1 ) and the structural similarity index (SSIM). F or each metric, the minimum (Min), median (Median) and maximum (Max) values are reported together with the 10 th , 25 th , 75 th and 90 th percentiles (P10, P25, P75, P90), respectiv ely . The TR size variable signifies the number of training examples used to train vec2pix . Model TR size Min P10 P25 Median P75 P90 Max RMSE data (ns) Closest TR 20,000 0.77 0.92 0.97 1.03 1.10 1.16 1.73 Predicted 20,000 0.50 0.55 0.56 0.58 0.61 0.63 0.92 Predicted 10,000 0.53 0.59 0.61 0.64 0.68 0.73 1.12 Predicted 5,000 0.56 0.63 0.67 0.81 1.08 1.39 2.24 l 1 (m/ns) Closest TR 20,000 6.92 10.57 11.41 12.36 13.56 14.56 19.27 Predicted 20,000 3.68 5.19 5.80 6.50 7.35 8.28 12.16 Predicted 10,000 4.29 5.77 6.47 7.24 8.16 9.05 13.48 Predicted 5,000 4.20 6.48 7.11 7.97 8.83 10.00 13.14 SSIM (-) Closest TR 20,000 0.66 0.75 0.78 0.80 0.83 0.84 0.91 Predicted 20,000 0.87 0.91 0.92 0.93 0.94 0.95 0.96 Predicted 10,000 0.84 0.89 0.91 0.92 0.93 0.94 0.96 Predicted 5,000 0.84 0.88 0.89 0.91 0.92 0.93 0.95 11 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.2 Case study 2: crosshole GPR data and binary channelized domain Results for tra vel time tomography within a binary channelized domain are displayed in Figure 6, T able 3 and T able 4. These results are in line with those obtained for the multi-Gaussian case: the predicted test models show lower data RMSE, lower l 1 and larger SSIM statistics than the closest training models in data space. Also, the predicted test models look visually close to their true counterparts. With a median data RMSE of 0.87 ns and min and max data RMSEs of 0.58 ns and 1.98 ns (T able 4), the predicted test models induce data RMSE values that are significantly larger than the “true" noise lev el of 0.5 ns. Nevertheless, vec2pix permits reduction in data RMSE of a factor two to three compared to the closest corresponding training model (T able 3). The associated SSIM indices are smaller than for the multi-Gaussian case: the P10 and median SSIM v alues are 0.72 and 0.81 (T able 4) against 0.91 and 0.93 for the multi-Gaussian case (T able 2). Globally , despite leading to larger data RMSEs compared to the prescribed noise le vel of 0.5 ns, the vec2pix models are similar to the true ones (Figure 6 and T ables 3). Figure 6: Results for crosshole GPR data in a binary channelized domain. (a-d) The true model is the most diff erent test model from the 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two dif ferent noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most different test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representati ve test models. T able 3 lists the prediction quality statistics associated with the models displayed in the (a - x) subfigures. 12 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 T able 3: Statistics of the results obtained for crosshole GPR data in a binary channelized domain. The naming conv ention is the same and the letters (a-x) refer to the models in Figure 5. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 -norm is calculated in terms of velocity (m/ns) while the SSIM is computed in the rescaled [0 , 1] domain. Closest TR means the closest training model in data space and Predicted #1 and Predicted #2 signify predicted models from two dif ferent noise realizations. T rue model RMSE data (ns) l 1 (m/ns) SSIM (-) (a) T rue 0.5 0 1 (b) Closest TR 3.50 45.36 0.40 (c) Predicted #1 1.05 11.04 0.71 (d) Predicted #2 1.06 10.98 0.72 (e) T rue 0.5 0 1 (f) Closest TR 3.06 42.42 0.41 (g) Predicted #1 1.31 14.42 0.67 (h) Predicted #2 1.26 15.20 0.66 (i) T rue 0.5 0 1 (j) Closest TR 2.46 49.28 0.39 (k) Predicted #1 1.24 9.90 0.78 (l) Predicted #2 0.90 10.50 0.77 (m) T rue 0.5 0 1 (n) Closest TR 2.14 20.06 0.60 (o) Predicted #1 0.99 10.70 0.74 (p) Predicted #2 0.98 9.46 0.77 (q) T rue 0.5 0 1 (r) Closest TR 1.20 14.92 0.70 (s) Predicted #1 0.66 4.90 0.87 (t) Predicted #2 0.61 4.52 0.88 (u) T rue 0.5 0 1 (v) Closest TR 2.54 24.66 0.64 (w) Predicted #1 0.96 5.24 0.86 (x) Predicted #2 0.90 5.88 0.84 T able 4: Statistics for the case of crosshole GPR data in a binary channelized domain when considering the 1000 independent test models. The comparison is made between the predicted model (Predicted #1) and the closest training model in data space (Closest TR) within the training set of 20,000 examples using the RMSE in data space ( RMSE data ), the l 1 -norm ( l 1 ) and the structural similarity index (SSIM). F or each metric, the minimum (Min), median (Median) and maximum (Max) values are reported together with the 10 th , 25 th , 75 th and 90 th percentiles (P10, P25, P75, P90), respectiv ely . The TR size variable signifies the number of training examples used to train vec2pix . Model TR size Min P10 P25 Median P75 P90 Max RMSE data (ns) Closest TR 20,000 0.77 1.31 1.53 1.78 2.09 2.35 3.50 Predicted 20,000 0.58 0.70 0.77 0.87 1.00 1.13 1.98 Predicted 10,000 0.56 0.79 0.89 1.02 1.17 1.40 2.36 Predicted 5,000 0.58 0.83 0.94 1.11 1.32 1.57 2.59 l 1 (m/ns) Closest TR 20,000 4.82 13.88 17.59 22.42 28.97 35.29 56.74 Predicted 20,000 2.06 4.46 6.04 7.76 10.03 12.48 24.88 Predicted 10,000 3.06 5.80 7.48 9.70 12.53 15.16 25.20 Predicted 5,000 2.78 6.18 8.36 10.89 14.60 17.46 27.32 SSIM (-) Closest TR 20,000 0.32 0.50 0.56 0.63 0.70 0.75 0.90 Predicted 20,000 0.59 0.72 0.77 0.81 0.85 0.88 0.94 Predicted 10,000 0.57 0.69 0.73 0.78 0.82 0.86 0.92 Predicted 5,000 0.54 0.66 0.71 0.77 0.81 0.85 0.93 13 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.3 Case study 3: transient pressur e data and multi-Gaussian domain For the transient pumping experiment within a multi-Gaussian domain, the vec2pix models are visually close to the true ones (Figure 7), e ven if the y appear slightly too smooth. The RMSEs in data space produced by the vec2pix models are ov erall similar to those produced by the closest training models (T ables 5 and 6), and are mostly distributed in the 0.02 m - 0.03 m range that is to be compared with the “true" noise lev el of 0.01 m. Ho wev er , the model reconstruction statistics, l 1 -norm and SSIM, are substantially better for the vec2pix models than for the closest training models in data space (T ables 5 and 6). Indeed, the vec2pix models display 40% to 60% smaller l 1 -norms and 15% to 25% larger SSIMs. T able 5: Statistics of the results obtained for transient subsurface pressure data in a multi-Gaussian domain. The naming con vention is the same and the letters (a-x) refer to the models in Figure 6. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 -norm is calculated in terms of log 10 K s (-) while the SSIM is computed in the rescaled [0 , 1] domain. Closest TR means the closest training model in data space and Predicted #1 and Predicted #2 signify predicted models from two dif ferent noise realizations. T rue model RMSE data (m) l 1 (m) SSIM (-) (a) T rue 0.010 0 1 (b) Closest TR 0.060 2807 0.79 (c) Predicted #1 0.023 1837 0.90 (d) Predicted #2 0.021 1788 0.90 (e) T rue 0.010 0 1 (f) Closest TR 0.049 2886 0.77 (g) Predicted #1 0.054 1559 0.92 (h) Predicted #2 0.041 1629 0.91 (i) T rue 0.010 0 1 (j) Closest TR 0.027 2848 0.75 (k) Predicted #1 0.018 1612 0.93 (l) Predicted #2 0.021 1503 0.93 (m) T rue 0.010 0 1 (n) Closest TR 0.022 3575 0.76 (o) Predicted #1 0.017 1913 0.89 (p) Predicted #2 0.023 1820 0.90 (q) T rue 0.010 0 1 (r) Closest TR 0.024 3122 0.74 (s) Predicted #1 0.017 1954 0.88 (t) Predicted #2 0.019 2048 0.88 (u) T rue 0.010 0 1 (v) Closest TR 0.020 3644 0.76 (w) Predicted #1 0.016 1524 0.91 (x) Predicted #2 0.013 1442 0.91 14 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 T able 6: Statistics for the case of transient subsurface pressure data in a multi-Gaussian domain when considering the 1000 independent test models. The comparison is made between the predicted model (Predicted #1) and the closest training model in data space (Closest TR) within the training set of 20,000 examples using the RMSE in data space ( RMSE data ), the l 1 -norm ( l 1 ) and the structural similarity index (SSIM). For each metric, the minimum (Min), median (Median) and maximum (Max) values are reported together with the 10 th , 25 th , 75 th and 90 th percentiles (P10, P25, P75, P90), respectively . The TR size variable signifies the number of training examples used to train vec2pix . Model TR size Min P10 P25 Median P75 P90 Max RMSE data (m) Closest TR 20,000 0.015 0.019 0.021 0.023 0.026 0.029 0.061 Predicted 20,000 0.011 0.016 0.019 0.024 0.031 0.039 0.070 Predicted 10,000 0.011 0.017 0.020 0.025 0.034 0.046 0.091 Predicted 5,000 0.013 0.021 0.026 0.033 0.043 0.055 0.096 l 1 (m) Closest TR 20,000 1819 2408 2616 2848 3088 3321 3984 Predicted 20,000 1273 1527 1623 1753 1894 2035 2408 Predicted 10,000 1176 1586 1703 1832 1983 2130 2597 Predicted 5,000 1203 1600 1734 1929 2108 2294 3055 SSIM (-) Closest TR 20,000 0.58 0.71 0.74 0.76 0.79 0.81 0.86 Predicted 20,000 0.81 0.86 0.88 0.89 0.90 0.92 0.94 Predicted 10,000 0.80 0.86 0.87 0.88 0.90 0.91 0.93 Predicted 5,000 0.80 0.85 0.86 0.88 0.89 0.90 0.93 15 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 7: Results for transient subsurface pressure data in a multi-Gaussian domain. (a-d) The true model is the most different test model from the 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two dif ferent noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most different test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representativ e test models. T able 5 lists the prediction quality statistics associated with the models displayed in the (a - x) subfigures. 16 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.4 Case study 4: transient pressur e data and binary channelized domain The hydraulic case with a binary channelized domain is by far the most challenging as the relationship between a binary channelized model and the resulting simulated transient flo w data is highly nonlinear . As a consequence, across the 20,000 training models, the signal-to-noise-ratio (SNR) defined as the ratio of the average RMSE obtained by dra wing prior realizations from the training image by MPS simulation to the noise le vel is in the 60 - 100 range. It is seen that the vec2pix models are in better visual agreement with the true model than the closest training models in data space (Figure 8). This is confirmed by two to three times smaller l 1 -norms and 10% to 220% larger SSIM indices (T ables 7 and 8). Even if vec2pix produces models that are of much better quality than the closest training models in data space, the resulting RMSEs in data space are often not better than those produced by the closest training models in data space. This is because a change of facies in the surroundings of a pumping well (red dots in Figure 3d) can dramatically affect the corresponding simulated data. T able 7: Statistics of the results obtained for transient subsurface pressure data in a binary channelized domain. The naming con vention is the same and the letters (a-x) refer to the models in Figure 7. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 -norm is calculated in terms of log 10 K s (-) while the SSIM is computed in the rescaled [0 , 1] domain. Closest TR means the closest training model in data space and Predicted #1 and Predicted #2 signify predicted models from two dif ferent noise realizations. T rue model RMSE data (m) l 1 (m) SSIM (-) (a) T rue 0.010 0 1 (b) Closest TR 0.385 9908 0.21 (c) Predicted #1 0.589 2504 0.67 (d) Predicted #2 0.590 2594 0.67 (e) T rue 0.010 0 1 (f) Closest TR 0.281 8180 0.28 (g) Predicted #1 0.363 2536 0.64 (h) Predicted #2 0.215 2246 0.65 (i) T rue 0.010 0 1 (j) Closest TR 0.052 8028 0.32 (k) Predicted #1 0.229 3068 0.64 (l) Predicted #2 0.067 2950 0.65 (m) T rue 0.010 0 1 (n) Closest TR 0.037 2754 0.71 (o) Predicted #1 0.021 1338 0.81 (p) Predicted #2 0.033 1420 0.80 (q) T rue 0.010 0 1 (r) Closest TR 0.027 3398 0.68 (s) Predicted #1 0.014 1638 0.80 (t) Predicted #2 0.014 1624 0.80 (u) T rue 0.010 0 1 (v) Closest TR 0.067 9774 0.20 (w) Predicted #1 0.379 3418 0.60 (x) Predicted #2 0.416 3334 0.60 17 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 T able 8: Statistics for the case of transient subsurface pressure data in a binary channelized domain when considering the 1000 independent test models. The comparison is made between the predicted model (Predicted #1) and the closest training model in data space (Closest TR) within the training set of 20,000 examples using the RMSE in data space ( RMSE data ), the l 1 -norm ( l 1 ) and the structural similarity index (SSIM). For each metric, the minimum (Min), median (Median) and maximum (Max) values are reported together with the 10 th , 25 th , 75 th and 90 th percentiles (P10, P25, P75, P90), respectively . The TR size variable signifies the number of training examples used to train vec2pix . Model TR size Min P10 P25 Median P75 P90 Max RMSE data (m) Closest TR 20,000 0.011 0.024 0.030 0.043 0.064 0.092 0.385 Predicted 20,000 0.011 0.016 0.020 0.031 0.083 0.248 0.713 Predicted 10,000 0.011 0.018 0.023 0.039 0.105 0.276 8.981 Predicted 5000 0.011 0.020 0.028 0.051 0.128 0.308 8.981 l 1 (m) Closest TR 20,000 548 2335 3251 4359 5665 7049 11160 Predicted 20,000 316 1064 1436 1990 2559 3222 6634 Predicted 10,000 348 1174 1608 2154 2838 3504 6692 Predicted 5,000 456 1316 1796 2415 3171 4015 6750 SSIM (-) Closest TR 20,000 0.11 0.37 0.47 0.55 0.64 0.72 0.89 Predicted 20,000 0.36 0.62 0.67 0.73 0.79 0.84 0.93 Predicted 10,000 0.34 0.59 0.66 0.71 0.77 0.82 0.93 Predicted 5,000 0.34 0.56 0.62 0.70 0.76 0.81 0.91 18 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 8: Results for transient subsurface pressure data in a binary channelized domain. (a-d) The true model is the most different test model from the 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two different noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most dif ferent test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representati ve test models. T able 7 lists the prediction quality statistics associated with the models displayed in the (a - x) subfigures. 19 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.5 Predicti ve Uncertainty T o assess predictiv e uncertainty using deep ensembles [ 7 ], we trained the networks fi ve times using random initialization and present the results obtain for the multi-Gaussian (Figure 9 and T able 9) and binary channelized hydraulic cases (Figure 10 and T able 10). These estimates should only be considered qualitati vely as it is based on a small ensemble of fiv e members and the resulting errors only refer to errors induced by training the neural networks. For instance, it is difficult to see a clear pattern for the multi-Gaussian case, except for an expected general tendency of larger uncertainties away from the measurement points. For the categorical case, the predicti ve uncertainty is as expected the largest at boundaries between the two categories [c.f., Figure 3 in 36 ] with the thickness of the uncertainty bands being the largest at the sides of the domain, that is, the furthest away from the measurement points. Moreover , all members of the considered ensembles capture the same main spatial patterns (Figures 9 and 10) and show comparable performance (T ables 9 and 10). T able 9: Statistics of the ensemble-based uncertainty quantification results obtained for transient subsurface pressure data in a multi-Gaussian domain. The True and Mean models are those depicted in Figure 8. The mean model is obtained from the deep ensemble models #1 to #5. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 -norm is calculated in terms of log 10 K s (-) while the SSIM is computed in the rescaled [0 , 1] domain. T rue model RMSE data (m) l 1 (m) SSIM (-) (a) T rue 0.010 0 1 (b) Mean 0.032 1858 0.71 (d) Model #1 0.023 1837 0.73 (e) Model #2 0.040 1965 0.68 (f) Model #3 0.025 1949 0.67 (g) Model #4 0.032 1876 0.73 (h) Model #5 0.028 2073 0.67 (i) T rue 0.010 0 1 (j) Mean 0.068 1597 0.73 (l) Model #1 0.054 1559 0.76 (m) Model #2 0.046 1779 0.68 (n) Model #3 0.055 1513 0.72 (o) Model #4 0.059 1660 0.71 (p) Model #5 0.061 1922 0.64 (q) T rue 0.010 0 1 (r) Mean 0.016 1571 0.77 (t) Model #1 0.016 1525 0.75 (u) Model #2 0.020 1572 0.75 (v) Model #3 0.015 1795 0.72 (w) Model #4 0.018 1492 0.77 (x) Model #5 0.016 1752 0.74 20 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 T able 10: Statistics of the ensemble-based uncertainty quantification results obtained for transient subsurface pressure data in a binary channelized domain. The T rue and Mean models are those depicted in Figure 9. The mean model is obtained from the deep ensemble models #1 to #5. RMSE data denotes the RMSE in data space, l 1 refers to the l 1 -norm and SSIM to the structural similarity index. The l 1 is calculated in terms of log 10 K s (-) while the SSIM is computed in the rescaled [0 , 1] domain. T rue model RMSE data (m) l 1 (m) SSIM (-) (a) T rue 0.010 0 1 (b) Mean 0.577 2492 0.67 (d) Model #1 0.590 2594 0.67 (e) Model #2 0.095 2908 0.66 (f) Model #3 0.267 2254 0.71 (g) Model #4 0.523 2912 0.64 (h) Model #5 0.523 1782 0.73 (i) T rue 0.010 0 1 (j) Mean 0.449 2321 0.62 (l) Model #1 0.216 2246 0.65 (m) Model #2 0.426 2558 0.63 (n) Model #3 0.183 2486 0.63 (o) Model #4 0.187 1988 0.67 (p) Model #5 0.233 2356 0.63 (q) T rue 0.010 0 1 (r) Mean 0.377 3046 0.60 (t) Model #1 0.416 3334 0.60 (u) Model #2 0.252 3422 0.58 (v) Model #3 0.271 3226 0.59 (w) Model #4 0.049 2520 0.64 (x) Model #5 0.397 2728 0.63 21 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 9: Predictive uncertainty of in v erse mapping using deep ensembles associated with the previously selected six true models for the case with transient subsurface pressure data and a multi-Gaussian domain. The True label indicates the true model while the Mean and STD labels denote the mean predicted model and its associated standard deviation map calculated over an ensemble of fi ve members. T able 9 lists the prediction quality statistics associated with the mean model and the five ensemble models. 22 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 10: Predictive uncertainty of in verse mapping using deep ensembles associated with the pre viously selected six true models for the case with transient subsurface pressure data and a binary channelized domain. The True label indicates the true model while the Mean and STD labels denote the mean predicted model and its associated standard deviation map calculated over an ensemble of fiv e members. T able 10 lists the prediction quality statistics associated with the mean model and the five ensemble models. 23 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.6 Effect of the Size of the T raining Set The e xperiments described in sections 4.1 to 4.4 were repeated using 5000 and 10,000 training examples to train vec2pix . The resulting peformance statistics for the same true models as those considered before are reported in T ables 2, 4, 6 and 8. Compared to using a training set size of 20,000, reducing the size of the training set degrades performance only moderately . For instance, it remains a better strate gy to train vec2pix with 5000 training examples than to pick up the best model in the data space among 20,000 examples. Figures 11 and 12 depict models produced by vec2pix when trained with 5000 training examples for the case of a binary channelized model domain. Even if the results are less good than the corresponding results in Figures 6 and 8 for 20,000 training examples, we still find that the produced models are visually close to the true models. T ypical ev olutions of the l 1 loss during training are depicted in Figure 2 for the three training set sizes and the case of transient flow within a binary channelized model domain (Case study 4, section 4.4). It is observed that for every training set size the major l 1 reduction occurs within the first 50 - 100 training epochs. Figure 11: Results for crosshole GPR data in a binary channelized domain when vec2pix is trained with 5000 training examples only . (a-d) The true model is the most different test model from the set of 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two dif ferent noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most dif ferent model test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representativ e test models. 24 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 Figure 12: Results for transient subsurface pressure data in a binary channelized domain when vec2pix is trained with 5000 training examples only . (a-d) The true model is the most dif ferent test model from the set of 20,000 training models in data space: (a) true model, (b) closest training model in data space, (c, d) predicted models from two dif ferent noise realizations (Pred #1 and Pred #2). The same plotting style is adapted for cases (e-h) where the true model is the second most dif ferent model test model in data space, and (i-l), (m-p), (q-t) and (m-x) for four representativ e test models. 25 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 4.7 Pure Regr ession V ersus Adversarial Lear ning Previous geoscientific applications of deep domain transfer based on the pix2pix - cycleGAN framework relied on adversarial training [ 8 ] of G Y X . W e hav e done so herein too and, as described below , noticed no real added value of using adversarial learning. W e tested with the W asserstein generativ e adversarial network [WGAN, 2 ] training framew ork, in which the W asserstein distance between the probability distributions of the true and generated data is minimized. Furthermore, we used the W asserstein GAN with gradient penalty [WGANGP , 10 ] which has been shown to further stabilize training compared to the WGAN. The full model now consists of the mapping function, G Y X with an associated critic function, D X : G Y X : R Y → R X , D X : R X → [0 , 1] . (4) The critic or discriminator , D X , tries to distinguish between the true and predicted models, x and ˜ x . For this experiment we used a fully conv olutional D X such as in Laloy et al. [16] . At training time, G Y X and D X are jointly learned using the sum of two losses: an adversarial loss and a reconstruction loss. The moti vation for using an adv ersarial loss is to ensure that a realistically-looking ˜ x model is predicted for any gi ven y vector , while as for pure re gression training the reconstruction loss is required to enforce that each ˜ x is in close agreement with its corresponding x . The WGANGP objectiv e function is giv en by L WGAN ( G Y X , D X , y , x ) = E y ∼ p y [ D X ( G Y X ( y ))] − E x ∼ p x [ D X ( x )] + λ GP E ˆ x ∼ p ˆ x h ( ||∇ ˆ x D X ( ˆ x ) || 2 − 1) 2 i , (5) where p ˆ x is sampling uniformly along straight lines between pairs of points sampled from the data distribution, p x , and the generator distribution, p ˜ x . This means that the ˆ x models are interpolations between the real, x , the and generated, ˜ x , models. The penalty coefficient, λ GP , is set to 10 [10]. Combining equations (2) and (5), the full objectiv e function for training vec2pix becomes min G Y X max D X {L WGAN ( G Y X , D X , y , x ) + λ rec L rec ( G Y X , y , x ) } , (6) where λ rec determines the relative importance of each objectiv e. Extensiv e testing revealed that λ rec needs to be set to a rather large value to get the most accurate reconstruction: λ rec ≥ 10 5 . Given the actual values taken by L WGAN ( G Y X , D X , y , x ) and λ rec L rec ( G Y X , y , x ) , this means that the adversarial loss has virtually no influence on the total loss function, and therefore, adversarial training is not needed for the problems considered herein. 5 Discussion W e hav e introduced vec2pix , a deep neural network for predicting 2D subsurface property fields from one-dimensional measurement data (e.g., time series). Our approach is illustrated using (1) synthetic crosshole first-arri val GPR tra vel times for recovering a 2D velocity field, and (2) time series of transient hydraulic heads to infer a 2D hydraulic conductivity field. For each problem, both a multi-Gaussian and a binary channelized subsurface domain with long- range connecti vity are considered. T raining vec2pix is achiev ed using (at most) 20,000 training examples. For e very considered case, our method is found to retriev e a model that is much closer to the true model than the closest training model in data space. Even if these recov ered models generally look similar to the true models, the data RMSE obtained when forward simulating the vec2pix models are higher than the prescribed noise le vel (that is, Gaussian white noise used to contaminate the true data). This is particularly true for our fourth case study that considers a transient pumping experiment within a channelized subsurf ace domain for which the relationship between model and simulated pressure data is highly nonlinear and, to some extent, not unique. If data fitting to the noise level is needed, we suggest that the solution deriv ed by vec2pix could be used as a starting point for a multiple-point statistics (MPS) based inv ersion such as sequential geostatistical resampling [e.g., 21 ]. The computational cost incurred by this additional MPS-based step will largely depend on the quality of the vec2pix -deri ved solutions. Uncertainty quantification based on deep ensembles obtained by training the network repeatedly with random initial- ization provides, at least, qualitatively-meaningful results. Although we only considered ensembles of fi ve, we note that the uncertainty gro ws as expected with distance from the measurement points. For the binary channelized case, we reproduce similar patterns of uncertainty as found with completely different in version approaches [ 36 ], with the uncertainty being the highest at channel boundaries. More work with lar ger deep ensembles is needed to understand to which e xtent the uncertainty quantification can be interpreted more quantitativ ely . W e clarify that the uncertainty 26 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 assessed herein is not directly based on the data misfit, that is, the produced ensemble models may not induce a data misfit that is in accordance with the underlying noise model. So a suggested above, for a pure Bayesian interpretation of uncertainty it is necessary to use the vec2pix results as a starting point for a MCMC-based MPS in version. A training set of 20,000 e xamples was used as baseline in this study . Considering a larger training base would likely further improv e prediction quality , but at the cost of a larger computational demand since obtaining one training sample requires one forward model ev aluation. When reducing the training set to only 10,000 and 5000 training examples, we find that the resulting reduction in performance is rather moderate. This suggests that vec2pix could be used with less training examples than 20,000 either for less complex subsurface heterogeneities than those considered in this work or if less accurate results are acceptable. In this trade-off between cost of building the training set and vec2pix performance, the actual choice of the training size needs to be determined based on the test example and study objecti ves. Furthermore, we stress that parallel computing can be used to build the training set since this is to be done offline. The resulting speedup can thus be large if many parallel cores are av ailable. Physics-informed deep networks [e.g., 26 ] hav e been shown to drastically reduce the training needs when building proxy forw ard models. Including such concepts for in verse mapping could further reduce the training demand. Prior studies relying on the pix2pix - cycleGAN framework [ 23 , 31 ] used adversarial training [ 8 ], of which the current standard is the W asserstein generativ e adversarial network [WGAN, 2 ]. W e ev aluated this alternativ e herein, using the state-of-the-art W asserstein GAN with gradient penalty [WGANGP , 10 ] method. Doing so, the total loss function used to train G Y X becomes a weighted sum of two losses: the WGANGP loss (equation (5)) and the reconstruction loss (equation (2)). W e observed that for the considered case studies, adversarial training is not needed. Indeed, to achiev e the most accurate results the relati ve weight of the reconstruction loss compared to the WGANGP loss needs to be so large that the WGANGP loss has ne gligible influence on the total loss function. As an alternativ e to the vec2pix architecture, we tested for the flow problem if it is better to reshape the 1D input vector to 2D at the entry of the network instead of achieving this reshaping in the center of our “diabolo"-like network (Figure 1). Since padding the input 2D matrix is necessary , we considered two padding options: zero-padding and replication padding. Our results consistently showed a 10% reduction in performance based on the l 1 norm. In field applications, the measurements presented to vec2pix will be contaminated with measurement errors. This is why we trained vec2pix with noise-contaminated data. Howe ver , limited testing showed that noise-corrupting the training data or not does not lead to important differences at test time, when the test data are noise-contaminated. That said, we hav e used realistic, but lo w , noise levels to corrupt our data and the situation may change if lar ger noise values are prescribed. Furthermore, real-world applications will bring more complexities such as forward model errors and the degree of inadequac y of the prior geologic model (training image). This warrants further in vestigations with real data. Lastly , our approach requires a ne w training of vec2pix for each measurement configuration (measurement locations and acquisition times). This limitation does not apply to GAN-based in v ersion [ 16 , 17 ] where a GAN is trained once for a giv en training image and in versions of dif ferent direct and indirect measurement datasets can be performed in the latent space of the GAN [ 16 , 24 ]. Howe ver , GAN-based in v ersion still has a substantial computational demand when done probabilistically [ 16 ], while the nonlinearity of the GAN transform may prev ent deterministic gradient-based in version [17] from being ef fectiv e. 6 Conclusion W e introduce vec2pix , a deep neural network for predicting cate gorical and continuous 2D subsurface property fields from one-dimensional measurement data (e.g., time series) and, thereby , offering an alternativ e approach to solve in verse problems. The method is illustrated using (1) synthetic first-arriv al GPR trav el times to infer a 2D velocity field, and (2) synthetic time series of transient hydraulic heads to retrie ve a 2D hydraulic conducti vity field. For each problem type, both a multi-Gaussian and a binary channelized subsurface domain with long-range connectivity are considered. Using a training set of 20,000 e xamples, our approach alw ays recovers a 2D model that is much closer to the true model than the closest training model in the forward-simulated data space. Despite a moderate decrease in performance, this remains also true when using only 5000 training examples. The inferred models generally look visually similar to the true ones, but the data misfits obtained when forward simulating these models are generally lar ger than the noise lev el used to corrupt the true data. T o assess uncertainty , we ha ve used a small deep ensemble, implying that the network is trained multiple times with random initialization. Qualitatively-speaking, these uncertainties are in agreement with the expected uncertainty patterns. This work opens up ne w perspectiv es on how to use deep learning to infer subsurface models from indirect measurement data. 27 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 7 Acknowledgements W e are grateful for helpful suggestions offered by Arnaud Doucet (deep ensembles) and Shiran Levy (choice of training rates). W e are also thankful for the comments offered by the three anonymous revie wers. Upon ac- ceptance of the manuscript, the code will be made av ailable at https://github.com/elaloy/vec2pix . Part of our code is inspired by the cycleGAN-and-pix2pix code by Jun-Y an Zhu ( https://github.com/junyanz/ pytorch- CycleGAN- and- pix2pix ). 8 A ppendix: Network details The G Y X network is made of con volutions, transposed con v olutions and a series of “ResNet" residual blocks [ 11 ]. W e use 6 residual blocks for cases in volving binary images (or models) and 9 residual blocks for cases in volving continuous images. Our used activ ation functions are either rectified linear unit: ReLU ( max (0 ,x ) ) or hyperbolic tangent: T anh, and we use reflection padding in the first and last layers of G Y X . Let c 2d 7 − s 1 − k in 1 − k out 64 − p 0 denote a 7 × 7 2D Con volution-InstanceNorm-ReLU layer with k in = 1 incoming channels (or filters), k out = 64 outgoing channels, stride 1 and zero padding. W e call co 2d 7 − s 1 − k in 1 − k out 64 − p 0 the same layer without normalization and with a T anh acti vation function. Furthermore, tc 2d signifies a 2D T ransposed Con v olution-InstanceNorm-ReLU, op 1 means output padding of 1 and R 2d − k 512 represents a residual block that contains two 3 × 3 2D conv olutional layers with InstanceNorm and k = 512 channels on both layers, and a ReLU activ ation function on the first layer . Lastly , Re ( z r , z c ) and F l a mean reshaping a vector into a z r × z c array and flattening an 2D array , respectively . From input to output layer , our generator is built as follo ws • [ c 1d 7 − s 1 − k in 1 − k out 64 − p 0] • [ c 1d 3 − s 2 − k in 64 − k out 128 − p 1] • [ c 1d 3 − s 2 − k in 128 − k out 256 − p 1] • [ c 1d 3 − s 2 − k in 256 − k out 512 − p 1] • Re ( z r , z c ) • N res × [ R 2d − k 512] • [ tc 2d 3 − s 2 − k in 512 − k out 256 − p 1 − op 1] • [ tc 2d 3 − s 2 − k in 256 − k out 128 − p 1 − op 1] • [ tc 2d 3 − s 2 − k in 128 − k out 64 − p 1 − op 1] • [ co 2d 7 − s 1 − k in 64 − k out 1 − p 0] where z r = X r 8 and z c = X c 8 with X r and X c the numbers of rows and columns of a model X , the incoming data vector y is padded with zeros such as its size matches X r X c 8 , and N res is the selected number of residual blocks (6 or 9, see abov e). References [1] Araya-Polo, M., J. Jennings, A. Adler , and T . Dahlke, 2018. Deep-learning tomography . The Leading Edge, 37(1):58–66. https://library .seg.org/doi/10.1190/tle37010058.1. [2] Arjovsk y , M., Chintala, S., and L. Bottou, 2017. W asserstein GAN. arXiv preprint arXi v:1701.07875. [3] Chan, S., and A. Elsheikh, 2019. Parametric generation of conditional geological realizations using generative neural networks. Computational Geosciences, 23(5), 925 –952. https://doi.or g/10.1007/s10596-019-09850-7. [4] Dietrich, C. R., and G. H. Newsam, 1997. Fast and exact simulation of stationary Gaussian processes through circulant embedding of the cov ariance matrix. SIAM Journal on Scientific Computing, 18, 1088–1107. [5] Earp, S., and A. Curtis, 2020. Probabilistic neural-network based 2D tra vel time tomography . Neural Computing and Applications. https://doi.org/10.1007/s00521-020-04921-8. [6] Fort, S., Hu, H., and B. Lakshminarayanan, 2019. Deep ensembles: a loss landscape perspecti ve. arXi v preprint [7] Fort, S., and S. Jastrzebski. Large scale structure of neural network loss landscapes. 2019. The Annual Conference on Neural Information Processing Systems (NIPS), V ancouver; 2019. 28 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 [8] Goodfellow , I., Pouget-Abadie, J., Mirza, M., Xu, B., W arde-Farley , D., Ozair , S., Courville, A., and Y . Bengio, 2014. Generativ e adversarial networks. The Annual Conference on Neural Information Processing Systems (NIPS), Montréal; 2014. [9] Goodfello w , I., Bengio, Y ., Courville, A. Deep learning. MIT Press; 2016. http://www.deeplearningbook.org . [10] Gulrajani, I., Ahmed, F ., Arjovsk y , M., Dumoulin, V ., and A. Courville, 2017. Improv ed training of W asserstein GANs. arXiv preprint arXi v:1704.00028. [11] He, K., Zhang, X., Ren, S., and J. Sun, 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer V ision and Pattern Recognition (CVPR), http://dx.doi.org/10.1109/CVPR.2016.90. Also av ailable as arXi v preprint [12] Isola, P ., Zhu, J.-Y ., Zhou, T ., and A. A. Efros, 2016. Image-to-image translation with conditional adversarial networks. arXi v preprint [13] Jetchev , N., Bergmann, U., and R. V ollgraf, 2016. T exture synthesis with spatial generativ e adversarial networks. arXiv preprint arXi v:1611.08207. [14] Kingma, D. P ., and J. L. Ba, 2015. ADAM: a method for stochastic optimization, The International Confer ence on Learning Repr esentations (ICLR , San Diego. [15] Laloy , E., R. Hérault, J. Lee, D. Jacques, and N. Linde, 2017. In version using a ne w low-dimensional representation of complex binary geological media based on a deep neural network. Adv ances in W ater Resources, 110, 387–405. https://doi.org/10.1016/j.advw atres.2017.09.029. [16] Laloy , E., R. Hérault, D. Jacques, and N. Linde, 2018. T raining-image based geostatistical in ver - sion using a spatial generativ e adversarial neural network. W ater Resour . Res. 54(1):381–406, 2018. https://doi.org/10.1002/2017WR022148. [17] Laloy , E., N. Linde, C. Ruf fino, R. Hérault, G. Gasso and D. Jacques, 2019. Gradient-based deterministic inv ersion of geophysical data with Generative Adversarial Networks: is it feasible? Computers and Geosciences, 23(5), 1193–1215, https://doi.org/10.1007/s10596-019-09875-y . [18] Linde, N., Renard, P ., Mukerji, T ., and J. Caers. 2015. Geological realism in hydrogeolog- ical and geophysical in verse modeling: A re view . Advances in W ater Resources, 86, 86–101. https://doi.org/10.1016/j.advw atres.2015.09.019. [19] Lucic, M., Kurach, K., Michalski, M., Gelly , S., and O. Bousquet, 2017. Are GANs created aqual? A large-scale study . arXiv preprint arXi v:1711.10337. [20] Mariethoz, G., Renard, P ., and J. Straubhaar , 2010. The Direct Sampling method to perform multiple-point geostatistical simulations. W ater Resources Research, 46, W11536. http://dx.doi.org/10.1029/2008WR007621. [21] Mariethoz, G., P . Renard, and J. Caers, 2010. Bayesian in verse problem and optimization with iterative spatial resampling. W ater Resources Research, 46, W11530. http://dx.doi.org/10.1029/2010WR009274. [22] Mosser , L., Dubrule, O., and M.J. Blunt, 2017. Reconstruction of three-dimensional porous media using generati ve adversarial neural networks. Ph ysical Revie w E., 96. https://doi.org/10.1103/PhysRe vE.96.043309. [23] Mosser , L., W . Kimman, J. Dramsch, S. Purves, A. De la Fuente, and G. Ganssle, 2018. Rapid seismic do- main transfer: Seismic velocity in version and modeling using deep generativ e neural networks, arXiv preprint [24] Mosser , L., Dubrule, O., and M.J. Blunt, 2020. Stochastic seismic w av eform in version using generati ve adv ersarial networks as a geological prior , Mathematical Geosciences, 52, 53–79. https://doi.org/10.1007/s11004-019-09832-6. [25] Niswonger , R.G., Sorab, P ., and I. Motomu, 2011. MODFLO W -NWT , A Ne wton formulation for MODFLO W - 2005. U.S. Geological Surve y T echniques and Methods 6-A37, 44 p. [26] Raissi, M., Perdikaris, P ., Karniadakis, G. E. 2019. Physics-informed neural networks: A deep learning frame work for solving forward and in verse problems in volving nonlinear partial dif ferential equations. Journal of Computational Physics, 378, 686–707. https://doi.org/10.1016/j.jcp.2018.10.045. [27] Reichstein, M.,Camps-V alls, G., Stev ens, B., Jung, M., Denzler , J.,Carv alhais, N., and Prabhat, 2019. Deep learning and process understanding for data-dri ven Earth system science. Nature, 566(7743), 195–204. https://doi.org/10.1038/s41586-019-0912-1. [28] Roth, K., Shulin, R., Flühler , H., and W . Attinger . 1990. Calibration of time domain reflectometry for water content measurement using a composite dielectric approach. W ater Resources Research, 26, 2267–2273. [29] Rücker , C., Günther , T ., and F .M. W agner , 2017. pyGIMLi: An open-source library for modelling and in version in geophysics. Computers and Geosciences, 109, 106–123. https://doi.org/10.1016/j.cageo.2017.07.011. 29 A P R E P R I N T - N OV E M B E R 1 1 , 2 0 2 0 [30] Shen, C., 2018. A transdisciplinary re view of deep learning research and its rele vance for water resources scientists. W ater Resources Research, 54(11), 8558–8593. https://doi.org/10.1029/2018WR022643 [31] Sun, A. Y ., 2018. Discovering state-parameter mappings in subsurface models using generativ e adversarial networks. Geophysical Research Letters, 45: 11137–11146. https://doi.org/10.1029/2018GL080404. [32] Sun, A. Y ., and B. R. Scanlon, 2019. How can Big Data and machine learning benefit en vironment and water management: a survey of methods, applications, and future directions. En vironmental Research Letters, 14, 073001. https://doi.org/10.1088/1748-9326/ab1b7d. [33] W ang, Z., Bovik, A. C., Sheikh, H. R., and E. P . Simoncelli, 2004. Image quality assessment: From error visibility to structural similarity . IEEE Transactions on Image Processing, 13(4), 600–612. [34] W ang, Y . T eng, Q., He, X., Feng, J., and T . Zhang, 2019. CT -image of rock samples super resolution using 3D con volutional neural netw ork. Computers & Geosciences, 133, 104314. https://doi.org/10.1016/j.cageo.2019.104314. [35] Y ang, F . and J. Ma, 2019. Deep-learning in version: A next-generation seismic velocity model building method, 2019. Geophysics, 84(4):1J A–Z21. https://doi.org/10.1190/geo2018-0249.1. [36] Zahner , T ., Lochbühler , T ., Mariethoz, G., and N. Linde, 2016. Image synthesis with graph cuts: a fast model proposal mechanism in probabilistic in version, Geophysical Journal International, 204(2), 1179–1190. http://dx.doi.org/10.1093/gji/ggv517. [37] Zhang, L., Zhang, L., and B. Du, 2016. Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine, 4(2), 22–40. http://dx.doi.org/10.1109/MGRS.2016.2540798. [38] Zhu, J., T . Park, P . Isola, and A. A. Efros, 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks: IEEE International Conference on Computer V ision (ICCV), 2242–2251. arXiv preprint [39] Zhu, Y ., N. Zabaras, 2018. Bayesian deep conv olutional encoder-decoder networks for surrogate modeling and un- certainty quantification. Journal of Computational Physics, 366, 415–437. https://doi.org/10.1016/j.jcp.2018.04.018. 30

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment