Machine Learning Approach to Earthquake Rupture Dynamics

Simulating dynamic rupture propagation is challenging due to the uncertainties involved in the underlying physics of fault slip, stress conditions, and frictional properties of the fault. A trial and error approach is often used to determine the unkn…

Authors: Sabber Ahamed, Eric G. Daub

Machine Learning Approach to Earthquake Rupture Dynamics
Confidential manuscript submitted to Jour nal of Geophysical R esearc h Machine Learning Approach to Earthquak e Rup ture Dynamics Sabber Ahamed 1 , Eric G. Daub 1 , 2 1 Center f or Earthquak e R esearch and Inf or mation (CERI), Univ ersity of Memphis, TN 2 Alan T uring Institute, London, U nited Kingdom Ke y P oints: • T w o machine lear ning algor ithms are used to predict if an earthquake can break through a fault with a geometric heterog eneity . • Models built from the algorithms can predict rupture propag ation or ar rest with more than 81 % accuracy . • Machine lear ning can identify the underl ying comple x data patter ns that determine the ph ysics of earthquak e rupture propagation. Corresponding author: Sabber Ahamed, sabbers@gmail.com –1– Confidential manuscript submitted to Jour nal of Geophysical R esearc h Abstract Simulating dynamic rupture propagation is challenging due to the uncer tainties in v ol v ed in the underlying phy sics of fault slip, s tress conditions, and frictional properties of the f ault. A tr ial and er ror approac h is often used to determine the unknown parameters describing rupture, but r unning man y simulations usually requires human revie w to determine ho w to adjust parameter v alues and is thus not v er y efficient. T o reduce the computational cost and impro ve our ability to determine reasonable stress and friction parameters, we take advantag e of the machine lear ning approach. W e dev elop tw o models f or earthquake r upture propa- gation using the ar tificial neural netw ork (ANN) and the random f orest (RF) algorithms to predict if a r upture can break a geometric heterog eneity on a f ault. W e train the models us- ing a database of 1600 dynamic rupture simulations computed numerically . Fault geometry , stress conditions, and fr iction parameters v ary in each simulation. W e cross-validate and test the predictiv e po w er of the models using an additional 400 simulated ruptures, respec- tiv ely . Both RF and ANN models predict rupture propagation with more than 81 % accuracy and model parameters can be used to infer the under lying factors most impor tant f or r upture propagation. Both of the models are computationall y efficient such that the 400 testings re- quire a fraction of a second, leading to potential applications of dynamic rupture that hav e previousl y not been possible due to the computational demands of phy sics-based rupture simulations. 1 Introduction Damage due to earthquakes poses a threat to humans w orldwide. Seismic hazard anal- ysis is used to estimate the possible ground motion at a giv en location in a given per iod based on historical earthquake data and deca y of ground motion intensity with distance. Ho we ver , current approac hes to seismic hazard analy sis are larg ely empir ical and may not capture the full range of ground shaking in future larg e earthquak es due to a lack of sufficient historical g eological data. This leads to lar ge uncer tainties in hazard estimates. Ideall y , this lack of data can be mitigated by emplo ying ph ysical models that supplement e xisting data with additional scenario e v ents that can quantify the e xpected v ar iability of ground shaking to provide a robust estimate of hazard and risk. Earthquake faults are incredibl y comple x systems that span a v ast rang e of length and time scales, making it challenging to cons truct ph ysical models that resolv e all of the relev ant ph ysics. Inf or mation on the s tate of stress of the f aults can be obtained from pas t earthquake f ocal plane mechanisms, g eologic observations, or direct w ell-bores or dr ill holes [ Zobac k , 2010 ]. How ev er , these stress analy sis techniq ues typicall y neither constrain the full stress tensor nor co ver a broad range of locations and depths. Theref ore, direct in situ measure- ments of the stresses and displacements during rupture are rarely a vailable. Ev en if the stress state and microscopic phy sics go v er ning ear thquake slip are kno wn with cer tainty , multi- scale modeling of ear thquakes poses a vas t computational challeng e due to the range of length and timescales inv olv ed. Due to these limitations, ph ysical modeling is not routinel y used directly in estimating ear thquak e hazard. Ho we v er, phy sics-based approac hes lik e dy - namic ear thquake r upture simulations are freq uently suggested as a wa y that such phy sical constraints could be incor porated into seismic hazard estimation [ Harris , 1998 ; Harris and Day , 1999 ; Grav es e t al. , 2011 ; Olsen et al. , 2009 ; de la Puent e et al. , 2009 ]. Dynamically simulating ear thquake r upture is challenging due to uncer tainty reg arding the underlying phy sics of ear thquak e slip. The stress conditions and frictional properties of faults are not w ell cons trained [ Duan and Og lesby , 2006 ; P eyr at e t al. , 2001 ; Ripper g er e t al. , 2008 ; Kame et al. , 2003 ] although the y , tog ether with fault geometry , control the r upture pro- cess and determine the dynamics of slip as well as the resulting ground motions. Because earthquake r upture is a highl y nonlinear process, determining parameter values is often done b y making simplifying assumptions or taking a trial and error approach, which usually incurs –2– Confidential manuscript submitted to Jour nal of Geophysical R esearc h e xpensive computational and numer ical cos ts [ Douilly et al. , 2015 ; Ripper g er e t al. , 2008 ; P eyrat et al. , 2001 ]. Theref ore, this limits the applicability of the simulations, as due to the high computational e xpense the y cannot be easil y integrated into other calculations such as in v ersions or seismic hazard analy sis. While r upture simulations compute the slip and ground motions with a high le v el of detail, seismic hazard analysis usually only requires more g eneral characteristics of an ear th- quak e suc h as moment magnitude or peak g round velocity [ P e tersen et al. , 2014 ]. Thus, rup- ture simulations ma y onl y need to appro ximate suc h characteristics to be useful in the haz- ard analysis. Machine lear ning is a promising approach f or reducing the computational cost in v olv ed in estimating suc h attributes of complex data sets. Man y recent studies show that a machine learning approac h can be beneficial in a variety of applications including seis- mic ev ent detection, hazard anal ysis, and f ault detection from un processed seismic data. For e xample, Rouet-Leduc et al. [ 2017 ] used a random f orest-based algor ithm to identify signals from the laborator y-g enerated acoustic measurements with predictiv e po w er . Their model was able to accurately forecas t future failure ev ents using onl y a windo w of data muc h shorter than the time between ev ents. P erol et al. [ 2018 ] de veloped a con v olutional neural netw ork algorithm ConvNetQuake to detect ear thquakes. ConvNetQuake sho w ed the abil- ity to identify 20 times more earthquakes accurately than w ere contained in the catalog listed b y the Oklahoma Geological Surve y . Last e t al. [ 2016 ] e xplored machine learning methods to predict the larges t possible magnitude of an earthquak e lik ely to occur next y ear based on the previously recorded seismic ev ents in Israel and its neighbor ing countries. The model was able to predict the maximum ear thquake magnitude with 69.8 % A UC (Area U nder the Curve) accuracy . Araya-P olo e t al. [ 2017 ] de veloped a neural netw ork based model that can identify faults from ra w seismic data b ypassing the e xpensiv e multistep seismic processing, which is a significant promise in h ydrocarbon e xploration. These models illustrate the v ar i- ety of wa ys that mac hine learning can take advantag e of geoph ysical data to help us under - stand the comple x s tr ucture and dynamics of the earth ’ s crust. Another recent application of machine lear ning algor ithm is to predict ground motion. P aolucci e t al. [ 2018 ] used a neural netw ork to predict broadband ear thquake ground motions from 3D phy sics-based numerical simulations. The authors show ed that their algorithm could reproduce the high-frequency shaking based on the lo w-frequency simulation results, illustrating how phy sics-based mod- els might be integ rated into hazard modeling. In this paper , w e describe a workflo w f or using machine learning to predict if an earth- quak e can break a fault with g eometr ic heterogeneity , a comple x problem that is dependent on geometry , s tress, and fault friction coupled to elastic wa ve propagation. W e star t by pre- senting the data generated from dynamic r upture simulations and normalizing the data (zero mean and unit standard de viation) f or training models. Then w e e xamine two models built from the random fores t and neural netw ork and trained by a larg e number of earthquake r up- ture simulations. Finally , w e discuss the potential to use machine learning models in predict- ing real surface fault r upture and estimating seismic hazard. –3– Confidential manuscript submitted to Jour nal of Geophysical R esearc h Nucleation point Distance along strike (km) Distance across fault (km) 0 8 16 24 32 1.0 0.5 0.0 -0.5 -1.0 Figure 1. The schematic diagram sho wing the fault geometry f or the preliminar y set of rupture models (v ertical scale exagg erated). The domain is 32 km in length along the strike of the f ault and 24 kilometers wide across the fault. The diagram show s a zoomed view of the f ault for better visualization of the barrier. Rupture initiates at the nucleation patch (red) which is ten kilometers from the curved g eometr ic bar rier . The half-width and height of the barr ier , stress tensor and friction parameters are varied randomly o ver the simulations. 2 Rup ture simulations and data preprocessing W e produce a set of 2,000 rupture simulations based on the g eometr y illus trated in Fig. 1 using fdfault [ Daub , 2017 ] a finite difference code f or numerical simulation of elas- todynamic fracture and fr iction problems. A dynamic r upture model sol v es the elastody- namic wa v e equation coupled to a friction la w describing the failure process. The simulation dynamically determines f ault slip based on the initial stress conditions, the elastodynamic wa v e equations, and frictional f ailure on that f ault. The fault in the simulation is planar, with a Gaussian g eometr ic heterog eneity at the center of the fault. Rupture is nucleated 10 km to the left of the barr ier and propagates from the hypocenter tow ards the bar r ier . The r upture on the f ault is gov er ned by the linear slip- w eakening law ( 3 ). The f ault s tar ts to break when the shear stress ( τ ) ex ceeds the peak s trength τ s = µ s σ n , where µ s and σ n are the static fr iction coefficient and nor mal s tress, respectiv ely . Ov er a cr itical slip distance d c , the friction coefficient reduces linearl y to constant dynamic friction µ d . F ault g eometry , stress state, and friction are all critical considerations f or whether or not a rupture propagating from left to r ight can break the bar r ier . F or right-lateral slip on the fault, the fault geometry is such that the closest side of the barr ier is a res training bend, which inhibits rupture, while the f ar side is a releasing bend that promotes r upture. Due to the fault geometry , all three stress components influence the shear and normal tractions on the restraining and releasing bends. For e xample, nor mal s tress increases around the re- straining bend depend on the bending angle while it decreases on the releasing bend, mak - ing the barr ier a challenging effect to systematicall y anal yze f or rupture propag ation. In this project, the half-width of the bar r ier v ar ies betw een 1 and 2.1 km while the height is set in the range of 0-10 % of its half-width. These values are larg e enough that the bar r ier has a non-negligible influence on the actual shear and nor mal tractions. Thus the combination of –4– Confidential manuscript submitted to Jour nal of Geophysical R esearc h the stress s tate with the comple xity of the fault geometry mak es it difficult to predict the abil- ity of the r upture to break the bar r ier due to these nonlinear effects. Rupture simulations frequently use the shear o ver normal stress or the S -ratio [ An- dr ew s , 1976 ; Das and Aki , 1977 ] to c haracter ize the ability of a rupture to propagate on a fault. The S -ratio is calculated as S = ( τ s − τ i )/( τ i − τ d ) , where τ i is the initial shear stress, and τ s and τ d are the peak and residual shear stresses, respectivel y (F ig. 3 ). For this par ticu- lar problem, ho we ver , w e find that neither metr ic can reliabl y predict if the rupture can break the bar r ier f or a giv en set of parameters. W e find that the S -ratio does offer some utility as a discr iminant f or rupture, but only in the sense that larg e v alues of S are more likel y to ar - rest. Smaller v alues of the S -ratio traditionall y indicate a rupture that is closer to f ailure, but w e find that f or this particular problem small S -ratios still ar rest in man y situations. For e x- ample, the Fig. 2 sho ws the smoothed probability density function of the S -ratio f or r upture arrest and propagation estimated from the training set of rupture data. From the figure, it is obvious that r upture propag ation and arrest ha ve larg e o ver lapping regions that mak e the S- ratio a poor discr iminant f or this par ticular problem. This is because of the role of the fault geometry , combined with the out of plane nor mal s tress component. 1 . 1 1 . 2 1 . 3 1 . 4 1 . 5 1 . 6 1 . 7 1 . 8 S factor 0 1 2 3 k ernel densit y Arrest Propagation Figure 2. Smoothed probability density function of S -ratio for rupture arrest and propagation. Ruptures that arrest typically ha ve a lar ge S -ratio and a smaller value indicates that r upture is close to failure. The mag- nitudes of S -ratio for rupture arrest and propagation ha ve a similar rang e and hav e a broad ov erlaping region that indicates that the S-ratio has limited po w er as a discriminant. For simple planar faults with unif orm stress and s trength conditions, the S-ratio and shear ov er normal s tress are indicative of the stress situation o ver the entire fault, and thus predictiv e of r upture o v er the whole f ault. For a rough fault, on the other hand, the actual tractions on the bar r ier can be significantl y different from those f or a flat f ault, and thus rup- tures that w ould ordinarily propagate may ar rest. Thus, while this might appear at the outset to be a rather simple r upture problem, it captures the essential phy sics of r upture propag ation on more comple x f ault g eometr ies and pro vides a useful test case. Theref ore, it allow s us to determine if a machine learning algorithm can predict the ability of a rupture to propagate under a giv en set of conditions. The 2000 simulations are divided into training (1600) and test (400) sets. 1600 training datasets are used to train and validate the model while the rest of the 400 test simulations are used to tes t the performance of the models. W e control the f ollowing eight parameters to v ar y the rupture beha vior in each simulation: –5– Confidential manuscript submitted to Jour nal of Geophysical R esearc h 1. The geometric barr ier half-width f ollow s a unif orm distribution betw een 1 and 2.1 km. 2. The bar r ier height f ollow s a unif orm distribution betw een 0 and 10 % of the bar rier half-width. 3. The fault nor mal s tress (in-plane, or IP) follo ws a unif orm dis tr ibution between -10 and -160 MPa (negativ e indicates compression hence smaller means higher s tress). Note that while this sets the normal stress on the flat par t of the f ault, while the nor - mal stress on the barr ier varies spatiall y due to the geometry . 4. The other normal s tress component, which we ter m the out-of-plane (OP) normal stress, f ollo ws a unif or m dis tr ibution betw een ± 25 % of the fault nor mal s tress. 5. The dynamic friction coefficient ( µ d ) follo ws a unif orm distribution between 0.2 and 0.6. 6. The fr iction drop (static minus dynamic fr iction) f ollo ws a unif or m distr ibution be- tw een 0.2 and 0 . 8 − µ d . 7. The shear stress is set to be proportional to the normal stress. The proportionality co- efficient is the dynamic friction coefficient plus 2 % of the fr iction drop ( µ s − µ d ), plus a random number times 13 % of the friction drop. In other words, the initial stress is chosen to be in a relativ ely nar ro w rang e of values where it is not trivial to predict whether or not the rupture will be able to propagate. Stress is alw ay s higher than the dynamic strength, but nev er v ery close to the static s trength. This small range reflects the idea that ear thquakes occur on faults once the shear s tress on the f ault reac hes the minimum stress needed to rupture the entire f ault, so such values should be represen- tativ e of realistic v alues of the initial shear s tress on natural f aults. 8. The slip w eakening distance ( d c ) follo ws a normal dis tr ibution centered at 0.4 m with a standard de viation of 0.05 m. Shear Stress Slip τ s = μ s σ n τ d = μ d σ n d c Figure 3. Slip-w eakening friction f or an earthquake f ault. The f ault begins to slip when the shear stress reaches or e xceeds the peak s trength τ s . Over a critical slip distance d c , τ s decreases linearl y to a constant dynamic sliding friction τ d . The shear strength is linearl y proportional to the (possibly time-varying) normal stress σ n , and the friction coefficient varies with slip between µ s and µ d . Fiv e sample r upture examples bef ore preprocessing are listed in T able 1 and sho ws the different parameter values with the cor responding output, 0 or 1. An output value of 1 means that the r upture propag ated through the restraining g eometr ic barr ier while 0 indicates that the rupture arrested bef ore reaching the center of the barr ier . All the parameters of the dataset are normalized to hav e zero mean and unit s tandard de viation. Note that this puts the problem in the form of a s tandard classification problem, f or whic h man y algorithms ha v e been dev eloped [ Carbonell et al. , 1983 ; Mic hie e t al. , 1996 ]. –6– Confidential manuscript submitted to Jour nal of Geophysical R esearc h T able 1. Fiv e sample training examples bef ore preprocessing Height (Km) Half- width (Km) OP Stress (MPa) Shear stress (MPa) IP Stress (MPa) Fric. drop Dyn fric. d c (m) Output 0.104 1.146 -102.509 58.619 -117.766 0.484 0.217 0.296 0 0.088 1.304 -136.062 51.391 -126.715 0.346 0.448 0.406 1 0.099 1.260 -117.559 40.972 -115.529 0.293 0.502 0.389 1 0.116 1.191 -128.169 94.021 -157.830 0.572 0.203 0.409 0 0.018 1.108 -106.350 29.149 -101.379 0.253 0.325 0.398 1 3 Classification Stategy W e use tw o different classification algorithms to create predictiv e models to deter - mine if a r upture can break the fault in our model. One is the random f orest decision tree algorithm [ Br eiman , 2001 ] and the other is the artificial neural network [ Rosenblatt , 1958 ; Rumelhart e t al. , 1988 ]. These tw o algorithms ha ve been selected due to their fle xibility in handling a range of classification problems with a large feature space as w ell as the f act that their estimated parameter values can pro vide insight into the underl ying dynamics. Another important aspect is that, while man y learning algor ithms suc h as nearest neighbor algorithms ma y w ork well for a particular problem, the y ma y not necessar il y tell us muc h about the un- derl ying ph ysics. Since our training dataset is imbalanced (65 % of ruptures arrest while 35 % of r up- tures propagate), cos t-sensitive lear ning [ Chen et al. , 2004 ] is beneficial in man y situations to make our models more suitable to learn from imbalanced data. Therefore, imposing a strong penalty on misclassification of minority class can impro ve per f or mance. W e assign a high w eight (i.e., higher misclassification cost) to the minor ity class (rupture propagation) using the ‘balanced‘ class strategy , where the class w eights are giv en by the number of total sam- ples / (number of classes × number of samples in each class). The abo v e f or mula is used to calculate the w eight f or r upture arrest and propag ation classes, which are 0.768 and 1.431 re- spectiv ely . These parameters are used to w eight the training e xamples in computing the cost function, which is then minimized when training the model. 4 Ev alution Metrics Since the data set is imbalanced, w e use recall, precision, F1 score, and accuracy as the ev aluation criteria. Accuracy is not alwa ys a good indicator of model perf or mance in these cases, as poor per f ormance on the rare class can be masked by good per f or mance on the much larg er numbers of the more common class. Recall measures the propor tion of the true predicted positive ex amples among the total positiv e cases. The recall is a metr ic to ev aluate the model performance on all positive samples. Precision, on the other hand, measures the proportion of the true positiv e e xamples out of the total number of predicted positiv es. Pre- cision helps us to analyze the model quality regarding the positiv e predictions. The F1 score is an ev aluation metric that combines precision and recall b y taking the harmonic mean of them. The F1 score is an auxiliary e valuation metr ic that combines precision and recall in a wa y where the model must exhibit high precision and recall to obtain a good F1 score. F1 score is calculated as: 2 × (precision × recall) / (precision + recall). W e also e xamine the confusion matr ix to ev aluate model per f ormance. The matrix giv es detailed information on true positiv e (TP), tr ue neg ative (TN), f alse positiv e (FP), and false negativ e (FN) predictions. TP and TN are the e xamples where the rupture propag ates or arrests, respectiv ely , and the models predict them cor rectly . The FPs are situations where a –7– Confidential manuscript submitted to Jour nal of Geophysical R esearc h rupture w as predicted to propag ate but actually ar rested, and the FNs are ex amples where an arrest w as predicted, but the r upture propag ated. All of the metrics pro vide us with a better understanding of model per f or mance. 5 Random f orest classifier Ensemble machine learning algorithms hav e been widely used because of their ex- cellent accuracy , robustness, and ease of use. The method combines multiple independent learning algorithms (w eak learners) to achie v e better predictiv e performance than could be obtained from any of the single learners alone [ Opitz and Maclin , 1999 ; P olikar , 2006 ; Rokac h , 2010 ]. The ensemble makes a final prediction based on a majority v oting from the individual components of the group. There are tw o types of ensembles commonly used: bag- ging and boosting. Bagging [ Br eiman , 1996 ] inv ol v es having each learner in the ensemble v ote with equal w eight while in boos ting [ Dietteric h , 2000 ], the prediction is made b y taking a weighted vote of the predictions. W eights are proportional to each lear ner’ s accuracy on its training set. Although the ensemble methods can often per f or m better than a single learner , it requires increased storage, e xtensive computation, and has a complex structure to inter pret due to the inv olv ement of multiple classifiers in decision making. x i ∩ X x i ∩ X x i ∩ X x i ∩ X y i y i y i y i Y Figure 4. The schematic diagram sho ws the ensemble of multiple decision trees kno wn as the random f orest. In each tree, the root node is at the top, and eac h inter nal node separates the data based on a decision using one of the input v ar iables. An edg e connects child nodes with a parent node. W e use the ID3 algo- rithm [ Quinlan , 1986 ] to generate a single decision tree. In each tree, the input parameters ( x i ) are the subset of complete training data ( X ) and ha v e subset f eatures which are selected b y random sampling with replace- ment. The random fores t is the combination of multiple decision trees combined to make the final predicted class ( Y ). A fast algor ithm such as decision trees is commonl y used in ensemble methods to re- duce the computational cost. Fig. 4 sho ws a sc hematic of the ensembles of multiple single –8– Confidential manuscript submitted to Jour nal of Geophysical R esearc h decision trees. Random f orest (RF) is a bagging type of ensemble classifier [ Breiman , 2001 ] that uses man y single trees to make predictions. RF can be used f or both classification and regression. In this project, w e use the ID3 algorithm [ Quinlan , 1986 ] that uses inf or mation gain as the splitting criteria f or a single decision tree. For the random f orest classification, we use the scikit-learn [ P edr egosa e t al. , 2011 ] librar y f or implementing the algorithm in Python . Three parameters are optimized to improv e the model perf or mance: (1) maximum depth: the maximum depth of the tree (2) minimum samples split: the minimum number of samples required to split an internal node and (3) number of estimators: the number of trees in the f orest. T o find the bes t param- eters, we per f orm a g rid search o v er the specified parameter values using the cross-validation technique to assess model per f ormance. The best parameter v alue of maximum depth is 10, minimum samples split is 40, and the number of estimators is 20. Cross-validation in v olv es dividing the training data into two par ts: the training data is used to estimate the model pa- rameters and the validation data to deter mine ho w w ell the model g eneralizes to unseen data. By optimizing the model performance on the cross-validation data, w e select the v alues of these model parameters. 5.1 Important features OP Stress Dynamic fric IP Stress Fric Drop Shear Stress Height Width d C Importance rank(%) 0.0 1.0 10 15 20 21.34 % 18.8 % 15.88 % 15.59 % 14.39 % 6.07 % 4.09 % 3.83 % Figure 5. The bar chart show s the relative importance (in % ) of the input f eatures given b y a random fores t classifier . OP normal stress has the highest influence (21.34 % ) on the decision f ollow ed by the dynamic friction coefficient (18.8 % ). IP normal stress, fr iction drop, and shear stress components sho w similar im- portance (15 % ). Geometr ic f eatures such as height and half-width of the bar rier and slip weak ening distance ( d c ) are identified as being less influential in making a decision. This bar char t and associated scripts, data are a vailable under CC-B Y Ahamed et al. [ 2017 ]. The random f orest algorithm allow s the ev aluation of the important f eatures of a clas- sification task. The most impor tant features tend to occur higher in the decision trees, so w e can see in this manner in which features are most predictiv e of r upture or arrest. Fig. 5 sho ws the weighting of important f eatures b y percentag e. Stress components, dynamic friction, and friction drop are the most important f eatures to determine if an earthquake can break through a fault. These f eatures account for 86 % of the total predictiv e po w er . This result is consistent with many studies that hav e shown that earthquake r upture initiation, its propagation, and –9– Confidential manuscript submitted to Jour nal of Geophysical R esearc h termination are highly sensitive to s tress and friction properties [ Duan and Oglesby , 2007 ; P eyrat et al. , 2001 ; Ripper ger et al. , 2008 ; Kame e t al. , 2003 ]. OP nor mal s tress alone has the greatest (21.34 % ) classification contribution f ollo w ed b y dynamic friction (18.80 % ). IP normal and shear s tress components and friction drop ha v e almost equal impor tance (15 % ). On the other hand, geometric f eatures height and half-width ha v e a less significant effect on the predictiv e po wer , indicating that the fault geometry is less important. This is likel y because the s tress and fr iction are impor tant both with and without a geometrical heterogeneity on the fault, and regardless of how strong the heterogeneous fault geometry is, as long as it is loaded sufficientl y close to f ailure it will s till be able to propa- gate. A dditionally , the influence of the OP nor mal s tress is due entirel y to the v ar ying g eom- etry , and the algorithm ma y find that the fault geometry v ar ies less widel y across the sim- ulations, making the OP normal stress the v ar iable that giv es the most reliable predictions. The OP normal stress also v ar ies more widely than the other s tress parameters, so this may e xhibit a stronger influence when constructing predictiv e models. This does not mean that fault geometry is unimportant, as it is well known that f ault roughness influences the resid- ual stress heterog eneities on the f ault, which has a significant effect on earthquake recur rence probability [ Zielke et al. , 2017 ]. Instead, this result sugges ts that if the geometry does not vary too much, the s tress and fr ictional conditions are more useful to make predictions about the r upture process. Simulations using more comple x f ault g eometr ies suc h as fractal f aults ma y sho w a strong er sensitivity to f eatures based on the fault geometry when compared to the simple barr ier considered here. 5.2 Classification result W e use 400 test data to estimate the generalization er ror of the random f orest model. T able 2 sho ws the confusion matrix that contains inf or mation about actual and predicted classifications per f ormed b y the model. The RF classifier accuratel y predicts 218 tes t cases as TN. Similar ly , 107 e xample r uptures are cor rectly identified as TP . On the other hand, 21 and 54 test cases w ere inaccuratel y classified as FP or FN, respectiv ely . T able 2. Confusion matr ix giv en by the random f orest model based on 400 testing data Neg ative Positiv e Neg ative (Rupture ar rested) TN = 221 FN = 51 Positiv e (Rupture propagated) FP = 24 TP = 104 T able 3. Random f orest classification results based on 400 testing data Class Precision Recall F1 score Number Neg ative (Rupture ar rested) 0.91 0.80 0.85 272 Positiv e (Rupture propagated) 0.66 0.84 0.74 128 A verag e/T otal 0.83 0.81 0.82 400 T able 3 sho ws the classification result using three e valuation metr ics. The high score of recall (0.80) for the positiv e class and the high precision (0.91) f or the negativ e class in- dicates that the number of tr ue positiv es (128) misclassified as negativ e is small (21). A slightly lo wer F1 score for the positiv e class indicates that the model performance of the r up- ture propagation class is not as good as the performance of the rupture ar rest class. A dding –10– Confidential manuscript submitted to Jour nal of Geophysical R esearc h more r upture propag ation data ma y impro v e the positive class perf or mance b y helping the algorithm distinguish more subtle patterns in the dataset. 6 Artificial N eural Ne twor k Artificial neural networks (ANN) are inspired b y ho w neurons are connected in the brain [ Rosenblatt , 1958 ]. A neural network consists of se veral units interconnected and or - ganized in la y ers. The individual units are also known as neurons. The neurons per f or m a w eighted sum of its input, so its output can fit comple x functions by combining a larg e num- ber of w eights. A la y er can be connected to an arbitrar y number of further hidden lay ers of arbitrary size before being combined in the output la y er . Hidden la y ers introduce comple xity to the model, and giv en sufficient data to train the model, these additional la yers can impro ve performance. How ev er , increasing the number of hidden la y ers does not alw ay s help. Addi- tional hidden la y ers can lead to ov er fitting [ Hinton e t al. , 2012 ; Lawr ence and Giles , 2000 ; Lawr ence e t al. , 1997 ] where the netw ork will memor ize the training data, but g eneralize poorl y to new data. Theref ore, selecting the number of la yers and units in eac h la y er is one of the challenges of cons tructing ANNs that perform well on comple x data sets. y i = f(x i ) x i Figure 6. Schematic diagram illustrating an artificial neural netw ork. N ormalized parameter values ( x i ) are f ed into the input la yer , which are then combined into a hidden la yer (center) using a set of parameter w eights estimated from the training data. The resulting values from the hidden la yer are then combined into an output la y er , which computes a probability . Models may also contain multiple hidden la yers, whic h enable the model to build more comple x combinations of the input parameters. Figure. 6 illus trates the schematic diagram of the neural network topology w e use in this work. The netw ork has one hidden lay er with 12 units. The eight input parameters are mapped to these 12 units, producing a 12 × 8 w eight matrix f or the model. As each input enters a unit, the output of the previous unit is multiplied b y its w eight. The unit then adds all these ne w in puts, which determines the output v alue of the intermediate unit. W e then apply a nonlinear activation function ReLu [ Hahnloser e t al. , 2000 ] to the output weight, –11– Confidential manuscript submitted to Jour nal of Geophysical R esearc h which passes all the v alues greater than zero and set an y neg ative output to be zero. Finall y , the hidden la y ers combine the 12 outputs with the output lay er and use the resulting w eight to make predictions. W e use keras [ Chollet et al. , 2015 ], a Python deep lear ning librar y , to build the model. In the ANN, 480 training ex amples (30 % of the training set) are used for validating the model. T o prev ent o v er fitting (high variance, theref ore, poor performance on unseen ex- amples), we use se v eral strategies: (1) cross-v alidation (a technique to assess model perf or - mance) (2) L-2 regularization that penalizes larg e w eights and effectivel y reduces the v ar i- ance of the model, hence prev ents o verfitting and (3) ear ly stopping tec hnique that stops the training when the difference betw een training and v alidation error s tart increasing rather than decreasing. In our case, if the v alidation accuracy does not impro v e in 20 consecutiv e train- ing steps, the earl y stopping technique halts the training. 6.1 Netw ork parameters The weights lear ned b y the neural netw ork allo w us to g ain some insight into the com- binations of input parameters that are mos t predictiv e of the ability of the rupture to propa- gate. T o visualize the w eights, w e ha ve constructed the weights versus neural units matrix plot. This is illustrated in Fig. 7 . The left panel sho ws the model weights mapping the eight inputs (hor izontal) to the twel ve hidden units (vertical). The right panel show s the weights that combine the hidden units into the one output unit on the right. The color scale indicates the range of w eights. The w eights in the model range from neg ative values to positiv e values which indicate inhibitory or e x citator y influences. If the output lay er w eight is positive f or a par ticular unit, then the parameter combina- tion is a good predictor of rupture propagation, while a negativ e w eight indicates the param- eter combination is a good predictor of r upture arrest. Similarly , w eights mapping the inputs to the hidden units that are positiv e indicators signify that large values of that input unit f av or rupture. If the output unit has a negativ e w eight, then large w eights f or a hidden unit indicate that large values of that parameter are predictive of the arrest. The w eights pro vide insight into which parameter combinations are most predictive of rupture. All the w eights related to height, half-width and slip-w eakening distance d c ha v e rel- ativ ely small w eights when compared to the stress and friction parameters (Fig. 7 ). This means these three parameters ha v e reduced the influence on the final prediction while the re- maining five parameters ha ve a muc h strong er contribution to the final predictions. W e note the consistency with the random f orest algor ithm that ga v e the same importance rank f or the parameters; that is, height, half-width, and d c are the least significant. Parameters illustrated in Fig. 7 also pro vide insights into the parameter combinations and their influence on determining the r upture propag ation. Interestingl y , we find that all of the units with large output might show similar patter ns f or the combined parameter values. For e xample, unit-4 has a larg e neg ative output w eight. The unit has a larg e negativ e weight of shear stress, fr iction drop, and slip-weak ening dis tance while height, half-width, OP , IP normal stress, dynamic friction coefficient, and fr iction drop hav e positiv e w eight. This in- dicates that if a fault has low shear stress, and low fr iction drop, but high compressiv e OP , IP nor mal s tress, high dynamic friction coefficient, height, and half-width then it is likel y that r upture w ould not propagate but arrest. On the other hand, unit-8 has a substantial pos- itiv e output weight. The OP , IP normal and dynamic friction coefficient has a larg e neg ative w eight while friction drop, slip-weak ening dis tance, shear stress ha v e high positiv e weight. Note that these are essentiall y the opposite v alues from those in unit-4, indicating a single underl ying pattern in the data. Theref ore, based on our input data the model has determined e xactly how to bes t combine the v arious input parameters in a more sophisticated wa y than simply looking at shear ov er normal s tress or the S parameter . A dditionally , the model pro- vides a method for integrating the effect of the out of plane nor mal s tress, which is known to –12– Confidential manuscript submitted to Jour nal of Geophysical R esearc h Height Width OP normal Shear IP normal Dynamic F ric F ric Drop d c 1 2 3 4 5 6 7 8 9 10 11 12 Hidden Units − 0 . 008 − 0 . 004 0 . 000 0 . 004 0 . 008 Output Unit 1 2 3 4 5 6 7 8 9 10 11 12 − 2 − 1 0 1 2 Figure 7. The illustration sho ws the parameters learned by the ANN f or the rupture model. The netw ork has one hidden la y er with twel ve nodes. The left panel show s the weights that map the in puts to the hidden units. The eight input parameters are on the hor izontal scale, and the tw el v e hidden units are on the v er tical scale. All input parameters are nor malized to ha v e zero mean and unit standard de viation. The colors in each ro w indicate ho w the parameters are combined to f or m each hidden unit. The r ight panel sho ws the w eights that are applied to each hidden unit to f or m the single output unit. A substantial positiv e value of the output unit indicates that the particular feature combination is predictiv e of propagation while a larg e negativ e w eight of the output unit indicates that the combination is predictiv e of the arrest. These w eights pro vide a ph ysical understanding of the parameters selected b y the neural network and giv e insight into the phy sics of r upture. This illustration and associated scripts are av ailable under CC-B Y Ahamed et al. [ 2017 ]. be impor tant f or comple x f aults but is not accounted f or q uantitativ ely when e xamining the S-factor . Our model is a simplification, and there is just one barr ier with one shape and size, rather than man y barriers with a much broader rang e of shapes and sizes. Although the fault geometry (half-width and height) has a relativ ely small influence in determining rupture, variations in barr ier size still play an important role f or local stress state on the fault sur face. It is likel y that when considering the full fractal fault, the f ault g eometr y lik ely play s a more pronounced role as there are many more places where the geometry could cause the r up- ture to arrest. Chest er and Chester [ 2000 ] used an analytical model of a w a vy friction f ault and found that the fault roughness has the mos t significant impact in determining the or ien- tation and magnitude of principal stress. In a nonplanar fault geometry , nar row er barr iers ha v e a more substantial stress per turbation in the sense that angle near the bend is shar per f or the bar r iers of the same height, so the variation in traction at the releasing and restraining bend (Fig. 1 ) is more prominent. Whereas if the barrier is broad, the angle around the bend chang es gradually , and the stress per turbation at the restraining and releasing bend is less noticeable. Although the fault geometry is not as predictive as the stress and friction, the ge- ometry still does pla y a role in understanding the rupture process, and the methods de veloped here give a straightf orward wa y to account f or them quantitativ ely . W e confir m that the parameter combinations f ound by the ANN approach are robust b y repeating the fitting procedure. W e build an additional 15 neural network models with the –13– Confidential manuscript submitted to Jour nal of Geophysical R esearc h 1 3 5 7 9 11 13 15 Model Number Model Number 1.0 0.90 0.85 0.75 R2 score 15 13 11 9 7 5 3 1 Figure 8. The illustration sho ws the coefficient of determination (R2 score) among the weights learned b y fifteen neural netw ork models. W eights are initialized by setting a random seed v alue for eac h different model. A single R2 score is calculated b y first sorting the output weights, and then correlating the sor ted w eights with all other model realizations. Then we construct a correlation map from R2 scores of fifteen models to see if w e obtain similar parameters independent of ho w the algorithm is initialized. Model-11 has the low est correlation among all models, while the other models are strongly correlated. This indicates that the model picks up on robus t f eatures that are meaningful f or understanding the r upture process. This illustration and associated scripts are av ailable under CC-B Y Ahamed et al. [ 2017 ]. same training data set, but different initial weights to see if the models find the same features that are predictiv e of r upture. The a v erage testing accuracy of the models is 83 % which is v er y similar to the results sho wn in T able 5 . Although the final w eights v ar y slightl y from model to model, they exhibit high correlation when we sor t them in ascending order based on their output w eight. Fig. 8 sho ws the determination of coefficient (R2 score) among the pa- rameters (in ascending order) learned b y the models. Model-11 has the smalles t correlation with the other model while models-10, and 12 ha v e high cor relations. Ev en though model- 11 has the low est correlation coefficient of 0.71 with model-1, it is high enough that it still contains similar patterns as the other models. The highl y correlated w eights indicate that the models are picking up on consistent features regardless of the random w ay that the model is initialized. 6.2 Classification result T able 4. Confusion matr ix giv en by neural netw ork model on 400 testing data Neg ative Positiv e Neg ative (Rupture ar rested) TN = 223 FN = 49 Positiv e (Rupture propagated) FP = 26 TP = 102 –14– Confidential manuscript submitted to Jour nal of Geophysical R esearc h T able 5. Neural netw ork classification results based on 400 testing data Class Precision Recall F1 score suppor t Neg ative (Rupture ar rested) 0.90 0.82 0.86 272 Positiv e (Rupture propagated) 0.68 0.80 0.73 128 A verag e/T otal 0.83 0.81 0.82 400 W e use the same 400 test data to validate the ANN model. T able 4 sho ws the confusion matrix (CM) that contains information about actual and predicted classifications. Interest- ingly the CM of the RF model is essentially the same as that of the ANN model, with tw o additional cases correctly predicted relativ e to the RF . The classification repor t (table 5 ) pro- duced by the ANN model also show s near ly the same precision, recall, and F1 scores as the RF model. The o verall testing accuracy of the model is 81 . 25% . This result sugges ts that sev eral common classification algor ithms are capable of accurately predicting the r upture re- sults to the same lev el given the same input data. This also sugges ts that the misclassified test data are due to the number of training e xamples pro vided to the algorithms. 7 Misclassification anal ysis − 2 . 5 0 . 0 2 . 5 PCA comp onent-1 − 2 0 2 4 PCA comp onent-2 (a) RF classified data − 2 . 5 0 . 0 2 . 5 PCA comp onent-1 − 2 0 2 4 (b) ANN c lassified data Height Width Stress xx Stress xy Stress yy Dynamic F ric F ric Drop d c − 0 . 7 − 0 . 3 0 . 0 0 . 3 0 . 7 Normalized mean (c) Normalized mean TP TN FP FN Figure 9. The illustration sho ws the classified training data in tw o-dimensional space. W e used Princi- pal component analy sis (PCA) to represent all the eight f eatures in 2D space. PCA is a tec hnique that uses Singular V alue Decomposition to linearly reduce the data and project it to a lo wer dimensional space. (a) 2D representation (PCA -1 vs. PCA -2) of training data used f or RF . Red and blue dots represent correctly predicted rupture ar rest and propagation respectivel y while black and green dots are f alse negativ es (FN) and positiv es (FP) respectiv ely . (b) 2D representation (PCA -1 vs. PCA -2) of training data used f or ANN. Symbols are the same as in Fig. 9 (a). The nor malized mean v alue f or each f eature of the cor rectly and incorrectly classified data. In both of the RF and ANN models, r upture propagation and arrest are distinguishable based on the lo w er dimensional projection of the data. F alse positives are in the transition betw een tr ue positiv e and true negativ e. When projected into the 2-dimensional space, some of the false positiv es o v erlap with the bulk of the true positives, while mos t of the false neg atives are located in the same region of parameter space as most ruptures that ar rest. This illustration and associated scripts are av ailable under CC-B Y Ahamed et al. [ 2017 ]. Although the models generally pro vide robust predictions on the data, it perf or ms less w ell on a number of f alse positiv es where r upture was predicted to propagate but arrested –15– Confidential manuscript submitted to Jour nal of Geophysical R esearc h instead. T o understand the model per f or mance on the f alse positiv es and false negativ es, w e reduced all the training data into 2D using Principal component anal ysis (PCA). PCA is mathematically defined as an orthogonal linear transformation that transf or ms multidimen- sional data (eight dimensions in our case) to a low er dimension. Fig. 9 (a) and (b) sho w the scatter plot of the reduced training data set. The horizontal and vertical ax es represent PCA component-1 and -2, respectivel y . Most of the r upture propagation (tr ue positiv es) and ar - rest (tr ue negativ es) data are distinguishable in the PC A plots for the RF (Fig. 9 (a) and ANN (Fig. 9 (b). Fig. 9 (c) sho ws the normalized mean value of eac h f eature of TP , FP , TN , and FN. This gives some insight that the misclassified e xamples tend to ha v e parameter values that are on the outside edge of the a v erages of those that are cor rectly predicted. F alse positiv es tend to be located in the transition between tr ue positiv es and tr ue negativ es. Some of the FP also ov erlap with true positiv es. Interestingl y , the false positiv es ha v e a similar mean value of all f eatures when compared to the true positiv es illus trated in Fig. 9 (c). Like wise, the false negativ es ha v e similar mean f eature values when compared to the tr ue neg atives. Most of the false negativ es are located in the rupture ar rest region. These examples likel y w ere misclas- sified because w e ha v e insufficient data, and adding more data (especially more ruptures that propagated) could impro v e model per f ormance. Another possible wa y to solv e the problem is to use the Ba y esian neural network (BNN) [ Gal , 2016 ; Mullac her y et al. , 2018 ]. Future efforts will focus on w ay s to improv e model per f or mance on the misclassified e xamples. 8 Discussion In this study , w e de v elop mac hine learning models using the random f orest and arti- ficial neural netw ork algorithms to predict earthquake r upture on a g eometr icall y comple x fault. The models pro vide a robust w ay to learn the parameter combinations responsible f or rupture propagation. Because of the complicated f ault g eometr y , nonlinear rupture process, and unknown mater ial properties, it is difficult to predict if an ear thquak e can break through a fault [ Ohnak a e t al. , 1986 ; Kanamori et al. , 1993 ; Mar one , 1998 ; Abercr ombie and Rice , 2005 ; W arr en and Shear er , 2006 ]. Moreov er , discontinuous par ticle v elocities across fault zones and tractions acting on the fault are gov er ned b y nonlinear friction la ws, and obtaining these parameters in situ is challenging [ McGarr and Gay , 1978 ; Saffer and Mar one , 2003 ; Saffer et al. , 2001 ; K ohlstedt et al. , 1995 ]. Theref ore, modeling ear thquake rupture with a heterogeneous fault surface and unkno wn material properties remains a c hallenging com- putational problem [ Douilly et al. , 2015 ; Ripper ger et al. , 2008 ; P eyr at et al. , 2001 ]. Our machine lear ning approach, on the other hand, pro vides a method to understand the rupture ph ysics on comple x fault and efficientl y predict the result with high accuracy if a sufficient number of e xamples are provided. Classification results produced by the ANN and RF models sho w that they can cap- ture most of the under lying patter ns of r upture propagation. Interestingl y , both models ha v e around 84 % recall on the positive class (rupture propag ation), meaning that both of the mod- els successfully lear ned mos t of the under lying complex data patterns responsible f or rupture propagation. Another e x citing aspect of the ANN model is that it consistentl y finds the same hidden data patterns despite the v arious w eight initializations. These underl ying comple x data patter ns reflect the ph ysics of the r upture propag ation. For ex ample, if a f ault with ge- ometrical heterog eneity has higher out-of-plane normal and shear stress and lo w s tatic and dynamic fr iction, then it is likel y that an ear thquake r upture can break through the f ault, and our model pro vides a wa y to ev aluate this quantitativ ely . These models are highly efficient in deter mining whether a rupture is going to break through a fault. Both of the models take a fraction of a second to predict if a rupture can propagate giv en eight input parameters. This is a significant impro vement ov er running a full rupture simulation, which tak es about two hours of wall clock time on eight processors. Al- though the model per f ormance in predicting earthquake r upture outperf or ms the traditional approach such as the S-parameter [ Andr ews , 1976 ; Das and Aki , 1977 ] or nondimensional prestress [ Bruhat et al. , 2016 ; Kaneko and Lapusta , 2010 ], this approach is not meant to re- –16– Confidential manuscript submitted to Jour nal of Geophysical R esearc h place all rupture simulations, but rather to help deter mine parameter values. This approac h might be combined with inf or mation suc h as a historical f ault database and paleoseismic rupture areas to choose a parameter range consistent with past earthquakes in a region. The model predictions might also be incor porated into other comple x calculations such as in v er- sions or probabilistic seismic hazard analy sis (PSHA). Since the models are computationally efficient, we can find a rang e of parameters lik ely to giv e a specific magnitude, which can be used to run some specific scenario e vents f or es timating ground motions. The method could also be adapted into a probabilistic model f or earthquak e size giv en a fe w ph ysicall y rele vant parameters that can account f or epistemic uncer tainty b y robus tly considering different parameter v alue c hoices. W e plan to expand our results to more exten- siv e data sets as well as more realistic g eometr ies drawn from f ault databases, and de v elop methods to use machine learning to generate rupture models that are based on phy sics and can be used in PSHA. 9 Conclusion W e use the random f orest and artificial neural network algorithms to predict if a rup- ture can propagate through a fault with a geometric heterogeneity . W e first g enerate 2000 dynamic ear thquake r upture simulations varying stress, friction parameters and height, and half-width of the restraining bend of the fault. In both of the models, 400 r upture examples are used to test the model perf or mance. Both of the models can consis tently predict rupture with more than 81 % accuracy . Data patterns identified b y the models reflect the ph ysics of the r upture propag ation, and these patter ns are robustl y identified independently of ho w the models are initialized. Computationally , the models are highly efficient. Once the training simulations are computed, and the machine learning algorithms are trained, the models can make a predic- tion within a fraction of a second. This has the potential to allo w f or the results of dynamic rupture simulations to be incor porated into other comple x calculations such as in versions or probabilistic seismic hazard analy sis, something that w ould not be ordinarily possible. The method can also be applied to other complex r upture problems such as branc hing f aults, f ault stepo v ers, and other comple x heterog eneities where the ph ysics of earthquake rupture prop- agation is not fully understood. Machine lear ning pro vides a new wa y of approaching this comple x, nonlinear problem, and helps scientists understand how the under lying geoph ysi- cal parameters are related to the resulting slip and g round motions, thus helping us cons train future seismic hazard and risk. 10 A ckno wledgement This research w as supported b y the Souther n Calif ornia Earthquake Center (Contribu- tion No. 7759). SCEC is funded b y NSF Cooperative Agreement EAR -1033462 & USGS Cooperativ e Agreement G12A C20038. Also, thank CERI and Univ ersity of Memphis HPC f or pro viding support and computational resources. The source code f or all graphs, plots, and models can be f ound in the Github repositor y: https://github.com/msahamed/ machine_learning_earthquake_rupture . Ref erences Abercrombie, R. E., and J. R. Rice (2005), Can observations of earthquake scaling constrain slip weakening?, Geophysical Journal International , 162 (2), 406–424. Ahamed, S., E. Daub, and E. Choi (2017), Coupling long-ter m tectonic loading and shor t- term earthquake slip., doi:10.6084/m9.figshare.4604344.v4. Andre ws, D. (1976), Rupture v elocity of plane strain shear crac ks, Journal of Geophysical Resear ch , 81 (32), 5679–5687. –17– Confidential manuscript submitted to Jour nal of Geophysical R esearc h Ara y a-Polo, M., T . Dahlk e, C. Frogner, C. Zhang, T . Poggio, and D. Hohl (2017), A utomated fault detection without seismic processing, The Leading Edg e . Breiman, L. (1996), Bagging predictors, Machine lear ning , 24 (2), 123–140. Breiman, L. (2001), Random f orests, Mac hine learning , 45 (1), 5–32. Bruhat, L., Z. F ang, and E. M. Dunham (2016), Rupture complexity and the supershear tran- sition on rough faults, Jour nal of Geophysical Resear ch: Solid Ear th , 121 (1), 210–224. Carbonell, J. G., R. S. Michalski, and T . M. Mitc hell (1983), An o v er view of mac hine learn- ing, in Mac hine learning , pp. 3–23, Spr inger . Chen, C., A. Liaw , and L. Breiman (2004), Using random f orest to learn imbalanced data, U niv ersity of Calif or nia, Berkeley , 110 , 1–12. Chester , F . M., and J. S. Chester (2000), S tress and deformation along w avy frictional f aults, Journal of Geophysical Researc h: Solid Ear th , 105 (B10), 23,421–23,430. Chollet, F ., et al. (2015), Keras, https://github.com/fchollet/keras . Das, S., and K. Aki (1977), A numer ical s tudy of tw o-dimensional spontaneous rupture propagation, Geophysical journal international , 50 (3), 643–668. Daub, E. G. (2017), Finite difference code f or earthquak e f aulting, https://github.com/ egdaub/fdfault/commit/b74a11b71a790e4457818827a94b4b8d3aee7662 . de la Puente, J., J.-P . Ampuero, and M. Käser (2009), Dynamic rupture modeling on unstr uc- tured meshes using a discontinuous galerkin method, Journal of Geophysical R esearc h: Solid Ear th , 114 (B10). Dietterich, T . G. (2000), An e xperimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization, Mac hine learning , 40 (2), 139–157. Douilly , R., H. A ochi, E. Calais, and A. Freed (2015), 3D dynamic r upture simulations across interacting faults: The Mw 7.0, 2010, Haiti ear thquake, Jour nal of Geophy sical Resear ch: Solid Earth . Duan, B., and D. D. Oglesby (2006), Heterog eneous f ault s tresses from previous earthquakes and the effect on dynamics of parallel str ike-slip faults, Journal of Geophysical Researc h: Solid Ear th , 111 (5), doi:10.1029/2005JB004138. Duan, B., and D. D. Oglesby (2007), Nonunif orm pres tress from prior earthquakes and the effect on dynamics of branched fault sys tems, Journal of Geophy sical R esearc h: Solid Earth (1978–2012) , 112 (B5). Gal, Y . (2016), Uncertainty in deep learning, U niver sity of Cambridg e . Gra v es, R., T . H. Jordan, S. Callaghan, E. Deelman, E. Field, G. Juv e, C. Kesselman, P . Maec hling, G. Mehta, K. Milner, et al. (2011), Cybershak e: A ph ysics-based seismic hazard model f or southern Calif ornia, Pur e and Applied Geophy sics , 168 (3-4), 367–381. Hahnloser , R. H., R. Sar peshkar , M. A. Maho w ald, R. J. Douglas, and H. S. Seung (2000), Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit, Natur e , 405 (6789), 947. Harr is, R. A. (1998), Introduction to special section: Stress tr iggers, stress shadow s, and im- plications for seismic hazard, Journal of Geophysical R esearc h: Solid Earth , 103 (B10), 24,347–24,358. Harr is, R. A., and S. M. Day (1999), Dynamic 3D simulations of earthquakes on en echelon faults, Geophysical Resear ch Letter s , 26 (14), 2089–2092. Hinton, G. E., N. Srivas tav a, A. Kr izhev sky , I. Sutske ver , and R. R. Salakhutdino v (2012), Impro ving neural networks by prev enting co-adaptation of f eature detectors, arXiv pr eprint arXiv :1207.0580 . Kame, N., J. R. Rice, and R. Dmo wska (2003), Effects of pres tress state and rupture velocity on dynamic fault branching, Journal of Geophysical R esearc h: Solid Earth (1978–2012) , 108 (B5). Kanamori, H., J. Mori, E. Hauksson, T . H. Heaton, L. K. Hutton, and L. M. Jones (1993), Determination of ear thquake energy release and ML using TERRAscope, Bulletin of t he Seismological Society of America , 83 (2), 330–346. –18– Confidential manuscript submitted to Jour nal of Geophysical R esearc h Kanek o, Y ., and N . Lapus ta (2010), Supershear transition due to a free surface in 3D simula- tions of spontaneous dynamic rupture on vertical s trike-slip f aults, T ectonophy sics , 493 (3- 4), 272–284. K ohlstedt, D., B. Evans, and S. Mackw ell (1995), Strength of the lithosphere: Constraints imposed by laborator y e xper iments, Jour nal of Geophysical Resear ch: Solid Earth , 100 (B9), 17,587–17,602. Last, M., N . Rabino witz, and G. Leonard (2016), Predicting the maximum ear thquake magnitude from seismic data in Israel and its neighboring countries, PloS one , 11 (1), e0146,101. La wrence, S., and C. L. Giles (2000), Ov erfitting and neural netw orks: conjugate gradient and backpropagation, in N eural Ne tworks, 2000. IJCNN 2000, Pr oceedings of the IEEE- INNS-ENNS International Joint Confer ence on , vol. 1, pp. 114–119, IEEE. La wrence, S., C. L. Giles, and A. C. T soi (1997), Lessons in neural netw ork training: Ov er- fitting may be harder than e xpected, in AAAI/IAAI , pp. 540–545. Marone, C. (1998), Laborator y-deriv ed friction law s and their application to seismic faulting, Annual Review of Earth and Planetary Sciences , 26 (1), 643–696. McGarr, A., and N. Gay (1978), S tate of stress in the Earth ’s crust, Annual Review of Earth and Planetar y Sciences , 6 (1), 405–436. Michie, D., D. J. Spieg elhalter, and C. C. T ay lor (1996), Machine lear ning, neural and s tatis- tical classification, Journal of the American Statistical Association , 91 (433), 436–438. Mullachery, V ., A. Khera, and A. Husain (2018), Ba y esian Neural Netw orks, ArXiv e-prints . Ohnaka, M., Y . Kuwahara, K. Y amamoto, and T . Hirasa wa (1986), Dynamic breakdown pro- cesses and the generating mechanism f or high-freq uency elastic radiation during s tick -slip instabilities, in Earthquake Source Mec hanics , v ol. 37, pp. 13–24, A GU W ashington, DC. Olsen, K., S. Day , L. Dalguer , J. Ma yhew , Y . Cui, J. Zhu, V . Cr uz- Atienza, D. R oten, P . Maec hling, T . Jordan, et al. (2009), Shakeout: Ground motion estimates using an en- semble of larg e earthquakes on the souther n San Andreas f ault with spontaneous rupture propagation, Geophysical Resear c h Le tters , 36 (4). Opitz, D. W ., and R. Maclin (1999), P opular ensemble methods: An empirical s tudy , J. Artif. Intell. Res.(J AIR) , 11 , 169–198. Paolucci, R., F . Gatti, M. Infantino, C. Smerzini, A. Güney Özcebe, and M. Stupazzini (2018), Broadband ground motions from 3D phy sics-based numerical simulations us- ing ar tificial neural networks, Bulletin of the Seismological Socie ty of America , 108 (3A), 1272–1286. Pedregosa, F ., G. V aroquaux, A. Gramf or t, V . Michel, B. Thirion, O. Gr isel, M. Blon- del, P . Prettenhof er , R. W eiss, V . Dubourg, J. V ander plas, A. Passos, D. Cournapeau, M. Br ucher , M. Perrot, and E. Duchesna y (2011), Scikit-learn: Machine Lear ning in Python, Jour nal of Machine Learning Resear ch , 12 , 2825–2830. Perol, T ., M. Gharbi, and M. Denolle (2018), Con v olutional neural network f or earthquake detection and location, Science Advances , 4 (2), e1700,578. Petersen, M. D., M. P . Moschetti, P . M. Po wers, C. S. Mueller , K. M. Haller , A. D. Frankel, Y . Zeng, S. Rezaeian, S. C. Har msen, O. S. Bo yd, N . Field, R. Chen, K. S. Ruks tales, N. Luco, R. L. Wheeler , R. A. Williams, and A. H. Olsen (2014), Documentation f or the 2014 update of the U nited S tates national seismic hazard maps, T ec h. r ep. , R eston, V A, doi:10.3133/ofr20141091. Pe yrat, S., K. Olsen, and R. Madariaga (2001), Dynamic modeling of the 1992 Landers earthquake, Jour nal of Geophysical Resear c h: Solid Earth (1978–2012) , 106 (B11), 26,467–26,482. Polikar , R. (2006), Ensemble based systems in decision making, IEEE Circuits and sy stems mag azine , 6 (3), 21–45. Quinlan, J. R. (1986), Induction of decision trees, Mac hine learning , 1 (1), 81–106. Ripperg er, J., P . Mai, and J.-P . Ampuero (2008), V ariability of near-field ground motion from dynamic ear thquake r upture simulations, Bulletin of the seismological socie ty of America , 98 (3), 1207–1228. –19– Confidential manuscript submitted to Jour nal of Geophysical R esearc h R okach, L. (2010), Ensemble-based classifiers, Artificial Intellig ence R eview , 33 (1), 1–39. R osenblatt, F . (1958), The perceptron: A probabilis tic model f or inf or mation storage and org anization in the brain., Psychological r eview , 65 (6), 386. R ouet-Leduc, B., C. Hulbert, N. Lubbers, K. Bar ros, C. J. Humphrey s, and P . A. Johnson (2017), Machine learning predicts laboratory earthquakes, Geophysical R esearc h Lett ers , doi:10.1002/2017GL074677, 2017GL074677. Rumelhart, D. E., G. E. Hinton, R. J. Williams, et al. (1988), Learning representations by back -propagating er rors, Cognitiv e modeling , 5 (3), 1. Saffer , D. M., and C. Marone (2003), Comparison of smectite-and illite-rich gouge fr ic- tional proper ties: Application to the updip limit of the seismog enic zone along subduction megathrusts, Earth and Planetary Science Letters , 215 (1), 219–235. Saffer , D. M., K. M. Frye, C. Marone, and K. Mair (2001), Laboratory results indicating comple x and potentially unstable frictional beha vior of smectite cla y , Geophysical R e- searc h Le tters , 28 (12), 2297–2300. W ar ren, L. M., and P . M. Shearer (2006), Systematic deter mination of earthquake r upture di- rectivity and fault planes from analy sis of long-per iod P-w av e spectra, Geophysical Jour - nal International , 164 (1), 46–62. Zielke, O., M. Galis, and P . M. Mai (2017), Fault roughness and strength heterogeneity con- trol ear thquake size and stress drop, Geophysical Resear ch Letter s , 44 (2), 777–783. Zoback, M. D. (2010), R eser voir geomec hanics , Cambridg e U niv ersity Press. –20–

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment