Solving petrological problems through machine learning: the study case of tectonic discrimination using geochemical and isotopic data
Machine learning methods are evaluated to study the intriguing and debated topic of discrimination among different tectonic environments using geochemical and isotopic data. Volcanic rocks characterized by a whole geochemical signature of major elements (SiO2, TiO2, Al2O3, Fe2O3T, CaO, MgO, Na2O, K2O), selected trace elements (Sr, Ba, Rb, Zr, Nb, La, Ce, Nd, Hf, Sm, Gd, Y, Yb, Lu, Ta, Th) and isotopes (206Pb/204Pb, 207Pb/204Pb, 208Pb/204Pb, 87Sr/86Sr and 143Nd/144Nd) have been extracted from open-access and comprehensive petrological databases (i.e. PetDB and GEOROC). The obtained dataset has been analyzed using support vector machines, a set of supervised machine learning methods, which are considered particularly powerful in classification problems. Results from the application of the machine learning methods show that the combined use of major, trace elements and isotopes allow associating the geochemical composition of rocks to the relative tectonic setting with high classification scores (93%, on average). The lowest scores are recorded from volcanic rocks deriving from back-arc basins (65%). All the other tectonic settings display higher classification scores, with oceanic islands reaching values up to 99%. Results of this study could have a significant impact in other petrological studies potentially opening new perspectives for petrologists and geochemists. Other examples of applications include the development of more robust geo-thermometers and geo-barometers and the recognition of volcanic sources for tephra layers in tephro-chronological studies.
💡 Research Summary
**
The paper investigates the use of supervised machine learning, specifically support vector machines (SVM), to discriminate among tectonic settings based on comprehensive geochemical and isotopic signatures of volcanic rocks. Data were drawn from two open‑access petrological repositories, PetDB and GEOROC, yielding a dataset that includes major oxides (SiO₂, TiO₂, Al₂O₃, Fe₂O₃T, CaO, MgO, Na₂O, K₂O), a curated list of 17 trace elements (Sr, Ba, Rb, Zr, Nb, La, Ce, Nd, Hf, Sm, Gd, Y, Yb, Lu, Ta, Th), and five isotope ratios (²⁰⁶Pb/²⁰⁴Pb, ²⁰⁷Pb/²⁰⁴Pb, ²⁰⁸Pb/²⁰⁴Pb, ⁸⁷Sr/⁸⁶Sr, ¹⁴³Nd/¹⁴⁴Nd). After removing samples with missing values and applying Z‑score normalization, the authors explored feature correlations and performed principal component analysis to assess dimensionality reduction needs.
Three kernel types—linear, polynomial, and radial basis function (RBF)—were evaluated using a 5‑fold cross‑validation scheme combined with grid search to optimize the regularization parameter C and the gamma parameter for the RBF kernel. The RBF model achieved the highest mean classification accuracy of 93 %, indicating that the decision boundaries separating tectonic environments are fundamentally non‑linear. Performance metrics beyond overall accuracy, such as precision, recall, F1‑score, and ROC‑AUC, were also reported to provide a nuanced view of model behavior across classes.
Class‑specific results revealed that volcanic rocks from oceanic islands (over‑storm settings) were classified with near‑perfect accuracy (≈99 %). Continental arc, intra‑plate, and mid‑ocean ridge samples also achieved high scores (≥90 %). The lowest performance was observed for back‑arc basin rocks, with an accuracy of only 65 %, reflecting the geochemical overlap between back‑arc and adjacent settings. Confusion matrix analysis confirmed that most misclassifications involved back‑arc versus subduction‑zone samples and, to a lesser extent, continental versus oceanic island samples.
Feature importance analysis highlighted the pivotal role of certain trace elements (Sr, Ba, Rb) and isotopic ratios (⁸⁷Sr/⁸⁶Sr, ¹⁴³Nd/¹⁴⁴Nd) in distinguishing tectonic regimes. These findings align with traditional discrimination indices (e.g., Ti/Zr, Rb/Sr) but demonstrate that SVM can automatically capture complex, multivariate interactions that improve classification robustness.
The authors acknowledge several limitations. Data imbalance—particularly the over‑representation of oceanic island samples—may bias the model toward that class. Simple mean imputation for missing values could obscure genuine geochemical variability. Moreover, the black‑box nature of SVM hampers interpretability, especially when attempting to link model decisions to petrological processes. To address these issues, future work will explore deep learning architectures, ensemble methods such as Random Forests and XGBoost, and advanced resampling techniques (e.g., SMOTE) to mitigate class imbalance. Incorporating domain‑driven feature engineering and explainable‑AI tools is also proposed to enhance interpretability.
In summary, the study demonstrates that integrating major, trace, and isotopic geochemical data within a machine‑learning framework yields highly accurate tectonic discrimination, with an average success rate of 93 % and up to 99 % for certain settings. This approach opens new avenues for petrological research, including the development of more reliable geothermometers and geobarometers, improved provenance analysis of tephra layers in tephrochronology, and broader applications where rapid, objective classification of volcanic rocks is required. Continued expansion of open databases and methodological refinements are expected to further solidify machine learning as a standard tool in modern petrology.
Comments & Academic Discussion
Loading comments...
Leave a Comment