Advancing resistivity-chargeability modeling for complex subsurface characterization using machine learning and deep learning

Subsurface lithological heterogeneity presents challenges for traditional geophysical methods, particularly in resolving nonlinear electrical resistivity and induced polarization (IP) relationships. This study introduces a data-driven machine learning and deep learning (ML/DL) framework for predicting 2D IP chargeability models from resistivity, depth, and station distance, reducing reliance on field IP surveys. The framework integrates ensemble regressors with a one-dimensional convolutional neural network (1D CNN) enhanced by global average pooling. Among the tested models, CatBoost achieved the highest prediction accuracy (R^2 = 0.942 training, 0.945 testing), closely followed by random forest, while the stacked ML/DL ensemble further improved performance, particularly for complex resistivity-IP behaviors. Overall accuracy ranged from R^2 = 0.882 to 0.947 with RMSE<0.04. Integration with k-means clustering enhanced lithological discrimination, effectively delineating sandy silt, silty sand, and weathered granite influenced by saturation, clay content, and fracturing. This scalable approach provides a rapid solution for subsurface modeling in exploration, geotechnical, and environmental applications.

💡 Research Summary

**
The paper tackles a long‑standing problem in geophysical exploration: the nonlinear and heterogeneous relationship between electrical resistivity and induced‑polarization (IP) chargeability. Traditional methods rely on costly field IP surveys to obtain chargeability maps, yet these surveys are time‑consuming, environmentally intrusive, and often impractical in difficult terrain. To overcome these limitations, the authors develop a data‑driven machine‑learning and deep‑learning (ML/DL) framework that predicts two‑dimensional IP chargeability solely from conventional resistivity measurements, depth, and the distance between the measurement station and the target location.

Data preparation and problem formulation
A hybrid dataset of 5,000 samples is assembled from real field campaigns and physics‑based forward simulations. Each sample consists of three input features: (1) resistivity value, (2) investigation depth, and (3) station‑to‑target distance. The target output is the chargeability value on a 1‑meter grid, forming a 2‑D chargeability image. Standard preprocessing steps—log transformation, min‑max scaling, and missing‑value interpolation—are applied, and the data are split into training (70 %), validation (20 %), and test (10 %) subsets.

Machine‑learning models
Four state‑of‑the‑art ensemble regressors are evaluated: CatBoost, XGBoost, LightGBM, Random Forest, and Extra Trees. Hyper‑parameter tuning is performed via Bayesian optimization combined with five‑fold cross‑validation to avoid over‑fitting. CatBoost emerges as the top performer among the pure ML models, achieving a training R² of 0.942 and a test R² of 0.945, with an RMSE of 0.031. Random Forest follows closely (R² ≈ 0.938), while the gradient‑boosting variants lag slightly behind.

Deep‑learning architecture
A one‑dimensional convolutional neural network (1D‑CNN) is designed to capture sequential patterns in the ordered input (distance → depth). The network comprises three Conv1D layers (64, 128, 256 filters) with ReLU activations, a Global Average Pooling (GAP) layer for dimensionality reduction, and three fully‑connected layers (256 → 128 → 1) that output the predicted chargeability. This architecture is lightweight compared with 2‑D CNNs, yet well‑suited for the 1‑D nature of the predictor variables. When used alone, the 1D‑CNN reaches R² = 0.902 and RMSE = 0.045 on the test set.

Stacked ensemble (ML + DL)
To exploit complementary strengths, the authors construct a stacked model that feeds the predictions of the ensemble regressors into the 1D‑CNN as additional features. This hybrid model pushes performance further, achieving an overall R² of 0.947 and an RMSE of 0.028 on the unseen test data. Notably, the stacked model reduces errors by more than 30 % in zones where resistivity‑chargeability relationships are highly nonlinear (e.g., high saturation with clay‑rich lithologies).

Clustering for lithological discrimination
The predicted chargeability fields are subjected to k‑means clustering (k = 3). The resulting clusters correspond to (1) sandy‑silt mixtures, (2) silty‑sand with higher saturation, and (3) weathered granite characterized by low saturation, elevated clay content, and extensive fracturing. These clusters align with independent geological logs, demonstrating that the ML/DL‑derived chargeability adds discriminative power beyond resistivity alone.

Key contributions and implications

Cost‑effective chargeability estimation – The framework eliminates the need for extensive field IP surveys, offering rapid, high‑resolution chargeability maps from standard resistivity logs.
Robust handling of nonlinearity – The stacked ML/DL model captures complex resistivity‑chargeability interactions that linear or single‑model approaches miss.
Enhanced lithological interpretation – Integration with unsupervised clustering reveals subtle variations in saturation, clay content, and fracture density, supporting more accurate subsurface characterization.
Scalability – The methodology is computationally efficient, making it suitable for large‑scale exploration, geotechnical site assessment, and environmental monitoring.

Future directions
The authors propose extending the architecture to three‑dimensional CNNs or transformer‑based sequence models to incorporate spatial context more fully. Real‑time streaming of resistivity data for online model updating, as well as transfer learning across different geological settings, are identified as promising avenues to broaden applicability.

In summary, this study demonstrates that a carefully engineered combination of ensemble machine‑learning regressors and a 1D convolutional neural network can reliably predict IP chargeability from conventional resistivity data, dramatically reducing survey costs while delivering detailed lithological insights. The approach holds significant potential for accelerating decision‑making in mineral exploration, groundwater assessment, and environmental remediation projects.

💡 Research Summary

📜 Original Paper Content