Automated rock joint trace mapping using a supervised learning model trained on synthetic data generated by parametric modelling
This paper presents a geology-driven machine learning method for automated rock joint trace mapping from images. The approach combines geological modelling, synthetic data generation, and supervised image segmentation to address limited real data and class imbalance. First, discrete fracture network models are used to generate synthetic jointed rock images at field-relevant scales via parametric modelling, preserving joint persistence, connectivity, and node-type distributions. Second, segmentation models are trained using mixed training and pretraining followed by fine-tuning on real images. The method is tested in box and slope domains using several real datasets. The results show that synthetic data can support supervised joint trace detection when real data are scarce. Mixed training performs well when real labels are consistent (e.g. box-domain), while fine-tuning is more robust when labels are noisy (e.g. slope-domain where labels can be biased, incomplete, and inconsistent). Fully zero-shot prediction from synthetic model remains limited, but useful generalisation is achieved by fine-tuning with a small number of real data. Qualitative analysis shows clearer and more geologically meaningful joint traces than indicated by quantitative metrics alone. The proposed method supports reliable joint mapping and provides a basis for further work on domain adaptation and evaluation.
💡 Research Summary
The paper introduces a geology‑driven machine‑learning workflow for automatically mapping rock joint traces from photographic images. Recognizing that manually annotating thousands of field images is labor‑intensive, subjective, and often incomplete, the authors generate large synthetic datasets that faithfully reproduce the geometric and statistical properties of natural fracture networks.
Synthetic data creation starts with discrete fracture network (DFN) modelling. By parametrically controlling joint persistence, connectivity, and node‑type distributions, the authors produce 2‑D jointed rock images at field‑relevant scales (tens to hundreds of metres). Each synthetic image comes with a perfect pixel‑wise label, eliminating annotation noise and allowing systematic variation of geological scenarios (joint density, orientation, length distribution, etc.).
For supervised segmentation, a U‑Net‑style convolutional neural network is employed. Three training regimes are compared: (1) pre‑training on synthetic data followed by fine‑tuning on a small set of real images, (2) mixed training that combines synthetic and real images in a single training phase, and (3) a pure zero‑shot approach that uses only the synthetic‑trained model for inference.
Experiments are conducted in two domains. The “box” domain contains relatively clean, consistent labels, while the “slope” domain suffers from biased, incomplete, and noisy annotations typical of field surveys. In the box domain, mixed training yields the highest Intersection‑over‑Union (IoU ≈ 0.71) and F1 scores, demonstrating that synthetic data can effectively supplement scarce real data when the ground truth is reliable. In the slope domain, mixed training tends to overfit the noisy labels; a fine‑tuned model trained on a few real images outperforms mixed training (IoU ≈ 0.64, F1 ≈ 0.71), indicating that fine‑tuning is more robust to label imperfections.
Zero‑shot performance remains modest, confirming that synthetic images alone cannot capture all the visual variability of real outcrops. Nevertheless, the synthetic‑trained model does learn generic joint patterns that transfer partially to real scenes, providing a useful initialization for subsequent fine‑tuning.
A key insight is that quantitative metrics (IoU, Dice) do not fully capture model usefulness; visual inspection reveals that the models often produce geologically plausible joint networks even when metric scores are modest, especially in cases where ground‑truth labels are incomplete. The authors therefore advocate a combined quantitative‑qualitative evaluation approach.
The contributions of the work are threefold: (1) a reproducible parametric DFN‑based synthetic data pipeline that can generate diverse, geologically realistic training sets; (2) an empirical comparison of training strategies that links label quality to the optimal approach (mixed training for clean labels, fine‑tuning for noisy labels); and (3) evidence that a small amount of real data can unlock the generalisation potential of a model pre‑trained on synthetic data, dramatically reducing the manual labeling burden.
Limitations include the current focus on 2‑D imagery (no integration of 3‑D point clouds or multi‑modal data), the exclusion of other structural features such as bedding or fault zones, and the still‑limited zero‑shot transfer capability. Future work is outlined to incorporate 3‑D volumetric data, apply domain‑adaptation techniques (e.g., adversarial training, style transfer), and model label uncertainty using Bayesian deep learning.
In summary, the study demonstrates that geology‑informed synthetic data, when combined with appropriate training strategies, can substantially alleviate data scarcity in rock joint trace mapping, offering a practical pathway toward more automated, consistent, and scalable structural characterization in geotechnical and geological engineering.
Comments & Academic Discussion
Loading comments...
Leave a Comment