The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils, especially in transportation infrastructure and foundation design. Traditional CBR determination relies on laboratory penetration tests. Despite their accuracy, these tests are often time-consuming, costly, and can be impractical, particularly for large-scale or diverse soil profiles. Recent progress in artificial intelligence, especially machine learning (ML), has enabled data-driven approaches for modeling complex soil behavior with greater speed and precision. This study introduces a comprehensive ML framework for CBR prediction using a dataset of 382 soil samples collected from various geoclimatic regions in Türkiye. The dataset includes physicochemical soil properties relevant to bearing capacity, allowing multidimensional feature representation in a supervised learning context. Twelve ML algorithms were tested, including decision tree, random forest, extra trees, gradient boosting, xgboost, k-nearest neighbors, support vector regression, multi-layer perceptron, adaboost, bagging, voting, and stacking regressors. Each model was trained, validated, and evaluated to assess its generalization and robustness. Among them, the random forest regressor performed the best, achieving strong R2 scores of 0.95 (training), 0.76 (validation), and 0.83 (test). These outcomes highlight the model's powerful nonlinear mapping ability, making it a promising tool for predictive geotechnical tasks. The study supports the integration of intelligent, data-centric models in geotechnical engineering, offering an effective alternative to traditional methods and promoting digital transformation in infrastructure analysis and design.
The California Bearing Ratio (CBR) test, first developed in 1929 by the State Highway Research Office in California to determine the bearing capacity of soils to be used in highway infrastructure, is a method used to investigate the strength of highway and airport pavements [1], [2]. CBR is defined as the ratio of the resistance of the ground at a certain penetration depth against a 49.63 mm diameter piston inserted into the ground at a speed of 1.27 mm/min to the resistance of a standard crushed stone sample at the same penetration depth [3]. The CBR value of the soil is calculated by comparing the measured loads against specific penetration values with standard values found for crushed stone. CBR tests are conducted in two ways: in the laboratory and in the field. CBR testing can be performed in the laboratory using wet CBR and dry CBR. The purpose of the wet CBR test is to determine the minimum bearing capacity at which the voids are completely filled with water. CBR is calculated based on the depth of penetration at 2.5 mm and 5 mm. The CBR value at the 2.5 mm penetration depth is normally taken into account during the design phase. If the CBR value at the 5 mm penetration depth is greater than the 2.5 mm value, the test is repeated. However, if the value is still higher in the new test, the higher CBR value is taken into account [4].
Although CBR testing provides useful information on the strength of road and airport pavements, it involves time-consuming and laborious procedures. This situation has led researchers to study the indirect methods of obtaining the CBR value. The index and compaction characteristics of soils have been frequently used in statistical approaches to determine CBR value [5]- [16]. On the other hand, with the development of computer technologies, many studies have been conducted on the estimation of CBR value with different artificial intelligence techniques. Initially, studies using different ANN architectures have been followed by research based on various machine learning techniques [17]- [25].
Given the limitations of conventional testing methods and the increasing demand for rapid, reliable geotechnical assessments, this study explores the viability of machine learning algorithms for estimating CBR values based on readily available soil parameters. This study investigates the predictive modelling of CBR values using a diverse and regionally representative dataset comprising 382 soil samples from various geographical locations across Türkiye. The dataset includes a broad spectrum of soil properties, which enables comprehensive input characterization for the machine learning algorithms. In this context, twelve distinct regression-based models were implemented and comparatively evaluated, including ensemble-based approaches (Random Forest, Gradient Boosting, AdaBoost, Bagging, Extra Trees, Voting, and Stacking), tree-based models (Decision Tree), kernelbased algorithms (Support Vector Regression), instance-based methods (K-Nearest Neighbours), and neural network architectures (Multi-Layer Perceptron). The aim is to assess the feasibility and predictive capability of these models in capturing the complex relationships between soil index properties and bearing performance, thereby offering a computational framework for efficient, scalable, and field-applicable CBR estimation.
Data used in this study consists of the CBR, Standard Proctor (SP) and the index test results of 382 soil samples. These test results are provided from the laboratory archives of the branches of the General Directorate of Türkiye Highways in different regions. The data included in the database are the CBR values, maximum dry density (MDD), optimum moisture content (OMC), liquid limit (LL), plasticity index (PI), fines content (FC), sand content (SC) and gravel content (GC). The soil samples in the database are diverse types of both fine-grained and coarse-grained samples. Statistical description of the data is given in Table 1.
Predicting natural phenomena with high accuracy has become a challenge for artificial intelligence techniques due to the inherent difficulty involved. In the last decade, Machine Learning (ML) algorithms, a component of artificial intelligence techniques, have been observed to achieve successful results in predicting these difficult phenomena. In addition to their success in predictive accuracy, machine learning algorithms can also utilize resources such as CPU and GPU more efficiently during the computational process. In this study, the following methods, which have successful applications in the literature, were used to estimate the California Bearing Ratio (CBR): XGBoost, Random Forest, Support Vector Regression (SVR), AdaBoost, Multi-layer Perceptron (MLP), K-nearest neighbors (K-NN), Bagging, Extra Trees, Voting, Gradient Boosting, Decision Tree, and Stacking, were modelled in Python and used. The study also employed the grid search method and ML techniques, hyper parametrizi
This content is AI-processed based on open access ArXiv data.