Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms

Reading time: 20 minute
...

๐Ÿ“ Original Info

  • Title: Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms
  • ArXiv ID: 2601.01009
  • Date: 2026-01-03
  • Authors: Mojtaba Aliasghar-Mamaghani, Mohammadreza Khalafi

๐Ÿ“ Abstract

This paper employs a data-driven approach to determine the impact of concrete mixture compositions on the temporal evolution of chloride in concrete structures. This is critical for assessing the service life of civil infrastructure subjected to aggressive environments. The adopted methodology relies on several simple and complex standalone machine learning (ML) algorithms, with the primary objective of establishing confidence in the unbiased prediction of the underlying hidden correlations. The simple algorithms include linear regression (LR), k-nearest neighbors (KNN) regression, and kernel ridge regression (KRR). The complex algorithms entail support vector regression (SVR), Gaussian process regression (GPR), and two families of artificial neural networks, including a feedforward network (multilayer perceptron, MLP) and a gated recurrent unit (GRU). The MLP architecture cannot explicitly handle sequential data, a limitation addressed by the GRU. A comprehensive dataset is considered. The performance of ML algorithms is evaluated, with KRR, GPR, and MLP exhibiting high accuracy. Given the diversity of the adopted concrete mixture proportions, the GRU was unable to accurately reproduce the response in the test set. Further analyses elucidate the contributions of mixture compositions to the temporal evolution of chloride. The results obtained from the GPR model unravel latent correlations through clear and explainable trends. The MLP, SVR, and KRR also provide acceptable estimates of the overall trends. The majority of mixture components exhibit an inverse relation with chloride content, while a few components demonstrate a direct correlation. These findings highlight the potential of surrogate approaches for describing the physical processes involved in chloride ingress and the associated correlations, toward the ultimate goal of enhancing the service life of civil infrastructure.

๐Ÿ“„ Full Content

Corrosion is one of the predominant causes of longterm deterioration of civil infrastructure components. Concrete bridges are commonly employed construction systems within the daily commute networks. During the early stages of concrete setting, a protective film forms around the reinforcing materials due to the high alkalinity of the concrete medium. The formation of this protective layer places the reinforcing materials in a passive state, effectively reducing the risk of corrosion. Electrochemical oxidation-reduction reactions characterize the propensity for corrosion. The breakdown of the protective film leads to a substantial increase in the rate of corrosion reactions, a critical point after which significant serviceability issues occur [1][2][3][4][5].

Corrosion imposes several deleterious effects on concrete structures. In particular, it adversely affects the mechanical characteristics and deformability of the reinforcing materials, as well as the bond properties between steel and concrete [6][7][8][9][10][11][12]. Furthermore, the formation of corrosive products–which often exhibit a volumetric expansion 2 to 6.4 times greater than the original reactants [13]–induces substantial passive pressure in the surrounding concrete, leading to the development of tensile strains and, subsequently, deterioration or spalling in affected regions [14][15][16].

The serviceability of concrete structures is profoundly affected by the content of chloride at the level of the reinforcing materials. Chloride ions destroy the protective layer initially formed around the steel reinforcement, thereby depassivating the steel, and inducing a substantial increase in the corrosion rate [1,[17][18][19]5]. Several prior studies have proposed replacing conventional steel reinforcement with alternative materials to circumvent the serviceability issues associated with steel corrosion [20][21][22][23]. Despite the significant efforts made, reinforced and prestressed concrete structures remain predominant worldwide [12,[24][25][26].

Several previous studies have focused on the description of chloride ingress in the concrete pore network. The intrusion of chloride in concrete can be simply described by a closed-form equation on the basis of Fick’s law of diffusion [2,27]. Advanced finite element approaches have been proposed to capture several interacting phenomena involved in chloride ingress [28,17,18,29,5]. The potential caveat of such numerical methodologies pertains to the several interacting physical mechanisms inherent to the problem setting, a coupled procedure that often entails substantial computational expense [29,5].

In light of the aforementioned limitations, several studies have endeavored to develop surrogate frameworks on the basis of Machine Learning (ML) algorithms to describe the evolution of chloride in concrete. The primary motivation for such efforts lies in the considerable computational effort associated with physics-based modeling schemes, with the ultimate goal of delivering a cost-effective approach [30][31][32][33][34][35][36].

Various approaches have been adopted in the past. Simplified approaches have primarily focused on quantifying the chloride diffusion coefficient involved in the closed-form solution [32,30,34,31,33]. The capability of the physicsinformed approaches to reproduce chloride content has also been investigated [35]. Recent advances have further shown the ability of several ML algorithms to accurately reproduce the chloride content in concrete while concurrently accounting for various interacting phenomena [36].

The vast majority of previous efforts have focused on an isolated facet of the problem, with particular emphasis on determining the diffusivity coefficient entailed in the closedform solution. Considering the complexity inherent in the problem setting, the temporal evolution of chloride has been rarely investigated through ML techniques. The correlation between concrete mixture compositions and the temporal evolution of chloride is another important aspect that has received limited attention. This is of significant importance, as it directly characterizes the service life of civil infrastructure in extreme environments. The need for such capabilities has become increasingly urgent considering the growing concerns pertaining to the impacts of climate change and structural aging on the service life of structures in harsh environments. The availability of such knowledge substantially reinforces the understanding required for optimizing concrete mixture properties and, subsequently, for the design of resilient and long-lasting civil infrastructure.

In order to address the foregoing research need, this study employs various standalone ML algorithms to describe the temporal evolution of chloride in concrete. The methodology incorporates three simple ML algorithms, including linear regression, k-nearest neighbors regression, and kernel ridge regression, as well as four more complex algorithms, including support vector regression, Gaussian process regressor, and two families of artificial neural networks encompassing a feedforward network (i.e., multilayer perceptron, MLP), and a gated recurrent unit. The latter was particularly employed to investigate the sequential response. A wide range of concrete mixture properties, along with various environmental factors, is included in the dataset. The accuracy of the algorithms is evaluated. Further analyses elucidate the contributions of mixture compositions to the temporal evolution of chloride and, subsequently, the role of these features in enhancing the service life of civil infrastructure in harsh environments.

The development of the models is driven by a variety of standalone ML algorithms, including three relatively simple approaches and four more complex methods, with the objective of identifying the hidden correlations between the temporal evolution of chloride and concrete mixture properties. The primary motivation for employing multiple ML algorithms is to evaluate the capabilities of each model and to establish confidence in the unbiased prediction of the underlying hidden correlations.

The first class of algorithms is the linear regression (LR) method. The algorithm provides a simple estimate of the target variable by establishing a linear relation between the input features and the output. The model entails parameters that are estimated using the least squares method by minimizing the sum of squared residuals [37].

The second simple type of algorithm considered rely on k-nearest neighbors (KNN) regression. The specific method is a non-parametric, instance-based learning regressor that makes use of k points (closest neighbors) in the feature space to predict the output. In this study, the prediction is obtained using a weighted average of these neighbors [38,37].

Kernel ridge regression (KRR) is another ML algorithm adopted in this study. The method builds on ridge regression and makes use of kernel trick to implicitly map input features into a higher-dimensional space. The ridge component introduces a penalty parameter to prevent overfitting and improve generalization by shrinking the coefficients [39,37].

Support vector regression (SVR) is a commonly employed ML algorithm across a variety of fields in artificial intelligence. Kernel ridge regression (KRR) and SVR are both grounded in kernel-based learning frameworks, wherein a kernel function is adopted to establish correlations in highdimensional feature space; however, their respective loss functions and optimization objectives are different. The SVR formulation adopted in this study is the ๐œ–-SVR model, which permits errors to reside within the ๐œ–-insensitive tube while penalizing deviations outside this region [40][41][42].

Gaussian process regression (GPR) is another class of algorithms adopted in this study, a nonparametric Bayesian model, which also employs the kernel trick to characterize underlying patterns. The development of the model is driven by the assumption that the underlying function values, evaluated across the feature space, follow a joint Gaussian distribution. GPR further incorporates an independent noise term in the predictions, which is modeled as a Gaussian random variable, typically with zero mean. The kernel function employed in this study is the radial basis function [43,44].

The final class of algorithms relies on artificial neural networks (ANNs). ANNs were initially inspired by biological neural networks, with the objective of reproducing human-like performance [45,46]. The first type of ANN considered in this study is the multilayer perceptron (MLP), which aims to capture the rather complex relation between the input and output features. The MLP is a subclass of feedforward neural networks, with an architecture that enables the information to evolve in a forward direction. The architecture makes use of multiple layers, each with numerous neurons interconnected through synaptic, weighted links. Each neuron contains a computational unit that receives inputs from the preceding layer and subsequently delivers an output, typically following an operation through an activation function. The training of the algorithm is commonly conducted through a back-propagation procedure that employs a gradient-descent scheme for optimization. The latter is performed on the weight factors associated with the synaptic links so as to enhance the efficiency of the network.

The architecture of the commonly-employed MLP algorithms is inherently devoid of the capability to explicitly handle sequential modeling. Therefore, the final advanced ML algorithm adopted in this study relies on the Gated Recurrent Unit (GRU) [47]. The GRU is a subclass of Recurrent Neural Networks (RNNs) with gating mechanisms that adaptively capture temporal dependencies, while mitigating some of the caveats of the traditional RNNs, such as the vanishing gradient problem. The motivation for employing GRUs in this study stems from the inherent temporal dependencies present in the dataset.

The dataset adopted in this study relies on the experimental tests conducted on concrete specimens in the literature [48-50, 17, 51-55] and is available through an online repository [56]. The present study primarily focuses on diffusion and neglects advection. In this respect, the dataset is devoid of any experimental instance with a driving force governed by moisture flow (advection). The implication of this assumption on the existing dataset [56,36] was to exclude moisture-driven data (advection) and solely focus on concentration-gradient-driven data (diffusion). This assumption was pursued to enhance the reliability of results and forgo any potential error that might arise from advection.

The analyzed dataset contains a range of environmental factors and mixture properties that constitute the inventory of input and output variables. Within the context of this study, the quantity of interest is the chloride content in the porous network of concrete. The inventory of input variables includes environmental factors such as the content of surface chloride, time of exposure, temperature, as well as depth from the exposure surface, concomitant with concrete mixture compositions, including the content of water, sulfateresisting Portland cement (SPRC), ordinary Portland cement (OPC), water-to-binder ratio, fly ash (FA), silica fume (SF), ground granulated blast-furnace slag (GGBS), superplasticizer, and fine and coarse aggregates.

The development of the model and the implementation of the framework relied on established libraries in the literature [40,57]. The process involved in training included the separation of the dataset into training and test splits, with the respective ratios of 0.75 and 0.25. The dataset was then normalized to have a mean value of 0 and a standard deviation of 1.0. Subsequently, the fine-tuning process employed 10-fold cross-validation to mitigate the potential for overfitting.

The performance of the ML algorithms was appraised employing a variety of metrics, including the coefficient of determination (๐‘… 2 ), mean absolute error (MAE), mean square error (MSE) and its root (RMSE), and mean absolute percentage error (MAPE). The best performance is obtained when the coefficient of determination approaches unity, while values approaching zero indicate the best results for the other metrics.

The prediction capability of each ML algorithm is summarized in Figure 1. Considering the characteristics of each learner, as anticipated, linear regression (LR) provides a poor estimate, with a performance metric of ๐‘… 2 = 0.62. The k-nearest neighbors (KNN) regression demonstrates a relatively satisfactory estimate, achieving the respective ๐‘… 2 scores of 0.89 and 0.82 for the training and test splits.

The kernel ridge regression (KRR) and support vector regression (SVR) exhibit good predictive performance with the respective test-set ๐‘… 2 scores of 0.90 and 0.89. The Gaussian process regression (GPR) provides an accurate estimate, exhibiting a test-set score of ๐‘… 2 = 0.91.

Subsequently, considering the adopted artificial neural networks, the multilayer perceptron (MLP) provides good performance, with a test-set score of ๐‘… 2 = 0.90. On the other hand, the Gated recurrent unit (GRU) shows poor performance, achieving a score of ๐‘… 2 = 0.55.

Pursuant to the establishment of models and the evaluation of their accuracy in predicting the content of chloride, further investigations are conducted on the sensitivity of the concrete mixture compositions to the temporal evolution of chloride. In light of the foregoing evaluation metrics and the accuracy achieved, the GRU was delisted from the inventory of ML learners for subsequent analyses.

The impact of concrete mixture properties on the temporal evolution of chloride in concrete is elucidated by examining the correlations between concrete mixture constituents and the space-time domain. The investigation is carried out using a reference concrete mixture as a baseline [53], from which the influence of individual mixture properties is appraised by varying each parameter over its entire range observed in the dataset [56]. The reference concrete mixture consisted of 184 kg/m 3 of water, 460 kg/m 3 of Ordinary Portland cement, 100 kg/m 3 of fly ash, and fine and coarse aggregate contents of 700 and 1050 kg/m 3 , respectively, together with superplasticizer content of 1.8 kg/m 3 . The specimen was exposed to a surface chloride of 19.6 gr/l for a duration of 1.3 years under an average ambient temperature of 9 โ€ข C [53]. The reference point for the temporal plot was fixed at a location 10 mm away from the exposure surfaceunless otherwise specified.

Figure 2 illustrates the impact of concrete mixture water content on the temporal evolution of chloride using various ML algorithms-10 mm below the exposure surface. Given the simplicity of the LR algorithm, a linear response is captured by the LR method. The KNN algorithm also provides a simple representation of the overall trend, while other methods reveal a clearer pattern. In general, an increase in water content is found to increase chloride concentration over time for the LR, KNN, and MLP algorithms, while an inverse relationship is observed when employing the KRR, SVR, and GPR algorithms. Although an inconsistent relationship was found across different algorithms, the direct correlation seems more plausible when considering the physics of the problem. The sensitivity of the results to the sulfate-resisting Portland cement (SPRC) content is depicted in Figure 3. Considering the predictive capability of each ML learner, a relatively clear trend is manifested by the MLP, wherein an increase SPRC content leads to a reduction in chloride content over time. Similar trends, to some extent, are also observed for the KRR, SVR, and GPR algorithms, indicating an inverse relationship between the temporal evolution of chloride and SPRC content.

The correlation between the content of ordinary Portland cement (OPC) and the temporal evolution of chloride is depicted in Figure 4. Similar to preceding observations, the LR and KNN models provide simple estimates of the overall trend, while the KRR and GPR models manifest a clearer trend. In this respect, an increase in OPC content leads to a reduction in chloride concentration.

Figure 5 shows the impact of the water-to-binder ratio on the temporal evolution of chloride. The simplified ML learners provide a rough estimate of this parameter. The advanced models provide a clearer picture of the overall trend; however, the trend seems to alternate over time. In this respect, in general, there exists a direct correlation between chloride content and water-to-binder ratio in the first few years, after which it changes to an inverse correlation.

The impact of the fly ash (FA) content on the temporal evolution of chloride is depicted in Figure 6. The sensitivity of the model response to this parameter is generally similar across all advanced ML algorithms. The relationship is particularly clear in the KRR, SVR, and GPR models, which Figure 7 demonstrates the correlation between silica fume (SF) content and chloride concentration over time. Among ML algorithms, the KRR, GPR, and MLP models capture a clear trend of the overall response. As the silica fume content increases, the concentration of chloride in the concrete decreases. As deduced from the figure, the optimum performance corresponds to the case with the highest SF content, which represents the most effective configuration for service life enhancement.

The sensitivity of the results to the ground granulated blast-furnace slag (GGBS) content is illustrated in Figure 8. Consistent with previous observations, KRR and GPR provide a clear representation of the model response. In general, there exists an inverse correlation between high GGBS content and chloride concentration over time, where a significant increase in the former leads to a reduction in the latter; however, as deduced from the figure, chloride concentration is highly sensitive to GGBS content, with relatively small amounts producing a significant effect on chloride concentration over time. Similar trends are observed for SVR and, to some extent, the MLP.

Figure 9 shows the impact of superplasticizer on chloride concentration at a fixed point in time. Among the adopted ML algorithms, the GPR model provides a consistent trend, whereas the responses of other learners (SVR, KRR, and MLP) fluctuate over time. In this respect, an inverse relationship is predicted by GPR, while the MLP demonstrates a relatively direct correlation. The response of the KRR is The correlation between fine aggregates and chloride content is further investigated and shown in Figure 10. Among the adopted ML algorithms, the responses obtained from KRR, SVR, and GPR are relatively similar. In general, an increase in fine aggregate content is found to reduce chloride concentration over time. However, the response is sensitive to the specific aggregate content. In particular, the response of MLP is clearer for lower aggregate contents. Considering the overall responses, there appears to be an optimal value of fine aggregate content-within the adopted range-that delivers the most promising response.

Subsequently, the correlation between coarse aggregate content and chloride concentration is depicted in Figure 11. Among all ML algorithms, the KRR, GPR, and MLP models clearly reveal the underlying relationship, wherein an increase in coarse aggregate content leads to a corresponding increase in chloride concentration over time. However, there exists a specific quantity-just below the maximum adopted in this study-that produces the most adverse mixture, resulting in the highest chloride concentration. On the other hand, the optimal value for durability is associated with the minimum coarse aggregate content.

The evolution of chloride in concrete was successfully described using a variety of ML algorithms. The MLP delivered the most accurate results, achieving training and A significant observation pertains to the accuracy of the gated recurrent unit (GRU) network. As discussed in the foregoing sections, the fundamental premise of this approach lies in its ability to reproduce sequential data. Although the training set yielded a satisfactory performance with a score of ๐‘… 2 = 0.89, the test set accuracy was poor (๐‘… 2 = 0.55). This discrepancy indicates that the sequential dependency present in the data imposes a significant constraint on the model, thereby compromising its predictive accuracy. The challenges in training and testing can be associated with the diversity of concrete mixture proportions adopted in the dataset.

An important observation regarding the hidden correlations between concrete mixture constituents and chloride ingress pertains to the ability of each ML learner to capture the latent relationships between input and output features. In this regard, LR provided a rough linear estimate of the underlying pattern. Similarly, KNN produced a rather discrete and somewhat unclear representation. SVR and MLP captured a clear trend for the majority of mixture compositions, although in some cases they exhibited variations in the temporal pattern. The KRR also delivered an acceptable response, with a similar limitation, while GPR consistently produced a clear trend across the entire input features.

Pursuant to the assessment of all ML algorithms, the hidden correlation between input and output features was unraveled. An inverse relationship was observed between the temporal evolution of chloride content and several concrete mixture compositions, including sulfate-resisting Portland cement, ordinary Portland cement, fly ash, silica fume, and superplasticizer. A similar trend was relatively found for fine aggregate content. In contrast, a clear direct correlation was identified between chloride content and coarse aggregate content, whereby an increase in the latter resulted in an increase in the former. The correlations of chloride content with water content and the water-to-binder ratio were not very clear and alternated over time; however, the MLP provided a general direct correlation with respect to these features in the early stages of exposure. The evolution of chloride was highly sensitive to ground granulated blastfurnace slag (GGBS), with small variations in its quantity leading to substantial changes in the chloride content.

A noteworthy consideration pertains to the physical interpretation of the observed phenomena. While the adopted ML algorithms provide insight into the relationships between input and output features, the underlying physical mechanisms governing these observations require further investigation and constitute a topic for future study. In this regard, some of the observed trends may be partially explained by changes in the pore morphology of concrete over time. For example, an increase in coarse aggregate content may lead to higher overall porosity, which in turn facilitates chloride diffusion. In contrast, increasing the fly ash content can reduce concrete porosity and, consequently, diminish the temporal diffusion of chloride. In addition, secondary chemical reactions associated with mixture compositions may influence the temporal evolution of chloride ingress. A The description of the hidden correlations was successfully achieved using a variety of ML algorithms. However, the optimization of concrete mixture proportions represents an interesting direction for future research. Such investigations may presently be challenging due to the limited number of experimental measurements available in the literature, as well as the complex coupling among input features and space-time domain.

This study employs various machine learning (ML) algorithms to assess the impact of concrete mixture compositions on chloride ingress in concrete structures. The adopted standalone ML algorithms consist of three simple and four complex methods. The simple algorithms include linear regression (LR), k-nearest neighbors (KNN) regression, and kernel ridge regression (KRR). The complex algorithms comprise support vector regression (SVR), Gaussian process regression (GPR), and two families of artificial neural networks, including multilayer perceptron (MLP) and gated recurrent unit (GRU). The KRR, GPR, and MLP algorithms successfully predicted the chloride content in concrete, achieving a minimum test-set score of ๐‘… 2 = 0.90. The GRU was unable to accurately capture the sequential response in the test set, while achieving a good score on the training set. The latent relationships between the input features and the temporal evolution of chloride were appraised. The LR and KNN algorithms provided a rough estimate, while SVR, MLP, and KRR provided an acceptable and explainable response. On the other hand, GPR demonstrated the most promising performance by capturing clear and explainable trends. A clear inverse correlation was observed between chloride content and several mixture compositions, including the contents of sulfate-resisting Portland cement, ordinary Portland cement, fly ash, silica fume, and superplasticizer, as well as, to a lesser extent, fine aggregate. In contrast, a clear direct correlation was identified between coarse aggregate content and chloride ingress. The results were found to be sensitive to the content of ground-granulated blast-furnace slag. These findings emphasize the very premise of surrogate approaches for describing chloride ingress in concrete, while concurrently identifying the latent correlations toward the ultimate goal of extending the service life of civil infrastructure.

Mojtaba Aliasghar-Mamaghani: Writing -review & editing, Writing -original draft, Software, Methodology, Investigation, Formal analysis, Conceptualization. Mohammadreza Khalafi: Writing -review & editing, Writingoriginal draft, Software, Methodology, Investigation, Formal analysis, Conceptualization.

๐Ÿ“ธ Image Gallery

1-water.png 10-CoarseAgg.png 2-SRPC.png 3-OPC.png 4-WtoB.png 5-FA.png 6-SF.png 7-GGBS.png 8-Superplast.png 9-FineAgg.png Fig1-Accuracy.png cas-email.jpeg cas-url.jpeg

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut