A comparative study of transformer models and recurrent neural networks for path-dependent composite materials
Accurate modeling of Short Fiber Reinforced Composites (SFRCs) remains computationally expensive for full-field simulations. Data-driven surrogate models using Artificial Neural Networks (ANNs) have been proposed as an efficient alternative to numerical modeling, where Recurrent Neural Networks (RNNs) are increasingly being used for path-dependent multiscale modeling by predicting the homogenized response of a Representative Volume Element (RVE). However, recently, transformer models have been developed and they offer scalability and efficient parallelization, yet have not been systematically compared with RNNs in this field. In this study, we perform a systematic comparison between RNNs and transformer models trained on sequences of homogenized response of SFRC RVEs. We study the effect on two types of hyperparameters, namely architectural hyperparameters (such as the number of GRU layers, hidden size, number of attention heads, and encoder blocks) and training hyperparameters (such as learning rate and batch size). Both sets of hyperparameters are tuned using Bayesian optimization. We then analyze scaling laws with respect to dataset size and inference accuracy in interpolation and extrapolation regimes. The results show that while transformer models remain competitive in terms of accuracy on large datasets, the RNNs demonstrate better accuracy on small datasets and show better extrapolation performance. Furthermore, under extrapolation, there is a clear difference, where the RNN remains accurate, while the transformer model performs poorly. On the other hand, the transformer model is 7 times faster at inference, requiring 0.5 ms per prediction compared to the 3.5 ms per prediction for the RNN model.
💡 Research Summary
This paper presents a systematic comparison between two deep learning architectures—Gated Recurrent Unit (GRU) based Recurrent Neural Networks (RNNs) and multi‑head self‑attention Transformers—for predicting the homogenized elasto‑plastic response of Short Fiber Reinforced Composites (SFRCs). The authors use a publicly available dataset comprising 547 high‑fidelity finite element / fast Fourier transform simulations of Representative Volume Elements (RVEs). Each sample provides a six‑dimensional strain path and the corresponding stress response, together with fiber orientation tensors. To mitigate the inherent scarcity of such data, a rotation‑based augmentation scheme is applied to the training and validation sets, generating up to 20‑fold more samples (≈10 420 after augmentation).
Both models are trained on sequences of strain‑stress data. The RNN architecture stacks several GRU layers followed by a dropout layer (p = 0.5) and a linear output head. The Transformer replaces recurrence with sinusoidal positional encodings, multi‑head attention, and feed‑forward sub‑layers; the number of encoder blocks and attention heads are treated as hyperparameters.
Hyperparameter optimization is performed via Bayesian Optimization (BO) with 200 trials per model. The search space includes learning rate, batch size, number of epochs, number of layers, hidden dimension, and, for the Transformer, the number of attention heads and encoder layers. Validation Root Mean Squared Error (RMSE) is used as the acquisition objective. BO converges to a best validation RMSE of 5.33 MPa for the RNN and 6.14 MPa for the Transformer. Sensitivity analysis shows the Transformer is highly dependent on the balance of attention heads and encoder layers, while the RNN degrades when hidden size or depth become excessive.
The authors then evaluate scaling behavior by training both models on augmented datasets ranging from 1× (≈521 samples) to 20× (≈10 420 samples). For large datasets (>5 000 samples) both architectures achieve similar RMSE values around 3.5 MPa, yet the Transformer retains a higher maximum absolute error, indicating less robust performance across all timesteps.
Extrapolation capability is examined by testing the models on strain paths that lie outside the training distribution. The RNN maintains a respectable RMSE of 5.4 MPa, whereas the Transformer’s error balloons to 23.6 MPa, demonstrating a pronounced weakness in generalizing to unseen loading histories.
Inference speed is measured on a GPU with batch size = 1. The Transformer predicts a full sequence in ~0.5 ms, whereas the RNN requires ~3.5 ms, making the Transformer roughly seven times faster due to its parallel attention mechanism. This speed advantage is particularly relevant for real‑time multiscale simulations or online control applications.
Overall, the study concludes that model choice should be guided by data availability and application requirements: RNNs excel with limited data and when extrapolation is critical, while Transformers become competitive on abundant data and offer substantial inference‑time benefits. The work also highlights the importance of Bayesian hyperparameter tuning and physics‑preserving data augmentation for both architectures.
Limitations include the focus on a single material system, the absence of physics‑informed or hybrid architectures, and a lack of uncertainty quantification. Future research directions suggested are: extending the comparison to other composite families, integrating physical constraints into Transformer attention, exploring hybrid RNN‑Transformer models, and employing Bayesian inference to quantify predictive uncertainties.
Comments & Academic Discussion
Loading comments...
Leave a Comment