A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
📝 Abstract
Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.
💡 Analysis
Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.
📄 Content
A NOVEL MULTIMODAL RUL FRAMEWORK FOR REMAINING USEFUL LIFE ESTIMATION WITH LAYER-WISE EXPLANATIONS Waleed Razzaq School of Automation University of Science and Technology China Hefei, Anhui waleedrazzaq@mail.ustc.edu.cn Yun-Bo Zhao ∗ School of Automation University of Science and Technology China Hefei, Anhui ybzhao@ustc.edu.cn December 9, 2025 ABSTRACT Background: Estimating Remaining Useful Life (RUL) of mechanical systems is a critical aspect of Prognostics and Health Management (PHM) systems. Accurate RUL prediction enables timely maintenance, enhances system reliability, and reduces operational costs. Among various mechanical components, rolling-element bearings are a leading cause of industrial machinery failure, underscor- ing the need for robust and reliable RUL estimation techniques. Problem: Despite numerous existing approaches, many suffer from limitations such as poor general- izability, lack of robustness, high data requirements, and limited interpretability. Methods: To address these challenges, this paper proposes a novel multimodal-RUL framework that integrates both image representations (ImR) and time-frequency representations (TFR) of one- dimensional nonstationary vibration signals obtained from multichannel accelerometers. The pro- posed multimodal-AI architecture comprises three distinct branches. The image and TFR branches extract spatial degradation features using multiple dilated convolutional blocks with residual con- nections, applied to the ImR and TFR inputs, respectively. These features are then merged in fusion branch, where a Long Short-Term Memory (LSTM) network captures temporal degradation patterns. A multi-head attention (MHA) mechanism further refines these features by emphasizing the most informative aspects, which are subsequently processed through Linear layers for RUL prediction. To facilitate effective multimodal-learning, we introduce a comprehensive feature engineering frame- work. Vibration signals are rasterized into ImR using the Bresenham Line Drawing algorithm, while TFRs are derived via Continuous Wavelet Transform. Additionally, we propose a layer-wise relevance propagation method tailored for multimodal architectures (multimodal-LRP), enhancing model transparency and interpretability. Results: The effectiveness of the proposed method is validated using the XJTU-SY and PRONOSTIA benchmark datasets. Experimental results demonstrate that our approach matches and sometimes outperforms baseline models under both seen and unseen operating conditions. Notably, the model achieves competitive performance while requiring approximately 28% less training data for XJTU-SY and 48% less for PRONOSTIA. The results also suggest that the proposed model exhibits strong resilience to various types of noise. Furthermore, the multimodal-LRP explanations substantiate the interpretability and trustworthiness of the method, reinforcing its practical applicability in industrial settings. Keywords PHMs · RUL · Bresenham Line algorithm · multimodal-AI · LRP · Deep learning ∗Corresponding author. Email: ybzhao@ustc.edu.cn arXiv:2512.06708v1 [cs.LG] 7 Dec 2025 A PREPRINT - DECEMBER 9, 2025 1 Introduction Prognostic Health Management Systems (PHMs) are an integral part of complex industrial systems, ensuring safety and reliability by continuously monitoring and evaluating the health conditions of critical components. PHMs are essential for preventing severe operational hazards and ensuring accident-free processes. Remaining Useful Life (RUL) is a key function of PHMs that determines the residual operational lifespan of a machine or its components. Nonstatic machines, particularly rotating machines with rolling-element bearings, are more susceptible to failure due to operation under extreme conditions, leading to wear and tear that affects performance. Several studies have indicated that approximately 40–50% of industrial machine failures are related to these bearings [1]. Therefore, an accurate RUL estimation system is essential for monitoring effective conditions, mitigating risk, and preventing unexpected breakdowns that could disrupt production. In recent years, there has been a focus on developing RUL systems for rolling-element bearings, and several notable accomplishments have been observed, broadly categorized into two approaches: physics-based and data-driven. Physics-based modeling provides information on bearing degradation processes via a set of equations derived from mathematical representations of physical systems. Gazizulin et al. [2] proposed a hybrid approach that integrates finite element (FE) modeling with damage mechanics and experimental validation utilizing continuum damage mechanics to simulate spall formation on bearing raceways, incorporating microstructural details through Poisson–Voronoi tessellation. To enhance accuracy, nonlinear dynamic modeling with FE simulations is combined to analyze the stress–strain response at the spall edge where the rolling element
This content is AI-processed based on ArXiv data.