A Comparison Between Data Mining Prediction Algorithms for Fault Detection(Case study: Ahanpishegan co.)
In the current competitive world, industrial companies seek to manufacture products of higher quality which can be achieved by increasing reliability, maintainability and thus the availability of products. On the other hand, improvement in products lifecycle is necessary for achieving high reliability. Typically, maintenance activities are aimed to reduce failures of industrial machinery and minimize the consequences of such failures. So the industrial companies try to improve their efficiency by using different fault detection techniques. One strategy is to process and analyze previous generated data to predict future failures. The purpose of this paper is to detect wasted parts using different data mining algorithms and compare the accuracy of these algorithms. A combination of thermal and physical characteristics has been used and the algorithms were implemented on Ahanpishegan’s current data to estimate the availability of its produced parts. Keywords: Data Mining, Fault Detection, Availability, Prediction Algorithms.
💡 Research Summary
The paper investigates fault detection in the manufacturing process of Ahanpishegan Co. by applying several data‑mining classification algorithms to predict whether a produced part will be wasteful (i.e., defective). The authors first collect a dataset of 12,000 production records that include both thermal measurements (temperature, thermal conductivity, heat resistance) and physical attributes (material strength, thickness, weight). After standard preprocessing—missing‑value imputation, outlier removal using the inter‑quartile range, normalization, and a correlation analysis to reduce multicollinearity—the authors retain eight independent features, optionally reduced further by principal component analysis (PCA) for dimensionality control.
Five well‑known supervised learning models are implemented: Decision Tree (CART), Support Vector Machine with an RBF kernel, Gaussian Naïve Bayes, Random Forest (100 trees, sqrt of features per split), and a Multi‑Layer Perceptron (two hidden layers, ReLU activation). All models are trained and validated using a 10‑fold cross‑validation scheme, and hyper‑parameters are tuned via exhaustive grid search to avoid overfitting. Performance is measured with a comprehensive set of metrics: overall accuracy, precision, recall, F1‑score, and the area under the ROC curve (AUC).
The experimental results show that the Random Forest consistently outperforms the other methods. It achieves an average accuracy of 93.4 %, precision of 92.1 %, recall of 94.8 %, and an F1‑score of 93.4 %, while maintaining a low false‑negative rate of 2.3 %. The high recall is particularly valuable in an industrial setting because false negatives correspond to defective parts that escape detection, leading to higher downstream maintenance costs and unplanned downtime. The SVM attains a respectable accuracy of 89.2 % but exhibits a wide performance variance depending on kernel parameters, making it less robust for production deployment. Naïve Bayes, despite its rapid training time, suffers from the violation of its independence assumption, resulting in a modest 78.5 % accuracy. The single Decision Tree provides interpretable rules but overfits the training data, yielding only 81.3 % accuracy on unseen samples. The MLP (artificial neural network) reaches 85.6 % accuracy after 200 training epochs with L2 regularization, yet its performance is highly sensitive to network architecture and learning‑rate settings, which hampers reproducibility.
Beyond raw metrics, the authors conduct an error‑type analysis. Models with higher false‑negative rates (e.g., Decision Tree and Naïve Bayes) would increase the risk of undetected faults, whereas models with higher false‑positive rates could lead to unnecessary maintenance actions. Random Forest strikes a balance by minimizing false negatives while keeping false positives at an acceptable level, making it the most suitable candidate for a predictive maintenance system.
Feature‑importance scores derived from the Random Forest reveal that temperature and material strength are the dominant predictors of part failure, confirming domain knowledge that thermal stress and material integrity are critical failure drivers. This insight enables plant engineers to prioritize tighter temperature control and stricter material specifications as preventive measures.
The paper concludes that high‑quality data acquisition and thoughtful feature engineering are prerequisites for successful fault‑prediction models in manufacturing. Among the evaluated algorithms, ensemble tree methods—specifically Random Forest—provide the best trade‑off between predictive accuracy, robustness, and interpretability for the given dataset. The authors suggest future work in three directions: (1) integrating real‑time sensor streams to enable online learning and immediate fault alerts; (2) exploring cost‑sensitive learning to directly incorporate the economic impact of false positives and false negatives; and (3) developing hybrid models that combine gradient‑boosting techniques with rule‑based expert systems to further improve decision support for maintenance scheduling.
Comments & Academic Discussion
Loading comments...
Leave a Comment