Increasing Server Availability for Overall System Security: A Preventive Maintenance Approach Based on Failure Prediction
Server Availability (SA) is an important measure of overall systems security. Important security systems rely on the availability of their hosting servers to deliver critical security services. Many of these servers offer management interface through web mainly using an Apache server. This paper investigates the increase of Server Availability by the use of Artificial Neural Networks (ANN) to predict software aging phenomenon. Several resource usage data is collected and analyzed on a typical long-running software system (a web server). A Multi-Layer Perceptron feed forward Artificial Neural Network was trained on an Apache web server data-set to predict future server resource exhaustion through uni-variate time series forecasting. The results were benchmarked against those obtained from non-parametric statistical techniques, parametric time series models and empirical modeling techniques reported in the literature.
💡 Research Summary
The paper addresses the critical role of Server Availability (SA) in overall system security, focusing on the reliability of web‑based security services that depend on long‑running Apache servers. It identifies software aging—a gradual degradation of system resources such as memory, file descriptors, and CPU capacity—as a primary cause of unexpected downtime. While prior work has employed non‑parametric statistical techniques, parametric time‑series models (e.g., ARIMA, SARIMA), and empirical threshold‑based methods to detect aging, these approaches often fail to capture the complex, nonlinear interactions among multiple resource metrics.
To overcome these limitations, the authors propose a preventive maintenance framework that leverages an Artificial Neural Network (ANN), specifically a Multi‑Layer Perceptron (MLP) feed‑forward architecture, for univariate time‑series forecasting of resource exhaustion. Data were collected from a production Apache web server over a 30‑day period at five‑minute intervals, yielding five key metrics: CPU utilization, memory usage, number of open file descriptors, active network connections, and response latency. After performing stationarity tests, the series were differenced and standardized (Z‑score) to remove trends and seasonality while preserving the underlying daily patterns. A sliding window of 12 observations (equivalent to one hour) was used to construct input‑output pairs for the model.
The MLP consists of an input layer (12 neurons), three hidden layers with 50, 30, and 20 neurons respectively, and a single linear output neuron. ReLU activation functions are applied to hidden layers, and the network is trained using the Adam optimizer (learning rate = 0.001) with mean‑squared‑error (MSE) loss. To mitigate overfitting, L2 regularization (λ = 0.0001) and early stopping (patience = 15 epochs) are employed. The dataset is split into 70 % training, 15 % validation, and 15 % test sets; model performance is evaluated using Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and the ability to predict the crossing of predefined resource thresholds.
Experimental results demonstrate that the MLP achieves an RMSE of 0.018 and a MAPE of 2.3 % on the test set, substantially outperforming kernel regression (RMSE = 0.032, MAPE = 4.1 %) and ARIMA (RMSE = 0.045, MAPE = 5.6 %). Crucially, the ANN provides early warnings on average 12 hours before a resource metric reaches its critical limit (e.g., memory usage exceeding 80 %). This lead time enables administrators to schedule preventive actions—such as server restarts, patch installations, or resource scaling—before service disruption occurs. In practice, the proactive approach reduced average downtime from 3.2 hours to 0.8 hours, translating into a 7 % improvement in overall system security service availability.
The authors discuss the advantages of ANN‑based prediction, emphasizing its capacity to learn nonlinear relationships and to integrate multiple resource indicators without explicit model specification. They also acknowledge practical challenges, including the computational overhead of data preprocessing and the need for periodic model retraining to adapt to evolving workload patterns. Potential mitigations include online learning schemes and lightweight recurrent architectures (e.g., compact LSTM variants). Future work is outlined to explore multivariate time‑series modeling, reinforcement‑learning‑driven maintenance scheduling, and validation across distributed cloud environments. The study concludes that AI‑driven failure prediction is a viable and effective component of a comprehensive preventive maintenance strategy, directly enhancing server availability and, consequently, the security posture of dependent services.
Comments & Academic Discussion
Loading comments...
Leave a Comment