Continual Learning for non-stationary regression via Memory-Efficient Replay
Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.
💡 Research Summary
The paper addresses the challenge of applying continual learning (CL) to regression problems in non‑stationary data streams, a scenario common in Industry 4.0 where models must adapt quickly without catastrophic forgetting. While most CL research focuses on classification, the authors propose the first prototype‑based generative replay framework specifically designed for online, task‑free continual regression. Their solution builds upon the TRIL3 architecture—originally created for tabular classification—by introducing three key innovations.
First, they replace discrete class labels with an Adaptive Output‑Space Discretization (AOSD) mechanism. The continuous target variable is dynamically partitioned into bins based on statistical properties of the incoming stream (e.g., moving averages and variance). These bins act as pseudo‑labels, enabling the prototype generation algorithm (XuIL‑VQ) to operate in a regression context.
Second, they integrate a Mixture Density Network (MDN) into the generative component. Instead of producing a single deterministic output for each prototype, the MDN learns a mixture of Gaussians conditioned on the prototype, thereby modeling the full predictive distribution of the target. This probabilistic treatment preserves uncertainty, improves the realism of synthetic samples, and mitigates the bias that would arise from using only mean values.
Third, the framework is entirely memory‑efficient: no raw data are stored. Only the compact set of prototypes and the MDN parameters are retained, which dramatically reduces storage requirements, eliminates privacy concerns, and makes the approach suitable for edge or IoT devices with limited resources.
The authors evaluate the method on several heterogeneous tabular regression benchmarks (e.g., UCI Energy, Bike Sharing, Power Consumption). Evaluation proceeds in two stages. In the first stage, the proposed method is compared against an offline Random Forest regressor and a conventional experience‑replay baseline. Results show a reduction in the average forgetting ratio of more than 15 % and competitive root‑mean‑square error (RMSE) performance. In the second stage, the framework is directly compared with CLeaR, the current state‑of‑the‑art continual regression system that relies on storing raw samples. Despite not storing any data, the new approach achieves equal or lower forgetting ratios and exhibits far more stable performance during update phases, especially when the data distribution shifts abruptly.
Overall, the paper contributes (1) a novel way to discretize continuous outputs for prototype‑based replay, (2) a probabilistic generative model (MDN) that yields high‑quality synthetic regression data, and (3) a memory‑efficient architecture that respects privacy and hardware constraints. These advances make the method immediately applicable to real‑time predictive maintenance, demand forecasting, and other Industry 4.0 use cases where data streams are non‑stationary and resources are limited. Future work is outlined to extend the approach to multivariate time‑series, multimodal sensor fusion, and more sophisticated adaptive binning strategies for high‑dimensional output spaces.
Comments & Academic Discussion
Loading comments...
Leave a Comment