Automated univariate time series forecasting with regression trees
This paper describes a methodology for automated univariate time series forecasting using regression trees and their ensembles: bagging and random forests. The key aspects that are addressed are: the use of an autoregressive approach and recursive forecasts, how to select the autoregressive features, how to deal with trending series and how to cope with seasonal behavior. Experimental results show a forecast accuracy comparable with well-established statistical models such as exponential smoothing or ARIMA. Furthermore, a publicly available software implementing all the proposed strategies has been developed and is described in the paper.
💡 Research Summary
This paper proposes a fully automated framework for univariate time‑series forecasting that leverages regression trees and two of their most popular ensemble extensions: bagging and random forests. The authors begin with a concise refresher on regression trees, emphasizing their non‑parametric nature, binary recursive partitioning, and built‑in feature selection. They then embed the tree learner into a classic autoregressive (AR) setting: a sliding window of past observations (lags) forms the feature vector, and the current observation is the target. Forecasts for horizons greater than one are generated using the recursive strategy, whereby each one‑step ahead prediction is fed back as an input for the next step.
A central challenge identified is that plain regression trees predict only values within the range of the training data, making them unsuitable for series with trends. To overcome this, the authors introduce three preprocessing strategies: (1) differencing, which removes deterministic trends by converting the series to first‑differences (and optionally second‑differences); (2) an additive transformation, where each target is centered by subtracting the mean of its associated lag vector, and optionally each lag vector is also centered; (3) a multiplicative transformation, analogous to the additive case but using division, designed for series with exponential growth. After training on the transformed series, forecasts are back‑transformed to the original scale.
Seasonality is addressed by careful lag selection that captures the periodicity of the series. An illustrative artificial quarterly series shows that a single‑lag tree can learn four distinct rules that perfectly reproduce the seasonal pattern, demonstrating the interpretability of tree‑based models.
The paper then describes how bagging and random forests are integrated. Bagging creates multiple deep trees on bootstrap samples and averages their predictions, reducing variance. Random forests add an extra layer of randomness by selecting a random subset of features at each split, further decorrelating the trees and improving robustness. Both ensembles work well with default hyper‑parameters, which aligns with the goal of minimal manual tuning.
Empirical evaluation uses the M4 competition’s yearly dataset (23,000 series). Forecast accuracy is measured with MASE (Mean Absolute Scaled Error), a scale‑independent metric that compares against a naïve seasonal benchmark. Results show that the additive transformation applied to both features and targets yields the best performance (Mean MASE = 3.387, Median = 2.468), followed closely by the additive transformation applied only to targets. Differencing also improves over the baseline (Mean = 4.020). In contrast, using the raw series without any transformation leads to a substantially higher error (Mean = 7.902). These findings confirm that trend‑removal or level‑normalization is essential for tree‑based forecasts.
All proposed methods are packaged into an R library released on CRAN. The package offers a simple API where users specify the number of lags, the transformation type, and the ensemble method; the library then handles data preparation, model training, recursive forecasting, and performance reporting automatically.
In summary, the contributions of the paper are fourfold: (1) a clear recipe for applying regression trees to univariate time series via an autoregressive formulation; (2) systematic strategies for handling trends (differencing, additive, multiplicative) and seasonality; (3) demonstration that bagging and random forests provide robust, out‑of‑the‑box performance without hyper‑parameter tuning; and (4) an open‑source implementation that makes the entire pipeline readily usable by practitioners. The work shows that tree‑based methods can achieve forecast accuracy comparable to classical statistical models such as ARIMA and exponential smoothing, while offering interpretability and ease of automation. Future directions suggested include extending the approach to multivariate series, automated hyper‑parameter search, and hybridization with other machine‑learning models.
Comments & Academic Discussion
Loading comments...
Leave a Comment