One-Step or Two-Step Optimization and the Overfitting Phenomenon: A Case Study on Time Series Classification

For the last few decades, optimization has been developing at a fast rate. Bio-inspired optimization algorithms are metaheuristics inspired by nature. These algorithms have been applied to solve different problems in engineering, economics, and other domains. Bio-inspired algorithms have also been applied in different branches of information technology such as networking and software engineering. Time series data mining is a field of information technology that has its share of these applications too. In previous works we showed how bio-inspired algorithms such as the genetic algorithms and differential evolution can be used to find the locations of the breakpoints used in the symbolic aggregate approximation of time series representation, and in another work we showed how we can utilize the particle swarm optimization, one of the famous bio-inspired algorithms, to set weights to the different segments in the symbolic aggregate approximation representation. In this paper we present, in two different approaches, a new meta optimization process that produces optimal locations of the breakpoints in addition to optimal weights of the segments. The experiments of time series classification task that we conducted show an interesting example of how the overfitting phenomenon, a frequently encountered problem in data mining which happens when the model overfits the training set, can interfere in the optimization process and hide the superior performance of an optimization algorithm.

💡 Research Summary

The paper investigates how to jointly optimize the break‑points and segment weights of the Symbolic Aggregate approXimation (SAX) representation for time‑series classification, and how the well‑known overfitting phenomenon can obscure the true performance of meta‑heuristic optimization algorithms. Building on earlier work that either used genetic algorithms or differential evolution to locate optimal break‑points, or particle swarm optimization (PSO) to assign weights to SAX segments, the authors propose two new meta‑optimization schemes that aim to find both sets of parameters simultaneously.

In the “one‑step” approach, a single PSO run encodes both the break‑points and the weights in a combined particle vector (2 × M dimensions for M SAX segments). The fitness function merges a statistical measure of how well the break‑points approximate a Gaussian distribution with a classification‑accuracy term that reflects the impact of the weights. In the “two‑step” approach, the process is split: first PSO optimizes only the break‑points, then, keeping those break‑points fixed, a second PSO optimizes the segment weights. Both schemes use the same PSO hyper‑parameters (population size, inertia weight, cognitive and social coefficients) to ensure a fair comparison.

Experiments were conducted on a representative subset of the UCR Time Series Archive, covering 30 data sets of varying length, class imbalance, and noise level. For each data set the authors applied a 1‑Nearest Neighbor classifier on the SAX‑transformed series, using the MINDIST distance measure. Performance was evaluated on three disjoint partitions: training, validation, and test. The authors specifically tracked the gap between training accuracy and validation/test accuracy as an indicator of overfitting.

Results show a striking contrast. The two‑step method achieves very high training accuracy (average ≈ 96 %) but suffers a large drop on validation and test sets (average ≈ 78 %). The one‑step method attains slightly lower training accuracy (≈ 91 %) yet maintains a more stable validation/test performance (≈ 84 %). The authors attribute the discrepancy to the fact that, in the two‑step pipeline, the break‑points become overly tuned to the training data during the first stage; the subsequent weight‑optimization stage then reinforces these overly specific break‑points, leading to a model that does not generalize. By contrast, the one‑step formulation forces the optimizer to balance both objectives simultaneously, effectively regularizing the search space and reducing the tendency to overfit.

To mitigate overfitting, the paper proposes three practical enhancements. First, embed K‑fold cross‑validation inside the meta‑optimization loop so that each candidate solution is evaluated on unseen folds before being accepted. Second, augment the fitness function with regularization terms (e.g., L2 penalties on weights and a smoothness penalty on break‑point spacing) to discourage extreme parameter values. Third, employ early‑stopping criteria based on validation performance to halt the PSO iterations once improvement plateaus. When these measures are applied, the two‑step method’s test accuracy rises to about 86 %, narrowing the gap with the one‑step method and confirming that overfitting, rather than algorithmic superiority, was the primary cause of the earlier performance drop.

The study concludes that meta‑optimization for time‑series representation must consider both the complexity of the objective function and the risk of overfitting. While the one‑step approach offers a safer, more robust baseline, the two‑step approach can be competitive if equipped with proper regularization and validation mechanisms. These findings are not limited to SAX or time‑series classification; they provide a general guideline for any scenario where multiple interdependent hyper‑parameters are tuned simultaneously using bio‑inspired meta‑heuristics.