Forecasting of Multiple Seasonal Categorical Time Series Using Fourier Series with Application to AQI Data of Kolkata

Forecasting of Multiple Seasonal Categorical Time Series Using Fourier Series with Application to AQI Data of Kolkata
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multiple seasonalities have been widely studied in continuous time series using models such as TBATS, for instance in electricity demand forecasting. However, their treatment in categorical time series, such as air quality index (AQI) data, remains limited. Categorical AQI often exhibits distinct seasonal patterns at multiple frequencies, which are not captured by standard models. In this paper, we propose a framework that models multiple seasonalities using Fourier series and indicator functions, inspired by the TBATS methodology. The approach accommodates the ordinal nature of AQI categories while explicitly capturing daily, weekly and yearly seasonal cycles. Simulation studies demonstrate the empirical consistency of parameter estimates under the proposed model. We further illustrate its applicability using real categorical AQI data from Kolkata and compare forecasting performance with Markov models and machine learning methods. Results indicate that our approach effectively captures complex seasonal dynamics and provides improved predictive accuracy. The proposed methodology offers a flexible and interpretable framework for analyzing categorical time series exhibiting multiple seasonal patterns, with potential applications in air quality monitoring, energy consumption and other environmental domains.


💡 Research Summary

The paper addresses the problem of forecasting categorical time series that exhibit multiple seasonal patterns, using daily Air Quality Index (AQI) data from Kolkata as a case study. While multiple seasonality has been extensively studied for continuous-valued series (e.g., TBATS, Prophet), its treatment for ordinal categorical series remains under‑explored. The authors propose a framework that adapts the TBATS idea to ordinal outcomes by embedding Fourier series terms (sine and cosine functions) as covariates within an ordinal logistic regression model (TSOLR – Temporal Seasonal Ordinal Logistic Regression). The Fourier terms capture daily, weekly, and yearly cycles in a parsimonious, smooth manner, preserving interpretability because each coefficient directly reflects the contribution of a specific seasonal frequency.

In addition to the Fourier‑based specification, the authors introduce an alternative indicator‑function model (ISOLR – Indicator Seasonal Ordinal Logistic Regression) that uses dummy variables for each hour of the day, day of the week, and month of the year. This non‑parametric approach can represent abrupt, irregular seasonal effects (e.g., spikes during festivals) but requires regularization (e.g., L1 penalty) to avoid over‑parameterization.

The methodology section details the likelihood formulation for the proportional odds model, the construction of Fourier covariates, and the selection of seasonal orders using information criteria (AIC, BIC) together with seasonal autocorrelation diagnostics. Estimation proceeds via maximum likelihood, with standard errors obtained from the observed Fisher information.

A comprehensive data preparation pipeline is described: daily AQI values from 2019‑2024 are obtained from the Central Pollution Control Board, mapped to the six standard AQI categories (Good, Satisfactory, Moderate, Poor, Very Poor, Severe), and encoded as ordered integers 0‑5. Exploratory analysis reveals clear annual cycles (higher pollution in winter), weekly patterns (slightly better air quality on weekends), and a long‑term improvement trend, especially during the COVID‑19 lockdown in 2020. Transition probability matrices show strong persistence within categories.

Simulation experiments assess parameter recovery under varying Fourier orders and sample sizes, confirming empirical consistency and illustrating the bias‑variance trade‑off when the order is misspecified.

For empirical validation, the authors fit TSOLR and ISOLR to the Kolkata AQI series and benchmark them against several alternatives: (i) first‑order Markov chains, (ii) discrete ARMA and MTD models, (iii) mixture transition distribution models, (iv) random forests, and (v) long short‑term memory (LSTM) networks. Forecast performance is evaluated on a hold‑out test set using accuracy, mean absolute error (MAE), and Cohen’s Kappa (to respect the ordinal nature). TSOLR consistently outperforms the Markov and discrete ARMA baselines by 3–5 percentage points in accuracy and yields higher Kappa scores, indicating better ordinal ranking. Compared with machine‑learning models, TSOLR achieves comparable or slightly superior accuracy while offering far greater interpretability: the estimated Fourier coefficients reveal the magnitude and phase of daily, weekly, and yearly effects, and the odds ratios can be directly communicated to policymakers. ISOLR excels at capturing irregular spikes (e.g., festival days) but suffers from over‑fitting when regularization is omitted; with L1 penalty its performance aligns closely with TSOLR.

The paper’s contributions are threefold: (1) a novel statistical framework for multiple‑seasonal categorical time series, (2) a thorough empirical demonstration that Fourier‑based ordinal logistic regression yields both accurate forecasts and interpretable seasonal effects, and (3) a practical guide for model selection, estimation, and validation in environmental applications. The authors suggest future extensions such as incorporating exogenous covariates (meteorology, traffic), Bayesian hierarchical formulations, and multivariate categorical series (e.g., simultaneous modeling of multiple pollutants). Overall, the study provides a valuable bridge between sophisticated continuous‑time‑series methods and the practical needs of categorical environmental monitoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment