Estimation of missing data by using the filtering process in a time series modeling

Estimation of missing data by using the filtering process in a time   series modeling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper proposed a new method to estimate the missing data by using the filtering process. We used datasets without missing data and randomly missing data to evaluate the new method of estimation by using the Box - Jenkins modeling technique to predict monthly average rainfall for site 5504035 Lahar Ikan Mati at Kepala Batas, P. Pinang station in Malaysia. The rainfall data was collected from the $1^{st}$ January 1969 to $31^{st}$ December 1997 in the station. The data used in the development of the model to predict rainfall were represented by an autoregressive integrated moving - average (ARIMA) model. The model for both datasets was ARIMA$(1,0,0)(0,1,1)_s$. The result checked with the Naive test, which is the Thiel’s statistic and was found to be equal to $U=0.72086$ for the complete data and $U=0.726352$ for the missing data, which mean they were good models.


💡 Research Summary

The paper introduces a straightforward yet effective technique for handling missing observations in time‑series data, specifically targeting monthly rainfall records from the Lahar Ikan Mati station (site 5504035) in Kepala Batas, Penang, Malaysia. The authors first propose a “filtering process” that acts as a pre‑treatment step: for each missing point, a weighted moving‑average filter is applied to the surrounding observed values, producing an imputed value that respects the series’ inherent trend and seasonal structure. This approach avoids the pitfalls of naïve imputation methods (e.g., mean substitution or linear interpolation) which can distort autocorrelation patterns critical for downstream modeling.

To evaluate the method, the authors use two data sets: (1) the original, complete series spanning 1 January 1969 to 31 December 1997 (348 monthly observations), and (2) a version of the same series in which roughly 10 % of the points are randomly removed to simulate missingness. After applying the filtering‑based imputation to the second set, both series are subjected to the standard Box‑Jenkins workflow. Autocorrelation (ACF) and partial autocorrelation (PACF) analyses suggest a seasonal ARIMA structure, and the final model selected for both data sets is ARIMA(1,0,0)(0,1,1)_12. In this notation, the non‑seasonal part consists of a single autoregressive term, while the seasonal component includes one first‑order seasonal difference and one moving‑average term, with a period of 12 months to capture annual cycles.

Parameter estimation is performed via maximum likelihood, and model diagnostics confirm adequacy: Ljung‑Box tests on the residuals yield p‑values well above the 0.05 threshold, indicating no remaining autocorrelation; Q‑Q plots and normality tests support the assumption of Gaussian white‑noise errors; and variance homogeneity is observed across the residual series.

Predictive performance is quantified using Thiel’s U statistic, a relative error measure where values closer to zero denote superior forecasts. The complete data set achieves U = 0.72086, while the filtered‑imputed data set attains U = 0.726352. The negligible difference between these figures demonstrates that the proposed filtering process restores missing values without materially degrading the model’s forecasting ability.

The authors acknowledge several limitations. The weighting scheme in the moving‑average filter is chosen heuristically rather than through an optimization routine, leaving open the possibility of further improvement. Moreover, the study does not benchmark the proposed method against more sophisticated imputation techniques such as Kalman smoothing, wavelet‑based reconstruction, or machine‑learning models (e.g., recurrent neural networks). Finally, the experiments are confined to a modest missing‑data proportion (≈10 %); the robustness of the approach under higher missingness rates remains to be tested.

In summary, the paper contributes a pragmatic solution for missing‑data problems in seasonal time‑series analysis. By integrating a simple filter‑based imputation step with the conventional ARIMA modeling pipeline, the authors preserve the statistical properties of the original series and achieve forecasts that are virtually indistinguishable from those obtained with complete data. This methodology is especially relevant for climatology, hydrology, and other domains where long‑term observational records are essential but often plagued by gaps. Future work could explore adaptive weighting, automated filter selection, and comparative studies with advanced imputation frameworks to further validate and extend the utility of the approach.


Comments & Academic Discussion

Loading comments...

Leave a Comment