Prediction of Zoonosis Incidence in Human using Seasonal Auto Regressive Integrated Moving Average (SARIMA)
Zoonosis refers to the transmission of infectious diseases from animal to human. The increasing number of zoonosis incidence makes the great losses to lives, including humans and animals, and also the impact in social economic. It motivates development of a system that can predict the future number of zoonosis occurrences in human. This paper analyses and presents the use of Seasonal Autoregressive Integrated Moving Average (SARIMA) method for developing a forecasting model that able to support and provide prediction number of zoonosis human incidence. The dataset for model development was collected on a time series data of human tuberculosis occurrences in United States which comprises of fourteen years of monthly data obtained from a study published by Centers for Disease Control and Prevention (CDC). Several trial models of SARIMA were compared to obtain the most appropriate model. Then, diagnostic tests were used to determine model validity. The result showed that the SARIMA(9,0,14)(12,1,24)12 is the fittest model. While in the measure of accuracy, the selected model achieved 0.062 of Theils U value. It implied that the model was highly accurate and a close fit. It was also indicated the capability of final model to closely represent and made prediction based on the tuberculosis historical dataset.
💡 Research Summary
The paper addresses the growing public‑health challenge posed by zoonotic diseases by developing a time‑series forecasting model capable of predicting future human incidence. Using a 14‑year monthly record (2000‑2013) of human tuberculosis (TB) cases in the United States, obtained from the Centers for Disease Control and Prevention (CDC), the authors explore the Seasonal Autoregressive Integrated Moving Average (SARIMA) methodology. Initial exploratory analysis reveals a strong seasonal pattern, prompting the authors to examine autocorrelation (ACF) and partial autocorrelation (PACF) functions to determine appropriate differencing orders. After applying a seasonal differencing of order one (D=1) to achieve stationarity, a systematic grid search evaluates numerous SARIMA configurations, with model selection guided by Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and residual diagnostics.
The optimal configuration is identified as SARIMA(9,0,14)(12,1,24)_12. This model incorporates nine non‑seasonal autoregressive terms, fourteen non‑seasonal moving‑average terms, a seasonal period of 12 months, one seasonal difference, and twenty‑four seasonal moving‑average terms. No non‑seasonal differencing is required (d=0). Diagnostic testing includes the Ljung‑Box test, which fails to reject the null hypothesis of white‑noise residuals (p > 0.05), confirming that autocorrelation has been adequately removed. Residual normality is verified through Q‑Q plots and the Shapiro‑Wilk test, indicating that the error distribution conforms to Gaussian assumptions. These diagnostics collectively demonstrate that the model captures the underlying structure of the TB series without overfitting.
Predictive performance is quantified using Theil’s U statistic, yielding a value of 0.062. Because Theil’s U ranges from 0 (perfect prediction) to 1 (no predictive skill), a value this low signifies that the SARIMA model’s forecasts are highly accurate and substantially better than a naïve random‑walk benchmark. The authors present a 12‑month ahead forecast, showing minimal deviation between predicted and observed TB counts, thereby illustrating the model’s practical utility for short‑term planning.
The study contributes two main insights. First, it validates SARIMA as a robust tool for zoonotic disease forecasting when the data exhibit clear seasonal cycles, as is the case with TB incidence. Second, by providing an objective accuracy metric (Theil’s U) and thorough residual analysis, the work offers a transparent framework that public‑health officials can rely on for evidence‑based decision making, such as allocating resources, scheduling vaccination campaigns, or preparing healthcare facilities.
Nevertheless, the research has notable limitations. The dataset is confined to a single country and a single disease, which restricts the generalizability of the findings to other zoonoses or geographic contexts. Moreover, the model does not incorporate exogenous variables—such as climate fluctuations, population mobility, or vaccination coverage—that are known to influence infectious‑disease dynamics. Future investigations could extend the approach by employing SARIMAX (which allows external regressors), by comparing SARIMA with machine‑learning time‑series techniques (e.g., LSTM networks, Prophet), or by constructing a multivariate framework that simultaneously models several zoonotic diseases across multiple regions.
In conclusion, the paper demonstrates that a carefully specified SARIMA model can accurately forecast human TB incidence, serving as a proof‑of‑concept for broader zoonotic‑disease prediction efforts. Its methodological rigor, transparent diagnostics, and strong predictive performance suggest that SARIMA‑based tools could become integral components of national and international disease‑surveillance systems, enabling proactive public‑health interventions and more efficient allocation of limited resources.
Comments & Academic Discussion
Loading comments...
Leave a Comment