Real-time Air Pollution prediction model based on Spatiotemporal Big data

Real-time Air Pollution prediction model based on Spatiotemporal Big data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Air pollution is one of the most concerns for urban areas. Many countries have constructed monitoring stations to hourly collect pollution values. Recently, there is a research in Daegu city, Korea for real-time air quality monitoring via sensors installed on taxis running across the whole city. The collected data is huge (1-second interval) and in both Spatial and Temporal format. In this paper, based on this spatiotemporal Big data, we propose a real-time air pollution prediction model based on Convolutional Neural Network (CNN) algorithm for image-like Spatial distribution of air pollution. Regarding to Temporal information in the data, we introduce a combination of a Long Short-Term Memory (LSTM) unit for time series data and a Neural Network model for other air pollution impact factors such as weather conditions to build a hybrid prediction model. This model is simple in architecture but still brings good prediction ability.


💡 Research Summary

The paper presents a novel real‑time air‑pollution prediction framework built on a massive spatiotemporal dataset collected from sensors mounted on taxis circulating throughout Daegu, South Korea. Between June 2017 and March 2018, approximately 33.3 million records were gathered at a 1‑second sampling rate, stored in a MySQL database, and pre‑processed using Apache Spark to handle the volume. For spatial modeling, the city is discretized into a 32 × 32 grid; the average pollutant concentration in each cell is treated as a pixel intensity, forming a grayscale “image” that captures the spatial distribution of pollution at a given timestamp.

A Convolutional Neural Network (CNN) is employed to classify these images into four health‑impact categories: Good (0), Moderate (1), Unhealthy (2), and Hazardous (3). The CNN architecture consists of two convolution‑max‑pooling blocks, two fully‑connected layers, and a softmax output layer. Training uses data from September to December 2017, while January 2018 serves as the test set. Input values are normalized to


Comments & Academic Discussion

Loading comments...

Leave a Comment