City traffic forecasting using taxi GPS data: A coarse-grained cellular automata model

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

City traffic is a dynamic system of enormous complexity. Modeling and predicting city traffic flow remains to be a challenge task and the main difficulties are how to specify the supply and demands and how to parameterize the model. In this paper we attempt to solve these problems with the help of large amount of floating car data. We propose a coarse-grained cellular automata model that simulates vehicles moving on uniform grids whose size are much larger compared with the microscopic cellular automata model. The car-car interaction in the microscopic model is replaced by the coupling between vehicles and coarse-grained state variables in our model. To parameterize the model, flux-occupancy relations are fitted from the historical data at every grids, which serve as the coarse-grained fundamental diagrams coupling the occupancy and speed. To evaluate the model, we feed it with the historical travel demands and trajectories obtained from the floating car data and use the model to predict road speed one hour into the future. Numerical results show that our model can capture the traffic flow pattern of the entire city and make reasonable predictions. The current work can be considered a prototype for a model-based forecasting system for city traffic.

💡 Research Summary

The paper presents a novel coarse‑grained cellular automaton (CA) framework for city‑wide traffic forecasting, leveraging large‑scale floating‑car (taxi GPS) data collected in Beijing. Unlike traditional microscopic CA models that operate on sub‑meter cells and explicit car‑to‑car interaction rules, the proposed model aggregates space into uniform 100 m × 100 m grids, each capable of holding many vehicles simultaneously. Within each grid, traffic is described by two macroscopic state variables: average speed (V) and occupancy (N) for each of the four cardinal movement directions. The authors first preprocess the raw GPS logs to remove macroscopic (out‑of‑city), mesoscopic (large jumps >10 km) and microscopic (GPS jitter) errors, retain only “occupied” taxi records, segment trips at gaps ≥5 minutes, and apply OSRM map‑matching to obtain realistic road‑aligned trajectories.

To estimate V and N, instantaneous values are computed from the reconstructed trajectories and then temporally smoothed over a ten‑minute sliding window, mitigating sparsity (≈30 000 occupied taxis versus 65 536 grids). For each grid, the authors plot the empirical flux‑occupancy relationship (flux = N·V versus N) and fit a piecewise fundamental‑diagram‑like function. Two anchor points—P (free‑flow threshold) and Q (congestion onset)—are automatically selected after discarding the top 5 % flux outliers and isolating the left‑most 20 % of the remaining data. The fitted V(N) function is parameterized by four quantities: free‑flow speed V_f, effective capacity N_c, congested‑flow slope V_s, and maximum occupancy N_m. Below N_c, speed is constant at V_f; above N_c, speed follows a hyperbolic form a/(N‑b) derived from the four parameters (Equation 6).

Demand is supplied directly from the historical O‑D pairs and routes extracted from the taxi data, serving as a proof‑of‑concept that, given accurate demand, the model can reproduce observed traffic dynamics. Supply is represented implicitly: each grid’s four directional links act as road segments with the calibrated V(N) relations, without explicit turning restrictions or traffic signals. The CA update rule uses the current occupancy to compute target speeds via the calibrated V(N) curves, then moves vehicles accordingly at each one‑minute time step.

Simulation experiments feed the historical demand into the model and predict average speeds one hour ahead. At the city scale, the simulated spatiotemporal traffic patterns closely match those observed in the data. At the individual road‑segment level, the mean absolute error of the one‑hour‑ahead speed prediction is modest (typically under 10 km/h), demonstrating that the coarse‑grained CA can achieve reasonable forecasting accuracy while remaining computationally lightweight.

The authors acknowledge limitations: the model relies solely on taxi trajectories, which may not fully represent the entire vehicle fleet; it omits traffic signals, turning rules, and heterogeneous vehicle types, potentially reducing fidelity under highly congested conditions. Future work is outlined to incorporate non‑taxi data, develop dynamic O‑D generation and route‑choice models, embed signal control and turning constraints, and implement online parameter updating for real‑time operation.

In conclusion, the study shows that a coarse‑grained cellular automaton, calibrated directly from floating‑car data, can serve as an effective backbone for city‑wide traffic forecasting systems, bridging the gap between detailed microscopic simulations and coarse macroscopic models while maintaining practical computational demands.

City traffic forecasting using taxi GPS data: A coarse-grained cellular automata model

💡 Research Summary

Comments & Academic Discussion

Leave a Comment