Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features

Real-time Secondary Crash Likelihood Prediction Excluding Post Primary Crash Features
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Secondary crash likelihood prediction is a critical component of an active traffic management system to mitigate congestion and adverse impacts caused by secondary crashes. However, existing approaches mainly rely on post-crash features (e.g., crash type and severity) that are rarely available in real time, limiting their practical applicability. To address this limitation, we propose a hybrid secondary crash likelihood prediction framework that does not depend on post-crash features. A dynamic spatiotemporal window is designed to extract real-time traffic flow and environmental features from primary crash locations and their upstream segments. The framework includes three models: a primary crash model to estimate the likelihood of secondary crash occurrence, and two secondary crash models to evaluate traffic conditions at crash and upstream segments under different comparative scenarios. An ensemble learning strategy integrating six machine learning algorithms is developed to enhance predictive performance, and a voting-based mechanism combines the outputs of the three models. Experiments on Florida freeways demonstrate that the proposed hybrid framework correctly identifies 91% of secondary crashes with a low false alarm rate of 0.20. The Area Under the ROC Curve improves from 0.654, 0.744, and 0.902 for the individual models to 0.952 for the hybrid model, outperforming previous studies.


💡 Research Summary

The paper addresses the critical need for real‑time prediction of secondary crashes—accidents that occur within the spatial‑temporal influence zone of a primary crash—without relying on post‑crash attributes such as crash type, severity, or duration, which are unavailable in an operational traffic management environment. The authors propose a hybrid prediction framework that extracts only real‑time observable features: traffic flow (speed, volume, occupancy), weather conditions, and road geometry, from both the primary crash segment and its upstream segments using a dynamically sized spatio‑temporal window.

The framework consists of three sub‑models. The first, the Primary Crash Model, estimates the probability that a given primary crash will generate a secondary crash. The second and third models (Model 1 and Model 2) evaluate traffic conditions at the primary crash location and upstream segments under two distinct comparative scenarios: Model 1 contrasts traffic status before secondary crashes with traffic status before primary crashes that did not lead to secondary crashes; Model 2 contrasts the same pre‑crash traffic status with traffic conditions observed on crash‑free days. By employing these two reference conditions, the framework captures the incremental risk contributed by congestion, shock‑wave propagation, and adverse weather.

Each sub‑model is trained using an ensemble of six machine‑learning algorithms, including XGBoost, Random Forest, Gradient Boosting Decision Trees, Support Vector Machines, Multi‑Layer Perceptrons, and Convolutional Neural Networks. To mitigate class imbalance (secondary crashes represent only 1–2 % of all crashes), the authors apply SMOTE oversampling and cost‑sensitive learning. The outputs of the six algorithms for a given sub‑model are averaged, producing a sub‑model score. A voting‑based mechanism then combines the three sub‑model scores into a final secondary‑crash‑likelihood estimate, effectively reducing individual model bias and improving robustness.

The authors validate the approach on a comprehensive dataset from three Florida freeways (I‑4, I‑75, I‑95) covering four years (2019‑2022). The dataset includes 21,236 recorded crashes, high‑resolution microwave vehicle detection system (MVDS) traffic data aggregated to 5‑minute intervals, weather observations from nearby stations, and detailed road‑geometry attributes for 1,278 roadway segments. Secondary crashes are identified using a speed‑contour‑plot method that compares segment speeds before/after a primary crash with speeds on crash‑free days, applying a statistical threshold (0.25 × standard deviation) to isolate congestion caused by the crash.

Experimental results demonstrate that the hybrid framework correctly identifies 91 % of secondary crashes while maintaining a low false‑alarm rate of 0.20. The Area Under the ROC Curve (AUC) for the combined hybrid model reaches 0.952, substantially outperforming the individual sub‑models, whose AUCs are 0.654, 0.744, and 0.902 respectively. Accuracy, precision, recall, and F1‑score also show marked improvements over prior studies that relied on post‑crash features.

The discussion acknowledges limitations: the validation is confined to Florida, so geographic generalization remains to be tested; real‑time deployment would require addressing data latency, sensor failures, and system integration challenges; and additional features such as vehicle class or real‑time incident reports could further enhance performance. Nonetheless, the study provides a concrete, implementable solution for proactive traffic safety management, enabling traffic operators to issue early warnings or deploy mitigation strategies (e.g., dynamic lane control, variable speed limits) before secondary crashes materialize. The authors conclude that excluding post‑crash information and leveraging a multi‑model, multi‑algorithm ensemble yields a highly accurate, low‑false‑alarm secondary‑crash likelihood predictor suitable for integration into modern active traffic management systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment