Advance Real-time Detection of Traffic Incidents in Highways using Vehicle Trajectory Data
A significant number of traffic crashes are secondary crashes that occur because of an earlier incident on the road. Thus, early detection of traffic incidents is crucial for road users from safety perspectives with a potential to reduce the risk of secondary crashes. The wide availability of GPS devices now-a-days gives an opportunity of tracking and recording vehicle trajectories. The objective of this study is to use vehicle trajectory data for advance real-time detection of traffic incidents on highways using machine learning-based algorithms. The study uses three days of unevenly sequenced vehicle trajectory data and traffic incident data on I-10, one of the most crash-prone highways in Louisiana. Vehicle trajectories are converted to trajectories based on virtual detector locations to maintain spatial uniformity as well as to generate historical traffic data for machine learning algorithms. Trips matched with traffic incidents on the way are separated and along with other trips with similar spatial attributes are used to build a database for modeling. Multiple machine learning algorithms such as Logistic Regression, Random Forest, Extreme Gradient Boost, and Artificial Neural Network models are used to detect a trajectory that is likely to face an incident in the downstream road section. Results suggest that the Random Forest model achieves the best performance for predicting an incident with reasonable recall value and discrimination capability.
💡 Research Summary
This paper investigates the use of GPS‑based vehicle trajectory data for advance, real‑time detection of traffic incidents on a high‑risk highway segment (I‑10 in Louisiana). The authors collected three days of low‑frequency (≤30 s) connected‑vehicle data (135,204 vehicles) from the Otonomo platform during the evacuation period of Hurricane Ida (August 27‑29, 2021) and paired it with 256 incident records from the Regional Integrated Transportation Information System (RITIS) that resulted in lane closures.
Data preprocessing involved filtering out trajectories that never entered the I‑10 corridor, segmenting the remaining raw points into individual trips, and defining a new trip whenever the time gap between successive points exceeded 15 minutes. This reduced the dataset to 11,674 trips, which were further classified into east‑bound (4,236) and west‑bound (7,438) based on mobility heading angles.
To impose spatial uniformity, the authors introduced a set of virtual detectors spaced every 1/16 mile (≈110 m) along both directions of the highway. For each trip, speed, speed standard deviation, and heading angle were interpolated at each detector location, producing detector‑based time series. Peak (6‑10 am, 3‑7 pm) and off‑peak periods were distinguished, and weather data (rain presence) were merged as an additional binary feature.
Incident‑trip matching was performed by locating the nearest virtual detector to each incident (the “event detector”) and checking whether a trip passed that detector within a two‑minute temporal window. This spatial‑temporal linkage identified the trips that experienced an incident, creating a highly imbalanced classification problem (few positive cases). The authors addressed the imbalance with Synthetic Minority Over‑sampling Technique (SMOTE), generating synthetic incident trips to achieve an approximate 1:4 incident‑to‑non‑incident ratio.
Four machine‑learning classifiers were trained and evaluated using five‑fold cross‑validation: Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGB), and an Artificial Neural Network (ANN). Performance metrics included accuracy, recall (sensitivity), F1‑score, and Area Under the ROC Curve (AUC). The Random Forest model achieved the best overall performance, with a recall of about 0.78 and AUC of 0.84, indicating strong ability to identify downstream incidents while maintaining reasonable false‑positive rates. Feature‑importance analysis revealed that peak‑period mean speed, speed variability, and changes in heading angle were the most predictive variables, underscoring the value of fine‑grained kinematic information for incident detection.
The study demonstrates that even low‑frequency GPS data can be transformed into a uniform detector‑based representation that supports real‑time risk estimation. The two‑minute observation window proved sufficient to capture pre‑incident vehicle behavior, suggesting that early warning systems can operate with minimal latency. Random Forest’s relatively low computational overhead makes it suitable for deployment in traffic management centers.
Limitations include the exclusive focus on a single evacuation period, which may not reflect typical traffic patterns, and the lack of detailed analysis by incident type (e.g., stalled vehicle vs. collision). Future work is suggested to incorporate online learning for continuous model updating, explore multi‑vehicle cooperative anomaly detection (e.g., graph neural networks), and evaluate the framework across diverse road networks and weather conditions.
In summary, the paper presents a practical, data‑driven framework for downstream incident detection using vehicle trajectory data, validates the superiority of Random Forest for this task, and highlights the potential for improving driver safety and traffic‑management response through timely, individualized incident alerts.
Comments & Academic Discussion
Loading comments...
Leave a Comment