SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses

SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The advancement of safety-critical research in driving behavior in ADAS-equipped vehicles require real-world datasets that not only include diverse traffic scenarios but also capture high-risk edge cases such as near-miss events and system failures. However, existing datasets are largely limited to either simulated environments or human-driven vehicle data, lacking authentic ADAS (Advanced Driver Assistance System) vehicle behavior under risk conditions. To address this gap, this paper introduces SAVeD, a large-scale video dataset curated from publicly available social media content, explicitly focused on ADAS vehicle-related crashes, near-miss incidents, and disengagements. SAVeD features 2,119 first-person videos, capturing ADAS vehicle operations in diverse locations, lighting conditions, and weather scenarios. The dataset includes video frame-level annotations for collisions, evasive maneuvers, and disengagements, enabling analysis of both perception and decision-making failures. We demonstrate SAVeD’s utility through multiple analyses and contributions: (1) We propose a novel framework integrating semantic segmentation and monocular depth estimation to compute real-time Time-to-Collision (TTC) for dynamic objects. (2) We utilize the Generalized Extreme Value (GEV) distribution to model and quantify the extreme risk in crash and near-miss events across different roadway types. (3) We establish benchmarks for state-of-the-art VLLMs (VideoLLaMA2 and InternVL2.5 HiCo R16), showing that SAVeD’s detailed annotations significantly enhance model performance through domain adaptation in complex near-miss scenarios.


💡 Research Summary

The paper introduces SAVeD, a large‑scale first‑person video dataset specifically designed for analyzing ADAS‑equipped vehicle crashes, near‑miss incidents, and system disengagements. By automatically crawling over 300,000 videos from platforms such as YouTube, BiliBili, and Douyin using keywords like “ADAS crash” and “self‑driving near miss,” the authors curated 2,119 high‑quality clips that were manually trimmed and verified. The dataset is divided into three categories: 1,040 collision videos, 602 near‑miss videos, and 477 system‑error videos, totaling more than 1.1 million frames. Each video is annotated at the frame level across 27 dimensions, covering temporal markers, vehicle dynamics, ADAS status, environmental conditions, surrounding traffic, and human driver interventions.

Two technical contributions are presented. First, a framework that fuses state‑of‑the‑art semantic segmentation with monocular depth estimation to compute real‑time Time‑to‑Collision (TTC) for dynamic objects, enabling quantitative risk assessment during near‑miss events. Second, the use of the Generalized Extreme Value (GEV) distribution to statistically model extreme risk across different roadway types, providing probabilistic estimates of rare but severe incidents.

The authors also benchmark two cutting‑edge video‑language models (VLLMs), VideoLLaMA2 and InternVL2.5 HiCo R16, on SAVeD. Domain adaptation with the detailed annotations yields substantial performance gains—average improvements of over 12 % in mean average precision, accuracy, and F1 score compared with training on generic video corpora. The gains are especially pronounced for complex near‑miss scenarios, where the models better identify hazardous objects and generate more accurate textual descriptions.

SAVeD addresses critical gaps in existing resources, which are either simulation‑based, limited to human‑driven vehicles, or confined to textual crash reports lacking temporal and spatial richness. By providing authentic first‑person footage of ADAS behavior under high‑risk conditions, the dataset supports a wide range of research: perception and decision‑making failure analysis, reaction‑time studies, risk modeling, and the development and validation of safer ADAS algorithms. The dataset will be publicly released with ongoing expansion plans to cover more geographic regions, weather conditions, and driving contexts, positioning SAVeD as a foundational asset for future ADAS safety research and policy development.


Comments & Academic Discussion

Loading comments...

Leave a Comment