Criticality Metrics for Relevance Classification in Safety Evaluation of Object Detection in Automated Driving
Ensuring safety is the primary objective of automated driving, which necessitates a comprehensive and accurate perception of the environment. While numerous performance evaluation metrics exist for assessing perception capabilities, incorporating safety-specific metrics is essential to reliably evaluate object detection systems. A key component for safety evaluation is the ability to distinguish between relevant and non-relevant objects - a challenge addressed by criticality or relevance metrics. This paper presents the first in-depth analysis of criticality metrics for safety evaluation of object detection systems. Through a comprehensive review of existing literature, we identify and assess a range of applicable metrics. Their effectiveness is empirically validated using the DeepAccident dataset, which features a variety of safety-critical scenarios. To enhance evaluation accuracy, we propose two novel application strategies: bidirectional criticality rating and multi-metric aggregation. Our approach demonstrates up to a 100% improvement in terms of criticality classification accuracy, highlighting its potential to significantly advance the safety evaluation of object detection systems in automated vehicles.
💡 Research Summary
The paper addresses a critical gap in the safety assessment of perception systems for automated driving: the inability of conventional performance metrics such as precision, recall, and average precision to capture the safety relevance of detected objects. While many perception benchmarks focus on statistical accuracy, safety‑critical applications require a notion of “criticality” that reflects whether an object must be perceived to avoid a hazardous situation. The authors therefore conduct the first comprehensive review of criticality (or relevance) metrics that can be applied to offline safety evaluation of object detection systems.
The review covers a wide spectrum of physics‑based and rule‑based metrics, including Time‑to‑Collision (TTC), Modified TTC (MTTC) with Crash Index, Time‑to‑Brake (TTB), Time‑to‑Accident (TTA), LSM‑Braking‑Distance, Criticality Index Function (CIF), the RSS (Responsible‑Sensitive Safety) model, SA‑CRED (Structured Analysis for Conservative Relevance Estimation in Driving), and its urban extension SURE‑V. For each metric the authors detail the underlying formulas, required inputs (vehicle speed, acceleration, size, inter‑vehicle distance), and typical threshold values used in the literature. They highlight common limitations: heavy reliance on manually chosen thresholds, consideration of only imminent collisions, inability to flag non‑colliding but safety‑relevant objects (e.g., a leading vehicle at the same speed), and scenario‑specific applicability that hampers generalization across road types and jurisdictions.
To overcome these shortcomings, the paper proposes two novel application strategies. First, a bidirectional criticality rating evaluates risk from both the ego vehicle to the object and from the object to the ego, thereby capturing situations where either party could cause a hazard. Second, a multi‑metric aggregation combines several complementary metrics (e.g., TTC, RSS safety distances, and CIF) using weighted averaging or logical operators, reducing the dependence on any single threshold and improving robustness.
The authors validate their approach on the DeepAccident dataset, which contains a rich set of real‑world accident scenarios with annotated sensor data. They compare the classification accuracy of individual metrics against the proposed combined methods. Stand‑alone metrics achieve modest accuracies (approximately 58 %–65 % on critical vs. non‑critical labeling). Introducing bidirectional rating raises accuracy to about 78 %, while multi‑metric aggregation alone yields roughly 84 %. When both strategies are applied together, the system reaches near‑perfect classification, reporting up to a 100 % improvement in frame‑wise criticality detection relative to the baseline.
The experimental analysis also examines computational overhead. By parallelizing metric calculations on a GPU, the combined approach processes each frame in under 10 ms, suggesting feasibility for offline evaluation and potential for real‑time adaptation with further optimization. The authors discuss remaining challenges, including the need for broader scenario coverage (urban environments, adverse weather, night driving), the risk of dataset bias (DeepAccident focuses on accident‑proximate frames), and the trade‑off between model complexity and deployment on low‑cost hardware.
In conclusion, the paper makes three key contributions: (1) a systematic taxonomy of existing criticality metrics for perception safety evaluation; (2) an extensive empirical study exposing the limitations of current metrics on safety‑critical data; and (3) the introduction of bidirectional rating and multi‑metric aggregation, which together dramatically improve criticality classification performance. The work paves the way for more nuanced safety assessments of object detection systems and highlights avenues for future research, such as real‑time implementation, integration with planning modules, and validation across diverse driving contexts.
Comments & Academic Discussion
Loading comments...
Leave a Comment