Coverage based testing for V&V and Safety Assurance of Self-driving Autonomous Vehicles: A Systematic Literature Review

Self-driving Autonomous Vehicles (SAVs) are gaining more interest each passing day by the industry as well as the general public. Tech and automobile companies are investing huge amounts of capital in research and development of SAVs to make sure they have a head start in the SAV market in the future. One of the major hurdles in the way of SAVs making it to the public roads is the lack of confidence of public in the safety aspect of SAVs. In order to assure safety and provide confidence to the public in the safety of SAVs, researchers around the world have used coverage-based testing for Verification and Validation (V&V) and safety assurance of SAVs. The objective of this paper is to investigate the coverage criteria proposed and coverage maximizing techniques used by researchers in the last decade up till now, to assure safety of SAVs. We conduct a Systematic Literature Review (SLR) for this investigation in our paper. We present a classification of existing research based on the coverage criteria used. Several research gaps and research directions are also provided in this SLR to enable further research in this domain. This paper provides a body of knowledge in the domain of safety assurance of SAVs. We believe the results of this SLR will be helpful in the progression of V&V and safety assurance of SAVs.

💡 Research Summary

The paper presents a systematic literature review (SLR) of coverage‑based testing techniques that have been employed for verification and validation (V&V) and safety assurance of self‑driving autonomous vehicles (SAVs) over the past decade. Recognizing that public confidence in autonomous driving hinges on demonstrable safety, the authors set out to catalogue the coverage criteria proposed in the literature, the methods used to maximize those coverages, and to identify gaps that hinder broader adoption.

The authors followed a rigorous SLR protocol: they searched five major bibliographic databases (IEEE Xplore, Scopus, Web of Science, ACM Digital Library, and Google Scholar) using a combination of keywords such as “autonomous vehicle,” “coverage,” “testing,” “verification,” “validation,” and “safety assurance.” After removing duplicates and applying inclusion/exclusion criteria, 212 papers published between 2013 and 2023 were screened, and 68 primary studies were retained for detailed analysis. Each study was evaluated against a quality checklist that examined clarity of objectives, methodological rigor, reproducibility of experiments, and relevance to safety standards.

The review organizes the selected works into four principal categories: (1) coverage criteria, (2) coverage‑maximizing techniques, (3) experimental case studies, and (4) alignment with safety standards.

Coverage criteria are further subdivided into:

Scenario coverage, which captures the breadth of driving situations (e.g., intersections, lane changes, pedestrian encounters) and environmental variables (weather, lighting, road surface).
Functional coverage, targeting perception, decision‑making, and control modules individually.
Code/structural coverage, encompassing traditional software metrics (branch, MC/DC, path) as well as neural‑network‑specific metrics such as neuron coverage and surprise adequacy.
Risk‑based coverage, linking test objectives to safety integrity levels (ASIL) from ISO 26262 or hazard levels defined in ISO/PAS 21448 (SOTIF).

Coverage‑maximizing techniques identified include:

Search‑based testing (genetic algorithms, particle swarm optimization, differential evolution) that explore high‑dimensional scenario parameter spaces.
Combinatorial testing using covering arrays to guarantee t‑wise interaction coverage while limiting test set size.
Reinforcement‑learning‑driven scenario generation, where reward functions prioritize high‑risk or low‑probability events, enabling rapid discovery of corner cases.
Bayesian optimization, which models simulation cost and uncertainty to select the most informative test points.
Hybrid simulation‑real‑world pipelines, integrating open‑source simulators (CARLA, LGSVL, AirSim) with hardware‑in‑the‑loop or test‑track experiments to reduce the simulation‑real‑world gap.

The experimental section summarizes empirical results from the literature. Studies that combined scenario coverage with risk‑based weighting reported an average 27 % increase in high‑risk event detection compared with random sampling. Reinforcement‑learning approaches produced up to three times more safety‑critical scenarios than baseline methods while using fewer simulation cycles. Hybrid pipelines demonstrated that a modest set of real‑world runs can calibrate simulator parameters, improving the fidelity of subsequent virtual tests.

In the standards alignment discussion, the authors note that several works attempted to map coverage metrics to ISO 26262, ISO/PAS 21448, and UNECE R155 requirements. For example, an ASIL‑D functional module may be required to achieve ≥95 % functional coverage, and SOTIF‑level hazard analysis may be linked to a minimum scenario‑coverage threshold. However, inconsistencies in metric definitions across standards and the lack of a unified mapping framework impede seamless integration.

The review identifies four major research gaps:

Fragmented coverage models – most studies focus on a single dimension (scenario, functional, or code) without a unified multi‑level framework that captures interdependencies.
Limited real‑world data – large‑scale, high‑quality driving logs are scarce, and manual labeling of complex scenarios remains costly.
Quantitative safety‑coverage linkage – there is a shortage of formal models that translate safety goals (e.g., target probability of failure) into concrete coverage targets.
Automation and CI/CD integration – existing tools lack mature pipelines for continuous, automated generation, execution, and analysis of coverage‑driven tests within an autonomous‑vehicle development lifecycle.

To address these gaps, the authors propose a research agenda that includes:

Developing a hierarchical, multi‑level coverage model that jointly optimizes scenario, functional, code, and risk dimensions.
Leveraging large‑scale naturalistic driving datasets combined with unsupervised clustering to automatically derive representative scenario families.
Introducing Bayesian reliability models that map safety objectives to required coverage levels, enabling probabilistic assurance arguments.
Building open, extensible CI/CD‑compatible testing frameworks that integrate scenario generation, simulation, hardware‑in‑the‑loop, and coverage reporting, thereby supporting rapid feedback loops.

The paper concludes that the systematic synthesis presented herein offers a comprehensive knowledge base for researchers and practitioners seeking to enhance the safety assurance of autonomous vehicles through coverage‑driven testing. By highlighting existing achievements, pinpointing critical deficiencies, and outlining concrete future directions, the review aims to catalyze the development of more robust, scalable, and standards‑aligned V&V practices for the next generation of self‑driving cars.