ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios

Reading time: 5 minute
...

📝 Original Info

  • Title: ScenicRules: An Autonomous Driving Benchmark with Multi-Objective Specifications and Abstract Scenarios
  • ArXiv ID: 2602.16073
  • Date: 2026-02-17
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. **

📝 Abstract

Developing autonomous driving systems for complex traffic environments requires balancing multiple objectives, such as avoiding collisions, obeying traffic rules, and making efficient progress. In many situations, these objectives cannot be satisfied simultaneously, and explicit priority relations naturally arise. Also, driving rules require context, so it is important to formally model the environment scenarios within which such rules apply. Existing benchmarks for evaluating autonomous vehicles lack such combinations of multi-objective prioritized rules and formal environment models. In this work, we introduce ScenicRules, a benchmark for evaluating autonomous driving systems in stochastic environments under prioritized multi-objective specifications. We first formalize a diverse set of objectives to serve as quantitative evaluation metrics. Next, we design a Hierarchical Rulebook framework that encodes multiple objectives and their priority relations in an interpretable and adaptable manner. We then construct a compact yet representative collection of scenarios spanning diverse driving contexts and near-accident situations, formally modeled in the Scenic language. Experimental results show that our formalized objectives and Hierarchical Rulebooks align well with human driving judgments and that our benchmark effectively exposes agent failures with respect to the prioritized objectives. Our benchmark can be accessed at https://github.com/BerkeleyLearnVerify/ScenicRules/.

💡 Deep Analysis

📄 Full Content

Real-world autonomous driving entails managing multiple, often conflicting objectives with varying priorities. Fig. 1a illustrates such a case from the perspective of a Waymo autonomous vehicle [1]. A scooter rider in the bike lane suddenly falls onto the road in front of the vehicle. To avoid a collision, the vehicle swerves into the adjacent left lane, nearly crossing into the opposite direction of traffic. In this situation, the vehicle cannot simultaneously satisfy the objectives of avoiding collision with the rider and maintaining lane position, and therefore prioritizes the former. Fig. 1b shows another example [2], where a Tesla autonomous vehicle enters the oncoming lane to avoid a large puddle and improve passenger comfort. Here, the objectives of staying on the correct side of the road and ensuring passenger comfort conflict, and the vehicle prioritizes the latter. These examples highlight the importance of explicitly specifying multiple objectives and their priority relations, along with the environment context, when evaluating autonomous driving systems.

To effectively evaluate autonomous driving systems in modern, complex traffic environments and capture realistic trade-offs among objectives, a benchmark must satisfy three key requirements.

First, it should provide a diverse set of objectives formalized as quantitative metrics and/or Boolean properties for evaluating agents and measuring the degree of violation. Second, it should include a specification framework that permits expressing priority relations among objectives. This framework must be interpretable, easy to manipulate, and adaptable across scenarios to reflect different preferences. Third, it should employ an expressive representation of traffic scenarios under which autonomous driving systems can be effectively evaluated under multi-objective specifications, assessing their ability to balance competing goals.

To meet the first requirement, prior work has attempted to formalize traffic rules using Boolean properties or quantitative metrics [3]- [8]. However, these studies do not validate whether the formalized rules accurately capture violations in real driving behaviors. In this paper, we not only formalize a diverse set of driving rules, but also provide alternative definitions for selected rules, serving as a basis for examining how different formulations influence evaluation outcomes. We further design precise and fine-grained measures to assess the degree of rule violation.

Second, most existing autonomous driving benchmarks either focus on a single objective [9]- [11] or consider multiple objectives without modeling their priority relations [12]- [17]. To address this limitation, we adopt the Rulebook structure [18], which captures the priority relations among multiple objectives. We further design a Hierarchical Rulebook framework that allows flexible adaptation to diverse driving contexts while remaining interpretable.

Finally, while many driving datasets collect large amounts of human driving data [9]- [14], [19]- [21], such data-heavy approaches suffer from incompleteness or are inefficient for verification. In our benchmark, we apply the concept of coreset selection [22], [23] to construct a compact yet representative set of scenarios that achieves broad coverage. We also reconstruct near-accident scenarios from real-world collision reports using a Large Language Model (LLM)-assisted pipeline, enabling evaluation of agents in critical situations. All scenarios are represented in the Scenic programming language [24], [25], which provides an expressive, yet abstract way to model complex traffic scenarios and stochastically generates diverse concrete scenarios with varying input parameters, enabling parameter-level coverage in simulation-based verification.

The main contributions of this paper are as follows.

• We propose ScenicRules, a benchmark that integrates Scenic programs with Rulebook specifications. To the best of our knowledge, this is the first benchmark to combine a multiobjective, priority-based specification framework (rulebook) with an expressive scenario modeling notation (Scenic) for evaluating autonomous driving approaches and systems.

1 arXiv:2602.16073v1 [cs.RO] 17 Feb 2026 (a) A Waymo autonomous vehicle prioritizes collision avoidance over lane keeping [1].

(b) A Tesla autonomous vehicle prioritizes passenger comfort over staying on the correct side [2].

Fig. 1: Real-world autonomous driving examples with multiple conflicting objectives.

• We collect and formalize a diverse set of autonomous driving objectives with precise, quantitative violation measures (Sec. IV-A). • We design a Hierarchical Rulebook framework that not only encodes multiple prioritized objectives but also remains interpretable, extensible, and adaptable to diverse driving contexts (Sec. IV-B). • We curate a representative and critical set of Scenic scenarios, forming a lightweight yet comprehensive testbed for evaluating auto

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut