Behavior-Centric Extraction of Scenarios from Highway Traffic Data and their Domain-Knowledge-Guided Clustering using CVQ-VAE
Approval of ADS depends on evaluating its behavior within representative real-world traffic scenarios. A common way to obtain such scenarios is to extract them from real-world data recordings. These can then be grouped and serve as basis on which the…
Authors: Niklas Roßberg, Sinan Hasirlioglu, Mohamed Essayed Bouzouraa
Beha vior -Centric Extraction of Scenarios from Highway T raffic Data and their Domain-Knowledge-Guided Clustering using CVQ-V AE Niklas Roßberg 1 , Sinan Hasirlioglu 2 , Mohamed Essayed Bouzouraa 2 , W olfgang Utschick 3 and Michael Botsch 1 ©2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collectiv e works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this w ork in other works. DOI: t. b . d. Abstract — Appro val of A utomated Driving Systems (ADS) depends on evaluating its behavior within representativ e r eal- world traffic scenarios. A common way to obtain such scenarios is to extract them from real-w orld data recordings. These can then be grouped and serv e as basis on which the ADS is subsequently tested. This poses two central challenges: how scenarios ar e extracted and ho w they ar e grouped. Existing extraction methods rely on heter ogeneous definitions, hindering scenario comparability . For the grouping of scenarios, rule- based or Machine Learning (ML)-based methods can be utilized. However , while modern ML-based approaches can handle the complexity of traffic scenarios, unlike rule-based approaches, they lack interpretability and may not align with domain-knowledge. This work contributes to a standardized scenario extraction based on the Scenario-as-Specification con- cept, as well as a domain-knowledge-guided scenario clustering process. Experiments on the highD dataset demonstrate that scenarios can be extracted reliably and that domain-knowledge can be effectively integrated into the clustering process. As a result, the proposed methodology supports a more standardized process for deriving scenario categories from highway data recordings and thus enables a more efficient validation process of automated vehicles. I . I N T R O D U C T I O N One of the key challenges in dev eloping ADS is ensuring safe and reliable operation. T o verify this, scenarios are collected from the system’ s intended Operational Design Domain (ODD). These scenarios represent the div erse traffic situations an ADS may encounter . The system can then be tested within these scenarios, either on proving grounds or in simulation. This approach offers the advantage that testing can be significantly accelerated and made more cost-effecti ve compared to real-world testing [1]. Consequently , scenario- based testing has become a central paradigm in the research on ADS de velopment and v alidation [2]–[4]. Howe ver , the foundation for such scenarios is raw traffic recordings, such as the highD dataset [5]. The first challenge, therefore, lies in extracting individual scenarios from these recordings [3]. A straightforward approach is to define the entire trajectory of a recorded vehicle as one scenario, or to se gment it into consecuti ve, fix ed-length snippets [6]. 1 T echnische Hochschule Ingolstadt, 85049 Ingolstadt, Germany { niklas.rossberg, michael.botsch } @thi.de 2 A UDI A G, 85045 Ingolstadt, Germany { sinan.hasirlioglu, essayed.bouzouraa } @audi.de 3 T echnische Univ ersit ¨ at M ¨ unchen, 80333 M ¨ unchen, Germany wolfgang.utschick@tum.de T raffic Dataset Scenario Prepr ocessing Scenario Clustering Knowledge - Guided Clustering via CVQ-V AE 1) Behavior Change Detection 2) Scenario Extraction 3) Interaction Score Assignment .4 .7 1 Lane Change 1 Fig. 1: Overvie w of the proposed method for knowledge guided scenario extraction and clustering. Another approach is to detect ev ents and extract individual scenarios each time an ev ent occurs. A wide range of studies determine such events by applying rule-based approaches. Examples include changes in the surrounding en vironment [2], [7], changes in the ego behavior such as lane-change maneuvers [8], [9], or violations of a minimum time headw ay [10]. Howe ver , these approaches lack standardization, which can result in scenarios derived by different methods being non-comparable, hindering the development of ADS. T o ov ercome this limitation, Bouzouraa and Hasirlioglu [4] propose the Scenario as Specification (SaS) approach. This concept, explained further in section III-A, defines the start of a scenario by a change in the ego v ehicle’ s behavior . How- ev er , detection is nontrivial giv en heterogeneous behavior changes and ambiguous onsets: lane-change durations vary , measurements are noisy , and maneuvers may be aborted. Therefore, a threshold-adapti ve rule-based method to detect changes in the ego v ehicle’ s behavior is dev eloped and compared against other methods. Based on these detected changes, scenarios are subsequently extracted. Once scenarios are available, the next challenge is group- ing them into meaningful categories for testing and analysis as shown in Fig. 1. Since these categories are not kno wn a priori, this problem can be addressed using unsupervised ma- chine learning. Scenarios may dif fer substantially in structure (e.g., number of vehicles or v elocities) yet represent the same underlying maneuver , such as a cut-in. Standard clustering approaches may ov eremphasize superficial properties and miss the behavioral essence. Therefore, unsupervised clus- tering must be guided by domain-knowledge that captures what truly defines scenario similarity . This work addresses that challenge with an autoencoder-based methodology for domain-knowledge-guided clustering. A Clustering V ector Quantized - V ariational Autoencoder (CVQ-V AE) [11] is employed, and its latent representation is enriched with do- main priors from behavior change detection. T aking expert- knowledge, such as physical constrains [12] or interaction between traffic participants into account when processing scenarios can lead to remarkable results as shown in [13], [14]. Therefore, each vehicle is assigned an interaction score deriv ed from the Directed Gradient - Social Force Model (DG-SFM) [15], which directs the model tow ard the influ- ential agents and ensures that clusters reflect the underlying interaction logic rather than raw scenario complexity . The key contributions of this paper are as follows: 1) A standardized rule-based methodology for highway scenario extraction based on ego vehicle behavior changes. 2) A domain-knowledge-guided scenario clustering ap- proach that injects knowledge about vehicle interac- tions and ego behaviors into a CVQ-V AE. 3) An ev aluation of behavior change detection and sce- nario clustering using ground-truth datasets, demon- strating the ef fectiv eness of the methods. I I . R E L AT E D W O R K T o enable scenario-based validation of ADS, continuous traffic recordings must be segmented into comparable units and collected in a scenario library . Section II-A revie ws different methods to segment traffic recordings and subse- quently extract scenarios. Section II-B surveys approaches to cluster the extracted scenarios. A. Extraction of T raf fic Scenarios For the e xtraction of scenarios rule-based methods remain prev alent since they are simple, interpretable, and easy to scale. They typically define scenario boundaries via explicit triggers on ego-centric signals [8], [10], [16] or interactions [17], [18], and specify termination either by additional state conditions or by a fixed duration. Single-feature triggers use thresholds on ego measures, e.g., a scenario starts when the time headway to the leading vehicle falls below a limit, with a fixed end time [10]. Similarly , predefined scenario types with fixed durations are instantiated once ego condi- tions such as “decelerating” and “leader present” hold [16]. Event-specific rules detect maneuvers directly , e.g., lane changes from lane-ids and lateral-velocity thresholds to set start/end around the lane crossing [8]. T ag-based pipelines generalise this idea by composing semantic tags over ego state, relative kinematics, and environment context to extract segments [19]–[21]. Interaction-centered rules trigger scenar - ios when ego and neighbor trajectories merge, div erge, or cross [18]. Recent work [17] refines the interaction-centered scenarios through improv ed interaction and relev ance metrics to prioritise influential neighbors. While ef fectiv e, these approaches can be sensitiv e to thresholds and struggle to generalise across di verse scenario types [22]. On the other hand, ML-based methods learn scenario structure directly from data and can therefore complement rule-based pipelines [23], [24]. Broadly , they fall into two groups: (i) Latent-clustering approaches, which encode time steps, cluster the embeddings, and place boundaries where cluster assignments change [25]–[27]. (ii) Probabilistic la- beling approaches, which predict per-timestep probabilities for predefined scenario classes and deriv e segments from consecutiv e high-probability windo ws [24], [28]. W ithin the first group, Kreutz et al. [25] use k-means on self-supervised embeddings to detect latent regime shifts, while Chetouane et al. [27] compare alternati ve clustering algorithms for episode extraction. As supervised baselines, Elspas et al. [28] train fully con v olutional networks on programmatically generated labels to obtain per-timestep probabilities. Montanari et al. [24] couple a rule-based state machine with an RNN to aid the transition detection. A hybrid alternativ e first segments by maximising an energy objectiv e and then classifies the resulting variable-length ma- neuvers [29]. Overall, these approaches lessen a-priori tuning and handle variable durations. Nevertheless, as demonstrated by [23], e ven simple rule-based baselines achie ve competitiv e accuracy . B. Clustering of T r affic Scenarios The methodologies for clustering traffic scenarios found in literature can roughly be grouped into two different approaches: (i) rule-based similarities [8], [19], [20], [30] and (ii) learning-based clustering [9], [16], [31]–[36]. W ithin rule-based methods, distance-driven approaches compare trajectories directly: Kerber et al. [8] align vehicle positions over a scenario, Ries et al. [30] pre-filter by present object types and maneuvers and then measure similarity via Dynamic T ime W arping (DTW) on feature distances. T ag-based pipelines categorize scenarios using semantic tags from extraction (ego state, relativ e kinematics, con- text) [19], [20]. Learning-based approaches replace hand-crafted metrics with embeddings and data-adaptiv e similarities. Random- Forest approaches deriv e proximities from tree paths: Kruber et al. [31] train with synthetic noise to capture real-data structure and cluster via path proximity . Balasubramanian et al. [16] extend this approach with open-set/open-world scenario discovery . Hauer et al. [34] combine DTW with k-means on latent features to group similar scenarios. Other methods embed scene graphs with contrasti ve learning before clustering [35]. Zeng et al. [36] introduce T oeplitz Inv erse Cov ariance Clustering (TICC) for segmenting short windows into stable action clusters. Autoencoder-based approach in- cluding the V ector Quantized - V ariational Autoencoder (VQ- V AE) [37] and metric-guided variants have gained traction for their ability to compress multi-agent spatiotemporal structure into compact, clustering-friendly representations [9], [11], [32], [33], [38], [39]. Howe ver , the integration of domain-knowledge into the clustering process of scenarios remains an open challenge. T o the best of our knowledge, no existing work combines rule-based scenario extraction together with explicit guidance from that extraction to dri ve the scenario clustering. I I I . P R E L I M I N A R I E S This section introduces important terms and the methods used and extended in this work. Section III-A outlines the scenario concept and its structure. Section III-B shows a method utilized to enrich scenarios with domain-knowledge. A. Scenario Definition and Structur e The SaS approach treats scenarios as formal, testable specifications of the system [4], enabling their systematic use throughout development and for approval. It defines a scenario based on the target behavior of the ego vehicle. T arget behavior means the specified correct behavior for the ADS in the current scenario. A new scenario arises only when the tar get beha vior changes. In other words: Changes in the static or dynamic en vironment constitute a new scenario only if they imply a different tar get behavior . Example 1: The ego trav els on a highway in the same lane, following a lead vehicle. Although external conditions may v ary (ov ertakes, straight or curved road), it remains the same scenario as long as the target behavior “keep lane and maintain car- following” is unchanged. Example 2: A cut-in forces the ego to decelerate to re-establish a safe time headway . The target behavior changes to “decelerate due to cut-in until a safe distance is restored, ” and a ne w scenario is instantiated. For further details of the SaS approach, see [4]. Real-world recordings cannot provide the target behavior . Howe ver , it is essential to test against real traffic scenar- ios [19]. T o overcome this problem, scenarios are detected via observed behavior changes of the ego, referred to as ego behavior changes. This ensures comparability across scenarios from different sources. Since the focus lies on dynamic motion the highD dataset is selected as traf fic dataset [5]. The temporal extent of each scenario is constrained to a fixed observation horizon T obs , following each detected ego behavior change. This event-anchored trimming aligns samples in time, and thereby enables stable training with a shared codebook in the CVQ-V AE. The horizon T obs is chosen to capture the behavior change and the subsequent steady behavior . Also, the spatial cardinality is chosen to be fixed. A constant number N of vehicles per scenario is enforced. If fewer than N are present, pseudo-vehicles, which do not influence the clustering, are added until N is reached. Each trajectory is represented by F features per vehicle, yielding a fixed-size tensor for each scenario ξ : ξ ∈ R N × F × T obs . Concrete parameter choices are giv en in Section IV. T o inject domain-knowledge, a rule-based pseudo-class s ∈ R S encodes the ego’ s behavior change type (Section IV -B). In addition, an interaction score matrix T ∈ R N × T obs is computed per v ehicle and time step using the DG-SFM [15] to quantify each neighbor’ s relev ance to the ego maneuver . The resulting dataset is D = ξ ( m ) , s ( m ) , T ( m ) M m =1 , comprising M samples, each containing the scenario ten- sor ξ , its pseudo-class s , and interaction scores T . B. Dir ected Gradient - Social F or ce Model The DG-SFM [15] adapts the repulsiv e potential compo- nent of the Social Force Model (SFM) [40] from crowd dynamics to road traffic. The authors use it to align a T ransformer’ s attention over scenario participants with a human-interpretable, physics-based interaction score. The SFM represents interactions between agents through repul- siv e potentials, i.e., virtual forces that discourage agents from coming too close to each other . In road traffic, such potentials quantify how strongly one v ehicle affects another’ s motion and therefore serve as an interpretable proxy for interaction [15]. DG-SFM modifies this repulsive potential by shaping it asymmetrically , stretching the influence area forward in the agent’ s direction of motion, which results in an “egg-shaped” interaction field [15]. Let i denote the ego vehicle, and let J i be the set of its neighboring vehicles. For each ℓ ∈ i, j , let r ℓ ∈ R 2 and v ℓ ∈ R 2 denote the position and velocity of vehicle ℓ , respec- tiv ely . DG-SFM defines two complementary components: (i) how deeply j intrudes into i ’ s directional personal space ( ˆ β A ij ), and (ii) the short-horizon change of the interaction based on j ’ s potential field ( ˆ β B ij ): ˆ β A ij = V egg ( r j , r i , v i ) , (1) ˆ β B ij = V egg ( r ∗ i , r ∗ j , v j ) − V egg ( r i , r j , v j ) , (2) where the short-horizon extrapolation is gi ven by r ∗ ℓ = r ℓ + N DG v ℓ ∆ t ( ℓ ∈ { i, j } ), with ∆ t > 0 denoting the temporal resolution and N DG the number of discretization steps [15]. V egg is the direction-aw are repulsi ve potential along the mover’ s heading (see [15]). The components are subsequently weighted and normalized to yield a deterministic interaction distribution over all neighbors. I V . M E T H O D O L O G Y This section presents our approach to scenario extraction and the integration of domain-knowledge into the subsequent clustering stage. The CVQ-V AE serves as the foundation for clustering the scenarios. The prior knowledge incorporated originates from two sources: (i) behavior change detection and (ii) the DG-SFM. A. Pr oblem Description W e consider continuous multi-agent highway recordings sampled at ∆ t . Event-anchored, fixed-size scenarios must be extracted from this. Each scenario is represented as ξ ∈ R N × F × T obs , where N = 9 , F = 6 ( x, y , v x , v y , a x , a y ) , and T obs = 4s . Each scenario is further enriched with two sources of prior knowledge: (i) a rule-based pseudo-class s ∈ R S that one-hot encodes the ego’ s behavior change type, and (ii) an interaction score matrix T ∈ [0 , 1] N × T obs that quantifies per -agent relev ance for each timestep, derived from the DG-SFM [15]. The objecti ve is to learn a representation ˆ z = h θ ( ξ ) ∈ R d (3) that captures both the ego’ s behavior and the interaction structure with surrounding vehicles. By incorporating the prior knowledge ( s , T ) obtained during scenario preprocess- ing, the learned representation is guided to reflect domain- relev ant semantics, enabling data-dri ven yet interpretable clustering into traf fic scenario categories q ∈ { 1 , . . . , Q } . B. Behavior Change Detection and Scenario Extraction Scenario extraction based on ego vehicle behavior changes first requires identifying such changes. Therefore, each car in the dataset is treated once as the ego vehicle. Dis- crete trajectories sampled at a constant time step ∆ t , with av ailable longitudinal and lateral accelerations ( a x ( t ) , a y ( t )) and lateral velocity v y ( t ) are assumed. From these signals, sequences of consistent motion states are deri ved using adaptiv e thresholding rules that distinguish between different modes of longitudinal and lateral behavior . Longitudinal behavior: Longitudinal motion is classified using an adaptive threshold applied to the acceleration a x ( t ) . The key idea is to ignore short, transient fluctuations and to identify only sustained de viations from pre vious behavior as genuine behavior changes. A transition from the zer o state (coasting) to the acceleration state is triggered once | a x ( t ) | > τ up for at least n up consecutiv e frames . Multiple threshold–duration pairs ( τ up , n up ) are applied in parallel, e.g., (0 . 2 , 100) , (0 . 3 , 50) , (0 . 4 , 25) , in order to detect both mild but persistent accelerations and shorter but stronger bursts. Analogously , the transition back to the zer o state occurs when | a x ( t ) | < τ down for n down frames. If the acceleration magnitude e xceeds a high threshold τ extreme , the state immediately switches to extr eme , capturing emer gency- like maneuvers. The same applies to deceleration , but with negati ve thresholds. Lateral behavior: Lateral maneuv ers are detected by accumulating the lateral displacement ∆ y = t 1 X t = t 0 v y ( t ) ∆ t (4) ov er intervals where the sign of v y ( t ) remains constant. This ensures that only consistent lateral movements are considered. If the displacement magnitude exceeds a lane- change threshold τ LC , the interval is classified as a lane change ; otherwise, it is labeled as keep lane . This criterion captures completed lane changes while filtering out small oscillations within the lane. Post-pr ocessing: After the initial labeling, consecutive frames with the same state are combined into segments, and short segments ( n < 3) are remov ed to improv e robustness. Each segment is then assigned a composite label ℓ ( t ) that integrates both longitudinal beha vior (e.g., zer o , normal , extr eme ) and lateral behavior (e.g., keep lane , lane chang e ). A behavior change occurs whenever this composite label changes. Formally , valid transition points t c are defined as C = { t c | ℓ ( t c − ∆ t ) = ℓ ( t c + ∆ t ) } , (5) which mark the onset of new behaviors. C denotes the set of all detected change points t c . Consecutiv e lane-change segments are further mer ged into a single composite segment, with their associated longitudinal acceleration state attached. Scenario extraction: Each detected change point serves as the anchor for a fixed-length temporal windo w [ t c − 50 , t c + 75] , provided that sufficient trajectory data is av ailable. This yields ego-ev ent-centered scenarios that are temporally aligned to behavioral changes. For the entire time window , the following features are extracted for the ego vehicle and all surrounding vehicles: position ( x, y ) , velocity ( v x , v y ) , and acceleration ( a x , a y ) . The pseudo-class label s is constructed from the beha vior states before and after each behavior point t c and stored as a one-hot encoded vector . The total number of distinct pseudo-classes is denoted by S . C. Enriching Scenarios with Domain-Knowledge Beyond the pseudo-class label s , each scenario is enriched with per-frame interaction scores that quantify ho w strongly each neighboring vehicle influences the ego behavior . For each time step t and each neighbor j ∈ J i ( t ) of ego i , the directional repulsive components ˆ β A ij ( t ) and ˆ β B ij ( t ) are computed [15]. These components are combined into a single interaction score β ij ( t ) = τ sum ˆ β A ij ( t ) + 1 − τ sum ˆ β B ij ( t ) , (6) with τ sum ∈ [0 , 1] . Scores are then normalized framewise ov er the present neighbors using a softmax, yielding nor- malized interaction scores π ij ( t ) for each neighbor j at time step t [15]. The normalized scores are assembled into the interaction matrix T ∈ [0 , 1] N × T obs , T = t (1) · · · t ( T obs ) , where t ( t ) ∈ [0 , 1] N denotes the column at time t with components t n ( t ) = 1 , if n is the ego row , π ij ( t ) , if n corresponds to neighbor j ∈ J i ( t ) , 0 , otherwise (absent slot) . T raffic Dataset Scenario Prepr ocessing CVQ-V AE Extracted Scenario Reconstructed Scenario Codebook 1 2 3 Fig. 2: In the Scenario Preprocessing stage, ego–behavior changes are detected and used to extract scenarios from the traf fic dataset. In addition, the interaction matrix T ( m ) and the pseudo-class label vector s ( m ) are computed for each scenario. For clustering, a CVQ-V AE with a predefined number of codebook entries Q is employed. The model recei ves only the scenario trajectories ξ ( m ) as input, produces a discretized representation z ( m ) q , and predicts both the interaction matrix and the behavior class from this latent representation. If fe wer than N vehicles are present, missing slots are padded with zeros to preserve shape. The ego row is always set to 1 for all t , thereby marking the identity of the ego. D. Model Ar chitectur e for T raf fic Scenario Clustering For the clustering step, the CVQ-V AE is employed. The CVQ-V AE provides a stable backbone, onto which domain-knowledge is integrated. In contrast to con ventional autoencoders with continuous latents, vector quantization discretizes the latent space into a finite set of codebook entries as shown in Fig. 2, each of which can be interpreted as a cluster . This property makes the model particularly attractiv e for structured domains such as traffic scenarios, where discrete categories and stable cluster assignments are essential. Similar VQ-based approaches have demonstrated their effecti veness for categorical representation learning [9], [37], [38]. Here, the mechanism is leveraged to ensure a reliable partitioning of scenarios into a predefined number of clusters Q , while the proposed extensions ensure that these clusters are semantically meaningful and consistent with domain reasoning. Giv en a scenario ξ ( m ) ∈ R N × F × T obs , the encoder h θ maps the input to a continuous latent ˆ z ( m ) = h θ ξ ( m ) with ˆ z ( m ) ∈ R d . Instead of passing ˆ z ( m ) to the decoder , it is discretized by nearest-neighbor lookup in a finite codebook Z = { z 1 , . . . , z Q } with z q ∈ R d : z ( m ) q = arg min z ∈Z ˆ z ( m ) − z 2 2 , q ∈ { 1 , . . . , Q } . (7) The decoder g ψ then reconstructs the input as ˆ ξ ( m ) = g ψ z ( m ) q ∈ R N × F × T obs . By construction, vector quantization maps each scenario to one of the Q codebook entries, thereby inducing a clustering with a predefined number of traffic scenario categories Q . Further details are giv en in [38]. a) CVQ-V AE Loss: The baseline objective combines reconstruction, vector -quantization, and commitment terms: L cvq = ξ ( m ) − ˆ ξ ( m ) 2 2 + sg[ ˆ z ( m ) ] − z ( m ) q 2 2 + ˆ z ( m ) − sg [ z ( m ) q ] 2 2 . (8) where sg[ · ] denotes the stop-gradient operator . b) Knowledge-guided Losses: T o inject domain- knowledge, the discretized latent z ( m ) q driv es two linear prediction heads that do not feed into the decoder but act as auxiliary supervision signals for clustering: Pseudo-class head: As depicted in Fig. 2, a linear clas- sifier f cl ( · ) = is added to the CVQ-V AE. It maps the dis- cretized latent scenario representation z ( m ) q to logits in R S : p ( m ) = softmax f cl ( z ( m ) q ) , with f cl : R d → R S . (9) The classifier is trained to predict the one-hot pseudo-class label s ( m ) ∈ 0 , 1 S that is constructed during scenario extrac- tion. Therefore, a cross-entropy loss is employed, which di- rectly measures the discrepancy between the predicted class distribution and the expert-defined pseudo-class. This pushed the latent representation z ( m ) q to linearly separable regions that correspond to the intended ego maneuver categories: L cl = − S X i =1 s ( m ) i log p ( m ) i . (10) This objectiv e teaches the model to preserve principal ego maneuver information in the latent vector z q and to assign each scenario to clusters with similar ego maneuvers, thereby discouraging clusters that mix heterogeneous ego behaviors. Interaction head: Additionally , a linear projection head f int , also shown in Fig. 2, maps z ( m ) q into an interaction matrix prediction ˆ T ( m ) ∈ [0 , 1] N × T obs , ˆ T ( m ) = σ f int ( z ( m ) q ) , with f int : R d → R N × T obs . (11) where σ ( · ) denotes an elementwise sigmoid to ensure [0 , 1] range and the reshape aligns with the target shape. A squared error penalizes de viations: L int = T ( m ) − ˆ T ( m ) 2 2 . (12) Intuitiv ely , the interaction head aligns clusters with a coher- ent spatiotemporal interaction pattern across agents. In doing so, the model is steered to recognize characteristic interaction structures and to discount incidental, low-rele v ance vehicles when forming clusters. T ogether with the pseudo-class head, this discourages mixing scenarios that differ in behavioral logic, e ven when their kinematics are superficially similar . c) T otal objective: The overall training loss is the weighted sum L total = L cvq + λ cl L cl + λ int L int , (13) with λ cl , λ int ≥ 0 controlling the strength of knowledge guidance. This con verts clustering from a purely data-dri ven grouping into a kno wledge-guided process. T raffic scenario clusters become discrete, interpretable, and aligned with domain semantics captured by ( s , T ) . V . E V A L UA T I O N This section describes the dataset used in the experiments and outlines the implementation details. It further ev aluates the performance of the behavior change detection and ex- amines the impact of integrating domain-knowledge into the clustering process. A. Dataset In this work, the publicly av ailable highD dataset [5] is used. It comprises highway trajectory recordings at 25 Hz collected at six locations in Germany . T o ensure comparabil- ity across scenarios, the analysis is restricted to recordings with three lanes. For the clustering part in this study , we focus on categories in which the ego transitions from keep lane to lane change (KL → LC), thereby reducing the prob- lem scope. The resulting selection contains 16 , 768 scenarios. Of these, 85% are used for training and 15% are held out as a v alidation split during training. B. Behavior Change Detection For v alidation, 100 vehicles and their trajectories from the highD dataset were manually annotated. Behavior changes were labeled whenev er a lane change was observable or a sustained change in speed occurred. Because the exact onset is not always unambiguous, each e vent was represented by a 50 -frame (= 2 s) window during which the change takes place. In total 119 behavior changes were detected. Composite labels were assigned as described in Sec. IV -B. A predicted change point ˆ t c is counted as a true positive if it falls inside the corresponding ground-truth windo w and the composite label matches, otherwise it is counted as a false positiv e. Ground-truth ev ents without a matching prediction contribute to false negati ves. Precision, recall, and the counts (TP/FP/FN) are reported. T o benchmark the introduced rule–based beha vior change, two alternati ve methods are introduced. First, the exponential T ABLE I: Ev aluation of behavior –change detection on the manually annotated subset. Best v alues are in bold . Method Precision ↑ Recall ↑ TP FP FN Rule-based 0.741 0.916 109 38 10 EMA [29] 0.196 0.244 29 119 90 CVQ-V AE 0.302 0.723 86 199 33 moving av erage (EMA) change detector proposed in [29] is applied. The EMA segments the time series by dilating a temporal window around each time step t , computing the scaled window energy , and declaring e vents at local energy maxima. W indow sizes of { 30 , 60 , 90 } are e v aluated to probe short-, mid-, and longer–range smoothing. As a data–dri ven alternativ e, all trajectories from the highD dataset are par- titioned into fixed–length snippets and embedded with a separate CVQ–V AE. The embeddings are assigned to 64 clusters during the unsupervised training. At test time, ego trajectories are again segmented into snippets and each snippet is assigned to a cluster . A behavior change is declared whenev er the assigned cluster switches between consecutive snippets of the same ego trajectory . The results for the approaches are presented in table I. The results show that the rule–based detection substan- tially outperforms the two alternative approaches. The EMA baseline detects at least one behavior change per trajectory by design, which leads to a large number of false positiv es. Moreov er , it often fails to align with the relativ ely gradual behavior changes observed on highways. While the unsu- pervised CVQ-V AE clustering is conceptually well-suited to capture complex patterns, it achieves poor performance. The strong imbalance in highway data, where most vehicles exhibit near -constant motion, means that substantially larger and more div erse datasets would be required to detect behavior changes reliably . In contrast, highway behavior changes follow well-structured and domain-specific traffic dynamics that can be formalized explicitly , enabling the rule-based method to exploit domain-knowledge effecti vely . Consequently , mainly for smaller traffic datasets, expert- driv en rules remain a reliable option for behavior change detection in this context. C. T r affic Scenario Clustering The previously described subset of data is clustered into Q = 64 clusters. T o e valuate to which extent it is required to introduce domain-knowledge into the clustering process, a random subset of previously extracted scenarios was aug- mented. Specifically , 50 scenarios were generated in which an additional vehicle was inserted at a larger distance from the ego vehicle. This inserted vehicle maintains a constant velocity and does not perform a lane change. T o ensure realistic motion patterns, the trajectory of the added vehicle was taken from the actual trajectory of a suitable vehicle in the dataset. An illustration of a single time step t of such an augmented scenario is shown in Fig. 3. During Fig. 3: Exemplary augmented scenario where the additional vehicle (green) has no influence on the ego vehicle (red). augmentation, care w as taken to ensure that the added vehicle has no influence on the observ ed ego behavior . Accordingly , its interaction was fixed to t aug = 0 for all observation steps T obs . The underlying idea is to e valuate ho w well the model captures behaviorally relev ant motion patterns. T wo metrics are employed to assess clustering quality and the contribution of domain-knowledge. The first metric is based on pseudo-classes that represent the ego behavior . For each latent codebook entry z q , the linear classifier predicts the probability distribution over all pseudo-classes, as described in Eq. (9). If the model consistently predicts a single pseudo-class for a giv en cluster , this indicates that the cluster contains scenarios belonging to the same behavioral category . Conv ersely , if predictions are spread across multiple pseudo-classes, this suggests that the clus- ter mixes different behaviors. T o quantify this effect, the Shannon entropy H avg = E q =1 ...Q [ H q ] is computed across all codebook entries [9]. The entropy H q for each entry q is defined as H q = − S X i =1 p cl ,i log 2 p cl ,i , (14) where p cl ,i denotes the predicted probability of pseudo- class S i for the latent representation ˆ z ( m ) . The entropy ranges from complete purity ( H q = 0 ) to maximum impurity ( H q = log 2 (10) ≈ 3 . 322 ) gi ven S = 10 classes. The second metric ev aluates the consistency of cluster as- signments for the augmented scenarios. For each augmented variant, we v erify whether it is assigned to the same cluster as its corresponding original scenario. Since the augmentation was e xplicitly designed to have no effect on the e go behavior , consistent assignments are e xpected. The fraction of correctly matched pairs o ver the total number of augmented scenarios defines an accuracy measure: accur acy = correct assignments total number of augmented scenarios . (15) In a first step, the CVQ-V AE was trained on the original scenarios with the knowledge-guided loss weights set to λ cl = 0 and λ int = 0 , i.e., without incorporating domain- knowledge. During inference, the augmented scenarios were passed through the model, and their cluster assignments were ev aluated. Based on these assignments, the two metrics were computed and reported in T able II under the column “no DK” (no domain-knowledge). In the next step, the weighting parameters were set to λ cl = 1 and λ int = 1 , thereby integrating domain-knowledge into the training. The corresponding results are listed in T able II under the column “DK”. T ABLE II: Clustering on the train split: cluster-purity ( ↓ ) and augmentation cluster accuracy ( ↑ ). DK = Domain- Knowledge. Best values per row are in bold . Backend no DK DK purity ↓ accuracy ↑ purity ↓ accuracy ↑ CVQ-V AE 3.014 0.068 1.243 0.568 k-means 3.014 0.091 1.127 0.182 Hierarchical 3.010 0.068 1.153 0.250 In addition to the CVQ-V AE codebook clustering, two further clustering approaches were applied in order to consol- idate the e valuation. Specifically , k -means clustering and hi- erarchical agglomerative clustering were performed directly on the latent scenario representations ˆ z obtained from the CVQ-V AE encoder . By including these baseline methods, the ev aluation allows for a fair assessment of whether improv e- ments are specific to the CVQ-V AE quantization mechanism or can also be observed when applying standard clustering algorithms to the learned latent representations. The results are summarized in T able II. Without domain- knowledge, all three clustering methods exhibit low aug- mentation accuracy , indicating that augmented scenarios are frequently assigned to clusters different from their original counterparts. In parallel, the high entropy values rev eal that multiple ego maneuvers are mixed within single clusters, which demonstrates the difficulty of separating behaviors solely from the raw latent representations. Once domain- knowledge is injected, clustering quality improv es consis- tently across all methods: both purity loss decreases and augmentation accuracy increases. The effect is most pro- nounced for CVQ-V AE codebook clustering. Here, augmen- tation accuracy rises from 0 . 068 to 0 . 568 , demonstrating that directly integrating domain priors through the codebook and auxiliary heads provides stronger guidance than applying generic clustering algorithms to the latent space. Although purity is not the best among all methods, the CVQ-V AE nev ertheless achie ves competiti ve results. This indicates that CVQ-V AE clusters, while slightly less homogeneous, are far more consistent in preserving the behavioral identity of scenarios across augmentations These findings indicate that the dataset alone does not provide enough variability to support reliable data-driv en clustering, making domain- knowledge indispensable for learning meaningful scenario semantics. V I . C O N C L U S I O N In this work we introduced a rule-based approach for extracting highway scenarios by detecting ego–beha vior changes, thereby providing a standardized and interpretable mechanism for deriving scenarios from highway data record- ings. Second, we proposed a domain-kno wledge-guided clus- tering framework that incorporates information about vehicle interactions and ego vehicle behavior changes. Experiments performed on the highD dataset show that training with- out such domain-knowledge produces behaviorally mixed clusters and low assignment consistency . When domain- knowledge is incorporated, the clusters become more coher- ent and the assignments more stable. This demonstrates that, under limited scenario av ailability , injecting expert knowl- edge provides a clear advantage for clustering. The largest gains are achie ved with CVQ-V AE codebook clustering. A current limitation of this study is the use of fixed-length windows, which can cut off or mer ge behaviors at segment boundaries. Future work will therefore focus on clustering variable-length scenarios with a varying number of traffic participants. A C K N O W L E D G M E N T W e appreciate the funding of this work by Audi A G. R E F E R E N C E S [1] N. Kalra and S. M. Paddock, “Driving to safety: Ho w many miles of driving would it take to demonstrate autonomous vehicle reliability?” T ransportation Resear ch P art A: P olicy and Practice , 2016. [2] J. Langner , H. Grolig, S. Otten, M. Holz ¨ apfel, and E. Sax, “Logical scenario deri vation by clustering dynamic-length-segments e xtracted from real-world-dri ving-data, ” in Pr oceedings of the 5th Interna- tional Conference on V ehicle T echnology and Intelligent T ransport Systems, (VEHITS) , 2019. [3] C. Neurohr , L. W esthofen, T . Henning, T . de Graaf f, E. M ¨ ohlmann, and E. B ¨ ode, “Fundamental considerations around scenario-based testing for automated driving, ” in IEEE Intelligent V ehicles Sympo- sium (IV) , 2020. [4] M. E. Bouzouraa and S. Hasirlioglu, “Scenario as specification: Structuring the development and deployment of automated driving, ” in IEEE/A CM 1st International W orkshop on Software Engineering for Autonomous Driving Systems (SE4ADS) , 2025. [5] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems, ” in IEEE International Confer ence on Intelligent T ransportation Sys- tems (ITSC) , 2018. [6] N. Epple, T . Hank ofer, and A. Riener , “Scenario classes in naturalis- tic dri ving: Autoencoder-based spatial and time-sequential clustering of surrounding object trajectories, ” in IEEE 23rd International Con- fer ence on Intelligent T ransportation Systems (ITSC) , 2020. [7] M. Schuldes, C. Glasmacher, and L. Eckstein, “scenario.center: Methods from real-world data to a scenario database, ” in 2024 IEEE Intelligent V ehicles Symposium (IV) , 2024. [8] J. Kerber , S. W agner, K. Groh, D. Notz, T . K ¨ uhbeck, D. W atzenig, and A. Knoll, “Clustering of the scenario space for the assessment of automated driving, ” in IEEE Intelligent V ehicles Symposium (IV) , 2020. [9] M. Neumeier, S. Dorn, M. Botsch, and W . Utschick, “Reliable trajectory prediction and uncertainty quantification with conditioned diffusion models, ” in IEEE/CVF Conference on Computer V ision and P attern Recognition (CVPR) W orkshops , 2024. [10] L. Balasubramanian, J. Wurst, M. Botsch, and K. Deng, “T raffic scenario clustering by iterati ve optimisation of self-supervised net- works using a random forest acti vation pattern similarity , ” in IEEE Intelligent V ehicles Symposium (IV) , 2021. [11] C. Zheng and A. V edaldi, “Online clustered codebook, ” in IEEE/CVF International Confer ence on Computer V ision (ICCV) , 2023. [12] R. Egolf, A. Fertig, and M. Botsch, “Conditioned trajectory gener- ation for realistic driving scenarios via a hybrid machine learning architecture, ” in 2025 IEEE Intelligent T ransportation Systems Con- fer ence (ITSC) , 2025. [13] T . Elter , T . Dirndorfer , M. Botsch, and W . Utschick, “Interaction- aware prediction of occupancy regions based on a pomdp frame- work, ” in 2022 IEEE 25th International Conference on Intelligent T ransportation Systems (ITSC) , 2022. [14] T . Elter , T . Dirndorfer , M. Botsch, and W . Utschick, “V alidation of a pomdp framework for interaction-aware trajectory prediction in vehicle safety , ” in 2025 IEEE Intelligent V ehicles Symposium (IV) , 2025. [15] M. Baden, A. Abouelazm, C. Hubschneider , Y . Wu, D. Slieter , and J. M. Z ¨ ollner , “Tpk: Trustworthy trajectory prediction integrating prior knowledge for interpretability and kinematic feasibility , ” in IEEE Intelligent V ehicles Symposium (IV) , 2025. [16] L. Balasubramanian, J. Wurst, M. Botsch, and K. Deng, “Open-world learning for traf fic scenarios categorisation, ” IEEE T ransactions on Intelligent V ehicles (IV) , 2023. [17] C. Chang, J. Zhang, J. Ge, Z. Zhang, J. W ei, L. Li, and F .-Y . W ang, “V istascenario: Interaction scenario engineering for vehicles with intelligent systems for transport automation, ” IEEE T ransactions on Intelligent V ehicles (IV) , 2024. [18] C. King, T . Braun, C. Braess, J. Langner, and E. Sax, “Capturing the variety of urban logical scenarios from bird-view trajectories, ” in 7th International Conference on V ehicle T echnolo gy and Intelligent T ransport Systems (VEHITS) , 2021. [19] D. Guo, M. M. S ´ anchez, E. de Gelder , and T . P . van der Sande, “Sce- nario extraction from a large real-world dataset for the assessment of automated vehicles, ” in IEEE International Conference on Intelligent T ransportation Systems (ITSC) , 2023. [20] E. d. Gelder, J. Manders, C. Grappiolo, J.-P . Paardekooper , O. O. d. Camp, and B. D. Schutter, “Real-world scenario mining for the assessment of automated vehicles, ” in IEEE 23rd International Con- fer ence on Intelligent T ransportation Systems (ITSC) , 2020. [21] L. Hartjen, R. Philipp, F . Schuldt, F . Ho war , and B. Friedrich, “Clas- sification of driving maneuvers in urban traffic for parametrization of test scenarios, ” 2019. [22] J. Cai, W . Deng, H. Guang, Y . W ang, J. Li, and J. Ding, “ A surve y on data-driven scenario generation for automated vehicle testing, ” Machines , v ol. 10, no. 11, p. 1101, 2022. [23] A. Erdogan, B. Ugranli, E. Adali, A. Sentas, E. Mungan, E. Kaplan, and A. Leitner, “Real- world maneuver extraction for autonomous vehicle validation: A comparative study , ” in 2019 IEEE Intelligent V ehicles Symposium (IV) , 2019. [24] F . Montanari, H. Ren, and A. Djanatliev , “Scenario detection in unlabeled real dri ving data with a rule-based state machine supported by a recurrent neural network, ” in IEEE 93r d V ehicular T echnology Confer ence (VTC) , 2021. [25] T . Kreutz, O. Esbel, M. M ¨ uhlh ¨ auser , and A. S. Guinea, “Unsuper- vised driving e vent discov ery based on vehicle can-data, ” in 2022 IEEE 25th International Conference on Intelligent T ransportation Systems (ITSC) , 2022. [26] E. Rodr ´ ıguez-Hern ´ andez, J. I. V asquez, C. A. Duchanoy Mart ´ ınez, and H. T aud, “Unsupervised driving situation detection in latent space for autonomous cars, ” Applied Sciences , v ol. 12, no. 7, 2022. [27] N. Chetouane and F . W otaw a, “On the application of clustering for extracting driving scenarios from vehicle data, ” Machine Learning with Applications (ML W A) , 2022. [28] P . Elspas, Y . Klose, S. Isele, J. Bach, and E. Sax, “Time series segmentation for driving scenario detection with fully conv olutional networks, ” in Pr oceedings of the 7th International Confer ence on V ehicle T echnology and Intelligent T ransport Systems (VEHITS) , 2021. [29] A. Aboah et al. , “Driver maneuv er detection and analysis using time series segmentation and classification, ” Journal of T ransportation Engineering, P art A: Systems , 2023. [30] L. Ries, P . Rigoll, T . Braun, T . Schulik, J. Daube, and E. Sax, “T rajectory-based clustering of real-world urban driving sequences with multiple traf fic objects, ” in IEEE International Intelligent T rans- portation Systems Confer ence (ITSC) , 2021. [31] F . Kruber , J. W urst, E. S. Morales, S. Chakraborty , and M. Botsch, “Unsupervised and supervised learning with the random forest al- gorithm for traffic scenario clustering and classification, ” in IEEE Intelligent V ehicles Symposium (IV) , 2019. [32] J. Zhao, J. Fang, Z. Y e, and L. Zhang, “Large scale autonomous driving scenarios clustering with self-supervised feature extraction, ” in IEEE Intelligent V ehicles Symposium (IV) , 2021. [33] J. W urst, L. Balasubramanian, M. Botsch, and W . Utschick, “Expert- lasts: Expert-knowledge guided latent space for traffic scenarios, ” in IEEE Intelligent V ehicles Symposium (IV) , 2022. [34] F . Hauer , I. Gerostathopoulos, T . Schmidt, and A. Pretschner , “Clus- tering traffic scenarios using mental models as little as possible, ” in IEEE Intelligent V ehicles Symposium (IV) , 2020. [35] M. Zipfl, M. Jarosch, and J. M. Z ¨ ollner , “T raffic scene similarity: a graph-based contrasti ve learning approach, ” in IEEE Symposium Series on Computational Intelligence (SSCI) , 2023. [36] Z. Zeng, S. Liu, Z. Bao, Q. Zhang, P . W ang, and Z. Hu, “ An ef- fectiv e and rob ust dri ving scenario identification framework utilizing unsupervised covariance clustering, ” in IEEE Intelligent V ehicles Symposium (IV) , 2025. [37] A. van den Oord, O. V inyals, and K. Kavukcuoglu, “Neural discrete representation learning, ” in Neural Information Pr ocessing Systems (NIPS) , 2017. [38] N. Roßberg, M. Neumeier, S. Hasirlioglu, M. E. Bouzouraa, and M. Botsch, “ Assessing the completeness of traffic scenario categories for automated highway dri ving functions via cluster -based analysis, ” in IEEE Intelligent V ehicles Symposium (IV) , 2025. [39] A. Fertig, L. Balasubramanian, and M. Botsch, “Clustering and anomaly detection in embedding spaces for the v alidation of auto- motiv e sensors, ” in 2024 IEEE Intelligent V ehicles Symposium (IV) , 2024, pp. 1076–1083. [40] D. Helbing and P . Molnar, “Social force model for pedestrian dynam- ics, ” Physical r evie w E , 1995.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment