Data is All You Need: Markov Chain Car-Following (MC-CF) Model
Car-following behavior is fundamental to traffic flow theory, yet traditional models often fail to capture the stochasticity of naturalistic driving. This paper introduces a new car-following modeling category called the empirical probabilistic parad…
Authors: Sungyong Chung, Yanlin Zhang, Nachuan Li
Data is All Y ou Need: Marko v Chain Car -Follo wing (MC-CF) Model Sungyong Chung 1 , Y anlin Zhang 1 , Nachuan Li 2 , Dana Monzer 2 , and Alireza T alebpour * 1 1 Department of Civil and En vir onmental Engineering, Univer sity of Illinois Urbana-Champaign 2 Northwestern University T ransportation Center , Northwestern Univer sity Abstract Car-follo wing behavior is fundamental to traffic flow theory , yet traditional models often fail to capture the stochasticity of naturalistic dri ving. This paper introduces a ne w car-follo wing modeling category called the empirical probabilistic paradigm, which bypasses con ventional parametric assumptions. W ithin this paradigm, we propose the Markov Chain Car -Follo wing (MC-CF) model, which represents state transitions as a Marko v process and predicts behavior by randomly sampling accelerations from empirical distrib utions within discretized state bins. Ev al- uation of the MC-CF model trained on the W aymo Open Motion Dataset (W OMD) demonstrates that its v ariants significantly outperform physics-based models including IDM, Gipps, FVDM, and SIDM in both one-step and open-loop trajectory prediction accuracy . Statistical analysis of transition probabilities confirms that the model-generated trajectories are indistinguishable from real-world behavior , successfully reproducing the probabilistic structure of naturalistic dri ving across all interaction types. Zero-shot generalization on the Naturalistic Phoenix (PHX) dataset further confirms the model’ s robustness. Finally , microscopic ring road simulations validate the frame work’ s scalability . By incrementally integrating unconstrained free-flow trajectories and high-speed freeway data (TGSIM) alongside a conservati ve inference strategy , the model drasti- cally reduces collisions, achie ving zero crashes in multiple equilibrium and shockwav e scenarios, while successfully reproducing naturalistic and stochastic shockwav e propagation. Ov erall, the proposed MC-CF model provides a robust, scalable, and calibration-free foundation for high- fidelity stochastic traffic modeling, uniquely suited for the data-rich future of intelligent trans- portation. K e ywor ds: Marko v Chain, Empirical, Car-F ollowing Model, W aymo Open Motion Dataset 1. INTR ODUCTION Car-follo wing behavior analysis forms a cornerstone of traf fic flow theory , providing the microscopic foun- dation for understanding traffic dynamics, from individual vehicle interactions to large-scale congestion patterns. F or decades, this understanding has relied on models dev eloped from limited, often experimental * Corresponding author . Email: ataleb@illinois.edu 1 datasets. Ho wev er , the recent emergence of large-scale, high-fidelity trajectory datasets such as W aymo Open Motion Dataset (WOMD) [1, 2], L yft le vel-5 Open Dataset [3], and Argov erse Dataset [4], has trans- formed this landscape. These datasets capture millions of real-world interactions in complex urban en viron- ments at high temporal resolution, encompassing both automated vehicles (A Vs) and human-dri ven vehicles (HD Vs). This data re volution presents a unique opportunity to build a ne w generation of car -following models that are more accurate, robust, and representati ve of real-world dri ving complexity . T raditionally , car-follo wing research has been dominated by parametric, physics-based formulations such as the Gipps model [5] and the Intelligent Dri ver Model (IDM) [6], along with stochastic variants like the Stochastic IDM (SIDM) [7]. These models hav e been in valuable for their interpretability , as they frame dri ving behavior through well-defined rules and parameters related to safety , speed, and spacing. Ho wev er , their reliance on simplified assumptions and limited number of parameters limits their capacity to exploit the richness of modern trajectory data. As a result, they often fall short in capturing behavioral heterogeneity , the stochastic nature of human dri ving, and the emergent dynamics in mixed traf fic of A Vs and HDVs. Recent dev elopments in data-driv en modeling, powered by machine learning techniques such as neural networks, have brought remarkable advances [8–14]. These models excel at capturing complex, nonlinear relationships directly from data, frequently achieving higher predicti ve accurac y than traditional approaches. Ho wev er , they often function as black boxes, offering limited insight into the behavioral mechanisms be- hind their predictions. This opacity poses challenges for interpretability and safety v alidation in simulation en vironments. T o address this gap, this paper introduces a no vel, purely data-dri ven car -following modeling cate- gory called the empirical probabilistic paradigm. This paradigm retains the transparenc y of classical state based logic while abandoning restrictiv e physics or psychology based parametric assumptions about driv er behavior . Building on recent empirical evidence from Li et al. [15], which provides rigorous statistical v alidation for the Markov property in car-follo wing state transitions where the state is defined by follower speed, leader speed, and spacing, this new category represents car-follo wing dynamics entirely through (i) empirical transition probabilities between discrete traffic states and (ii) state conditional empirical accelera- tion distributions learned directly from trajectory data. As a result, behavioral interpretation is statistically grounded. Each traffic state is associated with an observable acceleration profile, enabling direct inference of mean response and v ariability across regimes without in voking latent parameters. As a concrete realization of this paradigm, we propose the Mark ov Chain Car-Follo wing (MC-CF) model and its v ariants. Specifically , the MC-CF model learns state transition probabilities directly from tra- jectory data ov er a discretized state space and predicts behavior by sampling accelerations from the empiri- cal acceleration distribution of the estimated next state. By synthesizing a statistically validated Marko vian structure with pure data-driven sampling, the MC-CF model provides a simple yet powerful nonparametric frame work explicitly designed to le verage lar ge scale naturalistic driving data. W e note that the MC-CF model is particularly suited for the data-rich era ahead. As A V deployment accelerates, the volume of high-quality driving data collected from onboard sensors is expected to grow exponentially . Unlike parametric models that require complete recalibration when new data arriv e, the inherent structure of the MC-CF model allo ws continuous refinement. Specifically , new trajectory data can be seamlessly incorporated by updating transition probabilities and adding acceleration samples to each state’ s empirical acceleration distribution. This scalability and adaptability make the MC-CF model a future- proof tool for data-driven transportation analysis, capable of becoming more accurate and robust as more data becomes av ailable. The remainder of this paper is organized as follows. Section 2 revie ws physics-based and data-driv en car-follo wing models and positions the proposed approach within the broader modeling landscape. Section 3 describes the trajectory dataset and preprocessing procedures used for empirical implementation. Section 4 presents the MC-CF paradigm, including state definition, transition probability estimation, and construction 2 of state-conditional acceleration distrib utions. Section 5 ev aluates the proposed framework against repre- sentati ve car-follo wing model baselines in terms of trajectory accurac y with empirical data. W e then discuss model scalability in microscopic simulation and its implications for data-rich transportation systems. Fi- nally , Section 6 concludes the paper and outlines directions for future research. 2. LITERA TURE REVIEW The landscape of car -following modeling has continuously ev olved in pursuit of balancing empirical realism with structural transparency . T raditionally , this domain has been dominated by parametric, physics-based equations. While these classical models offer clear theoretical insights, they often struggle to capture the full spectrum of naturalistic human dri ving. In response, data-driv en models emerged, offering a more accurate and reliable representation of real-world behaviors by directly leveraging trajectory datasets without restricti ve parametric constraints [8]. Ho wev er , these machine learning approaches are frequently criticized for functioning as black boxes; they of fer limited insight into traffic flo w theory and present challenges for safety validation and transferability across different traffic conditions [8]. T o navigate these competing challenges, the modeling landscape currently spans classical physics-based models and complex machine learning architectures, both of which struggle to find an ideal middle ground between empirical flexibility and behavioral interpretability . 2.1. Physics-Based Car -Follo wing Models Physics-based car-follo wing models hav e long served as foundational tools for simulating driv er behavior and traffic dynamics by employing well-defined mathematical rules. For instance, the Gipps model [5], an extensi vely used frame work, operates on the assumption that individuals have a limited range of desirable acceleration and deceleration rates, enforcing strict collision-av oidance constraints. Howe ver , it has been heavily criticized for its inability to replicate metastability , often resulting in unconditionally stable traffic states [16, 17]. Seeking to capture smoother , more continuous car -following dynamics, the IDM [6] became highly adopted. Despite its popularity , the IDM struggles to simulate reaction delays and lacks the psychological human-factors necessary to reproduce heterogeneous, real-world reactions, sometimes yielding unrealistic vehicle dynamics [18, 19]. T o address this deterministic rigidity , the SIDM [7] introduced time-varying stochastic fluctuations to reproduce flow oscillations and driv er indifference. Ne vertheless, adding random noise to deterministic equations is generally criticized, as it can inadv ertently cause negati ve vehicle speeds [20]. Parallel to the IDM f amily , the Full V elocity Dif ference Model (FVDM) [21] was introduced to improv e the generalized force model (GFM) [22] by incorporating the impacts of both positiv e and negati ve relativ e speeds. V ariants of the FVDM, such as those utilizing constant time headw ays (FVDM-CTH) or non-linear sigmoids (FVDM-sigmoid), ha ve been used to model A Vs. Y et, empirical ev aluations have sho wn that sim- pler models like Gipps can sometimes outperform these FVDM variants in A V modeling [23]. Extending these physics-based principles to connected en vironments, V an Arem [24] modeled cooperativ e adapti ve cruise control systems utilizing vehicle-to-vehicle communication to allow for smaller following spacings. Ho wev er , similar to its predecessors, this model enforces specific safety constraints and desired traffic v ari- ables without allo wing for the flexible, heterogeneous objecti ve prioritization seen in actual dri ving [25]. Recognizing the limitations of static parameters in these traditional equations, a recent e volution within physics-based modeling has employed Markovian frame works to capture the dynamic, multi-modal nature of driving. These studies conceptualize driving as a sequence of transitions between discrete beha vioral states or latent regimes. For instance, Zaky and Gomaa [26] applied a Markov switching regression model to classify regimes such as stable following and braking. Zou et al. [27] utilized a Coupled Hidden Markov Model (CHMM) to segment driving into primitive patterns, while Y ao et al. [28] introduced the concept of 3 action-chains using coupled Markov chains. More recently , Zhang et al. [29] proposed a Factorial Hidden Marko v Model (FHMM) integrated directly with the IDM, calibrating a unique set of IDM parameters for each latent driving re gime. While this study significantly enhances accuracy by modeling regime switching with a lar ger set of calibrated parameters, it remains fundamentally anchored in the physics-based paradigm, as the underlying vehicle dynamics within each state are still dictated by rigid parametric equations. 2.2. Data-Driven Car -F ollowing Models T o bypass the restricti ve assumptions of parametric equations, researchers ha ve increasingly turned to data- dri ven techniques. Papathanasopoulou and Antoniou [8] de veloped an improved weighted regression (loess) technique to model car-follo wing behavior and compared it with the Gipps model to find that it outperformed the latter . The model was based on the dataset collected in Naples by Punzo et al. [30]. Howe ver , the model needs more validation before being integrated in traffic simulation [8]. He et al. [9] introduced a nonparametric car-follo wing model using k-nearest neighbor (kNN) and field data, a voiding assumptions about dri ver behavior parameters or fundamental diagrams. Based on the idea that driv ers respond to traffic stimuli in consistent, learned patterns, the model accurately reproduced empirical traffic features like stop- and-go oscillations. Howe ver , because the kNN approach predicts vehicle movement by taking the strict av erage of the most similar historical cases, it inherently smooths out the natural, heterogeneous variability of dri ver responses. Consequently , to capture stochastic driving errors and test traffic instability , the model relies on the artificial injection of white Gaussian noise, rather than deriving the true probabilistic distribution of dri ving behaviors directly from the empirical data. Artificial neural networks (ANNs) have also brought remarkable advances. Panwai and Dia [10] dev el- oped an ANN-based model that demonstrated superior performance compared to Gipps and psychophysical models. A more recent paper presented a deep neural network-based model using a Gated Recurrent Unit (GR U) [11], incorporating temporal dependencies to achie ve higher simulation accuracy than Forw ard Neu- ral Network (FNN)-based models and the IDM. Another direction focuses on learning-based models, such as sequence-to-sequence (seq2seq) approaches that account for memory effects and driv er reaction delays [13]. While these data-driven models excel at capturing complex, nonlinear relationships and achieving high predicti ve accuracy , they frequently function as black boxes. This opacity poses significant challenges for behavioral interpretability and safety v alidation in simulation environments. 2.3. The Empirical Probabilistic Paradigm T o bridge the gap between the transparenc y of classical physics-based logic and the fle xibility of data-dri ven approaches, this paper introduces a novel modeling category called the empirical probabilistic paradigm. Models in this class av oid restricti ve physics-based parametric assumptions and instead derive behavioral dynamics directly from observed data distrib utions. The foundation for this paradigm relies on the rigorous statistical validation of state transitions. A recent study by [15] confirmed that the Markov property holds for vehicle state transitions, where a state is explicitly defined by the leader vehicle’ s speed, the follo wer vehicle’ s speed, and the spacing between them. Building directly on this v alidated principle, we propose the MC-CF model. Crucially , the MC-CF model div erges from the regime-switching approaches discussed in Section 2.1. Instead of identifying latent behavioral regimes and calibrating separate parametric equations within them, the MC-CF model learns the state transition probabilities directly from trajectory data ov er a discretized state space. Furthermore, rather than assuming an y deterministic acceleration function for a gi ven state, our model embraces the inherent stochasticity of driving by representing acceleration as an empirical proba- bility distribution learned from the data for each discrete state. By making no assumptions other than the statistically validated Marko v property of state transitions, the new modeling category provides a simple and transparent frame work that deri ves both state dynamics and action distrib utions entirely from empirical 4 e vidence. 3. D A T A DESCRIPTION This study primarily draws on the WOMD [2] in order to train the proposed models. WOMD captures car- follo wing interactions near the W aymo vehicles (A Vs), and provides a data v olume that is unprecedented compared to other datasets commonly used for car-follo wing analysis, thereby enabling the introduction of the proposed empirical probabilistic paradigm. The dataset includes all interaction types: A V -following- HD V , HD V -follo wing-A V , and HD V -follo wing-HDV . For each interaction type, 90% of the extracted car - follo wing pairs are used for training, and the remaining 10% are reserved for testing to ev aluate the perfor- mance of the proposed and baseline models. In addition, we employ the Naturalistic Phoenix (PHX) dataset [31] to assess zero-shot generalization. Here, zero-shot generalization refers to a model’ s ability to generate motion predictions for time series originating from previously unseen datasets [32]. Specifically , for the zero-shot generalization analysis, the models are trained on WOMD training dataset and validated on the entire PHX dataset. From each dataset, we extract key v ariables relev ant to car-follo wing modeling, including follower and leader positions (x and y coordinates), speeds, accelerations, spacings, and relativ e speeds. T rajectories in both datasets are sampled at 10 Hz. Preprocessing in volves filtering for valid car-follo wing pairs based on the following criteria: a minimum duration of 10 seconds, a maximum spacing of 45 m, and a minimum speed threshold of 3 m/s (that is, the maximum speed within each car-follo wing pair must exceed 3 m/s to ensure that at least one vehicle is moving). T ime steps with accelerations outside the range of -10 m / s 2 to 5 m / s 2 are remov ed, as they likely stem from sensor noise; these accounted for approximately 0.81% of the WOMD and 1.26% of the PHX. Additionally , the first and last 2 seconds of each pair are discarded to minimize non car-follo wing behavior related to lane changes or merging. The resulting car-follo wing pairs are then partitioned into three groups based on interaction type within each dataset. 3.1. W aymo Open Motion Dataset The WOMD provides a large-scale benchmark for motion forecasting, comprising over 100,000 interactiv e 20 second scenes across 1,750 km of roadw ays in six US cities [2]. The dataset provides information includ- ing scenario IDs, unique tracking IDs for objects, object types (vehicles, pedestrians, and cyclists), W aymo vehicle identifiers, and detailed object attributes including position (x, y , z), dimensions (length, width, height), heading, and velocity . In addition, the dataset offers map-related features for each scenario repre- sented as 3D polylines, including lane centers, lane boundaries, and road boundaries. These map features also specify the trav el direction associated with each lane center ID and describe its connections to other lane center IDs, such as exit lanes and adjacent left and right lanes. Howe ver , in contrast to the traditional NGSIM trajectory dataset [33], which is one of the most widely used in car -follo wing studies, W OMD does not include assigned lane center IDs for each vehicle, a k ey element for identifying leader–follo wer pairs. T o systematically extract all possible car -following pairs, vehicles were assigned to lanes by mapping their position to the nearest point on lane centers. As vehicles moved, lane ass ignments were updated based on the closest point from the (i) current lane center ID, (ii) any exit lane center IDs, or (iii) neighboring left and right lane center IDs. After assigning lane centers to each vehicle at each time step, we identified leaders as the closest vehicles to the follo wer in the direction of travel, either within the same lane center as the follower or in lane centers connected to it. This procedure, consistent with the methodology described in [34], enabled us to extract all candidate car-follo wing pairs from the dataset. Following extraction, we applied the preprocessing steps described earlier to obtain the final car-follo wing pairs for analysis. T able 1 summarizes the trajectory durations, in seconds, and the number of car-follo wing pairs for each interaction type extracted from the WOMD. As noted, because each WOMD scenario has a maximum length of 20 seconds, the mean duration of car-follo wing pairs is around 9.5 seconds across all interaction types. 5 T ABLE 1: Descriptive statistics of trajectory durations in seconds and number of car-follo wing (CF) pairs for dif ferent interaction types in WOMD. Dataset Interaction T ype CF Pairs Mean (s) Std (s) Max (s) Min (s) WOMD (T rain) A V -following-HD V 3,778 9.28 2.67 15.7 5.1 WOMD (T est) A V -following-HD V 420 9.34 2.80 15.8 5.4 WOMD (T rain) HD V -following-A V 4,896 9.65 2.74 15.7 5.8 WOMD (T est) HD V -following-A V 544 9.67 2.78 15.7 6.0 WOMD (T rain) HD V -following-HD V 29,477 9.43 2.63 15.8 5.2 WOMD (T est) HD V -following-HD V 3,276 9.48 2.65 15.7 5.9 Specifically , 4,198 car-follo wing pairs were extracted for A V -following-HD V interactions, while 5,440 and 32,753 pairs were extracted for HD V -following-A V and HD V -follo wing-HD V interactions, respecti vely . 3.2. Naturalistic Phoenix Dataset The Naturalistic PHX Dataset was de veloped to capture detailed W aymo v ehicle trajectories and behavioral interactions in real-world traffic conditions across the Phoenix metropolitan area. High-resolution aerial videos were collected using a stabilized camera mounted on a helicopter , cov ering major arterial corridors with varying geometric and control features. These raw videos were processed through a multi-stage e xtrac- tion pipeline [35]. The resulting dataset provides precise, continuous trajectories at 0.1 second resolution. Compared with the WOMD, the PHX dataset offers longer trajectory durations, enabling the analysis of dynamic scenarios such as queue formation, dissipation, and heterogeneous car-follo wing behaviors. T o ensure high-fidelity trajectory reconstruction, all vehicle positions in the PHX dataset were trans- formed into a unified, ground-fixed coordinate system. Follo wing the data collection procedure in Am- mourah et al. [35], the image coordinates of the moving aerial platform were mapped to a consistent spatial reference frame, ensuring high-precision tracking across the study area. Object detections e xtracted from the aerial video were projected into this same fixed coordinate system, ensuring spatial consistency across all frames. Each detected vehicle was assigned to its corresponding lane region based on its centroid location. After establishing lane assignment, the longitudinal ordering of vehicles within each lane was used to identify leader–follo wer pairs dynamically ov er time. This procedure allo wed e very vehicle’ s immediate leader and follo wer at any giv en moment to be determined with high spatial accurac y , forming the foundation for robust model calibration and behavioral analysis. The resulting pairs were then further processed using the identical preprocessing steps described earlier to obtain the final set of car-follo wing pairs used in the analysis. T able 2 shows summary statistics of trajectory durations and the number of car-follo wing pairs for dif ferent interaction types in the PHX dataset. Compared to WOMD, PHX provides substantially longer trajectories, with mean durations exceeding 25 seconds and greater variability in length, making it partic- ularly suitable for ev aluating zero-shot generation. Giv en that A V behaviors may dif fer between WOMD and PHX [31], we restrict the zero-shot generation analysis to the 106 HD V -following-HD V pairs in PHX in order to assess whether the proposed and baseline models can reproduce heterogeneous HD V behaviors in a completely ne w dataset. 3.3. Third Generation Simulation Dataset The Third Generation Simulation (TGSIM) dataset [35, 36] represents a recent ef fort to generate high–fidelity vehicle trajectories that capture interactions among HD Vs and partially A Vs under naturalistic traf fic condi- tions. In contrast to the primarily urban environments of the WOMD and PHX datasets, TGSIM includes extensi ve data from freeway segments, such as I–294, I–90, and I–94. In this study , we specifically utilize 6 T ABLE 2: Descriptive statistics of trajectory durations in seconds and number of car-follo wing (CF) pairs in PHX. Dataset Interaction T ype CF Pairs Mean (s) Std (s) Max (s) Min (s) PHX A V -following-HD V 16 26.41 19.28 74.6 6.1 PHX HD V -following-A V 20 28.22 23.05 100.9 7.2 PHX HD V -following-HD V 106 25.60 18.16 89.2 4.9 the I–294 dataset to e v aluate the scalability of the proposed MC-CF model when augmented with high speed free way dynamics. The trajectories were extracted using a unified pipeline incorporating image stabilization, deep-learning- based vehicle detection, multi-frame tracking, and Kalman-filter-based smoothing. The moving–helicopter approach used to collect the dataset provides long continuous trajectories of the vehicles, and the result- ing trajectories cov er a wide range of operational scenarios, including free-flow , slow-and-go, congested shockwa ves, forced mer ges, discretionary lane changes, and A V responses to surrounding traffic. 4. METHODOLOGY In this section, we present the nov el MC-CF model, the first within the empirical probabilistic paradigm of car-follo wing models. Instead of prescribing a deterministic acceleration function as in classical physics- based car-follo wing models, this study adopts a probabilistic formulation that treats state transitions as a Marko v chain. W e also summarize the baseline models and the calibration method used for comparison. 4.1. Markov Chain Car -Following Model The e volution of the follo wer vehicle is represented in discrete time with step size ∆ t . At each time step t , the state of the system is described by the triplet s t = v t , ∆ v t , d t , where v t is the speed of the follower , ∆ v t = v t − v lead t is the relativ e speed with respect to the leader , and d t = x lead t − x t − l is the spacing. Here, l represents the a verage length of the leader and follower vehicles. The acceleration of the follo wer at time step t is denoted by a t . The stochastic dynamics of the system are modeled as a first-order Markov chain on the discretized state space S . The continuous state space is partitioned into a three-dimensional grid of bins using the Freedman–Diaconis rule for automatic bandwidth selection [37, 38], which adapts the number of bins to the data distribution while minimizing bias. The bin width h is computed as h = 2 · IQR ( X ) n 1 / 3 , (1) where IQR ( X ) is the interquartile range of the data X and n is the sample size. The number of bins is then calculated as k = x max − x min h . (2) The ranges for each dimension, relative speed ∆ v t ∈ [ − 10 , 10 ] m/s, spacing d t ∈ [ 0 , 45 ] m, and follower speed v t ∈ [ 0 , 20 ] m/s, were set based on the dataset to ensure all data samples fall within the grid and are classified into a bin. T able 3 summarizes the binning results across the three interaction types. Due to the larger number of samples in HD V -following-HD V , the Freedman–Diaconis rule yields a much smaller step size, resulting in a significantly larger number of bins. T o characterize dri ving beha vior , one could theoretically estimate transition probabilities directly be- tween the discretized bins defined based on the Freedman-Diaconis rule (T able 3). Howe ver , giv en the high 7 T ABLE 3: Binning results for dif ferent car-follo wing cases using the Freedman–Diaconis rule. Dimension Range A V -following-HD V HD V -follo wing-A V HD V -follo wing-HDV # bins Step size # bins Step size # bins Step size Relati ve Speed ( m / s ) ( − 10 , 10 ) 394 ∼ 0 . 05 444 ∼ 0 . 05 1,041 ∼ 0 . 02 Spacing ( m ) ( 0 , 45 ) 109 ∼ 0 . 41 139 ∼ 0 . 32 357 ∼ 0 . 13 Follo wer Speed ( m / s ) ( 0 , 20 ) 89 ∼ 0 . 22 90 ∼ 0 . 22 230 ∼ 0 . 09 dimensionality of the state space, man y bins may contain insufficient samples for reliable statistical estima- tion. T o address this sparsity while preserving local behavioral fidelity , we employ a spatially constrained state clustering algorithm, detailed in Algorithm 1. This approach structurally refines the state space during the training phase. W e first define a minimum sample threshold N min = 10. This value was selected to maintain a balance between behavioral resolution and statistical reliability: an excessi vely high threshold risks mer ging distinct states that are far apart, while an ov erly low threshold may ha ve bins dominated by sensor noise. The algorithm identifies sparse bins that fall belo w this threshold and iterativ ely merges them into their nearest neighbor . Importantly , to ensure that the distance metric accounts for the dif ferent scales of the three state variables, the nearest neighbor search is performed using Euclidean distance in a normalized feature space, where each dimension is scaled by its range. This process yields a refined set of robust state clusters C and a mapping function M : S → C that assigns a raw discretized state s ∈ S to a specific cluster C ∈ C . Consequently , the ef fectiv e state space of the MC-CF model becomes the set of these validated clusters. By design, every cluster contains at least N min acceleration samples without the need for synthetic imputation. Empirically , this procedure significantly mitigates state space sparsity . For instance, in the HD V - follo wing-HD V case, the Freedman-Diaconis rule-based discretization yields a theoretical grid of o ver 85.4 million bins (=1041 × 357 × 230 ) . Ho wev er , observed driving behaviors occupy only 1.46 million of these bins (approximately 1.7%). The clustering algorithm further merges these acti ve bins into 123,235 clusters, achie ving a compression ratio of 11.89. This process effecti vely isolates the relev ant behavioral subspace, ensuring that ev ery cluster meets the minimum acceleration sample requirement for reliable statistical esti- mation. The dri ving dynamics are modeled as a Markov chain over the clusters. The transition probability is defined as the probability of moving from cluster C to cluster C ′ in one time step: P ( C ′ | C ) = ∑ ( s t , s t + 1 ) ∈ D I ( M ( s t ) = C ∧ M ( s t + 1 ) = C ′ ) ∑ ( s t , s t + 1 ) ∈ D I ( M ( s t ) = C ) . (3) Simultaneously , we construct an empirical acceleration distribution A ( C ) for each cluster using the pooled acceleration samples from the mer ged bins. Once the sample set is gathered, it is refined by removing outliers using the interquartile range (IQR) rule, retaining only v alues within the standard IQR bounds (Q1- 1.5 · IQR to Q3+1.5 · IQR). This process yields a robust, localized acceleration distribution A ( C ) for each cluster , ensuring statistical v alidity while removing sensor noise. In the inference phase, at time t , we determine the current cluster C t for state s t using the mapping function M ( s t ) . Howe ver , to handle rare or previously unseen states that were not populated during training, we employ a nearest-neighbor fallback mechanism: C t is assigned to the cluster with the closest centroid 8 Algorithm 1: Spatially Constrained State Clustering T raining Procedure Input: T rajectory Data D , Bin Edges B r , B s , B f , Min Samples N min = 10 Output: Cluster Map M , T ransition Matrix P , Acceleration Distributions A // 1. Initialization Discretize D into 3D grid bins using B r , B s , B f ; Initialize set of clusters C where each unique grid bin is a cluster C i ; f or each cluster C i ∈ C do Calculate sample count n i , centroid µ i , and collected accelerations A i ; Calculate normalized centroid ˆ µ i based on feature ranges; end // 2. Iterative Merging Identify sparse clusters S = { C i ∈ C | n i < N min } ; while S = / 0 do f or each sparse cluster C src ∈ S do Find nearest neighbor cluster C d st ∈ C \ { C src } minimizing || ˆ µ src − ˆ µ d st || 2 ; Record potential merge tuple ( C src , C d st , dist ) ; end Sort merge tuples by source count (ascending) and distance (ascending); f or each valid mer ge tuple ( C src , C d st ) do if C src or C d st has been r emoved in curr ent batch then Continue; end // Merge Source into Destination n d st ← n d st + n src ; µ d st ← W eightedA vg ( µ d st , µ src ) ; ˆ µ d st ← W eightedA vg ( ˆ µ d st , ˆ µ src ) ; A d st ← A d st ∪ A src ; Remov e C src from C ; Update map M : original_bins ( C src ) → C d st ; end Update sparse set S based on new counts; end // 3. Transition Probability Calculation f or each transition ( C t , C t + 1 ) in D using M do Increment transition count T C t → C t + 1 ; end Compute ro w-normalized probabilities P ( C ′ | C ) ; retur n M , P , A in the normalized feature space. This ensures that a valid current cluster for the current state is always identified, pre venting model failure in edge cases. W e utilize this framew ork in two modes. In the deterministic version, namely , MC-CF (det), the most probable next cluster is selected, and the acceleration is the conditional mean: C t + 1 = arg max C ′ P ( C ′ | C t ) , a t = E [ A ( C t + 1 )] . (4) 9 In the stochastic version, denoted as MC-CF (stoch), the next cluster is sampled from the transition distribution, and the acceleration is dra wn from the empirical distribution: C t + 1 ∼ P ( · | C t ) , a t ∼ A ( C t + 1 ) , (5) allo wing the model to capture both the stochasticity of state ev olution and the v ariability of driver responses within a gi ven traf fic state. W e ev aluate the proposed models in two usage modes: one-step prediction and open-loop prediction. In one-step prediction, the update is performed for a single step and compared directly with the observed states. v t + 1 = max v t + a t ∆ t , 0 (6) x t + 1 = x t + 1 2 ( v t + v t + 1 ) ∆ t (7) d t + 1 = x lead t + 1 − x t + 1 − l (8) In open-loop prediction, the procedure is applied recursi vely to generate an entire follower trajectory gi ven the sequence of ground truth leader states and the initial follower state, thereby assessing long-term predicti ve consistency . 4.2. Baseline Model Calibration and Implementation T o benchmark the proposed MC-CF model, six representativ e car-follo wing models are considered: the IDM [6], the SIDM [7], the V an Arem’ s model [24], the FVDM-CTH [21, 23], the FVDM-Sigmoid [21, 23], and the Gipps model [5]. T ogether , these models span a di verse range of behavioral paradigms. This collection provides a comprehensive baseline co vering deterministic and stochastic dynamics, against which the proposed MC-CF model can be systematically e valuated. 4.2.1. IDM The IDM proposed by T reiber et al. [6] specifies acceleration as a t = a max " 1 − v t v 0 δ − s ∗ ( v t , ∆ v t ) d t 2 # , (9) where v 0 is the desired speed, a max is the maximum acceleration, δ is an acceleration exponent, ∆ v t is the follo wer speed ( v t ) minus the leader speed ( v l ead t ), and d t is the spacing. The desired gap function is s ∗ ( v t , ∆ v t ) = s 0 + v t T + v t ∆ v t 2 √ a max b , (10) with s 0 the minimum gap, T the desired time headway , and b the comfortable deceleration. 4.2.2. SIDM T o capture oscillatory traf fic dynamics, T reiber and K esting [7] e xtended the IDM by introducing stochastic fluctuations (SIDM): a t = a IDM t + σ ξ t , (11) where a IDM t is the deterministic IDM acceleration, ξ t is a Gaussian random variable with zero mean and unit v ariance, and σ controls the strength of stochasticity . This v ariant reflects heterogeneity in driv er responses and reproduces traf fic flow instabilities. 10 4.2.3. V an Arem Model V an Arem et al. [24] proposed a control-oriented formulation for cooperati ve adaptive cruise control sys- tems. The acceleration of the follower is determined as the minimum of a speed-based demand ( a t _ v ) and a distance/speed-based demand ( a t _ d ): a t = min ( a t _ v , a t _ d ) . (12) The speed-based demand is gi ven by a t _ v = k · ( v int − v t ) , (13) where v int is the dri ver’ s intended speed and k is a positi ve feedback gain. The distance/speed-based demand is gi ven by a t _ d = k a a l ead t − k v ∆ v t + k d ( d t − d ref t ) , (14) where a lead t is the acceleration of the leader , d ref t is a dynamic desired spacing deriv ed from time headway and safety considerations, and k a , k v , k d are positive feedback gains. The reference clearance d ref t is defined as the maximum of three v alues: d ref t = max ( r sa f e , r syst em , r min ) (15) where r min is the minimum spacing. The safe following distance ( r sa f e ) is computed based on the speed of the follo wer and the deceleration capabilities of leader ( d p ) and follo wer ( d ): r sa f e = ( ∆ v t ) 2 2 1 d p − 1 d , (16) and the system time-gap distance ( r syst em ) is a function of the ego vehicle’ s speed and a system time headway setting t syst em : r syst em = t syst em · v t (17) 4.2.4. FVDM-CTH The FVDM by Jiang et al. [21] combines a desired velocity rule with a velocity difference term. In the constant time headway v ariant proposed by Punzo et al. [23], namely FVDM-CTH, the acceleration is a t = K 1 V ( d t ) − v t − K 2 ∆ v t , (18) where K 1 and K 2 are positi ve feedback gains. The desired velocity is gi ven by V ( d t ) = ( 0 , d t ≤ s 0 , min V max , d t − s 0 T , d t > s 0 , (19) with s 0 the minimum spacing, T the desired time headway , and V max the maximum speed. 4.2.5. FVDM-Sigmoid A smooth alternativ e, namely FVDM-Sigmoid, is to represent the desired velocity with a sigmoid profile [23]: V ( d t ) = 0 , d t ≤ s 0 , V max 2 h 1 − cos π ( d t − s 0 ) T V max i , s 0 < d t < s 0 + T V max , V max , d t ≥ s 0 + T V max . (20) The acceleration structure remains the same as in the CTH formulation, with the sigmoid desired speed replacing the piece wise linear function. 11 4.2.6. Gipps Finally , the Gipps model [5] balances free-flow acceleration with collision-av oidance constraints. The fol- lo wer’ s speed at the next time step is determined by v t + 1 = min v t + 2 . 5 a max τ 1 − v t V max r 0 . 025 + v t V max , − b τ 2 + θ + s b 2 τ 2 + θ 2 + b h 2 ( d t − s 0 ) − τ v t + ( v lead t ) 2 ˆ b i ! (21) where a max denotes the maximum acceleration, b is the comfortable deceleration, τ is the reaction time, θ is an additional anticipation parameter , s 0 is the minimum spacing, V max is the desired speed, and ˆ b represents the expected deceleration capability of the leader vehicle. The resulting acceleration is approximated as a t = ( v t + 1 − v t ) / τ . The parameters for all six baseline car-follo wing models were calibrated using a trajectory-based op- timization approach. This method minimizes the root mean squared error (RMSE) between the simulated follo wer speed ( v sim ) and the actual follower speed ( v true ) over the entire time series for all segments in the training dataset. The objectiv e function for calibration is: Minimize RMSE v = v u u t 1 ∑ M j = 1 T j M ∑ j = 1 T j ∑ t = 1 ( v sim , j , t − v true , j , t ) 2 (22) where M is the total number of distinct car-following pairs in the training dataset, T j is the number of time steps of the j -th car-follo wing pair , and v sim , j , t is the simulated speed of the follo wer in pair j at time t . The minimization is performed by simulating the follo wer’ s trajectory ov er the entire trajectory using only the true initial conditions (position and speed) of the follower vehicle, allowing simulation errors to propagate throughout the time series. The optimization was performed using the Differential Evolution (DE) algorithm [39] to ef fecti vely explore the parameter space. Specifically , the DE algorithm was implemented using the SciPy optimization library ( scipy.optimize.differential_evolution ). The DE parameters were configured as follows: strategy = best1bin , population size = 15, mutation factor in [ 0 . 5 , 1 . 0 ] , recombination probability = 0.7, maximum iterations = 50, and con ver gence tolerance = 0.01. The calculated acceleration for all models was constrained within a practical range of [ − 10 , 5 ] m/s 2 to reflect real-world vehicle limitations. T able 4 summarizes the parameter bounds used for each model. This rigorous setup ensures a fair and systematic baseline comparison against the proposed MC-CF models. 5. RESUL TS AND DISCUSSION In this section, we first compare the trajectory prediction accurac y of the proposed MC-CF (det) and MC-CF (stoch) models against the baseline models across the three interaction types: A V -follo wing-HDV , HD V - follo wing-A V , and HDV -following-HD V . All models are trained using the WOMD training dataset and e val- uated on the WOMD test dataset. W e then compare the distribution of probabilities of obtaining the gener- ated follower trajectories from each trained model with that of the ground truth follower trajectories, where probabilities are computed using transition probabilities estimated from the WOMD training dataset. Next, we examine model performance on zero-shot generalization using the PHX dataset. Finally , we demonstrate the frame work’ s scalability through microscopic ring road simulations. 12 T ABLE 4: Parameter Bounds Applied for Baseline Models Calibration. Model Parameter Symbol Unit [ ] IDM Desired Speed v 0 m/s 5.0 50.0 Desired T ime Headway T s 0.5 3.0 Maximum Acceleration a max m/s 2 0.1 5.0 Comfortable Decel. b m/s 2 0.1 10.0 Minimum Spacing s 0 m 0.5 10.0 Acceleration Exponent δ — 1.0 10.0 SIDM IDM Parameters — — — — Noise Std De v σ m/s 2 0.01 2.0 V an Arem Acceleration Gain k a — 0.1 5.0 Speed Gain k v s − 1 0.1 5.0 Spacing Gain k d s − 2 0.1 5.0 System T ime Headway t system s 0.5 3.0 Intended Speed v int m/s 5.0 50.0 Minimum Spacing r min m 0.1 5.0 Leader Decel. Capability d p m/s 2 0.1 10.0 Follo wer Decel. Capability d m/s 2 0.1 10.0 Intended Speed Gain k — 0.1 1.0 FVDM-CTH Spacing Gain K 1 s − 2 0.1 5.0 V elocity Gain K 2 s − 1 0.1 5.0 Minimum Spacing s 0 m 0.1 10.0 Desired T ime Headway T s 0.5 3.0 Maximum Speed V max m/s 5.0 50.0 FVDM-Sigmoid Same as FVDM-CTH — — — — Gipps Maximum Acceleration a max m/s 2 0.5 3.0 Comfortable Deceleration b m/s 2 1.0 4.0 Reaction T ime τ s 0.1 1.5 Anticipation T ime θ s 0.3 1.0 Minimum Spacing s 0 m 0.1 10.0 Desired Speed V max m/s 5.0 50.0 Expected Leader Decel. ˆ b m/s 2 2.0 5.0 5.1. T rajectory Prediction Analysis W e present the trajectory prediction performance of the proposed MC-CF (det, stoch) models in comparison with six baseline models (IDM, V an Arem, FVDM-CTH, FVDM-Sigmoid, Gipps, and SIDM). The pro- posed and baseline models are trained on the WOMD training dataset and tested on the WOMD test dataset separately for three interaction types: A V -following-HD V , HD V -following-A V , and HD V -follo wing-HDV . For deterministic models, including MC-CF (det), we generate a single trajectory ( K = 1) since they produce identical follower trajectories for a gi ven leader trajectory . In contrast, stochastic models such as SIDM and MC-CF (stoch) can yield multiple plausible realizations conditioned on the same leader behavior . T o capture this variability , we generate K ∈ { 1 , 3 , 6 , 10 , 15 } follower trajectories for each leader trajectory in the test dataset. 13 5.1.1. Evaluation Metrics Model ev aluation is conducted in two categories: one-step prediction and open-loop prediction. For the one-step prediction, we compute the RMSE for spacing ( RM S E ( s ) ), speed ( RM SE ( v ) ), and acceleration ( RM SE ( a ) ). These metrics quantify how accurately each model estimates the instantaneous spacing, speed, and acceleration of the follower vehicle. Given that each car-follo wing pair contains, on av erage, more than 90 time steps with 420, 544, and 3,276 test pairs for A V -follo wing-HD V , HD V -following-A V , and HD V -following-HD V cases, respectiv ely (as shown in T able 1), the stochastic models are ev aluated with K = 1 for the one-step prediction analysis. This provides a robust measure of their instantaneous predicti ve capability , allo wing a direct comparison with the deterministic models. Open-loop prediction, on the other hand, assesses long-term trajectory consistency and spatial accurac y . For stochastic models, e v aluation is based on the best-matching realization among the K generated trajecto- ries, acknowledging that multiple follower trajectories may be plausible gi ven the same leader trajectory in real-world dri ving. W e employ five complementary open-loop metrics, each capturing dif ferent aspects of trajectory qual- ity . For a fair comparison between models, all open-loop metrics other than the overlapping rate ( OR ) are computed using the car-follo wing pairs without ov erlapping only . Specifically , to ensure valid pairwise com- parison, giv en that shorter predicted trajectories starting from the true initial point will naturally be closer to the ground truth, the ev aluation set includes only those car-follo wing pairs in the test dataset that (i) do not result in a crash for any deterministic model, (ii) hav e at least one non-crashing trajectory among the K SIDM trajectories, and (iii) hav e at least one non-crashing trajectory among the K MC-CF (stoch) trajecto- ries. As a result, all open-loop metrics are computed over the same time horizon for all models, ensuring a fair comparison. First, the dynamic time warping (DTW) distance computes the minimal cumulati ve cost required to align two sequences by allo wing non-linear time stretching. DTW is particularly effecti ve for time series that are out of phase, as it warps the time axis to find an optimal alignment between two sequences X = [ x 1 , x 2 , . . . , x m ] and Y = [ y 1 , y 2 , . . . , y n ] [40]. This capability is crucial in car-follo wing scenarios where minor temporal shifts in driv er behavior or model predictions can lead to large Euclidean distances, ev en if the ov erall shape of the trajectory is similar . The DTW distance between X and Y is given by DT W ( X , Y ) = min W ∑ ( i , j ) ∈ W d ( x i , y j ) , (23) where W = [( i 1 , j 1 ) , . . . , ( i M , j M )] is a warping path that maps elements of X to elements of Y , and d ( · , · ) denotes the local cost, taken in this study as squared Euclidean distance. W e compute DTW for each of the K ′ ( ≤ K ) generated non-crashing trajectories and then take the best match realization among them, where K ′ = 1 for deterministic models. The minimum DTW is calculated separately for spacing and speed, denoted as minDT W ( s ) and minDT W ( v ) , respectiv ely . For spacing, let the ground truth spacing sequence be s = [ s 1 , . . . , s T ] and the k th predicted spacing be ˆ s k = [ ˆ s 1 , k , . . . , ˆ s T , k ] . The DTW cost for the k th sample is DT W k ˆ s k , s = min W ∑ ( i , j ) ∈ W d ˆ s i , k , s j , (24) and the minimum DTW for spacing is then defined as minDT W ( s ) = min k ∈{ 1 ,..., K ′ } DT W k ˆ s : , k , s . (25) Similarly , for speed, let the ground truth speed be v = [ v 1 , . . . , v T ] and the k th predicted speed be ˆ v k = [ ˆ v 1 , k , . . . , ˆ v T , k ] . W e compute minDT W ( v ) = min k ∈{ 1 ,..., K ′ } DT W k ˆ v k , v . (26) 14 The third open-loop prediction metric, the minimum av erage displacement error (minADE), quantifies the mean spatial deviation between predicted and ground truth trajectories while accounting for stochastic predictions. For each non-crashing trajectory sample k ∈ { 1 , . . . , K ′ } , the av erage displacement error (ADE) is computed as ADE k = 1 T T ∑ t = 1 ∥ ˆ y t , k − y t ∥ 2 , (27) where y t denotes the ground truth position of the follo wer v ehicle at time step t , ˆ y t , k represents the predicted position at time step t for the k t h non-crashing trajectory , and T represents the total number of time steps in the trajectory . For each car -following pair , the minADE across K ′ non-crashing trajectories is then taken as minADE = min k ∈{ 1 ,..., K ′ } ADE k . (28) Next, the minimum final displacement error ( minF DE ) measures the smallest final position error among the K ′ non-crashing predicted trajectories: minF DE = min k ∈{ 1 ,..., K ′ } ∥ ˆ y T , k − y T ∥ 2 . (29) Finally , the OR e valuates safety-related consistency by computing the ratio of collision ev ents to the total number of car -following pairs in the test dataset. A collision is counted when the follower’ s pre- dicted position ov erlaps with the leader’ s position at any time step. T o ensure a fair comparison between deterministic and stochastic models, the OR is computed using a single generated trajectory ( K = 1) for each car-follo wing pair . Giv en the large size of the test dataset as shown in T able 1, using one realization for stochastic models still gi ves a statistically meaningful estimate of their safety , indicating how often the model produces collisions across the full dataset. T ogether , the three one-step prediction metrics and the fi ve open-loop prediction metrics provide a com- prehensi ve ev aluation framework that captures both instantaneous accuracy and long-horizon trajectory ac- curacy , allo wing fair comparison across deterministic and stochastic car -following models. 5.1.2. Prediction Results T able 5 presents the trajectory prediction performance for the A V -follo wing-HDV case, cov ering both one- step and open-loop prediction results for deterministic and stochastic car-follo wing models. All performance measures are av eraged ov er the ev aluated car-follo wing pairs in the WOMD test dataset. Since all models are e valuated on the same dataset under identical conditions, the averaged measures across pairs can be directly compared. The results clearly highlight the ef fectiv eness of the proposed MC-CF models, both deterministic and stochastic v ariants, each exhibiting distinct strengths. For the one-step prediction, which e valuates im- mediate next-step accuracy , MC-CF (det) achie ves the lowest RMSE in both speed and acceleration (i.e., RM SE ( v ) , RMS E ( a ) ), outperforming all baseline models and MC-CF (stoch). This suggests that, for instan- taneous prediction, a deterministic acceleration sampled directly from the next state with highest transition probability provides the most precise estimation of ne xt-step speed and acceleration. In contrast, for spacing estimation (i.e., RMSE(s)), MC-CF (stoch) performs better than MC-CF (det). This can be attributed to the stochastic variability in leader beha vior , which is not fully captured by the current state representation ( v t , ∆ v t , and d t ). Hence, introducing stochasticity in the acceleration prediction helps capture this uncertainty , leading to more accurate next-step spacing estimates. Moving to the open-loop prediction, where prediction errors can accumulate ov er time, MC-CF (stoch) consistently outperforms all deterministic models, as well as the baseline SIDM, across all open-loop met- rics: minDT W ( s ) , minDT W ( v ) , minADE and minF DE . This demonstrates that stochastic acceleration 15 T ABLE 5: T rajectory prediction performance for A V -following-HD V . Model One-Step Prediction Open-Loop Prediction RM SE ( s ) RM SE ( v ) RM SE ( a ) minDT W ( s ) minDT W ( v ) minADE minF DE OR Deterministic IDM 0.0187 0.1755 1.8074 8.3772 4.0860 1.7073 3.4103 0.0024* V an-Arem 0.0183 0.1570 1.5470 9.3201 4.1594 1.8277 3.9218 0.0071 FVDM-CTH 0.0183 0.1575 1.5570 7.8822 3.9071 1.5886 3.5725 0.0095 FVDM-Sigmoid 0.0183 0.1591 1.5732 7.9764 3.9329 1.6255 3.6235 0.0095 Gipps 0.0171 0.0945 0.9357 9.4957 4.2409 2.0343 3.8006 0.0048 MC-CF (det) 0.0168 0.0641* 0.6077* 10.4219 4.8604 1.6112 4.4190 0.0667 Stochastic SIDM (1) 0.0094 0.1584 1.6033 8.4411 4.0760 1.7348 3.4357 0.0024* SIDM (3) - - - 8.3790 4.0423 1.7210 3.3996 - SIDM (6) - - - 8.3461 4.0244 1.7136 3.3785 - SIDM (10) - - - 8.3272 4.0137 1.7099 3.3663 - SIDM (15) - - - 8.3078 4.0044 1.7060 3.3563 - MC-CF (stoch, 1) 0.0055* 0.0788 0.7698 10.8026 4.9925 1.6319 4.5116 0.0595 MC-CF (stoch, 3) - - - 8.5198 3.9503 1.3461 3.5654 - MC-CF (stoch, 6) - - - 7.5889 3.4932 1.2270 3.1517 - MC-CF (stoch, 10) - - - 7.0738 3.2689 1.1652 2.9118 - MC-CF (stoch, 15) - - - 6.6274* 3.0995* 1.1037* 2.7329* - Note : * indicates the minimum (best) value within each column. In case of ties, multiple entries are marked. sampling provides improved long-horizon consistency and adaptability to varying leader dynamics. How- e ver , we note that MC-CF (stoch) e xhibits an ov erlapping rate of around 6%, higher than that of all other baselines, whose o verlapping rates are below 1%. Addressing this overlapping issue is essential for applying the model in microscopic simulations, as discussed further in Section 5.4. It should be noted that, while FVDM-CTH achieves the best open-loop trajectory prediction perfor- mance for A Vs among the deterministic baselines, it exhibits a substantially higher RMSE(a) than the MC- CF v ariants. This implies that although traditional physics-based car-follo wing models can reproduce vi- sually plausible long-term trajectories with low crash rates and are often used for simulation or stability analysis, their acceleration patterns significantly de viate from the actual behavior of A Vs. Consequently , re- lying on such models to analyze A V dynamics or ev aluate traffic stability may lead to misleading or incorrect conclusions. For both HD V -follo wing-A V and HDV -following-HD V scenarios, as presented in T ables 6 and 7, re- specti vely , the ov erall performance trends are consistent with those observ ed in the A V -following-HD V case, reaf firming the superior trajectory prediction accuracy of the proposed MC-CF framew ork. In the one-step prediction metrics, MC-CF (det) again achieves the lowest RMSE in both speed and acceleration across all models, highlighting its strong short-term predictiv e accuracy . This suggests that the deterministic v ariant ef fectiv ely captures the immediate response behavior of human dri vers, whether follo wing A Vs or other HD Vs. The slight increase in RMSE(a) for HDV -following-A V (0.6729 m / s 2 ) and HD V -follo wing- HD V (0.6357 m / s 2 ) compared to the A V -follo wing-HD V case (0.6077 m / s 2 ) reflects the greater variability in human driving behavior , particularly when driv ers interact with the relatively uncommon A Vs on the road. In contrast, for the open-loop metrics including minDT W ( s ) , minDT W ( v ) , minADE , and minF DE , MC- CF (stoch) consistently deli vers the best performance in both HD V -follo wing-A V and HD V -follo wing-HDV scenarios. Howe ver , the ov erlapping rate of MC-CF (stoch) is again notably higher than that of the physics- based baselines across all interaction types, which often incorporate explicit, hard-coded safety constraints. As discussed further in Section 5.4, this collision rate is not a fixed limitation of the architecture, but rather an issue that can be significantly mitigated through the straightforward addition of more di verse trajectory 16 T ABLE 6: T rajectory prediction performance for HD V -following-A V . Model One-Step Prediction Open-Loop Prediction RM SE ( s ) RM SE ( v ) RM SE ( a ) minDT W ( s ) minDT W ( v ) minADE minF DE OR Deterministic IDM 0.0640 0.2124 2.3341 9.2262 3.6625 2.0941 4.2485 0.0000* V an-Arem 0.0636 0.1670 1.6473 8.9135 3.6904 1.9973 4.3032 0.0110 FVDM-CTH 0.0637 0.1720 1.7008 8.5489 3.5743 1.9093 4.1597 0.0147 FVDM-Sigmoid 0.0637 0.1733 1.7141 8.5179 3.5975 1.8915 4.1412 0.0147 Gipps 0.0628 0.1047 1.0450 8.8513 3.6436 2.0944 4.1735 0.0000* MC-CF (det) 0.0627 0.0692* 0.6729* 12.9675 4.8780 2.1156 5.5320 0.0202 Stochastic SIDM (1) 0.0272 0.1845 1.9673 9.1456 3.6269 2.0959 4.2512 0.0000* SIDM (3) - - - 8.7529 3.4543 2.0191 4.0444 - SIDM (6) - - - 8.5731 3.3751 1.9886 3.9526 - SIDM (10) - - - 8.4476 3.3198 1.9647 3.8928 - SIDM (15) - - - 8.3501 3.2849 1.9478 3.8439 - MC-CF (stoch, 1) 0.0238* 0.0863 0.8520 13.0722 4.9551 2.1032 5.4801 0.0221 MC-CF (stoch, 3) - - - 9.5669 3.7352 1.7273 4.2458 - MC-CF (stoch, 6) - - - 8.5552 3.4034 1.5752 3.7839 - MC-CF (stoch, 10) - - - 7.8877 3.1624 1.4769 3.4561 - MC-CF (stoch, 15) - - - 7.5742* 3.0313* 1.4255* 3.2846* - Note : * indicates the minimum (best) value within each column. In case of ties, multiple entries are marked. T ABLE 7: T rajectory prediction performance for HD V -following-HD V . Model One-Step Prediction Open-Loop Prediction RM SE ( s ) RM SE ( v ) RM SE ( a ) minDT W ( s ) minDT W ( v ) minADE minF DE OR Deterministic IDM 0.0343 0.2463 2.7957 8.7098 3.2324 1.8831 3.7673 0.0015* V an-Arem 0.0329 0.1543 1.5333 7.8201 3.0488 1.6038 3.5931 0.0085 FVDM-CTH 0.0333 0.1792 1.7777 7.4429 3.0110 1.5510 3.5387 0.0159 FVDM-Sigmoid 0.0333 0.1800 1.7857 7.4039 3.0254 1.5454 3.5211 0.0165 Gipps 0.0316 0.1108 1.1092 9.2229 3.3744 1.9943 3.8893 0.0015* MC-CF (det) 0.0314 0.0647* 0.6357* 9.3702 3.4945 1.6104 4.2171 0.0305 Stochastic SIDM (1) 0.0233 0.2205 2.3588 8.6577 3.2725 1.8900 3.7888 0.0015* SIDM (3) - - - 8.5213 3.2067 1.8674 3.7216 - SIDM (6) - - - 8.4549 3.1733 1.8559 3.6886 - SIDM (10) - - - 8.4140 3.1532 1.8488 3.6685 - SIDM (15) - - - 8.3862 3.1389 1.8439 3.6549 - MC-CF (stoch, 1) 0.0183* 0.0828 0.8257 9.5923 3.5591 1.6117 4.2469 0.0314 MC-CF (stoch, 3) - - - 7.0735 2.7091 1.3019 3.2248 - MC-CF (stoch, 6) - - - 6.1305 2.4037 1.1643 2.7818 - MC-CF (stoch, 10) - - - 5.6169 2.2351 1.0837 2.5260 - MC-CF (stoch, 15) - - - 5.2924* 2.1337* 1.0302* 2.3622* - Note : * indicates the minimum (best) value within each column. In case of ties, multiple entries are marked. data. As the number of generated stochastic trajectories increases from 1 to 15, the proposed MC-CF (stoch) model shows consistent impro vement across all open-loop metrics, as illustrated in Figures 1a-f. This steady improv ement reflects the benefit of stochastic sampling in capturing a broader range of realistic driving behaviors. Since each open-loop metric is calculated based on the best trajectory among the K ′ non-crashing 17 generated trajectories, higher K values naturally increase the chance of producing a trajectory closer to the observed ground truth. In contrast, deterministic models produce the identical trajectory for a gi ven leader trajectory , which may substantially de viate from the actual follower trajectory observ ed in real-world data. SIDM exhibits little improv ement as K increases, indicating limited stochasticity in its trajectory gen- eration. In contrast, the MC-CF (stoch) model produces a much more di verse set of possible trajectories conditioned on the same leader trajectory . This div ersity better reflects the inherent v ariability of real-world dri ving, where a follower’ s actual trajectory is only one realization among many plausible outcomes. Such v ariability arises from differences in individual dri ving styles as well as contextual factors like road geom- etry , surrounding traffic, and the presence of pedestrians or cyclists. Consequently , with K ≥ 10, MC-CF (stoch) consistently outperforms SIDM across all open-loop metrics and interaction types, as the increased di versity among generated trajectories raises the likelihood that one closely matches the observed ground truth. In addition, the pattern of error reduction shows a clear knee point in the performance curves. As sho wn in Figures 1b, d, and f, the largest gains in open-loop accuracy appear when K increases from 1 to 6. After K reaches 6, the improvement becomes modest, and the curves slo wly lev el of f as the y approach K = 15. This indicates that although stochastic sampling helps the model explore a wide range of plausible outcomes, only a small set of generated trajectories (e.g., K = 10) is required to find a close match to the ground truth. 5.2. T ransition Probability Analysis In this subsection, we present a statistical analysis to examine whether the distribution of the probabilities of observing follo wer trajectories generated by the calibrated models aligns with that of the ground truth follo wer trajectories. As pre viously discussed, the probability of observing the ground truth trajectory gi ven a leader trajectory is not one, o wing to the inherently stochastic nature of car-follo wing beha vior . Therefore, we ev aluate and compare the distrib utions of trajectory probabilities from the generated and ground truth data. Follo wing Zhang et al. [31], we use the geometric mean probability to represent the likelihood of a follo wer trajectory . Specifically , for each generated or ground truth follower trajectory with T time steps (i.e., T − 1 transitions) in the WOMD test dataset, we compute: GeomMeanProb = exp 1 T − 1 T − 1 ∑ t = 1 log P ( C t + 1 | C t ) ! , (30) where the transition probability P ( C t + 1 | C t ) is estimated from the WOMD training dataset. W e then compare the distrib utions of these geometric mean probabilities using the Mann–Whitney test. The null hypothesis of the Mann–Whitney test assumes that the model-generated and ground truth transition probability distributions come from the same underlying population, that is, there is no significant difference between them. Note that for stochastic models, we generate one trajectory per leader trajectory since the WOMD test set already provides a sufficiently large number of car-follo wing pairs for statistical comparison, as sho wn in T able 1. The Mann–Whitney test results in T ables 8a-c indicate that for the proposed MC-CF (stoch) model, we consistently fail to reject the null hypothesis that the generated and ground truth transition probability distributions come from the same population ( p > 0 . 1). This suggests that the transition dynamics produced by MC-CF (stoch) are statistically indistinguishable from those observed in the ground truth data across all interaction types. In the A V -follo wing-HDV case (T able 8a), MC-CF (stoch) sho ws strong alignment with the ground truth ( p = 0 . 640), performing comparably to V an-Arem ( p = 0 . 693) and FVDM v ariants. In the HD V -follo wing- A V scenario (T able 8b), while baseline models such as IDM, Gipps, and SIDM show statistically significant de viations ( p < 0 . 001), MC-CF (stoch) maintains a high p-value ( p = 0 . 832). Finally , in the HD V -following- HD V case (T able 8c), MC-CF (stoch) is one of only two models (along with FD VM-Sigmoid) where the 18 FIGURE 1: Open-loop performance comparison of SIDM and MC-CF (stoch) across different interaction types as the number of generated stochastic trajectories (K) increases from 1 to 15. (a), (c), and (e) show SIDM results for A V -follo wing-HD V , HD V -follo wing-A V , and HD V -follo wing-HD V cases, respectively , while (b), (d), and (f) present the corresponding results for MC-CF (stoch). 19 T ABLE 8: Comparison of model-generated and ground truth geometric mean probabilities using the Mann–Whitney test across three interaction types: (a) A V -follo wing-HDV , (b) HD V -following-A V , (c) HD V - follo wing-HD V . (a) A V -following-HD V Model T est Statistic p -V alue Significant Mean Median Ground T ruth – – – 0.388 0.367 IDM 96189 0.019 T rue 0.367 0.329 V an-Arem 86605 0.693 F alse 0.390 0.348 FVDM-CTH 85696 0.513 F alse 0.396 0.354 FVDM-Sigmoid 86296 0.629 F alse 0.393 0.355 Gipps 108647 0.000 T rue 0.337 0.281 SIDM 97308 0.008 T rue 0.366 0.317 MC-CF (det) 83461 0.197 F alse 0.410 0.369 MC-CF (stoch) 89633 0.640 F alse 0.382 0.361 (b) HD V -following-A V Model T est Statistic p -V alue Significant Mean Median Ground T ruth – – – 0.367 0.338 IDM 166801 0.000 T rue 0.340 0.298 V an-Arem 149486 0.770 F alse 0.359 0.333 FVDM-CTH 144624 0.519 F alse 0.371 0.339 FVDM-Sigmoid 147478 0.925 F alse 0.366 0.329 Gipps 174002 0.000 T rue 0.331 0.274 SIDM 178042 0.000 T rue 0.323 0.269 MC-CF (det) 134397 0.009 T rue 0.388 0.366 MC-CF (stoch) 146871 0.832 F alse 0.360 0.339 (c) HD V -following-HD V Model T est Statistic p -V alue Significant Mean Median Ground T ruth – – – 0.211 0.179 IDM 6116751 0.000 T rue 0.193 0.162 V an-Arem 5199860 0.039 T rue 0.211 0.183 FVDM-CTH 5039080 0.000 T rue 0.218 0.186 FVDM-Sigmoid 5246143 0.144 F alse 0.213 0.182 Gipps 6568550 0.000 T rue 0.181 0.153 SIDM 6440721 0.000 T rue 0.186 0.154 MC-CF (det) 4570529 0.000 T rue 0.232 0.201 MC-CF (stoch) 5259628 0.199 F alse 0.207 0.182 Note : A p -value belo w 0.1 indicates a statistically significant difference. null hypothesis is not rejected ( p = 0 . 199), demonstrating its robustness in capturing heterogeneous human dri ving behaviors. Overall, the proposed MC-CF (stoch) model demonstrates a robust ability to reproduce the ground truth transition probability distributions across all three interaction types. While the FVDM-Sigmoid model also produces statistically indistinguishable transition distributions in all cases, it falls short in prediction ac- curacy as shown in Section 5.1. Specifically , FVDM-Sigmoid e xhibits significantly higher errors in all 20 T ABLE 9: Trajectory prediction performance for HD V -following-HD V (Zero-Shot Generalization on PHX Dataset). Model One-Step Prediction Open-Loop Prediction RM SE ( s ) RM SE ( v ) RM SE ( a ) minDT W ( s ) minDT W ( v ) minADE minF DE OR Deterministic IDM 0.0703 0.2415 2.9318 76.1685 8.8606 7.5292 11.2900 0.0283* V an-Arem 0.0696 0.1796 1.3496 66.8863 7.1734 6.5011 10.8847 0.0377 FVDM-CTH 0.0697 0.1891 1.4581 57.9536 6.4863 5.6531 10.2218 0.0566 FVDM-Sigmoid 0.0697 0.1891 1.4577 57.3898 6.4440 5.6175 10.1114 0.0755 Gipps 0.0696 0.1660 1.1813 81.3445 9.6482 8.1977 11.7738 0.0283* MC-CF (det) 0.0693 0.1324 0.5983* 60.6580 7.3296 5.4455 10.6478 0.1132 Stochastic SIDM (1) 0.0355 0.2288 2.2997 75.5585 8.6362 7.4993 11.3479 0.0283* SIDM (3) - - - 75.4169 8.5544 7.4790 11.2933 - SIDM (6) - - - 75.3266 8.5231 7.4659 11.2653 - SIDM (10) - - - 75.2699 8.5014 7.4577 11.2502 - SIDM (15) - - - 75.2262 8.4843 7.4531 11.2341 - MC-CF (stoch, 1) 0.0313* 0.1096* 0.8118 61.3233 7.7300 5.5265 11.0829 0.0943 MC-CF (stoch, 3) - - - 54.8897 6.5929 4.9122 9.5776 - MC-CF (stoch, 6) - - - 49.0836 5.8597 4.5823 8.4322 - MC-CF (stoch, 10) - - - 46.1996 5.3691 4.3422 7.8156 - MC-CF (stoch, 15) - - - 45.2316* 4.9973* 4.2497* 7.5745* - Note : * indicates the minimum (best) value within each column. In case of ties, multiple entries are marked. measures ( RM S E ( s ) , RM S E ( v ) , RM SE ( a ) , minADE , and minF DE ) compared to MC-CF (stoch). This in- dicates that MC-CF (stoch) is the only frame work e valuated that achie ves both statistical v alidity in state transition dynamics and superior predictive accurac y , effecti vely balancing behavioral realism with kine- matic precision. 5.3. Zero-Shot Generalization Analysis In this subsection, we e valuate the models’ zero-shot generalization capability . Specifically , we test the models trained on the HDV -following-HD V subset of the WOMD training dataset on 106 HDV -following- HD V car-follo wing pairs from the PHX dataset, as summarized in T able 2, considering both one-step and open-loop performance metrics. W e focus on the HDV -following-HD V case because (i) A V behaviors may dif fer between the two datasets, as noted by Zhang et al. [31], and (ii) the WOMD training dataset contains substantially more HD V -following-HD V samples than the A V -included cases, pro viding a robust basis for cross-dataset e valuation. The zero-shot results in T able 9 rev eal a remarkably consistent pattern with the findings from the WOMD test dataset. Despite being trained exclusi vely on the WOMD HD V -following-HD V data, the proposed MC- CF models maintain superior predicti ve performance across all one-step and open-loop metrics when ev alu- ated on the PHX dataset. In particular, MC-CF (det) achie ves the lo west RMSE for acceleration in one-step prediction, while MC-CF (stoch) attains the lowest RMSE values for spacing and speed and demonstrates continuous improvement in long-horizon metrics ( minDT W ( s ) , minDT W ( v ) , minADE , and minF DE ) as the number of stochastic samples increases, with the best results observed at K = 15. In contrast, all con- ventional physics-based models, including IDM, V an Arem, Gipps, both FVDM variants, and SIDM e xhibit significant degradation in both short-term accuracy and long-term trajectory fidelity , consistent with their limited performance observed with the W OMD test dataset. These results reinforce the key insight that the proposed MC-CF models, particularly MC-CF (stoch) for trajectory generation, consistently outperform other models across datasets collected in entirely dif ferent 21 FIGURE 2: Schematic of the ring road simulation geometry ( L = 3 , 000 m). The periodic boundary con- dition ensures that x = 3 , 000 m and x = 0 m represent the same physical location, creating a closed-loop system for e valuating long-term v ehicle interactions. en vironments, demonstrating strong cross-domain generalization. This suggests that its stochastic formula- tion allows the model to capture the inherent behavioral variability of human driv ers rather than ov erfitting to dataset-specific dynamics. Overall, the findings highlight the robustness and transferability of the Marko v chain framew ork, distinguishing it from conv entional car-follo wing models commonly used in transporta- tion research. 5.4. Microscopic Simulation and Model Scalability While the previous sections demonstrate the MC-CF model’ s predictiv e accuracy on real-world short- horizon trajectories, a robust car-follo wing model must also perform reliably in microscopic simulations where errors can accumulate over time. In this section, we ev aluate the performance of the proposed model in a ring road simulation, as illustrated in Figure 2. This geometry represents a closed-loop system with a total length of L = 3 , 000 m (where the position x = 3 , 000 m is identical to x = 0 m), allowing us to assess long-term v ehicle interactions and shockwav e propagation over a 300-second horizon with a time step of ∆ t = 0 . 1 s. Since the WOMD contains the largest amount of HD V interaction data, we focus exclusi vely on a 100% HD V simulation en vironment. W e specifically utilize the MC-CF (stoch) for the simulation, le veraging its superior accuracy in open-loop trajectory prediction. 5.4.1. Simulation Setup T o demonstrate ho w incorporating di verse data and inference strategies systematically improv es simulation stability and realism, we conduct an ablation study comparing the baseline physics-based models (IDM and SIDM) against four v ariants of the proposed MC-CF framework. The components of these v ariants are summarized in T able 10. The MC-CF (Original) model is trained purely on WOMD car-follo wing pairs, which is identical to the MC-CF (stoch) model e valuated throughout the preceding sections. The MC-CF (Solo) model is augmented with unconstrained solo dri ving trajectories (free-flow data) to provide appropriate acceleration distributions for large spacings. T o incorporate free-flo w behavior , we ex- tracted trajectories of vehicles without leaders, or with spacings exceeding 45 m for WOMD and 150 m for 22 T ABLE 10: Ablation study configurations for the e valuated MC-CF models. Model Name WOMD CF Data Free-flo w (Solo) Data Conservati ve Inference Freeway (TGSIM) Data MC-CF (Original) ✓ MC-CF (Solo) ✓ ✓ MC-CF (Solo+Conserv ative) ✓ ✓ ✓ MC-CF (Solo+Conserv ative+TGSIM) ✓ ✓ ✓ ✓ T ABLE 11: Configuration of the ring road simulation experiments. Experiment V ehicles ( N ) v st art (m/s) Perturbation Pr ofile (T arget V ehicle) Normal Equilibrium 200 5.84 - Standard Shockwa ve 200 5.84 At t = 50 s: − 1 m/s 2 for 5 s, hold speed for 10 s, then 1 m/s 2 for 5 s Sev ere Shockwav e 200 5.84 At t = 50 s: − 1 m/s 2 for 10 s, hold speed for 30 s, then 1 m/s 2 for 10 s High-Speed Shockwa ve 40 30.00 Same sev ere perturbation profile as the Sev ere Shockwav e experiment TGSIM. For these cases, we assigned a ghost leader with ∆ v = 0 m/s and spacing d = 45 m for the MC-CF (Solo) and MC-CF (Solo+Conservati ve) models, and d = 150 m for the MC-CF (Solo+Conservati ve+TGSIM) model. T o minimize the impact of urban noise in the WOMD, free-flow trajectories were exclusi vely sam- pled only from scenarios without traffic signals, stop signs, and crosswalks, which represented approxi- mately 10% of the total WOMD training set scenarios. The MC-CF (Solo+Conservativ e) model incorporates a safety-oriented inference mode. Specifically , it computes the T ime-to-Collision (TTC), defined as spacing / | relati ve velocity | , when the follower is ap- proaching the leader . If TTC < 3 . 0 s, the model samples randomly from only the bottom 5 th percentile of the empirical acceleration distribution. If 3 . 0 ≤ TTC < 10 . 0 s, it samples from the bottom 30 th percentile. Otherwise, it samples from the full distribution. The rationale behind this conserv ativ e mode is that the dataset inevitably includes accelerations associated with lane-changing maneuvers, which may otherwise appear as aggressi ve car-follo wing behavior . Since these outliers can induce artificial crashes that critically disrupt a closed-loop simulation, the conservati ve filter acts as a necessary safeguard. It should be noted that e ven when the conserv ativ e filter is applied, the sampled acceleration is still drawn from real-w orld data rather than being a hard coded v alue. Finally , the MC-CF (Solo+Conservativ e+TGSIM) model is further augmented with high-speed free- way data from the TGSIM dataset. This addition extends the v alid spacing threshold to 150 m (originally 45 m), the relativ e speed range to [ − 30 , 30 ] m/s (originally [ − 10 , 10 ] m/s), and the valid follower speed range to [ 0 , 40 ] m/s (originally [ 0 , 20 ] m/s). This model allo ws us to ev aluate how expanding the state space with higher-speed samples impacts ov erall performance. Note that the solo dri ving data from the TGSIM dataset is added only to this specific model. W e designed four distinct simulation experiments to stress-test the models. The first three experiments stay within the speed range of the urban WOMD dataset, while the last experiment is introduced to test the models’ capabilities in handling high-speed. The specific configurations, v ehicle densities, and perturbation profiles for these scenarios are summarized in T able 11. 5.4.2. Simulation Results For the stochastic models, 20 independent simulation trials were conducted per scenario to ev aluate the mean and standard deviation of collision events. T able 12 summarizes these collision statistics. The results sho w that IDM and SIDM experience no crashes during the simulation. Howe ver , the MC-CF (Original) model clearly shows its limitations by resulting in numerous crashes, av eraging 37 . 05 crashes in the Se- vere Shockwav e test. Incorporating the solo driving dataset significantly reduces the number of crashes 23 T ABLE 12: Collision statistics (Mean ± Std) across 20 simulation trials for the four experiments. Experiment IDM SIDM MC-CF (Original) MC-CF (Solo) MC-CF (Solo+Cons) MC-CF (Solo+Cons+TGSIM) Normal Equilibrium 0.00 ± 0.00 0.00 ± 0.00 12.05 ± 5.34 1.90 ± 1.48 0.00 ± 0.00 0.25 ± 0.55 Standard Shockwav e 0.00 ± 0.00 0.00 ± 0.00 15.90 ± 5.30 6.00 ± 1.95 0.00 ± 0.00 0.00 ± 0.00 Sev ere Shockwave 0.00 ± 0.00 0.00 ± 0.00 37.05 ± 6.06 30.75 ± 4.83 8.05 ± 1.85 9.35 ± 2.94 High-Speed Shockwav e 0.00 ± 0.00 0.00 ± 0.00 7.40 ± 3.08 11.10 ± 2.81 3.20 ± 1.47 0.70 ± 1.17 in the urban scenarios, and adding the conservati ve sampling effecti vely eliminates crashes in both the Normal Equilibrium and Standard Shockwav e experiments. In the Sev ere Shockwav e experiment, MC-CF (Solo+Conserv ativ e) significantly reduces the number of collisions, though an av erage of 8 . 05 crashes per simulation remains. Notably , while adding the TGSIM dataset significantly reduces high-speed crashes (a mean of 0 . 70 crashes compared to 3 . 20 without the TGSIM dataset), it slightly degrades performance in the lo wer-speed, dense urban scenarios compared to the MC-CF (Solo+Conserv ativ e) model. This can be attributed to the limited number of samples and the differing data quality in TGSIM compared to the massiv e WOMD dataset. Ho wev er , as demonstrated in the high-speed experiment, adding even a small amount of free way data dras- tically improves performance in high-speed regimes. If larger amounts of high-fidelity naturalistic freew ay data (e.g., W aymo highway datasets) are collected in the future, we expect the model to perform e xception- ally and org anically self-correct across all speed regimes without such trade-of fs. Figure 3 illustrates the trajectories of the 200 vehicles on a ring road for an arbitrarily selected random seed in the Normal Equilibrium test. W e specifically start the simulation with an initial speed of 5 . 84 m/s, which corresponds to the equilibrium speed obtained from the IDM simulation. As shown in Figures 3a and 3b, the IDM and SIDM maintain a perfectly uniform flo w . Howe ver , we note that such perfect uniformity is highly unrealistic in real-world traf fic consisting of heterogeneous human driv ers. The MC-CF (Original) model struggles with the urban noise embedded in its training data, resulting in 14 crashes in the plotted equilibrium scenario. It is also interesting to observe that although both the IDM and MC-CF (Original) are calibrated/trained on the same dataset, they do not agree on the equilib- rium speed. Considering that the MC-CF framework demonstrated significantly higher accuracy in both one-step and open-loop trajectory predictions (Section 5.1), this discrepancy suggests that macroscopic conclusions (e.g., equilibrium speed or capacity) drawn from calibrated parametric models may be mis- leading. By adding solo driving data, the MC-CF (Solo) model effecti vely reduces the number of crashes. More importantly , applying the conserv ativ e inference mode (MC-CF (Solo+Conservati ve) and MC-CF (Solo+Conserv ativ e+TGSIM)) completely eliminates crashes in this experiment, matching the safety of the parametric models while simultaneously exhibiting realistic and stochastic speed fluctuations. T o empirically validate the structural realism of these stochastic speed fluctuations, we compare them with real-world human driving data from the Nagoya Dome ring road experiment [41]. In this experiment, HD Vs tra veled continuously on a circular track with a radius of 50 m (circumference approximately 314 m). As illustrated by the ground truth trajectories in Figure 4, naturalistic driving inherently exhibits con- tinuous stochastic speed fluctuations and forward moving shockwa ves. This empirical observation directly contradicts the perfectly homogeneous flow predicted by the IDM and SIDM (Figures 3a–b). In contrast, the augmented MC-CF models (Figures 3e–f) successfully reproduce this phenomenon, sustaining stable collision free flo w while organically generating the speed v ariations observed in real-world e xperiment. The Standard Shockwa ve test e valuates the response of follo wing v ehicles in a platoon with respect to a small perturbation introduced by the leader . As sho wn in Figures 5a-b, the trajectories produced by the IDM and SIDM exhibit backward shockwa ve propagation, yet the system fails to recover to the equilibrium speed within the 300-second horizon. The MC-CF (Original) model (Figure 5c) sho ws that vehicles reco ver their equilibrium speed in a shorter period of time. Howe ver , it fails to efficiently reduce the gap to the leader once the perturbation ends, leading to the formation of an unrealistic platoon. 24 (a) IDM (b) SIDM (c) MC-CF (Original) (d) MC-CF (Solo) (e) MC-CF (Solo+Conservati ve) (f) MC-CF (Solo+Conservati ve+TGSIM) FIGURE 3: T rajectory plots for the Normal Equilibrium test ( N = 200, v st art = 5 . 84 m/s). FIGURE 4: Ground truth trajectories from the Nagoya Dome ring road e xperiment (Session 1520, N = 20). The empirical data exhibits naturalistic speed fluctuations and forward moving shockwav es, contrasting with the rigid uniformity predicted by traditional parametric models. 25 (a) IDM (b) SIDM (c) MC-CF (Original) (d) MC-CF (Solo) (e) MC-CF (Solo+Conservati ve) (f) MC-CF (Solo+Conservati ve+TGSIM) FIGURE 5: T rajectory plots for the Standard Shockwav e test. MC-CF models with the conservati ve mode (e-f) eliminates collisions, while solo data (d-f) ensures proper gap reduction. The MC-CF (Solo) model (Figure 5d) still exhibits collisions (7 crashes in the plotted trial) but suc- cessfully resolves the unrealistic platooning issue observed in the Original model. The inclusion of solo data enables followers to correctly recognize large gaps as free-flow conditions, allowing them to acceler- ate smoothly , close the gap, and return the system to equilibrium. When the conservati ve mode is applied alongside the solo data, collisions drop to zero (Figures 5e and 5f). This demonstrates the model’ s capability to maintain safety as long as the simulation remains within regions of the state space that are suf ficiently represented in the training data. In the Sever e Shockwav e experiment (Figure 6), the IDM and SIDM produce mathematically perfect backward shockwav es that lack naturalistic v ariation. The MC-CF (Original) model (Figure 6c) fails en- tirely , resulting in heavy pile-ups (31 crashes in the plotted example) and failing once again to reduce the forward gap to the leader . The MC-CF (Solo) model (Figure 6d) solv es the gap recov ery issue but continues to suf fer from major pile-ups. 26 (a) IDM (b) SIDM (c) MC-CF (Original) (d) MC-CF (Solo) (e) MC-CF (Solo+Conservati ve) (f) MC-CF (Solo+Conservati ve+TGSIM) FIGURE 6: Trajectory plots for the Se vere Shockwa ve test. IDM (a) and SIDM (b) show rigid shockwav es, while the MC-CF models with the conservati ve mode (e-f) display realistic backward and forward wa ve propagation. In contrast, both the MC-CF (Solo+Conservati ve) and MC-CF (Solo+Conservati ve+TGSIM) models (Figures 6e-f) dramatically reduce the number of crashes (5 and 6 crashes in their respectiv e examples) and reproduce highly realistic traffic dynamics. Specifically , they generate a backward-propagating shock- wa ve initiated by the controlled v ehicle, followed by a forward-mo ving shockwav e as vehicles dynamically accelerate to close the gap once the leader accelerates. A key advantage of the MC-CF frame work over deterministic models is its ability to capture the inherent stochasticity of human driving. Figure 7 displays the results of four different random seeds using the MC- CF (Solo+Conservati ve) model under the e xact same se vere shockwav e perturbation. V isually , the trajectory plots demonstrate that the propagation speed, recov ery time, and intensity of the shockwav e v ary noticeably between trials. This confirms that the model naturally replicates the stochasticity observed in real-world con- gested traffic, where identical leader beha vior can trigger different macroscopic traf fic outcomes depending 27 (a) T rial 1 (b) T rial 2 (c) T rial 3 (d) T rial 4 FIGURE 7: Four random seed ev aluations of the MC-CF (Solo+Conserv ativ e) model under the Se- vere Shockwav e test. Despite identical leader perturbations, shockwa ve propagation exhibits naturalistic, stochastic v ariability . on the probabilistic responses of the follo wing vehicles. Finally , we tested the framework’ s scalability to high-speed en vironments through the High-Speed Shockwa ve test. Because the WOMD dataset lacks high-speed freew ay interactions, Figure 8a shows that the MC-CF (Solo+Conserv ati ve) model e xhibits unrealistic decelerations and phantom braking, resulting in collisions (3 crashes in the plotted example). By integrating TGSIM data, the MC-CF (Solo+Conserv ativ e+ TGSIM) model successfully populates the high-speed state space. As illustrated in Figure 8b, this addition allo ws v ehicles to maintain high speeds reliably , form realistic platoons, and smoothly respond to the shock- wa ve, dropping the crashes to 0 in the visualized e xample. This highlights the scalability of the proposed frame work, as its performance naturally improv es with the incorporation of new and di verse data. W e note that the current v ariants of the MC-CF model are not perfect, as they still suffer from occa- sional crashes in the Sev ere Shockwav e and High-Speed Shockwav e scenarios. Howe ver , the presented frame work achiev es these realistic traffic dynamics without relying on predefined physical equations, in- stead sampling accelerations purely from empirical data distributions. As A Vs collect increasingly vast, high-fidelity datasets across div erse operational design domains including freeways, the proposed model can organically ingest these new samples into its state bins without requiring structural recalibration. This inherent scalability ensures that the model will continuously improve, generating increasingly robust and realistic trajectories across both urban and free way scenarios. 6. CONCLUSION The enduring study of car-follo wing behavior remains central to transportation engineering, informing crit- ical applications from microscopic simulation to macroscopic traf fic flo w analysis. The emergence of ex- 28 (a) MC-CF (Solo+Conservati ve) (b) MC-CF (Solo+Cons+TGSIM) FIGURE 8: T rajectory plots for the High-Speed Shockwav e test. (a) MC-CF (Solo+Conservati ve) exhibits phantom braking due to a lack of high-speed training data, while (b) MC-CF (Solo+Conservati ve+TGSIM) successfully maintains high speeds and forms realistic platoons. tensi ve, high-fidelity datasets like the W OMD has the potential to fundamentally reshape our understanding of these dynamics. Building on this foundation, this paper introduces a new modeling category called the empirical probabilistic paradigm. Within this class of models, we propose the MC-CF model, which empiri- cally deri ves state transitions and acceleration distributions directly from real-world trajectory data, thereby obviating the need for e xplicit calibration or restrictiv e parametric assumptions. Our comprehensiv e ev aluation underscores the superior performance of the proposed MC-CF model. The deterministic v ariant, MC-CF (det), consistently achiev ed the lowest RMSE in next-step follower accel- eration prediction, significantly outperforming traditional physics-based models (IDM, V an-Arem, FVDM v ariants, Gipps, and SIDM). This immediate predictiv e accuracy was robustly demonstrated on both the WOMD test dataset and on the entirely unseen PHX dataset, highlighting its exceptional zero-shot general- ization capabilities. Furthermore, the stochastic v ariant, MC-CF (stoch), demonstrated superior performance in open-loop trajectory prediction. When giv en only initial conditions, MC-CF (stoch) surpassed all baseline models across trajectory-lev el metrics ( minDT W , minADE , minF DE ), while keeping the ov erlapping rate below 6%. Notably , transition probability analysis sho wed that MC-CF (stoch) successfully reproduces the ground truth transition probability distributions across all examined interaction types (A V -following-HD V , HD V - follo wing-A V , and HD V -follo wing-HDV). It also achie ved the highest performance in zero-shot generaliza- tion for trajectory-le vel prediction. These results highlight that MC-CF (stoch) not only deliv ers accurate trajectory predictions but also effecti vely captures the underlying probabilistic structure of real-world driv- ing behavior . Beyond trajectory prediction, the proposed frame work’ s inherent scalability allo ws it to ev olve into a ro- bust microscopic simulation tool. As demonstrated through the ring road ablation study , the straightforward aggregation of div erse data, such as unconstrained free-flow trajectories and high-speed freew ay samples from TGSIM, alongside a safety-oriented conserv ativ e inference strategy , successfully mitigated the limi- tations of the baseline model. The enhanced MC-CF framework significantly reduced collision rates, and accurately reproduced naturalistic and stochastic shockwav e propagation across both urban and high-speed regimes. This future-proof structure ensures that as larger , high-fidelity datasets covering diverse operational design domains become av ailable, the model can continuously self-correct and improve. Future work will leverage our model’ s robust representation of dri ving behavior for probabilistic stability analysis. This is a critical next step, gi ven that most of the e xisting stability studies rely hea vily on calibrated car-follo wing models that we have demonstrated to de viate significantly from empirical ground truth. 29 DECLARA TION OF CONFLICTING INTERESTS All other authors declare no potential conflicts of interest with respect to the research, authorship, and publication of this article. A CKNO WLEDGMENTS This work was supported by the National Science F oundation under Grant No. 2047937. DECLARA TION OF GENERA TIVE AI AND AI-ASSISTED TECHNOLO- GIES IN THE WRITING PR OCESS The authors ackno wledge the use of AI-assisted tools (such as ChatGPT) for language editing and grammar refinement during manuscript preparation. No AI tool was used for generating novel content, data analysis, or dra wing conclusions. All responsibility for the accurac y and integrity of the manuscript remains with the authors. A UTHOR CONTRIBUTIONS Sungy ong Chung: Conceptualization, data curation, formal analysis, in vestigation, methodology , soft- ware, visualization, Writing – original draft. Y anlin Zhang: Conceptualization, methodology , validation, Writing – revie w & editing. Nachuan Li: V alidation, visualization, Writing – revie w & editing. Dana Monzer: V alidation, Writing – re view & editing. Alireza T alebpour: Conceptualization, funding acquisi- tion, methodology , project administration, supervision, v alidation, Writing – revie w & editing. 30 REFERENCES [1] Chen, K., R. Ge, H. Qiu, R. Ai-Rfou, C. R. Qi, X. Zhou, Z. Y ang, S. Ettinger , P . Sun, Z. Leng, M. Mustafa, I. Bogun, W . W ang, M. T an, and D. Anguelov , WOMD-LiD AR: Raw Sensor Dataset Benchmark for Motion F orecasting. In Pr oceedings of the IEEE International Confer ence on Robotics and Automation (ICRA) , 2024. [2] Ettinger, S., S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, et al., Lar ge scale interactiv e motion forecasting for autonomous driving: The waymo open motion dataset. In Pr oceedings of the IEEE/CVF International Confer ence on Computer V ision , 2021, pp. 9710–9719. [3] Houston, J., G. Zuidhof, L. Bergamini, Y . Y e, L. Chen, A. Jain, S. Omari, V . Igloviko v , and P . On- druska, One thousand and one hours: Self-driving motion prediction dataset. In Confer ence on Robot Learning , PMLR, 2021, pp. 409–418. [4] Chang, M.-F ., J. Lambert, P . Sangkloy , J. Singh, S. Bak, A. Hartnett, D. W ang, P . Carr , S. Lucey , D. Ramanan, et al., Ar goverse: 3d tracking and forecasting with rich maps. In Pr oceedings of the IEEE/CVF confer ence on computer vision and pattern r ecognition , 2019, pp. 8748–8757. [5] Gipps, P . G., A behavioural car -following model for computer simulation. T ransportation r esear ch part B: methodological , V ol. 15, No. 2, 1981, pp. 105–111. [6] Treiber , M., A. Hennecke, and D. Helbing, Congested traffic states in empirical observ ations and microscopic simulations. Physical r evie w E , V ol. 62, No. 2, 2000, p. 1805. [7] Treiber , M. and A. K esting, The intelligent dri ver model with stochasticity-new insights into traffic flo w oscillations. T ransportation r esear ch pr ocedia , V ol. 23, 2017, pp. 174–187. [8] Papathanasopoulou, V . and C. Antoniou, T o wards data-driv en car-follo wing models. T ransportation Resear ch P art C: Emerging T echnolo gies , V ol. 55, 2015, pp. 496–509. [9] He, Z., L. Zheng, and W . Guan, A simple nonparametric car-follo wing model driv en by field data. T ransportation Resear ch P art B: Methodological , V ol. 80, 2015, pp. 185–201. [10] Panwai, S. and H. Dia, Neural agent car-follo wing models. IEEE T ransactions on Intelligent T rans- portation Systems , V ol. 8, No. 1, 2007, pp. 60–70. [11] W ang, X., R. Jiang, L. Li, Y . Lin, X. Zheng, and F .-Y . W ang, Capturing car -following behaviors by deep learning. IEEE T ransactions on Intelligent T ransportation Systems , V ol. 19, No. 3, 2017, pp. 910–920. [12] Khodayari, A., A. Ghaff ari, R. Kazemi, and R. Braunstingl, A modified car-follo wing model based on a neural network model of the human dri ver effects. IEEE T ransactions on Systems, Man, and Cybernetics-P art A: Systems and Humans , V ol. 42, No. 6, 2012, pp. 1440–1449. [13] Ma, L. and S. Qu, A sequence to sequence learning based car-follo wing model for multi-step pre- dictions considering reaction delay . T ransportation r esear ch part C: emerging technolo gies , V ol. 120, 2020, p. 102785. [14] Sutskev er , I., O. V inyals, and Q. V . Le, Sequence to sequence learning with neural networks. Advances in neural information pr ocessing systems , V ol. 27, 2014. [15] Li, Z., H. Meng, C. Ma, K. Ma, and X. Li, Assessing Marko v Property in Driving Behaviors: Insights from Statistical T ests. arXiv pr eprint arXiv:2501.10625 , 2025. [16] Higgs, B., M. Abbas, and A. Medina, Analysis of the W iedemann car following model ov er different speeds using naturalistic data. In Pr ocedia of RSS Conference , 2011, V ol. 1, pp. 1–22. [17] Krauss, S., Micr oscopic modeling of traf fic flow: in vestigation of collision fr ee vehicle dynamics , 1998. [18] Saifuzzaman, M. and Z. Zheng, Incorporating human-factors in car-follo wing models: A re view of recent dev elopments and research needs. T ransportation r esear ch part C: emerging technologies , V ol. 48, 2014, pp. 379–403. 31 [19] Albeaik, S., A. Bayen, M. T . Chiri, X. Gong, A. Hayat, N. Kardous, A. K eimer , S. T . McQuade, B. Piccoli, and Y . Y ou, Limitations and Improvements of the Intelligent Dri ver Model (IDM). SIAM J ournal on Applied Dynamical Systems , V ol. 21, No. 3, 2022, pp. 1862–1892. [20] Zheng, F ., C. Liu, X. Liu, S. E. Jabari, and L. Lu, Analyzing the impact of automated vehicles on un- certainty and stability of the mixed traffic flow . T ransportation r esear ch part C: emer ging technologies , V ol. 112, 2020, pp. 203–219. [21] Jiang, R., Q. W u, and Z. Zhu, Full velocity dif ference model for a car-follo wing theory . Physical Revie w E , V ol. 64, No. 1, 2001, p. 017101. [22] Helbing, D. and B. T ilch, Generalized force model of traffic dynamics. Physical r eview E , V ol. 58, No. 1, 1998, p. 133. [23] Punzo, V ., Z. Zheng, and M. Montanino, About calibration of car-follo wing dynamics of automated and human-dri ven vehicles: Methodology , guidelines and codes. T ransportation Resear ch P art C: Emer ging T echnologies , V ol. 128, 2021, p. 103165. [24] V an Arem, B., C. J. V an Driel, and R. V isser , The impact of cooperativ e adapti ve cruise control on traf fic-flow characteristics. IEEE T ransactions on intelligent transportation systems , V ol. 7, No. 4, 2006, pp. 429–436. [25] Ames, A. D., X. Xu, J. W . Grizzle, and P . T ab uada, Control barrier function based quadratic programs for safety critical systems. IEEE T ransactions on Automatic Contr ol , V ol. 62, No. 8, 2016, pp. 3861– 3876. [26] Zaky , A. B. and W . Gomaa, Car following regime taxonomy based on Markov switching. In 17th International IEEE Confer ence on Intelligent T ransportation Systems (ITSC) , IEEE, 2014, pp. 1329– 1334. [27] Zou, Y ., T . Zhu, Y . Xie, Y . Zhang, and Y . Zhang, Multiv ariate analysis of car-follo wing behavior data using a coupled hidden Mark ov model. T ransportation Resear ch P art C: Emer ging T echnologies , V ol. 144, 2022, p. 103914. [28] Y ao, X., S. C. Calvert, and S. P . Hoogendoorn, Identification of driving heterogeneity using action- chains. In 2023 IEEE 26th International Conference on Intellig ent T ransportation Systems (ITSC) , IEEE, 2023, pp. 6001–6006. [29] Zhang, C., C. W u, and L. Sun, Markov Regime-Switching Intelligent Dri ver Model for Interpretable Car-F ollowing Beha vior . arXiv pr eprint arXiv:2506.14762 , 2025. [30] Punzo, V ., D. J. Formisano, and V . T orrieri, Nonstationary Kalman Filter for Estimation of Accurate and Consistent Car-F ollowing Data. T ransportation Resear ch Record , V ol. 1934, No. 1, 2005, pp. 2–12. [31] Zhang, Y ., S. Chung, N. Li, D. Monzer, H. S. Mahmassani, S. H. Hamdar , and A. T alebpour , Can the W aymo Open Motion Dataset Support Realistic Behavioral Modeling? A V alidation Study with Naturalistic T rajectories. arXiv pr eprint arXiv:2509.03515 , 2025. [32] W u, W ., X. Feng, Z. Gao, and Y . Kan, Smart: Scalable multi-agent real-time motion generation via next-tok en prediction. In Advances in Neural Information Pr ocessing Systems , 2024, V ol. 37, pp. 114048–114071. [33] U.S. Department of Transportation Federal Highway Administration, Next Generation Simulation (NGSIM) V ehicle T rajectories and Supporting Data . http://doi.org/10.21949/1504477 , 2016, [Dataset]. Provided by ITS DataHub through Data.transportation.go v . [34] Chung, S., A. T alebpour , and S. H. Hamdar, Characterizing Lane-Changing Beha vior in Mix ed Traf fic. arXiv pr eprint arXiv:2512.07219 , 2025. [35] Ammourah, R., P . Beigi, B. Fan, S. H. Hamdar, J. Hourdos, C.-C. Hsiao, R. James, M. Khajeh- Hosseini, H. S. Mahmassani, D. Monzer , et al., Introduction to the third generation simulation dataset: 32 Data collection and trajectory extraction. T ransportation Resear ch Record , V ol. 2679, No. 1, 2025, pp. 1768–1784. [36] T alebpour , A., H. S. Mahmassani, and S. H. Hamdar , Third generation simulation data (tgsim): A closer look at the impacts of automated dri ving systems on human behavior , 2024. [37] Freedman, D. and P . Diaconis, On the histogram as a density estimator: L2 theory . Zeitschrift für W ahrscheinlichk eitstheorie und verwandte Gebiete , V ol. 57, No. 4, 1981, pp. 453–476. [38] Masserano, L., A. F . Ansari, B. Han, X. Zhang, C. Faloutsos, M. W . Mahoney , A. G. Wilson, et al., En- hancing foundation models for time series forecasting via W av elet-based tokenization. arXiv pr eprint arXiv:2412.05244 , 2024. [39] Storn, R. and K. Price, Differential e volution—a simple and ef ficient heuristic for global optimization ov er continuous spaces. Journal of global optimization , V ol. 11, No. 4, 1997, pp. 341–359. [40] Zhang, Y . and A. T alebpour , Characterizing human–automated vehicle interactions: An in vestigation into car-follo wing behavior . T ransportation Resear ch Recor d , V ol. 2678, No. 5, 2024, pp. 812–826. [41] T adaki, S.-i., M. Kikuchi, M. Fukui, A. Nakayama, K. Nishinari, A. Shibata, Y . Sugiyama, T . Y osida, and S. Y ukawa, Phase transition in traffic jam e xperiment on a circuit. Ne w Journal of Physics , V ol. 15, No. 10, 2013, p. 103034. 33
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment