Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle’s decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
💡 Research Summary
The paper addresses a fundamental challenge for autonomous driving: how to learn a driving policy that both mimics human behavior on a per‑vehicle level and reproduces realistic traffic flow at the network level, when only partial data are available. Conventional supervised or imitation‑learning pipelines rely on dense, high‑resolution on‑board sensor data from many instrumented vehicles, which is expensive to collect and often lacks contextual information about surrounding traffic. Conversely, reinforcement‑learning (RL) approaches typically train policies in simulators that are calibrated to match macroscopic traffic statistics (e.g., average speed, flow, density) but have no direct link to individual driver actions, leading to policies that may be unrealistic at the microscopic level.
To bridge this “micro‑macro” gap, the authors propose a two‑stage learning framework that jointly leverages (i) a small set of instrumented vehicles providing microscopic observations and actions, and (ii) roadside infrastructure delivering aggregate traffic descriptors. The goal is to reconstruct the unobserved states of the majority of vehicles and to learn a single shared policy πθ that is (a) microscopically consistent with the observed actions and (b) macroscopically aligned with target traffic statistics when deployed fleet‑wide.
Stage I – Hidden‑State Generator.
A neural generator Gϕ receives a partial snapshot s_obs (states of the observed vehicles) together with a macroscopic feature vector Ψ(S) (e.g., fleet‑average speed, mean spacing, min/max spacing, speed bounds). It outputs estimates of the hidden vehicles’ states ŝ_hid, forming a completed initial scene ĤS₀ = {s_obs, ŝ_hid}. The generator loss combines two terms:
- Macro alignment λ_macro·d(Ψ(ĤS₀), Ψ(S)), forcing the reconstructed scene to reproduce the known aggregate statistics.
- Reconstruction consistency λ_rec·d(ŝ_hid_{t+1}, s^{πθ}hid{t+1}), which ties the generator’s predictions to the hidden states that would result if the current policy πθ were applied. This term can be omitted if the generator is trained separately.
Stage II – Policy Learning with Macro‑Micro Scores.
Episodes start from the generator‑completed state ĤS₀ and roll out the dynamics under πθ. Two trajectory‑level scores are computed:
-
Microscopic score r_micro(τ;θ) = – Σ_{i∈I_obs} Σ_{t} d(πθ(O(S_t,i)), u_{it}), penalizing deviations between the policy’s actions for observed vehicles and the ground‑truth actions.
-
Macroscopic score r_macro(τ;θ) = – Σ_{t} D_macro(Ψ(S^{πθ}_t), Ψ(S)), penalizing differences between the simulated aggregate descriptors and the target descriptors over the whole rollout.
The overall objective is J(θ) = E_{τ∼(πθ,Gϕ)}
Comments & Academic Discussion
Loading comments...
Leave a Comment