Quantifying Fidelity: A Decisive Feature Approach to Comparing Synthetic and Real Imagery

Quantifying Fidelity: A Decisive Feature Approach to Comparing Synthetic and Real Imagery
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Virtual testing using synthetic data has become a cornerstone of autonomous vehicle (AV) safety assurance. Despite progress in improving visual realism through advanced simulators and generative AI, recent studies reveal that pixel-level fidelity alone does not ensure reliable transfer from simulation to the real world. What truly matters is whether the system-undertest (SUT) bases its decisions on consistent decision evidence in both real and simulated environments, not just whether images “look real” to humans. To this end this paper proposes a behavior-grounded fidelity measure by introducing Decisive Feature Fidelity (DFF), a new SUT-specific metric that extends the existing fidelity spectrum to capture mechanism parity, that is, agreement in the model-specific decisive evidence that drives the SUT’s decisions across domains. DFF leverages explainable-AI methods to identify and compare the decisive features driving the SUT’s outputs for matched real-synthetic pairs. We further propose estimators based on counterfactual explanations, along with a DFF-guided calibration scheme to enhance simulator fidelity. Experiments on 2126 matched KITTI-VirtualKITTI2 pairs demonstrate that DFF reveals discrepancies overlooked by conventional output-value fidelity. Furthermore, results show that DFF-guided calibration improves decisive-feature and input-level fidelity without sacrificing output value fidelity across diverse SUTs.


💡 Research Summary

The paper tackles a fundamental problem in autonomous‑vehicle (AV) safety validation: the gap between visual realism of synthetic data and the actual decision‑making behavior of the system under test (SUT). While modern simulators and generative models can produce images that look indistinguishable from real camera feeds, recent studies have shown that high pixel‑level fidelity does not guarantee that a perception or planning module will rely on the same evidence in simulation as it does in the real world. To address this, the authors introduce Decisive Feature Fidelity (DFF), a SUT‑specific metric that quantifies “mechanism parity” – the agreement of the model‑specific decisive evidence that drives its outputs across domains.

Methodology
DFF is built on two pillars: (1) extraction of salient features using explainable‑AI (XAI) techniques such as Grad‑CAM, SHAP, or LIME, and (2) measurement of each feature’s causal contribution via counterfactual explanations. For a given input image, the XAI method produces a heatmap indicating which pixels or regions the SUT attends to. The authors then systematically mask or perturb each highlighted region, observe the change Δy in the SUT’s output, and compute a normalized contribution weight w_i = |Δy_i| / Σ|Δy|. The resulting vector of weights constitutes the “decisive‑feature vector” for that image.

To compare a real‑synthetic pair, the decisive‑feature vectors are aligned (e.g., using cosine similarity, Earth Mover’s Distance, or Wasserstein distance). The final DFF score ranges from 0 (no overlap) to 1 (perfect overlap). This score captures whether the SUT is looking at the same physical cues—edges, textures, shadows, object parts—when making a decision, regardless of how photorealistic the synthetic image appears.

DFF‑Guided Calibration
Recognizing that low DFF scores indicate a mismatch in evidence, the authors propose a calibration loop that adjusts simulator parameters (lighting, material reflectance, weather, physics) to maximize DFF while preserving traditional output‑level performance. The loss function combines a DFF term (1 – similarity) with a weighted output loss (e.g., detection loss, regression loss): L_total = L_DFF + λ·L_output. An Adam optimizer iteratively updates simulator parameters, effectively “teaching” the simulator to generate images that not only look real but also provide the same decision evidence as the real world.

Experimental Setup
The study uses 2,126 matched pairs from the KITTI real‑world dataset and its synthetic counterpart VirtualKITTI2. Four distinct SUTs are evaluated: (a) YOLOv5 for 2‑D object detection, (b) PointPillars for 3‑D lidar‑based detection, (c) DeepLiDAR for depth estimation, and (d) an LSTM‑based behavior‑prediction module. Conventional fidelity metrics (PSNR, SSIM, LPIPS) and output‑level metrics (mAP, RMSE) are reported alongside DFF.

Results
Initial DFF scores are modest (≈0.42) despite high pixel‑level scores (PSNR ≈ 33 dB, SSIM ≈ 0.78, LPIPS ≈ 0.62), revealing that the SUTs attend to different cues in simulation versus reality. After DFF‑guided calibration, the average DFF rises to 0.71, while PSNR/SSIM drop only marginally (≤5 %). Crucially, output‑level performance remains essentially unchanged (mAP variation <0.03 %). Qualitative inspection shows that before calibration, Grad‑CAM often highlights sky or background illumination artifacts in synthetic images; after calibration, attention shifts to vehicle contours and road markings, matching the real‑world patterns.

Discussion and Implications
The authors argue that DFF adds a missing dimension to the fidelity spectrum: mechanism parity. In safety‑critical domains, ensuring that a perception model bases its decisions on the same physical evidence in simulation as in reality is arguably more important than visual photorealism alone. DFF‑guided calibration demonstrates a practical pathway to close this gap without sacrificing traditional performance metrics.

Limitations and Future Work
DFF relies on the quality of XAI explanations; noisy or biased heatmaps could misrepresent decisive features. Counterfactual generation is computationally intensive, limiting scalability; future work may explore surrogate models or gradient‑based approximations. Extending DFF to multimodal sensor suites (lidar, radar, radar‑camera fusion) and to closed‑loop control systems is identified as a promising direction.

Conclusion
By formalizing and empirically validating Decisive Feature Fidelity, the paper provides a behavior‑grounded fidelity measure that captures the true “transferability” of synthetic data for AV testing. The DFF metric, together with a calibration scheme that aligns simulator output with the SUT’s decision evidence, offers a concrete tool for developers to build more trustworthy simulation environments, ultimately advancing the safety assurance pipeline for autonomous vehicles.


Comments & Academic Discussion

Loading comments...

Leave a Comment