Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.
Trajectory prediction is a fundamental component of autonomous driving systems. It allows the planner to anticipate the future movements of surrounding agents, including vehicles, pedestrians, and cyclists, enabling proactive and safe decision-making (Cui et al. 2021). A typical self-driving vehicle (SDV) pipeline consists of perception, planning, and control modules (Rosique et al. 2019). The perception module detects nearby agents and uses a trajectory predictor to forecast their possible future behaviors. The planner then generates the ego vehicle's trajectory based on these predictions, traffic rules, and map information. High-quality predictions are crucial to ensure safety, efficiency, and passenger comfort. Figure 1: Illustration of how the predictor informs the planner in an SDV system. From the scene observation to the final control, it highlights two steps: the predictor estimates future trajectories of surrounding agents, then the planner leverages such information to plan for ego vehicle's future trajectory.
Trajectory prediction has been widely studied, with approaches spanning from physics-based predictors like Constant Velocity (CV) (Schöller et al. 2020;Isele et al. 2024) to the Learning-based methods, which have significantly improved prediction by modeling interactions and multimodality (Girgis et al. 2021;Nayakanti et al. 2022). The Transformer-based predictors, such as MTR (Shi et al. 2022) can naturally incorporate high-definition map context during trajectory forecasting, and recent graph-based models, e.g., LaneRCNN (Zeng et al. 2021), Path-Aware Graph Attention (Da and Zhang 2022), and GOHOME (Gilles et al. 2022), explicitly encode HD map structure to further enhance the plausibility of their multimodal outputs.
Given various methodologies, it becomes a realistic question of how to select predictors to provide real-world driving planners with a reliable reference (Da et al. 2024). Most prevalent strategies rely on displacement metrics, such as Average Displacement Error (ADE) and Final Displacement Error (FDE), which measure the distance between predictions and ground truth. However, the predictor is not an isolated module: its output guides the planner, affecting the vehicle’s decisions about braking, accelerating, or changing lanes. If evaluation criteria fail to capture meaningful differences among predictors that cause downstream changes, predictors that perform well on benchmark metrics may still lead to unsafe or inefficient driving behaviors in practice. Figure 3: The correlation analysis between ADE and driving performance. We could observe that the prediction’s ADE shows a weak and non-insightful relationship with the actual driving performance. i.e., given a predictor that is evaluated by an error-based metric, low error does not necessarily mean a better driving performance.
In our preliminary study 1 , we verified the above errorbased measures indeed fall short in representing such downstream impact of predictors to planners (Weng et al. 2023;Phong et al. 2023;Shridhar et al. 2020): as shown in Figure . 3, ADE hardly provides meaningful insight: low displacement error doesn’t imply improved SDV’s driving performance, vice versa. Given this, we conclude the first challenge: solely relying on error-based measures is not enough to quantify the predictor’s performance, considering the real impact on SDV.
Under the analysis of these SDVs’ driving scenarios and trajectories, it appears that an important factor neglected by the displacement error is the complexity of the scenario. As shown in Figure 2, in relatively simple scenarios, such as highway driving, where surrounding vehicles and the ego share a steady uniform motion, the predictors only need to extrapolate future paths accurately from past observations. By contrast, in complex scenarios, such as intersections, the requirement for the predictor is not only to cover the correct trajectory in its prediction, but also to anticipate every plausible maneuver a neighboring agent might take. Such comprehensive coverage is essential for planners to make early, 1 In total of 9059 real-world scenarios were tested. conservative decisions under uncertainty (Grewal, Tonella, and Stocco 2024). Since real-world systems lack access to real-time ground truth feedback, and high error tolerance can be dangerous, generating a diverse ensemble of possible futures becomes critical. Thus the second challenge arises as: How to reasonably quantify the prediction diversity that helps the planner with rich information to make safer, robust decision-making in challenging scenarios.
Recent works introduced diversity and uncertainty-aware metrics, such as Average Minimum Volume (AMV), energy scores (Shahroudi, Lepson, and Kull 2024), and lane-aware distances (Greer, Deo, and Trivedi 2021), to encourage predictors to produce multiple plausible futures. However, these metrics are typically computed in an open-loop setting, that do not account for
This content is AI-processed based on open access ArXiv data.