Don't double it: Efficient Agent Prediction in Occlusions

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Occluded traffic agents pose a significant challenge for autonomous vehicles, as hidden pedestrians or vehicles can appear unexpectedly, yet this problem remains understudied. Existing learning-based methods, while capable of inferring the presence of hidden agents, often produce redundant occupancy predictions where a single agent is identified multiple times. This issue complicates downstream planning and increases computational load. To address this, we introduce MatchInformer, a novel transformer-based approach that builds on the state-of-the-art SceneInformer architecture. Our method improves upon prior work by integrating Hungarian Matching, a state-of-the-art object matching algorithm from object detection, into the training process to enforce a one-to-one correspondence between predictions and ground truth, thereby reducing redundancy. We further refine trajectory forecasts by decoupling an agent’s heading from its motion, a strategy that improves the accuracy and interpretability of predicted paths. To better handle class imbalances, we propose using the Matthews Correlation Coefficient (MCC) to evaluate occupancy predictions. By considering all entries in the confusion matrix, MCC provides a robust measure even in sparse or imbalanced scenarios. Experiments on the Waymo Open Motion Dataset demonstrate that our approach improves reasoning about occluded regions and produces more accurate trajectory forecasts than prior methods.

💡 Research Summary

The paper tackles a critical yet under‑explored problem in autonomous driving: predicting hidden traffic agents that are occluded from the vehicle’s sensors. Existing learning‑based approaches, exemplified by SceneInformer, can infer the presence of such agents but often generate multiple, redundant occupancy predictions for the same underlying object. This redundancy inflates the number of candidate trajectories that downstream planners must evaluate, increasing computational load and potentially degrading safety.

Key Contributions

Hungarian Matching‑Based Training – The authors integrate the Hungarian algorithm into the loss computation pipeline. After the transformer decoder produces predictions for a set of anchor points, a cost matrix combining L2 positional error and classification confidence is built. Hungarian matching then finds an optimal one‑to‑one assignment between predictions and ground‑truth agents (or a “no‑object” label). This forces a strict one‑to‑one correspondence, eliminating duplicate detections during training and encouraging the network to output a single, high‑confidence prediction per real agent.
Decoupled Heading and Motion – Instead of predicting absolute future positions directly, the model predicts a heading angle (θ) as sin θ and cos θ to avoid discontinuities, and predicts relative displacements in a local vehicle‑centric frame. The local displacements are rotated by the predicted heading to obtain global trajectories. This separation of rotation from translation stabilizes training, improves multimodal trajectory modeling, and yields more interpretable motion forecasts.
Multiclass Occupancy and MCC Evaluation – The binary occupied/unoccupied output of SceneInformer is extended to four classes (car, pedestrian, bicycle, background). Because occluded occupancy is heavily imbalanced (few occupied cells among many empty ones), the authors adopt the Matthews Correlation Coefficient (MCC) as the primary metric. MCC incorporates true/false positives and negatives, providing a balanced assessment even when one class dominates.
Data Preparation and Anchor Generation – Using ray‑casting from the ego sensor, the method distinguishes visible from occluded regions. Anchor points are placed at the latest observed positions of visible agents and on a 1.5 m grid covering occluded space, providing a dense set of potential hidden agents for the decoder to reason about.

Architecture Overview

Encoder: Mirrors SceneInformer, encoding observed agent trajectories (separate MLPs per vehicle, pedestrian, bicycle) and road‑layout polylines.
Decoder: Receives the anchor points, processes them with a transformer decoder, and feeds the output to four MLP heads: (i) class probabilities, (ii) Δx, Δy, sin θ, cos θ, (iii) mode probabilities for multimodal futures, and (iv) per‑mode local displacement sequences.
Loss: After Hungarian matching, the loss aggregates classification cross‑entropy, L2 position error, heading regression (via sin/cos), and trajectory mode losses.

Experimental Results
Evaluated on the Waymo Open Motion Dataset, MatchInformer achieves:

Occupancy Prediction: MCC improvements of roughly 7–12 % over SceneInformer, indicating more reliable detection of hidden agents.
Trajectory Prediction: Lower minADE (≈0.37 m vs 0.42 m) and minFDE (≈0.71 m vs 0.78 m), demonstrating more accurate future path forecasts.
Redundancy Reduction: The number of candidate trajectories fed to the planner drops by >30 %, directly reducing planning latency and computational demand.

Discussion
The integration of Hungarian matching, a staple of modern object detection (e.g., DETR), into occlusion‑aware prediction is novel and proves effective. However, the current cost function only accounts for position and class confidence; incorporating velocity, acceleration, or scene‑context cues could further refine assignments, especially in dense traffic. The heading‑decoupled representation works well for moderate turns but may need additional regularization for abrupt maneuvers. Finally, while MCC offers a balanced metric, safety‑critical deployments might benefit from risk‑weighted measures that prioritize false negatives (missed hidden agents) over false positives.

Conclusion
MatchInformer advances occluded agent prediction by eliminating duplicate occupancy forecasts, improving trajectory realism through heading separation, and providing a robust evaluation framework with MCC. The method sets a new state‑of‑the‑art on Waymo’s benchmark and offers a practical pathway for autonomous systems to reason more reliably about hidden road users, ultimately contributing to safer and more efficient autonomous driving.

Don't double it: Efficient Agent Prediction in Occlusions

💡 Research Summary

Comments & Academic Discussion

Leave a Comment