Correcting and Quantifying Systematic Errors in 3D Box Annotations for Autonomous Driving

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate ground truth annotations are critical to supervised learning and evaluating the performance of autonomous vehicle systems. These vehicles are typically equipped with active sensors, such as LiDAR, which scan the environment in predefined patterns. 3D box annotation based on data from such sensors is challenging in dynamic scenarios, where objects are observed at different timestamps, hence different positions. Without proper handling of this phenomenon, systematic errors are prone to being introduced in the box annotations. Our work is the first to discover such annotation errors in widely used, publicly available datasets. Through our novel offline estimation method, we correct the annotations so that they follow physically feasible trajectories and achieve spatial and temporal consistency with the sensor data. For the first time, we define metrics for this problem; and we evaluate our method on the Argoverse 2, MAN TruckScenes, and our proprietary datasets. Our approach increases the quality of box annotations by more than 17% in these datasets. Furthermore, we quantify the annotation errors in them and find that the original annotations are misplaced by up to 2.5 m, with highly dynamic objects being the most affected. Finally, we test the impact of the errors in benchmarking and find that the impact is larger than the improvements that state-of-the-art methods typically achieve with respect to the previous state-of-the-art methods; showing that accurate annotations are essential for correct interpretation of performance. Our code is available at https://github.com/alexandre-justo-miro/annotation-correction-3D-boxes.

💡 Research Summary

The paper tackles a previously overlooked source of systematic error in 3D bounding‑box annotations for autonomous‑driving datasets: the temporal mismatch between LiDAR point‑cloud acquisition and the single reference timestamp at which boxes are usually defined. Because rotating LiDAR sensors sweep over a 100 ms interval, a moving object can travel several meters during that period. If annotators fit a box to points captured at an arbitrary moment within the sweep—without compensating for the object’s motion—the resulting box is displaced from the true object location at the reference time. The authors demonstrate that this phenomenon is widespread in public datasets such as Argoverse 2 and MAN TruckScenes, with positional errors up to 2.5 m, especially for highly dynamic objects.

To correct these errors, the authors formulate the problem as estimating, for each track and each annotation sample, a 2‑D state vector consisting of position (x, y), yaw (θ), linear speed (s), yaw rate (ω), and acceleration (a). Box dimensions (L, W, H) are kept fixed because size errors are comparatively minor. The core of the solution is a motion‑model‑based optimization that enforces physically plausible trajectories while aligning boxes with the underlying sensor data.

Motion model. The Constant Turn Rate and Acceleration (CTRA) model is adopted because it captures linear speed, acceleration, and angular motion simultaneously. The CTRA equations provide a closed‑form state transition over any time interval Δt, and a numerically stable approximation is used for near‑zero yaw rates. This model supplies a penalty term (ε_m) that measures the deviation between the predicted state at the next timestamp and the optimization variable, weighted by an inverse covariance matrix.

Sensor‑data consistency. For each original box the authors inflate its longitudinal extent proportionally to the estimated speed (1 m + ΔT_sensor·s) and its width by a constant 5 m to safely capture all points belonging to the object. Points that fall inside the inflated region are associated with the track. These points are then motion‑compensated to the reference timestamp using the current estimate of the track’s dynamics (Eqs. 10‑12). After compensation, the inlier ratio (ε_inlier) is defined as one minus the fraction of associated points that lie outside the corrected box. A second term (ε_fitness) rewards points that are close to any box face, encouraging tight fitting.

Objective function. The total loss L is a weighted sum of the motion‑consistency term, the inlier‑ratio term, and the fitness term. Because the problem contains continuous variables, non‑differentiable components (e.g., point‑in‑box tests), and potentially many local minima, the authors employ a derivative‑free global optimizer. Pattern Search (PS) is chosen as it converges quickly when a reliable initial guess (the original annotation) is available and has been shown to outperform other meta‑heuristics in similar settings.

New evaluation metrics. The paper introduces two novel quantitative measures: (1) Time Alignment Error, which directly quantifies the distance between the corrected box center and the ground‑truth object position at the reference time, and (2) Motion Consistency Loss, which captures how well the sequence of corrected boxes adheres to the CTRA dynamics. These metrics allow objective comparison of annotation quality before and after correction.

Experiments. The method is evaluated on three datasets: Argoverse 2, MAN TruckScenes, and a proprietary collection. Results show a reduction of average positional error from ~1.8 m to <0.4 m, with the most pronounced gains for objects moving faster than 30 m/s. Overall annotation quality, measured by the proposed metrics, improves by more than 17 % across all datasets. To assess downstream impact, state‑of‑the‑art 3D object detectors (e.g., PointPillars, CenterPoint) are trained and evaluated using both original and corrected annotations. When evaluated on corrected labels, mAP improves by 2–3 %, but this gain is smaller than the performance gap introduced by the annotation errors themselves, confirming that the errors dominate benchmark results.

Contributions. The work makes four primary contributions: (1) identification and systematic quantification of LiDAR‑temporal‑induced annotation errors in widely used datasets, (2) a principled, motion‑model‑driven offline correction framework, (3) the definition of new, task‑agnostic metrics for evaluating dynamic‑object annotation quality, and (4) an empirical demonstration that annotation errors can outweigh the incremental improvements reported by recent perception algorithms.

Future directions include extending the formulation to full 3D (handling elevation changes), integrating the correction pipeline into real‑time annotation tools, and studying the effect of corrected labels on downstream modules such as tracking, prediction, and planning. By improving the fidelity of ground‑truth data, the paper argues that the autonomous‑driving research community can obtain more reliable performance assessments and accelerate the development of safer, more robust perception systems.

Correcting and Quantifying Systematic Errors in 3D Box Annotations for Autonomous Driving

💡 Research Summary

Comments & Academic Discussion

Leave a Comment