TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Collecting a high-quality dataset is a critical task that demands meticulous attention to detail, as overlooking certain aspects can render the entire dataset unusable. Autonomous driving challenges remain a prominent area of research, requiring further exploration to enhance the perception and planning performance of vehicles. However, existing datasets are often incomplete. For instance, datasets that include perception information generally lack planning data, while planning datasets typically consist of extensive driving sequences where the ego vehicle predominantly drives forward, offering limited behavioral diversity. In addition, many real datasets struggle to evaluate their models, especially for planning tasks, since they lack a proper closed-loop evaluation setup. The CARLA Leaderboard 2.0 challenge, which provides a diverse set of scenarios to address the long-tail problem in autonomous driving, has emerged as a valuable alternative platform for developing perception and planning models in both open-loop and closed-loop evaluation setups. Nevertheless, existing datasets collected on this platform present certain limitations. Some datasets appear to be tailored primarily for limited sensor configuration, with particular sensor configurations. To support end-to-end autonomous driving research, we have collected a new dataset comprising over 2.85 million frames using the CARLA simulation environment for the diverse Leaderboard 2.0 challenge scenarios. Our dataset is designed not only for planning tasks but also supports dynamic object detection, lane divider detection, centerline detection, traffic light recognition, prediction tasks and visual language action models . Furthermore, we demonstrate its versatility by training various models using our dataset. Moreover, we also provide numerical rarity scores to understand how rarely the current state occurs in the dataset.

💡 Research Summary

The paper introduces TaCarla, a large‑scale, comprehensive benchmarking dataset for end‑to‑end autonomous driving research built on the CARLA Leaderboard 2.0 platform. Existing datasets either focus on perception (e.g., KITTI, nuScenes) and lack planning data, or they provide planning data but suffer from limited behavioral diversity, sensor configurations, or sub‑optimal expert policies that cause oscillations. To address these gaps, the authors collected over 2.85 million frames (recorded at 10 Hz) across all 36 Leaderboard 2.0 scenarios using the CARLA 0.9.15 simulator.

Key design choices:

Expert Policy – The robust PDM rule‑based expert is used for data collection, eliminating the oscillation problems observed in Bench2Drive’s RL‑based expert.
Sensor Suite – The dataset adopts the NuScenes sensor configuration (6 RGB cameras, 5 radars, 1 LiDAR) providing 360° coverage, which is more versatile than the limited front‑only setups of prior CARLA datasets. Additional modalities include bird’s‑eye‑view (BEV) RGB, depth, instance segmentation, and semantic segmentation images.
Rich Annotations – For each frame the dataset supplies 3D bounding boxes for seven dynamic object classes (pedestrians, cars, police, ambulance, firetruck, cross‑bike, construction), 2D traffic‑light boxes, lane‑divider and centerline maps, BEV lane guidance, depth maps, and rule‑based textual descriptions to support LLM‑driven research.
Scenario Diversity & Rarity Scoring – The authors provide a normalized rarity score for each state, quantifying how often a particular configuration appears in the dataset. This enables researchers to identify and focus on long‑tail events. Comparative heatmaps (Figure 2) and scenario counts (Table 2) demonstrate that TaCarla contains far more lane‑change, merging, and multi‑agent interactions than Bench2Drive or PDM‑Lite.
Weather Variability – Weather parameters (cloudiness, fog, precipitation, etc.) are recorded and bucketed into four intensity levels, ensuring models can be trained under diverse environmental conditions.

The paper also presents extensive baseline experiments. State‑of‑the‑art models are trained for each task: PointPillars/CenterPoint for 3D object detection, SCNN and CurveLanes for 2D lane detection, OpenLane‑V2 for 3D lane/centerline detection, Faster‑RCNN variants for traffic‑light recognition, TrajectoryCNN for motion prediction, and the original PDM planner for closed‑loop planning. Performance metrics, training times, and inference speeds are reported, showing that the richer sensor suite and multi‑task annotations lead to measurable improvements over prior CARLA datasets.

All data, annotations, visualization tools, and baseline code are released publicly via GitHub and HuggingFace, facilitating immediate reproducibility and community adoption. By combining a robust expert policy, a widely used sensor configuration, extensive multi‑task labels, and a systematic rarity quantification, TaCarla positions itself as a one‑stop benchmark for both modular and end‑to‑end autonomous driving research, potentially becoming the new standard for evaluating perception, prediction, and planning under realistic, long‑tail driving conditions.

TaCarla: A comprehensive benchmarking dataset for end-to-end autonomous driving

💡 Research Summary

Comments & Academic Discussion

Leave a Comment