DrivIng: A Large-Scale Multimodal Driving Dataset with Full Digital Twin Integration

DrivIng: A Large-Scale Multimodal Driving Dataset with Full Digital Twin Integration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Perception is a cornerstone of autonomous driving, enabling vehicles to understand their surroundings and make safe, reliable decisions. Developing robust perception algorithms requires large-scale, high-quality datasets that cover diverse driving conditions and support thorough evaluation. Existing datasets often lack a high-fidelity digital twin, limiting systematic testing, edge-case simulation, sensor modification, and sim-to-real evaluations. To address this gap, we present DrivIng, a large-scale multimodal dataset with a complete geo-referenced digital twin of a ~18 km route spanning urban, suburban, and highway segments. Our dataset provides continuous recordings from six RGB cameras, one LiDAR, and high-precision ADMA-based localization, captured across day, dusk, and night. All sequences are annotated at 10 Hz with 3D bounding boxes and track IDs across 12 classes, yielding ~1.2 million annotated instances. Alongside the benefits of a digital twin, DrivIng enables a 1-to-1 transfer of real traffic into simulation, preserving agent interactions while enabling realistic and flexible scenario testing. To support reproducible research and robust validation, we benchmark DrivIng with state-of-the-art perception models and publicly release the dataset, digital twin, HD map, and codebase.


💡 Research Summary

The paper introduces DrivIng, a comprehensive multimodal driving dataset that uniquely couples a large‑scale real‑world recording with a fully geo‑referenced digital twin of the same environment. The dataset spans an approximately 18 km route covering urban streets, suburban neighborhoods, and highway segments, captured continuously in three lighting conditions (day, dusk, night). Sensor hardware consists of six high‑resolution RGB cameras (1920×1080, 20 fps, providing 360° coverage), a 128‑laser‑ray LiDAR (Robosense Ruby Plus, 20 fps, 240 m range), and a high‑precision ADMA‑based GPS/IMU (100 fps, centimeter‑level accuracy). In total, 63 k synchronized frames were recorded, yielding about 378 k images and 63 k LiDAR sweeps.

Annotations are provided at 10 Hz with 3D bounding boxes, unique track IDs, and class labels for 12 object categories (car, van, bus, truck, trailer, cyclist, e‑scooter, motorcycle, pedestrian, other‑pedestrian, animal, other). The labeling effort resulted in roughly 1.2 million annotated instances, and each frame contains on average 20.6 objects (day), 15.0 (dusk), and 12.8 (night). The authors performed multiple rounds of visual inspection to ensure high label quality and applied Gaussian blurring to faces and license plates for privacy.

The standout contribution is the creation of a high‑fidelity digital twin: the entire recorded route is reconstructed in a simulation platform (CARLA) with exact geometry, HD map layers, and synchronized traffic agents whose trajectories are transferred 1‑to‑1 from the real world. This enables researchers to replay real traffic scenarios in simulation, modify environmental conditions (e.g., weather, lighting, sensor parameters), and conduct systematic, reproducible evaluations of perception algorithms. Unlike prior large datasets such as KITTI, nuScenes, or Waymo, which lack such a twin, DrivIng bridges the sim‑to‑real domain gap, facilitating robust testing of edge cases and cooperative perception setups.

To demonstrate utility, the authors benchmark state‑of‑the‑art 3D object detection and tracking models implemented in MMDetection3D on the real‑world portion of the dataset. Results show performance variations across lighting conditions, underscoring the importance of multimodal, multi‑time‑of‑day data. The paper also releases a nuScenes‑format converter, the full codebase, and the digital twin assets, encouraging immediate adoption by the community.

Compared to existing datasets, DrivIng offers (1) continuous long‑range trajectories rather than fragmented short clips, (2) a full 360° camera rig plus dense LiDAR, (3) precise GPS/IMU positioning for exact map alignment, (4) a publicly available digital twin enabling 1‑to‑1 scenario replay, and (5) extensive benchmark baselines. Limitations include the single‑vehicle platform (Audi Q8 e‑tron) and a fixed sensor layout, which may restrict direct transfer to other vehicle configurations. Moreover, while the twin captures static infrastructure and dynamic traffic, it does not yet model complex weather phenomena (rain, snow, fog) or detailed road surface conditions, which are important for full domain transfer.

Future work suggested by the authors includes expanding the sensor suite to multiple vehicles, enriching the twin with weather and road‑surface physics, and integrating real‑time sim‑real feedback loops for cooperative perception and multi‑agent planning research. Overall, DrivIng represents a significant step toward unified, reproducible autonomous‑driving research by providing a dataset that is simultaneously large‑scale, richly annotated, and tightly coupled with a high‑fidelity simulation environment.


Comments & Academic Discussion

Loading comments...

Leave a Comment