Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Visual place recognition and simultaneous localization and mapping (SLAM) have recently begun to be used in real-world autonomous navigation tasks like food delivery. Existing datasets for SLAM research are often not representative of in situ operations, leaving a gap between academic research and real-world deployment. In response, this paper presents the Segway DRIVE benchmark, a novel and challenging dataset suite collected by a fleet of Segway delivery robots. Each robot is equipped with a global-shutter fisheye camera, a consumer-grade IMU synced to the camera on chip, two low-cost wheel encoders, and a removable high-precision lidar for generating reference solutions. As they routinely carry out tasks in office buildings and shopping malls while collecting data, the dataset spanning a year is characterized by planar motions, moving pedestrians in scenes, and changing environment and lighting. Such factors typically pose severe challenges and may lead to failures for SLAM algorithms. Moreover, several metrics are proposed to evaluate metric place recognition algorithms. With these metrics, sample SLAM and metric place recognition methods were evaluated on this benchmark. The first release of our benchmark has hundreds of sequences, covering more than 50 km of indoor floors. More data will be added as the robot fleet continues to operate in real life. The benchmark is available at http://drive.segwayrobotics.com/#/dataset/download.

💡 Research Summary

The paper introduces the Segway DRIVE benchmark, a large‑scale dataset designed to evaluate visual place recognition and simultaneous localization and mapping (SLAM) algorithms in realistic indoor delivery‑robot scenarios. Existing SLAM benchmarks such as KITTI, Oxford RobotCar, and NCL‑T are predominantly outdoor or limited‑scope indoor collections that do not capture the full complexity of commercial service robots. To fill this gap, the authors collected data over a year from a fleet of Segway delivery robots operating in office buildings and shopping malls across five to eight distinct locations. The first release contains roughly 100 sequences, covering more than 50 km of floor space.

Sensor Suite and Calibration
Each robot carries an Intel RealSense ZR300 visual‑inertial (VI) sensor equipped with a global‑shutter fisheye camera (166.5° field‑of‑view, 30 fps capture, 10 fps stored) and a BMI055 IMU (250 Hz accelerometer, 200 Hz gyroscope). Two wheel encoders provide differential‑steering odometry. For ground‑truth generation, a Hokuyo UTM‑30LX 2‑D lidar is temporarily mounted; its scans are processed offline with a loop‑closure algorithm achieving ~5 cm positional accuracy. Camera‑IMU hardware synchronization ensures precise timestamps, and all intrinsic, extrinsic, and noise‑characteristic parameters are supplied (Kalibr for camera‑IMU, CAD drawings for wheel‑to‑camera transforms, Allan‑variance analysis for IMU noise).

Data Organization
Each sequence is delivered as a ROS bag containing three primary topics: /cam0/image_raw (fisheye images at ~10 Hz), /imu0 (IMU data at 200 Hz, interpolated to match gyro timestamps), and /tf0 (camera poses derived from wheel odometry, expressed in an Earth‑fixed frame). Wheel encoder ticks are integrated using a differential steering model, yielding 2‑D odometry that is lifted to 3‑D poses under the assumption of zero vertical translation and zero roll/pitch. Ground‑truth trajectories are provided as CSV files containing the camera pose in an Earth‑fixed world frame, generated from the lidar scans after careful time‑offset estimation (via SLERP‑based angular‑rate correlation) and extrinsic calibration (camodocal and OOMACT toolboxes).

Real‑World Challenges Captured
The dataset deliberately includes conditions that stress modern SLAM pipelines:

Stationary periods where visual depth estimation is ambiguous.
Rapid rotations causing motion blur and wheel slip.
Dynamic environments with moving pedestrians and objects.
Repetitive architectural elements leading to false place matches.
Long‑term illumination and structural changes (different times of day, refurbishment over months).
Rough flooring that excites high‑frequency IMU noise.
Low‑texture, reflective, or shadowed surfaces that degrade feature tracking.

Because the robots move primarily on planar floors, visual‑inertial navigation systems (VINS) suffer from unobservable global position and yaw. The inclusion of wheel odometry explicitly addresses this observability issue, providing a reliable relative motion prior.

Evaluation Metrics
The authors adopt two classic SLAM error measures from the literature: Relative Pose Error (RPE) – split into Relative Translation Error (RTE) and Relative Rotation Error (RRE) – for odometry assessment, and Absolute Trajectory Error (ATE) for global map consistency. For metric place recognition, they propose novel metrics that count the number of successful localizations (Nₚ) and evaluate false‑positive rates by comparing the relative motion between consecutive recognitions against wheel‑odometry‑derived motion. The method assumes false positives are independent and that the differential drive model accurately propagates pose uncertainty.

Baseline Experiments
Several state‑of‑the‑art algorithms were benchmarked: visual‑inertial odometry (ORB‑SLAM3, VINS‑Mono), lidar‑based SLAM (LOAM, Cartographer), and metric place recognition approaches (SeqSLAM, NetVLAD‑based retrieval). Across the board, performance degraded sharply in the presence of dynamic people, illumination shifts, and abrupt turns. VINS methods exhibited drift in yaw and global position due to planar motion observability limits; adding wheel odometry mitigated but did not eliminate the issue. Lidar‑based methods maintained higher absolute accuracy but rely on the temporary lidar hardware, which is not always available in production robots. Metric place recognition showed reasonable recall but suffered from higher false‑positive rates in repetitive corridors.

Significance and Future Directions
Segway DRIVE offers a realistic, reproducible testbed for indoor robot navigation research. By providing raw sensor streams, calibrated parameters, high‑quality ground truth, and evaluation scripts, it enables systematic comparison of algorithms under conditions that closely mirror commercial deployment. The benchmark highlights three research frontiers: (1) robust sensor fusion that compensates for unobservable states in planar motion, (2) algorithms resilient to dynamic, low‑texture, and illumination‑varying environments, and (3) scalable metric place‑recognition pipelines that can operate without reliance on lidar. The authors plan to expand the dataset with additional locations, longer time spans, and potentially higher‑frequency sensors, fostering continued progress toward reliable autonomous delivery robots.

Segway DRIVE Benchmark: Place Recognition and SLAM Data Collected by A Fleet of Delivery Robots

💡 Research Summary

Comments & Academic Discussion

Leave a Comment