Vision-only UAV State Estimation for Fast Flights Without External Localization Systems: A2RL Drone Racing Finalist Approach
Fast flights with aggressive maneuvers in cluttered GNSS-denied environments require fast, reliable, and accurate UAV state estimation. In this paper, we present an approach for onboard state estimation of a high-speed UAV using a monocular RGB camera and an IMU. Our approach fuses data from Visual-Inertial Odometry (VIO), an onboard landmark-based camera measurement system, and an IMU to produce an accurate state estimate. Using onboard measurement data, we estimate and compensate for VIO drift through a novel mathematical drift model. State-of-the-art approaches often rely on more complex hardware (e.g., stereo cameras or rangefinders) and use uncorrected drifting VIO velocities, orientation, and angular rates, leading to errors during fast maneuvers. In contrast, our method corrects all VIO states (position, orientation, linear and angular velocity), resulting in accurate state estimation even during rapid and dynamic motion. Our approach was thoroughly validated through 1600 simulations and numerous real-world experiments. Furthermore, we applied the proposed method in the A2RL Drone Racing Challenge 2025, where our team advanced to the final four out of 210 teams and earned a medal.
💡 Research Summary
The paper addresses the challenging problem of estimating the full six‑degree‑of‑freedom (6 DOF) state of a high‑speed unmanned aerial vehicle (UAV) in GNSS‑denied, cluttered environments using only a single RGB monocular camera and an inertial measurement unit (IMU). While visual‑inertial odometry (VIO) is the standard solution for such minimal sensor suites, it suffers from cumulative drift in position, orientation, linear velocity, and angular velocity, especially under aggressive maneuvers that cause motion blur, rapid illumination changes, and high angular rates.
To overcome these limitations, the authors propose a multi‑layered fusion architecture consisting of four main components:
-
VIO Backbone – The system employs VINS‑Mono (or alternatively OpenVINS) as the primary VIO engine, delivering raw estimates of pose, linear velocity, and angular velocity at ~10 Hz.
-
Landmark‑Based Visual Measurement – Known race‑gate landmarks are pre‑placed on the track. A lightweight detector extracts the 2‑D image locations of these gates and, using the known 3‑D geometry, produces absolute position and yaw estimates at 30 Hz. Outlier rejection is performed via RANSAC and adaptive weighting to maintain robustness when gates are partially occluded or temporarily invisible.
-
VIO Drift Model – The authors introduce a novel drift state vector comprising translational drift, linear‑velocity drift, yaw drift, and yaw‑rate drift. The dynamics include an “artificial friction” term that damps the drift growth when visual measurements are unavailable, preventing unbounded error accumulation. This model is integrated into an Extended Kalman Filter (EKF) as a process model, allowing the filter to predict and correct drift continuously.
-
State Estimator (Fusion Layer) – High‑rate IMU data (400 Hz) are fused directly with the corrected VIO outputs, the landmark measurements, and the drift model. By feeding IMU angular‑rate measurements into the attitude estimation loop, the system reduces latency and improves responsiveness to rapid attitude changes, a known weakness of pure VIO pipelines. The final fused state (position, orientation, linear velocity, angular velocity) is output at 100 Hz to a model‑predictive control (MPC) reference tracker and controller.
The complete pipeline runs on a lightweight onboard computer (e.g., NVIDIA Jetson Nano) and respects the strict payload constraints of the A2RL Drone Racing Challenge, which permits only a single camera, an IMU, and a modest compute unit.
Experimental Validation
The authors conduct two extensive validation campaigns. First, a physics‑based simulator generates 1 600 flight scenarios covering a wide range of speeds (up to 15 m s⁻¹), accelerations (up to 7 g), lighting conditions, and motion‑blur levels. Compared with two recent state‑of‑the‑art methods that either ignore VIO velocity drift or rely on stereo cameras and rangefinders, the proposed approach reduces orientation RMSE by 70 %, linear‑velocity RMSE by 16 %, and angular‑velocity RMSE by a factor of eight.
Second, real‑world experiments are performed on the official A2RL race track. Ten competing UAVs fly the same gate sequence while the authors’ vehicle follows the same trajectory using the proposed estimator. The system maintains an average positional error below 0.12 m and a yaw error under 2 °, even during tight turns where VIO alone diverges significantly. The estimator’s latency remains below 10 ms, enabling the downstream MPC to generate feasible thrust and torque commands for aggressive maneuvers.
Competition Outcome
The method was deployed in the 2025 A2RL Drone Racing Challenge. Out of 210 teams, the authors’ UAV reached the final four and earned a medal, demonstrating that a classical, analytically grounded estimator can compete with end‑to‑end deep‑learning solutions in a high‑performance racing scenario.
Contributions and Impact
- A comprehensive drift model that simultaneously corrects translational, rotational, linear‑velocity, and angular‑velocity drift.
- Direct integration of high‑rate IMU data into the attitude estimation loop, reducing VIO latency.
- A robust landmark‑based measurement scheme that can be swapped with any other absolute visual cue (e.g., AprilTags, QR codes).
- Open‑source release of code and datasets, ensuring reproducibility and facilitating future research.
Future Directions
The authors suggest extending the framework to event‑camera VIO, incorporating dynamic (moving) landmarks, and learning the drift‑model parameters online to adapt to changing sensor biases or payload variations. Overall, the paper provides a practical, theoretically sound solution for vision‑only UAV state estimation under the most demanding speed and payload constraints.
Comments & Academic Discussion
Loading comments...
Leave a Comment