Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV’s viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA’s photorealistic rendering with NASA’s GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
💡 Research Summary
**
The paper addresses a critical challenge in autonomous aerial vehicle (AAV) landing: the extreme scale variation of helipads as the vehicle descends from high altitude to touchdown. While state‑of‑the‑art object detectors such as YOLOv8 provide strong baseline performance, a single‑model pipeline degrades when the same object appears both as a few‑pixel target at altitude and as a large, dominant region near the ground. To overcome this, the authors propose a scale‑adaptive dual‑expert perception framework that explicitly separates the detection task into far‑range (small helipad) and near‑range (large helipad) regimes.
Two YOLOv8 models are trained independently on scale‑specialized subsets of the HelipadCat dataset. The far‑range expert uses images up‑scaled to 832 × 832 pixels, emphasizing features useful for detecting tiny, low‑resolution helipads. The near‑range expert is trained on 512 × 512 images where the helipad occupies a substantial portion of the frame, allowing the network to learn precise boundary localization. Both models share identical hyper‑parameters (batch size = 16, 100 epochs, SGD) to ensure a fair comparison and to isolate the effect of scale specialization.
During inference, both experts run in parallel on each monocular RGB frame captured by a downward‑facing camera. A deterministic geometric gating mechanism evaluates the Euclidean distance between each predicted bounding‑box centre and the image centre (the camera’s optical axis). The expert whose prediction is closest to the optical axis is selected, effectively routing the input to the model best aligned with the current viewpoint. This hard selection avoids reliance on raw confidence scores, which are known to be poorly calibrated across scale transitions. The chosen bounding box then passes through a short temporal‑smoothing filter (e.g., a three‑frame moving average) to suppress frame‑to‑frame jitter.
The perception module is embedded in a closed‑loop landing simulation that couples CARLA’s photorealistic rendering with NASA’s GUAM flight‑dynamics engine. This integrated environment enables controlled variation of altitude, approach angle, illumination, and weather while providing realistic vehicle dynamics. Ten randomized landing trials are conducted, comparing three configurations: (1) a single YOLOv8 detector trained on the full dataset, (2) a conventional confidence‑based fusion of multiple detectors, and (3) the proposed dual‑expert framework.
Results demonstrate substantial improvements. Detection success at high altitude rises from 92 % (single model) to 98 % (dual‑expert). Bounding‑box jitter, measured as root‑mean‑square pixel deviation, drops from 4.3 px to 1.2 px, a 72 % reduction. Final landing position error shrinks from an average of 0.45 m to 0.18 m, indicating more accurate visual‑servoing and depth control. Notably, the dual‑expert system eliminates the “detection loss” that typically occurs during the critical transition between far‑range and near‑range regimes.
The paper’s contributions are threefold: (i) a systematic method for creating scale‑specific training subsets and training dedicated YOLOv8 experts, (ii) a geometry‑based hard gating strategy combined with temporal smoothing that provides stable, jitter‑free expert selection, and (iii) a high‑fidelity simulation‑to‑simulation evaluation that validates end‑to‑end landing performance rather than isolated detection metrics. By adapting the Mixture‑of‑Experts (MoE) concept to a single visual modality focused on scale, the work fills a gap left by prior MoE research, which has largely targeted heterogeneous data sources or multi‑domain tasks.
Future directions suggested include adding intermediate‑scale experts for a three‑stage routing scheme, replacing the deterministic gate with a lightweight learned router (e.g., a small MLP) to enable smoother transitions, and extending the framework to multimodal sensor fusion (LiDAR, radar) within an MoE architecture. Real‑world flight tests are also proposed to assess the simulation‑to‑real transfer and to explore robustness under adverse weather, motion blur, and sensor noise. Overall, the study demonstrates that scale‑aware expert routing can dramatically improve the reliability of vision‑based AAV landing pipelines, paving the way for more resilient autonomous aerial operations in GPS‑denied or visually degraded environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment