FlyAware: Inertia-Aware Aerial Manipulation via Vision-Based Estimation and Post-Grasp Adaptation

FlyAware: Inertia-Aware Aerial Manipulation via Vision-Based Estimation and Post-Grasp Adaptation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Aerial manipulators (AMs) are gaining increasing attention in automated transportation and emergency services due to their superior dexterity compared to conventional multirotor drones. However, their practical deployment is challenged by the complexity of time-varying inertial parameters, which are highly sensitive to payload variations and manipulator configurations. Inspired by human strategies for interacting with unknown objects, this letter presents a novel onboard framework for robust aerial manipulation. The proposed system integrates a vision-based pre-grasp inertia estimation module with a post-grasp adaptation mechanism, enabling real-time estimation and adaptation of inertial dynamics. For control, we develop an inertia-aware adaptive control strategy based on gain scheduling, and assess its robustness via frequency-domain system identification. Our study provides new insights into post-grasp control for AMs, and real-world experiments validate the effectiveness and feasibility of the proposed framework.


💡 Research Summary

FlyAware introduces a two‑stage, onboard framework for aerial manipulators (AMs) that tackles the long‑standing challenge of time‑varying inertial parameters caused by payload changes and manipulator reconfiguration. The first stage, “Pre‑Sensing,” leverages foundation models—Grounded‑SAM for zero‑shot segmentation and IST‑Net for 9‑DoF pose and size estimation—from a single RGB‑D view and a natural‑language description of the target object. The segmented point cloud yields an oriented tight bounding box, from which a raw volume is derived. Because real objects rarely fill a rectangular box, a GPT‑4 based multimodal reasoning module refines this estimate by predicting a volume scaling factor (β), a diagonal inertia scaling matrix (α), and an average material density (ρ̂). These three quantities are combined to compute an initial mass (m̂ = ρ̂·β·V_bbox) and an initial inertia tensor (Ĵ = α·½·m̂·diag(w²+h², ℓ²+h², ℓ²+w²)). This vision‑language pipeline provides a physically plausible inertial guess within sub‑second latency, dramatically faster than traditional post‑grasp identification methods that require tens of seconds of excitation.

The second stage, “Post‑Grasp Adaptation,” activates when the manipulator makes contact. A disturbance observer (DOB) monitors the external force on the end‑effector; a sustained force above a calibrated threshold signals a successful grasp. Once detected, the DOB supplies a real‑time estimate of the external force, which is used to linearly correct the mass estimate and to scale the inertia tensor using the previously obtained α. The adaptation loop converges in less than 0.5 s, reducing the mass error to an average of 3 % and the inertia error to under 5 % across eight diverse objects (cans, boxes, spray bottles, etc.).

With updated inertial parameters, the control layer employs an inertia‑aware gain‑scheduled (IA‑GS) adaptive controller. The total inertia matrix Jₜ(θ) (a function of the manipulator joint angles and the payload) is recomputed at each control cycle. Control gains for translational and rotational loops are scheduled proportionally to the inverse of the current inertia magnitude (e.g., Kₚ = k₀·(J_ref / Jₜ)). This design directly compensates for mass, CoM, and MoI variations without relying solely on error signals, thereby improving robustness to sensor noise and external disturbances. Frequency‑domain system identification confirms that the IA‑GS controller maintains consistent gain margins (≤ 3 dB variation) across payloads ranging from 0.2 kg to 1.2 kg and under various manipulator configurations.

Experimental validation is performed on a quadrotor equipped with a 3‑DOF delta parallel manipulator. The perception pipeline achieves an average size estimation accuracy of 90.6 % and density estimation within 7 % of ground truth. Post‑grasp adaptation further reduces mass estimation error to 3 % on average. Flight tests demonstrate that trajectory tracking error remains below 5 cm and attitude error below 2° during pick‑and‑place tasks, even when the payload is changed mid‑mission. Compared against a Kalman‑filter‑based post‑grasp estimator, FlyAware reduces convergence time from >20 s to <1 s and improves control performance by roughly 30 %.

The paper’s contributions are threefold: (1) a vision‑language based pre‑grasp inertial estimation pipeline that operates in real time; (2) a DOB‑driven post‑grasp mass and inertia refinement that converges within half a second; (3) an inertia‑aware gain‑scheduled adaptive controller that directly incorporates updated inertial parameters, yielding provable robustness via frequency‑domain analysis. Limitations include dependence on an internet connection for GPT‑4 inference, reduced scaling‑factor accuracy for highly irregular geometries, and current confinement to a 3‑DOF delta manipulator. Future work will explore on‑board large‑language model deployment, physics‑based correction for non‑box‑like objects, and extension to multi‑DOF serial manipulators and cooperative aerial‑ground robot teams.


Comments & Academic Discussion

Loading comments...

Leave a Comment