FloPE: Flower Pose Estimation for Precision Pollination
This study presents Flower Pose Estimation (FloPE), a real-time flower pose estimation framework for computationally constrained robotic pollination systems. Robotic pollination has been proposed to supplement natural pollination to ensure global food security due to the decreased population of natural pollinators. However, flower pose estimation for pollination is challenging due to natural variability, flower clusters, and high accuracy demands due to the flowers’ fragility when pollinating. This method leverages 3D Gaussian Splatting to generate photorealistic synthetic datasets with precise pose annotations, enabling effective knowledge distillation from a high-capacity teacher model to a lightweight student model for efficient inference. The approach was evaluated on both single and multi-arm robotic platforms, achieving a mean pose estimation error of 0.6 cm and 19.14 degrees within a low computational cost. Our experiments validate the effectiveness of FloPE, achieving up to 78.75% pollination success rate and outperforming prior robotic pollination techniques.
💡 Research Summary
The paper introduces FloPE (Flower Pose Estimation), a real‑time 6‑DoF pose estimation framework designed for computationally constrained robotic pollination systems. Recognizing that declining natural pollinator populations threaten global food security, the authors aim to enable low‑cost manipulators to perform precision pollination without damaging delicate flowers. The core contributions are threefold. First, they employ 3D Gaussian Splatting (3DGS) to reconstruct photorealistic plant models from video captured of real plants. Using COLMAP they obtain a colored point cloud, initialize Gaussian primitives, and train a 3DGS model that can render high‑fidelity images from arbitrary camera viewpoints. A custom 3D annotation tool allows a human operator to label the 6‑DoF pose of each flower in the reconstructed model within minutes, providing accurate ground‑truth pose data for synthetic image generation. Second, they adopt a teacher‑student knowledge distillation pipeline. Grounding DINO, prompted with “white flower,” generates pseudo‑bounding boxes on each rendered image; these boxes are refined into segmentation masks by the Segment Anything Model (SAM). The resulting annotations train a lightweight YOLOv11‑nano detector, which runs on edge hardware (e.g., Nvidia Jetson) at >30 fps, delivering real‑time flower detection and mask extraction. Third, they design a pose regression network based on PoseNet that predicts rotational features in ℝ⁹. The 9‑dimensional vector is reshaped into a 3×3 matrix and orthogonalized via Singular Value Decomposition (SVD) to obtain a valid rotation matrix in SO(3). This representation avoids quaternion double‑cover and Euler angle singularities, enabling smooth L2 loss optimization. Because many flowers exhibit radial symmetry, yaw angles are zeroed before loss computation. Depth images together with camera intrinsics map 2D pixel coordinates to 3‑D flower centroids; these are transformed to world coordinates using the known camera pose. To mitigate sensor noise and regression errors, an Extended Kalman Filter (EKF) refines both position and orientation. The EKF operates on the ℝ⁹ rotation representation, updating the state and then projecting back onto SO(3) via SVD, thus preserving manifold constraints while providing continuous filtering. A global flower state manager matches new detections to existing tracks based on Euclidean proximity, updating each flower’s EKF‑refined pose. The system’s “Commander” module orchestrates exploration and pollination phases. During exploration, the robot collects multi‑view RGB‑D data to improve pose estimates; once a flower’s pose reaches a confidence threshold, the robot switches to pollination mode, first moving to a coarse pose and then employing visual servoing for fine alignment, discarding the global estimate in favor of real‑time feedback. Experimental validation on a single‑arm UR5 and a multi‑arm Stickbug platform demonstrates a mean positional error of 0.6 cm and a mean angular error of 19.14°, achieving a pollination success rate of 78.75 %, substantially higher than the reported 66 % baseline for existing systems. The approach runs efficiently on low‑power devices, meeting real‑time constraints while maintaining high accuracy. All code, synthetic datasets, and trained models are publicly released, supporting reproducibility and future research in precision agriculture robotics.
Comments & Academic Discussion
Loading comments...
Leave a Comment