Autonomous Grasping On Quadruped Robot With Task Level Interaction

Autonomous Grasping On Quadruped Robot With Task Level Interaction
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Quadruped robots are increasingly used in various applications due to their high mobility and ability to operate in diverse terrains. However, most available quadruped robots are primarily focused on mobility without object manipulation capabilities. Equipping a quadruped robot with a robotic arm and gripper introduces a challenge in manual control, especially in remote scenarios that require complex commands. This research aims to develop an autonomous grasping system on a quadruped robot using a task-level interaction approach. The system includes hardware integration of a robotic arm and gripper onto the quadruped robot’s body, a layered control system designed using ROS, and a web-based interface for human-robot interaction. The robot is capable of autonomously performing tasks such as navigation, object detection, and grasping using GraspNet. Testing was conducted through real-world scenarios to evaluate navigation, object selection and grasping, and user experience. The results show that the robot can perform tasks accurately and consistently, achieving a grasping success rate of 75 % from 12 trials. Therefore, the system demonstrates significant potential in enhancing the capabilities of quadruped robots as service robots in real-world environments.


💡 Research Summary

The paper presents an integrated system that equips a quadruped robot with a lightweight robotic arm (OpenManipulator‑X) and a gripper, enabling autonomous object grasping through a task‑level human‑robot interaction (HRI) framework. The hardware architecture combines the Lite3 quadruped platform (DeepRobotics) and the arm, both coordinated by a central perception host built on an NVIDIA Jetson Orin NX. The quadruped side includes an RK3588‑based motion host that drives the leg actuators and aggregates data from a wide‑angle camera, a LiDAR sensor, an ultrasonic radar, and a RealSense D435i depth camera for navigation. The arm side incorporates the OpenCR controller, Dynamixel smart actuators, and a second RealSense D435i mounted on the end‑effector for close‑range perception. All components are mechanically integrated using a custom‑designed 3‑D‑printed mount, ensuring balanced weight distribution and collision‑free operation.

Software is built on ROS 2 with a layered control scheme. A ROS‑WebSocket bridge provides a web‑based dashboard where users can select a target room, initiate navigation, and choose objects via click or drag‑select. High‑level commands are decomposed into a finite‑state machine (FSM) that orchestrates three task levels: (1) navigation to the designated room, (2) visual search and approach to the selected object, and (3) manipulation (grasp and transport). The FSM allows the operator to intervene at key states while the robot autonomously handles low‑level locomotion and arm motions.

For perception, the system employs YOLOv8n, a lightweight real‑time object detector, on RGB streams from the quadruped’s front camera. Detected objects are displayed with bounding boxes; the user selects the desired target, after which a CSRT tracker maintains the object’s pose during approach. Navigation relies on HDL‑Graph‑SLAM and HDL‑Localization to generate a LiDAR‑based 3‑D map, while the ROS navigation stack and a PID controller adjust linear and angular velocities based on visual tracking errors, ensuring the robot stops in a stable “seated” posture optimal for arm operation.

Manipulation begins once the robot is positioned near the object. The arm’s camera captures synchronized RGB, depth, and intrinsic matrix data, which are fed into GraspNet to generate a dense set of 6‑DOF grasp candidates across the entire scene. To focus on the user‑selected object, a masking step retains only pixels inside the bounding box. The candidate set is then filtered in three stages: (i) the top 20 grasps by confidence score, (ii) the grasp closest to the object’s centroid, and (iii) orientation adjustment to respect the arm’s joint limits. The final grasp pose, originally expressed in the camera frame, is transformed into the robot base frame using the manipulator’s kinematic model, then passed to MoveIt! for trajectory planning and execution. Successful grasps are followed by a return‑to‑origin maneuver where the robot transports the object back to the starting location and releases it.

Experimental validation was conducted in a real indoor environment on the 9th floor of the Institut Teknologi Sepuluh Nopember (ITS). Twelve trials were performed with various objects (e.g., bolt sets). Navigation and object approach succeeded in all trials, while the grasping subsystem achieved a 75 % success rate (9 out of 12). Failure cases were attributed to (a) grasp candidates lying outside the arm’s reachable workspace, (b) depth sensor noise causing inaccurate centroid estimation, and (c) object geometries that were inherently difficult for the parallel‑jaw gripper. User feedback collected via questionnaires highlighted the intuitive nature of the web interface, the clarity of real‑time visual feedback, and the convenience of issuing only high‑level commands to accomplish complex tasks.

The authors claim three primary contributions: (1) a modular hardware integration framework that can be adapted to other quadruped platforms or manipulators, (2) a task‑level HRI system that abstracts complex motion sequences into simple user actions, and (3) an autonomous manipulation pipeline that combines YOLOv8n detection with GraspNet grasp generation and a three‑stage filtering strategy, achieving a notable grasp success rate in real‑world conditions. The paper also discusses limitations and outlines future work, including the incorporation of more advanced 3‑D point‑cloud‑based grasp planners, multi‑object simultaneous manipulation, cooperative multi‑robot task allocation, and long‑duration power and thermal management studies.

In summary, this work demonstrates that equipping a quadruped robot with a manipulator and leveraging task‑level interaction can transform a purely mobile platform into a versatile service robot capable of autonomous navigation, perception, and object handling, thereby expanding the applicability of legged robots in service‑oriented and assistive scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment