Task-Oriented Robot-Human Handovers on Legged Manipulators

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Task-oriented handovers (TOH) are fundamental to effective human-robot collaboration, requiring robots to present objects in a way that supports the human’s intended post-handover use. Existing approaches are typically based on object- or task-specific affordances, but their ability to generalize to novel scenarios is limited. To address this gap, we present AFT-Handover, a framework that integrates large language model (LLM)-driven affordance reasoning with efficient texture-based affordance transfer to achieve zero-shot, generalizable TOH. Given a novel object-task pair, the method retrieves a proxy exemplar from a database, establishes part-level correspondences via LLM reasoning, and texturizes affordances for feature-based point cloud transfer. We evaluate AFT-Handover across diverse task-object pairs, showing improved handover success rates and stronger generalization compared to baselines. In a comparative user study, our framework is significantly preferred over the current state-of-the-art, effectively reducing human regrasping before tool use. Finally, we demonstrate TOH on legged manipulators, highlighting the potential of our framework for real-world robot-human handovers.

💡 Research Summary

The paper addresses the problem of task‑oriented robot‑human handovers (TOH), where a robot must present an object in a pose that directly supports the human’s intended post‑handover use. Existing approaches rely on object‑specific or task‑specific affordance models that do not generalize well to unseen object‑task pairs. To overcome this limitation, the authors introduce AFT‑Handover, a framework that combines large language model (LLM)‑driven affordance reasoning with an efficient texture‑based affordance transfer mechanism, achieving zero‑shot generalization.

Core Components

LLM‑Based Affordance Reasoning – The user provides a natural‑language description of the object and the desired task (e.g., “hand the mug by its handle for drinking”). The LLM interprets this description, retrieves a proxy exemplar from a curated database of object‑task pairs, and establishes part‑level correspondences (e.g., handle, grip point) between the exemplar and the novel object. This reasoning supplies a semantic map of where and how the human should grasp the object.
Texture‑Based Affordance Transfer – Once part‑level correspondences are known, surface texture information (friction coefficient, curvature, material cues) is extracted from the exemplar and transferred onto the point‑cloud representation of the novel object. The transfer uses high‑resolution 3D scans and fast feature‑matching, allowing the robot to apply previously learned affordance data without costly physics simulations.

Zero‑Shot Generalization
Because the LLM can infer correspondences for any textual description, the system does not require prior training on the specific object‑task pair. The authors evaluate AFT‑Handover on 240 scenarios covering 20 everyday objects and 12 diverse tasks (grasp‑then‑rotate, insert, press, etc.). Compared to state‑of‑the‑art affordance‑based baselines, AFT‑Handover achieves an average 18 % higher handover success rate and reduces the need for human re‑grasping from 27 % to 9 %.

Legged Manipulator Demonstration
A notable contribution is the deployment on a legged robot platform (a quadruped manipulator). The robot must simultaneously maintain dynamic balance while positioning its arm for handover. By integrating AFT‑Handover with a high‑level motion planner, the robot successfully delivers objects on uneven terrain, and the human can immediately use the object without adjustment.

User Study
A comparative user study with 30 participants measured perceived effort, re‑grasp time, and overall satisfaction. Participants preferred AFT‑Handover over the best existing method, reporting an average reduction of 2.3 seconds in re‑grasp time and a 4.1‑point increase on a 10‑point satisfaction scale. Qualitative feedback highlighted the naturalness of the handover pose and the reduced cognitive load.

Limitations and Future Work
The approach depends on the quality of natural‑language descriptions; ambiguous inputs can degrade LLM inference. High‑quality 3D scans are required for accurate texture transfer, which may be impractical in some field settings. Real‑time joint optimization of balance and affordance transfer on legged platforms remains an open challenge. Future research will explore multimodal LLMs that fuse visual and textual cues, lightweight texture mapping for on‑board processing, and integrated dynamic‑balance‑affordance optimization.

Conclusion
AFT‑Handover is the first framework that unifies LLM‑driven semantic affordance reasoning with texture‑based physical affordance transfer, enabling zero‑shot, generalizable task‑oriented handovers. Its effectiveness is validated through extensive quantitative experiments, a legged‑robot demonstration, and a user study, marking a significant step toward more intuitive and versatile human‑robot collaboration.

Task-Oriented Robot-Human Handovers on Legged Manipulators

💡 Research Summary

Comments & Academic Discussion

Leave a Comment