Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

Reading time: 5 minute
...

📝 Original Info

  • Title: Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
  • ArXiv ID: 2512.04453
  • Date: 2025-12-04
  • Authors: Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati

📝 Abstract

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

💡 Deep Analysis

Figure 1

📄 Full Content

Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati debasmita.ghose@yale.edu Yale University New Haven, CT, USA 2. Robot Action: Get Pasta 4. Robot Action: Fetch Tomato Do you want red or white sauce? 6. Robot Action: Serve Red Sauce Pasta Red sauce I want a quick Italian dinner 1. Human Action: Boil Water 3. Human Action: Add Pasta to the Pot 5. Human Action: Blend Tomato to make Red Sauce Figure 1: Illustrative cooking task. (1) Human says “I want a quick Italian dinner” and boils water; the robot considers options that fit this preference (pasta vs. rice) and leans toward pasta. (2) Robot fetches pasta. (3) Human adds pasta to the pot, reinforcing a pasta goal. (4) Robot is uncertain about which pasta dish is intended since several variants are possible with the available ingredients, so it asks, “Do you want red or white sauce?” Human replies “red,” and the robot fetches a tomato as a result. (5) Human blends the tomato to make sauce. (6) Robot serves red-sauce pasta, completing the collaboration. Abstract To collaborate with humans, robots must infer goals that are of- ten ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action–Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects support- ive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. HRI ’26, Edinburgh, Scotland, UK © 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-XXXX-X/2026/06 https://doi.org/XXXXXXX.XXXXXXX robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes. CCS Concepts • Computing methodologies →Reasoning about belief and knowledge. Keywords Human Goal Prediction, Open-Ended Goal Discovery ACM Reference Format: Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati. 2026. Open-Ended Goal Inference through Actions and Language for Human- Robot Collaboration. In Proceedings of ACM/IEEE International Conference on Human-Robot Interaction (HRI ’26). ACM, New York, NY, USA, 10 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction For robots to be effective partners in real-world human–robot col- laboration, they must be able to anticipate and support the hu- man’s end goals [29]. Yet these goals are often unknown, difficult to articulate, or challenging for the robot to interpret, especially in long-horizon tasks where many intermediate actions overlap across different possible outcomes [4]. Further, in many everyday settings, the space of potential goals is effectively unbounded: the arXiv:2512.04453v1 [cs.RO] 4 Dec 2025 HRI ’26, March 16–19, 2026, Edinburgh, Scotland, UK Ghose et al. same ingredients in a kitchen could be combined into countless recipes depending on preferences; the same tools and parts in an assembly task could produce different products; or a household reorganization could yield multiple equally plausible end states. To collaborate effectively in such settings, a robot must reason over this open-ended goal space while minimizing the burden on its partner, asking clarifying questions only when needed rather than demanding step-by-step instructions or demonstrations [37, 40, 55]. To address the above challenges, we propose Bidirectional Ac- tion Language Inference (BALI) for goal prediction, an approach for inferring human goals in collaborative tasks from both lan- guage and action. For example, consider a collaborative cooking task. When a person says “I want a sweet, healthy breakfast” and then begins fetching oats and fruit, the robot must reason that the likely goals include preparing a fruit oatmeal or a fruit parfait. BALI leverages ambiguous verbal instructions and observed human ac- tions to infer likely goals and gu

📸 Image Gallery

BALI_results.png BALI_sequence.png BALI_system.png acm-jdslogo.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut