📝 Original Info Title: Open-Ended Goal Inference through Actions and Language for Human-Robot CollaborationArXiv ID: 2512.04453Date: 2025-12-04Authors: Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati📝 Abstract To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
💡 Deep Analysis
📄 Full Content Open-Ended Goal Inference through Actions and Language for
Human-Robot Collaboration
Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati
debasmita.ghose@yale.edu
Yale University
New Haven, CT, USA
2. Robot Action: Get Pasta
4. Robot Action: Fetch Tomato
Do you want red or white sauce?
6. Robot Action: Serve Red Sauce Pasta
Red sauce
I want a quick Italian dinner
1. Human Action: Boil Water
3. Human Action: Add Pasta to the Pot
5. Human Action: Blend Tomato to make Red Sauce
Figure 1: Illustrative cooking task. (1) Human says “I want a quick Italian dinner” and boils water; the robot considers options
that fit this preference (pasta vs. rice) and leans toward pasta. (2) Robot fetches pasta. (3) Human adds pasta to the pot, reinforcing
a pasta goal. (4) Robot is uncertain about which pasta dish is intended since several variants are possible with the available
ingredients, so it asks, “Do you want red or white sauce?” Human replies “red,” and the robot fetches a tomato as a result. (5)
Human blends the tomato to make sauce. (6) Robot serves red-sauce pasta, completing the collaboration.
Abstract
To collaborate with humans, robots must infer goals that are of-
ten ambiguous, difficult to articulate, or not drawn from a fixed
set. Prior approaches restrict inference to a predefined goal set,
rely only on observed actions, or depend exclusively on explicit
instructions, making them brittle in real-world interactions. We
present BALI (Bidirectional Action–Language Inference) for goal
prediction, a method that integrates natural language preferences
with observed human actions in a receding-horizon planning tree.
BALI combines language and action cues from the human, asks
clarifying questions only when the expected information gain from
the answer outweighs the cost of interruption, and selects support-
ive actions that align with inferred goals. We evaluate the approach
in collaborative cooking tasks, where goals may be novel to the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
HRI ’26, Edinburgh, Scotland, UK
© 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-XXXX-X/2026/06
https://doi.org/XXXXXXX.XXXXXXX
robot and unbounded. Compared to baselines, BALI yields more
stable goal predictions and significantly fewer mistakes.
CCS Concepts
• Computing methodologies →Reasoning about belief and
knowledge.
Keywords
Human Goal Prediction, Open-Ended Goal Discovery
ACM Reference Format:
Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati. 2026.
Open-Ended Goal Inference through Actions and Language for Human-
Robot Collaboration. In Proceedings of ACM/IEEE International Conference
on Human-Robot Interaction (HRI ’26). ACM, New York, NY, USA, 10 pages.
https://doi.org/XXXXXXX.XXXXXXX
1
Introduction
For robots to be effective partners in real-world human–robot col-
laboration, they must be able to anticipate and support the hu-
man’s end goals [29]. Yet these goals are often unknown, difficult
to articulate, or challenging for the robot to interpret, especially
in long-horizon tasks where many intermediate actions overlap
across different possible outcomes [4]. Further, in many everyday
settings, the space of potential goals is effectively unbounded: the
arXiv:2512.04453v1 [cs.RO] 4 Dec 2025
HRI ’26, March 16–19, 2026, Edinburgh, Scotland, UK
Ghose et al.
same ingredients in a kitchen could be combined into countless
recipes depending on preferences; the same tools and parts in an
assembly task could produce different products; or a household
reorganization could yield multiple equally plausible end states. To
collaborate effectively in such settings, a robot must reason over
this open-ended goal space while minimizing the burden on its
partner, asking clarifying questions only when needed rather than
demanding step-by-step instructions or demonstrations [37, 40, 55].
To address the above challenges, we propose Bidirectional Ac-
tion Language Inference (BALI) for goal prediction, an approach
for inferring human goals in collaborative tasks from both lan-
guage and action. For example, consider a collaborative cooking
task. When a person says “I want a sweet, healthy breakfast” and
then begins fetching oats and fruit, the robot must reason that the
likely goals include preparing a fruit oatmeal or a fruit parfait. BALI
leverages ambiguous verbal instructions and observed human ac-
tions to infer likely goals and gu
📸 Image Gallery
Reference This content is AI-processed based on open access ArXiv data.