Goal Inference from Open-Ended Dialog

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Embodied AI Agents are quickly becoming important and common tools in society. These embodied agents should be able to learn about and accomplish a wide range of user goals and preferences efficiently and robustly. Large Language Models (LLMs) are often used as they allow for opportunities for rich and open-ended dialog type interaction between the human and agent to accomplish tasks according to human preferences. In this thesis, we argue that for embodied agents that deal with open-ended dialog during task assistance: 1) AI Agents should extract goals from conversations in the form of Natural Language (NL) to be better at capturing human preferences as it is intuitive for humans to communicate their preferences on tasks to agents through natural language. 2) AI Agents should quantify/maintain uncertainty about these goals to ensure that actions are being taken according to goals that the agent is extremely certain about. We present an online method for embodied agents to learn and accomplish diverse user goals. While offline methods like RLHF can represent various goals but require large datasets, our approach achieves similar flexibility with online efficiency. We extract natural language goal representations from conversations with Large Language Models (LLMs). We prompt an LLM to role play as a human with different goals and use the corresponding likelihoods to run Bayesian inference over potential goals. As a result, our method can represent uncertainty over complex goals based on unrestricted dialog. We evaluate in a text-based grocery shopping domain and an AI2Thor robot simulation. We compare our method to ablation baselines that lack either explicit goal representation or probabilistic inference.

💡 Research Summary

The paper “Goal Inference from Open‑Ended Dialog” introduces GOOD (Goals for Open‑ended Dialogue), an online framework that enables embodied agents to infer user goals from unrestricted natural‑language conversations while explicitly maintaining uncertainty over those goals. The authors argue that (1) extracting goals in natural language is the most intuitive way for humans to communicate preferences, and (2) an agent should only act on goals it is highly certain about, to avoid unsafe or undesired behavior.

GOOD consists of four interacting modules: (i) a Conversation module that uses a large language model (LLM) to generate robot queries and synthetic human responses based on a “human profile” (preferences, allergies, etc.); (ii) an Inference module that performs Bayesian belief updates over a finite set of candidate goals G. The key technical contribution is modeling the likelihood P(u | g) – the probability of a user utterance u given a goal g – by prompting the LLM to role‑play as a person who holds goal g and returning the log‑likelihood of the utterance. This yields a principled, language‑aware likelihood without hand‑crafted models. (iii) a Goal Management module that dynamically adds new goals to G when the posterior probability of an “unspecified” placeholder exceeds a threshold, and prunes low‑probability goals, thereby keeping the hypothesis space tractable. (iv) an Action module that selects the top‑k most probable goals, asks the LLM to generate an action plan, and executes it in the environment, checking for task completion.

The method is evaluated in two domains. In a text‑based grocery‑shopping simulation, users express nuanced preferences such as “gluten‑free chocolate cake.” GOOD quickly infers these preferences, updates its belief, and selects the correct items with fewer dialogue turns than baselines. In the AI2‑Thor simulated home‑robot domain, the agent must locate and deliver objects while respecting hidden constraints (e.g., user dislikes certain textures). GOOD again demonstrates superior success rates and lower uncertainty before acting.

Baselines include (1) a version that represents goals as engineered feature vectors and does not perform Bayesian updates, and (2) a version that uses natural‑language goals but relies only on the most recent utterance without probabilistic reasoning. GOOD outperforms both in goal‑identification accuracy, dialogue efficiency, and overall task success. The explicit uncertainty quantification allows the robot to defer action until it is sufficiently confident, reducing unsafe or irrelevant actions and even uncovering implicit preferences (e.g., allergies) that the user never stated directly.

Key contributions:

First online framework that integrates natural‑language goal representations with Bayesian inference.
Novel use of LLM role‑play as a likelihood model for language‑based human behavior.
Dynamic goal‑set management that keeps inference tractable in open‑ended settings.
Empirical validation across two realistic domains showing improved flexibility and data efficiency compared to offline RLHF‑style methods.

Limitations include dependence on LLM output stability (temperature, prompt sensitivity) and reliance on synthetic human responses for experiments. Future work will involve real‑user studies, multimodal extensions (vision, speech), and systematic tuning of LLM inference parameters to improve robustness. Overall, the paper presents a compelling approach to aligning embodied agents with human intent in a scalable, uncertainty‑aware manner.

Goal Inference from Open-Ended Dialog

💡 Research Summary

Comments & Academic Discussion

Leave a Comment