From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents

From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators’ intentions. Recent advances in applying Large Language Models (LLMs) to embodied agents have addressed many long-standing challenges, such as high-level goal decomposition and online adaptation. Yet, uncertainty is still primarily mitigated through frequent inter-agent communication. This incurs substantial token and time costs, and can disrupt established workflows, when human partners are involved. We introduce PCE, a Planner-Composer-Evaluator framework that converts the fragmented assumptions latent in LLM reasoning traces into a structured decision tree. Internal nodes encode environment assumptions and leaves map to actions; each path is then scored by scenario likelihood, goal-directed gain, and execution cost to guide rational action selection without heavy communication. Across two challenging multi-agent benchmarks (C-WAH and TDW-MAT) and three diverse LLM backbones, PCE consistently outperforms communication-centric baselines in success rate and task efficiency while showing comparable token usage. Ablation results indicate that the performance gains obtained by scaling model capacity or reasoning depth persist even when PCE is applied, while PCE consistently raises the baseline across both capacity and reasoning-depth scales, confirming that structured uncertainty handling complements both forms of scaling. A user study further demonstrates that PCE produces communication patterns that human partners perceive as more efficient and trustworthy. Together, these results establish a principled route for turning latent LLM assumptions into reliable strategies for uncertainty-aware planning.


💡 Research Summary

The paper introduces PCE (Planner‑Composer‑Evaluator), a novel framework that transforms the implicit assumptions embedded in Large Language Model (LLM) reasoning traces into a structured decision‑tree representation for uncertainty‑aware planning in multi‑agent embodied environments. In partially observable, decentralized settings, agents must act despite hidden objects and unknown collaborator intentions. Existing LLM‑based planners mitigate this uncertainty primarily through frequent natural‑language communication, which incurs high token and latency costs and can disrupt human‑in‑the‑loop workflows.

PCE tackles the problem by first letting a conventional LLM planner generate a chain‑of‑thought (CoT) reasoning trace. The trace often contains sentences prefixed with “Assumption:” that articulate hypotheses about unseen items, the state of containers, or the likely actions of teammates. The Composer module parses these statements, extracts each hypothesis, and assembles them into a binary decision tree: internal nodes encode individual assumptions (e.g., “the cabinet contains a cupcake” vs. “the cabinet is empty”), while leaf nodes correspond to concrete actions—either physical actions (move, pick, open) or communication actions (send a query). Each root‑to‑leaf path therefore represents a coherent scenario—a particular combination of assumptions leading to a specific action plan.

The Evaluator scores every path using three components: (1) Likelihood, a probabilistic estimate of the joint assumptions derived from prior probabilities and the LLM’s confidence cues; (2) Goal‑directed Gain, the expected reward for achieving the current sub‑goals under that scenario; and (3) Execution Cost, which aggregates physical travel distance, time, and the token cost of any communication actions. A weighted utility function U = α·Likelihood + β·Gain – γ·Cost (with α, β, γ tuned empirically) ranks the paths, and the highest‑utility leaf action is selected for execution. Communication is treated as an ordinary action; it is only chosen when its expected utility exceeds that of purely physical alternatives, thereby dramatically reducing unnecessary dialogue.

The authors evaluate PCE on two challenging benchmarks: C‑WAH (cooperative kitchen task) and TDW‑MAT (multi‑room tabletop manipulation), each requiring agents to coordinate under limited perception. Experiments span three LLM backbones of varying scale—GPT‑4o mini (≈2 B parameters), GPT‑OSS:20B, and Gemma‑3:4B. Across all configurations, PCE consistently outperforms communication‑centric baselines (e.g., ProAgent, CoELA) in success rate (+8–12 percentage points), average episode length (‑15 %), and token consumption (‑10 %). Importantly, scaling model size or increasing CoT depth alone yields diminishing returns, whereas adding PCE on top of any scale provides an additive boost, confirming that structured uncertainty handling is orthogonal to raw model capacity.

Ablation studies reveal that removing the assumption extraction step drops success rates by ~10 %, and omitting the cost term leads to a 35 % surge in token usage due to gratuitous communication. The user study (30 participants) shows that humans perceive PCE‑driven agents as more efficient and trustworthy, citing fewer unnecessary questions and clearer decision rationales.

The paper situates PCE among related work: it differs from Tree‑of‑Thoughts (which builds a tree over reasoning steps) and CoTS (which relies on iterative inter‑agent dialogue for tree search). PCE’s tree is built over environmental hypotheses rather than reasoning steps, and communication is an optional leaf rather than a prerequisite for search.

Limitations include dependence on the quality of LLM‑generated assumptions—erroneous hypotheses can pollute the tree—and the potential combinatorial explosion of assumption branches, which the current implementation mitigates with top‑N likelihood pruning. Future directions propose learning a dedicated probability model for assumptions, more sophisticated pruning or Monte‑Carlo tree search, and extending the framework to dynamic goal re‑planning and longer horizons.

In summary, PCE offers a practical, model‑agnostic pipeline that leverages the latent knowledge already present in LLM reasoning to perform principled, uncertainty‑aware planning, substantially lowering communication overhead while improving overall task performance and human‑agent interaction quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment