Affordances Enable Partial World Modeling with LLMs
Full models of the world require complex knowledge of immense detail. While pre-trained large models have been hypothesized to contain similar knowledge due to extensive pre-training on vast amounts of internet scale data, using them directly in a search procedure is inefficient and inaccurate. Conversely, partial models focus on making high quality predictions for a subset of state and actions: those linked through affordances that achieve user intents~\citep{khetarpal2020can}. Can we posit large models as partial world models? We provide a formal answer to this question, proving that agents achieving task-agnostic, language-conditioned intents necessarily possess predictive partial-world models informed by affordances. In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency. Empirical evaluations in tabletop robotics tasks demonstrate that our affordance-aware partial models reduce the search branching factor and achieve higher rewards compared to full world models.
💡 Research Summary
The paper “Affordances Enable Partial World Modeling with LLMs” investigates how large language models (LLMs) can be used not as full, exhaustive world simulators but as components of a partial world model that is guided by affordances. The authors begin by formalizing the notions of temporally‑extended intents and ζ‑affordances. An intent Iₒ(s, τ) describes a probability distribution over full trajectories τ that an abstract action (option) o should produce when started from state s. An intent is ζ‑satisfiable if the distance between the intended distribution and the true transition distribution is below a threshold ζ. An affordance is then defined as a state‑option pair (s, o) for which the corresponding intent is ζ‑satisfiable.
The key theoretical contribution is twofold. Theorem 1 proves that any agent capable of achieving a set of language‑conditioned, task‑agnostic intents must implicitly possess a partial world model that predicts outcomes only for those (s, o) pairs that belong to the affordance set. The proof leverages the fact that intents are probabilistic trajectory specifications; if an intent can be satisfied, the agent does not need accurate predictions elsewhere. Theorem 2 shows that planning with such a partial model reduces the effective branching factor of search exponentially: instead of expanding all |S|·|O| successors, the planner expands only |AF| affordance‑validated successors, dramatically lowering sample complexity for Monte‑Carlo Tree Search or other look‑ahead methods.
To move from theory to practice, the authors extend the affordance framework to a multi‑task setting. They distinguish between task‑agnostic intents (rooted in the robot’s embodiment, e.g., “pick up any visible block”) and task‑specific intents (dependent on the current goal, e.g., “stack the red block on the blue block”). By introducing distribution‑robust affordances, they ensure that some affordances hold with high probability across the entire task distribution, while others are only valid for particular task instances.
Experimentally, the approach is evaluated on a suite of tabletop robotic tasks (block sorting, stacking, color‑based placement, etc.) sampled from a common distribution. A pre‑trained GPT‑4 model is used in two roles: (1) as an affordance classifier that, given a textual description of the current scene and a candidate option, decides whether the option is likely to satisfy its intent; (2) as a generative dynamics predictor that outputs a distribution over the next state when the option is deemed an affordance. The planner queries the LLM only for affordance‑validated options, thereby limiting expensive LLM calls.
Results show that the affordance‑aware partial model reduces the average branching factor by roughly 62 % compared with a full world model that queries the LLM for every option. Correspondingly, cumulative reward across episodes improves by about 18 % and the total number of LLM invocations drops by ~70 %. These gains demonstrate that the partial‑world approach retains the rich knowledge encoded in LLMs while avoiding their inefficiencies and hallucination‑related errors in irrelevant parts of the state‑action space.
The paper’s strengths lie in its rigorous formalization of intents and affordances for multi‑task RL, the clear theoretical link between language‑conditioned goals and partial models, and the practical demonstration that LLMs can serve as selective, cost‑effective predictors. Limitations include reliance on hand‑crafted prompts for the LLM, the need to set the ζ threshold empirically, and evaluation confined to relatively simple tabletop domains. Future work is suggested on automatic confidence estimation for affordance predictions, multimodal extensions (e.g., vision‑language integration), and deployment on real‑world robots with continuous control.
Overall, the work proposes a compelling paradigm shift: rather than treating LLMs as monolithic world simulators, they are leveraged as affordance filters and partial dynamics models, enabling efficient, scalable planning across diverse tasks while preserving the broad commonsense knowledge that large language models encode.
Comments & Academic Discussion
Loading comments...
Leave a Comment