Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks – an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs’ in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.


💡 Research Summary

The paper introduces TheoryCoder‑2, an advancement over the original TheoryCoder TBRL system, that automatically learns reusable high‑level abstractions for hierarchical planning using large language models (LLMs). While TheoryCoder achieved strong performance by combining a learned low‑level Python world model with hand‑crafted PDDL abstractions, it required human engineers to write those abstractions for each new domain, limiting scalability. TheoryCoder‑2 removes this bottleneck by leveraging the in‑context learning capability of LLMs to synthesize PDDL domain and problem files directly from interaction data.

The method works as follows: at the start of an episode the agent collects a few random transitions (R_random) and presents the initial state, the transitions, and a minimal few‑shot prompt (describing the goal and a toy example of an abstract operator) to the LLM. The LLM generates a set of abstract operators D and predicates P in PDDL syntax. Fast‑Downward then produces a high‑level plan Π_H using D and P. Each high‑level operator ω_k is grounded by a breadth‑first search over the learned low‑level world model ˆT (a Python program that predicts state changes). The resulting concrete action sequences π_k are executed, and any prediction errors are recorded as R_p (state‑action‑next‑state triples) and R_a (action‑operator associations). These error logs are fed back to the LLM to refine ˆT. The updated world model and the abstract library are retained for future episodes, allowing the agent to reuse previously learned abstractions and extend them when necessary.

A curriculum of episodes groups similar environments (BabyAI, MiniHack, and VGDL games such as Sokoban, Labyrinth, Maze) by difficulty. The agent first learns abstractions in easy tasks, then reuses and augments them in harder tasks. Ablation studies that shuffle the curriculum order demonstrate that progressive learning dramatically improves sample efficiency.

Experiments compare TheoryCoder‑2 against several baselines: LLM + π (LLM directly generates primitive actions), LLM + Planner (LLM assists a classical planner but still relies on hand‑crafted abstractions), and WorldCoder (a prior program‑synthesis agent). Evaluation metrics include token cost (proxy for sample efficiency), wall‑clock compute time, and solution rate (percentage of tasks solved on the first attempt). TheoryCoder‑2 consistently outperforms all baselines, achieving 2–5× lower token consumption and faster runtimes while solving a higher proportion of levels. Notably, on complex Sokoban configurations it reaches over 80 % success, whereas baselines plateau far below that.

The paper also discusses limitations: reliance on large LLMs makes performance sensitive to model size; PDDL’s discrete logical format restricts applicability to continuous control problems; and the initial few‑shot prompt, though minimal, still requires human design. Future work is suggested on lightweight model transfer, richer abstraction languages (e.g., STRIPS extensions or direct code generation), and automated prompt optimization.

In summary, TheoryCoder‑2 demonstrates that automatic abstraction synthesis combined with hierarchical symbolic planning can dramatically improve the efficiency and generalization of LLM‑augmented agents, moving AI systems closer to human‑like learning of reusable concepts and planning strategies.


Comments & Academic Discussion

Loading comments...

Leave a Comment