Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback

Reading time: 5 minute
...

📝 Original Info

  • Title: Agent2World: Learning to Generate Symbolic World Models via Adaptive Multi-Agent Feedback
  • ArXiv ID: 2512.22336
  • Date: 2025-12-26
  • Authors: Mengkang Hu, Bowei Xia, Yuran Wu, Ailing Yu, Yude Zou, Qiguang Chen, Shijian Wang, Jiarui Jin, Kexin Li, Wenxiang Jiao, Yuan Lu, Ping Luo

📝 Abstract

Symbolic world models (e.g., PDDL domains or executable simulators) are central to model-based planning, but training LLMs to generate such world models is limited by the lack of large-scale verifiable supervision. Current approaches rely primarily on static validation methods that fail to catch behavior-level errors arising from interactive execution. In this paper, we propose Agent2World, a tool-augmented multi-agent framework that achieves strong inference-time world-model generation and also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback. Agent2World follows a three-stage pipeline: (i) A Deep Researcher agent performs knowledge synthesis by web searching to address specification gaps; (ii) A Model Developer agent implements executable world models; And (iii) a specialized Testing Team conducts adaptive unit testing and simulation-based validation. Agent2World demonstrates superior inference-time performance across three benchmarks spanning both Planning Domain Definition Language (PDDL) and executable code representations, achieving consistent state-of-the-art results. Beyond inference, Testing Team serves as an interactive environment for the Model Developer, providing behavior-aware adaptive feedback that yields multi-turn training trajectories. The model fine-tuned on these trajectories substantially improves world-model generation, yielding an average relative gain of 30.95% over the same model before training. Project page: https://agent2world.github.io.

💡 Deep Analysis

Figure 1

📄 Full Content

Preprint AGENT2WORLD: LEARNING TO GENERATE SYM- BOLIC WORLD MODELS VIA ADAPTIVE MULTI- AGENT FEEDBACK Mengkang Hu♠♡∗Bowei Xia♠♢∗ Yuran Wu♠Ailing Yu♠Yude Zou♠Qiguang Chen♣ Shijian Wang♡Jiarui Jin♡Kexin Li♢Wenxiang Jiao♡Yuan Lu♡Ping Luo♠† ♠The University of Hong Kong ♡Xiaohongshu Inc. ♢UESTC ♣Harbin Institute of Technology ABSTRACT Symbolic world models (e.g., PDDL domains or executable simulators) are cen- tral to model-based planning, but training LLMs to generate such world models is limited by the lack of large-scale verifiable supervision. Current approaches rely primarily on static validation methods that fail to catch behavior-level errors arising from interactive execution. In this paper, we propose AGENT2WORLD, a tool-augmented multi-agent framework that achieves strong inference-time world- model generation and also serves as a data engine for supervised fine-tuning, by grounding generation in multi-agent feedback. AGENT2WORLD follows a three- stage pipeline: (i) A Deep Researcher agent performs knowledge synthesis by web searching to address specification gaps; (ii) A Model Developer agent im- plements executable world models; And (iii) a specialized Testing Team conducts adaptive unit testing and simulation-based validation. AGENT2WORLD demon- strates superior inference-time performance across three benchmarks spanning both Planning Domain Definition Language(PDDL) and executable code repre- sentations, achieving consistent state-of-the-art results. Beyond inference, Testing Team serves as an interactive environment for the Model Developer, providing behavior-aware adaptive feedback that yields multi-turn training trajectories. The model fine-tuned on these trajectories substantially improves world-model gen- eration, yielding an average relative gain of 30.95% over the same model before training. Project page: agent2world.github.io. 1 INTRODUCTION In recent years, researchers have explored symbolic world models, a formal representation of an en- vironment’s dynamics and constraints, which is widely used in model-based planning (Guan et al., 2023; LeCun, 2022; Craik, 1967). The task of symbolic world-model generation involves automat- ically synthesizing these models from natural language descriptions, eliminating the need for do- main experts to manually design and specify complex rules and dynamics. Large language models (LLMs) (Guo et al., 2025; Zhao et al., 2023; Bai et al., 2023) have made this automation possi- ble by combining two key capabilities: commonsense knowledge about how the world works, and code generation abilities that formalize this knowledge into executable representations (Chen et al., 2025a). However, learning to generate such models from natural language remains difficult: cor- rectness is behavioral and execution-dependent, while large-scale, verifiable supervision is scarce. As illustrated in Figure 1, prior work in this domain largely follows two paradigms: (i) direct gen- eration of symbolic world models, and (ii) scripted workflows that couple generation with iterative verification and repair. Across both PDDL-style domains (Guan et al., 2023; Hu et al., 2025a) and executable code world models (Dainese et al., 2024), the second paradigm typically couples gener- ation with a pre-specified verification interface (e.g., parsers/planners/validators, fixed sets of evalu- ation trajectories). While such static validation improves syntactic validity, it misses behavior-level errors that only appear under interactive execution (e.g., inconsistent state updates or unreachable ∗Equal contribution. Corresponding to mkhu@connect.hku.hk, pluo.lhi@gmail.com. 1 arXiv:2512.22336v1 [cs.AI] 26 Dec 2025 Preprint Direct Workflow Agent (ours) Do not Pass ❌ ❌ ❌ LLM ❌ Results ❌ LLM Fix / Improve × Power × Bread × Bread + power_on() + bread_in() ✅ pass × Power × Bread √Power √Bread Results × Temperature unknown Searcher Developer World Model Required Action √Power √Bread √Temperature World Model Excution Unit Tester Require power Require bread Plug Insert Press Success ✅ Simulation Tester World Model pass √Power √Bread √Temperature Results Required Temperature Score Text2World CWMB Bytesized32 39 60 75 26 35 48 70 73 79 Figure 1: Comparison of AGENT2WORLD and previous world-model generation paradigms. goals). Furthermore, existing studies on generating symbolic world models with LLMs have pri- marily focused on training-free methods for one particular type of world models (Yu et al., 2025; Kong et al., 2025; Zhang et al., 2025), rather than fundamentally enhancing the world modeling capabilities of the LLMs themselves. In this paper, we propose AGENT2WORLD, a tool-augmented multi-agent framework that evaluates and improves world models through interactive execution. Given a natural-language description, AGENT2WORLD coordinates multiple LLM-based agents with access to external tools (e.g., web retrieval and code execution) to iteratively produce an executable world model. At a high level,

📸 Image Gallery

case_study.png feedback_performance.png gif_mcts_ablation.png method.jpg method.png multi_vs_single.png sankey_diagram_Bytesized32.png sankey_diagram_cwmb.png sankey_diagram_text2world.png specialization_efficiency.png teaser.png win_tie_loss.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut