Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning
Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present numerous open-ended problems where fixed tools often fail to meet task requirements. Furthermore, the lack of self-optimization mechanisms means that erroneous tool outputs can mislead the LLM’s responses. Additionally, the construction of existing tools entails significant manual effort, which consequently constrains their applicability. Recognizing that the reasoning traces of LLMs encapsulate implicit problem-solving capabilities, we propose UCT, a novel training-free framework that transforms agents from tool users to tool creators. This approach harvests reasoning experiences and distills them into reusable assets. This method transforms the agent from a mere tool user into a tool creator, enabling adaptive tool creation and self-updating during the inference process. We also introduce a memory consolidation mechanism to maintain the tool library, ensuring high reusability of retained experiential memory for subsequent reasoning tasks. This novel automated tool construction paradigm continuously improves tool quality during reasoning, allowing the overall agent system to progress without additional training. Extensive experiments demonstrate that our method serves as a novel paradigm for enhancing the capabilities of TIR models. In particular, the significant performance gains achieved +20.86%$\uparrow$ and +23.04%$\uparrow$ on benchmarks across multi-domain mathematical and scientific reasoning tasks validate the self-evolving capability of the agent.
💡 Research Summary
The paper addresses a fundamental limitation of current Tool‑Integrated Reasoning (TIR) systems: they rely on a static set of hand‑crafted or ad‑hoc tools, which hampers generalization, introduces error propagation, and requires costly manual engineering. The authors observe that the reasoning traces produced by large language models (LLMs) contain latent problem‑solving knowledge that can be harvested and turned into reusable assets. To this end they propose UCT (Training‑Free Experience Reuse), a framework that enables an LLM‑based agent to evolve from a mere tool user into a tool creator during inference, without any additional model training.
UCT consists of three tightly coupled components.
- Online Task Loop – built on the ReAct paradigm, the policy model receives the user query, the interaction history, and any observations from previously executed tools. At each step it decides among three actions: generate a thought, invoke an existing tool (core or previously created), or request a new tool. If a required tool is missing or fails, a “build ticket” is issued.
- Online Build Loop – triggered by a build ticket, this isolated pipeline generates the tool’s source code, a corresponding test suite, and executes both inside a sandbox. The generated code is then sent to a critic (a separate code‑review model). If the test or critic fails, the feedback is fed back to the ReAct model, which iteratively refines the code until it passes both functional tests and quality review. Once accepted, the tool is registered in the “created tools” library and can be immediately invoked for the current task.
- Offline Memory Consolidation – runs asynchronously on usage logs. It merges duplicate tools, classifies them by functionality, prunes rarely used assets, and updates metadata. This process keeps the tool library compact, searchable, and ready for future queries, effectively turning transient code snippets into long‑term experiential knowledge.
The key novelty lies in the training‑free nature of the system: all new capabilities arise from inference‑time experience, not from gradient‑based fine‑tuning. The framework also enforces strict quality control (sandbox execution + critic review), which mitigates the instability often seen in prior ad‑hoc code generation approaches.
Experiments are conducted on a newly released benchmark, TRBench (959 multimodal tool‑use problems), as well as standard mathematical, scientific, and visual‑question‑answering datasets. UCT outperforms baseline ReAct and Chain‑of‑Thought methods by 20.86 % to 23.04 % absolute accuracy gains across domains. Qualitative analysis shows that the generated tools are reusable: a tool created for one geometry problem can be applied to other geometry tasks without re‑generation. Moreover, the majority of tools pass the first‑round test, indicating the effectiveness of the built‑in verification loop.
Limitations are acknowledged. Current tool generation is focused on Python code, leaving non‑code APIs (e.g., REST calls, database queries) under‑explored. The sandbox environment introduces additional computational overhead and security considerations. Future work aims to broaden the modality of tools (image processing, simulation), integrate meta‑reinforcement learning to improve the tool‑creation policy, and explore more scalable memory‑management strategies.
In summary, UCT demonstrates that an LLM‑driven agent can self‑evolve by harvesting its own reasoning experience, automatically constructing high‑quality tools, and consolidating them into a persistent library. This training‑free, self‑optimizing paradigm opens a pathway toward more autonomous, adaptable AI systems that continuously improve without external supervision.
Comments & Academic Discussion
Loading comments...
Leave a Comment