Improving Language Agents through BREW

Reading time: 5 minute
...

📝 Original Info

  • Title: Improving Language Agents through BREW
  • ArXiv ID: 2511.20297
  • Date: 2025-11-26
  • Authors: ** Shashank Kirtania, Param Biyani, Priyanshu Gupta, Yasharth Bajpai, Roshni Iyer, Sumit Gulwani, Gustavo Soares (Microsoft) **

📝 Abstract

Large Language Model (LLM)-based agents are increasingly applied to tasks requiring structured reasoning, tool use, and environmental adaptation, such as data manipulation, multistep planning, and computer-use automation. However, despite their versatility, current training paradigms for model weight optimization methods, like PPO and GRPO, remain relatively impractical with their high computational overhead for rollout convergence. In addition, the resulting agent policies are difficult to interpret, adapt, or incrementally improve. To address this, we investigate creating and refining structured memory of experiential learning of an agent from its environment as an alternative route to agent optimization. We introduce BREW (Bootstrapping expeRientially-learned Environmental knoWledge), a framework for agent optimization for downstream tasks via KB construction and refinement. In our formulation, we introduce an effective method for partitioning agent memory for more efficient retrieval and refinement. BREW uses task graders and behavior rubrics to learn insights while leveraging state-space search for ensuring robustness from the noise and non-specificity in natural language. Empirical results on real world, domain-grounded benchmarks -- OSWorld, $\tau^2$Bench, and SpreadsheetBench -- show BREW achieves $10-20\%$ improvement in task precision, $10-15\%$ reduction in API/tool calls leading to faster execution time, all while maintaining computational efficiency on par with base models. Unlike prior work where memory is treated as static context, we establish the KB as a modular and controllable substrate for agent optimization -- an explicit lever for shaping behavior in a transparent, interpretable, and extensible manner.

💡 Deep Analysis

📄 Full Content

Improving Language Agents through BREW Shashank Kirtania, Param Biyani, Priyanshu Gupta, Yasharth Bajpai, Roshni Iyer, Sumit Gulwani, Gustavo Soares Microsoft {t-skirtania,t-pbiyani,priyansgupta,ybajpai, iyerroshni,sumitg,gustavo.soares}@microsoft.com Abstract Large Language Model (LLM)-based agents are increasingly applied to tasks re- quiring structured reasoning, tool use, and environmental adaptation, such as data manipulation, multistep planning, and computer-use automation. However, despite their versatility, current training paradigms for model weight optimization methods, like PPO and GRPO, remain relatively impractical with their high computational overhead for rollout convergence. In addition, the resulting agent policies are diffi- cult to interpret, adapt, or incrementally improve. To address this, we investigate creating and refining structured memory of experiential learning of an agent from its environment as an alternative route to agent optimization. We introduce BREW (Bootstrapping expeRientially-learned Environmental knoWledge), a framework for agent optimization for downstream tasks via KB construction and refinement. In our formulation, we introduce an effective method for partitioning agent memory for more efficient retrieval and refinement. BREW uses task graders and behavior rubrics to learn insights while leveraging state-space search for ensuring robust- ness from the noise and non-specificity in natural language. Empirical results on real world, domain-grounded benchmarks – OSWorld, τ 2Bench, and Spread- sheetBench – show BREW achieves 10 −20% improvement in task precision, 10 −15% reduction in API/tool calls leading to faster execution time, all while maintaining computational efficiency on par with base models. Unlike prior work where memory is treated as static context, we establish the KB as a modular and controllable substrate for agent optimization – an explicit lever for shaping behavior in a transparent, interpretable, and extensible manner. 1 Introduction Large Language Model (LLM) based agents are rapidly being deployed for structured reasoning, tool use, and autonomous interaction in real-world environments [16]. From computer-use and spreadsheet automation to software engineering pipelines, these agents drive tasks such as multi-step planning, data manipulation, and adaptive workflows [24, 13, 37, 2, 22]. For example, a language agent might help automate a multi-step workflow like collecting data from different sources, cleaning or validating it, and then uploading it onto a dedicated server, all while adjusting its plan if the format or structure of the data changes unexpectedly [36, 40, 28, 3]. Yet, despite these successes, top- performing agents generally score underwhelmingly on challenging real-world benchmarks—well behind human experts [39, 4, 32, 19]. As an example, consider the following scenario: Case Study on Computer Use Agents A computer-use agent in an Ubuntu environment tasked with automating software installation across multiple sessions. arXiv:2511.20297v1 [cs.AI] 25 Nov 2025 {agent-alignment,correctness} “Would the behavior and edits of the agent remain robust if the same task were performed on a slightly different system setup?” … {agent-alignment,correctness} “Would the behavior and edits of the agent remain robust if the same task were performed on a slightly different system setup?” … Example Rubric: "How well does the agent handle unexpected states or failures in the environment? Does it adapt or recover?" Human-Validated Rubrics and Task- specific Grader 1 User Query: "Can you enable the 'Do Not Track' feature in Chrome to enhance my online privacy?" Agent Trajectory: __file_diff__ __file_diff__ agent response… Trajectory Generation LLM tools planning memory Integrator Agent trajectories guided for alignment and correctness Optimal KB State Reflector Agent Learned Insights + Concept on trajectory Concepts Pack and unpack archival files Create Charts from Data "Export as PDF" … Zip and unzip files lemmatization 2 Compress and Extract Files Compress and Extract Files Meta Concept List Compress and Extract Files semantic deduplication {concept, insights} Compress and Extract Files: To compress: Select files or folder → Right-click →Compress… Choose .zip or .tar.gz → Set output name → Confirm location … 4 3 KB refinement Expand-and- Gather MCTS 5 Bootstrapping process mapping 𝐌𝐂𝐓𝐒𝒅𝟏(𝒔𝒊) 𝐌𝐂𝐓𝐒𝒅𝑲(𝒔𝒊) … Figure 1: BREW architecture overview using examples from the OSWorld dataset. Step 1 indicates the trajectory generation process with agent alignment to human-validated rubrics and correctness using task-specific grader. Steps 2–4 indicate the Reflector Agent, which learns key concepts and corresponding insights from trajectories. Step 5 indicates the Integrator Agent, which integrates knowledge from the Reflector Agent to bootstrap the KB. We introduce Expand-and-Gather MCTS for finding the best KB configuration by a reward-guided se

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut