SWE-Exp: Experience-Driven Software Issue Resolution

SWE-Exp: Experience-Driven Software Issue Resolution
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent advances in large language model (LLM) agents have shown remarkable progress in software issue resolution, leveraging advanced techniques such as multi-agent collaboration and Monte Carlo Tree Search (MCTS). However, current agents act as memoryless explorers - treating each problem separately without retaining or reusing knowledge from previous repair experiences. This leads to redundant exploration of failed trajectories and missed chances to adapt successful issue resolution methods to similar problems. To address this problem, we introduce SWE-Exp, an experience-enhanced approach that distills concise and actionable experience from prior agent trajectories, enabling continuous learning across issues. Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts. Specifically, it extracts reusable issue resolution knowledge at different levels - from high-level problem comprehension to specific code changes. Experiments show that SWE-Exp achieves a Pass@1 resolution rate of 73.0% on SWE-Bench Verified using the state-of-the-art LLM Claude 4 Sonnet, significantly outperforming prior results under other agent frameworks. Our approach establishes a new paradigm in which automated software engineering agents systematically accumulate and leverage repair expertise, fundamentally shifting from trial-and-error exploration to strategic, experience-driven issue resolution.


💡 Research Summary

The paper addresses a critical limitation of current large‑language‑model (LLM) based software issue‑resolution agents: they operate as memory‑less explorers, treating each bug in isolation and discarding knowledge gained from previous repair attempts. This leads to redundant exploration of failed trajectories, missed opportunities to reuse successful strategies, and an inability to evolve more sophisticated problem‑solving tactics over time.

To overcome these shortcomings, the authors propose SWE‑Exp, an experience‑driven framework that systematically captures, distills, and reuses knowledge from both successful and failed repair attempts. The system builds a multi‑faceted “experience bank” that stores information at three abstraction levels: (1) high‑level problem understanding (e.g., root‑cause summaries), (2) fault‑localization patterns (common code‑level symptoms and their diagnostic cues), and (3) concrete modification strategies (code snippets, defensive‑copy patterns, etc.).

Methodology Overview

  1. Trajectory Collection – Using a dual‑agent architecture (an Instructor and an Assistant) combined with Monte‑Carlo Tree Search (MCTS), the authors generate repair trajectories across 500 real‑world GitHub issues from the SWE‑Bench Verified benchmark. Each trajectory is recorded as a sequence of tuples ⟨(dₜ, aₜ, sₜ₊₁, fₜ)⟩ where dₜ denotes high‑level directives, aₜ the concrete action (e.g., edit, test), sₜ₊₁ the resulting repository state, and fₜ the environment feedback. Trajectories are annotated with success/failure labels and, for failures, detailed cause categories (localization error, misguided modification, insufficient comprehension).

  2. Experience Extraction – Raw trajectories are noisy and lengthy; therefore, the authors apply a hybrid extraction pipeline. LLM‑based summarization condenses high‑level directives, while rule‑based pattern matching identifies recurring fault‑localization cues and code‑mutation patterns. The output is a set of reusable templates: (i) a concise problem‑cause description, (ii) a localization hint (e.g., “shared attrs dict mutated”), and (iii) a concrete code transformation (e.g., “make defensive copy of attrs before mutation”). Both positive (successful) and negative (failed) experiences are retained, enabling the system to learn what not to do as well as what works.

  3. Experience Retrieval – When a new issue p′ arrives, the system queries the experience bank using a combination of vector similarity (derived from issue title, description, and code context) and categorical matching of failure causes. The retrieval function returns a small set E′ of the most relevant experiences, each containing a strategy, applicability conditions, and a code snippet.

  4. Experience‑Guided Resolution – The Instructor agent consumes E′ to formulate a high‑level plan (e.g., “focus on mutable default arguments”). The Assistant agent then executes low‑level actions, guided by the retrieved templates, while continuously feeding back test results to the Instructor for plan refinement. This closed‑loop reduces blind exploration and steers the search toward previously validated solutions.

Empirical Results
Using Claude 4 Sonnet as the underlying LLM, SWE‑Exp achieves a Pass@1 of 73 % on the SWE‑Bench Verified benchmark, a substantial improvement over prior state‑of‑the‑art MCTS‑based agents (≈55 %). Ablation studies show that:

  • Incorporating experience‑driven fault localization alone raises localization accuracy by ~10 % points.
  • Adding experience‑driven code modification alone contributes ~7 % points.
  • The combination yields the full 18 % point gain, confirming that both levels of experience are complementary.

The authors also present a motivating example from the Django codebase (issue django‑11964) where a naïve symptom‑focused patch copies a dictionary in a single widget, whereas an experience‑informed agent recognizes the underlying design flaw (shared mutable attrs) and applies a defensive‑copy pattern that resolves the bug across all related widgets.

Contributions and Impact

  • Framework Innovation: Introduces a systematic pipeline for collecting, structuring, and reusing repair experiences across heterogeneous repositories.
  • Experience‑Driven Agent Design: Demonstrates how a dual‑agent system can leverage retrieved knowledge to improve both fault localization and patch generation.
  • State‑of‑the‑Art Performance: Sets a new benchmark on SWE‑Bench Verified, validating the practical benefits of experience accumulation.

Future Directions
The paper suggests scaling the experience bank to cover more languages and frameworks, investigating lifelong learning mechanisms to update experiences online, and exploring cross‑project transfer learning to further reduce the need for extensive per‑project data.

In summary, SWE‑Exp transforms automated software engineering from isolated trial‑and‑error searches into a continuous learning process, where each solved (or unsolved) issue enriches a shared knowledge base that guides future repairs. This paradigm shift promises more efficient, robust, and scalable automated debugging solutions.


Comments & Academic Discussion

Loading comments...

Leave a Comment