Outcome-Conditioned Reasoning Distillation for Resolving Software Issues

Outcome-Conditioned Reasoning Distillation for Resolving Software Issues
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Software issue resolution in large repositories is a long-range decision process: choices made during localization shape the space of viable edits, and missteps can compound into incorrect patches. Despite this, many LLM-based repair pipelines still operate in a reset-and-solve manner, producing fresh reasoning for every new issue instead of carrying forward what worked in past fixes. This is wasteful because repositories routinely contain earlier issues with overlapping structure, failure modes, or constraints, where prior repair experience could provide useful guidance. Existing approaches typically harvest this signal through forward-time trial procedures, such as repeated refinement or search, incurring high inference cost while still risking divergence from the eventual correct patch. We present an Outcome-Conditioned Reasoning Distillation(O-CRD) framework that uses resolved in-repository issues with verified patches as supervision. Starting from a historical fix, the method reconstructs a stage-wise repair trace backward from the verified outcome, then reuses the distilled guidance at inference time to steer file/function localization and patch synthesis, without fine-tuning or online search. On SWE-Bench Lite, this approach increases Pass@1 by 10.4% with GPT-4o, 8.6% with DeepSeek-V3, and 10.3% with GPT-5, indicating that outcome-conditioned reuse of verified repairs can replace costly forward exploration for software issue resolution.


💡 Research Summary

The paper tackles the inefficiency inherent in current large‑language‑model (LLM) based software bug repair pipelines, which typically treat each new issue as an isolated problem and regenerate a full reasoning trace from scratch. This “reset‑and‑solve” approach ignores the wealth of information embedded in previously resolved bugs within the same repository—information that often shares code structure, failure modes, or constraints with new bugs. Existing attempts to reuse such historical knowledge rely on forward‑time exploration methods such as iterative refinement or Monte‑Carlo Tree Search (MCTS). While these methods can recover from early mistakes, they incur substantial inference cost, generate many speculative reasoning paths, and still risk diverging from the correct patch.

To address these shortcomings, the authors propose Outcome‑Conditioned Reasoning Distillation (O‑CRD), a three‑stage framework that extracts reusable, outcome‑conditioned repair guidance from verified past fixes and injects it as in‑context guidance during the repair of new bugs, without any fine‑tuning or online search.

Stage 1 – Repository‑Level Exemplar Mining
O‑CRD first builds a repository‑specific historical bug dataset containing resolved issues, their ground‑truth patches, and timestamps. For a target bug, it retrieves candidate exemplars using a two‑step process: (1) lightweight textual similarity via All‑MiniLM‑L6‑v2 embeddings to obtain the top‑5 most similar past issues, and (2) a semantic compatibility assessment performed by an LLM acting as a judge. The LLM evaluates each candidate against a rubric that captures failure symptoms, affected components, and implied repair intent, selecting the single most semantically aligned exemplar while ensuring temporal leakage is avoided (only bugs fixed before the target’s creation time are considered).

Stage 2 – Exemplar Guardian
Because superficial similarity does not guarantee that the reasoning from a past bug will transfer, O‑CRD introduces a conservative filter called the Exemplar Guardian. This LLM‑based evaluator scores the candidate on five dimensions: Root‑Cause Similarity, Causal‑Chain Transferability, Fix‑Strategy Applicability, Contextual Alignment, and Debugging‑Technique Relevance. It outputs a brief justification, a confidence estimate, and a continuous transferability score in


Comments & Academic Discussion

Loading comments...

Leave a Comment