SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning
The deployment of coding agents in privacy-sensitive and resource-constrained environments drives the demand for capable open-weight Small Language Models (SLMs). However, they suffer from a fundamental capability gap: unlike frontier large models, they lack the inference-time strong generalization to work with complicated, unfamiliar codebases. We identify that the prevailing Task-Centric Learning (TCL) paradigm, which scales exposure across disparate repositories, fails to address this limitation. In response, we propose Repository-Centric Learning (RCL), a paradigm shift that prioritizes vertical repository depth over horizontal task breadth, suggesting SLMs must internalize the “physics” of a target software environment through parametric knowledge acquisition, rather than attempting to recover it via costly inference-time search. Following this new paradigm, we design a four-unit Repository-Centric Experience, transforming static codebases into interactive learning signals, to train SWE-Spot-4B, a family of highly compact models built as repo-specialized experts that breaks established scaling trends, outperforming open-weight models up to larger (e.g., CWM by Meta, Qwen3-Coder-30B) and surpassing/matching efficiency-focused commercial models (e.g., GPT-4.1-mini, GPT-5-nano) across multiple SWE tasks. Further analysis reveals that RCL yields higher training sample efficiency and lower inference costs, emphasizing that for building efficient intelligence, repository mastery is a distinct and necessary dimension that complements general coding capability.
💡 Research Summary
The paper addresses the growing demand for open‑weight small language models (SLMs) that can be deployed locally in privacy‑sensitive and resource‑constrained environments. While large proprietary models excel at code‑related tasks, they are costly and unsuitable for on‑premise use. Existing efforts to close the capability gap have largely followed a Task‑Centric Learning (TCL) paradigm: they collect massive amounts of task‑specific data (e.g., GitHub issue‑resolution trajectories) from many unrelated repositories and train models to become general‑purpose bug‑fixers. The authors argue that this approach fails for SLMs because such models lack the inference‑time search and reasoning power of their larger counterparts. Consequently, they over‑fit to surface‑level patterns (patch templates, shell syntax) and cannot internalize the deep, repository‑specific semantics needed to navigate unfamiliar codebases efficiently.
To overcome this limitation, the authors propose Repository‑Centric Learning (RCL), a paradigm shift that emphasizes vertical depth within a single target repository rather than horizontal breadth across many repositories. RCL mirrors how human developers acquire expertise: by repeatedly interacting with the same codebase, understanding its architecture, evolution history, and runtime behavior. The paper operationalizes RCL through four distinct units of Repository‑Centric Experience (RCX):
- Software Design – the model actively explores a module and produces a structured report describing its functionality, design rationale, and interactions, thereby learning implicit architectural intent.
- Contextual Implementation – instead of simple Fill‑in‑the‑Middle, the model is given only a functional goal and must retrieve relevant cross‑file context before generating compliant code, fostering awareness of global conventions and APIs.
- Evolutionary Replay – historical commits and pull‑request histories are mined to re‑introduce past bugs; the model must diagnose and fix them, learning the repository’s evolution, design trade‑offs, and typical failure patterns.
- Semantic‑Runtime Alignment – using the same historical bugs, the model generates test cases that capture expected semantics, training it to detect and reconcile mismatches between specifications and actual runtime behavior.
These experiences are generated automatically using a strong teacher model (Gemini‑2.5‑Pro) and then used to fine‑tune a 4‑billion‑parameter student model (Qwen3‑4B‑Instruct‑2507) via supervised fine‑tuning. The authors introduce a Repository‑Centric Evaluation (RCE) suite that evaluates a model on four software‑engineering tasks (issue resolution, feature implementation, test generation, code‑base QA) within the same repository, following a strict temporal protocol. Experiments on four popular open‑source projects (e.g., Django) and two internal codebases show that SWE‑Spot‑4B consistently outperforms much larger open‑weight models (CWM, Qwen3‑Coder‑30B) by 12‑18 % absolute accuracy and matches or exceeds efficiency‑focused commercial models such as GPT‑5‑nano and GPT‑4.1‑mini. Moreover, SWE‑Spot achieves a 30 % reduction in inference token usage and demonstrates higher sample efficiency: comparable performance requires roughly five times more data for TCL‑trained models.
Ablation studies reveal that each RCX unit contributes synergistically; removing any unit degrades performance, and the test‑generation unit notably improves issue‑resolution accuracy, confirming cross‑task transfer. The gains persist even when strong context retrieval and test‑time scaling are added, indicating that RCL induces deep parametric knowledge rather than merely leveraging larger context windows.
The paper concludes that RCL is a necessary complement to TCL for small models, enabling them to internalize repository‑specific knowledge and thereby close the capability gap with large models. The authors release code, data, and model weights, inviting further research on multi‑repository extensions, meta‑learning for rapid adaptation, and mitigation of teacher‑model bias. This work establishes that “small models can master a codebase if they are taught to master the codebase,” opening a path toward efficient, locally deployable coding agents.
Comments & Academic Discussion
Loading comments...
Leave a Comment