Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but their practical effectiveness and efficiency within complex agentic frameworks for automated issue resolution remain poorly understood.
Goal. We investigate the performance, energy efficiency, and resource consumption of four leading agentic issue resolution frameworks when deliberately constrained to using SLMs. We aim to assess the viability of these systems for this task in resource-limited settings and characterize the resulting trade-offs.
Method. We conduct a controlled evaluation of four leading agentic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, AutoCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the SWE-bench Verified Mini benchmark. On fixed hardware, we measure energy, duration, token usage, and memory over 150 runs per configuration.
Results. We find that framework architecture is the primary driver of energy consumption. The most energy-intensive framework, AutoCodeRover (Gemma), consumed 9.4x more energy on average than the least energy-intensive, OpenHands (Gemma). However, this energy is largely wasted. Task resolution rates were near-zero, demonstrating that current frameworks, when paired with SLMs, consume significant energy on unproductive reasoning loops. The SLM's limited reasoning was the bottleneck for success, but the framework's design was the bottleneck for efficiency.
Conclusions. Current agentic frameworks, designed for powerful LLMs, fail to operate efficiently with SLMs. We find that framework architecture is the primary driver of energy consumption, but this energy is largely wasted due to the SLMs' limited reasoning. Viable low-energy solutions require shifting from passive orchestration to architectures that actively manage SLM weaknesses.
💡 Deep Analysis
📄 Full Content
SWEnergy: An Empirical Study on Energy Efficiency in Agentic
Issue Resolution Frameworks with SLMs
Arihant Tripathy
SERC, IIIT-Hyderabad
Hyderabad, Telangana, India
arihant.tripathy@research.iiit.ac.in
Ch Pavan Harshit
SERC, IIIT-Hyderabad
Hyderabad, Telangana, India
pavan.harshit@research.iiit.ac.in
Karthik Vaidhyanathan
SERC, IIIT-Hyderabad
Hyderabad, Telangana, India
karthik.vaidhyanathan@iiit.ac.in
Abstract
Context. Autonomous agents powered by Large Language Models
(LLMs) are increasingly used for software engineering, but their
reliance on large, proprietary models limits deployment on local
hardware. This has spurred interest in Small Language Models
(SLMs), but their practical effectiveness and efficiency within com-
plex agentic frameworks for automated issue resolution remain
poorly understood.
Goal. We investigate the performance, energy efficiency, and re-
source consumption of four leading agentic issue resolution frame-
works when deliberately constrained to using SLMs. Our goal is to
understand the viability of these systems for this task in resource-
limited settings and characterize the resulting trade-offs.
Method. We conduct a controlled evaluation of four leading agen-
tic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, Au-
toCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the
SWE-bench Verified Mini benchmark. On fixed hardware, we mea-
sure energy, duration, token usage, and memory over 150 runs per
configuration.
Results. We find that framework architecture is the primary driver
of energy consumption. The most energy-intensive framework,
AutoCodeRover (Gemma), consumed 9.4x more energy on average
than the least energy-intensive, OpenHands (Gemma). However,
this energy is largely wasted. Task resolution rates were near-zero
(4% for AutoCodeRover, 0% for all others), demonstrating that cur-
rent frameworks, when paired with SLMs, consume significant
energy on unproductive reasoning loops. The SLM’s limited rea-
soning was the bottleneck for success, but the framework’s design
was the bottleneck for efficiency.
Conclusions. Current agentic frameworks, designed for powerful
LLMs, fail to operate efficiently with SLMs. We find that frame-
work architecture is the primary driver of energy consumption,
but this energy is largely wasted due to the SLMs’ limited reason-
ing. Achieving viable, low-energy solutions requires a paradigm
shift from passive orchestration to new architectures that actively
manage the SLM’s weaknesses.
CCS Concepts
• Software and its engineering →Software creation and manage-
ment; Software creation and management; • Hardware →Impact
on the environment; Power estimation and optimization; • Gen-
eral and reference →Empirical studies; Experimentation.
Keywords
Agentic Issue Resolution, Empirical Study, Energy Efficiency, Small
Language Models
1
Introduction
Autonomous software engineering agents, powered by Large Lan-
guage Models (LLMs), have emerged as a transformative paradigm
in software development [9, 21], demonstrating impressive capa-
bilities in resolving real-world code issues on benchmarks like
SWE-bench. However, this success comes at a cost: massive, cloud-
hosted models create significant barriers to local deployment due
to their high computational costs and energy consumption [22].
As the environmental impact of AI systems has become a central
concern in the push toward sustainable software engineering prac-
tices [5], resource efficiency, in terms of computation, cost, and
power consumption, has emerged as a critical factor for widespread
adoption.
The challenge of resource efficiency has intensified interest in
Small Language Models (SLMs)—open-weight models with billions,
rather than hundreds of billions, of parameters [7, 8], offering
a path toward democratizing autonomous agents via consumer-
grade hardware, albeit with inherent limitations in reasoning and
instruction-following capabilities. The effectiveness of these SLMs
within complex agentic frameworks remains largely unmeasured.
An agentic framework defines how a language model structures its
reasoning, coordinates external tools, and executes problem-solving
workflows [14, 18]. It remains uncertain whether existing frame-
works, originally tuned to leverage the capabilities of large models,
can maintain their effectiveness when driven by more limited ones.
Agentic frameworks extend beyond single-turn code genera-
tion by enabling language models to autonomously plan, execute,
and verify the sequence of steps required to resolve software is-
sues. Systems such as SWE-Agent, OpenHands, and AutoCodeRover
have demonstrated strong performance on benchmarks like SWE-
bench, which evaluates real-world GitHub issues as problem state-
ments [11, 20, 23, 25]. While typically built around large proprietary
models, emerging work suggests SLMs may offer resource-efficient
alternatives for narrow, well-defined agentic tasks [3, 12]. However,
it remains unclear whether existing agentic architectures