SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs

Reading time: 5 minute
...

📝 Original Info

📝 Abstract

Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but their practical effectiveness and efficiency within complex agentic frameworks for automated issue resolution remain poorly understood. Goal. We investigate the performance, energy efficiency, and resource consumption of four leading agentic issue resolution frameworks when deliberately constrained to using SLMs. We aim to assess the viability of these systems for this task in resource-limited settings and characterize the resulting trade-offs. Method. We conduct a controlled evaluation of four leading agentic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, AutoCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the SWE-bench Verified Mini benchmark. On fixed hardware, we measure energy, duration, token usage, and memory over 150 runs per configuration. Results. We find that framework architecture is the primary driver of energy consumption. The most energy-intensive framework, AutoCodeRover (Gemma), consumed 9.4x more energy on average than the least energy-intensive, OpenHands (Gemma). However, this energy is largely wasted. Task resolution rates were near-zero, demonstrating that current frameworks, when paired with SLMs, consume significant energy on unproductive reasoning loops. The SLM's limited reasoning was the bottleneck for success, but the framework's design was the bottleneck for efficiency. Conclusions. Current agentic frameworks, designed for powerful LLMs, fail to operate efficiently with SLMs. We find that framework architecture is the primary driver of energy consumption, but this energy is largely wasted due to the SLMs' limited reasoning. Viable low-energy solutions require shifting from passive orchestration to architectures that actively manage SLM weaknesses.

💡 Deep Analysis

Figure 1

📄 Full Content

SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs Arihant Tripathy SERC, IIIT-Hyderabad Hyderabad, Telangana, India arihant.tripathy@research.iiit.ac.in Ch Pavan Harshit SERC, IIIT-Hyderabad Hyderabad, Telangana, India pavan.harshit@research.iiit.ac.in Karthik Vaidhyanathan SERC, IIIT-Hyderabad Hyderabad, Telangana, India karthik.vaidhyanathan@iiit.ac.in Abstract Context. Autonomous agents powered by Large Language Models (LLMs) are increasingly used for software engineering, but their reliance on large, proprietary models limits deployment on local hardware. This has spurred interest in Small Language Models (SLMs), but their practical effectiveness and efficiency within com- plex agentic frameworks for automated issue resolution remain poorly understood. Goal. We investigate the performance, energy efficiency, and re- source consumption of four leading agentic issue resolution frame- works when deliberately constrained to using SLMs. Our goal is to understand the viability of these systems for this task in resource- limited settings and characterize the resulting trade-offs. Method. We conduct a controlled evaluation of four leading agen- tic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, Au- toCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the SWE-bench Verified Mini benchmark. On fixed hardware, we mea- sure energy, duration, token usage, and memory over 150 runs per configuration. Results. We find that framework architecture is the primary driver of energy consumption. The most energy-intensive framework, AutoCodeRover (Gemma), consumed 9.4x more energy on average than the least energy-intensive, OpenHands (Gemma). However, this energy is largely wasted. Task resolution rates were near-zero (4% for AutoCodeRover, 0% for all others), demonstrating that cur- rent frameworks, when paired with SLMs, consume significant energy on unproductive reasoning loops. The SLM’s limited rea- soning was the bottleneck for success, but the framework’s design was the bottleneck for efficiency. Conclusions. Current agentic frameworks, designed for powerful LLMs, fail to operate efficiently with SLMs. We find that frame- work architecture is the primary driver of energy consumption, but this energy is largely wasted due to the SLMs’ limited reason- ing. Achieving viable, low-energy solutions requires a paradigm shift from passive orchestration to new architectures that actively manage the SLM’s weaknesses. CCS Concepts • Software and its engineering →Software creation and manage- ment; Software creation and management; • Hardware →Impact on the environment; Power estimation and optimization; • Gen- eral and reference →Empirical studies; Experimentation. Keywords Agentic Issue Resolution, Empirical Study, Energy Efficiency, Small Language Models 1 Introduction Autonomous software engineering agents, powered by Large Lan- guage Models (LLMs), have emerged as a transformative paradigm in software development [9, 21], demonstrating impressive capa- bilities in resolving real-world code issues on benchmarks like SWE-bench. However, this success comes at a cost: massive, cloud- hosted models create significant barriers to local deployment due to their high computational costs and energy consumption [22]. As the environmental impact of AI systems has become a central concern in the push toward sustainable software engineering prac- tices [5], resource efficiency, in terms of computation, cost, and power consumption, has emerged as a critical factor for widespread adoption. The challenge of resource efficiency has intensified interest in Small Language Models (SLMs)—open-weight models with billions, rather than hundreds of billions, of parameters [7, 8], offering a path toward democratizing autonomous agents via consumer- grade hardware, albeit with inherent limitations in reasoning and instruction-following capabilities. The effectiveness of these SLMs within complex agentic frameworks remains largely unmeasured. An agentic framework defines how a language model structures its reasoning, coordinates external tools, and executes problem-solving workflows [14, 18]. It remains uncertain whether existing frame- works, originally tuned to leverage the capabilities of large models, can maintain their effectiveness when driven by more limited ones. Agentic frameworks extend beyond single-turn code genera- tion by enabling language models to autonomously plan, execute, and verify the sequence of steps required to resolve software is- sues. Systems such as SWE-Agent, OpenHands, and AutoCodeRover have demonstrated strong performance on benchmarks like SWE- bench, which evaluates real-world GitHub issues as problem state- ments [11, 20, 23, 25]. While typically built around large proprietary models, emerging work suggests SLMs may offer resource-efficient alternatives for narrow, well-defined agentic tasks [3, 12]. However, it remains unclear whether existing agentic architectures

📸 Image Gallery

correlation_all_scaffolds.png energy_usage_boxplot.png failure_reasons_stacked.png token_usage_histograms.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut