SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

February 22, 2026

Reading time: 1 minute

...

📝 Original Info

Title: SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
ArXiv ID: 2511.05459
Date: 2025-11-07
Authors: ** 제공된 정보에 저자 명단이 포함되어 있지 않습니다. **

📝 Abstract

Evaluating large language models (LLMs) for software engineering has been limited by narrow task coverage, language bias, and insufficient alignment with real-world developer workflows. Existing benchmarks often focus on algorithmic problems or Python-centric bug fixing, leaving critical dimensions of software engineering underexplored. To address these gaps, we introduce SWE-Compass1, a comprehensive benchmark that unifies heterogeneous code-related evaluations into a structured and production-aligned framework. SWE-Compass spans 8 task types, 8 programming scenarios, and 10 programming languages, with 2000 high-quality instances curated from authentic GitHub pull requests and refined through systematic filtering and validation. We benchmark ten state-of-the-art LLMs under two agentic frameworks, SWE-Agent and Claude Code, revealing a clear hierarchy of difficulty across task types, languages, and scenarios. Moreover, by aligning evaluation with real-world developer practices, SWE-Compass provides a rigorous and reproducible foundation for diagnosing and advancing agentic coding capabilities in large language models.

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

'Studies for': A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model

A Theoretical Framework for Modular Learning of Robust Generative Models

Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs

Start searching

No results found