CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments

February 22, 2026

Reading time: 1 minute

...

📝 Original Info

Title: CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments
ArXiv ID: 2510.26852
Date: 2025-10-30
Authors: ** 정보 없음 (논문에 저자 정보가 제공되지 않음) **

📝 Abstract

Current evaluation for Large Language Model (LLM) code agents predominantly focus on generating functional code in single-turn scenarios, which fails to evaluate the agent's capability for continuous code optimization and multi-turn iterative development. To bridge this gap, we introduce CATArena, a framework designed to evaluate the evolutionary capabilities of code agents via iterative tournaments. Agents engage in multi-turn tournaments and continuously refine their code through self-reflection and peer-learning based on comprehensive execution feedback. For evaluation, we propose a dual-metric system to decouple static generation proficiency from evolutionary potential. Extensive experiments reveal that an agent's evolutionary potential is not strictly correlated with its initial proficiency. Our analysis further reveals that current agents struggle to concurrently leverage both peer-learning and self-reflection for effective performance gains. Furthermore, the results validate CATArena's high extensibility and resistance to variance tasks, establishing it as a continuous and reliable standard for assessing the evolutionary capability of LLM code agents.

CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Start searching

No results found