CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments

Reading time: 1 minute
...

📝 Original Info

  • Title: CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments
  • ArXiv ID: 2510.26852
  • Date: 2025-10-30
  • Authors: ** 정보 없음 (논문에 저자 정보가 제공되지 않음) **

📝 Abstract

Current evaluation for Large Language Model (LLM) code agents predominantly focus on generating functional code in single-turn scenarios, which fails to evaluate the agent's capability for continuous code optimization and multi-turn iterative development. To bridge this gap, we introduce CATArena, a framework designed to evaluate the evolutionary capabilities of code agents via iterative tournaments. Agents engage in multi-turn tournaments and continuously refine their code through self-reflection and peer-learning based on comprehensive execution feedback. For evaluation, we propose a dual-metric system to decouple static generation proficiency from evolutionary potential. Extensive experiments reveal that an agent's evolutionary potential is not strictly correlated with its initial proficiency. Our analysis further reveals that current agents struggle to concurrently leverage both peer-learning and self-reflection for effective performance gains. Furthermore, the results validate CATArena's high extensibility and resistance to variance tasks, establishing it as a continuous and reliable standard for assessing the evolutionary capability of LLM code agents.

💡 Deep Analysis

📄 Full Content

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut