Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based process. In contrast, real-world software development is organized as a collaborative activity carried out by teams following shared methodologies, with clear role separation, communication, and review. In this work, we present a fully automated multi-agent system that explicitly models software engineering as an organizational process, replicating the structure of an engineering team. Built on top of agyn, an open-source platform for configuring agent teams, our system assigns specialized agents to roles such as coordination, research, implementation, and review, provides them with isolated sandboxes for experimentation, and enables structured communication. The system follows a defined development methodology for working on issues, including analysis, task specification, pull request creation, and iterative review, and operates without any human intervention. Importantly, the system was designed for real production use and was not tuned for SWE-bench. When evaluated post hoc on SWE-bench 500, it resolves 72.2% of tasks, outperforming single-agent baselines using comparable language models. Our results suggest that replicating team structure, methodology, and communication is a powerful paradigm for autonomous software engineering, and that future progress may depend as much on organizational design and agent infrastructure as on model improvements.

💡 Research Summary

The paper introduces a fully automated multi‑agent system for autonomous software engineering that explicitly mirrors the organizational structure of a real development team. Built on the open‑source Agyn platform, the system defines four specialized agents—Manager, Researcher, Engineer, and Reviewer—each equipped with role‑specific prompts, toolsets, isolated execution sandboxes, and tailored language‑model configurations. The Manager orchestrates the workflow, dynamically deciding which agent to invoke next based on intermediate results, rather than following a rigid, pre‑defined pipeline. The Researcher, powered by a large general‑purpose LLM (e.g., GPT‑5), performs deep issue analysis, repository exploration, and produces a structured task specification. The Engineer uses a smaller, code‑specialized model (e.g., GPT‑5‑CodeX) to edit code, run tests, and iteratively refine solutions within its own sandbox. The Reviewer creates pull requests on GitHub, conducts inline code reviews, leaves comments, and either approves or requests changes, thereby providing a concrete acceptance signal for the system.

All agents interact with real GitHub primitives through custom tools that automate branch management, pull‑request creation, and inline commenting. Each agent’s sandbox is provisioned via the Nix package manager, allowing on‑the‑fly installation of project dependencies while keeping environments isolated to avoid cross‑contamination of experiments. This design reflects how human developers work with local environments while coordinating through shared artifacts.

The authors evaluate the system post‑hoc on the SWE‑bench 500 benchmark, which consists of real GitHub issues paired with test suites. Importantly, the system was not tuned for SWE‑bench; it was originally built for production use and deployed in day‑to‑day engineering workflows. Under a fully automated setting, the multi‑agent team resolves 72.2 % of the benchmark tasks, outperforming the lightweight single‑agent baseline mini‑SWE‑agent (≈64.8 %) by 7.4 percentage points, despite using comparable underlying models. The performance gain is attributed primarily to the organizational decomposition: heterogeneous subtasks receive appropriate context windows and model capacities, and the iterative, reviewer‑driven feedback loop mirrors real development cycles, improving robustness and correctness.

Key contributions include: (1) the open‑source Agyn platform for configuring and orchestrating multi‑agent systems with explicit communication and sandboxing; (2) a concrete role‑based team architecture with differentiated prompts, tools, and model allocations; (3) custom GitHub‑native tooling enabling autonomous pull‑request creation and inline review; (4) empirical evidence that a production‑oriented, benchmark‑agnostic multi‑agent system can achieve competitive benchmark performance; and (5) an artifact release comprising forked repositories, opened issues, pull requests, and full communication traces.

The study argues that replicating team structure, methodology, and communication is as crucial to autonomous software engineering progress as improvements in LLM quality. By treating software engineering as an organizational process rather than a monolithic code‑generation task, the work opens a path toward more scalable, cost‑effective, and reliable autonomous development systems. Future research directions highlighted include richer coordination protocols, dynamic role reassignment, cost‑aware model scheduling, and deeper integration of human‑in‑the‑loop oversight when needed.

Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering

💡 Research Summary

Comments & Academic Discussion

Leave a Comment