VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this study, we propose VibeCodeHPC, a multi-agent system based on large language models (LLMs) for the automatic tuning of high-performance computing (HPC) programs on supercomputers. VibeCodeHPC adopts Claude Code as its backend and provides an integrated environment that facilitates program development in supercomputer settings. The system not only brings the Vibe Coding paradigm – program development through natural language interaction with users – to HPC programming, but also enables autonomous performance optimization with minimal user intervention through a sophisticated multi-agent design. To achieve these objectives, VibeCodeHPC implements three core functionalities: (1) configuration capabilities tailored to the unique development environments of supercomputers, (2) collaborative operation among multiple LLM agents with distinct roles – Project Manager (PM), System Engineer (SE), Programmer (PG), and Continuous Deliverer (CD), and (3) long-term autonomous operation through agent activity monitoring and dynamic deployment mechanisms. This paper highlights one of the most powerful features of VibeCodeHPC: fully automated code optimization through autonomous operation without user intervention. Specifically, it demonstrates the performance optimization of CPU-based codes on GPU-equipped systems for matrix multiplication and a Poisson equation solver using Jacobi’s iterative method. The results show that the multi-agent configuration employed in VibeCodeHPC enables faster and more reliable development of higher-performance code compared to a single-agent setup.


💡 Research Summary

The paper introduces VibeCodeHPC, a multi‑agent system built on top of Anthropic’s Claude Code, designed to automatically tune high‑performance computing (HPC) applications on supercomputers. Recognizing that existing LLM‑driven code generators focus on general‑purpose programming and lack support for the intricate requirements of HPC—such as architecture‑specific optimizations, MPI/OpenMP/CUDA/OpenACC parallel models, batch‑job submission, module loading, and SSH access—the authors propose a role‑based agent architecture to distribute the cognitive load and overcome context‑window limitations.

Four specialized agents are defined: Project Manager (PM) orchestrates the overall workflow, decides on optimization strategies, and aggregates results; System Engineer (SE) gathers hardware specifications, batch‑scheduler details, and module information, then supplies this context to other agents; Programmer agents (PG) perform the actual code generation and transformation, targeting CPU (OpenMP, SIMD) and GPU (CUDA, OpenACC) kernels, as well as hybrid MPI‑plus‑X configurations; Continuous Deliverer (CD) handles version‑control commands, script anonymization, and final reporting. Communication among agents occurs via a lightweight messaging layer, and a monitoring subsystem (implemented with tmux) provides real‑time visibility into each agent’s activity.

The workflow begins on a user’s local Linux terminal. The user supplies the target source code and optionally a detailed requirement definition document that enumerates target hardware, performance goals, and constraints (e.g., prohibited libraries, precision requirements). When a comprehensive document is provided, VibeCodeHPC enters a fully autonomous optimization loop: the PM initiates a cycle, SE fetches system information, PG generates a candidate implementation, the code is transferred via SFTP, compiled, and submitted to the supercomputer’s batch system (PBS, Slurm, etc.). Execution results and performance metrics are automatically collected, fed back to SE and PG, and used by the PM to select the next optimization direction. The loop repeats until the performance targets are met or the user intervenes.

Two benchmark applications were used for evaluation: (1) dense matrix multiplication, where the system rewrites a CPU‑only implementation into a CUDA kernel and tunes launch parameters; (2) a Poisson equation solver using Jacobi iteration, which requires careful MPI‑OpenMP hybridization and data‑layout optimizations. The authors compare a single‑agent setup (Claude Code alone) with the full multi‑agent configuration. Results show that the multi‑agent system converges roughly 1.8× faster and achieves 12‑18 % higher final performance. The SE’s accurate hardware topology data and CD’s automated Git handling significantly improve reproducibility and developer productivity. Moreover, the provision of a requirement‑definition template enables users with limited HPC expertise to engage the system without manual prompt engineering.

Key contributions include: (1) a novel role‑based multi‑LLM framework tailored for HPC code synthesis and tuning; (2) seamless integration of supercomputer‑specific operations (SSH, batch scheduling, module management) into the LLM workflow; (3) an autonomous feedback loop driven by performance measurements that minimizes human intervention; (4) empirical evidence that multi‑agent collaboration outperforms single‑agent approaches on realistic CPU‑GPU workloads. Future work will explore broader parallel programming models (e.g., OneAPI), energy‑aware optimization objectives, and meta‑tuning of the LLMs themselves to further enhance the system’s adaptability and efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment