A Dual-Loop Agent Framework for Automated Vulnerability Reproduction
Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose CVE2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the Tactical Loop for code-level refinement, while the Strategic Loop for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that CVE2PoC achieves 82.9% and 54.3% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3% and 20.4%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.
💡 Research Summary
The paper addresses the labor‑intensive task of reproducing vulnerabilities from CVE descriptions by proposing CVE2PoC, a dual‑loop LLM‑based agent framework that separates strategic planning from tactical execution. Existing approaches either rely on static analysis, fuzzing, or single‑cycle LLM agents that conflate attack direction errors with implementation bugs, leading to inefficient debugging and low success rates, especially for logic‑type or non‑crash vulnerabilities.
CVE2PoC introduces three core modules:
-
Strategic Planner – parses the natural‑language CVE text together with the target codebase to construct a structured attack plan. The plan encodes vulnerability type, trigger conditions, required environment, input/output flow, and success criteria, ensuring that the agent has a coherent exploitation strategy before any code is written.
-
Tactical Executor – translates the attack plan into executable PoC code using a modern LLM (e.g., GPT‑4) and a tool‑calling interface. It then subjects the generated code to a progressive multi‑layer verification pipeline: syntactic checks, static data‑flow/type analysis, dynamic execution with detailed logging, and differential testing between vulnerable and patched versions (or simulated oracles when patches are unavailable). Each layer reports precise failure points rather than a binary pass/fail.
-
Adaptive Refiner – consumes verification results and diagnoses the failure source. If the issue is an implementation error (e.g., wrong API parameters, syntax mistakes), the system stays within the Tactical Loop, refining the code while preserving the original strategy. If the failure stems from a flawed attack strategy (e.g., misidentified vulnerability class, inappropriate vector), control returns to the Strategic Loop for replanning. A sparse experience index stores past successes and failures, enabling efficient retrieval of relevant cases without inflating the LLM context window.
The framework operates through two feedback loops. The Tactical Loop iterates between the Executor and Refiner, fixing code‑level bugs. When tactical fixes are insufficient, the Strategic Loop re‑engages the Planner to generate a new attack plan. This separation prevents wasted effort on fundamentally unsound strategies and avoids premature abandonment of promising approaches due to minor bugs.
Evaluation was performed on two public benchmarks: SecBench.js (387 npm package vulnerabilities) and PatchEval (230 vulnerabilities across Go, JavaScript, and Python). CVE2PoC achieved reproduction success rates of 82.9 % and 54.3 % respectively, surpassing the best prior baselines by 11.3 % and 20.4 %. On average, successful PoCs required only 3.2 tactical iterations and 0.7 strategic replannings, indicating effective error diagnosis. Human assessment by 20 security professionals showed that CVE2PoC‑generated PoCs scored 4.15/5 in readability, 4.08/5 in reusability, and 4.12/5 in maintainability, outperforming manually written PoCs (3.62/5, 3.71/5, 3.68/5).
Key contributions are: (1) the first dual‑loop architecture that cleanly separates strategic exploitation planning from tactical code generation, (2) a progressive multi‑layer verification pipeline that reliably detects whether a PoC truly triggers the vulnerability, and (3) an adaptive refinement mechanism with sparse experience indexing that routes failures to the appropriate loop while keeping LLM context efficient.
The authors argue that future work should extend the framework to broader language ecosystems, automate environment provisioning for more complex software stacks, develop vulnerability‑specific oracles for richer validation, and explore multi‑agent collaboration for compound attack scenarios. CVE2PoC demonstrates that integrating strategic reasoning into LLM‑driven security automation can substantially improve both success rates and code quality, moving the field beyond naïve generate‑and‑test pipelines toward more intelligent, diagnosis‑driven exploitation synthesis.
Comments & Academic Discussion
Loading comments...
Leave a Comment