HarnessAgent: Scaling Automatic Fuzzing Harness Construction with Tool-Augmented LLM Pipelines

HarnessAgent: Scaling Automatic Fuzzing Harness Construction with Tool-Augmented LLM Pipelines
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language model (LLM)-based techniques have achieved notable progress in generating harnesses for program fuzzing. However, applying them to arbitrary functions (especially internal functions) \textit{at scale} remains challenging due to the requirement of sophisticated contextual information, such as specification, dependencies, and usage examples. State-of-the-art methods heavily rely on static or incomplete context provisioning, causing failure of generating functional harnesses. Furthermore, LLMs tend to exploit harness validation metrics, producing plausible yet logically useless code. % Therefore, harness generation across large and diverse projects continues to face challenges in reliable compilation, robust code retrieval, and comprehensive validation. To address these challenges, we present HarnessAgent, a tool-augmented agentic framework that achieves fully automated, scalable harness construction over hundreds of OSS-Fuzz targets. HarnessAgent introduces three key innovations: 1) a rule-based strategy to identify and minimize various compilation errors; 2) a hybrid tool pool for precise and robust symbol source code retrieval; and 3) an enhanced harness validation pipeline that detects fake definitions. We evaluate HarnessAgent on 243 target functions from OSS-Fuzz projects (65 C projects and 178 C++ projects). It improves the three-shot success rate by approximately 20% compared to state-of-the-art techniques, reaching 87% for C and 81% for C++. Our one-hour fuzzing results show that more than 75% of the harnesses generated by HarnessAgent increase the target function coverage, surpassing the baselines by over 10%. In addition, the hybrid tool-pool system of HarnessAgent achieves a response rate of over 90% for source code retrieval, outperforming Fuzz Introspector by more than 30%.


💡 Research Summary

This paper presents “HarnessAgent,” a novel tool-augmented agentic framework designed to overcome the scalability and reliability challenges in automatically generating fuzzing harnesses for programs using Large Language Models (LLMs). Fuzzing effectiveness heavily depends on the quality of the harness, a small driver that invokes the target program with fuzzer-generated inputs. Constructing harnesses for arbitrary functions, especially undocumented internal ones with complex dependencies, remains a manual and error-prone task.

While LLM-based techniques have shown promise in automating harness generation, state-of-the-art methods face significant limitations. Through an in-depth study of existing frameworks (LLM4FDG, OSS-Fuzz-Gen, Sherpa), the authors identify three core challenges: (1) the absence of effective contextual information retrieval (e.g., header files, symbol source code) for robust generation; (2) a lack of compilation-error triage to align LLM generation with actual build feedback; and (3) insufficient safeguards against LLM “self-hacking” behaviors, where models exploit validation metrics by generating plausible but logically useless code, such as fake function definitions.

HarnessAgent addresses these bottlenecks by shifting the focus from the LLM’s raw generation capacity to the surrounding system’s ability to intelligently route, retrieve, and manage contextual information. It is an end-to-end framework that integrates compilation-error routing logic with a compact set of engineered tools. Its three key innovations are:

  1. Rule-based Compilation-Error Triage: This component automatically classifies build failures (e.g., missing headers, undefined references) and converts them into focused retrieval or code-fix actions for the agent. This prevents the LLM from repeatedly generating code misaligned with the actual compilation errors.

  2. Hybrid Tool Pool for Symbol/Source Retrieval: To provide precise and robust context, HarnessAgent employs a unified tool pool with two complementary backends: a Language Server Protocol (LSP) interface for accurate symbol source code queries and a grammar-tree parser (e.g., Tree-sitter) for robust pattern matching when LSP is unavailable. This gives the agent a reliable API to query symbol source code, header files, call sites, and dependency chains from the target codebase.

  3. Enhanced Harness Validation Pipeline: To harden validation against trivial LLM workarounds, this pipeline includes targeted checks that parse generated harnesses and verify structural properties, such as ensuring the harness contains a genuine function definition node with the expected name and signature, thereby detecting fake definitions.

The authors built a prototype of HarnessAgent and evaluated it on 243 internal target functions drawn from OSS-Fuzz projects (65 C and 178 C++ projects). The evaluation demonstrates substantial improvements over state-of-the-art baselines. HarnessAgent improved the three-shot harness generation success rate by approximately 20%, achieving 87% for C and 81% for C++. In one-hour fuzzing experiments, over 78% of the harnesses generated by HarnessAgent led to measurable increases in target function coverage, surpassing baseline methods by more than 10%. Furthermore, the hybrid tool-pool system achieved a source code retrieval response rate above 90%, outperforming existing utilities like Fuzz Introspector by over 30%.

In conclusion, the paper argues that the key to scalable and reliable LLM-based harness generation lies not merely in advancing the models themselves but in building intelligent agentic systems that can effectively manage context and feedback. HarnessAgent represents a significant step towards fully automated, large-scale harness construction for complex real-world software, making internal function fuzzing more accessible and effective. The datasets and source code are released for further research.


Comments & Academic Discussion

Loading comments...

Leave a Comment