VulReaD: Knowledge-Graph-guided Software Vulnerability Reasoning and Detection
Software vulnerability detection (SVD) is a critical challenge in modern systems. Large language models (LLMs) offer natural-language explanations alongside predictions, but most work focuses on binary evaluation, and explanations often lack semantic consistency with Common Weakness Enumeration (CWE) categories. We propose VulReaD, a knowledge-graph-guided approach for vulnerability reasoning and detection that moves beyond binary classification toward CWE-level reasoning. VulReaD leverages a security knowledge graph (KG) as a semantic backbone and uses a strong teacher LLM to generate CWE-consistent contrastive reasoning supervision, enabling student model training without manual annotations. Students are fine-tuned with Odds Ratio Preference Optimization (ORPO) to encourage taxonomy-aligned reasoning while suppressing unsupported explanations. Across three real-world datasets, VulReaD improves binary F1 by 8-10% and multi-class classification by 30% Macro-F1 and 18% Micro-F1 compared to state-of-the-art baselines. Results show that LLMs outperform deep learning baselines in binary detection and that KG-guided reasoning enhances CWE coverage and interpretability.
💡 Research Summary
Software vulnerability detection (SVD) remains a critical security challenge, yet most existing approaches treat it as a binary classification problem—simply labeling code as vulnerable or not. This binary focus neglects the need for precise identification of the underlying weakness, which is formally captured by the Common Weakness Enumeration (CWE) taxonomy. Recent work has explored large language models (LLMs) for SVD because of their strong code‑understanding and natural‑language generation capabilities. However, explanations generated by LLMs often suffer from hallucinations or misalignment with CWE categories, especially for rare vulnerability types.
VulReaD addresses these gaps by introducing a security knowledge graph (KG) that encodes hierarchical and semantic relationships among CWE entries, associated code constructs, and mitigation concepts. The KG serves three purposes: (1) it enriches existing benchmark datasets (PrimeVul, DiverseVul, R2Vul) with structured CWE‑related attributes; (2) it guides a powerful teacher LLM to produce contrastive reasoning pairs for each code sample—one CWE‑consistent rationale and one deliberately flawed rationale generated via label‑flipping; and (3) it provides a semantic backbone during inference to ensure that model predictions are grounded in the taxonomy.
The student model is fine‑tuned using parameter‑efficient adapters combined with Odds Ratio Preference Optimization (ORPO). ORPO maximizes the probability ratio between the correct and incorrect rationales, encouraging the model to prefer factually grounded, CWE‑aligned explanations while suppressing misleading ones. This training pipeline eliminates the need for costly human preference annotations across the long‑tailed CWE space.
Empirical evaluation across three real‑world datasets shows that VulReaD consistently outperforms state‑of‑the‑art baselines. In binary detection, it improves F1 scores by roughly 8–10 %. For multi‑class CWE classification, it achieves an average 30 % gain in macro‑F1 and an 18 % gain in micro‑F1. Notably, traditional deep‑learning models attain only about 30 % average F1, whereas LLM‑based methods, especially the Qwen2.5‑7B‑Instruct student, deliver substantially higher performance. The KG‑guided reasoning also expands CWE coverage and markedly enhances the interpretability of generated explanations.
The paper makes four key contributions: (1) a shift from binary detection to CWE‑level vulnerability reasoning; (2) a knowledge‑guided dataset distillation process that creates paired, taxonomy‑consistent rationales without manual labeling; (3) the integration of ORPO with lightweight fine‑tuning to produce grounded explanations; and (4) a comprehensive comparative study demonstrating that (i) LLMs generally surpass deep‑learning baselines in binary SVD, and (ii) KG‑guided reasoning substantially boosts accuracy and interpretability in multi‑class CWE detection. These findings underscore the importance of coupling large language models with structured security knowledge to build trustworthy, explainable vulnerability analysis systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment