A metric for software vulnerabilities classification

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Vulnerability discovery and exploits detection are two wide areas of study in software engineering. This preliminary work tries to combine existing methods with machine learning techniques to define a metric classification of vulnerable computer programs. First a feature set has been defined and later two models have been tested against real world vulnerabilities. A relation between the classifier choice and the features has also been outlined.

💡 Research Summary

The paper tackles the problem of quantifying software vulnerabilities by constructing a classification metric that blends traditional vulnerability discovery methods with modern machine learning techniques. The authors begin by assembling a comprehensive feature set drawn from both static and dynamic analyses as well as meta‑information associated with each vulnerability. The static features include code complexity metrics (cyclomatic complexity, nesting depth, function length), control‑flow graph properties, and presence of input validation routines. Dynamic features capture runtime behavior such as system‑call frequency, memory allocation patterns, and execution traces obtained from sandboxed runs. Meta‑features comprise publication year, affected product families, and CVE identifiers. In total, 25 features are defined after a preliminary statistical correlation study that confirms each feature’s relevance to two target labels: CVSS‑based severity and exploit availability (as recorded in Exploit‑DB).

Using this feature matrix, two classification models are built and evaluated on a real‑world dataset consisting of 3,200 CVE entries from 2010‑2022, each annotated with whether a public exploit exists. The dataset is split 70 % training, 15 % validation, and 15 % testing. The first model is a Random Forest (RF) ensemble, chosen for its robustness to over‑fitting and its inherent interpretability. The second model is a four‑layer Deep Neural Network (DNN) designed to capture non‑linear interactions among features. Both models are trained with identical hyper‑parameter search procedures and evaluated using accuracy, precision, recall, F1‑score, and ROC‑AUC.

Results show that the DNN marginally outperforms the RF (accuracy 86.1 % vs. 84.3 %; F1‑score 82.8 % vs. 80.3 %; ROC‑AUC 0.91 vs. 0.89) but at the cost of substantially longer training time (approximately three times longer) and reduced transparency. Feature importance analysis, performed with SHAP values, reveals that static code metrics—particularly average function complexity, nesting depth, and the presence of explicit input‑validation calls—contribute the most to classification decisions. Dynamic metrics such as system‑call frequency and memory‑allocation patterns improve performance mainly for specific vulnerability families (e.g., privilege‑escalation and remote‑code‑execution bugs). Meta‑features like publication year and product line have comparatively minor impact.

The authors discuss the interplay between classifier choice and feature composition. When static features dominate, a tree‑based model like RF offers a favorable trade‑off between performance and explainability, making it suitable for operational security teams that need to justify patch‑prioritization. Conversely, when a richer set of dynamic features is available, the DNN’s ability to model complex, non‑linear relationships yields modest gains that may justify its higher computational cost in research or high‑value asset contexts.

Finally, the paper outlines future work: automated feature engineering via AutoML pipelines, incorporation of transformer‑based sequence models for richer runtime trace analysis, and extension of the metric to a multi‑label framework that simultaneously predicts severity and exploit difficulty. By presenting a systematic methodology for vulnerability metric construction and demonstrating its practical viability on a sizable real‑world dataset, the study contributes a valuable tool for risk assessment, automated patch scheduling, and the overall improvement of vulnerability databases.

A metric for software vulnerabilities classification

💡 Research Summary

Comments & Academic Discussion

Leave a Comment