SynAT: Enhancing Security Knowledge Bases via Automatic Synthesizing Attack Tree from Crowd Discussions
Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI’s security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT’s practicality.
💡 Research Summary
The paper introduces SynAT, an automated pipeline that synthesizes attack trees from crowd‑sourced security discussions on platforms such as Stack Overflow and GitHub. Recognizing that traditional security knowledge bases like CVE suffer from latency between vulnerability discovery and official publication, the authors aim to bridge this gap by extracting emerging attack information directly from developer conversations. SynAT operates in three stages. First, a large language model (LLM) equipped with automatically generated prompts identifies sentences likely to contain attack‑related content, dramatically improving precision over simple keyword filters. Second, a transition‑based joint event and relation extraction model processes the selected sentences, simultaneously detecting attack events (trigger, target, instrument) and the logical connections between them (AND, OR). This joint learning approach captures inter‑event dependencies and yields high F1 scores (80.93 % for events, 87.81 % for relations). Third, a set of heuristic rules maps the extracted events and relations into a structured attack tree: the attack goal is linked to methods, AND edges require all child events, OR edges require any one child event, with additional steps to prune duplicates and enforce depth limits. The authors evaluated SynAT on 5,070 Stack Overflow security posts and 2,350 GitHub issue reports, comparing against several baselines including rule‑based extractors and state‑of‑the‑art relation models. SynAT achieved the best tree similarity metrics—average Hamming distance of 10.24 % and tree‑edit distance similarity of 7.93 %—and outperformed all baselines in both event and relation extraction. Practical impact is demonstrated by integrating 1,354 newly synthesized attack trees into Huawei’s internal security knowledge base and enriching public repositories such as CVE and CAPEC with previously undocumented attack scenarios. The paper also discusses limitations: dependence on well‑crafted LLM prompts, computational cost of the transition model, and the domain‑specific nature of the heuristic rules. Future work includes extending to multilingual forums, real‑time streaming data, and learning the synthesis rules automatically. Overall, SynAT offers a novel, effective solution for keeping security knowledge bases up‑to‑date by converting informal crowd discussions into formal, actionable attack‑tree representations.
Comments & Academic Discussion
Loading comments...
Leave a Comment