The Role of AI in Modern Penetration Testing
Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes. As systems grow in complexity, there is a pressing need for more scalable and efficient testing methodologies. This systematic literature review examines how Artificial Intelligence (AI) is reshaping penetration testing, analyzing 58 peer-reviewed studies from major academic databases. Our findings reveal that while AI-assisted pentesting is still in its early stages, notable progress is underway, particularly through Reinforcement Learning (RL), which was the focus of 77% of the reviewed works. Most research centers on the discovery and exploitation phases of pentesting, where AI shows the greatest promise in automating repetitive tasks, optimizing attack strategies, and improving vulnerability identification. Real-world applications remain limited but encouraging, including the European Space Agency’s PenBox and various open-source tools. These demonstrate AI’s potential to streamline attack path analysis, analyze complex network topology, and reduce manual workload. However, challenges persist: current models often lack flexibility and are underdeveloped for the reconnaissance and post-exploitation phases of pentesting. Applications involving Large Language Models (LLMs) remain relatively under-researched, pointing to a promising direction for future exploration. This paper offers a critical overview of AI’s current and potential role in penetration testing, providing valuable insights for researchers, practitioners, and organizations aiming to enhance security assessments through advanced automation or looking for gaps in existing research.
💡 Research Summary
The paper conducts a systematic literature review of 58 peer‑reviewed studies published between 2020 and 2025 to assess how artificial intelligence is reshaping penetration testing. Following the Kitchenham‑Charters methodology, the authors searched ACM Digital Library, IEEE Xplore, Scopus, and Springer Link using combined keywords for “penetration testing” and various AI terms (ML, RL, LLM, etc.). After duplicate removal and applying strict inclusion/exclusion criteria, 58 papers remained for analysis.
Four research questions guide the review. RQ1 asks how AI is currently applied; the answer shows that real‑world deployments are scarce, with the European Space Agency’s PenBox being the most notable example, limited to space‑system security. Other prototypes include RL‑based tools for dynamic XSS detection, network discovery, attack‑path optimization, and a deep‑neural‑network policy generator (ASAP). A unique LLM‑driven system, PenHeal, addresses the remediation phase by generating vulnerability reports and guiding fixes.
RQ2 investigates which AI methodologies dominate. Reinforcement learning accounts for roughly 77 % of the surveyed works, primarily used to model attacker behavior, optimize attack paths, and explore unknown network topologies. Large language models appear in only four studies, focusing on social‑engineering assistance, reporting, or remediation. One paper also explores synthetic media for creating fake identities in social‑engineering attacks.
RQ3 maps AI contributions to the four NIST‑defined pentesting phases: Preparation, Discovery, Exploitation, and Reporting. The majority (≈73 %) target the Discovery phase, followed by Exploitation. Preparation and Reporting receive far less attention (seven and five papers respectively), highlighting a research gap in AI‑supported reconnaissance and post‑exploitation documentation.
RQ4 evaluates benefits and limitations. Benefits include automation of repetitive tasks, speed and scalability improvements, and the ability to uncover novel vulnerabilities beyond signature‑based methods. Limitations involve model rigidity, false‑positive/negative rates, ethical and legal concerns, the risk of malicious misuse (especially for locally‑run RL tools), and dependence on external APIs for LLM‑based solutions, raising cost and privacy issues.
The authors conclude that AI‑assisted penetration testing is still in its infancy but shows strong momentum, especially in reinforcement‑learning research. They recommend expanding AI applications to the under‑explored Preparation and Reporting stages, integrating multimodal data, and developing human‑AI collaborative frameworks. Future work should also address model interpretability, robustness, and governance to mitigate misuse while harnessing AI’s potential to make security assessments more efficient and comprehensive.
Comments & Academic Discussion
Loading comments...
Leave a Comment