CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation
Advancements in large language models (LLMs) have enabled the development of intelligent educational tools that support inquiry-based learning across technical domains. In cybersecurity education, where accuracy and safety are paramount, systems must go beyond surface-level relevance to provide information that is both trustworthy and domain-appropriate. To address this challenge, we introduce CyberBOT, a question-answering chatbot that leverages a retrieval-augmented generation (RAG) pipeline to incorporate contextual information from course-specific materials and validate responses using a domain-specific cybersecurity ontology. The ontology serves as a structured reasoning layer that constrains and verifies LLM-generated answers, reducing the risk of misleading or unsafe guidance. CyberBOT has been deployed in a large graduate-level course at Arizona State University (ASU), where more than one hundred students actively engage with the system through a dedicated web-based platform. Computational evaluations in lab environments highlight the potential capacity of CyberBOT, and a forthcoming field study will evaluate its pedagogical impact. By integrating structured domain reasoning with modern generative capabilities, CyberBOT illustrates a promising direction for developing reliable and curriculum-aligned AI applications in specialized educational contexts.
💡 Research Summary
CyberBOT is a question‑answering chatbot designed for cybersecurity education that combines Retrieval‑Augmented Generation (RAG) with a domain‑specific cybersecurity ontology to improve factual accuracy and safety of large language model (LLM) outputs. The system follows a three‑stage pipeline: (1) an Intent Interpreter analyses multi‑turn conversational history to infer the student’s underlying intent and rewrites the user query into a knowledge‑intensive version; (2) a Retriever searches a curated knowledge base—comprising expert‑crafted QA pairs and course materials from ASU’s CSE 546 Cloud Computing class—using FAISS‑based similarity search on embeddings generated by BAAI‑Bge‑Large‑1.5, returning the top three relevant document chunks; (3) a Generator (Llama 3.3 70B) produces an initial answer conditioned on the retrieved context, after which an Ontology Verifier (also Llama 3.3 70B) checks the answer against a handcrafted cybersecurity ontology. The ontology encodes entities, relationships, and logical constraints (e.g., attacks must be linked to vulnerabilities, mitigation strategies must map to security policies) and provides a validation score between 0 and 1. Answers below a predefined confidence threshold are rejected or regenerated, ensuring that only ontology‑aligned responses are delivered to the learner.
The architecture is containerised with Docker, runs on an A100 80 GB GPU, and exposes a Streamlit web UI. Interaction logs and learning histories are stored in a lightweight SQLite database, enabling future personalization. The system is currently deployed for over one hundred graduate students in the Spring 2025 offering of CSE 546 at Arizona State University, where students can query concepts, request step‑by‑step explanations for assignments, or seek code‑level guidance.
For evaluation, the authors used the CyberQ dataset (≈3,500 open‑ended cybersecurity QA pairs) split into Zero‑shot, Few‑shot, and Ontology‑Driven subsets. They measured both QA‑centric metrics (BERTScore, METEOR, ROUGE‑1/2) and RAG‑centric metrics (Faithfulness, Answer Relevancy, Context Precision/Recall, Context Entity Recall) via the RAGAS framework. CyberBOT achieved an average BERTScore of 0.933 and Context Recall of 0.994 across all subsets, indicating high answer quality and effective retrieval. Notably, the Ontology‑Driven subset showed the strongest alignment, confirming that ontology‑based validation substantially reduces hallucinations and improves semantic consistency.
The paper’s contributions are threefold: (1) introducing a novel RAG‑plus‑ontology QA architecture that mitigates factual errors in a high‑risk domain; (2) constructing a course‑specific knowledge base and a detailed cybersecurity ontology that can be reused or extended for other curricula; (3) providing a real‑world deployment case with substantial student engagement, laying groundwork for systematic field studies on learner trust, satisfaction, and learning outcomes.
Limitations include the manual effort required to build and maintain the ontology, handling of queries that fall outside the ontology’s scope, and the computational cost and latency associated with large LLM inference. Future work will explore automatic ontology expansion via knowledge‑graph mining, multimodal integration of code and network traffic data, and adaptive learning models that incorporate real‑time student feedback to further personalize the tutoring experience. The authors anticipate that this hybrid approach can be generalized beyond cybersecurity to any specialized educational setting where accuracy and safety are paramount.
Comments & Academic Discussion
Loading comments...
Leave a Comment