Insider Threat Detection Using GCN and Bi-LSTM with Explicit and Implicit Graph Representations

Reading time: 5 minute
...

📝 Original Info

  • Title: Insider Threat Detection Using GCN and Bi-LSTM with Explicit and Implicit Graph Representations
  • ArXiv ID: 2512.18483
  • Date: 2025-12-20
  • Authors: Rahul Yumlembam, Biju Issac, Seibu Mary Jacob, Longzhi Yang, Deepa Krishnan

📝 Abstract

Insider threat detection (ITD) is challenging due to the subtle and concealed nature of malicious activities performed by trusted users. This paper proposes a post-hoc ITD framework that integrates explicit and implicit graph representations with temporal modelling to capture complex user behaviour patterns. An explicit graph is constructed using predefined organisational rules to model direct relationships among user activities. To mitigate noise and limitations in this hand-crafted structure, an implicit graph is learned from feature similarities using the Gumbel-Softmax trick, enabling the discovery of latent behavioural relationships. Separate Graph Convolutional Networks (GCNs) process the explicit and implicit graphs to generate node embeddings, which are concatenated and refined through an attention mechanism to emphasise threat-relevant features. The refined representations are then passed to a bidirectional Long Short-Term Memory (Bi-LSTM) network to capture temporal dependencies in user behaviour. Activities are flagged as anomalous when their probability scores fall below a predefined threshold. Extensive experiments on CERT r5.2 and r6.2 datasets demonstrate that the proposed framework outperforms state-of-the-art methods. On r5.2, the model achieves an AUC of 98.62, a detection rate of 100%, and a false positive rate of 0.05. On the more challenging r6.2 dataset, it attains an AUC of 88.48, a detection rate of 80.15%, and a false positive rate of 0.15, highlighting the effectiveness of combining graph-based and temporal representations for robust ITD.

💡 Deep Analysis

Figure 1

📄 Full Content

the most critical challenges in modern cybersecurity. Trusted individuals with legitimate access to sensitive systems can execute malicious activities that are notoriously difficult to detect using conventional methods. Current approaches often rely on either explicit graphs, which capture observable relationships, or implicit graphs, which identify hidden patterns. However, each alone has limitations, leading to suboptimal detection performance.

The framework introduced in this paper overcomes these limitations by combining explicit and implicit graph representations with temporal modeling via a Bi-LSTM network. This hybrid approach captures both observable and hidden relational patterns while analyzing the sequential dynamics of user behaviour. Tested on the CERT insider threat datasets, the framework achieved groundbreaking results. In the r5.2 data set, it reached an AUC of 98.62, a detection rate of 100%, and a false positive rate of 0.05, surpassing state-of-the-art methods such as LAN and DeepLog. On the more challenging r6.2 dataset, it demonstrated robust performance with an AUC of 88.48 and a detection rate of 80 15%.

This research has the potential to significantly enhance insider threat detection systems in environments such as critical infrastructure, financial services, and government organizations, where timely and accurate detection is essential to safeguard sensitive data and assets. By improving detection accuracy and reducing false positives, this work lays a strong foundation for developing more reliable and efficient solutions to address insider threats.

The increasing threat of malicious insider activities within organisations is a critical concern. These threats are challenging to detect due to insiders’ legitimate network access. Organisations routinely collect network logs encapsulating various information -login/logout times, opened files, removable device connection/disconnection, web browsing history, and email communication. Analysing these logs can reveal consistent patterns in a user’s daily behaviour, considering the assumption that a user’s activities are mainly consistent. Numerous studies have been conducted on insider threat detection, which can generally be categorised into three main types. The first category focuses on identifying various patterns of insider threats and conducting anomaly detection. These methods aim to establish baseline behaviours for users to differentiate between normal users and potential insider threats, employing approaches such as machine learning and deep learning. The second category of methods emphasises transforming user behaviours into sequence data, capturing the temporal relationships among log entries. Techniques like recurrent neural networks (RNN) and long short-term memory (LSTM) are frequently used to capture temporal aspects of activities. The third group constructs graphs that model the relationships between users or activities, capturing their underlying relational structure.

In this work, we focus on combining graph-based and temporal methods due to their ability to capture complex and subtle patterns in relational data, along with the temporal aspects of the data. Most previous work has approached the problem by either constructing implicit graphs or explicit graphs. In modelling relationships among activities, recent studies have approached implicit graph structure learning as a process of learning similarity metrics within the node embedding space. This approach assumes that node attributes inherently contain valuable information for deducing the implicit topological structure of the graph [2]. In explicit graph learning, recent work constructs the graph based on predefined rules that consider different relationships between activities [12], [13], [3], [4]. This method captures direct, observable interactions, providing a clear and structured representation of user activities and interactions.

Relying solely on either an implicit graph or an explicit graph poses significant challenges. Using only an implicit graph assumes that its optimized structure represents a “variation” or substructure tailored for the downstream task. Graph structure learning aims to capture this optimized graph through a similarity metric, but it might need to include the valuable contextual information embedded in explicit relationships. On the other hand, an explicit graph captures direct, observable interactions but may overlook hidden patterns and subtle relationships crucial for detecting anomalies. Explicit graphs often contain noise and may be incomplete, thus providing an inadequate representation of the data. In contrast, the implicit graph, derived from feature similarities, is optimized for downstream prediction tasks and can identify refined, optimized relationships that the explicit graph might overlook.

Therefore, this work combines both explicit and implicit graphs to leverage the strengths of each, ensuring a comprehensive represe

📸 Image Gallery

insiderrevised.png roc_curves_r5.png roc_curves_r6.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut