웹쉘 패밀리 자동 분류를 위한 동적 호출 추적과 그래프 기반 표현 연구

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: 웹쉘 패밀리 자동 분류를 위한 동적 호출 추적과 그래프 기반 표현 연구
ArXiv ID: 2512.05288
Date: 2025-12-04
Authors: Feijiang Han

📝 Abstract

Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samples from benign ones), we argue that it is time to transition from passive detection to in-depth analysis and proactive defense. One promising direction is the automation of WebShell family classification, which involves identifying the specific malware lineage in order to understand an adversary's tactics and enable a precise, rapid response. This crucial task, however, remains a largely unexplored area that currently relies on slow, manual expert analysis. To address this gap, we present the first systematic study to automate Web-Shell family classification. Our method begins with extracting dynamic function call traces to capture inherent behaviors that are resistant to common encryption and obfuscation. To enhance the scale and diversity of our dataset for a more stable evaluation, we augment these real-world traces with new variants synthesized by Large Language Models. These augmented traces are then abstracted into sequences, graphs, and trees, providing a foundation to benchmark a comprehensive suite of representation methods. Our evaluation spans classic sequence-based embeddings (CBOW, GloVe), transformers (BERT, SimCSE), and a range of structure-aware algorithms, including Graph Kernels, Graph Edit Distance, Graph2Vec, and various Graph Neural Networks. Through extensive experiments on four real-world, family-annotated datasets under both supervised and unsupervised settings, we establish a robust baseline and provide practical insights into the most effective combinations of data abstractions, representation models, and learning paradigms for this challenge. This foundational work is a crucial step toward automating threat intelligence, accelerating incident response, and ultimately enhancing the resilience of the digital services that society depends on.

💡 Deep Analysis

Deep Dive into 웹쉘 패밀리 자동 분류를 위한 동적 호출 추적과 그래프 기반 표현 연구.

📄 Full Content

Malicious WebShells have evolved from simple scripts into strategic assets used in sophisticated attacks that directly threaten critical public services in sectors like healthcare and finance, endangering the sensitive data of millions. To counter this pervasive threat, the research community has achieved considerable success in developing automated techniques for WebShell detection (Tu et al. 2014;Aboaoja et al. 2022;Ma, Han, and Zhou 2024;Feng et al. 2024;Han et al. 2025c).

While successful, this focus on binary classification (malicious vs. benign) provides only a foundational first line of defense and offers limited actionable intelligence for subsequent security operations. A more proactive and robust security posture requires not just knowing that a server is compromised, but understanding the specific nature of the threat itself. This necessitates WebShell family classification: the task of identifying the specific variant or lineage of the malware. Automating this process is crucial as it unlocks a deeper level of threat intelligence, helping security teams attribute attacks, anticipate an adversary’s next moves, and mount a faster, more targeted incident response (Zhao et al. 2024). For instance, an automated system can reduce response time from hours of manual expert analysis to mere seconds, enabling security operation centers (SOCs) to trigger specific defense playbooks tailored to a family’s known tactics before significant damage, like data exfiltration, occurs. This critical task, however, remains largely unexplored in the research community, with current practices relying on time-consuming manual analysis.

We argue that automating this task is technically feasible for two primary reasons. First, WebShells within the same family often share distinct behavioral characteristics due to code reuse (Wrench and Irwin 2015;Starov et al. 2016). Second, this malicious behavior can be captured in the program’s dynamic function call traces even when the source code is obfuscated (De Goër et al. 2018;Xu and Chen 2023). This insight forms our core hypothesis: by learning to recognize these fundamental behavioral patterns, a model can effectively group and track WebShell families, even when they are protected by surface-level obfuscation.

However, family classification is inherently more challenging than binary detection, as it requires models that can capture the nuanced behavioral patterns that differentiate families, not just generic malicious traits. This challenge motivates the foundational research question of our work: What data structures and representation methods are most effective for capturing these family-specific behaviors?

To answer this question, this paper presents the first systematic study to benchmark WebShell family classification.

We conduct a large-scale empirical evaluation of diverse data abstractions and representation learning methods designed to capture WebShell behavior. Our goal is to establish a robust foundation and a practical guide for this critical task.

Our contributions are as follows:

• A Comprehensive Methodological Framework. We design and execute the first large-scale benchmark for this task. To ensure a robust evaluation, we introduce a data synthesis framework leveraging a Large Language Model (LLM) to augment our real-world data with diverse, behaviorally-consistent function call traces. We abstract this enriched dataset into three fundamental data structures (sequences, graphs, and trees) and systematically evaluate a diverse spectrum of representation learning methods, from classic word embeddings and transformers to structure-aware algorithms like Graph Kernels and various Graph Neural Networks (GNNs).

• A Robust Empirical Baseline. Through extensive experiments on four real-world datasets with both supervised and unsupervised classification, we establish the first robust, data-driven performance baseline for Web-Shell family classification. This provides a crucial point of comparison for all future work in this emerging area.

• Actionable Insights for the Security Community.

Our analysis delivers a clear hierarchy of performance, demonstrating that structural representations (especially trees) are decisively more effective than sequential ones, and that GNNs are the premier modeling architecture. These findings offer immediate, practical guidance for practitioners and researchers aiming to build effective classification systems.

• A Practical Guide to Implementation. We distill our findings into a set of best practices for implementation, detailing optimal strategies for model selection and hyperparameter configuration.

Ultimately, this work provides both a foundational benchmark and a practical guide, empowering the community to move beyond simple detection and build the next generation of intelligent, fine-grained defense systems.

The primary goal of WebShell family classification is to automatically categorize a given malicious WebShell into o

…(Full text truncated)…

📄 Read Full PDF on ArXiv