PageRank in Malware Categorization

Reading time: 5 minute
...

📝 Abstract

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

💡 Analysis

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

📄 Content

PageRank in Malware Categorization BooJoong Kang, Suleiman Yerima, Kieran McLaughlin, Sakir Sezer Queen’s University Belfast Northern Ireland Science Park, Queen’s Road, Queen’s Island, Belfast, Northern Ireland, United Kindom, BT3 9DT +44 (0) 28 9097 1745 {b.kang, s.yerima, kieran.mclaughlin, s.sezer}@qub.ac.uk

ABSTRACT In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy. CCS Concepts • Security and privacy➝Intrusion/anomaly and malware mitigation➝Malware and its mitigation. Keywords Malware categorization; malware classification; PageRank; dynamic analysis.

  1. INTRODUCTION Malware detection and classification play a very important part in malware defense. Malware categorization is a malware classification technique that classifies malware into certain categories [1]. For example, multiple malware can be classified as a single malware family or a single type of malware (e.g. Trojan). Malware categorization can be used to discover similar malware, or groups of unknown malware, and analysts can use this additional information in further investigations on the malware. Since malware categorization can be also extended to malware detection, malware categorization can play an important role in malware defense. Previous studies on malware categorization investigated various characteristics of malware and proposed categorization methods utilizing those characteristics. Malware is a program that consists of instructions and the instructions define the behavior of the malware. Therefore, many existing methods proposed various forms of instruction information such as instruction sequence, frequency and etc. Malware variants in the same malware family tend to reuse the original code and be written in the same development environment such as editors and compilers. Compiling the reused code with the same compiler will produce the same result, i.e. the same low-level instructions in the same structure. Some malware, which are classed as the same malware type (e.g. Trojan), have similar purposes and sometimes behave in the same way. Because of similar functionalities, those malware may share similar code. Over the last few years, many research efforts have been conducted on developing automatic malware categorization systems. Various features have also been researched including instruction frequency [4-7] and sequence [8-11], control flow graph [12-14] and so on. Since a PageRank-based software analysis method [15] has been proposed, there is a need for investigation on PageRank in malware analysis. In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank [2] is a graph-based ranking technique that computes ranks of nodes representing an importance of each node based on the structural information between nodes. A Windows-based malware can be disassembled into a set of code that consists of assembly instructions (hereafter instructions) and graphs, where a node is an instruction and an edge represents a sequence of two instructions, can be generated from the code. Ranks of instructions can be computed using PageRank and the ranks will be different between different malware. We investigate a number of existing PageRank algorithms [2-3] and compare the performance of the algorithms for malware categorization. The remainder of this paper is organized as follows. Section 2 summarizes the related work. Section 3 describes our proposed malware categorization method with a number of existing PageRank algorithms. Section 4 evaluates our proposed method. Finally, Section 6 concludes the paper and outlines avenues for future work.
  2. RELATED WORK For many years, malware categorization has been done by human analysts but the manual analysis is time-consuming and labor- intensive [1]. Thus there has been a need for automatic malware categorization methods. One of the most essential parts in malware categorization is the feature extraction and several features have been proposed. Since Bilar [4] discovered that the distribution of instruction frequency varies in different groups of malware, several methods have been proposed based on instruction frequency. Rad et al. [5] compute Minkowski-form distance of instruction frequency vectors to measure function similarities of malware variants. Ye et al. [1] applies term frequency

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut