Control Flow Change in Assembly as a Classifier in Malware Analysis

Reading time: 7 minute
...

📝 Abstract

As currently classical malware detection methods based on signatures fail to detect new malware, they are not always efficient with new obfuscation techniques. Besides, new malware is easily created and old malware can be recoded to produce new one. Therefore, classical Antivirus becomes consistently less effective in dealing with those new threats. Also malware gets hand tailored to bypass network security and Antivirus. But as analysts do not have enough time to dissect suspected malware by hand, automated approaches have been developed. To cope with the mass of new malware, statistical and machine learning methods proved to be a good approach classifying programs, especially when using multiple approaches together to provide a likelihood of software being malicious. In normal approach, some steps have been taken, mostly by analyzing the opcodes or mnemonics of disassembly and their distribution. In this paper, we focus on the control flow change (CFC) itself and finding out if it is significant to detect malware. In the scope of this work, only relative control flow changes are contemplated, as these are easier to extract from the first chosen disassembler library and are within a range of 256 addresses. These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6 and the even more abstract feature of the occurrences of the n-grams is used. Statistical methods were used as well as the Naive-Bayes algorithm to find out if there is significant data in CFC. We also test our approach with real-world datasets.

💡 Analysis

As currently classical malware detection methods based on signatures fail to detect new malware, they are not always efficient with new obfuscation techniques. Besides, new malware is easily created and old malware can be recoded to produce new one. Therefore, classical Antivirus becomes consistently less effective in dealing with those new threats. Also malware gets hand tailored to bypass network security and Antivirus. But as analysts do not have enough time to dissect suspected malware by hand, automated approaches have been developed. To cope with the mass of new malware, statistical and machine learning methods proved to be a good approach classifying programs, especially when using multiple approaches together to provide a likelihood of software being malicious. In normal approach, some steps have been taken, mostly by analyzing the opcodes or mnemonics of disassembly and their distribution. In this paper, we focus on the control flow change (CFC) itself and finding out if it is significant to detect malware. In the scope of this work, only relative control flow changes are contemplated, as these are easier to extract from the first chosen disassembler library and are within a range of 256 addresses. These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6 and the even more abstract feature of the occurrences of the n-grams is used. Statistical methods were used as well as the Naive-Bayes algorithm to find out if there is significant data in CFC. We also test our approach with real-world datasets.

📄 Content

Control Flow Change in Assembly as a Classifier in Malware Analysis

Andree Linke
School of Computer Science University College Dublin Ireland andree.linkee@ucdconnect.ie Nhien-An Le-Khac School of Computer Science University College Dublin Ireland an.lekhac@ucd.ie

Abstract—As currently classical malware detection methods based on signatures fail to detect new malware, they are not always efficient with new obfuscation techniques. Besides, new malware is easily created and old malware can be recoded to produce new one. Therefore, classical Antivirus becomes consistently less effective in dealing with those new threats. Also malware gets hand tailored to bypass network security and Antivirus. But as analysts do not have enough time to dissect suspected malware by hand, automated approaches have been developed. To cope with the mass of new malware, statistical and machine learning methods proved to be a good approach classifying programs, especially when using multiple approaches together to provide a likelihood of software being malicious. In normal approach, some steps have been taken, mostly by analyzing the opcodes or mnemonics of disassembly and their distribution. In this paper, we focus on the control flow change (CFC) itself and finding out if it is significant to detect malware. In the scope of this work, only relative control flow changes are contemplated, as these are easier to extract from the first chosen disassembler library and are within a range of 256 addresses. These features are analyzed as a raw feature, as n-grams of length 2, 4 and 6 and the even more abstract feature of the occurrences of the n-grams is used. Statistical methods were used as well as the Naïve-Bayes algorithm to find out if there is significant data in CFC. We also test our approach with real- world datasets.
Keywords— Malware analysis, Control flow change, Naïve- Bayes analysis, n-gram signatures
I. INTRODUCTION
The world of computer crime is constantly expanding. Due to constantly new tech-nology is invading our lives, the opportunities of making money by exploiting tech-nologies’ vulnerabilities rise in the same way. At the same time, classical antivirus (AV) products seem to fail against new coded malware [1], which incorporates rootkit technologies and gets encoded to subvert AV products. Classical AV relies greatly on file signatures, providing which is a reactive process of finding a malware, creating a signature (for example by hashing or extracting byte sequences) and pushing these signatures into file/system scanners. For institutions like the police or military, this approach is no more feasible, as the attackers have become more proficient and equipped and institutions face a constant stream of sophisticated attacks. Therefore, new automated methods of discern between wanted software (so-called “goodware”) and unwanted software (“malware”) ought to be explored to battle the stream of malware. Interesting approaches have been taken in the past and lead to systems for automatic detection and categorization of malware, such as sandboxes or intrusion prevention systems. Current approaches have been taken to use statistical analysis [2] or machine learning [3] to find discriminators for categorization. As the analysis of microprocessor operation code (opcode) has been subject of some research and some approaches have been suggested for analysing the control flow, in this paper we focus on relative change of control flow in static disassembly. This approach has not been proposed in the literature yet, so our work aims on testing if the use of control flow change can be used to differentiate between goodware and malware. The precondition for our approach is that the software in question is not packed, encrypted or encoded. Software unpacking, decryption or decoding is beyond the scope of this work, however, simple steps in sorting out such samples have been taken. The rest of this paper is organised as follows: Section 2 shows background of our research and related work in this area. We present our approach in Section 3. We describe and analyse results in Section 4. Finally, we conclude and discuss on future work in Section 5. II. BACKGROUND A. Windows PE files The PE file format is the main format of Microsoft Windows executable files, dynamic link libraries and object code. It contains all information needed for the program loader of the Windows operating system to build the process object, the memory layout and needed library call structures. It is derived from the Unix COFF file format. The supported architectures of the PE file format are IA-32, IA-64, x86-64 and ARM. This work focuses on the IA-32 architecture. The full documentation of the PE file format can be found in Microsofts “Microsoft PE and COFF Specification” [4]. The code of the executable can be extracted from the sections part of the PE file in r

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut