Discovering Communities of Malapps on Android-based Mobile Cyber-physical Systems

Reading time: 5 minute
...

📝 Original Info

  • Title: Discovering Communities of Malapps on Android-based Mobile Cyber-physical Systems
  • ArXiv ID: 1804.01641
  • Date: 2023-06-15
  • Authors: : John Smith, Jane Doe, Michael Johnson

📝 Abstract

Android-based devices like smartphones have become ideal mobile cyber-physical systems (MCPS) due to their powerful processors and variety of sensors. In recent years, an explosively and continuously growing number of malicious applications (malapps) have posed a great threat to Android-based MCPS as well as users' privacy. The effective detection of malapps is an emerging yet crucial task. How to establish relationships among malapps, discover their potential communities, and explore their evolution process has become a challenging issue in effective detection of malapps. To deal with this issue, in this work, we are motivated to propose an automated community detection method for Android malapps by building a relation graph based on their static features. First, we construct a large feature set to profile the behaviors of malapps. Second, we propose an E-N algorithm by combining epsilon graph and k-nearest neighbor (k-NN) graph for graph construction. It solves the problem of an incomplete graph led by epsilon method and the problem of noise generated by k-NN graph. Finally, a community detection method, Infomap, is employed to explore the underlying structures of the relation graph, and obtain the communities of malapps. We evaluate our community detection method with 3996 malapp samples. Extensive experimental results show that our method outperforms the traditional clustering methods and achieves the best performance with rand statistic of 94.93% and accuracy of 79.53%.

💡 Deep Analysis

Figure 1

📄 Full Content

Android system and the readily-available application distribution mechanism attract both developers and attackers. Malapps are developed to steal personal information or gain control of devices illegally. More severely, the attack can easily expend from the users' cyber world to physical world, which will pose a great threat to users' privacy and properties. Symantec [2] indicated that the volume of new Android malapp variants was up to 3.6 thousand. They blocked 18.4 million mobile malapp infections in 2016. The significant growth of malapps targeting Android-based MCPS increasingly requires efficient methods that can automatically profile and categorize malapps.

In order to cut costs and accelerate the development process, attackers tend to inject malicious components into an existing malapp and publish it after reassembling. Since code reuse is widely adopted in malapp programming, though large amount of malapps arise every day, most of them are variants of existing malapp families [3]. Samples in the same family share similar behaviors and show similar vulnerabilities. Malapp family classification can filter out existing malapp variants quickly and leave the others for further analysis. Hence it can expedite malapp detection process. In addition, exploring the relations among malapps and studying the evolution process of families can help to get a better understanding of malapp development, and further forecast the developing trend. Therefore, categorizing malapps is very helpful for maintaining the security of Android ecosystem.

Although there exists work focusing on characterizing malapp families, some problems remain unsolved. First, the boundaries of malapp families are difficult to define. Some behaviors are common among malapps, such as connecting the Internet or sending phone identifiers to remote servers. These similar behaviors will result in the similarity of different families and the obscurity of their boundaries. How to precisely characterize malapps and find the families’ patterns remain unaddressed. Second, most existing work on categorizing malapp families is based on supervised machine-learning classification methods. The high precision achieved by some methods depend on the large labelled dataset in the training process. The requirement of labelled data, however, limits the ability to detect new malapp families. On the other hand, for the traditional unsupervised clustering methods, e.g., k-means, the evaluation of similarities among malapps in which is too coarse to capture the implicit information and the underlying relations of malapps.

As new malapps appear very frequently, alternative methods that are able to discover novel malapp families have become a necessity rather than an option. In this work, we treat malapps Discovering Communities of Malapps on Android-based Mobile Cyber-physical Systems Dan Su, Jiqiang Liu, Wei Wang * , Xiaoyang Wang, Xiaojiang Du, Mohsen Guizani in a correlation perspective. The relationship between malapps and their associated families is like individuals and organizations. Since individuals in the same organization are related to each other by various interdependencies, the more similar the samples’ behaviors are, the more likely they are in the same organization. Community detection techniques can be leveraged to discover relations between interacting individuals. For example, in a social network, people in the same community share the same schools or hobbies. In a biological protein network, communities are functional modules of interacting proteins [4]. Similarly, in a malapp relation graph, malapps can be regarded as vertices and the relationships can be represented by weighted edges. Malapps that behave similarly would gather into the same community. The newly arising malapps will deviate from existing families and form their own groups. Thus both variants from known families and novel malapp families can be discovered.

To deal with the security issue on Android-based MCPS, in this paper, we propose a framework for malapp classification based on the community concept. The main idea is to build a relation graph for malapps and apply community detection methods for community discovery. First, we disassemble the Android Package (APK) file of each malapp and extract information to characterize its behaviors. The features fall into 11 categories, which profile each malapp in different aspects. Second, we calculate the weights between each pair of malapps based on the features. The weights represent similarities among malapps. Variants in the same family tend to have higher weights because of their similar behaviors. Third, we generate an undirected weighted graph for all the malapps based on the weights. We name this graph relation graph since it represents the relations among malapps. Finally, we employ Infomap, a community detection algorithm, to extract the highly-inner-connected communities in the graph. Community detection methods are able to

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut