MalFlows: Context-aware Fusion of Heterogeneous Flow Semantics for Android Malware Detection
Static analysis, a fundamental technique in Android app examination, enables the extraction of control flows, data flows, and inter-component communications (ICCs), all of which are essential for malware detection. However, existing methods struggle to leverage the semantic complementarity across different types of flows for representing program behaviors, and their context-unaware nature further hinders the accuracy of cross-flow semantic integration. We propose and implement MalFlows, a novel technique that achieves context-aware fusion of heterogeneous flow semantics for Android malware detection. Our goal is to leverage complementary strengths of the three types of flow-related information for precise app profiling. We adopt a heterogeneous information network (HIN) to model the rich semantics across these program flows. We further propose flow2vec, a context-aware HIN embedding technique that distinguishes the semantics of HIN entities as needed based on contextual constraints across different flows and learns accurate app representations through the joint use of multiple meta-paths. The representations are finally fed into a channel-attention-based deep neural network for malware classification. To the best of our knowledge, this is the first study to comprehensively aggregate the strengths of diverse flow-related information for assessing maliciousness within apps. We evaluate MalFlows on a large-scale dataset comprising over 20 million flow instances extracted from more than 31,000 real-world apps. Experimental results demonstrate that MalFlows outperforms representative baselines in Android malware detection, and meanwhile, validate the effectiveness of flow2vec in accurately learning app representations from the HIN constructed over the heterogeneous flows.
💡 Research Summary
The paper “MalFlows: Context-aware Fusion of Heterogeneous Flow Semantics for Android Malware Detection” presents a novel framework for detecting Android malware by integrating and analyzing three fundamental types of program flows extracted via static analysis: control flows, data flows, and inter-component communications (ICCs). The authors identify a key limitation in existing methods: they either utilize only a subset of these flows or fail to effectively model the semantic complementarity and contextual relationships between entities across different flows when attempting fusion. This often leads to incomplete behavioral profiles and reduced detection accuracy.
Motivated by a large-scale statistical analysis of over 31,000 real-world apps, the research first demonstrates that benign and malicious apps exhibit distinct and complementary patterns across all three flow types. Malware more frequently guards APIs under sensitive triggering conditions (e.g., network access), uses different sets of source and sink APIs for data flows, and registers for specific system-level Intent actions in ICCs. This empirical evidence underscores the necessity of a holistic, integrated analysis.
To address this, MalFlows introduces a four-stage architecture:
- Data Modeling with a Heterogeneous Information Network (HIN): Using static analyzers like FlowDroid and IccTA, the system extracts entities (apps, methods, API classes, conditions, components, Intent actions) and relations (calls, guards, data-flows-to, contains) from APK files. These are structured into a large HIN, which explicitly captures the rich, typed interactions between heterogeneous elements from different flows.
- Context-Aware Embedding via flow2vec: This is the core technical innovation. To learn meaningful, low-dimensional representations of apps from the HIN, the authors propose
flow2vec, a novel HIN embedding technique. Unlike conventional methods that treat a node (e.g., an API) identically across all contexts,flow2vecis context-aware. It performs random walks guided by multiple pre-defined meta-paths (e.g., App->Condition->API for control flows, App->SourceAPI->SinkAPI for data flows). Crucially, it jointly uses these meta-paths and distinguishes the semantic role of a node based on the specific meta-path context in which it appears. This allows the model to learn app representations that respect the constraints and dependencies across different flow types, preventing semantic confusion. - Attention-Based Classification: The learned app representations (which fuse semantics from all three flows) are fed into a deep neural network classifier equipped with a channel attention mechanism. Here, each “channel” corresponds to the representation learned from a specific flow type (control, data, ICC). The attention module dynamically learns to assign different importance weights to each channel for a given app, allowing the model to focus on the most discriminative flow semantics for the final malicious/benign classification.
- Evaluation: The system is evaluated on a dataset of 31,301 apps (16,667 benign, 14,634 malicious) spanning multiple years, comprising over 20 million flow instances. MalFlows is compared against a range of representative baselines, including flow-based detectors (MUDFLOW, ICCDetector), heterogeneous graph-based models (HinDroid, GEMNN), and other ML-based approaches. Experimental results show that MalFlows consistently outperforms all baselines in terms of F1-score, accuracy, precision, and recall. Ablation studies further confirm the individual effectiveness of the HIN modeling, the context-aware
flow2vecembedding (over context-unaware alternatives), and the channel attention mechanism.
In conclusion, MalFlows represents a significant advance as the first work to comprehensively aggregate control, data, and ICC flow semantics through a context-aware fusion strategy. By leveraging a HIN for explicit relationship modeling and a novel embedding technique for context-sensitive representation learning, it achieves superior accuracy in detecting sophisticated Android malware, effectively capturing complex malicious patterns that span multiple behavioral dimensions.
Comments & Academic Discussion
Loading comments...
Leave a Comment