A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A function inlining optimization is a widely used transformation in modern compilers, which replaces a call site with the callee’s body in need. While this transformation improves performance, it significantly impacts static features such as machine instructions and control flow graphs, which are crucial to binary analysis. Yet, despite its broad impact, the security impact of function inlining remains underexplored to date. In this paper, we present the first comprehensive study of function inlining through the lens of machine learning-based binary analysis. To this end, we dissect the inlining decision pipeline within the LLVM’s cost model and explore the combinations of the compiler options that aggressively promote the function inlining ratio beyond standard optimization levels, which we term extreme inlining. We focus on five ML-assisted binary analysis tasks for security, using 20 unique models to systematically evaluate their robustness under extreme inlining scenarios. Our extensive experiments reveal several significant findings: i) function inlining, though a benign transformation in intent, can (in)directly affect ML model behaviors, being potentially exploited by evading discriminative or generative ML models; ii) ML models relying on static features can be highly sensitive to inlining; iii) subtle compiler settings can be leveraged to deliberately craft evasive binary variants; and iv) inlining ratios vary substantially across applications and build configurations, undermining assumptions of consistency in training and evaluation of ML models.

💡 Research Summary

This paper presents the first comprehensive study investigating the security implications of function inlining, a common compiler optimization, on machine learning (ML)-based binary analysis. The core premise is that while inlining improves performance by replacing function calls with the callee’s body, it fundamentally alters static features of the binary—such as machine instruction sequences and control-flow graphs (CFGs)—which are crucial inputs for ML models performing security tasks like vulnerability detection, malware classification, and code similarity analysis.

The research delves deep into the decision-making pipeline of the LLVM compiler. It analyzes the complex cost model that evaluates whether inlining a function is beneficial, weighing projected performance gains against code size inflation. The authors go beyond standard optimization levels (-O1 to -O3) to explore a wide array of fine-grained compiler flags that influence inlining decisions. Through this analysis, they identify specific combinations of these flags that can aggressively boost the inlining ratio far beyond typical levels, a strategy they term “extreme inlining.”

To assess the practical security impact, the authors conduct extensive experiments across five critical ML-assisted security tasks: Binary Code Similarity Detection (BCSD), Function Symbol Name Prediction, Malware Detection, Malware Family Classification, and Vulnerability Detection. They train and evaluate 20 unique state-of-the-art models on these tasks. The experimental setup involves comparing model performance on binaries compiled with standard settings versus those compiled with “extreme inlining” configurations.

The findings are significant and reveal a serious blind spot in current ML-based security analysis:

Evasion Vulnerability: Function inlining, despite being a benign and intended optimization, can be exploited to evade both discriminative and generative ML models. By simply recompiling code with different inlining settings, an adversary can create binary variants that ML models fail to recognize correctly.
Model Sensitivity: ML models that rely heavily on static features (e.g., raw bytes, CFGs) are highly sensitive to inlining transformations, leading to significant performance degradation.
Practical Attack Vector: Subtle compiler setting changes, which are trivial for an attacker to apply, are sufficient to craft these evasive variants, making this a low-cost, high-impact attack method.
Inconsistency in Assumptions: The inlining ratio varies substantially depending on the application’s coding style and build configuration. This variability undermines the common assumption of consistent feature distributions between training and evaluation data, posing a fundamental challenge for reliable ML model deployment.

The paper also contributes methodologies for using DWARF debugging information to establish ground truth for inlining and releases datasets and tools to support future research. In conclusion, this work sounds a critical alarm, demonstrating that legitimate compiler optimizations can unintentionally create new attack surfaces for evading ML-based security systems. It calls for the development of more robust models that account for compiler-induced variations and for a deeper integration of compilation-aware strategies in the ML for security pipeline.

A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis

💡 Research Summary

Comments & Academic Discussion

Leave a Comment