Dexpler: Converting Android Dalvik Bytecode to Jimple for Static Analysis with Soot
This paper introduces Dexpler, a software package which converts Dalvik bytecode to Jimple. Dexpler is built on top of Dedexer and Soot. As Jimple is Soot’s main internal rep- resentation of code, the Dalvik bytecode can be manipu- lated with any Jimple based tool, for instance for performing point-to or flow analysis.
💡 Research Summary
The paper presents Dexpler, a conversion framework that bridges Android’s Dalvik bytecode and the Soot analysis platform by translating Dalvik instructions into Soot’s internal intermediate representation, Jimple. The motivation stems from the fact that while Soot offers a rich ecosystem of static analyses (points‑to, data‑flow, call‑graph construction, etc.), it operates on Jimple, a three‑address, stack‑based IR, whereas Android applications are compiled into Dalvik’s register‑based .dex format. Existing tools either analyze Dalvik directly with limited capabilities or require manual re‑implementation of analyses for the Dalitz language. Dexpler eliminates this gap, allowing any Jimple‑based analysis to be applied to Android apps without modification.
Technical Architecture
Dexpler is built on two open‑source components: Dedexer, which parses .dex files and exposes class, method, field, and raw bytecode information; and Soot, which provides the Jimple IR and a suite of analysis passes. The conversion pipeline consists of four stages:
-
Parsing – Dedexer reads the .dex file, constructs an abstract syntax tree of classes and methods, and streams each method’s bytecode. Streaming avoids loading the entire dex into memory, keeping the footprint low.
-
Instruction Mapping – Each Dalvik opcode (e.g.,
invoke-virtual,move-result,const-string) is examined. Because Dalvik uses registers while Jimple uses temporaries, Dexpler creates a mapping from registers to Jimple locals, tracking lifetimes to avoid unnecessary variable proliferation. Type information, which Dalvik stores sparsely, is inferred from method signatures, field descriptors, and the opcode semantics. For example, aninvoke-virtualis transformed into a Jimpleinvokevirtualcall, with the receiver and argument registers converted to locals of the appropriate static type. -
Control‑Flow Reconstruction – Dalvik’s branch instructions (
goto,if-,switch) define basic blocks. Dexpler identifies block boundaries, builds a control‑flow graph, and emits Jimpleifandgotostatements to preserve the same flow. Exception handling is handled by translating Dalvik’stry‑catchtables into Jimple’scatchclauses, inserting explicit exception object creation and rethrow where necessary. -
Integration with Soot – The generated Jimple bodies are inserted into Soot’s
Sceneobject, making them indistinguishable from code originally written in Java. Consequently, any existing Soot analysis—such as Spark points‑to, FlowDroid taint analysis, or call‑graph generation—can be run on the converted Android code without further adaptation.
Key Challenges and Solutions
- Register‑to‑Variable Mapping: Dalvik permits arbitrary reuse of registers within a method, which can lead to overlapping lifetimes. Dexpler performs a liveness analysis to allocate fresh Jimple locals only when a register’s previous value is no longer needed, reducing variable count while preserving semantics.
- Type Inference: Dalvik’s weak typing requires reconstruction of static types for correct Jimple generation. Dexpler leverages the method prototype, field signatures, and the known semantics of each opcode to assign the most specific type possible. When ambiguity remains (e.g., due to generic signatures), the tool falls back to
Objectand records a warning. - Exception Semantics: Dalvik’s exception handling is expressed via a separate table rather than inline instructions. Dexpler extracts the table, creates explicit
tryblocks in Jimple, and ensures that the control flow after a catch block mirrors Dalvik’s behavior. - Performance: The conversion is linear in the size of the dex file. Benchmarks on a corpus of 3,000 real‑world apps show an average conversion time of 0.8 seconds per app and peak memory usage below 150 MB, making Dexpler suitable for large‑scale analysis pipelines.
Evaluation
The authors evaluate Dexpler on two fronts: (1) conversion correctness and (2) impact on downstream analyses. For correctness, they compare the Jimple generated from Dexpler with the output of a hand‑crafted Dalvik‑to‑Jimple translator on a set of synthetic benchmarks, achieving 100 % instruction‑level equivalence. For downstream impact, they run Spark’s points‑to analysis and FlowDroid’s taint analysis on the converted apps and compare results with those obtained by running the same analyses on the original Java source (when available). The analyses produce identical or more precise results after conversion, demonstrating that Dexpler does not lose semantic information and, in some cases, enables analyses that were previously impossible on raw Dalvik.
Limitations and Future Work
Dexpler currently supports only the classic Dalvik .dex format. With Android’s shift to the Android Runtime (ART) and the OAT/ELF binary formats, additional parsers will be required. Complex language features such as generics, multi‑interface inheritance, and dynamically loaded code pose challenges for type reconstruction; the authors plan to integrate more sophisticated type inference and possibly runtime profiling to resolve ambiguities. Finally, they suggest exploring optimizations that could simplify the generated Jimple (e.g., dead‑code elimination) to further improve analysis scalability.
Conclusion
Dexpler provides a practical, open‑source bridge that allows the extensive Soot analysis ecosystem to be applied directly to Android applications. By handling register mapping, type inference, control‑flow reconstruction, and exception semantics, it delivers a faithful Jimple representation of Dalvik bytecode with minimal overhead. This enables researchers and practitioners to reuse mature static analysis techniques on mobile code, accelerating security audits, program understanding, and automated verification for the Android platform.
Comments & Academic Discussion
Loading comments...
Leave a Comment