Superset Decompilation

Superset Decompilation Chang Liu Syracuse University cliu57@syr .edu Yihao Sun Syracuse University ysun67@syr .edu Thomas Gilray W ashington State University thomas.gilray@wsu.edu Kristopher Micinski Syracuse University kkmicins@syr .edu Abstract Reverse engineering tools remain monolithic and imperative com- pared to the advancement of modern compiler architectures: analy- ses are tied to a single mutable representation, making them dicult to extend or rene, and forcing prematur e choices between sound- ness and precision. W e observe that decompilation is the re verse of compilation and can be structured as a sequence of modular passes, each performing a granular and clearly dened interpretation of the binary at a progressively higher level of abstraction. W e formalize this as provenance-guided superset decompilation (PGSD), a framework that monotonically derives facts ab out the binary into a relation store. Instead of committing early to a single interpretation, the pip eline retains ambiguous interpretations as parallel candidates with provenance , deferring resolution until the nal selection phase. Manifold implements PGSD as a declarative reverse engineering framew ork that lifts Linux ELF binaries to C99 through a granular intermediate representation in 35K lines of Rust and Datalog. On GNU coreutils, Manifold ’s output quality matches Ghidra, ID A Pro, angr , and RetDec on multiple metrics while pro- ducing fewer compiler errors, and generalizes across compilers and optimization levels. CCS Concepts • Software and its engine ering → Software reverse engineer- ing ; Compilers ; • Theory of computation → Constraint and logic programming ; • Computing methodologies → Knowl- edge representation and reasoning . Ke ywords Reverse Engineering, Datalog 1 Introduction Reverse engine ering (RE) is a cornerstone of software security , program understanding, and related elds. In RE, an analyst per- forms an iterative, hypothesis-driven exploration of an (often bi- nary) artifact to explicate its higher-level behavior [ 43 ]. This pro- cess interleaves a range of related tasks, b oth manual ( e . g ., read- ing the code and making comments, renaming variables based on application-specic intuition) and automated ( e . g ., decompilation, static analysis), often driven by an RE tool such as IDA Pro [ 19 ] or Ghidra [ 30 ]. Despite the capabilities of these tools, a profound archi- tectural divide lies between the compiler and decompiler communi- ties. While modern compilers have embraced modular , multi-pass architectures built around well-dened intermediate representa- tions (IRs) [ 23 , 24 , 26 ], decompilers and reverse engineering frame- works [ 1 , 2 , 11 , 18 , 30 , 38 , 41 ] remain fundamentally monolithic. These tools typically consist of a massive codebase ranging fr om 150K to over 1M lines of C++ or Java code operating over a single IR or a limited set of IRs ( T able 1). Consequently , core tasks such as control-ow recov ery , typ e reconstruction, and variable inference are tightly entangled in a single, mutable program representation. This lack of modularity makes extensibility a persistent challenge: integrating new analyses requires navigating deeply coupled code with little infrastructure for incremental development [ 12 ], and extending any single feature risks breaking others. In this work, we argue that the result of decompilation, and RE broadly , should b e a forest of increasingly-higher-level candi- date decompilations, derived via a tower of logic-dened rules. Just as compilers incrementally lower code through a chain of modular passes [ 23 , 24 , 26 , 35 ], we envision decompilation restruc- tured as a se quence of IR lifting passes which perform analysis- directed translation via logical rules. However , unlike compilation, decompilation must cop e with missing information and genuine ambiguity—even disassembly of stripped binaries is unde cidable in general [ 15 , 16 ]. This creates a fundamental tension: sound, veri- ed decompilation [ 14 , 42 ], while theoretically appealing, demands formal semantics that rarely exist for pr oduction compilers or com- plex ISA s like x86-64. Meanwhile, rev erse engineering is inherently exploratory: analysts routinely entertain counterfactual interpreta- tions: “what if this data r egion encodes executable code?” (useful for de obfuscation) or “what if control ow diverts mid-instruction?” (useful for identifying ROP gadgets [ 37 ]). Existing monolithic tools force a premature commitment to a single interpretation, discarding alternatives that an analyst may later need to revisit. The result is a workow where the tool’s rigidity actively impe des the analyst’s natural hypothesis-driven reasoning. T o address these dual requirements of modularity and exploratory exibility , we introduce declarative decompilation : a framework for specifying and implementing practical, extensible, and scal- able decompilers using logic programming. In § 3, we formalize provenance-guided superset decompilation (PGSD), our specic ap- proach to logic-dened decompilation. As a proof of concept, we present Manifold , the rst declarative decompiler capable of lift- ing Linux ELF binaries to C99 by ascending through CompCert intermediate representations. In our ar chitectural design, each pass is either a self-contained Datalog program or an imperative module that only derives new facts. Both declare their input and output re- lations and operate over a shared, monotonic fact store — a central database where facts from all IR le vels coexist. Because the store Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski T able 1: Architectural comparison of decompilers. Manifold is the rst decompiler that combines a declarative specication language, multi-level intermediate representations, and a modular nano-pass architecture. IDA Pro/Binary Ninja is closed- source and proprietary . Decompiler Architecture IR Approach Specication Extensibility Language Decompiler LOC Open Source Ghidra [30] Monolithic Single (P-code) Imperative Java/Python plugins C++ / Java ∼ 300K Y es IDA [18] Monolithic Single (microcode) Imperative C++/Python C++ N/A (proprietary) NO RetDec [1] Monolithic Single (LLVM) Imperative LLVM passes (C++) C++ ∼ 200K Y es angr [38] Monolithic Multiple (VEX → AIL) Imperative Python Python/C/C++ ∼ 150K † Y es Binary Ninja [41] Monolithic Multiple (BNIL) Imperative Python C++ N/A (proprietary) NO Manifold Nano-pass Multiple (CompCert) Declarative Datalog rules, Rust Rust ∼ 35K Y es † Estimate, includes decompiler-relevant components. grows monotonically , passes compose naturally: no pass invali- dates another’s conclusions, and adding a new analysis amounts to writing a new pass without mo difying existing ones. Prior work has demonstrated the ee ctiveness of Datalog for binar y disassem- bly [ 16 ] and class hierarchy recovery [ 36 ]; Manifold extends this declarative approach to the full rev erse engineering pipeline. In summary , this pap er makes the following contributions: • A declarative decompilation architecture where indi- vidual nano-passes invert specic CompCert compiler trans- formations using Datalog inference rules (§ 3, § 4). • A formal framework for provenance-guided superset de- compilation (PGSD). This framework models the IR as a graph and denes the semantics of analysis passes operat- ing over a shared, monotonic relation stor e (§ 3). • The Manifold system , an implementation that systemat- ically lifts x86-64 ELF binaries to C through the CompCert IR stack. The system comprises approximately 35K lines of Rust and Datalog and successfully scales to general-purpose C code (§ 4). • An empirical evaluation showing that Manifold matches established decompilers on coreutils in function recovery , type accuracy , and struct reconstruction, and generalizes across compilers and optimization levels (§ 5). 2 Background Manifold is written in a combination of Ascent Datalog ( a Rust- embedded Datalog EDSL [ 34 ]) and Rust, but our formalism (§ 3) is based on provenance semirings, a unifying extension of Datalog to track (and compute over ) the provenance of each inferred fact. W e begin by discussing the essential background related to this work. 2.1 Datalog and Declarative Program Analysis Datalog is a declarative language originally designed for deductive reasoning that has found broad application in program analysis and reverse engineering. A Datalog program comprises an e xtensional database (EDB) of base facts and an intensional database (IDB) of derived facts, deduced through Horn clauses of the form: Head ( . . . ) ← Body 1 ( . . . ) , Body 2 ( . . . ) , . . . , Body 𝑛 ( . . . ) The head fact is derived when all body predicates are jointly satis- ed, with shared variables acting as implicit relational joins. Rules are applie d iteratively until a xpoint is reached, naturally capturing the recursive, transitive reasoning that pr ogram analysis demands. Declarative frameworks building on this semantics hav e achieved state-of-the-art results from source-level pointer analysis [ 39 ] to binary disassembly [ 16 ] and smart contract decompilation [ 17 ]. In binary analysis, the EDB is populated directly from program ar- tifacts such as instructions and contr ol-ow edges, and recursive rules propagate facts across the program. clight_stmt ( 𝑛, Sset ( 𝑑, ˆ 𝑒 ) ) ← csh_stmt ( 𝑛, Sset ( 𝑑, 𝑒 ) ) , var_type ( 𝑟 , 𝜏 ) for each 𝑟 ∈ vars ( 𝑒 ) , ˆ 𝑒 = typed_expr ( 𝑒 ) . Howev er , traditional Datalog engines require facts to be encoded in at, tabular form, which introduces friction when analyses ma- nipulate the rich, nested data structures found in compiler IRs: recursive types such as Tpointer(Ar c) cannot be di- rectly expressed as Datalog terms. Ascent [ 34 ] removes this barrier with a “Bring Y our O wn Data Structures” (BY OD) feature that na- tively embeds Rust types into the engine, allo wing relations ov er user-dened algebraic types and native Rust expressions within rule bodies. Manifold relies on this capability throughout the pipeline. The code shown ab ove demonstrates the rule that lifts a Csharp- minor assignment Sset ( 𝑑 , 𝑒 ) to Clight by resolving the types of all variables in 𝑒 : for each variable 𝑟 , the rule queries var_type ( 𝑟 , 𝜏 ) to obtain a type candidate; typed_expr then rewrites 𝑒 under that assignment to produce the fully typed Clight expression ˆ 𝑒 . Since a variable may admit multiple type candidates, distinct combinations yield distinct Clight statements, later resolv ed via a disambiguation phase § 4. 2.2 Background: Provenance Semirings Denition 2.1 (Commutative Semiring). A commutative semiring is a tuple ( 𝐾 , ⊕ , ⊗ , ¯ 0 , ¯ 1 ) such that ( 𝐾 , ⊕ , ¯ 0 ) and ( 𝐾 , ⊗ , ¯ 1 ) are com- mutative monoids, ⊗ distributes over ⊕ , and ¯ 0 ⊗ 𝑎 = 𝑎 ⊗ ¯ 0 = ¯ 0 for all 𝑎 ∈ 𝐾 . A semiring is zero-divisor-free if 𝑎 ⊗ 𝑏 = ¯ 0 implies 𝑎 = ¯ 0 ∨ 𝑏 = ¯ 0 . Denition 2.2 ( 𝐾 -Relation). Let 𝑈 be a nite attribute set. A 𝐾 - relation over 𝑈 is a function: 𝑅 : T uples ( 𝑈 ) → 𝐾 with nite support, i.e., { 𝑡 | 𝑅 ( 𝑡 ) ≠ ¯ 0 } is nite. When 𝐾 = B , this is just an ordinary nite relation. Superset Decompilation Figure 1: End-to-end overview of Manifold . Denition 2.3 (Provenance Polynomial Semiring). Let 𝑋 = { 𝑥 1 , 𝑥 2 , . . . } be a countable set of provenance tokens . The provenance polynomial semiring N [ 𝑋 ] is the commutative semiring of multivariate poly- nomials with coecients in N and indeterminates from 𝑋 , with ⊕ = + , ⊗ = × , ¯ 0 = 0 , and ¯ 1 = 1 . Concretely , a monomial 𝑥 𝑖 1 · · · 𝑥 𝑖 𝑘 records one derivation path through base facts 𝑥 𝑖 1 , . . . , 𝑥 𝑖 𝑘 , while ⊕ aggregates alternative deriva- tions into a polynomial. The semiring N [ 𝑋 ] is zero-divisor-free and universal (Denition 2.3): for ev ery commutative semiring 𝐾 and valuation 𝑣 : 𝑋 → 𝐾 , there is a unique homomorphism ℎ 𝑣 : N [ 𝑋 ] → 𝐾 extending 𝑣 that commutes with Datalog eval- uation. This means a pipeline can evaluate once over N [ 𝑋 ] and recover any coarser annotation by applying the appropriate homo- morphism afterward. The remainder of this section builds on this foundation to formalize Manifold ’s candidate-tracking semantics. Proposition 2.4 (Universality). For every commutative semir- ing 𝐾 and valuation 𝑣 : 𝑋 → 𝐾 , there is a unique semiring homo- morphism ℎ 𝑣 : N [ 𝑋 ] → 𝐾 extending 𝑣 . Moreover , for any positive Datalog program 𝑃 and any N [ 𝑋 ] -annotated input database 𝐼 , 𝑃 𝐾  ℎ 𝑣 ( 𝐼 )  = ℎ 𝑣  𝑃 N [ 𝑋 ] ( 𝐼 )  , where 𝑃 𝐾 denotes the semantics of 𝑃 evaluated over 𝐾 . In Manifold , we evaluate each pass once over N [ 𝑋 ] , recording full derivation prov enance, and recov er coarser annotations such as simple derivability or candidate multiplicity by applying the appropriate homomorphism. 3 Prov enance-Guided Superset Decompilation Perhaps the single hardest pr oblem confronted by decompilation is that compilation is not injective; register allocation maps many virtual-register assignments to the same machine locations, stack layout commits abstract variables to concrete frame osets, and linearization replaces structured control ow with jumps. A single binary may therefore be consistent with many distinct higher-level programs. W e introduce Provenance-Guided Sup erset De compi- lation (PGSD), a practical appr oach that addresses this ambiguity by viewing decompilation as a structured search through a space of higher-level explanations of a binar y . PGSD organizes de com- pilation around a poset of progressively higher-level IRs, using relational, declarative passes to lift facts from lower-to-higher rep- resentations. Each pass derives monotonically increasing knowl- edge about the binar y , either by deriving auxiliar y analysis facts or by performing analysis-directed transformations to a higher-level IR. Unlike in veried decompilation, we do not assume access to a formal specication of the compiler (or various IRs) or its inverse image. Our r eason for this is largely practical: formal specications of pr oduction compilers (or assembly languages such as x86-64) are rarely available, and in practice, real binaries often contain code that lies outside of any one compiler’s image (e.g., due to link-time optimization). PGSD is thus complementary to veried decompi- lation: rather than proving inversion of any particular compiler , PGSD enumerates a forest of candidate de compilations, each track- ing explicit prov enance for how the particular decompilation was performed. Our Datalog-based approach represents decompilation as an in- creasing sequence of databases, building an annotated relation store of candidate liftings. Passes compose naturally , so candidate gener- ation and analyses share a common semantics. Figure 1 illustrates the pipeline using a running example: a C program manipulating struct Point{int x; int y;} with a conditional o ver one eld, compiled to a Linux ELF binary with GCC. The decompiler rst disassembles the binary (currently via Capstone, though we also support the Datalog disassembler ddisasm ) and encodes instructions, symbols, and ABI information as relational facts. W e leverage CompCert’s IR hierarchy , whose low-level repre- sentations are close to assembly [ 26 ]. Consider classify , which takes a Point* p and returns p->y+1 if p ositive, else p->y-1 . During compilation, p->y lowers to (p+4) , then register allocation yields Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski movss 0x4(%rax), %xmm0 . Decompilation re verses these low er- ings through successive abstraction lay ers: at Mach, raw instruc- tions lift to typed operations, mov %rdi, -0x8(%rbp) becomes Mset- stack(DI, -8, T any64) ; mov (%rax), %eax be comes Mload(MInt32, Aindexed(0), [AX], AX) . At Linear/LTL, raw stack osets are re- placed with typed lo cal variable slots and SP-relative addressing is normalized to BP-based osets, yielding Lgetstack(Local, -8, Tlong, AX) . At RTL, register allocation is re versed: hardware registers become pseudo-registers, producing Iload(MF loat32, Aindexed(4), [p], f loat_var) . At Csharpminor , the oset becomes explicit p ointer arithmetic: Eload(MFloat32, Ebinop(Oaddl, Evar(p), Econst(4))) . Fi- nally , at Clight, type inference determines that p points to an int at oset 0 and f loat at oset 4, and struct recovery emits p->ofs_4 in place of raw pointer arithmetic. This example also illustrates the pr ecision/soundness tradeos and analysis dependencies during de compilation. Determining signed- ness requires inspecting downstream usage: the loaded value ows through test %eax, %eax and later a signe d branch ( jle ), conrming signed int . Distinguishing struct elds from array elements requires combining struct recovery with type information; dierently-typed values at osets 0 and 4 rule out a uniform array . Overly conserva- tive analysis reduces p->ofs_4 to (f loat)(p+4) : technically correct, but losing structural context. 3.1 Semantic Domains for PGSD Denition 3.1 (IR Hierarchy). An IR hierarchy is a nite partially ordered set ( L , ⪯) whose elements are intermediate representations. The order relation ⪯ is interpreted so that ℓ ⪯ ℓ ′ means that ℓ ′ is at least as high-level as ℓ , equivalently , that facts at level ℓ ′ may be derived from facts at level ℓ by some se quence of passes. In our reversal of CompCert, the IR dep endency graph is the totally-ordered chain mirroring CompCert’s compilation stages: x86 - 64 < Asm < Mach < LTL < RTL < Cminor < Cshminor < Clight . This chain should be understo od as a reference hierarchy of representations : it organizes the forms of facts pr oduced by Man- ifold , but do es not imply that each pass is a formally sp ecied inverse of the corresponding CompCert pass. A central subtlety is that higher-level candidates need not correspond one-to-one with concrete instruction addresses. W e therefore separate the identity of a program point from its representation at a particular le vel. Denition 3.2 (Nodes). Let 𝑁 = Addr ∪ SynAddr be the set of nodes . Elements of Addr are concrete machine addresses. Elements of SynAddr are synthetic nodes introduced during lifting, such as control-ow join points or regions lifted from no single address. Denition 3.3 (A nnotated Relation Store). For each IR level ℓ ∈ L , let R ℓ = { 𝑟 1 : 𝜏 1 , . . . , 𝑟 𝑘 ℓ : 𝜏 𝑘 ℓ } be a nite relation schema. The global schema is R = Ð ℓ ∈ L R ℓ together with any auxiliary analysis relations. An annotated relation store over R with annotations in a semiring 𝐾 is a mapping: 𝐷 : 𝑟 ↦→ 𝐷 ( 𝑟 ) such that each relation 𝑟 : 𝜏 is interpreted as a 𝐾 -relation: 𝐷 ( 𝑟 ) : T uples ( 𝜏 ) → 𝐾 . The value 𝐷 ( 𝑟 ) ( 𝑡 ) ∈ 𝐾 is the annotation of tuple 𝑡 in relation 𝑟 . When 𝐾 = N [ 𝑋 ] , this annotation records all derivation paths of 𝑡 . Denition 3.4 (Statement Candidates). For each IR level ℓ and node 𝑛 ∈ 𝑁 , the store 𝐷 induces a set of statement candidates Cand ℓ ( 𝑛 ) = { ( 𝑠 , 𝜅 ) | 𝑠 ∈ 𝑆 ℓ , 𝜅 = 𝐷 ( 𝑟 ℓ ) ( 𝑛, 𝑠 ) ≠ ¯ 0 } , where 𝑆 ℓ is the set of statements at level ℓ and 𝑟 ℓ is the principal statement relation for that level. A single node may admit multiple candidates, since one lower- level artifact may supp ort more than one higher-level interpretation. The annotation 𝜅 records the evidence for each candidate . For the rest of this section, we assume a xed nite active domain of con- stants and a xed nite schema. This is the setting induce d by one binary together with the nite number of synthetic nodes and analy- sis facts generated during decompilation. Under these assumptions, all relations have nite support and ev ery positive Datalog evalu- ation terminates after nitely many tuple insertions. The formal object manipulated by Manifold is an annotated stor e of candidate facts at each IR level. W e therefore formalize the decompiler directly as a candidate-generating pipeline ov er that store, rather than as an exact inverse semantics for any particular compiler . 3.2 Passes, Pip eline, and Conditional Candidate Completeness W e model the de compiler as a sequence of passes which monotoni- cally derive new facts into the global annotated store. Semantically , all passes are the same kind of object: each is a positive Datalog program, which derives new facts from existing ones. Passes dier only in which relations they derive, and ther efore what role they play in the pipeline: some passes derive new analysis information (e.g., identifying patterns for stack canaries), while others derive new higher-level IRs. Denition 3.5 (Pass). A pass is a tuple 𝑃 = ( I , O , Δ ) where: • I ⊆ R is the set of input relation names; • O ⊆ R is the set of output relation names; • Δ is a nite set of positive Datalog rules whose body atoms use relations in I and whose head uses relations in O . Some passes derive auxiliary analysis information, while others derive principal candidate facts at higher IR levels. Semantically , howev er , both are the same kind of obje ct: positive Datalog pro- grams over the shared annotated store. Denition 3.6 (Immediate Consequence Operator). Let 𝑃 = ( I , O , Δ ) be a pass and let 𝐷 be an annotated relation store. The immediate consequence operator 𝑇 𝑃 produces a new stor e by keeping all rela- tions outside O unchanged and, for each output relation 𝑟 ∈ O and tuple 𝑡 , setting: 𝑇 𝑃 ( 𝐷 ) ( 𝑟 ) ( 𝑡 ) = Ê 𝜌 ∈ Δ head ( 𝜌 ) = 𝑟 ( ® 𝑢 ) Ê 𝜃 : 𝜃 ( ® 𝑢 ) = 𝑡 Ì 𝑎 ∈ body ( 𝜌 ) 𝐷 ( rel ( 𝑎 ) )  𝜃 ( args ( 𝑎 ) )  . Here rel ( 𝑎 ) and args ( 𝑎 ) extract the relation symbol and argument tuple of the atom 𝑎 . Thus ⊗ combines evidence along a single derivation and ⊕ aggregates alternative derivations. Proposition 3.7 (Monotonicity). For every pass 𝑃 , the operator 𝑇 𝑃 is monotone with respect to the pointwise extension of annotations. Superset Decompilation Proof. Positive rules use no negation, and b oth ⊕ and ⊗ pre- serve pointwise extension of annotations. Hence, enlarging input annotations can only enlarge derived annotations. □ Lifting rules are recognizers , not formal inv erses. They expr ess patterns that we treat as evidence for higher-le vel constructs. Denition 3.8 (Implemented Derivation Step). Fix a pipeline of passes. An implemented derivation step is an instance of a rule in some pass whose body facts are present in the store and whose head adds a new annotated fact. A derivation for a fact ( 𝑟 , 𝑡 ) is a nite tree of implemented derivation steps rooted at ( 𝑟 , 𝑡 ) and whose leaves are extensional input facts from the initial stor e 𝐷 0 . Denition 3.9 (Coverage Witness). Let 𝑠 be a candidate statement at IR lev el ℓ and node 𝑛 . A cov erage witness for ( 𝑛, 𝑠 ) is a derivation, in the sense of Denition 3.8, whose root is the fact 𝑟 ℓ ( 𝑛, 𝑠 ) . This notion is intentionally internal to the implemented pass library . It does not claim that the witness corresponds to the true compilation history of the binar y; it claims only that the current library of rules and auxiliar y analyses derive the candidate. Denition 3.10 (Decompilation Pipeline). A de compilation pip eline is a sequence of passes 𝑃 1 , . . . , 𝑃 𝑚 together with an initial annotated store 𝐷 0 produced by disassembly . The store evolves as 𝐷 0 𝑃 1 − − → 𝐷 1 𝑃 2 − − → · · · 𝑃 𝑚 − − → 𝐷 𝑚 where 𝐷 𝑗 is the least xed point reached by iterating the im- mediate consequence operator of pass 𝑃 𝑗 starting from 𝐷 𝑗 − 1 . The initial store 𝐷 0 is extensional, as our implementation handles this by converting the disassembly into input tables. Proposition 3.11 (Finite Convergence). Each pass in the pip eline converges after nitely many iterations. The main guarantee that the implementation supports is a cov- erage theorem for the implemented rule set: Theorem 3.12 (Coverage of Implemented Deriv a tions). Let ⟨ 𝑃 1 , . . . , 𝑃 𝑚 ⟩ be a decompilation pip eline with initial store 𝐷 0 , and let 𝐷 𝑚 be the nal store. If a candidate fact ( 𝑟 , 𝑡 ) admits a coverage witness with leaves in 𝐷 0 , then 𝐷 𝑚 ( 𝑟 ) ( 𝑡 ) ≠ ¯ 0 In particular , if a statement candidate ( 𝑛, 𝑠 ) at level ℓ admits a cover- age witness, then ( 𝑠 , 𝜅 ) ∈ Cand ℓ ( 𝑛 ) for some 𝜅 ≠ ¯ 0 Proof. Induction on the height of the witness derivation tree. For the base case: a leaf is an extensional fact in 𝐷 0 , so its annota- tion is nonzero by construction. For the inductive step: consider a derivation node justied by some rule, ℎ ← 𝑏 1 , . . . , 𝑏 𝑘 . By the induction hypothesis, each body fact 𝑏 𝑖 has a nonzero annotation in the store at the point where the corresponding pass is evaluated. The annotation contributed by this rule instance is: 𝑘 Ì 𝑖 = 1 𝐷 ( 𝑏 𝑖 ) which is nonzero b ecause N [ 𝑋 ] is zero-divisor-free by Denition 2.3. Since this value is one summand in the ⊕ for the head ℎ , the head receives a nonzero annotation. Repeating this argument up the derivation tree yields the claim. □ 4 Implementation The framework presented in § 3 raises three practical questions: how ambiguity is concretely represented, how passes interact with each other and the relation store, and how this ambiguity is ulti- mately r esolved. The implementation of Manifold addresses these questions with concrete mechanisms: a superset IR representation that preserves provenance , a modular pass architecture that sepa- rates lifting from analysis while sharing a common r elation store, and a Clang-guided disambiguation phase that selects a coher ent C program from the accumulated candidates. Superset Representation. Adapting CompCert’s IRs for reverse engineering requires several structural changes. In forward com- pilation, each IR carries invariants established by construction: RTL assumes single-denition pseudo-registers, Mach assumes a compiler-chosen stack layout, and Clight assumes a fully typ ed AST . Raw binaries satisfy none of these. Manifold b orrows CompCert’s statement constructors— Msetstack , Iop , Sassign , Sset —but stores them as candidate tuples in a central relation store, DecompileDB , which maps string-keyed relation names to type-erased vectors of tuples. T ype erasure decouples passes: each ne eds only agree on a relation name and tuple type, and the scheduler can compose or parallelize them freely . The store supports two complementar y ac- cess patterns: Ascent-based decompile passes swap entire relations out of the store, run Datalog to xpoint, and swap the enlarged relations back without cop ying; analysis passes instead accumulate facts incrementally , appending individual tuples or r eplacing entire relations as needed. Each IR level denes a principal statement relation (e.g., mach_inst or clight_stmt ) and a separate edge relation ( e . g ., clight_succ , etc.). A single node—represented uniformly as u64 , with a reserved range for synthetic nodes introduced during lifting—may map to multi- ple statement tuples, encoding the candidate set of Denition 3.4. The store also hosts auxiliary analysis relations: register mappings, stack slot descriptors, type-evidence candidates, struct lay out hy- potheses, and function signatures. All relations—statements, edges, and analysis facts—are subject to the same append-only discipline formalized in Proposition 3.7: higher-level passes add new facts without retracting lower-lev el ones. When inference is inconclusive, the pipeline retains all competing interpretations as parallel tuples, allowing candidates to propagate without premature commitment until the disambiguation phase constructs a single C program. Decompile Passes. Following the design in § 3.2, passes fall into two categories. Decompile passes consume known IR facts and analysis results to advance the reverse engineering process, lifting from one CompCert IR level to the next while retaining candidates where the mapping is ambiguous. Analysis passes do not lift the IR directly but derive auxiliary information that subsequent decompile passes rely on to construct higher-level r epresentations. The pipeline begins by recognizing x86-64 mnemonics and emit- ting typed Mach constructors. For example, a MO V to a stack- relative address becomes Msetstack , a base-plus-displacement load becomes Mload , and auxiliary relations map platform register Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski names to CompCert identiers. The next pass normalizes stack ac- cesses into typed slot descriptors. This step distinguishes between Local , Incoming , and Outgoing slots, and identies callee-save spills for later suppression. Finally , the linear instruction stream is reor- ganized into a control-ow graph. Recovering pseudo-registers from the xed hardware allocation requires the most eort. A union-nd-based algorithm traces def- use chains across the CFG to group physical-register uses into equivalence classes, each receiving a fresh pseudo-register . The Datalog rules then lift LTL instructions by substitution. Meanwhile, when x86 instructions deviate from CompCert’s convention, such as IDI V , which writes both quotient and remainder , individual rule emits distinct pseudo-register outputs; when decomposition requires multiple statements, a synthetic node preserves the single- statement-per-node invariant. RTL op erations are restructured into expression trees by arity-based dispatch: nullary to Econst , moves to Evar , unary to Eunop , binary to Ebinop . T wo preparatory passes then bridge to Clight: a Csharpminor conv ersion translates addr ess- ing modes into explicit pointer arithmetic, and a structuring pass computes the dominator trees to detect loops, if-then-else regions, and switch chains. The nal pass emits typed C-level statements by enumerating all feasible type assignment combinations, and each combination produces a distinct Clight candidate. A nalysis Passes. Unlike decompile passes, which directly advance the pipeline toward higher-le vel representations, analysis passes enrich the reverse engineering process with auxiliary information that subsequent decompile passes consume. They may combine Ascent rules with imperative logic, and, like decompile passes, they write candidate results to the shared r elation store when analysis cannot resolve to a single answer . For example, stack frame analysis computes def-use chains for stack variables by walking the CFG backward through stack-pointer adjustments. An RTL optimization phase iterates over copy pr opagation, dead store elimination, and variable live-range merge until a xpoint is reached. Type inference combines opcode-driven emission, p ointer-evidence accumulation, and constraint propagation seeded by external signatures. Struct recovery identies candidates from multi-oset pointer dereference patterns and discards degenerate layouts. Function signatur e recon- ciliation merges denition-site and call-site evidence to determine parameter counts, types, and return types. The pass architecture extends naturally beyond CompCert’s im- age. A s a concrete example, CompCert lacks variable-length arrays, so we added a single analysis pass that recognizes VLA allocations at the assembly level—a dynamic SUB of RSP by a register operand, followed by a capture of the adjusted stack p ointer—and emits a Mbuiltin ( alloca , size_reg ) into the Mach-level store. No downstream pass requires modication: the existing built-in infrastructure car- ries alloca through every subsequent IR, surfacing at Clight as a call to alloca . This illustrates the general extension pattern: a new analysis pass derives facts into the shared store, and the pipeline consumes them through existing paths. Selecting a Representative. PGSD produces a forest of candidate decompilations: at each CFG node, multiple candidates may coexist. Extracting a single coherent C program from this forest requires resolving various interdependent choices (e .g., the type of a load at one node constrains which candidate statements are consistent at its use sites). Rather than encode the whole C language syntax check in Datalog, Manifold uses Clang as a type-checking ora- cle. The search proceeds as a parallelize d, error-directed greedy enumeration in which each function is assigned to an independent worker with no shared state. For each function, an initial congura- tion is constructed by selecting the rst e dge-consistent candidate at every CFG no de and submitting it to Clang. Compiler errors drive subsequent renements: err or messages are parsed to iden- tify the oending node and determine a directed x—for instance, a “pointer to integer conversion” diagnostic steers the variable’s declared type toward the pointer candidate . When no directed x is available, the remaining candidates for that node are tried in turn. A replacement is accepted only if it strictly reduces the total error count. The search terminates when Clang reports zero errors or when the per-function step budget is exhausted, and the selected statements are combined with the structuring results into C code. 5 Evaluation W e evaluate Manifold along three research questions: • RQ1: How does the quality of Manifold ’s C output com- pare to that of existing decompilers? • RQ2: How eective is the candidate selection phase at resolving ambiguity? • RQ3: How does Manifold ’s pipeline scale with binary size and complexity? 5.1 Experiment Setup Our primary benchmark is GNU Cor eutils 9.10, compiled with GCC 11.4.0 at -O3 , targeting x86-64 Linux ELF binaries. Binaries are dy- namically linked and not stripped. Of the full suite, we evaluate 101 programs, excluding utilities whose implementations share a sin- gle source le (e .g., base64 and base32 via basenc.c ), which makes automated source-to-decompilation comparison infeasible. W e sup- plement Coreutils with the Assemblage dataset [ 27 ], which provides a function-level source for real-world binaries beyond system utili- ties. W e compare against IDA Pro 8.3 [ 19 ], Ghidra 12.0.3 [ 30 ], angr 9.2.162 [ 38 ], and RetDec 5.0 [ 1 ], all with a 10-minute timeout per binary . All experiments run on a ser ver with an AMD EPYC 7713P (128 logical cores) and 500 GB RAM under Ubuntu 22.04. W e evaluate along two dimensions. For output quality , w e assess correctness: function recovery , signature and struct accuracy , and Clang front-end errors, along with CodeBLEU [ 33 ] similarity to the original source, a metric that combines lexical o verlap, syntax- tree matching, and dataow comparison to capture code similarity . W e also test generalization across compilers and optimization lev- els. For scalability , we prole CP U time and p eak memory usage across binaries of var ying sizes, as retaining multiple candidates throughout the pipeline incurs additional memory and computa- tional overhead. 5.2 Decompilation Output Quality Structural Correctness. T o address RQ1 , we assess the quality of decompiled output along two complementar y dimensions. Since the binaries are not stripped, all decompilers benet from symbol table information such as function names and boundaries. For tools Superset Decompilation Figure 2: Function and struct-related statistics from decom- pilers. ID A and Ghidra do not recover struct by default. IDA types are sanitize d to match C typ es. Figure 3: Per-binary CodeBLEU score distributions across decompilers. like ID A and Ghidra, this advantage extends further: their heuris- tics lev erage symbol metadata to propagate variable names, r esolve library signatures, and seed type recov ery . In contrast, Manifold ’s pipeline consumes only disassembly and functions produced by Capstone, and all subsequent lifting is driven by the de clarative passes themselves. This unstripp ed setting therefor e provides a stronger baseline advantage to the conventional tools than Mani- fold , making their function identication, argument count, type and return value more accurate. With this context, we rst mea- sure targeted correctness: whether functions, types, and structs are recovered accurately according to the follo wing metrics: • Function recovery: matched to source rst by exact name, then by a coarse signature ngerprint matching of argu- ment count and per-argument type class • Return typ e: match after normalization that resolves aliases (e.g., size_t to unsigned long ), strips qualiers, and maps decompiler-generated struct names through a struct alias map. • Argument count and type: exact match of argument count, and types are compared positionally after the same normalization in the previous step • Struct recovery: matche d by layout: eld count and per- eld type Figure 2 summarizes these metrics across all ve decompilers. On function recovery , all tools perform comparably at approximately 0.9 accuracy , reecting the b enet of unstrippe d symbol tables. Return type accuracy shows more variation: Manifold and IDA both achieve 0.54, while angr trails at 0.33. On argument count, Manifold falls behind the more mature to ols: IDA, Ghidra, and RetDec each reach approximately 0.94, whereas Manifold achieves 0.69. Argument type accuracy is low across the board: IDA leads at 0.54, RetDe c and Ghidra follow , Manifold falls b ehind at 0.2, and angr trails at 0.1, the latter largely be cause angr defaults to coarse types such as unsigned long that inate coverage at the expense of precision. Struct recovery remains challenging for all tools: Manifold achieves 0.03, RetDec achieves 0.02, and ID A and Ghidra do not recover structs by default. The gap in argument and type accuracy is largely attributable to signature coverage: ID A and Ghidra ship with extensive type libraries spanning thousands of standard and platform-specic functions, whereas Manifold relies on a manually curate d signature le supplemente d by ABI-level inference. Note that IDA ’s reported types are sanitized to match standard C type names before comparison. Beyond per-feature accuracy , we evaluate how faithfully each decompiler preserves the structural properties of the original pro- gram. T able 2 reports v e additional metrics: contr ol-ow comple x- ity (CC Ratio), nesting depth (Depth Ratio), code volume (Statement Ratio), inter-procedural call structure (CG F1), and surface-level control-ow construct usage (CF Sim). A ratio of 1.0 indicates per- fect alignment with the original source. Manifold achieves the b est CC Ratio (1.32) and Depth Ratio (1.20), compared to 1.44–1.60 and 1.50–1.67 for the other tools, indi- cating that its output most closely pr eserves the original pr ogram’s decision-point count and nesting structure. It also leads on CG F1 at 0.359, reecting accurate recovery of inter-procedural call e dges. Howev er , Manifold exhibits the highest Statement Ratio (3.97 vs. 2.76–3.58) and the lowest CF Sim (0.625 vs. 0.807). The elevated statement count stems from two sources: the lack of expression folding, where operations that mature decompilers collapse into a single compound expression are instead emitted as separate assign- ments; and the structuring pass’s use of goto + if encodings for lo ops, which inates the count relative to a single while construct. The low CF Sim follows dir ectly from the same choice: CF Sim compares control-ow keyword vectors without recognizing semantic equiv- alence, so a goto -based loop is penalized even when behaviorally identical to a while loop. ID A and Ghidra tie for the highest CF Sim at 0.807, consistent with their mature pattern-matching heuristics for loop and switch recovery . Source-Level Similarity . T o complement the structural correct- ness assessment for RQ1 , we report CodeBLEU [ 33 ] scores to cap- ture overall syntactic and semantic similarity between the decom- piled code and the original source, as well as per-feature metrics. CodeBLEU averages are shown in Figure 3 for coreutils and T a- ble 3 for the Assemblage dataset. Since A ssemblage provides only function-level source rather than complete source code, w e use it exclusively for source-similarity scoring. W e randomly selected 1000 binaries, each containing at least 10 functions, and computed Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski T able 2: Structural similarity metrics across decompilers, averaged over coreutils. CC, Depth, measure cyclomatic com- plexity , max nesting depth. CG F1 is the call-graph edge F1 score. CF Sim denotes the cosine similarity between control- ow construct vectors. Decompiler CC Ratio Depth Ratio Stmt Ratio CG F1 CF Sim Ghidra 1.60 1.67 3.00 0.337 0.807 RetDec 1.50 1.50 3.58 0.331 0.705 angr 1.44 1.50 2.76 0.344 0.805 ID A 1.51 1.50 3.00 0.344 0.807 Manifold 1.32 1.20 3.97 0.359 0.625 T able 3: Function level CodeBLEU score on Assemblage dataset. Decompiler Manifold angr ghidra retdec ida CodeBLEU 0.22 0.24 0.27 0.25 0.28 T able 4: CodeBLEU scores for coreutils binaries across de- compilers. Binary ID A Ghidra RetDec angr Manifold hostid 0.36 0.29 0.38 0.29 0.36 whoami 0.36 0.32 0.39 0.30 0.34 ls 0.25 0.23 0.21 0.23 0.20 sort 0.23 0.22 0.19 0.22 0.20 A verage 0.30 0.27 0.28 0.26 0.28 the average CodeBLEU score for each function associate d with these binaries. Figure 3 and T able 3 show that the tools cluster into three tiers. ID A leads on both Coreutils (0.30) and Assemblage (0.28). Then, Ghidra and RetDec form a middle group, though RetDec drops from 0.28 on Coreutils to 0.25 on Assemblage . Manifold and angr share the third tier with no signicant dierence: angr scores 0.24 on both datasets, while Manifold scores 0.25 on Coreutils and 0.22 on Assemblage. Part of this gap is attributable to naming rather than structural quality: Manifold derives variable names from register assignments, producing synthetic identiers that widen the lexical distance beyond what structural dierences alone would suggest. Meanwhile, Manifold performs best on compact, well- structured utilities where structural delity dominates — scoring 0.36 on hostid and 0.34 on whoami , on par with ID A — and weakest on larger utilities with heavy library usage where name and variable recovery matter more , dropping to 0.20 on both ls and sort while ID A maintains 0.25 and 0.23 respectively (T able 4). Cross-Compiler Robustness. T o assess whether Manifold ’s CompCert- derived pipeline generalizes b eyond a single compiler , we evaluate on the same coreutils suite compiled with b oth GCC and Clang under ve optimization levels each ( -O0 through -O3 and -Os ). Figure 4: Decompiler metrics on dierent compilers/opti- mization levels Decompilation of 10 conguration binaries was completed success- fully without timeouts. W e obser ve stability across compilers and optimizations: the mean CodeBLEU varies by only 0.04 across all ten compiler–optimization combinations. Per-binary variability is similarly low , indicating that output quality is dominated by the inherent complexity of each utility rather than by the compiler or optimization level, shown in Figure 4. This suggests that Manifold ’s rule-based lifting captures general properties of the x86-64 compilation process—stack discipline, call- ing conventions, common instruction sele ction patterns—rather than compiler-specic idioms. The result is practically signicant: analysts rarely know which compiler produced a given binar y , and a decompiler whose output quality is sensitiv e to that choice would impose an additional burden on the reverse engineering workow . Syntactic V alidity . Although the ultimate goal of a decompiler is typically to recover program behavior rather than produce strictly compilable C code [ 6 , 10 , 45 ], assessing recompilability provides a measurement of syntactic and semantic corr ectness. T o address RQ2 , we pass each decompiler’s output through the clang-sys crate and aggregate the resulting diagnostics across all coreutils binaries (Figure 9). Figure 7 shows the distribution of candidate statements per node in Manifold ’s Clight IR. Roughly 60% of no des carry one candidate. In many cases this is simply because the lifting is unambiguous: a single x86 instruction maps to exactly one Clight statement with no type or structural alternatives. While the pipeline generates candidate divergence from sev- eral sources, such as type inference , signature reconciliation, and control-ow structuring, sev eral factors ensure most ambiguity is resolved prior to statement emission. First, total soundness, such as allowing all type candidates, would lead to a memor y explosion. In practice, type inference and renement passes (§ 3.2) propagate constraints that narrow down most register typ es to a precise set of candidates. Second, the interprocedural signature pass converges on a single signature per function, eliminating call node div ergence. Third, the structuring pass enforces a deterministic CFG decompo- sition, collapsing multiple body nodes into comp ound statements (e.g., Sihenelse , Sloop ) under a single representativ e node. Lastly , synthetic nodes introduced during lifting correspond to exactly one statement by construction. The remaining 40% of nodes with multiple candidates reect genuine residual statements that we are unable to resolve during reverse engine ering, and these are subsequently resolved during the selection phase § 4. Superset Decompilation Figure 5: Running time for decompilers. Manifold paral- lelizes via Rayon across all available cores; angr parallelizes per function across all available cores. Figure 6: Running time and max memor y consumption of binaries varying in sizes. Manifold produces the fewest total Clang errors, substantially fewer than Ghidra and ID A. The dominant error categories across all tools are undeclared identiers and missing type or de claration errors, reecting gaps in external header recovery rather than fun- damental structural defects. Manifold ’s low error count follows directly from the Clang-guided selection phase (§ 4): because can- didate sele ction is driven by Clang’s own type checker , the nal output is largely validated b efore emission. The residual errors fall into two categories: missing external declarations for library func- tions whose signatures are not captured by the current ABI model and function signature analysis, and imprecisions in individual analysis passes (e.g., o verly coarse type evidence or incomplete ex- pression folding). Both are addressable within the existing modular architecture by rening or adding passes. 5.3 Scalability , Runtime, and Memor y Usage T o evaluate RQ3 , we rst compare per-binary wall-clock time on coreutils across all ve decompilers (Figure 5), then prole Man- ifold on larger binaries to characterize its scaling b ehavior (Fig- ure 6). For most input binaries, wall-clock time grows roughly linearly with binary size, but several programs deviate sharply . jq (3 MB) takes roughly 460s, longer than binaries twice its size because its core operates on a recursive tagged-union type in which the same register serves as both a pointer and an integer depending on Figure 7: Candidate statement amount. Figure 8: Candidate statement versus binary node count. context. This ambiguity propagates through IRs, causing the Clight pass to emit multiple statement candidates per node and r oughly doubling the candidate relation size. Peak memory (Figure 6) follows a similar pattern. Most binaries stay under 15 GB, but jq and tmux reach 40-50 GB. The monotonic relation store r etains all candidates at every IR lev el, and as Figure 8 shows, binaries with more nodes generally produce more candidates per node, so peak RSS grows faster than binary size alone would suggest. The tmux outlier traces to the Clight pass, which scans the full type-candidate relation for every statement to resolve variable types, which is a cost that grows with the product of statements and candidates (Section 2.1). Despite these costs, Manifold completes all coreutils binaries in time comparable to existing decompilers, indicating that memory consumption, rather than computation time, is the primar y scalabil- ity bottleneck of the approach. As part of our future work, we plan to optimize memor y usage by extending Manifold to use the top- 𝑘 proofs and various analysis-informed heuristics to preemptively cut o likely-unproductive paths at lower-le vel IRs. 6 Related W ork Monolithic vs. Modular Decompilation. Decompilation tradi- tionally relies on monolithic architectures centered ar ound a single, Chang Liu, Yihao Sun, Thomas Gilray, and Kristopher Micinski Figure 9: Error distribution from Clang validation. heavily mutable intermediate representation (IR). Industry stan- dards like IDA Pro [ 18 ], alongside op en-source frameworks like Ghidra [ 30 ] and angr [ 2 , 38 ], op erate o ver massive imperative codebases where analyses are tightly coupled, hindering exten- sibility and masking delity issues [ 7 , 12 , 31 ]. Binar y lifters like RetDec [ 1 ] and McSema [ 11 ] translate binaries to LLVM IR, but as Liu et al. [ 28 ] demonstrate, LLVM’s heavy weight, compiler-centric encoding often degrades r everse-engineering precision. In contrast, modern compiler design embraces structural decomposition via nanopasses [ 23 , 24 , 35 ]. Manifold applies this philosophy to re- verse engineering, utilizing the stratied, formally spe cied IRs of the CompCert veried compiler [ 26 ] to decouple lifting into isolated logical steps. While prior work explores formally veried decompilation into logic [ 14 , 29 , 42 ], we focus on the architectural modularity of the lifting pipeline itself, using CompCert’s IRs as a blueprint for extensibility rather than strict formal verication. Declarative Binary Analysis. Datalog has proven highly ef- fective for complex program analysis, separating relational speci- cation from execution strategy via engines like Soué [ 21 ] and Ascent [ 34 ]. Frameworks like DOOP [ 39 ] scale context-sensitive pointer analysis to millions of lines of code. In the binary domain, ddisasm [ 16 ] demonstrated that reassembleable disassembly can be elegantly modeled as a Datalog inference problem over superset candidates [ 4 ]. Other tools applied logic programming to specic niches, such as C++ class recovery [ 36 ], compositional taint analy- sis [ 5 ], or smart contract de compilation [ 17 ]. Manifold generalizes this declarative approach: rather than restricting Datalog to disas- sembly or isolated analyses, we design the entire decompilation pipeline as a sequence of passes operating over a relation store. T ype Re covery and Machine Learning. Reconstructing high- level variables and composite types is critical for binary readability . Traditional constraint-based approaches (e.g., TIE [ 25 ], Retypd [ 32 ]) and heuristic methods [ 13 , 46 ] typically operate as isolated, post- hoc passes over a xed IR. More recently , neural networks and Large Language Models (LLMs) have be en heavily applied to predict types and variable names [ 9 , 22 , 47 ], translate binaries directly to source [ 20 , 40 ], and driv e agentic re verse-engineering workows [ 3 , 8 , 44 , 48 ]. Howev er , these approaches are often bolted onto opaque, monolithic decompiler backends. Manifold embeds type inference directly within the analysis pass pipeline, allowing type analysis to interact with data-ow and memory-layout recovery . 7 Conclusion W e presented declarative de compilation, a novel approach that restructures re verse engineering as a sequence of modular , logic- dened lifting passes over a shared monotonic relation store. W e formalized this approach as provenance-guided superset decom- pilation (PGSD), which retains all candidate interpretations with explicit derivation witnesses rather than committing early to a sin- gle analysis result. W e implemented our approach in Manifold , a system of 35K lines of Rust and Datalog that lifts Linux ELF binaries to C through CompCert IR. Our evaluation of GNU coreutils and Assemblage binaries shows that Manifold matches established decompilers in function recovery , type accuracy , and struct recon- struction, produces fewer compiler-r eported err ors, and generalizes across compilers and optimization levels. The modular pass ar chi- tecture enables straightforward extension: adding variable-length array support, for instance, required a single analysis pass with no modications to other passes. Several directions for future work remain: expression folding and structured lo op recovery would directly reduce the elevated statement counts and improve control-ow similarity; top-k prov e- nance pruning and analysis-informed candidate cutos at lower IR levels would addr ess the memory scalability bottleneck obser ved on typ e-ambiguous binaries. The pass architecture naturally ac- commodates richer targets such as other ISA support and more precise type analysis, requiring only additional passes rather than architectural changes. References [1] A vast Software. 2017. RetDec: Retargetable Machine-Code Decompiler Based on LLVM. https://github.com/avast/retdec. Accessed: 2025-09-13. [2] Zion Leonahenahe Basque, Ati Priya Bajaj, Wil Gibbs, Jude O’Kain, Derron Miao, Tiany Bao, Adam Doupé, Y an Shoshitaishvili, and Ruoyu W ang. 2024. Ahoy SAILR! There is No Nee d to DREAM of C: A Compiler-A ware Struc- turing Algorithm for Binar y Decompilation. In 33rd USENIX Se curity Sympo- sium (USENIX Security 24) . USENIX Association, Philadelphia, PA, 361–378. https://www.usenix.org/confer ence/usenixsecurity24/presentation/basque [3] Zion Leonahenahe Basque, Samuele Doria, Ananta Soneji, Wil Gibbs, Adam Doupé, Y an Shoshitaishvili, Eleonora Losiouk, Ruoyu W ang, and Simone Aonzo. [n. d.]. De compiling the Synergy: An Empirical Study of Human–LLM T eaming in Software Reverse Engineering. ([n. d.]). [4] Erick Bauman, Zhiqiang Lin, and K evin W . Hamlen. 2018. Sup erset Disassembly: Statically Rewriting x86 Binaries Without Heuristics. In Network and Distributed System Security Symposium . [5] Denis Bueno and CT ADL T eam. 2025. CTADL: Comp ositional T aint A nalysis in Datalog . Sandia National Lab oratories. https://github.com/sandialabs/ctadl [6] Kevin Burk, Fabio Pagani, Christopher Kruegel, and Gio vanni Vigna. 2022. De- comperson: How Humans Decompile and What W e Can Learn From It. In 31st USENIX Security Symposium (USENIX Security 22) . USENIX Association, Boston, MA, 2765–2782. https://w ww.usenix.org/confer ence/usenixsecurity22/ presentation/burk [7] Ying Cao, Runze Zhang, Ruigang Liang, and Kai Chen. 2024. Evaluating the Eectiveness of Decompilers. In Proceedings of the 33rd ACM SIGSOFT Interna- tional Symposium on Software T esting and A nalysis (ISST A 2024) . Association for Computing Machinery , 491–502. doi:10.1145/3650212.3652144 [8] Guoqiang Chen, Huiqi Sun, Daguang Liu, Zhiqi W ang, Qiang W ang, Bin Yin, Lu Liu, and Lingyun Ying. 2025. ReCopilot: Reverse Engineering Copilot in Binar y Analysis. arXiv preprint arXiv:2505.16366 (2025). [9] Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Claire Le Goues, Graham Neubig, and Bogdan V asilescu. 2022. Augmenting Decompiler Output with Learned V ariable Names and T ypes. In 31st USENIX Security Symposium ( USENIX Security 22) . USENIX Association, Boston, MA, 4327–4343. https://www.usenix. org/conference/usenixsecurity22/presentation/chen- qibin Superset Decompilation [10] Cristina Garcia Cifuentes. 1994. Reverse compilation techniques. https://api. semanticscholar .org/CorpusID:110021381 [11] Sandeep Dasgupta, Sushant Dinesh, Deepan V enkatesh, Vikram S. Adve, and Christopher W . Fletcher . 2020. Scalable validation of binary lifters. In Pr oceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) (PLDI 2020) . Association for Computing Machiner y , New Y ork, NY, USA, 655–671. doi:10.1145/3385412.3385964 [12] Luke Dramko, Jeremy Lacomis, Edwar d J. Schwartz, Bogdan Vasilescu, and Clair e Le Goues. 2024. A taxonomy of C decompiler delity issues. In Proceedings of the 33rd USENIX Conference on Security Symposium (P hiladelphia, P A, USA) (SEC ’24) . USENIX Association, USA, Article 22, 18 pages. [13] Khaled Elwazeer , Kapil Anand, Aparna K otha, Matthew Smithson, and Rajeev Barua. 2013. Scalable variable and data type detection in a binary rewriter . Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (2013). https://api.semanticscholar .org/CorpusID: 10227723 [14] Daniel Engel, Freek V erbeek, and Binoy Ravindran. 2023. BIRD: A Binary Inter- mediate Representation for Formally V eried Decompilation of x86-64 Binaries. In Proceedings of the 17th International Conference on T ests and Proofs (T AP 2023) (Lecture Notes in Computer Science) . Springer , 3–20. [15] Daniel Engel, Freek V erbeek, and Binoy Ravindran. 2024. On the De cidabil- ity of Disassembling Binaries. In Theoretical Aspects of Software Engineering: 18th International Symp osium, TASE 2024, Guiyang, China, July 29 – August 1, 2024, Proceedings (Guiyang, China). Springer- V erlag, Berlin, Heidelberg, 127–145. doi:10.1007/978- 3- 031- 64626- 3_8 [16] Antonio Flores-Montoya and Eric Schulte. 2020. Datalog Disassembly. In 29th USENIX Security Symposium (USENIX Security 20) . USENIX Association, 1075– 1092. https://w ww .usenix.org/conference/usenixsecurity20/presentation/ores- montoya [17] Neville Grech, Lexi Brent, Bernhard Scholz, and Yannis Smaragdakis. 2019. Gi- gahorse: Thorough, Declarative De compilation of Smart Contracts. In 2019 IEEE/ACM 41st International Conference on Software Engine ering (ICSE) . 1176– 1186. doi:10.1109/ICSE.2019.00120 [18] Hex-Rays. 2024. IDA Pro and Hex-Rays Decompiler . Hex-Rays SA. https://hex- rays.com/ida- pro/ [19] Hex-Rays. 2025. IDA Pro Disassembler and Debugger . Hex-Rays SA, Liège, Belgium. https://hex- rays.com/ida- pro/ [20] Peiwei Hu, Ruigang Liang, and Kai Chen. 2024. DeGPT: Optimizing Decompiler Output with LLM. In Proceedings 2024 Network and Distributed System Security Symposium . [21] Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soué: On Synthesis of Program Analyzers. In Computer Aide d V erication . Springer International Publishing, 422–430. [22] Jeremy Lacomis, Pengcheng Yin, Edward Schwartz, Miltiadis Allamanis, Clair e Le Goues, Graham Neubig, and Bogdan V asilescu. 2019. DIRE: A Neural Approach to Decompiled Identier Naming. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) . IEEE, 628–639. [23] C. Lattner and V . Adve. 2004. LLVM: a compilation framework for lifelong pro- gram analysis & transformation. In International Symp osium on Code Generation and Optimization, 2004. CGO 2004. 75–86. doi:10.1109/CGO.2004.1281665 [24] Chris Lattner, Mehdi Amini, Uday Bondhugula, Alb ert Cohen, Andy Davis, Jacques Pienaar , River Riddle, T atiana Shpeisman, Nicolas Vasilache , and Olek- sandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specic Computation. In 2021 IEEE/A CM International Symposium on Code Generation and Optimization (CGO) . 2–14. doi:10.1109/CGO51591.2021.9370308 [25] Jonghyup Lee, Thanassis A vgerinos, and David Brumley . 2011. TIE: Principled Reverse Engineering of T ypes in Binar y Programs. In Network and Distributed System Security Symposium . https://api.semanticscholar .org/CorpusID:630135 [26] Xavier Leroy . 2009. Formal verication of a realistic compiler . Commun. ACM 52, 7 (July 2009), 107–115. doi:10.1145/1538788.1538814 [27] Chang Liu, Rebecca Saul, Yihao Sun, Edward Ra, Maya Fuchs, T ownsend Southard Pantano, James Holt, and Kristopher Micinski. 2024. Assemblage: Automatic Binary Dataset Construction for Machine Learning. arXiv:2405.03991 [cs.CR] [28] Zhibo Liu, Yuanyuan Yuan, Shuai W ang, and Y uyan Bao. 2022. SoK: Demystifying Binary Lifters Through the Lens of Downstream Applications. In 2022 IEEE Symposium on Security and Privacy (SP) . IEEE, 1100–1119. [29] Magnus O Myreen, Michael JC Gordon, and Konrad Slind. 2012. Decompilation into logic—improved. In 2012 Formal Methods in Computer-Aided Design (FMCAD) . IEEE, 78–81. [30] National Security Agency. 2019. Ghidra Software Reverse Engineering Frame- work. https://ghidra- sre.org/. Accessed: 2025-09-13. [31] Nico Naus, Freek V erb eek, Dale W alker, and Binoy Ravindran. 2023. A For- mal Semantics for P-Code. In V eried Software. Theories, T ools and Experiments . Springer International Publishing, 111–128. [32] Matt Noonan, Alexey Loginov , and David Cok. 2016. Polymorphic type inference for machine code. SIGPLAN Not. 51, 6 (June 2016), 27–41. doi:10.1145/2980983. 2908119 [33] Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sun- daresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297 [cs.SE] https://arxiv .org/abs/2009.10297 [34] Arash Sahebolamri, Langston Barrett, Scott Moore, and Kristopher Micinski. 2023. Bring Y our Own Data Structures to Datalog. Proc. ACM Program. Lang. 7, OOPSLA2, Article 264 (Oct. 2023), 26 pages. doi:10.1145/3622840 [35] Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig. 2004. A Nanopass In- frastructure for Compiler Education. In Proceedings of the ACM SIGPLAN In- ternational Conference on Functional Programming (ICFP ’04) . ACM, 201–212. doi:10.1145/1016850.1016878 [36] Edward J. Schwartz, Cory F. Cohen, Michael Duggan, Jerey Gennari, Jerey S. Havrilla, and Charles Hines. 2018. Using Logic Programming to Recover C++ Classes and Methods from Compiled Executables. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (T oronto, Canada) (CCS ’18) . Association for Computing Machinery, Ne w Y ork, N Y , USA, 426–441. doi:10.1145/3243734.3243793 [37] Hovav Shacham. 2007. The Geometr y of Innocent Flesh on the Bone: Return- into-Libc without Function Calls (on the X86). In Proceedings of the 14th ACM Conference on Computer and Communications Security (Alexandria, Virginia, USA) (CCS ’07) . Association for Computing Machinery , New Y ork, NY, USA, 552–561. doi:10.1145/1315245.1315313 [38] Y an Shoshitaishvili, Ruoyu W ang, Christopher Salls, Nick Stephens, Mario Polino , Audr ey Dutcher , John Grosen, Siji Feng, Christophe Hauser , Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of ) The Art of W ar: Oensive T echniques in Binary Analysis. In IEEE Symposium on Security and Privacy . [39] Y annis Smaragdakis and Martin Bravenboer . 2011. Using Datalog for Fast and Easy Program Analysis. In Procee dings of the First International Conference on Datalog Reloaded (Datalog’10) . Springer- V erlag, 245–251. [40] Hanzhuo T an, Qi Luo, Jing Li, and Yuqun Zhang. 2024. LLM4Decompile: Decom- piling Binary Code with Large Language Models. arXiv preprint (2024). [41] V ector 35. 2024. Binary Ninja: A Reverse Engineering Platform. https://binar y . ninja/. Accessed: 2025-09-13. [42] Freek V erbeek, Joshua Bockenek, Zhoulai Fu, and Binoy Ravindran. 2022. For- mally veried lifting of C-compiled x86-64 binaries. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022) . Association for Computing Machinery, 934–949. doi:10.1145/3519939.3523702 [43] Daniel V otipka, Seth M. Rabin, Kristopher Micinski, Jerey S. Foster , and Michelle M. Mazurek. 2020. An Observational Investigation of Reverse En- gineers’ Processes. In Proce edings of the 29th USENIX Conference on Security Symposium (SEC’20) . USENIX Association, USA, Article 106, 18 pages. [44] W ai Kin W ong, Daoyuan Wu, Huaijin W ang, Zongjie Li, Zhibo Liu, Shuai W ang, Qiyi Tang, Sen Nie, and Shi Wu. 2025. DecLLM: LLM-A ugmented Recompi- lable De compilation for Enabling Programmatic Use of De compiled Code. In Proceedings of the ACM SIGSOFT International Symposium on Software T esting and A nalysis (ISST A) . 1841–1864. [45] Khaled Y akdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping johnny to analyze malware: A usability-optimized decompiler and malware analysis user study . In 2016 IEEE Symposium on Security and Privacy (SP) . IEEE, 158–177. [46] Zhuo Zhang, Yapeng Y e, W ei Y ou, Guanhong Tao , W en-chuan Le e, Y onghwi K won, Y o usra Aafer , and Xiangyu Zhang. 2021. OSPREY: Recovery of V ariable and Data Structure via Probabilistic Analysis for Stripped Binar y . In 2021 IEEE Symposium on Security and Privacy (SP) . 813–832. doi:10.1109/SP40001.2021.00051 [47] Chang Zhu, Ziyang Li, Anton Xue, Ati Priya Bajaj, Wil Gibbs, Yibo Liu, Rajeev Alur , Tiany Bao, Hanjun Dai, Adam Doupé, Mayur Naik, Y an Shoshitaishvili, Ruoyu W ang, and Aravind Machir y . 2024. T YGR: T ype Inference on Stripped Binaries using Graph Neural Networks. In 33rd USENIX Security Symposium (USENIX Security 24) . USENIX Association, Philadelphia, P A, 4283–4300. https: //www.usenix.org/confer ence/usenixsecurity24/presentation/zhu- chang [48] Muqi Zou, Hongyu Cai, Hongwei Wu, Zion Leonahenahe Basque, Arslan Khan, Berkay Celik, Jing Tian, Antonio Bianchi, Ruoyu W ang, and Dongyan Xu. 2025. D-LiFT: Improving LLM-Based Decompiler Backend via Code Quality-driven Fine-tuning. arXiv preprint arXiv:2506.10125 (2025).

Superset Decompilation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment