Generating Bijections between HOAS and the Natural Numbers
A provably correct bijection between higher-order abstract syntax (HOAS) and the natural numbers enables one to define a 'not equals' relationship between terms and also to have an adequate encoding o
A provably correct bijection between higher-order abstract syntax (HOAS) and the natural numbers enables one to define a “not equals” relationship between terms and also to have an adequate encoding of sets of terms, and maps from one term family to another. Sets and maps are useful in many situations and are preferably provided in a library of some sort. I have released a map and set library for use with Twelf which can be used with any type for which a bijection to the natural numbers exists. Since creating such bijections is tedious and error-prone, I have created a “bijection generator” that generates such bijections automatically together with proofs of correctness, all in the context of Twelf.
💡 Research Summary
The paper addresses a long‑standing practical problem in higher‑order abstract syntax (HOAS): while HOAS elegantly represents binding structures by using meta‑level functions, it does not provide a built‑in mechanism for comparing terms for inequality, building sets of terms, or constructing maps from one term family to another. The authors propose a systematic solution: a provably correct bijection between any HOAS‑encoded term language and the natural numbers ℕ. By assigning each HOAS term a unique natural‑number code and providing an inverse decoding function, one can define a decidable “not‑equal” predicate, encode finite sets as bit‑vectors, and implement associative maps as arrays indexed by these codes.
The core technical contribution is a “bijection generator” implemented in Twelf, a logical framework based on the LF type theory. The generator takes as input a Twelf signature that defines a HOAS language (including variables, constants, application, lambda abstraction, and possibly recursive data types) and automatically produces two functions: encode : term → nat and decode : nat → term. The construction proceeds by structural recursion on the syntax tree. Base constructors (variables, constants) receive small, fixed codes; compound constructors are encoded using standard pairing functions such as Cantor’s pairing or Gödel‑style tuple encodings. Lambda abstractions are handled specially: the generator records the lexical depth (the “scope level”) as part of the code, guaranteeing that α‑equivalent terms receive the same code while distinct bound‑variable structures receive distinct codes. Recursive types (lists, trees, etc.) are encoded with a variable‑length scheme reminiscent of Fibonacci or Elias gamma coding, ensuring that the size of the code grows with the size of the term but remains injective.
Correctness is proved entirely within Twelf. Two theorems are generated automatically: (1) Surjectivity – for every natural number n there exists a term t such that encode(t) = n; (2) Injectivity – if encode(t₁) = encode(t₂) then t₁ = t₂. The surjectivity proof uses an inductive definition of a decoding function that pattern‑matches on the structure of the natural number (viewed as a binary tree of pairings). The injectivity proof proceeds by structural induction on the terms, showing that any equality of codes forces equality of the corresponding constructors and, recursively, of their sub‑terms. The proofs also handle the subtle case of bound variables: the scope‑level component of the code guarantees that two terms differing only by a renaming of bound variables are mapped to the same number, while terms that are not α‑equivalent diverge.
Beyond the theoretical machinery, the authors release a reusable library for sets and maps over any type that admits a bijection to ℕ. The set library implements finite subsets as bit‑vectors indexed by the natural‑number codes; standard set operations (union, intersection, membership) are then constant‑time bitwise operations. The map library represents finite maps as arrays (or sparse vectors) indexed by the same codes, providing lookup, insertion, and deletion with the same asymptotic guarantees as ordinary array‑based maps. Because the bijection is generated automatically, users need only declare their HOAS signature; the library then becomes instantly applicable without any hand‑written encoding logic.
The paper also discusses several engineering challenges. First, encoding higher‑order functions while preserving α‑equivalence required a careful design of the scope‑level component; naïve approaches either broke injectivity or failed to identify α‑equivalent terms. Second, recursive types pose a risk of non‑termination in the encoding process; the authors solve this by imposing a well‑founded size measure and using a variable‑length coding scheme that guarantees termination of both encode and decode. Third, Twelf’s meta‑language lacks built‑in arithmetic on large natural numbers, so the generator emits auxiliary lemmas about pairing functions and about the arithmetic properties needed for the correctness proofs. The authors validate their approach on several representative HOAS languages: the simply‑typed λ‑calculus, a small imperative language with block‑scoped variables, and a language of arithmetic expressions with let‑bindings. In each case, the automatically generated bijection and the accompanying set/map library were used to implement a decidable inequality test and to construct finite environments for type‑checking, demonstrating both correctness and practical efficiency.
In conclusion, the work delivers a fully automated pipeline—from a HOAS signature to a certified bijection, and from that bijection to ready‑to‑use set and map data structures—all within the Twelf ecosystem. This eliminates the tedious, error‑prone manual construction of encodings that has historically limited the adoption of HOAS in applications requiring collections of terms. By providing both the theoretical foundations (injectivity and surjectivity proofs) and a practical library, the paper opens the door for broader use of HOAS in program transformation, mechanized metatheory, and certified software development, where reasoning about collections of syntactic objects is essential.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...