Counting and generating lambda terms
Lambda calculus is the basis of functional programming and higher order proof assistants. However, little is known about combinatorial properties of lambda terms, in particular, about their asymptotic distribution and random generation. This paper tries to answer questions like: How many terms of a given size are there? What is a “typical” structure of a simply typable term? Despite their ostensible simplicity, these questions still remain unanswered, whereas solutions to such problems are essential for testing compilers and optimizing programs whose expected efficiency depends on the size of terms. Our approach toward the afore-mentioned problems may be later extended to any language with bound variables, i.e., with scopes and declarations. This paper presents two complementary approaches: one, theoretical, uses complex analysis and generating functions, the other, experimental, is based on a generator of lambda-terms. Thanks to de Bruijn indices, we provide three families of formulas for the number of closed lambda terms of a given size and we give four relations between these numbers which have interesting combinatorial interpretations. As a by-product of the counting formulas, we design an algorithm for generating lambda terms. Performed tests provide us with experimental data, like the average depth of bound variables and the average number of head lambdas. We also create random generators for various sorts of terms. Thereafter, we conduct experiments that answer questions like: What is the ratio of simply typable terms among all terms? (Very small!) How are simply typable lambda terms distributed among all lambda terms? (A typable term almost always starts with an abstraction.) In this paper, abstractions and applications have size 1 and variables have size 0.
💡 Research Summary
The paper tackles two fundamental problems concerning lambda‑calculus terms: (1) how many closed lambda‑terms exist of a given size, and (2) how to generate such terms uniformly at random. The authors adopt de Bruijn indices, which replace named variables with numeric scopes, thereby eliminating α‑conversion issues and allowing a clean combinatorial treatment. Size is defined so that an abstraction (λ) and an application have weight 1 while a variable has weight 0.
Using this size model they derive three distinct families of counting formulas for the number aₙ of closed terms of size n. The first is a direct recurrence
aₙ = aₙ₋₁ + Σ_{k=1}^{n‑2} a_k·a_{n‑1‑k},
which separates the cases where the term begins with an abstraction from those where it begins with an application. The second and third families arise from the generating function A(z)=Σ aₙ zⁿ, leading to a non‑linear functional equation for A(z) and to an explicit series expansion. By applying complex‑analytic singularity analysis they locate the dominant singularity ρ of A(z) and prove that aₙ grows asymptotically like C·ρ⁻ⁿ·n^{‑3/2}. This exponent mirrors the classic Catalan asymptotics, but the constant ρ is strictly smaller, reflecting the additional restriction imposed by variable binding.
From the counting results they construct an exact Boltzmann‑type sampler for closed lambda‑terms of a prescribed size. The sampler works recursively: at each step it chooses among “add an abstraction”, “add an application”, or “insert a variable” with probabilities proportional to the pre‑computed counts of the remaining sub‑structures. Memoisation of intermediate counts guarantees O(n) time and space for generating a term of size n, and the algorithm produces a uniform distribution over all terms of that size. The implementation is provided in both Haskell and OCaml, and the source code is released for reproducibility.
Extensive experiments using the sampler yield several statistically significant observations. First, the proportion of simply‑typable terms among all closed terms drops dramatically as size grows: for n≈20 the ratio is about 1 %, while for n≈40 it falls below 10⁻⁴. Consequently, almost all large lambda‑terms are untypable. Second, typable terms almost always start with one or more leading abstractions; the average number of initial λ‑binders is 2–3. Third, the average depth of variable bindings grows roughly logarithmically with term size, and the average number of “head” abstractions (those that appear before the first application) is small but non‑negligible. These empirical facts corroborate theoretical expectations about the structure of random lambda‑terms and provide concrete data for benchmarking type‑inference engines.
The paper also presents four combinatorial identities linking the various counting sequences, such as alternative recurrences and convolution formulas. These identities reveal a deep connection between the lambda‑term enumeration and known combinatorial families (e.g., Dyck paths, non‑plane binary trees) while highlighting the novel constraints introduced by de Bruijn indexing.
Finally, the authors argue that their methodology—size‑aware generating functions, singularity analysis, and exact Boltzmann sampling—extends naturally to any language with bound variables and scopes, including π‑calculus, ML‑style let‑bindings, and modern systems with explicit lifetimes. By providing both a rigorous asymptotic theory and a practical random generator, the work bridges a long‑standing gap between the combinatorial theory of lambda‑calculus and its applications in compiler testing, program optimisation, and probabilistic analysis of functional programs.
Comments & Academic Discussion
Loading comments...
Leave a Comment