On counting untyped lambda terms
We present several results on counting untyped lambda terms, i.e., on telling how many terms belong to such or such class, according to the size of the terms and/or to the number of free variables.
💡 Research Summary
The paper investigates the combinatorial enumeration of untyped λ‑terms by adopting de Bruijn indices, which replace named variables with natural numbers indicating the depth of their binding λ‑abstractions. The authors define Tₙ,ₘ as the set of λ‑terms of size n that may use only the indices {1,…,m}. They derive a clean two‑part recurrence: a λ‑abstraction contributes Tₙ,ₘ₊₁ (increasing the index bound by one) and an application contributes the product of two smaller terms, yielding
Tₙ₊₁,ₘ = Tₙ,ₘ₊₁ + ∑{k=0}^{n} T{n‑k, m}·T_{k, m}.
With the base case T₁,ₘ = m (the set of indices themselves), this recurrence can be evaluated efficiently because each term depends only on strictly smaller n or m.
Focusing first on closed terms (m = 0), the authors show that the sequence Tₙ,₀ dominates the Motzkin numbers Mₙ (Mₙ < Tₙ₊₁,₀). Since Motzkin numbers count unary‑binary trees, a bijection between such trees and λ‑terms of the form λ…λ 1 (all indices equal to 1) explains the inequality. Consequently, Tₙ,₀ grows at least as fast as 3ⁿ, contrasting with the Catalan growth 4ⁿ · n^{‑3/2}.
For a fixed size n, the authors treat the dependence on m as a polynomial P_Tₙ(m) = |Tₙ,ₘ|. They prove the recurrence
P_T₀(m)=0, P_T₁(m)=m,
P_Tₙ₊₁(m)=P_Tₙ(m+1) + ∑{k=1}^{n‑1} P_T_k(m)·P_T{n‑k}(m).
The degree of P_Tₙ is ⌈n/2⌉, and the leading coefficients exhibit a striking pattern: for odd indices the leading coefficient θ_{2q+1} equals the Catalan number C_q, while for even indices θ_{2q} equals the binomial coefficient (2q‑1 choose q). These results are obtained by translating the recurrence into functional equations for generating functions. The odd‑degree generating function O_d(z)=∑θ_{2i+1}z^i satisfies O_d(z)=1+z O_d(z)², whose solution is the Catalan generating function C(z) = (1‑√{1‑4z})/(2z). The even‑degree generating function follows from a linear relation involving O_d(z) and yields the binomial expression for θ_{2q}.
Beyond the leading term, the paper derives explicit formulas for the second and third coefficients (τ and δ). For example, τ_{2q+1}= (2q‑1)·C_{q‑1}·2/q, and δ_{2q+1}= q²·2^{‑q} + … (a more involved combination of Catalan numbers). These coefficients are expressed through convolutions of lower‑degree leading coefficients, reflecting the combinatorial composition of λ‑terms.
The second major part of the work treats normal forms. Two families are introduced: Gₙ,ₘ (normal forms without a leading λ) and Fₙ,ₘ (normal forms that start with a λ). Their combinatorial specifications are
Gₙ₊₁,ₘ = ∑{k=0}^{n} G{n‑k, m}·F_{k, m}, Fₙ₊₁,ₘ = Fₙ,ₘ₊₁ + Gₙ₊₁,ₘ.
Analogous to the term case, the authors define polynomials P_NFₙ(m) and Qₙ(m) for the counts of F and G respectively. The degree of each polynomial again equals ⌈n/2⌉, and the leading coefficients of the odd‑degree polynomials are Catalan numbers, mirroring the earlier result for general terms.
Throughout the paper, generating functions for the sequences of leading, second‑leading, and third‑leading coefficients are introduced (Od, Ev, Sod, etc.). By solving the functional equations they obtain closed forms such as
Od(z)=C(z), Ev(z)=z C(z)/(1‑2z C(z)),
and similar expressions for the other series, which reveal that the coefficient sequences correspond to known integer sequences (e.g., A002699, A144395) or to new ones not yet catalogued.
The authors conclude that the enumeration of λ‑terms, closed terms, and normal forms is tightly linked to classical combinatorial families (Catalan, Motzkin) and that the polynomial structure in the number of free variables provides a powerful tool for analyzing the distribution of term sizes. These results have immediate implications for random λ‑term generation, average‑case analysis of reduction strategies, and probabilistic modelling of λ‑calculus. The paper also opens several avenues for future work, such as deriving explicit formulas for higher‑order coefficients, exploring connections with derivatives of the Catalan generating function, and extending the methodology to typed λ‑calculi or other rewrite systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment