Parametric Compositional Data Types

In previous work we have illustrated the benefits that compositional data types (CDTs) offer for implementing languages and in general for dealing with abstract syntax trees (ASTs). Based on Swierstra’s data types 'a la carte, CDTs are implemented as a Haskell library that enables the definition of recursive data types and functions on them in a modular and extendable fashion. Although CDTs provide a powerful tool for analysing and manipulating ASTs, they lack a convenient representation of variable binders. In this paper we remedy this deficiency by combining the framework of CDTs with Chlipala’s parametric higher-order abstract syntax (PHOAS). We show how a generalisation from functors to difunctors enables us to capture PHOAS while still maintaining the features of the original implementation of CDTs, in particular its modularity. Unlike previous approaches, we avoid so-called exotic terms without resorting to abstract types: this is crucial when we want to perform transformations on CDTs that inspect the recursively computed CDTs, e.g. constant folding.

💡 Research Summary

The paper addresses a notable limitation of compositional data types (CDTs), a Haskell library built on Swierstra’s “data types à la carte” approach, namely the lack of a convenient and type‑safe representation for variable binders in abstract syntax trees (ASTs). While CDTs excel at modular definition of recursive data structures and the functions that operate on them, they traditionally rely on name‑based or naïve higher‑order abstract syntax (HOAS) techniques for binding, which either require manual α‑conversion handling or suffer from the infamous exotic‑term problem—terms that are ill‑typed at the meta‑level but escape the type checker.

To overcome this, the authors integrate Chlipala’s parametric higher‑order abstract syntax (PHOAS) into the CDT framework. PHOAS encodes binders as Haskell functions while parametrising the type of bound variables, thereby guaranteeing capture‑avoiding substitution and eliminating exotic terms through the type system. However, PHOAS expects a functorial representation that can simultaneously manipulate the type of bound variables and the recursive structure of the term, a capability that ordinary Functor instances lack.

The core technical contribution is the generalisation from Functor to Difunctor. A Difunctor is a bifunctorial abstraction with two type arguments, supporting a dimap operation that independently transforms the input (the variable type) and the output (the recursive sub‑term type). By redefining each signature as a Difunctor Sig a b, where a denotes the type of bound variables and b the type of the recursive children, the authors obtain a term type:

data Term f a = In (f a (Term f a))

This definition mirrors the original CDT encoding but now cleanly separates the binder from the recursion. The paper shows how to lift the standard catamorphism (cata) and paramorphism (para) to their difunctorial counterparts (cataD, paraD), preserving the modular recursion schemes that make CDTs attractive. Moreover, by providing Applicative and Monad instances for difunctorial signatures, effectful analyses (state, errors, logging) can be woven into transformations without breaking modularity.

A crucial advantage of this design is that exotic terms are ruled out at compile time. The term type is universally quantified over the variable type (forall a. Term f a), so any term that would capture a variable of the wrong type simply cannot be constructed. Unlike earlier approaches that introduced abstract wrapper types or relied on GADTs to hide the variable, the difunctor‑PHOAS integration achieves the same safety with a single, compositional abstraction.

The authors demonstrate the practicality of their approach with a constant‑folding optimisation. The optimisation traverses the AST, evaluates pure arithmetic sub‑expressions, and replaces them with literal nodes. Because the traversal is expressed as a difunctorial catamorphism, the optimiser can inspect recursively computed sub‑terms while the type system guarantees that all binders are well‑scoped. In contrast, a naïve HOAS implementation would either need runtime checks for exotic terms or would be forced to abandon the compositional style, losing the benefits of modularity.

The paper also situates its contribution relative to related work. Prior PHOAS‑based DSLs often combine GADTs with abstract types to avoid exotic terms, which complicates the code base and hampers extensibility. Other approaches, such as nominal techniques or de Bruijn indices, either introduce boilerplate or make α‑conversion explicit. By contrast, the difunctor‑based method retains the original CDT’s plug‑and‑play extensibility: new language constructs are added simply by defining additional difunctor signatures and composing them with :+:. Existing analyses and transformations remain unchanged, illustrating true modularity.

Finally, the authors outline future directions: porting the difunctor‑PHOAS framework to other languages with strong type systems (e.g., Scala, OCaml), integrating richer effect systems for more sophisticated optimisations (inlining, dead‑code elimination), and coupling the approach with proof assistants to formally verify transformation correctness.

In summary, the paper presents a clean, type‑safe, and modular solution for handling binders within compositional data types by elevating the underlying functorial abstraction to difunctors. This unifies the strengths of CDTs (modular syntax and recursion schemes) with the safety guarantees of PHOAS, eliminating exotic terms without sacrificing extensibility. The result is a powerful foundation for building robust language implementations, domain‑specific languages, and program‑transformation pipelines that require both flexibility and rigorous type‑level correctness.