Functorial Data Migration

In this paper we present a simple database definition language: that of categories and functors. A database schema is a small category and an instance is a set-valued functor on it. We show that morph

Functorial Data Migration

In this paper we present a simple database definition language: that of categories and functors. A database schema is a small category and an instance is a set-valued functor on it. We show that morphisms of schemas induce three “data migration functors”, which translate instances from one schema to the other in canonical ways. These functors parameterize projections, unions, and joins over all tables simultaneously and can be used in place of conjunctive and disjunctive queries. We also show how to connect a database and a functional programming language by introducing a functorial connection between the schema and the category of types for that language. We begin the paper with a multitude of examples to motivate the definitions, and near the end we provide a dictionary whereby one can translate database concepts into category-theoretic concepts and vice-versa.


💡 Research Summary

The paper proposes a categorical foundation for databases by treating a schema as a small category and an instance as a set‑valued functor on that category. Objects correspond to tables, morphisms to foreign‑key or constraint relationships, and the composition law captures the associativity of joins. An instance assigns a set of rows to each object and a function between row sets to each morphism, thereby encoding the entire database state as a functor. Natural transformations between such functors model updates or migrations.

The central technical contribution is the observation that any functor F : S → T between schemas induces three canonical “data migration functors”: the left Kan extension Σ_F (pushforward), the right Kan extension Π_F (right adjoint), and the pullback Δ_F (pullback). Σ_F aggregates data from T to S, implementing projection and union‑like operations across all tables simultaneously. Π_F propagates data from S to T, realizing joins and product‑like constructions in a single categorical step. Δ_F restricts a T‑instance to S, providing selective projection and filtering. These three functors together subsume the expressive power of SELECT, UNION, and JOIN in SQL, but they act at the level of whole schemas rather than individual queries, allowing complex queries to be expressed as compositions of functors and enabling systematic reasoning about query equivalence and optimization.

Beyond pure database theory, the authors connect the categorical model to functional programming languages. By interpreting a programming language’s type system as a category of types, a functor from the type category to the schema category yields a type‑safe mapping between program values and database rows. This mapping can be checked at compile time, guaranteeing that code respects the database schema and that schema evolution can be reflected automatically in the program.

The paper also supplies a “dictionary” translating between database terminology and categorical concepts: table ↔ object, column ↔ morphism, primary key ↔ isomorphism, foreign key ↔ domain‑codomain pair of a morphism, query ↔ functor expression, trigger ↔ natural transformation, schema evolution ↔ functor between categories, etc. This dictionary serves as a practical bridge for both database practitioners and category theorists.

Overall, the work demonstrates that categorical constructions provide a unified, mathematically rigorous framework for data migration, query formulation, and integration with typed programming languages. By abstracting projection, union, and join into adjoint functors, the authors offer a powerful alternative to traditional query languages, with clear benefits for schema evolution, interoperability, and formal verification of data‑centric software systems.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...