Inducing Probabilistic Programs by Bayesian Program Merging

Inducing Probabilistic Programs by Bayesian Program Merging
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This report outlines an approach to learning generative models from data. We express models as probabilistic programs, which allows us to capture abstract patterns within the examples. By choosing our language for programs to be an extension of the algebraic data type of the examples, we can begin with a program that generates all and only the examples. We then introduce greater abstraction, and hence generalization, incrementally to the extent that it improves the posterior probability of the examples given the program. Motivated by previous approaches to model merging and program induction, we search for such explanatory abstractions using program transformations. We consider two types of transformation: Abstraction merges common subexpressions within a program into new functions (a form of anti-unification). Deargumentation simplifies functions by reducing the number of arguments. We demonstrate that this approach finds key patterns in the domain of nested lists, including parameterized sub-functions and stochastic recursion.


💡 Research Summary

The paper presents a novel approach to learning generative models from a small set of examples by representing models as probabilistic programs. The authors build on the concept of Bayesian model merging, extending it to the richer class of probabilistic programs—a technique they call Bayesian program merging. The workflow begins with a trivial program that exactly reproduces every training example: each example is translated into a sequence of constructor calls of an algebraic data type, and all such programs are combined under a uniform‑choice distribution. This initial model has perfect likelihood but zero generalization because it never generates unseen data.

To improve the model, the authors define a search objective based on the Bayesian posterior probability P(M | D), which decomposes into a prior favoring shorter programs and a likelihood term measuring how well the program explains the data. The likelihood is estimated via selective model averaging, allowing the inference of parameters for stochastic primitives (e.g., Gaussian distributions for colors). The search space is explored using two families of program transformations—Abstraction and Deargumentation—applied iteratively within a beam‑search framework.

Abstraction identifies repeated sub‑expressions across the program and extracts them into new functions. This is achieved through anti‑unification, which finds the most general pattern that subsumes multiple concrete instances. By turning common sub‑trees into reusable functions, the program length shrinks, the prior probability rises, and the model gains compositional structure. Deargumentation reduces the number of arguments a function takes, either by merging similar arguments or by replacing explicit arguments with stochastic sampling inside the function body (e.g., replacing a list of concrete colors with a Gaussian‑sampled color). This further compresses the program and often improves the posterior.

The authors implement these ideas in a subset of the Church probabilistic programming language, supporting lambda abstraction, conditionals, stochastic primitives (uniform‑choice, flip, Gaussian), and user‑defined data constructors. Their experimental domain consists of “colored trees” represented as nested lists: each node carries a color and a size, and nodes may have a variable number of child branches ending in flowers. Starting from the data‑incorporation program, the system automatically discovers:

  1. A color function that samples a color from a Gaussian distribution, thus abstracting away individual color constants.
  2. A branch function that decides, via a flip, how many sub‑branches to create and recurses to generate deeper structure.
  3. A flower function that always creates three petal nodes, capturing the regularity that flowers have three petals.
  4. Additional helper functions for petal shading and branch information.

These functions are composed recursively to generate full trees, reproducing the observed distribution while also being capable of generating novel trees that respect the learned regularities. Quantitatively, the posterior probability of the final program exceeds that of the initial uniform‑choice program despite a slight drop in likelihood, because the dramatic reduction in program length yields a much higher prior.

The paper discusses limitations: the current system assumes data can be expressed as algebraic data types (lists, trees) and does not yet handle unstructured feature vectors; the transformation heuristics are hand‑crafted, which may not scale to more complex domains; and the beam width and scoring heuristics influence performance but are not systematically optimized. Future work is suggested in extending the type system, automating the discovery of transformation costs, and integrating meta‑learning to adapt transformation strategies across tasks.

In conclusion, Bayesian program merging demonstrates that a principled Bayesian objective combined with simple program‑level transformations can automatically induce meaningful abstractions, recursive functions, and stochastic parameterizations from very limited data. This bridges the gap between symbolic program induction and probabilistic modeling, offering a promising direction for learning highly expressive generative models without requiring large datasets.


Comments & Academic Discussion

Loading comments...

Leave a Comment