Even with AI, Bijection Discovery is Still Hard: The Opportunities and Challenges of OpenEvolve for Novel Bijection Construction
Evolutionary program synthesis systems such as AlphaEvolve, OpenEvolve, and ShinkaEvolve offer a new approach to AI-assisted mathematical discovery. These systems utilize teams of large language models (LLMs) to generate candidate solutions to a problem as human readable code. These candidate solutions are then ’evolved’ with the goal of improving them beyond what an LLM can produce in a single shot. While existing mathematical applications have mostly focused on problems of establishing bounds (e.g., sphere packing), the program synthesis approach is well suited to any problem where the solution takes the form of an explicit construction. With this in mind, in this paper we explore the use of OpenEvolve for combinatorial bijection discovery. We describe the results of applying OpenEvolve to three bijection construction problems involving Dyck paths, two of which are known and one of which is open. We find that while systems like OpenEvolve show promise as a valuable tool for combinatorialists, the problem of finding novel, research-level bijections remains a challenging task for current frontier systems, reinforcing the need for human mathematicians in the loop. We describe some lessons learned for others in the field interested in exploring the use of these systems.
💡 Research Summary
This paper investigates the capabilities and limitations of OpenEvolve, a state‑of‑the‑art evolutionary program synthesis platform, for the discovery of combinatorial bijections. OpenEvolve builds on the idea of assembling a team of large language models (LLMs) that first generate human‑readable code candidates for a given problem and then iteratively improve those candidates through evolutionary operators such as mutation, crossover, and selection. While prior work with similar systems (AlphaEvolve, ShinkaEvolve, etc.) has focused mainly on problems that can be expressed as bounds or optimization targets (e.g., sphere‑packing density), the authors argue that the program‑synthesis paradigm is especially well‑suited to problems whose solutions are explicit constructions. A bijection—an explicit one‑to‑one correspondence between two combinatorial families—fits this mold perfectly.
The authors select three bijection problems involving Dyck paths. Two are classic, well‑studied bijections: (1) Dyck paths ↔ binary trees and (2) Dyck paths ↔ balanced parenthesis strings. The third problem is an open question: a purported bijection between Dyck paths and a certain constrained lattice‑path family for which no published construction exists. By choosing both known and unknown cases, the study can measure OpenEvolve’s ability to reproduce existing mathematics and its potential to generate genuinely new research‑level constructions.
Experimental setup: each problem is encoded as a natural‑language prompt describing the combinatorial objects, the required one‑to‑one mapping, and any auxiliary constraints (e.g., preservation of length, height, or endpoint). Four distinct LLMs are assembled into an OpenEvolve team. The system first asks the LLMs to produce initial code solutions; roughly 10 % of these pass a basic syntactic check and a simple functional test. Those that survive become the initial population for the evolutionary loop. Mutation is implemented by prompting an LLM to edit a random fragment of a candidate (e.g., replace a loop with a recursion, add a guard condition). Crossover swaps entire function definitions between two candidates. Selection is driven by three criteria: (a) syntactic correctness, (b) compliance with the bijection specification as verified by an automated test suite, and (c) performance metrics such as time and memory usage.
Results for the two known bijections are encouraging. OpenEvolve successfully reproduces the textbook constructions, often generating code that mirrors the standard recursive transformations. In the Dyck‑to‑binary‑tree case, the evolutionary phase discovers a tail‑recursive version that reduces stack depth and improves runtime by about 15 % compared with the initial candidate. For the Dyck‑to‑parenthesis mapping, similar modest optimizations are observed, and the system consistently produces correct, readable code across multiple runs.
The open bijection, however, proves far more challenging. Initial candidates satisfy only a subset of the required constraints (e.g., they map Dyck paths to lattice paths with the correct length but violate the height restriction at certain steps). Throughout many generations, mutation and crossover fail to converge on a fully valid bijection. The authors identify three primary bottlenecks: (1) the problem’s logical complexity exceeds the LLMs’ ability to internalize all constraints from a single prompt; (2) current mutation operators explore only surface‑level syntactic changes, lacking deeper semantic restructuring; and (3) the selection mechanism, limited to the existing test suite, discards partially promising solutions that might become viable after more substantial modifications.
A central theme of the paper is the “human‑in‑the‑loop” paradigm. The researchers manually inspected intermediate generations, flagged logical errors, and supplied targeted feedback (e.g., “ensure the height never exceeds k”). This feedback was incorporated into subsequent prompts, dramatically increasing the quality of later candidates. Without such intervention, the evolutionary process tended to stagnate, repeatedly generating variants that either re‑introduced previously eliminated bugs or introduced new, unrelated defects.
The discussion outlines several avenues for future improvement. First, integrating a domain‑specific language (DSL) for bijection specifications could give mutation operators a richer, semantically aware search space. Second, coupling OpenEvolve with formal proof assistants (Coq, Lean) would allow the selection phase to enforce mathematically rigorous correctness rather than relying solely on empirical testing. Third, more sophisticated prompt engineering and fine‑tuning of the underlying LLMs could raise the baseline quality of initial candidates, reducing the evolutionary burden. Finally, the authors advocate for a broader benchmark suite of combinatorial constructions to systematically evaluate and compare evolutionary synthesis systems.
In conclusion, the study demonstrates that OpenEvolve is capable of reproducing known combinatorial bijections and can modestly improve their implementations, confirming the promise of evolutionary program synthesis for explicit‑construction problems. Nonetheless, discovering novel, research‑level bijections remains beyond the reach of current frontier systems. Human expertise, especially in guiding the evolutionary loop and providing precise logical feedback, remains indispensable. The paper thus positions OpenEvolve as a powerful assistive tool for combinatorialists while emphasizing the continued necessity of human creativity and insight in mathematical discovery.
Comments & Academic Discussion
Loading comments...
Leave a Comment