ImprovEvolve: Ask AlphaEvolve to Improve the Input Solution and Then Improvise
Recent advances in LLM-guided evolutionary computation, particularly AlphaEvolve, have demonstrated remarkable success in discovering novel mathematical constructions and solving challenging optimization problems. In this article, we present ImprovEvolve, a simple yet effective technique for enhancing LLM-based evolutionary approaches such as AlphaEvolve. Given an optimization problem, the standard approach is to evolve program code that, when executed, produces a solution close to the optimum. We propose an alternative program parameterization that maintains the ability to construct optimal solutions while reducing the cognitive load on the LLM. Specifically, we evolve a program (implementing, e.g., a Python class with a prescribed interface) that provides the following functionality: (1) propose a valid initial solution, (2) improve any given solution in terms of fitness, and (3) perturb a solution with a specified intensity. The optimum can then be approached by iteratively applying improve() and perturb() with a scheduled intensity. We evaluate ImprovEvolve on challenging problems from the AlphaEvolve paper: hexagon packing in a hexagon and the second autocorrelation inequality. For hexagon packing, the evolved program achieves new state-of-the-art results for 11, 12, 15, and 16 hexagons; a lightly human-edited variant further improves results for 14, 17, and 23 hexagons. For the second autocorrelation inequality, the human-edited program achieves a new state-of-the-art lower bound of 0.96258, improving upon AlphaEvolve’s 0.96102.
💡 Research Summary
ImprovEvolve is a novel framework that builds on the recent successes of AlphaEvolve, a large‑language‑model (LLM) guided evolutionary algorithm for solving hard mathematical optimization problems. While AlphaEvolve evolves complete programs that directly output a candidate solution, this requires the LLM to design an entire end‑to‑end optimization pipeline—including initialization, search strategy, and termination criteria—placing a heavy cognitive load on the model. ImprovEvolve addresses this bottleneck by redefining the evolutionary target: instead of a monolithic solver, the LLM evolves a Python class that implements three simple, well‑defined methods: (1) generate_config(seed) – produces a feasible initial configuration given a random seed; (2) improve(x) – takes any configuration x and returns a locally improved version with strictly better fitness; and (3) perturb(x, σ) – randomly perturbs x with a controllable intensity σ.
The separation of concerns allows the LLM to focus on domain‑specific heuristics for each sub‑task rather than on the full algorithmic structure. The evolved class is then used inside a two‑stage validation scheme (Algorithm 1). In Stage A, K different seeds are sampled, each passed through generate_config followed by improve; the best resulting configuration x* is kept as the starting point for the global search. Stage B performs a basin‑hopping‑style loop: σ is scheduled from a large σ_max to a small σ_min (geometric decay), perturb is applied to the current best solution, improve refines the perturbed point, and a monotonic acceptance rule (temperature T = 0) updates the incumbent only if fitness does not decrease. This inner loop is essentially classic basin hopping, but the perturbation and local‑optimization operators are themselves evolved LLM‑generated functions, making the search highly problem‑aware.
The outer evolutionary loop uses the open‑source GigaEvo framework with a MAP‑Elites archive. The only behavior descriptor is fitness, discretized into 150 bins. Each generation selects elite programs proportionally to fitness, then creates N offspring via an LLM‑based mutation operator that receives the parent source code, execution metrics, and pipeline insights (InsightsStage, LineageInsights). The mutation operator suggests code edits, which are immediately evaluated by running the inner basin‑hopping loop; successful offspring are inserted into the archive. To bootstrap evolution, five diverse initial programs are generated automatically with Gemini 3 Pro, after which the evolution itself is carried out with the cheaper Gemini 3 Flash Preview model, striking a balance between capability and cost.
The authors evaluate ImprovEvolve on two notoriously difficult benchmarks previously tackled by AlphaEvolve. The first is the hexagon‑packing problem: arranging n unit hexagons inside a larger hexagon to minimize the side length of the container. ImprovEvolve achieves new state‑of‑the‑art side lengths for n = 11, 12, 15, 16, and, after minor human editing of the evolved class, also improves results for n = 14, 17, 23. The second benchmark is the second autocorrelation inequality, a high‑dimensional continuous optimization task where the objective is to maximize a lower bound. A human‑edited ImprovEvolve program reaches a bound of 0.96258, surpassing AlphaEvolve’s 0.96102.
Computationally, a single run on the hexagon‑packing task (≈3n variables) takes about 10 hours on a single machine, while the autocorrelation inequality (tens of thousands of parameters, each improve call invoking L‑BFGS) requires roughly 40 hours. The number of basin‑hopping rounds R can be reduced during evolution for faster fitness estimation and increased during final validation to obtain higher‑quality solutions.
Key contributions of the paper are: (1) a modular program interface that dramatically reduces the design burden on LLMs; (2) the integration of MAP‑Elites with a customized basin‑hopping inner loop, enabling efficient global‑local search; (3) empirical demonstration that the approach not only matches but exceeds the performance of the prior state‑of‑the‑art AlphaEvolve on challenging mathematical problems. The authors argue that this decomposition is broadly applicable to other high‑dimensional, non‑convex optimization domains such as physics simulations, engineering design, and combinatorial layout problems, where LLMs can supply problem‑specific heuristics while the evolutionary framework handles exploration and exploitation.
Comments & Academic Discussion
Loading comments...
Leave a Comment