Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery
Large language models (LLMs) have enabled rapid progress in automatic heuristic discovery (AHD), yet most existing methods are predominantly limited by static evaluation against fixed instance distributions, leading to potential overfitting and poor generalization under distributional shifts. We propose Algorithm Space Response Oracles (ASRO), a game-theoretic framework that reframes heuristic discovery as a program level co-evolution between solver and instance generator. ASRO models their interaction as a two-player zero-sum game, maintains growing strategy pools on both sides, and iteratively expands them via LLM-based best-response oracles against mixed opponent meta-strategies, thereby replacing static evaluation with an adaptive, self-generated curriculum. Across multiple combinatorial optimization domains, ASRO consistently outperforms static-training AHD baselines built on the same program search mechanisms, achieving substantially improved generalization and robustness on diverse and out-of-distribution instances.
💡 Research Summary
The paper addresses a fundamental weakness of current large‑language‑model (LLM) driven automatic heuristic design (AHD) pipelines: they are tied to a static evaluation set. Because the generated heuristics are repeatedly tested on a fixed distribution of combinatorial‑optimization instances, they tend to over‑fit and hit a performance ceiling when the instance distribution shifts. To overcome this, the authors formulate heuristic discovery as a two‑player zero‑sum game between a solver program (which maps an instance to a solution) and an instance‑generator program (which defines a stochastic distribution over problem instances). The payoff of a pair (solver s, generator g) is the expected normalized gap U(s,g)=Eₓ∼g
Comments & Academic Discussion
Loading comments...
Leave a Comment