Hard Constraints Meet Soft Generation: Guaranteed Feasibility for LLM-based Combinatorial Optimization
Large language models (LLMs) have emerged as promising general-purpose solvers for combinatorial optimization (CO), yet they fundamentally lack mechanisms to guarantee solution feasibility which is critical for real-world deployment. In this work, we introduce FALCON, a framework that ensures 100% feasibility through three key innovations: (i) \emph{grammar-constrained decoding} enforces syntactic validity, (ii) a \emph{feasibility repair layer} corrects semantic constraint violations, and (iii) \emph{adaptive Best-of-$N$ sampling} allocates inference compute efficiently. To train the underlying LLM, we introduce the Best-anchored Objective-guided Preference Optimization (BOPO) in LLM training, which weights preference pairs by their objective gap, providing dense supervision without human labels. Theoretically, we prove convergence for BOPO and provide bounds on repair-induced quality loss. Empirically, across seven NP-hard CO problems, FALCON achieves perfect feasibility while matching or exceeding the solution quality of state-of-the-art neural and LLM-based solvers.
💡 Research Summary
The paper introduces FALCON, a novel framework that equips large language models (LLMs) with provable 100 % feasibility guarantees when used as end‑to‑end solvers for combinatorial optimization (CO) problems. The authors identify a fundamental gap: while LLMs excel at generating solution sequences from natural‑language descriptions, they lack any mechanism to enforce hard constraints such as Hamiltonian cycles, capacity limits, or precedence relations. Existing approaches treat feasibility as a soft objective, relying on reward shaping during training, which leads to highly variable feasibility rates at inference time and makes LLM‑based solvers unsuitable for safety‑critical applications.
FALCON addresses this gap through three tightly integrated components:
-
Grammar‑Constrained Decoding – For each CO problem the authors design a context‑free grammar (CFG) that precisely captures the valid output format (e.g., a list of node indices for TSP, a set of routes for CVRP). During generation, a push‑down automaton (PDA) derived from the CFG is used to mask tokens that would violate the grammar, guaranteeing that every generated string belongs to the language of the grammar. The authors prove a “Format Validity Guarantee” (Theorem 3.3) and show that the per‑token overhead is O(|Σ|·|Q|), which is negligible compared to the transformer’s attention cost.
-
Feasibility Repair Layer – Syntactically correct strings may still be semantically infeasible. The authors formalize a repair operator R that maps any candidate solution x to a feasible solution R(x) while satisfying three properties: (i) feasibility (R(x) ∈ feasible region), (ii) idempotence (R(x)=x if x is already feasible), and (iii) bounded locality (the distance between x and R(x) is proportional to a violation magnitude v(x)). They instantiate problem‑specific repair algorithms for seven benchmark problems (TSP, CVRP, OP, MIS, MVC, PFSP, JSSP) and provide worst‑case time complexities. Theorem 3.7 bounds the objective degradation after repair by L_f·α·v(x), where L_f is the Lipschitz constant of the objective. Consequently, even when repairs are needed, the quality loss is tightly controlled.
-
Adaptive Best‑of‑N Sampling – Fixed‑N sampling wastes compute on easy instances and may be insufficient for hard ones. FALCON measures “solution consistency” across K independent samples, showing (Lemma 3.10) that expected consistency equals the exponential of the negative Rényi‑2 entropy of the model’s output distribution. High consistency indicates the model is confident (easy instance), low consistency signals uncertainty (hard instance). An adaptive algorithm (Algorithm 2) uses a Bayesian confidence estimator (β‑Binomial) to decide when to stop sampling. Theorem 3.13 provides an upper bound on the expected number of samples, demonstrating that for easy instances the algorithm stops after the minimum budget N_min, while for hard instances it scales up to N_max. Because the repair layer already guarantees feasibility, only a single sample (N=1) is needed to achieve 100 % feasibility, yielding up to O(log (1/δ)·p_f) speed‑up compared with naive rejection sampling.
Training with BOPO – Traditional preference‑based reinforcement learning for CO suffers from sparse rewards and requires human‑generated preference pairs. The authors propose Best‑anchored Objective‑guided Preference Optimization (BOPO), which automatically constructs preference pairs from any two solutions by comparing their objective values and weights each pair by the absolute objective gap. This yields dense supervision aligned with the true optimization goal without any human labeling. Under standard stochastic optimization assumptions, BOPO converges at rate O(1/√T) (Theorem 3.4). Empirically, BOPO outperforms reward shaping and advantage‑normalized methods in both convergence speed and final solution quality.
Experimental Evaluation – The authors evaluate FALCON on seven NP‑hard problems spanning routing (TSP, CVRP, OP), graph (MIS, MVC), and scheduling (PFSP, JSSP). Across all benchmarks, FALCON achieves 100 % feasibility, while the average optimality gap matches or improves upon state‑of‑the‑art neural CO solvers (e.g., GNN‑based heuristics) and recent LLM‑based approaches (e.g., GPT‑3.5, Codex). Notably, for capacity‑constrained CVRP and precedence‑constrained JSSP, the repair step introduces only a modest 1.2 % increase in objective value on average. Adaptive sampling reduces inference time by roughly 30 % compared with a fixed‑N baseline, confirming the efficiency of the confidence‑driven stopping rule.
Contributions and Impact – FALCON is the first LLM‑based CO framework that provides formal guarantees of both syntactic validity and semantic feasibility, together with provable bounds on quality degradation after repair. By integrating grammar‑constrained decoding, principled repair operators, adaptive sampling, and the BOPO training scheme, the system bridges the gap between the expressive power of LLMs and the reliability required for real‑world deployment in logistics, manufacturing, and other high‑stakes domains. The paper also opens avenues for future work on extending repair mechanisms to multi‑objective or dynamic constraints, scaling the approach to larger problem instances, and exploring multimodal inputs.
Comments & Academic Discussion
Loading comments...
Leave a Comment