Aligning LLMs with Graph Neural Solvers for Combinatorial Optimization

Recent research has demonstrated the effectiveness of large language models (LLMs) in solving combinatorial optimization problems (COPs) by representing tasks and instances in natural language. However, purely language-based approaches struggle to ac…

Authors: Shaodi Feng, Zhuoyi Lin, Yaoxin Wu

Aligning LLMs with Graph Neural Solvers for Combinatorial Optimization
A L I G N I N G L L M S W I T H G R A P H N E U R A L S O L V E R S F O R C O M B I N A T O R I A L O P T I M I Z A T I O N Shaodi Feng 1 , Zhuoyi Lin 2 † , Y aoxin W u 3 , Haiyan Y in 4 , Y an Jin 5 , Senthilnath Jaya velu 2 , 6 , Xun Xu 2 1 National Y ang Ming Chiao T ung University 2 Institute for Infocomm Research (I 2 R), A*ST AR 3 Eindhov en Univ ersity of T echnology Eindhov en 4 Centre for Frontier AI Research, A*ST AR 5 Huazhong Univ ersity of Science and T echnology 6 National Univ ersity of Singapore A B S T R A C T Recent research has demonstrated the ef fectiv eness of large language models (LLMs) in solving combinatorial optimization problems (COPs) by represent- ing tasks and instances in natural language. Howe ver , purely language-based ap- proaches struggle to accurately capture complex relational structures inherent in many COPs, rendering them less effecti ve at addressing medium-sized or larger instances. T o address these limitations, we propose AlignOPT , a nov el approach that aligns LLMs with graph neural solvers to learn a more generalizable neural COP heuristic. Specifically , AlignOPT lev erages the semantic understanding ca- pabilities of LLMs to encode textual descriptions of COPs and their instances, while concurrently exploiting graph neural solvers to explicitly model the under- lying graph structures of COP instances. Our approach facilitates a rob ust inte gra- tion and alignment between linguistic semantics and structural representations, enabling more accurate and sca lable COP solutions. Experimental results demon- strate that AlignOPT achieves state-of-the-art results across div erse COPs, un- derscoring its effecti veness in aligning semantic and structural representations. In particular , AlignOPT demonstrates strong generalization, ef fecti vely e xtending to previously unseen COP instances. I N T R O D U C T I O N Combinatorial optimization problems (COPs), which in volv e finding optimal solutions from finite sets of objects, underpin numerous real-world applications in logistics, scheduling, and network design (Bengio et al., 2021). T ypical COPs, such as the T raveling Salesman Problem (TSP), V ehi- cle Routing Problem (VRP), and Knapsack Problem (KP), are notoriously challenging due to their NP-hard nature, requiring ef ficient heuristic or meta-heuristic solutions (W ang & Sheu, 2019; K on- stantakopoulos et al., 2022; Lin et al., 2024). T raditionally , COPs have been approached through exact optimization methods and domain-specific heuristics. Howe ver , these methods often require extensi ve domain knowledge and manual tuning, making them less adaptable to ne w problem vari- ants or different application conte xts. Recent studies indicate that large language models (LLMs) have emer ged as po werful and versatile tools for tackling a div erse range of COPs. By framing COPs within natural language descriptions, LLM-based methods hav e demonstrated initial success in automatically solving optimization prob- lems. Ne vertheless, despite these advancements, the current capability of LLMs to directly generate solutions remains primarily restricted to relativ ely small-scale problem instances, such as TSP with fewer than 30 nodes (Y ang et al., 2023; Iklasso v et al., 2024). In addition, existing LLM-based solutions still encounter inherent limitations when addressing COPs characterized by complex un- derlying structures, particularly graph problems (Cappart et al., 2023; Bengio et al., 2021; Drakulic † Corresponding author . 1 et al., 2024). Pure language models inherently lack e xplicit structural reasoning capabilities, making it difficult for them to ef fectiv ely capture and represent intricate relational information in graphs. Consequently , these limitations can significantly de grade solution optimality and ov erall quality , substantially limiting the applicability of LLM-driven approaches in realistic, large-scale settings, particularly in fields such as logistics, transportation, and supply chain management, where typical problem instances in volv e hundreds to thousands of nodes (Bengio et al., 2021). T o address these challenges, we propose AlignOPT , a novel frame work designed to integrate the complementary capabilities of LLMs and graph-based neural solvers for COPs. Specifically , LLMs provide robust semantic understanding and fle xible representation of natural language instructions, while graph-based neural solv ers explicitly capture relational structures and topological dependen- cies inherent in COP instances. T o ef fectiv ely align these two modalities, AlignOPT introduces a multi-task pre-training strate gy comprising two novel objecti ves: (1) a T ext-Graph Contrastive (TGC) loss, designed to align semantic node embeddings from LLMs with structural embeddings from graph-based neural solvers, and (2) a T ext-Graph Matching (TGM) loss, facilitating fine- grained multimodal node representation. By jointly optimizing these objecti ves, AlignOPT pro- duces unified representations that enhance the accuracy and richness of COP embeddings. In this way , AlignOPT leverages guidance from LLMs exclusi vely during the pre-training stage to embed optimization kno wledge into the graph neural solver (encoder). In the fine-tuning stage, AlignOPT fine-tunes the graph encoder along with a decoder trained via reinforcement learning to learn ef- fectiv e optimization policy . Consequently , AlignOPT utilizes only the graph encoder and decoder for inference, processing inputs directly as graphs without relying on textual input or an LLM. This approach significantly reduces inference overhead and enhances computational ef ficiency , enabling AlignOPT to achiev e superior generalization and solution quality across di verse COPs. Overall, the main contrib utions of this work to the COPs research can be summarized as follo ws. • W e introduce a nov el frame work AlignOPT , that explicitly aligns LLMs with graph- based neural solvers , bridging the gap between semantic and structural representations in COPs and mo ving be yond the single-modality reliance of current LLM-based models. AlignOPT performs multi-task pre-training across diverse text-attributed COPs , fa- cilitating a more informative encoding process and subsequent fine-tuning. This enables the generation of ef fectiv e and unified solutions for various COPs and adapts ef ficiently to unseen COPs without further reliance on LLMs during inference. • Extensi ve e xperiments on synthetic COP instances and real-world benchmarks demonstrate the effecti veness of our proposed AlignOPT , achie ving high performance gains over state- of-the-art solvers. R E L A T E D W O R K Neural Combinatorial Optimization Constructiv e neural combinatorial optimization (NCO) methods aim to learn policies that iteratively construct solutions in an autoregressi ve manner . Early approaches primarily emplo yed pointer networks (V inyals et al., 2015; Bello et al., 2016), a class of recurrent neural networks (RNNs) that encode inputs and generate outputs through a sequence-to- sequence frame work. Building on the T ransformer architecture (V aswani et al., 2017), the Attention Model (AM) (Kool et al., 2018) was subsequently dev eloped to address vehicle routing problems (VRPs), demonstrating superior performance compared to traditional heuristic methods. Following this, various strategies hav e been proposed to further improve T ransformer-based NCO models by exploiting the inherent symmetries in combinatorial optimization problems (COPs) (Kwon et al., 2020; Kim et al., 2022; Fang et al., 2024) and incorporating efficient acti ve search techniques (Hot- tung et al., 2021; Choo et al., 2022; Qiu et al., 2022). More recently , some work extends constructive NCO to be one-for -all solv ers aiming at multiple COPs by a single model (Zhou et al., 2024; Zheng et al., 2024; Berto et al.; Drakulic et al., 2024; Li et al.). Howe ver , they are constrained by specific problem structures, such as vehicle routing, which limits their representational scope and under- mines the model’ s learning capacity . In contrast, our AlignOPT delves into general text-attributed COPs described in natural language. Lev eraging the unified semantic representations inherent in LLMs, AlignOPT enables a general model to accommodate a wide range of COPs. Compared with GO AL (Drakulic et al., 2024) which proposes a unified encoder that is trained with supervised fine- tuning. AlignOPT goes further by 1). Explicitly aligning this encoder with structured optimization 2 insights derived from LLMs during pre-training. 2) Perform multi-task fine-tuning with reinforce- ment learning, ensuring superior generalization across diverse routing tasks during the fine-tuning stage. These enhancements e xplicitly encode generalized optimization reasoning from LLMs, en- abling the model to robustly generalize to di verse routing problems encountered in practice. LLM for Combinatorial Optimization Recent research on the application of LLMs to COPs has demonstrated promising and impactful results. As early attempts, LLMs operate as black-box solvers that either directly generate feasible solutions with natural language problem descriptions (Abgaryan et al., 2024) or iterativ ely refine initial solutions through guided search mechanisms (Y ang et al., 2023; Liu et al., 2024b). Notably , recent findings indicate that LLMs often exhibit lim- ited generalization capabilities, tending instead to replicate memorized patterns from training data rather than performing robust, adaptable reasoning (Zhang et al., 2024; Iklassov et al., 2024). On the other hand, LLMs can be tasked with generating e xecutable code that implements heuristic al- gorithms for solving COPs (Romera-Paredes et al., 2024; Liu et al., 2024a; Y e et al., 2024). By initializing a code template, LLMs iterativ ely refine algorithmic heuristics through an e volutionary process. While this approach demonstrates promising flexibility , it often requires substantial domain expertise and incurs high token usage for each specific problem instance. The most relev ant w ork to us is LNCS (Jiang et al., 2024), which integrates LLMs with NCO model to unify the solution process across multiple COPs. Howe ver , LNCS sequentially utilizes LLMs and T ransformer archi- tectures, resulting in a notable modality gap when compared to specialized neural solvers designed explicitly for COPs. Moreov er , LNCS heavily depends on the inference ef ficiency of LLMs, which is frequently constrained by significant computational requirements and limited conte xt lengths, thus restricting their scalability when inference on large-scale COPS. Instead, we propose AlignOPT to align LLMs, adept at semantic understanding, with graph-based neural solv ers, proficient in captur- ing structural information, aiming to enhance solution quality and generalization capabilities. Note that after pre-training of AlignOPT , LLMs are no longer required during the fine-tuning and infer- ence stages. This allows inference to be performed rapidly without the latency or cost associated with real-time LLM queries, significantly enhancing practical usability , scalability , and deployment feasibility . P R E L I M I N A R I E S Combinatorial Optimization Pr oblems Solving COPs in volv es identifying the optimal solution from a finite set of feasible candidates. Such problems are defined by their discrete nature, with solutions commonly represented as integers, sets, graphs, or sequences (Blum & Roli, 2003). Most COPs can be defined over a graph G with nodes and edges. Specifically , a COP P = ( S, f ) can be formulated as follows: min X f ( X , P ) s.t. c j ( X, P ) ≤ 0 , j = 0 , 1 , . . . , J. (1) where X = { x i ∈ D i | i = 1 , . . . , n } is a set of discrete variables; f ( X , P ) indicates the objectiv e function of COP and c ( X , P ) denotes the problem-specific constraints for the v ariable X . Note that typical COPs (e.g., TSP , CVRP , KP) are NP-hard problems. As a result, identifying the optimal solution s ∗ is computationally intractable in man y practical scenarios. Therefore, a more tractable approach inv olves searching for a set of feasible solutions S rather than stri ving for e xact optimality . The set S is formally defined as: S = { s = { ( x 1 , v 1 ) , . . . , ( x n , v n ) } | v i ∈ D i , c ( X, P ) ≤ 0 } . (2) where a solution s satisfies f ( s, P ) ≥ f ( s ∗ , P ) , ∀ s ∈ S . Neural Construction Heuristics for COPs Learning construction heuristics has become a widely adopted paradigm for addressing V ehicle Routing Problems (VRPs) (Bello et al., 2016; K ool et al., 2018; Kwon et al., 2020). In this framework, solutions are constructed incrementally by sequentially selecting valid nodes, a process effecti vely modeled as a Markov Decision Process (MDP). At each step, the agent observes a state composed of the problem instance and the current partial solution, and selects a valid node from the remaining candidates. This process continues iterativ ely until a complete and feasible solution is obtained. 3 Task Descriptio n : For a travel salesman problem (TSP), there wi ll be a li st of node s di stributed in a unit square, representing a series of cities … Instance Descr iption : Node (0). Attribution : [0. 6184 , 0. 8962 ]. The 3 nearest nodes and distances : [( 17 ): 0. 1067 , (6): 0. 1451 , (7): 0. 2120 ] ; Node (1). : … Task Des cription : For a knapsack probl em (KP), there will be a list of nodes distributed in an unit square, representing a series of items … Instance Description : Node (0). Attribution : [0. 2667 , 0. 9909 ]. Value - to - weight ratio and importance rank : [3. 7151 , 7]; Node (1). Attribution : [0. 6706 , 0. 1806 ]… LLM Unified Decoder Graph - based Encoder … (a) Pre - training with TGM a nd TGC Graph - based Encoder (b) Fine - tu ning w ithout L LM Codebook Mixed A ttention Add & Norm Feed Forward Output Adapter Add & Norm Input Adapter 𝑸 𝒎 + 𝑸 𝒎𝒏 # 𝑲 𝒎 + 𝑲 𝒎𝒏 # 𝑽 𝒎 MatMul SoftMax MatMul (c) Graph - based Encoder × L × H One - hot COP Representation Constraints Vector First N ode Selected Node 𝒉 𝒙 𝒉 𝒈 𝒉 𝒈 COP Instan ce COP Instan ce TGM los s TGC los s Figure 1: Overall workflow of AlignOPT . (a) AlignOPT first performs multi-task pretraining on div erse COPs to align semantic and structural node representations with TGC and TGM losses. The LLM remains frozen and processes the T AIs to generate semantic node representations. (b) The encoder and decoder are then fine-tuned through reinforcement learning to solve COPs. Notably , LLMs are e xcluded during this phase to ensure computational efficienc y , as the encoder has already been aligned with LLM-deriv ed representations during pre-training. (c) The model architecture of the graph-based encoder , which applies a mixed attention mechanism that enables handling COPs represented by graphs. The solution construction policy is typically parameterized by a neural network, such as a Long Short-T erm Memory (LSTM) or T ransformer, denoted by θ . At each decision step, the polic y in- fers a probability distribution over the valid nodes, from which one is sampled and appended to the partial solution. The ov erall probability of generating a tour π is then factorized as p θ ( π |G ) = Q T t =1 p θ ( π t |G , π

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment