Large-Scale Optimization Model Auto-Formulation: Harnessing LLM Flexibility via Structured Workflow

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large-scale optimization is a key backbone of modern business decision-making. However, building these models is often labor-intensive and time-consuming. We address this by proposing LEAN-LLM-OPT, a LightwEight AgeNtic workflow construction framework for LLM-assisted large-scale OPTimization auto-formulation. LEAN-LLM-OPT takes as input a problem description together with associated datasets and orchestrates a team of LLM agents to produce an optimization formulation. Specifically, upon receiving a query, two upstream LLM agents dynamically construct a workflow that specifies, step-by-step, how optimization models for similar problems can be formulated. A downstream LLM agent then follows this workflow to generate the final output. The agentic workflow leverages common modeling practices to structure the modeling process into a sequence of sub-tasks, offloading mechanical data-handling operations to auxiliary tools. This reduces the LLM’s burden in planning and data handling, allowing us to exploit its flexibility to address unstructured components. Extensive simulations show that LEAN-LLM-OPT, instantiated with GPT-4.1 and the open source gpt-oss-20B, achieves strong performance on large-scale optimization modeling tasks and is competitive with state-of-the-art approaches. In addition, in a Singapore Airlines choice-based revenue management use case, LEAN-LLM-OPT demonstrates practical value by achieving leading performance across a range of scenarios. Along the way, we introduce Large-Scale-OR and Air-NRM, the first comprehensive benchmarks for large-scale optimization auto-formulation. The code and data of this work is available at https://github.com/CoraLiang01/lean-llm-opt.

💡 Research Summary

The paper tackles the pressing challenge of automating the formulation of large‑scale optimization models, a task that traditionally demands expert knowledge and extensive manual effort. The authors introduce LEAN‑LLM‑OPT (Lightweight Agentic Workflow Construction for Optimization), a novel framework that orchestrates multiple large language model (LLM) agents to turn a natural‑language problem description and associated external datasets into a complete mathematical model and executable Python code.

Core Architecture

LEAN‑LLM‑OPT consists of three cooperating agents:

Classification Agent – Parses the input query (text description plus metadata) and predicts the problem class (e.g., resource allocation, network revenue management, transportation).
Workflow‑Generation Agent – Using the predicted class, it retrieves one or more similar reference problems from a curated Ref‑Data set (96 instances covering small‑ and large‑scale problems). It then dynamically constructs a step‑by‑step workflow that outlines how to handle the data, identify decision variables and constraints, select an appropriate model structure, and apply a code template.
Model‑Generation Agent – Executes the generated workflow. At each step it calls auxiliary tools (file I/O APIs, data summarizers, simple statistical scripts) to extract the necessary parameters from the external datasets, and finally emits the mathematical formulation (objective and constraints) together with ready‑to‑run Python code (leveraging libraries such as PuLP or OR‑Tools).

By decomposing a monolithic, long‑prompt reasoning task into a sequence of well‑defined sub‑tasks, the framework dramatically reduces the token burden on any single LLM call and allows the model to focus its reasoning power on high‑level decisions while off‑loading mechanical data handling to specialized tools.

Benchmark Construction

Two benchmark suites are introduced:

Ref‑Data – A reference library of 96 optimization instances (78 small‑scale, 18 large‑scale) spanning six problem families (resource allocation, facility location, assignment, transportation, network revenue management, sales‑based linear programming).
Large‑Scale‑OR – A test set of 101 real‑world‑style instances, deliberately emphasizing medium‑size (20‑99 variables, 26 % of the set) and large‑size (≥ 100 variables, 55 %) problems. Each instance includes a textual problem statement, one or more CSV/Excel datasets, and a ground‑truth formulation.

These datasets fill a gap in the literature, where most existing benchmarks focus on tiny LPs that can be embedded directly in the prompt.

Experimental Evaluation

The authors instantiate LEAN‑LLM‑OPT with two LLM back‑ends: the commercial GPT‑4.1 and the open‑source gpt‑oss‑20B. Both models are run through the same agentic pipeline without any fine‑tuning. Results on Large‑Scale‑OR show:

Overall modeling accuracy (exact match of objective, constraints, and variable naming) of ≈ 76 % for both models, with GPT‑4.1 reaching ≈ 85 % on the hardest instances.
Superior performance compared to state‑of‑the‑art baselines such as ORLM, Gemini 3 Pro, and GPT‑5.2, whose accuracies drop below 50 % once input token length exceeds ~800 tokens.
Consistently high scores on established small‑scale benchmarks (NL4OPT, IndustryOR, Mamo), demonstrating that the workflow approach does not sacrifice performance on easier problems.

Real‑World Case Study

A practical deployment is presented for Singapore Airlines’ choice‑based revenue management. Two sub‑benchmarks are defined:

Air‑NRM‑CA (capacity allocation, 15 instances) – LEAN‑LLM‑OPT achieves the highest accuracy among all compared methods.
Air‑NRM‑NP (network planning, 21 instances) – The framework yields an average optimality gap of < 2 %, outperforming competing approaches on virtually every instance.

An ablation study confirms that both the dynamic workflow generation and the auxiliary data‑handling tools are essential: removing either component reduces accuracy by more than 10 %.

Insights and Limitations

The study demonstrates that agentic workflow construction is an effective strategy to scale LLM‑driven optimization modeling to realistic problem sizes. By delegating low‑level data manipulation to tools, the LLM can concentrate on high‑level reasoning, mitigating the well‑known degradation of performance with long prompts. However, the current implementation focuses on linear and mixed‑integer linear programs; extending the approach to non‑linear, stochastic, or dynamic optimization remains an open research direction. Moreover, the quality and diversity of the reference dataset directly influence the workflow’s relevance, suggesting that continuous enrichment of Ref‑Data will be crucial for broader adoption.

Conclusion

LEAN‑LLM‑OPT offers a general, modular, and cost‑effective solution for automating large‑scale optimization model formulation. It bridges the gap between the flexibility of large language models and the rigor of structured optimization practice, achieving strong empirical results on both synthetic benchmarks and a high‑impact airline revenue‑management use case. The framework’s plug‑and‑play nature—compatible with any LLM that supports tool use—positions it as a promising foundation for future AI‑augmented decision‑support systems across diverse industries.