AI Agent Systems for Supply Chains: Structured Decision Prompts and Memory Retrieval

AI Agent Systems for Supply Chains: Structured Decision Prompts and Memory Retrieval
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study investigates large language model (LLM) -based multi-agent systems (MASs) as a promising approach to inventory management, which is a key component of supply chain management. Although these systems have gained considerable attention for their potential to address the challenges associated with typical inventory management methods, key uncertainties regarding their effectiveness persist. Specifically, it is unclear whether LLM-based MASs can consistently derive optimal ordering policies and adapt to diverse supply chain scenarios. To address these questions, we examine an LLM-based MAS with a fixed-ordering strategy prompt that encodes the stepwise processes of the problem setting and a safe-stock strategy commonly used in inventory management. Our empirical results demonstrate that, even without detailed prompt adjustments, an LLM-based MAS can determine optimal ordering decisions in a restricted scenario. To enhance adaptability, we propose a novel agent called AIM-RM, which leverages similar historical experiences through similarity matching. Our results show that AIM-RM outperforms benchmark methods across various supply chain scenarios, highlighting its robustness and adaptability.


💡 Research Summary

This paper investigates the use of large language model (LLM)‑based multi‑agent systems (MAS) for inventory management in multi‑echelon supply chains. The authors first construct a baseline MAS in which each tier of the supply chain is represented by an autonomous LLM agent. Agents operate in four sequential steps each period—inventory replenishment, ordering and demand observation, production/shipping, and profit‑loss calculation—using a state vector that captures current inventory, backlogs, recent shipments, and lead‑time information. A “fixed‑ordering strategy” prompt (P_DM) is enriched with a step‑by‑step procedural description (P_SD) and a traditional safety‑stock formula (P_SS). In a restricted setting with stationary demand and fixed lead times, this configuration reproduces the optimal base‑stock policy without any prompt tuning, demonstrating that a modern LLM can solve a classic inventory problem out‑of‑the‑box.

However, when demand variability, lead‑time uncertainty, or additional echelons are introduced, the baseline LLM‑MAS fails to maintain optimality, exposing a key limitation: the system’s performance is highly sensitive to scenario‑specific prompt engineering. To overcome this, the authors propose AIM‑RM, an adaptive agent that augments the decision prompt with a memory‑usage component (P_MU). AIM‑RM stores past experiences—state vectors, chosen order quantities, and resulting profit/loss—in a vector database. At decision time, it retrieves the K most similar historical states using Euclidean distance, inserts the retrieved cases into the prompt, and lets the LLM generate an order together with reasoning. This similarity‑based retrieval effectively provides the LLM with concrete examples of successful actions in comparable contexts, enabling it to generalize across diverse environments.

The experimental suite covers 12 scenarios varying in number of echelons (3‑ and 5‑tier), demand distributions (normal, Poisson, sudden spikes), and lead‑time patterns. Baseline methods include the classic base‑stock heuristic, centralized‑training‑decentralized‑execution (CTDE) reinforcement learning, and a prior LLM‑MAS without memory. Across all scenarios, AIM‑RM reduces total expected cost by 12 %–27 % relative to the best competing method, with especially pronounced gains under high demand volatility and uncertain lead times. Sensitivity analysis shows that appropriate thresholds for similarity and the choice of K balance the trade‑off between over‑fitting to specific episodes and providing useful guidance.

The study concludes that LLM‑based MAS can achieve optimal or near‑optimal inventory policies when equipped with structured prompts and a memory‑retrieval mechanism. The proposed AIM‑RM demonstrates robustness, adaptability, and interpretability (the LLM also outputs its reasoning). Future work is suggested on integrating real‑time data streams, multi‑objective optimization (e.g., carbon emissions), and large‑scale field deployments to validate the approach in operational supply chains.


Comments & Academic Discussion

Loading comments...

Leave a Comment