MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Large Language Models (LLMs) have achieved great success in many real-world applications, especially the one serving as the cognitive backbone of Multi-Agent Systems (MAS) to orchestrate complex workflows in practice. Since many deployment scenarios preclude MAS workflow modifications and its performance is highly sensitive to the input prompts, prompt optimization emerges as a more natural approach to improve its performance. However, real-world prompt optimization for MAS is impeded by three key challenges: (1) the need of sample efficiency due to prohibitive evaluation costs, (2) topology-induced coupling among prompts, and (3) the combinatorial explosion of the search space. To address these challenges, we introduce MASPOB (Multi-Agent System Prompt Optimization via Bandits), a novel sample-efficient framework based on bandits. By leveraging Upper Confidence Bound (UCB) to quantify uncertainty, the bandit framework balances exploration and exploitation, maximizing gains within a strictly limited budget. To handle topology-induced coupling, MASPOB integrates Graph Neural Networks (GNNs) to capture structural priors, learning topology-aware representations of prompt semantics. Furthermore, it employs coordinate ascent to decompose the optimization into univariate sub-problems, reducing search complexity from exponential to linear. Extensive experiments across diverse benchmarks demonstrate that MASPOB achieves state-of-the-art performance, consistently outperforming existing baselines.
💡 Research Summary
The paper introduces MASPOB, a novel framework for optimizing prompts in Large Language Model (LLM)‑driven Multi‑Agent Systems (MAS) under a strict evaluation budget. The authors first formalize the problem: each agent in a MAS is guided by a role‑specific prompt, and the overall system performance depends on the interaction of these prompts along a directed acyclic graph (DAG) representing the workflow. The search space is the Cartesian product of all agents’ prompt candidate sets, which grows exponentially with the number of agents, while each evaluation requires a full end‑to‑end execution of the MAS, making the problem a costly combinatorial black‑box optimization.
To address three core challenges—sample efficiency, topology‑induced coupling, and combinatorial explosion—MASPOB combines three technical components.
- Topology‑aware surrogate: A Graph Attention Network (GAT) encodes the workflow graph. Prompt embeddings, obtained from a pre‑trained text encoder, serve as node features. Through attention‑weighted message passing, the GAT captures asymmetric influence among agents, producing a graph‑level representation that is fed to a multilayer perceptron to predict the expected performance µ(c) of a prompt combination c. This surrogate explicitly models how changes in an upstream prompt propagate downstream, providing a structural inductive bias absent in flat, single‑agent optimizers.
- Bandit‑based exploration‑exploitation: Prompt search is cast as a contextual linear bandit problem. An information matrix M accumulates outer products of concatenated prompt embeddings from evaluated combinations. The uncertainty of a new combination is estimated as σ(c)=q·Φ(c)ᵀM⁻¹Φ(c). The acquisition function follows the Upper Confidence Bound (UCB) principle: UCB(c)=µ(c)+α·σ(c), where α controls the exploration weight. This formulation enables principled trade‑offs: promising regions are exploited, while under‑explored regions receive a bonus, ensuring efficient use of the limited budget.
- Coordinate ascent for scalable search: Exhaustive evaluation of all combinations is infeasible. MASPOB adopts a coordinate ascent strategy that iteratively optimizes one agent’s prompt while keeping the others fixed, selecting the prompt that maximizes the UCB score for that coordinate. This reduces per‑iteration complexity from O(∏|P_i|) to O(∑|P_i|), requiring only a linear number of surrogate evaluations. Because UCB evaluations involve only forward passes through the GAT, they are orders of magnitude cheaper than real MAS runs.
The overall algorithm proceeds as follows: (i) initialize the workflow graph and prompt embeddings; (ii) repeatedly perform coordinate ascent to propose a candidate combination; (iii) evaluate the candidate on the validation set to obtain a true performance score; (iv) update the GAT surrogate and the information matrix M; and (v) repeat until the evaluation budget T is exhausted. The best combination found is then tested on a held‑out test set.
Experiments cover six benchmarks spanning question answering (HotpotQA, DROP), code generation (HumanEval, MBPP), and mathematical reasoning (GSM8K, MATH). Under identical budgets, MASPOB consistently outperforms strong baselines: single‑agent prompt optimizers (OPRO, PromptBreeder), multi‑stage Bayesian optimizers (MIPRO), and ablations without the GAT or without coordinate ascent. The GAT contribution alone yields a 5–8 % performance gain, confirming the importance of topology‑aware modeling. Moreover, the coordinate ascent component proves essential; removing it leads to rapid budget exhaustion and degraded results.
In summary, MASPOB delivers a sample‑efficient, topology‑aware, and computationally scalable solution to MAS prompt optimization. By integrating a graph‑neural surrogate with a linear‑UCB bandit and a coordinate‑wise search, it navigates the exponential combinatorial space while respecting strict evaluation constraints. The authors suggest future extensions to cyclic workflows, multi‑objective settings (e.g., latency, cost, safety), and online continual learning, indicating broad applicability of the approach to real‑world, safety‑critical multi‑agent deployments.
Comments & Academic Discussion
Loading comments...
Leave a Comment