MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization
Prompt engineering is crucial for fully leveraging large language models (LLMs), yet most existing optimization methods follow a single trajectory, resulting in limited adaptability, gradient conflicts, and high computational overhead. We propose MAPGD (Multi-Agent Prompt Gradient Descent), a novel framework that reconceptualizes prompt optimization as a collaborative process among specialized agents. Each agent focuses on a distinct refinement dimension, such as instruction clarity, example selection, format structure, or stylistic adaptation, and their contributions are coordinated through semantic gradient embedding, conflict detection, and fusion. To further enhance robustness and stability, MAPGD introduces two new mechanisms: Hypersphere Constrained Gradient Clustering (HCGC), which enforces angular margin constraints for compact and well-separated clusters, and Channel Adaptive Agent Weighting (CAAW), which dynamically reweights agent contributions based on validation performance. Experiments on classification and reasoning benchmarks show that MAPGD consistently surpasses single-agent and random baselines in both accuracy and efficiency. Ablation studies confirm the effectiveness of gradient fusion, agent specialization, and conflict resolution. Together, these components establish MAPGD as a unified, gradient-based, and interpretable framework for robust prompt optimization with theoretical convergence guarantees.
💡 Research Summary
The paper introduces MAPGD (Multi‑Agent Prompt Gradient Descent), a novel framework that reconceptualizes prompt optimization as a collaborative process among several specialized agents. Traditional prompt‑tuning methods typically follow a single refinement trajectory, which can cause instability, conflicting update signals, and inefficient use of query budgets. MAPGD addresses these issues by decomposing the prompt into orthogonal refinement dimensions—such as instruction clarity, example selection, format structure, and stylistic adaptation—and assigning each dimension to a dedicated agent.
Each agent independently examines model errors and generates a natural‑language “pseudo‑gradient” that describes a targeted improvement (e.g., “make the instruction more concise”). Because these pseudo‑gradients live in discrete text space, MAPGD embeds them into a shared semantic vector space using a pretrained language model encoder, then normalizes them onto a unit hypersphere. Semantic similarity is measured by cosine similarity; gradients whose similarity falls below a predefined threshold are considered to be in conflict.
To resolve conflicts, MAPGD employs Hypersphere‑Constrained Gradient Clustering (HCGC). Gradients are clustered on the hypersphere via cosine K‑means, with the number of clusters adaptively bounded by the number of active gradients and a maximum limit. An angular margin constraint (n·α < β) is imposed so that intra‑cluster vectors are tightly packed while inter‑cluster vectors maintain a large angular separation. This geometric enforcement isolates incompatible refinement directions, preventing them from being merged inadvertently.
Within each coherent cluster, gradients are fused into a single representative direction. However, not all agents are equally reliable even when they agree semantically. MAPGD therefore introduces Channel‑Adaptive Agent Weighting (CAAW). Validation‑derived performance gains for each agent are transformed into softmax weights, controlled by a temperature‑like parameter λ. Agents that consistently contribute to validation improvement receive higher weights, while noisy agents are down‑weighted. The weighted fused gradients guide the generation of candidate prompts, which are evaluated under a budgeted bandit‑selection scheme to produce the next prompt iteration.
The authors prove that, under standard assumptions (unbiased pseudo‑gradient estimates, bounded variance, and proper learning‑rate schedule), MAPGD inherits the convergence rate of stochastic gradient descent, O(1/√T), and converges almost surely to a local optimum.
Empirical evaluation spans classification tasks (GLUE, SuperGLUE) and reasoning benchmarks (e.g., GSM‑8K). MAPGD consistently outperforms the single‑agent baseline ProTeGi and random baselines, achieving 1.2–3.5 percentage‑point gains in accuracy while using fewer API calls. Ablation studies show that removing HCGC leads to entangled conflicting gradients and a noticeable drop in performance; removing CAAW causes noisy agents to dominate updates, resulting in unstable training.
Limitations include the fixed set of agent roles (no dynamic role creation or negotiation) and reliance on the quality of the underlying semantic encoder, which may degrade clustering quality in domain‑specific settings lacking appropriate embeddings. Future work could explore adaptive role discovery, richer inter‑agent communication protocols, and cost‑aware budgeting strategies.
In summary, MAPGD offers a principled, gradient‑inspired, multi‑agent framework for prompt optimization that balances exploration across diverse refinement dimensions with robust conflict resolution and adaptive weighting. It preserves interpretability by operating directly in natural‑language space while delivering theoretical convergence guarantees and practical performance improvements, marking a significant step forward for scalable, automated prompt engineering.
Comments & Academic Discussion
Loading comments...
Leave a Comment