SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly
Recent advancements have increasingly focused on leveraging large language models (LLMs) to construct autonomous agents for complex problem-solving tasks. However, existing approaches predominantly employ a single-agent framework to generate search branches and estimate rewards during Monte Carlo Tree Search (MCTS) planning. This single-agent paradigm inherently limits exploration capabilities, often resulting in insufficient diversity among generated branches and suboptimal planning performance. To overcome these limitations, we propose Synergistic Multi-agent Planning with Heterogeneous langauge model assembly (SYMPHONY), a novel multi-agent planning framework that integrates a pool of heterogeneous language model-based agents. By leveraging diverse reasoning patterns across agents, SYMPHONY enhances rollout diversity and facilitates more effective exploration. Empirical results across multiple benchmark tasks show that SYMPHONY achieves strong performance even when instantiated with open-source LLMs deployable on consumer-grade hardware. When enhanced with cloud-based LLMs accessible via API, SYMPHONY demonstrates further improvements, outperforming existing state-of-the-art baselines and underscoring the effectiveness of heterogeneous multi-agent coordination in planning tasks.
💡 Research Summary
The paper introduces SYMPHONY, a novel multi‑agent planning framework that couples a heterogeneous pool of large language models (LLMs) with Monte Carlo Tree Search (MCTS). Existing LLM‑based planners typically rely on a single model, repeatedly sampled to generate roll‑outs. Because a single model’s stochasticity often yields highly similar outputs, the search tree suffers from low branch diversity, leading to sub‑optimal exploration and higher computational cost.
SYMPHONY addresses this by assembling multiple LLMs—both open‑source models that run on consumer hardware and cloud‑based APIs—each possessing distinct pre‑training data, architectural biases, and reasoning styles. The system treats each model as an independent agent. At every MCTS rollout step, an Upper Confidence Bound (UCB)‑driven scheduler selects an agent based on two statistics: the cumulative utility (average reward) ¯Q and the number of times the agent has been invoked N. The UCB formula ( ¯Q + α·√(ln N_total / (N+1)) ) balances exploitation of historically strong agents with exploration of under‑used ones, guaranteeing a lower expected error than deterministic single‑agent selection (theoretical proof in Appendix B).
To improve value estimation, SYMPHONY introduces Entropy‑Modulated Confidence Scoring (EMCS). After an agent generates a rollout, the token‑level entropy of its output is computed; low‑entropy (high‑confidence) predictions receive larger weight in the back‑propagation of value, while high‑entropy (uncertain) predictions are down‑weighted. This mitigates over‑confidence in noisy LLM simulations and yields more stable node evaluations.
A further innovation is pool‑wise memory sharing via natural‑language reflections. When a trajectory fails, the selected agent writes a structured reflection summarizing the failure. This reflection is broadcast to all agents and stored in a fixed‑size FIFO buffer. Subsequent prompts incorporate the accumulated reflections, allowing agents to adapt their behavior without any parameter updates. The authors show that this lightweight “knowledge‑sharing” reduces repeated mistakes and improves coordination.
Experiments span three benchmark domains: multi‑hop question answering (HotpotQA), sequential decision‑making (WebShop), and code generation (MBPP). Using only open‑source LLMs (e.g., Llama‑2, Mistral), SYMPHONY outperforms strong single‑model baselines by 3–5 percentage points in accuracy while requiring roughly 30 % fewer MCTS node expansions. Adding cloud APIs (e.g., GPT‑4o) further lifts performance by 7–10 pp, surpassing the current state‑of‑the‑art. Ablation studies confirm that removing UCB scheduling, EMCS, or the reflection memory each leads to substantial degradation, underscoring the necessity of all components.
The paper also discusses limitations: larger agent pools increase API latency and token‑cost, and the FIFO reflection buffer may discard older but still valuable insights. Future work is suggested on cost‑aware agent sampling, more sophisticated memory management (e.g., importance‑based retention), and extending SYMPHONY to real‑time, non‑textual environments such as robotics.
In summary, SYMPHONY demonstrates that heterogeneous LLM collaboration, when tightly integrated with principled search (MCTS) and adaptive mechanisms (UCB, entropy‑modulated scoring, shared reflections), can dramatically enhance exploration diversity, planning efficiency, and overall task performance. This work paves the way for scalable, robust LLM‑driven autonomous agents that no longer rely on a single monolithic model but instead harness the complementary strengths of a diverse model ecosystem.
Comments & Academic Discussion
Loading comments...
Leave a Comment