Context Learning for Multi-Agent Discussion
Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to the misalignment between their individual contexts.In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism.It enables LLMs to avoid premature convergence on majority noise and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%–50%, while enjoying favorable transferability and computational efficiency.
💡 Research Summary
The paper addresses a critical weakness in Multi‑Agent Discussion (MAD) systems: the tendency of multiple large language models (LLMs) to diverge or converge prematurely on incorrect “majority noise” because their static, hand‑crafted context prompts are misaligned. The authors propose Multi‑LLM Context Learning (M2CL), a framework that equips each LLM agent with a learnable context generator. This generator produces a fresh context instruction at every discussion round, conditioned on the task goal, the agent’s initial prompt, and the concatenated responses from all other agents in the previous round.
The theoretical contribution is Theorem 4.1, which formalizes how attention activations depend on context. It shows that (1) the sum of distances between each agent’s activation and the correct activation can be bounded by (a) inter‑agent activation divergence plus deviation from the initial context, and (b) a term that depends only on the initial contexts. The theorem implies two design principles: (i) initial contexts should be diverse and approximately orthogonal in latent space to provide a comprehensive basis, and (ii) contexts must evolve during discussion to reduce inter‑agent discrepancies while preserving coherence.
M2CL implements these principles in two stages. First, a lightweight initialization samples diverse prompts by clustering pre‑trained embeddings and selecting near‑orthogonal vectors. Second, a self‑adaptive balancing mechanism monitors two metrics each round: (a) context coherence (similarity of attention activations) and (b) output discrepancy (embedding distance between agents’ answers). When coherence becomes too high, indicating possible premature consensus, the mechanism deliberately diversifies the generated contexts; when output discrepancy remains large, it reinforces information fusion. The context generator is trained with a reinforcement‑learning style reward that combines answer correctness, reduced activation divergence, and minimized output distance. Importantly, the LLM parameters themselves remain frozen, so the overhead is limited to prompt‑level computation.
Experiments span nine challenging benchmarks, including multi‑step mathematical proofs, scientific article summarization, embodied robot manipulation, and mobile GUI control, and involve three state‑of‑the‑art LLMs (GPT‑4, Claude‑2, LLaMA‑2‑70B). Across all settings, M2CL outperforms prior MAD methods such as Debate, Self‑Consistency, and Auto‑Debate by 20 %–50 % absolute accuracy, with the largest gains (≈45 %) on complex GUI tasks. The additional runtime per round is only ~0.12 seconds, yielding a total overhead below 10 %. Moreover, the learned context generators transfer to different LLM architectures without retraining, delivering an extra 5 %–7 % boost, demonstrating model‑agnostic generality.
The paper’s contributions are threefold: (1) a principled, theoretically‑grounded analysis of how context influences multi‑LLM reasoning; (2) a practical, scalable method for dynamically generating and adapting contexts that mitigates both divergence and premature convergence; and (3) extensive empirical validation showing substantial performance improvements with modest computational cost. Limitations include the current focus on text‑only contexts and the linear growth of context generators with the number of rounds, suggesting future work on multimodal extensions and more memory‑efficient context compression. Overall, M2CL represents a significant step toward reliable, collaborative AI systems where multiple LLMs can reason together coherently and efficiently.
Comments & Academic Discussion
Loading comments...
Leave a Comment