Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup
ArXiv ID: 2512.06256
Date: 2025-12-06
Authors: Aniruddha Maiti, Satya Nimmagadda, Kartha Veerya Jammuladinne, Niladri Sengupta, Ananya Jana

📝 Abstract

In this work, we report what happens when two large language models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed sentence. After that, each model reads the other's output and generates a response. This continues for a fixed number of steps. We used Mistral Nemo Base 2407 and Llama 2 13B hf. We observed that most conversations start coherently but later fall into repetition. In many runs, a short phrase appears and repeats across turns. Once repetition begins, both models tend to produce similar output rather than introducing a new direction in the conversation. This leads to a loop where the same or similar text is produced repeatedly. We describe this behavior as a form of convergence. It occurs even though the models are large, trained separately, and not given any prompt instructions. To study this behavior, we apply lexical and embedding-based metrics to measure how far the conversation drifts from the initial seed and how similar the outputs of the two models becomes as the conversation progresses.

💡 Deep Analysis

Deep Dive into Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup.

In this work, we report what happens when two large language models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed sentence. After that, each model reads the other’s output and generates a response. This continues for a fixed number of steps. We used Mistral Nemo Base 2407 and Llama 2 13B hf. We observed that most conversations start coherently but later fall into repetition. In many runs, a short phrase appears and repeats across turns. Once repetition begins, both models tend to produce similar output rather than introducing a new direction in the conversation. This leads to a loop where the same or similar text is produced repeatedly. We describe this behavior as a form of convergence. It occurs even though the models are large, trained separately, and not given any prompt instructions. To study this behavior, we apply lexical and embedding-based metrics to measure how far the conversation drifts from the ini

📄 Full Content

Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup Aniruddha Maiti1[0000−0002−1142−6344], Satya Nimmagadda2, Kartha Veerya Jammuladinne1, Niladri Sengupta3, and Ananya Jana2 1 West Virginia State University, Institute, WV 25112 {aniruddha.maiti, kjammuladinne}@wvstateu.edu 2 Marshall University, Huntington, WV {jana, nimmagadda2}@marshall.edu 3 Fractal Analytics Inc., USA dinophysicsiitb@gmail.com Abstract. In this work, we report what happens when two large lan- guage models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed sen- tence. After that, each model reads the other’s output and generates a response. This continues for a fixed number of steps. We used Mistral Nemo Base 2407 and Llama 2 13B hf. We observed that most con- versations start coherently but later fall into repetition. In many runs, a short phrase appears and repeats across turns. Once repetition begins, both models tend to produce similar output rather than introducing a new direction in the conversation. This leads to a loop where the same or similar text is produced repeatedly. We describe this behavior as a form of convergence. It occurs even though the models are large, trained separately, and not given any prompt instructions. To study this behav- ior, we apply lexical and embedding-based metrics to measure how far the conversation drifts from the initial seed and how similar the outputs of the two models becomes as the conversation progresses. Keywords: Convergence in Multi-Agent, Agentic Conversation, Multi- Agent, Multi-LLM Interaction. 1 Introduction Most evaluations of large language models rely on short prompts or single-turn completions. These tests measure the correctness, fluency, or instruction follow- ing capability in isolation. They do not reveal what happens when a model must respond to its own output or engage in a long exchange with other models. Prior works indicate that dialogue diversity tends to degrade over long-term simu- lations [6]. This suggests that the behavior of language models over time may expose failure modes that are not visible in one-step settings. This work extends that idea to a two-model setup. Instead of using a single model recursively, we allow two different models to take turns responding to each arXiv:2512.06256v1 [cs.CL] 6 Dec 2025 2 Maiti et al. other. Each model runs in its own process, with separate weights and tokenizers, and reads only the plain-text output of the other. There is no shared memory, no prompts, and no injected system instructions. The setup is minimal: the models are connected only through raw text files. The models used are Mistral Nemo Base 2407 and Llama 2 13B hf. The original Mistral 12B parameter model is developed by Mistral AI and NVIDIA. Different users have developed a variety of fine-tuned versions of this base model for different use-cases since its publication. The original Llama-2 13 Billion Pa- rameter model is released by Meta. Similar to Mistral, different users and de- velopers have fine-tuned or adopted the original base model for a variety of purposes. These two models are large autoregressive transformers with different training sources and configurations. Their architectural similarity and difference in training data during training make them a useful pair for studying how agent- like conversation progresses when they respond alternately based on the other model’s output. This setting raises a natural question: when two models interact only through language, how long can the conversation remain meaningful? Will they maintain topic and coherence, or will they collapse into repetition? If so, when and how does that collapse occur? While exploring these questions, we found that model pairs begin with co- herent dialogue, but often drift into repetition after a few turns. In some cases, the collapse is gradual. In others, it is sudden and marked by a repeated phrase. Once repetition sets in, it tends to persist in the following turns. This behavior appears even though the models are large, capable, and not aligned through any shared context or fine-tuning. Studying this interaction gives insight into the stability and limits of gen- erative systems. It also helps identify convergence behaviors that are easy to miss in prompt-based evaluations. Understanding how and when models fall into low-diversity states is important for tasks that involve long-form generation or multi-agent communication. 2 Related Work Research on multi-agent large language models (LLMs) has expanded rapidly in the past two years. This is a recent trend that investigates how several LLMs in- teract. Earlier studies differed in goals, but most of them examined how multiple models exchanged messages, reasoned together, or produced stable output. The main lines of work are in the areas of : frameworks for multi-agent interaction, evaluation methods, dialogue stability, and co

…(Full text truncated)…

📄 Read Full PDF on ArXiv