Tacit Coordination of Large Language Models

Tacit Coordination of Large Language Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In tacit coordination games with multiple outcomes, purely rational solution concepts, such as Nash equilibria, provide no guidance for which equilibrium to choose. Shelling’s theory explains how, in these settings, humans coordinate by relying on focal points: solutions or outcomes that naturally arise because they stand out in some way as salient or prominent to all players. This work studies Large Language Models (LLMs) as players in tacit coordination games, and addresses how, when, and why focal points emerge. We compare and quantify the coordination capabilities of LLMs in cooperative and competitive games for which human experiments are available. We also introduce several learning-free strategies to improve the coordination of LLMs, with themselves and with humans. On a selection of heterogeneous open-source models, including Llama, Qwen, and GPT-oss, we discover that LLMs have a remarkable capability to coordinate and often outperform humans, yet fail on common-sense coordination that involves numbers or nuanced cultural archetypes. This paper constitutes the first large-scale assessment of LLMs’ tacit coordination within the theoretical and psychological framework of focal points.


💡 Research Summary

The paper investigates how large language models (LLMs) behave as agents in tacit coordination games, using Thomas Schelling’s focal‑point theory as a formal framework. In games with multiple Nash equilibria, pure rationality offers no guidance on which equilibrium will be selected. The authors therefore introduce a salience function S that assigns a non‑negative score to each equilibrium, and map these scores to a probability distribution over equilibria via a soft‑max with temperature β. When β=0 the model chooses uniformly at random; as β→∞ the model deterministically picks the most salient equilibrium.

Two theoretical conditions for the existence of a unique focal‑point equilibrium are proved. First, if all players share the same salience function and a tiny i.i.d. noise η is added to break ties, the equilibrium with the highest noisy salience is unique with probability one. Second, even when players have different salience functions, a symmetry group Γ that partitions equilibria into orbits can guarantee a unique focal point if the players agree on a common ordering of the average salience across orbits. These results formalise the intuition that shared cultural or structural cues can collapse the multiplicity of equilibria into a single, “obvious” outcome.

To evaluate whether LLMs exhibit such behaviour, the authors replicate the Amsterdam and Nottingham human experiments originally reported by Bardsley et al. (2010). In those studies participants answered 14 multi‑choice questions under three instructions: “pick” (no guidance), “guess” (estimate what a random partner would choose), and “coordinate” (explicitly aim for tacit coordination). Human performance is summarized using the Coordination Index (CI) – the probability that two randomly selected participants pick the same answer – and its Normalized version (NCI) which scales CI by the number of available options.

The empirical study tests a heterogeneous suite of open‑source LLMs, including Meta Llama‑3 (70 B, 1.1, 1.3 variants), Qwen‑2/2.5 (72 B), and GPT‑oss (20 B, 120 B). For each question the model is prompted 30 times, with three random permutations of answer order, yielding 90 trials per item. Results (see Figure 4) show that most LLMs achieve NCI scores comparable to, and often exceeding, those of human participants, especially under the “coordinate” instruction. This suggests that LLMs can implicitly infer the salient answer by leveraging the statistical regularities encoded during pre‑training, effectively performing a form of Theory‑of‑Mind reasoning without explicit training.

Nevertheless, the models display systematic failures on tasks that rely on simple numeric salience (e.g., picking “1, 50, 100” in a 1‑100 number game) and on questions that require nuanced cultural archetypes. In these cases the LLMs’ choices are scattered, leading to lower NCI than humans. To mitigate such gaps, the authors propose three learning‑free interventions:

  1. Saliency Prompt – prepend a meta‑instruction such as “choose the answer that most people would consider obvious”.
  2. Answer Ordering Randomization – shuffle the list of options for each trial to eliminate positional bias.
  3. Temperature‑Zero + Self‑Consistency – generate multiple samples at deterministic decoding, then select the most frequent answer.

Applying these techniques reduces the average NCI gap to under 5 % relative to human performance, demonstrating that simple prompt engineering can substantially improve tacit coordination without any fine‑tuning.

The paper also discusses limitations. The salience function S is not directly observable inside the model; the analysis relies on external prompts as proxies. Cultural bias remains a concern: models trained predominantly on English‑centric data tend to favour Western salient cues, hurting performance on culturally diverse items. Moreover, the experiments focus on two‑player, one‑shot games, leaving open the question of how LLMs would coordinate in multi‑agent, multi‑round settings.

Future work is outlined along three axes: (i) extracting or learning an internal salience estimator from model activations, (ii) extending the framework to many‑agent and repeated‑interaction games, and (iii) designing meta‑learning or reinforcement‑learning schemes that explicitly optimise for focal‑point selection.

In sum, this study provides the first large‑scale, theory‑grounded assessment of LLMs’ ability to engage in tacit coordination. It shows that, despite being trained for language prediction, modern LLMs already possess a surprisingly human‑like capacity to converge on focal points, and that modest, learning‑free prompt strategies can bridge the remaining gaps. The work bridges game‑theoretic concepts with contemporary AI, opening a pathway for designing more socially aware, cooperative AI agents.


Comments & Academic Discussion

Loading comments...

Leave a Comment