AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue
Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a game-theoretic framework that probes the asymmetry between information extraction (active deduction) and information containment (state maintenance) in dialogue. We propose two complementary tasks: AIDG-I, measuring pragmatic strategy in social deduction, and AIDG-II, measuring constraint satisfaction in a structured “20 Questions” setting. Across 439 games with six frontier LLMs, we observe a clear capability asymmetry: models perform substantially better at containment than deduction, with a 350 ELO advantage on defense;(Cohen’s d = 5.47). We identify two bottlenecks driving this gap: (1) Information Dynamics, where confirmation strategies are 7.75x more effective than blind deduction (p < 0.00001), and (2) Constraint Adherence, where instruction-following degrades under conversational load, accounting for 41.3% of deductive failures. These findings suggest that while LLMs excel at local defensive coherence, they struggle with the global state tracking required for strategic inquiry.
💡 Research Summary
The paper introduces the Adversarial Information Deduction Game (AIDG), a game‑theoretic evaluation framework designed to probe the asymmetry between information extraction (active deduction) and information containment (state maintenance) in multi‑turn dialogues with large language models (LLMs). Two complementary tasks are defined: AIDG‑I, a free‑form social‑deduction game where a “Holder” protects a natural‑language secret while a “Seeker” tries to infer it, and AIDG‑II, a constrained “20 Questions” style game where the Holder can only answer with “yes”, “no”, or “maybe”.
In AIDG‑I, 20 realistic facts serve as secrets. The Seeker operates in two modes: (A) Confirmation, where it starts with a hypothesis close to the secret and seeks confirmation, and (B) Blind Deduction, where it begins with maximal uncertainty and reduces entropy through structured questioning. An automated Arbiter classifies leaks into explicit, confirmatory, semantic paraphrase, or implicit admission.
AIDG‑II uses a closed ontology of 100 concrete nouns. The Holder’s output is limited to the trinary token set, forcing defensive strategies into a simple signal. The Seeker must ask property‑based queries and may only make a final guess at the end. A turn‑decay multiplier M(t) = (17‑t)/8 rewards early successful guesses and penalizes inefficient search.
Six frontier LLMs—GPT‑5, Gemini‑2.5‑Flash, Qwen3‑235B, DeepSeek‑V3.1, Llama‑4‑Maverick, and Granite‑3.3‑8B—were evaluated across 439 games (289 AIDG‑I, 150 AIDG‑II) using a round‑robin tournament with five independent random seeds. All models generated at temperature 0.0 (Seeker) and the Arbiter at 0.01 to ensure deterministic behavior. The dataset comprises 4,403 dialogue turns (average depth 9.5 for AIDG‑I, 11.1 for AIDG‑II).
Results show a pronounced defense advantage: the Holder role outperforms the Seeker role by an average of 350 ELO points (Cohen’s d = 5.47, p < 0.0001). Across all games, Holders won 86.6 % of matches (380/439). Defensive ELO scores have a tiny standard deviation (σ = 1.9), indicating consistent performance across architectures, whereas offensive ELO scores vary widely (σ = 53.3).
A key finding is the superiority of confirmation‑based attacks: Mode A yields a 21.9 % success rate, whereas blind deduction (Mode B) succeeds only 3.5 % of the time, an odds ratio of 7.75 (p < 0.00001). This demonstrates that even partial prior knowledge dramatically boosts extraction efficiency.
In AIDG‑II, the primary failure mode is constraint violation: 41.3 % of games end with the Seeker disqualified for breaching the “no direct guessing” rule, with model‑specific disqualification rates ranging from 0 % to 72 %. This highlights that instruction‑following degrades under conversational load, especially when lexical constraints are imposed.
The authors interpret these findings as evidence that current LLMs excel at local defensive coherence (maintaining a consistent secret state) but struggle with global state tracking required for strategic inquiry over many turns. Two bottlenecks are identified: (1) Information Dynamics—confirmation strategies are far more effective than blind deduction, indicating a weakness in hypothesis generation and entropy reduction; (2) Constraint Adherence—instruction following collapses under multi‑turn pressure, accounting for a large share of deductive failures.
To address these gaps, the paper suggests (i) strengthening long‑term memory and state‑tracking mechanisms, (ii) multi‑turn prompt engineering or meta‑prompting to guide strategic reasoning, and (iii) incorporating constraint‑aware objectives via reinforcement learning from human feedback (RLHF) or rule‑based reward shaping. The authors also release the full dataset, interaction logs, and analysis scripts to promote reproducibility and encourage future work on multi‑turn strategic evaluation of LLMs.
Comments & Academic Discussion
Loading comments...
Leave a Comment