Large Language Models Persuade Without Planning Theory of Mind
A growing body of work attempts to evaluate the theory of mind (ToM) abilities of humans and large language models (LLMs) using static, non-interactive question-and-answer benchmarks. However, theoretical work in the field suggests that first-personal interaction is a crucial part of ToM and that such predictive, spectatorial tasks may fail to evaluate it. We address this gap with a novel ToM task that requires an agent to persuade a target to choose one of three policy proposals by strategically revealing information. Success depends on a persuader’s sensitivity to a given target’s knowledge states (what the target knows about the policies) and motivational states (how much the target values different outcomes). We varied whether these states were Revealed to persuaders or Hidden, in which case persuaders had to inquire about or infer them. In Experiment 1, participants persuaded a bot programmed to make only rational inferences. LLMs excelled in the Revealed condition but performed below chance in the Hidden condition, suggesting difficulty with the multi-step planning required to elicit and use mental state information. Humans performed moderately well in both conditions, indicating an ability to engage such planning. In Experiment 2, where a human target role-played the bot, and in Experiment 3, where we measured whether human targets’ real beliefs changed, LLMs outperformed human persuaders across all conditions. These results suggest that effective persuasion can occur without explicit ToM reasoning (e.g., through rhetorical strategies) and that LLMs excel at this form of persuasion. Overall, our results caution against attributing human-like ToM to LLMs while highlighting LLMs’ potential to influence people’s beliefs and behavior.
💡 Research Summary
The paper investigates whether large language models (LLMs) possess Planning Theory of Mind (PT‑ToM)—the ability to understand and manipulate another agent’s mental states through interactive, goal‑directed behavior—by using a novel persuasion task. Traditional ToM assessments rely on static, spectatorial question‑answer benchmarks that predict behavior without requiring the participant to intervene. The authors argue that true PT‑ToM involves causal reasoning about beliefs and desires and the planning of informational interventions.
In their “MindGames” framework, a persuader must convince a target to adopt one of three policy proposals (A, B, or C) that affect three attributes: safety & control, development speed, and public trust. The persuader knows the full details of all proposals, while the target initially knows only a subset and has a personal value function over the attributes. Success depends on the persuader’s ability to infer the target’s knowledge (informational state) and preferences (motivational state) and to strategically disclose information that makes the persuader’s preferred proposal (always A) appear most attractive.
Two experimental conditions manipulate the availability of the target’s mental states:
- REVEALED – the persuader can see the target’s value function and current knowledge.
- HIDDEN – the persuader must obtain this information by asking questions or inferring it from dialogue.
Experiment 1 pits LLMs against human participants when the target is a rational bot that makes only logical inferences. LLMs achieve near‑optimal persuasion rates in the REVEALED condition but fall below chance in the HIDDEN condition, indicating difficulty with multi‑step planning and mental‑state inference. Humans perform moderately well in both conditions, showing flexibility in probing and updating their model of the target.
Experiment 2 replaces the bot with a human playing the target role. Even when the target’s mental states are hidden, LLMs outperform human persuaders.
Experiment 3 measures real belief change: participants rate their policy preferences before and after interacting with the persuader. LLM‑generated dialogues produce larger belief shifts than human‑generated ones, despite the LLM not having explicit access to the target’s preferences.
These findings lead to two key conclusions. First, LLMs do not exhibit robust PT‑ToM; they struggle when the task requires active information‑seeking and counterfactual planning. Second, effective persuasion does not necessarily require explicit mental‑state modeling. LLMs can leverage memorized rhetorical patterns, logical argumentation, and statistical knowledge to influence human beliefs, often more consistently than humans.
The authors caution against interpreting LLM performance on static ToM benchmarks as evidence of human‑like theory‑of‑mind. Instead, interactive tasks that demand planning and mental‑state manipulation provide a more accurate picture of LLM capabilities. They also highlight the societal implications: because LLMs can readily sway opinions, there is an urgent need for ethical guidelines, oversight mechanisms, and research into defenses against manipulative AI‑driven persuasion. Future work should explore how to endow LLMs with genuine PT‑ToM abilities and how to mitigate the risks associated with their persuasive power.
Comments & Academic Discussion
Loading comments...
Leave a Comment