📝 Original Info
- Title: AI Deception: Risks, Dynamics, and Controls
- ArXiv ID: 2511.22619
- Date: 2025-11-27
- Authors: Boyuan Chen, Sitong Fang, Jiaming Ji, Yanxu Zhu, Pengcheng Wen, Jinzhou Wu, Yingshui Tan, Boren Zheng, Mengying Yuan, Wenqi Chen, Donghai Hong, Alex Qiu, Xin Chen, Jiayi Zhou, Kaile Wang, Juntao Dai, Borong Zhang, Tianzhuo Yang, Saad Siddiqui, Isabella Duan, Yawen Duan, Brian Tse, Jen-Tse, Huang, Kun Wang, Baihui Zheng, Jiaheng Liu, Jian Yang, Yiming Li, Wenting Chen, Dongrui Liu, Lukas Vierling, Zhiheng Xi, Haobo Fu, Wenxuan Wang, Jitao Sang, Zhengyan Shi, Chi-Min Chan, Eugenie Shi, Simin Li, Juncheng Li, Jian Yang, Wei Ji, Dong Li, Jinglin Yang, Jun Song, Yinpeng Dong, Jie Fu, Bo Zheng, Min Yang, Yike Guo, Philip Torr, Robert Trager, Yi Zeng, Zhongyuan Wang, Yaodong Yang, Tiejun Huang, Ya-Qin Zhang, Hongjiang Zhang, Andrew Yao
📝 Abstract
As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive foundations across three hierarchical levels and identify three essential capability preconditions required for deception. We further examine contextual triggers, including supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we conclude detection methods covering benchmarks and evaluation protocols in static and interactive settings. Building on the three core factors of deception emergence, we outline potential mitigation strategies and propose auditing approaches that integrate technical, community, and governance efforts to address sociotechnical challenges and future AI risks. To support ongoing work in this area, we release a living resource at www.deceptionsurvey.com.
💡 Deep Analysis
Deep Dive into AI Deception: Risks, Dynamics, and Controls.
As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk across language models, AI agents, and emerging frontier systems. This project provides a comprehensive and up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception. We then review existing empirical studies and associated risks, highlighting deception as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment. Deception emergence reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential inevitably engage in deceptive be
📄 Full Content
AI Deception: Risks, Dynamics, and Controls
Project Team1
1The full list of Senior Advisors, Project Leaders, and Core Contributors is detailed on page 5.
# deceptionsurvey@gmail.com, www.deceptionsurvey.com
Abstract | As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to
secure self-beneficial outcomes, has evolved from a speculative concern to an empirically demonstrated risk
across language models, AI agents, and emerging frontier systems. This survey provides a comprehensive and
up-to-date overview of the AI deception field, covering its core concepts, methodologies, genesis, and potential
mitigations. First, we identify a formal definition of AI deception, grounded in signaling theory from studies
of animal deception. We then review existing empirical studies and associated risks, highlighting deception
as a sociotechnical safety challenge. We organize the landscape of AI deception research as a deception cycle,
consisting of two key components: deception emergence and deception treatment. Deception emergence
reveals the mechanisms underlying AI deception: systems with sufficient capability and incentive potential
inevitably engage in deceptive behaviors when triggered by external conditions. Deception treatment, in
turn, focuses on detecting and addressing such behaviors. On deception emergence, we analyze incentive
foundations across three hierarchical levels and identify three essential capability preconditions, namely
perception, planning, and performing, required for deception. We further examine contextual triggers, including
supervision gaps, distributional shifts, and environmental pressures. On deception treatment, we survey
detection methods spanning both external and internal analyses, covering benchmarks and evaluation protocols
in static and interactive settings. Building on the three core factors of deception emergence, we outline potential
mitigation strategies and propose auditing approaches that integrate technical, community, and governance
efforts to address sociotechnical challenges and future AI risks.
This survey concludes on key challenges and future directions in AI deception research, aiming to provide
a comprehensive and insightful review of AI deception research. To support ongoing work in this area, we
release a living resource at www.deceptionsurvey.com, continuously capturing the latest developments
and curating collections of papers, blog posts, and other resources.
One may smile, and smile, and be a villain.
— William Shakespeare
arXiv:2511.22619v2 [cs.AI] 3 Dec 2025
AI Deception: Risks, Dynamics, and Controls
Executive Summary
AI systems are increasingly capable, interactive, and embedded in sensitive workflows. With these
advances, the possibility of deception, where systems cause humans or other agents to hold false
beliefs that benefit the system, has moved from speculation to empirical reality. This survey provides
a comprehensive mapping of the AI deception field, integrating definitions, empirical taxonomy, risks,
causal mechanisms, and treatments into a unified framework.
Definition of AI Deception Although deception is conventionally associated with intent, we char-
acterize AI deception through a functional lens, referring to behaviors that mislead human or other
AI systems and yield outcomes aligned with the system’s objectives. Thus, AI deception can be
understood as a signal-based causal process in which a model, acting as the sender, produces signals
that induce the receiver to form false beliefs and respond rationally on the basis of those beliefs,
thereby yielding actual or potential benefits for the sender. Its formal elements include the sender
and the receiver, the signals and subsequent actions, the resulting utility, and the temporal dimension.
In multi-step interactions, if the trajectory of the receiver’s beliefs persistently deviates from reality in
ways that enhance the sender’s utility, the behavior constitutes sustained deception. This formulation
avoids presuppositions about the model’s intent and instead relies on a causal criterion: whether the
signals systematically induce false beliefs, alter the receiver’s behavior, and advantage the sender.
Capability
scaling
Deception
scaling
Figure 1 | The Entanglement of Intelligence and Deception. (1) The Möbius Lock: Contrary
to the view that capability and safety are opposites, advanced reasoning and deception actually
exist on the same Möbius surface. They are fundamentally linked; as AI capabilities grow, deception
becomes deeply rooted in the system. It is impossible to remove it without damaging the model’s
core intelligence. (2) The Shadow of Intelligence: Deception is not a bug or error, but an intrinsic
companion of advanced intelligence. As models expand their boundaries in complex reasoning and
intent understanding, the risk space for strategic deception exhibits non-linear, exponential growth.
(3) The Cyclic Dilemma: Mitigation strategi
…(Full text truncated)…
📸 Image Gallery
Reference
This content is AI-processed based on ArXiv data.