A Unified Definition of Hallucination: It's The World Model, Stupid!

Reading time: 6 minute
...

📝 Original Info

  • Title: A Unified Definition of Hallucination: It’s The World Model, Stupid!
  • ArXiv ID: 2512.21577
  • Date: 2025-12-25
  • Authors: ** Emmy Liu, Varun Gangal, Chelsea Zou, Michael Yu, Xiaoqi Huang, Alex Chang, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng **

📝 Abstract

Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today's frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference "world", distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we outline plans for a family of benchmarks using synthetic, fully specified reference world models to stress-test and improve world modeling components.

💡 Deep Analysis

Deep Dive into A Unified Definition of Hallucination: It's The World Model, Stupid!.

Despite numerous attempts at mitigation since the inception of language models, hallucinations remain a persistent problem even in today’s frontier LLMs. Why is this? We review existing definitions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be unified by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which contradicts a knowledge base OR producing a summary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evaluations to clarify their assumed reference “world”, distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this definition, we outline pla

📄 Full Content

A Unified Definition of Hallucination: It’s The World Model, Stupid! Emmy Liu 1 † Varun Gangal 2 † Chelsea Zou 3 † Michael Yu 4 † Xiaoqi Huang 4 † Alex Chang 4 † Zhuofu Tao 4 † Karan Singh 3 † Sachin Kumar 5 † Steven Y. Feng 3 † Abstract Despite numerous attempts at mitigation since the inception of language models, hallucinations re- main a persistent problem even in today’s frontier LLMs. Why is this? We review existing defini- tions of hallucination and fold them into a single, unified definition wherein prior definitions are subsumed. We argue that hallucination can be uni- fied by defining it as simply inaccurate (internal) world modeling, in a form where it is observable to the user. For example, stating a fact which con- tradicts a knowledge base OR producing a sum- mary which contradicts the source. By varying the reference world model and conflict policy, our framework unifies prior definitions. We argue that this unified view is useful because it forces evalu- ations to clarify their assumed reference “world”, distinguishes true hallucinations from planning or reward errors, and provides a common language for comparison across benchmarks and discussion of mitigation strategies. Building on this defini- tion, we outline plans for a family of benchmarks using synthetic, fully specified reference world models to stress-test and improve world modeling components. 1. Introduction Suppose a language model is given this passage in context: “Sherlock Holmes lives at 221B Baker Street in London” and is asked the question “Where does Sherlock Holmes live?” It responds with “Sherlock Holmes is a fictional character and has no real address”. Is this statement considered a https://github.com/DegenAI-Labs/HalluWorld † DegenAI Labs 1Carnegie Mellon University 2Patronus AI 3Stanford University 4Independent Researcher 5The Ohio State University. Correspondence to: Emmy Liu , Varun Gangal , Steven Y. Feng . Preprint. February 4, 2026. Figure 1. Hallucination as inaccurate (internal) world modeling hallucination? A summarization researcher might say yes, as the model contradicted the source. An open-domain QA researcher might say no, as it is true that Sherlock Holmes has no real world address. Hallucination, as a term, has evolved since its first introduction, and now means different things in different contexts. This fragmentation of defi- nitions, tailored to task-specific assumptions about what constitutes truth, makes it difficult to answer basic questions about hallucination such as: Can LLMs ever be expected to stop hallucinating? or Can we guarantee that hallucina- tions will happen a predictable percentage of the time? If we report that a technique “reduces hallucinations by 40%”, what is this actually measuring, and can we expect these improvements to transfer across contexts? These are not just philosophical questions, but have real impact on research directions and the public’s trust in deployed systems. We argue that existing definitions of hallucination can be unified as inaccurate world modeling that is observable to the user through model outputs. More explicitly, we argue that every setting implicitly assumes (1) a reference world model encoding what is true, (2) a view function specifying what information the model can observe, and (3) a conflict policy determining how contradictions resolve. A model hallucinates when its outputs imply claims that are false according to the reference world model and conflict policy. Prior definitions simply make different choices for these components, but all share this underlying structure. We further argue that making this structure explicit has prac- tical benefits. First, it forces benchmark designers to be more explicit about their assumptions, enabling better com- 1 arXiv:2512.21577v2 [cs.CL] 3 Feb 2026 A Unified Definition of Hallucination: It’s The World Model, Stupid! parison across existing benchmarks. It also distinguishes hallucination from other errors that can arise even with a correct world model, and provides a common language for why certain mitigations help in some settings but not others. Lastly, it enables the creation of a class of larger-scale and practically useful benchmarks by using synthetic environ- ments where the reference world is fully specified (game environments, simulated databases, or structured worlds). By having a clear definition, we can more easily generate evaluation instances where hallucination labels are fully determined rather than requiring human annotation. We start by reviewing how definitions of hallucination have evolved over time (§2) before formalizing our framework and unifying existing definitions (§3). We argue in §4 for the benefits of unification, and outline in §5 how our framework enables the creation of scalable benchmarks for advancing hallucination research. Lastly, we discuss some alternative views (§6), a call to action (§7), and future

…(Full text truncated)…

📸 Image Gallery

ActualBoardState.png HallucinatedBoardState.png chess_overview.jpeg cmu_logo.png core_defn.jpeg cover.png degenailabs_logo.png osu_logo.png page_2.webp page_3.webp patronus_logo.png stanford_logo.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut