TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context, but hallucination detection for D-LLMs remains underexplored. Prior detectors developed for auto-regressive LLMs typically rely on single-pass cues and do not directly transfer to diffusion generation, where factuality evidence is distributed across the denoising trajectory and may appear, drift, or be self-corrected over time. We introduce TDGNet, a temporal dynamic graph framework that formulates hallucination detection as learning over evolving token-level attention graphs. At each denoising step, we sparsify the attention graph and update per-token memories via message passing, then apply temporal attention to aggregate trajectory-wide evidence for final prediction. Experiments on LLaDA-8B and Dream-7B across QA benchmarks show consistent AUROC improvements over output-based, latent-based, and static-graph baselines, with single-pass inference and modest overhead. These results highlight the importance of temporal reasoning on attention graphs for robust hallucination detection in diffusion language models.

💡 Research Summary

The paper addresses the largely unexplored problem of hallucination detection in diffusion‑based large language models (D‑LLMs). Unlike auto‑regressive LLMs, D‑LLMs generate text by iteratively denoising an entire sequence, using bidirectional attention at every diffusion step. Consequently, factuality cues are not confined to a single output snapshot; they evolve, appear, drift, or self‑correct across the denoising trajectory. Existing hallucination detectors for AR‑LLMs—whether output‑based (e.g., entropy, lexical similarity), latent‑based (e.g., EigenScore), or static‑graph approaches—cannot directly handle this temporal dimension because they rely on a single pass likelihood or a final hidden state.

To capture the dynamic nature of D‑LLM generation, the authors propose TDGNet, a Temporal Dynamic Graph Network. The core idea is to treat each diffusion step t as a graph snapshot G(t) whose nodes are the fixed token positions and whose directed edges are derived from the model’s attention matrix A(t). An edge (j→i) is kept only if the attention weight exceeds a sparsity threshold τ, yielding a sparse, evolving token‑interaction graph. Node attributes are obtained from the residual stream (projected to a low‑dimensional vector) and edge attributes are the averaged attention scores across heads.

TDGNet processes the sequence of graphs {G(0)…G(T)} through three stages:

Spatial Aggregation – A Message‑Passing Neural Network (MPNN) aggregates information from each token’s neighbors at step t, producing a message vector (\bar{m}_i(t)). The message function ψ is implemented as an MLP that takes the source node feature, target node feature, and edge feature as inputs.
Temporal Memory Update – Each token i maintains a persistent memory state (s_i(t)). The memory is updated with a recurrent unit (e.g., GRU) that ingests the aggregated message: (s_i(t)=\text{GRU}(s_i(t-1),\bar{m}_i(t))). This mechanism accumulates how a token’s relational role changes throughout the denoising process.
Trajectory Readout – A temporal attention layer assigns a scalar weight (\alpha_t) to each diffusion step (learned via a simple linear time encoding followed by softmax). The final hallucination probability for a token is computed as (\sigma\big(\sum_t \alpha_t \cdot \text{MLP}(s_i(t))\big)). Sequence‑level detection can be obtained by averaging token scores or adding a dedicated readout token.

The authors evaluate TDGNet on two open‑source D‑LLMs—LLaDA‑8B and Dream‑7B—across four QA benchmarks (Math, CSQA, HotpotQA, TriviaQA). Using AUROC as the primary metric, TDGNet consistently outperforms strong baselines: output‑based methods (semantic entropy, lexical similarity, log‑norm entropy, perplexity), latent‑based EigenScore, and static‑graph baselines such as Temporal Subgraph Voting (TSV) and CCS. Improvements range from 4 to 8.5 absolute AUROC points, with TDGNet achieving the highest average scores (e.g., 0.72 on LLaDA‑8B and 0.74 on Dream‑7B).

A detailed analysis of hallucination dynamics reveals four characteristic patterns: (1) Self‑Correction – early noisy tokens become factual later; (2) Correctness Decay – factual tokens drift into hallucinations; (3) Semantic Drift – meaning gradually diverges from the prompt; (4) Persistent Error – early mistakes remain unchanged. Visualizations of attention graphs across steps show how TDGNet distinguishes stable grounding (stable edge clusters) from unstable drift (edges shifting to irrelevant tokens).

Ablation studies confirm the necessity of each component: (a) removing sparsification (using fully connected graphs) degrades AUROC by ~4%; (b) discarding temporal attention and relying only on the final step reduces performance by ~6–8%; (c) eliminating the persistent memory and using a single‑step MPNN drops AUROC by >7%. These results demonstrate that both structural (graph) and temporal (memory + attention) modeling are essential.

From a systems perspective, TDGNet incurs modest overhead. Graph sparsification keeps the number of edges per step to roughly 5–10 % of the total possible token pairs, leading to linear‑time message passing. The overall inference cost is comparable to a single‑pass output‑based detector, with only a 10–15 % increase in memory consumption. Consequently, TDGNet can be integrated into real‑time pipelines without prohibitive latency.

In summary, the paper introduces a novel paradigm for factuality verification in diffusion language models by explicitly modeling the evolving token‑interaction graph and aggregating temporal evidence through a dynamic graph neural network. TDGNet bridges the gap left by AR‑LLM detectors, delivering robust hallucination detection that leverages the unique bidirectional and iterative nature of D‑LLMs. The approach opens avenues for extending temporal‑graph‑based verification to multimodal diffusion models, tool‑augmented generation, and downstream safety frameworks.

TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs

💡 Research Summary

Comments & Academic Discussion

Leave a Comment