TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention

TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Object Hallucination (OH) has been acknowledged as one of the major trustworthy challenges in Large Vision-Language Models (LVLMs). Recent advancements in Large Language Models (LLMs) indicate that internal states, such as hidden states, encode the “overall truthfulness” of generated responses. However, it remains under-explored how internal states in LVLMs function and whether they could serve as “per-token” hallucination indicators, which is essential for mitigating OH. In this paper, we first conduct an in-depth exploration of LVLM internal states with OH issues and discover that (1) LVLM internal states are high-specificity per-token indicators of hallucination behaviors. Moreover, (2) different LVLMs encode universal patterns of hallucinations in common latent subspaces, indicating that there exist “generic truthful directions” shared by various LVLMs. Based on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt) that first learns the truthful direction of LVLM decoding and then applies truthful-guided inference-time intervention during LVLM decoding. We further propose TruthPrInt to enhance both cross-LVLM and cross-data hallucination detection transferability by constructing and aligning hallucination latent subspaces. We evaluate TruthPrInt in extensive experimental settings, including in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks. Experimental results indicate that TruthPrInt significantly outperforms state-of-the-art methods. Codes will be available at https://github.com/jinhaoduan/TruthPrInt.


💡 Research Summary

Object Hallucination (OH) – the generation of nonexistent visual elements by large vision‑language models (LVLMs) – remains a critical obstacle to trustworthy multimodal AI. Recent work on large language models (LLMs) has shown that hidden states encode a notion of “overall truthfulness,” but it is unclear whether LVLM internal representations carry similar signals at the token level and whether such signals can be leveraged for practical mitigation. This paper conducts a systematic investigation of LVLM hidden states, discovers that (1) the hidden state immediately preceding an object token is a highly specific per‑token indicator of hallucination, and (2) different LVLMs share universal hallucination patterns that reside in a common latent subspace, which we term a “generic truthful direction.”

To exploit these findings, the authors propose Truthful‑Guided Pre‑Intervention (TruthPrInt), a two‑stage framework. First, a per‑token hallucination detector is trained on a curated dataset of hidden states collected from MiniGPT‑4, LLaVA‑1.5, and mPLUG‑Owl2. The detector is a three‑layer MLP that receives the hidden state of the token before an object token and predicts whether the upcoming object token will be hallucinated. Although overall classification accuracy is modest, the detector achieves extremely low false‑positive rates (FPR ≈ 0.01) and a likelihood ratio for positive results (LR⁺) of 18‑22, indicating that it can reliably flag hallucination candidates while generating almost no false alarms.

Second, TruthPrInt learns the “truthful direction” in the latent space of the LVLM’s decoder. During generation, the current hidden state is projected onto this direction; if its alignment falls below a pre‑defined threshold, the token is deemed hallucinated. The framework then intervenes by either rolling back the decoder to an earlier step and re‑generating the next token under a projection that forces the hidden state toward the truthful direction, or by directly replacing the token with the most truthful candidate. This intervention is performed at inference time, incurs only a modest computational overhead (≈ 5 % extra per token), and prevents the model from committing the hallucinated token in the first place.

A major challenge for any hallucination detector is transferability across models and data distributions. To address this, the authors introduce ComnHallu, an unsupervised subspace alignment technique. For a source domain (training) and a target domain (testing), they compute the covariance of hidden states, perform eigen‑decomposition, and retain the top d′ eigenvectors to form independent subspaces K_S and K_T. These subspaces preserve the most variance – and thus the salient hallucination information – of each domain. A linear alignment matrix M = K_Tᵀ K_S maps K_S onto K_T, yielding an aligned subspace K_align_S. Projecting all hidden states onto the aligned subspaces aligns their distributions while retaining hallucination cues, enabling a detector trained on one LVLM to generalize to another and to out‑of‑distribution (OOD) datasets.

The experimental protocol spans both in‑domain and OOD evaluations. In‑domain tests use the CC‑Sbu‑Align dataset (≈ 3.4 k image‑caption pairs) to train and validate the per‑token detector. OOD benchmarks include CHAIR, POPE, and LLaVA‑Bench, and the evaluation covers five LVLMs: MiniGPT‑4, LLaVA‑1.5, mPLUG‑Owl2, Qwen‑VL, and InternVL‑2.5. Results show that TruthPrInt consistently outperforms state‑of‑the‑art baselines such as contrastive decoding and post‑processing pipelines. Across all models, the average F1 score improves by 8‑15 % and the hallucination rate drops by more than 30 % relative to baselines. Notably, the high‑specificity detector maintains an FPR of 0.01 while achieving LR⁺ ≈ 20, confirming its practical reliability.

The paper’s contributions are threefold: (1) an empirical demonstration that LVLM hidden states are high‑specificity per‑token hallucination indicators; (2) the TruthPrInt framework that couples a learned truthful direction with inference‑time intervention to suppress hallucinations; (3) the ComnHallu subspace alignment method that endows the detector with cross‑model and cross‑data transferability. By focusing on token‑level signals rather than overall uncertainty, the work opens a new avenue for real‑time, low‑overhead mitigation of multimodal hallucinations. Limitations include the current focus on noun‑type object tokens, the need for further analysis of subspace dimensionality, and the absence of user‑feedback loops for dynamic truthful‑direction updates. Future work may extend the approach to verbs, adjectives, and more complex multimodal alignments, as well as explore continual learning schemes that adapt the truthful direction as models evolve.

In summary, TruthPrInt provides a principled, scalable solution to the longstanding problem of object hallucination in LVLMs, demonstrating that internal representations can be harnessed not only to detect but also to prevent hallucinations, and that these mechanisms can generalize across diverse models and datasets. This represents a significant step toward trustworthy, reliable vision‑language AI.


Comments & Academic Discussion

Loading comments...

Leave a Comment