Hallucination Begins Where Saliency Drops

Hallucination Begins Where Saliency Drops
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent studies have examined attention dynamics in large vision-language models (LVLMs) to detect hallucinations. However, existing approaches remain limited in reliably distinguishing hallucinated from factually grounded outputs, as they rely solely on forward-pass attention patterns and neglect gradient-based signals that reveal how token influence propagates through the network. To bridge this gap, we introduce LVLMs-Saliency, a gradient-aware diagnostic framework that quantifies the visual grounding strength of each output token by fusing attention weights with their input gradients. Our analysis uncovers a decisive pattern: hallucinations frequently arise when preceding output tokens exhibit low saliency toward the prediction of the next token, signaling a breakdown in contextual memory retention. Leveraging this insight, we propose a dual-mechanism inference-time framework to mitigate hallucinations: (1) Saliency-Guided Rejection Sampling (SGRS), which dynamically filters candidate tokens during autoregressive decoding by rejecting those whose saliency falls below a context-adaptive threshold, thereby preventing coherence-breaking tokens from entering the output sequence; and (2) Local Coherence Reinforcement (LocoRE), a lightweight, plug-and-play module that strengthens attention from the current token to its most recent predecessors, actively counteracting the contextual forgetting behavior identified by LVLMs-Saliency. Extensive experiments across multiple LVLMs demonstrate that our method significantly reduces hallucination rates while preserving fluency and task performance, offering a robust and interpretable solution for enhancing model reliability. Code is available at: https://github.com/zhangbaijin/LVLMs-Saliency


💡 Research Summary

The paper tackles the persistent problem of hallucinations in large vision‑language models (LVLMs) by introducing a novel diagnostic metric called LVLMs‑Saliency and two inference‑time interventions that directly address the identified cause. Existing work has largely relied on forward‑pass attention maps, which only reveal where the model looks but not how changes in inputs affect outputs. Consequently, they cannot reliably differentiate hallucinated from factual tokens.
LVLMs‑Saliency is defined as the element‑wise product of the attention weight matrix A(l,h) and its gradient with respect to the loss ∇A(l,h) for each layer l and head h, followed by a lower‑triangular mask to preserve causality. After averaging across heads and ℓ2‑normalizing, the resulting matrix (\bar S(l)) quantifies how strongly each previously generated token influences the prediction of the next token. High saliency indicates strong contextual grounding; low saliency signals a breakdown in the model’s internal memory of its own output.
Empirical analysis on Qwen2‑VL‑7B and LLaVA‑1.5‑7B across 500 samples shows a clear pattern: correct tokens maintain a decaying yet consistently high saliency toward recent outputs, whereas hallucinated tokens exhibit a near‑collapse of saliency across all prior outputs. Prompt saliency varies little between correct and hallucinated cases, suggesting that the root cause lies in the output stream rather than the input prompt.
Based on this insight, the authors propose two complementary mechanisms.

  1. Saliency‑Guided Rejection Sampling (SGRS) – At each decoding step, the model draws a top‑K set of candidate tokens. For each candidate c, its saliency score S(c) is computed by averaging (\bar S(l)) over a set of target layers and over all previously generated output positions. An adaptive threshold τ(P) is derived from the average saliency of the most recent W tokens (τ(P)=α·mean_{j∈H}S(x_j)). Candidates with S(c) < τ(P) are rejected and resampled; if all are rejected, the token with the highest saliency is forced. This gatekeeper prevents low‑grounded tokens from entering the sequence, directly addressing the “context loss” that precedes hallucination.
  2. Local Coherence Reinforcement (LocoRE) – After a token is accepted, the attention matrix for the next step is modified by a distance‑aware gain γ(P)_j = 1 + β·I(P−j ≤ w_s) applied to keys corresponding to the most recent w_s output tokens. This operation boosts the attention weight from the current query to its immediate predecessors without any gradient computation or parameter update. By reinforcing recent context, LocoRE mitigates the decay of saliency observed in Pattern 1, ensuring that the model does not “forget” its own recent outputs.
    The two components form a closed‑loop system: SGRS filters tokens at entry, while LocoRE stabilizes the contextual links after entry.
    Experiments span three LVLM families (Qwen2‑VL‑7B, LLaVA‑v1.5‑7/13B, Intern‑VL‑7/13B) and evaluate on (1) hallucination‑specific benchmarks (CHAIR, POPE), (2) general VQA datasets (VizWiz, ScienceQA), and (3) comprehensive multimodal suites (MM‑VET, MME). Compared with prior decoding‑based methods (OPERA, DOPRA), logit‑adjustment SFT approaches (LESS, CCA‑LLaVA), and attention‑rebalancing techniques (EAH, Farsight), the combined SGRS + LocoRE pipeline reduces hallucination rates by roughly 30 % on average while preserving or slightly improving standard generation metrics (BLEU, ROUGE, CIDEr). Ablation studies show that SGRS alone already yields substantial gains, but adding LocoRE further improves long‑range coherence and prevents error propagation in extended captions.
    The paper’s contributions are threefold: (1) a gradient‑aware saliency metric that provides an interpretable, token‑level signal of visual grounding; (2) a dynamic, saliency‑driven rejection sampling strategy that blocks low‑grounded tokens during decoding; and (3) a lightweight attention‑modulation module that reinforces recent context without extra training. By establishing a causal link between low output‑token saliency and hallucination, the work offers both a diagnostic tool and a practical mitigation technique, opening avenues for future research on saliency‑guided pre‑training, fine‑tuning, and cross‑modal grounding enhancements.

Comments & Academic Discussion

Loading comments...

Leave a Comment