Faith Lens 대규모 언어 모델 신뢰성 환각 탐지를 위한 비용 효율적 모델

Reading time: 5 minute
...

📝 Original Info

  • Title: Faith Lens 대규모 언어 모델 신뢰성 환각 탐지를 위한 비용 효율적 모델
  • ArXiv ID: 2512.20182
  • Date: Pending
  • Authors: ** - Shuzheng Si*♠♢ - Qingyi Wang*‹ - Haozhe Zhao*♣ - Yuzhuo Bai♠ - Guanqiao Chen♠ - Kangyang Luo♠ - Gang Chen♢ - Fanchao Qi♠ - Minjia Zhang♣ - Baobao Chang♡ - Maosong Sun♠ 소속 - ♠ Tsinghua University - ♢ DeepLang AI - ‹ Fudan University - ♣ University of Illinois Urbana‑Champaign - ♡ Peking University — **

📝 Abstract

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce Faith-Lens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rulebased reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness. 1

💡 Deep Analysis

Figure 1

📄 Full Content

FaithLens: Detecting and Explaining Faithfulness Hallucination Shuzheng Si*♠♢, Qingyi Wang*‹, Haozhe Zhao*♣, Yuzhuo Bai♠, Guanqiao Chen♠Kangyang Luo♠, Gang Chen♢, Fanchao Qi♠, Minjia Zhang♣, Baobao Chang♡, and Maosong Sun♠ ♠Tsinghua University ♢DeepLang AI ‹ Fudan University ♣University of Illinois Urbana-Champaign ♡Peking University Abstract Recognizing whether outputs from large lan- guage models (LLMs) contain faithfulness hal- lucination is crucial for real-world applications, e.g., retrieval-augmented generation and sum- marization. In this paper, we introduce Faith- Lens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and ap- ply a well-defined data filtering strategy to en- sure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule- based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms ad- vanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explana- tions, delivering a distinctive balance of trust- worthiness, efficiency, and effectiveness. 1 1 Introduction Recent progress in large language models (LLMs) has revolutionized text generation (OpenAI, 2025). In practice, LLMs are widely used to generate co- herent responses based on the provided contextual information, e.g., retrieval-augmented generation (RAG) (Wang et al., 2025). However, LLMs are prone to generating hallucinated claims that are inconsistent or irrelevant to the given context, i.e., faithfulness hallucinations (Bi et al., 2025; Si et al., 2025c). Therefore, detecting such hallucinations is critical for providing responsible LLM services. To identify faithfulness hallucinations in LLM- generated outputs, recent works utilize the strong * Equal Contribution. 1 The data and code will be available at https://github.com/ S1s-Z/FaithLens. Email: ssz24@mails.tsinghua.edu.cn. Summarization Summarize: The docs mainly describe … Fixed-Doc QA What/When/…? provided docs According to docs, the answer is … ( , ) Faithfulness Hallucination Detection Claim RAG What/When/…? Based on the web, the answer is … web contents provided docs LLMs FaithLens Prediction CoT Explanation Response I think ... Based on doc… Binary answer Figure 1: The illustration of our FaithLens. Given a document doc and a claim c, FaithLens can jointly de- termine whether the claim is faithful or hallucinated and provide the corresponding explanations for its decision, applicable across various tasks. generalization abilities of LLMs and formulate it as a binary classification task (Wang et al., 2024). The first line of research leverages designed prompts to query advanced LLMs like GPT-4o (OpenAI, 2023) to check if generated outputs contain hallu- cinated claims (Liu et al., 2023c; Lei et al., 2023; Dhuliawala et al., 2024; Muhammed et al., 2025), e.g., SelfCheckGPT (Manakul et al., 2023). How- ever, these methods are inefficient for real-world de- ployment because they rely on large and advanced models to achieve reliable detection performance. Thus, many studies have focused on developing cost-efficient and specialized classifiers to detect hallucinations (Zha et al., 2023; Seo et al., 2025). For example, MiniCheck (Tang et al., 2024a) uses synthetic data generation techniques to train a 7B- parameter model, achieving performance compa- rable to GPT-4o. However, developing a detection model for real-world users still faces three key chal- lenges. Specifically, (1) Lack of Explainability: Current methods typically treat faithfulness hallu- cination detection as a binary classification task, acting as a black box that only returns the final pre- diction without corresponding explanation (Tang 1 arXiv:2512.20182v2 [cs.CL] 5 Jan 2026 et al., 2024a). This makes it difficult for users to localize errors and understand why tested claims are hallucinated, which limits the trustworthiness of detection models. (2) Inconsistent Generaliza- tion across Tasks: Previous methods are primarily designed for detecting task-specific hallucination (George and Stuhlmueller, 2023), e.g., summariza- tion (Wan et al., 2024), and then fail to transfer across different tasks effectively. Even the mod- els designed for general-purpose scenarios (Tang et al., 2024a; Lei et al., 2025; Seo et al., 2025) still perform unevenly on different tasks because each task may have unique hallucination patterns. For example, summarization hallucinations typi- cally manifest as subtly distorted content from the context (Li and Yu, 2025), whereas RAG hallucina- tions often ignore the retrieved context and involve conf

📸 Image Gallery

logo-nobg.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut