Title: Faith Lens 대규모 언어 모델 신뢰성 환각 탐지를 위한 비용 효율적 모델
ArXiv ID: 2512.20182
Date: Pending
Authors: ** - Shuzheng Si*♠♢ - Qingyi Wang*‹ - Haozhe Zhao*♣ - Yuzhuo Bai♠ - Guanqiao Chen♠ - Kangyang Luo♠ - Gang Chen♢ - Fanchao Qi♠ - Minjia Zhang♣ - Baobao Chang♡ - Maosong Sun♠ 소속 - ♠ Tsinghua University - ♢ DeepLang AI - ‹ Fudan University - ♣ University of Illinois Urbana‑Champaign - ♡ Peking University — **
📝 Abstract
Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce Faith-Lens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rulebased reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness. 1
💡 Deep Analysis
📄 Full Content
FaithLens: Detecting and Explaining Faithfulness Hallucination
Shuzheng Si*♠♢, Qingyi Wang*‹, Haozhe Zhao*♣, Yuzhuo Bai♠,
Guanqiao Chen♠Kangyang Luo♠, Gang Chen♢, Fanchao Qi♠,
Minjia Zhang♣, Baobao Chang♡, and Maosong Sun♠
♠Tsinghua University
♢DeepLang AI
‹ Fudan University
♣University of Illinois Urbana-Champaign
♡Peking University
Abstract
Recognizing whether outputs from large lan-
guage models (LLMs) contain faithfulness hal-
lucination is crucial for real-world applications,
e.g., retrieval-augmented generation and sum-
marization. In this paper, we introduce Faith-
Lens, a cost-efficient and effective faithfulness
hallucination detection model that can jointly
provide binary predictions and corresponding
explanations to improve trustworthiness. To
achieve this, we first synthesize training data
with explanations via advanced LLMs and ap-
ply a well-defined data filtering strategy to en-
sure label correctness, explanation quality, and
data diversity. Subsequently, we fine-tune the
model on these well-curated training data as
a cold start and further optimize it with rule-
based reinforcement learning, using rewards
for both prediction correctness and explanation
quality. Results on 12 diverse tasks show that
the 8B-parameter FaithLens outperforms ad-
vanced models such as GPT-4.1 and o3. Also,
FaithLens can produce high-quality explana-
tions, delivering a distinctive balance of trust-
worthiness, efficiency, and effectiveness. 1
1
Introduction
Recent progress in large language models (LLMs)
has revolutionized text generation (OpenAI, 2025).
In practice, LLMs are widely used to generate co-
herent responses based on the provided contextual
information, e.g., retrieval-augmented generation
(RAG) (Wang et al., 2025). However, LLMs are
prone to generating hallucinated claims that are
inconsistent or irrelevant to the given context, i.e.,
faithfulness hallucinations (Bi et al., 2025; Si et al.,
2025c). Therefore, detecting such hallucinations is
critical for providing responsible LLM services.
To identify faithfulness hallucinations in LLM-
generated outputs, recent works utilize the strong
* Equal Contribution.
1 The data and code will be available at https://github.com/
S1s-Z/FaithLens. Email: ssz24@mails.tsinghua.edu.cn.
Summarization
Summarize:
The docs mainly
describe …
Fixed-Doc QA
What/When/…?
provided docs
According to docs,
the answer is …
( , )
Faithfulness Hallucination Detection
Claim
RAG
What/When/…?
Based on the web,
the answer is …
web contents
provided docs
LLMs
FaithLens
Prediction
CoT
Explanation
Response
I think ... Based on doc… Binary answer
Figure 1: The illustration of our FaithLens. Given a
document doc and a claim c, FaithLens can jointly de-
termine whether the claim is faithful or hallucinated and
provide the corresponding explanations for its decision,
applicable across various tasks.
generalization abilities of LLMs and formulate it as
a binary classification task (Wang et al., 2024). The
first line of research leverages designed prompts
to query advanced LLMs like GPT-4o (OpenAI,
2023) to check if generated outputs contain hallu-
cinated claims (Liu et al., 2023c; Lei et al., 2023;
Dhuliawala et al., 2024; Muhammed et al., 2025),
e.g., SelfCheckGPT (Manakul et al., 2023). How-
ever, these methods are inefficient for real-world de-
ployment because they rely on large and advanced
models to achieve reliable detection performance.
Thus, many studies have focused on developing
cost-efficient and specialized classifiers to detect
hallucinations (Zha et al., 2023; Seo et al., 2025).
For example, MiniCheck (Tang et al., 2024a) uses
synthetic data generation techniques to train a 7B-
parameter model, achieving performance compa-
rable to GPT-4o. However, developing a detection
model for real-world users still faces three key chal-
lenges. Specifically, (1) Lack of Explainability:
Current methods typically treat faithfulness hallu-
cination detection as a binary classification task,
acting as a black box that only returns the final pre-
diction without corresponding explanation (Tang
1
arXiv:2512.20182v2 [cs.CL] 5 Jan 2026
et al., 2024a). This makes it difficult for users to
localize errors and understand why tested claims
are hallucinated, which limits the trustworthiness
of detection models. (2) Inconsistent Generaliza-
tion across Tasks: Previous methods are primarily
designed for detecting task-specific hallucination
(George and Stuhlmueller, 2023), e.g., summariza-
tion (Wan et al., 2024), and then fail to transfer
across different tasks effectively. Even the mod-
els designed for general-purpose scenarios (Tang
et al., 2024a; Lei et al., 2025; Seo et al., 2025)
still perform unevenly on different tasks because
each task may have unique hallucination patterns.
For example, summarization hallucinations typi-
cally manifest as subtly distorted content from the
context (Li and Yu, 2025), whereas RAG hallucina-
tions often ignore the retrieved context and involve
conf