Hallucination-Resistant Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement

Hallucination-Resistant Relation Extraction via Dependency-Aware Sentence Simplification and Two-tiered Hierarchical Refinement
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Relation extraction (RE) enables the construction of structured knowledge for many downstream applications. While large language models (LLMs) have shown great promise in this task, they often struggle to reliably determine whether a relation exists, particularly in sentences with complex syntax or subtle semantics. For instance, we find that Qwen2.5-14B-Instruct incorrectly predicts a relation in 96.9% of NO-RELATION instances on SciERC, revealing a severe hallucination problem. To address these challenges, we propose DEPTH, a framework that integrates Dependency-aware sEntence simPlification and Two-tiered Hierarchical refinement into the relation extraction pipeline. Given a sentence and its candidate entity pairs, DEPTH operates in two stages: (1) the Grounding module extracts relations for each pair by leveraging their shortest dependency path, distilling the sentence into a minimal yet coherent relational context that reduces syntactic noise while preserving key semantics; (2) the Refinement module aggregates all local predictions and revises them based on a holistic understanding of the sentence, correcting omissions and inconsistencies. We further introduce a causality-driven reward model that mitigates reward hacking by disentangling spurious correlations, enabling robust fine-tuning via reinforcement learning with human feedback. Experiments on eight well-established benchmarks demonstrate that DEPTH reduces the average hallucination rate to 7.9% while achieving a 9.3% improvement in average F1 score over existing LLM-based extraction baselines.


💡 Research Summary

The paper tackles a critical shortcoming of large language models (LLMs) in relation extraction (RE): the tendency to hallucinate relations when none exist, especially in sentences with complex syntax. The authors introduce DEPTH, a two‑stage framework designed to dramatically reduce such false positives while improving overall extraction quality.

Stage 1 – Grounding (Dependency‑aware Sentence Simplification)
Given a sentence and a candidate entity pair (e₁, e₂), DEPTH first runs a modern dependency parser (e.g., spaCy) to obtain the full dependency tree. It then extracts the shortest dependency path (SDP) connecting the two entities, which is known to capture the essential relational semantics. The sentence is simplified by retaining only the words on the SDP and a small surrounding context, discarding extraneous clauses. The SDP is also rendered as a natural‑language description (e.g., “methodology → improve → accuracy”). This compact, SDP‑centered prompt is fed to the LLM, which now operates on a minimal yet coherent relational context, reducing syntactic noise and focusing attention on the core semantics required to decide whether a relation exists.

Stage 2 – Refinement (Two‑tiered Hierarchical Re‑evaluation)
The Grounding module produces a local prediction for every candidate pair. DEPTH aggregates all these predictions and asks the LLM to re‑evaluate them under global sentence‑level constraints such as relational transitivity, mutual exclusivity, and logical consistency. By providing a holistic view, the model can add missing relations, delete spurious ones, and resolve contradictions, effectively performing self‑correction.

Causality‑Driven Reward Modeling for RLHF
Beyond structural simplification, the authors address reward hacking—a common failure mode in reinforcement learning with human feedback (RLHF). Traditional reward models inadvertently learn to reward superficial cues (e.g., response length, stylistic patterns) that are not causally linked to correct relation judgments. The paper proposes a causal factorization of each prompt‑response pair into reward‑relevant components (x₁, y₁) and reward‑irrelevant components (x₂, y₂). The reward‑relevant part contains the task definition, simplified sentence, target entity pair, and the SDP description; the irrelevant part contains any extra narrative or stylistic elements. The reward model is trained exclusively on (x₁, y₁), thereby learning to assign high scores only when the truly causal signals of a correct relation are present. This robust reward model is then used in Proximal Policy Optimization (PPO) to fine‑tune the LLM, aligning it with the desired behavior while resisting spurious correlations.

Experimental Evaluation
DEPTH is evaluated on eight well‑established RE benchmarks, with a particular focus on SciERC, a scientific literature dataset that includes a large proportion of NO‑RELATION instances. Baseline LLMs (e.g., Qwen2.5‑14B‑Instruct) misclassify 96.9 % of NO‑RELATION cases, essentially hallucinating a relation for almost every negative example. After applying DEPTH, the hallucination rate drops to 7.9 %, a more than tenfold reduction. Moreover, the average micro‑F1 score improves by 9.3 percentage points across the benchmarks, demonstrating that the framework not only curbs false positives but also enhances true positive detection. Ablation studies confirm that each component—SDP‑based simplification, global refinement, and causality‑aware reward modeling—contributes meaningfully to the overall gains.

Significance and Limitations
DEPTH represents one of the first systematic attempts to give LLMs the ability to reliably output “NO‑RELATION” when appropriate, a capability crucial for downstream knowledge‑graph construction, question answering, and enterprise‑scale document processing where false facts can be costly. By marrying syntactic insight (dependency parsing) with hierarchical reasoning and a rigorously designed reward signal, the approach bridges the gap between raw language generation and precise, trustworthy information extraction. However, the method relies on accurate dependency parses; parsing errors could propagate into the simplification stage. Additionally, the current experiments focus on English scientific texts, leaving open the question of how well the framework generalizes to other languages or domains with different syntactic characteristics. Future work may explore parser‑agnostic alternatives, multilingual extensions, and tighter integration of the refinement stage with external knowledge bases.

In summary, DEPTH delivers a compelling solution to the hallucination problem in LLM‑based relation extraction by (1) distilling each entity pair’s context to its shortest dependency path, (2) re‑evaluating predictions under global sentence constraints, and (3) training a causally sound reward model for RLHF. The resulting system markedly reduces spurious relation predictions while boosting overall extraction performance, paving the way for more reliable automated knowledge acquisition.


Comments & Academic Discussion

Loading comments...

Leave a Comment