Cross-Domain Fake News Detection on Unseen Domains via LLM-Based Domain-Aware User Modeling

Cross-Domain Fake News Detection on Unseen Domains via LLM-Based Domain-Aware User Modeling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Cross-domain fake news detection (CD-FND) transfers knowledge from a source domain to a target domain and is crucial for real-world fake news mitigation. This task becomes particularly important yet more challenging when the target domain is previously unseen (e.g., the COVID-19 outbreak or the Russia-Ukraine war). However, existing CD-FND methods overlook such scenarios and consequently suffer from the following two key limitations: (1) insufficient modeling of high-level semantics in news and user engagements; and (2) scarcity of labeled data in unseen domains. Targeting these limitations, we find that large language models (LLMs) offer strong potential for CD-FND on unseen domains, yet their effective use remains non-trivial. Nevertheless, two key challenges arise: (1) how to capture high-level semantics from both news content and user engagements using LLMs; and (2) how to make LLM-generated features more reliable and transferable for CD-FND on unseen domains. To tackle these challenges, we propose DAUD, a novel LLM-Based Domain-Aware framework for fake news detection on Unseen Domains. DAUD employs LLMs to extract high-level semantics from news content. It models users’ single- and cross-domain engagements to generate domain-aware behavioral representations. In addition, DAUD captures the relations between original data-driven features and LLM-derived features of news, users, and user engagements. This allows it to extract more reliable domain-shared representations that improve knowledge transfer to unseen domains. Extensive experiments on real-world datasets demonstrate that DAUD outperforms state-of-the-art baselines in both general and unseen-domain CD-FND settings.


💡 Research Summary

This paper, “Cross-Domain Fake News Detection on Unseen Domains via LLM-Based Domain-Aware User Modeling,” addresses a critical and practical challenge in combating online misinformation: detecting fake news in a previously unseen target domain (e.g., a sudden pandemic or geopolitical conflict) by leveraging knowledge from labeled source domains. The authors identify two key limitations in existing Cross-Domain Fake News Detection (CD-FND) methods: 1) insufficient modeling of high-level semantics in news content and user engagements, and 2) poor generalization to unseen domains due to label scarcity. While Large Language Models (LLMs) offer potential to overcome these limitations, their direct application is non-trivial due to challenges in capturing implicit user behavior semantics and ensuring the reliability of LLM-generated features for transfer learning.

To tackle these challenges, the authors propose a novel framework named DAUD (LLM-based Domain-Aware framework for fake news detection on Unseen Domains). DAUD consists of two synergistic core modules: the LDAE (LLM-based Domain-Aware Enhancement) module and the DSRA (Domain-Shared feature learning and Relation-aware Alignment) module.

The LDAE module is designed to harness LLMs for high-level semantic extraction. It first employs an LLM to generate concise summaries of news articles, capturing abstract, domain-invariant themes beyond surface-level text. More innovatively, it introduces a Domain-Aware User Agent. This agent constructs a personalized user profile by aggregating LLM-generated summaries of all news articles a user has historically engaged with (e.g., commented on, reposted). Crucially, it considers engagements both within a single domain and across multiple domains, building a rich representation of a user’s cross-domain behavioral preferences. For unseen domain news, this agent can then predict (via LLM prompting) how a user might engage with it, thereby augmenting sparse interaction data.

The DSRA module addresses the reliability and transferability of LLM-generated features. It operates on three levels—news, user, and engagement—by performing triple relation-aware alignment. Specifically, it models the relationships between: 1) original news text embeddings and LLM-generated news summary embeddings, 2) original user metadata embeddings and LLM-generated user profile embeddings, and 3) actual observed engagement features and LLM-predicted engagement features. Using mechanisms like cross-attention, DSRA aligns and fuses these paired representations. This process grounds the potentially hallucinatory LLM outputs in real data while enriching the data-driven features with high-level semantics. The refined features are then processed by domain-shared disentanglers to separate domain-specific noise from domain-invariant, veracity-related signals. These invariant features are used to train the final fake news classifier, enabling effective knowledge transfer to the unseen target domain.

The framework was rigorously evaluated on three real-world datasets: Politifact, GossipCop, and CoAID. Experiments were conducted under both general CD-FND settings (where some target domain labels are available) and more challenging unseen-domain CD-FND settings (where no target domain labels are used for training). DAUD was compared against a wide range of state-of-the-art baselines, including traditional ML models, graph neural networks, recent domain adaptation methods, and direct LLM prompting (e.g., GPT-3.5/4). The results demonstrated that DAUD significantly outperformed all baselines across all evaluation metrics (Accuracy, F1, etc.) in both settings. The performance gains were especially pronounced in the unseen-domain scenario, validating the core hypothesis that LLM-enhanced high-level semantics and relation-aware alignment are crucial for generalization to novel domains. Ablation studies further confirmed the contribution of each module (LDAE and DSRA) to the overall performance.

In conclusion, this paper makes a significant contribution by pioneering a framework that effectively leverages LLMs not just as feature extractors, but as reasoning agents for user modeling within a robust, relation-aware alignment architecture. DAUD provides a principled solution to the unseen-domain CD-FND problem, offering a powerful tool for real-world fake news mitigation where new topics and events constantly emerge without labeled data.


Comments & Academic Discussion

Loading comments...

Leave a Comment