Fake News Detection After LLM Laundering: Measurement and Explanation

Fake News Detection After LLM Laundering: Measurement and Explanation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub


💡 Research Summary

The paper “Fake News Detection After LLM Laundering: Measurement and Explanation” investigates how large language models (LLMs) affect the performance of existing fake‑news detectors when the news articles are paraphrased by LLMs. While a substantial body of work focuses on detecting human‑written misinformation, the authors argue that the rise of LLM‑generated or LLM‑paraphrased content creates a new, under‑explored threat.

Research Questions
The study poses five questions: (RQ1) how do detectors perform on human‑written versus LLM‑paraphrased fake news? (RQ2) which detectors are most robust to paraphrasing? (RQ3) which LLMs produce paraphrases that are hardest or easiest to detect? (RQ4) which generator yields the highest FBERT (BERTScore‑based) similarity? (RQ5) what can explainability tools reveal about detection failures?

Datasets and Pre‑processing
Two public corpora are used: a balanced COVID‑19 misinformation set (≈5.5 K real, 5 K fake) and the multi‑class LIAR dataset (≈12.8 K political statements with six truth‑fulness labels). Text is cleaned with NLTK, tokenized, and split into train/validation/test as provided.

Paraphrasing Methods
Three LLM families generate paraphrases of both real and fake articles:

  1. Pegasus – a transformer‑based summarization model.
  2. GPT – accessed via OpenAI API.
  3. LLaMA – Meta’s open‑source model.

Each original article receives three paraphrased versions (one per model).

Detection Models
A total of 17 detectors are evaluated:

  • Classical ML: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (with three feature representations – TF‑IDF, CountVectorizer, Word Embeddings).
  • Deep Learning: Convolutional Neural Network (CNN) and Long Short‑Term Memory network (LSTM).
  • Pre‑trained Transformers: BERT, T5, LLaMA.

All models are trained on the original datasets and then tested on both original and paraphrased texts.

Evaluation Metrics
Standard classification metrics (accuracy, precision, recall, F1) are reported, with macro‑F1 emphasized for the imbalanced LIAR set. Paraphrase quality is measured by FBERT, the harmonic mean of precision‑BERT and recall‑BERT, which captures contextual semantic similarity.

Explainability Analysis
Local Interpretable Model‑agnostic Explanations (LIME) are generated for each detector’s predictions on human‑written and paraphrased samples. The authors also compute sentiment scores (using a pretrained sentiment analyzer) for each pair to assess whether sentiment shifts correlate with misclassifications.

Key Findings

  1. Detectors struggle more with LLM‑paraphrased fake news – Across both datasets, all detectors achieve higher F1 on human‑written fake articles. Pegasus‑paraphrased texts are the most difficult, yielding the lowest F1 scores.

  2. Model‑specific patterns – Encoder‑decoder models (BERT, T5, LLaMA) perform relatively poorly on human‑written fake news but handle GPT‑ and LLaMA‑paraphrased texts better. Conversely, traditional ML models (SVM, Logistic Regression, Random Forest) excel at detecting GPT/LLaMA paraphrases but still miss many Pegasus‑paraphrased items.

  3. Deep learning models (CNN, LSTM) show mixed results – They achieve low F1 on Pegasus and GPT paraphrases, suggesting that convolutional or recurrent architectures are vulnerable to sophisticated paraphrasing.

  4. Sentiment shift as a failure driver – LIME explanations frequently highlight sentiment‑related tokens. When paraphrasing changes the overall sentiment (e.g., from neutral to negative), detectors often flip their prediction, indicating that sentiment consistency is a hidden cue used by many models.

  5. High FBERT does not guarantee sentiment preservation – Some paraphrases receive high FBERT scores (indicating strong semantic overlap) yet exhibit substantial sentiment divergence. This uncovers a limitation of purely embedding‑based similarity metrics for evaluating paraphrase quality in the misinformation context.

Implications
The results suggest that adding a paraphrase step to a detection pipeline generally impedes detection rather than helping it, especially when the paraphraser introduces sentiment changes. Defensive strategies could involve (a) augmenting training data with sentiment‑preserving paraphrases, (b) explicitly modeling sentiment as a feature, or (c) designing detectors that are invariant to stylistic variations.

Resources
The authors release two enriched datasets (original + three LLM paraphrases + sentiment scores + FBERT scores) on GitHub, enabling reproducibility and future benchmarking of both attack and defense methods.

Conclusion
This work provides the first systematic measurement of how LLM‑paraphrased fake news interacts with a broad suite of detectors, uncovers sentiment shift as a key explanatory factor for detection failures, and highlights the inadequacy of current similarity metrics for capturing nuanced changes that affect misinformation detection. The released resources and insights lay groundwork for more robust, LLM‑aware fake‑news detection systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment