Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection
In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pair. Yet most deepfake detectors, engineered to detect pixel-level forgeries, do not account for claim-level meaning, despite their growing integration in automated fact-checking (AFC) pipelines. This raises a central scientific and practical question: Do pixel-level detectors contribute useful signal for verifying image-text claims, or do they instead introduce misleading authenticity priors that undermine evidence-based reasoning? We provide the first systematic analysis of deepfake detectors in the context of multimodal misinformation detection. Using two complementary benchmarks, MMFakeBench and DGM4, we evaluate: (1) state-of-the-art image-only deepfake detectors, (2) an evidence-driven fact-checking system that performs tool-guided retrieval via Monte Carlo Tree Search (MCTS) and engages in deliberative inference through Multi-Agent Debate (MAD), and (3) a hybrid fact-checking system that injects detector outputs as auxiliary evidence. Results across both benchmark datasets show that deepfake detectors offer limited standalone value, achieving F1 scores in the range of 0.26-0.53 on MMFakeBench and 0.33-0.49 on DGM4, and that incorporating their predictions into fact-checking pipelines consistently reduces performance by 0.04-0.08 F1 due to non-causal authenticity assumptions. In contrast, the evidence-centric fact-checking system achieves the highest performance, reaching F1 scores of approximately 0.81 on MMFakeBench and 0.55 on DGM4. Overall, our findings demonstrate that multimodal claim verification is driven primarily by semantic understanding and external evidence, and that pixel-level artifact signals do not reliably enhance reasoning over real-world image-text misinformation.
💡 Research Summary
The paper investigates whether image‑only deepfake detectors, which are designed to spot pixel‑level manipulations, actually help in verifying multimodal misinformation consisting of an image‑text claim, or whether they introduce misleading priors that degrade performance. To answer this, the authors conduct a systematic empirical study on two multimodal misinformation benchmarks: MMFakeBench and DGM4. Both datasets contain a mixture of authentic images paired with false captions, as well as synthetically generated or edited images, thereby covering both visual tampering and semantic/contextual distortion.
Three system families are evaluated. (1) State‑of‑the‑art image‑only deepfake detectors (e.g., FaceForensics++, DFDC‑ResNet, DiGEN‑Adapter) that output a binary “manipulated / not manipulated” score for the image. (2) An evidence‑centric automated fact‑checking pipeline that parses the claim, uses Monte‑Carlo Tree Search (MCTS) to select retrieval tools (web search, entity detection, image forensics, etc.), gathers external evidence, and then runs a Multi‑Agent Debate (MAD) where several agents propose, challenge, and evaluate evidence before a large language model (LLM) makes the final verdict. (3) A hybrid system that injects the deepfake detector’s probability as an additional evidence piece into the same evidence‑centric pipeline.
Results show that deepfake detectors alone achieve low F1 scores (0.26‑0.53 on MMFakeBench, 0.33‑0.49 on DGM4). Their performance collapses on cases where the image is genuine but the accompanying text is false, indicating that pixel‑level signals do not correlate with claim‑level truth. When the detector outputs are added to the fact‑checking pipeline, overall performance consistently drops by 0.04‑0.08 F1 points. The authors attribute this to non‑causal “authenticity priors”: a high confidence that an image is real leads the downstream LLM to under‑weight contradictory textual evidence, resulting in biased decisions.
Conversely, the evidence‑centric pipeline without detector cues attains the highest scores (≈0.81 F1 on MMFakeBench, ≈0.55 on DGM4). MCTS efficiently allocates limited tool budget to the most promising evidence sources, while MAD enables agents to surface inconsistencies, cross‑validate retrieved documents, and flag false visual claims. This structured reasoning outperforms naïve multimodal fusion models that rely only on joint embeddings.
The study draws three key implications. First, multimodal misinformation verification is driven primarily by semantic understanding and external evidence; pixel‑level artifact detection offers at best a marginal auxiliary signal. Second, indiscriminately feeding deepfake detector outputs into automated fact‑checking pipelines can be counterproductive, suggesting that system designers should treat detector scores as optional evidence, possibly with calibrated confidence or separate verification stages. Third, modular pipelines that combine tool selection (MCTS) and deliberative multi‑agent debate provide a robust framework for integrating heterogeneous evidence types and mitigating shortcut reasoning.
Future work is suggested in three directions: (a) developing principled interfaces that dynamically weight detector confidence based on the context of the claim, (b) extending the framework to additional modalities such as video and audio, and (c) exploring human‑in‑the‑loop designs where expert feedback can correct misleading priors introduced by visual authenticity detectors. Overall, the paper provides the first comprehensive assessment of deepfake detectors within multimodal fact‑checking and demonstrates that semantic, evidence‑driven approaches substantially outperform pixel‑centric methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment