Enhancing Fake-News Detection with Node-Level Topological Features
In recent years, the proliferation of misinformation and fake news has posed serious threats to individuals and society, spurring intense research into automated detection methods. Previous work showed that integrating content, user preferences, and propagation structure achieves strong performance, but leaves all graph-level representation learning entirely to the GNN, hiding any explicit topological cues. To close this gap, we introduce a lightweight enhancement: for each node, we append two classical graph-theoretic metrics, degree centrality and local clustering coefficient, to its original BERT and profile embeddings, thus explicitly flagging the roles of hub and community. In the UPFD Politifact subset, this simple modification boosts macro F1 from 0.7753 to 0.8344 over the original baseline. Our study not only demonstrates the practical value of explicit topology features in fake-news detection but also provides an interpretable, easily reproducible template for fusing graph metrics in other information-diffusion tasks.
💡 Research Summary
The paper addresses a notable gap in graph‑neural‑network (GNN) based fake‑news detection systems, specifically the User Preference‑aware Fake News Detection (UPFD) framework, which integrates textual content, user profile information, and propagation structure but leaves all graph‑level representation learning to the GNN itself. Consequently, explicit topological cues—such as which users act as hubs or belong to tightly knit communities—remain hidden. To remedy this, the authors propose a lightweight augmentation: for every node in the retweet propagation graph they compute two classic graph‑theoretic metrics, degree centrality (a normalized measure of hubness) and local clustering coefficient (a measure of community cohesion). These scalar values are concatenated to the original node feature vector, which already contains a BERT‑encoded representation of the tweet history and a user‑profile embedding. The resulting enriched feature matrix is fed into a more expressive GNN encoder, namely a single‑layer Graph Isomorphism Network (GIN), followed by a global attention pooling layer that learns to weight nodes according to their relevance before a final fully‑connected classification head predicts “real” versus “fake”.
Experiments are conducted on the two benchmark datasets used by UPFD: Politifact (314 political news items, ~41 k user nodes) and GossipCop (5 464 entertainment news items, ~314 k user nodes). Training uses Adam (lr = 1e‑3, weight decay = 5e‑4), batch size 64, and 50 epochs. On the Politifact subset, the enhanced model raises macro‑averaged F1 from 0.7753 (baseline UPFD) to 0.8344, with a comparable AUC increase (0.8839 → 0.9152). The authors attribute this gain to the fact that political misinformation tends to spread through ideologically homogeneous clusters, producing distinctive hub‑and‑cluster patterns that are directly captured by degree centrality and clustering coefficient. In contrast, on GossipCop the performance gains are marginal (F1 drops slightly from 0.9551 to 0.9451, AUC remains at 0.9850), indicating that in the entertainment domain textual cues dominate while propagation structure is less discriminative.
To probe the relative contributions of content and structure, three ablation settings are evaluated: (1) “Original” (both content and structure intact), (2) “Feature‑Only” (graph edges randomized, content preserved), and (3) “Structure‑Only” (node features replaced with Gaussian noise, graph preserved). Results show that Politifact relies more heavily on structural information, whereas GossipCop depends primarily on textual features. A further exploratory analysis of five topological descriptors (average degree, degree centrality, clustering coefficient, graph density, node count) reveals that in Politifact fake news graphs are larger but sparser, while in GossipCop fake news graphs are denser and exhibit higher clustering. Correlation heatmaps indicate redundancy among average degree, density, and clustering, suggesting a compact feature set (e.g., node count plus degree centrality) could suffice for classification.
The paper’s contribution is twofold: (i) it demonstrates that adding just two interpretable, computationally cheap graph metrics can substantially improve GNN‑based fake‑news detection in domains where propagation structure is informative, and (ii) it provides an easily reproducible template that can be transplanted to other diffusion‑based tasks such as rumor detection or opinion dynamics modeling. By making structural signals explicit, the approach also enhances model interpretability, allowing practitioners to trace predictions back to concrete network roles (hub users, community tightness). The authors conclude that future systems should adopt a domain‑aware design, selectively enriching node representations with topological cues when the underlying diffusion patterns exhibit strong structural signatures.
Comments & Academic Discussion
Loading comments...
Leave a Comment