Predicting Influential Users in Online Social Networks
Who are the influential people in an online social network? The answer to this question depends not only on the structure of the network, but also on details of the dynamic processes occurring on it. We classify these processes as conservative and non-conservative. A random walk on a network is an example of a conservative dynamic process, while information spread is non-conservative. The influence models used to rank network nodes can be similarly classified, depending on the dynamic process they implicitly emulate. We claim that in order to correctly rank network nodes, the influence model has to match the details of the dynamic process. We study a real-world network on the social news aggregator Digg, which allows users to post and vote for news stories. We empirically define influence as the number of in-network votes a user’s post generates. This influence measure, and the resulting ranking, arises entirely from the dynamics of voting on Digg, which represents non-conservative information flow. We then compare predictions of different influence models with this empirical estimate of influence. The results show that non-conservative models are better able to predict influential users on Digg. We find that normalized alpha-centrality metric turns out to be one of the best predictors of influence. We also present a simple algorithm for computing this metric and the associated mathematical formulation and analytical proofs.
💡 Research Summary
The paper tackles the problem of identifying influential users in online social networks by emphasizing the importance of matching the influence‑ranking model to the underlying dynamic process that governs information flow. The authors introduce a dichotomy between conservative dynamics—where the quantity being transmitted is preserved (e.g., a random walk)—and non‑conservative dynamics—where information, opinions, or behaviors can be duplicated and spread (e.g., voting, retweeting). They argue that an influence model implicitly assumes a particular type of dynamics; therefore, a model designed for conservative processes will mis‑rank nodes when the real process is non‑conservative, and vice‑versa.
To test this hypothesis, the authors use data from Digg, a social news aggregator where users submit stories and other users vote for them. They define empirical influence as the total number of in‑network votes generated by a user’s submitted stories. This measure is purely a product of Digg’s voting mechanism, which is a classic example of non‑conservative information spread: a single vote can inspire many downstream votes without any loss of “vote mass”.
The study evaluates several centrality measures:
- Structural metrics – degree, closeness, betweenness.
- Conservative dynamic metric – PageRank (random‑walk based).
- Non‑conservative dynamic metrics – α‑centrality and its normalized variant.
α‑centrality is defined as x = (I – αA)⁻¹ · 1, where A is the adjacency matrix, α a scalar controlling the attenuation of longer paths, and 1 a vector of ones. Small α emphasizes immediate neighbors, while larger α incorporates longer‑range influence. Normalization rescales each node’s score by the sum of all scores, eliminating size bias and enabling direct comparison across nodes.
Experimental Findings
- PageRank and the purely structural metrics show weak correlation with empirical influence (Pearson r ≈ 0.2–0.35).
- α‑centrality exhibits a substantially higher correlation (r ≈ 0.65) when α is set in the range 0.1–0.2.
- Normalized α‑centrality further improves performance, achieving the highest correlation (r ≈ 0.78) and the best top‑k ranking agreement (≈68 % of the top‑10 % influential users correctly identified).
These results confirm the authors’ central claim: non‑conservative models better capture influence in environments where information duplication dominates.
Algorithmic Contribution
The paper presents an iterative algorithm for computing α‑centrality efficiently on large graphs:
Initialize x⁰ = 1
Repeat: x^{t+1} = α A x^{t} + 1
until convergence
Convergence is guaranteed when α·λ_max < 1, where λ_max is the largest eigenvalue of A. The authors provide a formal proof that the limit of this iteration equals (I – αA)⁻¹ · 1. The computational cost is O(m·k), with m the number of edges and k the number of iterations, making the method scalable to networks with millions of edges.
Implications and Applications
The findings have broad relevance:
- Marketing & Viral Campaigns: In platforms where sharing is non‑conservative, normalized α‑centrality can pinpoint seed users whose endorsement will cascade most effectively.
- Public Opinion & Misinformation: Detecting users who can amplify messages helps design interventions to curb the spread of false information.
- Network Design: Understanding whether a system behaves conservatively or non‑conservatively informs the choice of monitoring and control strategies.
Conversely, in domains where the process is fundamentally conservative (e.g., packet routing, disease transmission modeled as conserved particles), traditional random‑walk based centralities remain appropriate.
Conclusion
By systematically linking the nature of the underlying diffusion process to the choice of influence metric, the paper provides a principled framework for influence prediction. Empirical validation on Digg demonstrates that normalized α‑centrality, a non‑conservative centrality measure, outperforms both structural and conservative alternatives in ranking influential users. The accompanying scalable algorithm and rigorous convergence analysis make the approach ready for deployment in real‑world large‑scale social platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment