Supervised Rank Aggregation for Predicting Influence in Networks
Much work in Social Network Analysis has focused on the identification of the most important actors in a social network. This has resulted in several measures of influence and authority. While most of such sociometrics (e.g., PageRank) are driven by intuitions based on an actors location in a network, asking for the “most influential” actors in itself is an ill-posed question, unless it is put in context with a specific measurable task. Constructing a predictive task of interest in a given domain provides a mechanism to quantitatively compare different measures of influence. Furthermore, when we know what type of actionable insight to gather, we need not rely on a single network centrality measure. A combination of measures is more likely to capture various aspects of the social network that are predictive and beneficial for the task. Towards this end, we propose an approach to supervised rank aggregation, driven by techniques from Social Choice Theory. We illustrate the effectiveness of this method through experiments on Twitter and citation networks.
💡 Research Summary
The paper tackles the longstanding problem of identifying influential actors in social networks by reframing influence as a concrete predictive task rather than an abstract, ill‑posed notion. The authors argue that without a specific downstream objective, traditional centrality measures (e.g., degree, PageRank) provide only heuristic rankings that cannot be objectively compared. To address this, they propose a supervised rank‑aggregation framework—Supervised Kemeny Ranking (SKR)—that learns how to combine multiple centrality scores in a way that directly optimizes predictive performance.
Data and Task
The primary dataset consists of a two‑week snapshot of Twitter activity surrounding the “Pepsi” controversy in November 2009. From this period the authors construct three directed graphs: a follower graph, a retweet graph, and a mention graph, each containing roughly 40 million nodes and over a billion edges. For each graph they compute three basic centrality metrics—indegree, outdegree, and PageRank—yielding a total of 13 distinct rankings (including weighted versions of retweet and mention counts). The prediction task is binary: given a user’s rankings at time t, predict whether the user will generate at least 100 retweets in the following week (≈10 % of the maximum observed retweet count). Model performance is evaluated with Area Under the ROC Curve (AUC) and Average Precision (AP) for the top‑k users, reflecting both overall discrimination and ranking quality at the head of the list.
Baseline Evaluation of Individual Measures
Table I shows that several individual metrics already achieve respectable predictive power (AUC > 80 %). The strongest single predictors are the weighted retweet indegree (AUC = 90.18 %) and weighted retweet outdegree (AUC = 86.80 %). Follower indegree and mention outdegree also perform reasonably well, but the Spearman correlation between past retweets and follower count is modest (≈0.43), indicating that each metric captures a distinct facet of influence. This motivates the need for a method that can fuse complementary signals rather than relying on any one measure.
From Unsupervised to Supervised Rank Aggregation
Traditional rank‑aggregation methods from social choice theory—Borda count and Kemeny optimal aggregation—treat all input rankings equally. While Kemeny minimizes the sum of Kendall‑tau distances to the input rankings, it is NP‑hard for four or more rankings and does not incorporate any knowledge about which rankings are more predictive for the task at hand. Consequently, a naïve Kemeny or Borda aggregation can be sub‑optimal when some centrality measures are noisy.
Supervised Kemeny Ranking (SKR) addresses both issues. First, the authors estimate a weight for each input ranking based on its performance on a held‑out training set (e.g., AUC or AP). These weights are then used to bias the pairwise majority relation that drives an Approximate Kemeny algorithm. The Approximate Kemeny procedure is essentially a quick‑sort that uses the weighted majority relation as its comparator: element i is considered “greater than” element j if the weighted sum of rankings that place i before j exceeds the sum that place j before i. Because the comparator may be non‑transitive, the final order depends on the sorting algorithm, but the authors prove that the quick‑sort implementation yields a locally Kemeny‑optimal ordering and satisfies the Extended Condorcet Criterion (ECC), ensuring that any element that a majority of weighted rankings prefers over another will appear earlier in the final list.
Computationally, Approximate Kemeny runs in O(r m log m) time (r = number of rankings, m = number of items), making it scalable to millions of users. The authors also compare against a “Local Kemenization” baseline (bubble‑sort based) and a point‑wise logistic regression that treats the centrality scores as features; both are outperformed by SKR.
Experimental Results
Across 20 stratified random splits (using 80 % of the users for training and 20 % for testing), SKR achieves an average AUC of ≈ 90.5 % and AP of ≈ 0.73, modestly surpassing the best single metric (weighted retweet indegree) and substantially beating unsupervised Kemeny, Borda, and logistic‑regression baselines. The improvement, though numerically small (≈ 0.3 % AUC, ≈ 0.04 % AP), is statistically significant given the large sample size and consistent across all splits. The authors replicate the methodology on a citation network, where the task is to predict future citation counts; SKR again yields the highest predictive scores, demonstrating the approach’s domain‑agnostic applicability.
Strengths, Limitations, and Future Directions
Key strengths of the work include: (1) a clear formulation of influence as a measurable prediction problem; (2) a novel integration of social‑choice theory with supervised learning, yielding a rank‑aggregation method that respects ECC while leveraging task‑specific performance information; (3) an efficient approximation algorithm that scales to tens of millions of nodes. Limitations are also acknowledged: the Twitter experiments focus on a single event and a short two‑week window, raising questions about generalizability to other topics, longer horizons, or different platforms. The binary threshold (≥ 100 retweets) is arbitrary; exploring regression‑style objectives or varying thresholds could provide richer insights. Finally, the authors omit computationally expensive centralities such as betweenness, but note that recent parallel approximations could be incorporated in future extensions.
In summary, the paper presents a compelling case for supervised rank aggregation as a practical tool for influence prediction in large‑scale networks. By learning to weight and combine diverse centrality signals, Supervised Kemeny Ranking delivers consistently better rankings than any individual metric or traditional unsupervised aggregation, and does so with computational efficiency suitable for real‑world, big‑data environments. Future work could broaden the set of base rankings, integrate content‑based features, and explore adaptive weighting schemes that evolve over time.
Comments & Academic Discussion
Loading comments...
Leave a Comment