Supporting the Curation of Twitter User Lists

Supporting the Curation of Twitter User Lists
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Twitter introduced lists in late 2009 as a means of curating tweets into meaningful themes. Lists were quickly adopted by media companies as a means of organising content around news stories. Thus the curation of these lists is important, they should contain the key information gatekeepers and present a balanced perspective on the story. Identifying members to add to a list on an emerging topic is a delicate process. From a network analysis perspective there are a number of views on the Twitter network that can be explored, e.g. followers, retweets mentions etc. We present a process for integrating these views in order to recommend authoritative commentators to include on a list. This process is evaluated on manually curated lists about unrest in Bahrain and the Iowa caucuses for the 2012 US election.


💡 Research Summary

The paper addresses the practical problem faced by newsrooms and social‑media monitoring agencies of expanding Twitter user lists that are used to track emerging stories. While Twitter’s “list” feature allows users to group accounts by topic, manually curating these lists for new events is time‑consuming and risks missing key contributors. The authors propose an automated, iterative recommendation framework that starts from a small, expert‑provided seed list and expands it by discovering relevant “candidate” accounts in the surrounding Twitter network.

The system operates in three phases: bootstrapping, recommendation, and update. In the bootstrapping phase, the algorithm retrieves the follower/friend relationships, list memberships, and recent tweets of each seed user, subject to API rate limits (max 1,000 links, lists, or tweets per user). This yields a core set (the seed) and a candidate set (all non‑core users encountered).

Four distinct network views are constructed from the collected data: (1) a directed core‑friend graph (core users and the accounts they follow), (2) a directed core‑mention graph (edges from a core user to any non‑core user mentioned, weighted by mention count), (3) a directed core‑retweet graph (edges from a core user to any non‑core user whose tweets were retweeted, weighted by retweet count), and (4) an undirected weighted co‑listed graph (edges between core and non‑core users that appear together on the same external Twitter list, weighted by the Jaccard similarity of list overlap).

For each graph, appropriate centrality or authority measures are applied: normalized in‑degree for the core‑friend graph, HITS with priors (seed users receive uniform prior probability, others zero) for the same graph, and weighted in‑degree for the mention, retweet, and co‑listed graphs. These produce five ranking vectors (normalized degree, HITS, and three weighted in‑degree scores). Rather than merging raw scores, the authors convert each ranking into a rank order and assemble a matrix X whose rows correspond to users and columns to the five rankings. Singular Value Decomposition (SVD) is then performed on X; the first left singular vector provides an aggregated score that captures the dominant consensus across all views. Users are sorted by this aggregated score, the top r (r = 50 in the experiments) are presented as recommendations, and optional filters (minimum tweet count, recent activity) are applied.

After recommendations are generated, a human curator (or in the experiments, an automated selector) adds the highest‑ranked candidates to the core set. The update phase then refreshes the local network representation by re‑fetching data for the expanded core, for the most frequently mentioned non‑core users, and for the previously rejected candidates. This refreshed network feeds back into the next recommendation cycle, allowing the system to adapt to evolving conversation dynamics while respecting API quotas.

The framework is evaluated on two real‑world Twitter lists curated by Storyful: (1) a list of 128 accounts covering the Iowa caucus during the 2012 US presidential primaries, and (2) a list related to the political unrest in Bahrain. For the Iowa case, the full list is split into four disjoint subsets of 32 users each. Each subset undergoes six full recommendation‑selection‑update iterations, automatically adding the top five users per iteration (30 new core users total). Precision remains high throughout (0.88–0.97) and recall steadily improves (up to 0.48 by the sixth iteration), demonstrating that expanding the list does not significantly degrade relevance. Visualizations of the follower subgraph before and after expansion show that high‑profile accounts (e.g., the Governor of Iowa) are correctly incorporated. The Bahrain case examines whether a “silo” effect occurs when the seed list is biased toward a particular viewpoint; the multi‑view SVD aggregation mitigates this bias by surfacing users identified through mentions, retweets, or co‑listing rather than solely through follower counts.

Key contributions include: (i) the definition of four complementary network views, notably the novel weighted co‑listed graph that captures crowd‑sourced curation signals; (ii) an SVD‑based rank aggregation method that efficiently fuses heterogeneous centrality measures without requiring parameter tuning; (iii) a practical, API‑aware workflow that integrates human oversight, mirroring active‑learning paradigms. Limitations are acknowledged: API rate limits force the exclusion of ultra‑high‑degree nodes, potentially omitting influential accounts; evaluation relies on precision/recall against a static ground truth, which does not capture dimensions such as diversity, bias, or topical coverage; and the system currently operates in batch mode rather than true real‑time streaming.

Future work suggested includes incorporating textual analysis (topic modeling, sentiment) to complement structural signals, developing metrics for list diversity and bias, and extending the framework to online SVD or incremental graph updates for real‑time monitoring. Overall, the paper demonstrates that a multi‑view network approach, combined with simple linear algebraic aggregation, can substantially aid journalists and analysts in maintaining comprehensive, balanced Twitter lists for fast‑moving news events.


Comments & Academic Discussion

Loading comments...

Leave a Comment