Modelling of trends in Twitter using retweet graph dynamics
In this paper we model user behaviour in Twitter to capture the emergence of trending topics. For this purpose, we first extensively analyse tweet datasets of several different events. In particular, for these datasets, we construct and investigate the retweet graphs. We find that the retweet graph for a trending topic has a relatively dense largest connected component (LCC). Next, based on the insights obtained from the analyses of the datasets, we design a mathematical model that describes the evolution of a retweet graph by three main parameters. We then quantify, analytically and by simulation, the influence of the model parameters on the basic characteristics of the retweet graph, such as the density of edges and the size and density of the LCC. Finally, we put the model in practice, estimate its parameters and compare the resulting behavior of the model to our datasets.
💡 Research Summary
The paper investigates the emergence of trending topics on Twitter by focusing on the structural evolution of retweet graphs rather than on tweet content. The authors first collect a diverse set of Twitter datasets covering twenty‑plus events from 2013‑2014, including riots, sports competitions, political incidents, and natural disasters. For each dataset they construct a directed retweet graph where nodes represent users and a directed edge (u → v) indicates that user v retweeted a tweet originally posted by user u. By examining the graph at hourly intervals, they discover a consistent pattern: the edge density of the largest weakly connected component (LCC) exceeds one shortly before the overall tweet activity reaches its peak. This “densification” means that on average each user in the LCC receives more than one retweet, and it appears earlier for some events (e.g., a protest) and coincident with the peak for others (e.g., a live sports event). The timing of densification therefore serves as a potential early‑warning signal for trend escalation.
To capture these dynamics, the authors extend the “super‑star” random graph model introduced by Bhamidi et al. (2012). Their growth process proceeds in discrete time steps; at each step exactly one of three possible events occurs:
- T1 (new component creation) – a brand‑new user posts an original tweet on the topic, adding a new isolated node. This occurs with probability λ / (1 + λ).
- T2 (new user joins existing component) – a new user retweets an existing user, thereby adding both a new node and a new edge. This occurs with probability p / (1 + λ).
- T3 (edge between existing users) – an existing user retweets another existing user, adding only an edge. This occurs with probability (1 – p) / (1 + λ).
The parameter λ controls the rate at which fresh discussion components appear, while p governs the propensity for new users to attach to already‑existing components, thus driving component merging. For both T2 and T3 the source of the new edge is selected by a two‑stage preferential‑attachment mechanism: first a “message tree” (the retweet cascade rooted at an original tweet) is chosen with probability proportional to its current size, and then within that tree a specific source user is selected also proportionally to its degree. Additionally, a “super‑star” node receives a fixed positive probability of being chosen, reflecting the empirical observation that a small number of highly influential users attract a disproportionate share of retweets.
Analytically, the authors derive how λ and p affect the asymptotic size and edge density of the LCC. Larger λ yields many small, disconnected components, reducing the eventual LCC size. Higher p accelerates merging of components, leading to a larger and denser LCC. The super‑star parameter further amplifies densification by concentrating edges around a hub, which also advances the time at which the LCC density surpasses one.
Parameter estimation is performed on each real dataset by fitting the observed LCC size and density trajectories to the model’s predictions. Simulations using the estimated λ and p reproduce the empirical curves with high fidelity, confirming that the model captures both the growth of the overall retweet graph and the critical densification phase. Notably, the model works across heterogeneous domains: a protest in the Netherlands, the World Cup speed‑skating event, the Eurovision Song Contest, and others all exhibit similar agreement between simulated and observed LCC dynamics.
The paper’s contributions are threefold: (1) identification of LCC densification as a robust precursor to trend peaks, (2) formulation of a parsimonious three‑parameter stochastic graph‑growth model that explains this phenomenon, and (3) demonstration that the model’s parameters can be estimated from real data, enabling potential early‑warning systems for emerging trends. The authors acknowledge a limitation: the model ignores the underlying follower‑followee network, treating retweets as independent of friendship ties. Future work is suggested to integrate follower structure, temporal activity patterns, and content‑based features for a more comprehensive predictive framework.
Comments & Academic Discussion
Loading comments...
Leave a Comment