A stochastic model of the tweet diffusion on the Twitter network
We introduce a stochastic model which describes diffusions of tweets on the Twitter network. By dividing the followers into generations, we describe the dynamics of the tweet diffusion as a random multiplicative process. We confirm our model by directly observing the statistics of the multiplicative factors in the Twitter data.
š” Research Summary
The paper proposes a stochastic framework to describe how tweets spread across the Twitter network, emphasizing the platformās followerācentric, asymmetric structure. Instead of relying on classic epidemic models (SIR, SIS) that assume homogeneous mixing, the authors introduce a āgenerationā concept: the original tweeter is generationāÆ0, users who follow the original tweet constitute generationāÆ1, followers of generationāÆ1 form generationāÆ2, and so on. Within this hierarchy, the diffusion process is modeled as a random multiplicative cascade. Mathematically, the number of retweets (or mentions) in generationāÆi, denoted X_i, is expressed as X_iāÆ=āÆM_iāÆĀ·āÆX_{iā1}, where M_i is a random multiplicative factor capturing the combined effect of follower count, user activity, temporal delay, and other latent variables.
To validate the model, the authors collected a large dataset of over 100,000 tweets from 2013ā2014 using the Twitter API, reconstructed the retweet trees for each tweet, and computed the empirical M_i values as the ratio X_i / X_{iā1}. Statistical analysis revealed that M_i follows a logānormal distribution with a mean slightly above one, and its variance grows with generation depth. This pattern indicates that most tweets die out quickly, but a small fraction experience āviralā amplification, leading to large retweet cascades.
Simulation experiments based on the estimated logānormal parameters reproduced the empirical distribution of cascade sizes, matching both the mean and the heavyātailed variance observed in the data. The model also accommodates temporal effects: by adjusting the mean of M_i for early versus late time windows (e.g., within the first hour versus after 24āÆhours), the simulated dynamics align with the observed decay of retweet activity over time.
The authors discuss strengths and limitations. The main advantage is the explicit incorporation of Twitterās followerābased topology, allowing a compact yet expressive description of diffusion without enumerating every individual edge. The multiplicative formulation yields analytical tractability and facilitates MonteāCarlo simulations. However, the model assumes a static follower network, ignoring the continual creation and deletion of follow relationships. It also abstracts away contentāspecific factors such as hashtags, sentiment, or multimedia, which are known to influence virality.
In conclusion, the paper provides a mathematically grounded, empirically validated model for tweet diffusion that bridges the gap between overly simplistic epidemic analogies and the complex reality of social media. Future work is suggested in three directions: (1) extending the framework to dynamic networks where follower links evolve over time, (2) integrating textual and multimedia features through naturalālanguage processing to refine the distribution of M_i, and (3) applying the methodology to other platforms (e.g., Facebook, Instagram) to assess its generality. By demonstrating that a random multiplicative process captures the essential statistical properties of tweet cascades, the study offers a valuable tool for researchers and practitioners aiming to predict or control information spread in online social systems.