Regional properties of global communication as reflected in aggregated Twitter data

Regional properties of global communication as reflected in aggregated   Twitter data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Twitter is a popular public conversation platform with world-wide audience and diverse forms of connections between users. In this paper we introduce the concept of aggregated regional Twitter networks in order to characterize communication between geopolitical regions. We present the study of a follower and a mention graph created from an extensive data set collected during the second half of the year of $2012$. With a k-shell decomposition the global core-periphery structure is revealed and by means of a modified Regional-SIR model we also consider basic information spreading properties.


💡 Research Summary

The paper investigates how global communication patterns emerge on Twitter by aggregating user‑level interactions into regional networks. Using a publicly available Twitter stream from the second half of 2012, the authors first identify “geo‑users” whose location can be mapped to a fixed geographic point. Users located in oceans or otherwise unmapped territories are discarded. The remaining users are grouped into 473 regions for the follower graph and 476 regions for the mention graph; the latter uses the larger set of 5.38 million geo‑users, while the former relies on 3.31 million.

Two directed, weighted adjacency matrices are constructed: F (followers) and M (mentions). In F, a link points from the followed user to the follower (i.e., the direction of interest), while in M a link points from the sender to the mentioned user. Self‑loops (users mentioning or following someone in the same region) are retained and later used to define several descriptive metrics. The matrices are highly sparse (≈ 0.49 for F, ≈ 0.29 for M) but dominated by intra‑regional activity: 57 % of follower links and 83 % of mention links stay within the same region.

To study the structural hierarchy, the authors first symmetrize the matrices (zero out diagonals, set any non‑zero off‑diagonal entry to 1) obtaining unweighted graphs (\hat F) and (\hat M). A k‑shell decomposition is then applied. The follower graph’s core contains 240 regions, each with at least 199 neighbours, whereas the mention graph’s core consists of 173 regions with a minimum of 135 neighbours. The peripheries differ markedly: (\hat F) exhibits many small shells, while (\hat M) shows one large peripheral shell (45 regions) plus several tiny shells. This suggests that follower relations are broadly reciprocal and densely connected, whereas mentions are more topic‑oriented and form tighter clusters.

The paper introduces asymmetry parameters to capture directional imbalances. Type I asymmetry for a region i is defined as TVI / TVO when TVO > 0 (incoming over outgoing volume); Type II is the reciprocal when TVI > 0. Most regions have values close to 1, but about 15 %–20 % deviate strongly, indicating that some areas act primarily as information sources (e.g., California) and others as sinks (e.g., United Kingdom).

Two normalized measures, Normalized Interest Measure (NIM) and Normalized Activity Measure (NAM), are also proposed. They estimate the probability that a randomly chosen inter‑regional link involves region i, conditioned on the existence of a link. Empirically, both NIM and NAM decline as self‑followers or self‑mentions increase, following an approximate power‑law decay. The decay rate differs between core and peripheral regions, hinting at distinct scaling regimes, though the limited data prevent a definitive functional form.

The authors further explore the distribution of off‑diagonal entries in F and M. After trimming extreme values, the remaining data follow a power law with exponents α_F = 1.44 ± 0.02 and α_M = 1.38 ± 0.03, consistent with earlier findings on weighted social networks.

Finally, a regional SIR (R‑SIR) model is adapted to the aggregated networks. Each region k holds a population N_k, split into susceptible (S_k), infected (I_k), and recovered (R_k) compartments. Contact rates between regions are proportional to the normalized edge weights (x_{kj} = (N_k N_j)^{-1} X_{kj}) where X is either F or M. The differential equations are:

\


Comments & Academic Discussion

Loading comments...

Leave a Comment