Where are my followers? Understanding the Locality Effect in Twitter
Twitter is one of the most used applications in the current Internet with more than 200M accounts created so far. As other large-scale systems Twitter can obtain enefit by exploiting the Locality effect existing among its users. In this paper we perform the first comprehensive study of the Locality effect of Twitter. For this purpose we have collected the geographical location of around 1M Twitter users and 16M of their followers. Our results demonstrate that language and cultural characteristics determine the level of Locality expected for different countries. Those countries with a different language than English such as Brazil typically show a high intra-country Locality whereas those others where English is official or co-official language suffer from an external Locality effect. This is, their users have a larger number of followers in US than within their same country. This is produced by two reasons: first, US is the dominant country in Twitter counting with around half of the users, and second, these countries share a common language and cultural characteristics with US.
💡 Research Summary
The paper presents the first large‑scale investigation of the “locality” effect in Twitter, i.e., the tendency of followers to be geographically close to the users they follow. The authors collected geographic location data for roughly one million Twitter users (referred to as “friends”) and the locations of about 16.5 million of their followers, yielding over 100 million friend→follower links. Data were gathered via the Twitter REST API between January and April 2011, using a master‑slave distributed crawler to overcome the API’s rate limit (350 queries per hour per IP). User‑provided location strings were normalized with the Yahoo! Geocode API; entries lacking meaningful location information were discarded.
Two distance‑based metrics were defined. The link‑level distance measures the geographic distance for each individual friend→follower edge, while the user‑level distance computes the median distance between a user and all of its followers, thereby mitigating the bias introduced by highly popular users who have many long‑range followers. The authors argue that the user‑level metric better reflects a typical user’s locality.
Global analysis shows that 35 % of links are shorter than 1 000 km and 67 % are under 4 000 km, indicating a predominance of intra‑country or intra‑continent connections. Nevertheless, about 25 % of links exceed 6 500 km, revealing a substantial cross‑continent component; Twitter is therefore not a highly localized network at the link level. At the user level, 80 % of users have a median follower distance below 400 km, suggesting that most users experience strong locality, but popularity correlates positively with distance—more popular users tend to have farther‑reaching follower bases.
The study then focuses on the 15 countries contributing the most friends, which together account for roughly 90 % of the dataset. The United States dominates, providing about half of all friends, followers, and links. Countries are grouped by official language: English‑speaking (US, Canada, UK, Ireland, India, Australia) versus non‑English (Brazil, Spain, Germany, France, Italy, Indonesia, Japan, Netherlands). For each country, the authors compute the proportion of outgoing links that stay within the country, go to the US, or go elsewhere. Three distinct profiles emerge:
- US profile – over 70 % of its links are domestic, reflecting both its sheer size and a strong local culture.
- Local profile – countries such as Brazil, the Netherlands, Germany, Spain, and Indonesia retain a higher share of links domestically than to the US or other nations. Brazil is extreme, with nearly 80 % of its links staying within the country, underscoring the role of language and cultural cohesion.
- External (US‑oriented) profile – English‑official countries (UK, Australia, India) exhibit a large fraction of links directed to the US, indicating an “external locality” effect driven by the US’s dominance in Twitter’s user base and shared language.
User‑level analysis of four representative nations (Brazil, US, UK, France) confirms these patterns. Brazil shows 90 % intra‑country follower median distance; the US shows about 60 %; the UK displays a bipartite behavior where less popular users are locally oriented while popular users have most followers in the US; France shows a similar but less pronounced bipartition with a bias toward intra‑country connections.
The authors conclude that while Twitter exhibits a noticeable intra‑country locality, the overall network is heavily shaped by the United States’ demographic weight. Consequently, any system‑level optimizations (caching, CDN placement, data‑center allocation) must consider per‑country language and cultural factors rather than relying on a global locality assumption. Limitations include reliance on self‑reported location strings (which may be inaccurate or missing), the age of the dataset (2011), and API‑induced sampling bias. Future work is suggested to incorporate up‑to‑date data, mobile GPS signals, and additional socio‑economic variables to build a more nuanced, dynamic model of locality in online social networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment