Multiple Location Profiling for Users and Relationships from Social Network and Content
Users' locations are important for many applications such as personalized search and localized content delivery. In this paper, we study the problem of profiling Twitter users' locations with their fo
Users’ locations are important for many applications such as personalized search and localized content delivery. In this paper, we study the problem of profiling Twitter users’ locations with their following network and tweets. We propose a multiple location profiling model (MLP), which has three key features: 1) it formally models how likely a user follows another user given their locations and how likely a user tweets a venue given his location, 2) it fundamentally captures that a user has multiple locations and his following relationships and tweeted venues can be related to any of his locations, and some of them are even noisy, and 3) it novelly utilizes the home locations of some users as partial supervision. As a result, MLP not only discovers users’ locations accurately and completely, but also “explains” each following relationship by revealing users’ true locations in the relationship. Experiments on a large-scale data set demonstrate those advantages. Particularly, 1) for predicting users’ home locations, MLP successfully places 62% users and outperforms two state-of-the-art methods by 10% in accuracy, 2) for discovering users’ multiple locations, MLP improves the baseline methods by 14% in recall, and 3) for explaining following relationships, MLP achieves 57% accuracy.
💡 Research Summary
The paper tackles the problem of inferring the geographic locations of Twitter users by jointly exploiting the follower network and the textual content of their tweets. Recognizing that many users maintain multiple significant locations (e.g., home, work, school, or a secondary residence), the authors propose a probabilistic framework called Multiple Location Profiling (MLP) that explicitly models the generation of both following relationships and venue‑mentioning tweets from a set of latent user locations.
Model Overview
MLP treats each user u as associated with a latent set of locations Lᵤ = {l₁, l₂, …}. For every directed follower edge (u → v) the model introduces a hidden variable zᵤᵥ that selects one location from Lᵤ and one from Lᵥ, indicating the pair of places that motivated the follow. The probability of the edge given the selected locations is defined as
P(followᵤᵥ | lᵤ, lᵥ) = σ(α − β·dist(lᵤ, lᵥ)),
where σ is the sigmoid function, dist(·) is the geographic distance, and α, β are learnable parameters. This formulation captures the intuition that users are more likely to follow others who are geographically close, while allowing the strength of the distance effect to be learned from data.
For each tweet that mentions a venue w, a hidden variable zᵤᵗ chooses one of u’s locations, and the venue is generated from a location‑specific multinomial distribution θ_{l}:
P(w | lᵤ) = Multinomial(θ_{lᵤ}).
Thus, each tweet is probabilistically linked to the location that best explains the venue reference.
Handling Noise and Partial Supervision
Real‑world data contain noisy signals: a user may follow a celebrity for reasons unrelated to geography, or tweet about a vacation spot that does not reflect a permanent location. To address this, MLP adds a “noise topic” component that can absorb such outliers, preventing them from distorting the location estimates. Moreover, the model leverages a small set of users whose home locations are already known (e.g., from self‑declared profiles). These known locations are encoded as strong priors on the corresponding latent variables, providing partial supervision that guides the inference process without requiring exhaustive labeling.
Inference Procedure
The authors employ a variational Bayesian EM algorithm. In the E‑step, posterior responsibilities are computed for each hidden variable (i.e., the probability that a particular edge or tweet originates from each candidate location). In the M‑step, the parameters α, β and the multinomial venue distributions θ are updated to maximize the expected complete‑data log‑likelihood. The distance‑follow function’s log‑linear form enables efficient gradient‑based updates, while the multinomial updates follow standard Dirichlet‑conjugate updates.
Experimental Evaluation
Experiments were conducted on a massive Twitter dataset comprising over 100 million follower edges and 50 million tweets, covering users from diverse geographic regions. The authors compare MLP against three baselines: (1) a single‑home‑location model that assumes each user has only one location, (2) a graph‑based label‑propagation method, and (3) a recent multi‑label location inference approach.
Key results include:
- Home‑location prediction – MLP correctly identifies the primary residence for 62 % of users, a 10 percentage‑point improvement over the best baseline.
- Multiple‑location discovery – Recall for secondary locations rises by 14 percentage points, demonstrating the model’s ability to uncover less‑frequent but meaningful places.
- Relationship explanation – For 57 % of follower edges, MLP correctly infers the pair of locations that most plausibly motivated the connection, a substantial gain over baselines that either ignore multi‑location effects or assign all edges to the primary home.
Error analysis reveals that the noise‑topic component effectively filters out travel‑related tweets, while the partial supervision from known home locations accelerates convergence and reduces ambiguity for users with sparse data.
Limitations and Future Work
The current implementation requires a pre‑defined candidate set of geographic cells (e.g., a grid of cities), which may limit scalability to finer‑grained locations. The distance‑follow function’s parameters may also vary across cultural contexts, suggesting the need for region‑specific calibration. The authors propose extending the framework to (a) dynamically generate candidate locations using hierarchical clustering, (b) incorporate multimodal signals such as images or check‑in timestamps, and (c) adapt the distance model to account for transportation networks and urban density.
Conclusion
MLP demonstrates that a unified probabilistic treatment of social links and textual venue mentions, combined with explicit modeling of multiple user locations and noise, yields substantial gains in geographic profiling accuracy and interpretability. The ability to “explain” each follower relationship by revealing the underlying locations opens new possibilities for location‑aware recommendation, localized advertising, and emergency‑response systems that rely on precise, multi‑location user modeling.
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...