Human Mobility and Predictability enriched by Social Phenomena Information
The massive amounts of geolocation data collected from mobile phone records has sparked an ongoing effort to understand and predict the mobility patterns of human beings. In this work, we study the extent to which social phenomena are reflected in mobile phone data, focusing in particular in the cases of urban commute and major sports events. We illustrate how these events are reflected in the data, and show how information about the events can be used to improve predictability in a simple model for a mobile phone user’s location.
💡 Research Summary
The paper investigates how social phenomena manifest in mobile phone data and how incorporating external event information can improve human mobility prediction models. Using five months of anonymized Call Detail Records (CDRs) from a major Argentine mobile operator, the authors analyze roughly 40 million users with hourly granularity. Each record contains caller, callee, timestamp, direction, and the serving cell‑tower location.
First, a baseline “most‑frequent‑location” (MFL) model is built: for each of the 168 hourly slots in a week, the antenna most often used by a user in the training period (15 weeks) is taken as the predicted location. Tested over a two‑week horizon, the model achieves an average accuracy of about 35 %, with peaks above 50 % during weekday commuting hours. Predictability is higher for outgoing calls than incoming ones, and nighttime slots show the strongest predictability due to people staying at home. The authors note that treating each antenna as a distinct location introduces noise because a single physical place may be covered by multiple towers and vice‑versa.
Second, the study quantifies urban commuting. Nighttime (21 h–5 h) and daytime (12 h–16 h) periods are defined, and for each user the antenna with the highest night‑time activity outside the city and the antenna with the highest day‑time activity inside the city are labeled as “home” and “work,” respectively. Users whose night‑time calls are ≥80 % outside the city and day‑time calls are ≥80 % inside are classified as commuters. Approximately three million commuters are identified. The Euclidean distance between home and work antennas yields an average commute radius (R_OC) of 7.8 km, far shorter than a random baseline of 32.9 km, confirming that real commuting patterns are spatially concentrated.
Third, the authors examine mobility around major sports events, focusing on Boca Juniors soccer matches. By aggregating call volumes before and after match times, they observe a sharp convergence of users toward the stadium in the hours leading up to a game and a rapid dispersal afterward. This pattern is absent on comparable days without matches, demonstrating that large‑scale events generate distinct, short‑term mobility signatures.
The core contribution lies in enriching the baseline model with external event data. The authors tag users as “Boca fans” if they make calls from antennas surrounding the stadium during match time slots across three consecutive matches (both home and away). For match days, the enriched model overrides the MFL prediction and forces the stadium‑antenna cluster as the predicted location for tagged fans. When evaluated on the same test set, the enriched model raises fan‑group prediction accuracy from 19 % (baseline) to 38 %, effectively doubling performance. Moreover, coverage expands from 63 % of events (baseline) to 100 % for match days, because the enriched model can predict locations that never appeared in the training data (e.g., an away‑match stadium 1,000 km away). The authors illustrate cases where the baseline would never predict a user’s presence in a different city, whereas the enriched model correctly anticipates travel due to the scheduled match.
In the discussion, the paper acknowledges limitations of antenna‑level granularity and proposes clustering antennas to better approximate real places. It also suggests extending the approach to other external data sources—cultural festivals, holidays, vacation itineraries—and to community‑level tagging, where the behavior of one member informs predictions for the whole group. Systematically integrating such heterogeneous data streams is identified as a major future challenge.
Overall, the study demonstrates that mobile phone CDRs can capture social‑driven mobility patterns, and that augmenting simple frequency‑based prediction models with event schedules and user tagging yields substantial gains in accuracy and coverage. These findings have practical implications for urban planning, traffic management, epidemic modeling, and commercial services that rely on anticipatory mobility insights.
Comments & Academic Discussion
Loading comments...
Leave a Comment