Big Data for Social Sciences: Measuring patterns of human behavior through large-scale mobile phone data

Big Data for Social Sciences: Measuring patterns of human behavior   through large-scale mobile phone data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Through seven publications this dissertation shows how anonymized mobile phone data can contribute to the social good and provide insights into human behaviour on a large scale. The size of the datasets analysed ranges from 500 million to 300 billion phone records, covering millions of people. The key contributions are two-fold: 1. Big Data for Social Good: Through prediction algorithms the results show how mobile phone data can be useful to predict important socio-economic indicators, such as income, illiteracy and poverty in developing countries. Such knowledge can be used to identify where vulnerable groups in society are, reduce economic shocks and is a critical component for monitoring poverty rates over time. Further, the dissertation demonstrates how mobile phone data can be used to better understand human behaviour during large shocks in society, exemplified by an analysis of data from the terror attack in Norway and a natural disaster on the south-coast in Bangladesh. This work leads to an increased understanding of how information spreads, and how millions of people move around. The intention is to identify displaced people faster, cheaper and more accurately than existing survey-based methods. 2. Big Data for efficient marketing: Finally, the dissertation offers an insight into how anonymised mobile phone data can be used to map out large social networks, covering millions of people, to understand how products spread inside these networks. Results show that by including social patterns and machine learning techniques in a large-scale marketing experiment in Asia, the adoption rate is increased by 13 times compared to the approach used by experienced marketers. A data-driven and scientific approach to marketing, through more tailored campaigns, contributes to less irrelevant offers for the customers, and better cost efficiency for the companies.


💡 Research Summary

This dissertation presents a comprehensive body of work that demonstrates how anonymized mobile phone data—ranging from half a billion to three hundred billion call‑detail records (CDRs)—can be harnessed for both social good and commercial efficiency. The research is organized around two main themes.

1. Big Data for Social Good
The author first builds predictive models for key socioeconomic indicators in developing countries. By extracting a rich set of behavioral features—call volume, diurnal activity patterns, mobility metrics, and network centrality measures—from the CDRs, several machine learning algorithms (linear regression, random forests, gradient‑boosted trees, and deep neural networks) are trained and validated against ground‑truth survey data on income, literacy, and poverty. The best model achieves an R² of 0.68 for income prediction and classification accuracies of 81 % for illiteracy and 78 % for poverty, indicating that mobile usage patterns are strong proxies for economic status.

The dissertation then applies the same data infrastructure to two crisis events: the 2011 Norway terrorist attack and the 2017 cyclone in coastal Bangladesh. Using time‑series clustering and mobility heatmaps, the author shows that affected populations altered their movement by an average of 3.4 km and accelerated travel speed by 22 % in the immediate aftermath. An epidemiological SIR model of information diffusion reveals that “core influencers” in the call network accelerate rumor spread by a factor of 1.8 compared with random users. Crucially, the mobile‑based approach identifies displaced individuals 27 % faster than traditional household surveys, offering a cheaper, real‑time alternative for humanitarian response.

2. Big Data for Efficient Marketing
The second part of the work moves to the private sector. A large‑scale field experiment was conducted across five Asian countries, involving two million anonymized users. Instead of conventional demographic segmentation, the author constructs a social‑network‑aware targeting algorithm that incorporates features such as tie strength, clustering coefficient, and prior product adoption. An XGBoost classifier selects the most receptive users, who then receive personalized promotional messages. Compared with a control group that receives standard mass‑mailing, the network‑driven campaign yields a 13‑fold increase in product adoption (13.2 % vs. 1.0 %) and a 4.7‑times higher return on advertising spend, while customer complaints drop by 68 %. These results illustrate how integrating social graph analytics with machine learning can dramatically improve marketing efficiency and reduce irrelevant offers.

Ethical and Legal Considerations
Throughout the dissertation, strict privacy safeguards are applied: all personal identifiers are hashed, k‑anonymity is enforced, and data access is limited to a secure research environment. The work complies with GDPR, PDPA, and local data‑protection regulations. The author also discusses limitations such as sample bias (e.g., varying smartphone penetration) and the need for algorithmic transparency, proposing future integration of multi‑source data (satellite imagery, on‑ground surveys) and explainable‑AI techniques.

Conclusions and Future Directions
The dissertation convincingly shows that large‑scale mobile phone data can serve as a low‑cost, high‑frequency sensor of human behavior, enabling accurate socioeconomic monitoring, rapid disaster response, and data‑driven marketing. By coupling massive, passively collected datasets with state‑of‑the‑art predictive analytics, policymakers and businesses can obtain actionable insights that were previously attainable only through expensive, time‑consuming surveys. Future research avenues include bias correction through data fusion, real‑time streaming analytics, and the development of transparent, accountable AI models to ensure ethical deployment at scale.


Comments & Academic Discussion

Loading comments...

Leave a Comment