Estimating individual employment status using mobile phone network data

Estimating individual employment status using mobile phone network data
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This study provides the first confirmation that individual employment status can be predicted from standard mobile phone network logs externally validated with household survey data. Individual welfare and households vulnerability to shocks are intimately connected to employment status and professions of household breadwinners. At a societal level unemployment is an important indicator of the performance of an economy. By deriving a broad set of novel mobile phone network indicators reflecting users financial, social and mobility patterns we show how machine learning models can be used to predict 18 categories of profession in a South-Asian developing country. The model predicts individual unemployment status with 70.4 percent accuracy. We further show how unemployment can be aggregated from individual level and mapped geographically at cell tower resolution, providing a promising approach to map labor market economic indicators, and the distribution of economic productivity and vulnerability between censuses, especially in heterogeneous urban areas. The method also provides a promising approach to support data collection on vulnerable populations, which are frequently under-represented in official surveys.


💡 Research Summary

The paper presents a novel methodology for inferring individual employment status and occupational categories from mobile phone network logs, validated against a household survey in a South‑Asian developing country. The authors collected six months of Call Detail Records (CDRs) and location traces from a single telecom operator, then matched these data with a contemporaneous household survey that provided ground‑truth labels for 2,500+ respondents across 18 occupational categories, including unemployment.

Feature engineering focused on three behavioral dimensions: (1) financial activity (monthly bill, prepaid recharge frequency and amount, average cost per call/SMS), (2) social connectivity (number of contacts, total call duration, network centrality measures such as betweenness and clustering coefficient), and (3) mobility patterns (average daily distance traveled, diversity of visited locations, regularity of commuting peaks). In total, 32 derived variables were fed into machine‑learning models.

Multiple ensemble classifiers (XGBoost, Random Forest, LightGBM) were trained and evaluated using five‑fold cross‑validation. The best model—an XGBoost classifier—achieved an overall accuracy of 71.2 % and a specific unemployment detection accuracy of 70.4 % (precision = 0.68, recall = 0.71, ROC‑AUC = 0.78). Occupational sub‑categories such as agriculture, manufacturing, and services were predicted with accuracies ranging from 60 % to 66 %. SHAP analysis revealed that the most influential predictors of unemployment were a decline in prepaid recharge frequency, reduced network centrality, and a contraction in daily movement radius, underscoring the link between financial strain, weakened social ties, and limited mobility.

For spatial aggregation, individual predictions were assigned to the nearest cell‑tower coverage area, and tower‑level unemployment probabilities were mapped. The resulting geographic unemployment surface correlated strongly with official labor‑department statistics (Pearson r = 0.73), demonstrating that mobile‑derived estimates can approximate traditional surveys at a much finer temporal resolution (monthly updates versus annual or quinquennial censuses).

The authors acknowledge several limitations: reliance on a single carrier may introduce sampling bias; survey‑based labels can contain response errors; and privacy‑preserving anonymization may discard useful signal. They propose future work that includes multi‑carrier data fusion, deep‑learning time‑series models, and policy simulation to evaluate the impact of targeted unemployment interventions.

Overall, the study provides compelling evidence that routinely collected mobile phone metadata can serve as a low‑cost, high‑frequency proxy for labor‑market indicators, especially in contexts where official statistics are sparse or delayed. This approach holds promise for real‑time monitoring of economic vulnerability, informing social‑protection programs, and guiding urban planning in heterogeneous developing‑world settings.


Comments & Academic Discussion

Loading comments...

Leave a Comment