Predicting Well-Being with Mobile Phone Data: Evidence from Four Countries
We provide systematic evidence on the potential for estimating household well-being from mobile phone data. Using data from four countries - Afghanistan, Cote d’Ivoire, Malawi, and Togo - we conduct parallel, standardized machine learning experiments to assess which measures of welfare can be most accurately predicted, which types of phone data are most useful, and how much training data is required. We find that long-term poverty measures such as wealth indices (Pearson’s rho = 0.20-0.59) and multidimensional poverty (rho = 0.29-0.57) can be predicted more accurately than consumption (rho = 0.04 - 0.54); transient vulnerability measures like food security and mental health are very difficult to predict. Models using calls and text message behavior are more predictive than those using metadata on mobile internet usage, mobile money transactions, and airtime top-ups. Predictive accuracy improves rapidly through the first 1,000-2,000 training observations, with continued gains beyond 4,500 observations. Model performance depends strongly on sample heterogeneity: nationally-representative samples yield 20-70 percent higher accuracy than urban-only or rural-only samples.
💡 Research Summary
This paper provides a systematic, cross‑country assessment of how well mobile phone metadata can predict household well‑being in four low‑ and middle‑income countries: Afghanistan, Côte d’Ivoire, Malawi, and Togo. For each country the authors linked a household survey containing a range of welfare indicators—asset‑based wealth index, multidimensional poverty index, consumption, income, food‑security status, and mental‑health scores—to the phone number of at least one household member. From the mobile operator they obtained four categories of metadata: call and SMS logs, airtime top‑up records, mobile data usage, and mobile‑money transaction logs. The datasets differ in size (528 – 5,469 respondents), sampling frame (rural‑only, urban‑only, or nationally representative), survey mode (in‑person vs. phone), and observation window (one month vs. three months).
The authors ran parallel machine‑learning experiments using three algorithms—LASSO regression, random forest, and gradient‑boosted trees—each tuned via five‑fold cross‑validation. The model with the lowest validation RMSE was selected and evaluated on a held‑out test set, with performance measured by Pearson’s correlation (ρ) between predicted and observed values.
Four research questions guided the analysis: (i) which welfare measures are most predictable, (ii) which phone‑data types contribute most, (iii) how many labeled observations are needed for reliable models, and (iv) how sample heterogeneity influences accuracy.
Key findings:
-
Predictability of outcomes – Long‑term, structural poverty measures are the easiest to predict. Asset index correlations range from ρ = 0.20 (Afghanistan) to 0.59 (Togo); multidimensional poverty yields ρ = 0.29‑0.57. Consumption is moderately predictable (ρ = 0.04‑0.54), while income shows essentially no correlation (ρ≈0). Transient vulnerability indicators—food‑security (ρ = 0.04‑0.17) and mental‑health (ρ = 0.01‑0.23)—are poorly captured by phone behavior. Household size is modestly predictable (ρ = 0.13‑0.34).
-
Value of data types – Call and SMS logs, together with derived mobility features (e.g., tower transitions), provide the bulk of predictive power (ρ = 0.11‑0.52). Mobile‑money activity, data usage, and recharge records are far less informative when used alone (ρ = ‑0.01‑0.33). Models that combine all four data streams consistently outperform those that rely on a single type, but the marginal gain of adding the less‑informative streams is modest.
-
Training‑sample size – Predictive accuracy improves sharply up to about 1,000‑2,000 labeled individuals, after which returns diminish but remain positive up to the largest sample examined (≈4,500 in Malawi). Random forest and gradient boosting show similar performance; LASSO lags behind except when data are extremely scarce.
-
Impact of sample heterogeneity – Nationally representative samples (Togo, Côte d’Ivoire) achieve 20‑70 % higher correlations than comparable urban‑only or rural‑only subsamples. Within the same country, restricting the training set to only urban or only rural households reduces ρ by roughly 0.1‑0.2 relative to the full national model. This pattern mirrors findings from satellite‑based poverty mapping, where the urban‑rural split drives much of the explanatory power.
The discussion emphasizes that while mobile phone metadata can deliver low‑cost, high‑resolution estimates of long‑term poverty, they are ill‑suited for capturing short‑term shocks or mental‑health outcomes. The authors caution that their results may be driven by idiosyncrasies of the four case studies (different survey timings, modalities, and cultural contexts) and that the approach only predicts outcomes for phone owners, potentially excluding the poorest non‑subscribers. They call for future work that (a) expands the country pool with standardized surveys, (b) investigates bias introduced by non‑ownership, and (c) exploits longitudinal phone data to monitor welfare dynamics in near real‑time.
In sum, the paper demonstrates that (1) call/SMS and mobility features are the most valuable signals, (2) a modest training set of ~1,000 households suffices for reasonable accuracy, (3) heterogeneous, nationally representative samples dramatically boost performance, and (4) mobile phone data excel at predicting structural poverty but fall short for transient vulnerability measures. These insights guide both researchers designing phone‑based welfare estimation pipelines and policymakers considering the integration of such digital proxies into targeting and monitoring systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment