An analytical framework to nowcast well-being using mobile phone data

An intriguing open question is whether measurements made on Big Data recording human activities can yield us high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory …

Authors: Luca Pappalardo, Maarten Vanhoof, Lorenzo Gabrielli

An analytical framework to nowcast well-being using mobile phone data
An analytical framew ork to no w cast w ell-b eing using mobile phone data Luca P appalardo * † · Maarten V anho of ‡ · Lorenzo Gabrielli † · Zbigniew Smoreda ‡ · Dino Pedresc hi * · F osca Giannotti † Abstract An in triguing open question is whether mea- suremen ts made on Big Data recording human activi- ties can yield us high-fidelit y proxies of socio-economic dev elopment and w ell-b eing. Can w e monitor and pre- dict the socio-economic developmen t of a territory just b y observing the behavior of its inhabitants through the lens of Big Data? In this pap er, w e design a data- driv en analytical framework that uses mobilit y mea- sures and so cial measures extracted from mobile phone data to estimate indicators for socio-economic devel- opmen t and well-being. W e discov er that the diversity of mobility , defined in terms of entrop y of the individ- ual users’ tra jectories, exhibits (i) significant correla- tion with t wo differen t so cio-economic indicators and (ii) the highest importance in predictive mo dels built to predict the so cio-economic indicators. Our analyti- cal framework op ens an in teresting persp ective to study h uman b ehavior through the lens of Big Data b y means of new statistical indicators that quan tify and p ossibly “no wcast” the w ell-b eing and the socio-economic devel- opmen t of a territory . Keyw ords Complex Systems · Human Mobility · So cial Netw orks · Economic developmen t *Departmen t of Computer Science Universit y of Pisa, Italy E-mail: lpappalardo@di.unipi.it † Institute of Information Science and T echnologies (ISTI) National Researc h Council (CNR), Italy E-mail: fosca.giannotti@isti.cnr.it ‡ SENSE Orange Labs, F rance E-mail: zbigniew.smoreda@orange.com 1 In tro duction Big Data, the masses of digital breadcrum bs pro duced b y the information tec hnologies that humans use in their daily activities, allo w us to scrutinize individual and collective b ehavior at an unpreceden ted scale, de- tail, and sp eed. Building on this opp ortunity we hav e the potential capabilit y of creating a digital nervous system of our so ciety , enabling the measurement, mon- itoring and prediction of relev ant asp ects of the so cio- economic structure in quasi real time [21]. An in trigu- ing question is whether and how measurements made on Big Data can yield us high-fidelity proxies of so cio- economic developmen t and well-being. Can w e moni- tor and p ossibly predict the so cio-economic develop- men t of our societies just b y observing human behavior, for example h uman mov ements and so cial relationships, through the lens of Big Data? This fascinating question, also stimulated by the United Nations in recen t rep orts [2, 3], has attracted the interest of rese arc hers from several disciplines, who started inv estigating the relations b etw een human b e- ha vior and economic developmen t based on large ex- p erimen tal datasets collected for completely different purp oses [15, 34]. As a first result along this line a sem- inal work exploited a nationwide mobile phone dataset to discov er that the diversit y of so cial contacts of the in- habitan ts of a m unicipality is p ositiv ely asso ciated to a so cio-economic indicator of p ov erty , independently sur- v eyed by the official statistics institutes [15]. This result suggests that so cial behavior, to some extent, is a proxy for the economic status of a given territory . How ever, little effort has b een put in inv estigating ho w human mobility affects, and is affected b y , the so cio-economic dev elopment of a territory . Theoretical w orks suggest that human mobility is related to economic well-being, as it could nourish economic and facilitate flows of peo- 2 Luca P appalardo * † et al. ple and goo ds, whereas constraints in the p ossibilities to mo ve freely can diminish economic opp ortunities [29]. So, it is reasonable to in vestigate the role of h uman mo- bilit y with resp ect to the so cio-economic developmen t of a given territory . Our pap er provides a tw ofold contribu tion. First, w e design a data-driven analytical framework that uses Big Data to extract meaningful measures of human b e- ha vior and estimate indicators for the so cio-economic dev elopment. The analytical framework we prop ose is rep eatable on differen t countries and geographic scales since it is based on mobile phone data, the so-called CDR (Call Detail Records) of calling and texting ac- tivit y of users. Mobile phone data, indeed, can b e re- triev ed in every country due to their worldwide diffusion [7]: there are 6.8 billion mobile phone subscribers to day o ver 7 billion p eople on the planet, with a p enetration of 128% in the developed world and 90% in dev eloping coun tries. CDR data hav e prov en to b e a hi-fi proxy for individuals’ mov emen ts and so cial interactions [22, 37]. Second, w e apply the analytical framework on large- scale mobile phone data and quantify the relations b e- t ween human mobility , so cial in teractions and economic dev elopment in F rance using muni cipalit y-level official statistics as external comparison measurements. W e first define four individual measures ov er mobile phone data whic h describ e differen t asp ects of individual human b e- ha vior: the v olume of mobilit y , the diversit y of mobility , the volume of so ciality and the diversit y of so ciality . Eac h individual measure is computed for eac h of the sev eral million users in our dataset based on their lo ca- tions and calls as recorded in the mobile phone data. In a second stage, we aggregate the four individual mea- sures at the level of F renc h m unicipalities and explore the correlations b etw een the four aggregated measures and tw o external indicators of so cio-economic dev elop- men t. W e find that the aggregated mobility div ersity of individuals resident in the same m unicipality exhibits a sup erior correlation degree with the so cio-economic in- dicators and we confirm these results against tw o differ- en t null mo dels, an observ ation that allows us to reject the hypothesis that our discov ery o ccurred by chance. Next, we build regression and classification mo d- els to predict the external so cio-economic indicators from the p opulation density and the so cial and mobil- it y measures aggregated at m unicipality scale. W e sho w that the diversity of human mobility adds a significant predictiv e p ow er in b oth regression and classification mo dels, far larger than the diversit y of so cial contacts and demographic measures such as p opulation density , a factor that is kno wn to b e correlated with the inten- sit y of human activities [38, 54]. The imp ortance of this finding is tw ofold. On one side, it offers a new stimulus to social researc h: div ersity is a key concept not only for natural ecosystems but also for the so cial ecosystems, and can b e used to understand deeply the complexity of our interconnected society . On the other side, our re- sults reveal the high p otential of Big Data in providing represen tative, relatively inexp ensive and readily a v ail- able measures as pro xies of economic developmen t. Our analytical framework op ens an interesting p ersp ective to engineer official statistics pro cesses to monitor hu- man b eha vior through mobile phone data. New statis- tical indicators can b e defined to describ e and p ossibly “no wcast” the economic status of a territory , even when suc h measurements w ould b e imp ossible using tradi- tional censuses and surveys [2, 3]. The pap er is organized as follows. Section 2 revises the scientific literature relev ant to our topic, Section 3 describ es in detail the data-driven analytical framew ork w e propose. In Section 4, Section 5 and Section 6 w e apply our analytical framework on a nationwide mobile phone dataset co v ering several weeks of call activity in F rance. W e introduce the mobile phone data in Sec- tion 4.1, the measures of individual mobility b ehavior and individual so cial b eha vior in Section 4.2, and the computations of the measures on a nation-wide mobile phone dataset in Section 4.3. In Section 5 we describ e the results of the correlation analysis and v alidate them against tw o null mo dels. In Section 6 we present and v alidate predictive mo dels for so cio-economic develop- men t. In Section 7 we discuss the results and finally Section 8 concludes the pap er describing the opp ortu- nities and the challenges that arise from our research. 2 Related w ork The interest around the analysis of Big Data and the p ossibilit y to compile them into a comprehensive pic- ture of human b eha vior hav e infected all branches of h uman knowledge, from sp orts [11] to economy [42]. Ho wev er, t wo asp ects in particular attracted the in- terest of scien tists in the last decade, due the striking abundance of data in those contexts: human mobility and so cial netw orks. Studies from different disciplines do cument a stun- ning heterogeneity of human tra v el patterns [22, 39], and at the same time observe a high degree of pre- dictabilit y [51, 16]. The patterns of h uman mobility ha ve b een used to build generative mo dels of individual hu- man mobility and human migration flows [28, 48], meth- o ds for profiling individuals according to their mobility patterns [40], to discov er geographic b orders according to recurrent trips of priv ate vehicles [47], or to predict the formation of so cial ties [10, 53], and classification An analytical framew ork to now cast well-being using mobile phone data 3 Fig. 1 The data-driven analytical framework. Starting from mobile phone data (a) mobility and so cial measures are computed for each individual in the dataset (b). Each individual is then assigned to the territory where she resides (c) and the individual measures are aggregated at territorial level (d). Starting from the aggregated measures predictive models are constructed (e) in order to estimate and predict the socio-economic developmen t of the territories (f ). mo dels to predict the kind of activit y asso ciated to indi- viduals’ trips on the only basis of the observed displace- men ts [31, 27, 46]. In the con text of so cial netw ork anal- ysis the observ ation of so cial in teractions data pro vided b y emails, mobile phones, and so cial media allow ed to rev eal the complexity underlying the so cial structure [6]: hubs exist in our so cial netw orks who strongly con- tribute to the so-called small world phenomenon [5], and social netw orks are found to hav e a tendency to partition into so cial communities, i.e. clusters of densely connected sets of individuals [17]. The last few years hav e also witnessed a growing in terest around the usage of Big Data to supp ort of- ficial statistics in the measuremen t of individual and collectiv e well-being [12, 52]. Even the United Nations, in tw o recent rep orts, stimulate the usage of Big Data to in vestigate the patterns of phenomena relative to p eo- ple’s health and well-being [2][3]. The v ast ma jority of w orks in the con text of Big Data for official statistics are based on the analysis of mobile phone data, the so- called CDR (Call Detail Records) of calling and texting activit y of users. Mobile phone data, indeed, guaran- tee the rep eatabilit y of exp eriments on different coun- tries and geographical scales since they can b e retriev ed no wada ys in every country due to their worldwide dif- fusion [7]. A set of recent works use mobile phone data as a proxy for so cio-demographic v ariables. Deville et al., for example, sho w ho w the ubiquity of mobile phone data can b e exploited to pro vide accurate and detailed maps of p opulation distribution ov er national scales and an y time perio d [14]. Brea et al. study the structure of the so cial graph of mobile phone users of Mexico and prop ose an algorithm for the prediction of the age of mobile phone users [9]. Another recent work uses mo- bile phone data to study inter-cit y mobilit y and dev elop a metho dology to detect the fraction of residents, com- m uters and visitors within each city [19]. A lot of effort has b een put in recent years on the usage of mobile phone data to study the relationships b et w een human b ehavior and collective so cio-economic dev elopment. The seminal work by Eagle et al. analyzes a nationwide mobile phone dataset and shows that, in the UK, regional communication diversit y is p ositively asso ciated to a so cio-economic ranking [15]. Gutierrez et al. address the issue of mapping p o v erty with mo- bile phone data through the analysis of airtime credit purc hases in Iv ory Coast [24]. Blumensto ck shows a pre- liminary evidence of a relationship b etw een individual w ealth and the history of mobile phone transactions [8]. Decuyp er et al. use mobile phone data to study fo o d security indicators finding a strong correlation b e- t ween the consumption of vegetables rich in vitamins and airtime purchase [13]. F rias-Martinez et al. analyze the relationship b et ween h uman mobilit y and the socio- economic status of urban zones, presenting whic h mo- bilit y indicators correlate b est with so cio-economic lev- els and building a mo del to predict the so cio-economic lev el from mobile phone traces [18]. Pappalardo et al. 4 Luca P appalardo * † et al. analyze mobile phone data and extract meaningful mo- bilit y measures for cities, discov ering interesting cor- relation b etw een human mobility asp ects and so cio- economic indicators [41]. Lotero et al. analyze the ar- c hitecture of urban mobility net works in tw o Latin- American cities from the multiplex p ersp ective. They disco ver that the so cio-economic characteristics of the p opulation hav e an extraordinary impact in the la yer organization of these m ultiplex systems [33]. Amini et al. use mobile phone data to compare human mobility patterns of a dev eloping country (Iv ory Coast) and a dev elop ed country (Portugal). They show that cultural div ersity in developing regions can present challenges to mobility mo dels defined in less culturally div erse re- gions [4]. Smith-Clarke at al. analyze the aggregated mobile phone data of tw o developing countries and ex- tract features that are strongly correlated with p o vert y indexes derived from official statistics census data [49]. Other recen t w orks use different t yp es of mobility data, e.g. GPS tracks and market retail data, to show that Big Data on human mov ements can b e used to sup- p ort official statistics and understand people’s purc hase needs. Pennacc hioli et al. for example provide an em- pirical evidence of the influence of purc hase needs on h uman mobility , analyzing the purchases of an Italian sup ermark et chain to show a range effect of pro ducts: the more sophisticated the needs they satisfy , the more the customers are willing to trav el [43]. Marchetti et al. p erform a study on a regional level analyzing GPS trac ks from cars in T uscan y to extract measures of hu- man mobility at province and municipalit y level, finding a strong correlation betw een the mobilit y measures and a p ov ert y index indep endently survey ed by the Italian official statistics institute [34]. Despite an increasing interest around this field, a view on the state-of-the-art cannot av oid to notice that there is no a unified metho dology to exploit Big Data for official statistics. It is also surprising that widely accepted measures of human mobility (e.g. radius of gyration [22] and mobilit y entrop y [51]) ha ve not b een used so far. W e ov ercome these issues by pro viding an analytical framew ork as support for official statistics, whic h allo ws for a systematic ev aluation of the relations b et w een relev ant aspects of human b ehavior and the dev elopment of a territory . Moreov er, our pap er shows ho w standard mobility measures, not exploited so far, are p ow erful to ols for official statistics purp oses. 3 The Analytical F ramework Our analytical framew ork is a kno wledge and analyti- cal infrastructure that uses Big Data to provide reliable measuremen ts of so cio-economic developmen t, aiming at satisfying the increasing demand b y p olicy makers for contin uous and up-to-date information on the geo- graphic distribution of p ov erty , inequality or life condi- tions. Figure 1 describ es the structure of the metho dol- ogy w e prop ose. The analytical framework is based on mobile phone data, which guarantee the rep eatability of the pro cess on different countries and geographical scales. Mobile phone data are indeed ubiquitous and can b e retriev ed in every country due to their w orldwide diffusion: no wada ys the p enetration of mobile phones is of 128% in dev elop ed countr ies and 90% in develop- ing countries, with 6.8 billion mobile phone subscrib ers to da y o ver 7 billion p eople on the planet [7]. In partic- ular the call detail records (CDR), generally collected b y mobile phone op erators for billing and op erational purp oses, contain an enormous amount of information on how, when, and with whom p eople communicate. This wealth of information allows to capture differen t asp ects of human b ehavior and stimulated the creativ- it y of scientists from different disciplines, who demon- strated that mobile phone data are a high quality pro xy for studying individual mobility and so cial ties [22, 37]. Starting from the collected mobile phone data (Fig- ure 1(a)) a set of measures are computed which grasp the salien t asp ects of individuals’ mobility and social b eha vior (Figure 1(b)). This step is computationally exp ensiv e when the analytical framework is applied on massiv e data such as the CDRs of an entire coun try for a long p erio d. T o parallelize the computations and sp eed up the execution a distributed pro cessing plat- form can b e used such as Hado op or Spark. A wide set of mobilit y and so cial measures can b e computed dur- ing this phase, and the set can b e enlarged with new measures as so on as they are pro ven to b e correlated with socio-economic developmen t aspects of in terest. In Section 4.2 w e prop ose, as an example, a set of standard measures of individual mobility and so ciality and show ho w they can b e computed on mobile phone data. As generally required by policy mak ers, official statis- tics ab out so cio-economic developmen t are av ailable at the level of geographic units, e.g. regions, pro vinces, m unicipalities, districts or census cells. Therefore, the individuals in the dataset hav e to b e mapp ed to the corresp onding territory of residence, in order to p er- form an aggregation of the individual measures into a territorial measure (Figure 1(c) and 1(d)). When the cit y of residence or the address of the users are av ail- able in the data, this information can b e easily used to assign each individual to corresp onding city of resi- dence. Unfortunately these so cio-demographic data are generally not av ailable in mobile phone data for priv acy and proprietary reasons. This issue can b e solved, with a certain degree of approximation, by inferring the in- An analytical framew ork to now cast well-being using mobile phone data 5 formation from the data source. In literature the phone to wer where a user makes the highest num b er of calls during nighttime is usually considered her home phone to wer [44]. Then with standard Geographic Information System techniques it is p ossible to asso ciate the phone to wer to its territory (see Section 4.3). The obtained aggregated measures are compared with the external so cio-economic indicators to p erform correlation analysis, learn and ev aluate predictiv e mo d- els (Figure 1(e)). The predictive mo dels can b e aimed at predicting the actual v alue of so cio-economic de- v elopment of the territory , e.g. b y regression mo dels (Section 6.1), or to predict the class of so cio-economic dev elopment, i.e. the lev el of developmen t of a given geographic unit as done b y classification mo dels (Sec- tion 6.2). Finally , the estimates and the predictions pro- duced by the mo dels are the output of the analytical framew ork (Figure 1(f )). The measures, the territorial aggregation and the predictiv e mo dels of the analyti- cal framework can b e up dated every time new mobile phone data b ecome av ailable, providing p olicy makers with up-to-date estimates of the so cio-economic situ- ation of a giv en territory , in con trast with indicators pro duced b y official statistics institutes with are gener- ally released after months or ev en once a year. In the following sections we apply the prop osed an- alytical framework on a large-scale nation-wide mobile phone dataset and describ e its implementation step by step: from the definition of measures on the data (Sec- tions 4.1 and 4.2), to their computation and territorial aggregation (Section 4.3), and the construction of pre- dictiv e mo dels (Sections 5.1, 6.1 and 6.2). 4 Measuring Human Behavior W e now discuss steps (a), (b) and (c) in Figure 1, pre- sen ting the exp erimental setting which consists in the computation of the individual measures on the data and their aggregation at territorial level. First, we describ e the mobile phone data we use as proxy for individual b e- ha vior, together with details ab out data prepro cessing (Section 4.1). Then we define the individual measures capturing div erse asp ects of individual mobility and so- cial b ehavior (Section 4.2). Finally we s ho w how we compute the individual measures and aggregate them at municipalit y level (Section 4.3). 4.1 Mobile phone data W e hav e access to a set of Call Detail Records (CDR) gathered for billing purp oses by Orange mobile phone op erator, recording 215 million calls made during 45 da ys b y 20 million anon ymized mobile phone users. CDRs collect geographical, temp oral and interaction in- formation on mobile phone use and show an enormous p oten tial to empirically in vestigate human dynamics on a society wide scale [26]. Eac h time an individual mak es a call the mobile phone op erator registers the connec- tion b etw een the caller and the callee, the duration of the call and the coordinates of the phone to wer commu- nicating with the served phone, allowing to reconstruct the user’s time-resolved tra jectory . T able 1 illustrates an example of the structure of CDRs. (a) timestamp to wer caller callee 2007/09/10 23:34 36 4F80460 4F80331 2007/10/10 01:12 36 2B01359 9H80125 2007/10/10 01:43 38 2B19935 6W1199 . . . . . . . . . . . . (b) to wer latitude longitude 36 49.54 3.64 37 48.28 1.258 38 48.22 -1.52 . . . . . . . . . T able 1 Example of Call Detail Records (CDRs). Ev ery time a user makes a call, a record is created with timestamp, the phone to wer serving the call, the caller identifier and the callee identifier (a). F or eac h to wer, the latitude and longitude co- ordinates are a v ailable to map the tow er on the territory (b). In order to fo cus on individuals with reliable statis- tics, we carry out some prepro cessing steps. First, we select only users with a call frequency higher than the threshold f = N / 45 > 0 . 5, where N is the num b er of calls made b y the user and 45 da ys is the length of our p erio d of observ ation, we delete all the users with less than one call ev ery tw o days (in av erage ov er the observ ation p erio d). Second, we reconstruct the mobility tra jectories and the so cial net work of the filtered users. W e reconstruct the tra jectory of a user based on the time-ordered list of cell phone tow ers from which she made her calls dur- ing the p erio d of observ ation (see Figure 2). W e then translate the CDR data into a so cial netw ork represen- tation b y linking tw o users if at least one recipro cated pair of calls exists b etw een them during the p erio d of observ ation (i.e. A called B and B called A). This pro- cedure eliminates a large n umber of one-w ay calls, most of which corresp ond to isolated ev ents and do not repre- 6 Luca P appalardo * † et al. Fig. 2 The detailed tra jectory of a single user. The phone to wers are sho wn as grey dots, and the V oronoi lattice in grey marks appro ximate reception area of each to wer. CDRs records the iden tity of the c losest tow er to a mobile user; thus, w e can not identify the p osition of a user within a V oronoi cell. The tra jectory describ es the user’s mov ements during 4 da ys (each da y in a different color). The to wer where the user made the highest num b er of calls during nighttime is depicted in bolder grey . sen t meaningful communications [37]. Figure 3 shows a fraction of the so cial netw ork centered on a single user. The resulting dataset contains the mobilit y tra jectories of 6 million users and a call graph of 33 million edges. Fig. 3 A fraction of the call graph centered on a single user u . Nodes represent users, edges indicate recipro cated calls b et w een the users, the size of the edges is proportional to the total n um b er of calls betw een the users during the 45 days. 4.2 Measure Definition W e introduce t w o measures of individual mobility b e- ha vior and tw o measures of individual so cial b ehavior, dividing them into tw o categories: measures of volume, and measures of diversit y (see T able 4.2). individual measures so cialit y soci al v olume S V soci al div ersit y S D mobilit y mobility v olume M V mobility div ersity M D so cio-economic indicators demographic p opulation density P D dev elopment depriv ation index DI p er capita income P C I T able 2 Measures and indicators used in our study . So cial v olume, so cial diversit y , mobility volume and mobility diver- sity are individual measures computed on mobile phone data. Po pulation density , depriv ation index, and p er capita income are external so cio-economic indicators provided by INSEE. W e define tw o measures that capture asp e cts of indi- vidual so cial interactions: so cial volume ( S V ), the num- b er of so cial contacts of an individual; and so cial diver- sity ( S D ), the diversification of an individual’s calls o ver the so cial contacts. Within a social netw ork, w e can express the v olume of so cial interactions by count- ing the amount of links an individual p ossesses with others. This simple measure of connectivity is widely used in netw ork science and is called the de gr e e of an individual [36]. In a call graph the degree of an indi- vidual is the num b er of different individuals who are in con tact by mobile phone calls with her. W e can there- fore see the degree as a proxy for the v olume of so ciality for each individual: S V ( u ) = deg r ee ( u ) (1) The degree distribution is well appro ximated by a p ow er la w function denoting a high heterogeneity in so cial net- w orks with respect to the n um b er of friendships [30, 37]. The social diversit y of an individual u quantifies the top ological div ersity in a social netw ork as the Shannon en tropy asso ciated with her communication b ehavior [15]: S D ( u ) = − P k v =1 p uv log( p uv ) log( k ) (2) where k is the degree of individual u , p uv = V uv P k v =1 V uv and V uv is the num b er of calls betw een individual u and individual v during the p erio d of observ ation. S D is a An analytical framework to now cast well-being using mobile phone data 7 measure for the so cial diversification of each individ- ual according to its own interaction pattern. In a more general wa y , individuals who alwa ys call the same few con tacts reveal a low so cial diversification resulting in lo wer v alues for S D , whereas individuals who distribute their call among many different contacts show high so- cial diversification, i.e. higher S D . The distribution of S D across the p opulation is p eaked, as measured in GSM and landlines data [15]. Starting from the mobility tra jectories of an indi- vidual, we define tw o measures to describe individual mobilit y: mobility volume ( M V ), the typical trav el dis- tance of an individual, and mobility diversity ( M D ), the diversification of an individual’s mov ements ov er her lo cations. The radius of gyration [22] provides with a measure of mobility volume, indicating the charac- teristic distance trav eled b y an individual (see Figure 4). In detail, it c haracterizes the spatial spread of the phone tow ers visited by an individual u from the tra- jectories’ center of mass (i.e. the w eighted mean p oint of the phone tow ers visited by an individual), defined as: M V ( u ) = s 1 N X i ∈ L n i ( r i − r cm ) 2 (3) where L is the set of phone to wers visited by the in- dividual u , n i is the individual’s visitation frequency of phone to wer i , N = P i ∈ L n i is the sum of all the single frequencies, r i and r cm are the vectors of co ordi- nates of phone tow er i and center of mass resp ectively . It is known that the distribution of the radius of gyra- tion reveals heterogeneity across the p opulation: most individuals trav el within a short radius of gyration but others cov er long distances on a regular basis, as mea- sured on GSM and GPS data [22, 39]. Besides the volume of individual mobility , we define the div ersity of individual mobilit y b y using Shannon en tropy of individual’s trips: M D ( u ) = − P e ∈ E p ( e ) log p ( e ) log N (4) where e = ( a, b ) represents a trip b etw een an origin phone tow er and a destination phone to wer, E is the set of all the possible origin-destination pairs, p ( e ) is the probability of observing a mo v emen t betw een phone to wers a and b , and N is the total num b er of tra jec- tories of individual u (Figure 5). Analogously to S D , M D is high when a user p erforms many different trips from a v ariety of origins and destinations; M D is low when a user performs a small num b er of recurring trips. Seen from another persp ective, the mobilit y div ersity of an individual also quan tifies the p ossibilit y to predict individual’s future whereabouts. Individuals having a v ery regular mo vemen t pattern possess a mobility di- v ersity close to zero and their whereab outs are rather predictable. Conv ersely , individuals with a high mobil- it y diversit y are less predictable. It is kno wn that the distribution of the mobility diversit y is p eaked across the population and very stable across different so cial groups (e.g. age and gender) [51]. home%loca)on% center%of%mass% radius %of% gyra)on% A" (a) B" (b) Fig. 4 The radius of gyration of tw o users in our dataset . The figure shows the spatial distribution of phone to wers (circles). The size of circles is proportional to their vis- itation frequency , the red location indicates the most frequent lo cation L 1 (the lo cation where the user makes the highest num b er of calls during nighttime). The cross indicates the p osition of the center of mass, the black dashed line indicates the radius of gyration. User A has a small radius of gyra- tion b ecause she trav els b etw een locations that are close to each other. User B has high radius of gyration b ecause the lo cations she visits are far apart from each other. X" (a) Y " (b) Fig. 5 The mobility entrop y of t wo users in our dataset. No des represen t phone to wers, edges represent trips b et ween tw o phone tow ers, the size of nodes indicates the num b er of calls of the user managed by the phone tow er, the size of edges indicates the num b er of trips p erformed by the user on the edge. User X has low mobility en tropy because she distributes the trips on a few large preferred edges. User Y has high mobility entrop y b ecause she distributes the trips across many equal-sized edges. 8 Luca Pappalardo * † et al. 4.3 Measure computation W e implemen t step (b) in Figure 1 by computing the four b ehavioral measures for eac h individual on the fil- tered CDR data. Due the size of the dataset, w e use the MapReduce paradigm implemented by Hado op to dis- tribute the computation across a cluster of co ordinated no des and reduce the time of computation. W e find no relationship b etw een the mobility and the so cial mea- sures at individual level: the correlation b etw een S V and M V , as w ell as the correlation b etw een the S D and the M D , are close to zero. This suggests that the mobilit y measures and the so cialit y measures capture differen t asp ects of individual b ehavior. W e apply step (c) in Figure 1 b y aggregating the individual measures at the municipalit y level through a tw o-step pro cess: (i) we assign to each user a home lo cation, i.e. the phone to wer where the user p erforms the highest num b er of calls during nighttime (from 10 pm to 7 am) [44]; (ii) based on these home lo cations, w e assign eac h user to the corresponding m unicipality with standard Geographic Information Systems techniques. Figure 6 shows the spatial distribution of Orange users in F renc h municipalities. W e aggregate the S V , S D , M V and M D at municipalit y level by taking the mean v alues across the p opulation of users assigned to that m unicipality . W e obtain 5 , 100 municipalities each one with the asso ciated four aggregated measures. 5 Correlation Analysis Here we realize step (d) in Figure 1 and study the in terplay betw een human mobility , social interactions and so cio-economic developmen t at municipalit y level. First, in Section 5.1 w e introduce the external so cio- economic indicators and inv estigate their correlation b et w een the b ehavioral measures aggregated at m unici- palit y level. Then in Section 5.2 we compare the results with t wo n ull mo dels to reject the hypothesis that the correlations app ear by chance. 5.1 Human Behavior versus So cio-Economic Dev elopment As external so cio-economic indicators, we use a dataset pro vided by the F renc h National Institute of Statistics and Economic Studies (INSEE) ab out so cio-economic indicators for all the F rench municipalities with more than 1,000 official residents. W e collect data on p op- ulation densit y ( P D ), p er capita income ( P C I ), and a depriv ation index ( D I ) constructed b y selecting fun- damen tal needs asso ciated both with ob jective and sub- Fig. 6 The spatial distribution of users ov er F rench munic- ipalities with more than 1,000 official residents. Each user is assigned to a municipalit y according to the geographic p osi- tion of her home lo cation. The color of municipalities, in a gra- dient from blue to red, indicates the num b er of Orange users assigned to that municipalit y . W e observe that the num b er of users in the municipalities v aries according to the density of the municipalit y . jectiv e p ov erty [45]. The depriv ation index is constructed b y selecting among v ariables reflecting individual exp e- rience of depriv ation: the differen t v ariables are com- bined into a single score by a linear combination with sp ecific choices for co efficients (see App endix). There- fore depriv ation index is a comp osite index: the higher its v alue, the low er is the w ell-b eing of the munici- palit y . Preliminary v alidation show ed a high asso cia- tion b etw een the F rench depriv ation index and b oth income v alues and education level in F rench munici- palities, partly supp orting its abilit y to measure so cio- economic developmen t [45]. W e inv estigate the correlations b et ween the aggre- gated measures and the external socio-economic indica- tors finding t wo main results. First, the so cial v olume is not correlated with the t wo so cio-economic indica- tors (Figure 7(c) and (d)), while mobility v olume is correlated with per capita income (Figure 7(b)). Sec- ond, we find that mobilit y div ersity is a b etter predictor for so cio-economic developmen t than so cial diversit y . Figure 7(e)-(h) shows the relations b etw een diversit y measures and socio-economic indicators. F or mobility div ersity clear tendencies appear: as the mean mobil- it y diversit y of municipalities increases, depriv ation in- An analytical framework to now cast well-being using mobile phone data 9 (a) (b) (c) (d) (e) (f ) (g) (h) Fig. 7 The relation b etw een the aggregated diversit y measures and the so cio-economic indicators: (a) mobility v olume vs depriv ation index; (b) mobility volume vs p er capita income; (b) so cial volume vs depriv ation index; (d) so cial volume vs p er capita income; (e) mobility diversit y vs depriv ation index; (f ) mobility diversit y vs p er capita income; (g) so cial diversit y vs depriv ation index; (h) so cial diversit y vs p er capita income. The color of a p oint indicates, in a gradient from blue to red, the density of p oints around it. W e split the m unicipalities into ten equal-sized groups according to the deciles of the measures on the x axis. F or each group, we compute the mean and the standard deviation of the measures on the y axis and plot them through the black error bars. ρ indicates the Pearson correlation co efficient b etw een the tw o measures. In all the cases the p-v alue of the correlations is < 0 . 001. dex decreases, while p er capita income increases (Fig- ure 7(e) and (f )). So cial diversit y , in contrast, exhibits a w eaker correlation with the depriv ation index than mobilit y diversit y and no correlation with p er capita income (Figure 7(g) and (h)). Figure 8 pro vides another wa y to observ e the rela- tions b etw een the diversit y measures and so cio-economic dev elopment. W e split the m unicipalities in ten deciles according to the v alues of depriv ation index. F or each decile we compute the distributions of mean mobilit y div ersity and mean so cial diversit y across the munici- palities in that decile. F or mobility div ersity , the deciles of the economic v alues increase while the mean de- creases and the v ariance increases, highlighting a change of the distribution in the different groups. This is con- sisten t with the observ ation made in the plots of Figure 7(e). Conv ersely , for so cial diversit y distribution we do not observe a significan t change in the mean and the v ariance. The observed v ariation of the mobility diver- sit y distribution in the different deciles is an interest- ing finding when compared to previous works such as Song et al. [50] which states that mobile predictability is v ery stable across different subp opulations delineated b y p ersonal characteristics like gender or age group. Figure 7 and 8 suggest us that the diversit y of human mobilit y aggregated at municipalit y level is b etter as- so ciated with the so cio-economic indicators than so cio- demographic characteristics. The relation b etw een mo- bilit y diversit y and depriv ation index is stronger and more evenly distributed ov er the different levels of de- priv ation index for municipalities. 5.2 V alidation against Null Mo dels In order to test the significance of the correlations ob- serv ed on the empirical data, we compare our findings with the results pro duced b y t wo n ull mo dels. In null mo del NM1, we randomly distribute the users o ver the F rench municipalities. W e first extract uni- formly N users from the dataset and assign them to a random municipalit y with a p opulation of N users. 10 Luca Pappalardo * † et al.                                                                                                                                                                  (a)                                                                                                                (b) Fig. 8 The distribution of mobility diversit y (a) and so cial div ersity (b) in the deciles of depriv ation index. W e split the municipalities into ten equal-sized groups computed accord- ing to the deciles of depriv ation index. F or each group, w e plot the distributions of mean mobility diversit y and mean soc ial diversit y . The blue dashed curve represent a fit of the distribution, the red dashed line represents the mean of the distribution. W e then aggregate the individual diversit y measures of the users assigned to the same municipalit y . W e rep eat the pro cess 100 times and take the mean of the aggre- gated v alues of each m unicipalit y pro duced in the 100 exp erimen ts. In null mo del NM2, we randomly shuffle the v alues of the so cio-economic indicators ov er the municipali- ties. W e p erform this pro cedure 100 times and take, for eac h m unicipality , the mean v alue of the socio-economic indicators computed ov er the 100 pro duced v alues. In con trast with empirical data, we find no correlation in the n ull mo dels b etw een the diversit y measures and the so cio-economic indicators, neither for mobility diversit y nor for so cial diversit y (Figure 9). Suc h a clear differ- ence b etw een the correlations observed ov er empirical data and the absence of correlations in observ ations on randomized data allo ws us to reject the h yp othesis that our findings are obtained by chance. 6 Predictiv e Mo dels In this section we instantiate step (e) building and v al- idating b oth regression mo dels (Section 6.1) and clas- sification mo dels (Section 6.2) to predict the external so cio-economic indicators from the aggregated measures. 6.1 Regression Mo dels T o learn more ab out the relationship b etw een the ag- gregated measures and the so cio-economic indicators w e implement tw o multiple regression mo dels M1 and M2. W e use depriv ation index as dep endent v ariable in mo del M1, p er capita income as dep enden t v ariable for mo del M2, the four aggregated measures and p opu- lation density as regressors for b oth mo dels. W e deter- mine the regression line using the least squared metho d. The mo del M1 for depriv ation index pro duces a co ef- ficien t of determination R 2 = 0 . 43, meaning that the regressors explain the 43% of the v ariation in the depri- v ation index. The mo del M2 for p er capita income ex- plains the 25% of the v ariation in the p er capita income pro ducing a a co efficient of determination R 2 = 0 . 25. T able 3 and T able 4 show the co efficients of the regres- sion equations, the standard error of the co efficien ts and the p-v alues of the regressors for mo del M1 and mo del M2 resp ectiv ely . F or b oth mo del M1 and M2 we ha ve verified the absence of m ulticollinearity b etw een the regressors, the normalit y and the homoskedasticit y of regression residuals. Mo del M1 (depriv ation index), R 2 = 0 . 4267 co efficien ts std. error p-v alue PD 0.247 0.005 < 2 × 10 − 16 MD -2.980 0.0575 < 2 × 10 − 16 SD -2.153 0.2027 < 2 × 10 − 16 MV 0.002 0.0002 5 . 35 × 10 − 16 SV 0.006 0.0027 0 . 013 in tercept 4.078 0.1281 < 2 × 10 − 16 T able 3 The linear regression mo del M1 for depriv ation in- dex. The c o efficients column sp ecifies the v alue of slope calcu- lated by the regression. The std. err or column measures the v ariability in the estimate for the co efficients. The p-value column shows the probability the v ariable is not relev ant. W e quantify the con tribution of each re gressor to the multiple regression mo del by computing a relativ e imp ortance metric [23]. Figure 10 sho ws the relative im- p ortance of regressors produced by the LMG metho d [32] for b oth mo del M1 and mo del M2. W e observe that mobility diversit y giv es the highest contribution to the regression, accounting for the 54% and 65% of the imp ortance for M1 and M2 resp ectiv ely , while so- cial diversit y pro vides a little con tribution (0.7% for M1 An analytical framework to now cast well-being using mobile phone data 11 (a) (b) (c) (d) (e) (f ) (g) (h) Fig. 9 The relation b etw een the so cio-economic indicators and the diversit y measures computed on null mo del NM1 (a-d) and null mo del NM2 (e-h). The color of a p oint indicates, in a gradient from blue to red, the density of p oints around it. W e split the municipalities into ten equal-sized groups according to the deciles of the measures on the x axis. F or each group, we compute the mean and the standard deviation of the measure on the y axis (the black error bars). Mo del M2 (p er capita income), R 2 = 0 . 25 co efficien ts std. error p-v alue PD 781.94 74.84 < 2 × 10 − 16 MD 22,773.47 729.05 < 2 × 10 − 16 SD 18,451.79 2,569.05 7 . 82 × 10 − 13 MV 63.116 3.64 < 2 × 10 − 16 SV 191.16 34.62 3 . 56 × 10 − 8 in tercept -18,933.66 1,624.36 < 2 × 10 − 16 T able 4 The linear regression mo del M2 for p er capita in- come. The c o efficients column sp ecifies the v alue of slop e cal- culated by the regression. The std. err or column measures the v ariability in the estimate for the co efficient. The p-value column shows the probability the v ariable is not relev ant. and 0.3% for M2). Population densit y pro vides an im- p ortan t contribution in b oth mo dels, mobilit y volume is an imp ortan t v ariable to mo del M2 only (20% of the v ariance). T o v alidate the mo dels w e implement a cross v ali- dation pro cedure by p erforming 1,000 exp eriments. In eac h exp erimen t we divide the dataset of m unicipalities in to a training set (60%) and a tes t set (40%), compute mo del M1 and mo del M2 on the training set, and ap- ply the obtained models on the test set. W e ev aluate the p erformance of the mo dels on the test set using the (a) (b) Fig. 10 The relative importance of the aggregated measures in the multiple regression mo dels M1 (a) and M2 (b). W e use the Lindeman, Merenda and Gold (LMG) metho d to quan- tify an individual regressor’s contribution to the mo del. W e observ e that mobility diversit y is the most important v ariable in the mo del with a contribution of ab out 54% and 65% for mo del M1 and M2 resp ectively . ro ot mean square error RM S E = p P n i ( ˆ y i − y i ) 2 ) /n , where ˆ y i is the v alue predicted b y the mo del and y i the actual v alue in the test set, and computing the CV(RMSE), i.e. the RMSE normalized to the mean of the observed v alues. Figure 11 shows the v ariation of 12 Luca Pappalardo * † et al. R 2 and C V ( R M S E ) across the 1,000 exp eriments. W e observ e that the prediction error of the mo dels is stable across the experiments (Figure 11(a) and (c)), and that the error in the prediction is lo wer for model M1 (depri- v ation index). Finally , we compare the actual v alues of (a) (b) (c) (d) Fig. 11 V alidation of regression mo dels. W e p erform 1,000 exp erimen ts learning the mo del on a training set (60%) and ev aluating it on a a test set (40%). (a) The distribution of the adjusted co efficient of determination R 2 across the ex- p erimen ts for mo del M1. (b) The distribution of the root mean square error (RMSE) across the exp eriments for mo del M1. (c) The distribution of the adjusted R 2 across the ex- p erimen ts for mo del M2. (d) The distribution of RMSE for mo del M2. so cio-economic indicators and the v alues predicted by the models by computing the relativ e error, i.e. for each m unicipality i we compute ( ˆ y i − y i ) /y . W e observe that the mean relative error computed across the m unicipal- ities is close to zero for b oth mo del M1 and mo del M2 (Figure 12). 6.2 Classification Mo dels Here, instead of predicting the v alue of depriv ation or p er capita income of municipalities we wan t to classify the lev el of so cio-economic dev elopmen t of municipali- ties. T o this purp ose we build tw o sup ervised classifiers C1 and C2 that assign each municipalit y to one of three (a) (b) Fig. 12 The distribution of the relativ e error ( ˆ y i − y ) /y across the F renc h municipalities for regression mo dels M1 (a) and M2 (b). p ossible categories: low level, medium lev el or high level of depriv ation index (classifier C1) or per capita income (classifier C2). T o transform the tw o contin uous mea- sures depriv ation index and p er capita income into dis- crete v ariables we partition the range of v alues using the 33th p ercen tile of the distribution. This pro duced, for eac h v ariable to predict, three equal-p opulated classes. W e p erform the classification using Random F orest clas- sifiers on a training set (60% of the dataset) and v al- idate the results on a test set (40% of the dataset). Classifier C1 for depriv ation index reaches an o verall accuracy of 0.61, while the ov erall accuracy of classifier C2 for p er capita income is 0.54, against a random case accuracy of 0.33. T able 5 shows precision, recall and o verall accuracy reached by classifier C 1 and classifier C 2 on the three classes of so cio-economic developmen t. W e also ev aluate the imp ortance of every aggregated measure in classifying the lev el of so cio-economic dev el- opmen t of municipalities, using the Mean Decrease Gini measure. Similarly to the Relative Imp ortance metrics for the regression models, in both classifier C 1 and clas- sifier C 2 the mobilit y diversit y has the highest imp or- tance, follow ed by p opulation density (Figure 13). 7 Discussion of Results The implementation of the analytical framework on mo- bile phone data pro duces three remark able results. First, the usage of the measures of mobility and so cial b ehavior together with the standard and com- monly av ailable so cio-demographic information actu- ally adds pr e dictive p ower with resp ect to the external so cio-economic indicators. Indeed, while a univ ariate regression that predicts depriv ation index from p opula- tion densit y is able to explains only the 11% of the v ari- ance, b y adding the four b ehavioral measures extracted An analytical framework to now cast well-being using mobile phone data 13 (a) (b) Fig. 13 The mean decrease in Gini co efficient of the v ariables used to learn the classifiers, for depriv ation index (a) and p er capita income (b). The mean decrease in Gini co efficient is a measure of how eac h v ariable contributes to the homogeneity of no des and leav es in the resulting random forest. Mo del C1: accur acy = 0 . 61 recall precision lo w depriv ation 0.6230 0.6657 medium depriv ation 0.4970 0.4918 high depriv ation 0.7089 0.6721 Mo del C2 , accur acy = 0.54 recall precision lo w income 0.6098 0.5700 medium income 0.3590 0.3993 high income 0.6552 0.6376 T able 5 Statistics by class for classifier C1 (depriv ation in- dex) and classifier C2 (p er capita income). The recall is the n umber municipalities for which the classifier predicts the correct class divided by the num b er of municipalities in that class. The precision is the num b er of m unicipalities for which the classifier predicts the correct class divided by the num b er of municipalities the classifier predicts to b e in that class. W e observ e that the classes ‘low’ and ‘high’ are the b est predicted classes. from mobile phone data we can explain the 42% of the v ariance (see T able 3). This outcome suggests that mo- bile phone data are able to provide precise and realis- tic measuremen ts of the b ehavior of individuals in their complex so cial environmen t, which can b e used within a kno wledge infrastructure like our analytical framework to monitor so cio-economic developmen t. Second, the diversific ation of human movements is the most important aspect for explaining the so cio- economic status of a given territory , far larger than the div ersification of so cial interactions and demographic features lik e p opulation densit y . This result, which is eviden t from b oth the correlations analysis and the con tribution of mobility div ersity in the mo dels (Fig- ures 7, 10 and 13), is also imp ortan t for practical rea- sons. Mobile phone providers do not generally release, for priv acy reasons, information ab out the call inter- actions b etw een users, i.e. the so cial dimension. Our result shows that this is a marginal problem since the so cial dimension has a low er impact to the qualit y of the mo dels than the mobility dimension (Figures 10 and 13). Hence, the implemen tation of our analytical framew ork guarantees reliable results even when, as of- ten o ccurs b ecause of priv acy and proprietary reasons, the so cial dimension is not av ailable in the data. The interpretation of the observed relation b etw een mobilit y diversit y and so cio-economic indicators is, with- out a doubt, t wo-directed. It migh t b e that a well- dev elop ed territory provides for a wide range of activ- ities, an adv anced netw ork of public transp ortation, a higher av ailability and diversification of jobs, and other elemen ts that foster mobilit y diversit y . As well as it migh t b e that a higher mobilit y diversification of indi- viduals lead to a higher so cio-economic developmen t as it could nourish economy , establish economic opp ortu- nities and facilitate flows of p eople and go o ds. In any case this information is useful for p olicy makers, b e- cause a change in the diversification of individual mov e- men ts is linked to a c hange into the socio-economic sta- tus of a territory . Third remark able result is that our regression and classification mo dels exhibit goo d p erformance when used to predict the so cio-economic developmen t of other m unicipalities, whose data where not used in the learn- ing pro cess (Figure 11 and T able 5). This result is evi- den t from the cross v alidation pro cedure: the accuracy and the prediction errors of the mo dels are not dep en- den t on the training and test set selected. The mo dels hence give a real p ossibility to contin uously monitor the so cio-economic developmen t of territories and pro- vide p olicy makers with an imp ortant to ol for decision making. 8 Conclusions and F uture works In this pap er we design an analytical framework that uses mobile phone data to extract meaningful measures of h uman b ehavior and estimate indicators for so cio- economic developmen t. W e apply the analytical frame- w ork on a nationwide mobile phone data cov ering sev- eral weeks and find that the diversification of human mo vemen ts is the b es t proxy for indicators of so cio- economic developmen t. W e kno w that bio-diversit y is crucial to the health of natural ecosystems, that the div ersity of opinion in a cro wd is essential to answer difficult questions [20] and that the diversit y of so cial con tacts is asso ciated to so cio-economic indicators of 14 Luca Pappalardo * † et al. w ell-b eing [15]. The story narrated in this paper sug- gests that div ersit y is a relev ant concept also in mobility ecosystems: the diversit y of human mobilit y may b e a reliable indicator of the v ariety of human activities, and a mirror of some aspects of so cio-economic dev elopment and well-being. W e are a ware that the computation of individual measures on CDR data (step (a) and (b) in Figure 1) presen t priv acy issues. An imp ortan t next step will b e to incorporate a priv acy-by-design approach. W e in tend to use a metho d to assess the priv acy risk of users in order to detect risk cases where the priv acy of users is violated and apply priv acy enhancing techniques for data anonymization [35]. In our experiments we compare the measures of mo- bilit y and so ciality with t wo external socio-economic in- dicators: p er capita income and depriv ation index. Per capita income is a simple indicator indicating the mean income of individuals resident in a given municipalit y , without any information ab out the distribution of the w ealth and the inequalit y . In con trast depriv ation in- dex is a comp osite indicator obtained as linear com bi- nation of sev eral differen t v ariables regarding economic and ecological asp ects (see App endix). It would b e in- teresting, as future work, to inv estigate the relation b e- t ween the b ehavioral measures and the so cio-economic dev elopment in a multidimensional p ersp ective, using the single v ariables comp osing the depriv ation index to understand which are the aspects of so cio-economic de- v elopment that b est correlate with the measures of hu- man b ehavior. This multidimensional approach is fos- tered b y recent academic research and a n umber of concrete initiatives developed around the world [25, 1] whic h state that the measurement of w ell-b eing should b e based on many differen t asp ects b esides the mate- rial living standards (income, consumption and w ealth): health, education, personal activities, gov ernance, so- cial relationships, environmen t, and security . All these dimensions shap e p eople’s w ell-b eing, and yet many of them are missed b y conv entional income measures. Of- ficial statistics institutions are incorp orating questions to capture people’s life ev aluations, hedonic exp eriences and priorities in their own surveys (see for example the Italian BES pro ject developed by Italian National Statistics Bureau [1]). When these measures will b e- come a v ailable, they will allow us to refine our study on the relation betw een measures extracted from Big Data and the so cio-economic developmen t of territories. In the meanwhile, exp eriences like ours may con- tribute to shap e the discussion on ho w to measure some of the asp ects of so cio-economic dev elopmen t with Big Data, such as mobile phone call records, that are mas- siv ely av ailable everywhere on earth. If we learn ho w to use suc h a resource, we hav e the p otential of creating a digital nerv ous system in supp ort of a generalized and sustainable dev elopment of our societies. This is crucial b ecause the decisions p olicy makers (and w e as individ- ual citizens) mak e dep end on what we measure, how go o d our measurements are and how w ell our measures are understo o d. App endix As describ ed in [45], the v alue of depriv ation index for F rench municipalities is calculated in the following wa y: depriv ation =0 . 11 × Ov ercro wding + 0 . 34 × No access to electric heating + 0 . 55 × Non-owner + 0 . 47 × Unemploymen t + 0 . 23 × F oreign nationality + 0 . 52 × No access to a car + 0 . 37 × Unskilled work er-farm work er + 0 . 45 × Household with 6 + p ersons + 0 . 19 × Low level of education + 0 . 41 × Single-parent household . 0.5 1.0 1.5 2.0 2.5 3.0 deprivation index 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 P(deprivation index) 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 p e r c a p i t a i n c o m e ( l o g 1 0 ) 0 1 2 3 4 5 P(per capita income) Fig. 14 Distribution of depriv ation index (a) and p er capita income (b) across F renc h municipalities. Ac knowledgemen ts The authors would like to thank Or- ange for providing the CDR data, Giov anni Lima and Pier- paolo Paolini for the contribution developed during their mas- ter theses. W e are grateful to Carole Pornet and colleagues for providing the so cio-economic indicators and for comput- ing the depriv ation index for the F renc h municipalities. This work has b een partially funded by the follo wing Eu- ropean pro jects: Cimplex (grant agreemen t 641191), PETRA (gran t agreement 609042), SoBigData RI (grant agreement 654024). An analytical framework to now cast well-being using mobile phone data 15 References 1. Bes: il b enessere equo e sostenibile in italia. T ec hnical rep ort, IST A T, 2014. 2. A world that counts: mobilizing the data revolution for sustainable developmen t. T echnical rep ort, United Na- tions, 2014. 3. Indicators and a monitoring framework for the sustain- able developmen t goals. T echnical rep ort, United Na- tions, 2015. 4. A. Amini, K. Kung, C. Kang, S. Sob olevsky , and C. Ratti. The impact of social segregation on human mobility in developing and urbanized regions. EPJ Data Scienc e , 3, 2014. 5. L. Backstrom, P . Boldi, M. Rosa, J. Ugander, and S. Vi- gna. F our degrees of separation. In Pr o c e edings of the 4th Annual ACM Web Scienc e Confer ence , W ebSci ’12, pages 33–42, New Y ork, NY, USA, 2012. ACM. 6. A.-L. Barabasi. Linke d: The new scienc e of networks . Perseus Publishing, 2002. 7. V. D. Blondel, A. Decuyp er, and G. Krings. A survey of results on mobile phone datasets analysis, 2015. cite 8. J. Blumensto ck. Calling for better measurement: Esti- mating an individual’s w ealth and well-being. In ACM KDD (Data Mining for Social Go o d) , 2014. 9. J. Brea, J. Burroni, M. Minnoni, and C. Sarraute. Har- nessing mobile phone social netw ork top ology to infer users demographic attributes. In Pr o c e e dings of the 8th Workshop on Social Network Mining and Analysis , SNAKDD’14. ACM, 2014. 10. E. Cho, S. A. My ers, and J. Lesko vec. F riendship and mo- bility: user mov ement in lo cation-based so cial netw orks. In Pro c e e dings of the 17th ACM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , KDD’11, pages 1082–1090. ACM, 2011. 11. P . Cin tia, L. P appalardo, D. P edreschi, F. Giannotti, and M. Malv aldi. The harsh rule of the goals: data-driven p er- formance indicators for fo otball teams. In Pro c e e dings of the 2015 IEEE International Conferenc e on Data Sci- enc e and A dvanc e d Analytics , DSAA’15. EEE, 2015. 12. P . J. H. Daas, M. J. Puts, and B. Buelens. Big data and official statistics. In The 2013 New T echniques and T e chnolo gies for Statistics c onfer ence , 2013. 13. A. Decuyp er, A. Rutherford, A. W adh wa, J. Bauer, G. Krings, T. Gutierrez, V. D. Blondel, and M. A. Luengo-Oroz. Estimating fo o d consumption and p ov erty indices with mobile phone data. CoRR , abs/1412.2595, 2014. 14. P . Deville, C. Linard, S. Martin, M. Gilb ert, F. R. Stevens, A. E. Gaughan, V. D. Blondel, and A. J. T an- dem. Dynamic p opulation mapping using mobile phone data. Pr o c e eding s of th e National Ac ademy of Scienc es (PNAS) , 111(45):15888–15893, 2014. 15. N. Eagle, M. Macy , and R. Claxton. Net work diversit y and economic developmen t. Scienc e , 328(5981):1029– 1031, May 2010. 16. N. Eagle and A. S. Pen tland. Eigenbehaviors: iden tifying structure in routine. Behavior al Ec olo gy and So ciobiol- o gy , 63(7):1057–1066, 2009. 17. S. F ortunato. Comm unit y detection in graphs. Physics R ep orts , 486(3-5):75 – 174, 2010. 18. V. F rias-martinez, V. Soto, J. Virseda, and E. F rias- martinez. Can cell phone traces measure social devel- opment? In Thir d Confer enc e on the Analysis of Mobile Phone Datasets, NetMob , 2013. 19. B. F urletti, L. Gabrielli, F. Giannotti, L. Milli, M. Nanni, D. Pedresc hi, R. Vivio, and G. Garofalo. Use of mobile phone data to estimate mobility flows. measuring urban population and inter-cit y mobility using big data in an in tegrated approach. In 47th SIS Scientific Me eting of the Italian Statistica So ciety , Cagliari, 06/2014 2014. 20. F. Galton. V o x p opuli. Natur e , 75(7), 1907. 21. F. Giannotti, D. Pedresc hi, A. Pen tland, P . Luk owicz, D. Kossmann, J. L. Crowley , and D. Helbing. A planetary nerv ous system for so cial mining and collective aw are- ness. EPJ Sp e cial T opics , 214:49–75, 2012. 22. M. C. Gonz´ alez, C. A. Hidalgo, and A.-L. Barab´ asi. Un- derstanding individual h uman mobility patterns. Natur e , 453(7196):779–782, June 2008. 23. U. Gro emping. Relative imp ortance for linear regression in r: The pac k age relaimp o. Journal of Statistic al Soft- war e , 17(1):1–27, 2006. 24. T. Gutierrez, G. Krings, and V. D. Blondel. Ev aluating soci o-economic state of a country analyzing airtime credit and mobile phone datasets. CoRR , abs/1309.4496, 2013. 25. D. Helbing and S. Balietti. Ho w to create an innov ation accelerator. EPJ Sp e cial T opics , (195):101–136, 2011. 26. C. A. Hidalgo and C. Ro driguez-Sick ert. The dynamics of a mobile phone netw ork. Physic a A: Statistic al Me- chanics and its Applic ations , 387(12):3017 – 3024, 2008. 27. S. Jiang, J. F. Jr, and M. Gonz´ alez. Clustering daily patterns of human activities in the city . Data Mining and Know le dge Discovery , 25:478–510, 2012. 28. D. Karamsh uk, C. Boldrini, M. Conti, and A. P as- sarella. Human mobility mo dels for opp ortunistic net- w orks. IEEE Communic ations Magazine , 49(12):157– 165, 2011. 29. M.-P . Kwan. Gender, the home-work link, and space- time patterns of nonemploymen t activities. Ec onomic Ge o gr aphy , 75(4):370–394, 1999. 30. J. Lesko vec and E. Horvitz. Planetary-scale views on a large instant-messaging netw ork. In WWW , pages 915– 924. ACM, 2008. 31. L. Liao, D. J. Patterson, D. F o x, and H. Kautz. Learn- ing and inferring transp ortation routines. Artif. Intel l. , 171(5-6):311–331, Apr. 2007. 32. R. Lindeman, P . Merenda, and R. Gold. Intr o duction to bivariate and multivariate analysis . Scott, F oresman, 1980. 33. L. Lotero, A. Cardillo, R. Hurtado, and J. Gomez- Gardenes. Several multiplexes in the same city: The role of so cio economic differences in urban mobility . Available at SSRN 2507816 , 2014. 34. S. Marc hetti, C. Giusti, M. Pratesi, N. Salv ati, F. Gi- annotti, D. Pedresc hi, S. Rinzivillo, L. Pappalardo, and L. Gabrielli. Small area mo del-based estimators using big data sources. Journal of Official Statistics , 31(2), 2015. 35. A. Monreale, S. Rinzivillo, F. Pratesi, F. Giannotti, and D. P edreschi. Priv acy-by-design in big data analytics and soci al mining. EPJ Data Scienc e , 2014. 36. M. E. J. Newman. The structure and function of complex net works. SIAM R eview , 45(2):167–256, 2003. 37. J. Onnela, J. Saramaki, J. Hyvonen, G. Szab o, D. Lazer, K. Kaski, J. Kertesz, and A. L. Barabasi. Structure and tie strengths in mobile communication netw orks. Pr o c. Natl. Ac ad. Sci. USA , 104(18):7332–7336, 2007. 38. W. P an, G. Ghoshal, C. Krumme, M. Cebrian, and A. P entland. Urban c haracteristics attributable to density -driven tie formation. Natur e Communic ations , 4, 2013. 39. L. P appalardo, S. Rinzivillo, Z. Qu, D. Pedresc hi, and F. Giannotti. Understanding the patterns of car trav el. EPJ Spe cial T opics , 215(1):61–73, 2013. 16 Luca Pappalardo * † et al. 40. L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedresc hi, F. Giannotti, and A.-L. Barab´ asi. Returners and explor- ers dichotom y in human mobility . Natur e Communic a- tions , 6(8166), 2015. 41. L. Pappalardo, Z. Smoreda, D. Pedresc hi, and F. Gian- notti. Using big data to study the link b etw een human mobility and so cio-economic developmen t. In Pr oc e e d- ings of the IEEE International Confer enc e on Big Data , 2015. 42. D. Pennacc hioli, M. Coscia, S. Rinzivillo, F. Giannotti, and D. P edreschi. The retail market as a complex system. EPJ Data Science , 3(1):33, 2014. 43. D. Pennacc hioli, M. Coscia, S. Rinzivillo, D. Pedresc hi, and F. Giannotti. Explaining the pro duct range effect in purchase data. In Pr o c e e dings of the IEEE Interna- tional Confer enc e on Big Data , IEEE Big Data 2015, pages 648–656, 2013. 44. S. Phithakkitnuk o on, Z. Smoreda, and P . Olivier. So cio- geograph y of human mobility: A study using longitudinal mobile phone data. PL oS ONE , 7(6):e39253, 06 2012. 45. C. Pornet, C. Delpierre, O. Dejardin, P . Grosclaude, L. Launay , L. Guittet, T. Lang, and G. Launoy . Con- struction of an adaptable europ ean transnational ecolog- ical depriv ation index: the frenc h v ersion. Journal of Epi- demiol Community Health , 66(11):982–9, 2012. 46. S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo, D. Pedresc hi, and F. Giannotti. The purp ose of motion: Learning activities from individual mobility net w orks. In Pr o c e e dings of the 2014 International Confer enc e on Data Science and A dvanc e d Analytics , DSAA’14, 2014. 47. S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pe- dreschi, and F. Giannotti. Disco vering the geographi- cal borders of human mobility . K¨ unstliche Intel ligenz , 26(3):253–260, 2012. 48. F. Simini, M. C. Gonz´ alez, A. Maritan, and A.-L. Barab´ asi. A universal mo del for mobilit y and migration patterns. Natur e , 484(7392):96–100, 2012. 49. C. Smith-Clarke, A. Mashhadi, and L. Capra. Po vert y on the cheap: Estimating p ov erty maps using aggregated mobile comm unication netw orks. In Pr o c e e dings of the SIGCHI Confer enc e on Human F actors in Computing Systems , pages 511–520. ACM, 2014. 50. C. Song, T. Koren, P . W ang, and A.-L. Barab´ asi. Mo d- elling the scaling prop erties of human mobility . Nature Physics , 6(10):818–823, Sept. 2010. 51. C. Song, Z. Qu, N. Blumm, and A.-L. Barab´ asi. Limits of predictability in human mobility . Scienc e , 327(5968):1018–1021, 2010. 52. P . Struijs and P . J. H. Daas. Qualit y approaches to big data in official statistics. In Eur op e an c onferenc e on Quality in Official Statistics , 2014. 53. D. W ang, D. Pedresc hi, C. Song, F. Giannotti, and A.-L. Barab´ asi. Human mobility , so cial ties, and link predic- tion. In Pr o c ee dings of the 17th ACM SIGKDD Inter- national Confer enc e on Know le dge Disc overy and Data Mining , KDD ’11, pages 1100–1108, New Y ork, NY, USA, 2011. ACM. 54. X.-Y. Y an, C. Zhao, Y. F an, Z. Di, and W.-X. W ang. Uni- v ersal predictability of mobility patterns in cities. Jour- nal of The Royal So ciety Interfac e , 11(100), 2014.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment