Social and Spatial Clustering of People at Humanitys Largest Gathering

Macroscopic behavior of scientific and societal systems results from the aggregation of microscopic behaviors of their constituent elements, but connecting the macroscopic with the microscopic in human behavior has traditionally been difficult. Manif…

Authors: Ian Barnett, Tarun Khanna, Jukka-Pekka Onnela

Social and Spatial Clustering of People at Humanitys Largest Gathering
So cial and Spatial Clustering of P eople at Humanit y’s Largest Gathering Ian Barnett 1 , T arun Khanna 2 , Jukk a-P ekk a Onnela 1,* 1 Departmen t of Biostatistics, Harv ard Univ ersit y , Boston, MA, USA 2 Harv ard Business School, Boston, MA, USA * onnela@hsph.harv ard.edu Abstract Macroscopic b eha vior of scien tific and so cietal systems results from the aggregation of microscopic b eha viors of their constituen t elements, but connecting the macroscopic with the microscopic in human behavior has traditionally b een difficult. Manifestations of homophily , the notion that individuals tend to in teract with others who resemble them, ha ve been observed in man y small and in termediate size settings. How ev er, whether this b eha vior translates to truly macroscopic levels, and what its consequences ma y b e, remains unkno wn. Here, we use call detail records (CDRs) to examine the p opulation dynamics and manifestations of social and spatial homophily at a macroscopic lev el among the residents of 23 states of India at the Kum bh Mela, a 3-mon th-long Hindu festiv al. W e estimate that the festiv al was attended b y 61 million p eople, making it the largest gathering in the history of h umanit y . While we find strong o verall evidence for both t yp es of homophily for residents of different states, participants from lo w-representation states show considerably stronger prop ensity for b oth social and spatial homophily than those from high-representation states. These manifestations of homophily are amplified on cro wded da ys, such as the peak day of the festiv al, whic h w e estimate was attended by 25 million p eople. Our findings confirm that homophily , whic h here likely arises from so cial influence, p ermeates all scales of human b ehavior. In tro duction When the b eha vior of eac h individual in a group is dep endent on their in teractions with others around them, the collective b eha vior of the group as a whole can be surprisingly differen t from what would b e exp ected by simply extrap olating off that of the individual [1 – 3]. In particular, p eople think and b ehav e differently in crowds than in small scale settings [4 – 6], and this crowd b ehavior can occasionally lead to tragic ev en ts and ev en human stamp edes [7–9]. Individuals tend to form groups sp on taneously and engage in collectiv e decision-making outside of such dramatic even ts as w ell, but the nature of this t yp e of herding–and the exten t to which it happ ens–dep ends on how outn umbered the group is compared to the reference p opulation. F or example, friendship net works of adolescents demonstrate greater so cial homophily if they are in the minorit y [10], whereas ma jorit y members do not share this preference [11]. This phenomenon is in line with the description b y Simmel who argued that individuals “resist b eing leveled” in a crowd [12]. If, ho wev er, the minority group is to o small to form an indep enden t comm unity , it is p ossible for the minorit y to show heterophily PLOS 1/15 rather than homophily [13]. This finding highligh ts the imp ortance of the surrounding so cial context, in particular the relative size of the group. So cial homophily can also lead to spatial homophily and thereb y give rise to segregation [14, 15]. While the term homophily is used to mean differen t things, w e use it here to refer to the tendency for p eople who are similar to b e asso ciated with one another regardless of the mec hanism that causes this asso ciation. This use of the term is distinct from quan tifying homophily by the frequency of asso ciations among similar p eople, since p eople in the ma jority will hav e a greater frequency of asso ciations with others in the ma jority simply due to having more opp ortunities for forming them [16, 17]. While sev eral studies hav e inv estigated homophily of racial groups on smaller scales, we explore ho w such homophilous tendencies might p ersist on a muc h larger macroscopic scale. The behavior of individuals in a classro om cannot b e used to extrap olate on to the b eha vior of those pac ked into a crowd of millions. The Kumbh Mela is a religious Hindu festiv al that has b een celebrated for hundreds of y ears [18], and the 2013 Kumbh Mela, organized in Allahabad, stands out from all others to da y and throughout history due to its magnitude. As it is infeasible to collect demographic data from millions of participants, w e turned to call detail records (CDRs) that hav e been used to inv estigate so cial netw orks, mobilit y patterns, and other massive ev ents [19 – 24]. Cell phone op erators routinely main tain records of communication ev ents, mainly phone calls and text messages, for billing and research purp oses. These comm unication metadata, at minimum, keep track of who contacts whom, when, and for how long (voice calls only). Using these call detail records (CDRs), we first estimate the attendance of eac h of 23 states of India at the even t before inv estigating the relationship b et w een a state’s attendance and the degree of b oth so cial homophily and spatial homophily amongst its attendees. Metho ds Data description W e had access to CDRs 1 for one Indian op erator for the p erio d from Jan uary 1 to Marc h 31, 2013. This dataset contains records of 146 million (145,736,764) texts and 245 million (245,252,102) calls for a total of 390 million (390,988,866) communication ev ents. Giv en the logistical imp ossibility of collecting demographic, linguistic, or cultural attributes of Kum bh participants at scale, we based our inv estigation of homophily on a marker that acts as a proxy for these co v ariates, namely , cell phone area co des. The area co des corresp ond to different states 2 of India, and as a result of India’s States Reorganization Act of 1956 these divisions summarize demographic v ariabilit y along linguistic origin, ethnic agglomeration, and preexisting so cial b onds and b oundaries. While CDRs readily lend themselves to studying so cial netw orks and so cial homophily , to in vestigate spatial homophily we additionally acquired access to the cell to wer IDs at the Kumbh ven ue. Com bined with the latitude and longitude of each of the 207 tow ers at the site 3 , w e w ere able to infer the caller’s lo cation (at the time of phone-based comm unication) with 1 Only summary statistics from the CDRs were pro vided to us: so cial netw ork information and daily customer counts at v arious cell to wers located at the Kumbh. Caller IDs were anonymized, and no individual-level c haracteristics were provided to us aside from billing area co des and whether or not a prepaid or postpaid plan was used. 2 Though officially India has more than 23 states, w e adhere instead to the 23 functional state divisions used b y the service provider. 3 In anticipation of the large influx of people at the Kumbh, temp orary infrastructure was brought into the ven ue prior to the start of the festiv al so as to provide sufficient cov erage for the large n umber of exp ected cell phone users. PLOS 2/15 relativ ely high spatial resolution. The grid that divides the Kumbh site into regions around eac h cell tow er, called the V oronoi tessellation, groups all p oints on the map closest to eac h cell tow er. The birds-eye view of Allahabad in Fig. 1 sho ws the estimated attendance on one of the busiest and most fa vorable days for ritual bathing in the Ganges riv er. Figure 1. Cell phone usage around the cell to w ers at the Kum bh during its busiest da y . The heat map p olygons represent the V oronoi tessellation around the cell to wers that occupied the site of the Kumbh Mela even t in Allahabad, India. Cell to w ers with no activit y are remov ed from the analysis and their V oronoi cells are merged into neigh b oring activ e cell tow ers. Map data used to pro duce the river traces: Go ogle, DigitalGlob e. A ttendance Estimation Extrap olating p opulation measures from CDRs has b ecome feasible in recent years due to the rapid increase in the prev alence of cell phones. While CDRs provide raw counts of cell phone users, to estimate attendance, these num bers need to b e adjusted by (i) o verall prev alence of cell phones in India, (ii) the state-sp ecific mark et shares of our pro vider, (iii) the probability of daily use for a person known to b e presen t at the ven ue, and (iv) the probabilit y of phone non-use during a p erson’s en tire stay at the ven ue. First, regarding o verall phone prev alence, 71 . 3% of people in India had a wireless subscription in 2013 [28]. Second, regarding mark et share, the num ber of unique handsets are counted on a daily basis for eac h of 23 distinct states of India (T able S1), as defined by the service pro vider, and each coun t is extrap olated from the service provider’s market share in the giv en states. The service provider’s market share v aries widely state b y state (range 13 . 7%, 42 . 6%). It is imp ortan t to use state-sp ecific market share, b ecause if av erage mark et share is used instead, the state-sp ecific attendance coun ts can b e off b y more than a factor of 2. These handset coun ts are added together for each day b efore extrap olating to the general p opulation. Third, regarding daily use, it is lik ely that many Kumbh attendees who use their phone at least once do not use their phone every day while at the festiv al. If not addressed, this w ould bias our p opulation estimate do wnw ards. By tracking phone activit y , length of stay can b e estimated based on the time p eriod a p erson’s phone is activ e while at the Kumbh. Based on this, we estimate the p ercen tage of customers who use their phone on an y given day during their stay conditional on them using their phone at least once during their sta y to b e 40 . 4%. (Note that this quantit y applies to daily estimates, not to cum ulative estimates. See S1 T ext.) F ourth, regarding non-use, the probabilit y of a p erson not using his or her phone during the en tire stay at the ven ue is difficult to accoun t for; these individuals are not visible in the observ ed data, and yet the prop ortion of non-users could p oten tially b e substan tial given that many visitors from outside regions would hav e to pa y roaming fees, whic h likely leads them to minimize their phone use. T o ov ercome this difficult y , w e first examine four av ailable daily p opulation pro jections [29], each for a different day , and calibrate the prop ortion of non-users such that our resulting daily estimate for that same da y is most consistent with the four daily pro jections. W e obtain an estimate of 40 . 6% for non-use (coincidentally similar to 40 . 4% obtained ab o ve for daily use) and we use this estimate to adjust b oth cumulativ e and daily attendance. PLOS 3/15 So cial Homophily A so cial net work is constructed b et ween customers who used their phone at the Kumbh. A net work edge is assumed b et ween any tw o p eople who communicated with one another at an y p oin t ov er the course of the Kumbh. T o study how a state’s exten t of so cial homophily is related to its level of representation, defined as the num ber of p eople presen t from the state divided b y the total Kum bh attendance, we select a measure that results in consisten t estimates of homophily regardless of state representation. The measure of so cial homophily considered in refs. [13, 30] applied to our setting would define homophily for an y given state as the prop ortion of ties that inv olv e t wo participan ts from that state, but due to measuring absolute differences instead of relativ e differences, the homophily for states with small representation would b e biased do wnw ards due to their small prop ortions. A standard sto c hastic blo c k mo del (SBM) approac h [31] applied to our setting would assume an equal likelihoo d of forming net work ties b etw een any tw o participants from the same state. How ev er, if this mo del is missp ecified and there exist additional so cial structure within each state (within eac h blo c k), as is almost certainly the case, then this approach is likely biased in the opp osite direction and o verestimates the so cial homophily in states with low er representation. 4 The biases of b oth these metho ds are discussed in further detail in S1 T ext. T o circum ven t these problems, we shift our fo cus from dyads to same-state connected triples, sets of three no des from the same state that are connected either by t wo edges, resulting in an op en triple, or three edges, resulting in a closed triple. The rationale b ehind this choice is that the three no des in a connected triple can b e assumed to b elong to the same so cial group whether the triple is op en or closed. By considering the prop ensit y for same-state connected triples to b e closed, we can gain insight in to ho w densely connected the so cial groups are in which these triples are embedded. This approac h is a wa y of sampling pairs of no des from the same so cial group even when the so cial groups themselves are unobserved. The prop ortion of triples that are closed pro vides a natural measure of so cial homophily (see Fig. 2). This measure is commonly referred to as the global clustering co efficien t or the transitivit y index [32] calculated o ver each state-sp ecific netw ork. Ignoring residents from the lo cal state whose phone use is likely different from all other states 5 , there are 1,630,553 connected triples in the full Kum bh so cial net work. Figure 2. Sc hematics of homophily measures (A) and call detail records (B). F or homophily measures ( A ), the three dotted lines represent spatial b oundaries for the V oronoi tessellation around the cell to wers, separating the shaded region into three V oronoi cells, in tw o (a low and high homophily) examples. The solid lines denote whic h no des are in comm unication in the so cial net work, either through voice call or text message. In the con text of spatial homophily , tw o no des are considered nearby if and only if they b oth are in the same spatial region (V oronoi cell) on the same day . The size of V oronoi cells range from as small as a 1 / 4 km 2 to as large as 20 km 2 . F or the call detail records ( B ), analysis of spatial homophily uses all pairwise communication ev ents in volving at least one customer of our op erator who is present at the Kumbh, whereas 4 T o see the reason for this, consider the case where state A sends only a single group of friends to the Kumbh, whereas state B sends 100 different groups of friends. A random pair selected from state A will hav e a muc h higher likelihoo d of being friends than will a random pair from state B, even if so cial homophily is equally strong within the friendship groups of the tw o states. 5 When studying so cial homophily we ignore the attendees from the lo cal state where the Kumbh is held, eastern Uttar Pradesh, b ecause the so cial behavior of the lo cals is likely not comparable to those from the other 22 states. While visitors from other states are all present for the same purpose of participation in the Kumbh, this is not true for the lo cals, many of whom w ere employed to help run the Kumbh in v arious roles. Outsider phone usage will likely b e exclusiv ely for co ordinating purp oses at the even t, due to the cost of roaming calls. On the other hand, lo cals use their phones muc h more freely and for ev eryday purposes. PLOS 4/15 analysis of so cial homophily only considers the ties b etw een customers of our op erator. Letting C ij k = 1 if the ( i, j, k ) triple is closed and C ij k = 0 if it is open, and let R ij k b e the state of the three no des in the triple, with W r as the prop ortion of the total cum ulative Kumbh p opulation by March 31, 2013, that b elongs to state r = 1 , . . . , 23. Across the 22 non-lo cal states, W r ranges from 0 . 018% to 7 . 45%, th us v arying ov er 2.5 orders of magnitude. W e fit the follo wing regression mo del o ver all connected triples: logit(pr( C ij k = 1)) = β 0 + β 1 log 10 W R ij k (1) The mo del requires indep endence b et ween observ ations for accurate inference, and b ecause the same individual can b e in volv ed in m ultiple triples, this indep endence do es not hold. The estimate ˆ β 1 from (1) is still un biased, but its standard error and the P -v alue for the tw o-sided test of the null hypothesis β 1 = 0 will not b e correct if this dep endence is ignored. T aking adv an tage of the large sample size, for accurate inference w e select a random subset of triples where we do not allow the same individual to app ear in more than one triple. Spatial Homophily Let n crd b e the num b er of customers near cell tow er c from state r on day d of the Kum bh, and let N rd = P C c =1 n crd b e the total num ber of customers from state r at the Kum bh on day d , where the sum is taken ov er all C cell tow ers. T o av oid double-coun ting, if a person uses multiple cell tow ers on the same day , only the first cell to wer is recorded. The probability that any tw o giv en individuals from the state r are nearb y on the day d is: p rd = 1 N rd C X c =1 n crd n crd − 1 N rd − 1 (2) Here t wo p eople are defined to b e “nearby” on a particular day when they are b oth lo cated in the same V oronoi cell on that day , using the cell tow er designation men tioned ab o v e. The intuition b ehind equation 2 is that, giv en the lo cation of one p erson, the probabilit y a different randomly selected person from their state is in the same V oronoi cell is ( n crd − 1) / ( N rd − 1). The probability in equation 2 has the desirable prop erty of not scaling with state represen tation W r if spatial homophily is k ept constant. 6 This prop ert y is essen tial if we wish to ev aluate the relationship b etw een spatial homophily and state attendance/representation. Finally , let Q A r = P 90 d =1 p rd / 90 b e the probabilit y that an y tw o given individuals from state r are nearby av eraged o ver all 90 days. T o ev aluate busy , or high volume, days, w e consider the three days with the highest attendance. W e group ed each of these three days together along with the tw o days that preceded eac h and the tw o da ys that follow ed each, leading to a set of 15 days we lab eled as high v olume days. The remaining 75 days were group ed together to form the set of lo w volume days. W e let Q H r b e the av erage of the p rd o ver the high volume days, Q L r b e the av erage of the p rd o ver the lo w volume da ys, and we defined Q D r = Q H r /Q L r to b e the ratio of spatial homophily when comparing high v olume days to low volume days. 6 T o see this, supp ose that we hold constant ho w a particular state’s attendees are spread out over the cell towers of the Kumbh, i.e. suppose w e fix n crd / N rd . If we then increase the n umber of people present at the Kumbh from that state, p rd will stay essen tially unchanged with a negligible increase, because ( a · n crd − 1) / ( a · N rd − 1) > ( n crd − 1) / ( N rd − 1) for an y a > 1. PLOS 5/15 Results A ttendance Estimation Since the exten t of homophily for any given group can dep end on the relative size of that group compared to others, w e first estimate daily and cumulativ e attendance for participan ts from each state which can then simply b e added up to obtain ov erall attendance estimates. Existing estimates of the Kum bh’s attendance v ary widely and most are obtained with heavy extrap olation based on rough head coun ts combined with the rate of flow at high traffic p oints leading to the Kumbh ven ue [25]. These estimates ha ve the limitation that they only lo ok at the primary entrances into the Kumbh and ignore traffic flo w from secondary entrances. And while daily estimates can b e inferred from traffic flo w or satellite images, cumulativ e attendance is more difficult to ob tain, b ecause a satellite image cannot tell if the same people are present for many weeks, or if p eople stay only a short time b efore leaving to b e replaced by newcomers. Our estimates for the total daily and cum ulative attendance are shown in Fig. 3. They clearly show a spike of attendance on each of the Kum bh’s three primary bathing da ys. These days hold sp ecial religious significance and bathing on these days is seen to b e particularly auspicious. Based on the ab ov e num b ers, we es timate the p eak daily attendance of the 2013 Kum bh on F ebruary 10th to b e 25 million, and the total cum ulative attendance from January 1 to March 31 to b e 60 . 6 million, which suggests that the ev ent was the largest recorded gathering in humanit y’s history . A sensitivit y analysis in Fig. 3 shows the cumulativ e attendance if the p ercen t of customers that are non-users is v aried from the estimated 40 . 6%. F or example, if the p ercen t of customers that are non-users is 45%, then the cumulativ e attendance sinks to 54 million, whereas if the p ercen t of customers that are non-users is 35%, then the cum ulative attendance rises to 69 million. Figure 3. Estimates for daily and cum ulative attendance at the Kum bh. The cumulativ e ( A ) and daily ( B ) attendance at the Kumbh is estimated from January 1st, 2013, to March 30th, 2013. Daily estimates are the num b er of unique handsets used extrap olated by the (i) the national prev alence of cell phones, (ii) state-sp ecific market share of the service pro vider, (iii) the likelihoo d of inactivit y on a daily basis, and (iv) the prop ortion of individuals who never use their phone (non-users). Cumulativ e estimates are extrap olated only by (i), (ii), and (iv), which accoun ts for the apparent difference b et w een daily and cumulativ e coun ts on January 1st. The sensitivity of total cum ulative attendance to changes in (iv) shows the imp ortance of accounting for this form of censoring in the data (C) . The curve plotted is f ( x ) = c/x , where c = 24467257. So cial Homophily W e in vestigate so cial homophily among the residen ts of the 23 states, using state-sp ecific attendance estimates, by constructing a so cial netw ork of Kumbh attendees. The net work no des corresp ond to p eople and edges corresp ond to one or more pairwise comm unication even ts b etw een p eople. Note that only communication ev ents inv olving the service provider’s customers presen t at the Kumbh ven ue are observ ed (see Fig. 2), and b oth parties m ust b e customers of the pro vider to b e included in the net work so that their state of residence can b e ascertained. The resulting net work contains 2,130,463 no des and 8,204,602 ties. The netw ork is constructed using the full three mon th p erio d using b oth text and call information com bined b ecause otherwise the net work would b ecome to o sparse if segmented. PLOS 6/15 When there is strong so cial homophily in a state, the connected triples in the so cial net work among attendees from that state will hav e an increased lik eliho od of b eing closed. After fitting mo del (1) w e find that there is strong negative asso ciation b etw een so cial homophily and state representation. The mo del fit has an estimate of ˆ β 1 = − 0 . 208, 95% CI ( − 0 . 259 , − 0 . 157), implying that a ten-fold increase in W r corresp onds to an 81% decrease in the exp ected prop ortion of closed triples. The analysis restricted to a subset of indep enden t triples yields a P -v alue less than 10 − 20 and this significance remains robust to the subset selected. This analysis reduces sample size and sacrifices some statistical p o w er by lo oking only at a subset of indep enden t triples in order to allo w for accurate statistical inference. Even then, the P -v alue remains v ery significant, providing strong evidence that minority states at the Kumbh tend to sho w significantly greater so cial homophily as compared to well represented states. Spatial Homophily Do es the finding of heavily outnum b ered states b eing more tightly-knit in their so cial net works apply to spatial homophily as well? W e use our knowledge of which cell to wer is used b y a caller to approximate caller lo cation. Let Q A r b e the probability that any t wo given individuals from state r are physically nearby av eraged ov er all 90 days of the Kum bh. The Q A r and their confidence in terv als are illustrated in Fig. 4, with Q A r ranging b et w een 0 . 0025 and 0 . 018, reflecting ov er a 7-fold difference in the prop ensity for spatial homophily across states, with a mean v alue of 0 . 013. States with low represen tation tend to b e more spatially homophilous than states with high represen tation. In contrast, the lo cal p eople from the eastern Uttar Pradesh, where the Kum bh Mela takes place, alone make up a ma jority at the Kumbh, and they show significan tly less spatial homophily . Overall, there is a strong negative correlation (P earson’s ρ = − 0 . 54) b etw een spatial homophily ( Q A j ) and a verage logarithmic daily represen tation at the Kumbh. Figure 4. The spatial homophily and represen tation of the 23 mainland states of India at the Kum bh. The p oin t estimates and 95% confidence interv als for Q A r , the probabilit y that any tw o giv en customers from state r are physically close to one another, ( A ) and Q D r , the relative increase of state r ’s spatial homophily on busy da ys compared to normal days, ( B ), b oth demonstrate an inv erse relationship with state representation. The states ha v e been rank ed first b y representation at the Kumbh ( C ) and then b y degree of spatial homophily ( D ) (see S1 T ext for the list of state names). The heat map colors corresp ond to the rankings. The yello w star is the city of Allahabad, the lo cation of the 2013 Kumbh Mela. The near inv ersion of colors when comparing the t wo panels demonstrates a clear negative asso ciation b et ween state represen tation and spatial homophily . The a verage spatial homophily Q A r ab o v e was computed ov er the full three-month p eriod, but it is conceiv able that spatial homophily is a dynamic c haracteristic that v aries from day to day , reflecting the changing comp ositions of different so cial groups. W e conjectured that the exten t of spatial homophily might b e different on the three primary bathing da ys of F ebruary 10, F ebruary 15, and March 10 as compared to the other less crowded days. T o test this, we define Q D r to b e the ratio of spatial homophily on cro wded, high volume, days relative to spatial homophily on low er attendance da ys for state r . Fig. 4 sho ws that states with low representation tend to hav e a greater increase in spatial homophily on the high v olume days. Participan ts from these underrepresen ted states app ear particularly sensitive to increase in crowds, and they seem to group PLOS 7/15 together more closely as the cro wds build up. Some of the states with high represen tation are more robust to changes in the crowd size. In fact, there were seven states that had the opp osite effect (though these effects were quite mild in comparison). There is a gap b et w een the top four most represented states at the Kumbh (Uttar Pradesh East, Madhy a Pradesh, Bihar, and Delhi) and the remaining states. These four w ell-represented states all show ed les s spatial homophily on the busier days. Ov erall there is mo derate negative correlation (Pearson’s ρ = − 0 . 27) b et w een Q D r and a verage logarithmic daily represen tation at the Kumbh. Discussion W e used CDRs to estimate daily and cum ulative attendance at the 2013 Kumbh Mela whic h, according to our analyses, represents the largest gathering of p eople in recorded history . While participan ts from all states demonstrated so cial and spatial homophily , these phenomena w ere stronger for the states with low representation at the even t and w ere further amplified on esp ecially cro wded days. Giv en that a p erson ma y not use their phone immediately up on arriving or b efore lea ving the Kumbh, it is likely that the duration of stay as estimated by their phone usage is truncated. T o accoun t for this censoring, a mo del for daily phone usage is required that can estimate the amoun t of censoring. W e chose the simple mo del that assumed that eac h p erson had some indep enden t probabilit y of using their phone on eac h day . While this mo del is intuitiv e and provides suitable estimates for the amount of censoring, it ma y b e the case that phone usage is captured b etter by a more complicated and in volv ed mo del. Though w e consider the prop ortion of connected triples that are closed in the Kum bh so cial net work as a wa y of measuring the homophilous tendencies of attendees from eac h state, we draw a distinction b etw een this measure and what is more commonly known as triadic closure. In the so cial netw ork context, triadic closure is the mec hanism by which connections are formed through a m utual acquaintance. Ho w ever, since we do not observe when the original net w ork ties are formed, we cannot comment on triadic closure [26] as a causal mechanism for tie formation. Our observ ations av oid a causal connotation and fo cus instead on observed asso ciative measures. Our finding on spatial homophily is compatible with the phenomenon of “asso ciative homophily ,” whic h states that at a so cial gathering a p erson is more likely to join or con tinue engagement with a group as long as that group contains at least one other p erson who is similar to her [27]. Because every group is likely to ha ve at least one p erson from the ma jority , asso ciativ e homophily plays a relatively weak role for someone in the ma jority as she will b e comfortable in almost every group. On the other hand, a p erson in the minorit y may hav e to actively find a group that contains another p erson similar to him, inflating the minority group’s apparent homophily . This framew ork offers one p ossible explanation for the tigh ter cohesion of the states at the Kum bh with low representation. In conclusion, whether at the individual, group, or state level, it app ears that no one lik es to b e outn umbered. W e all seek safety in num bers. Supp orting Information S1 T ext Supplemen tary T ext. Extended discussion of how some measures of homophily can b e susceptible to confounding with the size of the subgroup. The names and PLOS 8/15 corresp onding market shares of the 23 mainland states of India is listed. Some in tuition for ho w censoring takes effect is also included. Figure S1. Sto c hastic blo c k mo del edge probabilities b y state. The p kk represen t the probability that an y random tw o no des in state k share an edge, assuming this probabilit y is the same for all pairs of no des in state k . The strong association b et w een this probability and state representation is heavily biased under mo del misp ecification as is more likely the case here, exagerating the result. The baseline probabilit y is calculated assuming no blo c k structure, i.e. all no des ha ve the same probabilit y of b eing connected to one another regardless of state mem b ership. Figure S2. Simple illustration of the bias pro duced b y the sto c hastic blo c k mo del under mo del missp ecification. Social groups are display ed in blue, and are assumed to all b e of equal size. The probability that tw o p eople in the same so cial group share an edge is 0 . 20. The probabilit y that tw o p eople in different so cial groups share an edge is 0 . 04. States A and B are constructed to hav e identical homophily , i.e. the probabilit y of an edge b et ween tw o p eople in the same so cial group is the same for b oth states. The av erage edge probabilit y display ed takes the a verage ov er all p ossible pairs of no des in the state. Figure S3. Sc hematic for estimation of the probability of phone usage on an y giv en da y . Eac h square represen ts a different day , and it is assumed that a p erson arriv es at and departs from the Kumbh only once. The estimated prop ortion of days a phone is used is calculated as the total num ber days a phone is used summed across all customers, divided b y the length of stay summed across all customers. T able S1. State Acronyms and operator market share. The acronyms for the t wen t y-three telecomm unications states in India used by the op erator are listed. In addition, the mark et share of the op erator as measured b y the p ercen tage of the total n umber of p eople in the state with some form of subscription to a phone plan, taken from the mon th of January 2013. S2 Information Supplemen tary Information. Netw ork data taken ov er the full duration of the Kum bh Mela. Daily handset count data, stratified by state. Ac kno wledgmen ts IB and JPO are supp orted by Harv ard T.H. Chan School of Public Health Career Incubator Aw ard to JPO. TK is supp orted b y the HBS Division of Research. The authors declare no conflict of interests. The authors would like to thank Gautam Ahuja, Clare Ev ans, Gokul Madhav an, Daniel Malter, and Peter Slo ot for contributing their helpful commen ts, suggestions, critiques and discussion, and would like to thank the op erator for providing access to their data. A sp ecial thanks to the op erator for b oth pro viding us access to their data as well as accomo dating us on their campus grounds as w e work ed on the analysis. In particular, we wish to express our thanks to emplo yees Rohit Dev and Vik as Singhal for their assistance. PLOS 9/15 References 1. Sc helling, T. C. Micr omotives and macr ob ehavior (WW Norton & Compan y , 2006). 2. Strandburg-P eshkin, A., F arine, D. R., Couzin, I. D. & Crofo ot, M. C. Shared decision-making driv es collective mov emen t in wild bab oons. Scienc e 348 , 1358–1361 (2015). 3. Helbing, D., F ark as, I. & Vicsek, T. Simu lating dynamical features of escap e panic. Natur e 407 , 487–490 (2000). 4. Le Bon, G. The cr owd: A study of the p opular mind (Fisc her, 1897). 5. P ark, R. E. The cro wd and the public. The University of Chic ago Pr ess (1972). 6. Blumer, H. Elementary collective b ehavior. New Outlines of the Principles of So ciolo gy 170–177 (1946). 7. Ngai, K. M., Burkle, F. M., Hsu, A. & Hsu, E. B. Human stamp edes: a systematic review of historical and p eer-reviewed sources. Disaster me dicine and public he alth pr ep ar e dness 3 , 191–195 (2009). 8. Greenough, P . G. et al. The kumbh mela stampede: disaster preparedness m ust bridge jurisdictions. BMJ 346 (2013). 9. Maclean, K. K. Power and Pilgrimage: The Kumbh Mela in Al lahab ad, 1765-1954 . Ph.D. thesis, La T rob e Universit y (2003). 10. Gonz´ alez, M. C., Herrmann, H. J., Kert´ esz, J. & Vicsek, T. Communit y structure and ethnic preferences in sc ho ol friendship net works. Physic a A: Statistic al me chanics and its applic ations 379 , 307–316 (2007). 11. V ermeij, L., V an Duijn, M. A. & Baerveldt, C. Ethnic segregation in context: So cial discrimination among native dutch pupils and their ethnic minorit y classmates. So cial Networks 31 , 230–239 (2009). 12. Simmel, G. The metr op olis and mental life (Pp. 409-24 in The So ciology of Georg Simmel, edited and translated b y Kurt H. W olff. New Y ork: F ree Press, 1964., 1903). 13. Currarini, S., Jackson, M. O. & Pin, P . An economic mo del of friendship: Homophily , minorities, and segregation. Ec onometric a 77 , 1003–1045 (2009). 14. Sc helling, T. C. Dynamic mo dels of segregation † . Journal of mathematic al so ciolo gy 1 , 143–186 (1971). 15. Hatna, E. & Benenson, I. Combining segregation and in tegration: Sc helling mo del dynamics for heterogeneous p opulation. arXiv pr eprint arXiv:1406.5215 (2014). 16. McPherson, M., Smith-Lovin, L. & Co ok, J. M. Birds of a feather: Homophily in so cial netw orks. A nnual r eview of so ciolo gy 415–444 (2001). 17. Hallinan, M. T. & Smith, S. S. The effects of classro om racial comp osition on studen ts’ interracial friendliness. So cial Psycholo gy Quarterly 3–16 (1985). 18. Mehrotra, R. & V era, F. Kumbh Mela: Mapping the Ephemer al Me gacity (Hatje Can tz, Harv ard South Asia Insititute, 2015). PLOS 10/15 19. Blondel, V., Decuyp er, A. & Krings, G. A survey of results on mobile phone datasets analysis. EPJ Data Scienc e 4 (2015). 20. Onnela, J.-P . et al. Structure and tie strengths in mobile communication netw orks. Pr o c e e dings of the National A c ademy of Scienc es 104 , 7332–7336 (2007). 21. Onnela, J.-P . et al. Analysis of a large-scale weigh ted netw ork of one-to-one h uman communication. New Journal of Physics 9 , 179 (2007). 22. Gonzalez, M. C., Hidalgo, C. A. & Barabasi, A.-L. Understanding individual h uman mobility patterns. Natur e 453 , 779–782 (2008). 23. W esolowski, A. et al. Quan tifying the impact of human mobility on malaria. Scienc e 338 , 267–270 (2012). 24. Aleissa, F. et al. Wired to connect: Analyzing human communication and information sharing b eha vior during extreme even ts. KDD Workshop on L e arning ab out Emer gencies fr om So cial Information (2014). 25. Sugden, J. How the kumbh mela crowds are coun ted. The Wal l Str e et Journal (2013). 26. Simmel, G. & W olff, K. H. The so ciolo gy of ge or g simmel , vol. 92892 (Simon and Sc huster, 1950). 27. Ingram, P . & Morris, M. W. Do p eople mix at mixers? structure, homophily , and the “life of the part y”. A dministr ative Scienc e Quarterly 52 , 558–585 (2007). 28. T. R. A. of India Annual rep ort 2012-13. www.trai.go v.in/W riteReadData/UserFiles/Do cumen ts/AnuualReports/TRAI- English-Ann ual-Rep ort-10032014.p df (2013). Accessed: 2013-03-20. 29. Kum bh mela 2013 at a glance. h ttp://kumbhmelaallahabad.go v.in/english/kum bh at glance.html. Accessed: 2013-03-20. 30. Coleman, J. S. Relational analysis: the study of so cial organizations with survey metho ds. Human Or ganization 17 , 28–36 (1958). 31. Holland, P . W., Laskey , K. B. & Leinhardt, S. Sto chastic blo c kmo dels: First steps. So cial networks 5 , 109–137 (1983). 32. W asserman, S. & F aust, K. So cial network analysis: Metho ds and applic ations , v ol. 8 (Cambridge universit y press, 1994). PLOS 11/15 Figure 1 PLOS 12/15 Figure 2 PLOS 13/15 Figure 3 PLOS 14/15 Figure 4 PLOS 15/15

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment