Bicycle cycles and mobility patterns - Exploring and characterizing data from a community bicycle program

This paper provides an analysis of human mobility data in an urban area using the amount of available bikes in the stations of the community bicycle program Bicing in Barcelona. The data was obtained by periodic mining of a KML-file accessible throug…

Authors: ** - Andreas Kaltenbrunner (, reas.kaltenbrunner@barcelonamedia.org) - Rodrigo Meza (rodrigo.meza@barcelonamedia.org) - Jens Grivolla (jens.grivolla@barcelonamedia.org) - Joan Codina (joan.codina@barcelonamedia.org) - Rafael Banchs (rafael.banchs@barcelonamedia.org) **

Bicycle cycles and mobility patterns - Exploring and characterizing data   from a community bicycle program
Bicyc le cyc les and mobility patterns Exploring and char acter izing data from a community bicycle prog ram Andreas Kaltenbrunner andreas.k alten brunner@barcelonamedia.org Rodrigo Meza rodrigo.meza@barcelonamedia.org Jens Grivolla jens.griv olla@barcelonamedia.org Joan Codina joan.codina@barcelonamedia.org Raf ael Banchs rafael.banc hs@barcelonamedia.org Barcelona Media Centre d’Innov ació, Ocata 1, 08003 Barcelona, Spain ABSTRA CT This paper pro vides an analysis of h uman mobility data in an urban area using the amou nt of a v ailable bikes in the sta- tions of the commu nity bicyc le program Bicing in Barcelona. The data was obtained b y p eriodic mining of a KML-file ac- cessible through the Bicing w ebsite. Although in principle v ery noisy , after some preprocessing and filtering steps the data allo ws to detect temp oral patterns in mobilit y as w ell as iden tify residen tial, univ ersity , business and leisure areas of the cit y . The results lead to a prop osal for an improv ement of the bicing website, including a prediction of the n umber of av ailable bik es in a certain station within the next min- utes/hours. F urthermore a mo del for iden tifying the most probable routes betw een stations is briefly sk etched. Categories and Subject Descriptors G.3 [ Probabilit y and statistics ]: Time series analysis; H.3.3 [ information storage and Retriev al ]: Information Searc h and Retriev al— Clustering, Information filtering General T erms Measuremen t K eywords Mobilit y pattern, comm unity bicycle program, urban behav- ior 1. INTR ODUCTION Human mobility patterns hav e received a certain amount of atten tion in recent studies. How ever, it is not a straightfor- w ard task to obtain data which allows a large scale study , mostly due to priv acy issues. Notable exceptions where the authors w ere able to ov ercome those difficulties include the use of geotagged photos [3] and lo cation data of mobile phones [9, 4], or analyzing the circulation of individual ban- knotes [2] and civil a viation traffic [7] to reconstruct geo- spatial data of h uman displacemen ts in different distance- scales. Some of these studies fo cus on the tra jectories of individ- uals whic h are reconstructed in several different manners. Large distance displacements can b e deduced from aviation traffic and hav e then been applied to predict the spread of infectious diseases [7]. Another quite ingenious w ay of the same authors to interfere middle and large scale tra jectories w as analyzing the circulation data of banknotes pro vided by individual users at an online bill-tracking system [2]. This study show ed that human trav el distances can b e describ ed b y a tw o-parameter contin uous-time random walk mo del. Shorter distances hav e b een analyzed in great detail in [4], using p osition data of individual mobile users. The authors sho w ed that individuals follow simple and repro ducible pat- terns of mobility in their ev eryda y displacemen ts, a fact that has not b een found in [2] for middle and large scale tra jec- tories. Another type of short distance patterns hav e been analyzed in [3], where the fo cus changed from everyda y life patterns to the b ehavior of tourists in foreign cities. Their spatio-temporal data w as deduced from geo-referenced pho- tos and the obtained results where contrasted with mobile phone usage. A case where, on the contrary to the ab o ve described stud- ies, only aggregate spatio temporal data is av ailable (e. g. the num b er of p ersons at time x in place y ), which do es not allo w the iden tification of individual tra jectories, was ana- lyzed in a recent study [9]. Data of aggregate mobile phone usage allo wed to construct activity cycles for different lo ca- tions, with clear differences b et ween working day and week- end patterns as well as a characterization of certain areas within the city by a cluster analysis. Here w e p erform a similar study using a different t yp e of aggregate data to infer human mobilit y patterns. The input spatio temp oral data, which has b een obtained b y a web mining pro cess, is the n umber of bicycles in the approxi- mately 400 differen t stations of Barcelona’s communit y bi- cycle program Bicing. T o our knowledge this is the first study using this type of mobility data. The aims of this study are tw ofold. First, w e wan t to ob- tain a description of the general patterns and activit y cycles, whic h can b e deduced from this type of data and second, we w ant to chec k if knowledge of those patterns can lead to a prediction of future behavior, whic h would allow to im- pro ve the current web-service of bicing and in turn increase users satisfaction with the system. Knowledge of those pat- terns could lead to an optimization of the bicing system itself, allo wing the op erator to predict shortage or ov erflow of bicycles in certain stations well in adv ance and adapt its redistribution schedule accordingly on the fly . Prediction of Bicing activit y is a problem related to traffic congestion control, which has b een analyzed traditionally for vehicular traffic. See for example [6] for a review on this sub ject. Related problems hav e been in vestigated also in the con text of web-serv er traffic congestion. A recen t study [1] used linear fits of activit y to predict web traffic hot-sp ots. Here we use a technique based on activity cycles more related to [8] where different patterns reflecting a websites activity cycle where used to predict the num b er of comments a news- item woul d receive. T o bridge the gap betw een studies using individual or ag- gregate displacemen t data w e also briefly sk etch a maximum en trop y [5] based mo del whic h could allow the detection of probable tra jectories out of the aggregate data. Such prob- lems hav e also b een extensively studied in the con text of v ehicular traffic flow [6]. The rest of pap er is organized as follows. W e first give a more detailed description of the Bicing system in sec- tion 1.1. Afterwards w e describ e details of the data retriev al (section 1.2) and basic quan tities of the collected data (sec- tion 1.3). In the results part of the article we first describ e the patterns of activity in some stations in section 2 and then tak e a global picture analyzing the activit y cycle of the en- tire city measured by the amoun t of bicycles in the stations (section 2.2) and their v ariation as spatial distribution (sec- tion 2.3). Then in section 3 some clustering is performed and in section 4 we apply the findings to predict future activity . Finally , w e present the model for reconstructing probable tra jectories in section 5 and the conclusions in section 6. 1.1 Bicing Bicing is an urban communit y bicycle program, managed and maintained in partnership b y the city council of Barcelona and the Cle ar Channel Communic ations Corporation. Bic- ing is mainly oriented to co v er small and medium daily routes of users within the Barcelona city area. Users register in to the system paying a fixed amount for a y early subscription and receive an RFID Card that allo ws them an unlimited usage through the y ear, where the first half hour of usage is free and subsequent half hour interv als are charged at 0.30 euros up to a maximum of 2 hours. Ex- ceeding this perio d is p enalized with 3 euros per hour. There are appro ximately 400 stations distributed all through the cit y , where each station has a fixed num b er of slots, either empt y (without a bicycle), o ccupied (holding a bicycle) or out of service, either b ecause the slot itself or the bicycle it con tains is marked as damaged. Whenever a subscriber needs to use a bicycle, he m ust select one from a station with occupied slots, trav el to his destiny station, and leav e it there on a free slot. The system registers every time a user tak es or parks a bike in a slot. Bicycles can be withdrawn from the stations from Monda y to F riday bet ween 5:00 and 24:00. On Saturday and Sunda y the service is open 24h. Outside of these time windows the bicycles can only b e returned but not withdra wn. There are tw o cases in whic h the system do es not allow a user to fulfill his route: 1. The origin sta tion do es not ha ve any av ailable bicycles. 2. The destin y station do es not hav e any empty slots to park in. When an y of these situations o ccur, users needing a free slot or a bicycle ha v e to choose b et ween waiting at the station, going to another station or take other means of transp orta- tion. In order to reduce these type of situations, there are truc ks which mo ve bicycles from highly loaded stations to empt y ones. How ever, in practice users do not wait for these truc ks since they do not hav e a fixed schedule nor ensure a maxim um resp onse time to fix problems at a station. T o allo w users to plan their routes in adv ance, the bicing system provides on their website a map of stations 1 , where users can chec k the status of the stations (amount of a v ail- able bikes and empty slots) close to their departure and ar- riv al p oin ts. Ho wev er, this information is only a v ailable at the sp ecific moment when the user queries the system. The service does not provide a history of previous loads to the stations 2 or an exp ected load of the destiny station at the time that the user gets to it. 1.2 Data retriev al The Bicing w ebsite provides an information service for users through the Go ogle maps API. It shows a map of Barcelona o verla yed with small mark ers indicating station p ositions and the amoun t of av ailable bicycles and free slots for every station. Data is inserted int o the map using Ja v aScript co de with a string v ariable that contains a KML geospatial an- notation do cument . Th is KML do cumen t defines the next information for eac h station: 1. station name 2. graphic icon to b e inserted in the map 3. latitude and longitude 4. nu mber of av ailable bicycles 5. nu mber of free slots In order to analyze the dynamics of station loads, we hav e been collecting these KML do cumen ts since May 15th every t wo minutes, parsing it and storing in a MySQL database all the relev ant information, suc h as the station name, lo- calization, a v ailable bicycles and free slots. As the Bicing net wo rk chang es from time to time, new stations are added automatically to the database when they first app ear in the KML files collected from the bicing website. 1 www.bicing.com/localizacione s/localizaciones.php 2 A nice personal pro ject ( http://statistings.c om ) im- pro ve s the service by providi ng the daily progression of the n um b er of bicycles in the stations. St. 13: pg maritim barceloneta (41.3845, 2.1957) Mo 12:00 Tu 12:00 We 12:00 Th 12:00 Fr 12:00 Sa 12:00 Su 12:00 18 May’08 15 May − 25 May’08 19 May − 01 Jun’08 26 May − 08 Jun’08 02 Jun − 15 Jun’08 09 Jun − 22 Jun’08 16 Jun − 29 Jun’08 23 Jun − 03 Jul’08 30 Jun − slots bikes average bikes ± stdv 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 Number of bikes/slots Hour and weekday Number of bikes/slots Mo 12:00 Tu 12:00 We 12:00 Th 12:00 Fr 12:00 Sa 12:00 Su 12:00 0 10 20 30 0 10 20 30 Figure 1: Sequence of the num b er of bicycles in the station (black line), and the total n um b er of slots (red line) of an example station next to the b eac h. Bottom row shows the av erage weekly pattern of this station. Gra y areas correspond to mean ± one stdv. 1.3 Basic quantities of the data collected Due to a problem in the bicing web-service, data after the 3rd of June was up dated only once or twice a day and could not b e used for our study . W e base our results therefore on the data recollected during the 7 w eeks betw een 12:00, Ma y 15th and 12:00, July 3rd, 2008. W e also initially did not collect data during Bicing’s closing hours on weekda ys betw een 0:01 and 5:00, which restricts our analysis further to the time-windo w b et ween 5:00 and 24:00. In total, we collected data from 377 stations with a total of appro ximate 8700 free slots (th ree stations, whic h nev er con- tained an y bicycles, w ere omitted from the analysis). The n umber of slots per station v aries betw een 15 and 39 and the maxim um amoun t of bicycles in the stations observed in our data was 3657. T able 1 summarizes these num b ers. n um b er of stations 377 (374 with data) n um b er of slots ≈ 8700 slots p er station [15 − 39] max. n um b er of bicycles observed 3657 T able 1: Principal quantities of the data recolle cted. 2. A CTIVITY CYCLES The amount of bicycles a v ailable at the different stations allo ws us to infer activity cycles of Barcelona’s p opulation. 2.1 Local activity cycles Before we b egin calculating activity cycles w e tak e a closer look at the data recov ered from Bicing’s web-service. The top plot in Figure 1 shows an example of the recov ered time-series data from a station close to the beach, a hospital and some office and universit y buildings. The recollection started on Thursda y , 15-05-2008 (b ottom of the subfigure) and subsequent weeks are drawn with an offset tow ards the top of the figure. The black lines indicate the amount of a v ailable bicycles. F or control reasons we also dra w the sum of bicycles and empt y slots (red line), which in case the sta- tion were 100% operationally should corresp ond to the total n um b er of its slots. Ho wev er, since often some slots or bi- cycles are marked as defect and cannot be used, the red lines show some fluctuations. Sometimes they exp erience a sudden drop during short time in terv als (e.g. on Saturday , 17-05-2008 morning), probably caused by a sp oradic mal- function in Bicing’s data collection system. St. 295: Diagonal (41.3875,2.1236) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 25 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) St. 13: pg maritim barceloneta (41.3845,2.1957) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 25 30 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) St. 9: Marques de l’Argentera (41.385,2.1853) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) St. 38: Ramon Berenguer el Gran (41.3843,2.1779) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 25 30 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) St. 332: Pl. Teresa de Claramunt (41.3631,2.1398) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 25 30 35 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) St. 21: Rossello (41.4033,2.1707) Time of day Number of bikes/slots 6 8 10 12 14 16 18 20 22 24 0 5 10 15 20 25 30 total slots bikes Mo−Fr ( ± stdv) bikes Sa−Su ( ± stdv) Figure 2: Average num b er of a v ailable bicycles during working days (black), and week ends (blue lines) for six example stations with different t yp es of activity cycles. Red curv e gives the a verage total n umber of slots in the station. Gra y and blue areas correspond to mean ± one stdv. Although the data is quite noisy with some sudden drops in the num b er of bikes, maybe caused by replacemen t truc ks whic h mov e bicycles from o ccupied stations to empt y ones, the mean w eekly activity pattern shown in the b ottom sub- plot of Figure 1, allo ws to av erage out those fluctuations quite well. W e therefore hav e chosen to ignore those unpre- dictable truck even ts in the rest of this study . The relative small standard deviations (black areas) sho w that the ob- serv ed patterns are quite stable during the 7 w eeks of data w e analyzed. Note esp ecially the near zero deviation at the sharp rise in the morning which can b e observed from Mon- da y to F riday . The greater standard deviation of the T ues- da y pattern is caused b y the lo cal holiday on June 24th, whose bicycle pattern is more similar to those of a typical Sunda y . W e clearly observe tw o different patterns for week- end and working days. This is confirmed b y a more detailed analysis of these t wo patterns in Figure 2, where week end (blue lines) and w eek- da y patterns ( black lines) from six different stations are com- pared. T o calculate those patterns we first delete all the el- emen ts of the time-series where the total num b er of slots in the station is b elo w a certain threshold (10). This allows to eliminate most of the moments where we believe the data to b e erroneous (e.g. the drops in the red line in Figure 1). W e then av erage those filtered time-series ov er the da ys of the corresp onding categories and apply a median filter with windo w length 3 to filter the noise further. W e first fo cus only on the weekda y patterns. The top mid- dle subplot corresp onds to the station analyzed in Figure 2 in more detail. W e observ e v ery differen t patterns in the differen t stations. Station #295 (top left) is close to a uni- v ersity and shows a quite narro w peak in the num b er of bicycle in the station b et ween 8:00 and 13:00, t ypical for a universit y with morning classes only . The following t wo stations are also close to universities (top middle and right subplots). How ever, their observed patterns are somehow differen t. All three stations show the initial rise in activit y in the morning. Sharp in station #13 (top middle) and less pronounced in #9 (top right). Station #13 is also close to some imp ortant office buildings and a hospital which might explain the sharp raise in activity around 8:00, more prone to a fixed working schedul e in companies or hospitals than v arying starting hours of univ ersity classes. The lo cation close to the b eac h probably causes the low er deca y in the n um b er of bikes in the afternoon hours where beach traf- fic collides with the leaving students and office and sanitary w orkers. Station #9 shows more v ariability . Although more spread than station #295 the morning p eak is quite similar. Ho wev er, this station exp eriences a second p eak starting at 15:00 and reaching its maximum at 16:00 in the afterno on, This migh t either b e caused b y p eople lea ving the universit y to take their lunc h elsewhere or a change of shift b et w een morning and afterno on lesson students. Finally , this station also experiences an increase in activity after 20:00 caused with high probabilit y b y the p opular close-by area of bars and restauran ts called “Born” . The station #38 (bottom left) represen ts another pattern quite differen t from the previous ones. It sho ws a drop in activit y typical for resident ial areas where p eople mainly withdra w bikes to mov e to their destinations (e.g. station # 21 in the top right corner of Figure 2). Ho wev er, at 8:00 in the morning the profile of this station changes to a pat- tern more characteristic for a office/universit y station. Due to closeness it might serve as backup for nearby universit y stations whic h hav e been run out of free slots to drop off the bikes. It is also situated right in the center b etw een the “Born” and “Barri G` otic” , which explains the increase in the n um b er of bikes in the late ev ening of p eople enjo ying the nigh tlife in the cit y center. Finally , the patterns of stations Figure 3: Average of the total amount of bicycles a v ailable in the stations. #332 and #21 show opp osite cycles to the previous ones, t ypical for residential areas, where p eople leav e the region during the morning to return later in the afterno on or late ev ening. The onsets of activity in the w eeken d patterns (blue lines) occur later than during working days, or is nearly absent as can b e observed for example in station #332 (b ottom mid- dle subplot), where only some minor activit y is observ ed. Station #295 shows an in teresting bimo dal distribution on w eekend s, which migh t b e caused b y a nearby shopping cen- ter which attracts afterno on visitors on week ends. 2.2 Global activity cycles If w e lo ok instead of the lo cal cycles in the particular sta- tions at the sum of bicycles av ailable at all stations during a certain hour of the da y , we get an idea of the global mobilit y cycle of Barcelona. In Figure 3 we plot these av erage cycles of av ailable bicycles for the working day s from Monday to F riday (blac k curve) and the week end (blue curv e). T o filt er the w orst noise out of the data, caused by malfunctions in the system, we use only measuremen ts where the total sum of slots (free and occu- pied) in all the stations is g reater than 8000 and furthermore w e apply again a median filter with a window length of 3 to ac hiev e smother curves. The less bicycles are av ailable for rent in the station the more displacemen ts using them are b eing performed. First, w e analyze the traffic during working days (black line). W e observ e a first lo cal minimum (i.e. a lo cal maximum in dis- placemen ts) a little earlier than 8:00, and a second lo wer one at 9:00. These t wo minima correspond to the typical starting hours in offices, which in Barcelona v aries normally betw een 8:00 and 10:00. This is further confirmed by the fact that the curv e reaches a local maximum at this hour, the time when late starters finally reach their working or study lo cations. A third low er minimum is observed around 14:00, whic h migh t be caused b y studen ts who leav e their classes. The num b er of a v ailable bicycles increases during people’s lunc h breaks (typically b etw een 14:00 and 16:00), but when the lo cal maximum at the end of this time span is reac hed it deca ys again. Finally , the global minimum nu m- ber of av ailable bicycles (the maximum in displacemen ts) is reac hed slightly after 19:00 in the afternoon. Typical finish- ing time of many working schedules. The week end pattern is differen t in the sense that it do es not show the early morning minima. Instead w e observe the maxim um of av ailable bicycles around 8:00, the equinox b e- t ween late home-comers from the last parties and early birds starting their day with a bicycle ride. The use of the bikes steadily augment s until their n um b er in the stations reaches a lo cal minimum at 14:00 just before lunch time, during whic h it increases again. Afterw ards the num b er of av ail- able bicycles decays again and follo ws a similar pattern as during working days, although the lo cal maximum at 16:00 occurs slightly earlier and the global minimum slightly later (at 20:00) and is less pronounced than during working da ys. It is therefore difficult to separate wo rking da y from week end activit y only based on afternoon activit y , as can be observ ed as well for most of the stations presented in Figure 2. Note that initially we only collected data betw een 5:00 and 24:00, whic h corresp onds to the opening hours of Bicing from Monda y to F rida y . Ho w ever, although the users are not allo wed to withdra w from a station outside of this time sc hedule, they can return a bicycle also b etw een 24:00 and 5:00. This explains the difference in the num b er of bicycles a v ailable at the beginning and end of the abov e described cycle. The small standard deviations (gray and blue areas in Fig- ure 3) show that the observed cycles are quite stable through- out the p eriod the data was collected. The week end devi- ation is sligh tly greater than its working da y coun terpart whic h is caused by the greater num ber of working da ys in our data set (35 vs 14) and the more flexible p ersonal time- sc hedules on week ends. 2.3 Mobility patterns T o get a spatial picture of the mobility pattern in the city , w e use these lo cal activity cycles together with the stations geo-coordinates (longitude and latitude) and place the dif- ference in the num b er of bicycles in the stations compared to their amoun t at 5:00 on the map of Barcelona for differ- en t times of the da y . Afterwards w e interpolate a 3D surface using this difference as color-encoded height 3 . Red stands for a positive difference, i.e. more bikes can b e found in this stations than at the b eginning of the day , while blue re- gions show areas whose n um b er of bicycles has been reduced. Green areas indicate a more or less constan t re lation b et ween incoming and outgoing bicycles. Figure 4 shows suc h geo- patterns for 6 different hours using the stations working da y cycle 4 . A t 6:30 (top left subfigure), no big difference form the initial distribution of bicycles in the stations can b e ob- serv ed. A t 9:30 how ever (top right subfigure), just after the morning minimum in Figure 3, we observe quite a different 3 Alternativ ely one can repeat the same procedure with other starting times (e.g. 16:00 to emphasize afterno on patterns). 4 A similar but simpler spatio temp oral visualization by F a- bien Girardin using just the evolution of bicycles in the sta- tions during one da y can b e found at http://w ww.girardin. org/fabien/tracing/bicing/ Figure 4: Geographic mobilit y patterns: Blac k crosses indicate the lo cation of bicing station and the color- o verla y the av erage v ariation during w orking da ys in the num b er of a v ailable bikes from the lev el at 5:00. Blue tones indicate regions whic h lo ose bik es while red tones stations whic h increase their n umber of bicycles. picture. Several areas change color either into deep red or dark blue. Blues regions corresp ond to mainly residen tial areas, from which p eople mov e out, while the red hot-sp ots are found mainly close to univ ersity and business quarters. In terestingly , although the num b er of bicycles in the station increases by roughly 400 until 12:00 in Figure 3, the snapshot of the geo-pattern (middle left subplot in Figure 4) at this momen t in time do es not change v ery m uc h. The only no- ticeable difference is that in already red regions the amoun t of bicycles increases sligh tly ev en more. W e can conclude that the morning peak in activity leads to quite a narrow band of stations with high bicycle concentration. The band crosses the city starting at its westmost en trance, where one the ma yor univ ersity area of the city lies, and follo ws the Di- agonal through a business district tow ards Pass eig the Gra- cia, where it turns righ t and heads down passing by one of the ma yor business and shopping areas and the Univ ersit y of Barcelona to meet the city cen ter and later the sea. There it turns left again to follow the beach tow ards P ort Olympic, lea ving out one station in the also mainly residential area of Barceloneta and passing by several campuses of Univ er- sitat P ompeu F abra. Close to Port Olympic we also find important office buildings as well as in a narrow band which gro ws from there northw ards tow ards Glories. Another area whic h receives a big surplus in activity is Diagonal Mar, the east-most point of Barcelona, also a region with important business activit y and a large shopping center. In the afterno on the picture changes, at 16:30 (middle left subfigure) a lot of bicycles hav e mov ed aw ay from the pre- viously described hot-spots, and the residential areas get some of their lost bikes back. Only the regions close to Port Olympic remains deeply red, probably no w caused mainly b y beach traffic. Also Diagonal Mar mainta ins its bicycles. A t 20:30 (b ottom left), finally , also those bik es head home again, only some region s in the cit y cen ter still hav e a surplus of bicycles, probably caused by p eople enjoying Barcelona’s nigh tlife. Those regions maintain their bicycles still at 23:30 (bottom right) when most of the remaining sta tions ha ve re- co vere d all their bikes and green tones. Those stations will reco ve r their bikes during the night. 3. CLUSTERING OF A CTIVITY The h uman b eha vior patterns men tioned ab o ve sugg est that some Bicing stations may show similar cycles dep ending on the activities o ccurring around them. T o find such similar- it y patterns, we use the sequences of the av erage num b er of bicycles in the stations during working days (as shown for some e xample st ations in 2) and define similarit y metrics b e- t ween those sequences which then allow to generate clusters of stations with similar cyc les. W e use the follo wing metrics: absolute similarity : Let p , q be tw o bicycle stations and let T = { t i } i =1 ..n be the set of measure p oints where station loads are collected, and s p ( t i ) b e the av erage n um b er of av ailable bicycles on station p in the mea- sure point t i . The absolute similarit y b et ween stations p and q is defined by: abs sim ( p, q ) = 1 P t i ∈ T | s p ( t i ) − s q ( t i ) | (1) relativ e similarit y Let p , q , T and s p ( t i ) b e defined as before. W e define a new function D p ( t i ) as: D p ( t i ) = ( 1 if s p ( t i +1 ) − s p ( t i ) ≥ 0 , − 1 if s p ( t i +1 ) − s p ( t i ) < 0 , (2) whic h is basically a slightly mo dified signum function of the gradient of s p . Then the relative similarity b e- t ween tw o stations p and q is defined by: r el sim ( p, q ) = X t i ∈ T 1 + D p ( t i ) × D q ( t i ) 2 (3) The greater those measures are, the more similar are the in vo lved stations. On one hand, absolute similarity tends to cluster stations according to the exact n umber of bicycles in every measure point, but do es not recognize tw o stations with the same pattern of use, but a different total num b er of slots. On the other hand, relativ e similarit y would clus- ter stations only according to the v ariations in their usage- pattern, but i s not useful to recognize the shap e of i ts cycles. Figure 5: Clusters grouped by their relativ e similar- it y define clear and meaningful zones. E.g. an alwa ys nearly empty station with little absolute v ari- ation in its num b er of bicycles would b e in the same cluster as a station with a large amplitude cycle, if it sho wed the same o ccupation pattern v ariations (receivin g bicycles in the mornings, losing them at lunc htime, etc). F or this reason, w e first calculate absolute similarity b etw een ev ery pair of stations and cluster them using the k-means algorithm. The result is a set of clusters of stations arranged according to their absolute num b er of bicycles. Afterwards, we use rela- tiv e similarity to compare the previously generated clusters and using k-means again to generate new clusters of clusters (meta-clusters) with the same usage-pattern, no matter the absolute num ber of bicycles in them. T o define the optimal n umber of clusters for the first iter- ation using absolute similarity , we hav e calculated intern al similarit y inside clusters using a v ariable num b er of clusters from 2 to the total num b er of stations. The optimal num b er of clusters is reached when the minimum internal similarity of all the clusters starts to decrease, that is at 31 clusters. Once we hav e obtained this first 31 clusters, we group them using relative similarities b etw een every pair of cluster, get- ting finally a set of 7 meta-clusters. The geographic zones defined by those meta-clusters are shown in Figure 5 in dif- feren t colors. The green cluster cov ers quite well the region with high morning activity detected in Figure 4. Station #295 (Figure 2 top left) is a t ypical exp onen t of this clus- ter. The pink zone corresponds to stations with a typical beach pattern as for example station #13 in Figure 2 (top middle). The black, red and blue meta-clusters cov er differ- en t types of residential area stations. An example for a blue station can b e found in #332 (Figure 2 bottom middle), and in station #21 (b ottom right) for a black one. Finally the y ello w and light blue meta-clusters sho w tw o different p e- ripheric patterns, with a decaying tendency in their num b er of bicycles du ring the en tire da y . This p rev alence of outgoing bicycles ma y require artificial replacements by the op erator to ensure a minimum num b er of bicycles at these stations. 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 time 0 5 10 15 20 25 30 available bikes Station 352, Wednesday 2008-06-18 prediction based on current number of bikes bikes prediction error (7.02) 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 time 0 5 10 15 20 25 30 available bikes Station 352, Wednesday 2008-06-18 prediction based on current number of bikes and average gradient average bikes prediction error (2.23) 0 50 100 150 200 250 offset in minutes 0 1 2 3 4 5 6 7 8 average error (with standard deviation) prediction error (number of available bikes) shifted gradient gradient (same weekday) Figure 6: Prediction of bicycle a v ailabilit y , (left) using only the current v alue and (middle) adjusted b y the a v erage gradien t o ver the other w eeks. The righ t subfigure sho ws the av erage error dep ending on the time offset of the prediction (2 hours for the example da y of station #352 shown in the middle and left subfigure). Zones in Figure 5 without color ov erlay corresp ond to sta- tions whose clusters could not b e arranged int o meta-clusters. They probably represent a mixture of several different clus- ters (e.g. the remaining stations #9 and #38 of Figure 2). F uture research will try to uncov er such combined patterns. 4. PREDICTION OF A CTIVITY In this section, we presen t initial results on the prediction of bicycles or free slots at a given station at a given time. W e compare several simple prediction mo dels, and establish ev aluation measures as w ell as a baseline with whic h other (more complex) mo dels can b e compared. Our initial set of prediction mo dels is based on the current state of the station as well as aggregate statistics of the station’s usage patterns. As the simplest baseline we chose to predict the current state of the station (num b er of bik es or free slots) for any time in the future. If there are currently 5 a v ailable bicycles, the system will predict t hat in 10 min utes there will still b e 5 bicycles av ailable. This corresponds to the b est prediction algorithm one can apply using only the presen t situation as display ed on the actual Bicing website. The next set of mo dels is based on extrapolating from the curren t state using the tendencies registered on other dates. T o the current num b er of bikes we add the exp ected change based on the av erage gradient in the aggregate mo del. The aggregate mo del in this case can b e based on all da ys other than the one for whic h predictions are made 5 , or can be lim- ited to the same day of the week, or split b etw een weekda ys and week ends/holida ys. W e ev aluate the different mo dels by measuring the mean error (difference betw een predicted and actual av ailabilit y of bicycles) ov er all stations and all a v ailable dates. This is done for different time offsets, i.e. predicting 10 minutes, 20 min utes, or several hours into the future. Figure 6 (left) shows an example for the fit obtained using the baseline mo del (i.e. predicting the current state 2 hours in to the future) and (middle) a gradient based prediction (using only data of to the same day of the week) for one particular station and day . The blue curv e corresp onds to 5 In a real application setting this w ould ob viously be limited to days prior to the current date. the actual num b er of bicycles (filtered with a median filter) in the station, while the red one indicates the prediction. In this example we ac hieve a muc h low er prediction error (indicated by the light blue areas) using the gradient of the a verag e activity cycle (green curve in Figure 6 middle). This is confirmed further by Figure 6 (right) where we com- pare the ov erall p erformance of our prediction algorithms as explained ab o ve. F or very short p erio ds (10 minutes) there is no notable difference b et ween the baseline and other models, whic h may partly b e due to a large num b er of low activity stations where predicting no change is the safest bet for v ery short time scales. How ever, we notice a significan tly b etter performance of prediction algorithms using the activit y cy- cles for larger offsets. Man y enhancemen ts and other approaches remain to b e tested, including the incorp oration of kno wledge about in- terv en tions of bicing trucks (b y having b etter data av ailable or detecting it from the a v ailable data) and other ev ent s that deviate from the “normal” trend. W eather conditions and man y other factors may also b e tak en into account. 5. PR OBABLE R OUTE IDENTIFICA TION In this section, we will deal with the problem of estimating those routes that are most lik ely to b e transited by the users from the aggregated data that is av ailable to us. Notice that in the actual context of the service provider this problem is not in teresting at all, since individual bicycle mo vemen ts among the stations can b e track ed by the system adminis- trator. How ever, when this in formation is not av ailable, esti- mating the most p opular routes from the aggregated data is a c hallenge. Basically , as we will discuss belo w, this problem is not solv able at all from the observ ation data alone. But w e claim that it is p ossible to approximate suboptimal solu- tions for it by means of conditioning the observed data with some additional information, suc h as the distances among the stations, the av erage bicycle velocity , and some other common sense implications. F or the appropriate computation of route p opularit y we should be able to estimate the conditional probabilit y of one bicycle arriving at station j given that it departed from station i . In other words, a transition probability matrix p j,i should be estimated for the whole system. As this problem is really in vo lving, we will approach it by considering tw o imp ortan t simplifications: first, we will consider the transition proba- bilities to b e time indep enden t of the analysis time interv al (w e restrict our analysis to the morning b eha vior from 5:00 to 12:00), and second, we will consider each bicycle to make only one trip during the morning interv al. According to this, our problem is describ ed by the following set of equations: F j = X i p j,i I i for j = 1 . . . N (4) X j p j,i = 1 for i = 1 . . . N (5) w ere N is the total num b er of stations, I is the initial (e.g. at 5:00) distribution of bicycles and F the final (e.g. at 12:00 in our exp eriments) distribution of bicycles. The first N equations represent the pro cess of aggregation of bicycles arriving from all stations into a final one; and the second set of equations guaranties that the total amoun t of bicycles is preserv ed. The system is describ ed b y a 2 N equations but N 2 parameters should b e determined. So it is strongly undetermined and admits an infinite n umber of solutions. Ev en for N = 2, the problem remains undetermined since the resulting four equations are linearly dep enden t. In our following pro cedure we use some additional informa- tion and some common sense heuristics in order to provide an appro ximate solution to this problem. It is founded on three imp ortan t observ ations: • Users are more lik ely to use the service to co ver in- termediate distances. F or short distances (less that 500 meters) the user will rather walk, and for long distances (more than 6 kilometers approximately) the user will rather use another transp ortation service. • In the morning interv al, most of the users will use the service to mov e from their home to their workin g and study area or an alternativ e transp ortation system. So during the morning it is expected (and confirmed by the data analyzed ab o ve) that some stations mainly serv e either as departure or arriv al stations. • Betw een the groups of departure and arriv al stations w e can also think about the existence of coupled sta- tions among which a great v olume of users should b e expected to mov e. Although this coupling is for sure a many-to-man y mo del, due to the computational ex- pensiveness of this analysis, for simplicit y , it will be appro ximated as a one-to-one sort of phenomenon. The three previous observ ations allow us to propose the follo wing maximum-en tropy-based mo del for the transition probabilities we wa nt to approximate: p j,i  f 1 ( dist i,j ) λ 1 f 2 ( sim i,j ) λ 2 f 3 ( coup i,j ) λ 3 (6) whic h is actually a log linear com bination of features. The first feature f 1 ( dist i,j ) represent s a log-normal distribution of the distance b et w een stations. Its parameters hav e b een adjusted to pro vide a maxim um v alue at 2 kilometers and to be negligible from 7 kilometers. The second feature f 2 ( sim i,j ) is a function of the cross-correlation co efficien t betw een the in vo lved stations’ av erage cycle of bicycles (more exactly f 2 ( sim i,j ) = (1 − xcor r ( i, j )) / 2), since bicycles are less lik ely to mov e b et ween tw o departure (or arriv al) stations. The third feature, f 3 ( coup i,j ), is a ternary function of pre- sumable one-to-one coupled stations. T o determine the de- gree of coupling b et ween tw o stations we shift the av erage cycle of observed bicycles at departure stations with resp ect to the arriv al stations b y a time factor related to the distance among them (an a verage speed of 25 km/h was assumed), and w e select those pairs of stations that b etter explained the one-to-one coupling assumption in terms of the total n um b er of bicycles shared by b oth stations. Every time a departure station k is identified to be coupled with an ar- riv al station m , f 3 ( coup m,k ) is set to one and the in verse relation f 3 ( coup k,m ) = 0 . 1; for all other cases f 3 ( coup i,j ) is set to 0 . 5. This last assumption helps to assign a proba- bilit y mass, among other even ts, to the ev ent of a bicycle remaining in the same station. The weigh ting exp onen ts are adjusted to b est fit the av ail- able aggregated data by means of simplex-based optimiza- tion. The columns of the resulting transition probabilit y matrix are normalized after each iteration in order to sat- isfy condition 5. F or the transition probabilit y from 5:00 to 12:00 on weekda ys we obtained for our data the weigh ts λ 1 = 0 . 949, λ 2 = 1 . 138 and λ 3 = 1 . 116. T o verify our model we compare in Figure 7 the actual a v- erage n um b er of bicycles in the stations at 12:00 (in blue) with the ob tained predictions from our model (in green) and a smo othed version of the prediction (in red). The stations or ranked in increasing order by their av erage num b er of bi- cycles. As seen from the figure, the prediction is very noisy , ho wev er as illustrated b y the smo othed version of it, the ten- dency is to follow the actual data. This result suggests that, although the prediction accuracy is p o or, the assumptions and heuristics used to construct the appro ximated mo del can explain the general tendency of the data. Finally , the most probable rout es, according to our prop osed stations rank number number of bikes Average number of bicycles in the station at 12:00 on working days 50 100 150 200 250 300 350 5 10 15 20 25 30 Original data Prediciton Prediction average Figure 7: Actual av erage bicycle distribution at 12:00 on weekda ys (blue), obtained predictions (green) and smo othed predictions (red). Figure 8: Most probable routes (green lines) on w eekdays b et w een departure (blue), arriv al (red) and non-pattern (blac k) stations. model, w ere extracted from the optimized transition ma- trix. Figure 8, presents a spatial representa tion of all bicing stations with the most probable routes among stations de- picted. In the figure, those stations identified as departure and arriv al stations are presen ted in blue and red, resp ec- tiv ely; while those stations exhibiting different patters are presen ted in black. The most probable routes (for whic h transition probabilities w ere larger than 0 . 03) are illust rated b y the green lines interconnect ing the stations. F rom the pic- ture, severa l clusters of morning activity can b e discov ered. It is important to mention that the most probable routes in this map are not necessarily related to heavy traffic of bicycles. They are mainly depicting the relativ e strength of traffic with resp ect to the amoun t of bicycles inv olved in the stations. This explains the lack of such probable routes in the city cen ter, where most of the activity is to b e exp ected, but this traffic is also is also muc h more disperse than on the b orders of the system. 6. CONCLUSIONS The approach of mining usage data from comm unity bicy- cle services presen ts some adv an tages against similar stud ies analyzing cell phone data [9, 4]. The data we use is freely a v ailable online for every one, so one can av oid the typical problems of finding a cell phone company willing to share its data with researchers and th e asso ciated priv acy and con- fiden tialit y issues. W e hav e shown that this type of data allows to infer the ac- tivit y cycles of Barcelona’s p opulation as well as the spatio- temporal distribu tion of t heir displacements. There are clear patterns of user b eha vior b y station and type of day . F rom the temporal clusterin g of stations or b y visualizing their a v- erage daily v ariation in activity it can be observ ed that the stations with similar b eha vior also corresp ond to adjacen t areas in the map sho wing residential, universit y and leisure areas. The cycles allo w a prediction of the amount of a v ail- able bicycles in the stations, which is significantly better for time windows greater than 20 minutes than the current ap- proac h on the Bicing website where only the actual num b er of bicycles/free slots is shown. It is our inten tion to pro vide a prototype for an improv ed Bicing web in terface which will allo w such detailed predictions in the near future. It would b e interesting to contrast our results with more specific usage statistics. The Bicing system m ust internally produce more information that is not public, such as the origin/destination of individual users. Access to this data w ould allow to b etter v alidate our algorithm for the detec- tion of the most probable routes and pro duce more precise models. Other information is not even av ailable to the Bic- ing operator: e.g. the users that could not take/lea ve a bicycle b ecause the station was empty/full. A survey aimed at obtaining a more detailed picture of the Bicing users and their motiv ations, curren tly being carried out by Jon F ro ehlic h et al. 6 , could help uncov er this information. A growing num b er of comm unity bicycle services are appear- ing world wide 7 , some of them with a similar web-service as the one we used to obtain our data, which is sure to generate increasing interest in this research topic. 7. REFERENCES [1] Y. Baryshnik ov, E. G. Coffman, G. Pierre, D. Rub enstein, M. Squillante, and T. Yimw adsana. Predictabilit y of web-serv er traffic congestion. In Pr o c e e dings of the WCW’05 , pages 97–103, W ashington, DC, USA, 2005. IEEE Computer So ciet y . [2] D. Bro c kmann, L. Hufnagel, and T. Geisel. The scaling la ws of human trav el. Natur e , 439(7075):462–465, 2006. [3] F. Girardin, F. Calabrese, F. D. Fiore, C. Ratti, and J. Blat. Digital fo otprin ting: uncov ering the presence and mov ements of tourists from user-generated conten t. IEEE Pervasive Computing , 2008. in print. [4] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Natur e , 453(7196):779–782, June 2008. [5] S. Guiasu and A. Shenitzer. The principle of maximum en trop y . The Mathematic al Intel ligenc er , 7(1), 1985. [6] S. P . Ho ogendoorn and P . H. L. Bovy . State-of-the-art of vehicu lar traffic flow mo delling. Pr oc e e dings of the I MECH E Part I Journal of Systems & Contr ol in Engine er , 215:283–303, 2001. [7] L. Hufnagel, D. Bro c kmann, and T. Geisel. F orecast and con trol of epidemics in a globalized world. PNAS , 101:15124–15129, 2004. [8] A. Kaltenb runner, V. G´ omez, and V. L´ opez. Description and prediction of slashdot activity . In Pr o c e e dings of the 5th L atin Americ an Web Congr ess (LA-WEB 2007) , Santiago de Chile, 2007. IEEE Computer Society . [9] J. Reades, F. Calabrese, A. Sevtsuk, and C. Ratti. Cellular census: Explorations in urban data collection. IEEE Pervasive Computing , 6(3):30–38, 2007. 6 https://catalysttools.washin gton.edu/webq/survey/ jfroehli/56481 7 A huge list o f suc h services can b e found at the Bik e-sharing w orld map at http://bike- sharing.blogspot.com .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment