Reading Stockholm Riots 2013 in social media by text-mining
The riots in Stockholm in May 2013 were an event that reverberated in the world media for its dimension of violence that had spread through the Swedish capital. In this study we have investigated the role of social media in creating media phenomena v…
Authors: Andrzej Jarynowski, Amir Rostami
Re ading S tock holm Riots 2013 in social media by text-minin g Andrzej Jarynowski 1,2 ,3 , Amir Rostami 1 1) Department of Sociolo gy, Stockholm Universit y, Sto ckholm, S weden 2) Department of T heory of Complex Syste m , Smolucho wski Institute, Jagiellonian Univers ity , Cracow, Poland 3) Laboratory of Tec hnics of Virtual Rea lity, CIOP – National Research I nstutute, Warsa w, Poland andrzej.jarynowski@ sociology.su . se Abstract The riots in Stockholm in May 2 013 were an event th at reverberated in the world media for its d imension of violence that had spread through th e Swedish capital. In this study w e have i nvestigated the role of so cial media in creating media phenomena via t ext mining and n atural language processing. We have focused on t wo ch annels of communication for our an alysis: Twitter and Po loniainfo.se (Forum of P olish community in Sweden). Our preliminary results show so me hot topics driving discussion related mostly to Swedish Police and Swedish P olitics b y counting word u sage. Typical features for media in tervention are presente d. We ha ve built networks of most popular phrases, clustered by categories (geograph y, media in stitution, etc.). Sentiment analysis shows negative conn otation with Police. The aim of this p reliminary explorato ry quantitative study was to generate qu estions and hypotheses, which we co uld carefully follow by deeper more qualitative me thods. Keywords: text-mining, cybere motions, social networks, riots, sentime nt analysis , la nguage semantics 1. Introduction The language and its prod ucts are the most importa nt source of information of people’s states of mind in communication pro cess ( Babtist 20 10) . The semantic and structural analysis of texts co uld shed more light on human perception, interpretation and creation of social phenomena like riots. Moreover, thinki ng in ter ms of networks a nd h ierarchy (associations bet ween words a nd terms) has gained ground in m any disciplines (Mouge 2003), including ps ychology, criminology, anthrop ology, political science, while it has risen from physics a nd sociology. Our aim is to provide qualitative analysis of how people discuss Stockhol m riots and what are their stimuli to such acti vity. To do so, w e in vestigate Forum posts (P oloniainfo.se) and T witter during a nd af ter rio ts. We choose those datasets, because, they are freely available a nd ca n be legally cr awled fro m Inter net and both ca n be treated as b ig data. Unf ortunately, both Medias represent bias, due to very special category of people u sing them, and co nclusions based o n those communi ti es cannot be directly generalize for the whole population. However, some insights for future studies could be obtained, especially from T witter (Polonia.info is eve n less represen tative an d findings from t here could only help us to understan d the complexity o f riot phenomena). The scientific community h as already experienced the p ower of social net work media since the riots in Totten ham, north London, in August 2011 (theguardian&LSE 20 11) . Since then T witter and an y other social media were ca refully in vestigated for al most every social movement such as STOP -ACTA (Jarynowski 201 3) or Smoleńsk cras h ( Sob kowicz 2013) in Poland w ith co mputational tools co nstructed f or th is problem. We try some simple too ls of NLP ( natural language pr ocessing) and text-mining (Yuskiv 2006) to obtain some kind o f hierarchi cal or network structure of concepts mentioned by Internet-users. The initial disturbances in H usby, a s uburb in the norther n p art o f Stockholm, were triggered by t he po lice shooting of an old Magh reb-origi n ma n. Most of the discussion took place during riots since 15 .05 (incident with police) via period o f actual riots 20 -25.05 and shortly after that. B oth Forum and Twitter data were collected from 15 .05 till middle o f Jul y (1 5.07) so both data series are exactl y 2 months long. With the relatively high sta ndard of living in Sweden (even suburban area of Stockholm like Hu sby is a long way a way fro m poornes s), the economic grievances were probably not the pri mary motivating factor in these protests ( Barker 2 013) . The riots in Stockhol m ha ve bee n carried out by a combination of angr y local youths, radical left-wing activists and hardened cr iminals (Rostami 2 012) . Sw edish Police have alread y intervened against many similar size riots like in Rosengård (Malmö) in 2 009 or in other neighborhoods often gathered people w ith socio-eco nomic problems (Nilson 2011). Crim inologists note that the se gregation iss ues, which have been drivi ng rio ts in 20 th centur y, t urn in 21 th into: alienation, rootlessne ss, unemployment, distr ust a nd resentment against societ y (H alls wo rth 2011) . Our goal is longitudinal text-mining analyses of public opinion (within social media) o n the r iot in order to explain its phenomena theor etically desc ribed b y criminologists , pu t some l ight on d riving factors, which w arm up the discourse a nd set up so me hypo thesis about e mergence of media phenomena . 2. Twitter 2.1. Data descrip tion and objec tives We analyze ~14k T weets in different langua ges (mostly Swedish and English) tagged w ith hash Husby. That’s implying internatio nal persp ective of p eople, who express their thought s via T witter. Because of Multilanguage perspective of such Tweets, we decide to anal yze not whole content of those tweets, but only co -occurrence with other hash tags. We do n ot d ifferentiate who was twitting: simple users, mainstream media, non- mainstream media, bloggers, activists, or even and the police. Only ~ 8k T weets were taken i n our a nalysis (onl y those with more tha n one ha sh tag). We c hoose the 20 most frequent ta gs, there Svpol is t he most freq uent with Sthlmriots a nd Migpol far b ehind a nd rest (Table 1). We decided to analyze hash tags, j ust as they are, but there could be po ssibility to categorize some hash tags in just one categor y (e.g. by combining Sthlmriots with Stkhlmriot). No Hash tag English meaning Counts 1 Svpol Swedish Politics 3897 2 Sthlmriots Sthlmriots 1319 3 Migpol Migration Politics 436 4 Sthlmriot Sthlmriot 236 5 Stockholm Stockholm 200 6 Aftonbladet Press company 142 7 Nymo Press company 124 8 Rinkeby District of Stockhol m 109 9 Polisen Police 108 10 Sweden Sweden 100 11 Upplopp Riots 92 12 Kista District of Stockhol m 89 13 Svtdebatt Debate in Swedis h TV 82 14 Vpol Left Politics 80 15 Debatt Debate 76 16 08pol Police PR Depar tment 75 17 Expressentv TV Program 72 18 Megafonen Political Activists 71 19 Kravaller Riots 70 20 Tensta District of Stockhol m 69 Table 1: Mo st frequent tags 2.2. Longitude analy sis We found, that for ex ample Svpol ( Fig. 1 ) an d Migpol (Fig. 2) are tags, which were in consta nt use for the whole period. T he shape of cumulative hash ta gs counts cur ve for them is almost linear. That mean s: the occurrence probabilities are eq ual i n tim e. Mo reover, every seco nd twit since begi nning of riots till middle of July is associated with Svpol hash ta g. T he r est of the has h tags died out after ri ots finished. Cu mul a t ive ha sh t ag s cou nt 1 5.05 1 5.07 T we ets 0 500 1000 1500 2000 2500 3000 3500 4000 Tag s u se d svpo l st hl mrio t s Fig. 1: Longitudinal a nalysis o f two most frequent tags (Svpol, Stkhlmriots ) Moreover, for tags like Debatt or Svtdebat t, we observed, that people u sed them only around the event (Fig. 2) , which i s a ver y co mmon p heno mena in Twitter w orld. Some media names hash tags have a step wise shape like Nymo (Fig. 2), which also is characteristic f or media. Those institutions provide some news, which are likel y to be r etwited. T hat e xplains the big number of media hash tag use in short time surround ed b y quiet regions (Chmiel 2011). T he freq uency of u sage of gi ven media ha sh tag s could be also a n indicator of ho w i nfluential that medium is. C umu la tive ha sh t ag s cou n t mi gp ol st h l mrio t st o ckh ol m a f t on bl a d e t n ymo ri nke by p o li sen sw ed en u p pl op p ki st a svt d e ba t t vp o l d e ba t t 0 8 po l e xp resse ntv me ga f on e n kra va ll er t e n st a 1 5 . 05 1 5 . 07 0 50 100 150 200 250 300 350 400 450 Tag s u se d Fig. 2: Longitudinal a nalysis o f rest of most frequent tag s 2.3. Association analy sis We also try to find assotiation bet ween ta gs. We define the link, when in the same tweet both tags coe xist. Hierarchical a nalysis shows l eading r ole of d yad Svpo l- Sthhlmriots (Fig. 3, T able 2). Mor eover, 2-gram elements (co-occurrence of 2 terms in one twit) of main dual dyad sthlmriots_svpol and svpol_ migpol are a fe w times more frequent than other ele ments (T able 2) . However, we cannot call triangle Migpol, Sv pol, Sthlmriots as triad, because t he link bet ween Migpol and Sthl mriots was ob served only 37 times s o it is an order of magnitude weaker than main double dyad . No Hash tag Counts 1 sthlmriots_svpol 533 2 svpol_migpol 353 3 svpol_sthlmriot 107 4 migpol_nymo 50 5 vpol_svpol 47 Table 2: The most frequent 2-gra ms. Evide nce of importnace of dual d yad sthlmriots_svpol a nd svpol_migpol. Let’s define co nstruction o f the net work (Fig. 3) . W e decided to establish lower threshold on the level of 2 tweets, needed to create a link (everythin g below seems to be a noise, because link- associatio n should be repeated at least o nce to avoid random effect s). Thickness of the link corresp onds to its weight (cou nt of given 2 - gram). On the o t her hand, dual dyad sthlmriots_ svpol and svpol_migpol thic knesses were red uced not to cover the whole figure. Fig. 3 : Network of co nnections (links thi nness in dual dyad sthlmriots_svpo l and svpol_migpol were decreased to see other links also) Network analysis (Buda 2013) show leadi ng ro le of dyad Svpol-Sthhlmriots and quasi-triangle Migpol, S vpol, Sthlmriots. On the other hand w e can find clusters of geographical di stricts, ta gs related to d ebate, Sw edish words describing riots, and media institution. 2.4. Conclusions o f Twitter analysis The provided anal ysis shows many features known already from other studies (step wise functions for cumulative T weets count fo r media orga nization or clusterization of ha shes within similar se mantic field) and general observatio n, but here t hey are presented in a more systematized w a y. T he most important topic is p olitics. Svpol as other hashes with the sa me m eaning, are definitively the most fr equent hash tags. Mo reover only Svpol and Migpol see m to be used after riots with the same frequency as before. It would be interesting to see how hashes about politics co-occ urrence with o thers change with ti me. Another q uestio n could be asked with sentiment analysis: how emotionall y o riented are hashes about politics. 3. Poloniainfo.se 3.1. Data descrip tion and objec tives Internet For ums li ke p oloniainfo.se are not bro adcasting media as Twitter, but relatio ns bet ween users are usually stronger and more p ersonal. Quantitative r esearch co uld be deeper d ue to complex relations bet ween users (Zbie g 2012), but on the other hand a mount the o f data is not as impressive as T witter. We look at frequencie s of world used by Fo rum user in T opic abo ut Riots in 52 5 P osts. We cho ose o nly Polish words in this analysis. Firstly we found extre mely huge a mount of personal and po ssessive pronouns o f third person in plural form (Table 3) . Everything seems to be d escribing abou t “The m” more often ca lled “others” in et hno lo gical literature ( Bauman 1996) . “They” ar e nat ive S wedes rep resented mostly by government and police and another “They” : riot participants. T hat indicates observative a nd little biased way of looking on the riots presented by Forum users (Gustafsson in press ) . T he Polish co mmunity did not take part in rio ts, a nd o n the ot her hand have no si gnificant influence o n politics o f S weden. T hat makes this medium neutral, while views: bo th for and against the riots were presented there. However, Polish people identify themselves culturall y with Swedish establishment and describe problems of Husby citizens unlike t heir o wn perspective. interesting pronoun Polish freq compared pronoun Polish freq them im 79 us nam 4 the tym 102 us nam - them/ their/ theirs ich 72 ours/ our nasze/ nasz/ nasza 2 they oni 53 we my 1 these ci 48 we my - them nich 38 us nas 13 Table 3: Orientation of conve rsation on „the m” 3.2. Methodology a nd data mining We tried to categor ies words used i n disc ussion in few categories. To do so, we choo se only th ose words, whic h have o nly one clear meaning somehow related to the topic. We found 386 different words, which appear at least once in our sample and seem to have some important meaning. Fro m t hem around 3 00 were attached to different categories 1 -10 with s ubcategories described by so me key words (Table 4). Every categor y allo cate sum of n umber of unique words, which belon g to family of given keywords relate d to given cate gory or subcategories. 1.1 Employment (work/employees, hardworking, rich, money, taxe s) 1.2 Unemployment (une mployment, social help, poor ) 2 Family (family) 3 Religion (Islam, religion) 4 Education (education, sc hools, learn, language) 5 Living (apart ments/residents, district) 6.0 Politics-general (government, debate, party, politicians, de mocracy) 6.1 Politics- multikulti (i nvite, acclimatization, multi kulti politics, hope, to lerance, asylum, get, arrives) 6.2 Politics-segregat ion (racist na mes, eugenics, racism, segregation, depor tation, hate) 7.0 Identity-general (na tion, Stockholm, socie t y) 7.1 They (immigrants, Arabia n, other nations, origi n) 7.2 Swedes (Swedes, S wedish, Sweden, Europ e, nobility) 8.0 Police -general (police, militar y) 8.1 Police-induce ( killed, wounds, induce, b ullets, weapon, shoot, Po lice in slang, knife, disar m) 8.2 Police -law (law, cutthroat , actio n in the name of la w ) 9.0 Riots- general (thro w, riots, night, street, viole nce, stones, car) 9.1 Riots-pro ( rebellion, youth, pro test, vulnerable) 9.2 Vandalis m (fires, vandalism, aggression) 10 External fields (other r iots, problems, wars, media) Table 4: Categories, subcate gories and key words describing them 3.3. Limitation o f coding Meaning of words used by p eople is very difficult to uniqueness classificatio n. I n o ur task we prop ose 10 main categories with 14 subcategori es and attract presented words into gi ven key words related to d escriptive categor y or subcategory. Classification is based on our subj ective feeling. We tried to avoid wo rds w ith man y meanings. We had problems with words: Stockhol m/Sweden (does not onl y rela te with that cit y/country, but a lso geographical location); Swedish/S wedes (does not o nly describe citizenship of S weden, but also the background of the riots); all words classified to cate gories pr o riots or p ro police (w ords connoted with law or rebellio n have mostly positive meaning, but not always); all w ords classi fied to categ ory e xternal f orces (e. g. none of media instituti on like radio, T V, press, or Internet co mpanies name s were included in investigatio n) Also all of ca tegories have v ery wide ra nge of potential connotation a nd some of t hem could overlap (e. g. where should be the bord er between employment a nd unemployment), but we tried to help ourselves with keyword list (T able 4). 3.4. Results of categ orization From results, we can concl ude, that the main con flict is going o n ar ound identity, police operation and riots itself, work and living issues. Politics, education and family related issues p lays secondar y role, but still such top ics were discussed by Forum users. The main subject of discussion seems to focus around identity (the biggest count of related words) which was alrea dy observed fro m intensively of prono unce use (T able 3) . Mor eover, motor of co nflict co uld be defined as Swedes -They. While there are some coding pr oblems w ith identity (are liter ary Swedes associate d wi th literary cate gory 7.2? ), the second f requent category: Employment is probab ly the biggest single issue related to Stock holm’s riots mentioned by For um u sers. T he smallest subcategor y (Police-pr o) is one related to the p ositive side of p olice operation. One order of magnitude often Police was described b y negative or neutral connotation. Fig. 4: Categories and Subcate gories from most to leas t popular Fig. 5: Categories and categor ies containing su bcategories. * means sum of all counts of subcategories for given categor y 3.5. Sentiment analysis of 2-gra ms with Po lice We extract all 2-grams wher e the word “ police ” or “ police officer ” ap pears. We ru n sentiment a nalysis on eac h 2-gra m. The sentence sentiment stre ngth 1 could vary from -4 ( very negative) to +4 (very positive). Most o f them have b een neutral and sensitive stren gth i s 0. W e analyze al l those 2 -grams, which were found more than once. To illustrate, we sho w few most frequent 2-grams (Table 5) . Moreo ver, Polish stop w ords ( without meaning) were excluded also. T o get effective po wer, w e multiply freq uency by sensitive strengt h. The overall score o f po wer for all 2 -gram s contai ning word Po lice is slightly negative ( -5). Polish 2- gram English 2 – gram No . sen si stren th Pow. szwedzka policja Swedish police 10 0 0 policja uż ywa police uses 3 0 0 granatniki policjanci police launchers 2 -1 -2 jaka policja what the police 2 0 0 mog ł a policja police could 2 0 0 mordować policja police murders 2 -3 -6 … … … … … Sum Power -5 Table 5: Most freque nt 2-gra ms with P olice, P oliceman or Police officer w ith their sentiment score and frequency. 3.5. Conclusions and future works fo r Poloniainfo. se This work ha s o nly explorato ry functio n, but even t hose preliminary results; we can p ropose few hypot heses, which should be checked b y deeper investigation. The first o ne is related to Police operation. Please note, that we d o not want to evaluat e pro fessionalism of Police, but only public opinion about their oper ation. Data- mining analysis p roposed b y us has many weak sides and it is us ually ver y d ifficult to make c lear conclusio ns fro m it. In the P olice case, ord er of m agnitude differe nce 1 We use tool: Sesnistren th (Thel wall 2012) for English translation of Polish sente nces between positive and both negative and neutral cases seem to be something more than methodologi cal bias or artifact. Why Forum user s ev en, they identif y the mself with Swedish establishment d id not say almost anything positive about Police? Did PR department of P olice work properly? One user wrote, that S wedish P olice as a best paid organization i n EU is one of the less ef fective at the same time. W e pr opose to survey public evaluation abo ut Police operation. Mo reover sentiment and association analysis should also giv e s ome more ins ight, while preliminary results sho w little negative emotional orientation (Table 5). This sam ple is unfortunately too small for an y conclusions a nd bigger datasets should be used to estimate actual se nsitive po wer. Employment also see ms to be relatively i mportant topic (Fig. 4, 5 ). Work i ssue, with connotatio n with taxes and salaries should be more carefully inve stigate d if that is really a leading f actor of discussions about riots. It beats so me a spects o f identit y (and e ven identit y also i f other coders would not includ e geographical word – e. g. Stockholm i nto id entity cate gory) and religion, living condition or education form freque ntional analysis. However, i t could come fro m bias, that Po les describ ing riots are mostly guest workers and work, a s a single theme, is the most popular within Polis h co mmunity in Sweden. 4. General findings, limitation of both studies and speculations The role of Police and politics should be in vestigated , because in T witter politics (T able 2) and in Foru m P olice (Fig. 5) are the main motors of o pinion spr ead between people. B oth studies are onl y preli minary and just explor e the field to set up question s to b e ask by prag matic research. Both datasets are not repr esentative for the whole societ y and opinion s hared in both m ediums ar e very special to people who u se the m. Moreover in eac h study d ifferent methodo logy was used due to difference of d ata structure it self, and ev en i n content. So me aspects revealed in one stud y were omitted i n the seco nd one. For example media institutio ns, which pla y important role in hash tag stud y, were not cate gorized anywhere i n Forum case. Ho wever, similar issues co me o ut fro m both mediums. As the res ult from this preliminary analysis hypotheses ab out emergence of media phenomena appear, while similar size riots repor ted by police w ent unnoticed (Nilsson 2012) . For example: how influe ntial actors used social media like b loggers, journalist, political activist made this topic so popular and it has been discussing till now ( Fig. 1). T he next par t of this proj ect will focus on network a nalysis of role of such actor s. While socio logists tr y to under stand mechanisms o f riots arising, underlying psychological and sociological patterns, and the presentatio n of riots in the mass media, natural lan guage pro cessing and text mining are great supplementary tools for that. Criminologists, o n the other hand, focus on r ecent causatio n theories and suggest various ways o f controlling riots (Sarnec ki 2001) . They are aware of similar incidents and forecast intensi ficati on of riots in the future d ue to stratification o f societ y with overrepresentation o f " urban underclas s ”. T hat indicate s the need for adjusting computational methods to problems, and this paper is a pr eliminary ap proach for that. Acknowledgments We would like to thank Hernan Monda ni, Andrzej Grabowski, Fredrik Liljeros, Anita Zbie g, Lauren Dean, and Clara Li ndblom, for discussions. AJ thanks to Svenska Institutet for i nvitation to Sweden. References Barker, V. (20 13). Policing Me mbership in Husb y: Four Factors to the Stock holm Riots . Border Criminolog ies Blog Post : 3 June. Babtist, J.M. (2010.) Quantitative Analysis o f Culture Using Mi llions of Digitized Books . Science 331 (6014). Bauman, Z. (19 96). From p ilgrim to tourist – or a sho rt history of id entity. Questions of cultural iden tity . pp 19 - 38. London. Buda, A. a nd J arynowski, A. (2 013). Network Structure of Phonographic Market with Characteristic Similarities bet ween Artists, In: Acta Physica Polonica A , vol. 123 (2). Chmiel, A., Sie nkiewicz, J ., T helwall, M., Palto glou, G., Buckley, K., Kappas, A., & Hoł yst, J . A. (2011). Collective e motions online and their infl uence on community life. P loS one , 6 (7), e22207. Gustafsson, S.M., Sikström, S . & L indholm, T . (in press). Selection Bia s i n Choice of Wo rds: Evaluatio ns of ”I” and ”W e” Di ffer bet ween Contexts, but ”They” are Always Worse. Journal of Social Psychology a nd Language. Hallsworth, S. and Bro therton, D. (2 012) . Urban Disorder and Gangs: A Critique and a Warning. London: Runnymede Tr ust. Jarynowski, A., Zbieg, A. a nd Jankowski, J. (2013) Viral spread with or without emotions in o nline co mmunity. arXiv preprint arXiv: 1302.3086 . Mouge, P . & Co ntractor, N. (2003). Theo ries of Communication Networks. Cambridge: Oxford University Press. Nilsson, T., an d Westerberg, A . I. (2012). Våldsamma upplopp i Sve rige – från av vikelse till n ormalitet . MSB:Stockholm. Theguardian & LSE. (2 011). Rea ding the Riots . London. Thelwall, M. (2012 ) Sentiment strengt h detectio n for t he social Web. Journal of the American Society for Information S cience and Technolog y, vol. 63 (1 ). Yuskiv, B. (2 006). Co ntent analysis. Hi story developmen t and world p ractice . Rowne. Rostami, A. (2013 ). Tusen Fiender , Linnéuniversitetet:Lin kopping. Sarnecki, J . (2001). Delinquent networks . Ca mbridge: Cambridge University P ress. Sobkowicz, P. and Sobkowicz, A. (201 2). T wo-year study of emotion and communication patterns in a highly polarized political di scussion forum. Social Science Compu ter Review , 30 (4), 4 48 - 469. Zbieg, A., Żak, B., Jankowski, J., Michalski, R. and Ciuberek, S. (201 2). Stud ying Diffusion o f Viral Content at D yadic Level. In: ASONAM 20 12, Istambul.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment