An Army of Me: Sockpuppets in Online Discussion Communities

An Arm y of Me: Soc kpuppets in Online Discussion Comm unities Srijan K umar University of Mar yland srijan@cs.umd.edu Justin Cheng Stanf ord Univ ersity jcccf@cs.stanf ord.edu Jure Lesko vec Stanf ord Univ ersity jure@cs.stanf ord.edu V .S. Subrahmanian University of Mar yland vs@cs.umd.edu ABSTRA CT In online discussion communities, users can interact and share in- formation and opinions on a wide variety of topics. Howe ver , some users may create multiple identities, or sockpuppets, and engage in undesired behavior by deceiving others or manipulating discus- sions. In this work, we study sockpuppetry across nine discussion communities, and sho w that sockpuppets differ from ordinary users in terms of their posting behavior , linguistic traits, as well as social network structure. Sockpuppets tend to start fewer discussions, write shorter posts, use more personal pronouns such as “I”, and hav e more clustered ego-networks. Further , pairs of sockpuppets controlled by the same individual are more lik ely to interact on the same discussion at the same time than pairs of ordinary users. Our analysis suggests a taxonomy of deceptiv e behavior in discussion communities. Pairs of sockpuppets can vary in their deceptiveness, i.e., whether they pretend to be different users, or their supportiv e- ness, i.e., if they support arguments of other sockpuppets controlled by the same user . W e apply these ﬁndings to a series of prediction tasks, notably , to identify whether a pair of accounts belongs to the same underlying user or not. Altogether , this work presents a data-driven view of deception in online discussion communities and pa ves the w ay to wards the automatic detection of sockpuppets. K eywords Malicious users; Antisocial behavior; Multiple account use 1. INTR ODUCTION Discussions are a core mechanism through which people inter- act with one another and exchange information, ideas, and opin- ions. They take place on social networks such as Facebook, news aggregation sites such as Reddit, as well as news websites such as CNN.com. Nonetheless, the anonymity afforded by some discus- sion platforms has led to some users deceiving others using multi- ple accounts, or sockpuppets [13]. Sockpuppetry is often malicious and deceptiv e, and has been used to manipulate public opinion [2, 38] and vandalize content (e.g., on W ikipedia [37]). © 2017 International W orld Wide W eb Conference Committee (IW3C2), published under Creativ e Commons CC BY 4.0 License. WWW 2017, April 3–7, 2017, Perth, Australia. A CM 978-1-4503-4913-0/17/04 http://dx.doi.org/10.1145/3038912.3052677 . Figure 1: A VClub .com social network. Nodes represent users and edges connect users that reply to each other . Sockpuppets (red nodes) tend to interact with other sockpuppets, and are more central in the netw ork than ordinary users (blue nodes). Prior work on sockpuppetry and decepti ve behavior has tended to focus on individual motiv ations [9, 17], or on identifying sockpup- pets through their linguistic traits [6] (e.g., on Wikipedia [37, 40]). Further , given the dif ﬁculty of obtaining ground-truth data about sockpuppets, w ork has also tended to mak e assumptions about ho w sockpuppets behave, for example, assuming that they have simi- lar usernames [26], they are only used to support one another [47], or that they write similar to each other [6]. Further , research has generally not considered how the interactions between sockpup- pets controlled by the same individual could be used to accurately and automatically identify sockpuppets. As such, improved meth- ods for identifying sockpuppets, as well as deeper analyses of how sockpuppets interact with one another may allow us to better un- derstand, characterize, and automatically detect sockpuppetry . The present work: Sockpuppetry in online discussion commu- nities. In this paper, we focus on identifying, characterizing, and predicting sockpuppetry in nine different online discussion com- munities. W e broadly deﬁne a sockpuppet as a user account that is controlled by an individual (or puppetmaster ) who controls at least one other user account. By considering less easily manipu- lated behavioral traces such as IP addresses and user session data, we automatically identiﬁed 3,656 sockpuppets comprising 1,623 sockpuppet gr oups , where a group of sockpuppets is controlled by a single puppetmaster . Studying these identiﬁed sockpuppets, we discover that sock- puppets differ from ordinary users in terms of how they write and interact with other sockpuppets. Sockpuppets hav e unique linguis- tic traits, for example, using more singular ﬁrst-person pronouns (e.g., “I”), corroborating with prior work on deception [2]. They also use fe wer negation words, perhaps in an attempt to appear more impartial, as well as fe wer standard English parts-of-speech such as verbs and conjunctions. Suggesting that sockpuppets write worse than ordinary users on average, we ﬁnd that posts are more likely to be down voted, reported by the community , and deleted by moderators. Sockpuppets also start fe wer discussions. Examining pairs of sockpuppets controlled by the same puppet- master , we ﬁnd that sockpuppets are more lik ely to post at the same time and post in the same discussion than random pairs of ordinary users. As illustrated in Figure 1, by studying the network of user replies, we ﬁnd that sockpuppets have a higher pagerank and higher local clustering coef ﬁcient than ordinary users, suggesting that they are more important in the network and tend to generate more com- munication between their neighbors. Further , we ﬁnd that pairs of sockpuppets write more similarly to each other than to ordinary users, suggesting that puppetmasters tend not to have both “good” and “bad” accounts. While prior work characterizes sockpuppetry as malicious [37, 14, 40], not all the sockpuppets we identiﬁed were malicious. In some sockpuppet groups, sockpuppets hav e display names signif- icantly dif ferent from each other , b ut in other groups, they have more similar display names. Our ﬁndings suggest a dichotomy in how deceptive sockpuppets are – some are pr etenders , that mas- querade as separate users, while others are non-pr etenders , that is sockpuppets that are ov ertly visible to other members of the com- munity . Pretenders tend to post in the same discussions and are more likely to have their posts down voted, reported, or deleted compared to non-pretenders. In contrast, non-pretenders tend to post in separate discussions, and write posts that are longer and more readable. Our analyses also suggest that sockpuppets may differ in their supportiv eness of each other . Pairs of sockpuppets controlled by the same puppetmaster dif fer in whether they agree with each other in a discussion. While sockpuppets in a pair mostly remain neutral tow ards each other (or are non-supporters ), 30% of the time, one sockpuppet in a pair is used to support the other (or is a supporter ), while 10% of the time, one sockpuppet is used to attack the other (or is a dissenter ). Studying both deceptiv eness and supportiveness, we ﬁnd that supporters also tend to be pretenders, b ut dissenters are not more likely to be pretenders, suggesting that deceptiv eness is only important when sockpuppets are trying to create an illusion of public consensus. Finally , we show how our previous observations can be used to dev elop models for automatically identifying sockpuppetry . W e demonstrate robust performance in dif ferentiating pairs of sock- puppets from pairs of ordinary users (ROC A UC=0.90), as well as in the more difﬁcult task of predicting whether an individual user account is a sockpuppet (R OC A UC=0.68). W e discover that the strongest predictors of sockpuppetry relate to interactions between the two sockpuppet accounts, as well as the interactions between the sockpuppet and the community . Altogether , our results begin to shed light on how sockpuppetry occurs in practice, and paves the way to wards the de velopment and maintenance of healthier online communities. Community Genre # Users # Sockpuppets # Sock-groups CNN News 846,436 1,191 523 Breitbart News 196,846 761 352 allkpop Music 159,671 445 193 MLB Sports 115,845 339 139 IGN Games 266,976 314 142 Fox Ne ws News 145,009 214 94 A.V . Club Entertainment 37,332 199 90 The Hill Politics 158,378 134 62 NPR News 65,662 59 28 T able 1: Statistics of the nine online discussion communities. 2. D A T A AND DEFINITIONS W e start by laying out the terminology that we use in the remain- der of the paper . W e then describe the data we used in our analysis and a robust method for automatically identifying sockpuppets. Sockpuppetry . While in prior work sockpuppets have typically been used to refer to a false online identity that is used for the pur- poses of deceiving others [26, 47], we adopt a broader deﬁnition. W e deﬁne a sockpuppet as a user account controlled by an individ- ual who has at least one other account. In other words, if an indi vid- ual controls multiple user accounts, each account is a sockpuppet. The indi vidual who controls these sockpuppets is referred to as the puppetmaster . W e use the term sockpuppet gr oup/pair to refer to all the sockpuppets controlled by a single puppetmaster . In each sockpuppet group, one sockpuppet typically has made signiﬁcantly more comments than the others – we refer to this sockpuppet as the primary sockpuppet, and the other sockpuppets as secondary . Fi- nally , we use or dinary user to refer to any user account that is not a sockpuppet. W e study sockpuppets in the context of online discussion com- munities. In these communities, people can create accounts to com- ment on articles. In addition to writing or replying to posts , users can also v ote on posts or report them for ab use. Moderators can, in turn, delete posts that do not conform to community standards. If a post is not a reply to another post, we call that post a r oot post . W e deﬁne a discussion as all the posts that follow a gi ven news article, and a sub-discussion as a root post and any replies to that post. Data. The data consists of nine different online discussion commu- nities that encompass a variety of topical interests – from ne ws and politics to sports and entertainment (T able 1). Disqus, a comment- ing platform that hosted these discussions, pro vided us with a com- plete trace of user activity across nine communities that consisted of 2,897,847 users, 2,129,355 discussions, and 62,744,175 posts. Each user has a display name which appears next to their posts and an email address which is priv ate. (T o respect user privac y , all email addresses were stripped of the domain names and analyzed in aggregate.) Each post is also associated with an anonymized IP address of the posting user . Identifying sockpuppets. No explicit labels of sockpuppets ex- ist in any of the discussion communities, so to identify sockpup- pets, we use multiple signals that together suggest that accounts are likely to share the same owner – the IP address of a comment, as well as the times at which comments are made, and the discus- sions they post in. Our approach draws on the approach adopted by W ikipedia administrators who identify sockpuppets by ﬁnding accounts that make similar edits on the same W ikipedia article, in near-similar time and from same IP address [1]. As we are primar- ily interested in identifying sockpuppets with high precision, the criteria we adopt is relativ ely conservati ve – relaxing these criteria may improv e recall, but at the cost of more f alse positiv es. Ordinar y pairs Soc kpuppet pairs 0 1 2 3 4 5 6 K min 0 . 6 0 . 8 1 . 0 1 . 2 1 . 4 Time diff erence betw een posts (a) 0 1 2 3 4 5 6 K min 50 60 70 80 90 100 110 120 Diff erence in length of post (b) Figure 2: V arying K min , the minimum number of overlapping sessions between users for them to be identiﬁed as sockpuppets. For sockpuppet pairs (blue) the time between posts, and the difference in post lengths r each a minimum value at K min = 3 . T o limit spurious detection of sockpuppets, we ﬁlter the data and remov e any IP addresses used by many user accounts, as these ac- counts may simply be accessed from behind a country-wide proxy or intranet. W e also do not consider user accounts that post from many different IP addresses, since they have high chance of shar- ing IP address with other accounts. Speciﬁcally , for each discussion community , we remove the top 5% most used IP addresses and the top 5% accounts that hav e the most IP addresses. Then, we identify sockpuppets as user accounts that post from the same IP address in the same discussion within T minutes for at least K min different discussions. Here, we set T = 15 minutes (larger values result in empirically similar ﬁndings). T o pick an appropriate value for K min , we use two metrics that prior work has found indicati ve of sockpuppets: the time dif ference between posts made by two accounts, and the difference in the length of posts [6, 40, 20, 33]. Figure 2 plots these quantities for identiﬁed pairs of sockpuppets as well as random pairs of users. W e observe that regardless of the value of K min , the identiﬁed sockpuppets post more closely in time and write posts more similar in length, compared to a random pair of users. Further, we ﬁnd that among sockpuppet pairs, these quantities achiev e their minimum at K min = 3, which means that sockpuppets are most reliably identiﬁed at that value of K min . T o summarize, we deﬁne sockpuppets as user accounts that post from the same IP address in the same discussion in close tempo- ral proximity at least 3 times. W e then deﬁne sockpuppet groups as maximal sets of accounts such that each account satisﬁes the abov e deﬁnition with at least one other account in the group. Over- all, we identify a total of 1,623 sockpuppet groups, consisting of 3,656 sockpuppets from nine dif ferent online discussion communi- ties (T able 1). As most sockpuppet groups contain tw o sockpuppets (Figure 3(a)), we focus our analyses on pairs of sockpuppets. W e gi ve an example of an identiﬁed sockpuppet group below , which consists of three users: S 1 and S 2 comprise a pair of sock- puppets, while O is an ordinary user . After an unusually positive interaction between the two sockpuppets, O identiﬁes them as be- ing controlled by the same puppetmaster: S 1 : Possibly the best blog I’ ve ever read major props to you.  → S 2 : Thanks. I knew Marvel f ans would try to ﬂame me, but they have nothing other than “oh that’ s your opinion” instead of coming up with their own argu- ment.  → O : Quit talking to yourself, *******. Get back on your meds if you’ re going to do that. 2 3 4 5 6 7 Sockpuppet group size 10 0 10 1 10 2 10 3 10 4 Number of groups (a) 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F raction of posts of ﬁrst sockpuppet account 0 . 00 0 . 05 0 . 10 0 . 15 0 . 20 0 . 25 0 . 30 0 . 35 0 . 40 F raction of sockpuppet groups (b) Figure 3: (a) Number of sockpuppet groups, i.e. sockpuppets belonging to the same puppetmaster . (b) The second sockpup- pet in a group tends to be cr eated shortly after the ﬁrst. Figure 4: T wo hypotheses how similarity of sockpuppet pairs and ordinary users relates to each other . T op: Under the double life hypothesis, sockpuppet S 1 is similar to an ordinary user O , while S 2 deviates. Bottom: Alternative hypothesis is that both sockpuppet accounts are highly differ ent from ordinary users. 3. CHARA CTERIZING SOCKPUPPETR Y Having identiﬁed sockpuppets, we now turn to characterizing their behavior . W e study when sockpuppets are created and how their language and social networks differ from ordinary users across all nine discussion communities. Sockpuppets are cr eated early . T o understand when sockpuppets are created, we examine the acti vity of the ﬁrst sockpuppet account in each sockpuppet pair . Figure 3(b) shows the fraction of total number of posts made by the ﬁrst sockpuppet before the second sockpuppet is created. The second sockpuppet tends to be created during the ﬁrst 10% of the posts, with a median of 18% posts writ- ten by ﬁrst sockpuppet before the second sockpuppet begins post- ing. In other words, sockpuppets tend to be created early in a user’ s lifetime, which may indicate that sockpuppet creation is premedi- tated and not a result of user’ s interactions in the community . Matching sockpuppets with ordinary users. On average, sock- puppets write more posts than ordinary users (699 vs. 19) and par- ticipate in more discussions (141 vs. 7). T o control for this dispar- ity , in all our subsequent analyses we use propensity score match- ing [34] to match sockpuppets with ordinary users that have similar numbers of posts and make posts to the same set of discussions. 3.1 Do puppetmasters lead double lives? First, we explore an important question about how the behavior of sockpuppets S 1 and S 2 controlled by the same puppetmaster re- lates to that of an ordinary user O . T wo possible hypotheses are illustrated in Figure 4. First is the double life hypothesis, where the puppetmaster maintains a distinct personality for each sock- puppet – one sockpuppet, S 1 , behaves like an ordinary user, while the other , S 2 behav es more maliciously . Under this hypothesis we would expect that the linguistic similarity of posts written by O and Sockpuppets pair Sockpuppet and Ordinary pair 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 Difference in a verage word length (a) Sockpuppets pair Sockpuppet and Ordinary pair 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 4 . 5 Difference in ARI (b) Sockpuppets pair Sockpuppet and Ordinary pair 0 2 4 6 8 10 12 14 Difference in number of function words (c) Sockpuppets pair Sockpuppet and Ordinary pair 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 Difference in number of personal pronouns (d) Sockpuppets pair Sockpuppet and Ordinary pair 0 . 00 0 . 02 0 . 04 0 . 06 0 . 08 0 . 10 0 . 12 0 . 14 0 . 16 Difference in assent words (e) Figure 5: Difference in properties of sockpuppet pairs and that of sockpuppet-ordinary pairs. Sockpuppet pairs are mor e similar to each other in several linguistic attrib utes. S 1 would be high, and that between both S 1 and S 2 , as well as O and S 2 to be signiﬁcantly lower . In the alternativ e hypothesis (Fig- ure 4 (bottom)), both sockpuppets act maliciously . In this case, we might e xpect that the linguistic similarity between S 1 and S 2 would be lo w , but that between S 1 and O , and S 2 and O to be much lower . T o ﬁnd out which is the case, we compare the language of pairs of sockpuppets, and that of each sockpuppet with an ordinary user . T o control for user activity , we again match sockpuppets with ordi- nary users that have similar posting activity , and that participate in similar discussions. Speciﬁcally , for each user , we created a feature vector consisting of sev eral linguistic features computed from that user’ s posts, including LIWC categories and sentiment, the a verage number of words in a post, the average fraction of special charac- ters. W e then compute the cosine similarity of the feature vectors. W e ﬁnd that on average, the two sockpuppets are more similar to each other than either is to an ordinary user ( p < 0.001). Figure 5 highlights that these observ ations hold for indi vidual features as well – the difference between two sockpuppets’ readability score (ARI), average word length, number of function words, personal pronouns and assent words are smaller than that of either sockpup- pet and an ordinary user . This suggests that the double life hy- pothesis (Figure 4(top)) is less likely to be true than the alternate hypothesis (Figure 4(bottom)). In other words, this experiment suggests that puppetmasters do not lead double lives, and that it is generally not the case that in- dividual sockpuppets controlled by the same puppetmaster behave differently . Rather , sockpuppets as a whole tend to write dif fer- ently from ordinary users, and sockpuppets controlled by the same puppetmaster all tend to write similarly to each other . 3.2 Linguistic T raits of Sockpuppets Having established that dif ferent sockpuppets controlled by the same puppetmaster behav e consistently , we now turn our attention to quantify their linguistic traits more precisely . Here, we focus on comparing various measures of similarity sim ( S i , O ) of a sock- puppet S i ( i = { 1 , 2 } ) and a matched ordinary user O . Speciﬁcally , we use LIWC word categories [31] to measure the fraction of each type of w ords written in all posts, and V ADER [18] to measure sen- timent of posts. W e report the average values for sockpuppets and the corresponding p -values by performing paired t -tests for each sockpuppet and its matching ordinary user . Do sockpuppets write differently from ordinary users? Lin- guistic traits have been used to identify trolls in discussions [11], vandalism on W ikipedia [32], and fake revie wers on e-commerce platforms [27]. For e xample, deceptive authors tend to increase their usage of function words, particles and personal pronouns [2, 5]. They use more ﬁrst- and second-person singular personal pro- nouns (e.g., ‘I’, ‘you’), while reducing their use of third-person singular personal pronouns (e.g., ‘his’, ‘her’). They also tend to T opics 0 100 200 300 400 500 600 700 Number of sockpuppets usa world politics justice opinion blog showbiz health tech sport tra vel living business (a) Primary Secondary 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 Acccount edit entropy (b) Figure 6: (a) Histogram for the most active topic f or each sock- puppet account. (b) In a sockpuppet group, the secondary sock- puppets tend to be used alongside the primary sockpuppet. ov ersimplify their writing style by writing shorter sentences and writing words with fe wer syllables. W e make similar observ ations with respect to sockpuppets. First, we observe that they tend to write posts that are more self-centered – and use “I” more often than ordinary users (0.076 for sockpup- pets vs 0.074 for ordinary users, p <0.001). Sockpuppets also use “you” more often (0.017 vs 0.015, p <0.01) but third-person sin- gular personal pronouns and plural personal pronouns (i.e., ‘we’, ‘he’, ‘she’, and ‘they’) less ( e.g. 0.016 vs 0.018 for ‘he/she’ words, p < 0.001), indicating that they tend to address other users in the community more directly . Similarly , we observe that sockpuppets also write shorter sentences than ordinary users (a mean of 12.4 vs. 12.9 words per sentence, p < 0.001). Howe ver , in contrast to prior work on decepti ve writing, sockpuppets use a similar number of syllables per word (1.29 vs 1.28, p = 0.35). T urning to differences in LIWC categories, we observe that sock- puppets also appear to write worse than ordinary users. They are more lik ely to swear (0.003 vs 0.002, p < 0.05) and use more punc- tuation (0.057 vs 0.055, p < 0.05), while using fe wer alphabetic characters (0.769 vs 0.771, p < 0.05). Sockpuppets also use fewer standard English parts-of-speech, such as articles, verbs, adverbs and conjuctions ( e .g. for articles, 0.062 vs 0.064, p < 0.001). How- ev er, while trolls wrote posts that were less readable [11], sock- puppets write posts with similar readability (automated readability index, or ARI = 11.24 vs 11.41 of ordinary users, p =0.09). Sock- puppets also tend to agree more in their posts (0.002 vs -0.012, p < 0.05), possibly to minimize conﬂict with others and support their other sockpuppet account. They also express less negati ve senti- ment (0.022 vs 0.023, p < 0.001), though their overal sentiment, subtracting negati ve from positiv e sentiment, is similar to that of ordinary users (0.030 vs 0.028, p = 0.43). 3.3 Activity and Interactions Next, we study ho w sockpuppets interact with the community at large, and ho w it responds to these sockpuppets. Sockpuppets start fewer discussions, and post more in exist- ing discussions. First, we note that sockpuppets start fewer dis- cussions, but rather post more within existing discussions (65% of sockpuppets’ posts are replies compared to 51% for ordinary users, p < 0.001). This shows that sockpuppets are mainly used to reply to other users. Sockpuppets tend to participate in discussions with more con- tro versial topics. Do sockpuppets create accounts to participate in certain topics? T o answer this, we look at the topics of the discus- sions in CNN on which sockpuppets post. As sho wn in Figure 6(a) topics that tend to attract more controv ersy such as usa, world, pol- itics, justice and opinion , also attract the majority of sockpuppets, while other topics such as health and showbiz have comparatively fewer sockpuppets. This indicates that one of the main moti vations for using sockpuppets is to use them to build support for a particular position, corroborating prior work [6, 47]. Sockpuppets are treated harshly by the community . A com- munity can provide feedback to a sockpuppet in three ways: other users can vote on or report their posts, and moderators can delete the posts. Comparing the posts made by sockpuppets with those made by ordinary users, we ﬁnd that sockpuppets’ posts receiv e a greater fraction of down votes (0.51 vs 0.43, p < 0.001), are reported more often (0.05 vs 0.026, p < 0.001) and are also deleted more often (0.11 vs 0.08, p < 0.001). Moreover , sockpuppets are also blocked by the moderators more often (0.09 vs 0.07, p < 0.001). Overall, this suggests that sockpuppets are making undesirable com- ments. Sockpuppets in a pair interact with each other more. P airs of sockpuppets also tend to post together in more sub-discussions compared to random pairs of ordinary users (6.57 vs 0.33, p < 0.001). Moreo ver , looking at when posts are made, pair of sock- puppets also post more frequently on the same discussion within 15 minutes of each other (7.8 vs 4.28, p < 0.001). In other words, pairs of sockpuppets are signiﬁcantly more likely to interact with one another , and post at the same time, than two ordinary users would. Sockpuppets in a pair upvote each other more. Looking at votes, pairs of sockpuppets vote signiﬁcantly more often on each other’ s posts than random pairs of ordinary users (9.35 vs 0.40 vote s, p < 0.001). Among the two sockpuppets in a pair , the secondary sockpuppet votes more on primary sockpuppet’ s posts than vice- versa (14.2 vs 4.5 votes, p < 0.01). Moreover , pair of sockpuppets largely giv e positive votes to each other as compared to ordinary users (0.987 vs 0.952; p < 0.05). Altogether, sockpuppets in a pair use their votes to signiﬁcantly inﬂate one another’ s ‘popularity’. Secondary sockpuppets are used in conjunction with primary sockpuppets. Puppetmasters, while controlling multiple sockpup- pets, may either use multiple sockpuppets at the same time, or dif- ferent sockpuppets at different times. T o quantify how a puppet- master may switch between using different sockpuppets, we com- pute the fraction of consecutive posts made by a particular sock- puppet, and then compute the entropy of this distribution. This way we quantify how much intertwined is the usage of both sockpup- pet accounts. For instance, consider two sockpuppets controlled by the same puppetmaster . The puppetmaster ﬁrst uses S 1 to write 5 posts, then uses S 2 to write 1 post, switches back to S 1 to write 4 more posts, and then ﬁnally switches back to S 2 to write 1 more post. The entropy of this post sequence for S 1 is − 5 9 log 5 9 − 4 9 log 4 9 , while that for S 2 is − 1 2 log 1 2 − 1 2 log 1 2 . Sockpuppet Ordinary 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 A verage cluster coefﬁcient (a) Sockpuppet Ordinary 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 A verage reiprocity (b) Sockpuppet Ordinary 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 F raction of initiated interactions (c) Figure 7: Comparison of egonetwork of sockpuppets and sim- ilar random users in the reply netw ork. Thus, a lo wer entropy signiﬁes that a particular sockpuppet is not being used at the same time as the other sockpuppet, while higher entropy indicates that that sockpuppet is being used at the same time as another sockpuppet, with the puppetmaster constantly switching between the two. Figure 6(b) shows the entropy of the primary and the secondary sockpuppets in a sockpuppet pair . W e ﬁnd that secondary accounts tend to hav e higher entropy , meaning that these sockpuppets are more likely to be used in conjunction with the primary sockpup- pet, and thus may be used to support the primary account (e.g., in writing supportiv e replies). In contrast, primary sockpuppets hav e lower entrop y , meaning they tend to be used more exclusi vely . 3.4 Reply network structur e Last, we examine the user-user interaction network of the en- tire discussion community . T o do this, we create a reply network , where a node represents a user and an edge from node A to node B indicates that A replied to B ’ s post at least once. Figure 1 shows the reply network of The A V Club discussion community , with red nodes denoting the sockpuppets and blue nodes denoting ordinary accounts. Here, we observe that the nodes denoting sockpuppets are more central in the network. In particular , we ﬁnd that sock- puppets tend to have higher pagerank than ordinary users (2 × 10 − 4 vs 1 × 10 − 6 , p < 0.001). T o further understand the differences in how the sockpuppets in- teract with other users, we additionally compare the ego network of sockpuppets with that of ordinary users (Figure 7). W e observ e that both the number of nodes, and density of the ego netw orks of sock- puppets and ordinary users are similar (291.5 vs. 291.3 nodes, p = 0.97, and densities of 0.24 vs. 0.22, p < 0.01). Howe ver , the ego networks of sockpuppets are more tightly knit, as measured by the av erage clustering coefﬁcient (Figure 7(a), 0.52 vs 0.49, p < 0.001). The nodes in a sockpuppet’ s ego netw ork reply more to their neigh- bors, as measured by the average reciprocity (Figure 7(b), 0.48 vs 0.45, p < 0.001) with sockpuppets generally initiating more inter- actions (that is, they reply to more users than the users that reply to it, Figure 7(c), 0.51 vs 0.46, p < 0.001). These observations sug- gest that sockpuppets are highly activ e in their local network, and also generate more activity among the other users. 4. TYPES OF SOCKPUPPETR Y Different types of sockpuppets exist, and their characteristics suggest that they may serve dif ferent purposes. Here, in contrast to prior work which assumes that sockpuppets usually pretend to be other users [37, 14, 40], we ﬁnd that sockpuppets can dif fer in their deceptiv eness – while man y sockpuppets do pretend to be dif ferent users, a signiﬁcant number do not. When sockpuppets participate in the same discussions, they may also differ in their supportiv e- ness – sockpuppets may be used to support other sockpuppets of the same puppetmaster , while others may choose not to. Pretenders Non-pretenders 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 F raction of upvotes (a) Pretenders Non-pretenders 0 . 00 0 . 01 0 . 02 0 . 03 0 . 04 0 . 05 0 . 06 0 . 07 F raction of special chars (b) Pretenders Non-pretenders 0 10 20 30 40 50 60 70 80 Number of characters per sentence (c) Pretenders Non-pretenders 0 . 00 0 . 01 0 . 02 0 . 03 0 . 04 0 . 05 A verage sentiment (d) Pretenders Non-pretenders 0 . 000 0 . 005 0 . 010 0 . 015 0 . 020 0 . 025 0 . 030 0 . 035 F raction of I (e) Figure 8: Differences between pretenders and non-pretenders: (a) fraction of upvotes, (b) fraction of special characters in posts, (c) number of characters per sentence, (d) av erage sentiment, (e) usage of ﬁrst person pronoun (“I”). Sockpuppets Ordinary 0 2 4 6 8 10 12 Levenshtein distance between displa y names (a) Sockpuppets Ordinary 0 2 4 6 8 10 12 14 16 18 Levenshtein distance between email addresses (b) 0 5 10 15 20 Levenshtein Distance between displa y names 0 50 100 150 200 250 300 Number of pairs Sock pairs Random pairs (c) Figure 9: The (a) display names and (b) email addresses of the sockpuppet accounts are more similar to each other compared to similar random pairs. (c) Based on the distance of display names, sockpuppets can be pretenders (high distance) or non- pretenders (low distance). 4.1 Deceptiveness: Pretenders vs. non-pretenders A pair of sockpuppets can pretend to be two separate individu- als, or may simply be two user accounts an individual uses in dif- ferent contexts, without any masquerading. W e refer to the former group of sockpuppets as pretenders , and the latter group as non- pr etenders . One way we might quantify the deceptiv eness of a sockpuppet pair is to examine the similarity of display names and email ad- dresses (we only examine the part of the email address before the @-sign). Display names are public and sho w up ne xt to user’ s com- ments, while email addresses are priv ate and only visible to forum administrators. If a pair of sockpuppets w ants to appear as two sep- arate users, each may adopt a display name that is substantially dif- ferent from the other in order to deceive community members. Pup- petmaster may also adopt signiﬁcantly different email addresses to av oid detection by system administrators. T o quantify this differ - ence, we measure the Levenshtein distance between two display names, as well as the corresponding email addresses. Figures 9(a) and 9(b) compare how display names and email ad- dresses differ between pairs of sockpuppets, and between random pairs of ordinary users. W e observ e that sockpuppets pairs have 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 F raction of common discussions 0 50 100 150 200 250 300 Number of sockpuppet pairs (a) 0 5 10 15 20 25 30 Levenshtein distance of displa y names 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Median of fraction of common discussions Correlation coeff . = 0.71, p < 10 − 4 (b) Figure 10: (a) Based on the fraction of common discussions be- tween sockpuppet pairs, there are two types of sockpuppets: independent , which rarely post on same discussion, and sock- only , which only post on same discussions. (b) Increase is dis- play name distance is highly correlated with discussion use. both more similar display names as well as email addresses than what would be expected by comparing random pairs of users. This also serves as evidence that sockpuppets we identiﬁed are likely to hav e been created by the same individual. Further , we also observe that email addresses of sockpuppets are 50% more similar than those of ordinary accounts, while display names of sockpuppets are only 25% more similar than expected at random. This observation may be explained by the fact that sockpuppets put more effort into picking unique display names, which are public-facing, and less ef- fort into picking unique email addresses, which are priv ate-facing and less likely to be noticed. But are all sockpuppets simply more likely to have more sim- ilar display names? Examining the distribution of the distances between display names in Figure 9(c), we ﬁnd that the distribution for random pairs is unimodal, while for sockpuppets it is bimodal. This bimodality suggests that two types of sockpuppets pairs ex- ist. The ﬁrst type of sockpuppets has virtually identical display names (Le venshtein distance < 5), and these are what we call non- pretenders. The second type of sockpuppets has substantially dif- ferent display names (Levenshtein distance ≥ 5), and we call them pretenders. Pretenders are likely to be created for deception and use different display names to av oid detection. Non-pretenders on the other hand have similar display names and this may implicitly signal to the community that they are controlled by the same indi- vidual, and thus may be less likely to be malicious. Across all communities, we ﬁnd 947 pretender and 403 non- pretender sockpuppet groups. W e observe that pretenders tend to participate in the same discussions. For example, Figure 10(a) plots the fraction of common discussions over all sockpuppet groups. W e observe bimodality here as well, which may be partially explained by the bimodality of the distribution of display name distances – Figure 10(b) shows that the likelihood of a pair of sockpuppets par- ticipating in the same discussion increases as their display names become more different (ﬁtted regression line sho wn in solid for clarity). In other words, these observations suggest that sockpup- pets that participate in many common discussions hav e very dif- ferent display names (high Levenshtein distance) and are thus pre- tenders, while accounts that participate in few common discussions tend to hav e similar display names and are thus non-pretenders. Figure 8 additionally illustrates the other differences of pretenders and non-pretenders. W e ﬁnd that pretenders’ posts are both more likely to be reported (0.06 fraction of all pretenders’ posts are re- ported vs 0.03 for non-pretenders, p < 0.001 ), be deleted by mod- erators (0.11 vs 0.08, p < 0.001), and recei ve a smaller fraction of up-votes (0.45 vs 0.53, p < 0.001). Pretenders also write posts that contain more uppercase (0.07 vs 0.05, p < 0.001) and special characters (0.06 vs 0.05, p < 0.001), which suggests both shouting, as well as swearing. In contrast, non-pretenders wrote posts which were longer (35.2 words vs 38.6, p < 0.05) and more readable (ARI 11.15 vs 11.58, p < 0.05). Pretenders’ posts also contained more positiv e sentiment (0.04 vs 0.013, p < 0.001) and agreement w ords (0.006 vs 0.005, p < 0.001), suggesting that they tended to be more aff able. 4.2 Supporters vs. Dissenters Prior work suggests that a primary purpose of sockpuppets is to sway public opinion by creating consensus [35]. Thus, we focus our attention on sockpuppets participating in the same discussion, and examine how they interact with each other – do sockpuppets tend to support each other? W e study two ways in which a sockpuppet pair ( S 1 , S 2 ) may in- teract – directly , where one sockpuppet ( S 2 ) replies to another sock- puppet ( S 1 ), or indirectly , where one sockpuppet ( S 2 ) replies to a third user ( O ) who had replied to the ﬁrst sockpuppet ( S 1 ). W e focus on the extent to which sockpuppet S 2 agrees with sock- puppet S 1 , and measure agreement as the difference between frac- tion of words categorized by LIWC as assenting and those catego- rized as either negations or dissenting [45]. W e additionally adjust the sign of agreement depending on who the replying sockpuppet is replying to. For example, S 2 may write a post disagreeing with an ordinary user O . But if that ordinary user O in turn disagreed with the initial sockpuppet S 1 , then we assume that the replying sock- puppet S 2 is instead in agreement with the initial sockpuppet S 1 . W e divide sockpuppets into three groups – supporters , who have a positiv e agreement score, non-supporters , who have an agreement score of zero, and dissenters , who ha ve a negati ve agreement score. Across all communities, we ﬁnd that 60% of the sockpuppets are non-supporters, while 30% are supporters. Only 10% of sockpup- pets are dissenters. Examining these discussions, we ﬁnd evidence that supporters tend to support the ar guments of S 1 (e.g., ‘I agree, or ‘so true’), and sometimes make additional arguments in their fa vor (e.g., ‘That will cost him the election [...]’). On the other hand, dissenters tend to argue against S 1 (e.g., ‘That’ s not what you said [...]’). W e hypothesize that one reason sockpup- pets may disagree with each other, despite being controlled by the same puppetmaster, may simply be to attract more attention to the argument. In some cases, we observed a dissenter making easily refutable arguments (e.g., ‘Ok if your [sic] so worried about being spied Throw away all your electronics. ’), which may have served to discredit the opposing view . Altogether , these observations suggest that within discussions, sockpuppets may adopt different roles. While most sockpuppets argue for other sockpuppets controlled by the same puppetmaster , a small but signiﬁcant number instead ar gue against other sockpup- pets instead. Nonetheless, is there a relationship between deceptiv eness and supportiv eness? Figure 2 shows that overall, users who support Pretender Non-pretender Supporter 0.74 0.26 Non-supporter 0.70 0.30 Dissenter 0.58 0.42 T able 2: 74% of supporters, 70% of non-supporters and 58% of dissenters are pr etenders. Feature Set Features Activity Reply egonetwork clustering coef ﬁcient and reciprocity , Number of posts, proportion of reply posts, T ime between posts, tenure time Community Whether account is blocked, fraction of upvotes, Fraction of reported and deleted posts Post Number of characters, syllables, words, sentences, Fraction of punctuations, uppercase characters, etc., Number of syllables per word, words per sentences, etc. Readability metrics ( e.g. ARI), LIWC ( e.g. swear words), Agreement, sentiment and emotion strength T able 3: Three sets of features were used to identify sockpup- pets and sockpuppet pairs. other users in discussions are most likely to be also pretending to be other users (74% of supporters are pretenders). Interestingly , when users dissent in a discussion, they are less likely to be a pre- tender . This suggests that pretending is most important when a puppetmaster is trying to create an illusion of consensus. 5. DETECTING SOCKPUPPETS Our previous analysis found that sockpuppets generally contribute worse content and engage in decepti ve beha vior . Thus, it would be useful to create automated tools that can help identify sockpuppets, and assist moderators in policing online communities. In this sec- tion, we consider two classiﬁcation tasks, both of which relate to the prediction of sockpuppetry . First, can we distinguish sockpup- pets from ordinary users? And second, can we identify pairs of sockpuppets in the communities? Based on the observations and ﬁndings from the analyses in the previous sections, we identify three sets of features that may help in ﬁnding sockpuppets and sockpuppet pairs: activity features, com- munity features, and post features. For each user U , we develop the following features: Activity features: This set of features is derived from U ’ s post- ing activity . Prior research has sho wn that activity behavior of bots, spammers, and vandals is different from that of benign users [10, 24, 25, 12, 40]. Moreov er , in our analysis, we hav e seen that sockpuppets make more posts and they start less sub-discussions. Therefore, the activity features we consider include the number of posts, the proportion of posts that are replies, the mean time be- tween two consecutiv e posts, and U ’ s site tenure, or the number of days from U ’ s ﬁrst post. Further, we use features based on how U is situated in the reply network. Here, U ’ s local network consists of U , the users whose posts U replied to, and the users that replied to U ’ s posts. W e then consider clustering coefﬁcient and reciprocity of this network. In addition, for the task of identifying pairs of sockpuppets, we use number of common sub-discussions between these sockpuppets to measure ho w often the two comment together . Community features: Interactions between a user and the rest of the community may also be indicative of sockpuppetry . Commu- nity feedback on an account’ s posts has been effecti ve in identifying trolls and cheaters [11, 4], and we also observed that sockpuppets are treated more harshly than ordinary users. Thus, we consider the fraction of do wnv otes on posts U wrote, as well as the fraction that were reported or deleted, in addition to whether U was blocked. All Activity Community P ost Baseline 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 A UC 0.68 0.59 0.54 0.57 0.50 Is an account a sockpuppet? (a) All Activity Community P ost Baseline 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 A UC 0.91 0.86 0.56 0.80 0.50 Are two accounts sockpuppet pairs? (b) Figure 11: Classiﬁcation performance to identify (a) sockpup- pets from ordinary users and (b) pairs of sockpuppet accounts (bottom). Activity featur es have the highest perf ormance. Post features: Finally , we also measure the linguistic features of U ’ s posts. These features have been very ef fectiv e to identify sock- puppets [37, 6], authors of te xt [3, 20], decepti ve writing styles [2], trolls [11], hoaxes [25], and vandalism [32]. In our analysis, we observe that sockpuppets do indeed write differently , for example, writing shorter sentences, using more swear words, and using more singular ﬁrst-person pronouns. Thus, we incorporate linguistic fea- tures such as the number of characters, av erage word length, av- erage number of syllables in words, number of big sentences, text readability (measured using ARI), and the different categories of LIWC features. Finally , we also consider sentiment, emotional va- lence, and agreement (as described previously). 5.1 Is an account a sockpuppet? Giv en the posts a user has made, is it possible to distinguish a sockpuppet from an ordinary user? T o control for user acti vity , we match sockpuppets and ordinary users on their total number of posts, as well as the discussions they post in. Matching gi ves a balanced dataset, where random guessing results in 50% accuracy . W e perform 10-fold cross validation using a random forest classi- ﬁer , then measure performance using ROC A UC. Figure 11(a) shows results obtained indi vidually with each fea- ture set, as well as when considering all the features together . W e observe that when using all features, the A UC is 0.68. Individu- ally , all three feature sets perform similarly , with activity features slightly more predictiv e than the others (A UC=0.59). T o ﬁnd the relati ve advantage of adding features, we perform for- ward feature selection. W e observe that acti vity and post features perform close to the ﬁnal A UC of 0.68, and that there is not much lift by adding the community features. This means that to identify sockpuppets, their activity and content of the post matter the most. 5.2 Are tw o accounts sockpuppet pairs? Next, we turn our attention to identifying pairs of sockpuppets. Giv en a sockpuppet and two other users, where one is a sockpuppet, can we predict which user is the sockpuppet? For each sockpuppet pair ( S 1 , S 2 ) , we choose a matching ordi- nary account O for sockpuppet S 1 . Again, this results in a balanced dataset and the task is to identify which of these two pairs is a sock- puppet. Features used in this experiment are the differences in the individual feature values for the two accounts each pair . W e again ev aluate performance using a random forest classiﬁer . Figure 11(b) shows the performance of the resulting classiﬁer . W e achiev e a very high A UC of 0.91, suggesting that the inter- actions between sockpuppets are strongly predicti ve. Looking at the individual features, we see that activity features again perform the best, with close to 0.86 A UC. Community features perform the poorest, with an A UC of 0.56. Overall, our results suggest that acti vity-based features best pre- dict sockpuppetry . While it is possible to differentiate sockpuppets from ordinary users, it is signiﬁcantly easier to ﬁnd other sockpup- pets in the same group once at one sockpuppet has been identi- ﬁed. The latter result suggests most importantly , the interactions between sockpuppets are the best way to identify them. 6. RELA TED WORK Our ﬁndings b uild on a rich v ein of prior work in both deception and author identiﬁcation. Deception detection Sockpuppetry is situated in the broader ﬁeld of deception. Deception online is aided by the virtue of anonymity [13]. It can occur as deceptive content as well as a deceptiv e agent [41]. The behavior of people changes when they deceiv e, for example, they reduce communication [48] and change the focus of their pre- sentation [39]. When writing deceptiv ely to hide their identity , au- thors tend to increase use of particles and personal pronouns, write shorter sentences, and show nervousness [2, 5, 22, 8]. Our work adds to this line of research by ﬁnding evidence of deceptive writ- ing styles and presentation by sockpuppets – pretender sockpuppets may pretend to be dif ferent people by using dif ferent display names and they tend to write decepti vely as well. Motivations for sockpuppetry T urning to research that studies sockpuppetry speciﬁcally , one line of work has studied their mo- tiv ations. Sockpuppetry is often used to av oid being banned, to create false consensus [38, 2] and support a person or a position [9], or vandalize content ( e.g. on Wikipedia [37]). Relatedly , mo- tiv ations for multiple account creation in online multiplayer games can either be benign ( e.g. experimentation with dif ferent identities) or malicious ( e.g. increasing in-game proﬁt, cheating) [15, 16, 23, 9]. In our work, we ﬁnd e vidence for these motiv ations – sockpup- pets in discussion communities sometimes support each other , and beyond malicious uses, some uses of sockpuppetry may be benign ( e.g . a user may simply use different accounts to post in different topics). Sockpuppetry and author identiﬁcation Another line of work has also identiﬁed sockpuppets using te xtual information, link analysis and temporal information, both in online discussion forums [6, 47] and social networks [14, 26, 47, 43]. Howe ver , deﬁnitions of sock- puppets from previous research have tended to make assumptions about the usernames that sockpuppets use ( e.g. , that they are similar [26]), their opinion tow ards topics ( e.g. , they ha ve the same opinion [6]), and their interactions (e.g., that they reply in support of each other’ s posts [47]). As such, these deﬁnitions tend to miss several types of sockpuppetry . In this work, we de veloped a robust method- ology for identifying sockpuppets that makes fewer assumptions, and sho wed that a signiﬁcant fraction of sockpuppets do use differ - ent names (i.e., the non-pretenders), and tend not to support each other in discussions (i.e., the non-supporters). Sockpuppetry on W ikipedia has been studied extensi vely due to av ailability of manually-v alidated ground-truth data [37, 40, 44, 30]. Howe ver , in contrast to sockpuppetry in discussion communi- ties, which is the focus of our work, sockpuppet editors on W ikipedia primarily edit articles, and the main purpose is not to interact with each other . Closely related to sockpuppet detection is author identiﬁcation, or the task of identifying the original author of a document [20, 21, 28, 29, 30, 33]. More recently , research has used multiple ac- counts of users across different social platforms to identify mali- cious users [42], and to identify accounts operated by the same user across different web platforms [19, 36, 46, 47]. In contrast to our work here, this line of research does not operate under the as- sumption of deception and thus may be less applicable to situations when authors try to obfuscate their writing [2, 5]. 7. DISCUSSION AND CONCLUSION Our ﬁndings shed light on how sockpuppets are used in practice in online discussion communities. By dev eloping a robust method- ology for identifying sockpuppets, we are able to comprehensively study their activity . Importantly , this methodology is able to iden- tify sockpuppets that were created at signiﬁcantly different times, use very different usernames or email addresses, write differently , or mostly post in different discussions. Our work revealed differences in how sockpuppets write and be- hav e in online communities. Sockpuppets use more singular ﬁrst- person pronouns, write shorter sentences, and swear more. They participate in discussions with more controversial topics, and are especially likely to interact with other sockpuppets. These differ - ences allowed us to build preditiv e models that robustly differen- tiate pairs of sockpuppets from ordinary users, as well as identify individual user accounts that are sockpuppets. Nonetheless, our analysis has limitations that would be interest- ing to explore in future work. First, using our heuristics, we are not able to identify sockpuppets that are also throwa way accounts and are only used once before being abandoned. Next, we studied sock- puppetry in discussion forums where users are pseudonymous. It would be interesting to study the effect of using real identities (e.g., Facebook), or in completely anonymous settings (e.g., 4chan). W e also primarily studied pairs of sockpuppets, but understanding how larger groups of sockpuppets function may reveal additional ways in which sockpuppets may coordinate, and may allow us to observe more pronounced effects of sockpuppetry . Further, behavior of sockpuppets in kno wledge sharing platforms (e.g., StackOverﬂo w) may be different from that in opinion expressing discussion plat- forms we studied – for instance, the primary purpose of sockpup- pets in such platforms may primarily be to give additional ‘upv otes’ to their answers. Such a study would bring additional insights into sockpuppetry . Furthermore, prior w ork found that trolling corre- lates with sadism [7] – understanding the role of personality traits in sockpuppetry would also be valuable future work. Finally , ev en more robust methodologies for identifying sockpuppets may un- cov er an even wider range of behavior . For example, sockpuppets may exist beyond a single discussion community – for example, we found 14 different sockpuppet groups that were used in more than one online community . Studying these types of sockpuppets may allow us to better characterize how a sockpuppet’ s behavior may change in different communities. By developing better techniques to identify sockpuppets, our work can be used to improve the quality of discussions online, where each discussant can better trust that their interactions with others are genuine. Nonetheless, while it is possible to identify sockpup- petry , one should be careful not to assume that all sockpuppets are malicious. Our work suggests that a signiﬁcant number of sock- puppets do not pretend to be other users and were simply used to participate in different discussions. This observation suggests that some users ﬁnd it valuable to separate their activity in different spheres of interests. 8. A CKNO WLEDGEMENT Parts of this work were supported by US Army Research Ofﬁce under Grant Number W911NF1610342, NSF IIS-1149837, AR O MURI, D ARP A NGS2, Stanford Data Science Initiative and Mi- crosoft Research PhD fellowship. W e would like to thank Disqus for sharing data with us for research and the anonymous revie wers for their helpful comments. 9. REFERENCES [1] W ikipedia sockpuppet inv estigation policy . https://goo.gl/89ieoY . Accessed: 2016-10-24. [2] S. Afroz, M. Brennan, and R. Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In Security and Privacy (SP), 2012 IEEE Symposium on , pages 461–475. IEEE, 2012. [3] S. Argamon and S. Le vitan. Measuring the usefulness of function words for authorship attrib ution. In Pr oceedings of the 2005 ACH/ALLC Conference , 2005. [4] J. Blackburn, R. Simha, N. K ourtellis, X. Zuo, M. Ripeanu, J. Skvoretz, and A. Iamnitchi. Branded with a scarlet c: cheaters in a gaming social network. In Pr oceedings of the 21st international conference on W orld W ide W eb , pages 81–90. A CM, 2012. [5] M. Brennan, S. Afroz, and R. Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve pri vacy and anon ymity . ACM T ransactions on Information and System Security (TISSEC) , 15(3):12, 2012. [6] Z. Bu, Z. Xia, and J. W ang. A sock puppet detection algorithm on virtual spaces. Knowledge-Based Systems , 37:366–377, 2013. [7] E. E. Buckels, P . D. T rapnell, and D. L. Paulhus. T rolls just want to ha ve fun. P ersonality and individual Differ ences , 67:97–102, 2014. [8] D. B. Buller and J. K. Burgoon. Interpersonal deception theory . Communication theory , 6(3):203–242, 1996. [9] A. Caspi and P . Gorsky . Online deception: Prev alence, motiv ation, and emotion. CyberPsychology & Behavior , 9(1):54–59, 2006. [10] K.-T . Chen and L.-W . Hong. User identiﬁcation based on game-play acti vity patterns. In Pr oceedings of the 6th ACM SIGCOMM workshop on Network and system support for games , pages 7–12. A CM, 2007. [11] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Lesko vec. Antisocial behavior in online discussion communities. In Ninth International AAAI Conference on W eb and Social Media , 2015. [12] J. P . Dickerson, V . Kagan, and V . Subrahmanian. Using sentiment to detect bots on twitter: Are humans more opinionated than bots? In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Confer ence on , pages 620–627. IEEE, 2014. [13] J. S. Donath et al. Identity and deception in the virtual community . Communities in cyberspace , 1999. [14] K. Gani, H. Hacid, and R. Skraba. T owards multiple identity detection in social networks. In Pr oceedings of the 21st International Confer ence on W orld W ide W eb , pages 503–504. A CM, 2012. [15] R. Gilbert, V . Thadani, C. Handy , H. Andrews, T . Sguigna, A. Sasso, and S. Payne. The psychological functions of av atars and alt (s): A qualitativ e study . Computers in Human Behavior , 32:1–8, 2014. [16] R. L. Gilbert, J. A. Foss, and N. A. Murphy . Multiple personality order: Physical and personality characteristics of the self, primary av atar and alt. In Reinventing ourselves: Contemporary concepts of identity in virtual worlds , pages 213–234. Springer , 2011. [17] J. T . Hancock. Digital deception. Oxford handbook of internet psychology , pages 289–301, 2007. [18] C. J. Hutto and E. Gilbert. V ader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International AAAI Conference on W eblogs and Social Media , 2014. [19] P . Jain, P . Kumaraguru, and A. Joshi. @ i seek’fb . me’: Identifying users across multiple online social networks. In Pr oceedings of the 22nd international confer ence on W orld W ide W eb , pages 1259–1268. A CM, 2013. [20] F . Johansson, L. Kaati, and A. Shrestha. Detecting multiple aliases in social media. In Pr oceedings of the 2013 IEEE/A CM international conference on advances in social networks analysis and mining , pages 1004–1011. ACM, 2013. [21] F . Johansson, L. Kaati, and A. Shrestha. Timeprints for identifying social media users with multiple aliases. Security Informatics , 4(1):7, 2015. [22] P . Juola. Detecting stylistic deception. In Pr oceedings of the W orkshop on Computational Approac hes to Deception Detection , pages 91–96. Association for Computational Linguistics, 2012. [23] Y . B. Kafai, D. A. Fields, and M. Cook. Y our second selves: av atar designs and identity play in a teen virtual world. In Pr oceedings of DIGRA , volume 2007, 2007. [24] S. Kumar , F . Spezzano, and V . Subrahmanian. V ews: A wikipedia v andal early w arning system. In Pr oceedings of the 21th A CM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 607–616. ACM, 2015. [25] S. Kumar , R. W est, and J. Leskovec. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes. In Pr oceedings of the 25th International Conference on W orld W ide W eb , pages 591–602. International W orld W ide W eb Conferences Steering Committee, 2016. [26] D. Liu, Q. W u, W . Han, and B. Zhou. Sockpuppet gang detection on social media sites. F r ontiers of Computer Science , 10(1):124–135, 2016. [27] A. Mukherjee, V . V enkataraman, B. Liu, and N. Glance. What yelp fake re view ﬁlter might be doing? In Seventh International AAAI Conference on W eblogs and Social Media , 2013. [28] A. Narayanan, H. Pask ov , N. Z. Gong, J. Bethencourt, E. Stefanov , E. C. R. Shin, and D. Song. On the feasibility of internet-scale author identiﬁcation. In Security and Privacy (SP), 2012 IEEE Symposium on , pages 300–314. IEEE, 2012. [29] J. Nov ak, P . Raghavan, and A. T omkins. Anti-aliasing on the web . In Proceedings of the 13th international conference on W orld W ide W eb , pages 30–39. A CM, 2004. [30] P . P . Paul, M. Sultana, S. A. Matei, and M. Gavrilo va. Editing behavior to recognize authors of cro wdsourced content. In Systems, Man, and Cybernetics (SMC), 2015 IEEE International Confer ence on , pages 1676–1681. IEEE, 2015. [31] J. W . Pennebaker, M. E. Francis, and R. J. Booth. Linguistic inquiry and word count: Liwc 2001. Mahway: Lawrence Erlbaum Associates , 71(2001):2001, 2001. [32] M. Potthast, B. Stein, and R. Gerling. Automatic v andalism detection in wikipedia. In Eur opean Conference on Information Retrieval , pages 663–668. Springer , 2008. [33] T . Qian and B. Liu. Identifying multiple userids of the same author . In EMNLP , pages 1124–1135, 2013. [34] P . R. Rosenbaum and D. B. Rubin. The central role of the propensity score in observational studies for causal ef fects. Biometrika , pages 41–55, 1983. [35] C. Seife. V irtual Unreality: J ust Because the Internet T old Y ou, how Do Y ou Know It’s T rue? Penguin, 2014. [36] G. Silvestri, J. Y ang, A. Bozzon, and A. T agarelli. Linking accounts across social networks: the case of stackoverﬂo w , github and twitter . In International W orkshop on Knowledge Discovery on the WEB , pages 41–52, 2015. [37] T . Solorio, R. Hasan, and M. Mizan. A case study of sockpuppet detection in wikipedia. In W orkshop on Language Analysis in Social Media (LASM) at NAA CL HLT , pages 59–68, 2013. [38] B. Stone and M. Richtel. The hand that controls the sock puppet could get slapped. New Y ork T imes , 2007. [39] C. L. T oma and J. T . Hancock. What lies beneath: The linguistic traces of deception in online dating proﬁles. Journal of Communication , 62(1):78–97, 2012. [40] M. Tsikerdekis and S. Zeadally . Multiple account identity deception detection in social media using non verbal behavior . IEEE T ransactions on Information F orensics and Security , 9(8):1311–1321, 2014. [41] M. Tsikerdekis and S. Zeadally . Online deception in social media. Communications of the ACM , 57(9):72–80, 2014. [42] G. V enkatadri, O. Goga, C. Zhong, B. V iswanath, K. P . Gummadi, and N. Sastry . Strengthening weak identities through inter-domain trust transfer . In Pr oceedings of the 25th International Conference on W orld W ide W eb , pages 1249–1259. International W orld W ide W eb Conferences Steering Committee, 2016. [43] B. V iswanath, A. Post, K. P . Gummadi, and A. Mislove. An analysis of social network-based sybil defenses. A CM SIGCOMM Computer Communication Review , 40(4):363–374, 2010. [44] Z. Y amak, J. Saunier, and L. V ercouter . Detection of multiple identity manipulation in collaborativ e projects. In Pr oceedings of the 25th International Confer ence Companion on W orld W ide W eb , pages 955–960. International W orld W ide W eb Conferences Steering Committee, 2016. [45] H. Y uan, D. O. Clifton, P . Stone, and H. H. Blumberg. Positiv e and negativ e words: Their association with leadership talent and effecti veness. The Psychologist-Mana ger Journal , 4(2):199, 2000. [46] R. Zafarani and H. Liu. Connecting users across social media sites: a behavioral-modeling approach. In Pr oceedings of the 19th A CM SIGKDD international conference on Knowledge discovery and data mining , pages 41–49. ACM, 2013. [47] X. Zheng, Y . M. Lai, K.-P . Chow , L. C. Hui, and S.-M. Y iu. Sockpuppet detection in online discussion forums. In Intelligent Information Hiding and Multimedia Signal Pr ocessing (IIH-MSP), 2011 Seventh International Confer ence on , pages 374–377. IEEE, 2011. [48] L. Zhou and Y .-w . Sung. Cues to deception in online chinese groups. In Hawaii international conference on system sciences, pr oceedings of the 41st annual , pages 146–146. IEEE, 2008.

An Army of Me: Sockpuppets in Online Discussion Communities

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment