Analysis of Social Voting Patterns on Digg

The social Web is transforming the way information is created and distributed. Blog authoring tools enable users to publish content, while sites such as Digg and Del.icio.us are used to distribute content to a wider audience. With content fast becomi…

Authors: Kristina Lerman, Aram Galstyan

Analysis of Social Voting Patterns on Digg
Analysis of So cial V oting P atterns on Digg Kristina Ler man and Aram Galstyan Univ ersity of Southe rn Calif or nia Inf ormation Scie nces Institute 4676 Admir alty W a y Marina del Rey , California 90292, USA {lerman,galstyan }@isi.edu ABSTRA CT The social W eb is transforming the w a y information is cre- ated and distributed. Blog a uthoring tools enable users to publish c onten t, while sites suc h as Dig g and Del.icio.us are used to distribute con tent to a wider audience. With content fast b ecoming a commodity , interest in using social n etw orks to promote and find content h as grown, b oth on the side of conten t pro ducers (viral mark eting) and consumers (recom- mendation). Here we stud y the role of social netw orks in promoting conten t on Digg, a so cial news aggregator that allo ws users to submit link s to and vote on news stories. Digg’s go al is to feature the most interes ting stories on its fron t page, and it aggregates opinions of its many u sers to identif y them. Like other so cial netw orking sites, Digg al- lo ws users to d esignate other users as “friends” and see what stories they found interes ting. W e stud ied the sp read of in- terest in news stories submitted to Digg in June 2006. Our results suggest th at pattern of the spread of interest in a story on th e netw ork is indicative of how p opular t he story will b ecome. Stories th at spread mainly outside of the sub- mitter’s neighborho o d go on to b e very p opular, while stories that sp read mainly through submitter’s social neighborho o d prov e not to b e very p opu lar. This effect is visible already in the early stages of v oting, and one can make a prediction abou t the potential audience of a story simply by analyzing where the initial votes come from. Categories and Subject Descriptors H.3.5 [ INFORMA TION ST ORAGE AND RETRIE- V AL ]: Online Information Services General T erms Human F actors, Measurement Keyw ords Information sharing and forw arding, Recommendation / col- laborative fi ltering sy stems Permission to make digit al or hard copie s of all or part of this work for personal or classroom use is granted without fee prov ided that copies are not made or distrib uted for profit or commercia l adva ntage and that copies bear this notice and the full cita tion on the fi rst page . T o copy otherwi se, to republi sh, to post on servers or to re distrib ute to li sts, requir es prior specific permission and/or a fee. W OSN’08, August 18, 2008, Seattle , W ashington, USA. Copyri ght 2008 ACM 978 -1-60558-182- 8/08/08 ...$5.00. 1. INTR O DUCTION The so cial W eb, a lab el that includes b oth social netw ork- ing sites such as MySpace and F aceb o ok and the social media sites such as Digg and Flickr, is changing the wa y conten t is created and d istribu t ed. W eb -based aut horing to ols en able users to rapidly pub lish conten t, from stories and opinion pieces on w eblogs, to ph otographs and videos on Flic kr and Y ouT ub e, to ad v ice on Y ahoo A nswe rs, to W eb d isco v eries on Del.icio.us and F url. User-generated conten t is fueling the rapid expansion of the W eb, accounting for much of th e n ew W eb content. In addition to allo wing users to share conten t, social W eb sites also include a so cial n etw orking comp onent, whic h means that they allo w users to mark other users as friends or contacts, and provide an interfa ce to track their friends’ activities, e.g., the new conten t t h ey created. With the commodification of content, content pro ducers face a chall enge of h o w to effectively promote and distribut e their conten t. The challenge facing con tent consum ers is how to efficiently identify interesting or relev ant conten t in a v ast stream of new user-generated content. Social scien- tists (and marketers) h a ve long recognized th e central role social netw orks pla y in the spread of informatio n [7]. Before the advent of electronic comm unication, ‘w ord–of–mouth’ recommendation w as carried out mainly th rough telephone and letter comm unications, or p ersonal interac tions. Mod- ern communications technolog ies hav e further elev ated the role of so cial netw orks in p rod u ct recommendation [4, 18], information dissemination [23, 8] and search [14, 11]. H o w- ever, t here are few empirical studies of information propa- gation on social netw orks. Even more interestingly , existing studies ha ve pro duced conflicting results. One study of mu- sic recommendation conduct ed in a laboratory setting, found that users’ c hoice of music to listen to was significan tly in- fluenced by choi ces made by their p eers [21]. How ever, a large-scale study of viral mark eting on Amazon [12] sho w ed w ord of mouth recommendations t o b e large ly ineffectiv e in leading to new pu rchas es of pro du cts. Like th e previ- ous study of information propagation through email [23], it found that most recommendation chains terminate after just a few steps. The study d id note the sensitivity of recom- mendation to price and category of p rod uct, leaving op en the question whether so cial netw ork s are an effective to ol for disseminating information ab out, and helping users dis- criminate b etw een, free (or similarly priced) p rod ucts, e.g., helping users decide what blogs to read or movies to see. W e study t h e role of so cial netw orks in spreading news stories on th e so cial n ews aggregator Digg 1 , which allo ws 1 http://digg .com users to p ost links to and vote on n ews stories. Digg b e- came p opular in part because, rather than relying on an editorial b oard, it aggregates opinions of its many u sers to identif y the most interes ting stories online. Li ke other so- cial W eb sites, D igg also allow s its users to create so cial netw orks by d esignating other users as friends, and makes it easy to trac k friends’ activities. Through the F riends Inter- fac e , which acts as a so cial recommendation engine, a user can see the stories h is friends found interesting. W e p erform an empirical study of so cial recommendation on Digg, by examining the impact of social netw orks on voting. When a story is submitted, some votes come from the n et- w ork neighbors of the sub mitter. New votes might also at- tract additional votes from the neighbors of th e voters, and so on. This pro cess is analogous to a diffusion, or spread of, activ ation on a n etw ork. Our results su ggest th at the pattern of the spread of activ ation through the netw ork is indicative of how interes ting the story is, or how p opular it will b ecome. If in the initial stages of voting, most of th e votes come from within the so cial neighborho o d of the su b- mitter, then the story will likely not pro ve p opular with the general Digg aud ience. If, on the other hand, the votes come from users not directly linked to th e submitter, th e story will lik ely prov e p opu lar. I n other w ords, stories which propagate mostly through th e netw ork effect, do not carry sufficient in- terest for the users outside th e su bmitter’s communit y and will not b ecome p opular, while stories that mainly spread outside of the submitter’s comm u nity will end up becoming p opular. This effect is visible already in the early stages of voting, and one can m ake a pred iction ab out the p otential audience of a story simply by analyzing where the initial votes come from. In the section b elo w w e describ e Digg, its functionality and the data we collect from it. I n S ection 4 w e empirically study the spread of interest in a story th rough th e social netw ork of Digg users. In Section 5 we show that w e can use the early stages of this spread to predict how p opular the story will b ecome. 2. RELA TED WORK Our find ings are in line with conclusions of previous stu d- ies that show ed th at social netw ork s pla y an imp ortant role in promoting and locating conten t [11 , 13, 9]. I n p articular, Lerman [9] sho w ed that users with larger so cial netw orks are more successful in getting their stories promoted to Digg’s fron t page, even if the stories are not very interesting. These findings hav e imp lications for t h e design of so cial media and social netw orking sites. F or examp le, some implementations of social recommendation ma y lead to the “t yranny of the mi- nority ,” where a small group of active, w ell-connected users dominate the site [10]. R ather than b eing a liability , social netw orks can b e used to, for example, more accurately assess the quality of content, as t his pap er shows. Other researc hers hav e used Digg’s trov e of empirical data to study dyn amics of voting. W u and Hu b erman [24] found that interest in a story p eaks when the story first hits the fron t page, and th en d eca ys with time, with a half-life of abou t a day . Their study is complementary to ours, as they studied dyn amics of stories after they hit t he front p age. Also, th ey d o not identify a mechanism for the spread of interes t in a story . W e, on th e other hand, prop ose, an d empirically stud y , so cial netw orks as a mechanism for the spread of interest in a story . Crane and Sornette [3] an- alyzed a large number of v ideos p osted on Y ouT u b e. By looking at th e dy namics of the num ber of votes received by the videos, they found that th ey could iden tify high quality videos, whether they were selected by Y ouT ub e editors, or sp ontaneously b ecame p opular. Lik e W u and H ub erman, they looked at aggregate statistics, not t he microscopic dy - namics of the spread of interest in stories. 3. DIGG’S FUNCTIONALITY The so cial news aggregator Digg relies on users t o sub- mit and mo derate news stories. Each new story goes to the up coming stories qu eue. The new submissions (there are 1-2 new submissions every minute) are display ed in reverse chronol ogical order, 15 to the page, with the most recent story at the top. Each day Digg selects a handful of sto- ries to feature on its fr ont p age . Digg’s goal is to promote only the most interesting of the su b mitted stories, and it relies on the opinions of its users to identify them. Digg’s automatic promotion algorithm lo oks at the voting patterns made within 24 hours of a story’s submission. Alth ough its details are kept secret and change on a regular basis [19], the promotion algorithm takes into account t he num b er of votes a story receives and the rate at which it receives th em, among other factors. In th e data we collected, we did not see any fron t-page stories with few er th an 43 votes, nor did w e see any stories in th e up coming qu eu e with more than 42 votes. Digg also allows users to designate others as friends and makes it easy to trac k friends’ activities. The friendship relationship is asymmetric. When user A lists user B as a friend , user A is able to w atc h th e activity of B bu t not vice v ersa. W e call A the fan of B . Digg provides a F riends Interfac e, whic h su m marizes a user’s friends’ recent activit y: the number of stories his friends hav e sub mitted, commen ted on or voted on in the preceding 48 h ours. T racking activities of friends is a feature of many social W eb sites and is one of their ma jor draws . Digg users v ary widely in their activity level s. S ome users casually b ro wse the fron t p age, voting on one or tw o sto- ries. Others sp end h ours a d a y com bing the W eb for new stories to submit, and voting on stories they found on Digg. Digg calculated a users’ reputation b ased on h o w success- ful they were in getting their stories promoted to the front page. U ntil F ebruary 2007 [20], in order to encourage activ- it y , D igg publicized users’ reputation on the T op Users list. A lo ok at th e statistics of u ser activity show ed th at top- ranked users w ere disprop ortionately activ e: of the more than 15,000 front p age stories submitted by the top 1000 Digg u sers as of June 2006, the top 3% of the u sers were responsible for 35% of the su b missions and a similarly high fractions of th e votes cast and comments made. 3.1 Digg dataset F or our study , we scrap ed Digg’s T echnology section with the aid of a to ol pro vided by F etch T echnolo gies. On June 30, 2006, we scrap ed Digg’s front page, collecting d ata ab out roughly 200 of the most recen tly promoted stories. F or each story , we extracted the story’s title, name of the submitter, time the story w as submitted, as well as names of u sers who voted on the story . A lthough we do not hav e the time stamp of each vote, they are listed in chronologi cal order, with sub- mitter’s name app earing first on the list. In F ebruary 2008 w e augmented this data with the fin al vote count (num b er of diggs) th e stories receive d. In all, we collected information abou t votes from ov er 16 , 600 distinct users. 0 500 1000 1500 2000 2500 0 1000 2000 3000 4000 5 000 tim e ( minutes) number of votes Figure 1: Tim e series of the num be r of votes, si nce submission, recei ved b y randomly c hosen fron t-page stories. The rectangle i ndicates vot es received while the stories were in the up coming stories queue, and dashes indicate transition to the front page. The basic dy namics of votes received by stories app ears the same from story t o story , as show n in Figure 1. While in the up coming qu eue (d otted rectangle in the figure), a story accumula tes votes at some slo w rate, but once it is promoted to the fron t p age (indicated by dashes), it accumulates votes at a much faster rate. As the story ages, th e accumulation of new votes slows do wn, and after a few days, th e story’s vote count saturates at some v alue. This v alue dep en ds on how inter esting the story is to the general Digg communit y . Some stories are very interesting, accumulating thousands votes, while others are not so interesting, receiving few er than 500 votes. Figure 2(a) sho ws the histogram of the fin al vote counts receiv ed by the front-page stories in our sample. Twen ty p ercent of the stories were not very interesting, receiving few er than ab out 500 votes, and tw en ty p ercent w ere very interes ting, receiving more th an 1500 votes. This graph is very similar t o one presented by W u and Hub erman [24] whic h sho wed votes received by almost 30, 000 fron t-page stories on Digg submitted ov er a p eriod of a year. In that dataset, ∼ 20% of front-page stories received fewe r than 400 votes, and another ∼ 20% received b etw een 400 an d 600 votes. Ab out 30% of th e stories received more than 1,000 votes. The distribution of user activity is skew ed, as shown in Figure 2(b). While most of the users had one story promoted to the fron t page, a num b er of users were resp onsible for multiple submissions. These were also th e users with highest reputation, the so-called top users . V oting statistics are even more skew ed. While most of the users voted on only on e story , some voted on many , and a few on w ell ov er a hundred stories. 3.2 Social networks In addition to d ata ab out stories, we also extracted a snap- shot of t he so cial net w ork of the top-ranked 1020 Digg u sers as of June 30, 2006. This d ata contained the names of each user’s friends and fans. As a reminder, user A ’s friends are all the users that A is watc hin g (outgoing links on the so cial netw ork graph), while A ’s fans are all th e users wa tching his activity (incoming links). Since the original so cial net- w ork did not contai n information about all the voters in our dataset, we augmented this data in F ebruary 2008 by extracting names of fans of th e 15 , 000 + add itional users. Man y of these u sers acquired new fans b etw een June 2006 and F ebruary 2008. Although Digg do es not provide infor- mation ab out the time a fan link was created, it do es list these links in reverse c hronologica l order, with the most re- cent appearing on top. In addition to a fan’s name, D igg also giv es the date the fan joined Digg. But eliminating fans who joined D igg after June 30, 2006, w e b elieve w e were able to faithfully reconstruct th e fan links (incoming edges) for all the u sers in our d ataset. The top users, those with most stories on the front page, tend ed to hav e more friends and fans than other users. 4. INFORMA TION SPREAD IN NETWORKS The Digg dataset allo ws us to empirically study the role of social n etw orks in the spread of information. Before a story reac hes the front page, it is v isible only on the u p coming stories queue and th rou gh th e F riends interfa ce. Alth ough some users browse the up coming stories queu e, the quan- tity of submissions t here (more th an 1500 daily at the time w e collected d ata) makes b ro wsing unmanageable t o most users. Digg also offers a v isual interface to brow se the up- coming and front page stories, S w arm and St ac k. These visualizations are supp osed to make it easier for users to identif y more p opu lar stories, but it is not clear h ow many users take adv an tage of th em. Increasingly , many news sites and blogs are including a “Digg it” button to allo w its read- ers to submit or vote on the story d irectly from the story’s W eb p age. Again, it is not clear ho w many users take ad- v antage of t his option. W e b elieve that so cial netw orks pla y an imp ortant role in p romoting stories on Digg. I n a previ- ous w ork w e presen ted data to supp ort the claim th at users emplo y th e F riends interface to filter the v ast stream of new submissions to see the stories their friends liked. Belo w we study the microscopics of information spread. 4.1 Inf ormation cascades At the time of submission, th e story is v isible only to sub - mitter’s fans through th e ‘ se e the stories your friends sub- mitte d ’ part of th e F riends interf ace. As the story receive s new votes, it b ecomes visible to many more users through the ‘ se e the stories my friends dugg ’ 2 part of the F riends in- terface. A story’s influenc e is given by th e number of users who can see it through the F riends interfa ce. Figure 3(a) sho ws a histogram of the stories’ in fl uence. S ligh tly more than h alf of t he stories in our sample were submitted by p oorly connected u sers with fewe r than ten fans. After sto- ries received ten new v otes, almost half of them w ere visible to at least 200 users th rou gh the F riends in terface. After 30 votes, all the stories in our sample were visible to at least ten oth er users through t he F riends interface, and ma jorit y of the stories we re visible to hundreds of users. Because we know the so cial netw ork of Digg u sers, we can count how many votes came from within the net w ork — 2 In this pap er, as on Digg, ’digg’ is syn onymous with ’vote.’ 0 500 1000 1500 2000 2500 3000 3500 4000 0 5 10 15 20 25 number of votes received number of stories receiving x votes 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3 10 4 # submissions or votes (x) # users making x submissions /votes votes submissions (a) (b) Figure 2: Statistics of story and user activity: (a) Hi stogram of the num ber of votes rece i ved by s tories . (b) Histogram of the num ber of stories submitted and voted on. 0 40 80 120 0 40 number of stories 0 200 400 600 800 1000 1200 1400 0 40 story influence at submission after 10 votes after 20 votes 0 50 100 0 50 100 number of cascades 0 5 10 15 20 25 0 50 100 cascade size after 10 votes after 20 votes after 30 votes (a) (b) Figure 3: Spread of interest in stories: (a) histogram of the story’s i nfluence, defined as the num b er of users who can see it through the friends interface, after it rece ived ten votes, and (b) the num b er of in-netw ork vot es the story received within the first ten votes. from fans of the previous voters. This is the story’s c asc ade . Figure 3(b) shows th e d istribution of cascades in our sample. F or 30% of the stories, at least half of the first 10 votes were in-netw ork votes. Cascades grow with th e number of votes cast. After 20 votes, 28% of th e stories had at least 10 in- netw ork votes and after 30 votes, 36% of the stories had at least 10 in-netw ork votes. 5. STOR Y INTERESTINGNESS The total number of votes a story receive s gives a measure of how inter esting it is to Digg’s aud ience. Digg attempts to predict whether the story will b e found interesting by its au- dience when it makes a d ecision whether t o promote a story to the fron t page. It uses a num b er of features in the predic- tion, such as t he num b er of votes received and t he rate at whic h it receives them, D igg attempts to predict, and gen- erally makes the p rediction after 40 or so votes were cast. It is esp ecially c hallenging to Digg to predict how interesting a story submitted by one of the top users is. T op users are far more active and well connected than other users, meaning that they submit and vote on many more stories, some of whic h happ en to b e stories submitted by th eir friends. Since top u sers are more likely to b e in the same netw ork , their stories are more likely t o get more votes and therefore, b e promoted to th e front page. In S eptember 2006, a contro- versy ab out top user dominance [1, 2 ] caused Digg to mo dify the promotion algorithm to take in to account “un ique dig- ging diversit y of the individu als digging the story” [19]. Al- though this mo dification did result in changes in front page composition, it is not clear whether it affected the spread of interes t in stories on the social netw orks on Digg. Rather than discounting the votes coming from fans, as Digg has chos en to do, we show that w e can predict how interesting a story will b e by monitoring its spread through t he so cial netw ork. 5.1 Social networks and inter estingness In a prev ious work [9] we sh o w ed th at top Digg users w ere very successful in getting th eir stories promoted to the front page. W e claimed th at this could b e explained by social brow sing, i.e., the fact that Digg users use th e F riends inter- face to fi n d n ew interesting stories. W e show ed that social brow sing, together with the observ ation that top users ha ve more fans than other users, ex plains how less interesting sto- ries submitted by top u sers are promoted to the fron t page. Here we study in detail ho w the spread of interest in a story through the so cial net w ork relates to how interesting the story is. aft er 6 v otes 0 500 1000 1500 2000 fi nal votes aft er 1 0 v otes 0 500 1000 1500 2000 fi nal votes aft er 20 v otes 0 500 1000 1500 2000 0 5 10 15 20 in-net w ork votes fi nal votes Figure 4: Distribution of the num ber of in-netw ork vot es stories receive vs how i nt eresting they are. The plots s ho w the num b er of in-netw ork votes re- ceived w i thin the first (not counting the submitter) six, 10 and 20 votes. Figure 4 sh ows the total number of votes a story receives (interes tingness) as a function of how many in-netw ork votes it received within the first (not counting the su bmitter) six, 10 and 20 v otes. The graph shows the median and width of the distribu t ion of votes (ex cept for the highest and low est v alues) for that v alue of in-netw ork votes. There is a clear inv erse relationship b etw een interesti ngness and the fraction of in-netw ork votes, and this relationship is already visible early on, within the first 6–10 votes. W e defin e a story to b e in teresting if it receives at least 520 vo tes, and n ot in ter- esting if it received fewer th an 520 votes. 3 As found in our previous work, many of the front-page stories submitted by b est connected (top) users were deemed to b e uninteresting, receiving fewer votes, and almost all of the stories submitted 3 This threshold wa s chosen based on Figure 2(a), which in- dicated that 20% of th e stories in the sample receiv ed few er than 500 votes, suggesting 500 as th e interestingness th resh- old. Tw o stories in our sample that were submitted by top users w ere close to this threshold, with 505 and 507 votes (with five in-netw ork votes each). W e made t he decision to raise the interestingness threshold to 520 and keep th ese am biguous cases in the sample. by p o orly connected users were found to b e highly interest- ing, with many gathering th ou sand s of votes. One of t h e ex- ceptions was a story submitted by a po orly connected user that gathered only 185 votes. One of the early voters for this story w as kevinr ose , the founder of Digg an d the user with most fans. The extra visibilit y th at kevinr ose ’s vote ga ve to the story , help ed promoted this uninteresting story to the front page. These observ ations suggest that there are tw o mec hanisms for th e spread of interest in a story on Digg: interest-based and netw ork -based. A h ighly interesting story will spread from many indep endent seed sites, as users unconnected t o netw ork of the previous vo ters disco ver the story with some small p robabilit y and propagate it to th eir own fans. A story that is interesting t o a narrow community , how ev er, will spread within that communit y only , without b eing pick ed up by unconnected users. 5.2 Pre dicting intere stingness The evidence presented in t h e section ab ov e suggests that it is p ossible to pred ict how interesting a story is by mon i- toring how interest in it spreads through th e social netw ork. Moreo ver, it should b e p ossible to make t h e pred iction rela- tively early , after the first 6–10 votes. Digg generally wa its longer, until a story accumulates at least 40 votes. Such pre- diction is esp ecially useful for stories sub mitted by top users who tend to hav e bigger and more active social netw orks, and therefore, make it more d ifficult to decipher b etw een a user’s p opularity and story interestingness. v10 v10 fans1 yes(130/5) no(18/0) yes(30/8) no(29/13) <=4 >4 >8 <=8 <=85 >85 Figure 5: D ecision tree classifie r trained on the votes data. W e trained a C4.5 (J48) decision tree classifier [22] on 207 stories in our dataset. Eac h story had th ree attributes: num- b er of in- netw ork votes within the first ten votes (v10), num- b er of u sers watc hing th e su b mitter (fans1) an d a b o olean attribute indicating whether the story was interesting or not. The story w as judged in teresting if it received more than 520 votes. Figure 5 shows the learned decision tree. Results of 10-fold v alidation indicate that this tree correctly classifies 174 of t h e examples, and misclassifies 33 examples. W e tested the learned mod el on stories extracted from the up coming q ueue on June 30, 2006. This d ataset consisted of 900 stories submitted within the same time p erio d as the data analyzed above , b u t not yet promoted to the front page. W e augmented this data by retrieving the fi nal number of votes received by stories. F rom this set, we kept only the stories that were submitted by top u sers (with rank ≤ 100) and received at least 10 votes, lea v ing 48 stories. W e u sed the learned classifier in Figure 5 to predict whether a story was interesting ( receiv ed more than 520 v otes). The classifier correctly pred icted 36 ex amp les (TP=4, TN=32) and made 12 errors (FP=11, FN=1). 4 It is difficult to com- pare t he pred ictions made by our algorithm to t h ose made by Digg, b ecause some of the stories t h at Digg did not pro- mote could h a ve end ed up receiving many votes and b eing deemed interesting. When we limit th e comparison only t o the stories th at Digg did promote, of the 14 stories p romoted by Digg, only five w en t on to receiv e more than 520 votes (P=TP/(TP+FP)=0.36), in other wo rds, were judged as in- teresting by the Digg comm unity . In contrast, our algorithm said that seven of these stories were interesting, and of these four received more than 520 votes (P=0.57). 6. CONCLUSION W e studied empirically t he spread of in terest in news sto- ries on the social news aggregator Digg. W e found th at social netw orks play a significant role in promoting stories. In addition, we show that the pattern of social voting can b e used for p redicting how in teresting th e story will b e. Al- though our study was carried out on data from Digg, we b eliev e that its conclusions will apply to other so cial media sites th at use social netw orks to promote conten t. As a future work, it will b e interesting to analyze more thoroughly the role of netw ork’s structural prop erties on the voting dynamics. Indeed, it is known that structural prop erties can ha ve a significant impact on v arious dynam- ical pro cesses on netw orks. F or instance, it is known that p o w er–la w degree distribution observed in many real–w orld netw orks can lead to v anishing t hreshold for ep id emics [17, 16] for certain mo dels, in a sharp contrast with the results for random Erdos-R enyi netw orks. F urthermore, the presence of wel l–connected clusters of nod es can impact the tran sient dynamics of v arious infl uence propagation mo d els[5]. This latter phenomenon can b e especially imp ortant in netw orks with well– defined c ommuni ty structur e [6, 15]. 7. REFERENCES [1] M. Arrington. T roub les in diggville. 09/06/2006 http://w ww.tec h crunch.com/2 006/09/ 06/troubles-in-diggville/ . [2] M. Calore. Digg fights top users for control, 09/07/2 006 Wired News. [3] R. Crane and D. Sornette. Viral, quality , and junk videos on youtub e: Separating conten t from noise in an information-ric h environment. In Pr o c. of AAAI symp osium on So cial Information Pr o c essing , Menlo P ark, CA, 2008. AAA I. [4] P . Domingos and M. Richardson. Mining the netw ork v alue of customers. In KDD ’01: Pr o c e e dings of the seventh ACM SIGKDD i nternational c onfer enc e on Know le dge disc overy and data mini ng , pages 57–66, New Y ork, NY, USA , 2001. ACM Press. [5] A. Galsty an and P . Cohen. Cascading d ynamics in mod u lar netw ork s. Phys. R ev. E , 75:036109 , 2007. [6] M. Girv an and M. E. Newman. Communit y structure in social an d biological netw ork s. Pr o c Natl A c ad Sci U S A , 99(12):7821 –7826, June 2002. 4 The notation denotes true p ositiv es (TP), true negatives (TN), false p ositiv es (FP) and false negatives (FN). [7] M. S. Grano vetter. The strength of w eak ties. The Amer ic an Journal of So ciolo gy , 78(6):1360–13 80, 1973. [8] D. Gruhl, D. Lib en-N o w ell, R. Guha, and A. T omkins. Information diffusion th rough blogspace. SIGKDD Explor. Newsl. , 6(2):43–52, December 2004. [9] K. Lerman. So cial information p rocessing in so cial news aggregation. IEEE I nternet Computing: sp e cial issue on So cial Se ar ch , 11(6):16–28, 2007. [10] K . Lerman. User participation in so cial media: Digg study . Pr o c. of the WI-IA T Workshop on So cial Me dia Ana lysis (SMA07) , 2007. [11] K . Lerman, A. Plangprasopchok, and C. W ong. P ersonalizing results of image search on flickr. In AAAI w orkshop on Intel ligent T e chniques for Web Personlization , 2007. [12] J. Lesko vec, L.A. Adamic, and B.A. H ub erman. The dynamics of viral marketing. In EC ’06: Pr o c e e dings of the 7th ACM c onfer enc e on Ele ctr onic c ommer c e , pages 228–237, New Y ork, NY, US A , 2006. ACM Press. [13] A . Mislo ve, M. Marcon, K. P . Gummadi, P . Druschel, and B. Bhattacharjee. Measurement and analysis of online so cial n etw orks. In Pr o c e e dings of the 5th ACM/USENIX Internet Me asur ement Conf er enc e (IMC’07) , 2007. [14] A . Mislo ve, K.P . Gummadi, and P . Dru sc hel. Exploiting so cial n etw orks for internet searc h. In Pr o c e e dings of the 5th Workshop on Hot T opics in Networks (HotNets ˇ S06) , 2006. [15] M. E. J. N ewman. Mo dularity and communit y structure in netw orks. Pr o c Natl A c ad Sci U S A , 103(23):85 77–8582 , June 2006. [16] R . Pa stor-Satorras and A. V espignani. Epidemic dynamics and end emic states in complex netw orks. Phys. R ev. E , 63(6):066117, 2001. [17] R . Pa stor-Satorras and A. V espignani. Epidemic spreading in scale-free netw orks. Phys. R ev. L ett. , 86(14):320 0–3203, 2001. [18] M. Richardson and P . Domingos. Mining knowl edge-sharing sites for viral marketing. In KDD ’02: Pr o c e e dings of the eighth ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 61–70, New Y ork, NY, USA, 2002. ACM . [19] K . Rose. Digg friends, 09/2006 http://digg theblog.blogspot.com/2006/09 /digg-friends.h tml . [20] K . Rose. A coup le of up dates, 02/01/2 007 http://blog .digg.com/? p=60 . [21] M. J. Salganik, P . S . Do dds, and D. J. W atts. Exp erimental study of inequ alit y and unp redictabilit y in an artificial cultural market. Scienc e , 311:854, 2006. [22] I . H. Witten and E. F rank . Data Mining: Pr actic al Machine L e arning T o ols and T e chniques . Morgan Kaufmann, San F rancisco, 2nd edition edition, 2005. [23] F. W u, B. Hu b erman, L. Adamic, and J. Tyler. Information flow in so cial groups. Physic a A , 2003. [24] F. W u and B. A. Hub erman. N o velt y and collectiv e attentio n. Pr o c e e di ngs of the National Ac ademy of Scienc es , 104(45):17599–17 601, Nov em ber 2007. 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3 number friends+1 number fans+1 all users top users

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment