Diversity of Online Community Activities
Web sites where users create and rate content as well as form networks with other users display long-tailed distributions in many aspects of behavior. Using behavior on one such community site, Essembly, we propose and evaluate plausible mechanisms t…
Authors: Tad Hogg, Gabor Szabo
Div ersit y of Online Comm unit y Ac tiv ities T ad Hogg HP Labs P alo Alto, CA 9 4304 Gab or Szab o HP Labs P alo Alto, CA 9 4304 No v em b er 23, 2018 Abstract W eb sites where users create and rate conten t as well as for m netw orks with other users display long- ta iled distributions in many asp ects of b ehavior. Using be- havior on o ne such communit y s ite, Essembly , we pro- po se and ev aluate plausible mechanisms to expla in these b ehaviors. Unlik e pur ely descriptive models , these mechanisms rely on user behaviors based on information av a ilable lo cally to ea ch user. F or Es- sembly , we find the long-tails arise fro m large differ- ences among user activity rates and qualities of the rated conten t, as w ell as the extensive v ariability in the time users dev ote to the s ite. W e show that the mo dels not only ex plain overall b ehavior but also al- low estimating the qua lity of conten t from their ear ly behaviors. 1 In tro duction Participatory web s ites facilitate their users c reat- ing, r ating a nd shar ing conten t. Examples include Digg[.com] for news stories, Flickr[.com] for photos and Wikip edia[.or g] for ency c lop edia articles. T o a id users in finding co nten t, many such sites employ col- lab orative filtering [19] to allow users to sp ecify links to other users whose cont ent or ratings are particu- larly relev an t. These links c a n in volv e either p eople who alr eady know each o ther (e.g., friends) or p eople who discov er their common in teres ts thro ugh par tic- ipating in the w eb site. In addition t o helping iden- tify relev ant con tent, the resulting netw orks enable users to find others with similar interests and estab- lish trust in recommendations [13]. The av ailabilit y of activity recor ds from these sites has led to numerous studies of user b e havior and the netw or ks they cr eate. Observed commonalities in these systems sugges t genera l generative pro cesses leading to thes e obser v ations. Examples include pref- erential a ttachmen t in fo r ming netw orks a nd multi- plicative pr o cesses leading to wide v ariation in user activity . While such mo dels provide a br oad under- standing of the observ ations, they often lac k causal connection with plausible user b ehaviors based on user preferences and the informatio n av ailable to users in mak ing their decisions [2 9, 5 ]. Moreov er, observed b ehavior can ar ise from a v ariety of mech a- nisms [23]. F or predicting consequence s of alternate desig ns of the web site, mo dels including causal b ehavior a re necessary . Establishing such mo dels is mor e difficult than simply obser ving b ehavior: due to the p ossi- bilit y of confounding factors in o bserv ations, many different causal mo dels can pro duce the same obser - v ations. Instead, such mo dels would ideally use inter- ven tion studies a nd randomize d tria ls to identify im- po rtant caus al relationships. In co ntrast to the wide av ailability of observ ational data on user b ehavior, such interv ention studies are difficult, thoug h this is situation is improving with the increasing fea sibility of e x per iments in la rge v irtual communities [4] and large-s cale web-based exp er imen ts [27]. Nevertheless, identifying infor ma tion rea dily av ail- able to users o n a par ticipatory w eb site can s uggest plausible causal mechanisms. Such mo dels provide sp ecific h yp otheses to test with future interv ention exp eriments and also sug gest improvemen ts to over- all sys tem b ehavior by altering the us e r experience, e.g., av ailable information or incentiv es. The simplest such a ppr oach co ns iders av erage b ehavior of user s on a site [21]. Such mo dels can indicate how system b e- havior r elates to the average decis ions o f many users. By desig n, s uc h mo dels do not address a pro minent asp ect of o bserved online netw orks: the long tails in their distributions of links a nd activity . Mo dels in- cluding this diversit y could b e useful to impr ove effec- 1 tiveness of the web sites by allowing fo cus on signif- icantly active user s or esp ecially interesting co n tent, and enhancing user exp erience by leveraging the long tail in nic he demand [3]. A key question with resp ect to the o bs erved diver- sity is whether users, conten t and the net works are reasona bly viewed a s behaviors arising from a sta - tistically homogeneo us p opulation, a nd hence w ell- characterized by a mean and v ariance. Or is div er- sity of int rinsic c hara cteristics among par ticipa nt s the dominant cause of the observed wide v ariation in be - haviors? In the latter case, can these characteris- tics be estimated (quickly) fro m (a few) observ ations of behavior, allo wing site design to use estimates o f these characteristics, e.g., to highligh t esp ecially in- teresting con tent? Moreov er, to the exten t use r di- versit y is imp orta n t, wha t is a minimal c hara cteriza- tion of this user v ariation sufficient to pro duce the observed long-tail distributions? This pap er considers these ques tions in the co ntext of a p olitically-or iented web communit y , Es sembly 1 . Unlik e mos t such s ites , E ssembly pr ovides multiple net works with differing nominal s e ma nt ics, whic h is useful for distinguishing amo ng so me mo dels. W e consider plausible mechanisms us e r s could be follow- ing to pro duce the obser ved long - tail b ehaviors b oth in their online activities and netw or k characteristics. In the r emainder of this pap er, we first des c rib e Es - sembly and our da ta se t in Sec. 2 . W e then sep- arately exa mine highly v ariable behaviors for users, conten t r ating and netw or k for ma tion, in Secs. 3, 4, and 5, respec tively . W e suggest mo dels to descr ibe the observed c hara cteristics of us e r s, conten t, a nd the net work, and co nsider their p o ssible use during op- eration of the web site by helping iden tify user and conten t par ameters ear ly in their history . Finally we discuss implications a nd extensions to other partici- patory w eb sites. In the three sections fo cus ing on user b ehavior, resolve characteris tics, and net work structure, re- sp ectively , w e first introduce the o bserv ations, then present a model describing these observ ations (sub- sections Mo del ), and finally analyze the mo del pa- rameters and predictions (subsections Behavior ). 2 Essem bly Essembly is an online service helping user s engage in po litical discussion through cr eating and voting on 1 Essembly LLC at www .essemb ly . com r esolves reflecting contro versial issues. Essembly pro- vides three distinct netw orks for users: a so cial net- work, an ideo logical pre fer ence netw ork, a nd an anti- preference net work, ca lled friends (those who kno w each o ther in p erson), al lies (th ose who share similar ideologies) and n emeses (thos e who have opp osing ideologies), resp ectively . The distinct so cial and ideological net works en- able users to distinguish betw een p eople they know per sonally and p eo ple encoun tered on the site with whom they tend to agr ee o r disagr ee. Netw ork links are formed by invitation only and each link must b e approv ed by the invitee. Thus all thr ee netw orks in Essembly ar e explicitly created by users. Essembly provides a ranked list of ide o logically most similar or dissimilar user s based on voting history , th us users can identify p otential allies or nemeses by compar ing profiles. With regar ds to voting a ctivity , the Esse m- bly user interface pre sents several options for users to discov er new resolves, for instanc e based o n votes by net work neig h b ors , recency , ov erall p opularity , and degree of contro versy . Our data s et consis ts of anonymized voting recor ds for E ssembly b etw een its inception in August 20 05 and December 2006 , and the users and the links they hav e at that time in the thr ee net works a t the end of this per io d. Our data set ha s 15 , 424 use rs. Essem bly presents 1 0 re s olves during the user regis tration pro- cess to establish a n initial ideolo gical pro file used to facilitate users finding others with similar or different po litical views. T o fo cus on user-cr eated conten t, we consider the remaining 24 , 953 resolves, with a total of 1 . 3 million votes. 3 Users Fig. 1 sho ws most users a re activ e for o nly a short time (less than a day), as measur ed by the time be- t ween their first a nd la st votes (this includes votes on the initia l res olves during registra tion – users need not vote on a ll of them immediately). The 476 2 users active for a t least a day a ccount for most o f the votes and links, a nd we fo cus on these active users for o ur mo del. F or these users, Fig. 1 shows an exp o nen- tial fit to the activity distribution for in termediate times. Thus user s who hav e sufficient interest in the system to pa rticipate for at least a few days b ehav e approximately as if they decide to sto p participa ting as a Poisson pr o cess. The additional decrea se at long times (ab ov e 200 days or so) is due to the finite length 2 0 50 100 150 200 250 300 0.20 0.10 0.05 0.02 active time H days L fraction of users active at least that long Figure 1: Distributio n of activity times for users. The line sho ws an exponential fit to the v a lue s b et ween 10 and 200 days, prop or tional to e − t/τ where τ = 124 days . of our da ta sample (ab out 500 days). Abo ut a fifth of users hav e no votes o n noninitial resolves. F or the r est of the users, Fig . 2 shows the distribution of votes among users who voted at least once for noninitial resolves. These votes are clo se to a Zipf distributio n in num b er of votes, with num b er of user s with v votes prop ortio na l to v − ν − 1 . The parameter estimates a nd co nfidence interv als in this and the other figures ar e maximum likeliho o d esti- mates [24, 17] a ssuming indep endent samples. This wide v ar ia tion in user activity also o ccurs in other participator y web sites such as Digg [22]. The distribution of n umber votes pe r user aris es from tw o fa c tors: how long users participate b efore bec oming inactive, and how often they vote while ac- tive. 3.1 Mo del Fig. 3 summar izes our mo del for user b ehavior. This mo dels the participation of users and their activities on the site while they are a ctive. New us ers arr ive in the system when they r egister, and we mo del this as a Poisson pro ce ss with rate α , and such us ers leav e the system ( i.e. , b ecome inactive) with a rate 1 /τ . T able 1 gives the v alues for these mo del par ameters based on average arr iv al and activity times of active users. User activities consist of voting , cr e ating r esolves , and forming links . User activity is clump ed in time, with g roups of many votes c lose in time separ ated by gaps of at leas t se veral hour s. T his temp or al str uc- 1 10 100 1000 10 4 1 10 100 1000 votes per user number of users Figure 2: Distribution of num b er of users v s. the nu mber of votes a user made. The so lid c ur ve in- dicates a Zipf distribution fit to the v alues , with pa- rameter ν = 0 . 45 ± 0 . 01. In this and other figur es the range giv en with the par ameter estimate is the 95% confidence interv al. The plot do es not include the 2984 users with zero votes. parameter v alue new user rate α = 9 . 3 / day activity time constant τ = 124 days resolve creatio n q = 0 . 0 18 ± 0 . 0002 link creation λ = 0 . 043 ± 0 . 0003 T able 1: User activity para meters. ture can b e viewed as a seq ue nc e of user sessions. The av eraged distributions for interev ent times b et ween activities of individuals s how lo ng-tail b ehavior, sim- ilarly to other observed human activity patterns, such as email communications or web site visits [30 ]. T o mo del the num b er of votes p er user in the long time limit where we are only interested in the total num- ber o f accumulated votes for a particular user, this clumping of votes in time is not important. Sp ecif- ically w e suppose each user has an average activity rate ρ while they are a ctive on the site (cf. Fig. 1), given as ρ u = e u /T u , whe r e ρ u is user u ’s activity , e u is her nu mber of even ts (i.e., votes, r esolve crea tions and links), and T u is the time ela psed betw een her first and la st vote. W e suppo se the ρ u v alues arise as indep endent choices from a distribution P user ( ρ u ) and the v alues are independent of the length of time a user is activ e o n the site. These pr op erties ar e only weakly cor r elated (co rrelation c o efficient − 0 . 0 6 among active users ). W e characterize user activities by fractions q and λ 3 Figure 3: Model o f us e r b ehavior. People join the site as active user s , who cr e ate res olves, vote on them a nd link to other a ctive users . User s can even tually stop participating and be c ome inactive. for cr eating r esolves and forming links, resp ectively . The r ate of v oting o n existing resolves fo r a us e r is then ρ u (1 − q − λ ), which is by fa r the most common of the three us er activities. F or simplicit y , we treat these choices as indep endent a nd take q and λ to be the same for all user s. Th us in o ur mo del, the v ar iation a mong users is due to their differ ing ov erall activity rates ρ u and amount of time they are active on the site T u . 3.2 Beha vior W e estimate the mo del parameter s from the o bserved user activities, and re s trict a tten tion to active user s. T able 1 shows the estimates for parameter s, q and λ , g overning activity c hoices. Fig. 4 shows the o b- served cumulativ e distribution ρ u v alues and a fit to a lognormal distribution. The heavy ta iled na ture o f the votes p er us e r dis- tribution (Fig. 2) can b e a ttributed to the interpla y betw een the use r activity times T u and the broad log- normal distribution of the user activity rates ρ u : the mixture of these tw o distributions r esults in a p ower law, as has been shown in the con text of web page links as w ell [16]. The distributions of activity times and rates pr e- sumably reflect the range of dedication o f users to the s ite, wher e most user s are trying the s ervice for a v ery limited time but active users are also repre- sented in the heavy tail. Such extended distributions of user a ctivity ra tes is also see n in other activities, including use of web sites, e.g., Digg [22 ], and scien- tific pro ductivity [28]. 0.01 0.1 1 10 100 0.0 0.2 0.4 0.6 0.8 1.0 user Ρ value fraction of users with smaller value Figure 4: Cumulativ e distribution of activity rates, ρ u , for the 4719 users who were active at least one day and v oted on at least o ne noninitial reso lve or formed at lea s t o n link. Plot includes a cur ve for a lognorma l distr ibution fit, which is indistinguisha ble from the p oints a nd with parameters µ = 0 . 0 3 ± 0 . 05 and σ = 1 . 70 ± 0 . 04. T he ρ v alues are in units o f actions p e r day . 4 Resolv es A key question for user- created conten t is how user activities distribute among the av a ila ble co n tent. F o r Essembly , Fig. 5 s hows the total num b er of votes p er resolve. This distr ibution cov ers a wide r a nge, with some res olves receiv ing many times a s many votes as the median. In Esse mbly , each res olve receives its first vote when it is cr e ated, i.e., the vote of the user introducing the r esolve. Thus the o bserved votes on a r e solve ar e a combination of tw o user activi- ties: creating a new res o lve (giving the resolve its first vote) and subsequently o ther user s choo s ing to vote on the resolve if they see it while visiting the site. W e note that users do not see the distribution of previo us votes until they cas t their votes, so that their judgement is unbiased. After voting, they can see how other users had voted on the resolve. W e co nsider a us e r ’s s election of an existing re- solve to vote on as mainly due to a combination of t wo factors : vis ibilit y and interestingness of a resolve to a us e r. Visibilit y is the proba bilit y a us er finds the reso lve during a vis it to the site. Interestingness is the conditiona l probability a user votes o n the r e- solve given it is visible to tha t user . These t wo factors apply to a v ar iety of web s ites, e .g., providing a de- scription of av erag e behavior on Dig g [2 1]. 4 1 5 10 50 100 500 1000 1 5 10 50 100 500 votes per resolve number of resolves Figure 5: Distribution o f votes o n resolves. The solid curve indicates a double Pareto log normal fit to the v alues , with para meter s α = 2 . 4 ± 0 . 1, β = 2 . 5 ± 0 . 1, µ = 3 . 67 ± 0 . 02 and σ = 0 . 38 ± 0 . 0 2. The design o f the web site’s use r in terface deter- mines co nten t visibilit y . T ypically sites, including E s- sembly , emphasize rec e ntly c reated conten t and pop- ular conten t (i.e., r eceiving many votes over a p erio d of time). Essemb ly a lso emphasizes controv ersial re- solves. As with other netw or king sites , the use r in- terface highlights re solves with these prop er ties b o th globally a nd among the use r’s netw ork neighbo rs. Users can also find resolves thro ugh a se a rch inter- face. While we cannot o bserve which res o lves p eople click on, we do reg ister when they vote on them, and th us find it interesting enough to warrant sp ending time to consider them. In a similar vein, c lickthrough ra tes have bee n ex- tensively inv estigated in the context of web search and sea rch engine optimization. W eb sea rch engines strive to provide user s with relev ant results to their queries, and rank the matching do cuments in reverse order of p erceived imp or ta nce to the searcher. How- ever, due to the fact that search queries are no t well defined and several p ossible optimal r esults may match a user’s request, it is not alwa ys the top ranked result that is most relev ant to the user. Search en- gine logs pr ovide data on which results use r s click on for giv en q ueries, and thus can rev eal user s’ im- plicit r elev a nce judgements to their sear ches. It has bee n shown, how ever, that the probability of click- ing on a given res ult is biased by the presentation order, thus a result with the same relev ance as a n- other but app ear ing in a higher po isition may get more clic ks (th is is also called “trust bias”) [10, 9]. 1 10 100 1000 10 4 1 10 100 1000 10 4 10 5 resolve age number of votes Figure 6: Distribution of votes vs. age o f a resolve. Eyetrac king exp eriments have als o shown that user s scan through sear ch results in a linear order fro m top to bo ttom, which further explains why res ults o n the top ar e click ed with a larger pro bability [18]. Clic k- throughs are analogous to v otes cast on resolves in Essembly , indicating a preference on the part of the user fo r the given item found for the query during a web sear ch, a nd the r esolve voted on in Essembly , resp ectively . Predictive models hav e b een developed to comp ensate for p osition bias and to offset it to r e- veal the true r elev a nce o f the search results for the users [1, 8]. In Es s embly , recency app ea rs to b e the mo st signif- icant factor a ffecting visibility , in a very similar man- ner to ho w search engine users perce ive the rank ed results. Fig. 6 shows how votes distribute a ccording to the age of the r esolve at the time of the vote. W e define the age of the resolve as the ordina lit y of the given resolve among re solves introduce d in time. An age 1 resolve is the newest one of the resolves intro- duced, while the o lde s t resolve has ag e R where R is the num ber of resolves. Most v otes go to recent resolves with a small age. The decay in votes with age is motiv ated by re- cency (decr e a sing visibility with age as res olve mov es down, and even tually off, the list of r ecent resolves). W e offer no underlying mo del for this “aging func- tion” but its ov era ll power-law form corresp onds to users’ willingness to vis it succes sive pages or scroll down a long list [16]. The s tep at age 5 0 is, pre- sumably , due to a limit on num b er of recent resolves readily accessible to users . The v a lues decr e a se a s a power law, prop ortio nal to a − s , where a is resolve ag e and s ≈ 0 . 5 up to ab out a ge 50. F or la r ger ages, the v alues in Fig. 6 decre a ses fas ter, with s ≈ − 1. It has 5 also b een found that in search engine re s ult page s the probability of clicking on a r e sult a lso decr eases with the rank of the res ult a s a p ow er law, alb eit with a different exp onent ( − 1 . 6 ) [20, 12]. The combination of different age s in the da ta sam- ple is a significan t factor in pro ducing the obser ved distributions [15]. In par ticular, a dis tribution of ages and a m ultiplicative pro ce ss pr o duces a lo g normal distribution with power-law tails, the double Pareto lognorma l distribution [26], with four parameters. Two par ameters, µ and σ characterize the location and width o f the center of the distribution. The r e- maining para meter s characterize the tails: α for the power-law decay in the upp er tail, with num b er o f re- solves with v votes prop or tional to v − α − 1 , and β for the power-law growth in the low er tail, with num b er of r esolves pr op ortional to v β − 1 . Fig. 5 shows a fit of this distribution to the num b er s of votes different resolves received. F or Essembly , the netw orks ha ve only a mo de s t influence on voting [14]. 4.1 Mo del Our mo de l of r esolve c reation, des c r ib ed in Sec. 3, inv olves a fraction q of e a ch user’s a ctivity on the site, on average, giv ing ea ch resolve its fir s t vote. F or subsequent votes, we view a user ’s choice of r esolve as due to an intrinsic inter estingness prop er t y r of each resolve and its visibility . In genera l r could depe nd on the re s olve age a nd its p opularity (es p ecia lly among netw ork neig hbo rs, if neighbors influence a user to vote rather than just make a re solve mor e visible). How ever, for simplicity , we take r to b e co nstant for a res olve. A k ey mo- tiv ation for this choice is the observ a tion that high or low rates of voting o n a re s olve tend to p er sist ov er time, when controlling for the ag e and num ber of v otes the re s olve alr eady has. Thus the impor tance of an in trinsic in teres tingness prop er t y of resolves is a reasona ble appr oximation for Ess em bly (as discussed further in Sec. 6). W e further assume r is indep en- dent of the user, which amounts to consider ing ge n- eral interest in resolves amo ng the po pulation rather than considering p o ssible niche in teres ts amo ng sub- groups o f users. With these simplifications , we take the r v alues to arise a s indep endent choices fro m a distribution P resolve ( r ). Visibilit y of a r esolve dep ends on ag e, rank in num- ber of votes co mpa red with other resolves (p opular- it y), contro versy , both in genera l and among user’s neighbors. F or Es sembly , r esolve age a pp e a rs to b e int erv al resolve I 1 I 2 I 3 I 4 1 f (1) r 1 v 1 f (2) r 1 v 2 f (3) r 1 v 3 f (4) r 1 v 4 2 − f (1) r 2 v 2 f (2) r 2 v 3 f (3) r 2 v 4 3 − − f (1) r 3 v 3 f (2) r 3 v 4 4 − − − f (1) r 4 v 4 T able 2: Mo del of distribution o f votes amo ng re- solves in time int erv als be tw een successive res olve in- tro ductions, here shown for the first four reso lves. the mo st s ignificant factor, s o we take visibility to b e a function o f ag e a alone, as determined by a function f ( a ). With these facto rs, we mo del the chance that the next vote o n e x isting re s olves go es to re s olve j as b e- ing pro po rtional to r j f ( a j ) wher e a j is the age of the resolve at the time of the vote. The mo de l’s be havior is unc hange d by an overall multiplicative constant, and we arbitra rily set f (1) = 1. 4.2 Beha vior W e would like to estimate the distributio n P resolve and the a ging function f ( a ). T o do so, w e consider the votes (other than the first vote on each resolve) betw een successive resolve introductions . Sp ecifi- cally , let R b e the num ber of reso lves in our data sample. W e denote the res olves in the order they were introduced, ranging from 1 to R . Let us assume that there have i r esolves b een in- tro duced in Es sembly up to a g iven time, and let v i be the num ber of votes made in the time int erv al I i betw een the introductions of reso lves i and i + 1 (not including the tw o v otes a ccompanying those resolve int ro ductio ns ). During this interv al, the s y stem has i exis ting res olves as a ssumed. When the num b er o f existing reso lves is la rge, we can treat the votes g o- ing to each resolve as approximately independent. In this case, the num b er o f votes reso lve j ≤ i rec eives during time interv al I i is a Poisson pr o cess with mean v i r j f ( i − j + 1) b ecause during this interv a l re s olve j is of ag e i − j + 1. T able 2 illustr ates these relationships. W e estimate the r j and f ( a ) v a lues a s tho s e max- imizing the likelihoo d o f getting the observed num- ber s of votes o n the resolves in these time interv als , coming from independent Poisson distributions. This maximization do es not hav e a simple c lo sed form, but setting deriv atives with resp ect to these parameters 6 0.001 0.01 0.1 1 0.0 0.2 0.4 0.6 0.8 1.0 resolve r value fraction of resolves with smaller value Figure 7: Cumulativ e distribution of r v alues for the reso lves as obta ine d from a maximum likeli- ho o d estimate for the observed data. The curv e shows a lognormal distribution fit, with parameters µ = − 3 . 11 ± 0 . 0 1 and σ = 0 . 69 ± 0 . 01. to zer o does give simple r elations be t ween these v al- ues at the maximum: r j = v resolve j P R − j +1 a =1 f ( a ) v a + j − 1 (1) f ( a ) = v age a P R − a +1 j =1 r j v a + j − 1 (2) where v resolve j is the num b er of votes reso lve j has re- ceived, v age a is the num ber o f votes made to reso lves of age a at the time of the vote, in bo th ca ses ex c luding the initial vote to each re solve. The res ulting f ( a ) estimates from the n umerica l solution are s imilar to the distribution of votes vs. a ge in Fig. 6, and Fig. 7 shows the dis tribution of estimated r v alues and a lognorma l fit. With the wide v ariation in r v a lues for r esolves a nd the activit y rates for users (Fig. 4), a na tural q uestion is whether these v ariations ar e related. In particular, whether the most ac tiv e user s tend to prefer e ntially int ro duce r esolves that are esp ecially in teresting to other users . While active users tend to intro duce more reso lves ov erall, the correla tion b etw een the ac- tivit y ra te of a user and the av era ge r v alues of the resolves in tro duced b y that user is sma ll: − 0 . 06. W e find a mo dest cor relation (0 . 16) b et ween the time a user is active on the site and the mean r v alues of that user’s introduced resolves. T o relate this mo del to the v ote distribution of Fig. 5, consider the votes received by re s olve j up to and including the time it is of age A . According to o ur mo del, the num b er of votes, other than its first vote, this reso lve r eceives is a Poisson v ariable V j ( A ) with mean µ j ( A ) = r j A X a =1 f ( a ) v j + a − 1 A t the end of our data set, resolve j is of age R − j + 1. The p ersistence o f votes on resolves based on the wide v aria tion of r v alues among res olves g ives r ise to a multiplicativ e pro cess with decay . T o s ee this, in our mo del the num b er of votes b etw een success ive r e - solve introductio ns is geometrica lly distributed with mean ˆ v = (1 − q − λ ) /q ≈ 52. F urthermor e, from Fig. 6, the aging function is approximately p ow er law, with f ( a ) ≈ a − s and P A a =1 f ( a ) ∼ A 1 − s / (1 − s ). The exp ected n umber of v otes up to age A is then µ j ( A ) ∼ r j ˆ v A 1 − s / (1 − s ). After accumulating many votes (i.e., when A is lar ge), the actual n umber of votes V j ( A ) will usua lly b e close to this exp ected v alue. The change in votes to age A + 1 is V j ( A + 1 ) ≈ r j ˆ v A 1 − s 1 − s (1 + x ) ≈ V j ( A )(1 + x ) where x is a nonneg ative rando m v ariable with mea n (1 − s ) / A . Thus, except p os s ibly for the votes a resolve r eceives shortly after its intro duction, the growth in num b er of votes is well-describ ed by a m ul- tiplicative pro ce ss with decay . That our mo del cor resp onds to a m ultiplicative pro cess has tw o co nsequences. First, a sample ob- tained at a rang e of ages from a multiplicativ e pro cess (with or without decay) leads to the double P areto lognorma l distribution se e n in Fig. 5 . In our case, the sa mple has a uniform range of a g es from 1 to R , though with the decay o lde r resolves accumulate votes more slowly than younger ones . A second con- sequence arises fro m the decay as r esolves b eco me less vis ible over time. Thus o ur mo del provides o ne mechanism using lo cally av ailable information giv ing rise to dynamics g ov erned by mult iplicative rando m v ar iation with decay . A simila r pro cess a rises if the decay is due to any combination o f decrea sing in ter- est in the conten t and los s of visibility with ag e, e.g., as seen in sites such as Digg [31] with curr ent events stories that b ecome less relev ant over time. As one indicatio n o f the diversit y of voting on r e- solves, Fig. 8 shows how the average r v a lue for re- solves re c eiving votes compares to the av era ge for all 7 0 5000 10 000 15 000 20 000 25 000 1.0 1.5 2.0 2.5 3.0 minimum age ratio of mean r values Figure 8: Ratio of means of r estimates for reso lves receiving votes at or after v arious ag es to the r esti- mates for all r esolves of those ages. Erro r bars indi- cate the standard erro r in the ra tio of means from the standard devia tio n of the r v a lues and the num ber of resolves in each categor y . resolves among thos e at leas t a given age. Random- ization tes ts indica te average r v alues o f r esolves re- ceiving votes are unlikely to b e the same as thos e of all res olves at eac h of these ages , with p -v alues less than 10 − 3 in all ca ses. With increas ing age, resolves contin uing to receive votes tend to b e those with es- pec ially high r v alues. This b ehavior indica tes that high interestingness estima tes for r esolves p ersist ov er time, as a small subset of resolves co n tinue to collect votes well after their introduction. 5 Links Users’ decisions o f who to link to and ho w they at- tend to the b ehavior o f their neig hbors can sig nif- icantly affect the p erforma nc e of participa to ry web sites. A co mmon prop er t y of such net works is the wide range in num ber s of links made by users, i.e., the deg ree distribution of the netw ork. The struc- ture of the net works is typical of those seen in o nline so cial netw orking s ites , and the links created by users generally conform to their nominal semantics [14]. The degr ee distributions in a ll three Essembly net- works are clo se to a truncated p ow er law [2 5], with nu mber of users in the net work with deg r ee d prop or - tional to d − τ e − d/κ . Fig. 9 shows the distribution of degrees in the net works. These long-tail degree distributions a r e often viewed as due to a preferential attachmen t pro ce ss in which user s tend to form links with other s in prop or- allies friends nemeses 1 5 10 50 100 1 5 10 50 100 500 1000 degree number of users Figure 9: The num be r of us e rs in the E ssembly so cial net work who have a given n umber of links of the in- dicated type (plus symbols are for the frie nds , circles for the a llie s, and sq ua res for the nemes es netw ork s , resp ectively). The parameters of the bes t fits of trun- cated p ow er laws on the three sets of data ar e given in the text. tion to how many links they already hav e. Com bined with a limitation on the num b er of links a user has, this pro ces s g ives truncated p ow er-law degr ee distri- butions [2]. F or Essembly , this limitation arise s from users b ecoming inactive, since suc h users no longer accept links. Howev er, user s in Essembly hav e no di- rect acces s to num be r of links of other user s. Th us we need to identify a mechanism users c o uld use, based on information av ailable to them. The mechanism underlying prefer ential attach- men t likely differs b et ween the friends netw ork and the tw o ideolo gical ones. In particular , since the links app ear to follow their nomina l semantics [14], links in the friends netw ork are likely to be mainly b etw een peo ple who know each other (i.e., not found v ia Es- sembly) while ideologica l links (esp ecially those who are not also friends ) r equire finding the p eople by ideologica l profile (which Essembly makes av ailable). Building such a pro file requires voting, so a user with many v otes is more likely to hav e v oted on similar resolves as other users. Suc h common votes a llow ideologica l compar isons be t ween users and therefore suggestions for po tent ial use rs to link to. The need to build ideological pro files sugges ts votes on common resolves is k ey to the num b er a nd type of links. Tha t is, a user for ming a link is more likely to hav e many common votes with other users who are very active (a nd hence have ma n y votes). Thus forming links bas e d on commo n votes is likely to lead 8 peo ple to link prefer e n tially to highly active us e rs, who will in turn tend to have many links. One challenge to ev aluating this mechanism is cau- sation: resolves voted on b y netw ork neigh b ors are highlighted in the user interface, making them more visible and hence more likely to r eceive votes. Thus common votes increa se the chances of fo rming a link by providing information to for m a pro file , and links increase the chance of common votes through visibil- it y of res olves. Separating these effects is esp ecia lly challenging s inc e our da ta set do es no t indica te when each link was formed. W e can partially a ddr ess this challenge through t wo observ ations. Firs t, Essembly presents “ resolves in your netw ork ” g r ouping the three netw orks to- gether. So any influence o n resolve v isibilit y due to net works should b e simila r for all netw orks. Seco nd, Fig. 10 illustrates a distinction b et ween the ideologi- cal netw orks in Essembly and the so cia l netw ork nom- inally linking p eo ple who know ea ch other as friends. The fig ure shows friends generally have many more resolves in co mmon, i.e., both users v oted o n, than random pairs of us e rs who participate in at le a st one of the net works. The figure also shows the ideolog- ical netw ork s (bo th allies and nemeses) ar e similar and hav e significa ntly more common reso lves than the friends net work. These tw o o bserv ations suggest the enhanced num- ber of common votes for the friends netw ork com- pared to rando m pair s is pr imarily due to the in- creased visibility o f resolves due to netw ork neighbors voting o n them. Because Essembly presents resolves from a ll net works together, this enhance ment is also likely to b e the same for the ideologica l netw ork s. Hence, the r emaining incr ease in common votes in the ideolog ical ne tw or ks compar ed to the friends net- work suggests the additio na l commonality required for users to form the links . Fig. 11 shows the types of links v ary dep ending on user a ctivity . F or this plot, users with a t least one netw ork connectio n are gro uped into qua ntiles by their num ber of votes. Each po int on the plot is the av erage fractio n of link types among users in that quantile, with the erro r bar indicating the standard deviation of this estimate of the mea n. Users with few votes tend to hav e most o f their link s to friends only , so do not pa rticipate muc h in the ideolog ical net works. On the other hand, users with many votes tend to have most of their links in the ideologic al net works and to p eople who are not also friends. The same trend in link types o ccur s as a function of other 0 20 40 60 80 100 120 140 0.0 0.2 0.4 0.6 0.8 1.0 common votes fraction of links with more common votes random friends allies nemeses Figure 10: Cumulativ e distribution of num b er of common votes among linked pairs in the net works, and among random pair s of users who are in at lea st one netw ork. F or each num ber o f common votes, the curves show the fr action o f pairs with mor e than that many resolves b oth users in the pair voted on. measures of user activity , i.e., us ing quantiles based on the time a user is active o r the num ber of links a user has. 5.1 Mo del In o ur mo del, user u forms links at a rate λρ u . Thus the num ber of links a user forms is a combination of activity r a te a nd ho w long the user remains a t the site. The wide v ariation in activity times a nd ρ u among users (Fig. 1 and 4) gives rise to a wide distribution of num be r of links. While the most common mec hanisms designed to repro duce the o bserved p ow er law deg r ee distribu- tions use gr owing rules and the deg ree of vertices in link forma tio n [1 1], in the following we prop ose a mechanism that only takes in to account the extent to which tw o users shar e in terests to descr ib e link formation b et ween t wo user s . Because links in volve tw o p eople, an a dditional mo deling issue is which pairs of users form links . In our mo del, we take the fr ie nds netw ork to primar ily reflect a preexisting so cia l ne tw or k. F or the ideologi- cal netw ork s, how ever, we take the choices to dep end on co mmon votes. F urthermor e, o nly ac tive user s can for m links. Specifica lly , we mo del the likeliho o d a (non-friend) pair forms a link in a n ideo logical net- work as prop ortio nal to the num b er of commo n votes they hav e. In addition, existing friends links can add 9 1 10 100 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 number of votes fraction of link types only friends friends & ideological non - friends Figure 11: F r action of link t yp es vs. nu mber of votes on noninitial r esolves. A linked pair of users a re de- noted “o nly friends” when their only link is in the friends net work, “non-friends” when they are no t linked in the friends netw ork, and “friends & ideo- logical” when they hav e a friends link as well as a link in the allies or nemeses ne tw or ks. an ideological link. People form ideolog ical links based on common votes, and only users ac tiv e a t the same time can form links. T he fir st factor gives more links for those who vote a lo t (due to b eing more likely to hav e votes in common with others). This leads to, in effect, preferential attachmen t fo r forming links (those with more links are likely to b e user s with mor e votes, hence mor e ov erlap with others), while the attach- men t probability do es not explicitly dep end on de- grees. The ac tivit y constr aint limits the link growth, corres p o nding to descriptive mo dels giv ing truncated power-law degree distribution [2]. 5.2 Beha vior T o verify whether user s connect to ea ch other based on similarities in their voting pr ofile, w e prop ose the following simplified mechanism for link for mation. Suppo se that user A voted on N A resolves, while us er B voted on N B resolves in to ta l. Assuming tha t A and B form a link with a probability pro po rtional to the num b er of votes that the pair has in common, this probability will b e P l ( A, B ) ∝ N A N B if N A and N B are sufficiently s maller than the num ber of all resolves, and A and B vote indep e nden tly of each other a nd pick re solves randomly from the p o ol of all av ailable res olves. Caldar elli et al. hav e shown that if vertices in a netw ork po s sess intrinsic “fitness e s ”, and the linking probability is prop ortiona l to the pro duct of fitnesses of the tw o vertices to be linked, then in the particular case when the fitness es are dr awn from a p ow er-law pr obability distribution function the re- sulting degr ee distribution will have the sa me exp o- nent as the fitness distribution [7]. W e ca n consider the num be r of votes a pe rson makes as the fitness of the vertex, a nd arrive b y analog y at the same model as Ref. [7], resulting in an exp ected p ow er-law exp o- nent of − 1 . 4 5 (Fig. 2). Fitting truncated p ow er laws to the deg ree dis tri- butions of the three netw ork s shown in Fig. 9, w e found the para meter s τ F = 1 . 25 ± 0 . 04, κ F = 27 ± 4; τ A = 1 . 20 ± 0 . 04, κ A = 59 ± 9; and τ N = 1 . 44 ± 0 . 11, κ N = 18 ± 6 for the friends, allies , and nemeses net works, r esp ectively , with the v alues fo r the 95% confidence interv als indicated. The p ow er-law exp o- nent s ar e in the range [1 . 20 , 1 . 44], giving a consistent match to the exp onent of Fig. 2 . The truncation o f the power laws seen in the degr e e distributions are most likely the result of vertices gr adually b ecom- ing inactiv e in time. An interesting co nsequence o f the above is that while the friends netw ork as s een on Essembly is s upp os ed to not b e a res ult of shared votes made conspicuo us by the web user interface, we see a consistent match in the exp o nents: this sugges ts that friendship links in rea l life may also for m ar ound shared interests, and that the scop e of in terests p eo- ple hav e may follow a similar pr obability distribution function as shown in Fig. 2. Unlik e r a ndom graph mo dels with this deg r ee dis- tribution [25], our mechanism ba s ed on common votes a lso gives significa nt transitivity , comparable to that observed for the allies net work. That is, if users A and B hav e voted on ma n y r e solves in com- mon, as hav e user s B and C , then users A and C also tend to hav e s ignificant overlap in the r esolves they voted on. A further consequence of our mo del with ideo lo gi- cal link s depending on common votes is the pre diction of a change in the types of link s users make as they vote. In particular, users with few votes will also hav e few co mmon votes with o ther user s and hence their links will tend to b e mostly friends. Users with many votes, on the other hand, will tend to have common votes with many o thers and hence, accor ding to this mo del, tend to hav e mos tly ideologic al links. This change in type o f links as a function of a user’s num- ber of v otes or links o ccurs in Essembly , as seen in Fig. 11. Finally , our model als o desc rib es the significant fraction of user s who form no links as due to a combi- 10 0 50 100 150 200 0 2 4 6 8 10 time H days L Ρ estimate Figure 12: E stimates o f ρ v alues for s e veral use rs as a function o f the time since their first vote. Error bars show the 95% confidence interv als. nation of low activity rate ρ and short activity time t . Sp ecifically , in o ur mo del the pr obability a user has no link s is e − λρt . F or active use rs, who s e activ- it y time distr ibutio n is r oughly exp onential with time constant τ , the v a lues in T a ble 1 a nd the distribution of ρ v alues in Fig. 4 give the pr obability for no links as the average v alue o f e − λρt equal to 2 3%. This compares with the 1242 out of 476 2 a ctive us e r s (i.e., 26%) who have no links in our data set. 6 Online Estimation Our mo del allows estimating parameters for new users and new r esolves as they a ct in the s ystem. In particular, we descr ib e using the ear ly histor y of re- solves to estimate the num be r of votes a resolve will even tually have as w ell a s which r esolve will likely receive the next vote. Fig. 12 shows estimates of user activity levels a s a function of time s ince the user first voted. W e see user activity levels change with time, and in different wa ys. So user s not only differ consider ably in their av erage a ctivity ra tes but als o in how their interest in the site v a ries in time. F or r e solves, using the mo del o f Sec. 4, Fig. 13 shows how estimates for resolves, and their confidence int erv als change ov er time, as more votes are ob- served. Other r esolves show similar b ehavior. Thus the interestingness of reso lves app ear s to conv erg e in time as w e exp ect. In practice, how ever, the optimization pro cedure is computationally very costly due to the large num b er of par ameters that grows linearly with the n umber of r esolves in the system. A further r equirement of an online algo rithm is that it is a ble to up date the 0 10 20 30 40 50 60 0.0 0.1 0.2 0.3 0.4 resolve age r estimate Figure 13: E stimates o f r v a lue s for tw o resolves as a function of their ag e. Error bars s how the 95% confidence interv als. mo del parameters in rea l time as new us ers, votes and resolves enter the system. Thus it is no t fea sible to consider a gr owing num b er of reso lves with constant resource s. Instead we must limit the the num b er of parameters and thus resolves to b e o ptimized to a constant v alue. One such approa ch is to optimize parameters based on the last K active resolves only , a nd keep the inter- estingness a nd aging par ameters constant for reso lves older than that. This metho d, interestingly , ha s the po tent ial benefit of b eing able to tr a ck changes to int eres tingness and a ging in time. Another incr emental appro ach uses the observ ation that o ld reso lves, with a lo ng track re c ord of votes, hav e their in teres ting ness well-estimated and simi- larly the a ging function f ( a ) for small age s is w ell- estimated from prio r ex p er ience with ma n y res olves receiving votes at those ages. Conv ersely , recently in- tro duced reso lves have had little time to accumulate votes and f ( a ) for large ages is p o orly e s timated due to having little exp erience in the sy s tem with res olves that o ld. F urther more, we can exp ect f ( a ) to change slowly with time as primar ily due to how the user in- terface makes r esolves visible to users. The maximum likelihoo d estimation for these par ameters descr ib ed in Sec . 4 requir es a computationally exp ensive opti- mization to find the b est choices for r j and f ( a ) for all v alues. F or new r e solves, with j close to R , Eq. (1) determines the r j v alues in ter ms of the v alues of f ( a ) for small ages (i.e., a = 1 , . . . , R − j + 1 ) which a r e already well-determined fro m the prior his tory o f the system. S o instead of an expens ive reev aluation of all the r and f v alues, we can simply incremen tally estimate the r v alues of new reso lves assuming f ( a ) v alues for sma ll ages do no t change muc h. Con versely , 11 as new reso lves a re intro duced, the oldest resolves in the s y stem adv ance to ever la rger ages , allowing esti- mates of f ( a ) for those ages fro m E q. (2) by assuming the r v alues o f those old r esolves do not change muc h with the in tro duction of new r e solves. Such es tima tes of mo del par ameters can be useful guides for improving so cial w eb sites if extended to user behavior as well, b y identifying new users likely to b ecome highly a ctive or co n tent likely to b ecome po pular. Since it is poss ible to estimate the statistical error s given the sample size, o ne can also p er fo rm r is k assessment when giving the estimates. Newly p os ted conten t with high interestingness, for instance, c an be quickly identified and g iven pro minent a tten tion on the online int erfa c e. 7 Discussion W e describ ed several extended distributions resulting from user b ehavior o n Essembly , a web site where users cr eate a nd rate conten t a s well as form net- works. These distributions a r e common in participa- tory web s ites. F rom the extended distributions of user b ehavior we find ex tremely heter ogenous p opu- lation of users and res olves. W e int ro duce a plausible mechanism describing user behavior based o n lo c a lly av ailable info r mation, inv olving a combination of ag- ing and a large v ariatio n among people and resolves. W e centered our inv estigatio ns around thr ee areas: the wide range in user a ctivity levels in online par- ticipation; how online so cial net works form ar ound topical interests; and the factors that influence the po pularity of user-cre a ted conten t. In pa rticular, we found, first, tha t most users try the online serv ices only briefly , so most of the activity arises from a r elatively small fraction of users who ac- count fo r the diverse b ehavior observed. Second, we gav e a plausible, q uantit ative explana tion o f the long- tailed degree distributions o bs erved in online commu- nities, based on only the obser ved a ctivity patter ns of users a nd the underlying collabo rative mechanisms. Our obs e rv a tions suggest differ e n t mechanisms un- derly the for mation of the so cia l (friends) and ideo- logical (allies a nd nemeses) netw or ks, although these mechanisms give similar o utco mes, e.g., for the qual- itative form of the deg ree distribution. The impli- cations may extend b eyond the scope of pure ly o n- line so cieties to describ e other so cietal connections as well wher e shared interests mo tiv ate relationship formation. Our mo del, how ever, do es no t addre s s other significant prop erties of the netw orks, such as communit y str ucture and assor tativity and why they differ a mong the three net works [14]. Nor do es our mo del addres s detailed effects on use r behavior due to their netw ork neig h b ors . Thir d, we prop osed a mo del and algorithm that can describ e and pre dict through iter ative r efinement s ho w the popular ity of user-gener ated submissions evolves in time, co ns id- ering b oth their changing exp osur e online and their inherent interestingness. W e found that the exp osur e that conten t r eceives dep e nds largely on its rec e ncy , and decays with age. The characteristics of our models pla usibly apply to o ther w eb s ites wher e user participa tio n is self- directed and where conten t crea tion and social link formation plays a do minant pa rt in the individual online activities. The Digg and Wikip edia user com- m unities (those who s e activity data is publicly av a il- able) in particular may show similar b ehavior in their activity pa tterns. Our mo dels could b e extended to include the weak, but nevertheless statistica lly sig- nificant, correlatio ns among user b ehaviors such as activity ra te a nd the time they remain active on the site. Including such co rrelations , as well as some his- torical a nd demographic informatio n on individual users, ma y improv e the mo del predic tio ns as seen, for example, in mo dels estimating customer purchase activities [6]. Consequences of our mo del include sugges tions for ident ifying active user s and interesting resolves ear ly in their history . E.g., from p er s istence in voting r ates ov er time, even b efor e a ccumulating eno ugh votes to be ra ted as p opular. Such iden tification co uld be use- ful to pro mote interesting conten t on the web site more rapidly , particularly in the ca se of niche inter- ests. Beyond helping users find in teresting con tent, designs infor med by causa l mo dels co uld also help with deriv ative applications, such as co llab orative fil- tering or developing trust a nd reputations, by quic kly fo cusing on the most sig nificant users or items . Such applications raise significant questions o f the relev ant time scales. Tha t is, o bserved b e havior is noisy , so there is a tr adeoff b etw een us ing a long time to accu- m ulate enough statistics to c a librate the model vs. us- ing a short time to allow resp o ns iveness faster than other proxies for user interest such a s p opular it y . Our mo dels raise additional questio ns on p opula- tion prop er ties w e used. One such question is under - standing how the resolve aging function r e lates to the user interface and changing interests a mong the us er po pulation. Another q uestion is how the wide distr i- 12 butions in user activity and res olve interestingness arise. The log normal fits sug gest underlying m ul- tiplicative proces s es are in volved. It w ould als o be int eres ting to extend the mo del to identify niche r e- solves, if a n y . T ha t is , re s olves of high interest to small s ubgroups of us ers but no t to the po pulation a s a whole. Automatically identifying such s ubgroups could help p eople find other s with similar interests by s upplemen ting co mpa risons based on ideological profiles. A cav eat o n our results, as with o ther obser v a- tional studies of web b ehavior, is the ev idence for mechanisms is based on corr elations in o bserv ations. While mechanisms prop osed here are plausible causal explanations since they rely o n information a nd a c- tions a v ailable to users , interven tion exp eriments would give more c o nfidence in distinguishing corre- lation from causa l r elationships. Our mo del provides testable hypothese s for such exp eriments. F or exam- ple, if in trinsic interest in r esolves is a ma jor facto r in user s’ selec tion of r esolves, then deliber ate changes in the num b er of votes may change v isibility but will not affect interestingness. In that case, w e would ex- pec t subsequent votes to r eturn to the or iginal trend. Thu s one ar ea for exp erimentation is to determine how users v a lue c o nten t on v arious web sites. F or example, if items are v alued mainly because others v alue them (e.g ., fashion items) then obser ved votes would c ause rather than just reflect high v alue. In such cases, r andom initial v aria tions in ratings would be amplified, and show very different results if re- pea ted o r tried on separate subgr o ups of the po pu- lation. If items a ll have similar v a lues and difference mainly due to visibility , e.g., recency or p opularity , then we would exp ect votes due to rank order of votes (e.g., whether item is most p opula r) rather than ab- solute n umber of votes. If items hav e broad in trinsic v alue, then voting would show p ersis tence ov er time and similar outcomes for indepe ndent subgro ups. It would also be useful to identif y as pec ts of the model that could b e tested in small groups, thereby a llowing detailed and well-cont ro lled la bo ratory exp eriments comparing mult iple interven tions. Large r scale ex- per iments [4, 27] would also be useful to determine the generality of these mechanisms. The key featur es of contin ual arriv al of new users, existing user s b ecoming inactive a nd a wide rang e of activity levels a mong the user p opulation a nd in ter- est in the con tent can apply in man y contexts. F o r the distribution of how us er rate co nt ent (e.g ., votes on resolves in E ssembly), genera lizing to other situa- tions will dep end on the o r igin of per ceived v alue to the users. A t o ne extreme, which s eems to apply to Essembly , the r esolves themselves have a wide ra nge of a ppe a l to the user p opulation, leading some items to consistently collect rating s at higher rates than others. A t the other extr e me, p erceived v alue c ould be la rgely driven by p opular it y amo ng the users , o r subgroups of us e rs, as seen in some cultural pro d- ucts [27]. In rapidly changing situations, e.g., cur- rent news even ts or blog p osts, recency is imp ortant not only in pr oviding v isibility thr ough the s y stem’s user in terfac e , but also determining the level o f in- terest. In other situations , the level o f interest in the items c hanges slowly , if a t all, as a ppea rs to b e the ca se for resolves in Ess em bly co ncerning broa d po litical questions such as the b enefits of free trade. All these situations can lead to long-ta il distributions through a comb ination of a “ric h get richer” multi- plicative pro cess and decay with age. But these sit- uations hav e differ ent underlying causa l mechanisms and hence different implications fo r how changes in the site will a ffect user behavior. Thu s, design and ev a lua tion o f participa tory web sites ca n b enefit from the av ailability of causal mo dels. Ac kno wledgmen ts W e thank Chris Chan and Jimmy Kittiyac hav alit of Es- sem bly for th eir help in accessing th e Essem bly data. W e hav e benefi ted from discussions with Mic hael Brzozo wski, Dennis Wilkinson, and T am´ as Sarl´ os. References [1] E. Agich tein, E. Brill, S. Dumais, and R. R agno. Learning user interactio n mo dels for predicting w eb searc h result preferences. In Pr o c. of the Inter- national A CM S IGIR Confer enc e on R ese ar ch and Development in Information R etrieval , pages 3–10, 2006. [2] L. A. N . A maral, A. S cala, M. Barthelem y , and H. E. Stanley . Classes of small-w orld net w orks. Pr o c. of the Natl. A c ad. Sci. , 97:11149–111 52, 2000. [3] C. A n derson. The L ong T ail: Why the F utur e of Business is Sel li ng L ess of Mor e . Hyperion, 2006. [4] W. S. Bain bridge. The scientific research p otential of virtual worlds. Scienc e , 317:472–476, 2007. [5] S. Boccaletti, V. Latora, Y . Moreno, M. Cha vez, and D.-U. Hwang. Complex netw orks: Struct u re and dy- namics. Physics R ep orts , 424:157–30 8, 2006. 13 [6] S. Borle, S. S . Singh, and D. C. Jain. Customer lifetime v alue measuremen t. Management Scienc e , 54:100– 112, 2008. [7] G. Caldarelli, A. Capo cci, P . De Los Rios, and M. A. Munoz. Scale-free netw orks from varying vertex in- trinsic fitn ess. Physic al R eview L etters , 89:258702, 2002. [8] B. Carterette and R. Jones. Ev aluating search en- gines by mo deling the relationship b etw een relev ance and clicks. In J. Platt et al., editors, A dvanc es in Neur al Information Pr o c essing Systems . NI PS , 2007. [9] C. L. A. Clarke, E. Agich tein, S. Dumais, and R. W. White. The infl uence of caption features on clic k- through patterns in w eb searc h. In Pr o c. of the I nter- national A CM SIGIR Conf er enc e on R ese ar ch and Development in Inf ormation R etrieval , pages 135– 142, 2007. [10] N. Crasw ell, O. Zo eter, M. T aylor, and B. Ramsey . An exp erimental comparison of clic k p osition-bias mod els. In Pr o c. of the International Confer enc e on Web Se ar ch and Web Data Mini ng , pages 87–94, NY, 2008. ACM . [11] S. N. D orogo vtsev and J. F. F. Mendes. Evolution of netw orks. Adv anc es In Physics , 51:1079– 1187, 2002. [12] S. F ortunato, A. Flammini, F. Menczer, and A. V espignani. T opical interests and the mitigation of searc h engine bias. Pr o c. of the Natl. A c ad. of Scienc es , 103:126 84–12689, 2006. [13] R. Guha, R. Kumar, P . Raghav an, and A . T omkins. Propagation of trust and distrust. In Pr o c. of t he 13th Intl. World Wide Web Conf. (WWW2004) , pages 403–412 , N ew Y ork, 2004. ACM. [14] T. Hogg, D. M. Wilkinson, G. S zab o, and M. Brzo- zo wski. Multiple relationship t yp es in online com- munities and social netw orks. In Pr o c. of the AAAI Symp osium on So cial Inf ormation Pr o c essing , 2008. [15] B. A . Hub erman and L. A. Adamic. Gro wth dynam- ics of the W orld Wide W eb . Natur e , 401:1 31, 1999. [16] B. A. Hub erman, P . L. T. Pirolli, J. E. Pitko w, and R. M. Lukose. Strong reg ularities in W orld Wide W eb surfing. Scienc e , 280:95–97, 1998. [17] A. James and M. J. Plank. On fi t ting p ow er laws to ecologica l d ata. arxiv.org preprint 0712.0613 , 2007. [18] T. Joachims, L. Grank a, B. Pan, H. Hembrooke, and G. Gay . Accurately in terpreting clickthrough data as implicit feedbac k. In Pr o c. of the International ACM SIGIR Confer enc e on R ese ar ch and Development i n Information R etrieval , pages 154–161, 2005. [19] C. Lam. SNACK: incorp orating so cial net work infor- mation in automated collab orative filtering. I n Pr o c. of the 5th ACM Confer enc e on Ele ctr onic Commer c e (EC’04) , pages 254–255. AC M Press, 2004. [20] R. Lemp el and S. Moran. Predictive cac hing and prefetc hing of query results in search engines. I n Pr o c. of the I nternat ional Confer enc e on World Wide Web , pages 19–28, 2003. [21] K. Lerman. So cial information pro cessing in social news aggregation. arxiv.org preprint cs.cy/0703087, 2007. [22] K. Lerman. U ser participation in social media: Digg study . In IEEE/W IC/ACM I ntl. Conf. on Web I n- tel li genc e and I ntel li gent A gent T e chnolo gy , pages 255–258 , 2007. [23] M. Mitzenmac her. A b rief history of generative mo d- els for p ow er law and lognormal distributions. Inter- net Mathematics , 1:226–251 , 2004. [24] M. E. J. N ewman. P ow er la ws, P areto distributions and Zipf ’s law. Contemp or ary Physics , 46:323–3 51, 2005. [25] M. E. J. Newman, S . H. Strogatz, and D. J. W atts. Random graphs with arbitrary degree distributions and their applications. Physic al R eview E , 64:026118, 2001. [26] W. J. Reed and M. Jorgensen. The double Pareto- lognormal distribution: A new parametric model for size distributions. Com munic ations i n Statistics: The ory and Metho ds , 33:1733–17 53, 2004. [27] M. J. Salganik, P . S. Do dds, and D. J. W atts. Ex- p erimen tal stu dy of inequality and unpredictability in an artificial cultural market. Scienc e , 311:854 –856, 2006. [28] W. Sho ckley . On th e statistics of ind iv idual vari a- tions of pro ductivity in research lab oratories. Pr o c. of the IRE , 45:279–290, 1957. [29] A. V´ azquez. Growing net work with lo cal rules: Pref- erentia l attac hment, clustering h ierarch y , and degree correlations. Physic al R eview E , 67:056104, 2003. [30] A. V´ azquez, J. G. Olivei ra, Z. Dezso, K.-I. Goh, I. Kondor, and A .- L. Barabasi. Modeling b ursts and heavy t ails in human dynamics. Physic al R eview E , 73:0361 27, 2006. [31] F. W u and B. A . Hub erman. Nov elt y and collective attentio n. Pr o c. of the Natl. A c ad. Sci. , 104:17599 – 17601, 2007. 14
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment