Evaluating the effect of topic consideration in identifying communities of rating-based social networks
Finding meaningful communities in social network has attracted the attentions of many researchers. The community structure of complex networks reveals both their organization and hidden relations among their constituents. Most of the researches in th…
Authors: Ali Reihanian, Behrouz Minaei-Bidgoli, Muhammad Yousefnezhad
Evaluating the effect of t opic consideration in identifying c ommuniti es of ra ting-based social networks Ali Reih anian Departme nt of In f ormatio n Technolo gy Mazandara n U ni ve rsity of Science an d Technolo gy Babo l, Iran areihanian@us tmb .ac.ir Beh rouz Minaei-Bidgoli Departme nt of Computer Sc ience Iran U niversity of Science and Technolo gy Tehran, Ira n b_mi na ei @iust.ac .ir Muhamm ad Yousefnezh ad Departme nt of Computer Sc ience Nanjing U niversity of Aeronautics a nd Astro na utics Nanjing, China myo us ef n ezh ad@ nuaa.edu .cn Abstrac t —Find i ng me aningfu l com munities in socia l netwo rk has attracted the at tention s o f man y resea rchers . T he comm unit y structu re o f co mplex n et wo r k s rev eals bo t h their o rganiz ation and hidden rela tions amo ng their cons t itue nts. Most of the researc hes in t he field of comm unity d etectio n ma i nly focus on the topo l ogica l st ructur e o f the ne t wo r k wit ho ut p erformi ng any content ana lysis. No wadays, real wo rld social n etwork s are contain ing a v ast range of i n form atio n including shar ed objec t s, com ments, f ol l ow ing info rmat ion, etc. In re cen t y ears, a num ber of research es hav e propo sed appro aches wh ich consider bo t h the content s that are in t erchang ed in the netwo rks and t he topo l o gical structur es of t he network s i n order to find mo re meanin gful co mm unities. In this res earch, the effe c t of t op ic analysis in fi nd ing mo re meaning f ul com munit ies i n s o cial netwo rking sites in wh ich the us ers express t heir feeling s towa rd different ob jects (like mo vies) by the means of rating is demo nst rated by pe rform ing extens ive expe ri men t s. Keywor ds-Con te n t An alysis; Topi cal co m muni ty; Co mmunity detection ; Modular ity; Pur ity I. I NTRO DUCTION With the emergence of social n etw orks, peo ple h ave bee n attracted to t h em , and hav e been sharing valuab le info rmatio n by means of c ommunicati n g wit h e a c h o th e r. F or exampl e, folkso n omies ar e so cial tagging sites which their users co l laborative l y e xpress their f eeli n gs an d se ntiments tow ar d a spec i al r eso ur ce like a mov ie or m us ic by m eans of desc r iptiv e keywo r ds (tags) [1] o r rati n gs. O ne of th e most impo r tant issues c o n side r ed when analy zi n g th ese kinds of netwo r ks i s co m munity dete ctio n . A Community (al so so m etimes r ef erred to as a module or cl uster [2]) is a dense s ub-netw ork w ithin a larger netw ork, such as a clo se-knit group o f f r i ends i n a so cial netwo r k [3]. The commu nity str uctu r e of complex netw o r ks reve als bo t h thei r o r ganizat ion a nd hi dde n relat ions a m o n g their co nstituents [4] . A large n umber o f m et h ods ha ve b een p r opo sed t o extract appropriate co mm unit ies from netwo r ks including a set of nodes (indiv iduals) and w eighted edge s (connectio ns). They just co nsider the g r aph struc ture of t he n e t w or k fo r fin di n g co m muniti es and n o c onte nt a na ly sis has bee n used in t h e proce ss of th eir p ropos ed approaches. Des pite of the orig inal defi n itio n of the netwo r ks, now aday s, r eal w orld netwo rks like F a ce boo k and T w itter are co n taini ng a v ast range of info rmatio n in cl udi n g s hare d obj ects, co mme n ts, etc . It is unre asonab le f or a com munity to be expl ained by a s ingle e ntity b ec ause t h e co mmuni t y membe rs are ge n e rally inte racti n g wi th eac h ot h e r via a l arge numb er of dist inguis hable way s in v ario us do mains. O ne of the pos sible solu tio n s is to find topic al clus ters in w hi c h the n o des have t h e s ame t o pic of inte r es t. Eac h topic al cluste r rep r ese nts o n e of th e topics of inte r est in the ne t w o rk. The n, a comm unity detec tion algo rit hm c an be app lied to these topica l cl uste r s to find th e ulti mate co mmu nities [5]. In th is w a y , we ca n analy ze and es t im ate th e eff ect of t opic co n side rat ion in co mmuni ty dete ctio n. In this pape r , th e ef fec t of topic analy sis in find ing m ore meani ngful co mmunitie s in soc i al netw orki ng site s i n w hic h the use r s express the ir fee l ings tow ar d diff erent obj ects (like mov i es ) by t he m ea ns o f rati n g, i s de m ons tra ted by per fo rming exte n sive expe rime nts. The r efo r e , a n e t w ork is p art i tio n ed into diff er e n t topic al cl usters in w hich the n odes hav e t h e s ame topic o f i nte res t. Th e n, a co mmuni t y detec tio n a lgo rithm i s appl ied to the top ical clus te rs in o rder to f ind m o r e mea ningf ul co m muni t ies . This wil l l e ad us to com m uni ties in whic h th e nodes ar e t ight ly connecte d and hav e t he sa me top i c o f inte r e st. This proc ess is c alled top ic-o rie nt ed comm u nity dete ct i o n [5]. At last , the resu lt s of com muni t y de tect i on wi th top ic co n side rat ion are co mpa r ed w ith th e r esu lts of co m muni t y detec t io n w ithou t c o n side r ing the topics of inte r e st. Quant i ta t ive ev a lua tio ns rev eal that the resul ts o f co mmu nity detec t io n wi l l be im prov ed when the topic of inte r est in th e netw ork is co nside r ed . The r ema inde r of the pape r i s out lined a s fo ll o w s. I n sec t ion II , r e lated w orks are r ev iew ed. S ec tio n III expl a ins the topic-o riente d co m mu ni ty detec tio n. In o r der to ev alu ate the eff ect of topic co n s i de ra tio n in ide ntify in g th e c om m uni ti es of rating-b ase d soci a l ne t w orks, exte n s ive expe rime nts are co n duc ted on r eal -lif e da t a se ts. The de sc riptio n s of t h ese d ata sets, th e expe rimenta l r es ults and thei r analy ses are gi ve n in sec t ion IV . Fi nally , the co n c l us ions ar e giv en in se ction V . II. R ELATED WOR KS Many r esearc hes have been do n e in th e area of c ommunity detec t ion. Most of th ese r esearc h es mainly focus o n th e topo l o gical structure o r linkage p atterns o f netwo r ks. They merely co nsider the graph st ructure o f the n etwo r k fo r findi ng co m muniti es , while n o co n tent analy sis i s used in t h e proc ess of their p r opo sed approaches. Acco r ding t o the commu nity dete ction strategies whic h were employ ed in these r es ea r ches, thei r p r opo sed m et hods can b e classifie d into optimizatio n -based methods and heuristic met hods. So me of the optimizatio n -based methods foc us on o pt imi zing an o bjectiv e function [5]. One of the most important w orks in the l ite ratu re w a s a r ese a rch done b y New man a nd Gi r v an, in w hi ch they introduc ed mo dularity as an obj e ctive function [6]. A la r ge amou n t of works has b een done to optimize modularity such as the methods w h ich we r e deve lope d in [7-9]. This function has b ee n influe ntial in the literatu r e o f community dete ctio n , and has gained succ ess in many applic ations. Modularity is use d to evaluate the qu ality of a pa rticular divisio n of a n etw ork into co mmunities [5]. On the ot her h and, heu ristic met hods such as GN algorit hm [10] and CPM algo r it hm [11] des ign a grap h cluste r ing algo r ithm base d on intuitive assumptio n s [5]. Eve n t h o ugh these r ese arc hes have g aine d suc ce ss in some appl ica tio ns, si n ce they mainly focus o n the to pologic al struc t u r e of the netw orks, they igno r ed the co nten ts inte r c han ged b etw een membe r s. As a res ult , the r e lat io n shi ps be t w e en the memb ers i n t h es e r ese arches are mainly base d on the tota l nu m be r of commu nicatio n s. In recent y ear s, a num be r of resea r ch es h ave propos ed approaches w hich consider bo t h th e co nt e n ts that are inte r cha nged in the n etw orks an d the topo logical structures of the netw o r ks in orde r to find more meaningf ul commu n ities . Z. Z hao, e t a l. p ropos ed a topic-o r i e n ted co mm unity dete ction approach b ased on soc ial objec ts’ clustering and li n k analy s is [5]. Their propos ed approach could ide n tify th e topical co m muniti es whic h r efle ct t he to pics and st r engt h s of co n n ectio ns si multaneously . Zhu, et al. c omb in e cl a ssic i de a s in topic mode l ing w i th a v ar ia nt o f m ixed-memb ers h ip block mode l w hi c h is rece ntly de ve l o p ed in the statistic al phy sics co m munity [12]. In th ei r resea r ch, Z hu, et al. co mbine topic - mode ling wit h link st ructu r e. A. Z ha o a n d Ma propos ed a framew ork to appl y a semantic ally structured app r oach to th e Web service community modeling and disc ov er y [13]. III. T OPIC - ORIENT ED COM M UNITY DETECT ION IN A SOCIAL NETWOR K As w e m e n tio n ed earl i e r, th e go al of this pape r is to demo n st rate the ef fec t of topic cons ide ra tio n in fi ndi n g more meani ngful co mmunitie s in soc i al netw orki ng site s i n w hic h the users ex press thei r fee l ings tow ard diff er e nt ob je cts by t he means o f rating. For this c ause , so me compo nents of t he f rame w o r k w hic h was p ropos ed in [5] ar e cha nged in o r de r to be appl icab le to the me nt io n ed soc ial n etw orks. This f r amew ork detec t s com muni ties w h i c h have u nique top i c of inte r es t and co nn ected m e mbers. Eac h co mmunity co n tains the node s of the netw ork w hic h have t he same topic of inte res t. This f r amew ork is imp leme nted i n fo ur steps : P r ep r oce ssing and annota ting topic labe l s, Clus t e r ing so cial ob jec ts, Cre a ti ng topic a l clus te rs and A pply ing a com m uni t y dete ctio n algo rithm t o th e topic a l cluste r s . A. Preprocessing and annota t ing topi c lab els In this step, data se ts are p reproc ess ed and r eady to use. In this p r oce ss, the so cial ob jects are r eco gn ized. Gene rally , Peo ple communicate w ith eac h other t hr oug h soc i al o bjects. These objec ts often imply the topics which peo ple ar e inte r este d in. So cial ob jects can be classif ied into tw o kinds of situatio n s [5]: 1) the soc i al ob jects which are a ttac h ed to multi-memb er s, 2) the so cial ob jects which are attached to one membe r . In the fi r st situatio n, the edges b etwe en m emb ers are b ui lt bec ause of a soc ial o bjec t. A n example of this situatio n can be happe ned in a mov ie r ating n e two r k. In this netwo r k, edg es be t ween members are built w h en they r ate the s ame mo vie . As a matte r of fact, in thi s n etw ork, eac h mo vie (soc ial obj ect) is attac hed t o multi m emb er s . The membe r s of the mov ie r a ti n g netwo r k are co nn ec ted to each o the r due to th e r a ti n g of th e same mo vie. In the se co n d situatio n, each soc i al o bject i s attac h ed to only o n e member. Theref ore th e so cial obj e cts ar e c onsidere d to be th e attributes of membe rs of th e n etw ork. An examp le of this situation can b e ha ppe n ed in a p ape r cita tio n n etw ork. In this n etw ork, p a pers (me mbe r s) cite eac h o th e r. Also , e ach paper c ontains a text co nt ent ( th e ti tle o f a pape r ) whic h is a social o bject and c an be co n sidered as the a tt r i b ute o f the co r r espo n ding pape r . Figure 1 show s th e tw o different ki nds of r elatio n s be t ween the memb ers of a n etw ork and soc i al ob j ec ts. Th e netwo r k w h i c h i s loc ated in the left side o f Figure 1 is a movie rating n etw ork. As it is clear, the edge s be t ween membe r s are built b ecause of t he so cial ob jects. Also , the n etwo r k w hi ch i s loc ated o n the right side of t he Figu re 1 is a p ape r cit atio n netwo r k. In this netw ork, each so cial obj e ct is the a ttrib ute of its correspo n di ng paper. Since in this paper th e social n etw orking sites in w hi c h th e users exp r ess t h ei r fe elin gs t ow ar d diff er ent ob j ec ts are analy zed, the fi r st situatio n is happe n ed. Fig. 1. Two di ff eren t kinds of r elatio ns be t wee n th e me mbe r s of netw orks and so cial o bjects So , in t h is step , data sets ar e preproc essed an d r eady to use. I n this proc ess, the so cial ob jects are r eco gn ized. Af terward, the topics o f e ach soc i al o bject in the da t a s et are retriev ed. Sub sequently , e ach soc ial ob ject is labele d b y its co r r espo n ding t o pic. In so me cases th e topics of each so ci al obje ct can e a sily be r et riev ed manually , or there are co r r espo n ding tags which represe nt the topics f o r eac h soc ial obje ct. But in case s w h ere a so cial ob ject is r epresente d b y text and its la be l s cannot easily b e r etrieve d, a method ha s bee n int r oduce d by Z. Zhao, e t al . w hich c an a nn otate the topic labe l to eac h social o bject [5]. B. Clustering soc ial obje cts In this step, so cial ob j ec ts in a n e two rk a r e parti tio ned into diffe rent cluste r s . Eac h cl uster represe nts a unique topic which is shared by its memb ers. In ot h e r wo r ds, Acc ording to their labele d t o pics, social ob jects ar e pa r titio n ed into diff er ent clusters in a way t hat e ac h c luste r includes m emb ers with th e same topic. Diff erent m et h o ds can be used to pe r fo rm the soc ial ob je ct cluste ring acco r ding to the ty pe o f soc i al obj ects. Fo r e xa m pl e, a n ov el m et hod has be en propos ed in [5] to cluste r the te xt so cial ob jects. This met hod comb in es the ve cto r spac e m od el w ith the Ent r opy Weight ing K-Me ans ( EWKM ) [14] in o r de r to clus t e r the text so cial ob je cts. Since the d ata se ts whic h are used in this pape r cont ain so cial obj ects with lab eled top ics, we ma nually parti tio n t hese so cial o bj ects into dif f ere nt clus ters. C. Creating top i cal clus t ers U sing the resu l ts t hat are ge n e rat ed in the prev i ous ste p , we part itio n the me mbers of th e ne t w ork into diff er e nt t opic al cluste r s . In the f irst step , each soc ial obj ect h as b ee n anno t at ed w ith a t o pic lab el. In this ste p, me mbe rs are pa rt it i oned into diff er e n t to pic a l cl usters w ith co n side ring t h e top ic l a be l s of the so cial obj ects they ar e invo lved i n. Thus in this step we fi nd cluste r s in w hic h ev e r y m e m be r has the same topic o f inte r e st. The refo r e the t ota l numbe r o f t opic al c luste r s is eq ual t o the numb er of topics of inte r est in th e n e t w o r k. A use r ca n be a membe r o f sev er al top i ca l cluste r s , si n ce it is co mmo n fo r a user t o b e inte r este d i n se ve ra l topics . D. Applying a communi ty detect ion algor ithm to the topical clusters This step aims to f ind co m munities in each of the topical clusters w h ich we r e created in the prev ious step. Me m be r s in each topic al clus ter ar e co nn ected to each ot h er w i th diff er ent strengt hs. B ased o n the number of r atings on the same so cial obje cts, some members may have st r o n ger co nn ections , w hile so me o th er may have w eak o r no c onnec tions. This h as been co n cluded acco r din g to the t opic analy sis th at has been perfo r med in the f r ame wo r k. Si nce the result o f th e framew or k is to detect co m munities w hi c h have unique topic o f interes t and c onnected membe r s, we should apply a co mmuni t y detec t ion algorithm to th e prev i o usl y cr eated topical clus ters in order to identify th e tig h tly connec ted membe rs. In o r de r to per f orm this proce ss, man y c ommunity detec t ion algorithms ca n b e employe d suc h as GN and so on. New man pr o pos es an impo rtant algorit hm t o pa rtiti o n n etwork graphs o f links an d n ode s i nt o sub g raphs. H e also int roduce s a concept w hich is called modula r ity . In t h e case of we igh te d netwo r ks, modularity ha s be en def ined as f ollow s [9]: j i j i j i ij c c m k k A m Q , , 2 2 1 (1) Where A ij r ep r esents the w eigh t of th e edge be twee n i and j, j ij i A k is th e sum of t he we i ght s of t he edges attached to vertex i, c i is the co mm unity to which the verte x i is assigned, the δ function δ (u, v) is 1 if u=v and 0 o the r wise and also ij ij A m 2 1 . Since New man 's algorithm was ve r y ti me-co n suming, Blo n del, et al . sugges t the modif i ed v ersion of the al go r it hm in order t o make it faster, giving r ise to w h at i s know n a s the "Lo u vain m et h od" [15]. This algorithm is a m odula rity maximizat ion algo rithm which iterat ive l y optimizes the modularity i n a local way an d aggregates n ode s of th e same co m munity [16]. In this pape r , the "Lo uv ain method" has b een applied in o rder t o f ind topic al co m munities. IV. E XPERIM ENT AND ANA LYSIS In this sec tio n, the r esu lts of our resea r ch are prese nted. Firs t, three rea l-life d ata sets along wit h a perfo rmance m et ric w hi c h w er e use d in th e expe rime nts ar e des c ri be d. The n, the proc ess o f detec ting topic al comm u nities i n the m e nti o n ed da t a sets is disc ussed a nd its resu l ts a r e analy zed. Fina lly , th e resu lts of topic-o ri e nted co mmu nity detec ti o n (w it h perfo rm ing co n tent ana ly sis) are comp ar ed w it h the resu lts of commu nity detec t io n w i tho ut perfo rming any co nt e nt analy sis. A. Re al life data se ts We used th e public ly availab le dat a sets in ou r e xpe ri me nts w hi c h are Mo vie le n s, Boo k-Cr oss ing and CIAO dat ase t s. Mo vie lens data se t [17] is a rating data se t w hi c h is c ollect ed from the Mo v iele n s web s ite ( http:/ /mov i elens .o r g). I t co n sis ts of 100000 ratings f r om 943 use r s whic h w ere give n t o 1682 mov i es. B ook -Cr os si ng d ata se t [18] is a rati ng d ata set w hich is col l ec t ed fro m the B o ok-C r oss ing co mmuni ty (http ://w ww . boo kcrossing . co m). It co ntai ns 278858 use r s prov iding 114 9780 ratings a b out 2713 79 b oo ks. CIAO da ta s et [19-22 ] is a ra ti ng dat a se t w h i c h is co llecte d f rom a prod uct review site ( http ://ci a o.c om ) in whic h use r s share the ir o pinio n s abo ut a product by mea ns o f r ati n g o r co mme n ting . The r e are 35773 rat ings in this data se t w hic h ar e attac h ed to 168 50 produc t s by 2248 use r s . As desc r i b ed earlier in th is pape r, the t opic -o r ie n t ed co m munity dete ctio n f ra mew ork considers the r esults of topic analy sis fo r fi n ding mo re meani n gful co m munitie s. So , in order to ev a luate thi s f r amew ork, two aspec ts should be co n sidered: topic and lin kage structure. It means that t he expe cted r esults should kee p eac h co mmunity 's membe rs wi th the same topic and strong co nnections. Z h ao et al. introduc ed a pe rformance evaluatio n metric w hich co n si ders bo th topic and linkage st r uctu re [5]. This metric h as be en def in ed as follow s: Q Pur i ty Q Purity Pur Q 2 2 / 1 As it is clea r in the abov e equati o n , The PurQ β has three parameters w hich are Q, Purity and β. Q de notes t he modularity . T his para meter measures the commu nities f r o m the perspective of the link s tructure . T h e larger th e Q, t h e be t ter t h e co m munitie s are div i ded from t h e perspec t ive of topo l o gical str ucture . In our expe r ime n t, fo r each topical cluster, modula r ity is calculat ed by equation 1 . S ince th e topic-o r iented framewo rk may gener ate more than one topic al cluster f or eac h data se t, t he total value o f modularit y in this framew ork is calcul a ted as f ollow s: n i TC T TC i i Q W eight W eight Q 1 Where n is t h e numbe r of generated topica l cluste rs. i TC Q is the v al ue of modularity fo r the topical c luster TC i . i TC Weight is the su m of t he weights of edge s in the to pical cluster TC i . We ight T is the sum of the w eights of edges in the topical cluste r, w hich is directly created from the basic netwo r k (when n o topica l cluste r ing has b ee n pe r fo r med). It should be co nsider ed that since in this f r amewo r k n o co m municatio ns' co n tent analy sis is perfo rmed, t he w eig ht of each r elatio n ship be twee n t w o membe r s is the n umb er of ratings w hi c h are giv en t o t he same so cial ob jects by th ese two membe r s. In equa tio n 2, Pu rit y r ep r esents t h e pu rity o f to pics in the detec t ed commu nities a n d is calculate d as fo ll ow s [5]: cm N i i ij k j cm n n N Purity 1 1 } / { m ax / 1 Where N cm repr ese nt s the num be r of dete cte d co m muniti es , n ij r ef ers to th e n umb er of n ode s b elonging to topic j and co mm unity i , n i refe r s to the numb er of nodes in co m munity i. k is the numbe r o f topics in the n etw ork. Th e highe r the Purity , the better th e commu nities are partitio ned from the pe r spectiv e of topics. β is a paramete r to adjus t the w eight of Pu rity a nd Q and β ] , 0 [ . If w e co nsider the puri t y of to pics and the to pology of the n e tw ork equa lly i mpo r tant , the v alue of β s h ou ld be set t o 1. If we want t o pay mo r e atte n tio n to Puri ty in co mparis on w ith Q , th e n th e v alue of β s h ould be se t to a numbe r be t w ee n 1 and ∞. On the ot h er hand, if we w an t to pay mo r e atte n tio n to Q in comp ariso n w ith Pu rity , th e v alue of β s h ou ld be set to a numb er betw een 0 and 1. Ac tually β is use d in eq uatio n 2 to adjus t the e mphas is of topics and li nk struc tu r e [5 ]. B. Ex perime nt s In orde r to ide n tify t h e commu nities by appl y in g the topic- oriented co mmunity detec t ion f r amewo r k to t he thre e introduce d datasets, fo ur s teps (acc ording to section III) h av e bee n take n. T h e f irst s tep w as to preproc ess the d ata sets. As to the Mov i elens an d B oo k-Cr ossing d a ta sets , mov i es an d boo k s were conside r ed as the soc ial o bje cts. So , for t h e Mov ielens data set th e genres of th e mov ies were extracted. These extracted ge nres ar e the s ame as the genres attac h ed to each mov ie by IMDB (h tt p:/ / w ww . imdb.co m ). Then, all t h e mov i es w h ich were in th e ge nr es of Do cumentar y o r W estern were r etrie ve d. As yo u kn o w , th e genr e of a mo vie repr ese nt s the gene r al t o pic in which a mov i e is m ade abo ut . In this step, we a chie ve d 77 movie s. For th e B oo k-Cr oss in g data set, we extracted the c ategories of 93 boo ks from A mazo n (http://ww w .amazon.co m). As fo r the CIAO data set, produc ts were considered as the soc ial o bjects. Each produc t's category w as at tached to it in the data set. Thus fo r t h e Bo ok-Cros sin g data set and the CIAO data se t, the categorie s r ep r esent th e topics of eac h produc t o r bo ok. The sec ond step w a s to cluste r th e social ob jects. As f or the Mov ielens data se t, the m ov ies we r e partitioned into two clusters of Docume n tary a nd Weste rn. The Do cumenta ry cluster contained 50 mov ies whil e the Wes tern o n e containe d 27 mov i es. As fo r the Boo k-Crossing data set, the boo ks were partitio ned into two clusters of Fiction and No n -Fictio n . Th e Fictio n cluste r co n tained 80 boo ks, w hi le th e Non-Ficti on cluster conta ined 13. T he p r oduc ts i n t h e CIAO da ta set we r e partitio ned into six cluste r s of DVDs, B ooks, B eaut y , Music, Travel, and Foo d an d Drink. Th e DVDs cluste r co n tains 2057 products, The Bo oks cluster contai n s 2803 products, th e Be a ut y cluster contains 2333 produc ts, the Music clust er co n ta ins 1801 produc ts, the Trave l cluster co ntains 3922 products an d fi nally th e Foo d and D rink cluste r c ontains 3937 products. The t hird step w as to c r eate topical cl usters. Theref ore in each data set, the use rs who ra te th e soc i al objec ts in ea ch cluster we re partitioned into topic al c luste rs. For e xample, all users who r ate th e movie s in th e cluster of “ D oc umentary ” were p ar titio n ed into the t o pical c luste r o f “ Documentary”. The me mbe rs o f each to pical cluste r rated the soc ial ob jects w hi ch have the same topics . Thus acco r ding to the n umb er of topics , we achi ev ed two t opic al cluster s f o r the Mo vie lens and Boo k-Cr oss in g data sets and 6 topic al clus t ers fo r the CIAO data set . As m e nt ioned earlier, si n ce i n this f ram ew ork no co m municatio ns' co n tent analy sis is perfo rmed, t he w eig h t of each r elatio n ship be twee n t w o membe r s is the n umb er of ratings w h ich ar e given to the same soc ial ob jects (for example, t w o m ovie s in the genre of Do cumentary ) by th es e two m embers. The las t step was to dete ct to pical co mm unit ies. T hus w e applied the "Lo uv ain method" t o eac h topical cluste r create d in t he previo us step. In orde r to acc ura tely calculat e t he modularity , we a pplied th e Lo uvai n m et h od t o each topical cluster t e n t imes, and ca lculate d th e ave ra ge o f th e achieve d values of modula rity . Table I giv es th e r esults achieve d by appl y in g th e topic- oriented co mmunity de tection f ramew ork to the Mov ielens, Boo k-Cr oss in g and CIAO d a ta sets . I n this Tab l e, the co lumns "Topic al Cluste r s", " No. of Edges" and "No. of Nodes" represe nt th e c r eated topica l cluste rs in the p r o ce ss of appl y in g the t opic -orie nted framew o r k to the three mentio n ed data sets , the numbe r of edge s and th e numb er of n odes e xisting in eac h of these to pi cal cl uster s , respe ctively . Mo reove r, the co lumn s "Total Modula rity " a nd "Pu rit y " denote the o verall modularity value (Q) and Pu r ity value fo r all o f th e to pical communit ies. TABLE I. T HE RESULT S ACH IEVED BY APPLYING THE TOPIC - OR IENTED COMMUNITY DETECT ION FRA MEW ORK TO M OV I ELENS , B OOK -C ROSSING AND CIAO DATA SETS Purity Tota l Mod ularity No. o f Nodes No. o f Edg es Top ical Clusters Data sets 1 0.12 44 352 158 33 Document ary Mov ielens 491 693 69 Western 1 0.84 69 102 1 8531 Fiction Book - Crossi ng 191 158 7 Non-Fiction 1 0.30 86 135 6 539 16 DVDs CIAO 904 899 9 Books 811 526 7 Beauty 569 207 6 Music 867 129 05 Travel 119 3 297 63 Food & Drink As it is clear in Tab l e I, Puri ty has i ts maximu m v a lue in eac h of the thr ee data se ts. The r easo n is that , th e topic al cluste r s crea ted in e a c h dat a se t inco rporate m e m be r s w hi c h are inte r este d in the s ame unique t o pic s. The r efo r e the puri ty of topics in e ac h of t h e to pica l c om m uni ties is 1 acco rding to equa tio n 4. It s h o uld be c o n si de r ed tha t it is po ssible f o r a ce rtain use r to b e i n sev eral topic a l c l us te rs, s in ce the inte r e st of pe ople in sev eral diff erent t opic s is com m o n . Thus some of the membe rs of topic al clus t e r s in eac h data se t m ay be th e same . F o r ex a mple , co n side r th e c ase tha t a us e r ra ted se ve ra l diff er e n t mov ies. So m e of these mov ies w ere in the genre of Do cument ar y , and the o thers w ere in the ge nre of W este rn. The refo r e t his use r b elo ngs to bo th t op ical cluste r s in the Mo vie lens d ata set. C. Comparison In o rder to prov e th e supe r io rit y of t h e resul ts o f de tecti ng co m muni t ies with top ic co n side ration, in this sect i o n, we co m pare the r es ults o f topic-o rie n ted co mmuni ty detec tion, w hi c h was imple me nted in sec tion B , w ith t h e r es ults of Classic al Com muni t y Detec tio n in w hic h n o co nte n t ana l y sis is perfo rm ed . In the proce ss of cl assic a l com m uni t y detectio n appro ac h, a co m muni t y detec tion a l go rithm is applie d to a netw ork (b a s ic netw ork) in w hic h the we i g ht o f th e edge s represe nts the numb er of co mmunic atio n s be t w een relev ant nodes . In th is co n ditio n, no c onte n t ana ly sis i s do n e . We first applied the "Lo uvain method" to the basic netwo r ks of th e M ov ielens, B oo k-Cr oss in g a nd CIAO data sets (impleme nting the Classical Commu n ity Detec tion Framew ork). Then w e partitio ned the basic netw orks of t he three me ntioned data sets into t opic al cluste rs. Each topical cluster in cludes memb ers w hich have th e same t opi c. Af terwards, t he Lo uvain m et hod w a s applied to t hese to pi cal clusters (i mpleme n ting the Topic-o r iente d commu nit y detec t ion framew ork w hich w as discusse d in section B ). We then use d Pu rQ β to e v aluate the pe r fo r mances i n the expe rimental evaluatio n. The co rr espo n di n g r esul ts a r e give n in Table II. Conseque ntly , as i t i s show n in Table II, β was set to 0.5, 0.75, 1 , 1.5, 2 r espec t ive l y , w h i c h represe n ts th e diffe rent stre n g ths fo r the t opic and the link. Pu r ity , Q and PurQ β ha ve bee n ca l culate d f or e ac h o f th e tw o mentioned framew orks. Acco r ding to Table II, Modul ar ity an d Purity has hi ghe r values in th e t o pic-o r ie n ted f ra mew ork, since th e basic netwo r k i s partitio ned int o topical cluste rs, and each identifie d co m munity in clu des me mbe r s who have the sa me topic of inte r est. Th e r efo r e, the t opic -orie nt ed co m munity dete ctio n framew ork has a highe r value of Pur Q β fo r all five values of β. T ABLE II. C OMP ARISON OF M ODULARIT I ES W HICH WERE ACH IEVED BY APPLYING THE TOPIC - OR IENT E D FRAMEWORK ALONG WITH CLASSICA L COMMUNITY DETECTION FRAMEWORK T O EACH O F THE THREE MENTION E D DATA SETS . PurQ β Tota l Purity Tota l Mod ularity Framew orks Data set β=2 β=1. 5 β=1 β=0. 75 β=0. 5 0.13 21 0.14 95 0.1955 0.2519 0.3 760 0.97 77 0.10 86 Classical Mov ielens 0.15 09 0.17 03 0.2213 0.2830 0.4 154 1 0.12 44 Topic-ori ented 0.85 02 0.85 72 0.8699 0.8795 0.8 906 0.90 50 0.83 75 Classical Boo k-Cro ssing 0.87 37 0.88 88 0.9171 0.9389 0.9 651 1 0.84 69 Topic-ori ented 0.33 32 0.36 24 0.4294 0.4963 0.6 038 0.82 79 0.28 99 Classical CIAO 0.35 81 0.39 20 0.4716 0.5535 0.6 906 1 0.30 86 Topic-ori ented V. CONCL USION This pape r ev alua tes the ef fect of topic co n sideration i n finding mo r e meaningful commu nities i n soc ial netw orkin g sites in w h ich the use rs express their fe elings tow ar d dif fe r ent obje cts (li ke mov ies) by th e means of rating. Th e r efo r e, the netwo r k is p artitio ned into dif f erent t opic al cluste r s in w hich the n odes have the same topic of inte r est. Then, a co mm uni t y detec t ion algorithm is applie d to the topical cluste r s in order to detec t commu n ities . After tha t, a co m parison ha s bee n perfo r med b etwee n th e results o f topic -orie nt ed commu nity detec t ion and th e r esults of Classical Commu nity Detec tion in w hi ch n o co nt e n t an aly sis is perfo r med. The experime ntal results indicate that th e r esul ts of topic-o r iented co m muni t y detec t ion will be i mprov ed w h en it is jo in ed wit h topic analy sis. The re is a plenty o f roo m to s tudy on co mmuni t y detec t io n prob lem in r ea l comple x n e two r ks w hic h co n ta in h uge am o unt of info rmatio n wi th diff erent natu res. The refo r e, in fu tur e w o r ks w e hav e a plan to w ork o n th e eff ect o f o th e r ki nds of co n tents in the netwo rk, like the co mmu nica tio ns' co nt e nt analy sis, in f ind ing mo r e me ani ngf ul c om m uni ti es in so cial netw orking s ites in w hich the use r s ex press th e ir fe eli ngs tow ard diff ere nt ob jects w ith rat ing. R EFERENCE S [1] Chakra borty, A ., Gho sh, S., & Ganguly, N ., 2012 . Detecti ng overla pping communitie s in folkso nomies, Pro ceeding s o f th e 23 rd ACM co nferenc e on Hypert ext and social media . Publis hing, pp. 21 3-218 . [2] Leskov ec, J., Lang, K.J., & Mahoney, M., 201 0. E mpirical c ompari son of al gorith ms for network co mmunity d etection, Pr oceedi ngs o f t he 1 9th international con ference on W orld wide web. Pu blishing, pp. 631 -640. [3] Ne wman, M . 2011 . Co mmunitie s, m odules a nd large -scale structur e i n networks. Na ture Physics, 8 , 25-31. [4] Lan cichinetti, A., & Fortu nat o, S. 2 012 . Conse nsus clu stering i n complex networks. Sc ientific rep orts, 2 . [5] Zhao, Z ., F eng, S., Wan g, Q ., Hu an g, J.Z. , Willia ms, G .J., & Fan, J. 201 2. T op ic orient ed commu nity d etection t hro u gh s ocia l objects an d link a nalysi s i n so cial networks. Kno w ledg e-Base d Systems, 26 , 164- 173 . [6] Ne wman, M.E. , & G irva n, M. 2 00 4. Finding a nd evaluating commu nity structur e in netw orks. Physical re vie w E, 69 , 026 11 3. [7] Are nas, A., Du ch, J., Fer nández, A., & Gó mez, S. 200 7. S ize reduct ion of c ompl ex net works preservin g m odularit y. New Jo urn al o f Physics, 9 , 176 . [8] Leicht, E.A., & Ne wman, M .E. 200 8. Co mmunity structure in dire cted networks. Physical rev ie w lette rs, 100 , 118 703 . [9] Ne wman, M .E. 2 004 . An al ysis of we ighted netw orks. P hysic al Review E, 70 , 056 13 1. [10] G irvan, M., & Ne wman, M.E. 2 002 . C ommu nity structure in social a nd biological n etwork s. Proc eed ings o f the Natio nal Aca demy of S cienc es, 99 , 78 21 -7826. [11] Palla, G., D eré nyi, I., Farkas, I., & V icsek, T . 20 05. Uncovering th e overlappi ng community structure o f complex network s in na ture an d society. Nature , 435 , 814-818 . [12] Zhu, Y., Y an, X., G eto or, L., & Moor e, C. 20 13. Sc a la ble texta nd l ink analysis with m ixed -topic li nk mode ls. Pro ceeding s of th e AC M SIGKDD I nternational C onferen ce on Kno wledge Disc overy and Data Mining ( KDD). Pub lishing, pp. 473 –4 81 . [13] Zhao , A., & Ma, Y., 2 012 . A Sema ntically S tructu red Appr oach to Service C ommunity Di scovery, Semanti cs, Kno wledge and Gri ds (SKG), 201 2 Eighth Inter national Co nferenc e on. Publi shing, pp. 136 - 142 . [14] L. Ji ng, M . Ng , J. Hu ang, An entrop y weighting k-mea ns a lgor ithm for sub space clus tering o f h igh-dimen sional spa rse da ta , IEEE T rans a ction s on Knowle dge and Data Engi neering, vol. 19, pp. 102 6–1 041 (20 07) [15] Blo ndel, V.D., Guil laum e, J .-L., La mbiotte, R., & Lefebvr e, E. 20 08 . Fast unfol ding of c ommuni ties in large n etwork s. Jo urn al of S tatistical Mecha nics: The ory a nd Experime nt, 200 8 , P100 08. [16] Wa ng, D ., Kwon, K., So hn, J ., J oo, B. - G., & Chung, I .-J., 201 4. Community To pical “ Fingerpri nt” Analysis Ba sed on S ocial Sema ntic Network s, Ad vanced T echnol ogie s, Embedded a nd Multim edia fo r Human-c entric Co mputing. Pu blishi ng, pp. 83 -91. [17] H erlocker, J.L ., Konstan, J. A., B orc hers, A., & R i edl, J. , 19 99. A n algorith mic framework for per formin g collab orativ e filt ering, Proceeding s of t he 22n d annual internatio nal ACM SIGIR conference on Research an d develo pment in in formatio n r etri eval. Publishi ng, pp. 2 30 - 237 . [18] Zie gler, C. -N., McNee , S.M., K onst a n, J .A., & Lau sen, G. , 20 05 . Improvin g r ecommendati on li st s throu gh topic diversi ficatio n, Proceeding s of the 14 t h internat ional conferenc e o n World Wide W eb. Publishin g, pp. 22 -32. [19] T ang, J., Gao , H., Hu , X. , & Liu, H ., 201 3. Exploi ting homophily e ffect for tru s t pr ediction, Pro ceeding s o f t he sixth ACM internati onal confere nce on Web s earch and data mini ng. Publ ishing, p p. 53 -62. [20] T ang, J ., G ao, H. , & Liu, H., 20 12 . mTrust: d iscernin g multi -facete d trust in a connect ed world, Proceedi ngs of t h e f i fth ACM inte r national confere nce on Web s earch an d data mini ng. Publ ishing, p p. 93 -102. [21] T ang, J ., Gao , H., Liu, H ., & Da s Sar ma, A., 20 12. eTru st: Understan ding t rust evolution in an online world, Proceedi ngs o f th e 18th A C M SIGK DD internat ional conference on Knowledge discovery and data mini ng. Pub lishing, p p. 25 3-261. [22] T ang, J., H u, X . , Gao, H., & Liu , H., 20 13. Exploiting l ocal and global social context for r ecom mendatio n, Proceedi ngs of th e T we nty -Third international j oint confere nce on A rt i ficial I ntelligen ce. Publishin g, pp. 271 2-2718.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment